TL;DR
- Cluster: A group of servers providing high availability and distributed processing
- Node: A single Elasticsearch server in the cluster (Master, Data, Coordinating roles)
- Index: A logical collection of documents (similar to RDB table)
- Document: A JSON data unit (similar to RDB row)
- Shard: A horizontal partition of an index (consists of Primary/Replica)
Target Audience: Developers new to Elasticsearch Prerequisites: JSON basics, REST API concepts
Understand the roles and relationships of Elasticsearch’s core components: Cluster, Node, Index, Document, and Shard.
Overall Architecture#
flowchart TB
subgraph Cluster["Cluster"]
subgraph Node1["Node 1 (Master)"]
subgraph Index1["products Index"]
P0["Primary Shard 0"]
P1["Primary Shard 1"]
end
end
subgraph Node2["Node 2 (Data)"]
R0["Replica Shard 0"]
R1["Replica Shard 1"]
end
end
P0 -.Replication.-> R0
P1 -.Replication.-> R1Diagram: Shows the structure where a Master node and Data node exist in the cluster, with Primary Shards of the products index being replicated to Replica Shards on a different node.
Cluster#
A cluster is a group of Elasticsearch servers consisting of one or more nodes.
Key Characteristics#
- Identified by a unique name (default:
elasticsearch) - Nodes with the same cluster name automatically connect
- Distributes data and load across multiple nodes
Cluster Status#
| Status | Meaning | Action |
|---|---|---|
| 🟢 Green | All shards healthy | Normal operation |
| 🟡 Yellow | Primary OK, some Replicas unassigned | Consider adding nodes |
| 🔴 Red | Some Primary shards unassigned | Immediate action required |
# Check cluster health
GET /_cluster/health{
"cluster_name": "my-cluster",
"status": "green",
"number_of_nodes": 3,
"active_primary_shards": 10,
"active_shards": 20
}Key Points
- A cluster is identified by its unique name, and nodes with the same name automatically connect
- Cluster status (Green/Yellow/Red) allows you to grasp the overall system health at a glance
- Check current status with the
/_cluster/healthAPI
Node#
A node is a single Elasticsearch server that is part of a cluster.
Node Roles#
flowchart LR
subgraph Cluster
M[Master Node<br>Cluster Management]
D1[Data Node<br>Data Storage/Search]
D2[Data Node<br>Data Storage/Search]
C[Coordinating Node<br>Request Routing]
end
Client --> C
C --> D1
C --> D2
M -.Management.-> D1
M -.Management.-> D2Diagram: Shows the flow where client requests are routed through the Coordinating node to Data nodes, with the Master node managing everything.
| Role | Description | Configuration |
|---|---|---|
| Master | Cluster state management, index creation/deletion | node.roles: [master] |
| Data | Data storage, search/aggregation execution | node.roles: [data] |
| Coordinating | Route search requests, merge results | node.roles: [] |
| Ingest | Pre-process data before indexing | node.roles: [ingest] |
In small clusters, a single node may perform multiple roles.
Check Node Information#
GET /_nodesKey Points
- Nodes can perform various roles such as Master, Data, Coordinating, and Ingest
- In small clusters, a single node performs multiple roles; in large clusters, roles are separated
- Check node information with the
/_nodesAPI
Index#
An index is a collection of documents with similar characteristics. It’s analogous to a table in RDB.
RDB vs Elasticsearch#
| RDB | Elasticsearch |
|---|---|
| Database | Cluster |
| Table | Index |
| Row | Document |
| Column | Field |
| Schema | Mapping |
Creating an Index#
PUT /products
{
"settings": {
"number_of_shards": 3,
"number_of_replicas": 1
},
"mappings": {
"properties": {
"name": { "type": "text" },
"price": { "type": "integer" },
"category": { "type": "keyword" }
}
}
}Index Settings#
| Setting | Default | Description |
|---|---|---|
number_of_shards | 1 | Number of Primary shards (cannot change after creation) |
number_of_replicas | 1 | Number of Replica shards (dynamically changeable) |
refresh_interval | 1s | Interval until searchable |
Index Management#
# List indices
GET /_cat/indices?v
# Index information
GET /products
# Delete index
DELETE /productsKey Points
- An index is similar to an RDB table and has a Mapping (schema)
number_of_shardscannot be changed after creation, so choose carefullynumber_of_replicascan be changed dynamically
Document#
A document is a JSON data unit stored in an index. It’s analogous to a Row in RDB.
Document Structure#
{
"_index": "products", // Parent index
"_id": "1", // Unique document ID
"_version": 1, // Version (increments on update)
"_source": { // Actual data
"name": "MacBook Pro",
"price": 2390000,
"category": "Laptop"
}
}Document CRUD#
# Create (with ID)
PUT /products/_doc/1
{
"name": "MacBook Pro",
"price": 2390000
}
# Create (auto-generate ID)
POST /products/_doc
{
"name": "iPad"
}
# Read
GET /products/_doc/1
# Update
POST /products/_update/1
{
"doc": {
"price": 2290000
}
}
# Delete
DELETE /products/_doc/1Key Points
- Documents are stored in JSON format and uniquely identified by
_id- Concurrency control is possible with
_version- CRUD operations are performed via RESTful API (PUT/POST/GET/DELETE)
Shard#
A shard is a horizontal partition of an index. It enables distributed storage and parallel processing.
Primary Shard vs Replica Shard#
flowchart LR
subgraph Index["products Index (3 Primary, 1 Replica)"]
direction TB
subgraph Node1
P0[Primary 0]
R2[Replica 2]
end
subgraph Node2
P1[Primary 1]
R0[Replica 0]
end
subgraph Node3
P2[Primary 2]
R1[Replica 1]
end
end
P0 -.-> R0
P1 -.-> R1
P2 -.-> R2Diagram: Shows the structure where 3 Primary Shards are distributed across different nodes, with each Primary’s Replica placed on a different node for fault tolerance.
| Type | Role | Characteristic |
|---|---|---|
| Primary | Stores original data | Fixed count at index creation |
| Replica | Copy of Primary | Improves read performance, failover backup |
How Shards Work#
Write:
- Calculate hash from document ID
- Determine responsible Primary shard:
shard = hash(id) % number_of_shards - Write to Primary, then replicate to Replica
Read:
- Coordinating node receives request
- Parallel requests to all relevant shards (Primary or Replica)
- Merge and return results
Shard Count Guidelines#
| Data Scale | Recommended Primary Shards |
|---|---|
| Few GB | 1 |
| Tens of GB | 2-5 |
| Hundreds of GB | 5-10 |
| TB or more | 10+ (consider node count) |
Rule of Thumb: 20-40GB per shard is optimal.
Check Shard Information#
# Shard allocation status
GET /_cat/shards/products?v
# Example output
index shard prirep state docs store node
products 0 p STARTED 100 50mb node-1
products 0 r STARTED 100 50mb node-2
products 1 p STARTED 120 55mb node-2
products 1 r STARTED 120 55mb node-1Key Points
- Primary Shards cannot be changed after creation, Replicas can be changed dynamically
- The responsible shard is determined by the hash value of the document ID:
shard = hash(id) % number_of_shards- 20-40GB per shard is the optimal size
Inverted Index#
The core principle behind Elasticsearch’s fast search.
Forward Index vs Inverted Index#
Forward Index:
doc1 → [MacBook, Pro, 14-inch]
doc2 → [MacBook, Air, 13-inch]Inverted Index:
MacBook → [doc1, doc2]
Pro → [doc1]
Air → [doc2]
14-inch → [doc1]
13-inch → [doc2]Search Process#
When searching “MacBook Pro”:
- “MacBook” → [doc1, doc2]
- “Pro” → [doc1]
- Intersection: doc1
Key Point: Finds results directly from the inverted index without scanning all documents.
Key Points
- The inverted index is structured as “term → document list”
- It’s called “inverted” because it’s the opposite direction of a forward index (document → terms)
- Fast search is possible through intersection/union operations on search terms
Lucene Internals (Advanced)#
Elasticsearch internally uses the Apache Lucene library. Each shard is a Lucene index.
Segment Structure#
flowchart TB
subgraph Shard["Shard (= Lucene Index)"]
subgraph Segments["Segments"]
S1["Segment 1<br>(Immutable)"]
S2["Segment 2<br>(Immutable)"]
S3["Segment 3<br>(Immutable)"]
end
Commit["Commit Point<br>(Segment list)"]
Translog["Translog<br>(Uncommitted changes)"]
end
S1 --> Commit
S2 --> Commit
S3 --> CommitDiagram: Shows the structure where multiple immutable Segments exist inside a shard, with a Commit Point tracking the active segment list and Translog storing uncommitted changes.
| Component | Role | Characteristic |
|---|---|---|
| Segment | File containing the actual inverted index | Immutable - once written, never modified |
| Commit Point | List of currently active segments | Tracks segments for search |
| Translog | Change log | For crash recovery, protects data before fsync |
Document Indexing Process#
sequenceDiagram
participant Client
participant ES as Elasticsearch
participant Buffer as In-Memory Buffer
participant Segment
participant Translog
Client->>ES: Index document
ES->>Buffer: Add to memory buffer
ES->>Translog: Write to Translog
ES-->>Client: Success response
Note over Buffer,Segment: Refresh (default 1 second)
Buffer->>Segment: Create new segment
Note right of Segment: Now searchable
Note over Segment,Translog: Flush (periodic)
Segment->>Segment: fsync to disk
Translog->>Translog: Clear TranslogDiagram: Shows the process where document indexing writes to memory buffer and Translog, then Refresh creates a new segment making it searchable, and Flush permanently saves to disk.
Why are Segments Immutable?#
- Concurrency Guarantee: Read without locks
- Cache Efficiency: Maximize OS file system cache
- Stability: Reduced risk of data corruption
Downside: Deletes/updates don’t actually remove data, just mark as “deleted” → Merged later
Segment Merge#
Too many segments degrade performance. Elasticsearch merges segments in the background:
flowchart LR
subgraph Before["Before Merge"]
S1["Seg 1<br>100 docs"]
S2["Seg 2<br>50 docs"]
S3["Seg 3<br>30 docs"]
end
subgraph After["After Merge"]
SM["Merged Seg<br>180 docs"]
end
S1 --> SM
S2 --> SM
S3 --> SMDiagram: Shows the process where multiple small segments (Seg 1, 2, 3) are merged into one larger segment.
What happens during merge:
- Documents marked as deleted are physically removed
- Multiple segments → One larger segment
- I/O load generated (monitor in production)
Refresh vs Flush vs Merge#
| Operation | Trigger | Result | Performance Impact |
|---|---|---|---|
| Refresh | Every 1 second (default) | Buffer → New segment (searchable) | Low |
| Flush | Periodic / translog size | Segment fsync + translog clear | Medium |
| Merge | Background | Multiple segments → One | High (I/O intensive) |
Near Real-Time (NRT) Search#
Elasticsearch provides “Near Real-Time (NRT)” search, not true real-time:
Document indexed → (up to 1 second wait) → Refresh → Searchable- Default refresh_interval: 1 second
- For immediate search:
?refresh=trueparameter (watch performance) - For bulk indexing: Disable with
refresh_interval: -1, then manual refresh when complete
// Bulk indexing optimization
PUT /products/_settings
{ "refresh_interval": "-1" }
// After indexing complete
POST /products/_refresh
PUT /products/_settings
{ "refresh_interval": "1s" }Key Points
- Segments are immutable, so deletes/updates only “mark as deleted” and are merged later
- Refresh (1 second) makes searchable, Flush permanently saves to disk
- Elasticsearch provides “Near Real-Time (NRT)” search; use
?refresh=truefor immediate search (watch performance)
Summary#
flowchart TB
A[Cluster] --> B[Node]
B --> C[Index]
C --> D[Shard]
D --> E[Document]
A2["Collection of nodes<br>Provides high availability"] -.-> A
B2["Physical server<br>Separable by role"] -.-> B
C2["Logical group of documents<br>Similar to RDB table"] -.-> C
D2["Physical partition of index<br>Unit of distributed processing"] -.-> D
E2["JSON data<br>Similar to RDB row"] -.-> EDiagram: Summarizes the hierarchical structure Cluster → Node → Index → Shard → Document and the role of each component.
Next Steps#
| Goal | Recommended Document |
|---|---|
| Schema design | Data Modeling |
| Write search queries | Query DSL |
| Hands-on practice | Basic Examples |