TL;DR
- Cluster: A group of servers providing high availability and distributed processing
- Node: A single Elasticsearch server in the cluster (Master, Data, Coordinating roles)
- Index: A logical collection of documents (similar to RDB table)
- Document: A JSON data unit (similar to RDB row)
- Shard: A horizontal partition of an index (consists of Primary/Replica)
Target Audience: Developers new to Elasticsearch Prerequisites: JSON basics, REST API concepts
Understand the roles and relationships of Elasticsearch’s core components: Cluster, Node, Index, Document, and Shard.
Overall Architecture#
flowchart TB
subgraph Cluster["Cluster"]
subgraph Node1["Node 1 (Master)"]
subgraph Index1["products Index"]
P0["Primary Shard 0"]
P1["Primary Shard 1"]
end
end
subgraph Node2["Node 2 (Data)"]
R0["Replica Shard 0"]
R1["Replica Shard 1"]
end
end
P0 -.Replication.-> R0
P1 -.Replication.-> R1Diagram: Shows the structure where a Master node and Data node exist in the cluster, with Primary Shards of the products index being replicated to Replica Shards on a different node.
Cluster#
Why not just run a single Elasticsearch server? What happens if you only operate one server? When the disk fills up, you can no longer store data, and if the server goes down, the entire service stops. A cluster groups multiple servers together to solve this single point of failure problem.
A cluster is a group of Elasticsearch servers consisting of one or more nodes.
Key Characteristics#
- Identified by a unique name (default:
elasticsearch) - Nodes with the same cluster name automatically connect
- Distributes data and load across multiple nodes
Cluster Status#
| Status | Meaning | Action |
|---|---|---|
| 🟢 Green | All shards healthy | Normal operation |
| 🟡 Yellow | Primary OK, some Replicas unassigned | Consider adding nodes |
| 🔴 Red | Some Primary shards unassigned | Immediate action required |
# Check cluster health
GET /_cluster/health{
"cluster_name": "my-cluster",
"status": "green",
"number_of_nodes": 3,
"active_primary_shards": 10,
"active_shards": 20
}Key Points
- A cluster is identified by its unique name, and nodes with the same name automatically connect
- Cluster status (Green/Yellow/Red) allows you to grasp the overall system health at a glance
- Check current status with the
/_cluster/healthAPI
Node#
Why separate node roles? What problems arise if all nodes do the same work? When every server simultaneously handles cluster state management, data storage, and request routing, resource contention intensifies and the blast radius of failures widens. Separating roles lets each node focus on its specialty, improving both stability and performance.
A node is a single Elasticsearch server that is part of a cluster.
Node Roles#
flowchart LR
subgraph Cluster
M[Master Node<br>Cluster Management]
D1[Data Node<br>Data Storage/Search]
D2[Data Node<br>Data Storage/Search]
C[Coordinating Node<br>Request Routing]
end
Client --> C
C --> D1
C --> D2
M -.Management.-> D1
M -.Management.-> D2Diagram: Shows the flow where client requests are routed through the Coordinating node to Data nodes, with the Master node managing everything.
| Role | Description | Configuration |
|---|---|---|
| Master | Cluster state management, index creation/deletion | node.roles: [master] |
| Data | Data storage, search/aggregation execution | node.roles: [data] |
| Coordinating | Route search requests, merge results | node.roles: [] |
| Ingest | Pre-process data before indexing | node.roles: [ingest] |
In small clusters, a single node may perform multiple roles.
Check Node Information#
GET /_nodesKey Points
- Nodes can perform various roles such as Master, Data, Coordinating, and Ingest
- In small clusters, a single node performs multiple roles; in large clusters, roles are separated
- Check node information with the
/_nodesAPI
Index#
Why not store everything in one place? What happens if you mix product data and log data together? Every search must scan irrelevant data, and field type conflicts cause mapping errors. An index logically separates data of similar characteristics to prevent such confusion.
An index is a collection of documents with similar characteristics. It’s analogous to a table in RDB.
RDB vs Elasticsearch#
| RDB | Elasticsearch |
|---|---|
| Database | Cluster |
| Table | Index |
| Row | Document |
| Column | Field |
| Schema | Mapping |
Creating an Index#
PUT /products
{
"settings": {
"number_of_shards": 3,
"number_of_replicas": 1
},
"mappings": {
"properties": {
"name": { "type": "text" },
"price": { "type": "integer" },
"category": { "type": "keyword" }
}
}
}Index Settings#
| Setting | Default | Description |
|---|---|---|
number_of_shards | 1 | Number of Primary shards (cannot change after creation; can be recreated via split/shrink API) |
number_of_replicas | 1 | Number of Replica shards (dynamically changeable) |
refresh_interval | 1s | Interval until searchable |
Index Management#
# List indices
GET /_cat/indices?v
# Index information
GET /products
# Delete index
DELETE /productsKey Points
- An index is similar to an RDB table and has a Mapping (schema)
number_of_shardscannot be changed after creation (can be recreated via split/shrink API), so choose carefullynumber_of_replicascan be changed dynamically
Document#
A document is a JSON data unit stored in an index. It’s analogous to a Row in RDB.
Document Structure#
{
"_index": "products", // Parent index
"_id": "1", // Unique document ID
"_version": 1, // Version (increments on update)
"_source": { // Actual data
"name": "MacBook Pro",
"price": 2390000,
"category": "Laptop"
}
}Document CRUD#
# Create (with ID)
PUT /products/_doc/1
{
"name": "MacBook Pro",
"price": 2390000
}
# Create (auto-generate ID)
POST /products/_doc
{
"name": "iPad"
}
# Read
GET /products/_doc/1
# Update
POST /products/_update/1
{
"doc": {
"price": 2290000
}
}
# Delete
DELETE /products/_doc/1Key Points
- Documents are stored in JSON format and uniquely identified by
_id- Concurrency control is possible with
_version- CRUD operations are performed via RESTful API (PUT/POST/GET/DELETE)
Shard#
Why split an index into shards? What happens if you store a 100GB index on a single node? Disk I/O becomes a bottleneck, search speed drops dramatically, and if that node goes down, all data is lost. Shards split an index into multiple pieces for distributed storage, achieving both parallel processing and fault recovery.
A shard is a horizontal partition of an index. It enables distributed storage and parallel processing.
Primary Shard vs Replica Shard#
flowchart LR
subgraph Index["products Index (3 Primary, 1 Replica)"]
direction TB
subgraph Node1
P0[Primary 0]
R2[Replica 2]
end
subgraph Node2
P1[Primary 1]
R0[Replica 0]
end
subgraph Node3
P2[Primary 2]
R1[Replica 1]
end
end
P0 -.-> R0
P1 -.-> R1
P2 -.-> R2Diagram: Shows the structure where 3 Primary Shards are distributed across different nodes, with each Primary’s Replica placed on a different node for fault tolerance.
| Type | Role | Characteristic |
|---|---|---|
| Primary | Stores original data | Fixed count at index creation (can be recreated via split/shrink API) |
| Replica | Copy of Primary | Improves read performance, failover backup |
How Shards Work#
Write:
- Calculate hash from document ID
- Determine responsible Primary shard:
shard = hash(id) % number_of_shards - Write to Primary, then replicate to Replica
Read:
- Coordinating node receives request
- Parallel requests to all relevant shards (Primary or Replica)
- Merge and return results
Shard Count Guidelines#
| Data Scale | Recommended Primary Shards |
|---|---|
| Few GB | 1 |
| Tens of GB | 2-5 |
| Hundreds of GB | 5-10 |
| TB or more | 10+ (consider node count) |
Rule of Thumb: 20-40GB per shard is optimal.
Check Shard Information#
# Shard allocation status
GET /_cat/shards/products?v
# Example output
index shard prirep state docs store node
products 0 p STARTED 100 50mb node-1
products 0 r STARTED 100 50mb node-2
products 1 p STARTED 120 55mb node-2
products 1 r STARTED 120 55mb node-1Key Points
- Primary Shards cannot be changed after creation (can be recreated via split/shrink API), Replicas can be changed dynamically
- The responsible shard is determined by the hash value of the document ID:
shard = hash(id) % number_of_shards- 20-40GB per shard is the optimal size
Inverted Index#
Why do we need an inverted index? How would you find documents containing “MacBook” among 1 million documents? Scanning every document one by one (Full Scan) slows linearly as data grows. An inverted index pre-builds a “term -> list of documents containing that term” mapping, enabling instant results at search time.
The core principle behind Elasticsearch’s fast search.
Forward Index vs Inverted Index#
Forward Index:
doc1 → [MacBook, Pro, 14-inch]
doc2 → [MacBook, Air, 13-inch]Inverted Index:
MacBook → [doc1, doc2]
Pro → [doc1]
Air → [doc2]
14-inch → [doc1]
13-inch → [doc2]Search Process#
When searching “MacBook Pro”:
- “MacBook” → [doc1, doc2]
- “Pro” → [doc1]
- Intersection: doc1
Key Point: Finds results directly from the inverted index without scanning all documents.
Key Points
- The inverted index is structured as “term → document list”
- It’s called “inverted” because it’s the opposite direction of a forward index (document → terms)
- Fast search is possible through intersection/union operations on search terms
Lucene Internals (Advanced)#
Elasticsearch internally uses the Apache Lucene library. Each shard is a Lucene index.
Segment Structure#
flowchart TB
subgraph Shard["Shard (= Lucene Index)"]
subgraph Segments["Segments"]
S1["Segment 1<br>(Immutable)"]
S2["Segment 2<br>(Immutable)"]
S3["Segment 3<br>(Immutable)"]
end
Commit["Commit Point<br>(Segment list)"]
Translog["Translog<br>(Uncommitted changes)"]
end
S1 --> Commit
S2 --> Commit
S3 --> CommitDiagram: Shows the structure where multiple immutable Segments exist inside a shard, with a Commit Point tracking the active segment list and Translog storing uncommitted changes.
| Component | Role | Characteristic |
|---|---|---|
| Segment | File containing the actual inverted index | Immutable - once written, never modified |
| Commit Point | List of currently active segments | Tracks segments for search |
| Translog | Change log | For crash recovery, protects data before fsync |
Document Indexing Process#
sequenceDiagram
participant Client
participant ES as Elasticsearch
participant Buffer as In-Memory Buffer
participant Segment
participant Translog
Client->>ES: Index document
ES->>Buffer: Add to memory buffer
ES->>Translog: Write to Translog
ES-->>Client: Success response
Note over Buffer,Segment: Refresh (default 1 second)
Buffer->>Segment: Create new segment
Note right of Segment: Now searchable
Note over Segment,Translog: Flush (periodic)
Segment->>Segment: fsync to disk
Translog->>Translog: Clear TranslogDiagram: Shows the process where document indexing writes to memory buffer and Translog, then Refresh creates a new segment making it searchable, and Flush permanently saves to disk.
Why are Segments Immutable?#
- Concurrency Guarantee: Read without locks
- Cache Efficiency: Maximize OS file system cache
- Stability: Reduced risk of data corruption
Downside: Deletes/updates don’t actually remove data, just mark as “deleted” → Merged later
Segment Merge#
Too many segments degrade performance. Elasticsearch merges segments in the background:
flowchart LR
subgraph Before["Before Merge"]
S1["Seg 1<br>100 docs"]
S2["Seg 2<br>50 docs"]
S3["Seg 3<br>30 docs"]
end
subgraph After["After Merge"]
SM["Merged Seg<br>180 docs"]
end
S1 --> SM
S2 --> SM
S3 --> SMDiagram: Shows the process where multiple small segments (Seg 1, 2, 3) are merged into one larger segment.
What happens during merge:
- Documents marked as deleted are physically removed
- Multiple segments → One larger segment
- I/O load generated (monitor in production)
Refresh vs Flush vs Merge#
| Operation | Trigger | Result | Performance Impact |
|---|---|---|---|
| Refresh | Every 1 second (default) | Buffer → New segment (searchable) | Low |
| Flush | Periodic / translog size | Segment fsync + translog clear | Medium |
| Merge | Background | Multiple segments → One | High (I/O intensive) |
Near Real-Time (NRT) Search#
Elasticsearch provides “Near Real-Time (NRT)” search, not true real-time:
Document indexed → (up to 1 second wait) → Refresh → Searchable- Default refresh_interval: 1 second
- For immediate search:
?refresh=trueparameter (watch performance) - For bulk indexing: Disable with
refresh_interval: -1, then manual refresh when complete
// Bulk indexing optimization
PUT /products/_settings
{ "refresh_interval": "-1" }
// After indexing complete
POST /products/_refresh
PUT /products/_settings
{ "refresh_interval": "1s" }Key Points
- Segments are immutable, so deletes/updates only “mark as deleted” and are merged later
- Refresh (1 second) makes searchable, Flush permanently saves to disk
- Elasticsearch provides “Near Real-Time (NRT)” search; use
?refresh=truefor immediate search (watch performance)
Summary#
flowchart TB
A[Cluster] --> B[Node]
B --> C[Index]
C --> D[Shard]
D --> E[Document]
A2["Collection of nodes<br>Provides high availability"] -.-> A
B2["Physical server<br>Separable by role"] -.-> B
C2["Logical group of documents<br>Similar to RDB table"] -.-> C
D2["Physical partition of index<br>Unit of distributed processing"] -.-> D
E2["JSON data<br>Similar to RDB row"] -.-> EDiagram: Summarizes the hierarchical structure Cluster → Node → Index → Shard → Document and the role of each component.
Next Steps#
| Goal | Recommended Document |
|---|---|
| Schema design | Data Modeling |
| Write search queries | Query DSL |
| Hands-on practice | Basic Examples |