TL;DR
- Index/Document/Field: Corresponds to Table/Row/Column in RDB
- Shard/Replica: Basic units of data distribution and replication
- Analyzer/Tokenizer: Breaks text into searchable tokens
- Query/Filter Context: Search methods distinguished by scoring calculation
- Sorted alphabetically, each term links to related concept documents
Quick reference for Elasticsearch core terms. For detailed explanations, see the Concepts section.
A-E#
Aggregation#
Feature for grouping search results and calculating statistics. Similar to SQL’s GROUP BY. Three types: Bucket/Metric/Pipeline.
→ Aggregations | Query DSL
Alias#
An alternative name for an Index. Useful for zero-downtime index switching and multi-index search. Used with ILM. → Indexing Strategy
Analyzer#
Component that breaks text into Terms. Processes in order: Character Filter → Tokenizer → Token Filter. Use Nori analyzer for Korean. → Data Modeling
BM25 (Best Matching 25)#
Elasticsearch’s default Score calculation algorithm. Based on TF and IDF. Can be adjusted with Boosting. → Search Relevance
Boosting#
Technique of adding weight to the Score of specific fields or conditions. → Search Relevance
Bulk API#
API for indexing multiple Documents at once. Essential for performance. Use with Refresh control. → Indexing Strategy
Cluster#
A group of Elasticsearch servers consisting of one or more Nodes. State managed by Master Node. → Core Components | Cluster Management
Coordinating Node#
Node that receives search requests, distributes to Data Nodes, and merges results. All nodes perform this role by default. → Cluster Management
Data Node#
Node that stores actual data and performs search/Aggregation. Shards are assigned to it. → Cluster Management
Document#
JSON data unit stored in Elasticsearch. Similar to a Row in RDB. Stored within an Index. → Core Components
DSL (Domain Specific Language)#
JSON-based language for writing Elasticsearch Queries. Provides various queries like Bool, Match, Term. → Query DSL
F-M#
Field#
Individual data item within a Document. Similar to a Column in RDB. Type defined by Mapping. → Data Modeling
Filter Context#
Performs condition matching without Score calculation. Cached for excellent performance. Used with Query Context in Bool queries. → Query DSL
Flush#
Operation to permanently store memory buffer data to disk. Clears Translog. Distinct from Refresh. → Indexing Strategy
IDF (Inverse Document Frequency)#
Indicator of how rare a word is across all Documents. Component of BM25. → Search Relevance
ILM (Index Lifecycle Management)#
Automatic management of Index lifecycle from creation to deletion. Hot → Warm → Cold → Delete phases. → Indexing Strategy
Index#
Collection of Documents. Similar to a Table in RDB. Distributed storage via Shards. → Core Components
Inverted Index#
Data structure mapping Terms → Document locations. Core of fast search. → Core Components
kNN (k-Nearest Neighbors)#
Vector similarity-based search. Algorithm that finds the k closest documents in Vector Search. → Vector Search
Mapping#
Defines how Documents and Fields are stored/indexed. Similar to Schema in RDB. Dynamic/Explicit methods. → Data Modeling
Master Node#
Node that manages Cluster state and handles Index creation/deletion. Recommended to separate from Data Node. → Cluster Management
N-R#
Node#
Single Elasticsearch server that forms a Cluster. Roles include Master, Data, Coordinating. → Core Components | Cluster Management
Nori#
Official Elasticsearch Korean morphological Analyzer. Provides nori_tokenizer, nori_part_of_speech filter. Used for autocomplete, initial consonant search.
→ Korean Search Optimization
Primary Shard#
Shard where original data is stored. Count cannot be changed after creation. Source of Replica Shard. → Core Components
Query Context#
Calculates relevance Score between search term and Document. Used with Filter Context in Bool queries. → Query DSL
Refresh#
Operation to make memory buffer data searchable. Default 1 second. Recommended to adjust when using Bulk API. Distinct from Flush. → Indexing Strategy | Performance Tuning
Reindex#
Copy/transform existing Index to new index. Used for Mapping changes, data migration. → Indexing Strategy
Replica Shard#
Copy of Primary Shard. Improves read performance and failover. Placed on different Nodes in Cluster. → High Availability
S-Z#
Score#
Number indicating relevance between search term and Document. Calculated by BM25 algorithm. Adjustable with Boosting. → Search Relevance
Segment#
Immutable file piece that composes an Index. Created during Refresh. Consolidated by Merge. → Performance Tuning
Shard#
Horizontal partition of an Index. Unit of distributed storage and parallel processing. Divided into Primary and Replica. → Core Components
Snapshot#
Backup of Index state at a specific point. Stored in remote storage (S3, GCS, etc.). Automated with SLM. → High Availability
TF (Term Frequency)#
Frequency of Term appearing in a Document. Component of BM25. → Search Relevance
Term#
Individual token generated after Analyzer processing. Stored in Inverted Index. → Data Modeling
Tokenizer#
Component of Analyzer that breaks text into tokens. Standard, Whitespace, Nori, etc. → Data Modeling
Translog#
Write-Ahead Log for preventing data loss. Used for recovery until Flush. → High Availability
Vector Search#
Semantic search using embedding vectors. Uses kNN algorithm. Used for semantic search, similar product recommendations. → Vector Search
Abbreviations#
| Abbr | Full Name | Meaning | Reference |
|---|---|---|---|
| BM25 | Best Matching 25 | Default scoring algorithm | Search Relevance |
| CCR | Cross-Cluster Replication | Real-time cross-cluster replication | High Availability |
| DSL | Domain Specific Language | Query language | Query DSL |
| IDF | Inverse Document Frequency | Word rarity indicator | Search Relevance |
| ILM | Index Lifecycle Management | Index lifecycle management | Indexing Strategy |
| kNN | k-Nearest Neighbors | k-nearest neighbor search | Vector Search |
| SLM | Snapshot Lifecycle Management | Snapshot lifecycle management | High Availability |
| TF | Term Frequency | Word frequency indicator | Search Relevance |
Next Steps#
- Concepts - Elasticsearch core concepts
- Quick Start - Quick start guide
- References - Official docs, blogs
- FAQ - Frequently asked questions