TL;DR
  • Index/Document/Field: Corresponds to Table/Row/Column in RDB
  • Shard/Replica: Basic units of data distribution and replication
  • Analyzer/Tokenizer: Breaks text into searchable tokens
  • Query/Filter Context: Search methods distinguished by scoring calculation
  • Sorted alphabetically, each term links to related concept documents

Quick reference for Elasticsearch core terms. For detailed explanations, see the Concepts section.

A-E#

Aggregation#

Feature for grouping search results and calculating statistics. Similar to SQL’s GROUP BY. Three types: Bucket/Metric/Pipeline. → Aggregations | Query DSL

Alias#

An alternative name for an Index. Useful for zero-downtime index switching and multi-index search. Used with ILM. → Indexing Strategy

Analyzer#

Component that breaks text into Terms. Processes in order: Character Filter → Tokenizer → Token Filter. Use Nori analyzer for Korean. → Data Modeling

BM25 (Best Matching 25)#

Elasticsearch’s default Score calculation algorithm. Based on TF and IDF. Can be adjusted with Boosting. → Search Relevance

Boosting#

Technique of adding weight to the Score of specific fields or conditions. → Search Relevance

Bulk API#

API for indexing multiple Documents at once. Essential for performance. Use with Refresh control. → Indexing Strategy

Cluster#

A group of Elasticsearch servers consisting of one or more Nodes. State managed by Master Node. → Core Components | Cluster Management

Coordinating Node#

Node that receives search requests, distributes to Data Nodes, and merges results. All nodes perform this role by default. → Cluster Management

Data Node#

Node that stores actual data and performs search/Aggregation. Shards are assigned to it. → Cluster Management

Document#

JSON data unit stored in Elasticsearch. Similar to a Row in RDB. Stored within an Index. → Core Components

DSL (Domain Specific Language)#

JSON-based language for writing Elasticsearch Queries. Provides various queries like Bool, Match, Term. → Query DSL


F-M#

Field#

Individual data item within a Document. Similar to a Column in RDB. Type defined by Mapping. → Data Modeling

Filter Context#

Performs condition matching without Score calculation. Cached for excellent performance. Used with Query Context in Bool queries. → Query DSL

Flush#

Operation to permanently store memory buffer data to disk. Clears Translog. Distinct from Refresh. → Indexing Strategy

IDF (Inverse Document Frequency)#

Indicator of how rare a word is across all Documents. Component of BM25. → Search Relevance

ILM (Index Lifecycle Management)#

Automatic management of Index lifecycle from creation to deletion. Hot → Warm → Cold → Delete phases. → Indexing Strategy

Index#

Collection of Documents. Similar to a Table in RDB. Distributed storage via Shards. → Core Components

Inverted Index#

Data structure mapping TermsDocument locations. Core of fast search. → Core Components

kNN (k-Nearest Neighbors)#

Vector similarity-based search. Algorithm that finds the k closest documents in Vector Search. → Vector Search

Mapping#

Defines how Documents and Fields are stored/indexed. Similar to Schema in RDB. Dynamic/Explicit methods. → Data Modeling

Master Node#

Node that manages Cluster state and handles Index creation/deletion. Recommended to separate from Data Node. → Cluster Management


N-R#

Node#

Single Elasticsearch server that forms a Cluster. Roles include Master, Data, Coordinating. → Core Components | Cluster Management

Nori#

Official Elasticsearch Korean morphological Analyzer. Provides nori_tokenizer, nori_part_of_speech filter. Used for autocomplete, initial consonant search. → Korean Search Optimization

Primary Shard#

Shard where original data is stored. Count cannot be changed after creation. Source of Replica Shard. → Core Components

Query Context#

Calculates relevance Score between search term and Document. Used with Filter Context in Bool queries. → Query DSL

Refresh#

Operation to make memory buffer data searchable. Default 1 second. Recommended to adjust when using Bulk API. Distinct from Flush. → Indexing Strategy | Performance Tuning

Reindex#

Copy/transform existing Index to new index. Used for Mapping changes, data migration. → Indexing Strategy

Replica Shard#

Copy of Primary Shard. Improves read performance and failover. Placed on different Nodes in Cluster. → High Availability


S-Z#

Score#

Number indicating relevance between search term and Document. Calculated by BM25 algorithm. Adjustable with Boosting. → Search Relevance

Segment#

Immutable file piece that composes an Index. Created during Refresh. Consolidated by Merge. → Performance Tuning

Shard#

Horizontal partition of an Index. Unit of distributed storage and parallel processing. Divided into Primary and Replica. → Core Components

Snapshot#

Backup of Index state at a specific point. Stored in remote storage (S3, GCS, etc.). Automated with SLM. → High Availability

TF (Term Frequency)#

Frequency of Term appearing in a Document. Component of BM25. → Search Relevance

Term#

Individual token generated after Analyzer processing. Stored in Inverted Index. → Data Modeling

Tokenizer#

Component of Analyzer that breaks text into tokens. Standard, Whitespace, Nori, etc. → Data Modeling

Translog#

Write-Ahead Log for preventing data loss. Used for recovery until Flush. → High Availability

Semantic search using embedding vectors. Uses kNN algorithm. Used for semantic search, similar product recommendations. → Vector Search


Abbreviations#

AbbrFull NameMeaningReference
BM25Best Matching 25Default scoring algorithmSearch Relevance
CCRCross-Cluster ReplicationReal-time cross-cluster replicationHigh Availability
DSLDomain Specific LanguageQuery languageQuery DSL
IDFInverse Document FrequencyWord rarity indicatorSearch Relevance
ILMIndex Lifecycle ManagementIndex lifecycle managementIndexing Strategy
kNNk-Nearest Neighborsk-nearest neighbor searchVector Search
SLMSnapshot Lifecycle ManagementSnapshot lifecycle managementHigh Availability
TFTerm FrequencyWord frequency indicatorSearch Relevance

Next Steps#