Version InformationThis guide is written based on the following versions:
- Elasticsearch: 8.11.x
- Kibana: 8.11.x
- Spring Boot: 3.2.x
- Spring Data Elasticsearch: 5.2.x
- Java: 17+
Some APIs or configurations may differ in other versions. In particular, Elasticsearch 7.x and 8.x have significant differences in security settings and client APIs.
What is Elasticsearch?#
Elasticsearch is a distributed search and analytics engine. It’s a tool that enables fast searching across large volumes of data and real-time analysis.
Why is Elasticsearch Needed?#
What problems arise when searching with LIKE '%keyword%' in an RDB?
| RDB Search Limitations | After Elasticsearch Adoption |
|---|---|
| Slow due to full table scan | Millisecond search via Inverted Index |
| No morphological analysis | Searching “Samsung Electronics” matches both “Samsung” and “Electronics” |
| No typo tolerance | Fuzzy search finds “Samsng” too |
| No relevance sorting | Accurate results ranked by relevance score |
| Single server limitations | Horizontal scaling via sharding, handles billions of documents |
Elasticsearch solves these problems while providing real-time search, complex aggregations, and high availability.
When Should You Use Elasticsearch?#
Suitable cases:
- When full-text search for products, posts, etc. is needed
- When time-series data analysis for logs, metrics is needed
- When real-time aggregations for dashboards are needed
- When autocomplete, typo correction, synonym handling is needed
- When RDB search becomes slow due to large data volume
May be overkill:
- When only simple CRUD is needed (RDB is sufficient)
- When transaction integrity is critical (Elasticsearch is Eventually Consistent)
- When data volume is small and search requirements are simple
- When there’s no capacity for operational infrastructure
Elasticsearch Limitations and Considerations#
Realistic drawbacks you should know before adopting Elasticsearch:
| Limitation | Description | Mitigation |
|---|---|---|
| Operational Complexity | Cluster management, shard rebalancing, JVM tuning required | Dedicated personnel or managed services |
| Cost | Memory-intensive, minimum 4GB+ per node recommended | Proper capacity planning for data scale |
| Data Consistency | Eventually Consistent, not real-time (default 1-second refresh) | Adjust refresh settings if real-time is critical |
| Schema Changes | Cannot change existing field types, requires reindexing | Initial Mapping design is crucial |
| No JOIN Support | Cannot JOIN between tables, denormalization required | Data model redesign, Application-side JOIN |
| No Transaction Support | No ACID transactions | Keep RDB as primary, ES for search only |
| Learning Curve | Query DSL, Mapping, Analyzers require learning | Consider team capabilities |
Practical Advice: The safest pattern is using Elasticsearch as a “search-only secondary store” while maintaining RDB as the main store. Core service functionality remains even if ES fails.
Alternative Technology Comparison#
Elasticsearch is not the only option:
| Technology | Characteristics | Best For |
|---|---|---|
| Elasticsearch | Full-stack search/analytics, most features | Large-scale search, log analysis, complex aggregations |
| OpenSearch | Fork of ES 7.10, AWS managed available | AWS environments, licensing concerns |
| Apache Solr | Long history, proven stability | Traditional enterprise environments |
| Meilisearch | Simple setup, quick start | Small scale, prototypes, instant search |
| Typesense | Easy configuration, built-in typo tolerance | Small services, rapid implementation |
| PostgreSQL FTS | No separate system needed | Simple search, already using PG |
Selection Criteria: If data volume < 1 million documents and search requirements are simple, PostgreSQL FTS or Meilisearch may suffice. Elasticsearch is suitable for large-scale, complex requirements.
RDB vs Elasticsearch#
| Concept | RDB | Elasticsearch |
|---|---|---|
| Storage Unit | Row | Document (JSON) |
| Schema | Table Schema | Mapping |
| Table | Table | Index |
| Column | Column | Field |
| Index | B-Tree Index | Inverted Index |
| Join | JOIN | Nested, Parent-Child (limited) |
| Transaction | ACID | Eventually Consistent |
Key Difference: RDB is optimized for accurate retrieval of normalized data, while Elasticsearch is optimized for fast search of denormalized data.
What This Guide Covers#
Quick Start#
Store and search data in Elasticsearch in 5 minutes. See it working before diving into concepts.
Concepts#
Not just “use it this way”, but explaining why it works this way.
| Topic | What You’ll Learn |
|---|---|
| Core Components | Roles and relationships of Cluster, Node, Index, Document, Shard |
| Data Modeling | Mapping, Field Type, Analyzer design |
| Query DSL | Writing search queries with Match, Term, Bool |
| Search Relevance | Improving search quality with Score, BM25, Boosting |
| Aggregations | Data analysis with Bucket and Metric aggregations |
| Indexing Strategy | Bulk indexing, Refresh, ILM settings |
| Cluster Management | Node configuration, shard allocation, status monitoring |
| Performance Tuning | Query optimization, caching, JVM settings |
| High Availability | Replica, Snapshot, failure response |
Hands-on Examples#
Executable example code based on Spring Boot.
- Environment Setup - Docker Elasticsearch + Kibana configuration
- Basic Examples - Document CRUD and basic search implementation
- Product Search System - Korean search, autocomplete, filtering implementation
Appendix#
- Glossary - Quick reference for Elasticsearch terms
- FAQ - Frequently asked questions
- References - Official docs and additional learning resources
Prerequisites#
- Required: REST API basics, JSON format understanding
- Helpful: Java/Spring Boot experience, RDB usage experience
Suggested Learning Path#
If you're new: Quick Start → Core Components → Data Modeling → Basic Examples
Search implementation: Query DSL → Search Relevance → Product Search System
Data analysis: Aggregations → Indexing Strategy
Operations prep: Cluster Management → Performance Tuning → High AvailabilityEach document can be read independently, but if you’re new, we recommend the order above.