TL;DR
- Mapping: Schema defining document structure (similar to RDB table definitions)
- text: For full-text search, tokenized by Analyzer
- keyword: For exact value matching, sorting/aggregation
- Analyzer: Converts text into searchable tokens (use Nori for Korean)
- Denormalization: Include related data in one document since there’s no JOIN
Target Audience: Developers looking to use Elasticsearch search features Prerequisites: Core Components, basic JSON syntax
This document covers Mapping, Field Type, and Analyzer design for effectively storing and searching data in Elasticsearch.
What is Mapping?#
Why define a Mapping upfront? What happens if you index documents without a Mapping? Elasticsearch might infer “2024-01-15” as a string instead of a date, or assign a numeric ID as long type, wasting unnecessary memory. Changing the type later requires reindexing all data. Mapping is the schema definition that prevents these problems from the start.
Mapping is a schema that defines how documents and fields are stored and indexed.
RDB vs Elasticsearch#
| RDB | Elasticsearch |
|---|---|
| CREATE TABLE | PUT /index (mapping) |
| Column Type | Field Type |
| Schema Required | Dynamic Mapping possible |
| ALTER TABLE | Limited (requires reindexing) |
Mapping Example#
PUT /products
{
"mappings": {
"properties": {
"name": {
"type": "text",
"analyzer": "standard"
},
"category": {
"type": "keyword"
},
"price": {
"type": "integer"
},
"created_at": {
"type": "date"
},
"in_stock": {
"type": "boolean"
}
}
}
}Key Points
- Mapping is defined when creating an index, and field type changes are limited afterward
- Dynamic Mapping allows automatic type inference, but explicit definition is recommended for production
- Schema changes require reindexing
Field Types#
String Types#
text vs keyword#
| Property | text | keyword |
|---|---|---|
| Purpose | Full-text search | Exact value matching |
| Analysis | Tokenized by Analyzer | No analysis |
| Search | match query | term query |
| Sort/Aggregation | Not possible (by default) | Possible |
| Examples | Product description, post content | Category, status, ID |
{
"properties": {
"title": {
"type": "text" // "MacBook Pro" → ["macbook", "pro"]
},
"category": {
"type": "keyword" // "Laptop" → "Laptop" (as-is)
}
}
}Multi-field#
Index a single field in multiple ways:
{
"properties": {
"name": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword" // Access via name.keyword
}
}
}
}
}# Full-text search
GET /products/_search
{ "query": { "match": { "name": "MacBook" } } }
# Exact value aggregation
GET /products/_search
{
"aggs": {
"names": { "terms": { "field": "name.keyword" } }
}
}Numeric Types#
| Type | Range | Use Case |
|---|---|---|
byte | -128 ~ 127 | Small integers |
short | -32,768 ~ 32,767 | Small integers |
integer | -2³¹ ~ 2³¹-1 | General integers |
long | -2⁶³ ~ 2⁶³-1 | Large integers, IDs |
float | 32-bit floating point | Approximate values |
double | 64-bit floating point | Precise calculations |
scaled_float | Scaled value | Prices (scaling_factor: 100) |
{
"properties": {
"price": {
"type": "scaled_float",
"scaling_factor": 100 // 23900.00 → 2390000 stored
},
"quantity": {
"type": "integer"
}
}
}Date Type#
{
"properties": {
"created_at": {
"type": "date",
"format": "yyyy-MM-dd HH:mm:ss||yyyy-MM-dd||epoch_millis"
}
}
}Supported formats:
2024-01-152024-01-15T10:30:002024-01-15T10:30:00+09:001705300200000(epoch millis)
Boolean Type#
{
"properties": {
"in_stock": {
"type": "boolean" // true, false, "true", "false" all accepted
}
}
}Complex Types#
Object#
Nested JSON objects:
{
"properties": {
"seller": {
"properties": {
"name": { "type": "keyword" },
"rating": { "type": "float" }
}
}
}
}// Document
{
"seller": {
"name": "Official Store",
"rating": 4.8
}
}
// Search
GET /products/_search
{
"query": {
"match": { "seller.name": "Official Store" }
}
}Nested#
Problem with Object type:
// Document
{
"options": [
{ "color": "black", "size": "M" },
{ "color": "white", "size": "L" }
]
}Object type flattens arrays:
options.color: ["black", "white"]
options.size: ["M", "L"]→ Searching “black AND L” incorrectly matches!
Use Nested type:
{
"properties": {
"options": {
"type": "nested",
"properties": {
"color": { "type": "keyword" },
"size": { "type": "keyword" }
}
}
}
}// Accurate nested query
GET /products/_search
{
"query": {
"nested": {
"path": "options",
"query": {
"bool": {
"must": [
{ "term": { "options.color": "black" } },
{ "term": { "options.size": "M" } }
]
}
}
}
}
}Key Points
- text: For full-text search, use match query
- keyword: For exact values, sorting/aggregation, use term query
- Multi-field: Can index a single field as both text and keyword (name.keyword)
- Nested: Use when relationships between objects in an array need to be preserved (Object type flattens)
Analyzer#
Why do we need an Analyzer? If you search for “galaxy” in a document containing “I purchased a Samsung Galaxy,” will it return results? Without an Analyzer, the original text is compared as a whole, so “Galaxy” (with surrounding characters) and “galaxy” are treated as different strings, causing the search to fail. An Analyzer breaks text into meaningful token units to resolve such mismatch problems.
An Analyzer converts text into searchable tokens.
Analysis Process#
flowchart LR
A["Input Text<br>The Quick Brown Fox"]
--> B["Character Filter<br>(HTML removal, etc.)"]
--> C["Tokenizer<br>(word separation)"]
--> D["Token Filter<br>(lowercase, etc.)"]
--> E["Tokens<br>[the, quick, brown, fox]"]Diagram: The process of converting input text into final tokens through Character Filter, Tokenizer, and Token Filter.
Built-in Analyzers#
| Analyzer | Behavior | Example Result |
|---|---|---|
standard | Word separation + lowercase | “Quick Brown” → [quick, brown] |
simple | Extract letters only + lowercase | “Quick-Brown” → [quick, brown] |
whitespace | Split by whitespace | “Quick Brown” → [Quick, Brown] |
keyword | No analysis | “Quick Brown” → [Quick Brown] |
Testing Analyzers#
GET /_analyze
{
"analyzer": "standard",
"text": "The Quick Brown Fox"
}{
"tokens": [
{ "token": "the", "position": 0 },
{ "token": "quick", "position": 1 },
{ "token": "brown", "position": 2 },
{ "token": "fox", "position": 3 }
]
}Korean Analyzer (Nori)#
Korean text cannot be properly tokenized by whitespace alone.
// Standard Analyzer
"삼성전자가 스마트폰을 출시했다"
→ ["삼성전자가", "스마트폰을", "출시했다"]
// Nori Analyzer
"삼성전자가 스마트폰을 출시했다"
→ ["삼성", "전자", "스마트폰", "출시"]Nori Configuration#
PUT /products
{
"settings": {
"analysis": {
"analyzer": {
"korean": {
"type": "custom",
"tokenizer": "nori_tokenizer",
"filter": ["nori_part_of_speech"]
}
},
"tokenizer": {
"nori_tokenizer": {
"type": "nori_tokenizer",
"decompound_mode": "mixed"
}
}
}
},
"mappings": {
"properties": {
"name": {
"type": "text",
"analyzer": "korean"
}
}
}
}decompound_mode Options#
| Mode | “삼성전자” Result |
|---|---|
none | [삼성전자] |
discard | [삼성, 전자] |
mixed | [삼성전자, 삼성, 전자] |
Recommended:
mixed- Both compound words and separated words are searchable
Custom Analyzer#
PUT /products
{
"settings": {
"analysis": {
"analyzer": {
"my_analyzer": {
"type": "custom",
"char_filter": ["html_strip"],
"tokenizer": "standard",
"filter": ["lowercase", "my_synonym"]
}
},
"filter": {
"my_synonym": {
"type": "synonym",
"synonyms": [
"노트북, 랩탑",
"핸드폰, 스마트폰, 휴대폰"
]
}
}
}
}
}Key Points
- Analyzer = Character Filter + Tokenizer + Token Filter
- Nori Analyzer is recommended for Korean (decompound_mode: mixed)
- Use
/_analyzeAPI to test analysis results- Synonym handling is configured with Custom Analyzer
Dynamic Mapping#
Why does Dynamic Mapping exist? Manually defining every field’s type is tedious. Especially during prototyping, schemas change frequently. Dynamic Mapping lets Elasticsearch automatically infer types just by inserting documents, enabling rapid development. However, be cautious in production, as incorrect inference can be critical.
If Mapping is not defined, Elasticsearch automatically infers types.
Automatic Type Inference#
| JSON Value | Inferred Type |
|---|---|
"hello" | text + keyword |
123 | long |
12.34 | float |
true | boolean |
"2024-01-15" | date |
{ "a": 1 } | object |
Controlling Dynamic Mapping#
PUT /products
{
"mappings": {
"dynamic": "strict", // false: ignore, strict: error
"properties": {
"name": { "type": "text" }
}
}
}| Setting | Behavior |
|---|---|
true | Auto-add new fields (default) |
false | Store new fields but don’t index |
strict | Error on new fields |
Production recommendation:
strictor explicit Mapping definition
Key Points
- Dynamic Mapping is convenient during development, but risks unexpected type inference in production
- Setting
dynamic: strictwill throw an error when undefined fields are inputdynamic: falsestores new fields but doesn’t index them (not searchable)
Modeling Patterns#
Pattern 1: Denormalization#
Elasticsearch doesn’t support JOIN, so include related data in a single document.
// RDB Normalized (2 tables)
// products: id, name, category_id
// categories: id, name
// Elasticsearch Denormalized (1 document)
{
"name": "MacBook Pro",
"category": {
"id": 1,
"name": "Laptop"
}
}Pros: Fast search, simple queries Cons: All documents need updating when category changes
Pattern 2: Application-Side Join#
Manage frequently changing data in separate indices:
// 1. Search products
List<Product> products = productRepository.search(query);
// 2. Fetch inventory info (separate index)
List<String> productIds = products.stream().map(Product::getId).toList();
Map<String, Stock> stocks = stockRepository.findByIds(productIds);
// 3. Combine
products.forEach(p -> p.setStock(stocks.get(p.getId())));Pattern 3: Nested vs Parent-Child#
| Property | Nested | Parent-Child (Join) |
|---|---|---|
| Performance | Fast | Slow |
| Update | Re-index entire document | Update child only |
| Query Complexity | Low | High |
| Recommended For | Rarely changing relations | Frequently changing 1:N |
Key Points
- Denormalization is the default strategy since Elasticsearch doesn’t support JOIN
- Consider Application-Side Join for frequently changing data
- Nested has good performance but requires full document re-indexing; Parent-Child allows individual updates
Best Practices#
1. Use text for search fields, keyword for filter/aggregation fields#
{
"name": {
"type": "text",
"fields": { "keyword": { "type": "keyword" } }
},
"status": { "type": "keyword" }
}2. Use keyword for numeric IDs#
{
"user_id": { "type": "keyword" } // Not long!
}If no range queries needed, keyword is more efficient.
3. Exclude unnecessary fields from indexing#
{
"raw_data": {
"type": "object",
"enabled": false // Store only, not searchable
}
}4. Use Index Templates#
PUT /_index_template/logs
{
"index_patterns": ["logs-*"],
"template": {
"mappings": {
"properties": {
"@timestamp": { "type": "date" },
"message": { "type": "text" }
}
}
}
}Key Points
- Configure search fields as text + keyword Multi-field
- Numeric IDs are more efficient as keyword if no range queries
- Exclude fields from indexing with
enabled: falseif not searching- Apply consistent Mapping with index templates
Next Steps#
| Goal | Recommended Document |
|---|---|
| Write search queries | Query DSL |
| Improve search quality | Search Relevance |
| Hands-on practice | Basic Examples |