TL;DR
  • Mapping: Schema defining document structure (similar to RDB table definitions)
  • text: For full-text search, tokenized by Analyzer
  • keyword: For exact value matching, sorting/aggregation
  • Analyzer: Converts text into searchable tokens (use Nori for Korean)
  • Denormalization: Include related data in one document since there’s no JOIN

Target Audience: Developers looking to use Elasticsearch search features Prerequisites: Core Components, basic JSON syntax

This document covers Mapping, Field Type, and Analyzer design for effectively storing and searching data in Elasticsearch.

What is Mapping?#

Mapping is a schema that defines how documents and fields are stored and indexed.

RDB vs Elasticsearch#

RDBElasticsearch
CREATE TABLEPUT /index (mapping)
Column TypeField Type
Schema RequiredDynamic Mapping possible
ALTER TABLELimited (requires reindexing)

Mapping Example#

PUT /products
{
  "mappings": {
    "properties": {
      "name": {
        "type": "text",
        "analyzer": "standard"
      },
      "category": {
        "type": "keyword"
      },
      "price": {
        "type": "integer"
      },
      "created_at": {
        "type": "date"
      },
      "in_stock": {
        "type": "boolean"
      }
    }
  }
}
Key Points
  • Mapping is defined when creating an index, and field type changes are limited afterward
  • Dynamic Mapping allows automatic type inference, but explicit definition is recommended for production
  • Schema changes require reindexing

Field Types#

String Types#

text vs keyword#

Propertytextkeyword
PurposeFull-text searchExact value matching
AnalysisTokenized by AnalyzerNo analysis
Searchmatch queryterm query
Sort/AggregationNot possible (by default)Possible
ExamplesProduct description, post contentCategory, status, ID
{
  "properties": {
    "title": {
      "type": "text"          // "MacBook Pro" → ["macbook", "pro"]
    },
    "category": {
      "type": "keyword"       // "Laptop" → "Laptop" (as-is)
    }
  }
}

Multi-field#

Index a single field in multiple ways:

{
  "properties": {
    "name": {
      "type": "text",
      "fields": {
        "keyword": {
          "type": "keyword"   // Access via name.keyword
        }
      }
    }
  }
}
# Full-text search
GET /products/_search
{ "query": { "match": { "name": "MacBook" } } }

# Exact value aggregation
GET /products/_search
{
  "aggs": {
    "names": { "terms": { "field": "name.keyword" } }
  }
}

Numeric Types#

TypeRangeUse Case
byte-128 ~ 127Small integers
short-32,768 ~ 32,767Small integers
integer-2³¹ ~ 2³¹-1General integers
long-2⁶³ ~ 2⁶³-1Large integers, IDs
float32-bit floating pointApproximate values
double64-bit floating pointPrecise calculations
scaled_floatScaled valuePrices (scaling_factor: 100)
{
  "properties": {
    "price": {
      "type": "scaled_float",
      "scaling_factor": 100    // 23900.00 → 2390000 stored
    },
    "quantity": {
      "type": "integer"
    }
  }
}

Date Type#

{
  "properties": {
    "created_at": {
      "type": "date",
      "format": "yyyy-MM-dd HH:mm:ss||yyyy-MM-dd||epoch_millis"
    }
  }
}

Supported formats:

  • 2024-01-15
  • 2024-01-15T10:30:00
  • 2024-01-15T10:30:00+09:00
  • 1705300200000 (epoch millis)

Boolean Type#

{
  "properties": {
    "in_stock": {
      "type": "boolean"   // true, false, "true", "false" all accepted
    }
  }
}

Complex Types#

Object#

Nested JSON objects:

{
  "properties": {
    "seller": {
      "properties": {
        "name": { "type": "keyword" },
        "rating": { "type": "float" }
      }
    }
  }
}
// Document
{
  "seller": {
    "name": "Official Store",
    "rating": 4.8
  }
}

// Search
GET /products/_search
{
  "query": {
    "match": { "seller.name": "Official Store" }
  }
}

Nested#

Problem with Object type:

// Document
{
  "options": [
    { "color": "black", "size": "M" },
    { "color": "white", "size": "L" }
  ]
}

Object type flattens arrays:

options.color: ["black", "white"]
options.size: ["M", "L"]

→ Searching “black AND L” incorrectly matches!

Use Nested type:

{
  "properties": {
    "options": {
      "type": "nested",
      "properties": {
        "color": { "type": "keyword" },
        "size": { "type": "keyword" }
      }
    }
  }
}
// Accurate nested query
GET /products/_search
{
  "query": {
    "nested": {
      "path": "options",
      "query": {
        "bool": {
          "must": [
            { "term": { "options.color": "black" } },
            { "term": { "options.size": "M" } }
          ]
        }
      }
    }
  }
}
Key Points
  • text: For full-text search, use match query
  • keyword: For exact values, sorting/aggregation, use term query
  • Multi-field: Can index a single field as both text and keyword (name.keyword)
  • Nested: Use when relationships between objects in an array need to be preserved (Object type flattens)

Analyzer#

An Analyzer converts text into searchable tokens.

Analysis Process#

flowchart LR
    A["Input Text<br>The Quick Brown Fox"]
    --> B["Character Filter<br>(HTML removal, etc.)"]
    --> C["Tokenizer<br>(word separation)"]
    --> D["Token Filter<br>(lowercase, etc.)"]
    --> E["Tokens<br>&#91;the, quick, brown, fox&#93;"]

Diagram: The process of converting input text into final tokens through Character Filter, Tokenizer, and Token Filter.

Built-in Analyzers#

AnalyzerBehaviorExample Result
standardWord separation + lowercase“Quick Brown” → [quick, brown]
simpleExtract letters only + lowercase“Quick-Brown” → [quick, brown]
whitespaceSplit by whitespace“Quick Brown” → [Quick, Brown]
keywordNo analysis“Quick Brown” → [Quick Brown]

Testing Analyzers#

GET /_analyze
{
  "analyzer": "standard",
  "text": "The Quick Brown Fox"
}
{
  "tokens": [
    { "token": "the", "position": 0 },
    { "token": "quick", "position": 1 },
    { "token": "brown", "position": 2 },
    { "token": "fox", "position": 3 }
  ]
}

Korean Analyzer (Nori)#

Korean text cannot be properly tokenized by whitespace alone.

// Standard Analyzer
"삼성전자가 스마트폰을 출시했다"
 ["삼성전자가", "스마트폰을", "출시했다"]

// Nori Analyzer
"삼성전자가 스마트폰을 출시했다"
 ["삼성", "전자", "스마트폰", "출시"]

Nori Configuration#

PUT /products
{
  "settings": {
    "analysis": {
      "analyzer": {
        "korean": {
          "type": "custom",
          "tokenizer": "nori_tokenizer",
          "filter": ["nori_part_of_speech"]
        }
      },
      "tokenizer": {
        "nori_tokenizer": {
          "type": "nori_tokenizer",
          "decompound_mode": "mixed"
        }
      }
    }
  },
  "mappings": {
    "properties": {
      "name": {
        "type": "text",
        "analyzer": "korean"
      }
    }
  }
}

decompound_mode Options#

Mode“삼성전자” Result
none[삼성전자]
discard[삼성, 전자]
mixed[삼성전자, 삼성, 전자]

Recommended: mixed - Both compound words and separated words are searchable

Custom Analyzer#

PUT /products
{
  "settings": {
    "analysis": {
      "analyzer": {
        "my_analyzer": {
          "type": "custom",
          "char_filter": ["html_strip"],
          "tokenizer": "standard",
          "filter": ["lowercase", "my_synonym"]
        }
      },
      "filter": {
        "my_synonym": {
          "type": "synonym",
          "synonyms": [
            "노트북, 랩탑",
            "핸드폰, 스마트폰, 휴대폰"
          ]
        }
      }
    }
  }
}
Key Points
  • Analyzer = Character Filter + Tokenizer + Token Filter
  • Nori Analyzer is recommended for Korean (decompound_mode: mixed)
  • Use /_analyze API to test analysis results
  • Synonym handling is configured with Custom Analyzer

Dynamic Mapping#

If Mapping is not defined, Elasticsearch automatically infers types.

Automatic Type Inference#

JSON ValueInferred Type
"hello"text + keyword
123long
12.34float
trueboolean
"2024-01-15"date
{ "a": 1 }object

Controlling Dynamic Mapping#

PUT /products
{
  "mappings": {
    "dynamic": "strict",    // false: ignore, strict: error
    "properties": {
      "name": { "type": "text" }
    }
  }
}
SettingBehavior
trueAuto-add new fields (default)
falseStore new fields but don’t index
strictError on new fields

Production recommendation: strict or explicit Mapping definition

Key Points
  • Dynamic Mapping is convenient during development, but risks unexpected type inference in production
  • Setting dynamic: strict will throw an error when undefined fields are input
  • dynamic: false stores new fields but doesn’t index them (not searchable)

Modeling Patterns#

Pattern 1: Denormalization#

Elasticsearch doesn’t support JOIN, so include related data in a single document.

// RDB Normalized (2 tables)
// products: id, name, category_id
// categories: id, name

// Elasticsearch Denormalized (1 document)
{
  "name": "MacBook Pro",
  "category": {
    "id": 1,
    "name": "Laptop"
  }
}

Pros: Fast search, simple queries Cons: All documents need updating when category changes

Pattern 2: Application-Side Join#

Manage frequently changing data in separate indices:

// 1. Search products
List<Product> products = productRepository.search(query);

// 2. Fetch inventory info (separate index)
List<String> productIds = products.stream().map(Product::getId).toList();
Map<String, Stock> stocks = stockRepository.findByIds(productIds);

// 3. Combine
products.forEach(p -> p.setStock(stocks.get(p.getId())));

Pattern 3: Nested vs Parent-Child#

PropertyNestedParent-Child (Join)
PerformanceFastSlow
UpdateRe-index entire documentUpdate child only
Query ComplexityLowHigh
Recommended ForRarely changing relationsFrequently changing 1:N
Key Points
  • Denormalization is the default strategy since Elasticsearch doesn’t support JOIN
  • Consider Application-Side Join for frequently changing data
  • Nested has good performance but requires full document re-indexing; Parent-Child allows individual updates

Best Practices#

1. Use text for search fields, keyword for filter/aggregation fields#

{
  "name": {
    "type": "text",
    "fields": { "keyword": { "type": "keyword" } }
  },
  "status": { "type": "keyword" }
}

2. Use keyword for numeric IDs#

{
  "user_id": { "type": "keyword" }  // Not long!
}

If no range queries needed, keyword is more efficient.

3. Exclude unnecessary fields from indexing#

{
  "raw_data": {
    "type": "object",
    "enabled": false    // Store only, not searchable
  }
}

4. Use Index Templates#

PUT /_index_template/logs
{
  "index_patterns": ["logs-*"],
  "template": {
    "mappings": {
      "properties": {
        "@timestamp": { "type": "date" },
        "message": { "type": "text" }
      }
    }
  }
}
Key Points
  • Configure search fields as text + keyword Multi-field
  • Numeric IDs are more efficient as keyword if no range queries
  • Exclude fields from indexing with enabled: false if not searching
  • Apply consistent Mapping with index templates

Next Steps#

GoalRecommended Document
Write search queriesQuery DSL
Improve search qualitySearch Relevance
Hands-on practiceBasic Examples