TL;DR
  • Mapping: Schema defining document structure (similar to RDB table definitions)
  • text: For full-text search, tokenized by Analyzer
  • keyword: For exact value matching, sorting/aggregation
  • Analyzer: Converts text into searchable tokens (use Nori for Korean)
  • Denormalization: Include related data in one document since there’s no JOIN

Target Audience: Developers looking to use Elasticsearch search features Prerequisites: Core Components, basic JSON syntax

This document covers Mapping, Field Type, and Analyzer design for effectively storing and searching data in Elasticsearch.

What is Mapping?#

Why define a Mapping upfront? What happens if you index documents without a Mapping? Elasticsearch might infer “2024-01-15” as a string instead of a date, or assign a numeric ID as long type, wasting unnecessary memory. Changing the type later requires reindexing all data. Mapping is the schema definition that prevents these problems from the start.

Mapping is a schema that defines how documents and fields are stored and indexed.

RDB vs Elasticsearch#

RDBElasticsearch
CREATE TABLEPUT /index (mapping)
Column TypeField Type
Schema RequiredDynamic Mapping possible
ALTER TABLELimited (requires reindexing)

Mapping Example#

PUT /products
{
  "mappings": {
    "properties": {
      "name": {
        "type": "text",
        "analyzer": "standard"
      },
      "category": {
        "type": "keyword"
      },
      "price": {
        "type": "integer"
      },
      "created_at": {
        "type": "date"
      },
      "in_stock": {
        "type": "boolean"
      }
    }
  }
}
Key Points
  • Mapping is defined when creating an index, and field type changes are limited afterward
  • Dynamic Mapping allows automatic type inference, but explicit definition is recommended for production
  • Schema changes require reindexing

Field Types#

String Types#

text vs keyword#

Propertytextkeyword
PurposeFull-text searchExact value matching
AnalysisTokenized by AnalyzerNo analysis
Searchmatch queryterm query
Sort/AggregationNot possible (by default)Possible
ExamplesProduct description, post contentCategory, status, ID
{
  "properties": {
    "title": {
      "type": "text"          // "MacBook Pro" → ["macbook", "pro"]
    },
    "category": {
      "type": "keyword"       // "Laptop" → "Laptop" (as-is)
    }
  }
}

Multi-field#

Index a single field in multiple ways:

{
  "properties": {
    "name": {
      "type": "text",
      "fields": {
        "keyword": {
          "type": "keyword"   // Access via name.keyword
        }
      }
    }
  }
}
# Full-text search
GET /products/_search
{ "query": { "match": { "name": "MacBook" } } }

# Exact value aggregation
GET /products/_search
{
  "aggs": {
    "names": { "terms": { "field": "name.keyword" } }
  }
}

Numeric Types#

TypeRangeUse Case
byte-128 ~ 127Small integers
short-32,768 ~ 32,767Small integers
integer-2³¹ ~ 2³¹-1General integers
long-2⁶³ ~ 2⁶³-1Large integers, IDs
float32-bit floating pointApproximate values
double64-bit floating pointPrecise calculations
scaled_floatScaled valuePrices (scaling_factor: 100)
{
  "properties": {
    "price": {
      "type": "scaled_float",
      "scaling_factor": 100    // 23900.00 → 2390000 stored
    },
    "quantity": {
      "type": "integer"
    }
  }
}

Date Type#

{
  "properties": {
    "created_at": {
      "type": "date",
      "format": "yyyy-MM-dd HH:mm:ss||yyyy-MM-dd||epoch_millis"
    }
  }
}

Supported formats:

  • 2024-01-15
  • 2024-01-15T10:30:00
  • 2024-01-15T10:30:00+09:00
  • 1705300200000 (epoch millis)

Boolean Type#

{
  "properties": {
    "in_stock": {
      "type": "boolean"   // true, false, "true", "false" all accepted
    }
  }
}

Complex Types#

Object#

Nested JSON objects:

{
  "properties": {
    "seller": {
      "properties": {
        "name": { "type": "keyword" },
        "rating": { "type": "float" }
      }
    }
  }
}
// Document
{
  "seller": {
    "name": "Official Store",
    "rating": 4.8
  }
}

// Search
GET /products/_search
{
  "query": {
    "match": { "seller.name": "Official Store" }
  }
}

Nested#

Problem with Object type:

// Document
{
  "options": [
    { "color": "black", "size": "M" },
    { "color": "white", "size": "L" }
  ]
}

Object type flattens arrays:

options.color: ["black", "white"]
options.size: ["M", "L"]

→ Searching “black AND L” incorrectly matches!

Use Nested type:

{
  "properties": {
    "options": {
      "type": "nested",
      "properties": {
        "color": { "type": "keyword" },
        "size": { "type": "keyword" }
      }
    }
  }
}
// Accurate nested query
GET /products/_search
{
  "query": {
    "nested": {
      "path": "options",
      "query": {
        "bool": {
          "must": [
            { "term": { "options.color": "black" } },
            { "term": { "options.size": "M" } }
          ]
        }
      }
    }
  }
}
Key Points
  • text: For full-text search, use match query
  • keyword: For exact values, sorting/aggregation, use term query
  • Multi-field: Can index a single field as both text and keyword (name.keyword)
  • Nested: Use when relationships between objects in an array need to be preserved (Object type flattens)

Analyzer#

Why do we need an Analyzer? If you search for “galaxy” in a document containing “I purchased a Samsung Galaxy,” will it return results? Without an Analyzer, the original text is compared as a whole, so “Galaxy” (with surrounding characters) and “galaxy” are treated as different strings, causing the search to fail. An Analyzer breaks text into meaningful token units to resolve such mismatch problems.

An Analyzer converts text into searchable tokens.

Analysis Process#

flowchart LR
    A["Input Text<br>The Quick Brown Fox"]
    --> B["Character Filter<br>(HTML removal, etc.)"]
    --> C["Tokenizer<br>(word separation)"]
    --> D["Token Filter<br>(lowercase, etc.)"]
    --> E["Tokens<br>&#91;the, quick, brown, fox&#93;"]

Diagram: The process of converting input text into final tokens through Character Filter, Tokenizer, and Token Filter.

Built-in Analyzers#

AnalyzerBehaviorExample Result
standardWord separation + lowercase“Quick Brown” → [quick, brown]
simpleExtract letters only + lowercase“Quick-Brown” → [quick, brown]
whitespaceSplit by whitespace“Quick Brown” → [Quick, Brown]
keywordNo analysis“Quick Brown” → [Quick Brown]

Testing Analyzers#

GET /_analyze
{
  "analyzer": "standard",
  "text": "The Quick Brown Fox"
}
{
  "tokens": [
    { "token": "the", "position": 0 },
    { "token": "quick", "position": 1 },
    { "token": "brown", "position": 2 },
    { "token": "fox", "position": 3 }
  ]
}

Korean Analyzer (Nori)#

Korean text cannot be properly tokenized by whitespace alone.

// Standard Analyzer
"삼성전자가 스마트폰을 출시했다"
 ["삼성전자가", "스마트폰을", "출시했다"]

// Nori Analyzer
"삼성전자가 스마트폰을 출시했다"
 ["삼성", "전자", "스마트폰", "출시"]

Nori Configuration#

PUT /products
{
  "settings": {
    "analysis": {
      "analyzer": {
        "korean": {
          "type": "custom",
          "tokenizer": "nori_tokenizer",
          "filter": ["nori_part_of_speech"]
        }
      },
      "tokenizer": {
        "nori_tokenizer": {
          "type": "nori_tokenizer",
          "decompound_mode": "mixed"
        }
      }
    }
  },
  "mappings": {
    "properties": {
      "name": {
        "type": "text",
        "analyzer": "korean"
      }
    }
  }
}

decompound_mode Options#

Mode“삼성전자” Result
none[삼성전자]
discard[삼성, 전자]
mixed[삼성전자, 삼성, 전자]

Recommended: mixed - Both compound words and separated words are searchable

Custom Analyzer#

PUT /products
{
  "settings": {
    "analysis": {
      "analyzer": {
        "my_analyzer": {
          "type": "custom",
          "char_filter": ["html_strip"],
          "tokenizer": "standard",
          "filter": ["lowercase", "my_synonym"]
        }
      },
      "filter": {
        "my_synonym": {
          "type": "synonym",
          "synonyms": [
            "노트북, 랩탑",
            "핸드폰, 스마트폰, 휴대폰"
          ]
        }
      }
    }
  }
}
Key Points
  • Analyzer = Character Filter + Tokenizer + Token Filter
  • Nori Analyzer is recommended for Korean (decompound_mode: mixed)
  • Use /_analyze API to test analysis results
  • Synonym handling is configured with Custom Analyzer

Dynamic Mapping#

Why does Dynamic Mapping exist? Manually defining every field’s type is tedious. Especially during prototyping, schemas change frequently. Dynamic Mapping lets Elasticsearch automatically infer types just by inserting documents, enabling rapid development. However, be cautious in production, as incorrect inference can be critical.

If Mapping is not defined, Elasticsearch automatically infers types.

Automatic Type Inference#

JSON ValueInferred Type
"hello"text + keyword
123long
12.34float
trueboolean
"2024-01-15"date
{ "a": 1 }object

Controlling Dynamic Mapping#

PUT /products
{
  "mappings": {
    "dynamic": "strict",    // false: ignore, strict: error
    "properties": {
      "name": { "type": "text" }
    }
  }
}
SettingBehavior
trueAuto-add new fields (default)
falseStore new fields but don’t index
strictError on new fields

Production recommendation: strict or explicit Mapping definition

Key Points
  • Dynamic Mapping is convenient during development, but risks unexpected type inference in production
  • Setting dynamic: strict will throw an error when undefined fields are input
  • dynamic: false stores new fields but doesn’t index them (not searchable)

Modeling Patterns#

Pattern 1: Denormalization#

Elasticsearch doesn’t support JOIN, so include related data in a single document.

// RDB Normalized (2 tables)
// products: id, name, category_id
// categories: id, name

// Elasticsearch Denormalized (1 document)
{
  "name": "MacBook Pro",
  "category": {
    "id": 1,
    "name": "Laptop"
  }
}

Pros: Fast search, simple queries Cons: All documents need updating when category changes

Pattern 2: Application-Side Join#

Manage frequently changing data in separate indices:

// 1. Search products
List<Product> products = productRepository.search(query);

// 2. Fetch inventory info (separate index)
List<String> productIds = products.stream().map(Product::getId).toList();
Map<String, Stock> stocks = stockRepository.findByIds(productIds);

// 3. Combine
products.forEach(p -> p.setStock(stocks.get(p.getId())));

Pattern 3: Nested vs Parent-Child#

PropertyNestedParent-Child (Join)
PerformanceFastSlow
UpdateRe-index entire documentUpdate child only
Query ComplexityLowHigh
Recommended ForRarely changing relationsFrequently changing 1:N
Key Points
  • Denormalization is the default strategy since Elasticsearch doesn’t support JOIN
  • Consider Application-Side Join for frequently changing data
  • Nested has good performance but requires full document re-indexing; Parent-Child allows individual updates

Best Practices#

1. Use text for search fields, keyword for filter/aggregation fields#

{
  "name": {
    "type": "text",
    "fields": { "keyword": { "type": "keyword" } }
  },
  "status": { "type": "keyword" }
}

2. Use keyword for numeric IDs#

{
  "user_id": { "type": "keyword" }  // Not long!
}

If no range queries needed, keyword is more efficient.

3. Exclude unnecessary fields from indexing#

{
  "raw_data": {
    "type": "object",
    "enabled": false    // Store only, not searchable
  }
}

4. Use Index Templates#

PUT /_index_template/logs
{
  "index_patterns": ["logs-*"],
  "template": {
    "mappings": {
      "properties": {
        "@timestamp": { "type": "date" },
        "message": { "type": "text" }
      }
    }
  }
}
Key Points
  • Configure search fields as text + keyword Multi-field
  • Numeric IDs are more efficient as keyword if no range queries
  • Exclude fields from indexing with enabled: false if not searching
  • Apply consistent Mapping with index templates

Next Steps#

GoalRecommended Document
Write search queriesQuery DSL
Improve search qualitySearch Relevance
Hands-on practiceBasic Examples