Search Relevance

Prerequisites
Before reading this document, understand these concepts first:
Query DSL - match, bool query basics
Data Modeling - Analyzer operation principles

Learn relevance tuning methods including Score, BM25, and Boosting to improve search result quality.

The core value of a search engine is enabling users to find what they want on the first page. No matter how quickly results are returned, if what users want is on page 10, it’s meaningless. Search Relevance is the key concept for solving this problem.

Elasticsearch uses the BM25 algorithm by default to assign a relevance score to each document. However, default scores alone often cannot meet business requirements. Requirements like “prioritize promotional products,” “give bonus points to recent products,” or “deprioritize out-of-stock items” are implemented through Boosting and Function Score. This document covers various tuning techniques to improve search quality.

What is Score?#

Why do we need scoring? When you search for “MacBook” and thousands of documents match, which ones should be shown first? Sorting by insertion order or alphabetically could bury the most relevant result on page 10. Score quantifies “how relevant” each document is to the search query, surfacing the most appropriate results to the top.

Score is a number indicating how relevant a document is to the search query. Higher scores appear higher in search results.

GET /products/_search
{
  "query": {
    "match": { "name": "MacBook Pro" }
  }
}

Response:

{
  "hits": {
    "hits": [
      {
        "_score": 2.876,          // Relevance score
        "_source": { "name": "MacBook Pro 14-inch" }
      },
      {
        "_score": 1.234,
        "_source": { "name": "MacBook Air" }
      }
    ]
  }
}

BM25 Algorithm#

Elasticsearch uses BM25 (Best Matching 25) as its default scoring algorithm.

BM25 Core Elements#

Score = IDF × TF × fieldLength

Element	Meaning	Example
TF (Term Frequency)	Term frequency in document	“MacBook” appears 3 times → score increases
IDF (Inverse Document Frequency)	Rarity across all documents	Rare words → score increases
Field Length	Field length	Match in shorter field → score increases

TF (Term Frequency)#

More occurrences of the search term = higher relevance:

“MacBook” appears 1 time → score 1.0
“MacBook” appears 3 times → score ~1.7 (logarithmic scale)

IDF (Inverse Document Frequency)#

Rarer words are considered more important:

“the” (appears in 1M documents) → low IDF
“MacBook” (appears in 10K documents) → high IDF

Field Length Normalization#

Matches in shorter fields are considered more relevant:

“MacBook Pro” (2-word field) → high score
“Apple’s latest MacBook Pro model released…” (10-word field) → lower score

Score Analysis#

Explain API#

See how the score was calculated:

GET /products/_search
{
  "explain": true,
  "query": {
    "match": { "name": "MacBook" }
  }
}

Response (simplified):

{
  "_explanation": {
    "value": 1.234,
    "description": "weight(name:MacBook)",
    "details": [
      {
        "value": 0.876,
        "description": "idf, computed as..."
      },
      {
        "value": 1.41,
        "description": "tf, computed as freq=1.0..."
      }
    ]
  }
}

Profile API#

Analyze query execution performance:

GET /products/_search
{
  "profile": true,
  "query": {
    "match": { "name": "MacBook" }
  }
}

Boosting#

Why isn’t the default BM25 score enough? BM25’s default scoring alone cannot reflect business requirements. Requirements like “surface promotional products higher” or “name matches should be more important than description matches” are beyond what the default algorithm can know. Boosting assigns weights to specific fields or conditions to incorporate business logic into search rankings.

Adjust scores by adding weights to specific conditions.

Field Boosting#

Add weight to important fields:

GET /products/_search
{
  "query": {
    "multi_match": {
      "query": "MacBook Pro",
      "fields": [
        "name^3",           // name field 3x weight
        "description"       // description field 1x (default)
      ]
    }
  }
}

Boosting in Bool Query#

GET /products/_search
{
  "query": {
    "bool": {
      "must": [
        { "match": { "name": "MacBook" } }
      ],
      "should": [
        {
          "term": {
            "is_promotion": {
              "value": true,
              "boost": 2.0        // Promotion items 2x score
            }
          }
        },
        {
          "range": {
            "rating": {
              "gte": 4.5,
              "boost": 1.5        // High-rated items 1.5x score
            }
          }
        }
      ]
    }
  }
}

Negative Boosting#

Lower scores for certain conditions:

GET /products/_search
{
  "query": {
    "boosting": {
      "positive": {
        "match": { "name": "MacBook" }
      },
      "negative": {
        "term": { "condition": "refurbished" }
      },
      "negative_boost": 0.5    // Reduce score to 50%
    }
  }
}

Function Score Query#

Why do we need Function Score Query? Simple Boosting cannot express complex scoring like “give bonus points to recent products but halve the score after 30 days, add extra boost for items with 100+ sales, and throw in some randomness.” Function Score Query combines multiple functions to incorporate such sophisticated business logic into search rankings.

Implement complex scoring logic.

Basic Structure#

GET /products/_search
{
  "query": {
    "function_score": {
      "query": { "match": { "name": "laptop" } },
      "functions": [
        // Score adjustment functions
      ],
      "score_mode": "sum",      // How to combine function results
      "boost_mode": "multiply"  // How to combine with original score
    }
  }
}

score_mode Options#

Value	Description
`multiply`	Multiply function results
`sum`	Sum function results
`avg`	Average of function results
`max`	Maximum of function results
`min`	Minimum of function results
`first`	Use only first function result

boost_mode Options#

Value	Description
`multiply`	Original score × function result
`replace`	Replace with function result
`sum`	Original score + function result
`avg`	Average
`max`	Maximum of both
`min`	Minimum of both

Practical Example: Product Search Ranking#

GET /products/_search
{
  "query": {
    "function_score": {
      "query": {
        "bool": {
          "must": [
            { "match": { "name": "laptop" } }
          ],
          "filter": [
            { "term": { "in_stock": true } }
          ]
        }
      },
      "functions": [
        {
          // Boost popular products
          "filter": { "range": { "sales_count": { "gte": 100 } } },
          "weight": 2
        },
        {
          // Boost recent products (decay by date)
          "gauss": {
            "created_at": {
              "origin": "now",
              "scale": "30d",
              "decay": 0.5
            }
          }
        },
        {
          // Factor in rating
          "field_value_factor": {
            "field": "rating",
            "factor": 1.2,
            "modifier": "sqrt",
            "missing": 3
          }
        },
        {
          // Add randomness (diversity)
          "random_score": {
            "seed": 12345,
            "field": "_seq_no"
          },
          "weight": 0.1
        }
      ],
      "score_mode": "sum",
      "boost_mode": "multiply"
    }
  }
}

Decay Functions#

Decrease scores based on distance or time.

graph LR
    A[origin<br>Reference point] --> B[scale<br>Decay range]
    B --> C[decay<br>Decay rate]

This diagram shows how the three Decay Function parameters (origin, scale, decay) are applied to score reduction.

Function	Decay Shape
`linear`	Linear decay
`exp`	Exponential decay
`gauss`	Gaussian curve

{
  "gauss": {
    "location": {
      "origin": { "lat": 37.5, "lon": 127.0 },
      "scale": "5km",
      "decay": 0.5
    }
  }
}

→ Score decays to 50% at 5km from origin

Search Quality Improvement Techniques#

1. Synonym Handling#

Configure custom Analyzer for synonym handling. → Analyzer basics

PUT /products
{
  "settings": {
    "analysis": {
      "filter": {
        "synonym_filter": {
          "type": "synonym",
          "synonyms": [
            "laptop, notebook, portable computer",
            "phone, smartphone, mobile"
          ]
        }
      },
      "analyzer": {
        "synonym_analyzer": {
          "tokenizer": "standard",
          "filter": ["lowercase", "synonym_filter"]
        }
      }
    }
  }
}

2. Autocomplete (Prefix Matching)#

GET /products/_search
{
  "query": {
    "match_phrase_prefix": {
      "name": {
        "query": "MacBook P",
        "max_expansions": 10
      }
    }
  }
}

3. Typo Correction (Fuzzy)#

GET /products/_search
{
  "query": {
    "match": {
      "name": {
        "query": "Macbok",
        "fuzziness": "AUTO"
      }
    }
  }
}

4. Highlighting#

GET /products/_search
{
  "query": {
    "match": { "description": "M3 chip" }
  },
  "highlight": {
    "fields": {
      "description": {
        "pre_tags": ["<strong>"],
        "post_tags": ["</strong>"],
        "fragment_size": 150
      }
    }
  }
}

Practical Tips#

1. Use filter Actively#

Conditions that don’t need Score go in filter:

{
  "bool": {
    "must": [
      { "match": { "name": "MacBook" } }      // Needs score
    ],
    "filter": [
      { "term": { "category": "Laptop" } },   // No score needed
      { "term": { "in_stock": true } }
    ]
  }
}

2. A/B Test Field Weights#

Find optimal weights through testing:

// Experiment A
"fields": ["name^3", "description^1"]

// Experiment B
"fields": ["name^2", "description^2"]

3. Score Normalization#

Use max_boost in function_score:

{
  "function_score": {
    "max_boost": 10,    // Limit maximum boost
    "functions": [...]
  }
}

4. Search Quality Metrics#

Precision: Ratio of relevant results among returned results
Recall: Ratio of returned results among all relevant documents
NDCG: Quality score considering ranking

Summary#

Technique	Use Case	Example
Field Boost	Emphasize important fields	`name^3`
Bool should	Optional boosting	Prioritize promotion items
Negative Boost	Lower scores	Deprioritize refurbished items
Function Score	Complex logic	Popularity, freshness factors
Decay	Distance/time based	Recency, proximity

Next Steps#

Goal	Recommended Document
Data analysis	Aggregations
Practical implementation	Product Search System
Performance optimization	Performance Tuning