Prerequisites

Before reading this document, understand these concepts first:

Learn how to analyze data and extract statistics using Elasticsearch Aggregations.

While Elasticsearch is a search engine, it’s also a powerful real-time analytics engine. When you search for “laptop” on an e-commerce site, the brand filters, price range filters, and rating distributions that appear on the left side are all results of Aggregations. Implementing these features with RDBMS would require complex subqueries and GROUP BY combinations, and performance issues would be severe with large datasets.

Aggregations can group data and calculate statistics simultaneously with search queries, allowing you to get both search results and filter facets in a single request. This improves both user experience and server efficiency. Aggregations play a core role in various areas such as dashboard building, business intelligence, and log analysis.

What is Aggregation?#

Aggregation is a feature for grouping search results and calculating statistics. Similar to SQL’s GROUP BY and aggregate functions (COUNT, SUM, AVG).

Basic Structure#

GET /products/_search
{
  "size": 0,                    // Don't need search results
  "aggs": {
    "aggregation_name": {
      "aggregation_type": {
        // Aggregation settings
      }
    }
  }
}

Bucket Aggregations#

Divide data into groups (buckets). Similar to SQL’s GROUP BY.

terms#

Group by field values:

GET /products/_search
{
  "size": 0,
  "aggs": {
    "categories": {
      "terms": {
        "field": "category",
        "size": 10           // Top 10 buckets
      }
    }
  }
}

SQL: SELECT category, COUNT(*) FROM products GROUP BY category

Response:

{
  "aggregations": {
    "categories": {
      "buckets": [
        { "key": "Laptop", "doc_count": 150 },
        { "key": "Tablet", "doc_count": 80 },
        { "key": "Smartphone", "doc_count": 200 }
      ]
    }
  }
}

Sorting Options#

{
  "terms": {
    "field": "category",
    "order": { "_count": "asc" }     // Ascending by document count
    // "order": { "_key": "desc" }   // Descending by key name
  }
}

range#

Group by numeric ranges:

GET /products/_search
{
  "size": 0,
  "aggs": {
    "price_ranges": {
      "range": {
        "field": "price",
        "ranges": [
          { "to": 500000, "key": "Under $5,000" },
          { "from": 500000, "to": 1000000, "key": "$5,000-$10,000" },
          { "from": 1000000, "to": 2000000, "key": "$10,000-$20,000" },
          { "from": 2000000, "key": "Over $20,000" }
        ]
      }
    }
  }
}

date_range#

Group by date ranges:

{
  "aggs": {
    "sales_periods": {
      "date_range": {
        "field": "created_at",
        "format": "yyyy-MM-dd",
        "ranges": [
          { "from": "2024-01-01", "to": "2024-04-01", "key": "Q1" },
          { "from": "2024-04-01", "to": "2024-07-01", "key": "Q2" }
        ]
      }
    }
  }
}

histogram#

Group by fixed intervals:

GET /products/_search
{
  "size": 0,
  "aggs": {
    "price_histogram": {
      "histogram": {
        "field": "price",
        "interval": 500000,      // $5,000 intervals
        "min_doc_count": 1       // Only buckets with at least 1 doc
      }
    }
  }
}

date_histogram#

Group by time intervals:

GET /orders/_search
{
  "size": 0,
  "aggs": {
    "orders_over_time": {
      "date_histogram": {
        "field": "order_date",
        "calendar_interval": "month",  // day, week, month, year
        "format": "yyyy-MM"
      }
    }
  }
}

interval Options#

TypeOptionExample
calendar_intervalday, week, month, quarter, yearCalendar-based
fixed_interval1d, 12h, 30mFixed time

filter / filters#

Group by conditions:

GET /products/_search
{
  "size": 0,
  "aggs": {
    "stock_status": {
      "filters": {
        "filters": {
          "in_stock": { "term": { "in_stock": true } },
          "out_of_stock": { "term": { "in_stock": false } }
        }
      }
    }
  }
}

Metric Aggregations#

Calculate statistics on numeric data.

Basic Metrics#

GET /products/_search
{
  "size": 0,
  "aggs": {
    "avg_price": { "avg": { "field": "price" } },
    "max_price": { "max": { "field": "price" } },
    "min_price": { "min": { "field": "price" } },
    "total_price": { "sum": { "field": "price" } },
    "product_count": { "value_count": { "field": "price" } }
  }
}

stats#

Multiple statistics at once:

GET /products/_search
{
  "size": 0,
  "aggs": {
    "price_stats": {
      "stats": { "field": "price" }
    }
  }
}

Response:

{
  "aggregations": {
    "price_stats": {
      "count": 100,
      "min": 100000,
      "max": 5000000,
      "avg": 1500000,
      "sum": 150000000
    }
  }
}

extended_stats#

Additional statistics including standard deviation:

{
  "aggs": {
    "price_extended": {
      "extended_stats": { "field": "price" }
    }
  }
}

Additional fields: variance, std_deviation, std_deviation_bounds

cardinality#

Unique value count (DISTINCT):

GET /orders/_search
{
  "size": 0,
  "aggs": {
    "unique_customers": {
      "cardinality": {
        "field": "customer_id"
      }
    }
  }
}

SQL: SELECT COUNT(DISTINCT customer_id) FROM orders

percentiles#

Percentile values:

GET /products/_search
{
  "size": 0,
  "aggs": {
    "price_percentiles": {
      "percentiles": {
        "field": "price",
        "percents": [25, 50, 75, 90, 99]
      }
    }
  }
}

Nested Aggregations#

Why aren’t single aggregations enough? Just knowing “product count by category” is insufficient. You need multi-dimensional analysis like “average price by category” or “sales by brand within each category,” but a single aggregation cannot express such hierarchical analysis. Nested aggregations place additional aggregations inside buckets to perform multi-level analysis in a single request.

Perform additional aggregations within buckets.

Bucket + Metric#

Average price by category:

GET /products/_search
{
  "size": 0,
  "aggs": {
    "categories": {
      "terms": { "field": "category" },
      "aggs": {
        "avg_price": { "avg": { "field": "price" } },
        "max_price": { "max": { "field": "price" } }
      }
    }
  }
}

SQL:

SELECT category, AVG(price), MAX(price)
FROM products
GROUP BY category

Multi-level Nesting#

Category > Brand > Statistics:

GET /products/_search
{
  "size": 0,
  "aggs": {
    "by_category": {
      "terms": { "field": "category" },
      "aggs": {
        "by_brand": {
          "terms": { "field": "brand" },
          "aggs": {
            "price_stats": {
              "stats": { "field": "price" }
            }
          }
        }
      }
    }
  }
}

Pipeline Aggregations#

Why do we need pipeline aggregations? You can get “monthly sales totals” with Bucket + Metric aggregations. But questions like “how much did this month’s sales increase compared to last month?” or “what is the monthly average sales?” require post-processing of aggregation results. Pipeline Aggregations take the output of other aggregations as input to perform this kind of secondary analysis.

Use results from other aggregations as input.

avg_bucket#

Average of buckets:

GET /orders/_search
{
  "size": 0,
  "aggs": {
    "monthly_sales": {
      "date_histogram": {
        "field": "order_date",
        "calendar_interval": "month"
      },
      "aggs": {
        "total_sales": { "sum": { "field": "amount" } }
      }
    },
    "avg_monthly_sales": {
      "avg_bucket": {
        "buckets_path": "monthly_sales>total_sales"
      }
    }
  }
}

derivative#

Calculate change:

{
  "aggs": {
    "monthly_sales": {
      "date_histogram": {
        "field": "order_date",
        "calendar_interval": "month"
      },
      "aggs": {
        "sales": { "sum": { "field": "amount" } },
        "sales_change": {
          "derivative": {
            "buckets_path": "sales"
          }
        }
      }
    }
  }
}

cumulative_sum#

Cumulative sum:

{
  "aggs": {
    "daily_sales": {
      "date_histogram": {
        "field": "order_date",
        "calendar_interval": "day"
      },
      "aggs": {
        "sales": { "sum": { "field": "amount" } },
        "cumulative_sales": {
          "cumulative_sum": {
            "buckets_path": "sales"
          }
        }
      }
    }
  }
}

Combining Search and Aggregations#

Aggregate on Filtered Data#

GET /products/_search
{
  "size": 0,
  "query": {
    "bool": {
      "filter": [
        { "term": { "in_stock": true } },
        { "range": { "created_at": { "gte": "2024-01-01" } } }
      ]
    }
  },
  "aggs": {
    "categories": {
      "terms": { "field": "category" }
    }
  }
}

global Aggregation#

Ignore query and aggregate on all data:

GET /products/_search
{
  "query": {
    "match": { "name": "MacBook" }
  },
  "aggs": {
    "filtered_count": {
      "value_count": { "field": "_id" }
    },
    "all_products": {
      "global": {},
      "aggs": {
        "total_count": {
          "value_count": { "field": "_id" }
        }
      }
    }
  }
}

Practical Examples#

Dashboard Statistics#

GET /orders/_search
{
  "size": 0,
  "query": {
    "range": {
      "order_date": {
        "gte": "now-30d/d",
        "lt": "now/d"
      }
    }
  },
  "aggs": {
    "total_revenue": { "sum": { "field": "amount" } },
    "order_count": { "value_count": { "field": "_id" } },
    "avg_order_value": { "avg": { "field": "amount" } },
    "unique_customers": { "cardinality": { "field": "customer_id" } },
    "daily_trend": {
      "date_histogram": {
        "field": "order_date",
        "calendar_interval": "day"
      },
      "aggs": {
        "revenue": { "sum": { "field": "amount" } }
      }
    },
    "top_products": {
      "terms": {
        "field": "product_id",
        "size": 5
      },
      "aggs": {
        "revenue": { "sum": { "field": "amount" } }
      }
    }
  }
}

Product Filter Facets#

GET /products/_search
{
  "size": 10,
  "query": {
    "match": { "name": "laptop" }
  },
  "aggs": {
    "brands": {
      "terms": { "field": "brand", "size": 20 }
    },
    "price_ranges": {
      "range": {
        "field": "price",
        "ranges": [
          { "to": 1000000, "key": "Under $10,000" },
          { "from": 1000000, "to": 2000000, "key": "$10,000-$20,000" },
          { "from": 2000000, "key": "Over $20,000" }
        ]
      }
    },
    "ratings": {
      "terms": { "field": "rating" }
    }
  }
}

Performance Tips#

1. Use size: 0#

Exclude search results if only aggregations needed:

{ "size": 0, "aggs": {...} }

2. Only Necessary Buckets#

{
  "terms": {
    "field": "category",
    "size": 10,          // Only as many as needed
    "shard_size": 25     // Per-shard collection (accuracy vs performance)
  }
}

3. Composite Aggregation#

Pagination for large data:

{
  "aggs": {
    "my_composite": {
      "composite": {
        "size": 1000,
        "sources": [
          { "category": { "terms": { "field": "category" } } },
          { "brand": { "terms": { "field": "brand" } } }
        ],
        "after": { "category": "Laptop", "brand": "Apple" }  // Next page
      }
    }
  }
}

SQL Comparison#

SQLAggregation
COUNT(*)value_count
COUNT(DISTINCT x)cardinality
SUM(x)sum
AVG(x)avg
MIN(x)min
MAX(x)max
GROUP BY xterms
GROUP BY CASE WHENrange / filters

Next Steps#

GoalRecommended Document
Indexing optimizationIndexing Strategy
Practical implementationProduct Search System
Performance optimizationPerformance Tuning