Quick Navigation: A | C | E | F | G | H | I | L | M | O | P | R | S | T | U | W
A#
Alerting Rules#
Rules in Prometheus that define condition-based alerts. When conditions are met, alerts are sent to Alertmanager.
Alertmanager#
A component that receives alerts from Prometheus and groups, inhibits, and routes them.
C#
Cardinality#
The number of unique time series. Higher cardinality occurs with more label combinations.
Context Propagation#
A mechanism for passing Trace ID and Span ID between services in distributed systems.
Counter#
A monotonically increasing metric type. Used for request counts, error counts, etc. Calculate rate of change with rate() function.
E#
Exemplar#
A trace sample linked to a metric. Allows direct navigation from metrics to related traces.
Exporter#
A component that exposes application/system metrics in Prometheus format.
F#
Four Golden Signals#
Four core metrics proposed by Google SRE: Latency, Traffic, Errors, Saturation.
G#
Gauge#
A metric type representing current value. Can increase/decrease. Used for CPU usage, temperature, etc.
H#
Histogram#
A metric type that measures value distribution in buckets. Used for measuring response time distribution. Calculate percentiles with histogram_quantile().
I#
Instrumentation#
Adding observability data collection code to applications. Can be automatic or manual instrumentation.
irate()#
A PromQL function that calculates instantaneous rate using only the last two samples.
L#
Label#
Key-value metadata attached to metrics. Used for filtering and grouping.
LogQL#
Grafana Loki’s query language. Similar syntax to PromQL.
Loki#
Grafana’s log collection system. Lightweight with label-based indexing.
M#
Micrometer#
A metrics facade for JVM applications. Supports various backends like Prometheus, Datadog.
O#
OpenTelemetry (OTel)#
A vendor-neutral observability standard framework for metrics, logs, and traces.
OTLP#
OpenTelemetry Protocol. A standard protocol for transmitting observability data.
P#
Percentile#
A value below which a certain percentage of data falls in the distribution. P99 = 99% of values are below this.
PromQL#
Prometheus Query Language. A language for querying and analyzing time series data.
Pull Model#
A method where Prometheus actively scrapes targets for metrics (opposite of Push).
R#
rate()#
A PromQL function that calculates the average per-second rate of increase for Counters.
Recording Rules#
Prometheus rules that pre-calculate complex queries and store them as new metrics.
RED Method#
A microservice monitoring methodology measuring Rate, Errors, Duration.
S#
Sampling#
A technique that stores only a portion of all traces. Used for cost optimization.
Scrape#
The act of Prometheus collecting metrics from targets.
Service Level Indicator (SLI)#
A metric that measures service level. Example: P99 response time, error rate.
Service Level Objective (SLO)#
Target values for SLIs. Example: P99 < 500ms, 99.9% availability.
Span#
A single unit of work in distributed tracing. A Trace consists of multiple Spans.
T#
Tail-based Sampling#
A sampling method that prioritizes storing errors/slow requests after request completion.
Tempo#
Grafana’s distributed tracing backend. Optimized for large-scale trace storage.
Three Pillars#
The three pillars of Observability: Metrics, Logs, Traces.
Trace#
The complete path of a single request through a distributed system. Consists of multiple Spans.
Trace ID#
A unique ID identifying a trace. All Spans share the same Trace ID.
U#
USE Method#
A resource monitoring methodology measuring Utilization, Saturation, Errors.
W#
W3C Trace Context#
An HTTP header standard for distributed tracing. Uses traceparent header.