Not just “how to use it” but explaining “why it was designed this way”.

Learning Path#

Foundational Concepts#

If you’re new to Observability, follow this order.

  1. Three Pillars of Observability - Roles of Metrics, Logs, Traces and their interconnections
  2. Metrics Fundamentals - Understanding Counter, Gauge, Histogram, Summary types
  3. Prometheus Architecture - Pull model, time series DB, service discovery

PromQL Deep Dive#

A deep exploration of the Prometheus query language.

  1. PromQL Overview - PromQL learning roadmap
  2. Syntax Basics - Selectors, label matching, time ranges
  3. Aggregation Operators - sum, avg, count, topk, by/without
  4. rate and increase - Core of Counter metric processing
  5. histogram_quantile - Calculating percentiles (P50/P95/P99)
  6. Recording Rules - Pre-computing complex queries
  7. Alerting Rules - Writing alerting rules

SRE Golden Signals#

Applying the 4 core indicators proposed by Google SRE by service type.

  1. Golden Signals Overview - Introduction to 4 signals and USE/RED methods
  2. Latency - Latency measurement strategies
  3. Traffic - Traffic/throughput monitoring
  4. Errors - Error rate definition and classification
  5. Saturation - Saturation (resource utilization)
  6. Application by Service Type - Guide for Web API, Kafka, DB

Logging and Tracing#

Integrating logs and distributed tracing beyond metrics.

  1. Log Aggregation - Loki vs ELK comparison, log design patterns
  2. Distributed Tracing - Span, Trace ID, Context Propagation
  3. OpenTelemetry - Observability standards and integration methods

Operations#

Practical knowledge for effective operations.

  1. Dashboard Design - Effective visualization principles

Document Structure Pattern#

Each concept document follows this structure:

1. TL;DR - Key summary (within 5 lines)
2. Why is it needed? - Problem situation and solution
3. Core Concepts - Detailed explanation + diagrams
4. Practical Examples - Code ready to apply
5. Trade-offs - Pros/cons and selection criteria
6. Next Steps - Related document links
graph TD
    subgraph "Beginner (1-2 hours)"
        A["Three Pillars"] --> B["Metrics Fundamentals"]
        B --> C["Prometheus Architecture"]
    end

    subgraph "PromQL Deep Dive (2-3 hours)"
        D["Syntax Basics"] --> E["Aggregation Operators"]
        E --> F["rate/increase"]
        F --> G["histogram_quantile"]
        G --> H["Recording Rules"]
        H --> I["Alerting Rules"]
    end

    subgraph "SRE Perspective (1-2 hours)"
        J["Golden Signals Overview"] --> K["4 Signals Deep Dive"]
        K --> L["Application by Service Type"]
    end

    C --> D
    C --> J
    I --> L