Not just “how to use it” but explaining “why it was designed this way”.
Learning Path#
Foundational Concepts#
If you’re new to Observability, follow this order.
- Three Pillars of Observability - Roles of Metrics, Logs, Traces and their interconnections
- Metrics Fundamentals - Understanding Counter, Gauge, Histogram, Summary types
- Prometheus Architecture - Pull model, time series DB, service discovery
PromQL Deep Dive#
A deep exploration of the Prometheus query language.
- PromQL Overview - PromQL learning roadmap
- Syntax Basics - Selectors, label matching, time ranges
- Aggregation Operators - sum, avg, count, topk, by/without
- rate and increase - Core of Counter metric processing
- histogram_quantile - Calculating percentiles (P50/P95/P99)
- Recording Rules - Pre-computing complex queries
- Alerting Rules - Writing alerting rules
SRE Golden Signals#
Applying the 4 core indicators proposed by Google SRE by service type.
- Golden Signals Overview - Introduction to 4 signals and USE/RED methods
- Latency - Latency measurement strategies
- Traffic - Traffic/throughput monitoring
- Errors - Error rate definition and classification
- Saturation - Saturation (resource utilization)
- Application by Service Type - Guide for Web API, Kafka, DB
Logging and Tracing#
Integrating logs and distributed tracing beyond metrics.
- Log Aggregation - Loki vs ELK comparison, log design patterns
- Distributed Tracing - Span, Trace ID, Context Propagation
- OpenTelemetry - Observability standards and integration methods
Operations#
Practical knowledge for effective operations.
- Dashboard Design - Effective visualization principles
Document Structure Pattern#
Each concept document follows this structure:
1. TL;DR - Key summary (within 5 lines)
2. Why is it needed? - Problem situation and solution
3. Core Concepts - Detailed explanation + diagrams
4. Practical Examples - Code ready to apply
5. Trade-offs - Pros/cons and selection criteria
6. Next Steps - Related document linksRecommended Learning Path#
graph TD
subgraph "Beginner (1-2 hours)"
A["Three Pillars"] --> B["Metrics Fundamentals"]
B --> C["Prometheus Architecture"]
end
subgraph "PromQL Deep Dive (2-3 hours)"
D["Syntax Basics"] --> E["Aggregation Operators"]
E --> F["rate/increase"]
F --> G["histogram_quantile"]
G --> H["Recording Rules"]
H --> I["Alerting Rules"]
end
subgraph "SRE Perspective (1-2 hours)"
J["Golden Signals Overview"] --> K["4 Signals Deep Dive"]
K --> L["Application by Service Type"]
end
C --> D
C --> J
I --> L