Official Documentation#

Prometheus#

ResourceLinkDescription
Prometheus Official Docshttps://prometheus.io/docs/Configuration, PromQL, operations guide
PromQL Referencehttps://prometheus.io/docs/prometheus/latest/querying/basics/Query language details
Alerting Ruleshttps://prometheus.io/docs/prometheus/latest/configuration/alerting_rules/Writing alerting rules

Grafana#

ResourceLinkDescription
Grafana Official Docshttps://grafana.com/docs/grafana/latest/Dashboard, panel configuration
Loki Docshttps://grafana.com/docs/loki/latest/Log collection/querying
Tempo Docshttps://grafana.com/docs/tempo/latest/Distributed tracing

OpenTelemetry#

ResourceLinkDescription
OpenTelemetry Officialhttps://opentelemetry.io/docs/Concepts, SDK, Collector
Java Instrumentationhttps://opentelemetry.io/docs/languages/java/Java auto/manual instrumentation
Semantic Conventionshttps://opentelemetry.io/docs/concepts/semantic-conventions/Standardized attribute names

Books#

Essential#

BookAuthorContent
Site Reliability EngineeringGoogle SRE TeamSRE principles, Golden Signals, SLO
Observability EngineeringCharity Majors, Liz Fong-JonesModern observability concepts
The SRE WorkbookGoogle SRE TeamPractical SRE application
BookAuthorContent
Prometheus: Up & RunningBrian BrazilDetailed Prometheus guide
Distributed Tracing in PracticeAustin Parker et al.Advanced distributed tracing
Database Reliability EngineeringLaine Campbell, Charity MajorsDatabase observability

Blogs & Articles#

Prometheus/Grafana#

SRE/Observability#


Videos#

Conferences#

VideoLinkContent
PromConhttps://www.youtube.com/@PrometheusIoPrometheus conference
GrafanaConhttps://www.youtube.com/@GrafanaGrafana conference
KubeConhttps://www.youtube.com/@caborggKubernetes, observability sessions

Tutorials#


Online Courses#

CoursePlatformDescription
Prometheus & GrafanaUdemyHands-on focused
Site Reliability EngineeringCourseraGoogle’s SRE course
Observability with OpenTelemetryLinux FoundationOTel introduction

Community#

Slack#

GitHub#

ProjectLink
Prometheushttps://github.com/prometheus/prometheus
Grafanahttps://github.com/grafana/grafana
Lokihttps://github.com/grafana/loki
Tempohttps://github.com/grafana/tempo
OpenTelemetryhttps://github.com/open-telemetry

Dashboards & Rules#

Grafana Dashboards#

IDNamePurpose
1860Node Exporter FullServer monitoring
3662Prometheus StatsPrometheus self-monitoring
4701JVM MicrometerSpring Boot JVM
7362MySQL OverviewMySQL monitoring
7587PostgreSQLPostgreSQL monitoring
11074Kafka ExporterKafka monitoring

Searchable at https://grafana.com/grafana/dashboards/

Alerting Rules#


Tools#

Testing & Validation#

ToolPurpose
promtoolValidate Prometheus config/rules
amtoolValidate Alertmanager config
logcliLoki CLI query tool

Simulation#

ToolPurpose
prometheus-fake-exporterGenerate fake metrics
heyHTTP load testing
k6Load testing + metrics

Certifications#

CertificationProviderContent
CKA/CKADCNCFKubernetes (Prometheus integration)
Prometheus Certified AssociateCNCFPrometheus official certification (2024~)
Grafana AssociateGrafana LabsGrafana fundamentals