<?xml version="1.0" encoding="utf-8" standalone="yes"?><rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom"><channel><title>How-To Guide on Advanced Beginner</title><link>https://advanced-beginner.github.io/ko/docs/observability/howto/</link><description>Recent content in How-To Guide on Advanced Beginner</description><generator>Hugo</generator><language>ko-KR</language><managingEditor>d8lzz1gpw@mozmail.com (kimbenji)</managingEditor><webMaster>d8lzz1gpw@mozmail.com (kimbenji)</webMaster><lastBuildDate>Fri, 16 Jan 2026 09:24:28 +0000</lastBuildDate><atom:link href="https://advanced-beginner.github.io/ko/docs/observability/howto/index.xml" rel="self" type="application/rss+xml"/><item><title>높은 지연시간 진단</title><link>https://advanced-beginner.github.io/ko/docs/observability/howto/debug-high-latency/</link><pubDate>Mon, 12 Jan 2026 00:00:00 +0000</pubDate><author>d8lzz1gpw@mozmail.com (kimbenji)</author><guid>https://advanced-beginner.github.io/ko/docs/observability/howto/debug-high-latency/</guid><description>&lt;blockquote class='book-hint '&gt;
&lt;p&gt;&lt;strong&gt;대상 상황&lt;/strong&gt;: P99 응답시간이 SLA(500ms)를 초과
&lt;strong&gt;목표&lt;/strong&gt;: 병목 구간을 찾아 해결
&lt;strong&gt;소요 시간&lt;/strong&gt;: 15~30분 (문제 복잡도에 따라 상이)
&lt;strong&gt;성공 기준&lt;/strong&gt;: P99 응답시간이 SLA 임계값(500ms) 이하로 복구됨&lt;/p&gt;
&lt;/blockquote&gt;&lt;h2 id="문제-상황"&gt;문제 상황&lt;a class="anchor" href="#%eb%ac%b8%ec%a0%9c-%ec%83%81%ed%99%a9"&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;pre tabindex="0"&gt;&lt;code&gt;Alert: HighP99Latency
Service: order-service
P99: 2.5s (Threshold: 500ms)
Duration: 10 minutes&lt;/code&gt;&lt;/pre&gt;&lt;h2 id="진단-워크플로우"&gt;진단 워크플로우&lt;a class="anchor" href="#%ec%a7%84%eb%8b%a8-%ec%9b%8c%ed%81%ac%ed%94%8c%eb%a1%9c%ec%9a%b0"&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;pre class="mermaid"&gt;graph TD
 A[&amp;#34;1. 범위 파악&amp;lt;br&amp;gt;어떤 서비스? 언제부터?&amp;#34;]
 B[&amp;#34;2. 구간 분석&amp;lt;br&amp;gt;어디서 느린가?&amp;#34;]
 C[&amp;#34;3. 리소스 점검&amp;lt;br&amp;gt;CPU/메모리/DB?&amp;#34;]
 D[&amp;#34;4. 근본 원인&amp;lt;br&amp;gt;코드? 쿼리? 외부?&amp;#34;]
 E[&amp;#34;5. 해결&amp;#34;]

 A --&amp;gt; B --&amp;gt; C --&amp;gt; D --&amp;gt; E&lt;/pre&gt;&lt;h2 id="step-1-범위-파악"&gt;Step 1: 범위 파악&lt;a class="anchor" href="#step-1-%eb%b2%94%ec%9c%84-%ed%8c%8c%ec%95%85"&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;h3 id="영향-범위-확인"&gt;영향 범위 확인&lt;a class="anchor" href="#%ec%98%81%ed%96%a5-%eb%b2%94%ec%9c%84-%ed%99%95%ec%9d%b8"&gt;#&lt;/a&gt;&lt;/h3&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" class="chroma"&gt;&lt;code class="language-promql" data-lang="promql"&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="c1"&gt;# 어떤 서비스가 느린가?&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="k"&gt;topk&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;5&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kr"&gt;histogram_quantile&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="mf"&gt;0.99&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;sum&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;by&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="nv"&gt;service&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nv"&gt;le&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="kr"&gt;rate&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="nv"&gt;http_request_duration_seconds_bucket&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s"&gt;5m&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="o"&gt;))&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="o"&gt;)&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="c1"&gt;# 언제부터 느려졌나?&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="kr"&gt;histogram_quantile&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="mf"&gt;0.99&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;sum&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;by&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="nv"&gt;le&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="kr"&gt;rate&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="nv"&gt;http_request_duration_seconds_bucket&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="nl"&gt;service&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;&amp;#34;&lt;/span&gt;&lt;span class="s"&gt;order-service&lt;/span&gt;&lt;span class="p"&gt;&amp;#34;}[&lt;/span&gt;&lt;span class="s"&gt;5m&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="o"&gt;))&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="o"&gt;)&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="c1"&gt;# → Time range: Last 1 hour&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;h3 id="특정-엔드포인트-확인"&gt;특정 엔드포인트 확인&lt;a class="anchor" href="#%ed%8a%b9%ec%a0%95-%ec%97%94%eb%93%9c%ed%8f%ac%ec%9d%b8%ed%8a%b8-%ed%99%95%ec%9d%b8"&gt;#&lt;/a&gt;&lt;/h3&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" class="chroma"&gt;&lt;code class="language-promql" data-lang="promql"&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="c1"&gt;# 엔드포인트별 P99&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="kr"&gt;histogram_quantile&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="mf"&gt;0.99&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;sum&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;by&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="nv"&gt;uri&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nv"&gt;le&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="kr"&gt;rate&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="nv"&gt;http_request_duration_seconds_bucket&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="nl"&gt;service&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;&amp;#34;&lt;/span&gt;&lt;span class="s"&gt;order-service&lt;/span&gt;&lt;span class="p"&gt;&amp;#34;}[&lt;/span&gt;&lt;span class="s"&gt;5m&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="o"&gt;))&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="o"&gt;)&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;&lt;strong&gt;결과&lt;/strong&gt;: &lt;code&gt;/orders&lt;/code&gt; POST 엔드포인트가 느림&lt;/p&gt;</description></item><item><title>메트릭 카디널리티 최적화</title><link>https://advanced-beginner.github.io/ko/docs/observability/howto/reduce-cardinality/</link><pubDate>Mon, 12 Jan 2026 00:00:00 +0000</pubDate><author>d8lzz1gpw@mozmail.com (kimbenji)</author><guid>https://advanced-beginner.github.io/ko/docs/observability/howto/reduce-cardinality/</guid><description>&lt;blockquote class='book-hint '&gt;
&lt;p&gt;&lt;strong&gt;대상 상황&lt;/strong&gt;: Prometheus 메모리/스토리지 급증, 쿼리 느림
&lt;strong&gt;목표&lt;/strong&gt;: 불필요한 시계열을 줄여 비용 최적화
&lt;strong&gt;소요 시간&lt;/strong&gt;: 30분~1시간 (분석 및 수정 복잡도에 따라 상이)
&lt;strong&gt;성공 기준&lt;/strong&gt;: 시계열 수가 목표치 이하로 감소하고 메모리 사용량이 안정화됨&lt;/p&gt;
&lt;/blockquote&gt;&lt;h2 id="문제-상황"&gt;문제 상황&lt;a class="anchor" href="#%eb%ac%b8%ec%a0%9c-%ec%83%81%ed%99%a9"&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;pre tabindex="0"&gt;&lt;code&gt;Alert: PrometheusHighCardinality
Active Series: 2,500,000 (Threshold: 1,000,000)
Memory Usage: 32GB&lt;/code&gt;&lt;/pre&gt;&lt;h2 id="카디널리티란"&gt;카디널리티란?&lt;a class="anchor" href="#%ec%b9%b4%eb%94%94%eb%84%90%eb%a6%ac%ed%8b%b0%eb%9e%80"&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;&lt;strong&gt;카디널리티 = 고유한 시계열 수&lt;/strong&gt;&lt;/p&gt;
&lt;pre tabindex="0"&gt;&lt;code&gt;http_requests_total{method=&amp;#34;GET&amp;#34;, status=&amp;#34;200&amp;#34;, path=&amp;#34;/api/users&amp;#34;} # 1개
http_requests_total{method=&amp;#34;GET&amp;#34;, status=&amp;#34;200&amp;#34;, path=&amp;#34;/api/users/123&amp;#34;} # 또 1개!
http_requests_total{method=&amp;#34;GET&amp;#34;, status=&amp;#34;200&amp;#34;, path=&amp;#34;/api/users/456&amp;#34;} # 또 1개!&lt;/code&gt;&lt;/pre&gt;&lt;p&gt;&lt;strong&gt;문제&lt;/strong&gt;: path에 user_id가 들어가면 사용자 수만큼 시계열 생성&lt;/p&gt;</description></item><item><title>알림 피로도 관리</title><link>https://advanced-beginner.github.io/ko/docs/observability/howto/manage-alert-fatigue/</link><pubDate>Fri, 16 Jan 2026 00:00:00 +0000</pubDate><author>d8lzz1gpw@mozmail.com (kimbenji)</author><guid>https://advanced-beginner.github.io/ko/docs/observability/howto/manage-alert-fatigue/</guid><description>&lt;blockquote class='book-hint '&gt;
&lt;p&gt;&lt;strong&gt;대상 상황&lt;/strong&gt;: 하루에 수십&lt;del&gt;수백 건의 알림이 발생하여 중요한 알림을 놓침
&lt;strong&gt;목표&lt;/strong&gt;: 실질적으로 대응이 필요한 알림만 수신
&lt;strong&gt;소요 시간&lt;/strong&gt;: 1&lt;/del&gt;2시간 (알림 규칙 분석 및 수정)
&lt;strong&gt;성공 기준&lt;/strong&gt;: 일일 알림 수가 대응 가능한 수준(예: 10건 이하)으로 감소&lt;/p&gt;
&lt;/blockquote&gt;&lt;h2 id="시작하기-전에"&gt;시작하기 전에&lt;a class="anchor" href="#%ec%8b%9c%ec%9e%91%ed%95%98%ea%b8%b0-%ec%a0%84%ec%97%90"&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;h3 id="필요-환경"&gt;필요 환경&lt;a class="anchor" href="#%ed%95%84%ec%9a%94-%ed%99%98%ea%b2%bd"&gt;#&lt;/a&gt;&lt;/h3&gt;
&lt;table&gt;
 &lt;thead&gt;
 &lt;tr&gt;
 &lt;th&gt;구성요소&lt;/th&gt;
 &lt;th&gt;버전&lt;/th&gt;
 &lt;th&gt;확인 방법&lt;/th&gt;
 &lt;/tr&gt;
 &lt;/thead&gt;
 &lt;tbody&gt;
 &lt;tr&gt;
 &lt;td&gt;Prometheus&lt;/td&gt;
 &lt;td&gt;2.40+&lt;/td&gt;
 &lt;td&gt;&lt;code&gt;prometheus --version&lt;/code&gt;&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;Alertmanager&lt;/td&gt;
 &lt;td&gt;0.25+&lt;/td&gt;
 &lt;td&gt;&lt;code&gt;alertmanager --version&lt;/code&gt;&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;amtool&lt;/td&gt;
 &lt;td&gt;0.25+&lt;/td&gt;
 &lt;td&gt;&lt;code&gt;amtool --version&lt;/code&gt;&lt;/td&gt;
 &lt;/tr&gt;
 &lt;/tbody&gt;
&lt;/table&gt;
&lt;h3 id="필요-권한"&gt;필요 권한&lt;a class="anchor" href="#%ed%95%84%ec%9a%94-%ea%b6%8c%ed%95%9c"&gt;#&lt;/a&gt;&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;Prometheus 설정 파일(&lt;code&gt;prometheus.yml&lt;/code&gt;) 수정 권한&lt;/li&gt;
&lt;li&gt;Alertmanager 설정 파일(&lt;code&gt;alertmanager.yml&lt;/code&gt;) 수정 권한&lt;/li&gt;
&lt;li&gt;Prometheus/Alertmanager 재시작 권한&lt;/li&gt;
&lt;/ul&gt;
&lt;h3 id="환경-확인"&gt;환경 확인&lt;a class="anchor" href="#%ed%99%98%ea%b2%bd-%ed%99%95%ec%9d%b8"&gt;#&lt;/a&gt;&lt;/h3&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" class="chroma"&gt;&lt;code class="language-bash" data-lang="bash"&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="c1"&gt;# Prometheus 상태 확인&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;curl -s http://localhost:9090/-/healthy &lt;span class="o"&gt;&amp;amp;&amp;amp;&lt;/span&gt; &lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="s2"&gt;&amp;#34;Prometheus OK&amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="c1"&gt;# Alertmanager 상태 확인&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;curl -s http://localhost:9093/-/healthy &lt;span class="o"&gt;&amp;amp;&amp;amp;&lt;/span&gt; &lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="s2"&gt;&amp;#34;Alertmanager OK&amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="c1"&gt;# amtool 설정 확인&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;amtool config show --alertmanager.url&lt;span class="o"&gt;=&lt;/span&gt;http://localhost:9093&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;h2 id="문제-상황"&gt;문제 상황&lt;a class="anchor" href="#%eb%ac%b8%ec%a0%9c-%ec%83%81%ed%99%a9"&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;pre tabindex="0"&gt;&lt;code&gt;# 어제 발생한 알림 요약
Critical: 15건 (HighCPU 8건, HighMemory 7건)
Warning: 87건 (SlowResponse 45건, HighLatency 32건, PodRestart 10건)
총: 102건

# 실제 장애: 1건
# 놓친 알림: 1건 (HighCPU 알림에 묻힘)&lt;/code&gt;&lt;/pre&gt;&lt;p&gt;알림 피로도(Alert Fatigue)가 발생하면:&lt;/p&gt;</description></item></channel></rss>