Target Audience: Backend developers who want to configure auto-scaling in Kubernetes Prerequisites: Deployment, resource management concepts After reading this: You will understand auto-scaling methods using HPA and VPA
TL;DR
- HPA (Horizontal Pod Autoscaler) automatically adjusts Pod count
- VPA (Vertical Pod Autoscaler) automatically adjusts Pod resource requests
- In most cases, consider HPA first
Scaling Methods Comparison#
Kubernetes provides two scaling methods.
| Method | Description | Suitable For |
|---|---|---|
| Horizontal scaling (HPA) | Adjust Pod count | Stateless applications |
| Vertical scaling (VPA) | Adjust Pod resources | Stateful, resource optimization |
flowchart LR
subgraph Horizontal[Horizontal Scaling]
H1[Pod] --> H2[Pod]
H2 --> H3[Pod]
end
subgraph Vertical[Vertical Scaling]
V1[Pod<br>256Mi] --> V2[Pod<br>512Mi]
endHPA (Horizontal Pod Autoscaler)#
HPA automatically adjusts Pod count based on metrics (CPU, memory, custom).
HPA Operation Principle#
flowchart LR
MS[Metrics Server] -->|collect metrics| HPA
HPA -->|current vs target| CALC[Calculate]
CALC -->|scale decision| D[Deployment]
D --> P1[Pod]
D --> P2[Pod]
D --> PN[Pod N]HPA operation sequence:
| Step | Action |
|---|---|
| 1 | Collect current metrics from Metrics Server (15s interval) |
| 2 | Compare target metric with current metric |
| 3 | Calculate required replica count |
| 4 | Adjust Deployment’s replicas |
Creating HPA#
Create with command:
kubectl autoscale deployment my-app \
--cpu-percent=50 \
--min=2 \
--max=10Create with YAML:
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
name: my-app-hpa
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: my-app
minReplicas: 2
maxReplicas: 10
metrics:
- type: Resource
resource:
name: cpu
target:
type: Utilization
averageUtilization: 50Key fields explained:
| Field | Description |
|---|---|
| scaleTargetRef | Scaling target (Deployment, etc.) |
| minReplicas | Minimum Pod count |
| maxReplicas | Maximum Pod count |
| metrics | Scaling criteria metrics |
| averageUtilization | Target utilization (%) |
Multi-Metric HPA#
HPA considering both CPU and memory.
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
name: my-app-hpa
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: my-app
minReplicas: 2
maxReplicas: 10
metrics:
- type: Resource
resource:
name: cpu
target:
type: Utilization
averageUtilization: 50
- type: Resource
resource:
name: memory
target:
type: Utilization
averageUtilization: 70With multiple metrics, the largest replica count calculated from each metric is applied.
HPA Scaling Calculation#
HPA calculates required replicas with this formula:
desiredReplicas = ceil(currentReplicas × (currentMetric / targetMetric))Examples:
| Current State | Calculation | Result |
|---|---|---|
| 3 Pods, CPU 70%, target 50% | 3 × (70/50) = 4.2 | 5 Pods |
| 5 Pods, CPU 30%, target 50% | 5 × (30/50) = 3 | 3 Pods |
Controlling Scaling Behavior#
Configuration to prevent rapid scaling.
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
name: my-app-hpa
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: my-app
minReplicas: 2
maxReplicas: 10
behavior:
scaleDown:
stabilizationWindowSeconds: 300 # 5 minute stabilization
policies:
- type: Percent
value: 10
periodSeconds: 60 # Max 10% decrease per minute
scaleUp:
stabilizationWindowSeconds: 0
policies:
- type: Percent
value: 100
periodSeconds: 15 # Max 100% increase per 15s
metrics:
- type: Resource
resource:
name: cpu
target:
type: Utilization
averageUtilization: 50Comparing scale up/down policies:
| Item | Scale Up | Scale Down |
|---|---|---|
| Speed | Fast | Slow |
| Reason | Respond to traffic surge | Prevent unnecessary scale down |
VPA (Vertical Pod Autoscaler)#
VPA automatically adjusts Pod resource requests/limits.
VPA is not installed by default
VPA must be installed separately. See VPA GitHub.
VPA Components#
| Component | Role |
|---|---|
| Recommender | Analyze resource usage, calculate recommendations |
| Updater | Restart Pods to apply new resources |
| Admission Controller | Inject resources when creating new Pods |
VPA Example#
apiVersion: autoscaling.k8s.io/v1
kind: VerticalPodAutoscaler
metadata:
name: my-app-vpa
spec:
targetRef:
apiVersion: apps/v1
kind: Deployment
name: my-app
updatePolicy:
updateMode: "Auto" # Off, Initial, Recreate, Auto
resourcePolicy:
containerPolicies:
- containerName: app
minAllowed:
memory: "128Mi"
cpu: "100m"
maxAllowed:
memory: "2Gi"
cpu: "2"Comparing updateMode options:
| Mode | Behavior |
|---|---|
| Off | Provide recommendations only, don’t apply |
| Initial | Apply only when creating new Pods |
| Recreate | Restart existing Pods to apply |
| Auto | Automatically select most suitable method |
HPA vs VPA Selection Guide#
Which scaling should you choose? Refer to this flowchart.
flowchart TD
START[Need scaling] --> Q1{Application<br>type?}
Q1 -->|Stateless<br>web server, API| Q2{Traffic fluctuation<br>large?}
Q1 -->|Stateful<br>DB, cache| VPA_REC[VPA recommended]
Q2 -->|Yes| HPA_REC[HPA recommended]
Q2 -->|No| Q3{Resource configuration<br>appropriate?}
Q3 -->|Don't know| VPA_OFF[Check recommendations with VPA<br>updateMode: Off]
Q3 -->|Need optimization| VPA_REC
HPA_REC --> Q4{Resource optimization<br>also needed?}
Q4 -->|Yes| BOTH[Use HPA + VPA together<br>HPA for CPU, VPA for memory]
Q4 -->|No| HPA_ONLY[Use HPA only]
VPA_REC --> VPA_ONLY[Use VPA]
VPA_OFF --> MANUAL[Adjust settings manually]| Criteria | HPA | VPA |
|---|---|---|
| Stateless applications | ✓ | |
| Stateful applications | ✓ | |
| Respond to traffic fluctuation | ✓ | |
| Resource optimization | ✓ | |
| Immediate application | ✓ | ✗ (requires restart) |
Real Selection Examples#
| Situation | Recommended | Reason |
|---|---|---|
| REST API server, high traffic fluctuation | HPA | Respond quickly with Pod count |
| Batch job server | VPA | Optimize resource usage |
| Server with limited DB connections | HPA (careful with max) | Need to manage connections based on Pod count |
| New service, don’t know appropriate resources | VPA (Off mode) | Collect recommendations then configure |
Using HPA and VPA Together
When using HPA and VPA together on the same Deployment, caution is needed. It’s recommended to separate VPA adjusting only memory and HPA scaling based on CPU.
Prerequisites#
To use HPA, the following are required:
| Requirement | Description |
|---|---|
| Metrics Server | Must be installed in cluster |
| Resource requests | Pod must have CPU/memory requests configured |
Check Metrics Server Installation#
# Check Metrics Server operation
kubectl top nodes
kubectl top podsIf you get error: Metrics API not available, install Metrics Server.
# Minikube
minikube addons enable metrics-server
# Other environments
kubectl apply -f https://github.com/kubernetes-sigs/metrics-server/releases/latest/download/components.yamlPractice: Configuring and Testing HPA#
Deploy Test Application#
apiVersion: apps/v1
kind: Deployment
metadata:
name: php-apache
spec:
replicas: 1
selector:
matchLabels:
app: php-apache
template:
metadata:
labels:
app: php-apache
spec:
containers:
- name: php-apache
image: k8s.gcr.io/hpa-example
ports:
- containerPort: 80
resources:
requests:
cpu: 200m
limits:
cpu: 500m
---
apiVersion: v1
kind: Service
metadata:
name: php-apache
spec:
ports:
- port: 80
selector:
app: php-apacheCreate HPA#
kubectl autoscale deployment php-apache --cpu-percent=50 --min=1 --max=10Load Test#
Generate load in a new terminal:
kubectl run -i --tty load-generator --rm --image=busybox --restart=Never \
-- /bin/sh -c "while sleep 0.01; do wget -q -O- http://php-apache; done"Check HPA Operation#
# Monitor HPA status
kubectl get hpa php-apache --watch
# Expected output:
# NAME REFERENCE TARGETS MINPODS MAXPODS REPLICAS
# php-apache Deployment/php-apache 0%/50% 1 10 1
# php-apache Deployment/php-apache 250%/50% 1 10 1
# php-apache Deployment/php-apache 250%/50% 1 10 5When you stop the load generator (Ctrl+C), Pod count decreases after a while.
Next Steps#
Once you understand scaling, proceed to the next steps:
| Goal | Recommended Doc |
|---|---|
| Configure health checks | Health Checks |
| Resource optimization | Resource Optimization |
| Actual deployment practice | Spring Boot Deployment |