Target Audience: Backend developers who want to configure auto-scaling in Kubernetes Prerequisites: Deployment, resource management concepts After reading this: You will understand auto-scaling methods using HPA and VPA

TL;DR
  • HPA (Horizontal Pod Autoscaler) automatically adjusts Pod count
  • VPA (Vertical Pod Autoscaler) automatically adjusts Pod resource requests
  • In most cases, consider HPA first

Scaling Methods Comparison#

Kubernetes provides two scaling methods.

MethodDescriptionSuitable For
Horizontal scaling (HPA)Adjust Pod countStateless applications
Vertical scaling (VPA)Adjust Pod resourcesStateful, resource optimization
flowchart LR
    subgraph Horizontal[Horizontal Scaling]
        H1[Pod] --> H2[Pod]
        H2 --> H3[Pod]
    end
    subgraph Vertical[Vertical Scaling]
        V1[Pod<br>256Mi] --> V2[Pod<br>512Mi]
    end

HPA (Horizontal Pod Autoscaler)#

HPA automatically adjusts Pod count based on metrics (CPU, memory, custom).

HPA Operation Principle#

flowchart LR
    MS[Metrics Server] -->|collect metrics| HPA
    HPA -->|current vs target| CALC[Calculate]
    CALC -->|scale decision| D[Deployment]
    D --> P1[Pod]
    D --> P2[Pod]
    D --> PN[Pod N]

HPA operation sequence:

StepAction
1Collect current metrics from Metrics Server (15s interval)
2Compare target metric with current metric
3Calculate required replica count
4Adjust Deployment’s replicas

Creating HPA#

Create with command:

kubectl autoscale deployment my-app \
  --cpu-percent=50 \
  --min=2 \
  --max=10

Create with YAML:

apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: my-app-hpa
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: my-app
  minReplicas: 2
  maxReplicas: 10
  metrics:
  - type: Resource
    resource:
      name: cpu
      target:
        type: Utilization
        averageUtilization: 50

Key fields explained:

FieldDescription
scaleTargetRefScaling target (Deployment, etc.)
minReplicasMinimum Pod count
maxReplicasMaximum Pod count
metricsScaling criteria metrics
averageUtilizationTarget utilization (%)

Multi-Metric HPA#

HPA considering both CPU and memory.

apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: my-app-hpa
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: my-app
  minReplicas: 2
  maxReplicas: 10
  metrics:
  - type: Resource
    resource:
      name: cpu
      target:
        type: Utilization
        averageUtilization: 50
  - type: Resource
    resource:
      name: memory
      target:
        type: Utilization
        averageUtilization: 70

With multiple metrics, the largest replica count calculated from each metric is applied.

HPA Scaling Calculation#

HPA calculates required replicas with this formula:

desiredReplicas = ceil(currentReplicas × (currentMetric / targetMetric))

Examples:

Current StateCalculationResult
3 Pods, CPU 70%, target 50%3 × (70/50) = 4.25 Pods
5 Pods, CPU 30%, target 50%5 × (30/50) = 33 Pods

Controlling Scaling Behavior#

Configuration to prevent rapid scaling.

apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: my-app-hpa
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: my-app
  minReplicas: 2
  maxReplicas: 10
  behavior:
    scaleDown:
      stabilizationWindowSeconds: 300  # 5 minute stabilization
      policies:
      - type: Percent
        value: 10
        periodSeconds: 60  # Max 10% decrease per minute
    scaleUp:
      stabilizationWindowSeconds: 0
      policies:
      - type: Percent
        value: 100
        periodSeconds: 15  # Max 100% increase per 15s
  metrics:
  - type: Resource
    resource:
      name: cpu
      target:
        type: Utilization
        averageUtilization: 50

Comparing scale up/down policies:

ItemScale UpScale Down
SpeedFastSlow
ReasonRespond to traffic surgePrevent unnecessary scale down

VPA (Vertical Pod Autoscaler)#

VPA automatically adjusts Pod resource requests/limits.

VPA is not installed by default
VPA must be installed separately. See VPA GitHub.

VPA Components#

ComponentRole
RecommenderAnalyze resource usage, calculate recommendations
UpdaterRestart Pods to apply new resources
Admission ControllerInject resources when creating new Pods

VPA Example#

apiVersion: autoscaling.k8s.io/v1
kind: VerticalPodAutoscaler
metadata:
  name: my-app-vpa
spec:
  targetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: my-app
  updatePolicy:
    updateMode: "Auto"  # Off, Initial, Recreate, Auto
  resourcePolicy:
    containerPolicies:
    - containerName: app
      minAllowed:
        memory: "128Mi"
        cpu: "100m"
      maxAllowed:
        memory: "2Gi"
        cpu: "2"

Comparing updateMode options:

ModeBehavior
OffProvide recommendations only, don’t apply
InitialApply only when creating new Pods
RecreateRestart existing Pods to apply
AutoAutomatically select most suitable method

HPA vs VPA Selection Guide#

Which scaling should you choose? Refer to this flowchart.

flowchart TD
    START[Need scaling] --> Q1{Application<br>type?}

    Q1 -->|Stateless<br>web server, API| Q2{Traffic fluctuation<br>large?}
    Q1 -->|Stateful<br>DB, cache| VPA_REC[VPA recommended]

    Q2 -->|Yes| HPA_REC[HPA recommended]
    Q2 -->|No| Q3{Resource configuration<br>appropriate?}

    Q3 -->|Don't know| VPA_OFF[Check recommendations with VPA<br>updateMode: Off]
    Q3 -->|Need optimization| VPA_REC

    HPA_REC --> Q4{Resource optimization<br>also needed?}
    Q4 -->|Yes| BOTH[Use HPA + VPA together<br>HPA for CPU, VPA for memory]
    Q4 -->|No| HPA_ONLY[Use HPA only]

    VPA_REC --> VPA_ONLY[Use VPA]
    VPA_OFF --> MANUAL[Adjust settings manually]
CriteriaHPAVPA
Stateless applications
Stateful applications
Respond to traffic fluctuation
Resource optimization
Immediate application✗ (requires restart)

Real Selection Examples#

SituationRecommendedReason
REST API server, high traffic fluctuationHPARespond quickly with Pod count
Batch job serverVPAOptimize resource usage
Server with limited DB connectionsHPA (careful with max)Need to manage connections based on Pod count
New service, don’t know appropriate resourcesVPA (Off mode)Collect recommendations then configure
Using HPA and VPA Together
When using HPA and VPA together on the same Deployment, caution is needed. It’s recommended to separate VPA adjusting only memory and HPA scaling based on CPU.

Prerequisites#

To use HPA, the following are required:

RequirementDescription
Metrics ServerMust be installed in cluster
Resource requestsPod must have CPU/memory requests configured

Check Metrics Server Installation#

# Check Metrics Server operation
kubectl top nodes
kubectl top pods

If you get error: Metrics API not available, install Metrics Server.

# Minikube
minikube addons enable metrics-server

# Other environments
kubectl apply -f https://github.com/kubernetes-sigs/metrics-server/releases/latest/download/components.yaml

Practice: Configuring and Testing HPA#

Deploy Test Application#

apiVersion: apps/v1
kind: Deployment
metadata:
  name: php-apache
spec:
  replicas: 1
  selector:
    matchLabels:
      app: php-apache
  template:
    metadata:
      labels:
        app: php-apache
    spec:
      containers:
      - name: php-apache
        image: k8s.gcr.io/hpa-example
        ports:
        - containerPort: 80
        resources:
          requests:
            cpu: 200m
          limits:
            cpu: 500m
---
apiVersion: v1
kind: Service
metadata:
  name: php-apache
spec:
  ports:
  - port: 80
  selector:
    app: php-apache

Create HPA#

kubectl autoscale deployment php-apache --cpu-percent=50 --min=1 --max=10

Load Test#

Generate load in a new terminal:

kubectl run -i --tty load-generator --rm --image=busybox --restart=Never \
  -- /bin/sh -c "while sleep 0.01; do wget -q -O- http://php-apache; done"

Check HPA Operation#

# Monitor HPA status
kubectl get hpa php-apache --watch

# Expected output:
# NAME         REFERENCE               TARGETS   MINPODS   MAXPODS   REPLICAS
# php-apache   Deployment/php-apache   0%/50%    1         10        1
# php-apache   Deployment/php-apache   250%/50%  1         10        1
# php-apache   Deployment/php-apache   250%/50%  1         10        5

When you stop the load generator (Ctrl+C), Pod count decreases after a while.


Next Steps#

Once you understand scaling, proceed to the next steps:

GoalRecommended Doc
Configure health checksHealth Checks
Resource optimizationResource Optimization
Actual deployment practiceSpring Boot Deployment