How Auto Scalling works in kubernetes and how to do it
Overview
In this blog, we’ll learn how auto scalling (hpa) works in kubernetes and how to do it
Why ? What ? and How ?
First thing to understand is WHY
simple answer - when the traffice increases than normal, we need to make sure that all the requests are served, how to do it just increse the number of applcations running,
In kubernetes this is done by Horizontal Pod AutoScaler
Understanding how HPA’s works
In simple terms, the hpa constantly asks metrics server what is the cpu/memory for a pod, if it is more than what we defined it adds more pods
What is this Metrics Server ?
It is a cluster level aggregator of resource usage data. This means that the metrics server gets the node / pod resource usage from kubelet and make these data available in a prometheus endpoint
Control Loop Flow
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
┌─────────────────────────────────────────────────────────────────┐
│ HPA Control Loop (every 15s) │
└─────────────────────────────────────────────────────────────────┘
│
▼
┌────────────────────────┐
│ 1. Fetch HPA spec from │
│ kube-apiserver │
└───────────┬────────────┘
│
▼
┌────────────────────────┐
│ 2. Query current │
│ metrics from │
│ Metrics API │
└───────────┬────────────┘
│
▼
┌────────────────────────┐
│ 3. Calculate desired │
│ replicas using │
│ scaling algorithm │
└───────────┬────────────┘
│
▼
┌────────────────────────┐
│ 4. Apply scaling │
│ decision │
│ (if needed) │
└───────────┬────────────┘
│
▼
┌────────────────────────┐
│ 5. Update HPA status │
│ (current replicas, │
│ current metrics) │
└────────────────────────┘
Scaling algorithm
1
desiredReplicas = ceil(currentReplicas × (currentMetricValue / targetMetricValue))
Example Calculation
Given:
- Current replicas: 2
- Target memory: 50%
- Current memory: 80%
Calculation: desiredReplicas = ceil(2 × (80 / 50)) = ceil(2 × 1.6) = ceil(3.2) = 4
Result: Scale from 2 → 4 pods
When multiple metrics are defined, it will calculate for each metric and do max()
The Intresting part: How
HPA can scale Deployment, ReplicaSet, StatefulSet but not Pods because it has no replica field
Before we do this, make sure that metrics server is enabled
Just identify what needs to be scalled and use the below manifest for configuring it
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
name: otel-collector-hpa
namespace: monitoring
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: otel-collector
minReplicas: 2
maxReplicas: 10
metrics:
- type: Resource
resource:
name: memory
target:
type: Utilization
averageUtilization: 50
Here metadata.name is the hpa name, in the spec.scaleTargetRef we define what to target for scalling and then we define max, min replica counts and we define “On what bases it should be scaled”
That’s it just a simple config
Notes
- HPA can use custom metrics emitted by a pod/application