Post

How Auto Scalling works in kubernetes and how to do it

How Auto Scalling works in kubernetes and how to do it

Overview

In this blog, we’ll learn how auto scalling (hpa) works in kubernetes and how to do it

Why ? What ? and How ?

First thing to understand is WHY

simple answer - when the traffice increases than normal, we need to make sure that all the requests are served, how to do it just increse the number of applcations running,

In kubernetes this is done by Horizontal Pod AutoScaler

Understanding how HPA’s works

In simple terms, the hpa constantly asks metrics server what is the cpu/memory for a pod, if it is more than what we defined it adds more pods

What is this Metrics Server ?

It is a cluster level aggregator of resource usage data. This means that the metrics server gets the node / pod resource usage from kubelet and make these data available in a prometheus endpoint

Control Loop Flow

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
┌─────────────────────────────────────────────────────────────────┐
│                    HPA Control Loop (every 15s)                 │
└─────────────────────────────────────────────────────────────────┘
                              │
                              ▼
                 ┌────────────────────────┐
                 │ 1. Fetch HPA spec from │
                 │    kube-apiserver      │
                 └───────────┬────────────┘
                             │
                             ▼
                 ┌────────────────────────┐
                 │ 2. Query current       │
                 │    metrics from        │
                 │    Metrics API         │
                 └───────────┬────────────┘
                             │
                             ▼
                 ┌────────────────────────┐
                 │ 3. Calculate desired   │
                 │    replicas using      │
                 │    scaling algorithm   │
                 └───────────┬────────────┘
                             │
                             ▼
                 ┌────────────────────────┐
                 │ 4. Apply scaling       │
                 │    decision            │
                 │    (if needed)         │
                 └───────────┬────────────┘
                             │
                             ▼
                 ┌────────────────────────┐
                 │ 5. Update HPA status   │
                 │    (current replicas,  │
                 │     current metrics)   │
                 └────────────────────────┘

Scaling algorithm

1
desiredReplicas = ceil(currentReplicas × (currentMetricValue / targetMetricValue))

Example Calculation

Given:

  • Current replicas: 2
  • Target memory: 50%
  • Current memory: 80%

Calculation: desiredReplicas = ceil(2 × (80 / 50)) = ceil(2 × 1.6) = ceil(3.2) = 4

Result: Scale from 2 → 4 pods

When multiple metrics are defined, it will calculate for each metric and do max()

The Intresting part: How

HPA can scale Deployment, ReplicaSet, StatefulSet but not Pods because it has no replica field

Before we do this, make sure that metrics server is enabled

Just identify what needs to be scalled and use the below manifest for configuring it

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: otel-collector-hpa
  namespace: monitoring
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: otel-collector
  minReplicas: 2
  maxReplicas: 10
  metrics:
    - type: Resource
      resource:
        name: memory
        target:
          type: Utilization
          averageUtilization: 50

Here metadata.name is the hpa name, in the spec.scaleTargetRef we define what to target for scalling and then we define max, min replica counts and we define “On what bases it should be scaled”

That’s it just a simple config

Notes

  • HPA can use custom metrics emitted by a pod/application

Further Readings

This post is licensed under CC BY 4.0 by the author.