23 Şubat 2021 Salı

Kubernetes Autoscaler

Giriş
Kubernetes için 3 tane autoscaler var. Bunlar şöyle
1. Horizontal Pod Autoscaler : Pod sayınını ölçeklendirir
2. Vertical Pod Autoscaler : İşlemci ve belleği ölçeklendirir
3. Cluster Autoscaler : Cluster'a node eklemeyi ölçeklendirir
Cluster Autoscaler
Eğer cluster'da yeni pod için kaynak yoksa, horizontal autocaler zaten işe yaramaz.

Horizontal Pod Autoscaler
Kaç tane POD'un çalışması gerektiğine karar veren bileşenin ismi "Horizontal Pod Autoscaler". CPU ve bellek kullanımına bakarak bir formül çalıştırır. Açıklaması şöyle.
Kubernetes’ Horizontal Pod Autoscaler (HPA) automatically scales the application workload by scaling the number of Pods in deployment (or replication controller, replica set, stateful set), based on observed metrics like CPU utilization, memory consumption, or with custom metrics provided by the application. 

HPA uses the following simple algorithms to determine the scaling decision, and it can scale the deployment within the defined minimum and the maximum number of replicas. 

desiredReplicas = ceil[currentReplicas * ( currentMetricValue / desiredMetricValue )]
Örnek - CPU
Şöyle yaparız
apiVersion: autoscaling/v1
kind: HorizontalPodAutoscaler
metadata:
  name: hello-world
  namespace: default
spec:
  maxReplicas: 10
  minReplicas: 1
  scaleTargetRef:
    apiVersion: extensions/v1beta1
    kind: Deployment
    name: hello-world
  targetCPUUtilizationPercentage: 50

Scale To Zero
Şu anda sıfır POD'a indirmek desteklenmiyor. Açıklaması şöyle.
Kubernetes’ default HPA is based on CPU utilization and desiredReplicas never go lower than 1, where CPU utilization cannot be zero for a running Pod. This is the same behavior for memory consumption-based autoscaling, where you cannot achieve scale to zero. However, it is possible to scale into zero replicas if you ignore CPU and memory utilization and consider other metrics to determine whether the application is idle. For example, a workload that only consumes and processes a queue can scale to zero if we can take queue length as a metric and the queue is empty for a given period of time. Of course, there should be other factors to consider like lower latency sensitivity and fast bootup time (warm-up time) of the workload to have a smooth user experience. 

But, the current Kubernetes current stable release (v1.19) does not support scale to zero, and you can find discussions to support this in the Kubernetes enhancements gitrepo.

Hiç yorum yok:

Yorum Gönder