Workload Autoscaling

Introduction

This tutorial provides a hands-on guide to configuring workload autoscaling in Kubernetes, focusing on concepts and practical skills relevant to the Certified Kubernetes Administrator (CKA) exam. We will cover Horizontal Pod Autoscaling (HPA), which automatically scales the number of pods in a deployment, replication controller, replica set, or stateful set based on observed CPU utilization, memory utilization, or custom metrics.

Prerequisites:

A running Kubernetes cluster (minikube, kind, or a cloud-based cluster).
kubectl configured to connect to your cluster.
Basic understanding of Kubernetes deployments and services.

Task 1: Deploying a Sample Application

First, we’ll deploy a simple application that we can scale. We’ll use a basic HTTP server.

Create a deployment configuration file named php-apache.yaml:
NODE_TYPE // yaml
```
apiVersion: apps/v1
kind: Deployment
metadata:
  name: php-apache
spec:
  selector:
    matchLabels:
      run: php-apache
  replicas: 1
  template:
    metadata:
      labels:
        run: php-apache
    spec:
      containers:
      - name: php-apache
        image: k8s.gcr.io/hpa-example
        ports:
        - containerPort: 80
        resources:
          requests:
            cpu: 200m
          limits:
            cpu: 400m
---
apiVersion: v1
kind: Service
metadata:
  name: php-apache
  labels:
    run: php-apache
spec:
  ports:
  - port: 80
    protocol: TCP
  selector:
    run: php-apache
  type: LoadBalancer
```
The k8s.gcr.io/hpa-example image is a simple PHP-based web server designed for demonstrating autoscaling. The resources requests and limits are crucial for HPA to function correctly.
Apply the deployment and service:
NODE_TYPE // bash
```
kubectl apply -f php-apache.yaml
```
NODE_TYPE // output
```
deployment.apps/php-apache created
service/php-apache created
```
Verify the deployment and service are running:
NODE_TYPE // bash
```
kubectl get deployment php-apache
kubectl get service php-apache
```
NODE_TYPE // output
```
NAME         READY   UP-TO-DATE   AVAILABLE   AGE
php-apache   1/1     1            1           <age>

NAME         TYPE           CLUSTER-IP      EXTERNAL-IP   PORT(S)        AGE
php-apache   LoadBalancer   <cluster-ip>   <pending>     80:30713/TCP   <age>
```
It may take a few minutes for the service to obtain an EXTERNAL-IP, especially in cloud environments. For minikube, you can use minikube service php-apache to access the service.

Task 2: Creating a Horizontal Pod Autoscaler (HPA)

Now, we’ll create an HPA that automatically scales the php-apache deployment based on CPU utilization.

Create an HPA using kubectl autoscale:
NODE_TYPE // bash
```
kubectl autoscale deployment php-apache --cpu-percent=50 --min=1 --max=10
```
This command creates an HPA that targets the php-apache deployment. It will maintain a CPU utilization of 50% across all pods, scaling between 1 and 10 replicas.
Verify the HPA:
NODE_TYPE // bash
```
kubectl get hpa php-apache
```
NODE_TYPE // output
```
NAME         REFERENCE             TARGETS   MINPODS   MAXPODS   REPLICAS   AGE
php-apache   Deployment/php-apache   0%/50%    1         10        1          <age>
```
The TARGETS column might show <unknown>/50% initially. This is because the metrics server needs time to collect CPU utilization data. Ensure the metrics server is properly installed in your cluster.

Task 3: Generating Load and Observing Autoscaling

To trigger the autoscaling, we need to generate load on the php-apache service.

Run a load generator in a separate terminal:
NODE_TYPE // bash
```
kubectl run -i --tty load-generator --image=busybox /bin/sh
```
Inside the load-generator pod, use wget to generate traffic:
NODE_TYPE // bash
```
while true; do wget -q -O- <php-apache-external-ip>; done
```
Replace <php-apache-external-ip> with the external IP address of your php-apache service. If you’re using minikube, use minikube service php-apache --url to get the URL and then extract the IP.
Observe the HPA scaling the deployment:
NODE_TYPE // bash
```
kubectl get hpa php-apache -w
```
The -w flag watches for changes. You should see the REPLICAS column increase as the CPU utilization rises above 50%.
NODE_TYPE // output
```
NAME         REFERENCE             TARGETS   MINPODS   MAXPODS   REPLICAS   AGE
php-apache   Deployment/php-apache   23%/50%    1         10        1          <age>
php-apache   Deployment/php-apache   67%/50%    1         10        2          <age>
php-apache   Deployment/php-apache   81%/50%    1         10        3          <age>
```
Autoscaling may take a few minutes to kick in. Be patient and observe the TARGETS and REPLICAS columns.
Verify the number of pods:
NODE_TYPE // bash
```
kubectl get pods -l run=php-apache
```
You should see the number of pods increasing.
Stop the load generator by exiting the pod. Type exit in the shell.

Task 4: Cleaning Up

After you’re done experimenting, clean up the resources:

Delete the HPA:
NODE_TYPE // bash
```
kubectl delete hpa php-apache
```
Delete the deployment and service:
NODE_TYPE // bash
```
kubectl delete -f php-apache.yaml
```
Delete the load generator pod:
NODE_TYPE // bash
```
kubectl delete pod load-generator
```

Conclusion

In this tutorial, you learned how to configure workload autoscaling in Kubernetes using Horizontal Pod Autoscaling (HPA). You deployed a sample application, created an HPA based on CPU utilization, generated load to trigger autoscaling, and observed the scaling process. This hands-on experience provides a solid foundation for understanding and implementing autoscaling in Kubernetes, a crucial skill for the CKA certification. You should now be able to apply these principles to scale your own applications based on various metrics and resource requirements.

Workload Autoscaling

Introduction

Task 1: Deploying a Sample Application

Task 2: Creating a Horizontal Pod Autoscaler (HPA)

Task 3: Generating Load and Observing Autoscaling

Task 4: Cleaning Up

Conclusion

Next Topic

Application Deployments and Rolling Updates

Configuring ConfigMaps and Secrets

Pod Admission and Scheduling