Technical Theory

Workload Autoscaling

Introduction

This tutorial provides a hands-on guide to configuring workload autoscaling in Kubernetes, focusing on concepts and practical skills relevant to the Certified Kubernetes Administrator (CKA) exam. We will cover Horizontal Pod Autoscaling (HPA), which automatically scales the number of pods in a deployment, replication controller, replica set, or stateful set based on observed CPU utilization, memory utilization, or custom metrics.

Prerequisites:

  • A running Kubernetes cluster (minikube, kind, or a cloud-based cluster).
  • kubectl configured to connect to your cluster.
  • Basic understanding of Kubernetes deployments and services.

Task 1: Deploying a Sample Application

First, we’ll deploy a simple application that we can scale. We’ll use a basic HTTP server.

  1. Create a deployment configuration file named php-apache.yaml:

    NODE_TYPE // yaml
    apiVersion: apps/v1
    kind: Deployment
    metadata:
      name: php-apache
    spec:
      selector:
        matchLabels:
          run: php-apache
      replicas: 1
      template:
        metadata:
          labels:
            run: php-apache
        spec:
          containers:
          - name: php-apache
            image: k8s.gcr.io/hpa-example
            ports:
            - containerPort: 80
            resources:
              requests:
                cpu: 200m
              limits:
                cpu: 400m
    ---
    apiVersion: v1
    kind: Service
    metadata:
      name: php-apache
      labels:
        run: php-apache
    spec:
      ports:
      - port: 80
        protocol: TCP
      selector:
        run: php-apache
      type: LoadBalancer
    The k8s.gcr.io/hpa-example image is a simple PHP-based web server designed for demonstrating autoscaling. The resources requests and limits are crucial for HPA to function correctly.
  2. Apply the deployment and service:

    NODE_TYPE // bash
    kubectl apply -f php-apache.yaml
    NODE_TYPE // output
    deployment.apps/php-apache created
    service/php-apache created
  3. Verify the deployment and service are running:

    NODE_TYPE // bash
    kubectl get deployment php-apache
    kubectl get service php-apache
    NODE_TYPE // output
    NAME         READY   UP-TO-DATE   AVAILABLE   AGE
    php-apache   1/1     1            1           <age>
    
    NAME         TYPE           CLUSTER-IP      EXTERNAL-IP   PORT(S)        AGE
    php-apache   LoadBalancer   <cluster-ip>   <pending>     80:30713/TCP   <age>
    It may take a few minutes for the service to obtain an EXTERNAL-IP, especially in cloud environments. For minikube, you can use minikube service php-apache to access the service.

Task 2: Creating a Horizontal Pod Autoscaler (HPA)

Now, we’ll create an HPA that automatically scales the php-apache deployment based on CPU utilization.

  1. Create an HPA using kubectl autoscale:

    NODE_TYPE // bash
    kubectl autoscale deployment php-apache --cpu-percent=50 --min=1 --max=10
    This command creates an HPA that targets the php-apache deployment. It will maintain a CPU utilization of 50% across all pods, scaling between 1 and 10 replicas.
  2. Verify the HPA:

    NODE_TYPE // bash
    kubectl get hpa php-apache
    NODE_TYPE // output
    NAME         REFERENCE             TARGETS   MINPODS   MAXPODS   REPLICAS   AGE
    php-apache   Deployment/php-apache   0%/50%    1         10        1          <age>
    The TARGETS column might show <unknown>/50% initially. This is because the metrics server needs time to collect CPU utilization data. Ensure the metrics server is properly installed in your cluster.

Task 3: Generating Load and Observing Autoscaling

To trigger the autoscaling, we need to generate load on the php-apache service.

  1. Run a load generator in a separate terminal:

    NODE_TYPE // bash
    kubectl run -i --tty load-generator --image=busybox /bin/sh
  2. Inside the load-generator pod, use wget to generate traffic:

    NODE_TYPE // bash
    while true; do wget -q -O- <php-apache-external-ip>; done

    Replace <php-apache-external-ip> with the external IP address of your php-apache service. If you’re using minikube, use minikube service php-apache --url to get the URL and then extract the IP.

  3. Observe the HPA scaling the deployment:

    NODE_TYPE // bash
    kubectl get hpa php-apache -w

    The -w flag watches for changes. You should see the REPLICAS column increase as the CPU utilization rises above 50%.

    NODE_TYPE // output
    NAME         REFERENCE             TARGETS   MINPODS   MAXPODS   REPLICAS   AGE
    php-apache   Deployment/php-apache   23%/50%    1         10        1          <age>
    php-apache   Deployment/php-apache   67%/50%    1         10        2          <age>
    php-apache   Deployment/php-apache   81%/50%    1         10        3          <age>
    Autoscaling may take a few minutes to kick in. Be patient and observe the TARGETS and REPLICAS columns.
  4. Verify the number of pods:

    NODE_TYPE // bash
    kubectl get pods -l run=php-apache

    You should see the number of pods increasing.

  5. Stop the load generator by exiting the pod. Type exit in the shell.

Task 4: Cleaning Up

After you’re done experimenting, clean up the resources:

  1. Delete the HPA:

    NODE_TYPE // bash
    kubectl delete hpa php-apache
  2. Delete the deployment and service:

    NODE_TYPE // bash
    kubectl delete -f php-apache.yaml
  3. Delete the load generator pod:

    NODE_TYPE // bash
    kubectl delete pod load-generator

Conclusion

In this tutorial, you learned how to configure workload autoscaling in Kubernetes using Horizontal Pod Autoscaling (HPA). You deployed a sample application, created an HPA based on CPU utilization, generated load to trigger autoscaling, and observed the scaling process. This hands-on experience provides a solid foundation for understanding and implementing autoscaling in Kubernetes, a crucial skill for the CKA certification. You should now be able to apply these principles to scale your own applications based on various metrics and resource requirements.

Next Topic