Scaling and Managing Workloads in Kubernetes
In this continuation of our Kubernetes blog series, we delve into scaling and managing workloads in Kubernetes. If you haven’t already done so, you can check out the previous articles in this series via the links below:
Scaling applications is a key feature of Kubernetes, ensuring that your applications can handle varying loads efficiently. We’ll explore both manual and automatic scaling using the same deployment that we worked on in the previous episode.
Here’s that deployment file in its entirety, for your quick reference:
apiVersion: apps/v1 kind: Deployment metadata: name: hello-world spec: replicas: 2 selector: matchLabels: app: hello-world template: metadata: labels: app: hello-world spec: containers: - name: hello-world image: [YourDockerHubUsername]/hello-k8s ports: - containerPort: 80
In this spec, you’ll notice that we asked Kubernetes to setup two replicas of our appliation. You can verify this by running kubectl get pods
in your terminal.
Manual Scaling
First, let’s scale our application manually. In order to do so, run the following command in the terminal:
kubectl scale deployment/hello-world --replicas=5
Next, verify the update to your deployment:
kubectl get deployment hello-world
Did Kubernetes react accordingly? Try running another kubectl get pods
command:
Awesome! We now have five replicas of our hello-world app running! You can probably start to appreciate the power of Kubernetes now. With a few simple commands we were able to scale up our application to service more traffic.
Auto-Scaling with Horizontal Pod Autoscaler (HPA)
While adding and removing replicas manually is impressive, we can do a lot more. Automatic scaling in Kubernetes can be achieved with the Horizontal Pod Autoscaler (HPA). We can configure this to automatically scale up or down based on select metrics such as CPU utilization or memory used. Let’s try this out.
First, we need a tool to collect various metrics from our resources to pass to the Autoscaler so that it can add or remove replicas as our configurations dictate. This tool is aptly called the Kubernetes Metric Server. To install this tool in our local Kubernetes cluster, let’s grab a yaml manifest from their GitHub Repo, here. However, instead of directly applying this template from the GitHub repo, let’s download it and make an adjustment to get it working in our local environment. Here are the contents of that manifest file, below:
apiVersion: v1 kind: ServiceAccount metadata: labels: k8s-app: metrics-server name: metrics-server namespace: kube-system --- apiVersion: rbac.authorization.k8s.io/v1 kind: ClusterRole metadata: labels: k8s-app: metrics-server rbac.authorization.k8s.io/aggregate-to-admin: "true" rbac.authorization.k8s.io/aggregate-to-edit: "true" rbac.authorization.k8s.io/aggregate-to-view: "true" name: system:aggregated-metrics-reader rules: - apiGroups: - metrics.k8s.io resources: - pods - nodes verbs: - get - list - watch --- apiVersion: rbac.authorization.k8s.io/v1 kind: ClusterRole metadata: labels: k8s-app: metrics-server name: system:metrics-server rules: - apiGroups: - "" resources: - nodes/metrics verbs: - get - apiGroups: - "" resources: - pods - nodes verbs: - get - list - watch --- apiVersion: rbac.authorization.k8s.io/v1 kind: RoleBinding metadata: labels: k8s-app: metrics-server name: metrics-server-auth-reader namespace: kube-system roleRef: apiGroup: rbac.authorization.k8s.io kind: Role name: extension-apiserver-authentication-reader subjects: - kind: ServiceAccount name: metrics-server namespace: kube-system --- apiVersion: rbac.authorization.k8s.io/v1 kind: ClusterRoleBinding metadata: labels: k8s-app: metrics-server name: metrics-server:system:auth-delegator roleRef: apiGroup: rbac.authorization.k8s.io kind: ClusterRole name: system:auth-delegator subjects: - kind: ServiceAccount name: metrics-server namespace: kube-system --- apiVersion: rbac.authorization.k8s.io/v1 kind: ClusterRoleBinding metadata: labels: k8s-app: metrics-server name: system:metrics-server roleRef: apiGroup: rbac.authorization.k8s.io kind: ClusterRole name: system:metrics-server subjects: - kind: ServiceAccount name: metrics-server namespace: kube-system --- apiVersion: v1 kind: Service metadata: labels: k8s-app: metrics-server name: metrics-server namespace: kube-system spec: ports: - name: https port: 443 protocol: TCP targetPort: https selector: k8s-app: metrics-server --- apiVersion: apps/v1 kind: Deployment metadata: labels: k8s-app: metrics-server name: metrics-server namespace: kube-system spec: selector: matchLabels: k8s-app: metrics-server strategy: rollingUpdate: maxUnavailable: 0 template: metadata: labels: k8s-app: metrics-server spec: containers: - args: - --cert-dir=/tmp - --secure-port=4443 - --kubelet-preferred-address-types=InternalIP,ExternalIP,Hostname - --kubelet-use-node-status-port - --metric-resolution=15s - --kubelet-insecure-tls # Adding this to ignore certificate errors for local explorations. Do not do this in prod. image: registry.k8s.io/metrics-server/metrics-server:v0.6.4 imagePullPolicy: IfNotPresent livenessProbe: failureThreshold: 3 httpGet: path: /livez port: https scheme: HTTPS periodSeconds: 10 name: metrics-server ports: - containerPort: 4443 name: https protocol: TCP readinessProbe: failureThreshold: 3 httpGet: path: /readyz port: https scheme: HTTPS initialDelaySeconds: 20 periodSeconds: 10 resources: requests: cpu: 100m memory: 200Mi securityContext: allowPrivilegeEscalation: false readOnlyRootFilesystem: true runAsNonRoot: true runAsUser: 1000 volumeMounts: - mountPath: /tmp name: tmp-dir nodeSelector: kubernetes.io/os: linux priorityClassName: system-cluster-critical serviceAccountName: metrics-server volumes: - emptyDir: {} name: tmp-dir --- apiVersion: apiregistration.k8s.io/v1 kind: APIService metadata: labels: k8s-app: metrics-server name: v1beta1.metrics.k8s.io spec: group: metrics.k8s.io groupPriorityMinimum: 100 insecureSkipTLSVerify: true service: name: metrics-server namespace: kube-system version: v1beta1 versionPriority: 100
The one adjustment that I made to the YAML manifest above is to add an additional argument to the metrics server container spec, kubelet-insecure-tls
. This configuration will instruct the metrics server to not bother checking the validity of the TLS certificates provided by the Kubelets. In production usage, you’ll be using certificates from a proper certificate authority but for the purposes of this simple demo and our learning, we’ll bypass it.
Save the above file as metrics.yaml
and apply the configuration.
kubectl apply -f metrics.yaml
You can verify your installation with the following command:
kubectl get deployment metrics-server -n kube-system
Next, we’ll make a Horizontal Pod Autoscaler (HPA) resource. Create a new yaml file named hello-world-hpa.yaml
and add the following contents to it and save.
apiVersion: autoscaling/v1 kind: HorizontalPodAutoscaler metadata: name: hello-world-hpa spec: scaleTargetRef: apiVersion: apps/v1 kind: Deployment name: hello-world minReplicas: 2 maxReplicas: 10 targetCPUUtilizationPercentage: 50
In the manifest above, we’re instructing Kubernetes to provision a Horizontal Pod Autoscaler and target the hello-world
application (the deployment of it to be more precise) that we created and deployed in our previous session. The targetCPUUtilizationPercentage
of 50
instructs the HPA to scale out more resources when our pods cross the 50% CPU utilization mark of what each pod is allowed to use. Well, home much CPU and memory can our hello-world app use? What is that set to? We haven’t done that yet and will get to that in a sec. First, let’s create our HPA.
Setup the HPA by issuing the following command: kubectl apply -f hello-world-hpa.yaml
.
Remember the deployment that we did in part 2 of this series (hello-deployment.yaml
)? Let’s reopen that manifest file and add the following configurations to it.
apiVersion: apps/v1 kind: Deployment metadata: name: hello-world spec: replicas: 2 selector: matchLabels: app: hello-world template: metadata: labels: app: hello-world spec: containers: - name: hello-world image: tvaidyan/hello-k8s ports: - containerPort: 80 resources: requests: cpu: 10m memory: 5Mi
Here, we’re specifying that our hello-world containers can use up to 10m
(10 millicores) of CPU resources and 5Mi
(memibytes) of RAM. Let’s apply these changes.
kubectl apply -f hello-deployment.yaml
With these configurations in place, let’s test out our auto-scaling. To test, we can generate some artificial load against our containers by creating a busy pod:
kubectl run -i --tty load-generator --image=busybox /bin/sh
Inside the shell of the busy pod, run:
while true; do wget -q -O- http://hello-world-deployment; done
Here we’re creating an endless loop to keep requesting our hello-world app. While that’s running, open a new terminal window and issue the following command to monitor the auto-scaling:
kubectl get hpa hello-world-hpa --watch
You should see the number of replicas automatically increase as the load goes up.
Conclusion
Congratulations! You have successfully learned how to manually and automatically scale applications in Kubernetes. This knowledge is crucial for managing the performance and efficiency of your applications in a Kubernetes environment.
In our next articles, we will explore other advanced Kubernetes features and practices. Stay tuned!