Misadventures in Kubernetes: Autoscaling Workers

So we’ve already done a few things for setting up our own custom cluster. We’ve manually configured a Kubernetes Control Plane and joined worker nodes by hand. While that was a great way for us to learn the components, let’s be honest: setting up every server by hand is just not scalable for a real production environment. Or for creating a more resilient cluster as well! Really the main issue boils down to how we set up the initial pool of worker nodes.

If what I’m talking about us doing isn’t familiar you should first read our prior post!

The Problem with Manual Nodes

Right now, our cluster is static, and honestly, it’s a bit of a liability. If worker-1 decides to take an unscheduled vacation and crashes, it’s just gone. Our capacity takes a hit, and we are stuck in the dark until someone manually notices and provisions a replacement. In a modern setup, having to SSH into every new VM just to run a join command isn’t just tedious, it’s a bottleneck that keeps us from scaling when it actually matters. Not only that, but it also makes it hard to meet the expectations of what we thought Kubernetes would be able to help as well, no!?

There are a few things that we need our cluster to be able to do:

Automatic Joining: New nodes should just join the cluster the moment they boot up, no human needed.
Self-Healing: If a node dies, the system should recognize the loss and automatically spin up a healthy replacement.
Smart Scaling: The cluster needs to get bigger when the load increases and shrink back down when things quieten down to save money.

Startup Scripts

The key to our automation is ensuring a new VM joins the cluster automatically when it boots. We can’t be there to SSH in and run kubeadm join every time.

To achieve this, we will use a GCP Startup Script to run the join command for us.

Step 1: Generate a Permanent Token

Standard kubeadm tokens expire after 24 hours. For our autoscaling group that might last months, we need a permanent token.

On your Control Plane, run:

kubeadm token create --print-join-command --ttl 0

Copy the output command, as you’ll need it for the next step.

Step 2: Create an Instance Template

An Instance Template tells GCP how to build a VM by specifying the image, machine type, and scripts to run. Note that the k8s-node-family referenced here is a custom image we built in the previous part of this series.

We use the — metadata startup-script flag to inject our join command.

gcloud compute instance-templates create k8s-worker-template \
--image-family=k8s-node-family \
--machine-type=e2-standard-2 \
--tags=k8s-worker \
--metadata startup-script='#! /bin/bash &lt;PASTE_YOUR_JOIN_COMMAND_HERE&gt;'

Step 3: Create the Managed Instance Group (MIG)

We will create a Regional MIG. This means GCP will spread our nodes across multiple zones, such as us-central1-a, b, and c, for high availability.

gcloud compute instance-groups managed create k8s-worker-mig \
--template=k8s-worker-template \
--size=1 \
--region=us-central1

GCP will immediately spin up 1 node. It will boot, run the startup script, and join your cluster automatically.

Step 4: Enable Autoscaling

Now we tell GCP to watch the CPU usage of these nodes. If the average CPU usage exceeds 60%, it will add more nodes for us (up to 5).

gcloud compute instance-groups managed set-autoscaling k8s-worker-mig \
--max-num-replicas=5 \
--min-num-replicas=1 \
--target-cpu-utilization=0.60 \
--region=us-central1

Stress Testing the Autoscaler

Let’s prove it works. We’ll use a local kubectl connection to create artificial load.

Create a Load Generator We’ll deploy a simple busybox container that does nothing but an infinite loop to burn CPU.

kubectl create deployment load-generator --image=busybox -- /bin/sh -c "while true; do :; done"

2. Request CPU This step is absolutely critical. We must tell Kubernetes. exactly how much CPU this pod requires; without these resource requests, the cluster autoscaler will not recognize that the node is at capacity and will fail to trigger the scale-up event.

kubectl set resources deployment load-generator --requests=cpu=200m

3. Scale the Load Scale it to 20 replicas.

kubectl scale deployment load-generator --replicas=20

4. Watch the Magic Open two terminal windows. In one, watch your nodes:

kubectl get nodes -w

In the other, watch GCP instances:

gcloud compute instance-groups managed list-instances k8s-worker-mig --region=us-central1

You will see the single node fill up, pods go into Pending state, and the GCP Autoscaler provision new VMs to handle the load.

The Easy Way (GKE)

It’s always worth pausing to appreciate just how much effort we put into this. If you were using Google Kubernetes Engine (GKE), this entire process could have been replaced by a single “easy button” command:

gcloud container clusters create k8s-easy-cluster \
--zone us-central1-a \
--num-nodes 3 \
--machine-type e2-medium

While GKE is powerful, understanding the “hard way” makes us better operators because we know exactly which components to investigate when things go wrong.

Ready for the next challenge? Follow along to the next part of the series: upgrading your control plane without downtime!

Are We Done? (Or Just Getting Started?)

At this stage, we have a functional Kubernetes cluster on Compute Engine that includes self-healing and auto-scaling capabilities. This setup provides control over the operating system, kernel, and networking without GKE management fees, demonstrating that automation can be integrated into a manual build. However, this configuration is merely the foundation for a much broader architectural exploration. What are some things that you might want this cluster to do that it doesn’t already?

Go Deeper

The Original Blueprint: This entire series is inspired by Kelsey Hightower’s foundational guide, Kubernetes The Hard Way.
Official Documentation: Get familiar with the source of truth for all components: Kubernetes.io Documentation.
Official GKE Documentation: Access the complete guide for Google Kubernetes Engine (GKE) Documentation.

Misadventures in Kubernetes: Autoscaling Workers was originally published in Google Cloud – Community on Medium, where people are continuing the conversation by highlighting and responding to this story.

Source Credit: https://medium.com/google-cloud/misadventures-in-kubernetes-autoscaling-workers-40af6e0485f7?source=rss—-e52cf94d98af—4