

written in collaboration with Radhika Aggarwal
In the world of cloud computing, consistency is king. How do you ensure that every virtual machine you deploy is secure, compliant, and configured identically, every single time? The answer lies in moving away from manual setups and embracing automation. The cornerstone of this modern approach is the “golden image.”
A golden image is a standardized, pre-configured VM template that serves as the trusted foundation for all your cloud deployments. Instead of configuring a new VM from scratch after it boots, you bake your security tools, monitoring agents, and compliance settings directly into the image itself.
This isn’t just about saving time; it’s a fundamental shift in how we manage infrastructure. By adopting an immutable infrastructure model, we treat our servers as disposable. When a change is needed — whether it’s a security patch or a software update — we don’t modify a running server. We build a new, versioned golden image, destroy the old instances, and deploy fresh ones.
This article will walk you through the entire lifecycle of building, managing, and maintaining a golden image pipeline on Google Cloud Platform (GCP), turning your infrastructure management from a manual chore into a secure, repeatable, and automated powerhouse.
The build stage is where your image factory comes to life. It’s a fully automated CI/CD pipeline that reliably produces your golden images, moving from a base OS to a hardened, compliant, and ready-to-use template.
The Anatomy of a Golden Image
A golden image is built in layers, starting with a base provided by the cloud vendor and adding your organization’s specific requirements. A typical recipe includes:
- Base Operating System: A standard OS like Red Hat Enterprise Linux (RHEL) 9. A common practice is to use a Bring-Your-Own-Subscription (BYOS) model if you have existing enterprise agreements.
- Security Tooling: All the agents your security team requires. This could include endpoint protection (e.g., Cisco AMP), vulnerability scanners (e.g., Tenable, PrismaCloud), and SIEM agents.
- Observability & Logging: Agents for application performance monitoring (e.g., Dynatrace OneAgent) and centralized logging (e.g., Syslog-ng) to ensure you have visibility from the moment an instance boots.
- Internal Dependencies: Any required packages, certificates, or configurations needed to connect to your internal services, such as a private package repository.
- Hardening Scripts: Scripts that configure the OS to meet specific compliance standards like PCI-DSS or CIS Benchmarks.
The Automated Workflow
Manually creating images is slow and prone to error. The goal is to create a fully automated CI/CD pipeline using a few key tools:
- Git Repository (e.g., GitLab, GitHub): This is your single source of truth. All configurations are stored as code here.
- HashiCorp Packer: This is the core engine of the factory. Packer reads a template file (written in HCL) that defines everything: the source image, the provisioning steps (scripts, Ansible playbooks), and where to publish the final image.
- CI/CD Orchestrator (e.g., Google Cloud Build, GitLab CI, GitHub Actions): This is the automation server that runs the Packer build process.
Here’s how the pipeline works from end to end:
- Trigger: The pipeline is initiated, either by a developer pushing a change to the Packer configuration in Git or by a nightly/weekly schedule.
- Secure Authentication: The CI/CD job authenticates to Google Cloud. Crucially, this should be done without long-lived service account keys. The best practice is to use Workload Identity Federation, which allows your CI/CD platform to securely impersonate a GCP Service Account for the duration of the job.
- Build Instance: Packer, running within the CI/CD job, uses its granted permissions to launch a temporary, private VM from a public base image (e.g., the latest RHEL 9 image).
- Provision: Packer connects to the temporary VM (ideally over a secure channel using Identity-Aware Proxy (IAP)) and runs a series of provisioners to install security agents, pull packages from your Artifact Registry or other package repository, and apply hardening scripts.
- Validate: This is a critical quality gate. Before capturing the image, the pipeline runs automated tests. This often includes a vulnerability scan using tools like Tenable to ensure you aren’t shipping a known vulnerability into production. If the scan finds a critical issue, the pipeline fails.
- Cleanup & Publish: Packer runs cleanup scripts to generalize the VM. It then creates a new, versioned custom image from the temporary VM’s disk. This new image is published to a dedicated GCP project and, most importantly, added to an Image Family.
An example GitLab pipeline is as follows:
---
variables:
# Packer-specific variables
PACKER_VERSION: "1.13.1"
PACKER_DIR: "packer/rhel9-golden"# Google Cloud project details
GCP_PROJECT: "INSERT_PROJECT"
GCP_WORKLOAD_IDENTITY_PROVIDER: "projects/PROJECT_NUMBER/locations/global/workloadIdentityPools/WIF_POOL/providers/WIF_PROVIDER"
GCP_SERVICE_ACCOUNT: "gitlab-ci-sa@INSERT_PROJECT.iam.gserviceaccount.com"
stages:
- build
.id_tokens:
id_tokens:
GITLAB_OIDC_TOKEN:
aud: https://iam.googleapis.com/${GCP_WORKLOAD_IDENTITY_PROVIDER}
build_rhel9_image:
image: google/cloud-sdk:latest
stage: build
extends: .id_tokens
script:
## Install Packer
- echo "Installing Packer..."
- mkdir -p /tmp/packer-install && cd /tmp/packer-install
- apt-get update && apt-get install -y wget unzip
- wget https://releases.hashicorp.com/packer/${PACKER_VERSION}/packer_${PACKER_VERSION}_linux_amd64.zip
- unzip -o packer_${PACKER_VERSION}_linux_amd64.zip
- mv packer /usr/local/bin/packer
- packer --version
## Authenticate to Google Cloud using Workload Identity Federation
- echo "Authenticating to Google Cloud..."
- export JWT_FILE_PATH=$(pwd)/.ci_job_jwt_file
- export GOOGLE_CREDS_CONFIG_PATH=$(pwd)/.gcp_temp_cred.json
- echo ${GITLAB_OIDC_TOKEN} > ${JWT_FILE_PATH}
- gcloud iam workload-identity-pools create-cred-config "${GCP_WORKLOAD_IDENTITY_PROVIDER}" --service-account="${GCP_SERVICE_ACCOUNT}" --output-file=${GOOGLE_CREDS_CONFIG_PATH} --credential-source-file=${JWT_FILE_PATH}
- gcloud auth login --cred-file=${GOOGLE_CREDS_CONFIG_PATH}
- gcloud config set project ${GCP_PROJECT}
- export GOOGLE_APPLICATION_CREDENTIALS=${GOOGLE_CREDS_CONFIG_PATH}
## Build Packer Image
- echo "--- Starting Packer Build ---"
- cd "${CI_PROJECT_DIR}"
- cd "${PACKER_DIR}"
- packer init .
- packer validate -var-file="variables.pkrvars.hcl" .
- packer build -var-file="variables.pkrvars.hcl" .
- echo "--- Packer Build Completed Successfully ---"
Patching and Maintenance
With an immutable infrastructure, you don’t patch running servers. You replace them. This is the core of the maintenance strategy.
For stateless applications running in a Managed Instance Group (MIG), this is a seamless process. You trigger a rolling update, and the MIG automatically replaces old instances with new ones created from the latest golden image, with zero downtime.
For stateful applications, the key is to separate your data from the OS by using stateful Persistent Disks. When you perform an update, the MIG’s RECREATE
method terminates the old VM, creates a new one from the patched image, and reattaches the original data disk.
Creating the image is only half the battle. You also need to ensure that your development teams use it — and only it.
Image Families: The Easy Button for Consumers
Instead of having teams update their scripts every month with a new image name (e.g., from rhel-9-golden-202506
to rhel-9-golden-202507
), you use Image Families. An image family is a simple, named pointer that always references the latest, non-deprecated image.
Your pipeline automatically updates the my-org-rhel-9-golden
family to point to the newest version. Application teams simply reference that family name in their Infrastructure as Code (IaC) templates, guaranteeing they always deploy the latest, approved version.
Enforcement Through Policy
You can enforce the use of your golden images with a powerful two-pronged approach:
- Organization Policy (
constraints/compute.trustedImageProjects
): This is the hard enforcement layer. You configure a GCP policy at the organization or folder level that explicitly whitelists your golden image project. Any attempt to create a VM from an image outside that project will be denied by the GCP API. - Policy as Code (e.g., Open Policy Agent — OPA): This provides “shift-left” validation. You integrate OPA into your IaC deployment pipeline. Before Terraform or OpenTofu even attempts to create a VM, OPA scans the plan to ensure it’s using an image from an approved image family. This catches errors early and provides immediate feedback to developers.
The “Break-Glass” Option: VM Manager
What about zero-day vulnerabilities that require an immediate fix? Waiting for a full pipeline run might be too slow. This is where Google Cloud’s VM Manager comes in.
VM Manager is a suite of tools that gives you visibility and control over your running fleet.
- OS Inventory tells you exactly what packages are installed on every VM.
- Patch Management allows you to deploy a targeted, emergency patch across your entire fleet in minutes.
- OS Policies allow you to define and automatically enforce a desired configuration state
Using VM Manager for a critical patch is your “break-glass” procedure. But it’s crucial that as soon as the emergency is contained, you immediately trigger your golden image pipeline to build the patch into a new, official image. This ensures your fleet quickly returns to a known-good, immutable state.
A well-managed lifecycle includes a graceful retirement process for old images to reduce security risks and manage storage costs. GCP provides a clear, phased approach using different image states:
- DEPRECATED: When your pipeline releases a new image, it should automatically mark the previous one as
DEPRECATED
. The image family immediately stops pointing to it. It can still be used if specified directly (with a warning), but it’s effectively “soft-retired.” - OBSOLETE: After a grace period (e.g., 30 days), the image is marked
OBSOLETE
. It is now impossible to create new VMs from it. This is a hard lock. - DELETED: After a final retention period for auditing (e.g., 90–180 days), the image is permanently deleted.
This entire flow can and should be automated as part of your image factory pipeline.
By embracing a golden image strategy, you build a foundation of security and consistency that allows your teams to move faster and with greater confidence. You eliminate configuration drift, streamline compliance, and make patching a routine, low-risk event.
Source Credit: https://medium.com/google-cloud/build-a-better-vm-creating-a-golden-image-pipeline-on-gcp-1419c8f18654?source=rss—-e52cf94d98af—4