
This is the cold start problem, and for AI/ML workloads on Kubernetes, it is a massive operational constraint. Container image optimization and node warm pools can take you so far, but when your model checkpoint is 70 billion parameters, there is no clever Dockerfile trick that meaningfully reduces a multi-minute initialization to something acceptable at scale.
Google Kubernetes Engine has a direct answer to this: Pod Snapshots.
📢 GA as of May 6, 2026: GKE Pod Snapshots is now generally available on clusters running version 1.35.3-gke.1234000 or later, across all channels.
🤔 What Is a Pod Snapshot?
A GKE Pod Snapshot is a point-in-time capture of a running pod’s entire runtime state — memory, CPU state, GPU state, and filesystem changes — all frozen and uploaded to Cloud Storage. When a new replica is needed, GKE restores that pod from the snapshot. The pod picks up exactly where it left off, skipping the expensive initialization work.
This feature builds on gVisor’s checkpoint/restore capabilities, which means it requires GKE Sandbox to be enabled. By running in an isolated environment, GKE can safely freeze the process and its memory.
📖 Note on Setup: Implementing this feature involves specific IAM permissions, GCS bucket configurations, and GKE Sandbox settings. For a step-by-step implementation guide, please refer to the official GKE Pod Snapshots documentation.
⚙️ How It Works Under the Hood
The feature is built on a declarative model using Kubernetes Custom Resource Definitions (CRDs). There are three key resources you need to understand:
PodSnapshotStorageConfig
This defines where snapshots are stored. Currently, only Cloud Storage buckets are supported. You point the config to a GCS bucket and an optional path prefix, and an agent running on each GKE node manages the snapshot lifecycle. Based on the policies you define, the agent determines when to create new snapshots and when to use existing ones to restore new pods.
apiVersion: snapshot.gke.io/v1
kind: PodSnapshotStorageConfig
metadata:
name: my-snapshot-storage
spec:
snapshotStorageConfig:
gcs:
bucket: my-snapshot-bucket
path: snapshots/llm-server
PodSnapshotPolicy
This is where the bulk of the configuration lives. The policy uses Kubernetes label selectors to identify which pods should be snapshotted, references the storage config, and defines the trigger type. There are two trigger models available:
- Workload trigger: The application running in the pod signals the GKE agent when it is ready for a snapshot. This is ideal for horizontally-scaled inference servers: the pod initializes, reaches a warm, ready state, signals the agent, and the snapshot is taken. Every subsequent replica skips that initialization entirely.
- Manual trigger: You create a PodSnapshotManualTrigger resource to trigger an on-demand snapshot for a specific pod. This is the right choice when you can't modify your application to emit a readiness signal, or when you need one-off snapshots during testing.
apiVersion: snapshot.gke.io/v1
kind: PodSnapshotPolicy
metadata:
name: llm-server-snapshot-policy
spec:
selector:
matchLabels:
app: llm-server
storageConfigName: my-snapshot-storage
triggerConfig:
type: Workload
PodSnapshot
A resource representing the successful capture of a point-in-time state. You can inspect these resources to view snapshot metadata, check their status, and monitor restore progress.
🔍 Snapshot Matching: How GKE Knows Which Snapshot to Use
This is one of the more interesting aspects of the implementation. GKE determines snapshot compatibility by generating a hash from what it calls the distilled Pod spec—a fingerprint built from container images, commands, arguments, volume mounts, security context, and other runtime-critical fields.
When a new pod is scheduled, GKE computes the same hash from the pod’s spec. If the hashes match, the pod is restored from the snapshot rather than starting cold. If they don’t match — because you updated the container image tag, changed an environment variable, or modified a volume mount — the existing snapshot is invalidated, and a new one will be taken once the updated pod reaches its ready state.
⚠️ Operational note: Every deployment that changes the pod spec will trigger a cold start for the first replica, which then gets snapshotted for subsequent replicas. Plan your rollout strategy accordingly.
📦 What’s Included (and What Isn’t)
Understanding the boundaries of what a snapshot captures is critical to building reliable workflows around this feature.
✅ Included in a snapshot:
- In-memory application state: all open file descriptors, threads, and memory
- The container root file system (rootfs), EmptyDir volumes, and tmpfs mounts
- Loopback connections, listening sockets, and Unix-Domain sockets
- GPU memory state (via NVIDIA’s cuda-checkpoint tool)
🧠 Because GPU state is written into process memory, Pod memory usage increases during snapshot and restore operations. You should account for this additional memory requirement when you set memory limits for your Pods.
❌ Not included:
- Persistent Volume Claims (PVCs) — these are treated as external mounts and not checkpointed
- Secrets and credentials stored outside application memory
- User-added network rules like iptables or nftablesand custom routes are not restored.
The implication for stateful workloads: if your pod writes critical data to a PVC or a mounted volume, that data won’t be captured in the snapshot. The snapshot captures the pod’s runtime state, not its persistent storage. Design your workload accordingly.
🌍 Environment Variables After a Restore
Because environment variables are part of the process memory, they are frozen at snapshot time. If you update a secret or API key via an environment variable and deploy, the restored pod will silently continue using the old, frozen value unless your application is explicitly coded to handle this.
GKE provides an updated environment at /proc/gvisor/spec_environ in the same format as /proc/<pid>/environ. For any values that may rotate or change between deployments — such as API keys, feature flags or service endpoints — your application should read them from this path at startup rather than relying on the process environment.
🖥️ Hardware Compatibility: Know Before You Deploy
Pod Snapshots come with specific hardware requirements that need to be planned around carefully:
- Machine series and architecture must match between snapshot creation and restore. You cannot snapshot on an N2 node and restore on an N4 node.
- CPU count and memory allocation can differ, so you have some flexibility for right-sizing replicas.
- E2 machine types are not supported due to their dynamic CPU architecture.
- GPU workloads require matching gVisor kernel versions and GPU driver versions between snapshot and restore nodes.
For teams running mixed node pools, this means carefully choosing which pool to snapshot and ensuring restore targets are drawn from a compatible pool.
🎯 When Should You Use This?
Pod Snapshots add meaningful operational overhead: you’re managing GKE Sandbox requirements, Cloud Storage buckets, IAM bindings with Workload Identity, and snapshot lifecycle policies. That overhead is worth it for the right workload profile — but it isn’t the right tool for every situation.
Use Pod Snapshots for workloads with long initialization times, such as AI inference workloads that load large models into CPU or GPU memory, or large applications that load many libraries and dependencies. Workloads with fast startup times generally won’t benefit from Pod Snapshots.
🌐 The Bigger Picture
Pod Snapshots are part of a broader set of capabilities Google has been building for AI workloads on GKE. Together with the Inference Gateway, Dynamic Resource Allocation, and Agent Sandbox, they form a cohesive set of primitives for production AI infrastructure on Kubernetes. Pod Snapshots specifically solve the replica scale-out problem: once a model is warm on one pod, snapshotting means you never pay the full initialization cost again for that deployment.
The cold start problem has long been a common challenge for Kubernetes-based AI deployments. While Pod Snapshots don’t completely eliminate it — since you still need to take that initial snapshot — they do turn it from an ongoing operational hassle into a one-time expense per deployment. For teams managing inference at scale, that’s a meaningful shift.
If you’re running LLM servers, scientific computing jobs, or any pods on GKE with startup times measured in minutes, this feature is worth a serious look.
📚 References:
- GKE Pod Snapshots — Concepts
- GKE Pod Snapshots — How-To Guide
Cold Starts Are Costing You: Fix Them with GKE Pod Snapshots was originally published in Google Cloud – Community on Medium, where people are continuing the conversation by highlighting and responding to this story.
Source Credit: https://medium.com/google-cloud/cold-starts-are-costing-you-fix-them-with-gke-pod-snapshots-733d4c1808c9?source=rss—-e52cf94d98af—4
