

Every virtual machine in Google Cloud comes with a line item: Compute Engine cores, memory, disks, egress traffic, snapshots. When projects start, we book resources “with a safety margin.” Months pass, features ship, load predictions change, but the VM shapes linger. A fleet that runs at 35–40 % CPU and half its RAM is money left on the table.
Fun fact: Google’s built-in Recommender service claims a median 15 % spending drop once its advice is applied (2024 internal study).
1. Find the sleepy machines
Imagine a security guard walking past a row of vending machines. Some hum loudly; others barely buzz. We only pay electricity for the buzz, but Google charges the same for every machine.
- How to spot them: open Cloud Monitoring → Metrics Explorer and plot
instance/cpu/utilization
. - What to look for: any VM that sits under 5 % CPU for hours and hardly touches disk or network.
- Why it matters: if nobody uses the machine, you either pause it or turn it into a cheaper shape.
2. Work in “waves” so nothing breaks at once
Why waves? Because turning ten things off in one night can wake up angry engineers the next morning. By moving gently, you keep the business calm and learn from small mistakes.
3. Quick wins you can try this week
4. A real-world mini-case (ten VMs)
- CPU Utilization — graph should sit below 40 % most of the time. If it is lower, downsize; if spikes hit 80 %, let autoscaler add cores.
- Memory Use — aim for 65–75 % used RAM. Below that you are wasting money; above that apps can crash.
- Disk I/O Latency — a rising red line in throttled_time means the disk is too slow (or too small) for the job.
- Network Egress — sudden traffic bumps may mean the VM is serving content that belongs behind Cloud CDN, which is cheaper per gigabyte.
- GKE Autopilot can now scale a microservice to zero pods — no traffic, no bill.
- Recommender API v2 ships a ready-made
gcloud
command with each suggestion, so you can script fixes. - Billing Export adds effective_price for every SKU, making Grafana dashboards like “dollars per vCPU-hour” easy.
- Cloud Scheduler + Cloud Functions — set a rule: “Shut down intern sandboxes at 6 PM.” They forget, the script will not.
- DevOps collect graphs and craft proposals.
- Developers run load tests to prove nothing slowed down.
- Finance checks the math and buys commitment contracts.
- Managers watch SLAs and feature velocity so savings never hurt customers.
Think of it as tuning a guitar: one person twists the peg, another listens for the clear note.
Optimization is not a one-off project. It is a measure → improve → verify loop you run every sprint. With Google’s Recommender as your co-pilot and a sprinkle of autoscaling plus spot instances, you keep applications quick, developers happy, and accountants relaxed.
Was this helpful? Smash that clap button and hit Subscribe — more hands-on guides are coming!
Source Credit: https://medium.com/google-cloud/optimize-google-cloud-vms-in-2025-cut-costs-keep-performance-c2f73c925a62?source=rss—-e52cf94d98af—4