Run OpenAI’s new gpt-oss model at scale with GKE

Deven Goratela 12 August 2025

It’s exciting to see OpenAI contribute to the open ecosystem with the release of their new open weights model, gpt-oss. In keeping with our commitment to provide the best platform for open AI innovation, we’re announcing immediate support for deploying gpt-oss-120b and gpt-oss-20b on Google Kubernetes Engine (GKE). To help customers make informed decisions while deploying their infrastructure, we’re giving customers detailed benchmarks of gpt-oss-120b on accelerators on Google Cloud. You can access it here.

This continues our support for a broad and diverse ecosystem of models, from Google’s own Gemma family, to models like Llama 4, and now, OpenAI’s gpt-oss. We believe that offering choice and leveraging the best of the open community is critical for the future of AI.

Run demanding AI workloads at scale

The new gpt-oss-120b model is large and requires significant computational power, needing multiple NVIDIA H100 / H200 Tensor Core GPUs for optimal performance. This is where Google Cloud and GKE shine. GKE is designed to handle large-scale, mission-critical workloads, providing the scalability and performance needed to serve today’s most demanding models. With GKE, you can leverage Google Cloud’s advanced infrastructure, including both GPU and TPU accelerators, to power your generative AI applications.

Source Credit: https://cloud.google.com/blog/products/containers-kubernetes/run-openais-new-gpt-oss-model-at-scale-with-gke/