
Wednesday April 2, 2025 13:00 – 13:20 BST
Level 1 | Solutions Showcase | Hall Entrances S8 – S9 | Demo Theater
AI workloads are getting bigger and trickier to run. For large-scale training jobs we need Kubernetes clusters that can scale to sizes beyond what has been possible until now. In this demonstration we will showcase what it takes to scale Kubernetes to up to 65,000 nodes and handle the most demanding workloads, whether you are training AI models or running complex simulations. We will also look at open-source tools like Kueue and how it efficiently manages a wide range of AI workload scenarios within a large, hypothetical AI company, running their AI workload on a 65,000-node Google Kubernetes Engine (GKE) cluster. We’ll showcase Kueue’s capabilities in handling train/inference fungibility, priority-based preemption, fair sharing, and topology-aware scheduling, providing insights into its effectiveness at this massive scale.
Google Cloud will be delivering multiple sessions at KubeCon. Below is a small selection of the sessions you won’t want to miss. For a full list check our website and the KubeCon schedule.
The Future of Data on Kubernetes From Database Management To AI Foundation: The Data on Kubernetes ecosystem has expanded beyond persistent storage to support critical data workloads including databases and AI/ML operations. In this panel, experts will discuss the current state and future of data on kubernetes.
Yes You Can Run LLMs on Kubernetes: As LLMs become increasingly powerful and ubiquitous, the need to deploy and scale these models in production environments grows. In this session we’ll cover the key considerations and best practices for packaging LLM inference services as containerized applications using popular OSS inference servers like TGI, vLLM and Ollama, and deploying them on Kubernetes.
KubeCon FamilyFortune, Episode 2: Join us for a rousing game of Family Fortune (Family Feud to our friends across the pond)! We will have silly questions with even sillier answers, as we try to guess what our global community of Kubernauts think.
AI Beyond Autocomplete: Using LLMs To Create 1000 Kubernetes Controllers: LLMs can generate React apps, poems, and even music. But can they rise to the ultimate challenge: writing reliable Kubernetes controllers? The Config Connector team say “yes!” We are successfully using AI to write production controllers for a thousand google cloud resources. Join us to learn lessons that will apply as your project embraces the AI-assisted future.
Scalable DNS With CoreDNS Plugins: A Deep Dive: CoreDNS is a highly flexible and extensible DNS server widely recognized as the default DNS solution in Kubernetes. We will dive deep into CoreDNS’s extensive plugin ecosystem, examining several plugins that significantly enhance DNS scalability in Kubernetes. We’ll also walk through developing a Go-based demo plugin that leverages source IP for service discovery.
A Practical Guide To Kubernetes Policy as Code: Policies play a critical role in ensuring Kubernetes security, compliance, and governance in your clusters. In this session, they’ll explain what PaC is, why it’s essential, and demonstrate how to effectively use built-in Kubernetes features like ValidatingAdmissionPolicy and MutatingAdmissionPolicy alongside CNCF policy engines such as OPA/Gatekeeper and Kyverno to manage your PaC lifecycle.
Simplifying the Networking and Security Stack With Cilium, Hubble, and Tetragon: Join us as we celebrate nearly a decade of Cilium. This session provides updates on the latest Cilium release and showcases how its unified eBPF-powered stack is transforming Kubernetes environments by replacing fragmented toolchains with seamless, secure, scalable, and simplified solutions.
The Next Generation of DaemonSet Autoscaling: Imagine you have small 4-core nodes and larger 64-core nodes in the same cluster, and a DaemonSet that does much more work on the larger nodes. How do you set resource requests and limits appropriately? In this talk we discuss our case studies, why this feature is useful, how our prototype implements per-pod VPA for DaemonSets to improve resource efficiency, stability, and eliminate the need for manual tuning.
Making the Leap: What Gateway API Needs To Support Ingress-NGINX Users: Ingress-NGINX has been the cornerstone of Kubernetes Ingress for years. In this talk, Rob and James explore the critical challenges of migrating from Ingress to Gateway. They highlight commonly used Ingress-NGINX features that are not yet supported in Gateway API and discuss how the community can drive the evolution of Gateway API to meet the needs of Ingress-NGINX users.
Encryption, Identities, and Everything in Between; Building Secure Kubernetes Networks: As the scale of your clusters grows, so does the complexity of securing your networks. The stakes are high: Inadequate encryption or identity management solutions can leave clusters vulnerable to a range of security risks. In this session, Lior and Igor explore the landscape of network encryption, AuthN and AuthZ solutions grounded in the principles of defense-in-depth and least privilege.
Defusing the Kubernetes API Performance Minefield: Kubernetes enables a wide landscape of CNCF projects and organizations to build upon its foundation and extend its functionality through custom controllers. But anyone who has deployed an operator at scale quickly discovers that the Kubernetes API is a performance minefield where one small mistake can lead to reliability and performance issues. Join us to learn how these changes mitigate risks, boost performance, and contribute to a more stable and reliable Kubernetes experience.
Source Credit: https://cloud.google.com/blog/products/containers-kubernetes/google-cloud-at-kubecon-europe-2025/