Building Cloud AI Copilots: Multi-Agent Systems on Google Cloud Run with ADK

Building Cloud AI Copilots: Multi-Agent Systems on Google Cloud Run with Gemini 2.5, ADK, and the A2A Protocol

The Problem: DevOps Complexity is Real

If you’ve ever managed Kubernetes clusters in production, you know the drill. You’re juggling multiple terminals, switching between kubectl commands, checking Grafana dashboards, and diving into GCP billing reports to understand why your cloud bill jumped 30% last month.

The reality? Most DevOps tasks require expertise across multiple domains:

Kubernetes orchestration
Cloud cost optimization
Security and compliance
Observability and monitoring

What if we could have specialized AI agents handle each domain — and let them collaborate when needed?

That’s exactly what I built for the Google Cloud Run Hackathon.

Introducing Cloud AI Copilots

Cloud AI Copilots is a multi-agent platform powered by Google’s Agent Development Kit (ADK) and Gemini 2.5 Flash, deployed entirely on Cloud Run. Think of it as having a team of expert DevOps engineers available 24/7 through natural language conversation.

The platform consists of three services:

K8s Copilot — Your Kubernetes management expert
Cost Optimization Copilot — Your cloud cost analyst
Copilot Dashboard — A unified interface to interact with both agents

But here’s what makes it special: these agents can talk to each other using Google’s A2A (Agent-to-Agent) protocol.

Building Cloud AI Copilots: Multi-Agent Systems on Google Cloud Run with ADK — *The unified dashboard provides access to both specialized AI agents*

Architecture: Serverless Multi-Agent System

Let me break down the technical architecture:

User → Dashboard (Cloud Run)
         ↓
    ┌────┴────┐
    ↓         ↓
K8s Copilot  Cost Copilot (both on Cloud Run)
    ↕         ↕
    A2A Protocol (Agent Collaboration)
    ↓         ↓
  GKE API   GCP Billing API

Each service runs independently on Cloud Run:

Auto-scales from 0 to 3 instances
512Mi memory, single CPU
Deployed in europe-west1
Docker images stored in Artifact Registry
Secrets managed via Google Secret Manager

CI/CD Pipeline: GitHub → Cloud Build → Artifact Registry → Cloud Run

The Magic: Agent-to-Agent Communication

Here’s where things get interesting. The A2A protocol enables agents to discover and communicate with each other seamlessly.

Example workflow:

You ask K8s Copilot:

“Are there any pods consuming excessive resources that might be increasing our costs?”

What happens behind the scenes:

K8s Copilot recognizes this query spans both Kubernetes and cost domains
It discovers Cost Copilot via the A2A protocol (/.well-known/agent-card.json)
K8s Copilot queries the Kubernetes API for resource usage
It sends the data to Cost Copilot via A2A protocol
Cost Copilot analyzes the financial impact
Both agents collaborate to provide a comprehensive answer

*K8s Copilot analyzing cluster resources and providing intelligent responses*

This is true multi-agent collaboration — not just multiple chatbots, but specialized agents working together.

Tech Stack Deep Dive

Backend (Python):

Google ADK — Framework for building production-grade AI agents
Gemini 2.5 Flash — Fast, efficient LLM for real-time responses
Kubernetes Python Client — Direct API access to GKE clusters
GCP Billing API — Real-time cost data retrieval
Uvicorn — High-performance ASGI server

Frontend (TypeScript):

TanStack Start — Modern full-stack React framework with SSR
React 19 — Latest version with improved performance
TypeScript — Type safety throughout
Tailwind CSS — Rapid UI development
shadcn/ui — Beautiful, accessible components

Infrastructure:

Cloud Run — Serverless container platform
Artifact Registry — Docker image storage
Secret Manager — Secure credential management
Cloud Build — Automated CI/CD pipeline

Building the Agents: A Practical Guide

Let me walk you through building an AI agent with Google ADK:

Step 1: Define Your Agent’s Tools

from adk.tools import tool
from kubernetes import client, config

@tool
async def list_pods(namespace: str = "default") -> dict:
    """List all pods in a Kubernetes namespace."""
    config.load_kube_config()
    v1 = client.CoreV1Api()
    pods = v1.list_namespaced_pod(namespace)
    
    return {
        "namespace": namespace,
        "pods": [
            {
                "name": pod.metadata.name,
                "status": pod.status.phase,
                "cpu": pod.spec.containers[0].resources.requests.get("cpu"),
                "memory": pod.spec.containers[0].resources.requests.get("memory")
            }
            for pod in pods.items
        ]
    }

Step 2: Create Your Agent

from adk import Agent
from google import genai

agent = Agent(
    name="k8s-copilot",
    model="gemini-2.5-flash",
    tools=[list_pods, get_deployments, scale_deployment],
    system_instruction="""
    You are a Kubernetes expert assistant. Help users manage their 
    clusters through natural language. Use the provided tools to 
    query and modify cluster resources.
    """
)

Step 3: Enable A2A Protocol

from adk.servers import a2a_server

# Expose A2A endpoint
app = a2a_server(agent)

if __name__ == "__main__":
    import uvicorn
    uvicorn.run(app, host="0.0.0.0", port=8080)

That’s it! Your agent now exposes the A2A protocol at /.well-known/agent-card.json.

*Cost Copilot provides actionable cost-saving recommendations*

Deployment: From Code to Cloud Run

The deployment process is fully automated with Cloud Build:

cloudbuild.yaml:

steps:
  # Build Docker image
  - name: 'gcr.io/cloud-builders/docker'
    args: ['build', '-t', 'europe-west1-docker.pkg.dev/$PROJECT_ID/k8s-copilot-repo/k8s-copilot:$COMMIT_SHA', '.']
  
  # Push to Artifact Registry
  - name: 'gcr.io/cloud-builders/docker'
    args: ['push', 'europe-west1-docker.pkg.dev/$PROJECT_ID/k8s-copilot-repo/k8s-copilot:$COMMIT_SHA']
  
  # Deploy to Cloud Run
  - name: 'gcr.io/google.com/cloudsdktool/cloud-sdk'
    args:
      - 'run'
      - 'deploy'
      - 'k8s-copilot'
      - '--image=europe-west1-docker.pkg.dev/$PROJECT_ID/k8s-copilot-repo/k8s-copilot:$COMMIT_SHA'
      - '--region=europe-west1'
      - '--platform=managed'
      - '--allow-unauthenticated'

Trigger deployment:

git push origin main
# Cloud Build automatically builds and deploys

*Agents collaborating via A2A protocol to answer complex queries*

Challenges & Solutions

Challenge 1: Secret Management

Problem: Different agents need different secrets (kubeconfig, API keys)

Solution: Used Google Secret Manager with granular IAM permissions. Each Cloud Run service only mounts the secrets it needs.

Challenge 2: A2A Protocol Implementation

Problem: Limited examples of A2A protocol in production

Solution: Studied the spec thoroughly, implemented agent discovery, and tested cross-agent communication extensively.

Challenge 3: Cost Data Without BigQuery

Problem: Setting up BigQuery billing export requires org-level permissions

Solution: Used GCP Billing API and Cloud Asset Inventory for real-time cost data instead.

Performance & Cost Efficiency

Response Times:

K8s Copilot: ~2–3 seconds for Kubernetes queries
Cost Copilot: ~3–4 seconds for billing analysis
A2A collaboration: +1–2 seconds overhead

Cost Breakdown (Monthly):

Cloud Run (3 services, light usage): ~$5–10
Gemini API calls (1M tokens): ~$0.10
Artifact Registry storage: ~$1
Secret Manager: ~$0.06

Total: ~$6–11/month for a production-ready multi-agent platform 🎉

Key Learnings

Specialization > Generalization — Focused agents with domain expertise outperform general-purpose chatbots
A2A is the future — Agent collaboration opens up entirely new possibilities
Serverless = Cost-Efficient AI — Cloud Run’s auto-scaling keeps costs low
Tools are everything — The quality of your agent’s tools determines its usefulness
Static-first UIs — Pre-configured agent metadata beats dynamic discovery for UX

What’s Next?

Short-term:

BigQuery billing export integration
Multi-cluster support
Conversation history
Slack/Teams integration

Long-term:

More specialized agents (Security, Observability, CI/CD)
Agent orchestration for complex workflows
Multi-cloud support (AWS, Azure)
Open-source agent framework

Try It Yourself

The entire project is open source!

🔗 GitHub: [https://github.com/m3rryqold/cloud-copilot] 🎥 Demo Video: [https://youtu.be/RTIN7x1M5zI] 🚀 Live Demo: [https://agents.hunt3r.dev/] 🏆 Hackathon: [https://run.devpost.com/]

Quick Start:

git clone https://github.com/your-username/cloudcopilot
cd cloudcopilot

# Deploy to your GCP project
gcloud builds submit --config cloudbuild.yaml

Conclusion

Building Cloud AI Copilots taught me that the future of AI isn’t about building one giant model that does everything — it’s about creating specialized agents that collaborate to solve complex problems.

With Google ADK, the A2A protocol, and Cloud Run, we can now build production-grade multi-agent systems that are:

✅ Cost-efficient (serverless auto-scaling)
✅ Intelligent (Gemini 2.5 Flash)
✅ Collaborative (A2A protocol)
✅ Production-ready (Cloud Run + Secret Manager)

The question isn’t “Can AI agents replace DevOps engineers?”

The real question is: “How can AI agents augment DevOps teams to move faster and make better decisions?”

Cloud AI Copilots is my answer.

Connect & Discuss

I’d love to hear your thoughts on multi-agent systems and AI for DevOps!

💬 Comment below with your questions

Built something with Google ADK or the A2A protocol? Share your project — I’d love to check it out!

Building Cloud AI Copilots: Multi-Agent Systems on Google Cloud Run with ADK was originally published in Google Cloud – Community on Medium, where people are continuing the conversation by highlighting and responding to this story.

Source Credit: https://medium.com/google-cloud/building-cloud-ai-copilots-multi-agent-systems-on-google-cloud-run-with-adk-d4a2a18628e3?source=rss—-e52cf94d98af—4

Deven Goratela

Administrator

Visit Website View All Posts

Related Stories

Secure Private Access for Cloud Run with Private Service Connect

When writing, beware of zombies!

Auto-ISAC, Google partner to boost automotive sector cybersecurity

You may have missed

Grid Dynamics (GDYN) Is Up 8.2% After New AWS GenAI Deal And Plan Expansion – Has The Bull Case Changed? – Yahoo Finance