Building Distributed Multi-Agent Systems with Google’s AI Stack: Part 4

Scaling Multi-Agent Workflows: Solving the Token Limit Problem

Building Distributed Multi-Agent Systems with Google’s AI Stack series:

Part 1: From Monolithic AI to Distributed Intelligence: Building Your First Multi-Agent System
Part 2: Making Agents Talk: Agent-to-Agent (A2A) Protocol Deep Dive
Part 3: Building the Orchestrator: Coordinating Agents with the AgentTool Pattern
Part 4: Scaling Multi-Agent Workflows: Solving the Token Limit Problem ← You are here
Part 5: External Tool Integration via Model Context Protocol (MCP)
Part 6: Deploying to Cloud: Cloud Run and Vertex AI Agent Engine

Welcome Back!

In Part 3, we built an intelligent orchestrator that coordinates 5 specialist agents. It works beautifully… until it doesn’t.

The Problem

You test a complete 5-agent campaign workflow:

✅ Agent 1 (Brand Strategist): Complete - 2,000 tokens output
✅ Agent 2 (Copywriter): Complete - 2,500 tokens output
✅ Agent 3 (Designer): Complete - 1,800 tokens output
❌ Agent 4 (Critic): Workflow stops!
❌ Agent 5 (Project Manager): Never reached!

What happened? You hit the token output limit.

In this article, we’ll solve this with Lazy Context Compaction; an elegant solution that:

Summarizes older agent outputs intelligently
Preserves recent context quality
Scales workflows to 10+ agents
Reduces token costs

Let’s fix it!

Understanding the Token Limit Problem

What Are Token Limits?

LLMs have two token limits:

Input limit: How much context they can read (e.g., 128K tokens)
Output limit: How much they can generate (e.g., 8,192 tokens)

Our problem is the output limit.

Why Multi-Agent Workflows Hit Limits

User Brief: 200 tokens
↓
Agent 1 Output: 2,000 tokens
Agent 2 Output: 2,500 tokens
Agent 3 Output: 1,800 tokens
-----------------------------------
Orchestrator's response so far: 6,500 tokens
Agent 4 tries to start...
❌ Would exceed 8,192 token limit!
Workflow stops prematurely.

Why This Happens

The orchestrator presents the full output from each agent to maintain transparency. After 3 agents, it’s already used most of its output budget!

Traditional solutions:

. Increase max_output_tokens → Still fails with more agents

. Summarize everything → Loses important context

. Reduce agent outputs → Loses quality

Our solution: Lazy Context Compaction

What is Lazy Context Compaction?

Lazy Context Compaction is a strategy that:

Compacts only when needed (after N agents)
Summarizes older outputs (saves tokens)
Preserves recent outputs (maintains quality)
Uses LLM for summarization (intelligent compression)

The Strategy

Agents 1-3: Full context preserved
    ↓
After Agent 3: Compaction triggered
    ↓
Agents 1-2: Summarized → ~500 tokens
Agent 3: Full output preserved → ~1,800 tokens
    ↓
Agents 4-5: Execute with room to spare!

Building Distributed Multi-Agent Systems with Google’s AI Stack: Part 4

Result: Workflow completes successfully with high-quality outputs.

Implementing Context Compaction with ADK

Step 1: Import Required Components

# agents/creative_director/agent.py
from google.adk.apps.llm_event_summarizer import LlmEventSummarizer
from google.adk.apps.app import EventsCompactionConfig
from google.adk.apps import App
from google.adk.models import Gemini

Step 2: Create Summarizer

# Use fast model for summarization (cost-efficient)
summarization_llm = Gemini(model_id="gemini-2.5-flash")
summarizer = LlmEventSummarizer(llm=summarization_llm)

Why Gemini Flash?

Fast summarization
Cost-efficient
High-quality summaries
Same model family as main agent

Step 3: Configure Compaction

compaction_config = EventsCompactionConfig(
    summarizer=summarizer,
    compaction_interval=3,  # Summarize after every 3 agents
    overlap_size=1          # Keep most recent agent's full output
)

Configuration explained:

compaction_interval=3: Compact after 3 agent completions
overlap_size=1: Keep 1 most recent agent full (preserve quality)

Step 4: Wrap Agent in App

def create_creative_director():
    # ... (agent creation code from Part 4) ...
    agent = Agent(
        name="creative_director",
        model="gemini-2.5-flash",
        tools=agent_tools,
        instruction=system_instruction,
        generate_content_config=GenerateContentConfig(
            max_output_tokens=20000,  # Increased from 8,192
            temperature=0.2
        )
    )
    # Wrap agent in App with compaction config
    app = App(
        name="creative_director",
        root_agent=agent,
        events_compaction_config=compaction_config
    )
    logger.info("✅ App created with lazy context compaction")
    logger.info("   Compaction interval: 3 agents")
    logger.info("   Overlap size: 1 agent")
    logger.info("   Context will be summarized only when necessary")
    return app

# Create app (not just agent)
root_agent = create_creative_director()

Important: We return an App, not just an Agent!

How It Works: Step by Step

5-Agent Workflow Example

Phase 1: Agents 1–3 (No Compaction)

User: "Create complete Instagram campaign"
Orchestrator announces plan:
"I'll coordinate our team:
1. Brand Strategist → research
2. Copywriter → posts
3. Designer → visuals
4. Critic → review
5. Project Manager → timeline"
Agent 1 (Brand Strategist) executes:
→ Output: 2,000 tokens (FULL)
→ Total context: 2,000 tokens
Agent 2 (Copywriter) executes:
→ Output: 2,500 tokens (FULL)
→ Total context: 4,500 tokens
Agent 3 (Designer) executes:
→ Output: 1,800 tokens (FULL)
→ Total context: 6,300 tokens

Status: No compaction yet. All outputs preserved.

Phase 2: After Agent 3 (Compaction Triggered)

Compaction interval reached (3 agents)
    ↓
Summarizer analyzes:
- Agent 1 output (2,000 tokens)
- Agent 2 output (2,500 tokens)
    ↓
Creates intelligent summary:
- "Brand Strategist research: [key points] (300 tokens)
- Copywriter posts: [post summaries] (200 tokens)
Total summary: 500 tokens
    ↓
Keeps Agent 3 full (overlap_size=1):
- Designer visuals: [full output] (1,800 tokens)
    ↓
New context size: 500 + 1,800 = 2,300 tokens

Saved: 4,000 tokens! (from 6,300 → 2,300)

Phase 3: Agents 4–5 (With Compacted Context)

Agent 4 (Critic) executes:
→ Context available: 2,300 tokens
→ Has: Summary of research/posts + Full visual concepts
→ Output: 1,500 tokens
→ Total: 3,800 tokens
Agent 5 (Project Manager) executes:
→ Context available: 3,800 tokens
→ Output: 2,000 tokens
→ Total: 5,800 tokens
✅ Workflow completes successfully!
✅ Under 8,192 token limit
✅ All 5 agents executed

Configuration Strategies

Short Workflows (3–5 agents)

compaction_config = EventsCompactionConfig(
    summarizer=summarizer,
    compaction_interval=3,  # Compact after 3 agents
    overlap_size=1          # Keep last 1 full
)

Use when:

3–5 agents total
Moderate output per agent
Quality is critical

Long Workflows (5–10 agents)

compaction_config = EventsCompactionConfig(
    summarizer=summarizer,
    compaction_interval=4,  # Compact after 4 agents
    overlap_size=2          # Keep last 2 full
)

Use when:

5–10 agents total
Need more recent context preserved
Complex interdependencies

Very Long Workflows (10+ agents)

compaction_config = EventsCompactionConfig(
    summarizer=summarizer,
    compaction_interval=5,  # Compact every 5 agents
    overlap_size=2          # Keep last 2 full
)

Use when:

10+ agents total
Very complex workflows
Multiple rounds of compaction needed

Quality-Critical Workflows

compaction_config = EventsCompactionConfig(
    summarizer=summarizer,
    compaction_interval=3,
    overlap_size=2  # Keep last 2 full (more quality)
)

Use when:

Quality > token savings
Later agents need rich context
Acceptable to compact more frequently

Testing Context Compaction

Test Script

# test_context_compaction.py
import asyncio
from creative_director.agent import root_agent
from google.adk import Runner
from google.adk.sessions import InMemorySessionService
from google.genai import types
async def test_full_workflow():
    """Test complete 5-agent workflow with compaction"""
    brief = """
    Create a complete Instagram campaign for EcoFlow smart water bottle.
    Target: Health-conscious millennials (25-34).
    Budget: $5,000. Launch in 2 weeks.
    Include research, posts, visuals, review, and full timeline.
    """
    print("="*70)
    print("Testing 5-Agent Workflow with Context Compaction")
    print("="*70)
    print(f"\nBrief: {brief}\n")
    session_service = InMemorySessionService()
    runner = Runner(
        app_name="creative_director",
        agent=root_agent,  # This is now an App, not just Agent
        session_service=session_service
    )
    session_id = "test_compaction"
    user_id = "test_user"
    agent_count = 0
    try:
        await session_service.create_session(
            app_name="creative_director",
            user_id=user_id,
            session_id=session_id
        )
        async for event in runner.run_async(
            user_id=user_id,
            session_id=session_id,
            new_message=types.Content(parts=[types.Part(text=brief)])
        ):
            if hasattr(event, 'text') and event.text:
                text = event.text
                # Count agent completions
                if "✓" in text and "complete" in text.lower():
                    agent_count += 1
                    print(f"\n[Agent {agent_count} completed]")
                # Detect compaction
                if "summariz" in text.lower():
                    print("\n[!] Context compaction triggered"                print(text, end='', flush=True)
        print(f"\n\n{'='*70}")
        print(f"✅ Workflow complete!")
        print(f"   Agents executed: {agent_count}/5")
        print(f"{'='*70}")
        if agent_count == 5:
            print("✅ SUCCESS: All 5 agents completed (compaction worked!)")
        else:
            print(f"❌ PARTIAL: Only {agent_count}/5 agents completed")
    finally:
        await runner.close()

if __name__ == "__main__":
    asyncio.run(test_full_workflow())

Expected Output

======================================================================
Testing 5-Agent Workflow with Context Compaction
======================================================================
creative_director > I'll coordinate our team to create your campaign:
1. Brand Strategist → research
2. Copywriter → posts
3. Designer → visuals
4. Critic → review
5. Project Manager → timeline
Let's begin!
[Agent 1 completed]
✓ Research complete. I received audience insights...
[Agent 2 completed]
✓ Copywriting complete. I received 5 Instagram posts...
[Agent 3 completed]
✓ Design complete. I received image concepts...
[!] Context compaction triggered
[Agent 4 completed]
✓ Review complete. Quality score: 8.5/10...
[Agent 5 completed]
✓ Timeline complete. Project plan created...
======================================================================
✅ Workflow complete!
   Agents executed: 5/5
======================================================================
✅ SUCCESS: All 5 agents completed (compaction worked!)

Token Usage Comparison

Without Compaction

Agent 1: 2,000 tokens output
Agent 2: 2,500 tokens output
Agent 3: 1,800 tokens output
-----------------------------------
Total: 6,300 tokens
Agent 4: ❌ Cannot start (would exceed 8,192)
Result: FAILURE (3/5 agents completed)

With Compaction (interval=3, overlap=1)

Agent 1: 2,000 tokens output
Agent 2: 2,500 tokens output
Agent 3: 1,800 tokens output
Total before compaction: 6,300 tokens
→ Compaction triggered
Agents 1-2 summarized: 500 tokens
Agent 3 preserved: 1,800 tokens
Total after compaction: 2,300 tokens
Agent 4: 1,500 tokens output (2,300 → 3,800 total)
Agent 5: 2,000 tokens output (3,800 → 5,800 total)
-----------------------------------
Final: 5,800 tokens (under 8,192 limit)
Result: ✅ SUCCESS (5/5 agents completed)

Token savings: 500 tokens from compaction Workflow success: 100% (vs 60% without)

Quality Preservation

What Gets Summarized?

The summarizer preserves key information:

Original Agent 1 Output (2,000 tokens):

**Audience Insights:**
Health-conscious millennials (25-34) are increasingly seeking products...
[1,500 words of detailed analysis]
**Competitive Analysis:**
1. Hydro Flask - Established brand with strong loyalty...
[800 words of competitor details]
**Trending Topics:**
1. #SustainableLiving - 2.3M posts, growing 15% monthly...
[700 words of trend analysis]

Summarized Version (300 tokens):

Research Summary: Target audience is health-conscious millennials (25-34)
valuing sustainability and smart features. Main competitors: Hydro Flask
(premium, no tech), S'well (design-focused), HidrateSpark (smart but basic).
Key trends: sustainable living, hydration tracking, minimalist aesthetics.
Opportunity: premium sustainable + smart features gap in market.

Key points preserved:

Target audience demographics
Main competitors identified
Key trends listed
Strategic opportunity highlighted

Details lost:

Full competitor analysis
Detailed trend statistics
Extended audience behaviors

Quality vs Efficiency Trade-off

overlap_size=0: Maximum compression, minimal quality
overlap_size=1: Balanced (recommended)
overlap_size=2: High quality, less compression
overlap_size=3: Maximum quality, minimal compression

Recommendation: Start with overlap_size=1, increase if quality issues arise.

When NOT to Use Compaction

Scenario 1: Short Workflows

# 2-agent workflow
brief = "Research the market and write 3 posts"
# No compaction needed - output is small

Scenario 2: Small Outputs

# Each agent outputs < 500 tokens
# Total for 5 agents: 2,500 tokens
# Well under limit - compaction unnecessary

Scenario 3: Context-Critical Tasks

# Legal document review where every detail matters
# Better to split into multiple sessions than compress

Advanced: Multiple Compaction Rounds

For very long workflows (15+ agents), multiple compaction rounds occur:

Agents 1-3: Full
→ Compaction 1: Agents 1-2 summarized, Agent 3 kept
Agents 4-6: Execute
→ Compaction 2: Agents 1-4 summarized, Agents 5-6 kept
Agents 7-9: Execute
→ Compaction 3: Agents 1-7 summarized, Agents 8-9 kept
... and so on

Each round further compresses older context while preserving recent work.

Troubleshooting

Issue 1: Workflow Still Stops Early

Solution: Reduce compaction_interval:

compaction_interval=2  # Compact more frequently

Issue 2: Quality Degradation

Solution: Increase overlap_size:

overlap_size=2  # Keep more recent context

Issue 3: Too Much Compaction

Solution: Increase compaction_interval:

compaction_interval=4  # Compact less frequently

Cost Analysis

Without Compaction (Failed Workflow)

Agents executed: 3/5
Input tokens: 6,300 (wasted partial context)
Output tokens: 6,300
Cost: ~$0.05 (but incomplete workflow)
Value: $0 (workflow failed)

With Compaction (Successful Workflow)

Agents executed: 5/5 ✅
Input tokens: 8,000 (including summarization)
Output tokens: 5,800
Summarization cost: ~$0.01
Total cost: ~$0.07
Value: Complete campaign delivered ✅

ROI: 40% more cost, but 100% success vs failure!

Our agents can now scale to handle complex workflows. But what about integrating with external services?

In Part 6, we’ll add Model Context Protocol (MCP) integration to the Project Manager agent.

Code Repository: https://github.com/Saoussen-CH/ai-creative-studio-adk-a2a-mcp-vertexai-cloudrun

Next: Part 5: External Tool Integration via MCP →

Thanks for reading! I hope this helps you on your journey. If you found value in this, please clap, leave a comment, and star the GitHub repo. Hit the Follow button to get notified about my next article, so don’t forget to Subscribe to the email list and let’s connect on LinkedIn!

Building Distributed Multi-Agent Systems with Google’s AI Stack: Part 4 was originally published in Google Cloud – Community on Medium, where people are continuing the conversation by highlighting and responding to this story.

Source Credit: https://medium.com/google-cloud/building-distributed-multi-agent-systems-with-googles-ai-stack-part-4-e2d58bfb3957?source=rss—-e52cf94d98af—4

Deven Goratela

Administrator

Visit Website View All Posts

Related Stories

Introducing OpenClaw on Amazon Lightsail to run your autonomous private AI agents

How to Use the Gemini Deep Research API in Production

Partnering with DigitalRoute on reusable data pipelines

You may have missed

Introducing OpenClaw on Amazon Lightsail to run your autonomous private AI agents

How to Use the Gemini Deep Research API in Production

Learn how to create a lesson planner with AI easily

How Lendi revamped the refinance journey for its customers using agentic AI in 16 weeks using Amazon Bedrock

Scaling Multi-Agent Workflows: Solving the Token Limit Problem

Welcome Back!

The Problem

Understanding the Token Limit Problem

What Are Token Limits?

Why Multi-Agent Workflows Hit Limits

Why This Happens

What is Lazy Context Compaction?

The Strategy

Implementing Context Compaction with ADK

Step 1: Import Required Components

Step 2: Create Summarizer

Step 3: Configure Compaction

Step 4: Wrap Agent in App

How It Works: Step by Step

5-Agent Workflow Example

Phase 3: Agents 4–5 (With Compacted Context)

Configuration Strategies

Short Workflows (3–5 agents)

Long Workflows (5–10 agents)

Very Long Workflows (10+ agents)

Quality-Critical Workflows

Testing Context Compaction

Test Script

Expected Output

Token Usage Comparison

Without Compaction

With Compaction (interval=3, overlap=1)

Quality Preservation

What Gets Summarized?

Quality vs Efficiency Trade-off

When NOT to Use Compaction

Scenario 1: Short Workflows

Scenario 2: Small Outputs

Scenario 3: Context-Critical Tasks

Advanced: Multiple Compaction Rounds

Troubleshooting

Issue 1: Workflow Still Stops Early

Issue 2: Quality Degradation

Issue 3: Too Much Compaction

Cost Analysis

Without Compaction (Failed Workflow)

With Compaction (Successful Workflow)

About the Author

Related Stories

You may have missed