Scaling Multi-Agent Workflows: Solving the Token Limit Problem
Building Distributed Multi-Agent Systems with Google’s AI Stack series:
- Part 1: From Monolithic AI to Distributed Intelligence: Building Your First Multi-Agent System
- Part 2: Making Agents Talk: Agent-to-Agent (A2A) Protocol Deep Dive
- Part 3: Building the Orchestrator: Coordinating Agents with the AgentTool Pattern
- Part 4: Scaling Multi-Agent Workflows: Solving the Token Limit Problem ← You are here
- Part 5: External Tool Integration via Model Context Protocol (MCP)
- Part 6: Deploying to Cloud: Cloud Run and Vertex AI Agent Engine
Welcome Back!
In Part 3, we built an intelligent orchestrator that coordinates 5 specialist agents. It works beautifully… until it doesn’t.
The Problem
You test a complete 5-agent campaign workflow:
✅ Agent 1 (Brand Strategist): Complete - 2,000 tokens output
✅ Agent 2 (Copywriter): Complete - 2,500 tokens output
✅ Agent 3 (Designer): Complete - 1,800 tokens output
❌ Agent 4 (Critic): Workflow stops!
❌ Agent 5 (Project Manager): Never reached!
What happened? You hit the token output limit.
In this article, we’ll solve this with Lazy Context Compaction; an elegant solution that:
- Summarizes older agent outputs intelligently
- Preserves recent context quality
- Scales workflows to 10+ agents
- Reduces token costs
Let’s fix it!
Understanding the Token Limit Problem
What Are Token Limits?
LLMs have two token limits:
- Input limit: How much context they can read (e.g., 128K tokens)
- Output limit: How much they can generate (e.g., 8,192 tokens)
Our problem is the output limit.
Why Multi-Agent Workflows Hit Limits
User Brief: 200 tokens
↓
Agent 1 Output: 2,000 tokens
Agent 2 Output: 2,500 tokens
Agent 3 Output: 1,800 tokens
-----------------------------------
Orchestrator's response so far: 6,500 tokens
Agent 4 tries to start...
❌ Would exceed 8,192 token limit!
Workflow stops prematurely.
Why This Happens
The orchestrator presents the full output from each agent to maintain transparency. After 3 agents, it’s already used most of its output budget!
Traditional solutions:
. Increase max_output_tokens → Still fails with more agents
. Summarize everything → Loses important context
. Reduce agent outputs → Loses quality
Our solution: Lazy Context Compaction
What is Lazy Context Compaction?
Lazy Context Compaction is a strategy that:
- Compacts only when needed (after N agents)
- Summarizes older outputs (saves tokens)
- Preserves recent outputs (maintains quality)
- Uses LLM for summarization (intelligent compression)
The Strategy
Agents 1-3: Full context preserved
↓
After Agent 3: Compaction triggered
↓
Agents 1-2: Summarized → ~500 tokens
Agent 3: Full output preserved → ~1,800 tokens
↓
Agents 4-5: Execute with room to spare!

Result: Workflow completes successfully with high-quality outputs.
Implementing Context Compaction with ADK
Step 1: Import Required Components
# agents/creative_director/agent.py
from google.adk.apps.llm_event_summarizer import LlmEventSummarizer
from google.adk.apps.app import EventsCompactionConfig
from google.adk.apps import App
from google.adk.models import Gemini
Step 2: Create Summarizer
# Use fast model for summarization (cost-efficient)
summarization_llm = Gemini(model_id="gemini-2.5-flash")
summarizer = LlmEventSummarizer(llm=summarization_llm)
Why Gemini Flash?
- Fast summarization
- Cost-efficient
- High-quality summaries
- Same model family as main agent
Step 3: Configure Compaction
compaction_config = EventsCompactionConfig(
summarizer=summarizer,
compaction_interval=3, # Summarize after every 3 agents
overlap_size=1 # Keep most recent agent's full output
)
Configuration explained:
- compaction_interval=3: Compact after 3 agent completions
- overlap_size=1: Keep 1 most recent agent full (preserve quality)
Step 4: Wrap Agent in App
def create_creative_director():
# ... (agent creation code from Part 4) ...
agent = Agent(
name="creative_director",
model="gemini-2.5-flash",
tools=agent_tools,
instruction=system_instruction,
generate_content_config=GenerateContentConfig(
max_output_tokens=20000, # Increased from 8,192
temperature=0.2
)
)
# Wrap agent in App with compaction config
app = App(
name="creative_director",
root_agent=agent,
events_compaction_config=compaction_config
)
logger.info("✅ App created with lazy context compaction")
logger.info(" Compaction interval: 3 agents")
logger.info(" Overlap size: 1 agent")
logger.info(" Context will be summarized only when necessary")
return app
# Create app (not just agent)
root_agent = create_creative_director()
Important: We return an App, not just an Agent!
How It Works: Step by Step
5-Agent Workflow Example
Phase 1: Agents 1–3 (No Compaction)
User: "Create complete Instagram campaign"
Orchestrator announces plan:
"I'll coordinate our team:
1. Brand Strategist → research
2. Copywriter → posts
3. Designer → visuals
4. Critic → review
5. Project Manager → timeline"
Agent 1 (Brand Strategist) executes:
→ Output: 2,000 tokens (FULL)
→ Total context: 2,000 tokens
Agent 2 (Copywriter) executes:
→ Output: 2,500 tokens (FULL)
→ Total context: 4,500 tokens
Agent 3 (Designer) executes:
→ Output: 1,800 tokens (FULL)
→ Total context: 6,300 tokens
Status: No compaction yet. All outputs preserved.
Phase 2: After Agent 3 (Compaction Triggered)
Compaction interval reached (3 agents)
↓
Summarizer analyzes:
- Agent 1 output (2,000 tokens)
- Agent 2 output (2,500 tokens)
↓
Creates intelligent summary:
- "Brand Strategist research: [key points] (300 tokens)
- Copywriter posts: [post summaries] (200 tokens)
Total summary: 500 tokens
↓
Keeps Agent 3 full (overlap_size=1):
- Designer visuals: [full output] (1,800 tokens)
↓
New context size: 500 + 1,800 = 2,300 tokens
Saved: 4,000 tokens! (from 6,300 → 2,300)
Phase 3: Agents 4–5 (With Compacted Context)
Agent 4 (Critic) executes:
→ Context available: 2,300 tokens
→ Has: Summary of research/posts + Full visual concepts
→ Output: 1,500 tokens
→ Total: 3,800 tokens
Agent 5 (Project Manager) executes:
→ Context available: 3,800 tokens
→ Output: 2,000 tokens
→ Total: 5,800 tokens
✅ Workflow completes successfully!
✅ Under 8,192 token limit
✅ All 5 agents executed
Configuration Strategies
Short Workflows (3–5 agents)
compaction_config = EventsCompactionConfig(
summarizer=summarizer,
compaction_interval=3, # Compact after 3 agents
overlap_size=1 # Keep last 1 full
)
Use when:
- 3–5 agents total
- Moderate output per agent
- Quality is critical
Long Workflows (5–10 agents)
compaction_config = EventsCompactionConfig(
summarizer=summarizer,
compaction_interval=4, # Compact after 4 agents
overlap_size=2 # Keep last 2 full
)
Use when:
- 5–10 agents total
- Need more recent context preserved
- Complex interdependencies
Very Long Workflows (10+ agents)
compaction_config = EventsCompactionConfig(
summarizer=summarizer,
compaction_interval=5, # Compact every 5 agents
overlap_size=2 # Keep last 2 full
)
Use when:
- 10+ agents total
- Very complex workflows
- Multiple rounds of compaction needed
Quality-Critical Workflows
compaction_config = EventsCompactionConfig(
summarizer=summarizer,
compaction_interval=3,
overlap_size=2 # Keep last 2 full (more quality)
)
Use when:
- Quality > token savings
- Later agents need rich context
- Acceptable to compact more frequently
Testing Context Compaction
Test Script
# test_context_compaction.py
import asyncio
from creative_director.agent import root_agent
from google.adk import Runner
from google.adk.sessions import InMemorySessionService
from google.genai import types
async def test_full_workflow():
"""Test complete 5-agent workflow with compaction"""
brief = """
Create a complete Instagram campaign for EcoFlow smart water bottle.
Target: Health-conscious millennials (25-34).
Budget: $5,000. Launch in 2 weeks.
Include research, posts, visuals, review, and full timeline.
"""
print("="*70)
print("Testing 5-Agent Workflow with Context Compaction")
print("="*70)
print(f"\nBrief: {brief}\n")
session_service = InMemorySessionService()
runner = Runner(
app_name="creative_director",
agent=root_agent, # This is now an App, not just Agent
session_service=session_service
)
session_id = "test_compaction"
user_id = "test_user"
agent_count = 0
try:
await session_service.create_session(
app_name="creative_director",
user_id=user_id,
session_id=session_id
)
async for event in runner.run_async(
user_id=user_id,
session_id=session_id,
new_message=types.Content(parts=[types.Part(text=brief)])
):
if hasattr(event, 'text') and event.text:
text = event.text
# Count agent completions
if "✓" in text and "complete" in text.lower():
agent_count += 1
print(f"\n[Agent {agent_count} completed]")
# Detect compaction
if "summariz" in text.lower():
print("\n[!] Context compaction triggered" print(text, end='', flush=True)
print(f"\n\n{'='*70}")
print(f"✅ Workflow complete!")
print(f" Agents executed: {agent_count}/5")
print(f"{'='*70}")
if agent_count == 5:
print("✅ SUCCESS: All 5 agents completed (compaction worked!)")
else:
print(f"❌ PARTIAL: Only {agent_count}/5 agents completed")
finally:
await runner.close()
if __name__ == "__main__":
asyncio.run(test_full_workflow())
Expected Output
======================================================================
Testing 5-Agent Workflow with Context Compaction
======================================================================
creative_director > I'll coordinate our team to create your campaign:
1. Brand Strategist → research
2. Copywriter → posts
3. Designer → visuals
4. Critic → review
5. Project Manager → timeline
Let's begin!
[Agent 1 completed]
✓ Research complete. I received audience insights...
[Agent 2 completed]
✓ Copywriting complete. I received 5 Instagram posts...
[Agent 3 completed]
✓ Design complete. I received image concepts...
[!] Context compaction triggered
[Agent 4 completed]
✓ Review complete. Quality score: 8.5/10...
[Agent 5 completed]
✓ Timeline complete. Project plan created...
======================================================================
✅ Workflow complete!
Agents executed: 5/5
======================================================================
✅ SUCCESS: All 5 agents completed (compaction worked!)
Token Usage Comparison
Without Compaction
Agent 1: 2,000 tokens output
Agent 2: 2,500 tokens output
Agent 3: 1,800 tokens output
-----------------------------------
Total: 6,300 tokens
Agent 4: ❌ Cannot start (would exceed 8,192)
Result: FAILURE (3/5 agents completed)
With Compaction (interval=3, overlap=1)
Agent 1: 2,000 tokens output
Agent 2: 2,500 tokens output
Agent 3: 1,800 tokens output
Total before compaction: 6,300 tokens
→ Compaction triggered
Agents 1-2 summarized: 500 tokens
Agent 3 preserved: 1,800 tokens
Total after compaction: 2,300 tokens
Agent 4: 1,500 tokens output (2,300 → 3,800 total)
Agent 5: 2,000 tokens output (3,800 → 5,800 total)
-----------------------------------
Final: 5,800 tokens (under 8,192 limit)
Result: ✅ SUCCESS (5/5 agents completed)
Token savings: 500 tokens from compaction Workflow success: 100% (vs 60% without)
Quality Preservation
What Gets Summarized?
The summarizer preserves key information:
Original Agent 1 Output (2,000 tokens):
**Audience Insights:**
Health-conscious millennials (25-34) are increasingly seeking products...
[1,500 words of detailed analysis]
**Competitive Analysis:**
1. Hydro Flask - Established brand with strong loyalty...
[800 words of competitor details]
**Trending Topics:**
1. #SustainableLiving - 2.3M posts, growing 15% monthly...
[700 words of trend analysis]
Summarized Version (300 tokens):
Research Summary: Target audience is health-conscious millennials (25-34)
valuing sustainability and smart features. Main competitors: Hydro Flask
(premium, no tech), S'well (design-focused), HidrateSpark (smart but basic).
Key trends: sustainable living, hydration tracking, minimalist aesthetics.
Opportunity: premium sustainable + smart features gap in market.
Key points preserved:
- Target audience demographics
- Main competitors identified
- Key trends listed
- Strategic opportunity highlighted
Details lost:
- Full competitor analysis
- Detailed trend statistics
- Extended audience behaviors
Quality vs Efficiency Trade-off
overlap_size=0: Maximum compression, minimal quality
overlap_size=1: Balanced (recommended)
overlap_size=2: High quality, less compression
overlap_size=3: Maximum quality, minimal compression
Recommendation: Start with overlap_size=1, increase if quality issues arise.
When NOT to Use Compaction
Scenario 1: Short Workflows
# 2-agent workflow
brief = "Research the market and write 3 posts"
# No compaction needed - output is small
Scenario 2: Small Outputs
# Each agent outputs < 500 tokens
# Total for 5 agents: 2,500 tokens
# Well under limit - compaction unnecessary
Scenario 3: Context-Critical Tasks
# Legal document review where every detail matters
# Better to split into multiple sessions than compress
Advanced: Multiple Compaction Rounds
For very long workflows (15+ agents), multiple compaction rounds occur:
Agents 1-3: Full
→ Compaction 1: Agents 1-2 summarized, Agent 3 kept
Agents 4-6: Execute
→ Compaction 2: Agents 1-4 summarized, Agents 5-6 kept
Agents 7-9: Execute
→ Compaction 3: Agents 1-7 summarized, Agents 8-9 kept
... and so on
Each round further compresses older context while preserving recent work.
Troubleshooting
Issue 1: Workflow Still Stops Early
Solution: Reduce compaction_interval:
compaction_interval=2 # Compact more frequently
Issue 2: Quality Degradation
Solution: Increase overlap_size:
overlap_size=2 # Keep more recent context
Issue 3: Too Much Compaction
Solution: Increase compaction_interval:
compaction_interval=4 # Compact less frequently
Cost Analysis
Without Compaction (Failed Workflow)
Agents executed: 3/5
Input tokens: 6,300 (wasted partial context)
Output tokens: 6,300
Cost: ~$0.05 (but incomplete workflow)
Value: $0 (workflow failed)
With Compaction (Successful Workflow)
Agents executed: 5/5 ✅
Input tokens: 8,000 (including summarization)
Output tokens: 5,800
Summarization cost: ~$0.01
Total cost: ~$0.07
Value: Complete campaign delivered ✅
ROI: 40% more cost, but 100% success vs failure!
Our agents can now scale to handle complex workflows. But what about integrating with external services?
In Part 6, we’ll add Model Context Protocol (MCP) integration to the Project Manager agent.
Code Repository: https://github.com/Saoussen-CH/ai-creative-studio-adk-a2a-mcp-vertexai-cloudrun
Next: Part 5: External Tool Integration via MCP →
Thanks for reading! I hope this helps you on your journey. If you found value in this, please clap, leave a comment, and star the GitHub repo. Hit the Follow button to get notified about my next article, so don’t forget to Subscribe to the email list and let’s connect on LinkedIn!
Building Distributed Multi-Agent Systems with Google’s AI Stack: Part 4 was originally published in Google Cloud – Community on Medium, where people are continuing the conversation by highlighting and responding to this story.
Source Credit: https://medium.com/google-cloud/building-distributed-multi-agent-systems-with-googles-ai-stack-part-4-e2d58bfb3957?source=rss—-e52cf94d98af—4
