How to reconcile the stateful nature of LLM reasoning with the stateless reality of cloud-native infrastructure using LangGraph, Vertex AI, and MCP.
The modern Security Operations Center (SOC) is fundamentally a massive real-time stream-processing engine. When your infrastructure emits billions of telemetry events per second, triaging anomalies is no longer a human-scale problem. We are rapidly moving toward the era of Data Agents: autonomous, LLM-backed microservices capable of context-aware investigation, telemetry querying, and automated remediation.
However, deploying these agents exposes a massive architectural impedance mismatch.
Most AI agent frameworks are designed as monolithic, long-running processes that hold their conversational state, tool schemas, and reasoning scratchpads in local RAM. But the modern cloud is ephemeral. To handle erratic, spikey event streams without aggressive over-provisioning, we must deploy to serverless infrastructure like Google Cloud Run.
The challenge thus remains — how do you build a highly stateful, context-heavy AI agent on a fundamentally stateless, ephemeral runtime?
This is the Stateless Paradox. And here we will discuss the architectural blueprint for resolving it.
Enterprise Use Cases: Beyond the Prototype
When you move this architecture out of the lab and into the enterprise, the pattern of “stateless compute over an external ledger” unlocks massive, highly concurrent security and engineering workflows:
- Autonomous Threat Hunting (SOC): Instead of paging an L1 analyst for anomalous outbound network traffic, an event triggers the agent. It retrieves past interactions regarding the specific subnet (Long-Term Memory), uses an MCP tool to execute a read-only query against VPC flow logs, and cross-references the destination IPs against an internal Threat Intelligence API.
- Zero Trust Access Triage: When a user requests elevated, just-in-time access to a critical production database, the agent wakes up statelessly. It checks the user’s recent Okta authentication context, queries the Jira API via MCP to verify an open change-management ticket, and automatically grants or denies the temporary credential.
- Cloud Security Posture Management (CSPM): If a developer accidentally exposes an S3 bucket or misconfigures a Kubernetes cluster, the event stream triggers the agent. It pulls the infrastructure-as-code history, identifies the PR that caused the drift, and automatically pushes a remediation commit — maintaining the investigation thread so human engineers can review the automated fix.
The Architecture: Decoupling Compute from the State Machine
If you deploy a stateful LangChain/LangGraph agent into a standard Kubernetes pod, you are building a ticking time bomb. A sudden Kafka spike of 10,000 alerts will result in immediate CPU throttling, aggressive OOM (Out of Memory) kills, and the catastrophic loss of the agent’s internal reasoning context mid-investigation.
To survive on the serverless containerized infrastructure, the agent must be modeled as a stateless transition engine executing over an externalized ledger. While the container remains warm to handle concurrent streams, each request must be treated as an atomic operation:
Hydrate(State) → Reason(LLM) → Mutate(Ledger).
This ensures that even as the Cloud Run autoscaler aggressively spins up and tears down containers, the ‘intelligence’ of the investigation remains preserved in the shared state layer, effectively decoupling the Compute Lifecycle from the Inquiry Lifecycle.
The Cloud Run container holds zero memory between invocations. To achieve human-like recall without violating this constraint, we must strictly delineate our memory tiers based on their required read/write latency and consistency guarantees:
- L1: Ephemeral Execution Context (RAM): The agent’s internal chain-of-thought and intermediate tool outputs. This lives entirely in the container’s memory and is strictly bound to the lifecycle of a single HTTP request. It vanishes when the response is returned.
- L2: Session State (External Checkpointer): The conversational ledger of the investigation. We use LangGraph’s checkpointer mechanism backed by a highly available, CP (Consistent/Partition-tolerant) database like Google Cloud Firestore or PostgreSQL. State is hydrated at the start of a container execution and atomically persisted at the end.
- L3: Semantic Index (Vector Database): Episodic knowledge (e.g., threat intel, historical post-mortems). This is a read-heavy, eventually consistent tier retrieved via Vector Search only when the agent encounters a recognized Indicator of Compromise (IOC).

This diagram is the visual manifestation of the “Stateless Paradox” .It illustrates exactly how a system can be both ephemeral and stateful. Here is the narrative flow it represents:
- Group 1: Ingress & Serverless Scaling: This shows the volatile nature of serverless compute. As the User Request Stream spikes, the Orchestrator violently scales out (Instances 1, 2… N). Each instance is completely ignorant of the others and spins up with an empty L1 Cache (RAM).
- Group 2: Hydration (Solving the Paradox): Because the instance wakes up with amnesia, its very first I/O task is to look outward. It hits the L2 Cache (e.g., Firestore/Postgres) to grab the current conversational thread and the L3 Cache (Vector DB) to grab any historical context. It loads this external state into its local L1 Scratchpad.
- Group 3: Execution & Intelligence: Now fully hydrated, the instance passes the L1 context to the Foundational LLM. If the LLM decides to use a tool, the instance executes it via the Isolated MCP Server, appending the results back into its local L1 RAM.
- Group 4: Dehydration: Before the container finishes the HTTP request, it takes the updated L1 Scratchpad and atomically writes it back to the L2 Shared Ledger. Only after this durable write succeeds does it return the 200 OK to the user.
By grouping these components visually, the reader instantly understands that compute (the containers) and memory (the databases) live in completely different failure domains.
The Tooling Bottleneck: Serverless vs. MCP
Hardcoding database credentials and API keys into your agent tightly couples your orchestration logic to your infrastructure. The Model Context Protocol (MCP) elegantly solves this by decoupling tool execution into isolated, standardized servers.
But in a serverless environment, this introduces a severe latency penalty. If a Cloud Run container must perform a full MCP handshake (via SSE or Stdio) on every single invocation just to discover available tool schemas, you introduce a massive network bottleneck before the LLM even generates its first token.
The Solution: Late-Binding Tool Execution. We treat MCP tool schemas as static configurations. We cache these definitions and inject them into the LLM’s system prompt at container start. We only establish the heavy MCP TCP connection if — and only if — the LLM explicitly emits a tool_call token.
Implementation: The Asynchronous Graph
Let’s examine the Python implementation. Because we are orchestrating network I/O across LLM APIs, external databases, and MCP servers, the graph must execute entirely asynchronously. Blocking the serverless container’s event loop with synchronous I/O will immediately degrade throughput.
The following implementation is a distilled architectural slice focusing on the state transition logic. For a complete, reproducible environment — including the Dockerfile, Cloud Run manifests, and MCP server configurations — explore the security-agent repository on GitHub.
1. Defining the Bounded Context(Async state)
Because we are executing network I/O to external databases and MCP servers, the entire graph must run asynchronously to prevent blocking the serverless container’s event loop.
We therefore define the state that will be hydrated and dehydrated by our checkpointer.
from typing import Annotated, Sequence, TypedDict
from langchain_core.messages import BaseMessage
from langgraph.graph.message import add_messages
class ThreatAgentState(TypedDict):
messages: Annotated[Sequence[BaseMessage], add_messages]
historical_context: str
2. The Agent Node & Schema Contracts
When giving an agent access to a database via an MCP tool, you will inevitably run into Database Hallucinations. The agent knows how to execute the query, but without a map, it will hallucinate table names or use incorrect SQL dialects (e.g., trying to use Postgres date math on an SQLite database).
To fix this, we strictly define the schema and dialect in the system prompt.
from langchain_google_vertexai import ChatVertexAI
llm = ChatVertexAI(model="gemini-2.5-pro")
def call_model(state: ThreatAgentState):
# Cached schema prevents MCP connection latency during startup
mcp_tool_schemas = [
{
"name": "read_query",
"description": "Run a read-only SQL query against the database.",
"parameters": {
"type": "object",
"properties": {"query": {"type": "string"}},
"required": ["query"],
},
}
]
llm_with_tools = llm.bind_tools(mcp_tool_schemas)
# CRITICAL: Prevent Database Hallucinations
system_prompt = f"""Context: {state.get('historical_context', '')}
You are an elite SOC Data Agent.
CRITICAL RULES FOR SQL QUERIES:
1. DIALECT: You MUST use strictly SQLite-compatible syntax.
2. SCHEMA: There is exactly ONE table available named `network_logs`.
3. COLUMNS: The table has exactly two columns: `ip` (TEXT) and `action` (TEXT). Do NOT invent column names.
"""
messages = [{"role": "system", "content": system_prompt}] + list(state["messages"])
response = llm_with_tools.invoke(messages)
return {"messages": [response]}
3. The Dynamic MCP Tool Node
This node intercepts the tool call from the LLM, dynamically establishes the secure MCP session, executes the query, and closes the connection.
from langchain_core.messages import ToolMessage
from app.tools.mcp_client import mcp_session # Your custom MCP context manager
async def execute_mcp_tools(state: ThreatAgentState):
last_message = state["messages"][-1]
if not last_message.tool_calls:
return {"messages": []}
tool_responses = []
# Only establish the connection if a tool was called
async with mcp_session() as session:
for tool_call in last_message.tool_calls:
mcp_result = await session.call_tool(
tool_call["name"],
arguments=tool_call["args"]
)
tool_responses.append(
ToolMessage(
content=str(mcp_result.content[0].text),
name=tool_call["name"],
tool_call_id=tool_call["id"],
)
)
return {"messages": tool_responses}
4. The Serverless Webhook (Ingress FastAPI)
Finally, we wrap the compiled graph in FastAPI. We use the application lifespan to manage the asynchronous connection pool to our short-term memory database (e.g., Postgres or SQLite), ensuring we don’t exhaust connections when Cloud Run scales out to 1,000 instances.
Architecturally, we use the application lifespan to manage the L2 Session State connection pool. This is critical: if we initialize the checkpointer on every request, we will instantly exhaust the database connection limits under load.
from contextlib import asynccontextmanager
from fastapi import FastAPI
from pydantic import BaseModel
from langgraph.checkpoint.sqlite.aio import AsyncSqliteSaver
from app.agent.graph import workflow
agent_app = None
@asynccontextmanager
async def lifespan(app: FastAPI):
# Open async connection pool on container startup
async with AsyncSqliteSaver.from_conn_string("short_term_memory.db") as saver:
global agent_app
agent_app = workflow.compile(checkpointer=saver)
yield
# Safely close on container teardown
api = FastAPI(lifespan=lifespan)
class AlertPayload(BaseModel):
incident_id: str
telemetry: str
@api.post("/webhook/triage")
async def triage_alert(payload: AlertPayload):
# Hydrate specific thread state statelessly
config = {"configurable": {"thread_id": payload.incident_id}}
user_message = {"role": "user", "content": payload.telemetry}
final_state = await agent_app.ainvoke({"messages": [user_message]}, config=config)
return {
"incident_id": payload.incident_id,
"agent_response": final_state["messages"][-1].content
}

Architectural Challenges (The “Gotchas”)
Treating your agent as a state transition function introduces specific distributed systems friction points. You must engineer around these failure modes for production readiness:
- The Concurrency Nightmare (OCC & Idempotency): If two events for the same incident_id hit Cloud Run simultaneously, two containers will hydrate the same state and attempt to write back diverging realities.
- The Fix: Your L2 checkpointer must implement Optimistic Concurrency Control (OCC). LangGraph’s production checkpointers achieve this via sequential checkpoint_id tracking, rejecting writes if the state has mutated mid-flight.
- The Guardrail: Beyond the database, your API ingress must enforce an X-Idempotency-Key. Because Cloud Run is a CaaS (Container-as-a-Service) environment, a network glitch might trigger a client retry. Without an idempotency key, your agent might execute a "destructive" tool action—such as updating a firewall rule or disabling a user account—twice. The key ensures that the transition from $State_{t}$ to $State_{t+1}$ is processed exactly once, regardless of how many times the request is retried.
- Database Connection Exhaustion: Serverless scaling is violent. 1,000 concurrent network alerts result in 1,000 Cloud Run instances. If each instance opens a raw TCP connection for memory hydration, your operational database will crash. The Fix: Terminate your container connections at a multiplexer like PgBouncer or the Cloud SQL Auth Proxy, or utilize HTTP-native NoSQL stores like Google Cloud Firestore to offload connection management.
- Infinite Reasoning Loops (Poison Pills): If an agent hallucinates a malformed SQL query, the MCP tool returns an error. If the LLM lacks the reasoning capacity to correct the syntax, it will loop endlessly until the Cloud Run container times out, failing to save the state. The Fix: Implement max-step circuit breakers in your LangGraph configuration, and route failed investigations to a Dead Letter Queue (DLQ) for human review.
Beyond Security: The Universal Data Agent
While the SOC is the perfect crucible for testing this architecture, the pattern of “stateless compute over an external ledger” scales to any data-heavy enterprise domain. By swapping the MCP tools, the same Cloud Run architecture can power entirely different departments:
- Site Reliability Engineering (SRE): When a PagerDuty alert fires at 3:00 AM for a database latency spike, a stateless agent wakes up first. It uses MCP to query Datadog for the exact metric, checks GitHub for any merged PRs in the last hour, and queries Kubernetes pod logs. By the time the on-call engineer opens their laptop, the agent has already diagnosed a bad database index and attached the root cause to the ticket.
- FinOps & Cloud Economics: Instead of relying on passive dashboards, an agent listens to GCP or AWS billing pub/sub alerts. When a sudden cost anomaly occurs, the agent hydrates its state, queries the Cloud Asset Inventory via MCP to find an orphaned GPU cluster, cross-references Jira to see who deployed it, and pings the engineering manager in Slack with a summarized report and a “Click to Destroy” button.
- Tier 3 Customer Support: When a VIP enterprise customer submits a Zendesk ticket containing a massive stack trace, an agent intercepts it. It uses MCP to query Elasticsearch for the customer’s specific tenant ID, correlates the error with known Jira bugs, and drafts a highly technical response — maintaining the thread state so a human support engineer can seamlessly review and send it.
- Supply Chain Logistics: If an IoT sensor in a warehouse detects a temperature failure in a refrigeration unit, an agent is triggered. It queries the SAP/ERP system via MCP to identify which perishable goods are at risk, checks the inventory of nearby facilities, and automatically generates a rerouting manifest for the logistics team.
The Staff’s Pick: The Ultimate Enterprise Agent Stack
If you are building this for an enterprise-grade engineering or security environment, here is the curated stack to ensure massive scale and reliability:
- Orchestration Engine: LangGraph. It is currently unmatched for treating multi-agent workflows as strictly typed, persistable state machines.
- Short-Term Memory Storage: Google Cloud Firestore. Single-digit millisecond latency, scales to zero, and operates over HTTP/gRPC, completely avoiding the serverless connection-pooling nightmare.
- Telemetry Database: ClickHouse. When your agent needs to execute an MCP tool to comb through terabytes of VPC flow logs or application traces, ClickHouse is the undisputed king of real-time analytical speed.
- Foundational Model: Gemini 2.5 Pro (via Vertex AI). With its massive context window, it can absorb heavily retrieved long-term memory payloads without losing track of the immediate investigative thread.
Conclusion: State is a Liability
If there is one universal truth in distributed systems, it is this: state is a liability. When we treat AI agents as magical, long-running monolithic brains, we violate the core tenets of reliability engineering. Stateful Kubernetes pods executing volatile, memory-heavy LLM workloads are ticking time bombs. A sudden spike in events leads to CPU throttling, OOM (Out of Memory) kills, and the catastrophic loss of the agent’s internal reasoning context. Your Mean Time To Recovery (MTTR) plummets, and your operational toil skyrockets.
The architecture of the “Stateless Paradox” is not just a clever workaround for serverless environments; it is a necessary evolution. By enforcing a strict separation of concerns — ephemeral compute via Cloud Run, durable ledger persistence via async databases, and isolated I/O via late-bound MCP tools — we transform brittle AI scripts into hardened enterprise microservices.
We limit the blast radius of any single failure. If a container dies mid-thought, the system doesn’t lose the investigation; a new container simply spins up, hydrates the state from the last successful checkpoint, and picks up exactly where the dead node left off.
We are moving past the era of the monolithic AI process. The next generation of enterprise automation belongs to fleets of decoupled, distributed, and entirely stateless Data Agents — systems engineered to scale violently, fail gracefully, and recover instantly.
Building stateful Data Agents on stateless infrastructure is no longer a paradox; it is an established agentic design pattern that balances elastic scale with cognitive consistency. To start deploying this blueprint to your own Google Cloud environment, check out the demo code and deployment manifests at shuvajyotikar13/security-agent.
Opinions expressed are my own in my personal capacity, as a leader in the Data Agents ecosystem, and do not represent the views, policies or positions of my(or, anyone else’s) employers(current and exes) or its subsidiaries or affiliates.
The Stateless Paradox: Engineering Stateful Data Agents on Ephemeral Compute was originally published in Google Cloud – Community on Medium, where people are continuing the conversation by highlighting and responding to this story.
Source Credit: https://medium.com/google-cloud/engineering-stateful-data-agents-on-ephemeral-compute-ee354a71118c?source=rss—-e52cf94d98af—4
