The Temporal Identity Paradox: Architecting Zero-Trust for Long-Running Data Agents on Ephemeral Compute
The enterprise AI landscape is undergoing a massive shift. We are moving away from passive conversational interfaces toward autonomous, transactional AI agents. But as we transition from static automation to Agentic AI, the industry is headed towards an identity crisis. The problem isn’t the LLM's reasoning capabilities; it's its identity.
If your engineering organization is provisioning static, or rotating API keys or traditional service accounts for language models, you are in for a ride— and an explosive security risk. AI agents differ fundamentally from traditional software; they can autonomously act on external services, exhibiting non-deterministic, flexible behaviour that adapts in real time.
We will discuss here why traditional IAM is fundamentally inadequate for the agentic era, why solutions built for simple chatbots collapse under the weight of long-running autonomous agents/ swarms, and how to architect a “Re-hydration” loop to securely run stateful agents on ephemeral serverless compute.
The Imperative: Why an Agent Needs an Identity
Traditional software clients — web or mobile — operate on structured, unambiguous user inputs. These actions represent a clear, auditable grant of intent. Agents, however, interpret unstructured data and dynamically formulate their own execution graphs. In short, agents (LLMs and their associated harness) typically have a probabilistic path to the goal, and it’s this stochasticity that causes auditability nightmares.
Currently, agents often impersonate users in ways that are opaque to external services, creating significant accountability gaps. If an agent hallucinating a network automation script runs with sweeping administrative credentials, the blast radius can be catastrophic, and we have had a few real-life incidents to ascertain the fear.
To secure this workflow, we must shift from implicit Impersonation to explicit Delegation.
True delegation requires explicit “on-behalf-of” (OBO) flows in which agents prove their delegated scope while remaining identifiable as distinct entities from the user they represent. An agent must have its own cryptographically verifiable identity to ensure non-repudiation and enforce strict capability boundaries.
Let’s take an example to understand it better.
Implicit Impersonation is like handing your personal L6 engineering badge to a junior contractor so they can go restart a server for you.
To the security system, the contractor is you. If they get confused and accidentally power down the primary billing database, the audit logs show that you did it. There is no proof that the contractor was even there. This is exactly how we currently treat agents: we inject our sweeping system API keys right into the agent’s environment. The agent uses the LLM to reason, but it executes actions using your admin privileges. If the model hallucinates a destructive DROP TABLE command, the agent executes it with zero principle of least privilege, and the blast radius is tied directly to you.
Explicit Delegation (OBO), on the other hand, is like the contractor going to the security desk to get their own temporary badge. This visitor badge explicitly states, “I am Contractor X, and I am authorized by Engineer Y to enter only Rack 4 for the next 60 minutes.”
So there is an active spatio-temporal fencing in place for his/her access rights. If the contractor tries to swipe into Rack 5, the door flashes red. The system enforces the boundary. More importantly, when the security audit happens, the logs don’t just say Engineer Y was in the building; they explicitly record: “Contractor X (Acting on behalf of: Engineer Y) accessed Rack 4.”
In our cloud-native architecture, the JWT token must act exactly like that visitor badge. Instead of just a sub (Subject) claim that proves the human user, the token must carry an act (Actor) claim that represents the specific AI agent instance. This cryptographic proof ensures that the agent can only execute the limited scope you delegated to it, and every action it takes is immutably logged as a machine acting on your behalf.
The Baseline: How We Normally Do It
For the first generation of AI agents, the security architecture was relatively straightforward. Existing OAuth 2.1 frameworks, when used with AI agents, work well within single trust domains with synchronous agent operations.

In this standard pattern, an enterprise user interacts with a chatbot that needs to query an internal tool. We bind the agent to the Model Context Protocol (MCP). The agent authenticates via an enterprise Identity Provider (IdP), receives a short-lived JSON Web Token (JWT), and executes its tool call.

This works because the operation is synchronous and short-lived. The agent spins up, grabs the context, uses the short-lived token to pull from the database, and returns the answer to the user before the token expires.
The Pivot: Short-Lived vs. Long-Running Agents
The paradigm breaks down as we move toward scaled autonomy. There is a massive architectural difference between a stateless RAG chatbot and an asynchronous Agent-to-Agent (A2A) swarm.
Consider a threat-hunting agent. A severity-1 alert fires for a DNS rebinding attack at 2:00 AM. A human analyst deploys a swarm to analyze 48 hours of VPC flow logs, cross-reference them against real-time threat feeds, and isolate compromised endpoints.
This is not a synchronous chat request. This is a highly autonomous, multi-step execution graph that might take minutes or hours to complete.
The Problem: The Identity Paradox of Ephemeral Compute
When we run these long-running agents on cloud-native, serverless infrastructure (such as Google Cloud Run), we encounter a fundamental paradox.
IAM models based on short-lived, user-session-bound access tokens are fundamentally incompatible with long-running, asynchronous agent operations. If we adhere strictly to zero-trust principles and use 15-minute On-Behalf-Of (OBO) tokens, the token will expire right in the middle of the agent’s cognitive graph. The MCP server will throw a 401 Unauthorized, and the container will crash, destroying hours of reasoning context.
To bypass this, developers often resort to an anti-pattern: injecting long-lived Refresh Tokens or static API keys directly into the agent’s execution sandbox.
But granting a non-deterministic model a persistent, highly privileged token that can self-refresh indefinitely completely destroys the principle of least privilege. If the agent hallucinates or falls victim to prompt injection, you have armed it with an unstoppable credential.
The Proposed Solution: Durable Identity + The Re-Hydration Loop
To solve this, we must stop fighting the ephemeral nature of Cloud Compute. We need to decouple the agent’s cognitive state from its execution compute.
The agent requires a durable, delegated identity that is a first-class citizen in the IAM system, distinct from the initiating user, allowing it to authenticate independently over extended periods. We map this durable identity to a temporal memory architecture using the Sense and Act (SnAC) pattern, effectively externalising the agent’s hippocampus.
System Design: The Zero-Trust SnAC Architecture
The architecture is divided into four highly isolated components:

The diagram above maps out the system design for a complete zero-trust ecosystem that safely runs autonomous agents without granting them permanent admin keys. It shows how a human initially delegates authority through an Identity Control Plane, which provisions a short-lived token to an agent running inside an isolated Compute Sandbox. Every action the agent attempts is strictly intercepted and evaluated by the SnAC IAM Mesh (the policy enforcer), while its ongoing “thought process” and execution states are continuously streamed into a Temporal Auditing Backplane (powered by Kafka and ClickHouse). Crucially, if the agent’s short-lived token expires mid-task, or if it attempts a high-risk action that requires out-of-band human approval via a mobile push notification, the backplane safely pauses the agent’s state. A background orchestrator then steps in to verify the permissions and “re-hydrate” a fresh container so the agent can seamlessly resume its work. Lets understand each of the components now:
- The Identity Control Plane (SCIM & IdP): We extend the System for Cross-domain Identity Management (SCIM) schema to treat the agent as a non-human employee. The IdP holds the durable delegation from the human, but only mints short-lived OBO tokens for the execution sandbox.
- The Compute Isolation Layer (Cloud Run/WebAssembly): The agent runs within a strict, ephemeral serverless boundary. It never holds persistent secrets.
- The Temporal Auditing Backplane (Kafka to ClickHouse): As the agent processes its graph, every thought, reasoning trace, and action is streamed as Change Data Capture (CDC) via Kafka directly into ClickHouse. This provides sub-second temporal state and bypasses the “read amplification wall.”
- The MCP Gateway (Policy Enforcement Point): The MCP server sits between the compute layer and the external tools, dynamically validating the short-lived tokens and intercepting capability requests.
The Control Flow: Executing the Re-Hydration Loop
When a token expires mid-execution, we treat it as an expected lifecycle event rather than a fatal error. Here is the exact control flow:

Phase 1: Execution and Suspension
- The agent executes inside Cloud Run, using a fixed time (eg. 15-minute) token to interact with the MCP Gateway.
- It streams its ongoing context continuously to ClickHouse via Kafka.
- The short-lived token expires. The MCP Gateway throws a 401 Unauthorized.
- The agent catches the exception, pushes a final STATE_SUSPENDED marker to Kafka—containing its exact pointer in the execution graph—and gracefully terminates the container.
Phase 2: Verification and Re-Hydration
5. A secure background orchestrator detects the STATE_SUSPENDED event in ClickHouse.
6. It queries the IdP to ensure the human’s original durable delegation has not been revoked.
7. If valid, the Identity Plane mints a fresh 15-minute token.
8. The orchestrator spins up a new Cloud Run container, passes in the fresh token, and instructs the agent to pull its prior cognitive state from ClickHouse. The agent resumes exactly where it left off.
// Inside the Ephemeral Agent Worker
try {
const result = await mcpClient.executeTool(action.tool, action.params);
this.state = updateState(this.state, result);
await flushStateToKafka('STATE_ACTIVE', this.state);
} catch (error: any) {
if (error.statusCode === 401) {
console.log(`[AUTH_BOUNDARY] Token Expired. Halting execution.`);
// Safely persist execution pointer before container death
await flushStateToKafka('STATE_SUSPENDED', this.state);
// Graceful exit. Orchestrator will re-hydrate.
process.exit(0);
}
}
The Engineering Use Case: CIBA and Swarm Escalation
This architecture natively solves one of the most complex engineering use cases: human-in-the-loop bottlenecks during high-stakes threat hunting.
Let’s return to our swarm isolating a DNS rebinding attack. Halfway through the autonomous execution, the Triage Agent identifies a compromised node and delegates the task of quarantining it to a Remediation Sub-Agent.
Quarantining a node is a high-consequence capability. The MCP Gateway recognizes that the sub-agent’s token lacks this specific scope.
Instead of failing, the system utilizes the Client-Initiated Backchannel Authentication (CIBA) protocol. The agent fires a CIBA request to the IdP, pushes its state to ClickHouse as STATE_SUSPENDED, and dies.
The CIBA flow enables the agent to request secure, out-of-band authorisation from the appropriate human decision-maker. The on-call engineer receives a push notification on their mobile device containing the exact context. They approve the escalation. The authorisation server triggers the Re-Hydration loop, but this time the newly minted token includes a cryptographically verified scope to drop the firewall rule.
By architecting identity as a dynamic, re-hydratable state rather than a static credential, we eliminate the false dichotomy between security and autonomy. We unlock the ability to deploy self-contained data agents that execute massive, multi-day workloads across distributed systems — securely bounded by the strict, unforgiving rules of cloud-native infrastructure.
Conclusion: Returning to Distributed Systems Basics
We’ve all done it: injected a long-lived admin token into a Python script, wrapped it around an LLM, and called it an agent. It’s the fastest way to get a prototype working. But as these systems graduate from internal demos to production workloads handling real infrastructure, that shortcut becomes a massive liability. We cannot keep handing static, sweeping credentials to non-deterministic models and simply hoping for the best.
The Re-Hydration architecture isn’t just a clever trick to bypass 15-minute token expirations. It’s an acknowledgement of how we actually need to build stateful systems on stateless cloud infrastructure. Instead of fighting the ephemeral nature of serverless compute by poking holes in our IAM policies or extending token lifetimes, we lean into it.
By separating the agent’s memory (persisted safely in ClickHouse) from its execution (ephemeral Cloud Run containers) and enforcing explicit delegation via On-Behalf-Of tokens, we establish a clear separation of concerns. We are accepting a practical reality: we can’t guarantee exactly what an LLM will decide to do next. But by surrounding it with standard IAM guardrails and out-of-band approvals like CIBA, we can guarantee what the system will actually permit it to execute. The security boundaries become explicit, auditable, and tied to the principle of least privilege.
Have you implemented an IAM strategy for your agents yet? Drop your strategies and incidents when they have broken, in the comments.
FootNote
The architecture is based on a discussion with Ayesha Dissanayaka, after her session “Who’s Calling? Bringing Identity To the MCP Host” that focused on treating the MCP host as a first-class identity through four disciplines: Administer (lifecycle, credentials), Authenticate (how a host proves itself), Authorize (delegation vs. impersonation, token exchange, actor claims), and Audit (trails that separate agent action from user intent).
And thanks to her pioneering work on Identity Management for Agentic AI, which sets the tone for many of the techniques that are discussed above.
Opinions expressed are my own in my personal capacity and do not represent the views, policies, or positions of my current and/or ex-employer (s) (we all love our exes. 😉 ), their subsidiaries, or affiliates.
The Temporal Identity Paradox: Architecting Zero-Trust for Long-Running Data Agents on Ephemeral… was originally published in Google Cloud – Community on Medium, where people are continuing the conversation by highlighting and responding to this story.
Source Credit: https://medium.com/google-cloud/zero-trust-iam-agentic-ai-architecture-3f30aa94c281?source=rss—-e52cf94d98af—4
