
To help detail what we believe are the core issues, we’ve published a comprehensive guide covering our approach to securing AI agents that addresses concerns for both AI agent developers and security practitioners. Our goal is to provide a clear and actionable foundation for building secure and trustworthy AI agent systems that benefit society.
We cover the security challenges of agent architecture, the specific risks of rogue actions and sensitive data disclosure, and detail the three fundamental agent security principles: well-defined human controllers, limited agent powers, and observable agent actions.
- Agents must have well-defined human controllers: Agents must operate under clear human oversight, with the ability to distinguish authorized user instructions from other inputs.
- Agent powers must have limitations: Agent actions and resource access must be carefully limited and dynamically aligned with their intended purpose and user risk tolerance. This emphasizes the least-privilege principle.
- Agent actions and planning must be observable: Agent activities must be transparent and auditable through robust logging and clear action characterization.
Google’s hybrid approach: Agentic defense-in-depth
Google advocates for a hybrid defense-in-depth approach that combines the strengths of both traditional (deterministic) and reasoning-based (dynamic) security measures. This creates layered defenses that can help prevent catastrophic outcomes while preserving agent usefulness.
We believe that the most effective and efficient defense-in-depth path forward secures agents with both classic and AI controls. Our approach advocates for two distinct layers:
- Layer 1: Use traditional, deterministic measures, such as runtime policy enforcement. Runtime policy engines act as external guardrails, monitoring and controlling agent actions before execution based on predefined rules. These engines use action manifests to capture the security properties of agent actions, such as dependency types, effects, authentication, and data types.
- Layer 2: Deploy reasoning-based defense strategies. This layer uses the AI model’s own reasoning to enhance security. Techniques such as adversarial training and using specialized models as security analysts can help the agent distinguish legitimate commands from malicious ones, making it more resilient against attacks, data theft, and even model theft.
Of course, each of the above two layers should have their own layers of defense. For example, model-based input filtering coupled with adversarial training and other techniques can help reduce the risk of prompt injection, but not completely eliminate it. Similarly, these defense measures would make data theft more difficult, but would also need to be enhanced by traditional controls such as rule-based and algorithmic threat detection.
Key risks, limitations, and challenges
Traditional security paradigms, designed for static software or general AI, are insufficient for AI agents. They often lack the contextual awareness needed to know what the agent is reasoning about and can overly restrict an agent’s utility.
Similarly, relying solely on a model’s judgment for security is also inadequate because of the risk posed by vulnerabilities such as prompt injection, which can compromise the integrity and functionality of an agent over time.
In the wide universe of risks to AI, two risks associated with AI agents stand out from the crowd by being both more likely to manifest and more damaging if ignored.
Rogue actions are unintended, harmful, and policy-violating behaviors an agent might exhibit. They can stem from several factors, including the stochastic nature of underlying models, the emergence of unexpected behaviors, and challenges in aligning agent actions with user intent. Prompt injections are a significant vector for inducing rogue actions.
For example, imagine an agent designed to automate tasks in a cloud environment. A user intends to use the agent to deploy a virtual machine. However, due to a prompt injection attack, the agent instead attempts to delete all databases. A runtime policy engine, acting as a guardrail, would detect the “delete all databases” action (from its action manifest) and block it because it violates predefined rules.
Sensitive data disclosure involves the unauthorized revelation of private or confidential information by agents. Security measures would help ensure that access to sensitive data is strictly controlled.
For example, an agent in the cloud might have access to customer data to generate reports. If not secured, the agent might retain this sensitive data and then be coaxed to expose it. A malicious user could then ask a follow-up question that triggers the agent to inadvertently disclose some of that retained data.
However, securing AI agents is inherently challenging due to four factors:
- Unpredictability (non-deterministic nature)
- Emergent behaviors
- Autonomy in decision-making
- Alignment issues (ensuring actions match user intent)
Practical security considerations
Our recommended hybrid approach addresses several critical areas.
- Agent/plugin user controls: Emphasizes human confirmation for critical and irreversible actions, clear distinction between user input and other data, and verifiable sharing of agent configurations.
- Agent permissions: Adherence to the least-privilege principle, confining agent actions to its domain, limiting permissions, and allowing for user authority revocation. This level of granular control often surprises security leaders because such a traditional 1980s-style security control delivers high value for securing 2020s AI agents.
- Orchestration and tool calls: The intricate relationship between AI agents and external tools and services they use for orchestration can present unique security risks, especially with “Actions as Code.” Robust authentication, authorization, and semantic tool definitions are crucial risk mitigations here.
- Agent memory: Data stored in an agent’s memory can lead to persistent prompt injections and information leakage.
- Response rendering: Safely rendering AI agent outputs into user-readable content is vital to prevent classic web vulnerabilities.
Assurance and future directions
Continuous assurance efforts are essential to validate agent security. This includes regression testing, variant analysis, red teaming, user feedback, and external research programs to ensure security measures remain effective against evolving threats.
Securing AI agents requires a multi-faceted, hybrid approach that carefully balances the utility of these systems with the imperative to mitigate their inherent risks. Google Cloud offers controls in Agentspace that follow these guidelines, such as authentication and authorization, model safeguards, posture assessment, and of course logging and detection.
To learn more about how Google is approaching securing AI agents, please read our research paper.
Source Credit: https://cloud.google.com/blog/products/identity-security/cloud-ciso-perspectives-how-google-secures-ai-agents/