Google AI Agent Security. Based on Google’s recently released on… | by Ruban Siva | Google Cloud - Community

Based on Google’s recently released on their AI Agent Security article from a CISO perspective…

AI Agent Security Guidelines

Agents must have well-defined human controllers: Agents must operate under clear human oversight, with the ability to distinguish authorized user instructions from other inputs.
Agent powers must have limitations: Agent actions and resource access must be carefully limited and dynamically aligned with their intended purpose and user risk tolerance. This emphasizes the least-privilege principle.
Agent actions and planning must be observable: Agent activities must be transparent and auditable through robust logging and clear action characterization.

Google’s Suggested Approach

Google advocates for two distinct layers:

Layer 1: Use traditional, deterministic measures, such as runtime policy enforcement. Runtime policy engines act as external guardrails, monitoring and controlling agent actions before execution based on predefined rules. These engines use action manifests to capture the security properties of agent actions, such as dependency types, effects, authentication, and data types.
Layer 2: Deploy reasoning-based defense strategies. This layer uses the AI model’s own reasoning to enhance security. Techniques such as adversarial training and using specialized models as security analysts can help the agent distinguish legitimate commands from malicious ones, making it more resilient against attacks, data theft, and even model theft.

Initial Base AI Agent Concept

Key Security Risks

Risk 1: Rogue Actions

Summary: This is when an AI agent performs harmful or unintended actions. This can happen because it is tricked by malicious instructions hidden in data (like an email or website) or because it misunderstands ambiguous user commands or misinterprets how to use an external tool.

Risk 2: Sensitive Data Disclosure

Summary: This is when an AI agent improperly reveals private information. Attackers can achieve this by tricking the agent into sending data to an external location (e.g., in a URL) or by manipulating its response to include the sensitive data directly.

Core principles for AI Agent security

Now here’s how Google advises on risk mitigation…

Principle 1: Human Oversight and Control

Statement: Every AI agent must be clearly linked to a human controller, who must approve any critical or irreversible actions, ensuring the system can always distinguish between the user’s commands and other data.

Principle 2: Limited and Purpose-Aligned Powers

Statement: An agent’s abilities must be strictly limited to its intended purpose, with permissions dynamically adjusted for each specific task, and users must always be able to review and revoke its access.

Principle 3: Observability and Transparency

Statement: All agent actions and reasoning steps must be securely logged and made observable, with a user interface that provides clear insight into the agent’s thought process to build trust and allow for security audits.