The technology stack
This solution effectively combines Google Cloud’s managed AI services with open-source frameworks:
-
Agent Development Kit (ADK): The open-source Python framework used to define the agent’s complex logic, including the multi-step orchestration, state management, and integration with various services.
-
Vertex AI Agent Engine: The fully managed, serverless environment that hosts and executes the ADK-based agent, handling scaling, security, and operational overhead.
-
Vertex AI RAG Engine: For generating contextually grounded responses. The Engine is configured to use Vertex AI Search as its retrieval backend, efficiently pulling relevant information from our internal documents to inform the language model.
-
Gemini Models: Provides the advanced reasoning and language synthesis capabilities required to generate high-quality, human-readable answers from the retrieved data.
-
Cloud Pub/Sub: Functions as a durable messaging queue that decouples the agent from the final write-back process, increasing the overall resilience and reliability of the architecture.
-
Cloud Storage: Served as storage for the unstructured customer documents needed to answer the DOR questions.
Overcoming challenges
The journey to automate DOR creation with an AI agent was not without its hurdles. Several key challenges were encountered and successfully addressed, highlighting important architectural and deployment considerations for similar agentic AI solutions.
1. Agent context management and scaling:
Initially, the design involved passing all 140+ questions to the agent at once, expecting it to iterate and manage its progress. However, this approach led to significant memory overloads and “Out of Memory” (OOM) errors. The agent’s internal context window, which grew with each check against its logic and the accumulation of answers, quickly became unmanageable.
The solution involved shifting the responsibility for state management to a FastAPI server acting as an orchestrator. Instead of receiving all questions upfront, the agent was designed to process questions one by one. The FastAPI server now maintains the overall context and the accumulating document, passing individual questions to the agent and storing the agent’s responses. This compartmentalization of context dramatically improved the agent’s stability and allowed for more efficient scaling.
2. Deployment architecture and resource management:
Determining the optimal deployment architecture for both the backend orchestrator (FastAPI server) and the agent on Vertex AI Agent Engine posed another challenge. Early experiments with deploying both components within a single Google Kubernetes Engine (GKE) cluster resulted in frequent pod crashes, primarily due to the agent’s context and memory demands. The decision was made to decouple the FastAPI server from the agent’s runtime. The FastAPI server is deployed as a standalone service on GKE, which then makes calls to the agent separately deployed on Vertex AI Agent Engine. This separation leverages Vertex AI Agent Engine’s fully managed and scalable environment for the agent, while providing the flexibility of a custom backend orchestrator.
3. Performance optimization for LLM calls:
The nature of generating answers using Gemini models, involving multiple API calls for each of the 140+ questions, initially resulted in a lengthy runtime of approximately 2.5 hours per DOR. Recognizing that these calls were I/O-bound, the process was significantly optimized through parallelization. By implementing multi-threading within the FastAPI orchestrator, multiple Gemini calls could be executed concurrently. Vertex AI Agent Engine’s horizontal scaling capabilities further supported this parallel execution. This architectural change drastically reduced the overall processing time, improving efficiency by a substantial margin.
Business outcomes
The implementation of this AI agent has delivered significant, measurable results for Palo Alto Networks:
-
Increased Efficiency: The time required to create a comprehensive DOR has been dramatically reduced.
-
Improved Consistency and Quality: By standardizing on a 140-question framework, every DOR now meets a uniform high standard of quality and completeness.
-
Enhanced Accuracy: Grounding the agent’s answers in a trusted RAG system minimizes the risk of human error and ensures the information is drawn from the latest internal documentation.
-
Strategic Re-focus of Personnel: The automation of this task allows expert employees to dedicate more time to high-value activities like customer strategy and direct engagement.
-
Ability to understand the gaps in documentation, areas where the answers are weak or absent ensures the pre-sales teams coordinating the efforts can emphasize on those topics for a more complete understanding of the customer.
This use case demonstrates a practical and powerful application of agentic AI in the enterprise, showcasing how a combination of open-source frameworks and managed cloud services can solve complex business challenges and drive operational efficiency.
The team would like to thank Googlers Hugo Selbie (GSD AI Incubation team) and Casey Justus (Professional Services) for their support and technical leadership on agents and agent frameworks as well as their deep expertise in ADK and Agent Engine.
Source Credit: https://cloud.google.com/blog/topics/partners/palo-alto-networks-customer-intelligence-agentic-design/
