Design private connectivity for RAG AI apps

The flexibility of Google Cloud allows enterprises to build secure and reliable architecture for their AI workloads. In this blog we will look at a reference architecture for private connectivity for retrieval-augmented generation (RAG)-capable generative AI applications. This architecture is for scenarios where communications of the overall system must use private IP addresses and must not traverse the internet.

The power of RAG

RAG is a powerful technique used to optimize the output of large language models (LLMs) by grounding them in specific, authoritative knowledge bases outside of their original training data. RAG allows an application to retrieve relevant information from your documents, datasources, or databases in real time. This retrieved context is then provided to the model alongside the user’s query, helping to ensure that the AI’s responses are accurate, verifiable, and highly relevant to your business. This improves the quality of responses and reduces hallucinations.

This approach is helpful because it allows you to direct generative AI to use a designated source of truth, rather than relying solely on the model’s pre-existing knowledge, and without needing to retrain or fine-tune the model itself.

Design pattern example

To understand how to think about setting up your network for private connectivity for a RAG application in a regional design, let’s look at the design pattern.

The setup comprises an external network (on-prem and other clouds) and Google Cloud environments consisting of a routing project, a Shared VPC host project for RAG, and three specialized service projects: data ingestion, serving, and frontend.

This design utilizes the following services to provide an end-to-end solution:

Cloud Interconnect or Cloud VPN: To securely connect from your on-premises or other clouds to the routing VPC network
Network Connectivity Center: Used as an orchestration framework to manage connectivity between the routing VPC network and the RAG VPC network via VPC spokes and hybrid spokes
Cloud Router: In the routing project, facilitates dynamic BGP route exchange between the external network and Google Cloud
Private Service Connect: Provides a private endpoint in the routing VPC network to reach the Cloud Storage bucket for data ingestion without traversing the public internet
Shared VPC: Host project architecture that allows multiple service projects to use a common, centralized VPC network
Google Cloud Armor and Application Load Balancer: Placed in the frontend service project to provide security and traffic management for user interaction
VPC Service Controls: Creates a managed security perimeter around all resources to mitigate data exfiltration risks