
Semantic search and RAG
In semantic search, we are able to leverage underlying relationships between words to find relevant results beyond a simple keyword match. Embeddings are vector representations of data, ranging from text to videos, that capture these relationships. In a semantic search we find embeddings that are mathematically close to our search query, allowing us to find words or search terms close in meaning that may have not shown up in a keyword search. Databases such as AlloyDB allow us to combine this unstructured search with a structured search to provide high quality relevant results. For example, the prompt “Show me all pictures of sunsets I took in the past month” includes a structured part (the date is within the past month) and an unstructured part (the picture contains a sunset).
In many RAG applications, embeddings play an important role in retrieving the relevant context from a knowledge base (such as a database) to ground the responses of large language models (LLMs). RAG systems can perform a semantic search on a Vector Database, such as AlloyDB, or directly pull data from the database to provide the retrieved results as context to the LLM so that it has access to the necessary information to generate informative answers.
Knowledge ingestion pipelines
A knowledge ingestion pipeline takes unstructured content, such as product catalogs with free form descriptions, support ticket dialogs, and legal documents, processes them into embeddings, and then pushes these embeddings into a vector database. The source of this knowledge can vary widely, from files stored in cloud storage buckets (like Google Cloud Storage) and information stored in databases like AlloyDB, to streaming sources such as Google Cloud Pub/Sub or Google Cloud Managed Service for Kafka. For streaming sources, the data itself might be raw content (e.g, plain text) or URIs pointing to documents. A key consideration when designing knowledge ingestion pipelines is how to ingest and process knowledge, whether in a batch or streaming fashion.
-
Streaming vs Batch: To provide the most up-to-date and relevant search results, and thus a superior user experience, embeddings should be generated in real time for streaming data. This applies to new documents being uploaded or new product images, where current knowledge holds significant business value. For less time-sensitive applications and operational tasks like backfilling, a batch pipeline is suitable. Crucially, the chosen framework must support both streaming and batch processing without requiring business logic re-implementation.
-
Chunking: Regardless of the data source, after reading the data there is normally a step for preprocessing the information. For simple raw text, this might mean basic cleaning. However, for larger documents or more complex content, chunking is a crucial step. Chunking breaks down the source material into smaller, manageable units. The best chunking strategy varies depending on the specific data and application.
Introducing Dataflows MLTransform for embeddings
Dataflow ML provides many out of the box capabilities to simplify the entire process of building and running a streaming or batch knowledge ingestion pipeline, allowing you to implement these pipelines in a few lines of code. For an ingestion pipeline there are typically four phases, reading from data sources, preprocessing the data, making it ready for embeddings, and finally writing the correctly shaped schema to our vector database. The new capabilities in MLTransform adds support for chunking, generation of embeddings, using Vertex or bring your own (BYO) models and specialized writers for persisting embeddings to databases such as AlloyDB.
Source Credit: https://cloud.google.com/blog/topics/developers-practitioners/create-and-retrieve-embeddings-with-a-few-lines-of-dataflow-ml-code/