

“So what’s changed in RAG!”, you ask? Read on…
You’ve built the perfect vector search engine. You’ve fine-tuned your embeddings model. You’re throwing queries at it, and… crickets. Or worse, plausible-sounding but ultimately irrelevant results. Sound familiar? The dirty secret of vector search isn’t just building it, it’s knowing if your vector matches are actually good. We’ve all been there, staring blankly at a list of results, wondering, ‘Is this thing even working?!’ Let’s dive into how to actually evaluate the quality of your vector matches.
“So what changed in RAG?”, you ask? Everything! For years, Retrieval Augmented Generation (RAG) felt like a promising but elusive goal. Now, finally, we have the tools to build RAG applications with the performance and the reliability needed for mission-critical tasks.
First of all, for those new to the party, generative models, with their large context windows and impressive output quality, are transforming AI. RAG provides a systematic way to inject context into AI applications and agents, grounding them in structured databases or information from various media. This contextual data is crucial for clarity of truth and accuracy of output, but how accurate those results are? Does your business depend largely on the accuracy of these contextual matches and relevance? Then this project is going to tickle you!
Now we already have the foundational understanding of 3 things:
- What contextual search means for your agent and how to accomplish that using Vector Search.
- We also deep-dove into attaining Vector Search within the scope of your data, that is within you database itself (all of Google Cloud Databases support that, if you didn’t already know!).
- We went one step further than the rest of the world in telling you how to accomplish such a light-weight Vector Search RAG capability with high performance and quality with AlloyDB Vector Search capability powered by ScaNN index.
If you haven’t gone through those basic, intermediate and slightly advanced RAG experiments, I would encourage you to read those 3 here, here and here in the listed order.
Many specialized vector databases today require you to create complex pipelines and applications in order to get the data you need. AlloyDB for PostgreSQL offers Google Research’s, state-of-the-art vector search index, ScaNN, enabling you to optimize the end-to-end retrieval of the most fresh, relevant data with a single SQL statement.
Let me refresh your memory of this this example: Consider the Patent Search application we built here. The picture below shows the overall flow of what’s happening in this application.
Follow step 3 in this codelab to create the AlloyDB cluster, instance and database. Then follow step 4 to load patents data into the newly created table. Follow step 5 to create embeddings for the patent abstract data and step 6 to do your Vector Search. That is it. In one query, you’re able to just execute one of the very common complex contextual match use cases. Take a look at the query here:
SELECT id || ' - ' || title as title, abstract
FROM patents_data
ORDER BY abstract_embeddings <=>
embedding('text-embedding-005', 'Eventually consistent synchronous data replication in a storage system')::vector LIMIT 10;
That’s it. In one query we are retrieving 10 contextually relevant matches to the user’s search text using Cosine Similarity algorithm.
Later on, we enabled ScaNN index for Vector Search fields that allowed users to get matching results faster and at a reasonable quality. Read about it here.
There a good number of things introduced. But I’m going to talk about two in particular. Yes, 2 main things as a developer that you would love to see:
- Inline Filtering
Previously as a developer, you would have to perform the Vector Search query and have to go deal with the filtering and recall. AlloyDB Query Optimizer makes choices on how to execute a query with filters. Inline filtering, is a new query optimization technique that allows the AlloyDB query optimizer to evaluate both the metadata filtering conditions and the vector search alongside, leveraging both vector indexes and indexes on the metadata columns. This has made recall performance increase, allowing developers to take advantage of what’s available to do!
You can enable inline filtering from your Cluster settings console or in a SQL statement: (Don’t run the statement yet).
SET scann.enable_inline_filtering = on
Inline filtering, is best for cases with medium selectivity. As AlloyDB searches through the vector index, it only computes distances for vectors that match the metadata filtering conditions (your functional filters in a query usually handled in the WHERE clause). This massively improves performance for these queries complementing the advantages of post-filter or pre-filter.
2. Recall Evaluator
Recall in similarity search is the percentage of relevant instances that were retrieved from a search, i.e. the number of true positives. This is the most common metric used for measuring search quality. One source of recall loss comes from the difference between approximate nearest neighbor search, or aNN, and k (exact) nearest neighbor search, or kNN. Vector indexes like AlloyDB’s ScaNN implement aNN algorithms, allowing you to speed up vector search on large datasets in exchange for a small tradeoff in recall. Now, AlloyDB provides you with the ability to measure this tradeoff directly in the database for individual queries and ensure that it is stable over time. You can update query and index parameters in response to this information to achieve better results and performance.
What is the logic behind recall of search results?
In the context of vector search, recall refers to the percentage of vectors that the index returns which are true nearest neighbors. For example, if a nearest neighbor query for the 20 nearest neighbors returns 19 of the ground truth nearest neighbors, then the recall is 19/20×100 = 95%.
- Install or update the pgvector extension
CREATE EXTENSION IF NOT EXISTS vector WITH VERSION '0.8.0.google-3';
2. If the pgvector
extension is already installed, upgrade the vector
extension to version 0.8.0.google-3 or later to get recall evaluator capabilities.
ALTER EXTENSION vector UPDATE TO '0.8.0.google-3';
3. To create ScaNN indexes, install the alloydb_scann
extension.
CREATE EXTENSION IF NOT EXISTS alloydb_scann;
4. Let’s see the impact of Inline Filtering & ScaNN index:
First run the Vector Search Query without the index and without the Inline Filter enabled:
Run Explain Analyze on it: (with no index nor Inline Filtering)
Let’s create a regular index on the num_claims field so we can filter by it:
CREATE INDEX idx_patents_data_num_claims ON patents_data (num_claims);
Let’s create the ScaNN index for our Patent Search application. Run the following from your AlloyDB Studio:
CREATE INDEX patent_index ON patents_data
USING scann (abstract_embeddings cosine)
WITH (num_leaves=32);
Set the Inline Filtering enabled on the ScaNN Index:
SET scann.enable_inline_filtering = on
Now, let’s run the same query with filter and Vector Search in it:
As you can see the execution time is reduced significantly for the same Vector Search. The Inline Filtering infused ScaNN index on the Vector Search has made this possible!!!
You can find the recall for a vector query on a vector index for a given configuration using the evaluate_query_recall
function. This function lets you tune your parameters to achieve the vector query recall results that you want. Recall is the metric used for search quality, and is defined as the percentage of the returned results that are objectively closest to the query vectors. The evaluate_query_recall
function is turned on by default.
Let’s see how to use it:
- Set the Enable Index Scan flag on the ScaNN Index:
SET scann.enable_indexscan = on
2. Run the following query in AlloyDB Studio:
SELECT
*
FROM
evaluate_query_recall($$
SELECT
id || ' - ' || title AS title,
abstract
FROM
patents_data
where num_claims >= 15
ORDER BY
abstract_embeddings <=> embedding('text-embedding-005',
'sentiment analysis')::vector
LIMIT 25 $$,
'{"scann.num_leaves_to_search":1, "scann.pre_reordering_num_neighbors":10}',
ARRAY['scann']);
The evaluate_query_recall function takes in the query as a parameter and returns its recall. I’m using the same query that I used to check performance as the function input query. I have added SCaNN as the index method. For more parameter options refer the documentation.
The recall for this Vector Search query we have been using:
I see that the RECALL is 70%. Now I can use this information to change the index parameters, methods and query parameters and improve my recall for this Vector Search!
I have modified the number of rows in the result set to 7 (from 10 previously) and I see a slightly improved RECALL, i.e. 86%.
This means in real-time I can vary the number of matches that my users get to see to improve the relevance of the matches in accordance with the users’ search context.
Isn’t this a significant leap in the world of Vector Search particularly in the agentic era??? You are now in control of number of matches that the end user has to see and improve the match according to the user’s context. What you are building are now true relevant matches! Not some arbitrary matches. Doesn’t this have significant impacts in the RAG realm, where accuracy of truth will be the utmost important to MCPs or A2As?
I have put together a quality-controlled multi-tool agentic application that uses ADK and all of the AlloyDB stuff that we discussed here to create a high performing and quality Patent Vector Search & Analyzer Agent that you can view here: https://youtu.be/Y9fvVY0yZTY
Wondering how I built this agent? That is for a follow-up lab. Stay tuned!
Source Credit: https://medium.com/google-cloud/quality-controlled-patent-agent-adk-alloydb-inline-search-recall-evaluation-supercharges-rag-96fbc9164bbf?source=rss—-e52cf94d98af—4