From zero to RAG: mastering Semantic Search in BigQuery

Standard e-commerce search struggles with nuance. If you ask for a “bold red for a steak dinner,” keyword matching often fails because databases store technical specs, not sensory experiences. In this article, I’ll show you how we built a hyper-personalized “AI Sommelier” entirely within BigQuery.

We went beyond simple similarity search. By leveraging Gemini models to enrich raw metadata and Vector Search to capture semantic intent, we transformed a flat product catalog into a dynamic sensory engine. You will learn how we implemented a sophisticated recommendation pipeline that adapts to individual customer taste profiles, prioritizes local inventory, and respects budget constraints, all optimized for sub-second latency without ever leaving the data warehouse. 🍷✨

From zero to RAG: mastering Semantic Search in BigQuery

· 🤔 The “Subjectivity” problem
· 🍇 Phase 1: tasting the data
· 🔍 Phase 2: the “blind taste test” (Vector Search)
· 🧠 Phase 3: learning the palate (personalization)
· 📦 Phase 4: the “Inventory & Budget” reality
· 🚀 Optimization: serving it fast
· 🍷 Conclusion

🤔 The “Subjectivity” problem

Imagine walking into a high-end wine shop. You usually don’t walk up to the counter and ask for “a fermented grape beverage with 13.5% alcohol and a pH of 3.5.”

Instead, you say: “I need a bold red for a steak dinner,” or “I want a crisp, mineral white that reminds me of my vacation in Santorini.”

A human sommelier understands this instantly. A standard SQL database? Not so much.

In traditional e-commerce search, if a user searches for “cozy weekend wine,” and the product description doesn’t explicitly contain the word “cozy,” the result is empty. Standard keyword search (WHERE description LIKE '%cozy%') fails because it lacks semantic understanding.

We faced this exact challenge with a massive catalog of premium wines. Our database was pristine but clinically cold, filled with structured attributes like “Tuscany,” “Sangiovese,” and “2016”. But our customers don’t buy “fermented Sangiovese”; they buy “a warm evening by the fire” or “a prestigious bottle to impress a client.” When the search bar fails to understand that nuance, the sale is lost.

This is the “Subjectivity problem”: the massive gap between how data is stored (structured specs) and how humans experience it (sensory and emotional).

To bridge this gap, we needed an “AI Sommelier” : a system capable of translating vague, sensory human intent into specific Product IDs. Traditionally, building this level of intelligence meant a complex architectural sprawl: exporting terabytes of data to a specialized Vector Database (like Chroma or Milvus), managing fragile ETL pipelines, and dealing with synchronization lag.

But we asked ourselves: Why move the data to the model, when we can bring the model to the data?

BigQuery now supports remote models (like Gemini) and vector search natively. So, we decided to build the entire brain, from generating sensory tags to calculating personalized “taste profiles”, right where the data already lived.

🍇 Phase 1: tasting the data

You can’t search for “velvety finish” or “romantic evening” if your database only says “Pinot Noir, 2018, France.” To fix this, we enriched our raw data using AI.GENERATE_TABLE in BigQuery, using a Materialized View as our stable base.

We chose a Materialized View for a specific reason: consistency. Our product data was scattered across multiple tables: translations, categories, and business rules. Re-joining these every time we ran an expensive LLM operation would be inefficient. The Materialized View gave us a frozen, pre-computed snapshot of the catalog, ensuring the “Sommelier” always tasted the exact same vintage of data.

The Structured Prompt

Instead of dealing with messy text parsing, we defined a strict Output Schema directly in the SQL. This forced the Gemini 2.5 model to return structured arrays for our new columns: IDEAL_OCCASION, FLAVOR_PROFILE, GIFT_RECIPIENT, and DRINKER_PERSONA.

/* Calling Gemini to enrich wine data with structured output */
CREATE OR REPLACE TABLE `project.dataset.ENRICHED_WINE_DATA` AS
SELECT * FROM
  ML.GENERATE_TABLE(
    MODEL `project.dataset.gemini_model`,
    (
      SELECT
        product_id,
        w.WINE_NAME,
        w.GRAPE_COMPOSITION,
        -- 1. Construct the detailed prompt
        CONCAT(
          'You are a Master Sommelier and wine consultant. ',
          'Analyze the following detailed wine information. ',
          'Your entire response MUST be **a single, valid JSON object, and nothing else** (no leading/trailing text, no explanations, no code fences like ```json). ',
          'The JSON keys MUST be exactly: "IDEAL_OCCASION", "FLAVOR_PROFILE", "GIFT_RECIPIENT", and "DRINKER_PERSONA". ',
          '--- REQUIRED JSON STRUCTURE AND CONTENT ---',
          '1. "IDEAL_OCCASION": An ARRAY of up to 10 ideal use occasions (e.g., "Dinner party", "Quiet evening", "Summer picnic"). ',
          '2. "FLAVOR_PROFILE": An ARRAY of up to 10 adjectives describing the taste (e.g., "Oaky", "Full-bodied", "Citrusy"). ',
          '3. "GIFT_RECIPIENT": An ARRAY of ideal recipients for a gift (e.g., "Father\'s birthday", "Boss", "Collector"). ',
          '4. "DRINKER_PERSONA": A single sentence, maximum 150 characters, describing the typical drinker. ',
          'PRODUCT INFORMATION: --- ',
          'NAME: ', w.WINE_NAME, ' | ',
          'DESCRIPTION: ', w.DESCRIPTION, ' | ',
          'DETAILS: ', w.TECHNICAL_DETAILS, ' | ',
          'GRAPE: ', w.GRAPE_COMPOSITION,
          ' ---'
        ) AS prompt
      FROM
        `project.dataset.MV_WINE_CATALOG` AS w
    ),
    -- 2. DEFINE THE OUTPUT SCHEMA
    -- This forces the LLM to return actual BigQuery arrays, not just text!
    STRUCT(
      "IDEAL_OCCASION ARRAY<STRING>, FLAVOR_PROFILE ARRAY<STRING>, GIFT_RECIPIENT ARRAY<STRING>, DRINKER_PERSONA STRING" AS output_schema,
      0.5 AS temperature,
      8192 AS max_output_tokens
    )
  );

Technical Tip: We learned that setting max_output_tokens is critical. Initially, low limits caused "silent failures" where JSON generation was cut off. We bumped this to 8192 tokens to ensure complete responses.

Taming the Creative Chaos

Generative AI can be too creative. We found thousands of variations like “Crisp,” “Bone Dry,” and “Sec” for the same flavor. To fix this, we used a second LLM pass to map these raw descriptors into a standardized “Taxonomy of 100” canonical tags. This ensured that a search for “Dry” retrieved all dry wines, regardless of the AI’s specific phrasing.

🔍 Phase 2: the “blind taste test” (Vector Search)

To translate “human” descriptions into machine logic, we used Google’s text-embedding-005 model directly within BigQuery. This transforms our enriched text into high-dimensional vectors, where a "Barolo" sits mathematically closer to "Truffle Dinner" than to "Poolside."

Generating the Embeddings

We didn’t just embed the technical specs. We concatenated our new sensory tags with the product details to create a holistic “Product Profile” vector.

/* Generating embeddings for the entire wine catalog */
CREATE OR REPLACE TABLE `project.dataset.WINE_EMBEDDINGS` AS
SELECT
  * EXCEPT(content),
  ml_generate_embedding_result AS embedding
FROM
  ML.GENERATE_EMBEDDING(
    MODEL `project.dataset.embedding_model`,
    (
      SELECT
        *,
        -- Concatenate all sensory and technical data into one string
        -- We use COALESCE to ensure NULL fields don't break the concatenation
        CONCAT(
          'Wine Name: ', COALESCE(WINE_NAME, ''), '. ',
          'Description: ', COALESCE(SOMMELIER_NOTES, ''), '. ',
          'Ideal for occasions like: ', COALESCE(ARRAY_TO_STRING(IDEAL_OCCASION, ', '), ''), '. ',
          'The flavor profile is: ', COALESCE(ARRAY_TO_STRING(FLAVOR_PROFILE, ', '), ''), '. ',
          'Perfect gift for: ', COALESCE(ARRAY_TO_STRING(GIFT_RECIPIENT, ', '), ''), '. ',
          'Drinker Persona: ', COALESCE(DRINKER_PERSONA, ''), '. ',
          'Grape: ', COALESCE(GRAPE_COMPOSITION, ''), '. '
        ) AS content
      FROM
        `project.dataset.ENRICHED_WINE_DATA`
    )
  );

The Search Logic

Now, when a user asks, “I’m looking for a complex, earthy red to pair with a truffle risotto,” we don’t just look for the word “risotto.”

We convert that user query into a vector and calculate the Cosine Distance between the query and every bottle in our warehouse. The smaller the distance, the better the match.

/* Performing the semantic search */
DECLARE user_query_text STRING DEFAULT 'complex earthy red for truffle risotto';

SELECT
  WINE_NAME,
  PRICE,
  -- Calculate semantic similarity (lower distance = better match)
  ML.DISTANCE(
    (
      SELECT ml_generate_embedding_result
      FROM ML.GENERATE_EMBEDDING(
        MODEL `project.dataset.embedding_model`,
        (SELECT user_query_text AS content)
      )
    ),
    product.embedding,
    'COSINE'
  ) AS distance
FROM
  `project.dataset.WINE_EMBEDDINGS` AS product
ORDER BY
  distance ASC
LIMIT 5;

The “variety” rule

In our initial testing, we found a UX flaw: if a user asked a broad question like “What’s good for dinner?”, the model might return 5 very similar Cabernet Sauvignons. To mimic a helpful sommelier, we refined the query to enforce category diversity.

We used ROW_NUMBER() OVER(PARTITION BY CATEGORY …) to ensure the top results included a mix, perhaps one Red, one White, and one Sparkling, giving the user a curated flight rather than a repetitive list.

🧠 Phase 3: learning the palate (personalization)

A good sommelier recommends what you like, not just what’s popular. We achieved this by calculating a Centroid for every customer: averaging the embeddings of their purchases from the last 3 years to create a mathematical “Palate Vector”.

The Hybrid Score Formula

We rank search results using a weighted formula that balances the user’s immediate question with their long-term taste:

/* Hybrid Scoring: Balancing the Query and the User's Palate */
SELECT 
  product.*,
  (
    (ML.DISTANCE(product.embedding, query_vec, 'COSINE') * 0.5) +  -- 50% Importance: The Question
    (ML.DISTANCE(product.embedding, user_palate_vec, 'COSINE') * 0.5) + -- 50% Importance: The User's Taste
  ) AS final_ranking_score
FROM ...

The “white wine” paradox (vectors centroids vs. stats)

We hit a snag with specific preferences like wine type (red vs. white). In vector space, a “Fruity Pinot Noir” (Red) and a “Fruity Chardonnay” (White) can be surprisingly close : they share descriptors like “fruit-forward” and “French.”

However, if a customer hates white wine, they don’t care about semantic similarity. We found that averaging vectors for distinct categories (Red + White = Rosé?) was a mistake. Instead, we switched to explicit stats. We count the user’s purchases by category. If a user has a strong history of buying Red, we apply a mathematical penalty to white wines in the final ranking, overriding the vector similarity.

📦 Phase 4: the “Inventory & Budget” reality

In the real world, the perfect wine recommendation is useless if it’s out of stock or costs €5,000 when your budget is €50.

We added a final layer to our SQL query. This doesn’t just search for semantic similarity; it joins the vector search results with the live Inventory Table for the user’s specific local store (e.g., Store_ID = 'NY01') and applies a strict Budget Filter.


SELECT 
  product.*,
FROM 
  `project.dataset.WINE_EMBEDDINGS` product
JOIN 
  `project.dataset.STORE_INVENTORY` inventory
  ON product.id = inventory.product_id
WHERE 
  inventory.store_id = 'NY01' -- User's Local Store
  AND inventory.quantity > 0
  AND (budget IS NULL OR inventory.price <= budget) -- Budget Check
ORDER BY 
  final_ranking_score ASC
LIMIT 5;

🚀 Optimization: serving it fast

Our initial prototype was functionally impressive but computationally heavy. The original query calculated the user’s “Palate Vector” on the fly, scanning their entire 3-year purchase history every time they asked a question.

While accurate, this approach created a noticeable lag in the chatbot experience.

The Fix: Pre-calculation

We refactored the architecture to split the workload. We implemented a recurrent job that pre-calculates the customer centroid and preferences for every active customer, storing the results in a clustered lookup table (CUSTOMER_PRECALCULATED_PROFILES).

The live search query now simply joins to this table instead of re-aggregating millions of rows. The performance gains were dramatic:

Elapsed Time: Reduced by 50% (4s -> 2s).
Slot Time Consumed: Decreased by ~71% (2m 57s -> 51s), significantly lowering query costs.
Bytes Shuffled: Plummeted by ~96% (163.22 MB -> 5.56 MB), eliminating data shuffling bottlenecks.

This optimization turned a 5-second “thinking” pause into a snappy, real-time conversation.

🍷 Conclusion

We set out to solve a specific problem: bridging the gap between the rigid, technical data in our warehouse and the fluid, emotional way customers actually shop.

By building this “AI Sommelier,” we proved that you don’t need a complex, fragmented stack to deploy state-of-the-art Generative AI. We didn’t need to export terabytes of data to a specialized vector database, manage external API keys for embeddings, or worry about data synchronization.

We did it all inside BigQuery.

From enriching raw text with Gemini 2.5 to performing high-speed Vector Search and calculating personalized Taste Centroids, the entire pipeline lives where the data lives. The result is a system that understands the nuance between a “Barolo” and a “Bordeaux”, and more importantly, knows exactly which one you would prefer for your anniversary.

References

BigQuery ML
BigQuery Vector Search Documentation
Gemini Models in Vertex AI
Google Cloud Model Garden
AI.GENERATE_TABLE function

From zero to RAG: mastering Semantic Search in BigQuery was originally published in Google Cloud – Community on Medium, where people are continuing the conversation by highlighting and responding to this story.

Source Credit: https://medium.com/google-cloud/from-zero-to-rag-mastering-semantic-search-in-bigquery-15321095fc03?source=rss—-e52cf94d98af—4

Deven Goratela

Administrator

Visit Website View All Posts

Related Stories

Transform live video for mobile audiences with AWS Elemental Inference

BigQuery Graph Series | Part 1: From “Dark Data” to Knowledge Graphs

Gartner ranks Spanner #1 for Lightweight Transactions Use Case

You may have missed

40% in Japan use AI in their job search, survey finds – The Japan Times