For years, the “AI revolution” has had a “Keep Out” sign for many data professionals. If you didn’t know Python, weren’t comfortable managing Docker containers, or couldn’t wrangle specialized ML libraries, you were stuck on the sidelines.
That wall has crumbled.
If you know standard SQL, you can now wield the most powerful AI models Google has to offer. By combining BigQuery with Vertex AI (Google’s unified AI platform), you can perform complex generative AI tasks — like sentiment analysis, text summarization, and entity extraction — without writing a single line of Python.
In this post, we’re going to build a scalable sentiment analysis pipeline using nothing but SQL.
The Architecture: How it Works
Traditionally, sentiment analysis meant exporting your data from your warehouse, running it through an external script (usually Python), and loading the results back in. It was slow, brittle, and insecure.
BigQuery ML’s Remote Models change this. They act as a bridge. When you run a SQL query, BigQuery securely sends the data to a Vertex AI endpoint (where Gemini lives), gets the response, and joins it back to your table in real-time.
Prerequisites: The One-Time Setup
Before we write SQL, we need to open the bridge between BigQuery and Vertex AI. You only have to do this once per project.
1. Enable APIs
Ensure the BigQuery, BigQuery Connection, and Vertex AI APIs are enabled in your Google Cloud project.
2. Create a Cloud Resource Connection
This is the “identity” BigQuery will use to talk to Vertex AI.
- Go to BigQuery in the Google Cloud console.
- Click + ADD > Connections to external data sources.
- Choose Vertex AI remote models, remote functions and BigLake (Cloud Resource).
- Give it a Connection ID (e.g.,
gemini-conn) and select your region (e.g.,USorEU).
3. Grant Permissions (Crucial Step!)
Once created, click on your new connection in the BigQuery Explorer. You will see a “Service Account ID” that looks like an email address.
- Copy that Service Account ID.
- Go to IAM & Admin in the console.
- Click Grant Access, paste the Service Account ID, and assign it the role: Vertex AI User.
Step 1: Create the Gemini Model in SQL
Now, we treat Gemini just like a table or a view. We will “create” the model using standard DDL (Data Definition Language).
Run this in your BigQuery console (replace your_project and your_dataset with your actual IDs):
CREATE OR REPLACE MODEL `your_project.your_dataset.gemini_flash`
REMOTE WITH CONNECTION `us.gemini-conn`
OPTIONS (ENDPOINT = ‘gemini-1.5-flash-002’);
Note: We are using gemini-1.5-flash-002 here because it is blazing fast and highly cost-effective for high-volume text tasks like sentiment analysis.
Step 2: The Data
Let’s imagine we have a table of customer reviews. If you don’t have one, here is a quick script to generate some dummy data:
CREATE OR REPLACE TABLE `your_dataset.customer_reviews`
AS
SELECT
1 AS review_id,
“The app crashes every time I try to upload a photo. Fix this!” AS review_text
UNION ALL
SELECT 2, “I love the new dark mode, but the search function is a bit slow now.”
UNION ALL
SELECT
3,
“Customer support was completely unhelpful. I’m cancelling my subscription.”
UNION ALL
SELECT 4, “Best purchase I’ve made all year. Highly recommended!”
Step 3: The Magic Query
Now for the fun part. We will use the ML.GENERATE_TEXT function.
We won’t just ask for sentiment; we will ask Gemini to be specific. We want a score between 1 and 5, and a one-word reasoning.
SELECT
review_id,
review_text,
ml_generate_text_result[‘candidates’][0][‘content’][‘parts’][0][‘text’]
AS ai_analysis
FROM
ML.GENERATE_TEXT(
MODEL `your_dataset.gemini_flash`,
(
SELECT
review_id,
review_text,
CONCAT(
‘Analyze the sentiment of the following customer review. ‘,
‘Return ONLY a JSON object with two fields: “score” (integer 1–5) and “reason” (max 3 words). ‘,
‘Review: ‘,
review_text) AS prompt
FROM `your_dataset.customer_reviews`
),
STRUCT(
0.0 AS temperature,
-Keep it deterministic 150 AS max_output_tokens,
TRUE AS flatten_json_output — Makes parsing easier));
Level Up: Structured Output
The previous query returns the AI’s answer as a string. To make this truly useful for dashboards (like Looker), we should parse that string into real columns.
Because we asked Gemini for JSON, we can use BigQuery’s native JSON functions to extract the data cleanly:
WITH raw_ai_results AS (
— (Insert the previous ML.GENERATE_TEXT query here)
)
SELECT
review_id,
review_text,
— Safely parse the JSON string into actual data types
SAFE_CAST(JSON_VALUE(SAFE.PARSE_JSON(ai_analysis), ‘$.score’) AS INT64) as sentiment_score,
JSON_VALUE(SAFE.PARSE_JSON(ai_analysis), ‘$.reason’) as sentiment_reason
FROM raw_ai_results;
Boom. You just turned unstructured text into structured metrics ready for visualization using only SQL.
Source Credit: https://medium.com/google-cloud/sql-is-all-you-need-analyzing-sentiment-with-bigquery-and-gemini-ba05b6bdac97?source=rss—-e52cf94d98af—4
