

BigQuery is mind-blowing! Let me break this down for you.
The title just gave it away. But still, read on it’s a short blog. Imagine being able to build a whole application with search and chat and what not all with only SQL queries? That’s BigQuery for you. Would you believe me if I say I no longer write heavy application code for AI or agentic features for my data driven use cases.
You would think this level of simplification of architecture would require a lot of complex configuration, set up and nested SQL constructs.
You probably are still remembering the time we were able to create remote functions and remote model definition to do anything generative with BigQuery. Like the time I wrote this blog. While you can still use that, we have evolved a lot from that point to using direct AI.GENERATE constructs with advanced features.
There are way too many things I can speak of it in terms of business value and developer impact. But I’m bored to do the writing. Let me get to the point and break this down for you!!! Also, the database fanatic in me wants to do the SQL.
Let’s start with creating tables generatively from your data source. This could be data in any format — text, tabular, multimodal etc. Imagine being able to plug your application to an MCP server, and send it to BigQuery to process and perform all the AI you want and store response into a table that is ready to be queried and used downstream? I want to cry knowing how far databases (don’t get me started on BigQuery is a Data Warehouse argument now) have evolved.
Construct: AI.GENERATE_TABLE
It lets you perform generative natural language tasks by using any combination of text and unstructured data from BigQuery standard tables, and also specify a schema to format the response from the model.
The function works by sending requests to a BigQuery ML remote model that represents a Vertex AI Gemini model, and then returning that model’s response. The function supports remote models over any of the generally available or preview Gemini models.
Note: Remote Model needs to be created to use the remote model of your choice (from the list of available ones mentioned in the above statement). You can follow one of my earlier blogs to create a remote model however you need to note that the steps involved in creating a remote connection has changed and you need to reference the documentation for that.
The following query in BigQuery creates a table (a series of columns which you can create a table from if you want) from an image based on your prompt:
SELECT
*
FROM
AI.GENERATE_TABLE( MODEL `toys.gemini_remote_model`,
(
SELECT
('Analyze the following image and identify the following from it: image description, category of the toy in the image (plush or electronics or plastic or other hard material etc., color of the toy body, color of the outfit of the toy, accessories in the toy like jewels or tie or bow or bowtie or shoes or watch or makeup or other toys that the main toy may have etc., any recommendation for suitability with children, other details not covered in other fields like toy ethnicity or any cultural significance of the oty in good taste )',
OBJ.GET_ACCESS_URL(OBJ.FETCH_METADATA(OBJ.MAKE_REF('gs://img_toys/image (14).png',
"us-central1.bq-conn-2025")),
'r')) AS prompt
-- , * from toys.toys
),
STRUCT ( "image_description STRING, toy_category STRING, toy_color STRING, toy_outfit_color STRING, toy_accessories STRING, toy_recommendations STRING, toy_significance STRING"
AS output_schema ));
Check out the result:
Result in JSON:
[{
"image_description": "A close-up shot features a red plush teddy bear seated on a wooden surface. The bear has a furry texture, a black nose and eyes, and wears a red and white polka-dot bow tie. It is posed in a sitting position with its paws resting in front. In the background, there are books stacked on top of each other and blurred shelves, suggesting an indoor setting. The lighting is warm and soft, enhancing the cozy atmosphere of the image.",
"toy_accessories": null,
"toy_category": "plush",
"toy_color": "red",
"toy_outfit_color": "red and white",
"toy_recommendations": "Suitable for children of all ages.",
"toy_significance": "Teddy bears are often associated with comfort, security, and childhood nostalgia, making them a common and culturally significant toy.",
"full_response": "{\"candidates\":[{\"avg_logprobs\":-0.4095088649440456,\"content\":{\"parts\":[{\"text\":\"{\\n \\\"image_description\\\": \\\"A close-up shot features a red plush teddy bear seated on a wooden surface. The bear has a furry texture, a black nose and eyes, and wears a red and white polka-dot bow tie. It is posed in a sitting position with its paws resting in front. In the background, there are books stacked on top of each other and blurred shelves, suggesting an indoor setting. The lighting is warm and soft, enhancing the cozy atmosphere of the image.\\\",\\n \\\"toy_category\\\": \\\"plush\\\",\\n \\\"toy_color\\\": \\\"red\\\",\\n \\\"toy_outfit_color\\\": \\\"red and white\\\",\\n \\\"toy_recommendations\\\": \\\"Suitable for children of all ages.\\\",\\n \\\"toy_significance\\\": \\\"Teddy bears are often associated with comfort, security, and childhood nostalgia, making them a common and culturally significant toy.\\\"\\n}\"}],\"role\":\"model\"},\"finish_reason\":\"STOP\",\"score\":-75.75914001464844}],\"create_time\":\"2025-06-14T04:31:42.729120Z\",\"model_version\":\"gemini-2.0-flash\",\"response_id\":\"LvtMaKDALOiVhMIP0M-y0Ak\",\"usage_metadata\":{\"billable_prompt_usage\":{\"image_count\":1,\"text_count\":550},\"candidates_token_count\":185,\"candidates_tokens_details\":[{\"modality\":\"TEXT\",\"token_count\":185}],\"prompt_token_count\":1428,\"prompt_tokens_details\":[{\"modality\":\"IMAGE\",\"token_count\":1290},{\"modality\":\"TEXT\",\"token_count\":138}],\"total_token_count\":1613,\"traffic_type\":\"ON_DEMAND\"}}",
"status": "",
"prompt": {
"_field_1": "Analyze the following image and identify the following from it: image description, category of the toy in the image (plush or electronics or plastic or other hard material etc., color of the toy body, color of the outfit of the toy, accessories in the toy like jewels or tie or bow or bowtie or shoes or watch or makeup or other toys that the main toy may have etc., any recommendation for suitability with children, other details not covered in other fields like toy ethnicity or any cultural significance of the oty in good taste )",
"_field_2": "{\"access_urls\":{\"expiry_time\":\"2025-06-14T10:31:37Z\",\"read_url\":\"https://storage.googleapis.com/img_toys/image%20%2814%29.png?X-Goog-Algorithm\u003dGOOG4-RSA-SHA256\u0026X-Goog-Credential\u003dbqcx-273845608377-xrnn%40gcp-sa-bigquery-condel.iam.gserviceaccount.com%2F20250614%2Fauto%2Fstorage%2Fgoog4_request\u0026X-Goog-Date\u003d20250614T043137Z\u0026X-Goog-Expires\u003d21600\u0026X-Goog-SignedHeaders\u003dhost\u0026generation\u003d1749843202335739\u0026X-Goog-Signature\u003d37d40cd21e85fbffb962766877a58cea0c226b149db1cf37382a3ac9cf1b09a8082a589e336ef6e6b26cbe45fceb3273c4c546ad5ea1a1556ac06f00105c560ea9f8d5532f7c6482fcc2a3006dc6fde61774fbb3743379afcfd5bc98a77ecc211e3305c5ad561a004ccef603ced40fd5049f85f0c327b1757a05b67767285cc90f999a80534866bfd59c06b631f45e13de36aae672b8aff8d2a8d068d6b1b609cb56fe7822ed2552712895ab6563a3c99ae2680cbb58508467a262fe11cbdc8e1790abd66a8fbc31afd2cdec3d1157f243053fbb9d87fc7b0cbf8c52818ac8b420b237a2db6e97f06b2b8d91f84ee37100916068e60965be79f799481ada29c3\",\"write_url\":\"\"},\"objectref\":{\"authorizer\":\"abis-345004.us-central1.bq-conn-2025\",\"details\":{\"gcs_metadata\":{\"content_type\":\"image/png\",\"md5_hash\":\"47830a3a0b31954a2dbc4479a6f556f6\",\"size\":1280376,\"updated\":1.749843202e+15}},\"uri\":\"gs://img_toys/image (14).png\",\"version\":\"1749843202335739\"}}"
}
}]
Notice the JSON output, you would see that each output value is returned as an individual field which means it is identified as an individual column when used in queries, DMLs and DDLs to select, update and create a new table respectively.
And if you’re wondering, here is the image of the gsutil URI I used:
Now let’s try the same with text input:
The following query in BigQuery creates a table (a series of columns from which you can create a table) from a STRING you share as a field or through a constant:
SELECT
context, purpose, topics_covered, significance, category
FROM
AI.GENERATE_TABLE( MODEL `toys.gemini_remote_model`,
(
SELECT
'Analyze the summary of the following book and extract the following fields from it. Do not miss out any field. You CANNONT return NULL for any of the 5 fields. Fields: context, purpose, topics_covered, significance, category. Here is the summary you need to analyze: The Outline of History is a book that attempts to tell the story of life and mankind in a clear and comprehensive way. It is written for the general reader and aims to provide a broad understanding of history beyond the traditional focus on national histories. The book covers a wide range of topics, including the origins of life, the development of human societies, and the rise and fall of civilizations. It emphasizes the interconnectedness of human history and the importance of a common understanding of our past.' AS prompt
-- , * from bookshelf.books
),
STRUCT ( "context STRING, purpose STRING, topics_covered STRING, significance STRING, category STRING" AS output_schema ));
Result:
For advanced analytics and transactions you can club the two together — basically treat your request as multimodal and generate results to use in your application.
Now if you are sure you are going to use one of the Vertex AI Gemini Models (say Gemini 2.0 Flash or something like that) where you don’t have to create a remote model, then this option is ideal. For any — one off fields that you want to populate based on a generative or agentic logic, this is the low code version of the same (in my opinion).
Construct: AI.GENERATE
It returns a STRUCT
value for each row in the table. The struct contains the following fields:
result
: aSTRING
value containing the model’s response to the prompt. The result isNULL
if the request fails or is filtered by responsible AI. If you specify anoutput_schema
thenresult
is replaced by your custom schema.full_response
: aSTRING
value containing the JSON response from theprojects.locations.endpoints.generateContent
call to the model. The generated text is in thetext
element. The safety attributes are in thesafety_ratings
element.status
: aSTRING
value that contains the API response status for the corresponding row. This value is empty if the operation was successful.
The reason I elaborated on the response here is so you understand the difference between the AI.GENERATE_TABLE and AI.GENERATE constructs. In GENERATE_TABLE the main response contains all columns specified in the output_schema argument. In the AI.GENERATE construct, the main response is the single STRING or an output_schema represented in a STRING. We will see this difference towards the end of the following example.
The following query in BigQuery generates whether or not a selected field (or a STRING) contains profanity along with the details of profanity:
SELECT
AI.GENERATE(CONCAT('Does this word contain profanity even if its very minute?
Respond with a Bool. Also exaplain the category of profanity.: ',
'Ihateyoufcukyou'),
connection_id => 'projects/abis-345004/locations/us/connections/bq-vx',
endpoint => 'gemini-2.0-flash',
output_schema => 'PROFANITY_BOOL BOOL, CATEGORY STRING') profanity_check
Result in JSON:
[{
"profanity_check": {
"CATEGORY": "Offensive",
"PROFANITY_BOOL": "true",
"full_response": "{\"candidates\":[{\"avg_logprobs\":-0.005195302622658866,\"content\":{\"parts\":[{\"text\":\"{\\n \\\"CATEGORY\\\": \\\"Offensive\\\",\\n \\\"PROFANITY_BOOL\\\": true\\n}\"}],\"role\":\"model\"},\"finish_reason\":\"STOP\",\"score\":-0.10910135507583618}],\"create_time\":\"2025-06-14T06:30:57.366178Z\",\"model_version\":\"gemini-2.0-flash\",\"response_id\":\"IRdNaOKsFuLa48APtPHXoAo\",\"usage_metadata\":{\"billable_prompt_usage\":{\"text_count\":141},\"candidates_token_count\":21,\"candidates_tokens_details\":[{\"modality\":\"TEXT\",\"token_count\":21}],\"prompt_token_count\":42,\"prompt_tokens_details\":[{\"modality\":\"TEXT\",\"token_count\":42}],\"total_token_count\":63,\"traffic_type\":\"ON_DEMAND\"}}",
"status": ""
}
}]
I have received my output that the input string “‘Ihateyoufcukyou’” does contain profanity and the category as “Offensive”.
As you can see in the above JSON response for the AI.GENERATE construct, the fields defined in my schema PROFANILTY_BOOL and CATEGORY are nested with this STRING object “PROFANITY_CHECK”. In the response for AI.GENERATE_TABLE you observed each of our output schema field was returned as its own JSON field without being nested into another object. That is why you could use the result directly in a CREATE TABLE DDL.
You can use AI.GENERATE construct to directly return BOOL, INT64 and FLOAT64 fields using the AI.GENERATE_BOOL, AI.GENERATE_INT and AI.GENERATE_DOUBLE constructs respectively.
Tell me another database (OK FINE!) or a Data Warehouse outside of Google Cloud Databases that can do anything remotely as close to this one!!!
Source Credit: https://medium.com/google-cloud/the-sql-fanatics-dream-ai-agents-without-the-application-code-67779e4cca81?source=rss—-e52cf94d98af—4