Grounding Gemini: One-Time BigQuery Ingestion with Terraform

This guide covers the Terraform configuration for One-Time Ingestion to ground a Gemini Enterprise Agent in BigQuery data.

For details regarding Periodic Ingestion (Automated Sync), please refer to the guide available here.

Grounding Gemini: One-Time BigQuery Ingestion with Terraform

When to Use One-Time Ingestion

One-Time Ingestion is ideal when:

Your source data in BigQuery is static or rarely changes.
You want to control exactly when data is refreshed (on-demand) rather than relying on an automated schedule.
You are testing the integration and want the data to be available immediately without waiting for the 1-hour initial sync delay associated with managed BigQuery Data Connectors.

Prerequisites

Before getting started, make sure you have:

1. Google Cloud Project & APIs

You must have an active Google Cloud project with the following APIs enabled:
Discovery Engine API (discoveryengine.googleapis.com): Powers the core search indexing, data store capabilities, and conversational infrastructure required for Gemini Enterprise.
BigQuery API (bigquery.googleapis.com): Enables Discovery Engine connectors to query, read, and sync tabular data from your BigQuery datasets.

2. Licensing & Tooling

Gemini Enterprise Subscription: Your Google Cloud project or organization must have an active, purchased Gemini Enterprise subscription tier.
Your Google Cloud Organization ID
Terraform v1.3+ installed

3. IAM Permissions

Your identity (or the service account running Terraform) will need:

Project level:

Discovery Engine Admin (roles/discoveryengine.admin)
Project IAM Admin (roles/resourcemanager.projectIamAdmin)
BigQuery Data Viewer (roles/bigquery.dataViewer) access to the target BigQuery dataset and table.

Cross-Project BigQuery Ingestion (If Applicable):

If you intend to import data from a source Google Cloud project that is different from the project hosting your Gemini Enterprise data store, you must configure cross-project permissions.
Locate the Discovery Engine service account inside the gemini enterprise data store project:

service-PROJECT_NUMBER@gcp-sa-discoveryengine.iam.gserviceaccount.com

Grant this service account the following IAM roles in the external project containing your BigQuery data source (source google cloud project):
BigQuery Job User (roles/bigquery.jobUser)
BigQuery Data Editor (roles/bigquery.dataEditor)

Terraform Configuration

We divide the configuration into three files: variables.tf, provider.tf, and main.tf.

variables.tf

To keep our configuration reusable, we define variables for project-specific details.

variable "project_id" {
 type        = string
 description = "The Google Cloud Project ID."
}

variable "location" {
 type        = string
 default     = "global"
 description = "The region for the Discovery Engine (e.g., global, us)."
}

variable "datastore_id" {
 type        = string
 description = "The ID of the Discovery Engine Data Store."
}

variable "bq_dataset_id" {
 type        = string
 description = "The BigQuery dataset ID."
}

variable "bq_table_id" {
 type        = string
 description = "The BigQuery table ID."
}

variable "engine_id" {
 type        = string
 description = "The ID of the Gemini Search Engine."
}

provider.tf

We configure the Google providers. Note that some features of Gemini Enterprise often require the `google-beta` provider.

terraform {
  required_version = ">= 1.0"
  required_providers {
    google-beta = {
      source  = "hashicorp/google-beta"
      version = ">= 5.0.0"
    }
  }
}

provider "google-beta" {
  project               = var.project_id
  region                = var.region
  user_project_override = true
  billing_project       = var.project_id
}

main.tf

Here, we declare our search infrastructure, trigger the immediate data load, and layer the Gemini Enterprise configuration over it. Since this flow is for One-Time Ingestion, we use a standalone `datastore` resource rather than a continuous connector resource. To populate it, we leverage a Terraform provisioner to trigger the ingestion API immediately during deployment. This ensures your search store is populated and ready for queries as soon as the Terraform apply finishes.

Step 1: Create the Data Store & Trigger Ingestion

We create a standalone google_discovery_engine_data_store.
To trigger the one-time import, we use a terraform_data resource with a local-exec provisioner. This provisioner executes a curl command to call the Discovery Engine API immediately upon deployment, importing your BigQuery data.

Note: This is a one-time snapshot, it will only re-trigger if you change the target BigQuery table or dataset in your Terraform configuration, not when the data inside the table changes.

Step 2: Create the Gemini Search Agent

Next, we create the google_discovery_engine_search_engine.
We set the subscription tier to SUBSCRIPTION_TIER_SEARCH_AND_ASSISTANT to enable advanced Gemini features.
We link it to the standalone Data Store created in Step 1.
Crucially, we use depends_on to ensure the search engine is only provisioned after the initial data import has been successfully triggered.

Step 3: Secure the Agent

Finally, we configure the Access Control (ACL) to use Google Workspace (GSUITE) for authentication.

# Access token datasource for the API call
data "google_client_config" "default" {}

# 1. Standalone Data Store
resource "google_discovery_engine_data_store" "bq_data_store" {
 provider              = google-beta
 project               = var.project_id
 location              = var.location
 data_store_id         = var.datastore_id
 display_name          = "BigQuery One-Time Data Store"
 industry_vertical     = "GENERIC"
 content_config        = "NO_CONTENT"
 solution_types        = ["SOLUTION_TYPE_SEARCH"]

 lifecycle {
   ignore_changes = [document_processing_config]
 }
}

# 2. API Trigger for immediate import
resource "terraform_data" "trigger_import" {
 triggers_replace = {
   table_id          = var.bq_table_id
   dataset_id        = var.bq_dataset_id
   datastore_id      = google_discovery_engine_data_store.bq_data_store.data_store_id
   datastore_created = google_discovery_engine_data_store.bq_data_store.create_time
 }

 provisioner "local-exec" {
   command = <<EOT
     curl -f -X POST \
       -H "Authorization: Bearer ${data.google_client_config.default.access_token}" \
       -H "Content-Type: application/json" \
       -H "X-Goog-User-Project: ${var.project_id}" \
       "https://discoveryengine.googleapis.com/v1beta/projects/${var.project_id}/locations/${var.region}/collections/default_collection/dataStores/${google_discovery_engine_data_store.bq_data_store.data_store_id}/branches/default_branch/documents:import" \
       -d '{
         "bigquerySource": {
           "projectId": "${var.project_id}",
           "datasetId": "${var.bq_dataset_id}",
           "tableId": "${var.bq_table_id}",
           "dataSchema": "custom"
         },
         "reconciliationMode": "INCREMENTAL",
         "autoGenerateIds": true
       }'
   EOT
 }
}

# 3. Create the Gemini Search Agent
resource "google_discovery_engine_search_engine" "bq_search_agent" {
 provider      = google-beta
 project       = var.project_id
 engine_id     = var.engine_id
 collection_id = "default_collection"
 location      = var.region
 display_name  = "Gemini BigQuery Search Agent One-Time"
 app_type      = "APP_TYPE_INTRANET"
  # Reference the standalone data store directly
 data_store_ids = [google_discovery_engine_data_store.bq_data_store.data_store_id]
  search_engine_config {
   search_tier                = "SEARCH_TIER_ENTERPRISE"
   search_add_ons             = ["SEARCH_ADD_ON_LLM"]
   required_subscription_tier = "SUBSCRIPTION_TIER_SEARCH_AND_ASSISTANT"
 }

 # Ensure the import starts before the search engine points to the store
 depends_on = [
   terraform_data.trigger_import
 ]

 lifecycle {
   replace_triggered_by = [
     google_discovery_engine_data_store.bq_data_store
   ]
 }
}

# 4. Secure the Agent (ACL Config)
resource "google_discovery_engine_acl_config" "identity_provider" {
 provider = google-beta
 project  = var.project_id
 location = var.region
 idp_config {
   idp_type = "GSUITE"
 }
}

Deployment and Verification

Deploying the configuration:

Initialize the working directory to pull down the required Google Beta binaries:

terraform init

2. Execute a dry-run verification preview to validate your HCL block syntax against your live GCP environment:

terraform plan

Once your plan step cleanly compiles and outputs `Plan: 3 to add, 0 to change, 0 to destroy`, your code logic is completely pristine.

3. Apply the configuration to provision the backend engines:

terraform apply

Verify Ingestion

The import is triggered immediately during terraform apply. You can monitor the import job status in the Google Cloud Console under Agent Builder > Data Stores > [Your Data Store] > Activity

Test the Agent

Once the import completes, go to Gemini Enterprise > Apps, select your app, and use the preview link to test the search interface.

Key Considerations and Final Thoughts

This flow is designed for static data. Terraform triggers the import once during deployment. It will not automatically sync new rows or updates added to your BigQuery table later.
If your BigQuery data changes and you need to force a refresh of the search index, you can tell Terraform to re-run the import command by replacing the trigger resource:

terraform apply -replace="terraform_data.trigger_import"

Use this one-time flow for archives, documentation, or static datasets. If your data changes frequently (e.g., daily updates), you should use the Periodic Ingestion flow instead, which handles automatic syncing on a schedule.

Happy building! 🚀

Grounding Gemini: One-Time BigQuery Ingestion with Terraform was originally published in Google Cloud – Community on Medium, where people are continuing the conversation by highlighting and responding to this story.

Source Credit: https://medium.com/google-cloud/grounding-gemini-one-time-bigquery-ingestion-with-terraform-913f1ea49f02?source=rss—-e52cf94d98af—4

Deven Goratela

Administrator

Visit Website View All Posts

Related Stories

Amazon EC2 C9g and C9gd instances powered by AWS Graviton5 processors are now available

Conversational Analytics in BigQuery now GA

Accelerate your infrastructure deployments by up to 4x with AWS CloudFormation Express mode

You may have missed