Build a Custom Connector for Gemini Enterprise | by Sascha Heyer | Google Cloud - Community

How to integrate any system into Gemini Enterprise using a Custom Connector.

Gemini Enterprise enables you to connect your organization’s knowledge sources directly to Gemini, making them searchable and accessible via natural language.

In today’s article, we’ll build a custom connector that ingests local Markdown, CSV, and TXT files into Discovery Engine, enforces fine-grained access control (ACLs), and keeps data up to date with live updates.

This guide is for both companies using Gemini Enterprise and partners integrating external data sources such as SaaS platforms or internal repositories. It explains not just the code but also how we interact with what is commonly called the Gemini Enterprise API, which in fact is the Discovery Engine API. We also touch on how the connector needs to work with several APIs, but more about that later in the article.

Full Reference Implementation on GitHub

The full implementation of the local filesystem connector used in this article is available on GitHub.

It contains scripts for document import, real-time file watching, and setup automation. You can explore the Python source code to understand how the Discovery Engine, DataStoreService, and Identity Mapping Store APIs are used together.

What Is a Custom Connector?

A custom connector bridges external data and Gemini Enterprise. It performs three key actions:

Fetch
Collect content and metadata from your source system (APIs, file systems, databases).
Transform
Convert that data into Discovery Engine’s Document format and assign access controls.
Sync
Upload it to a Gemini Enterprise data store and keep it updated over time.

Once indexed, Gemini uses this content to answer questions securely, respecting your organization’s access controls.

Architecture Overview

The connector communicates directly with multiple Google Cloud APIs and endpoints that together power Gemini Enterprise. It primarily uses the Discovery Engine API to manage and upload documents for indexing and also interact with services such as the Identity Mapping Store for user and group mappings and the DataStoreService for datastore creation and configuration. With various endpoints involved, you can see that there is a bit of complexity in bringing everything together. These APIs handle the ingestion, metadata management, and access control layers of Gemini Enterprise.

Local Filesystem 
→ Discovery Engine 
→ Gemini Enterprise

Why Companies Build Custom Connectors

Custom connectors allow organizations to make their private data searchable inside Gemini Enterprise. Google provides a set of pre-built connectors for popular enterprise systems, but the catalog is still limited and many connectors are in preview with uncertain release timelines.

This approach gives both customers and software vendors the opportunity to build their own integrations with full control over data models, permissions, and sync logic.

Key benefits:

Unify scattered systems under Gemini search.
Control data access with ACLs and identity mapping.
Support multiple formats and enterprise repositories.
Automate updates through APIs or scheduled syncs.

The Local Filesystem Connector in This Article

To make the concept easier to understand and accessible, I chose a local filesystem connector as our reference implementation. It avoids the complexity of authenticating against third-party APIs and products and instead focuses on demonstrating how Gemini Enterprise ingests and indexes data.

By syncing files from a local directory into Gemini Enterprise, we can clearly show how content, metadata, and ACLs are uploaded, stored, and later used for secure search.

This approach allows anyone to experiment with the fundamentals before tackling a more complex production integration.

How the Connector Interacts with Gemini Enterprise

The connector uses the Discovery Engine DocumentService API, the same interface Gemini Enterprise uses internally. Each document is uploaded as a structured payload containing content, metadata, and ACLs.

The workflow:

Build a DocumentServiceClient and set the endpoint for your region. This uses the Discovery Engine DocumentService API to connect to the right data store endpoint.
Create a Document object containing:
content.raw_bytes (UTF‑8 text or extracted data)
struct_data (owners, tags, file path, lastModified)
acl_info.readers (users, groups, or mapped identities) Here, the structure aligns with the schema managed through the DataStoreService API, which defines and configures the data store.
Call import_documents with reconciliation_mode=INCREMENTAL to upsert data using the Discovery Engine DocumentService API.
Discovery Engine indexes the data, and Gemini Enterprise applies ACLs at query time. ACL validation may also reference mappings created through the Identity Mapping Store API if external groups or identities are involved.

This design ensures that the connector and Gemini share a consistent data model while coordinating multiple APIs: Discovery Engine for document ingestion, DataStoreService for configuration, and Identity Mapping Store for user and group resolution.

Live Syncing Data Sources

Live syncing uses the same Discovery Engine API but focuses on detecting and propagating changes.

Best practices for production:

Webhooks
Use events from the source system (e.g Zendesk or Linear ticket updates) to trigger upserts immediately.
Scheduled polling
If the system lacks webhooks, schedule jobs or cron tasks to query recently updated data or to do a full sync.
Hybrid model
Combine real-time updates with periodic full syncs to catch missing data.

Considerations for Live Syncing

Use stable document IDs derived from the source system’s primary key.
Use INCREMENTAL mode for regular updates and FULL mode for periodic cleanups.
Handle deletions either by calling the delete API or by running a FULL import from Cloud Storage.
Respect rate limits, batch updates efficiently, and log results.
Keep ACL and identity mapping data in sync with your content.
Monitor job status, track error samples, and retry failures.

For local testing, you can run the included watcher:

python tools/watch_local_docs.py \
--project sascha-playground-doit \
--location global \
--data-store-id demo_local_docs_acl_v1 \
--content-root demo-content \
--verbose

In production, this logic should run as webhook handlers on Cloud Run or as a scheduled Cloud Run Job.

Access Control Strategies

Direct User ACLs
Define access per user.
Workspace Groups
Grant access by Google Workspace group.
External Identity Mapping
Map external users or groups (for example, Zendesk teams) to Google identities. This works through the Identity Mapping Store API, which allows external identities such as usernames, legacy IDs, or group labels to be associated with Google Workspace users or groups. The connector imports these mappings into a dedicated identity store in Discovery Engine. When a user makes a query in Gemini Enterprise, the system resolves these external identities through the mapping store to determine which documents the user is allowed to access. We cover this topic in a dedicated article that explores the Identity Mapping Store API and advanced access control configurations in more depth.

These controls ensure Gemini Enterprise only surfaces data that each user is authorized to see.

For Companies Integrating Custom Data Sources

When planning an enterprise integration, follow this discovery checklist:

Define the use case: Which data, personas, and access rules are required?
Plan data sync: How will updates flow (webhooks, exports, APIs)?
Gather sample data and decide on ACL representation.
Provision a Discovery Engine datastore with ACL enforcement.
Transform content into Document payloads.
Import documents through the API and automate syncing.
Register the datastore in Gemini Enterprise and link it to an assistant workspace.
Validate end-to-end permissions and adjust metadata as needed.

Building production-grade connectors for complex systems such as Zendesk or Salesforce requires significant engineering work, often taking weeks to handle schema mapping, authentication, scaling, and monitoring. This complexity does not come from the Google way of integrating but rather from the systems you are integrating.

References and Further Reading

For additional details and official guidance, check the Google Cloud Gemini Enterprise Custom Connector documentation. It provides explanations, best practices, and API references that complement this article.

Conclusion

Gemini Enterprise is a reasoning layer for your organization’s data. By building a custom connector, you make any data source searchable and intelligent within Gemini. I like this way of integrating data and APIs, though I wish everything were unified under a single API with clearer and more consistent endpoints.

You Made It To The End.

I hope you enjoyed the article.

Got thoughts? Feedback? Discovered a bug while running the code? I’d love to hear about it.

Connect with me on LinkedIn. Let’s network! Send a connection request, tell me what you’re working on, or just say hi.
AND Subscribe to my YouTube Channel ❤️

Source Credit: https://medium.com/google-cloud/build-a-custom-connector-for-gemini-enterprise-ad3aab884645?source=rss—-e52cf94d98af—4

Deven Goratela

Administrator

Visit Website View All Posts

Related Stories

Building Distributed Multi-Agent Systems with Google’s AI Stack: Part 5

Secure Private Access for Cloud Run with Private Service Connect

When writing, beware of zombies!

You may have missed