Business Intelligence in AI Era: How Agents and Gemini unlock your data

Example: Support Team Performance Metrics

But after that, let’s dive into why this is a big deal and how you can build similar solutions.

For years, Business Intelligence (BI) has promised to empower decision-makers with data-driven insights. Yet, many organizations still grapple with significant challenges. Here is our opinionated rating of top problems.

We face problems like:

Data Quality: Garbage in, garbage out. If the underlying data isn’t clean and reliable, no amount of fancy AI can save it.
Data Governance and Compliance: Critical, but often complex to implement in large organizations, especially with evolving regulations.
Performance and Scalability: As data volumes explode and multiple sources are incorporated, traditional BI systems can struggle.
Skills and Expertise Gaps: There’s often a chasm between the business users who have the questions and the technical teams who know how to get the answers. Decision-makers, Business Analysts, Data Analysts, and Data Engineers all speak slightly different languages and have different views of the problem.
User Adoption: If tools are too complex or don’t provide relevant answers quickly, users won’t use them. The same applies if the results or methods are not yet credible, which may be a barrier of adoption for nascent technologies like generative AI.
Data Complexity and Modeling: Real-world data is messy and often requires sophisticated modeling and understanding of the context to make sense of it.

The first 3 problems are there to be solved by the right people, processes and tools (AI included). Google Cloud offers plenty of tools and services to help.

We will focus on the other 3: Skills Gap, User Adoption, and Data Complexity. In fact, User Adoption is the consequence of data complexity combined with the lack of data engineering skills.

A lot of people across various business functions (Sales, Finance, Product, HR) already benefit from Business Intelligence over the company’s data. There are still barriers between the hypotheses they formulate from their knowledge of the business and the answers stemming from taping the data. They have analytical skills, but they often don’t know enough about the applications that produce the data (ERP, CRM). And on top of that, most of them don’t know that data lands in their analytical data warehouse, and how it’s exposed in analytical views.

What we end up with is a set of pre-designed dashboards that normally serve only a fraction of people, in a static context. And a minor change often requires a rigorous long process.

But what if we had an analytical tool that would make that first iteration of answering our analytical questions? And those questions — we would be able to ask them using our professional lingo, without even having to mind where the data is coming from.

What if we had a virtual unicorn employee that can help each and every role involved in the process of answering a business question using data analysis?

Data Agents are suited to make it happen. Let’s see what it takes to create such data agents.

The initial promise of Generative AI for data analytics often revolved around Natural Language to SQL (NL2SQL) — the idea that you could simply ask a question in English, and the AI would magically write the SQL query to get your answer.

The traditional NL2SQL approach usually involves:

Database Schema Injection: Feeding the database schema to the LLM.
Relevant Table Identification: Some preprocessing to help the LLM figure out which tables to use.
SQL Generation: Often a multi-turn process to refine the query.

But as many of us have discovered, this “traditional” NL2SQL often doesn’t quite cut it in the real world. Here’s why:

Schema Overload: Real-world database schemas (especially for enterprise systems like Salesforce) are often massive. They’re too large for the context windows of many small- and even medium-context models. Simply dumping the whole schema into a prompt isn’t feasible or effective.

This is where techniques like using vector stores for schema retrieval come in, but that has its limits…

…because Identifying Relevant Tables is not trivial: When a business question requires joining multiple tables (which is almost always the case for meaningful insights), simple vector-based retrieval of relevant tables and fields can be weak. The LLM needs a deeper understanding of entities, relationships, dimensions and metrics.

SQL Complexity & “Span of Attention”: Complex business questions translate into complex SQL queries.

And there are always things that junior data engineers and language models tend to overlook (e.g. currency conversion when aggregating across many currencies).

The Semantic Gap: what is “customer”, how are “sales” reflected in the data? Which “regions” are relevant or how is “this year” defined in the fiscal calendar?

Business questions are NOT just English language versions of SQL queries. There’s a significant semantic leap required.

Top 5 customers by value in the US this year

This question is packed with contextual nuances that an AI needs to understand:

“Customers”: Which table represents customers? (e.g., Accounts in Salesforce).
“Value”: What does “value” mean in this business context? Is it total revenue, profit margin, number of deals? How is it calculated? (e.g., “sum of all opportunities won, weighted by opportunity amount”).
“US”: How is “US” represented in the geo dimension? USA, US, United States? Which table and column hold this information?
“This year”: What’s the relevant time dimension? Which table and date field should be filtered? (e.g., CloseDate on Opportunities).
Currency Conversion: If opportunity amounts are in different currencies, they need to be converted to a common currency (e.g. `USD`) using a DatedConversionRate table.

A simple NL2SQL model will likely choke on this. It needs a lot more context and reasoning capability even if it used pre-defined semantic models. This is where Gemini shines.

Gemini models, especially Gemini 1.5 Pro and the new Gemini 2.5 Pro, are built to handle exactly these kinds of complex, context-rich scenarios.

Two key features make a massive difference:

Massive Context Windows (up to 2 Million Tokens with Gemini 1.5 Pro!): This means Gemini can comprehend and consistently retrieve information from incredibly large data models, including detailed annotations and even source code. Imagine feeding it your entire (well-annotated) database schema, relevant business glossaries, and examples of good queries. The “Les Misérables” example, where Gemini 1.5 Pro processed the entire novel (1382 pages, 732k tokens) to locate a scene from a hand-drawn sketch, is a testament to this power.
Powerful In-Context Learning: Gemini can learn and reason about topics and domains it wasn’t explicitly pre-trained on, directly from the information provided in the prompt. It can recognize coding practices, understand business-specific jargon, and apply conceptual formulas if you provide them. The MTOB benchmark, where Gemini learned to translate Kalamang (a language with fewer than 200 speakers and ~250k tokens of instructional material) is a prime example.

And with Gemini 2.5 Pro, we’re seeing state-of-the-art performance in enhanced reasoning and code generation, crucial for building effective data agents.

These capabilities allow us to move beyond simplistic NL2SQL and build agents that can truly understand the business context.

Agents

Reasoning capabilities of modern generative models truly shine in multi-agent scenarios. The trick is simple — describe the team working on your problem (these are your agents) and carefully describe every agent as a persona, with their role, experience, principles of solving problems, and constraints.

Each agentic teammate starts speaking professional lingo that is specific for their role, and users can always digest the results in a way that is familiar to them.

This approach also drastically improves quality of the results each agent produces — if you compare it with one-chat-does-it-all approach, even after multiple turns.

Data Models and Data Engineering

Software people often design underlying data models of their apps in a way that is optimized for consuming by the apps, not people. When the time comes to analyze that data, we face a problem of translating business rules to operational and analytical data models.

The nature of language models in in their training data — mostly well written natural language. The biggest consequence of that is these models understand concepts better if they are clearly explained in natural language.

So, to make our data model make the most sense to a language model, we need to come up with a structure that is logically clear and reflects underlying business processes. Basically, optimize your data models for people — language models will thank you too.

Often it’s too late to make a good data model for our business app from scratch. Or it’s not even our app. Build a good data model on top of your original one, and annotate it in details.

How to annotate. That’s another ingredient of the secret sauce.

Metadata

Before you start developing your data agent, plan these 2 pieces:

Data Model: The agent needs access to a data model that “makes sense” in business terms. This might not be your raw OLTP schema but rather a representation tailored for analytics, perhaps in your BigQuery data warehouse.
Annotations or Metadata: This is your greatest IP. Every table and field needs to be annotated in two ways:
Business Meaning: What does this field represent in the real world? What are its constraints, relationships, and typical uses? What are possible synonyms? For example, is “stock” and “inventory” the same?
Data Engineering Details: Data types, nullability, specific formatting rules.

Crucially, you need to identify and provide “company facts” — your company name, industries, top KPIs, and their conceptual formulas (e.g., Customer Lifetime Value = …). Gemini can then use its in-context learning to apply these.

Many business applications provide rich metadata for their data models. Salesforce is a champion in it. They provide detailed structured descriptions of every object and every field, including relationships, value constraints, etc. And customizations made on your Salesforce instance will be included as well.

Agent Evaluation

Evaluating agents on your core scenarios and questions is equally important.

Unit and integration tests offer clear stability checks in traditional software engineering, but they don’t cut it for generative agents due to their non-deterministic nature. To properly assess LLMs, we need to qualitatively examine both their results and their reasoning process — the steps they take. Setting up automated evaluations for this is extra work initially, but it’s a highly recommended best practice that pays off quickly if you’re serious about moving past the prototype stage.

Evaluation must be done at the whole agent level as well as for every sub-agent.

Plan toolchain and techniques you would use to evaluate your agents.
Collect evaluation data.
Identify metrics and target intervals.
Run evaluations as part of your DevOps/MLOps.

Agent Development Kit and Vertex AI, each offers its help with evaluating your agents.

This brings me to the CRM Data Q&A Agent project. It’s an open-source example of how to build a multi-agent system using the Google Agent Development Kit (ADK), powered by Gemini on Vertex AI, to answer natural language questions about Salesforce.com data stored in BigQuery.

To quickly deploy a demo as a publicly available Cloud Run service, use this link.

Here is what our Data Agent is made of.

The core idea revolves around a Mixture of Agents, each specializing in a part of the problem-solving process.

Orchestration Agent (Root Agent): This is the conductor of our AI orchestra. It receives the user’s question and delegates tasks to specialized agents. It’s responsible for the overall flow and reasoning about the results.
CRM Business Analyst (BA) Agent: This agent is crucial for bridging the semantic gap. It takes the user’s ambiguous business question (e.g., “Who are my best customers?”) and translates it into a concrete analysis plan. It identifies key metrics, dimensions, and conceptual CRM objects (like Customers, Opportunities, Leads) needed to answer the question. It understands that “best” can mean different things and might propose interpretations.
Data Engineer (DE) Agent: This agent takes the BA’s plan and the annotated schema of our Salesforce data in BigQuery to write high-quality, performant BigQuery SQL. It handles complexities like currency conversion (using a DatedConversionRate table) and geographical name variations. This agent includes a SQL Validator sub-tool that uses Gemini to check and correct the generated SQL against the BigQuery schema and dialect, significantly improving reliability.
BI Engineer Agent: Once the DE agent provides the SQL, the BI Engineer executes it against BigQuery, retrieves the data, and, importantly, generates an appropriate visualization. In our case, it creates interactive Vega-Lite 4 charts. This agent also has its own set of sub-tools:

Vega Lite Validator: Ensures the generated chart JSON is valid.
Dimension Filter Extractor: Populates chart parameters for interactive filtering.
Visual Chart Evaluator: Uses Gemini’s multimodal capabilities to “look” at the rendered chart and provide feedback on its readability and effectiveness, leading to iterative chart refinement.

Building truly intelligent data agents is an exciting frontier. While NL2SQL was a starting point, the combination of advanced models like Gemini, sophisticated agentic architectures enabled by frameworks like ADK, and the scalable infrastructure of Google Cloud allows us to tackle much more complex, context-aware data conversations.

What to remember:

Context is paramount: Simple keyword matching or schema dumping isn’t enough.
Gemini’s long context and in-context learning are vital for understanding complex business queries and data structures.
Multi-agent systems, where specialized agents with well defined personas and equipped with tools collaborate, offer a robust way to break down complex problems.
Well-annotated data models are your company’s unique IP in the GenAI era.
Google Cloud provides the end-to-end platform, from data storage in BigQuery and Cloud Storage, to model serving with Vertex AI, to application deployment with Cloud Run, and session management with Firestore.
The Google Agent Development Kit simplifies the creation and orchestration of these agents.

This is an evolving space, and there’s still so much to explore. How do we make these agents even more intuitive? How can they proactively offer insights? How do we best manage the “art” of prompt engineering and annotation at scale?

We’d love to hear your thoughts! What challenges are you facing when trying to build conversational interfaces for your data? What use cases are you most excited about? Drop a comment below, and let’s get the discussion going.

To start building your own breakthrough solutions with Generative AI and many other services on Google Cloud, check out these resources:

And of course, feel free to explore the CRM Data Q&A Agent repository, try it out, and contribute!

— Vlad Kolesnikov and Lucia Subatin.

Related Stories

Introducing checkpointless and elastic training on Amazon SageMaker HyperPod

Docker Gamefied: Learning Containers in the Age of AI Agents

New serverless customization in Amazon SageMaker AI accelerates model fine-tuning

You may have missed