Multimodal Live API Tooling. Welcome back to this Multimodal Live… | by Sascha Heyer | Google Cloud - Community

Welcome back to this Multimodal Live API article series. 📚 To explore previous articles in this series, head over to the article series overview. I highly recommend reading the previous articles to get up to speed.

So far, our AI can listen and respond with impressive fluidity, and we can see the transcriptions in real time. But what if we want it to do more than just talk?

What if we want it to automatically fetch information or interact with other services? That’s where the magic of tooling/function calling comes in.

In this article, we’re diving deep into how to equip our voice assistant with the power to use tools.

Before we dive into the specifics of using tools with the Live API, it’s worth understanding the broader concept if you’re new to it. Large Language Models are rapidly evolving beyond their initial training data, becoming powerful reasoning engines.

By leveraging capabilities like function calling (also known as tooling), LLMs can now interact with external APIs and real-time data, enabling them to perform actions.

You can check out my other article on the topic for a general overview, including a YouTube Video.

Now, let’s focus on the Live API and let’s get practical. We’ll enhance our audio-to-audio assistant from the previous article. We integrate the ability to check the status of an order.

A user will be able to ask, “Can you check the status of order 12345?” and our assistant will use a custom function to fetch (or, in this example, simulate fetching) that information and report back.

Before we can define and configure our tool, we need to ensure we have the necessary components imported from the Google Generative AI SDK. In addition to the imports we used for audio streaming, we now need to add FunctionDeclaration and Tool to describe our function to Gemini.

from google.genai.types import (
LiveConnectConfig,
SpeechConfig,
VoiceConfig,
PrebuiltVoiceConfig,
FunctionDeclaration, # New: For describing our function
Tool                 # New: For packaging our function(s)
)

With these in place, we’re ready to define the actual function.

A tool is just a regular Python function that performs the desired action. For our example, we’ll create a mock function that simulates looking up order details.

def get_order_status(order_id: str) -> dict:
...
# you can see the full code in the GitHub repo
# esentially we simply mock a order status API
# and return an dummyorder status

This function takes an order_id and returns an order status. The key is that it’s a callable piece of code that produces a result Gemini and the Live API can use.

Next, we need to tell Gemini about this tool: what it’s called, what it does, and what parameters it expects. This is done using FunctionDeclaration and Tool objects from the google.genai.types module.

from google.genai.types import FunctionDeclaration, Tool# Define the order status tool
order_status_tool = Tool(
function_declarations=[
FunctionDeclaration(
name="get_order_status",
description="Get the current status and details of an order when a user asks about their order. You should ask for the order ID if not provided.",
parameters={
"type": "OBJECT",
"properties": {
"order_id": {
"type": "STRING",
"description": "The order ID to look up, usually a sequence of numbers or alphanumeric characters."
}
},
"required": ["order_id"]
}
)
]
)

Let’s break this down. The FunctionDeclaration object is used to describe a single function.

Its name parameter, in this case, must precisely match the name of our Python function.

The description field is crucial, the underlying model uses it to understand when it is appropriate to use the tool. A well-crafted description helps the AI make the correct decision.

Finally, the parameters section defines the input parameters that the function expects. We specify that order_id is a required string, and this schema helps Gemini structure the arguments correctly when it decides to call the function.

The Tool objec then serves as a container for one or more FunctionDeclarations, and we package our order_status_tool into this.

Now, we need to tell our LiveConnectConfig about the tools we want to make available to Gemini. We also provide a system_instruction to further guide the AI’s behavior, especially regarding tool usage.

# Assuming client is already initialized as per previous articlesCONFIG = LiveConnectConfig(
...
system_instruction="You are a helpful customer service assistant for an online store. You can help customers check the status of their orders. When asked about an order, you should first ask for the order ID if the user hasn't provided it. Then, use the get_order_status tool to retrieve the information. Be courteous, professional, and provide all relevant details about shipping, delivery dates, and current status based on the tool's output.",
tools=[order_status_tool], # Here's where we add our defined tool
)

Two important additions are made in this configuration.
Firstly, the tools=[order_status_tool] line explicitly passes a list containing our previously defined order_status_tool to the configuration, making it available to the model.

Secondly, the system_instruction has been carefully crafted. This instruction guides the model on how to behave as a customer service assistant, specifically mentioning the get_order_status tool and the recommended process of asking for an order_id if it’s not initially provided.

This system instruction, when combined with the detailed description within the tool itself, significantly improves the reliability and accuracy of the function calling mechanism.

This is where the new logic integrates into our existing receive_and_play asynchronous function from the previous article.

This function is responsible for listening to responses from Gemini.

The core idea is that within the async for response in session.receive(): loop, we look for a response.tool_call. If it exists, it means Gemini wants us to execute one or more of our declared functions.

First, we check if the incoming response object contains a tool_call attribute.

If it does, it means that the Live API or the Gemini model behind it has determined that a function needs to be executed.

if response.tool_call:
print(f"📝 Tool call received from Gemini: {response.tool_call}")
function_responses_to_send = []

Next, Gemini might request multiple functions to be called in a single turn (though it’s often just one).

So, we iterate through response.tool_call.function_calls. For each function call in this list, we extract the name of the function and the unique call_id.

This call_id is crucial because we need to include it when we send the function’s result back to the Live API, allowing it to match our response to its original request.

for function_call in response.tool_call.function_calls:
name = function_call.name
args = dict(function_call.args) 
call_id = function_call.id 
print(f"📞 Gemini wants to call function: {name} with args: {args} (Call ID: {call_id})")

Now, we check the name of the requested function and use the corresponding Python function. The cool thing is that the parameter/argument is returned by the Live API and is extracted from the live conversation. Meaning if I say: “I would like to know the order status of order SH1005” the Live API with the underlying Gemini model extracts this parameter dynamically.


if name == "get_order_status":
order_id_value = args["order_id"]
tool_result = get_order_status(order_id=order_id_value)
print(f"   ✅ Function '{name}' executed. Result: {tool_result}")

After our function is executed, we construct a response.

We provide the original call_id (so the Live API knows which request this is for), the name of the function, and the response itself.

This FunctionResponse is then added to our function_responses_to_send list.

# Append the successful result (or error) to send back
function_responses_to_send.append(types.FunctionResponse(
id=call_id, # Use the call_id from the request
name=name,
response={"result": tool_result} # The actual data returned by your function
))

Finally, after iterating through the requested function_calls and collecting their results, we send them back.

After sending the tool responses, we use continue. This is important because it tells our loop to go back to session.receive() and wait for Live APIs next message. That next message will be Live APIs conversational reply, now informed by the results of the function(s) we just executed and sent back.

We don’t expect any playable audio or displayable text in the same server message that contained the initial tool_call.

# Send all collected function responses back to Gemini
if function_responses_to_send:
print(f"➡️ Sending function responses to Gemini: {function_responses_to_send}")
await session.send_tool_response(function_responses=function_responses_to_send)
continue

After your application is sent, the response back to the Live API processes the information your function provided. It then formulates a natural language response to the user, incorporating these results.

This response will come through as a regular server_content message.

For example, the Live API will return something like: “Okay, I’ve looked up order 12345 for you. It has been shipped and is estimated to arrive by May 10th.”

This opens up a lot of possibilities for building helpful multimodal voice-driven applications. While function calling (tooling) is not a brand-new concept, its integration into the Live API is particularly exciting.

I hope this dive into function calling with the Live API has sparked some ideas.

We’ve given our assistant ears, a voice, and hands to use tools.

What could be next? Perhaps we’ll finally tackle that 🇸🇪 Swedish furniture assembly challenge by integrating video input, allowing Gemini to see what we’re working on and its ability to call functions (maybe to pull up specific instruction steps). The possibilities are exciting.

You can find the complete Python script that includes the audio-to-audio setup from Part 2, now enhanced with the function calling capabilities for getting the order status.

If you want to better understand the change, check out the git diff (it’s not too complicated).

Either way, I hope you enjoyed the article.

Got thoughts? Feedback? Discovered a bug while running the code? I’d love to hear about it.

Connect with me on LinkedIn. Let’s network! Send a connection request, tell me what you’re working on, or just say hi.
AND Subscribe to my YouTube Channel ❤️