Save Tokens with TOON using Google Antigravity and the Gemini CLI | by Karl Weinmeister | Google Cloud - Community

When working with Large Language Models, token count is a constant concern. Every token affects performance, adding to cost and latency while consuming the available context window. Standard data formats like JSON, with verbose syntax like braces and quotes, can use up valuable context space.

TOON (Token-Oriented Object Notation) is a format designed to increase information density for LLMs. It’s compact, highly readable, and can be mapped back to a schema for validation.

What is TOON?

TOON is a data serialization format, similar in appearance to YAML, that’s optimized for token efficiency. Instead of using braces and brackets, TOON uses indentation to define the structure of the data, much like Python. This, combined with minimal punctuation, helps reduce the noise in prompts. Unlike JSON, it also supports comments, allowing developers to add hints or metadata directly into the data stream.

TOON is also highly readable for both humans and models. It remains schema-aware, allowing it to be mapped back to a strict JSON schema for validation. Most importantly, by reducing syntactic noise, TOON can actually improve model performance.

For example, a simple JSON object:

{
"items": [
{ "sku": "A1", "qty": 2, "price": 9.99 },
{ "sku": "B2", "qty": 1, "price": 14.50 }
]
}

Becomes this in TOON:

items[2]{sku,qty,price}: A1,2,9.99 B2,1,14.50

Potential for Performance Improvements

While the token savings are impressive, the benefits may go deeper. According to benchmarks from the official TOON project, the format may improve model performance in certain scenarios. In one test, TOON achieved 73.9% accuracy on a Retrieval Accuracy task, compared to JSON’s 69.7%.

The likely reason is that the “noise” of JSON punctuation dilutes the semantic signal. By removing syntactic clutter, TOON increases the information density of the context, allowing the model’s attention mechanism to focus on the relationships between data points.

Supported Use Cases

All kinds of use cases can benefit from the same amount of information conveyed with fewer tokens. Let’s walk through a few examples.

Compressing retrieved documents allows more evidence to fit into the context window for RAG pipelines. This can lead to better answer quality at a lower cost.

Toon can also support agent-to-agent communication, enabling more complex, multi-step conversations by passing compact payloads.

Finally, TOON can be useful for log and telemetry analysis, as it can convert verbose JSON logs into a clean, tabular structure that helps a model spot patterns and anomalies.

The TOON MCP Server

To make it easy to use TOON in agentic workflows, I’ve built a TOON MCP Server. This MCP server implements the Model Context Protocol to expose two tools for encoding and decoding TOON data:

encode_toon

Converts standard JSON data into the compact TOON format. It also accepts parameters to control indentation, delimiters, and key folding, giving you fine-grained control over the output. This is perfect for compressing data before sending it to an LLM.

decode_toon

Converts TOON-formatted text back into standard JSON. This allows you to parse an LLM’s response if it generates TOON output. This tool

Integration with Google Antigravity

Because the Model Context Protocol is a universal standard, you can bring these capabilities directly into your IDE. Google Antigravity supports connecting custom MCP servers, giving the editor direct access to TOON encoding and decoding.

To add the TOON server to Antigravity:

Open the MCP Store panel within the “…” dropdown at the top of the agent panel.
Click on “Manage MCP Servers” and then “View raw config”.
Add the TOON configuration to your JSON file:

{
"mcpServers": {
"toon": {
"command": "npx",
"args": [
"-y",
"git+https://github.com/kweinmeister/toon-mcp.git"
]
}
}
}

Once the server is added, you can simply ask Google Antigravity to use its tools:

“I have a large JSON dataset of products. Please encode it to TOON format to save tokens.”

Google Antigravity will find the registered toon server and use its encode_toon tool to compress your data.

Using with the Gemini CLI

You can also easily use the TOON MCP server with the Gemini CLI.

First, you need to add the server to your Gemini configuration. For a local server, you can use the npx command, which ensures you’re always running the latest version:

gemini mcp add toon npx -y "git+https://github.com/kweinmeister/toon-mcp.git"

Hosting on Cloud Run

Running the MCP server locally is great for development, but for production or shared workflows, hosting it on Google Cloud Run is a practical option.

By hosting the TOON MCP server on Cloud Run, you can offload processing to a scalable serverless platform and centralize your tooling logic. You can deploy it directly using the gcloud CLI:

gcloud run deploy toon-mcp \
--source . \
--port 8080 \
--allow-unauthenticated

(Note: I’m using --allow-unauthenticated for testing. You should secure your service in production.)

Connecting to a Remote Server

Once deployed, you’ll get a URL. You can add this remote server to your Gemini CLI or Google Antigravity configuration just like the local one. Here’s the Gemini CLI command:

gemini mcp add toon https://toon-mcp-xyz.a.run.app/mcp -t http

What’s Next?

This guide has shown how to use the TOON MCP Server to make your agentic workflows more token-efficient. By integrating a simple, powerful tool, you can reduce costs and improve performance, especially when dealing with large volumes of structured data. The combination of TOON for payload compression and the Model Context Protocol for standardized tool use is a foundational pattern for building scalable, production-ready AI systems.

To continue exploring, you can dive into the TOON MCP Server repository on GitHub to see the full implementation, or learn more about the format itself by reading the official spec at toonformat.dev. For a hands-on guide to deploying an MCP server to Google Cloud Run, try out the codelab.

I’d love to hear how you’re using TOON and MCP in your projects. Connect with me on LinkedIn, X, or Bluesky to continue the discussion!

Source Credit: https://medium.com/google-cloud/save-tokens-with-toon-using-google-antigravity-and-the-gemini-cli-e9a641c06ea8?source=rss—-e52cf94d98af—4