Optimize your prompt size for long context window LLMs | by Karl Weinmeister | Google Cloud - Community

And using code2prompt, yet another CLI tool for this purpose, yields a token count of 57,160,552.

What’s our conclusion? At roughly 56–69 million tokens, we are way over our large context window targets. We need strategies to shrink this down significantly.

Many files in a typical repository aren’t actually source code relevant for high-level analysis. Think about data files (.csv, .json), static assets (.svg, .css), and various schema files (.proto).

Let’s use repomix’s filtering capabilities to exclude these file types:


repomix . --ignore “**/*.ipynb,**/*.csv,**/*.svg,**/*.txt,**/*.json,**/*.js,**/*.css,**/*.sql,**/*.pem,**/*.proto”

This command tells repomix to ignore files matching these patterns. The result? A dramatic reduction down to ~2.8 million tokens. Tools like code2prompt and gitingest also offer filtering mechanisms via configuration or command-line arguments.

We’re much closer, but still nearly 3x over a 1M token budget!

Some tools offer basic compression, often by removing comments and excessive whitespace. Repomix has a --compress flag. Let’s combine it with our filter:

repomix . --ignore "**/*.ipynb,**/*.csv,**/*.svg,**/*.txt,**/*.json,**/*.js,**/*.css,**/*.sql,**/*.pem,**/*.proto" --compress

This further reduces the size, bringing us down to ~1.8 million tokens.

Better, but still not quite there. Compression helps, but it’s often not enough on its own for very large repos.

We need to get smarter. Instead of just removing file types, we need to remove or prioritize file content.

Option A: Focusing on specific areas

Is the entire repository necessary for your analysis? Perhaps you’re only interested in the App Engine samples, or the Pub/Sub examples. You can instruct tools to focus only on specific subdirectories. This drastically reduces the scope and, therefore, the token count.

Option B: Excluding low-value code

Test files (e.g. *_test.py files) are crucial for code quality, but often less important for understanding the core logic or architecture via an LLM. Excluding them can save a significant number of tokens.

repomix . --ignore "**/*.ipynb,…,**/test/*,**/*_test.py,**/test_*.py" --compress

Combining filtering, compression, and excluding tests might get us under the 1M limit for many parts of the python-docs-samples repo.

Option C: Intelligent prioritization with yek

What if you do want a broad overview, but still need to meet a strict token limit? This is where tools like yek come in. Yek can prioritize files within a specified token budget, using Git history to infer importance along with user-defined rules. The goal is to include the most “important” or “relevant” files first, placing them later in the output since LLMs tend to pay more attention to content appearing later in the context.

Let’s try targeting our 1M token budget using yek:

yek --tokens 1024k .

Based on the experiment, yek produced an output within the budget:

DEBUG Processed 5167 files in parallel for base_path: .
DEBUG 816952 tokens generated
DEBUG 5167 files processed
DEBUG 3.3 MB generated
DEBUG 56351 lines generated

Yek successfully selected a subset of the (filtered) repository, resulting in ~817k tokens, well within our 1M target! This approach intelligently curates the codebase based on the tool’s heuristics, aiming to preserve the most valuable information within the given constraints.

An ASCII art representation of our repo after full codebase analysis

Assess: Clone the repo and get a baseline token count
Filter broadly: Remove common non-code or large data file types using ignore flags/configurations.
Filter semantically (if needed): Focus on specific subdirectories relevant to your task. Exclude test directories/files.
Prioritize (if needed): If still over budget but needing broad coverage, use a prioritization tool to automatically select the most relevant files.
Compress (optional): Apply compression if needed and supported by the tool, usually after filtering/prioritization.

Now you’re ready to perform all kinds of tasks from diagramming to fixing bugs. Check out Level up your codebase with Gemini’s long context window in Vertex AI for ideas.

Beyond just fitting into the context window, optimization offers several benefits. It sharpens the model’s focus by helping it concentrate on the most relevant code for your query, leading to more pertinent results. Additionally, smaller context windows generally mean faster processing speed. Finally, minimizing input tokens directly reduces cost for APIs that charge per token, allowing developers to better utilize features like Gemini 2.5 Flash’s configurable “thinking budget” to balance performance and expense.

While the focus here is on filtering and packing a repository into the LLM’s context window, it’s worth noting two major alternative strategies for handling large codebases:

Retrieval-Augmented Generation (RAG) with frameworks: Instead of putting the entire (filtered) codebase into the prompt, frameworks like LangChain and LlamaIndex allow you to index the repository (e.g., breaking code into chunks and storing them as vector embeddings). When you ask a question, the framework retrieves only the most relevant code chunks based on your query and adds those to the LLM prompt. This is highly scalable and effective for question-answering over codebases far exceeding any context window limit. Find out how to set up RAG on your codebase in 5 easy steps.

Dynamic context management in AI coding assistants: Tools like Gemini Code Assist, manage context dynamically within your IDE or terminal. They include relevant local code into the prompt when generating code, explaining errors, or answering questions related to the immediate task. This avoids needing to manually package the entire repository. Gemini Code Assist for Individuals offers a no-cost tier, currently with 6,000 code requests and 240 chat requests daily.

Leveraging the massive context windows of models like Google’s Gemini 2.5 Pro and Gemini 2.5 Flash opens exciting possibilities for complex code analysis. However, as we’ve seen, real-world repositories often demand smart strategies to fit even within a million-token budget. Techniques like filtering by file type, focusing on specific directories, excluding tests, and using prioritization tools are essential for taming large codebases.

While these preparation steps are crucial, the best way to truly appreciate these capabilities is to experience them directly. Vertex AI Studio and Google AI Studio provide accessible web-based interfaces where you can easily experiment with different prompts, upload files, and directly interact with the latest Gemini models. Share your tips with me on LinkedIn, X, Bluesky, or Threads, and happy analyzing!

Source Credit: https://medium.com/google-cloud/optimize-your-prompt-size-for-long-context-window-llms-0a5c2bab4a0f?source=rss—-e52cf94d98af—4

Related Stories

Building secure, scalable AI in the cloud with Microsoft Azure

Cloud CISO Perspectives: The global threats facing EU healthcare

AWS Weekly Roundup: Project Rainier, Amazon CloudWatch investigations, AWS MCP servers, and more (June 30, 2025)

You may have missed

Building secure, scalable AI in the cloud with Microsoft Azure

Cloud CISO Perspectives: The global threats facing EU healthcare

AWS Weekly Roundup: Project Rainier, Amazon CloudWatch investigations, AWS MCP servers, and more (June 30, 2025)

My AI Agent Experiments — Connecting Salesforce Agentforce with Google Cloud’s ADK | by Gaurav Kheterpal | Google Cloud – Community | Jul, 2025