When we Decoded the Context in the last post, one of the observations was that the context holds the history and the history could have conflicting instructions on how the model should behave.
So how might we get around this?
1. Clear the context on each turn
Gemini CLI has a command /clear which clears the context on each turn. This is the easiest and cleanest way to set the exact context and instruction that we need. However it is a pain for continuity since the most natural way of working is to build on previous commands — eg: 1) do this, 2) now that didn’t work so do this, 3) alright, let’s revert everything we did… etc.
So this is not an option. We can use it when a logical piece of work is done, but can’t run this between every command.
Another issue that might happen when you keep clearing or compressing the context is that it will load the system instructions and gemini.md hierarchy and the folder hierarchy each time, thereby costing tokens, and since it does this every time with new hashes, it will not be able to take the benefit of context caching in the model.
2. Send the full context / instruction on each turn
Again, a very logical approach and probably what many of you might have thought of at the very first instance. If I ask the model to talk like a pirate on each turn, then there is no way it will talk like a Yoda. Right?
Of course, that’s true, and it might be a very very complicated context situation when this won’t work.
In more complex situations, the context / instruction might be so large that sending it over and over again might make it too large and bloated. The advantage we have with Gemini is that 1M token context window. You can keep throwing context at it and it won’t bat an eyelid.
What will make you bat an eyelid will be the API cost you’ll be paying for the massive # of tokens being processed and the quotas that you might be burning.
But let’s keep cost out of this for now. I am focussed on quality of output rather than cost for now.
There is a very important use case that will cause us issues even if we pad context each and every turn.
Gemini CLI is an agentic tool and therefore might go about executing tools, looking up files, and doing a whole bunch of stuff by itself without any user input needed. So it might end up putting a lot of stuff into the context without you ever being able to intervene and re-pad the context.
Depending on what files it’s been reading and processing, the context might be “polluted” enough that it causes side-effects, especially if it runs through many folders and loads many GEMINI.md and README.md files into the context.
3. Run a new session on each turn
What if we could run a new session with a set of instructions, such that all processing happens within that session and the context doesn’t bleed back in? Sounds plausible?
A bunch of interesting videos by John Capobianco got me thinking about using Gemini CLI itself as a standalone server.
We launch Gemini CLI as the primary input interface. However every task that we provide to the CLI is executed in a new instance of the Gemini CLI. These are not mere prompts but full agentic goals. The new instance will do all it takes, read files, write files, analyse, run loops, until the user goal is satisfied, or report back that it failed.
The risk here is still that the token usage will explode. Maybe even more than the previous approach of repeating instructions, since entire system contexts will need to be reloaded and there will be no context sharing. However, as I said, my priority is quality of output over token usage… for now at least.
With that, I built out the superagent prompt that breaks down the user goal, let’s me spawn sub-agents (sequentially, one at a time, for now), and executes it to completion.
Design Decision: Isolated Agents
After some thoughts (from running Gemini CLI with a listener port hook, to running Gemini CLI on Cloud Run, to building a Jules-esque system) I decided to use the shell tool to invoke Gemini CLI with a prompt. With some basic testing I was able to prove this works.
The way I thought about this was to have an orchestrator agent which I am calling Strategist and a bunch of worker agents which I am calling Specialist. The strategist spins off specialist agents on demand and as the task progresses towards the goal.
The specialist once done with its task, hands control back to the strategist for checks and further execution.
In a sense this is a discussion around sub-agents — creating many agents and handoff between the agents. There is a lot of content around this area which might be worth a read. Incidentally my colleague Paul Datta has been exploring the idea of sub-agents too, and has written some great articles on creating multi-agent systems with Gemini CLI.
He also references his inspiration from the Anthropic post on subagents with Claude Code. I haven’t gone through this yet, so let me know what you think of sub-agents if you have been exploring this area.
* **Delegate & Execute:** Append your new turn to the `Agent Work Log`. In the `Next Step` block, write the summary of the prompt which you have crafted to the specialist agent and other details as indicated. As your final action for this turn, you must invoke the specialist using the `shell` tool passing the contents of the prompt file just created to the command `gemini` via the `shell` tool, for execution in a new session.
* **Await Control:** Wait for the specialist agent process to complete its work, terminate, and hand over control back to the strategist. Then repeat the process and execute the next turn.
Design Decision: Dynamic Agents
While we could easily build out the isolated agents, I have always been of the opinion that agents using GenAI should be able to think, reason and plan their approach by themselves. I have never liked providing instructions such as — “when the user wants to look for shoes, then use query_product with the parameters set to shoes and users preferences”.
As long as the AI knows there is a tool called query_product and has all the parameters of the tool, maybe a list of supported enums for the parameters, the AI should be able to invoke the tool whenever needed. I shouldn’t have to provide rules on when to invoke which tool and how the agent should go about its task.
Based on this very reasoning, I feel we shouldn’t be creating too many specific sub-agents. Ever since we started using AI mainstream, I have been thinking about how might we create new sub-agents dynamically — i.e. we don’t pre-program a set of agents, we at runtime create the agents that are needed for the task.
My reference has been that of an engineering team, a scrum team perhaps — a manager / scrum master agent that has the bigger picture, a tech lead agent that can do deep analysis and lay out an implementation plan, a developer agent or two that can do the implementations based on the plan, tester / QA agents that does verification, and back to the scrum master that verifies that the software product has been built to the user’s specification. The team and size is formed based on the engineeering task and complexity at hand. Sometimes we’ll need UI devs, sometimes mobile devs, sometimes UXRs.
Now with Gemini CLI and this context analysis, I was able to make that work successfully.
What we are doing now is that the main agent — the strategist will take a call on what is the next task to be done and what kind of agent is required for it. Creating the agent just takes a prompt and Gemini CLI will use that prompt with an agentic approach and go do what is needed. For the next task, Gemini CLI will just craft another specific agentic prompt and hand off to another Gemini CLI process.
All this is completely dynamic. New agents are spawned based on need and nothing needs to pre-decided or pre-built. We just have to “teach” Gemini CLI how to craft a detailed prompt for any task.
* **Formulate:** Craft a precise prompt for the next specialist agent using the "Prompt Engineering Best Practices" below and save the prompt only in an appropriately named file (as suggested below) for the specialist. This should include the files the agent has to write to, instructions to update its thinking and execution results, and finally exit gracefully in order to handoff control back to the Strategist.
The reference I used to help craft the perfect prompt was this excellent guide on prompt engineering by Anthropic.
There were a few interesting gotchas where the dynamically generated prompt was so large that I was finding it tough to reliably pass it to the Gemini CLI sub-session.
If I passed it directly as gemini then it had special characters that would cause the shell command to fail. I then had to write the prompt into a file and pass it to Gemini CLI. That too failed due to command substitution security settings. Then I tried to read the file and cat it to gemini, but it split it at newlines and sent split up the prompt and sent it one by one.
Finally, I landed on something simple. I saved the prompt to a file. Then I launched Gemini CLI with a simple prompt that told it to “read the said file and execute the instructions within”. This worked flawlessly!
The custom prompt should simply ask `Gemini CLI` to read the contents of the prompt file for the specialist agent which was just created, and execute the instructions within.
The custom prompt should never pass the contents of the prompt file. It should pass just the full path to the file along with instruction asking the specialist agent to execute the commands within the mentioned prompt file.
Design Decision: Agents Communication
When faced with a large goal, the strategist could break down the task and invoke an army of specialists to work on bits and sections of the larger task. Then the agents have to communicate information back to the orchestrator and the orchestrator needs to send the information back to the next agent.
How do you get the agents to be independent, i.e. in terms of context, and at the same time share information and communicate with each other?
An approach would be a common session log file that each agents will write to and read from. This will be like a scratchpad for all the agents where they will share notes.
While this sounds simple to say, it is tough to implement reliably. The agents should modify only specific sections, should not overwrite other sections, etc. The biggest risk when working with offline / asynchronous agents is to understand what exactly it did. You come back to the file system after a long running process and you see content has changed in ways that you didn’t not expect or intend.
I experimented with various formats of the file, include Markdown with YAML front matter, templates, etc. Finally settled on a markdown only template with some instructions included in the template.
**You are a specialist agent. This file is your Session Log. You MUST follow these rules:*** **Agent Log Integrity:** To add your turn to the `Agent Work Log`, you MUST follow this procedure exactly:
1. **Read:** Use the `read_file` tool to read the entire content of this Session Log file.
2. **Append:** In your agent's memory, concatenate your new log entry (using the "Agent Log Entry Template") to the content you just read.
3. **Write:** Use the `write_file` tool to write the *entire*, updated content back to this Session Log file.
* **CRITICAL:** Under no circumstances should you modify, delete, or alter any part of the log file that is not your own entry. The Strategist is the ONLY agent that can modify the `Master Plan`.
Design Decision: Progress Tracking
The final bit of the puzzle, and actually the thing that originally set me down this path. I am very interested in seeing the thinking of the models and the progress it makes while working on tasks. I want the model and the CLI to keep me informed at all times about what it is doing, and I would love to have a checklist style output with a dynamic list of tasks to be done, and them being checked off as it’s being done.
As an example, below is what I had envisioned.
* [📝] **Phase 1: Goal Deconstruction & Analysis**
* [✅] Deconstruct the user's request into fundamental questions.
* [⏳] Use `glob` and `read_file` to gather initial context from the codebase.
* [ ] Formulate a verifiable understanding of the current state.
* [ ] **Phase 2: Structured Planning**
* [ ] **Phase 3: Step-by-Step Execution & Verification**
* [ ] **Phase 4: Final Alignment & Sign-Off**
This worked beautifully well at times. However also there were times when it wouldn’t output its thinking and progress at all. That’s when I started deep diving into the context and realised there is just so much information there that the model might get confused or overwhelmed.
The instructions were also at odds with the system instruction which tell it to be succinct and not overly explain things.
So with the new instruction set, I decided that I just want to be kept aware of what was happening. The model could decide how to present the information to me.
The interesting thing here is that since we have multiple agents and the specialist agent being called within a shell, the inner agent will not have the instructions to output its thinking. So I had to add specific instructions to the user GEMINI.md so that any agents that spawns off will have these common instructions that it can adhere to.
### Show Thinking and Track Progress
* While always being terse (and not verbose) let the user know how you are thinking about solving the problem / task before you start work. Output your thinking in bullets.
* While executing the task, keep the user updated on the progress of the task - what's working, what's not, whether the task is completed. Output your progress as checklists or tasklists.
Note: Output each thought / progress / action in a new line so that it is readable. All such output should be short sentences or phrases.
The other inclusion is a progress template in the session log. The strategist will update this checklist as specialists complete off tasks. Opening up the markdown file in a “live reloading” environment / IDE will help me keep track of the progress via the checklist as well as can I read the logs being added by the agents to see what exactly each of them did.
## Master Plan
*(Strategist Only: This is the high-level plan. The Strategist will update the status of each step here.)*- [ ] *Step 1: Strategist will define this.*
- [ ] *Step 2: Strategist will define this.*
- [ ] *Step 2.1: Strategist will define this.*
- [ ] *Step 2.2: Strategist will define this.*
- [ ] *Step 3: Strategist will define this.*
...
Bonus: Interactive Shell
At the time I was writing this and publishing my repo, the Gemini CLI team dropped this amazing new feature — Interactive Shell.
Previously, I didn’t have much control of the sub-agent. In fact I was running the sub-agent in YOLO mode so that it could just go and execute whatever it felt like without waiting for user input, and I hoped that the orchestrator would be able to handle deviations and mistakes.
Now with the interactive shell, I have 2 advantages.
- I can see what all the sub-agent is doing as all the output it logs to console is updated and rendered to me in real-time.
- I can actually run Gemini CLI in an interactive mode within the sub-agent, and that drops me back into a full Gemini CLI experience where I can approve tool use, approve implementation plans, and so on.
So I updated my superagent instructions to run the dynamically generated prompt in an interactive Gemini CLI session within the interactive shell.
Then invoke `gemini` in the interactive mode (-i) with a custom prompt using the `shell` tool so that it executes in a new session.
The **mandatory** syntax to be used to invoke the specialist agent is: `gemini -i ` which will invoke it in interactive mode.
Finally — The Superagent Source Code
Finally, the superagent that you might be interested in taking a look at. I have built this using FastMCP which also launched this recent integration with Gemini CLI, making it super easy for me to build and install MCP servers.
Head over to https://github.com/ksprashu/gemini-cli-mcp-servers for all the MCP servers for Gemini CLI that I am building out.
Github: https://github.com/ksprashu/gemini-cli-mcp-servers/tree/main/superagent_server
The Superagent server is in the above path, and the instructions to install this is in the README in the root of the repo.
Remember to create the folder for the scratch files and provide Gemini CLI with access to it.
Appendix — Working of SuperAgent
I’ll leave you with a few screengrabs of how I used the superagent and how it helped me build a feature (which I will talk about in the next blog).
Easter Egg: Codebase Investigator Agent
Btw, did you know that you can install the preview version of Gemini CLI via npm install -g @google/gemini-cli@preview ?
This has some cool new features, including the “Intelligent Model Router” which I will blog about next. Sneak-peak: I will also show you a new feature that I implemented using the superagent that I built in this post.
I also happened to notice that there is now an in-built agent in Gemini CLI — the Codebase Investigator Agent. It’s quite cool actually! Install the preview release and try it out yourself.
Did you like this? Did you try out the Superagent MCP server? Let me know what are your thoughts.
Source Credit: https://medium.com/google-cloud/advanced-gemini-cli-part-3-isolated-agents-b9dbab70eeff?source=rss—-e52cf94d98af—4
