

An agent based content production studio? What the heck is that? What are AI agents?
Hol’ up! Let me explain. Commercials are expensive. What if they were cheap and easy? What if you had an AI that created any commercial you dreamed up?
Well you can using agents!
Agents are all the rage lately. The hype train says they will replace humans for all tasks large and small, making SWE’s drop to their knees and cry.
Tools are external functions or APIs that an LLM can call to perform specific actions. Think of them as Python functions available to the model, like a search engine query or a database lookup.
So an agent is an LLM that uses tools, checks what happened, and keeps going until it gets the job done.
So how do we use this in the real world? Well there are several ways but today are going to focus on the Google ADK.
The Android Development Kit! WUT??… no no no, not that ADK 😉
The Agent Development Kit is a new open source tool by Google that provides a framework to build and deploy your agents in GCP, or anywhere really.
Today, we’ll use Gemini’s APIs to do our dirty work. No cloud infra to manage and integrate. Sorry for the tease devops engineers, maybe next time!
Now, what does our content studio do? WE MAKE COMMERCIALS!!!!!
Sadly our studio has only one employee, our stalwart video producer, wrangler of audio and video streams, powered by Gemini Flash 2.0.
Let’s try it out. We are going to use python, that means venv. Clone the repo and pip the requirements, authenticate then tart the web interface.
python -m venv .adkplayground
source .adkplayground/bin/activate
git clone https://github.com/byronwhitlock-google/adk-playground.git
cd adk-playground
pip install -r requirements.txt
gcloud auth application-default login
adk web
Go to http://localhost:8000 and choose the video_producer_agent
Try out a prompt. You can watch the agent make decisions and execute the plan.
Voila, a light hearted commercial about a SAAS candy bar.
We haven’t seen the code! List the cloned directory:
ls -l ~/adk-playground
Take a look around, the video_producer_agent
directory has all the code and prompts to make the magic happen.
Take a look at agent.py
this is where the root
agent is defined. Think of the root agent as the entrypoint . Also notice there is a tools array. This is super important and contains the functions this agent is allowed to call.
root_agent = Agent(
name="video_producer_agent",
model="gemini-2.0-flash", # Make sure this is the correct model identifier
instruction=prompt,
tools=[
#AgentTool(agent=video_generation_agent),
gcs_uri_to_public_url,
video_join_tool,
video_generation_tool,
text_to_speech,
mux_audio,
get_linear16_audio_duration_gcs
]
)
Each of the tools are python functions. But you should observe some differences. — Don’t throw exceptionis if you want the LLM to react. Returning exceptions rather than raising them lets the agent see the error and try to work around by calling the tool again with different parameters.
You should also notice a few *test.py files. These are super quick and dirty test scripts to make sure the functions work. The LLM is slow, so it is faster easier to test them directly.
When you pass the tools to the agent, it uses the python docstring
to figure out how to call the tool. It is very important to use a good description for all parameters and function names.
The agent also takes a prompt that tells it how to do its job. Think of this as the “system prompt” for the agent.
For our content generation agent in particular we need to tell it a few things in order to effectively use the tools.
- How to break down a commercial into scenes
- How to prompt veo2 to get good results
- How to handle audio and video streams of different lengths
video_join_tool
This tool accepts a list of videos, and joins them together using the Transcoder API. It returns the GCS uri of the joined video.
video_generation_tool
This tool takes a prompt and generates a veo2 video clip based on the prompt. It returns a GCS uri of the vidoe clip. Veo does not generate audio.
text_to_speech
This tool calls the GCP Text To Speech API. It abstracts voices into 4 categories allowing the producer to specify different voices. it also handles speech rate. It returns a GCS uri of the LINER16 audio clip.
mux_audio
This tool takes an audio uri and video uri and muxes the audio and video together. IT also takes the end offset to handle mismatched audio/video lengths. The algorithm to figure out how to managed lengths is stored as a prompt in the video producer.
get_linear16_audio_duration_gcs
This tool gets the duration of the audio stream to assist in muxing.
As a seasoned developer, what struck me is the elegant simplicity of the ecosystem and ability get going so quickly.
The ADK docs are excellent https://google.github.io/adk-docs/.
Now you have to tools to go forth and build agents to do all of the things!
Bye!
P.S. Here are some more fun videos. Have a gander!
Prompt: A commercial about the Dogwood Meditation studio. Specializing in Deep breathing, no talk therapy, and holistic education. There are lots of cushions, it is very comfortable. the hardest thing is to do it and dogwood makes it easy and fun.
Video Link
Prompt: Google Cloud Professional Service Commercial. Over the top Infomercial style. The problem is failed migrations, slow time to market, and overall friction for developers and teams moving to cloud. The solution is google PSO. explain how they do it infomercial style chaos to order because of product. End with ‘Side effects include: more high-fives, less coffee, and developers who actually like Mondays. Terms and conditions apply, Bitcoin not accepted. Double check scenes to ensure good logical flow between them while telling the story.
Video Link
Source Credit: https://medium.com/google-cloud/did-your-next-viral-commercial-just-get-built-by-googles-adk-ai-seriously-it-did-620818d54237?source=rss—-e52cf94d98af—4