I Taught My AI Coding Agent to Write YouTube Descriptions

After producing dozens of videos, I’ve learned a ton. From researching the right topic to editing, each video brings unique challenges. Some tasks are not so interesting, though. Every video requires a description with a hook, timestamps, links, and hashtags. Hand-crafting it every single time can be a drag.

Copy-paste templates help, but they can’t adapt to the actual content of the video. They can’t pull timestamps from a transcript or decide which five hashtags have the broadest reach for this specific topic. I wanted something smarter. So I built an Agent Skill.

https://medium.com/media/3c68f0d5fb7e5e13017e91e677191f04/href

What exactly is an Agent Skill?

Think of a skill as a package of structured knowledge you hand to an AI coding agent, like the Gemini CLI or Google Antigravity, so it can reliably execute a specific task. Not a one-off prompt you paste into a chat window. A reusable, version-controlled, testable set of instructions that lives in your project like any other piece of code.

Here’s what’s inside a skill folder:

video-description/
├── SKILL.md              # The main instruction file (with YAML frontmatter)
├── references/
│   ├── TEMPLATES.md      # Platform-specific output templates
│   └── EXAMPLES.md       # Few-shot input/output pairs
├── scripts/
│   ├── validate.py       # Automated output validation
│   └── test_validate.py  # Test suite for the validator
└── assets/
    └── evaluations.json  # Regression test cases

The SKILL.md file has YAML frontmatter (a name and description so agents can discover it automatically) and a step-by-step workflow written in plain markdown. The references/ directory holds templates and examples the agent reads for context. The scripts/ directory is where things get interesting, but I’ll get to that.

The video-description skill, dissected

The video-description skill has six steps:

Analyze the transcript and any supplementary docs (blog posts, spec sheets).
Determine the target platform: YouTube, LinkedIn, or a universal format.
Select a template from TEMPLATES.md based on video type and platform.
Draft the description using the chosen template.
Add metadata — timestamps, resource links, social calls-to-action.
Validate the output by running a script.

Simple enough. But the design decisions baked into those steps are what make it actually useful.

First-person singular, always. This skill supports videos on my personal channel, which supplements official videos like this one I’ve produced for the Google Cloud Tech channel. The skill explicitly instructs the agent to use “I” and “my” instead of “we” or “our.” This is a small thing that ensures a consistent voice.

Exactly five hashtags. Not “a few.” Not “up to ten.” Five. With broad appeal. This constraint forces the agent to be selective instead of spraying generic tags, and it’s enforced programmatically (more on that in a moment).

Platform-aware templates. The TEMPLATES.md file contains three distinct formats: a Professional template (ideal for LinkedIn), an Educational template (YouTube with timestamps), and a Product Showcase template. Each has a different structure. The agent picks the right one based on context.

Here’s what the YouTube template looks like:

# [Title: Catchy and Benefit-Oriented]

[2-sentence hook explaining what I will teach you and why it matters.]

🚀 **Try it yourself**: [Link]

## What's in this video?
- [Key Point 1]
- [Key Point 2]
- [Key Point 3]

## Timestamps
00:00 - Intro
01:30 - [Topic A]
05:45 - [Topic B]
10:20 - Summary & Next Steps

#Tag1 #Tag2 #Tag3 #Tag4 #Tag5

And here’s a real output the skill generated for one of my videos:

# Agent Discoverability and Interoperability Explained

I explore how AI agents are evolving from standalone systems to an
interconnected ecosystem. I'll walk you through the history of web
discovery, the current state of Agent Cards and UCP, and the future
of MCP Server Cards and the Unified AI Catalog.

🚀 **Learn more about Vertex AI Agents**: https://cloud.google.com/vertex-ai/docs/agents

## What's in this video?
- How web discovery patterns like `.well-known` are being adapted for AI agents.
- The role of the A2A protocol and Agent Cards in agent-to-agent communication.
- How the Universal Commerce Protocol (UCP) enables agentic commerce.
- Emerging standards like MCP Server Cards and the Unified AI Catalog.

## Timestamps
00:00 - Intro: The Need for Connection
00:20 - The Past: Web Discovery Standards
00:38 - The Present: A2A Agent Cards
00:50 - UCP: Commerce Manifests
01:10 - The Future: MCP Server Cards
01:50 - AI Catalog and Unified AI Card
02:30 - Conclusion

#AI #Agents #A2A #MCP #UCP

It pulled the timestamps from the transcript, chose the educational YouTube template, synthesized the key points from both the transcript and a companion blog post, and used first-person singular throughout. No corrections needed.

The validation layer: testing AI output like software

This is the part that changed how I think about working with AI agents.

The scripts/validate.py file runs automated checks against the generated description. It’s a standard Python script. Just regex and string operations:

def validate_description(content):
    errors = []
    warnings = []

    # Length: flag if too long (>5000 chars) or too short (<100)
    char_count = len(content)
    if char_count > 5000:
        warnings.append(f"Description is very long ({char_count} chars).")
    elif char_count < 100:
        warnings.append(f"Description is very short ({char_count} chars).")

    # Timestamps: warn if none are found
    if not re.findall(r"\b\d{1,2}:\d{2}\b", content):
        warnings.append("No timestamps found.")

    # Hashtags: EXACTLY 5 required
    hashtags = re.findall(r"#\w+", content)
    if len(hashtags) != 5:
        errors.append(f"Expected exactly 5 hashtags, found {len(hashtags)}.")

    # POV: flag "we/our/us"
    if re.search(r"\b(we|our|ours|us)\b", content, re.IGNORECASE):
        warnings.append("Detected 'we/our/us'. Preference is 'I/my'.")

    return errors, warnings

It goes further. The test_validate.py file contains a full pytest suite that uses mock LLM responses paired with evaluation cases from evaluations.json:

[
  {
    "query": "Create a video description for a technical deep dive about AI agents.",
    "expected_behavior": [
      "Uses first-person singular 'I' and 'my'",
      "Includes exactly 5 hashtags",
      "Passes validation by scripts/validate.py"
    ]
  }
]

Each test case generates a mock output, runs it through validate_description() for structural checks, then runs a separate check_semantic_expectations() function for heuristic checks — does the output actually contain first-person pronouns? Does the hashtag count match what the behavior spec demands?

This is the pattern that matters: treat agent output as a testable artifact. You can write unit tests for your AI workflows the same way you write unit tests for your functions. Structural assertions catch formatting violations. Semantic assertions catch content drift.

Getting started in 30 seconds

Installing an agent skill takes one command. Clone the devrel-demos repo and symlink the skill you want:

# For Gemini CLI (global)
mkdir -p ~/.gemini/skills
ln -s ~/devrel-demos/agents/skills/video-description ~/.gemini/skills/video-description

# For a specific project
mkdir -p .gemini/skills
cp -r ~/devrel-demos/agents/skills/video-description .gemini/skills/

# For Antigravity (IDE)
# Copy to ~/.gemini/antigravity/skills/video-description

Then just ask: “Create a YouTube video description from this transcript.” The agent reads the SKILL.md, follows the workflow, applies the templates, and validates its own output.

Beyond YouTube descriptions

The video-description skill is a small, self-contained example. The pattern it demonstrates is bigger.

Agent skills give you a portable packaging format for AI behavior. They’re just folders of markdown and scripts — no proprietary format, no vendor lock-in. They work with any agent that reads SKILL.md files. This platform-agnostic approach is essential as emerging standards like Agent to Agent (A2A), the Universal Commerce Protocol (UCP), and the Model Context Protocol (MCP) define how agents discover and interact with each other. You can version them in Git, review them in PRs, and share them across teams.

What makes a good skill? Three things:

A clear workflow. Step-by-step instructions the agent can follow sequentially. Ambiguity is the enemy.
Few-shot examples. Concrete input/output pairs that anchor the agent’s understanding of “good” output. The difference between a mediocre skill and a great one often comes down to the quality of the examples in EXAMPLES.md.
Machine-verifiable output. A validation script that can programmatically check whether the output meets your standards. This is the piece most people skip — and it’s the piece that makes skills reliable instead of just convenient.

To learn more, Ankur’s blog post provides step-by-step instructions with screenshots. Then, you can look at the devrel-demos skill collection for examples, including this one.

If you’ve got a repetitive content task, it’s worth encoding that pattern as a skill. Your future self (and your team) will thank you. Share the skill you’ve built with me on LinkedIn, X, or Bluesky!

I Taught My AI Coding Agent to Write YouTube Descriptions was originally published in Google Cloud – Community on Medium, where people are continuing the conversation by highlighting and responding to this story.

Source Credit: https://medium.com/google-cloud/i-taught-my-ai-coding-agent-to-write-youtube-descriptions-fdf7f200abac?source=rss—-e52cf94d98af—4