Build Accessibility-First Multimodal Agents with Google GenAI | by Qingyue(Annie) Wang | Google Cloud - Community

We used the Google GenAI SDK + Firebase Genkit to build ClarityCam with structure and adaptability.

Google GenAI SDK is a developer toolkit that makes it easy to use Gemini models in your app — with structured prompts, schema-based input/output, and multimodal support (e.g. text + images).
Firebase Genkit is a framework that connects your GenAI logic to your backend code, giving you routing, logging, secret management, and developer tooling out of the box.

Below are 8 repeatable GenAI patterns, complete with code. Or you can directly check the codelab here.

1. Classify User Intent

Turn free-form voice input into structured tags, each with a distinct intention:

const classifyIntentPrompt = ai.definePrompt({
name: 'classifyIntentPrompt',
input: z.object({ userQuery: z.string() }),
output: z.object({
intent: z.enum([
"DescribeImage", "TakePicture", "ReadTextInImage",
"SetDescriptionDetailed", "SetDescriptionConcise", "Unknown"
])
}),
prompt: `Classify the user's intent: '{{userQuery}}'. Output one intent.`,
config: { temperature: 0.05 }
});

Read more

2. Handle Imperfect Voice Input

Gemini helps clean up noisy or vague commands:

const prompt = ai.definePrompt({
name: 'checkTypoPrompt',
input: { schema: CheckTypoInputSchema },
output: { schema: CheckTypoOutputSchema },
prompt: `
You are a helpful assistant correcting typos. If unsure, return the original.User: {{{text}}}
Corrected:`
});

Read more

3. Maintain Context & Memory

Track recent images and responses across turns:

const describeImageInput = {
photoDataUri,
previousAIResponse: "A red car next to a tree",
previousUserQueryOnImage: "What’s in this image?"
};

Read more

4. Use Effective Prompt Engineering

Context-aware, task-specific prompting boosts reliability:

const describePrompt = await ai.generate({
input: {
image: photoDataUri,
prompt: `Provide a concise and accurate description of this image.`
}
});

5. Handle Uncertainty Gracefully

Avoid hallucinations with fallback intent types:

if (intent === "OutOfScopeRequest" || intent === "Unknown") {
speakText("Sorry, I can describe images, read text, or identify colors.");
}

Read more

6. Proactively Onboard New Users

Many users, especially those using accessibility tools, may not be familiar with how to interact with an AI agent.

Imagine a user who is blind or has low vision. They might not even be aware that ClarityCam has been activated. This confusing initial experience can quickly lead to frustration and, ultimately, cause them to stop using the product altogether.

const introMessage = "Hi! I’m ClarityCam AI. You can say 'take a picture' or 'describe this image.'";
speakText(introMessage);

Read more

7. Adapt to User Preferences

We can make it simple for users to quickly set their preferences. You can customize the agent’s responses, making them more detailed or concise, to better suit your specific needs. This level of control ensures a smoother and more personalized experience right from the start.

const prompt = ai.definePrompt({
name: 'describeImagePrompt',
input: { schema: DescribeImagePromptInputSchema },
output: { schema: DescribeImageOutputSchema },
prompt: `You are an AI assistant helping a visually impaired user understand an image.{{#if isDetailed}}
Provide a very detailed and comprehensive description of the image.
{{else}}
Provide a concise description of the main subjects.
{{/if}}
{{#if question}}
Also answer the user's question: {{{question}}}
{{/if}}
Here is the image:
{{media url=photoDataUri}}`
});

Read more

8. Support Both Voice and Text Outputs

To ensure full accessibility, ClarityCam provides responses through both spoken audio and on-screen text.

setAnalysisResult(description); // Displayed
speakText(description);         // Spoken

Source Credit: https://medium.com/google-cloud/build-accessibility-first-multimodal-agents-with-google-genai-f3957b50f029?source=rss—-e52cf94d98af—4

Deven Goratela

Administrator

Visit Website View All Posts

Related Stories

Introducing checkpointless and elastic training on Amazon SageMaker HyperPod

Docker Gamefied: Learning Containers in the Age of AI Agents

New serverless customization in Amazon SageMaker AI accelerates model fine-tuning

You may have missed

How to fix the 403 Forbidden error (11 simple methods)

Introducing checkpointless and elastic training on Amazon SageMaker HyperPod

Docker Gamefied: Learning Containers in the Age of AI Agents

How to Create Your Online Store in Under 1 Hour Using WooCommerce

About the Author

Related Stories

You may have missed