We used the Google GenAI SDK + Firebase Genkit to build ClarityCam with structure and adaptability.
- Google GenAI SDK is a developer toolkit that makes it easy to use Gemini models in your app — with structured prompts, schema-based input/output, and multimodal support (e.g. text + images).
- Firebase Genkit is a framework that connects your GenAI logic to your backend code, giving you routing, logging, secret management, and developer tooling out of the box.
Below are 8 repeatable GenAI patterns, complete with code. Or you can directly check the codelab here.
1. Classify User Intent
Turn free-form voice input into structured tags, each with a distinct intention:
const classifyIntentPrompt = ai.definePrompt({
name: 'classifyIntentPrompt',
input: z.object({ userQuery: z.string() }),
output: z.object({
intent: z.enum([
"DescribeImage", "TakePicture", "ReadTextInImage",
"SetDescriptionDetailed", "SetDescriptionConcise", "Unknown"
])
}),
prompt: `Classify the user's intent: '{{userQuery}}'. Output one intent.`,
config: { temperature: 0.05 }
});
Read more
2. Handle Imperfect Voice Input
Gemini helps clean up noisy or vague commands:
const prompt = ai.definePrompt({
name: 'checkTypoPrompt',
input: { schema: CheckTypoInputSchema },
output: { schema: CheckTypoOutputSchema },
prompt: `
You are a helpful assistant correcting typos. If unsure, return the original.User: {{{text}}}
Corrected:`
});
Read more
3. Maintain Context & Memory
Track recent images and responses across turns:
const describeImageInput = {
photoDataUri,
previousAIResponse: "A red car next to a tree",
previousUserQueryOnImage: "What’s in this image?"
};
Read more
4. Use Effective Prompt Engineering
Context-aware, task-specific prompting boosts reliability:
const describePrompt = await ai.generate({
input: {
image: photoDataUri,
prompt: `Provide a concise and accurate description of this image.`
}
});
5. Handle Uncertainty Gracefully
Avoid hallucinations with fallback intent types:
if (intent === "OutOfScopeRequest" || intent === "Unknown") {
speakText("Sorry, I can describe images, read text, or identify colors.");
}
Read more
6. Proactively Onboard New Users
Many users, especially those using accessibility tools, may not be familiar with how to interact with an AI agent.
Imagine a user who is blind or has low vision. They might not even be aware that ClarityCam has been activated. This confusing initial experience can quickly lead to frustration and, ultimately, cause them to stop using the product altogether.
const introMessage = "Hi! I’m ClarityCam AI. You can say 'take a picture' or 'describe this image.'";
speakText(introMessage);
Read more
7. Adapt to User Preferences
We can make it simple for users to quickly set their preferences. You can customize the agent’s responses, making them more detailed or concise, to better suit your specific needs. This level of control ensures a smoother and more personalized experience right from the start.
const prompt = ai.definePrompt({
name: 'describeImagePrompt',
input: { schema: DescribeImagePromptInputSchema },
output: { schema: DescribeImageOutputSchema },
prompt: `You are an AI assistant helping a visually impaired user understand an image.{{#if isDetailed}}
Provide a very detailed and comprehensive description of the image.
{{else}}
Provide a concise description of the main subjects.
{{/if}}
{{#if question}}
Also answer the user's question: {{{question}}}
{{/if}}
Here is the image:
{{media url=photoDataUri}}`
});
Read more
8. Support Both Voice and Text Outputs
To ensure full accessibility, ClarityCam provides responses through both spoken audio and on-screen text.
setAnalysisResult(description); // Displayed
speakText(description); // Spoken
Source Credit: https://medium.com/google-cloud/build-accessibility-first-multimodal-agents-with-google-genai-f3957b50f029?source=rss—-e52cf94d98af—4