
The world of generative AI is moving fast, with models like Lyria, Imagen, and Veo now capable of producing stunningly realistic and imaginative images and videos from simple text prompts. However, evaluating these models is still a steep challenge. Traditional human evaluation, while the gold standard, can be slow and costly, hindering rapid development cycles.
To address this, we’re thrilled to introduce Gecko, now available through Google Cloud’s Vertex AI Evaluation Service. Gecko is a rubric-based and interpretable autorater for evaluating generative AI models that empowers developers with a more nuanced, customizable, and transparent way to assess the performance of image and video generation models.
The challenge of evaluating generative models with auto-raters
Creating useful, performant auto-raters is challenging as the quality of generation dramatically improves. While specialised models can be efficient, they lack the interpretability developers need to understand model behavior and pinpoint areas for improvement. For instance, when evaluating how accurately a generated image depicts a prompt, a single score doesn’t reveal why a model succeeded or failed.
Introducing Gecko: Interpretable, customizable, and performant evaluation
Gecko offers a fine-grained, interpretable, and customizable auto-rater. This Google DeepMind research paper shows that such an auto-rater can reliably evaluate image and video generation across a range of skills, reducing the dependency on costly human judgment. Notably, beyond its interpretability, Gecko exhibits strong performance and has already been instrumental in benchmarking the progress of leading models like Imagen.
Gecko makes evaluation interpretable with its clear, step-by-step rubric-based approach. Let’s take an example and use Gecko to evaluate the generated media of a cup of coffee and a croissant on a table.
Source Credit: https://cloud.google.com/blog/products/ai-machine-learning/evaluate-your-gen-media-models-on-vertex-ai/