

Earlier this month, the Gemini 2.5 Computer Use model was announced. This model is specialized in interacting with graphical user interfaces (UI). This is useful in scenarios where a structured API does not exist for the model to interact with (via function calling). Instead, you can use the Computer Use model to directly interact with user interfaces such as filling and submitting forms.
It’s important to note that the model does not interact with the UI directly. As input, the model receives the user request, a screenshot of the environment, and a history of recent actions. As output, it generates a function call representing a UI action such as clicking or typing (see the full list of supported UI actions). It’s the client-side code’s responsibility to execute the received action and the process continues in a loop:
You can check out the agent in computer-use-preview repo to learn how you can implement such a looping agent using playwright or browserbase.
This all sounds great but I was curious how robust the computer use model is and decided to test it with a concrete use case of mine.
Searching Flights
As a Developer Advocate, I need to travel a lot for work and I end up spending quite a bit of time searching and booking flights. I also developed certain preferences for the type of flights I want. Can I automate this with the computer use model?
First, I came up with a prompt that captures my flight preferences and ran it with the reference agent provided in computer-use-preview repo:
python main.py --query “Find me top 5 fights sorted by price with the following constraints:Flight site to use: www.google.com/travel/flights
From: London
To: Larnaca
One-way or roundtrip: One way
Date to leave: Sometime next week
Date to return: N/A
Travel preferences:
-Direct flights
-No flights before 10am
-Carry-on luggage”
As you can see, I’m looking for direct flights from London to Larnaca, sometime next week, ideally no early flights and with carry-on luggage. This is quite a specific request, can the computer use model handle it?
I started seeing the model requesting to open a browser:
Started local playwright.
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓
┃ Gemini Computer Use Reasoning ┃ Function Call(s) ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┩
│ │ Name: open_web_browser │
└───────────────────────────────────────┴──────────────────────────────────────┘
Under the covers, the agent uses playwright to open an incognito brower and drive it with the function calls from the model.
It first asked me if I wanted to accept the cookie consent form. It’s nice that safety is built in with human in the loop:
Safety service requires explicit confirmation!
I see a cookie consent dialog on the Google page. Should I accept all cookies to proceed?
Do you wish to proceed? [Yes]/[No]
Once I said yes, the model continued opening Google Flights and clicking through the UI to find me the right flights. At each step, it provided a function call and its reasoning:
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓
┃ Gemini Computer Use Reasoning ┃ Function Call(s) ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┩
│ I have evaluated step 24. I have the │ Name: drag_and_drop │
│ "Times" filter open. I need to set │ Args: │
│ the "Departure" time to start from 10 │ destination_x: 641 │
│ am. I will drag the left slider for │ y: 521 │
│ "Departure" to the right until it │ destination_y: 521 │
│ shows 10:00 AM or later. It looks │ x: 530 │
│ like it moves in increments, I'll aim │ │
│ for 10 am. The slider is currently │ │
│ from x=530 to x=825. I need to move │ │
│ the left dot (x=530) to about │ │
│ x=620-630 to get near 10 am. │ │
└───────────────────────────────────────┴──────────────────────────────────────┘
It was impressive to see that the model had no problem in selecting non-stop flights, selecting 10am or later for flight time, and even remembered to select 1 carry-on bag as an option!
After a couple of minutes, clicking through the complicated UI, it gave me the following answer:
Final Answer:
The top 5 cheapest, direct, one-way flights from London to Larnaca departing after 10 am with carry-on luggage on Friday, October 31, 2025 (the cheapest date during the week of Oct 27-Nov 2) are:
1. 10:25 AM - 5:10 PM, Wizz Air, £42 (from LTN)
2. 3:45 PM - 10:25 PM, Jet2, £46 (from STN)
3. 4:35 PM - 11:10 PM, Wizz Air, £50 (from LGW)
4. 5:30 PM - 12:10 AM (+1 day), British Airways, £140 (from LHR)
5. 10:55 AM - 5:35 PM, British Airways, £147 (from LHR)
Impressive! You can see the full flow here:
Conclusion
Today, I tried the Gemini Computer Use model to see if it can automate a common task for me: searching flights with specific requirements. I was impressed that it was able to navigate through a complicated UI and complete the task with the first prompt I provided. If you want to learn more, check out the official documentation and the reference sample.
Source Credit: https://medium.com/google-cloud/search-flights-with-gemini-computer-use-model-e7707f08b7bf?source=rss—-e52cf94d98af—4