
End-to-end testing is a best practice for ensuring software quality, yet it remains a pain point for many developers. Traditional frameworks often lead to brittle test suites that are expensive to maintain. They rely on specific selectors like CSS IDs, which can break with the smallest refactor. The result? Flaky tests, a slow development cycle, and a constant game of catch-up.
This article provides a technical deep-dive into building a modern, resilient testing framework powered by an AI agent. We cover the three open-source components that make this possible: browser-use
, pytest
, and Allure Report. Then, we explore how they are architected into a cohesive solution called AgentiTest.
At the center of our solution is browser-use, a Python library that enables an AI agent to control a web browser. It’s the engine that translates human language into machine actions.
browser-use orchestrates an interaction between an LLM and a Playwright-controlled browser. The core of its operation is a ReAct (Reason and Act) loop:
- Observe: The agent inspects the current web page, simplifying its DOM into a clean, actionable accessibility tree. This provides a structured view of all interactive elements.
- Reason: It sends this simplified view, along with the high-level task (e.g., “find the search bar, type ‘Vertex AI’, and press enter”), to the LLM. The LLM analyzes the context and decides the single best next action to perform.
- Act: The agent executes the command returned by the LLM (e.g.,
type_into_element(index=5, text='Vertex AI')
) using its underlying Playwright driver.
This loop continues until the agent determines the original task is complete. A key feature of browser-use is its state management. The agent history tracks each action, LLM thought process, URL visited, and screenshot, providing an audit trail that we will later use for reporting.
How it works
browser-use
allows you to define an agent, give it a task, and run it with a few lines of code:
import asyncio
from browser_use import Agent
from browser_use.llm import ChatGoogleasync def main():
# Define the agent with a task and an LLM
agent = Agent(
task="Go to wikipedia.org and search for the 'Roman Empire'",
llm=ChatGoogle(model="gemini-2.5-pro"),
)
# Run the agent and wait for the result
result = await agent.run()
print(result.final_result())
asyncio.run(main())
While Browser Use handles the browser interaction, pytest provides the structure, organization, and execution engine for our tests.
A Simple pytest Example
If you have a file named test_example.py
, pytest automatically discovers and runs the test functions within it.
# content of test_example.py
def test_always_passes():
assert 1 == 1
You would run this from your terminal with this command: pytest
Key pytest Concepts Used
- Fixtures: Fixtures are setup/teardown functions that provide a consistent baseline for tests in
conftest.py:
llm()
: A session-scoped fixture that initializes theChatGoogle
model once for the entire test run.browser_session()
: A function-scoped fixture that creates a new, isolated browser session for each test. This ensures that tests do not interfere with one another.
2. Dependency Injection: pytest automatically provides the output of these fixtures to any test function that lists them as an argument, simplifying test setup significantly.
# conftest.py
@pytest.fixture(scope="session")
def llm() -> ChatGoogle:
# ... initialization logic ...
return ChatGoogle(model=model_name)# test_community_website.py
async def test_search_for_term(self, llm, browser_session, term):
# `llm` and `browser_session` are automatically provided by pytest
...
3. Parameterization: To avoid writing repetitive code, we use this marker to run a single test function with multiple sets of inputs. This is perfect for data-driven scenarios, like testing navigation to four different links with a single test definition.
# test_community_website.py
@pytest.mark.parametrize(
"link_text, expected_path_segment",
[
("Google Workspace", "google-workspace"),
("AppSheet", "appsheet"),
# ... more parameters ...
],
)
async def test_main_navigation(self, llm, browser_session, link_text, expected_path_segment):
# This test will run once for each tuple in the list above
...
A test run is only as good as its report. Allure Report transforms the raw pass/fail output into an interactive HTML dashboard that tells the full story of what happened during execution.
The connection between pytest and Allure is made through the allure-pytest plugin. When you run pytest, this plugin acts as a listener, hooking into the pytest execution lifecycle. It intercepts test events (like the start and end of a test) and interprets the Allure decorators and function calls.
For each event and attachment, Allure generates a corresponding JSON or attachment file in the directory specified by the --alluredir
command-line option. This folder of raw result files is the universal data format that the allure command-line tool then uses to generate the final, polished HTML report.
Allure’s features allow us to structure and enrich this report:
Decorators for organizing: We use Allure decorators for pytest to organize the report:
@allure.feature("Search Functionality")
: Creates a high-level grouping.@allure.story("Searching for Terms"”")
: Defines a user story within that feature.@allure.title("Search for '{term}'")
: Sets a dynamic title for each parametrized test case.
Attachments for results: The intermediate and final outputs are included in the report with these functions:
allure.step()
: A context manager that groups a set of actions into a collapsible step in the report.allure.attach()
: Attaches rich data — like text, screenshots, and URLs — to the current step.
Environment for context: Allure looks for a file named environment.properties
in its results directory. A pytest fixture automatically generates this file. It populates the report with context like the OS, browser version, and LLM model used for the run.
Each of the components we’ve just reviewed can be integrated together into an end-to-end architecture:
Test Initiation and Orchestration (pytest)
The process begins with the developer, who writes a test using natural language and executes the pytest
command. This command activates the pytest framework, which acts as the central orchestrator. Its first job is to consult the conftest.py
file. This prepares the fixtures, such as the Gemini LLM instance and the browser session. These fixtures will then be available to all tests. Each of the tests provide a natural language task and expected output.
Agentic Execution Loop (Browser Use)
The Agent class from browser-use then takes control, inspecting the current state of the web page and sending the full context to the Gemini model. Gemini processes the context and returns a specific command, such as click(selector='text="AppSheet"')
.
Next, the agent executes this command using Playwright, which changes the state of the target website. This feedback loop continues, with the agent observing the new state and sending it back to the LLM to determine the next action until the overall task is complete.
Reporting and Visualization (Allure)
Running silently in parallel to this execution loop is the reporting pipeline. The allure-pytest
plugin hooks into pytest’s execution lifecycle, listening for test events and Allure-specific commands.
With every step the agent takes, the custom record_step hook captures the context. This includes the agent’s internal “thoughts,” the current URL, and a screenshot. It then uses allure.attach() to report this information.
After the tests conclude, the allure serve
command processes the test artifacts and renders the HTML report.
To see how these concepts translate into a real-world scenario, the AgentiTest repository includes a sample test file, test_community_website.py. This suite runs against the official Google Cloud Community website. This public forum allows users, partners, and Googlers to ask questions and collaborate.
The sample tests are designed to validate user-facing functionality, and are intuitive for non-technical stakeholders. The test suite checks two main areas:
- Home Page Content: The agent verifies navigation by clicking on each primary navigation link, such as ‘Google Workspace,’ successfully takes the user to the correct corresponding page. Then, it confirms that the key community statistics for members, online users, and solutions are all visible on the page.
- Search Functionality: The agent searches for a known term like ‘BigQuery’ and verifies it is taken to a search results page. To test the negative path, the agent searches for a random, nonsensical string. It then confirms that the website correctly displays a ‘no results’ message.
Altogether, we have a test automation framework that is greater than the sum of its parts. The agent-based approach gives us more resiliency than ever before, all built on popular open-source testing tools that can integrate easily into DevOps workflows.
Ready to dive in?
- Explore the AgentiTest Framework: Clone the source code, run the tests yourself, and adapt it for your own projects.
- Check out the Google Cloud Community: See the website used in this demonstration and imagine how you could write tests for it in natural language. Visit the Google Cloud Community
- Get Started with Google Cloud and Gemini: This project is powered by Google’s Gemini models. To get an API key and begin experimenting, you can start with the Google Cloud Free Tier, which provides free credits to explore a wide range of services. Once you’re signed up, you can learn more about the Gemini models available.
Share your ideas about the future of test automation with me on LinkedIn, X, and Bluesky.
Source Credit: https://medium.com/google-cloud/ai-native-test-automation-is-here-5b096ac12851?source=rss—-e52cf94d98af—4