Browser Agent Test Report Template

Browser agent test report quality is the difference between a useful AI browser run and a fancy demo nobody can trust. I see QA teams get excited when an agent clicks through a flow, but the real question is simple: can another tester replay the evidence and decide whether the result is valid?

This guide gives you a production-ready template for reporting browser agent runs. Use it for Browser Use, Stagehand, Playwright-based agents, internal QA agents, or any workflow where an LLM controls a browser and claims that a user journey passed.

Table of Contents

Why a browser agent test report matters
The browser agent test report template
Capture the prompt and environment
Trace, screenshots, console logs, and network evidence
Assertions, oracles, and pass/fail rules
Reviewer sign-off and defect routing
Playwright TypeScript example
CI storage and retention rules
India QA team context
Key takeaways
FAQ

Contents

Why a browser agent test report matters

AI browser agents are now good enough to move through real products, not just toy pages. Browser Use describes its project as a way to make websites accessible for AI agents and reports more than 100,000 GitHub stars on its public repository. Browserbase Stagehand also keeps shipping browser-agent tooling, with the browse@0.9.0 release published in June 2026.

That growth is useful, but it creates a QA problem. A human tester leaves notes, screenshots, bug IDs, and clear observations. An agent may leave a final answer like “checkout works” unless we force it to preserve evidence. That answer is not enough for a release decision.

AI browser runs are not normal automated tests

A normal Playwright test has code, assertions, fixtures, and deterministic failure output. A browser agent run has more moving parts: prompt wording, model behavior, browser state, page timing, tool calls, retries, and sometimes hallucinated reasoning. If any one part changes, the same task can produce a different path.

That does not mean browser agents are useless. It means the report must be stricter than a normal automation log. I want the report to show exactly what the agent was asked to do, what it clicked, what it saw, what evidence it saved, and what rule converted that evidence into pass or fail.

The report is your safety rail

Without a report, browser agents become a trust exercise. With a report, they become reviewable test execution. A lead SDET can open the artifact, scan the prompt, inspect the trace, compare screenshots, and reject weak conclusions in less than 5 minutes.

I use a simple rule: if the report cannot convince a skeptical human, the agent did not finish the test. The output may be interesting research, but it is not release evidence.

The browser agent test report template

A strong browser agent test report is not a wall of logs. It is a structured artifact with seven sections. Each section answers one question that a reviewer will ask before accepting the result.

Task summary: what user journey was tested and why it matters.
Prompt and instructions: the exact agent prompt, constraints, and success criteria.
Environment: URL, build number, browser, viewport, test account, feature flags, and data setup.
Execution evidence: trace file, screenshots, video, DOM snapshots, console logs, and network notes.
Assertions and oracle: the rule that decides pass, fail, blocked, or needs review.
Defects and risks: bug links, uncertain observations, flaky steps, and missing coverage.
Reviewer sign-off: owner, decision, timestamp, and next action.

Here is the compact version you can paste into a Jira ticket, Confluence page, GitHub issue, or Markdown artifact:

# Browser Agent Test Report

## 1. Task Summary
- Feature:
- User journey:
- Business risk:
- Agent/tool:
- Run ID:

## 2. Prompt and Instructions
- Exact prompt:
- Constraints:
- Success criteria:
- Stop conditions:

## 3. Environment
- App URL:
- Build/commit:
- Browser and viewport:
- Test account:
- Feature flags:
- Data setup:

## 4. Evidence
- Trace URL:
- Screenshot folder:
- Video URL:
- Console log summary:
- Network/API failures:

## 5. Assertions
- Expected result:
- Actual result:
- Pass/fail rule:
- Confidence level:

## 6. Defects and Risks
- Bug links:
- Unknowns:
- Flaky steps:
- Follow-up tests:

## 7. Reviewer Sign-off
- Reviewer:
- Decision: Accepted / Rejected / Needs re-run
- Notes:
- Date:

Keep it boring on purpose

The template looks simple because simple reports survive real delivery pressure. A report that depends on a custom dashboard nobody opens will die in two sprints. A Markdown report with links to trace, screenshots, logs, and sign-off works in almost every team.

If you already run Playwright, connect this report with your existing HTML report. The Playwright reporters documentation explains the built-in reporter options, and those artifacts can sit beside your agent evidence.

Capture the prompt and environment

The prompt is test input. Treat it like test data, not like a chat message. If the prompt changes, the test changed. If the environment changes, the result changed. This is where many QA teams lose auditability.

Record the exact prompt

Do not summarize the prompt in the report. Paste the exact prompt, including constraints and examples. A one-line summary like “agent tested checkout” hides the most important part of the run.

A useful prompt section includes:

The business task in plain language.
Allowed credentials or test account rules.
Pages the agent may visit.
Actions the agent must not perform, such as submitting a real payment.
Success criteria in observable terms.
Stop conditions when the agent sees errors or uncertainty.

Bad prompt: “Test login.” Good prompt: “Open staging login, sign in with qa_agent_01, verify the dashboard loads, verify the account name is visible, save a screenshot after login, and stop if a CAPTCHA or real payment page appears.”

Record the environment like a release note

A browser agent run without environment details is almost impossible to debug later. Include the app URL, build number, commit hash, browser, viewport, locale, test account, flags, and seed data. If your application uses experiments, write down the experiment state.

This matters because agents are sensitive to small UI differences. A changed banner, delayed API call, or feature flag can send the agent down a different path. The report should make that visible.

Include model and tool metadata

For agent-driven runs, record the model provider, model name, temperature if available, agent framework, and version. For example, Browser Use published version 0.13.2 on June 12, 2026. If a later version changes browser control behavior, old reports still need to be understandable.

Trace, screenshots, console logs, and network evidence

A browser agent test report must include evidence that can be inspected without rerunning the agent. Reruns are useful, but they are not proof of what happened in the original run. I want one evidence folder per run.

Trace is the strongest artifact

Playwright trace is one of the best artifacts for browser work because it captures timeline, actions, snapshots, console messages, network requests, and source context. The official Playwright trace viewer documentation shows how traces help inspect actions and page state after a test run.

For agent testing, trace answers the question: what did it really click? That question matters because the agent’s final text may be wrong. The trace can show whether it clicked the intended button, waited for the right state, or accidentally accepted a default option.

Screenshots need labels

Screenshots are useful only when they have names and context. A folder with 30 files named image1.png to image30.png is not evidence. Use names like 01-home-loaded.png, 02-login-filled.png, 03-dashboard-visible.png, and 04-error-toast.png.

The Playwright screenshots documentation covers page and locator screenshots. For agent reports, I prefer both full-page screenshots at important checkpoints and cropped locator screenshots for the exact element being verified.

Console and network logs catch hidden failures

A browser can show a green success banner while the console is full of errors. An agent can also miss API failures if the UI does not expose them. Record console errors, failed requests, 4xx/5xx responses, and any slow endpoint that changed the user experience.

In the report, do not paste thousands of log lines. Summarize the important failures and attach the raw file. The reviewer needs a fast signal first, then full detail if needed.

Assertions, oracles, and pass/fail rules

An agent saying “looks good” is not an assertion. A real test report needs an oracle: a rule that decides whether the observed result is acceptable. This rule should be written before the run, not invented after the agent finishes.

Use observable rules

Good pass/fail rules are observable in the browser or API response. They should not depend on the agent’s confidence alone. Examples:

Pass if the dashboard heading contains “Welcome, QA Agent” after login.
Pass if the order summary total equals ₹1,499 and no failed network request appears during checkout.
Fail if a console error with severity “error” appears after clicking Save.
Needs review if the agent reaches a CAPTCHA, payment gateway, or unclear modal.

Notice the difference. The rules point to evidence. A reviewer can verify them in trace, screenshot, DOM text, console logs, or network files.

Separate confidence from result

I like adding a confidence level, but I never use it as the result. The result is pass, fail, blocked, or needs review. Confidence explains how much trust the reviewer should place in the evidence.

For example, a run can be “pass with medium confidence” when the main assertion is visible but one screenshot is missing. A run can be “fail with high confidence” when the trace shows a broken API and the UI displays an error banner.

Make negative evidence visible

Negative evidence is what the agent did not check. If the agent verifies login but never checks user role permissions, write that gap in the report. This prevents stakeholders from reading more coverage into the result than the agent actually performed.

Reviewer sign-off and defect routing

Browser agents can reduce repetitive exploration, but humans still own the release decision. The report should make reviewer sign-off explicit. This is not bureaucracy. It is the handoff between machine execution and human accountability.

Use three reviewer decisions

Keep the decision set small:

Accepted: evidence supports the result and no critical gaps remain.
Rejected: evidence is weak, incorrect, incomplete, or contradicted by logs.
Needs re-run: environment issue, flaky step, missing artifact, or blocked path.

Do not allow “looks okay” as a decision. That phrase creates ambiguity. A report needs a clear status so it can move through CI, release notes, or a defect triage meeting.

Route defects with evidence, not opinions

When the report finds a bug, create a defect with the trace link, screenshot, console summary, network error, and exact reproduction path. The best defect from an agent run reads like a human tester prepared it.

For ScrollTest readers who already use Playwright, this connects well with the existing Playwright upgrade checklist for production E2E. Upgrade checks and browser-agent reports both need artifacts that another engineer can inspect.

Playwright TypeScript example

Here is a small Playwright TypeScript pattern that creates an evidence bundle for a browser-agent-style run. It records console errors, failed requests, screenshots, and a JSON report. You can wire this around your agent execution function.

import { test, expect, Page } from '@playwright/test';
import fs from 'node:fs/promises';
import path from 'node:path';

type AgentReport = {
  runId: string;
  prompt: string;
  environment: Record<string, string>;
  consoleErrors: string[];
  failedRequests: string[];
  screenshots: string[];
  assertions: Array<{ name: string; status: 'pass' | 'fail'; evidence: string }>;
  reviewerDecision: 'pending' | 'accepted' | 'rejected' | 'needs-re-run';
};

async function collectEvidence(page: Page, runId: string, prompt: string) {
  const dir = path.join('agent-evidence', runId);
  await fs.mkdir(dir, { recursive: true });

  const report: AgentReport = {
    runId,
    prompt,
    environment: {
      appUrl: process.env.APP_URL ?? 'https://staging.example.com',
      build: process.env.BUILD_ID ?? 'local',
      browser: test.info().project.name,
      viewport: JSON.stringify(test.info().project.use.viewport ?? {})
    },
    consoleErrors: [],
    failedRequests: [],
    screenshots: [],
    assertions: [],
    reviewerDecision: 'pending'
  };

  page.on('console', msg => {
    if (msg.type() === 'error') report.consoleErrors.push(msg.text());
  });

  page.on('requestfailed', request => {
    report.failedRequests.push(`${request.method()} ${request.url()} ${request.failure()?.errorText}`);
  });

  return { dir, report };
}

test('agent verifies login journey with evidence', async ({ page }) => {
  const prompt = 'Sign in as qa_agent_01 and verify the dashboard heading is visible.';
  const runId = `login-${Date.now()}`;
  const { dir, report } = await collectEvidence(page, runId, prompt);

  await page.goto(process.env.APP_URL ?? 'https://staging.example.com/login');
  await page.screenshot({ path: path.join(dir, '01-login-page.png'), fullPage: true });
  report.screenshots.push('01-login-page.png');

  // Replace this block with your agent execution call.
  await page.getByLabel('Email').fill('qa_agent_01@example.com');
  await page.getByLabel('Password').fill(process.env.QA_PASSWORD ?? 'secret');
  await page.getByRole('button', { name: 'Sign in' }).click();

  await expect(page.getByRole('heading', { name: /dashboard/i })).toBeVisible();
  await page.screenshot({ path: path.join(dir, '02-dashboard-visible.png'), fullPage: true });
  report.screenshots.push('02-dashboard-visible.png');

  report.assertions.push({
    name: 'Dashboard heading visible after login',
    status: 'pass',
    evidence: '02-dashboard-visible.png'
  });

  if (report.consoleErrors.length > 0 || report.failedRequests.length > 0) {
    report.assertions.push({
      name: 'No browser console or network failures',
      status: 'fail',
      evidence: 'consoleErrors and failedRequests in report.json'
    });
  }

  await fs.writeFile(path.join(dir, 'report.json'), JSON.stringify(report, null, 2));
});

Add trace in Playwright config

For CI, set trace capture in Playwright config so every failed run includes a trace. Many teams use trace: 'on-first-retry' for normal suites. For experimental agent runs, I prefer trace on every run until the workflow is stable.

// playwright.config.ts
import { defineConfig } from '@playwright/test';

export default defineConfig({
  use: {
    trace: 'on',
    screenshot: 'only-on-failure',
    video: 'retain-on-failure'
  },
  reporter: [['html'], ['json', { outputFile: 'playwright-report/results.json' }]]
});

If you are building reusable agent skills, read the QASkills CLI install flow for AI QA teams. A reusable skill should not only run the browser. It should also save the report in a predictable format.

CI storage and retention rules

Reports become useful when CI stores them consistently. If each run saves artifacts in a different place, reviewers stop checking them. Decide the storage path before the rollout.

Recommended folder structure

Use a folder path that includes date, suite, run ID, and scenario. Keep it short enough for humans to scan.

agent-evidence/
  2026-06-30/
    checkout-smoke/
      run-1042/
        report.md
        report.json
        trace.zip
        01-home-loaded.png
        02-cart-filled.png
        03-checkout-summary.png
        console.log
        network-failures.json

Retention should match risk

Not every report needs to live forever. For low-risk smoke runs, 14 or 30 days may be enough. For payment, compliance, healthcare, finance, or production-release evidence, keep the report as long as your audit policy requires.

For CI gates, I like this policy:

Keep all failed and rejected agent reports for 90 days.
Keep accepted smoke reports for 30 days.
Keep release-candidate reports for at least one release cycle.
Keep reports linked to defects until the defect is closed and verified.

Make the report searchable

Add labels such as feature, build, model, agent version, environment, and result. Search matters when a manager asks, “Did the agent test this on Friday’s release candidate?” You should not need to open 40 folders to answer.

India QA team context

In India, many QA teams still sit between service-company delivery habits and product-company release pressure. A TCS or Infosys project may need formal sign-off and defect evidence. A Bengaluru product company may care more about fast CI feedback and trace links in Slack. The same template works for both if you keep it practical.

For SDETs targeting ₹25-40 LPA roles, browser-agent reporting is a strong skill because it shows engineering maturity. Anyone can run an AI agent. Fewer people can design the evidence system that makes the result usable in a release meeting.

What managers want to see

Managers do not want another black-box automation tool. They want faster feedback without losing control. Show them the report, not the prompt alone. Show how the report catches missing screenshots, console errors, weak assertions, and unclear pass/fail logic.

If your team is already modernizing test automation, pair this with the LLM output testing guide with PromptFoo and DeepEval. Browser agents test user journeys. Prompt evals test model outputs. Both need evidence, fixtures, scoring, and review.

Key takeaways

A browser agent test report turns AI browser execution into something a QA team can review, challenge, and improve. The agent run is only one part of the system. The evidence is what makes it useful.

Save the exact prompt because it is test input.
Record environment details so reruns and failures make sense.
Attach trace, screenshots, console logs, network failures, and raw report JSON.
Use observable pass/fail rules instead of trusting the agent’s final text.
Require reviewer sign-off before using an agent result as release evidence.

My practical advice: start with one high-value smoke journey. Do not automate 50 agent scenarios on day one. Build the report template, prove that reviewers trust it, then expand.

FAQ

Is a browser agent test report different from a Playwright report?

Yes. A Playwright report focuses on deterministic test execution. A browser agent report also records prompt, model/tool metadata, agent decisions, reviewer confidence, and gaps in coverage. Use both when possible.

Should every browser agent run block CI?

No. Start with non-blocking reports until the workflow is stable. Move selected scenarios into CI gates only after you have reliable assertions, stable environments, and reviewer trust.

What is the minimum evidence I should save?

Save the exact prompt, environment, trace, final screenshot, console errors, failed requests, assertions, and reviewer decision. If you cannot save trace, at least save screenshots and structured logs.

Can manual testers use this template?

Yes. Manual testers can use it as a checklist while reviewing AI browser runs. It also helps them learn automation thinking because it connects user actions, evidence, assertions, and defects.

Which tool should I use for browser agents?

Pick the tool that fits your stack. Browser Use, Stagehand, and Playwright-based internal agents are all valid options. The reporting standard matters more than the logo on the agent framework.