Playwright Sharding CI Tutorial

Day 15 of the 21-Day Playwright + TypeScript series.

Playwright sharding is the first performance upgrade I add when a test suite becomes too slow for a serious pull request workflow. Workers make one machine faster. Shards split the same suite across multiple CI machines, so your team gets feedback in minutes instead of waiting for a long single runner to finish.

Table of Contents

What Is Playwright Sharding?
Parallelism vs Sharding
Create a Local Baseline First
Playwright Sharding in GitHub Actions
Merge Reports After Sharding
Handle Data, Auth, and Isolation
Common Pitfalls I See in Teams
Production Checklist
FAQ

Contents

What Is Playwright Sharding?

Playwright sharding means splitting one test suite into separate pieces and running each piece as an independent CI job. The official Playwright sharding guide shows the CLI syntax as --shard=current/total, for example --shard=1/4 for the first shard in a four-shard run. That small flag changes how you think about test execution at scale.

On Day 12 we built a GitHub Actions workflow for Playwright. Today we extend that idea and make the pipeline parallel across runners, not just within one runner. If you missed that setup, read Playwright CI with GitHub Actions first because sharding depends on a working CI baseline.

The simple mental model

Think of a suite with 400 tests. Without sharding, one CI job receives all 400 tests. With four shards, each CI job receives a portion of the suite. The jobs run at the same time, then you merge the reports into one final result.

Shard 1/4 runs one part of the suite.
Shard 2/4 runs another part.
Shard 3/4 and Shard 4/4 do the same.
The final report job downloads all blobs and creates one HTML report.

Why this matters for SDETs

A slow suite damages the whole engineering loop. Developers stop waiting for test results. QA engineers get blamed for delayed releases. Managers start asking which tests can be skipped. Sharding protects the value of end-to-end tests because it makes them usable inside a pull request workflow.

Playwright is not a small niche tool anymore. The Microsoft Playwright GitHub repository shows more than 91,000 stars, and the npm downloads API reports over 158 million monthly downloads for @playwright/test. Those numbers matter because teams are standardising around Playwright, and CI design is now a core SDET skill.

Parallelism vs Sharding: Do Not Mix the Concepts

Playwright sharding is not the same as worker-level parallelism. The official Playwright parallelism documentation says Playwright Test runs test files in parallel by using worker processes. By default, test files run in parallel, while tests inside a single file run in order unless you configure them differently.

That means you already get parallelism before you add shards. Sharding adds another layer above it.

Workers run inside one machine

Workers are processes on the same runner. If your CI machine has enough CPU and memory, multiple workers can run test files together. You control this with the workers option or the --workers CLI flag.

// playwright.config.ts
import { defineConfig } from '@playwright/test';

export default defineConfig({
  workers: process.env.CI ? 4 : undefined,
  retries: process.env.CI ? 2 : 0,
  reporter: [['html'], ['list']],
});

This is useful, but it has a ceiling. One GitHub runner has limited CPU and memory. If you keep increasing workers, browsers compete for resources and tests become slower or flaky.

Shards run across machines

Shards run as separate jobs. Each job can still use workers internally. A four-shard matrix with four workers per job can run much more work at the same time than one job with four workers.

# Local example: run only the second shard in a four-shard suite
npx playwright test --shard=2/4

The practical rule is simple: tune workers first, then add shards. If your tests cannot run safely with multiple workers on one machine, sharding will expose the same isolation problems faster.

When fully parallel helps

Playwright also supports running tests inside a file in parallel using test.describe.configure({ mode: 'parallel' }) or fullyParallel: true. I use it only when each test is completely independent. If tests share a user, order, cart, database row, inbox, or local storage state, keep them isolated before you turn this on.

import { test, expect } from '@playwright/test';

test.describe.configure({ mode: 'parallel' });

test('guest can open pricing', async ({ page }) => {
  await page.goto('/pricing');
  await expect(page.getByRole('heading', { name: /pricing/i })).toBeVisible();
});

test('guest can open docs', async ({ page }) => {
  await page.goto('/docs');
  await expect(page.getByRole('heading', { name: /docs/i })).toBeVisible();
});

Create a Local Baseline First

Before I add Playwright sharding to CI, I collect a baseline. This prevents guesswork. If the suite is 12 minutes locally but 48 minutes in CI, you may have a CI dependency problem, not a sharding problem.

Run the suite with a controlled worker count

Start with one worker. Then try two, four, and the CI worker count you plan to use. Write down the numbers. Do not trust one run because network-heavy tests naturally vary.

npx playwright test --workers=1
npx playwright test --workers=2
npx playwright test --workers=4

If runtime improves from one to two workers but gets worse at eight workers, your runner is saturated. You need fewer workers per shard, not more.

Tag slow and destructive tests

Some tests should not sit in the same fast pull request lane. Payment flows, destructive admin flows, and heavy visual tests can run in nightly jobs. Playwright projects, tags, and grep patterns help you separate them.

import { test, expect } from '@playwright/test';

test('@smoke user can sign in', async ({ page }) => {
  await page.goto('/login');
  await page.getByLabel('Email').fill('qa@example.com');
  await page.getByLabel('Password').fill('Password123!');
  await page.getByRole('button', { name: 'Sign in' }).click();
  await expect(page.getByRole('banner')).toContainText('Dashboard');
});

test('@nightly admin can export audit report', async ({ page }) => {
  // Keep slow export flows outside the PR lane.
});

For locator strategy and assertion discipline, revisit Playwright Locators and Assertions. Sharding gives bad selectors more chances to fail, so strong locators matter.

Describe the screenshot you expect

Screenshot description for this step: the Playwright HTML report should show a stable pass rate with the same tests passing at one worker and four workers. If the four-worker run has random failures, fix isolation before you touch shards.

Playwright Sharding in GitHub Actions

The cleanest GitHub Actions setup uses a matrix. Each matrix entry runs one shard. The workflow below uses four shards, installs browsers once per job, uploads blob reports, and leaves the final HTML merge to another job.

name: Playwright Sharded Tests

on:
  pull_request:
  push:
    branches: [main]

jobs:
  test:
    timeout-minutes: 30
    runs-on: ubuntu-latest
    strategy:
      fail-fast: false
      matrix:
        shardIndex: [1, 2, 3, 4]
        shardTotal: [4]
    steps:
      - uses: actions/checkout@v4
      - uses: actions/setup-node@v4
        with:
          node-version: 22
          cache: npm
      - run: npm ci
      - run: npx playwright install --with-deps
      - name: Run Playwright shard
        run: npx playwright test --shard=${{ matrix.shardIndex }}/${{ matrix.shardTotal }} --reporter=blob
      - name: Upload blob report
        if: always()
        uses: actions/upload-artifact@v4
        with:
          name: blob-report-${{ matrix.shardIndex }}
          path: blob-report
          retention-days: 7

Notice fail-fast: false. I keep this on because I want every shard to finish and upload its report. If shard 1 fails instantly and GitHub cancels the remaining jobs, you lose the complete failure picture.

Pick the right shard count

Start with two or four shards. Eight shards can be useful for large suites, but it also increases CI minutes, artifact handling, and debugging noise. A good first target is to cut the pull request suite below 10 minutes without creating flaky tests.

Measure current runtime with one CI job.
Try two shards with the same worker count.
Try four shards only if two shards are still slow.
Compare pass rate, report quality, and CI cost.

Use project selection with shards

If you run Chromium, Firefox, and WebKit on every pull request, sharding alone may not be enough. Many teams run Chromium smoke tests on pull requests and run full browser coverage at night.

# Pull request lane
npx playwright test --project=chromium --grep @smoke --shard=1/4

# Nightly lane
npx playwright test --project=chromium --project=firefox --project=webkit

This is not cheating. It is risk-based testing. The pull request lane answers, “Did this change break the main user journey?” The nightly lane answers, “Do we still have broad confidence across browsers?”

Merge Reports After Sharding

A sharded run is incomplete until reports are merged. Playwright supports blob reports for this exact reason. Each shard produces a blob report. A final job downloads all artifacts and runs npx playwright merge-reports.

  merge-reports:
    if: always()
    needs: [test]
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - uses: actions/setup-node@v4
        with:
          node-version: 22
          cache: npm
      - run: npm ci
      - uses: actions/download-artifact@v4
        with:
          path: all-blob-reports
          pattern: blob-report-*
          merge-multiple: true
      - name: Merge Playwright reports
        run: npx playwright merge-reports --reporter html ./all-blob-reports
      - name: Upload HTML report
        if: always()
        uses: actions/upload-artifact@v4
        with:
          name: playwright-html-report
          path: playwright-report
          retention-days: 14

Screenshot description: the GitHub Actions page should show four green test jobs and one merge-reports job. The final artifact should be named playwright-html-report. When you open it, the report should list all tests, not only the tests from one shard.

Keep traces useful

Traces become more important after sharding because failure context is scattered across jobs. I set traces to retain on first retry in CI. This gives enough data without storing huge trace files for every passing test.

import { defineConfig } from '@playwright/test';

export default defineConfig({
  retries: process.env.CI ? 2 : 0,
  use: {
    trace: 'on-first-retry',
    screenshot: 'only-on-failure',
    video: 'retain-on-failure',
  },
  reporter: process.env.CI ? [['blob']] : [['html'], ['list']],
});

If traces are new for you, read Playwright Trace Viewer: Day 7 Tutorial. A trace is often faster than adding five console logs and rerunning the shard.

Handle Data, Auth, and Isolation

Most sharding failures are not caused by Playwright. They are caused by shared data. Four shards create four sets of browsers that may hit the same backend at the same time. If two tests mutate the same user, order, or environment flag, you get random failures.

Use unique data per worker

Playwright exposes worker information through fixtures. You can use testInfo.workerIndex to create unique emails, tenants, carts, or records. This is not optional for serious parallel execution.

import { test as base, expect } from '@playwright/test';

type Fixtures = {
  userEmail: string;
};

export const test = base.extend<Fixtures>({
  userEmail: async ({}, use, testInfo) => {
    const email = `qa-${Date.now()}-w${testInfo.workerIndex}@example.test`;
    await use(email);
  },
});

test('new user can start checkout', async ({ page, userEmail }) => {
  await page.goto('/signup');
  await page.getByLabel('Email').fill(userEmail);
  await page.getByRole('button', { name: 'Create account' }).click();
  await expect(page.getByText(userEmail)).toBeVisible();
});

We covered fixtures on Day 6. If your test setup is still duplicated across files, revisit Playwright Fixtures and Hooks before scaling the suite.

Store authentication safely

Authentication state is another common problem. A single shared logged-in account can break when tests run together. Prefer one of these patterns:

Read-only user accounts for read-only tests.
Per-worker accounts for flows that modify data.
API-created users for flows that need a clean state.
Separate admin credentials for destructive admin scenarios.

If you use storage state, make sure the account behind it can support the test load. Day 10 covers this in detail: Playwright Authentication.

Clean up without hiding bugs

I prefer test data that expires automatically over aggressive cleanup in afterEach. Cleanup code can fail and hide the real product issue. For many QA environments, a daily cleanup job is safer than deleting records during the test run.

Common Pitfalls I See in Teams

Playwright sharding looks simple, so teams add it too early. Then they spend two weeks debugging flaky tests that were already unsafe. Here are the patterns I watch for during framework reviews.

Pitfall 1: Sharding a messy suite

If tests depend on order, sharding will break them. A test that assumes “create customer” ran before “edit customer” is not an independent test. Put that flow inside one test or create the required state through an API call.

Pitfall 2: Too many workers per shard

A four-shard setup with eight workers per shard creates 32 worker processes. That can crush a small staging environment. Your bottleneck may be the application under test, not the CI runner. Watch backend CPU, database connections, queue latency, and rate limits.

Pitfall 3: Ignoring retry data

The Playwright retries documentation explains that failed tests are retried in a fresh worker process. Retries are useful for diagnostics, but they should not become a hiding place for flaky tests. Track retry count as a quality metric. A suite that passes after retry every day is still unhealthy.

Pitfall 4: No ownership model

Once a suite is sharded, failures appear in different jobs. Assign ownership by area or tag. Checkout failures go to the commerce team. Login failures go to the identity team. Visual baseline failures go to the frontend owner. Without ownership, the QA team becomes the permanent failure router.

Production Checklist for Playwright Sharding

Use this checklist before you mark sharding as done. It is short, but it catches most production mistakes.

The suite passes with one worker, two workers, and the planned CI worker count.
Every test creates or receives isolated data.
Shared accounts are read-only or split per worker.
The GitHub Actions matrix uses fail-fast: false.
Each shard uploads blob reports even when tests fail.
The merge job runs with if: always().
The merged HTML report contains all tests from all shards.
Traces, screenshots, and videos are retained only when useful.
Slow tests are tagged and moved to nightly when needed.
Retry count is reviewed weekly, not ignored.

A realistic India team setup

For many teams in India, especially service teams moving from Selenium to Playwright, I suggest this path: keep pull request tests under 10 minutes, run full cross-browser coverage at night, and teach every SDET to read traces. Product companies often expect this pipeline maturity from mid-level SDETs, and it shows up in interviews for ₹25-40 LPA automation roles.

Do not pitch sharding as a fancy DevOps trick. Pitch it as engineering discipline: fast feedback, stable reports, and fewer blocked merges.

FAQ

Should I use Playwright sharding on day one?

No. First make the suite stable with normal workers. Add sharding when runtime becomes a real pull request problem.

How many shards should I start with?

Start with two. Move to four when you have enough tests and a stable environment. More shards are useful only when the suite is large enough to justify the CI cost.

Does sharding fix flaky tests?

No. It usually exposes flaky tests. If a test depends on shared state, timing, or order, sharding makes the failure more visible.

Can I shard only smoke tests?

Yes. Use --grep @smoke with --shard. This is a practical pattern for fast pull request feedback.

Key Takeaways

Playwright sharding is the right upgrade when your Playwright + TypeScript suite is stable but too slow for pull requests.

Workers improve speed inside one runner; shards split work across runners.
Use GitHub Actions matrix jobs for clean shard execution.
Always merge blob reports into one HTML report.
Fix data isolation before scaling parallel execution.
Track retries because a retry-based pass is still a signal.

Tomorrow we move from execution speed to test observability: reporters, annotations, and custom metadata that make failures easier to triage.

Sources: Playwright documentation on parallelism, sharding, CI, and retries; GitHub API data for microsoft/playwright; npm downloads API for @playwright/test.