| |

Playwright Sharding and Docker: How I Cut My Test Suite from 47 Minutes to 8 Minutes

Last quarter, my Playwright regression suite hit 47 minutes on a single GitHub Actions runner. Developers started skipping pre-merge checks. Product managers asked why “a few tests” took longer than the build itself. I knew the suite was large, but I did not realize how much time we were lighting on fire until I measured it properly.

Three weeks later, the same suite ran in 8 minutes and 12 seconds. The change was not a new framework, a cloud vendor switch, or hiring a DevOps specialist. It was two built-in Playwright features — sharding and Docker — configured with precision instead of hope.

In this article, I show you the exact Playwright sharding and Docker setup that produced an 83% runtime reduction. I include the full GitHub Actions YAML, the mistakes that cost me two days, and the numbers that convinced my engineering manager to approve the change in one Slack message.

Table of Contents

Contents

The Starting Point: Why 47 Minutes Is a Death Sentence

Our suite had 312 end-to-end tests across 28 spec files. Each test averaged 8.5 seconds, but sequential execution on a single runner with workers: 1 (the Playwright CI default) meant simple arithmetic: 312 tests × 8.5 seconds = 2,652 seconds, or roughly 44 minutes. Add setup time, dependency installation, and artifact upload, and we landed at 47 minutes.

Here is what 47 minutes does to a team:

  • Developers merge broken code: When a PR check takes 47 minutes, developers stop waiting. They merge, go home, and hope nothing breaks overnight.
  • Flakiness compounds: Long suites run less frequently. When they do run, they hit more environmental variance — network hiccups, staging instability, resource contention — which produces flaky failures.
  • CI bills balloon: GitHub Actions bills by the minute. A 47-minute job running 20 times a day costs 940 minutes daily. At $0.008 per minute for Teams plans, that is $7.52 per day, or $225 per month for one workflow.

The real cost is not money. It is trust. When the pipeline is slower than a coffee break, the team stops treating it as a quality gate. It becomes a ceremonial step. I see this in almost every organization where I consult. The suite grows, no one optimizes it, and eventually it dies from neglect.

Playwright now sees 134.7 million npm downloads per month as of April 2026, compared to Cypress at 30.4 million and selenium-webdriver at 8.39 million. That is a 16× gap over Selenium. With that scale comes a responsibility to run tests efficiently. Sharding is how you do it.

What Playwright Sharding Actually Does

Sharding splits your test suite into smaller chunks called shards. Each shard runs independently on its own machine. If you split a 312-test suite into 6 shards, each shard runs roughly 52 tests. On identical hardware, the total runtime drops from 47 minutes to under 8 minutes because the shards execute in parallel.

Playwright handles the splitting via the --shard=x/y flag. For example, to split into 6 shards:

npx playwright test --shard=1/6
npx playwright test --shard=2/6
npx playwright test --shard=3/6
npx playwright test --shard=4/6
npx playwright test --shard=5/6
npx playwright test --shard=6/6

Each command receives a different subset of tests. In CI, you run these commands on separate jobs, and the orchestrator — GitHub Actions, GitLab CI, Azure Pipelines — schedules them concurrently.

It is important to understand that sharding is not the same as workers. Workers run tests concurrently on a single machine using shared CPU and memory. Sharding distributes tests across multiple machines, each with its own isolated resources. You can combine both: 6 shards × 4 workers per shard = 24 tests running simultaneously across the fleet. This is the combination that produces the 83% reduction I measured.

Balancing Shards: File-Level vs Test-Level

Playwright supports two levels of sharding granularity, and choosing the wrong one can leave you with lopsided shards that defeat the purpose.

Without fullyParallel: true: Playwright assigns entire test files to shards. If one file contains 40 tests and another contains 3, the first shard is overloaded. This is the default behavior, and it is dangerous for teams with uneven test files.

With fullyParallel: true: Playwright splits individual tests across shards. Each shard receives an approximately equal number of tests regardless of file size. This is the mode I use, and it is the reason my 6-shard setup stays within 30 seconds of perfect balance.

In your playwright.config.ts:

export default defineConfig({
  fullyParallel: true,
  workers: process.env.CI ? 4 : undefined,
});

The workers: 4 setting tells each shard to run 4 tests concurrently on its own runner. On GitHub Actions ubuntu-latest (2 vCPU, 7 GB RAM), 4 workers is the sweet spot. More workers cause memory pressure and browser crashes. Fewer workers leave CPU cycles on the table.

Merging Shard Reports

Each shard produces its own report. To see a unified HTML report, Playwright provides the blob reporter and a merge-reports CLI command. I configure blob reporting in CI and merge the results in a downstream job. This gives me one HTML report with all 312 tests, traces, and screenshots — exactly what I would get from a single non-sharded run.

In playwright.config.ts:

reporter: process.env.CI ? 'blob' : 'html',

After all shards finish, I run:

npx playwright merge-reports --reporter html ./all-blob-reports

This produces a standard playwright-report directory with combined results.

Docker’s Role: Consistency Begets Speed

Sharding alone would have cut my time, but Docker is what made the numbers stable. Without Docker, my shards showed 15% runtime variance between runs. One shard finished in 7 minutes, another in 10. The culprit was environment drift: different browser versions, missing system dependencies, and inconsistent locale settings across GitHub Actions runners.

I switched to the official Playwright Docker image, and variance dropped to under 3%.

The Image I Use

mcr.microsoft.com/playwright:v1.59.1-noble

This image is based on Ubuntu 24.04 LTS and contains Chromium, Firefox, WebKit, and all system dependencies pre-installed. The compressed size is approximately 872 MB. The first CI run downloads it in 2–3 minutes. Subsequent runs cache the image layer and start in under 30 seconds.

Pinning is non-negotiable. If your package.json installs Playwright 1.59.1 and your Docker image runs 1.58, Playwright cannot locate browser executables. The error is cryptic. The fix is trivial: pin everything.

Docker Flags That Matter

Three flags make or break Playwright in Docker:

  • --init: Prevents zombie browser processes by handling PID 1 correctly. Without this, long suites leak Chromium instances and exhaust memory.
  • --ipc=host: Required for Chromium. Without shared memory, Chromium runs out of memory on JavaScript-heavy pages and crashes with RESULT_CODE_KILLED errors.
  • --user pwuser: Only needed for untrusted sites. For end-to-end tests on your own application, root is fine and avoids permission headaches.

My GitHub Actions container block looks like this:

container:
  image: mcr.microsoft.com/playwright:v1.59.1-noble
  options: --init --ipc=host

If you are still installing browsers via npx playwright install --with-deps on every run, you are burning 45–90 seconds per job. The Docker image contains the browsers already. That time savings multiplies across 6 shards: 6 × 60 seconds = 6 minutes saved per pipeline run.

For a deeper walkthrough of the Docker + GitHub Actions baseline setup, see my Playwright Docker GitHub Actions CI/CD pipeline guide.

The Exact Configuration That Cut My Time

This is the complete GitHub Actions workflow I run in production. It shards across 6 jobs, uses the official Docker image, merges blob reports, and uploads artifacts. Copy it, change the shard count to match your suite size, and run it.

name: Playwright Tests (Sharded)
on:
  push:
    branches: [main, master]
  pull_request:
    branches: [main, master]

jobs:
  playwright-tests:
    timeout-minutes: 30
    runs-on: ubuntu-latest
    container:
      image: mcr.microsoft.com/playwright:v1.59.1-noble
      options: --init --ipc=host
    strategy:
      fail-fast: false
      matrix:
        shardIndex: [1, 2, 3, 4, 5, 6]
        shardTotal: [6]
    steps:
      - uses: actions/checkout@v5

      - name: Cache Node modules
        uses: actions/cache@v4
        with:
          path: ~/.npm
          key: ${{ runner.os }}-node-${{ hashFiles('package-lock.json') }}
          restore-keys: |
            ${{ runner.os }}-node-

      - name: Install dependencies
        run: npm ci

      - name: Run Playwright tests
        run: npx playwright test --shard=${{ matrix.shardIndex }}/${{ matrix.shardTotal }}

      - name: Upload blob report
        if: ${{ !cancelled() }}
        uses: actions/upload-artifact@v4
        with:
          name: blob-report-${{ matrix.shardIndex }}
          path: blob-report/
          retention-days: 1

  merge-reports:
    if: ${{ !cancelled() }}
    needs: [playwright-tests]
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v5
      - uses: actions/setup-node@v5
        with:
          node-version: lts/*

      - name: Install dependencies
        run: npm ci

      - name: Download blob reports
        uses: actions/download-artifact@v5
        with:
          path: all-blob-reports
          pattern: blob-report-*
          merge-multiple: true

      - name: Merge reports
        run: npx playwright merge-reports --reporter html ./all-blob-reports

      - name: Upload HTML report
        uses: actions/upload-artifact@v4
        with:
          name: html-report
          path: playwright-report/
          retention-days: 14

How the Math Works

Before sharding:

  • Single runner, 312 tests, workers = 1
  • Runtime: 47 minutes

After sharding:

  • 6 runners, ~52 tests per shard, workers = 4 per shard
  • Shard runtime: 7–9 minutes
  • Total pipeline time (longest shard + merge job): 8 minutes 12 seconds

The 83% reduction comes from two factors:

  1. Horizontal scaling: 6 machines instead of 1.
  2. Per-shard parallelism: 4 workers per shard instead of 1.

GitHub Actions bills by total compute minutes, not pipeline wall-clock time. The single-runner job cost 47 minutes of compute. The 6-shard job costs roughly 6 × 8 = 48 minutes of compute plus 2 minutes for the merge job. The bill is essentially identical, but the developer experience is night and day.

The Three Tweaks That Matter More Than Sharding

Sharding is the headline, but three smaller optimizations account for an additional 25% of the time savings. Without them, my shards would have run 10–11 minutes instead of 8.

1. Enable fullyParallel

I mentioned this earlier, but it deserves its own bullet. Before fullyParallel: true, my largest spec file had 38 tests. That file always landed on one shard, creating a 12-minute straggler. With test-level sharding, the load balanced to within 4% across all shards.

2. Cache Node Modules Aggressively

The actions/cache@v4 step in my workflow stores ~/.npm between runs. On a warm cache, npm ci drops from 90 seconds to 11 seconds. Across 6 shards, that is 6 × 79 seconds = 474 seconds, or nearly 8 minutes of total compute saved per run.

3. Use Blob Reports, Not HTML Reports per Shard

My first attempt uploaded an HTML report from every shard. Each report was 45–120 MB depending on trace and screenshot volume. Uploading 6 reports took 3 minutes total. Switching to blob reports cut the upload size to 8–15 MB per shard, and the merge job handles the final HTML generation in 40 seconds.

If you want to compare Playwright’s parallelization model with Selenium Grid, my Playwright vs Selenium stability analysis breaks down why file-level parallelism in Selenium often produces worse balancing than Playwright’s test-level approach.

What This Costs on GitHub Actions

I run this on public repositories where GitHub Actions is free. For private repositories, here is the math.

GitHub Actions pricing for private repos:

  • Free plan: 2,000 minutes per month
  • Team plan: $4 per user per month, includes 3,000 minutes, then $0.008 per minute
  • Enterprise: $21 per user per month, includes 50,000 minutes, then $0.008 per minute

My pre-sharding workflow consumed 47 minutes per run. At 20 runs per day (push + PR on a busy team), that is 940 minutes daily, or 18,800 minutes in a 20-workday month. A Team plan would exceed its included minutes by 15,800 minutes, costing an additional $126.40 per month.

Post-sharding, the compute consumption is roughly 50 minutes per run (6 shards × 8 minutes + merge job). Monthly consumption is 20,000 minutes. The additional cost is $136 per month. The dollar difference is negligible, but the velocity difference is massive.

For teams on the free plan, the 2,000-minute limit means you can run the sharded pipeline approximately 40 times per month. If you need more, you either pay or switch to self-hosted runners.

Self-Hosted Runners: When They Make Sense

GitHub-hosted runners cap at 2 vCPU and 7 GB RAM. If your application is memory-heavy or you need more than 4 workers per shard, self-hosted runners on AWS EC2 c6i.2xlarge instances (8 vCPU, 16 GB RAM) cost roughly $0.17 per hour on-demand. At 20 runs per day with 6 shards, each shard runs for 8 minutes. Total daily compute is 6 shards × 8 minutes × 20 runs = 960 minutes, or 16 hours. Daily cost: $2.72. Monthly cost: approximately $54 — cheaper than GitHub’s overage billing for high-volume private repos.

The trade-off is maintenance. You now manage runner pools, autoscaling groups, and AMI updates. I recommend self-hosted runners only after you have exhausted sharding and worker optimization on GitHub-hosted runners. Do not throw hardware at a configuration problem.

India Context: What Startups and Product Companies Actually Use

In Bengaluru’s product startup ecosystem, the Playwright + GitHub Actions + Docker stack is now the default for teams paying ₹20–40 LPA for SDET roles. Service companies still run Selenium Grid on self-hosted VMs, but that gap is closing.

Here is what I see in 2026:

  • Product startups (Series A to D): 80% use GitHub Actions or GitLab CI. Docker is standard. Sharding is expected once suites cross 150 tests.
  • Fintech and healthtech: These sectors keep HTML reports for 90 days and require trace files for every failed test. Artifact retention is a compliance feature, not a convenience.
  • Service companies transitioning to automation: Many are on Jenkins with on-premise agents. The migration path is Jenkins → GitHub Actions → Docker, usually over 12–18 months.

If you are an SDET interviewing in 2026, knowing how to configure sharded Playwright pipelines is table stakes for product companies. I ask candidates to whiteboard this exact workflow in my SDET career sessions. The ones who can draw the container, the cache layer, and the shard matrix get offers.

For manual testers looking to make the jump, my AI SDET roadmap covers the Docker and CI/CD skills that separate ₹8 LPA candidates from ₹25 LPA candidates.

Common Failures When Sharding Playwright Tests

After helping three teams implement this pattern, I have a short list of failures that appear again and again.

Failure 1: Imbalanced Shards Without fullyParallel

Symptom: One shard finishes in 5 minutes, another takes 14.

Cause: A single spec file contains too many tests. Playwright assigns the entire file to one shard.

Fix: Enable fullyParallel: true or split large spec files into smaller ones.

Failure 2: Missing Blob Reports on Failure

Symptom: A shard fails, but the merge job crashes because it cannot find the blob report.

Cause: The blob upload step uses if: failure() instead of if: ${{ !cancelled() }}. Cancelled or timed-out jobs do not upload artifacts.

Fix: Use if: ${{ !cancelled() }} for all blob uploads, and set fail-fast: false on the matrix so one failed shard does not kill the others.

Failure 3: Docker Image Version Drift

Symptom: browserType.launch: Executable doesn't exist

Cause: The Docker image tag does not match the Playwright version in package.json.

Fix: Pin both to the same exact version. I use a shell script in CI that reads package.json and constructs the Docker image tag dynamically:

PLAYWRIGHT_VERSION=$(node -p "require('./package.json').devDependencies['@playwright/test']")
echo "mcr.microsoft.com/playwright:v${PLAYWRIGHT_VERSION}-noble"

Failure 4: Memory Crashes on 4 Workers

Symptom: Random browserType.launch: Protocol error or container OOM kills.

Cause: 4 workers × 3 browsers × heavy JavaScript apps exceed the 7 GB RAM limit on ubuntu-latest.

Fix: Drop workers to 2, or shard more aggressively (8–10 shards) so each shard runs fewer concurrent tests. You can also use GitHub’s larger runners (4 vCPU, 16 GB RAM) for an additional $0.032 per minute.

Key Takeaways

  • A 47-minute Playwright suite is not a law of nature. It is a configuration problem.
  • Sharding splits tests across machines. Six shards turned my 47-minute suite into an 8-minute suite.
  • Docker eliminates environment drift. The official mcr.microsoft.com/playwright image saves 45–90 seconds per job by pre-installing browsers.
  • fullyParallel: true is essential for balanced shards. Without it, one large spec file becomes a straggler.
  • The compute cost of sharding is roughly equal to single-runner execution. The developer velocity gain is exponential.
  • For teams in India, this skill set is now expected in product companies paying ₹20–40 LPA for SDET roles.

Frequently Asked Questions

How many shards do I need?

Start with one shard per 50 tests. My 312-test suite uses 6 shards. If your suite is under 100 tests, sharding may not be worth the orchestration overhead. If it is over 500, consider 10–12 shards or self-hosted runners with more cores.

Does sharding work with GitLab CI and Azure Pipelines?

Yes. GitLab CI uses parallel: matrix and Azure Pipelines uses strategy: matrix. The --shard flag is framework-agnostic. The concepts in this article apply to any CI system that supports parallel jobs.

Can I shard across different operating systems?

Yes, but you need separate blob reports per OS because screenshots and trace paths may differ. Merge them into a single HTML report using the same merge-reports command.

What about test dependencies and global setup?

Playwright runs global setup once per shard, not once per suite. If your setup is expensive (database seeding, environment provisioning), running it 6 times may add overhead. Consider running setup in a separate job, persisting state to an artifact, and downloading it in each shard.

Is Docker required for sharding?

No. You can shard on bare GitHub Actions runners using npx playwright install --with-deps. Docker adds consistency and saves install time. For teams without Docker expertise, start with bare runners and add Docker once you see version drift.

Similar Posts

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.