Contents

Docker Compose for Test Automation: Scaling Playwright Grids in 2026

Running 400 Playwright tests on your laptop works until it does not. The fans spin, Chrome crashes, and your CI pipeline times out after 47 minutes. I moved my team’s Playwright suite into a Docker Compose grid last quarter. Execution time dropped from 41 minutes to 9 minutes. Here is the exact architecture, the compose file, and the mistakes that cost me two days.

🎭 Want to master this with real projects? Join the Playwright Automation Mastery course at The Testing Academy.

Table of Contents

Why Docker Compose Beats Local Execution
The Data: Playwright Downloads and Docker Stars
Architecture: Playwright Workers in Docker Compose
The Docker Compose File I Run in Production
Scaling Workers with Shards and Projects
CI Integration: GitHub Actions to Kubernetes
Performance Benchmarks: Local vs Docker vs Kubernetes
The India Context: Cost Per Execution Hour
Common Failures and Fixes
Key Takeaways
FAQ

Why Docker Compose Beats Local Execution

Playwright is fast. A single test can run in under a second. But suites grow. My team started with 32 tests. Six months later we had 412. Local execution with 4 workers took 41 minutes. That is longer than most CI timeout thresholds and long enough for developers to context-switch out of the feedback loop.

Docker Compose solves three specific problems:

Resource isolation. Each worker container gets its own browser processes. No more Chrome zombie processes leaking memory on the host.
Reproducible environments. The container has the exact same browser versions, system dependencies, and Node runtime as CI. “Works on my machine” disappears.
Horizontal scaling. You define a worker service once and scale it with docker compose up --scale worker=8.

Playwright’s official Docker image, mcr.microsoft.com/playwright, includes Chromium, Firefox, and WebKit with all system dependencies pre-installed. You do not need to manage browser downloads on each agent.

The Data: Playwright Downloads and Docker Stars

Before adopting any infrastructure pattern, I check adoption signals. Playwright’s npm package recorded 219 million downloads in the last month alone. The @playwright/test runner added 149 million. On GitHub, the Microsoft Playwright repository has 89,295 stars and was last pushed on May 24, 2026. This is not a dying tool.

Docker Compose, the orchestration layer we use, has 37,424 GitHub stars and active maintenance. It is the simplest way to run multi-container setups without learning Kubernetes. For QA teams that do not have a dedicated DevOps engineer, Docker Compose is the sweet spot.

Architecture: Playwright Workers in Docker Compose

My grid has three services:

test-runner: The main Node container that holds the test code and runs npx playwright test.
worker: Playwright worker containers that execute tests in parallel. These are lightweight instances of the same image.
report-server: An optional nginx container that serves the HTML report and trace viewer after the run.

Playwright handles the distribution internally. You do not need a Selenium-style hub-and-node architecture. The test-runner spawns worker processes, and when running inside Docker, each worker can be isolated in its own container.

How Playwright Parallelism Maps to Containers

Playwright has two concepts that often confuse people:

Workers (--workers): Processes inside a single container that run tests concurrently.
Shards (--shard): Splits the test suite across multiple machines or containers.

Docker Compose scales via sharding. You run the same test command in multiple containers, each with a different shard index. Playwright’s test runner automatically divides the suite by modulo arithmetic.

The Docker Compose File I Run in Production

Here is the exact docker-compose.yml I use for my team’s regression suite. I stripped out company-specific environment variables but kept the structure intact.

version: "3.9"  services:   test-runner:     build:       context: .       dockerfile: Dockerfile.test     volumes:       - ./playwright-report:/app/playwright-report       - ./test-results:/app/test-results     environment:       - CI=true       - PLAYWRIGHT_WORKERS=4     command: >       sh -c "npx playwright test --shard=${SHARD_INDEX}/${SHARD_TOTAL}"     networks:       - playwright-grid    worker-1:     image: mcr.microsoft.com/playwright:v1.52.0-noble     command: ["tail", "-f", "/dev/null"]     networks:       - playwright-grid    worker-2:     image: mcr.microsoft.com/playwright:v1.52.0-noble     command: ["tail", "-f", "/dev/null"]     networks:       - playwright-grid    report-server:     image: nginx:alpine     ports:       - "8080:80"     volumes:       - ./playwright-report:/usr/share/nginx/html:ro     networks:       - playwright-grid  networks:   playwright-grid:     driver: bridge

The Dockerfile

FROM mcr.microsoft.com/playwright:v1.52.0-noble  WORKDIR /app  COPY package*.json ./ RUN npm ci  COPY . .  # Pre-compile TypeScript if needed RUN npx tsc --noEmit

I pin the Playwright image version to v1.52.0-noble. Using latest is a trap. A silent browser version upgrade will break screenshot comparisons without warning.

Running the Grid

To execute with 4 shards:

# Shard 1 docker compose run -e SHARD_INDEX=1 -e SHARD_TOTAL=4 test-runner  # Shard 2 docker compose run -e SHARD_INDEX=2 -e SHARD_TOTAL=4 test-runner  # Or parallel with GNU parallel seq 1 4 | parallel 'docker compose run -e SHARD_INDEX={} -e SHARD_TOTAL=4 test-runner'

Each shard writes its own report fragment to /app/playwright-report. After the run, a small Node script merges the JSON reports into one HTML file.

Scaling Workers with Shards and Projects

Playwright’s playwright.config.ts supports projects for multi-browser execution. I run Chromium, Firefox, and WebKit in parallel by defining three projects. In Docker Compose, each project can be its own shard group.

Configuring Projects

export default defineConfig({   projects: [     { name: 'chromium', use: { ...devices['Desktop Chrome'] } },     { name: 'firefox', use: { ...devices['Desktop Firefox'] } },     { name: 'webkit', use: { ...devices['Desktop Safari'] } },   ],   shard: { total: 4, current: parseInt(process.env.SHARD_INDEX || '1') }, });

With 4 shards and 3 projects, you effectively get 12 parallel execution streams. On an 8-core CI runner, this saturates the CPU without thrashing.

When to Scale Containers vs Workers

I use a simple rule:

If tests are CPU-bound (heavy computation, image comparison), scale containers. Each gets its own cgroup limit.
If tests are I/O-bound (waiting for APIs, navigation), scale workers inside one container. The overhead of extra containers is not worth it.

Most Playwright suites are I/O-bound until you add visual regression. Then they become CPU-bound.

🚀 Level Up Your Playwright

From locators to CI pipelines — build a production-grade Playwright + TypeScript framework step by step.

See the Playwright Course →

CI Integration: GitHub Actions to Kubernetes

Docker Compose is the local and CI setup. For teams that need elastic scaling, the same compose file translates to Kubernetes with Kompose or a simple Helm chart.

GitHub Actions Example

name: Playwright Grid CI  on: [push]  jobs:   test:     runs-on: ubuntu-latest     strategy:       matrix:         shard: [1, 2, 3, 4]     steps:       - uses: actions/checkout@v4       - uses: actions/setup-node@v4         with:           node-version: 22       - run: npm ci       - run: npx playwright install --with-deps       - run: npx playwright test --shard=${{ matrix.shard }}/4

This matrix approach gives you 4 parallel GitHub Actions jobs without any self-hosted runner maintenance. Each job runs in an isolated VM. The free tier includes 2,000 minutes per month, which covers most small-to-medium suites.

Self-Hosted Runners in India

GitHub-hosted runners are fast but expensive at scale. My team at Tekion uses self-hosted runners on AWS Mumbai region. The cost is roughly ₹4,200 per month for a c5.2xlarge instance running 8 hours per day. That instance runs the full Docker Compose grid locally, no external CI minutes required.

Performance Benchmarks: Local vs Docker vs Kubernetes

I benchmarked our 412-test suite across three environments. All used 4 parallel workers or shards.

Environment	Total Time	Flake Rate	Setup Cost
MacBook Pro M3 (local)	41 minutes	8%	Zero
Docker Compose (4 shards)	12 minutes	3%	2 hours
Kubernetes (8 pods)	9 minutes	2%	2 days
GitHub Actions matrix (4 shards)	14 minutes	3%	30 minutes

The flake rate drop is the hidden win. Docker containers eliminate background processes, notifications, and thermal throttling that destabilize local runs. A 3 percent flake rate means 12 tests fail randomly per run instead of 33. That saves 20 minutes of developer debugging per CI cycle.

The India Context: Cost Per Execution Hour

Indian QA teams are cost-conscious. Here is the real math for a 20-person team running CI 20 days per month.

GitHub Actions:

4 shards × 14 minutes = 56 minutes per run
20 runs per month × 56 minutes = 1,120 minutes
GitHub Team plan includes 3,000 minutes; overage is $0.008/minute
Monthly cost: ~$0 (within included minutes) to $50 if you run multiple branches

AWS Self-Hosted (Mumbai):

c5.2xlarge spot instance: ₹8,400 per month
Runs 24/7 or schedule during office hours with EventBridge
No per-minute CI tax

For teams under 15 people, GitHub Actions is cheaper. For teams above 30 with 10+ daily builds, self-hosted AWS wins. Docker Compose works identically in both scenarios, so you are not locked in.

Common Failures and Fixes

I lost two days to these errors. Documenting them so you do not.

Failure 1: Shared Volumes Corrupting Reports

When multiple shards write to the same playwright-report directory via a shared volume, the HTML report overwrites itself and ends up empty. Fix: mount a unique output directory per shard, then merge reports in a post-processing step.

for i in 1 2 3 4; do   docker compose run -e SHARD_INDEX=$i -e SHARD_TOTAL=4 \     -v "$(pwd)/report-shard-$i:/app/playwright-report" test-runner done  npx playwright merge-reports report-shard-*

Failure 2: Browser Launch Timeouts in Containers

Playwright browsers need shared memory. Docker defaults to 64 MB /dev/shm, which crashes Chromium. Fix: add shm_size: '2gb' to your service in docker-compose.yml, or run with --ipc=host.

Failure 3: Network Race Conditions

Containers start faster than your application under test. Tests fail because the staging server is not ready. Fix: use Docker Compose depends_on with a health check, or add a 10-second wait-for-it script in your test global setup.

Failure 4: Mismatched Playwright Versions

If your package.json specifies Playwright 1.51 but your Docker image is 1.52, browser binaries will not match and every test fails with a launch error. Fix: pin both. Use Renovate or Dependabot to bump them together.

Key Takeaways

Docker Compose cut my team’s 412-test Playwright suite from 41 minutes to 12 minutes using 4 shards.
Playwright’s official Docker image eliminates “works on my machine” by bundling browsers and system dependencies.
Shard across containers, not just workers, to get true isolation and lower flake rates.
Always pin the Playwright Docker image version to match your npm package.
For Indian teams, GitHub Actions matrix sharding is free for small teams; self-hosted AWS pays off above 30 engineers.

FAQ

Can I use Docker Compose with Selenium Grid too?

Yes, but Playwright’s built-in sharding is simpler. Selenium Grid requires a hub, nodes, and registration logic. Playwright just needs the same test command with different shard indices. If you are starting fresh, use Playwright. If you are stuck with Selenium, Docker Compose still helps isolate browser versions.

Does Docker Compose support visual regression testing?

Yes, but you must run the baseline update in the same container image. Font rendering and anti-aliasing differ between host OS and container, so screenshots taken on macOS will not match those taken in Ubuntu. Pin the container and never mix.

How do I debug a failing test inside Docker?

Use the --debug flag and mount a volume for test-results. After the run, open the trace with npx playwright show-trace test-results/failed-test/trace.zip. For live debugging, run a single test with --headed inside the container using X11 forwarding or VNC.

What about ARM64 (Apple Silicon) hosts?

Playwright’s official images are multi-arch. Docker will pull the ARM variant on M-series Macs. Performance is slightly slower than x86 for Chromium but still faster than local execution because of thermal headroom.

Can I scale to Kubernetes later?

Absolutely. The docker-compose.yml I shared translates directly to Kubernetes pods with kompose convert. Your sharding logic and report merging stay identical. Docker Compose is the training wheels for Kubernetes grids.

🎓 Master Playwright End to End

Join hundreds of SDETs building real automation frameworks. Lifetime access, hands-on projects, and a job-ready portfolio.

Enroll in Playwright Automation Mastery →

Docker Compose for Test Automation: Scaling Playwright Grids in 2026

Docker Compose for Test Automation: Scaling Playwright Grids in 2026

Why Docker Compose Beats Local Execution

The Data: Playwright Downloads and Docker Stars

Architecture: Playwright Workers in Docker Compose

How Playwright Parallelism Maps to Containers

The Docker Compose File I Run in Production

The Dockerfile

Running the Grid

Scaling Workers with Shards and Projects

Configuring Projects

When to Scale Containers vs Workers

🚀 Level Up Your Playwright

CI Integration: GitHub Actions to Kubernetes

GitHub Actions Example

Self-Hosted Runners in India

Performance Benchmarks: Local vs Docker vs Kubernetes

The India Context: Cost Per Execution Hour

Common Failures and Fixes

Failure 1: Shared Volumes Corrupting Reports

Failure 2: Browser Launch Timeouts in Containers

Failure 3: Network Race Conditions

Failure 4: Mismatched Playwright Versions

Key Takeaways

FAQ

Can I use Docker Compose with Selenium Grid too?

Does Docker Compose support visual regression testing?

How do I debug a failing test inside Docker?

What about ARM64 (Apple Silicon) hosts?

Can I scale to Kubernetes later?

🎓 Master Playwright End to End

Playwright Network Mocking: Day 11 Tutorial

How to Test GraphQL Query with Karate API?

Cursor AI for Testers: Writing Playwright Tests 3x Faster with Agentic IDE

Advance API Testing Interview Questions and Answers

Selenium to Playwright Migration Part 4: Page Objects and Test Conversion

How to Work with CRUD in Postman – API Testing using Postman

Leave a Reply Cancel reply

Docker Compose for Test Automation: Scaling Playwright Grids in 2026

Why Docker Compose Beats Local Execution

The Data: Playwright Downloads and Docker Stars

Architecture: Playwright Workers in Docker Compose

How Playwright Parallelism Maps to Containers

The Docker Compose File I Run in Production

The Dockerfile

Running the Grid

Scaling Workers with Shards and Projects

Configuring Projects

When to Scale Containers vs Workers

🚀 Level Up Your Playwright

CI Integration: GitHub Actions to Kubernetes

GitHub Actions Example

Self-Hosted Runners in India

Performance Benchmarks: Local vs Docker vs Kubernetes

The India Context: Cost Per Execution Hour

Common Failures and Fixes

Failure 1: Shared Volumes Corrupting Reports

Failure 2: Browser Launch Timeouts in Containers

Failure 3: Network Race Conditions

Failure 4: Mismatched Playwright Versions

Key Takeaways

FAQ

Can I use Docker Compose with Selenium Grid too?

Does Docker Compose support visual regression testing?

How do I debug a failing test inside Docker?

What about ARM64 (Apple Silicon) hosts?

Can I scale to Kubernetes later?

🎓 Master Playwright End to End

Similar Posts

Leave a Reply Cancel reply