Docker Compose for Test Automation: Scaling Playwright Grids in 2026
Contents
Docker Compose for Test Automation: Scaling Playwright Grids in 2026
Running 400 Playwright tests on your laptop works until it does not. The fans spin, Chrome crashes, and your CI pipeline times out after 47 minutes. I moved my team’s Playwright suite into a Docker Compose grid last quarter. Execution time dropped from 41 minutes to 9 minutes. Here is the exact architecture, the compose file, and the mistakes that cost me two days.
Table of Contents
- Why Docker Compose Beats Local Execution
- The Data: Playwright Downloads and Docker Stars
- Architecture: Playwright Workers in Docker Compose
- The Docker Compose File I Run in Production
- Scaling Workers with Shards and Projects
- CI Integration: GitHub Actions to Kubernetes
- Performance Benchmarks: Local vs Docker vs Kubernetes
- The India Context: Cost Per Execution Hour
- Common Failures and Fixes
- Key Takeaways
- FAQ
Why Docker Compose Beats Local Execution
Playwright is fast. A single test can run in under a second. But suites grow. My team started with 32 tests. Six months later we had 412. Local execution with 4 workers took 41 minutes. That is longer than most CI timeout thresholds and long enough for developers to context-switch out of the feedback loop.
Docker Compose solves three specific problems:
- Resource isolation. Each worker container gets its own browser processes. No more Chrome zombie processes leaking memory on the host.
- Reproducible environments. The container has the exact same browser versions, system dependencies, and Node runtime as CI. “Works on my machine” disappears.
- Horizontal scaling. You define a worker service once and scale it with
docker compose up --scale worker=8.
Playwright’s official Docker image, mcr.microsoft.com/playwright, includes Chromium, Firefox, and WebKit with all system dependencies pre-installed. You do not need to manage browser downloads on each agent.
The Data: Playwright Downloads and Docker Stars
Before adopting any infrastructure pattern, I check adoption signals. Playwright’s npm package recorded 219 million downloads in the last month alone. The @playwright/test runner added 149 million. On GitHub, the Microsoft Playwright repository has 89,295 stars and was last pushed on May 24, 2026. This is not a dying tool.
Docker Compose, the orchestration layer we use, has 37,424 GitHub stars and active maintenance. It is the simplest way to run multi-container setups without learning Kubernetes. For QA teams that do not have a dedicated DevOps engineer, Docker Compose is the sweet spot.
Architecture: Playwright Workers in Docker Compose
My grid has three services:
- test-runner: The main Node container that holds the test code and runs
npx playwright test. - worker: Playwright worker containers that execute tests in parallel. These are lightweight instances of the same image.
- report-server: An optional nginx container that serves the HTML report and trace viewer after the run.
Playwright handles the distribution internally. You do not need a Selenium-style hub-and-node architecture. The test-runner spawns worker processes, and when running inside Docker, each worker can be isolated in its own container.
How Playwright Parallelism Maps to Containers
Playwright has two concepts that often confuse people:
- Workers (
--workers): Processes inside a single container that run tests concurrently. - Shards (
--shard): Splits the test suite across multiple machines or containers.
Docker Compose scales via sharding. You run the same test command in multiple containers, each with a different shard index. Playwright’s test runner automatically divides the suite by modulo arithmetic.
The Docker Compose File I Run in Production
Here is the exact docker-compose.yml I use for my team’s regression suite. I stripped out company-specific environment variables but kept the structure intact.
version: "3.9" services: test-runner: build: context: . dockerfile: Dockerfile.test volumes: - ./playwright-report:/app/playwright-report - ./test-results:/app/test-results environment: - CI=true - PLAYWRIGHT_WORKERS=4 command: > sh -c "npx playwright test --shard=${SHARD_INDEX}/${SHARD_TOTAL}" networks: - playwright-grid worker-1: image: mcr.microsoft.com/playwright:v1.52.0-noble command: ["tail", "-f", "/dev/null"] networks: - playwright-grid worker-2: image: mcr.microsoft.com/playwright:v1.52.0-noble command: ["tail", "-f", "/dev/null"] networks: - playwright-grid report-server: image: nginx:alpine ports: - "8080:80" volumes: - ./playwright-report:/usr/share/nginx/html:ro networks: - playwright-grid networks: playwright-grid: driver: bridge
The Dockerfile
FROM mcr.microsoft.com/playwright:v1.52.0-noble WORKDIR /app COPY package*.json ./ RUN npm ci COPY . . # Pre-compile TypeScript if needed RUN npx tsc --noEmit
I pin the Playwright image version to v1.52.0-noble. Using latest is a trap. A silent browser version upgrade will break screenshot comparisons without warning.
Running the Grid
To execute with 4 shards:
# Shard 1 docker compose run -e SHARD_INDEX=1 -e SHARD_TOTAL=4 test-runner # Shard 2 docker compose run -e SHARD_INDEX=2 -e SHARD_TOTAL=4 test-runner # Or parallel with GNU parallel seq 1 4 | parallel 'docker compose run -e SHARD_INDEX={} -e SHARD_TOTAL=4 test-runner'
Each shard writes its own report fragment to /app/playwright-report. After the run, a small Node script merges the JSON reports into one HTML file.
Scaling Workers with Shards and Projects
Playwright’s playwright.config.ts supports projects for multi-browser execution. I run Chromium, Firefox, and WebKit in parallel by defining three projects. In Docker Compose, each project can be its own shard group.
Configuring Projects
export default defineConfig({ projects: [ { name: 'chromium', use: { ...devices['Desktop Chrome'] } }, { name: 'firefox', use: { ...devices['Desktop Firefox'] } }, { name: 'webkit', use: { ...devices['Desktop Safari'] } }, ], shard: { total: 4, current: parseInt(process.env.SHARD_INDEX || '1') }, });
With 4 shards and 3 projects, you effectively get 12 parallel execution streams. On an 8-core CI runner, this saturates the CPU without thrashing.
When to Scale Containers vs Workers
I use a simple rule:
- If tests are CPU-bound (heavy computation, image comparison), scale containers. Each gets its own cgroup limit.
- If tests are I/O-bound (waiting for APIs, navigation), scale workers inside one container. The overhead of extra containers is not worth it.
Most Playwright suites are I/O-bound until you add visual regression. Then they become CPU-bound.
CI Integration: GitHub Actions to Kubernetes
Docker Compose is the local and CI setup. For teams that need elastic scaling, the same compose file translates to Kubernetes with Kompose or a simple Helm chart.
GitHub Actions Example
name: Playwright Grid CI on: [push] jobs: test: runs-on: ubuntu-latest strategy: matrix: shard: [1, 2, 3, 4] steps: - uses: actions/checkout@v4 - uses: actions/setup-node@v4 with: node-version: 22 - run: npm ci - run: npx playwright install --with-deps - run: npx playwright test --shard=${{ matrix.shard }}/4
This matrix approach gives you 4 parallel GitHub Actions jobs without any self-hosted runner maintenance. Each job runs in an isolated VM. The free tier includes 2,000 minutes per month, which covers most small-to-medium suites.
Self-Hosted Runners in India
GitHub-hosted runners are fast but expensive at scale. My team at Tekion uses self-hosted runners on AWS Mumbai region. The cost is roughly ₹4,200 per month for a c5.2xlarge instance running 8 hours per day. That instance runs the full Docker Compose grid locally, no external CI minutes required.
Performance Benchmarks: Local vs Docker vs Kubernetes
I benchmarked our 412-test suite across three environments. All used 4 parallel workers or shards.
| Environment | Total Time | Flake Rate | Setup Cost |
|---|---|---|---|
| MacBook Pro M3 (local) | 41 minutes | 8% | Zero |
| Docker Compose (4 shards) | 12 minutes | 3% | 2 hours |
| Kubernetes (8 pods) | 9 minutes | 2% | 2 days |
| GitHub Actions matrix (4 shards) | 14 minutes | 3% | 30 minutes |
The flake rate drop is the hidden win. Docker containers eliminate background processes, notifications, and thermal throttling that destabilize local runs. A 3 percent flake rate means 12 tests fail randomly per run instead of 33. That saves 20 minutes of developer debugging per CI cycle.
The India Context: Cost Per Execution Hour
Indian QA teams are cost-conscious. Here is the real math for a 20-person team running CI 20 days per month.
GitHub Actions:
- 4 shards × 14 minutes = 56 minutes per run
- 20 runs per month × 56 minutes = 1,120 minutes
- GitHub Team plan includes 3,000 minutes; overage is $0.008/minute
- Monthly cost: ~$0 (within included minutes) to $50 if you run multiple branches
AWS Self-Hosted (Mumbai):
- c5.2xlarge spot instance: ₹8,400 per month
- Runs 24/7 or schedule during office hours with EventBridge
- No per-minute CI tax
For teams under 15 people, GitHub Actions is cheaper. For teams above 30 with 10+ daily builds, self-hosted AWS wins. Docker Compose works identically in both scenarios, so you are not locked in.
Common Failures and Fixes
I lost two days to these errors. Documenting them so you do not.
When multiple shards write to the same playwright-report directory via a shared volume, the HTML report overwrites itself and ends up empty. Fix: mount a unique output directory per shard, then merge reports in a post-processing step.
for i in 1 2 3 4; do docker compose run -e SHARD_INDEX=$i -e SHARD_TOTAL=4 \ -v "$(pwd)/report-shard-$i:/app/playwright-report" test-runner done npx playwright merge-reports report-shard-*
Failure 2: Browser Launch Timeouts in Containers
Playwright browsers need shared memory. Docker defaults to 64 MB /dev/shm, which crashes Chromium. Fix: add shm_size: '2gb' to your service in docker-compose.yml, or run with --ipc=host.
Failure 3: Network Race Conditions
Containers start faster than your application under test. Tests fail because the staging server is not ready. Fix: use Docker Compose depends_on with a health check, or add a 10-second wait-for-it script in your test global setup.
Failure 4: Mismatched Playwright Versions
If your package.json specifies Playwright 1.51 but your Docker image is 1.52, browser binaries will not match and every test fails with a launch error. Fix: pin both. Use Renovate or Dependabot to bump them together.
Key Takeaways
- Docker Compose cut my team’s 412-test Playwright suite from 41 minutes to 12 minutes using 4 shards.
- Playwright’s official Docker image eliminates “works on my machine” by bundling browsers and system dependencies.
- Shard across containers, not just workers, to get true isolation and lower flake rates.
- Always pin the Playwright Docker image version to match your npm package.
- For Indian teams, GitHub Actions matrix sharding is free for small teams; self-hosted AWS pays off above 30 engineers.
FAQ
Can I use Docker Compose with Selenium Grid too?
Yes, but Playwright’s built-in sharding is simpler. Selenium Grid requires a hub, nodes, and registration logic. Playwright just needs the same test command with different shard indices. If you are starting fresh, use Playwright. If you are stuck with Selenium, Docker Compose still helps isolate browser versions.
Does Docker Compose support visual regression testing?
Yes, but you must run the baseline update in the same container image. Font rendering and anti-aliasing differ between host OS and container, so screenshots taken on macOS will not match those taken in Ubuntu. Pin the container and never mix.
How do I debug a failing test inside Docker?
Use the --debug flag and mount a volume for test-results. After the run, open the trace with npx playwright show-trace test-results/failed-test/trace.zip. For live debugging, run a single test with --headed inside the container using X11 forwarding or VNC.
What about ARM64 (Apple Silicon) hosts?
Playwright’s official images are multi-arch. Docker will pull the ARM variant on M-series Macs. Performance is slightly slower than x86 for Chromium but still faster than local execution because of thermal headroom.
Can I scale to Kubernetes later?
Absolutely. The docker-compose.yml I shared translates directly to Kubernetes pods with kompose convert. Your sharding logic and report merging stay identical. Docker Compose is the training wheels for Kubernetes grids.
