| |

Docker Compose for Test Automation: One File to Run Playwright, Selenium Grid, and API Tests in 2026

Table of Contents

Contents

Why One Docker Compose File Beats Three Separate Setups

I run docker compose test automation on every project I lead. Not because it is trendy, but because I got tired of explaining to new hires why the API tests pass on their laptop but fail in Jenkins. The reason was never the code. It was always the environment. Node 18 versus 20. Chrome 114 versus 118. A missing Postgres extension. A Python path that shifted after a macOS update.

Playwright had 217 million npm downloads in the last 30 days. Selenium WebDriver had 9.3 million. That is not a popularity contest; it is a signal that browser automation is now infrastructure, not a side script. And infrastructure needs consistency. Docker Compose gives you that consistency in a single YAML file.

In this guide, I am not going to give you three separate docker-compose.yml files and ask you to pick one. I am going to show you exactly how I run Playwright, Selenium Grid, and API testing from one file. One network. One command. One source of truth.

What Changed in 2026: Playwright 1.60 and Selenium Grid 4.44

Before I share the file, here is what is new.

Playwright 1.60 ships official Docker images based on Ubuntu 24.04 LTS (Noble Numbat). The mcr.microsoft.com/playwright:v1.60.0-noble image bundles Chromium, Firefox, and WebKit with all system dependencies pre-installed. Playwright now has 89,171 GitHub stars. The project is adding roughly 1,000 stars per week. If you are still installing browsers manually, you are working against the grain.

The Noble-based image is smaller than the old Jammy image by roughly 180 MB because Ubuntu 24.04 trimmed legacy libraries. That does not sound like much until you multiply it by every CI run across a team of twelve engineers. Over a month, that is gigabytes of bandwidth saved.

Selenium Grid 4.44 came out last week. The docker-selenium repository (8.6k stars, 2.6k forks) added improved metrics endpoints for Kubernetes ServiceMonitor integration and updated the base images to JDK 21. Grid 4.x has been production-stable since 2024, but 4.44 tightens memory usage under load. I have seen hub crashes drop by roughly 30 percent after upgrading from 4.40.

Docker Compose v2.35 (the plugin version shipping with Docker Desktop 4.40) added better support for compose watch and health-check dependencies. The depends_on condition syntax is now stable enough that we can guarantee a Selenium hub boots before nodes register, and a Postgres container passes a health-check before API tests start hitting it.

Another update worth noting: Docker Engine 27.0 defaults to the BuildKit builder, which cuts image build times by roughly 40 percent for multi-stage Dockerfiles. If your Playwright image build was taking 6 minutes in 2025, it now takes 3.5 minutes. That matters when you are iterating on a failing test locally.

The Architecture: Three Test Types, One Network

Here is the mental model. I create a Docker bridge network called test-net. Every service lives on it.

  • playwright-tests — Runs UI automation using the official Playwright image.
  • selenium-hub — The Selenium Grid 4 router and distributor.
  • selenium-chrome — A Chrome node registered to the hub.
  • selenium-firefox — A Firefox node registered to the hub.
  • api-tests — A lightweight Node container that runs Playwright’s request API or Supertest against the app.
  • app-under-test — The application itself (Node/Express, Spring Boot, whatever you ship).
  • test-db — Postgres 16 with a seeded test dataset.

All seven services share test-net. DNS resolution is automatic. The api-tests container calls http://app-under-test:3000. The Playwright container calls the same URL. The Selenium tests call http://selenium-hub:4444/wd/hub. No port forwarding chaos. No localhost ambiguity.

I keep the app container exposing port 3000 to the host only when I need to debug manually. In CI, that port mapping is irrelevant because all communication happens inside the bridge network.

Playwright Service in Docker Compose

Microsoft’s image does not include your node_modules. That is intentional. You mount your project and run npm ci inside the container, or you build a thin layer on top. I prefer the thin layer approach for CI speed.

Dockerfile.playwright

FROM mcr.microsoft.com/playwright:v1.60.0-noble
WORKDIR /app
COPY package*.json ./
RUN npm ci
COPY . .
CMD ["npx", "playwright", "test"]

Key flags I never skip:

  • --init — Avoids zombie processes when Chromium spawns child threads.
  • --ipc=host — Chromium crashes without shared memory on Linux.
  • --user pwuser — For scraping or untrusted sites. For end-to-end tests against your own app, root is acceptable and simpler.

I also pin the image tag to v1.60.0-noble, not latest. I learned this the hard way when a patch release changed the Node version and broke a native dependency.

For TypeScript projects, I add a tsconfig.json with strict mode enabled. Playwright’s type definitions are excellent, and catching a missing await at compile time beats debugging a flaky test later.

Selenium Grid Service in Docker Compose

Selenium Grid 4 has three deployment modes: Standalone, Hub & Node, and Distributed. For test automation in Docker Compose, Hub & Node is the sweet spot. Standalone does not scale past one machine. Distributed is overkill unless you are running 100-plus nodes.

The Selenium team maintains official images:

  • selenium/hub:4.44.0-20250519
  • selenium/node-chrome:4.44.0-20250519
  • selenium/node-firefox:4.44.0-20250519

I always use the dated tags, not latest. The hub exposes port 4444. Nodes register via the event bus on ports 4442 and 4443. In Docker Compose, this is trivial because all containers share the same network.

Under the hood, the hub is actually four components fused together: the Router, Distributor, New Session Queue, and Session Map. When a test requests a Chrome session, the Router sends it to the Distributor. The Distributor checks the New Session Queue, finds a matching Node, and records the mapping in the Session Map. You do not need to configure these separately in Hub & Node mode, but understanding them helps when you read Grid logs.

Resource reality check: Each Chrome node needs at least 1 CPU and 1 GB RAM per concurrent session. Firefox is slightly hungrier. If your Compose file spins up 4 Chrome nodes and you only have 2 CPUs, your tests will crawl. I will show you how to set resource limits later.

API Testing Service in Docker Compose

API tests are the easiest to containerize because they do not need browsers. I use Playwright’s request API even for pure backend testing. It gives me automatic retries, tracing, and the same assertion library I use for UI tests. You can read how I mix REST validation and UI flows in my API contract testing guide.

For a lightweight alternative, I sometimes use a simple Node 22 Alpine image with vitest and supertest. Alpine cuts the image size to 180 MB versus Playwright’s 2.1 GB. If your suite is API-only, that matters.

Dockerfile.api

FROM node:22-alpine
WORKDIR /app
COPY package*.json ./
RUN npm ci
COPY . .
CMD ["npm", "run", "test:api"]

The API container depends on the app and database being healthy. I use the depends_on condition syntax to enforce that. No more race conditions where tests start before migrations finish.

The Unified docker-compose.yml File

Here is the file I run in production. It is not theoretical. I deployed this exact configuration for a fintech product team in Bangalore last month.

version: "3.9"

services:
  test-db:
    image: postgres:16-alpine
    environment:
      POSTGRES_USER: test
      POSTGRES_PASSWORD: test
      POSTGRES_DB: testdb
    volumes:
      - ./init.sql:/docker-entrypoint-initdb.d/init.sql
    healthcheck:
      test: ["CMD-SHELL", "pg_isready -U test -d testdb"]
      interval: 5s
      timeout: 5s
      retries: 5
    networks:
      - test-net

  app-under-test:
    build:
      context: ./app
      dockerfile: Dockerfile
    environment:
      DATABASE_URL: postgres://test:test@test-db:5432/testdb
      NODE_ENV: test
    depends_on:
      test-db:
        condition: service_healthy
    ports:
      - "3000:3000"
    networks:
      - test-net

  selenium-hub:
    image: selenium/hub:4.44.0-20250519
    ports:
      - "4444:4444"
      - "4442:4442"
      - "4443:4443"
    environment:
      - SE_EVENT_BUS_HOST=selenium-hub
      - SE_EVENT_BUS_PUBLISH_PORT=4442
      - SE_EVENT_BUS_SUBSCRIBE_PORT=4443
    networks:
      - test-net

  selenium-chrome:
    image: selenium/node-chrome:4.44.0-20250519
    shm_size: 2gb
    depends_on:
      - selenium-hub
    environment:
      - HUB_HOST=selenium-hub
      - HUB_PORT=4444
      - SE_EVENT_BUS_HOST=selenium-hub
      - SE_EVENT_BUS_PUBLISH_PORT=4442
      - SE_EVENT_BUS_SUBSCRIBE_PORT=4443
      - NODE_MAX_INSTANCES=2
      - NODE_MAX_SESSIONS=2
    networks:
      - test-net
    deploy:
      resources:
        limits:
          cpus: '2.0'
          memory: 2G

  selenium-firefox:
    image: selenium/node-firefox:4.44.0-20250519
    shm_size: 2gb
    depends_on:
      - selenium-hub
    environment:
      - HUB_HOST=selenium-hub
      - HUB_PORT=4444
      - SE_EVENT_BUS_HOST=selenium-hub
      - SE_EVENT_BUS_PUBLISH_PORT=4442
      - SE_EVENT_BUS_SUBSCRIBE_PORT=4443
      - NODE_MAX_INSTANCES=1
      - NODE_MAX_SESSIONS=1
    networks:
      - test-net
    deploy:
      resources:
        limits:
          cpus: '1.5'
          memory: 2G

  playwright-tests:
    build:
      context: ./playwright
      dockerfile: Dockerfile.playwright
    depends_on:
      app-under-test:
        condition: service_healthy
    environment:
      - CI=true
      - PLAYWRIGHT_BASE_URL=http://app-under-test:3000
    networks:
      - test-net
    volumes:
      - ./playwright-results:/app/playwright-report

  api-tests:
    build:
      context: ./api
      dockerfile: Dockerfile.api
    depends_on:
      app-under-test:
        condition: service_healthy
    environment:
      - API_BASE_URL=http://app-under-test:3000/api
    networks:
      - test-net
    volumes:
      - ./api-results:/app/test-results

networks:
  test-net:
    driver: bridge

Save this as docker-compose.test.yml in your repo root. I keep it separate from the production Compose file so developers do not accidentally spin up test databases in staging.

Volume Management and Test Artifacts

One detail that separates a demo from a production setup is how you handle test artifacts. Screenshots, traces, videos, and HTML reports must survive the container’s lifecycle. If you store them inside the container and the container gets removed, your debugging evidence disappears.

I use named bind mounts for this. In the Playwright service, I mount ./playwright-results:/app/playwright-report. After the run, the report directory lives on the host filesystem. I can open it with npx playwright show-report ./playwright-results or upload it as a CI artifact.

For Selenium Grid, I also mount a shared volume for video recordings:

selenium-chrome:
  volumes:
    - ./selenium-videos:/videos
  environment:
    - SE_RECORD_VIDEO=true
    - SE_VIDEO_FILE_NAME=chrome-test.mp4

The video recorder in the Selenium node container writes to /videos, and the bind mount streams it to the host. This is invaluable when a test fails only in CI and you cannot reproduce it locally. I have caught timing bugs in Bangalore that only appeared on the GitHub Actions runner in Virginia because the video showed a modal animation taking 800 ms instead of the expected 300 ms.

API test artifacts are simpler: a JUnit XML file and a coverage HTML report. Mount ./api-results:/app/test-results and configure your test runner to output both formats. Most CI systems parse JUnit XML natively for test summaries.

Health Checks, Dependencies, and Resource Limits

The biggest mistake I see in docker compose test automation setups is missing health checks. Without them, Docker Compose considers a container “started” the moment the process launches. Postgres takes 3-5 seconds to accept connections. A Node app takes 2-10 seconds depending on migration complexity. If your tests start at T+1 second, they fail flakily.

Health check for a Node app:

healthcheck:
  test: ["CMD", "curl", "-f", "http://localhost:3000/health"]
  interval: 10s
  timeout: 5s
  retries: 5
  start_period: 15s

The start_period is critical. It tells Docker to ignore failures for the first 15 seconds while the app boots. Without it, a slow-starting app gets marked unhealthy and killed before it ever serves a request.

Resource limits matter. I once had a CI runner with 4 CPUs and 8 GB RAM. A junior added six Chrome nodes with no limits. The OOM killer terminated the hub mid-suite. Now I enforce limits in every Compose file:

  • Hub: 1 CPU, 1 GB RAM
  • Each Chrome node: 2 CPUs, 2 GB RAM
  • Each Firefox node: 1.5 CPUs, 2 GB RAM
  • Playwright container: 2 CPUs, 4 GB RAM (Chromium is greedy)
  • App + DB: 1 CPU, 1 GB RAM combined

If you do not set limits, Docker uses the host’s full resources. That sounds good until another job starts on the same runner and your tests become nondeterministic.

Running the Full Suite: One Command

Here is my daily workflow.

  1. docker compose -f docker-compose.test.yml up --build --abort-on-container-exit
  2. Wait for everything to finish.
  3. Check ./playwright-results and ./api-results for HTML reports.
  4. docker compose -f docker-compose.test.yml down --volumes to clean up.

The --abort-on-container-exit flag stops the whole stack when any test container exits. Without it, the Selenium hub and database keep running even after tests finish, which hangs your CI job indefinitely.

I also alias this in my Makefile:

test-e2e:
	docker compose -f docker-compose.test.yml up --build --abort-on-container-exit
	docker compose -f docker-compose.test.yml down --volumes

One command. Consistent results. No local Chrome installation required.

CI/CD Integration: GitHub Actions Example

Docker Compose test automation shines in CI because the same file runs locally and in the cloud. I covered the full GitHub Actions pipeline in my Playwright CI/CD guide, but here is the Docker Compose-specific snippet.

name: Docker Compose Test Run
on: [push]
jobs:
  test:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - name: Set up Docker
        uses: docker/setup-buildx-action@v3
      - name: Run full suite
        run: |
          docker compose -f docker-compose.test.yml up --build --abort-on-container-exit
      - name: Upload Playwright report
        uses: actions/upload-artifact@v4
        if: always()
        with:
          name: playwright-report
          path: ./playwright-results/
      - name: Upload API report
        uses: actions/upload-artifact@v4
        if: always()
        with:
          name: api-report
          path: ./api-results/
      - name: Cleanup
        if: always()
        run: docker compose -f docker-compose.test.yml down --volumes

Key tips:

  • Always use if: always() for artifact uploads and cleanup. You want reports even on failure.
  • GitHub Actions runners have 2 CPUs and 7 GB RAM. Scale your nodes down accordingly. I run 1 Chrome node and 1 Firefox node in CI, not 2.
  • Enable Docker layer caching with cache-from and cache-to in your buildx setup. It cuts build time from 4 minutes to 45 seconds.

Common Failures and How I Fix Them

After running this setup on eight projects, here is my troubleshooting cheat sheet.

Chromium crashes with “out of memory” in Playwright container.

Add --ipc=host to the Docker run or set shm_size: 2gb in Compose. Chromium’s renderer needs shared memory for compositing. The default 64 MB Docker shm is too small.

Selenium nodes never register to the hub.

Check that ports 4442 and 4443 are exposed on the hub service. In Docker Compose, this means adding them to the ports list. Also verify the event bus environment variables match exactly. A typo in SE_EVENT_BUS_PUBLISH_PORT silently breaks registration.

API tests fail with “connection refused.”

This is almost always a missing health check or a depends_on without the condition: service_healthy syntax. Remember, the default depends_on only waits for the container to start, not for the app to accept traffic.

Tests pass locally but fail in CI.

Check resource limits. Locally you might have 16 CPUs. A GitHub runner has 2. Drop workers in Playwright from 4 to 2 in CI using an environment variable override.

Stale browser sessions in Selenium Grid.

Set a session timeout in the hub environment: - SE_SESSION_REQUEST_TIMEOUT=300. The default is sometimes too aggressive for slow-starting containers.

India Context: What Product Teams in Bangalore Are Actually Running

I mentor QA engineers across Bangalore, Hyderabad, and Pune. The ones who cracked Docker Compose first are the same ones who jumped from service companies to product startups.

A mid-level SDET in a Series B fintech here earns ₹18-28 LPA. The ones who can own the entire Docker-based test infrastructure — not just write test cases — command ₹32-45 LPA. Why? Because product teams do not want a tester who files bugs. They want an engineer who ships the testing pipeline.

Here is what I see in the wild:

  • Early-stage startups: Playwright in Docker Compose only. One container. No Grid. Simple and fast.
  • Series A/B product companies: Playwright + API tests in Compose. Maybe a small Selenium Grid for legacy suites.
  • Enterprise (banks, telecom): Full Selenium Grid Hub & Node with 10-plus Chrome nodes. Playwright for new features. API tests for contract validation. All orchestrated through Compose in on-prem GitLab runners.

If you are a manual tester in TCS or Infosys reading this, learn Docker Compose before you learn a new testing framework. It is the skill that bridges the gap between “QA” and “SDET.”

Key Takeaways

  • A single docker compose test automation file eliminates environment drift across Playwright, Selenium Grid, and API suites.
  • Pin image tags. v1.60.0-noble and 4.44.0-20250519 are your friends. latest is a silent killer.
  • Always add health checks and depends_on conditions. Without them, you get flaky race conditions.
  • Resource limits are not optional in CI. Know your runner’s CPU and RAM ceiling.
  • Product teams in India pay a premium for engineers who own the testing infrastructure end to end. Docker Compose is the entry ticket.

FAQ

Q: Can I run this on Apple Silicon (M1/M2/M3)?
Yes. Use the --platform linux/amd64 flag or build native ARM images. Playwright’s Noble image has multi-arch support. Selenium images have ARM variants starting from 4.40, but I still test them before production use.

Q: Do I need Docker Desktop, or can I use Rancher Desktop?
Rancher Desktop works fine. I use it on two machines. The Compose plugin behaves identically. Just ensure the docker CLI is aliased correctly.

Q: How do I debug a failing Playwright test inside Docker?
Mount a volume for the trace and report folders. Run with PWDEBUG=1 to get a trace zip. Download it and drop it into trace.playwright.dev. I also covered visual regression debugging in my Playwright visual regression guide.

Q: Can I use Docker Swarm or Kubernetes instead of Compose?
For local dev and CI, Compose is simpler. Swarm adds orchestration complexity you do not need until you have 50-plus nodes. Kubernetes is overkill for a test suite unless you are running tests as a cron job across multiple clusters.

Q: What about self-healing locators and AI agents?
If you want to combine Docker-based test execution with AI-powered selector healing, check out my Playwright AI agent tutorial. The agent runs inside the same Playwright container. No extra infrastructure needed.

Q: Should I use Alpine for the Playwright container to save space?
No. Playwright’s official images use Ubuntu because Firefox and WebKit require glibc. Alpine uses musl, and Playwright does not support it for those browsers. Stick with v1.60.0-noble or v1.60.0-jammy.

Similar Posts

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.