|

Build MCP Servers for QA: A Complete Guide to Model Context Protocol Test Automation Tools

Contents

Build MCP Servers for QA: A Complete Guide to Model Context Protocol Test Automation Tools

In the last eight months, I have built four custom MCP servers for my QA teams at Tekion and for clients at The Testing Academy. Two of them handle test data generation, one orchestrates CI pipeline queries, and the fourth exposes our internal BrowsingBee agent API to Claude Desktop. None of them took more than a weekend to prototype. That is the real story of MCP in 2026: it is not just about using the Playwright MCP server anymore. It is about building your own.

In this guide, I show you how to build MCP servers for QA from scratch. You will learn the protocol internals, write a working server in TypeScript, connect it to Claude Desktop, and see three production-ready patterns I use weekly. If you have already read my introduction to MCP for browser testing, this article is the sequel. We go from consumer to creator.

Table of Contents

What Is MCP Really? The Protocol for QA Engineers

MCP is an open standard that lets AI assistants call external tools through structured JSON-RPC. Anthropic open-sourced it in November 2024. By May 2026, the modelcontextprotocol/servers repository on GitHub has crossed 86,459 stars and 10,866 forks. The official TypeScript SDK sits at 12,569 stars with 1,884 forks. These numbers matter because they mean the tooling is mature, the community is active, and the protocol is not going away.

But here is what the README does not tell you: MCP is a contract, not a product. When you build an MCP server, you are writing a small API that exposes capabilities to any LLM that speaks the protocol. The LLM does not care if your server generates test data, queries Jenkins, or controls a browser. It sees a list of available tools, their input schemas, and their descriptions. Then it decides which tool to call and with what arguments.

For QA engineers, this is a paradigm shift. Instead of writing tests in a siloed framework, you are building infrastructure that AI agents can compose. A single MCP server can serve a manual tester using Claude Desktop, an SDET using VS Code Copilot, and a CI bot using a headless LangChain agent. One server, infinite clients.

The Three Primitives

MCP defines three primitives that every server can expose:

  • Tools: Functions the LLM can call to perform actions. Example: generate_test_data(schema).
  • Resources: Read-only data the LLM can fetch for context. Example: test-run-report://{run_id}.
  • Prompts: Pre-defined templates that guide the LLM’s behavior. Example: “Write a Playwright test for this API contract.”

Most QA use cases start with Tools. That is where the action is. Resources are useful for giving the LLM context about your test suite without stuffing the entire HTML into the prompt. Prompts are underrated; I use them to standardize how my team asks the LLM to review pull requests.

Why Build Custom MCP Servers Instead of Using Off-the-Shelf Tools

The Playwright MCP server is excellent. So are the GitHub, Slack, and PostgreSQL servers in the official repository. But every QA team has internal systems that will never get an official MCP integration: proprietary test runners, custom dashboards, internal Jira clones, legacy SOAP APIs, and on-premise CI servers. Building a custom MCP server is how you bridge that gap.

The ROI Data from My Teams

At Tekion, I built an MCP server that exposes our internal test orchestrator. Before MCP, querying the status of a smoke test run meant opening three tabs: Jenkins, our custom dashboard, and Slack. Now I ask Claude, “Did the payment gateway smoke suite pass on staging?” and the MCP server queries all three systems and returns a consolidated answer. The average time to check CI status dropped from 4 minutes to 12 seconds.

For The Testing Academy, I built a test data generation server that connects to our student database. When a student asks for realistic Indian address data for a form-filling test, the server generates Aadhaar-like numbers, valid pin codes, and realistic names. It took 3 hours to build and has served 2,400+ requests in the last 90 days.

When to Build vs When to Use Existing

Scenario Action
You need browser automation Use the official Playwright MCP server
You need to query GitHub issues Use the official GitHub MCP server
You need to generate test data from internal schemas Build a custom server
You need to query a proprietary CI dashboard Build a custom server
You need to validate API contracts against internal specs Build a custom server
You need to orchestrate multi-tool agent workflows Build a custom orchestrator server

MCP Architecture: Tools, Resources, and Prompts

Before we write code, you need to understand the lifecycle of an MCP interaction. It is simpler than it looks.

The Connection Lifecycle

  1. Initialization: The client (Claude Desktop) starts the server as a subprocess. They exchange protocol version and capability flags.
  2. Tool Discovery: The client asks the server, “What tools do you have?” The server returns a JSON schema for each tool, including name, description, and parameter types.
  3. Execution: The LLM decides it needs a tool. The client sends a JSON-RPC request. The server executes the function and returns the result.
  4. Teardown: When the client closes, the server process exits cleanly.

Transport Options

MCP supports two transports:

  • Stdio: The server runs as a local subprocess. Input and output happen over stdin/stdout. This is what Claude Desktop uses by default. It is simple and secure because the server never opens a network port.
  • SSE (Server-Sent Events): The server runs as an HTTP service. The client connects over HTTP and receives events as a stream. This is what you use for remote servers or Docker deployments.

For internal QA tools, I almost always start with stdio. If I need to share the server across a team, I switch to SSE behind an internal VPN or authentication proxy.

Building Your First QA MCP Server in TypeScript

Let us build a minimal MCP server that generates realistic test data. This is the foundation for every other pattern in this article.

Step 1: Scaffold the Project

mkdir qa-mcp-server && cd qa-mcp-server
npm init -y
npm install @modelcontextprotocol/sdk zod
npm install -D typescript @types/node
npx tsc --init

Step 2: Write the Server

import { Server } from "@modelcontextprotocol/sdk/server/index.js";
import { StdioServerTransport } from "@modelcontextprotocol/sdk/server/stdio.js";
import {
  CallToolRequestSchema,
  ListToolsRequestSchema,
} from "@modelcontextprotocol/sdk/types.js";
import { z } from "zod";

const GenerateTestDataSchema = z.object({
  type: z.enum(["user", "address", "payment"]),
  count: z.number().min(1).max(100).default(1),
  locale: z.enum(["en-US", "en-IN"]).default("en-US"),
});

const server = new Server(
  { name: "qa-test-data-server", version: "1.0.0" },
  { capabilities: { tools: {} } }
);

server.setRequestHandler(ListToolsRequestSchema, async () => {
  return {
    tools: [
      {
        name: "generate_test_data",
        description: "Generate realistic test data for QA automation",
        inputSchema: {
          type: "object",
          properties: {
            type: {
              type: "string",
              enum: ["user", "address", "payment"],
              description: "Type of test data to generate",
            },
            count: {
              type: "number",
              minimum: 1,
              maximum: 100,
              description: "Number of records to generate",
            },
            locale: {
              type: "string",
              enum: ["en-US", "en-IN"],
              description: "Locale for realistic regional data",
            },
          },
          required: ["type"],
        },
      },
    ],
  };
});

server.setRequestHandler(CallToolRequestSchema, async (request) => {
  if (request.params.name === "generate_test_data") {
    const args = GenerateTestDataSchema.parse(request.params.arguments);
    const data = [];
    for (let i = 0; i < args.count; i++) {
      if (args.type === "user") {
        data.push({
          name: args.locale === "en-IN" ? `Rahul Sharma ${i}` : `John Doe ${i}`,
          email: `test${i}@example.com`,
          phone: args.locale === "en-IN" ? `+91 9${Math.floor(Math.random() * 1000000000)}` : `+1-555-${String(Math.floor(Math.random() * 10000)).padStart(4, "0")}`,
        });
      } else if (args.type === "address") {
        data.push({
          street: `${Math.floor(Math.random() * 999)} Main St`,
          city: args.locale === "en-IN" ? "Bangalore" : "New York",
          pincode: args.locale === "en-IN" ? `560${String(Math.floor(Math.random() * 100)).padStart(3, "0")}` : `${Math.floor(Math.random() * 90000) + 10000}`,
        });
      }
    }
    return { content: [{ type: "text", text: JSON.stringify(data, null, 2) }] };
  }
  throw new Error("Unknown tool");
});

const transport = new StdioServerTransport();
await server.connect(transport);
console.error("QA Test Data MCP server running on stdio");

Step 3: Configure Claude Desktop

{
  "mcpServers": {
    "qa-test-data": {
      "command": "node",
      "args": ["/path/to/qa-mcp-server/dist/index.js"]
    }
  }
}

Restart Claude Desktop, type “Generate 5 Indian user records for my test suite,” and watch it work. That is the entire setup. Under 80 lines of TypeScript.

Pattern 1: Test Data Generation Server

The test data server is the gateway drug of MCP for QA. Every test needs data, and every team has custom rules about what valid data looks like.

What I Built for BrowsingBee

At BrowsingBee, we test against hundreds of e-commerce sites. Each site has different product schemas, shipping rules, and payment flows. I built an MCP server that connects to our internal “site profile” database. When I ask the LLM to “generate a checkout flow for Amazon India,” the MCP server looks up the profile, generates the correct product IDs, selects the valid payment methods for India, and produces a complete test script scaffold.

Extending the Basic Server

Here is how I extended the basic template to support schema validation:

server.setRequestHandler(CallToolRequestSchema, async (request) => {
  if (request.params.name === "generate_test_data") {
    const args = GenerateTestDataSchema.parse(request.params.arguments);
    // Connect to internal PostgreSQL for realistic seeded data
    const client = new Client({ connectionString: process.env.TEST_DB_URL });
    await client.connect();
    const rows = await client.query(
      "SELECT * FROM seed_data WHERE type = $1 AND locale = $2 LIMIT $3",
      [args.type, args.locale, args.count]
    );
    await client.end();
    return { content: [{ type: "text", text: JSON.stringify(rows.rows, null, 2) }] };
  }
});

The key insight: MCP servers are just thin wrappers around your existing infrastructure. You do not rewrite your data layer. You expose it.

Pattern 2: CI Pipeline Status Server

This is the server I use most often. It answers the question every QA engineer asks twenty times a day: “Is the build green?”

The Problem with Manual CI Checks

At Tekion, we run 14 parallel CI pipelines across 6 environments. Checking the status of a specific feature branch means navigating Jenkins, GitHub Actions, and an internal deployment tracker. The cognitive load is real. I measured it: engineers spent an average of 4.2 minutes per check, and senior SDETs were interrupted with status requests 8-12 times per day.

The MCP Solution

I built a CI status MCP server that aggregates three APIs:

  1. GitHub Actions API for PR checks
  2. Jenkins REST API for deployment jobs
  3. Internal deployment tracker for environment status
{
  name: "get_pipeline_status",
  description: "Get the CI status for a branch across all systems",
  inputSchema: {
    type: "object",
    properties: {
      branch: { type: "string", description: "Git branch name" },
      includeDeployments: { type: "boolean", default: false }
    },
    required: ["branch"]
  }
}

Now I ask Claude: “Is the payment-refactor branch green, and is it deployed to staging?” The server queries all three systems and returns a unified report. Total time: under 2 seconds.

Adding Alerts

I added a second tool called notify_on_failure that posts to Slack when a critical pipeline fails. The LLM can decide to call it based on the status results. This turns Claude from a passive assistant into an active monitoring partner.

Pattern 3: Internal Agent API Server

This is the most advanced pattern. I exposed the BrowsingBee agent API as an MCP server so that Claude can orchestrate our internal AI testing agents directly.

Architecture

Our internal agents run on a Kubernetes cluster. They expose a gRPC API. I wrote a small translation layer that converts MCP tool calls into gRPC requests. The server exposes three tools:

  • run_smoke_test(url): Triggers a headless Playwright smoke test against a URL.
  • get_visual_diff(baseline, target): Compares two screenshots using our internal visual regression engine.
  • analyze_logs(run_id): Fetches and summarizes test logs using an LLM summarizer.

Real Workflow

Here is a real workflow I ran last week:

  1. I typed: “Run a smoke test on the new staging URL and tell me if anything looks broken.”
  2. Claude called run_smoke_test and got back a run ID.
  3. It then called analyze_logs with that run ID.
  4. The logs showed two console errors. Claude suggested they were related to a missing CSP header.
  5. I fixed the header. Total debugging time: 11 minutes. Without MCP, I would have opened three different tools and spent 45 minutes.

This pattern is where MCP stops being a convenience and starts being a force multiplier. You are not just querying data. You are orchestrating systems.

Securing and Deploying MCP Servers in Enterprise Environments

MCP servers run as subprocesses with the same permissions as the user. This is a feature and a risk.

Security Checklist

  • Never hardcode secrets: Use environment variables or a secret manager. The TypeScript SDK reads process.env like any Node.js app.
  • Scope database connections: Give your MCP server a read-only database user. It does not need DROP TABLE privileges.
  • Validate inputs: Use Zod or JSON Schema validation on every tool parameter. LLMs hallucinate arguments more often than you think.
  • Log aggressively: Every tool call should log the caller, the arguments, and the result. If an LLM deletes something, you need an audit trail.
  • Network isolation: If you use SSE transport, bind to localhost or put the server behind an internal VPN. Do not expose MCP servers to the public internet.

Deployment Patterns

For local development, stdio is perfect. For team sharing, I package the server as a Docker image and run it on an internal VM. For CI integration, I run the server as a sidecar container in Kubernetes and have the main test pod connect over localhost SSE. This keeps the protocol standardized while the infrastructure scales.

India Context: Why Product Teams Are Building Custom MCP Tools

In Bangalore and Hyderabad, the adoption of custom MCP servers is following a familiar pattern. Product companies move first. Service companies follow two years later.

A senior QA lead at a fintech unicorn told me last month that their team of six SDETs now maintains three internal MCP servers: one for test data, one for environment provisioning, and one for compliance report generation. Their sprint velocity improved by 35% because manual testers can now self-serve environment checks through Claude instead of pinging the SDET channel on Slack.

The salary data supports this trend. In my 2026 India salary survey, engineers who list “AI agent tooling and MCP” on their resume are quoting ₹28-38 LPA at product companies. That is a 15-20% premium over standard Playwright automation roles. The reason is simple: there are not enough engineers who understand both test automation and protocol design.

If you are a manual tester in India looking to make the jump, here is my recommendation. Learn Playwright first. Then build one MCP server. It does not matter what it does. What matters is that you understand the protocol, the schema design, and the security implications. That combination is rare, and hiring managers are paying for it.

Common Mistakes When Building MCP Servers for Testing

I have reviewed MCP servers from six different teams in the last quarter. Here are the mistakes I see repeatedly.

  1. Over-exposing tools: A team exposed 47 tools in one server. The LLM context window could not hold the schema descriptions, and tool selection accuracy dropped to 30%. Keep it under 10 tools per server. Split into multiple servers if needed.
  2. Vague descriptions: The LLM picks tools based on their descriptions. “Do stuff with data” is useless. “Generate realistic Indian address records for form-filling tests” is precise.
  3. No input validation: An LLM once passed a negative number to a count parameter because the server did not enforce a minimum. Zod schemas prevent this.
  4. Blocking the event loop: MCP servers must be async. A synchronous database call stalls the entire JSON-RPC transport and breaks the client connection.
  5. Ignoring errors: When a tool fails, return a clear error message in the MCP format. Do not just crash the server. The LLM can often recover if it understands what went wrong.

Key Takeaways

  • MCP is an open protocol, not a product. Building custom servers lets you connect LLMs to your internal QA infrastructure.
  • Start with a test data generation server. It is the simplest pattern and delivers immediate value.
  • The CI status server saves the most senior-engineer time by eliminating status-check interruptions.
  • The agent API server is the advanced pattern. It turns Claude into an orchestrator for your existing automation stack.
  • Always validate inputs, use read-only database users, and log every tool call.
  • Teams in India that build custom MCP tools are seeing 35% sprint velocity gains and 15-20% salary premiums.
  • Keep descriptions precise and tool counts low. The LLM’s tool selection accuracy depends on it.

FAQ

Do I need to know TypeScript to build an MCP server?

No. The SDK has Python and Kotlin implementations too. I use TypeScript because Playwright is TypeScript-first, and my team shares types between the test suite and the MCP server. But Python is equally valid.

Can MCP servers call other MCP servers?

Yes. You can compose servers. I have a “QA Orchestrator” server that calls the test data server, the CI status server, and the Playwright browser server in sequence. This is how you build multi-step agent workflows without writing complex orchestration code.

How do I debug an MCP server?

Use the MCP Inspector tool: npx @modelcontextprotocol/inspector node dist/index.js. It gives you a web UI to list tools, send requests, and inspect responses. It is the Postman of MCP.

Is MCP secure for banking and healthtech?

Yes, if you deploy it correctly. Run servers locally or on internal networks. Use read-only connections. Audit every tool call. For extreme sensitivity, run a local LLM via Ollama and connect MCP to it. No data leaves your network.

Will building MCP servers replace SDETs?

No more than JIRA replaced project managers. MCP is a tool that amplifies SDETs. The engineers who build and maintain these servers are more valuable, not less, because they control the interface between AI and infrastructure.

Similar Posts

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.