| |

The 90-Day Roadmap: From Manual Tester to AI Engineer in 2026

Contents

The 90-Day Roadmap: From Manual Tester to AI Engineer in 2026

I have trained over 15,000 QA engineers through The Testing Academy, and the question I hear most often is not about Selenium or Playwright. It is this: “How do I become an AI engineer?” In 2026, the gap between manual testing and AI-augmented QA has become a career chasm. The engineers who cross it are earning ₹35-60 LPA in India. The ones who do not are watching their work get automated by the very tools they refused to learn.

This article is a brutally practical manual tester to AI engineer roadmap. It is not a list of courses to buy. It is a day-by-day plan for the next 90 days, built from the actual skills I see hiring managers demand in interviews. No motivational fluff. Just the stack, the projects, and the milestones that get you hired.

Table of Contents

Why 2026 Is the Inflection Point

Three forces converged in 2025-2026 to make this transition not just possible, but urgent.

First, LLMs became reliable enough for production testing. GPT-4o, Claude 3.7 Sonnet, and Gemini 2.5 can now read test plans, generate edge cases, and triage failures with 85%+ accuracy. In 2024, this was a demo. In 2026, it is a job requirement.

Second, agent frameworks matured. LangChain, LangGraph, and CrewAI went from experimental to enterprise-grade. Companies are not just using AI to assist testers. They are building AI agents that autonomously run smoke tests, analyze logs, and file bugs. Someone has to build, maintain, and evaluate those agents.

Third, the talent gap is extreme. I have interviewed over 200 candidates in the last 18 months. Less than 10% can explain what a vector database is. Less than 5% have built anything with LangChain. The market is starving for QA engineers who understand both testing domain knowledge and AI engineering. That combination is rare, and rarity commands a premium.

If you are a manual tester with 2-5 years of experience, you are at the perfect intersection. You know how software breaks. You understand user journeys. You just need to add the AI engineering layer on top. This roadmap shows you exactly how.

The AI Engineer Skill Stack for QA

Before I break down the 90 days, here is the stack you will have at the end. Do not try to learn everything at once. The daily plan below sequences these intentionally.

Core Languages

  • Python 3.11+: The lingua franca of AI. You do not need to be a Python expert, but you need to be comfortable with dataclasses, type hints, and asyncio.
  • TypeScript: For Playwright and frontend agent tools. If you already know JavaScript, TypeScript is a weekend upgrade.

AI Frameworks

  • LangChain / LangGraph: For building agent pipelines with memory, tools, and multi-step reasoning.
  • OpenAI / Anthropic APIs: For LLM calls, embeddings, and fine-tuning basics.
  • Ollama: For local model development without API costs. Essential for iterating fast.

Testing-Specific AI Tools

  • DeepEval: The leading framework for LLM evaluation in testing contexts.
  • PromptFoo: For red-teaming prompts and measuring regression in LLM behavior.
  • Playwright: Still the browser automation backbone. AI agents need a browser driver.

Infrastructure

  • Vector databases (Astra DB, Chroma, Pinecone): For RAG-based test case retrieval and similarity search.
  • Docker: For containerizing agents and reproducible test environments.
  • GitHub Actions: For CI/CD of agent pipelines.

This looks like a lot. It is. But you are not learning it from scratch — you are adding to a foundation you already have. A manual tester who understands equivalence partitioning and boundary value analysis has a massive head start over a generic software engineer learning AI.

Days 1-30: Foundation — Python, APIs, and LLMs

The first month is about fluency. You are not building agents yet. You are making AI tools feel as natural as JIRA.

Week 1: Python for Testers

If your Python is rusty, start here. Do not take a generic Python course. Take one focused on testing and data processing. You need:

  1. List comprehensions and dictionary manipulation
  2. Writing functions with type hints
  3. Working with JSON and CSV
  4. Basic asyncio (for parallel API calls)
  5. Unit testing with pytest

Project: Write a script that reads a Jira CSV export, groups bugs by component, and prints the top 3 components by bug count. This is boring but forces you to manipulate real test data in Python.

Week 2: LLM APIs

Sign up for OpenAI and Anthropic accounts. Set a $20 budget. Learn:

  1. Chat completions API with system prompts
  2. Temperature and max_tokens tuning
  3. Structured output (JSON mode / function calling)
  4. Token counting and cost estimation

Project: Build a “test case generator.” Input: a user story description. Output: 5 test cases in JSON format with title, steps, and expected result. Use GPT-4o-mini to keep costs under $0.10 per run. Validate the output with Pydantic.

Week 3: Embeddings and Vector Search

This is where most testers get lost. Do not skip it. You need to understand:

  1. What an embedding is (a number vector representing meaning)
  2. Cosine similarity and why it matters for test case retrieval
  3. How to store and query embeddings in Chroma or Astra DB

Project: Create a “smart test case finder.” Upload 50 existing test cases to Chroma. When a developer submits a bug, embed the bug description and retrieve the top 3 most similar test cases. This is a primitive RAG system, and it is the core of modern AI testing.

Week 4: Ollama for Local Development

Running everything against OpenAI gets expensive and slow. Ollama lets you run models locally. Install Llama 3.1 8B and Mistral 7B. Learn:

  1. Pulling and running models
  2. Creating custom Modelfiles with system prompts
  3. Measuring latency vs quality tradeoffs

Project: Re-run your test case generator using Llama 3.1 8B locally. Compare the output quality and cost to GPT-4o-mini. Document the results. Hiring managers love candidates who understand local vs cloud model tradeoffs.

Days 31-60: Building — Agents, RAG, and Evaluation

Now you build. This is the hardest and most rewarding phase.

Week 5: LangChain Basics

LangChain abstracts LLM calls into chains. Learn:

  1. LLMChain and SimpleSequentialChain
  2. Prompt templates with variables
  3. Output parsers (Pydantic, JSON)
  4. Memory types (buffer, summary, vector)

Project: Build a “bug classifier chain.” Input: a bug description. Chain 1: summarize the bug. Chain 2: classify as UI, API, or Performance. Chain 3: assign severity (P0-P3) based on keywords. End-to-end latency should be under 3 seconds.

Week 6: LangGraph for Multi-Agent Workflows

LangGraph lets you build state machines where agents call tools and loop until done. This is where AI testing gets serious. Learn:

  1. StateGraph and conditional edges
  2. Tool calling (search, API, code execution)
  3. Human-in-the-loop breakpoints

Project: Build a “test planner agent.” The agent receives a feature spec, searches your existing test case database (via vector search), identifies gaps, and generates missing test cases. If it is unsure, it asks you for clarification. This is a real tool I use at Tekion.

Week 7: RAG for Test Case Generation

Retrieval-Augmented Generation is the biggest unlock for QA. Instead of asking an LLM to hallucinate test cases, you retrieve relevant existing cases and ask it to adapt them. Learn:

  1. Chunking strategies for test artifacts
  2. Hybrid search (keyword + vector)
  3. Re-ranking with cross-encoders

Project: Build a “regression minimizer.” Given a pull request diff, retrieve the top 10 most relevant historical test cases. Use an LLM to select the 3 highest-risk ones. Output a JSON list of test tags to run. This is predictive test selection, and it saves hours of execution time.

Week 8: Evaluation with DeepEval and PromptFoo

AI engineers who cannot evaluate their systems are dangerous. Learn:

  1. DeepEval metrics: G-Eval, Faithfulness, Answer Relevancy
  2. PromptFoo for prompt regression testing
  3. Creating a golden dataset of 50 labeled examples

Project: Evaluate your test case generator. Build a dataset of 20 real user stories with “gold standard” test cases written by a senior QA. Run your generator against all 20. Measure precision (how many generated cases are valid) and recall (how many gold cases were covered). Your target: 80% precision, 70% recall. If you hit this, you have a portfolio piece that beats most candidates.

Days 61-90: Production — CI/CD, Monitoring, and Portfolio

The final month is about making your projects look like real engineering work.

Week 9: Docker and GitHub Actions

Containerize your agent. Write a Dockerfile. Set up a GitHub Actions workflow that:

  1. Runs linting (ruff, mypy)
  2. Executes your evaluation suite
  3. Fails if precision drops below 75%
  4. Builds and pushes a Docker image

Project: Your test planner agent should now be deployable via docker run with an environment variable for the OpenAI API key.

Week 10: Playwright + AI Agent Integration

This is the bridge between your QA background and AI engineering. Learn:

  1. Running Playwright from Python (playwright-pytest)
  2. Using an LLM to generate Playwright selectors from natural language
  3. Self-healing selectors with embedding-based similarity

Project: Build a “self-healing smoke test.” A Playwright script that, when a selector fails, asks an LLM to suggest an alternative based on the page HTML. This is exactly what tools like BrowsingBee and AgentQA do at production scale. I wrote about the architecture in my guide on AI agents for QA.

Week 11: Monitoring and Observability

Production AI needs monitoring. Set up:

  1. LangSmith or Langfuse for tracing agent runs
  2. A simple Grafana dashboard showing daily test cases generated, precision, and cost
  3. Alerting if precision drops or API costs spike

Project: A public dashboard (or screenshot) showing your agent’s performance over time. This is instant credibility in interviews.

Week 12: Portfolio and Interview Prep

Package everything. You need:

  1. A GitHub repo with clean README, Dockerfile, and GitHub Actions
  2. A 5-minute demo video (Loom) walking through each project
  3. A one-page summary: “AI Testing Agent — 82% precision, 71% recall, $0.08 per run”
  4. Three LinkedIn posts documenting your journey (hiring managers stalk social media)

Practice explaining your RAG pipeline. The most common interview question I ask: “Why did you choose cosine similarity over dot product?” If you cannot answer, you did not build it deeply enough. Another favorite: “How would you reduce the cost of your agent by 50% without dropping precision below 70%?” Candidates who have optimized for cost always stand out.

What Hiring Managers Actually Want

I hire SDETs and AI engineers. Here is what separates candidates who get offers from those who do not.

They want proof, not promises. A certificate from an AI course means nothing. A GitHub repo with an evaluated agent means everything. Show me the code, the metrics, and the cost analysis.

They want domain expertise. A generic AI engineer who does not understand test design will build garbage. Show me you know boundary value analysis AND embeddings. That combination is rare.

They want pragmatism. I ask candidates: “Would you use GPT-4o for this or a local model?” The wrong answer is always GPT-4o. The right answer is: “It depends on latency, cost, and data privacy requirements.” Show me you have thought about tradeoffs.

They want communication. AI testing is new. Stakeholders are scared. Can you explain RAG to a product manager without saying “vector database”? Can you write a one-page proposal for why the team should adopt agent-based smoke tests? These soft skills are hard filters.

Salary Data: India vs Global

Let me be specific because this is what drives decisions.

Manual QA in India (2026): ₹4-8 LPA for 1-3 years. ₹8-15 LPA for 4-7 years. Growth is flat. Many manual testing roles at services companies now involve repetitive execution of pre-written scripts with little exploratory work. These are the roles most at risk from AI automation.

SDET with automation: ₹15-30 LPA. Playwright and CI/CD are the differentiators. Engineers who can write clean page object models and parallelize execution across shards are in demand. I broke down the full range in my SDET salary India 2026 analysis.

AI Engineer for QA: ₹25-45 LPA at startups. ₹40-60 LPA at product unicorns. Global remote roles pay $80k-$150k USD. The premium is not just for coding skill. It is for the ability to design evaluation systems, manage model costs, and explain AI behavior to non-technical stakeholders.

The jump from manual QA to AI engineer is not ₹5 LPA to ₹60 LPA in 90 days. That is unrealistic. But the jump from ₹12 LPA manual QA to ₹20 LPA AI-augmented SDET is absolutely achievable in 90 days. From there, ₹30+ is a 12-month sprint if you keep building.

At TCS and Infosys, “AI engineer” titles do not exist yet. But internal upskilling programs are starting. The first movers inside these companies will be the ones who get the new roles when they are announced. At product companies, the roles exist today and are unfilled. I see LinkedIn job postings for “GenAI SDET” and “LLM Testing Engineer” every week now, mostly from Bangalore and Hyderabad-based startups.

Global remote work adds another dimension. A tester in Tier 2 India who masters AI evaluation and builds a public portfolio can compete for US and EU remote contracts. The time zone difference is manageable for async work like evaluation dataset creation and prompt engineering. I know three engineers from Indore and Kochi who landed remote US contracts at $40/hour within six months of starting this exact roadmap.

Common Traps and How to Avoid Them

I have watched hundreds of testers try this transition. Most fail for predictable reasons.

Trap 1: Tutorial Hell

You watch 50 hours of LangChain tutorials and build zero projects. Break this by building on day one. Your first project will be ugly. That is fine. The learning is in the debugging.

Trap 2: Ignoring the Testing Domain

You become a generic AI engineer and lose your QA edge. Do not do this. Your competitive advantage is that you know how software breaks. Every project you build should solve a testing problem.

Trap 3: Chasing Every New Tool

CrewAI, AutoGen, Dify, n8n AI—the tool list is endless. Pick LangChain + Playwright + one vector DB. Master them. Depth beats breadth in interviews. I see candidates who have built Hello World in six frameworks and nothing production-ready in any. Hiring managers can smell this immediately.

Trap 4: No Public Proof

You build everything locally and never share it. Push to GitHub. Write about it. The job market is a marketplace of attention. If nobody can find your work, you do not exist to hiring managers. I found my last two hires through their LinkedIn posts, not their resumes.

Trap 5: Underestimating Evaluation

You build an agent that “works” but you never measure it. This is the difference between a demo and a product. Use DeepEval. Create golden datasets. Know your precision and recall numbers cold.

Key Takeaways

  • The manual tester to AI engineer transition is urgent because LLMs, agent frameworks, and talent scarcity converged in 2026.
  • The 90-day roadmap splits into Foundation (Python + LLMs), Building (LangChain + RAG + Evaluation), and Production (CI/CD + Portfolio).
  • Your competitive advantage is domain expertise in testing combined with AI engineering skills. Do not abandon your QA foundation.
  • Hiring managers want proof: GitHub repos, evaluation metrics, and demo videos. Certificates are irrelevant.
  • In India, AI-augmented SDETs earn ₹25-45 LPA at startups and ₹40-60 LPA at unicorns. Services firms are lagging but internal programs are starting.
  • Avoid tutorial hell, tool chasing, and lack of evaluation. Build one project deeply and measure it rigorously.

FAQ

Do I need a computer science degree to become an AI engineer?

No. I have hired AI engineers without degrees. What matters is demonstrated skill: GitHub projects, evaluation metrics, and clear communication. A degree helps for visa-sponsored roles, but for Indian startups and product companies, the portfolio is king.

How much does the 90-day plan cost?

Budget ₹10,000-15,000. This covers OpenAI API usage (₹3,000), a cloud VPS for hosting (₹2,000), and courses (₹5,000-10,000). You can reduce costs by using Ollama for local development and free tiers of Chroma and Astra DB.

What if I do not know Python?

Add two weeks to the plan. Python for testers is not hard. You need to be functional, not fluent. Focus on data structures, APIs, and pytest. Skip Django, Flask, and web development entirely.

Will AI replace manual testers completely?

Not completely, but it will replace the ones who refuse to adapt. Manual testing of AI-generated code, human-in-the-loop validation, and exploratory testing with AI assistance are growing fields. The job is changing, not disappearing. The testers who learn AI become the ones who manage the agents.

What is the single most important project for my portfolio?

A RAG-based test case generator with published precision/recall metrics. It demonstrates Python, vector search, LLM prompting, and evaluation — the four pillars of AI testing. Everything else is a bonus.

How do I balance this with a full-time job?

Most manual testers in India work 9-10 hour shifts. You will not have 4 hours a day. You have 90 minutes. That is enough. Wake up early. Use weekends for deep project work. The key is consistency, not intensity. Missing one day is fine. Missing five days in a row breaks momentum.

Should I quit my job to focus on this full time?

No. The financial stress will hurt your learning. Keep your job, build projects on weekends, and start interviewing after day 60. If you get an offer at ₹20+ LPA, then consider the transition. Until then, your current job pays the bills and funds your API experiments.

Similar Posts

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.