Vibe Coding Broke the SDLC. Vibe Testing Is How QA Puts It Back Together.
A developer ships a login feature in 20 minutes using Cursor and Claude. The AI writes the code, generates the tests, and every test passes. Two weeks later, the feature is in production — and users can log in with literally any password.
This is not a hypothetical scenario. This is the new normal. According to the Tricentis 2025 Software Quality Report, 46.6% of respondents in Singapore admitted to releasing code without fully testing it. Even more alarming: 87.2% now leave release decisions to GenAI tools. The machines are writing the code, testing the code, and deciding whether the code ships.
Welcome to the age of vibe coding — and the urgent, emerging discipline of vibe testing.
When Andrej Karpathy coined the term “vibe coding” in early 2025, he described it as surrendering to the AI: accept all suggestions, don’t read the diffs, just see if it works. As SitePoint reported in March 2026, “Vibe coding in 2026 looks nothing like the casual experiment Karpathy described.” It has evolved from a fun experiment into a production development methodology — complete with official Google Cloud deployment guides and enterprise adoption at scale.
But here is the problem: our testing strategies have not kept up. Traditional test automation assumes a human developer understood the code they wrote. That assumption is now broken. This article lays out the five failure modes unique to vibe-coded applications, introduces a practical vibe testing checklist, and explains why exploratory testing is more critical than ever when the developer did not fully understand the code they shipped.
Contents
1. What Is Vibe Coding — And Why It Broke the SDLC
Vibe coding is the practice of using AI assistants (Cursor, GitHub Copilot, Claude Code, Windsurf) to generate entire features, modules, or applications through natural language prompts rather than manual coding. The developer describes what they want; the AI produces the implementation.
The Evolution: From “Accept All” to Specification Engineering
In its original form, vibe coding meant accepting every AI suggestion without reviewing diffs. But the practice has matured significantly. Today’s vibe coding involves specification engineering — writing detailed prompts that serve as pseudo-specifications, iterating on AI outputs, and using AI-assisted code review. Google Cloud now maintains an official “What is Vibe Coding” guide that includes deployment workflows and best practices for production use.
This maturation is exactly what makes it dangerous. When vibe coding was obviously experimental, teams treated it with appropriate caution. Now that it looks professional — complete with enterprise tooling, CI/CD integration, and official cloud provider documentation — organizations are shipping vibe-coded features with the same confidence they’d apply to traditionally developed code. But the underlying risk profile is fundamentally different.
Where the SDLC Breaks
The traditional SDLC assumes a chain of understanding: requirements → design → implementation → testing → deployment. At each stage, a human understands what was built and why. Vibe coding severs this chain at the implementation stage. The developer may understand the intent but not the implementation. This creates a gap that conventional testing strategies are not designed to catch.
If you have worked with AI-assisted frameworks, you have likely encountered this firsthand. In my experience building a real automation framework from scratch using vibe coding, I documented how AI tools can accelerate delivery while simultaneously introducing subtle risks that only surface under real-world conditions.
2. What Is Vibe Testing?
Vibe testing is a QA methodology specifically designed for validating software built with AI-assisted development tools. The term was coined by LambdaTest and represents a fundamental shift in how we think about test strategy.
Traditional Testing vs. Vibe Testing
Traditional testing asks: “Does the code do what the specification says?”
Vibe testing asks: “Does the code do what the specification says, AND did the AI actually implement what the developer intended, AND do the AI-generated tests actually validate the right behavior?”
This triple-verification is necessary because vibe-coded applications introduce three layers of potential misunderstanding:
- Prompt-to-intent gap: The developer’s prompt may not fully capture their intent
- Intent-to-implementation gap: The AI may interpret the prompt differently than intended
- Implementation-to-test gap: AI-generated tests may validate the implementation rather than the intent
Vibe testing addresses all three gaps through a combination of exploratory testing, specification validation, mutation testing, and behavioral contract verification.
3. The 5 Failure Modes of Vibe-Coded Applications
After analyzing dozens of vibe-coded projects and reviewing industry reports from Tricentis, Zoonou, and CDOTrends, I have identified five distinct failure modes that are unique to — or dramatically amplified by — AI-assisted development.
Failure Mode #1: Requirement Drift
What it is: The AI subtly reinterprets requirements during implementation, producing code that satisfies the literal prompt but misses the business intent.
Example: A developer prompts: “Build a user registration form with email validation.” The AI implements client-side regex validation for email format — but does not implement server-side validation, does not check for duplicate emails, and does not verify the email domain actually exists. The “requirement” was technically met. The business need was not.
Why traditional tests miss it: If the AI also generates the tests, it will test what it built — client-side regex validation — and every test will pass. The missing server-side validation never gets tested because the AI never built it.
Vibe testing approach: Specification-first review. Before looking at any code or tests, compare the original business requirement against the AI’s interpretation. Create a requirement traceability matrix that maps each business need to its implementation AND its test coverage.
Failure Mode #2: Hallucinated Logic
What it is: The AI generates code that contains plausible-looking but fundamentally incorrect business logic — logic that was never specified and does not match any real-world requirement.
Example: An AI is asked to implement a pricing calculator for a SaaS product. It generates a discount logic that applies a 15% discount for annual subscriptions. The actual business rule is 20% for annual, 10% for semi-annual. The AI “hallucinated” a discount structure that looks reasonable but is completely wrong.
Why traditional tests miss it: The AI-generated test asserts that the discount is 15% — and it passes. The test validates the hallucinated behavior perfectly. This is what the industry calls “tests that validate incorrect behavior.”
Vibe testing approach: Business logic audits with domain experts. Every calculation, every conditional branch, every business rule in vibe-coded software needs explicit sign-off from someone who understands the domain — not just the AI’s interpretation of it.
Failure Mode #3: The Test-That-Tests-Nothing
What it is: AI-generated tests that achieve high coverage metrics but do not actually validate meaningful behavior. They pass, they look professional, and they are functionally useless.
Example: Consider this AI-generated test for a password validation function:
// AI-generated test — looks correct, tests nothing meaningful
describe('Password Validation', () => {
test('should validate password', () => {
const result = validatePassword('Test@1234');
expect(result).toBeDefined(); // Only checks it returns something
});
test('should reject weak password', () => {
const result = validatePassword('123');
expect(result).not.toBeNull(); // Checks it returns non-null, not that it REJECTS
});
test('should handle edge cases', () => {
const result = validatePassword('');
expect(typeof result).toBe('object'); // Type check, not behavior check
});
});
Every test passes. Coverage reports show 100% for the validatePassword function. But none of these tests actually verify that valid passwords are accepted and invalid ones are rejected. They test that the function returns something — not that it returns the right thing.
Here is what real tests should look like:
// Human-reviewed test — validates actual behavior
describe('Password Validation', () => {
test('should accept password meeting all criteria', () => {
const result = validatePassword('Str0ng@Pass!');
expect(result.isValid).toBe(true);
expect(result.errors).toHaveLength(0);
});
test('should reject password without uppercase', () => {
const result = validatePassword('weak@1234');
expect(result.isValid).toBe(false);
expect(result.errors).toContain('Must contain at least one uppercase letter');
});
test('should reject empty password with specific error', () => {
const result = validatePassword('');
expect(result.isValid).toBe(false);
expect(result.errors).toContain('Password cannot be empty');
});
});
Vibe testing approach: Mutation testing. Use tools like Stryker (JavaScript), mutmut (Python), or PITest (Java) to mutate the source code and verify that tests actually fail when behavior changes. If you can change the password validation logic and all tests still pass, your tests are testing nothing.
Failure Mode #4: Dependency Confusion
What it is: The AI introduces dependencies, packages, or API calls that are outdated, deprecated, non-existent, or subtly wrong versions — and the developer does not catch it because they did not write the import statements.
Example: An AI generates a Node.js authentication module that imports jsonwebtoken@8.5.1 — a version with a known vulnerability (CVE-2022-23529). The AI learned from training data that included this version, and the developer accepted the suggestion without checking. The tests pass because the vulnerable version still functions correctly — it is a security flaw, not a functional one.
Why traditional tests miss it: Functional tests verify behavior, not supply chain integrity. The JWT tokens work perfectly. They are just generated by a library with a critical security vulnerability.
Vibe testing approach: Automated dependency auditing as a first-class test gate. Run npm audit, pip-audit, or safety check as part of every CI pipeline. But go further: manually review every dependency the AI introduced. Check if the package actually exists on the registry (AI can hallucinate package names). Verify versions match current stable releases.
Failure Mode #5: Specification-Implementation Gap
What it is: The gap between what the developer thinks the AI built and what the AI actually built. This is the most insidious failure mode because it persists through code review, testing, and even initial production deployment.
Example: A developer prompts: “Implement rate limiting — max 100 requests per minute per user.” The AI implements rate limiting per IP address, not per authenticated user. For most testing scenarios (single user, single IP), this behaves identically. The gap only surfaces when multiple users share an IP (corporate NAT, VPN) or a single user accesses from multiple IPs.
Why traditional tests miss it: Unit tests run in isolation with a single test client. Integration tests typically use a single IP. The specification-implementation gap only manifests under specific production conditions.
Vibe testing approach: Contract testing and behavioral specification verification. Write tests that explicitly encode the specification — not the implementation. Use property-based testing to explore edge cases the AI might not have considered. And critically: have the developer explain, in writing, what they believe the AI built — then verify that understanding against the actual code.
4. The Vibe Testing Checklist: A Practical Framework
Based on these five failure modes, here is a practical checklist for QA teams working with vibe-coded applications. This is not theoretical — it is a working framework you can implement in your next sprint.
| Phase | Check | Failure Mode Addressed | Tool/Method |
|---|---|---|---|
| Pre-Code Review | Compare AI prompt against business requirements document | Requirement Drift | Manual review + requirement traceability matrix |
| Pre-Code Review | Verify developer can explain what the AI built (not just what they asked for) | Specification-Implementation Gap | Developer walkthrough session |
| Code Review | Audit every dependency the AI introduced | Dependency Confusion | npm audit, pip-audit, Snyk, Dependabot |
| Code Review | Flag all business logic for domain expert review | Hallucinated Logic | Manual review with domain SME |
| Code Review | Check for hallucinated packages (packages that do not exist on registries) | Dependency Confusion | Registry verification scripts |
| Test Review | Run mutation testing on all AI-generated tests | Test-That-Tests-Nothing | Stryker, mutmut, PITest |
| Test Review | Verify test assertions check behavior, not just existence | Test-That-Tests-Nothing | Assertion audit (manual) |
| Test Review | Ensure negative test cases were not generated by the same AI that wrote the code | Hallucinated Logic | Independent test generation |
| Exploratory Testing | Conduct session-based exploratory testing on all vibe-coded features | All five failure modes | SBET with charter focused on AI-generated code |
| Exploratory Testing | Test edge cases the AI likely did not consider (cultural, locale, accessibility) | Requirement Drift, Hallucinated Logic | Heuristic-based exploration |
| Pre-Deployment | Verify feature behavior matches original business intent (not just tests passing) | Specification-Implementation Gap | UAT with stakeholders |
| Pre-Deployment | Run security scan on all AI-introduced code paths | Dependency Confusion | SAST/DAST tools |
5. Why Exploratory Testing Is Non-Negotiable for Vibe-Coded Apps
Here is a truth that the AI-testing-tools industry does not want to hear: exploratory testing by skilled humans is more important now than it was before AI-assisted development.
The reason is simple. AI-generated tests are inherently biased toward the AI’s understanding of the code. When the same model (or same class of model) writes both the implementation and the tests, you get a closed loop of confirmation bias. The tests confirm what the AI built. They do not challenge it.
This is precisely the scenario I described in my analysis of how QA teams should evaluate AI agents before production. An AI agent can pass every demo scenario and still fail catastrophically in production because no one tested the scenarios the AI did not anticipate.
Session-Based Exploratory Testing for Vibe Code
When conducting exploratory testing on vibe-coded features, structure your sessions around these charters:
- “What did the AI assume?” — Explore the assumptions embedded in the implementation. Test boundary conditions, null states, error handling, and recovery paths.
- “What did the developer not ask for?” — Test for missing functionality. If the prompt was “build a shopping cart,” did the AI handle currency conversion? Tax calculation? Empty cart states? Session timeout?
- “What happens when the user does not follow the happy path?” — AI models are trained on examples that overwhelmingly represent happy paths. Deliberately break the workflow. Use unexpected inputs. Navigate out of sequence.
- “What about non-functional requirements?” — Test performance under load, accessibility compliance, security boundaries, and data handling. These are the areas AI is most likely to overlook unless explicitly prompted.
6. Building Your Vibe Testing Strategy: A Step-by-Step Approach
Implementing vibe testing does not require throwing out your existing test strategy. It is an augmentation layer that addresses the specific risks of AI-assisted development.
Step 1: Classify Code by Origin
Tag every feature, module, or component with its development method: fully human-written, AI-assisted, or fully AI-generated. This classification drives your test strategy — fully AI-generated code gets the full vibe testing treatment; AI-assisted code gets targeted checks for the specific failure modes most relevant to the level of AI involvement.
Step 2: Implement Independent Test Generation
Never let the same AI that wrote the code also write all the tests. Use a different model, a different tool, or — ideally — a human tester to write the critical test cases. Cross-model verification catches hallucination patterns that single-model testing misses.
This is where tools like Playwright CLI combined with OpenCode become valuable — they give QA teams an independent channel for generating and running tests that are not contaminated by the development AI’s assumptions.
Step 3: Add Mutation Testing to Your CI Pipeline
Mutation testing is the single most effective technique for catching “tests that test nothing.” Here is a minimal setup for a JavaScript project:
// stryker.conf.js — Mutation testing configuration
/** @type {import('@stryker-mutator/api/core').PartialStrykerOptions} */
const config = {
packageManager: 'npm',
reporters: ['html', 'clear-text', 'progress'],
testRunner: 'jest',
jest: {
projectType: 'custom',
configFile: 'jest.config.js'
},
coverageAnalysis: 'perTest',
thresholds: {
high: 80,
low: 60,
break: 50 // Fail the build if mutation score drops below 50%
},
mutate: [
'src/**/*.js',
'!src/**/*.test.js',
'!src/**/*.spec.js'
]
};
When mutation testing reveals that you can change if (user.role === 'admin') to if (user.role !== 'admin') and all tests still pass, you have found a test-that-tests-nothing. Fix it before it ships.
Step 4: Establish Behavioral Contracts
For critical business logic in vibe-coded applications, write behavioral contracts — plain-language specifications of expected behavior that serve as the source of truth for testing:
# Behavioral Contract: User Authentication
# This contract is the source of truth, NOT the AI-generated code
CONTRACT: Login
GIVEN a registered user with valid credentials
WHEN they submit email and password
THEN they receive a JWT token valid for 24 hours
AND the token contains their user ID and role
AND a refresh token is stored in httpOnly cookie
CONTRACT: Login - Invalid Credentials
GIVEN any user
WHEN they submit incorrect password 5 times within 15 minutes
THEN the account is locked for 30 minutes
AND an email notification is sent to the account owner
AND subsequent login attempts return 429 (not 401)
CONTRACT: Login - Rate Limiting
GIVEN any client
WHEN they exceed 100 requests per minute
THEN rate limiting is applied PER AUTHENTICATED USER (not per IP)
AND rate limit headers are included in response
These contracts become your test oracles. When the AI generates code and tests, you verify both against the contract — not against each other.
7. Common Pitfalls When Testing Vibe-Coded Applications
Even teams that recognize the need for vibe testing fall into predictable traps. Here are the most common ones I have observed.
Pitfall 1: Trusting Coverage Metrics
AI-generated tests are exceptional at achieving high code coverage. They can hit 90%+ coverage while validating almost nothing. Coverage tells you which lines executed during testing — it says absolutely nothing about whether the right behavior was verified. In vibe-coded projects, coverage metrics are actively misleading.
Pitfall 2: Using the Same AI for Code and Tests
If Cursor writes your feature code and you then ask Cursor to write the tests, you have created a confirmation bias loop. The AI will generate tests that pass for the code it wrote — even if that code is wrong. Always introduce an independent verification step.
Pitfall 3: Skipping Manual Review Because “Tests Pass”
The green checkmark is seductive. When all 247 tests pass and coverage is at 94%, it takes real discipline to still conduct a manual code review and exploratory testing session. But this is exactly where vibe-coded bugs hide — behind a wall of passing tests that test the wrong thing.
Pitfall 4: Treating AI-Generated Code as Senior-Developer Code
AI code often looks like it was written by a senior developer — clean formatting, consistent naming, well-structured. This aesthetic quality creates false confidence. In reality, AI-generated code should be reviewed with the same rigor you would apply to code from a new junior developer who has not learned your domain yet.
Pitfall 5: Not Tracking Which Code Is AI-Generated
If you cannot identify which parts of your codebase were vibe-coded, you cannot apply appropriate testing rigor. Implement code origin tagging — through commit messages, PR labels, or code comments — so your QA strategy can be risk-proportionate.
8. The Future of QA in the Vibe Coding Era
The QA profession is not becoming obsolete because of AI. It is becoming more critical. As I have explored in my work on building automation frameworks with vibe coding, AI tools can dramatically accelerate test creation — but they cannot replace the critical thinking, domain knowledge, and adversarial mindset that defines skilled QA work.
The QA engineers who thrive in this era will be the ones who:
- Understand AI limitations — knowing where AI-generated code is likely to fail
- Master exploratory testing — the one testing discipline AI cannot replicate
- Think in specifications — writing behavioral contracts that serve as test oracles
- Use AI as a tool, not an oracle — leveraging AI for test generation while maintaining independent verification
- Champion mutation testing — proving that tests actually catch bugs, not just execute code
Frequently Asked Questions
What is the difference between vibe coding and traditional AI-assisted coding?
Traditional AI-assisted coding uses AI for code completion and suggestions while the developer maintains full understanding of the codebase. Vibe coding goes further — the developer describes intent through natural language prompts and accepts AI-generated implementations without necessarily reading or understanding every line. The key difference is the understanding gap: in vibe coding, the developer may not fully comprehend the code they ship. This gap is what makes vibe testing necessary.
Can AI-generated tests be trusted at all?
AI-generated tests are excellent starting points — they provide structural coverage, handle obvious test cases efficiently, and can dramatically accelerate test suite creation. However, they should never be the only line of defense. AI-generated tests are most dangerous when they create a false sense of security through high coverage numbers. Always supplement with mutation testing to verify test effectiveness, independent human-written tests for critical paths, and exploratory testing to catch what automated tests miss.
How does vibe testing fit into existing CI/CD pipelines?
Vibe testing integrates into existing pipelines as an additional layer, not a replacement. Add mutation testing as a CI gate (fail builds below a mutation score threshold). Include dependency auditing for AI-introduced packages. Tag PRs that contain AI-generated code for enhanced review. Schedule regular exploratory testing sessions for features built with vibe coding. The goal is to make vibe testing an automated, repeatable part of your quality process — not an ad hoc activity.
Is vibe testing only relevant for teams using AI coding tools?
While vibe testing was born from the challenges of AI-assisted development, many of its principles — specification-first testing, mutation testing, behavioral contracts, independent test verification — are good testing practices regardless. However, the urgency and necessity scale dramatically when AI is generating code. If your team uses Cursor, Copilot, Claude Code, or any AI coding assistant, vibe testing should be on your radar immediately.
What tools do I need to start vibe testing today?
You can start with tools you likely already have. For mutation testing: Stryker (JavaScript/TypeScript), mutmut (Python), or PITest (Java). For dependency auditing: npm audit, pip-audit, or Snyk. For exploratory testing: any session-based test management tool or even a simple spreadsheet with time-boxed charters. For behavioral contracts: plain text files in your repository. The most important “tool” is the mindset shift — recognizing that AI-generated code requires a different testing approach than human-written code.
Conclusion: QA Is the Last Line of Defense
Vibe coding is not going away. It is going to accelerate. Google Cloud is publishing deployment guides for it. Enterprise teams are building production systems with it. The Tricentis data shows nearly half of teams are already shipping under-tested code, and that number will only grow as AI makes development faster.
The SDLC did not break because vibe coding is bad. It broke because our testing strategies assumed a human understood the code at every stage. That assumption is gone. Vibe testing is how we build it back — not by rejecting AI-assisted development, but by adapting our quality practices to account for the new failure modes it introduces.
Start with the five failure modes. Implement the checklist. Add mutation testing to your pipeline. Make exploratory testing non-negotiable. And above all, remember: when the developer did not fully understand the code they shipped, QA is the last line of defense between a passing test suite and a production incident.
The tests might pass. Your job is to ask whether they should.
