AI Agents Write Tests. Who Fixes Them?

Karol Kasprzak

at Boldare - Product Design and Development Company

Home

Blog

GenAI

This week’s AI Bite: AI Agents Write Tests. Who Fixes Them?

Karol Kasprzak

at Boldare - Product Design and Development Company

Weekly AI Bites is a series that gives you a direct look into our day-to-day AI work. Every post shares insights, experiments, and experiences straight from our team’s meetings and Slack, highlighting what models we’re testing, which challenges we’re tackling, and what’s really working in real products. If you want to know what’s buzzing in AI, check Boldare’s channels every Monday for the latest bite.

As a Senior Software Engineer, I’ve been thinking a lot about how AI agents are changing the way we write and break — tests. Not a new topic, but one that looks very different now that agents are doing the coding.

This week’s AI Bite: AI Agents Write Tests. Who Fixes Them?

Share this article:

Search for an article

A quote from xUnit Test Patterns – a book I read a few years back – got me thinking. When xUnit libraries first emerged, they made interaction-based testing significantly cheaper, largely by simplifying mock generation. A good thing at the time.

But in the age of AI agents, that same convenience has a darker side.

Agents are great at generating mock-heavy tests. They optimize locally for green – and naturally copy whatever patterns already exist in the codebase. The problem is that mocks tie tests to implementation details: which methods get called, with what arguments, in what order. Not to what the system actually does.

So if a codebase is already mock-heavy, agents will quietly make it more so – reinforcing an architecture that gets harder to change over time.

The paradox: mock-heavy tests are the easiest for AI to generate, but the hardest for humans to maintain. The agent moves on. You’re left fixing tests that broke because someone renamed a method.

For years, mocks lowered the cost of writing tests. But when generating code is nearly free, writing tests is no longer the bottleneck — surviving refactors is.

I wonder if we’ll see a gradual shift from “did object A call object B correctly?” toward “does the system still behave as expected?” Not implementation verification. Contract verification.

Share this article:

This week’s AI Bite: AI Agents Write Tests. Who Fixes Them?

Table of contents

This week’s AI Bite: AI and design – Automatic generation of design systems from existing websites

This week’s AI Bite: AI in Practice: Claude Code from a Java Developer’s Perspective

This week’s AI Bite: AI in Practice: Claude Code from a Java Developer’s Perspective

Claude Code in Production: AI-Augmented Delivery on a Mission-Critical Platform | Case Study

Enterprise AI licenses – Why this is non-negotiable for regulated industries

Join our Team

Get in touch

This week’s AI Bite: AI Agents Write Tests. Who Fixes Them?

Table of contents

Related Articles

This week’s AI Bite: AI and design – Automatic generation of design systems from existing websites

This week’s AI Bite: AI in Practice: Claude Code from a Java Developer’s Perspective

This week’s AI Bite: AI in Practice: Claude Code from a Java Developer’s Perspective

Claude Code in Production: AI-Augmented Delivery on a Mission-Critical Platform | Case Study

Enterprise AI licenses – Why this is non-negotiable for regulated industries

Join our Team

Get in touch