Claude Code in Production - Case Study

Magdalena Chmiel

at Boldare - Product Design and Development Company

Home

Blog

GenAI

Claude Code in Production: AI-Augmented Delivery on a Mission-Critical Platform | Case Study

Magdalena Chmiel

at Boldare - Product Design and Development Company

Shipping features on a regulated, event-driven energy platform leaves little room for skipped tests or undocumented decisions.

When Boldare embedded Claude Code as a permanent, team-wide practice in Q4 2025, the results came faster than expected: test coverage climbed 10 percentage points in a single quarter, sprint velocity jumped by up to 31%, and AI now touches an estimated 75–85% of all new code and tests. This is a detailed account of how the transition happened, which tools drove it, and what the data actually shows.

Claude Code in Production: AI-Augmented Delivery on a Mission-Critical Platform | Case Study

Share this article:

Search for an article

~95%

Test coverage after AI

+31%

Velocity increase (6-dev team)

~85%

New tests authored with AI

~75%

Code with AI contribution

The Client

The client operates one of only three gas capacity trading platforms in Europe – a large digital marketplace for buying and selling gas transmission capacity across European networks. The platform is backed by around 20 institutional shareholders and operates in a regulated environment where reliability and test coverage are not optional.

On the technical side: a mature, Agile engineering organisation with high standards for delivery quality. Boldare joined as an external development team working in full Scrum alongside the client’s in-house developers.

The Problem

Three friction points repeated sprint after sprint:

Area	Symptom	Consequence
Testing	Writing tests was time-consuming relative to their sprint value	Deferred work, coverage gaps
Documentation	ADRs and architectural decisions went unrecorded under delivery pressure	Growing documentation debt
Onboarding	Large, unfamiliar codebase slowed new developers down	Long ramp-up time

The question was how to embed AI so that results scaled to the whole team – not just to one or two individual enthusiasts.

Key Objectives

Increase delivery velocity without expanding headcount
Raise and sustain test coverage across a complex codebase
Reduce the documentation debt that accumulates in fast-moving sprints
Make AI a shared team practice, not a tool used by one or two individuals
Establish repeatable standards and prompt practices that could evolve over time

How the Rollout Happened

Phase	What happened
Q4 2025 – kickoff	Two developers completed a formal Spec-Driven Development course (10xDevs / Brave Courses)
Q4 2025 – onboarding	One internal session for the full team (6 devs, PO, SM) covering AI workflow and Claude Code
Q4 2025 – stabilisation	AI usage became consistent and habitual; practices stabilised and were documented
January 2026	One developer moved teams; two new developers joined. The 6-person team continues the same AI practices

The tech stack required zero changes: React, Java/Spring, EventStore, PostgreSQL, Redis, DocumentDB, AWS, GitLab CI/CD. AI tooling integrated into existing workflows without new infrastructure.

Tool Split by Layer

Layer	Tool	Share of dev time	Primary use cases
Frontend	Cursor	~50%	Component-driven iteration
Frontend	Claude	~50%	Logic, analysis, refactoring
Backend	Claude Code	~90%	Code generation, ADRs, event-service reasoning
Backend	GitHub Copilot	~10%	Supplementary autocomplete

On the backend, Claude Code functioned not only as a code generator but as a contextual reasoning layer – analysing event-driven service behaviour, answering questions about business logic embedded in the codebase, and producing Architecture Decision Records (ADRs) as a natural by-product of the development process rather than a separate task.

Results in Detail

Velocity

Test coverage

Coverage rose from ~85% to ~95% across the codebase. Approximately 85% of all new tests written since Q4 2025 were authored with AI assistance.

AI contribution by task type

Task	AI involvement
Code generation, refactoring, debugging	~75%
Test authoring	~85%
Documentation (ADRs, technical docs)	~70%
Code review (pre-review pass)	~50%

The ADR Problem – and How It Got Solved

In fast-moving teams, Architecture Decision Records are almost always deferred. Writing them is time-consuming relative to their immediate sprint value, so they slip off the backlog under delivery pressure.

Using Claude Code to analyse codebase context and generate structured ADRs changed that dynamic entirely: documentation became a by-product of delivery, not a task competing with it. ADRs and technical documentation are now a consistent output of the backend development process.

Methodology: Three Principles

Tool specialisation by layer

Frontend and backend developers use different primary tools because the nature of their work differs. Cursor and Claude suit the component-driven, iterative rhythm of frontend development. Claude Code suits backend work precisely because it reasons about codebases as a whole – essential for an event-driven architecture where understanding service dependencies is often the real bottleneck.

AI as a force-multiplier on high-volume tasks

Test authoring is the clearest example: the cognitive overhead per test is low, but the volume is high. AI removes that friction almost entirely. This is why test coverage improved substantially – not just the raw test count. The same logic applies to documentation: low cognitive overhead per decision record, high benefit over time.

Iterative adoption, not big-bang rollout

Formal training seeded initial fluency; knowledge spread from there. The team did not mandate a specific way of working with AI tools – instead, practices emerged from experience and were documented as they stabilised. This approach keeps the learning curve manageable and ensures adoption reflects what actually works in that specific codebase.

What This Tells Us

Embedding AI tools into a production team is primarily a process decision, not a technology one. The tooling is straightforward to introduce. What takes longer is reaching the point where every developer on the team uses it consistently – in ways that compound rather than cancel each other out across the codebase.

One quarter in, the evidence from this engagement points to a clear pattern: AI tooling is most valuable not where the task is hardest, but where the volume is highest and the cognitive overhead is low enough that developers tend to skip or defer the work. Tests and documentation are the obvious examples – and both improved significantly.

Identifying client details have been kept at a general level in accordance with NDA obligations.
All metrics reflect Boldare team members’ tooling. Additional team members use Junie and Codex; their usage patterns were not tracked for this case study.