Home Blog GenAI Claude Code in Production: AI-Augmented Delivery on a Mission-Critical Platform | Case Study

Claude Code in Production: AI-Augmented Delivery on a Mission-Critical Platform | Case Study

Shipping features on a regulated, event-driven energy platform leaves little room for skipped tests or undocumented decisions.

When Boldare embedded Claude Code as a permanent, team-wide practice in Q4 2025, the results came faster than expected: test coverage climbed 10 percentage points in a single quarter, sprint velocity jumped by up to 31%, and AI now touches an estimated 75–85% of all new code and tests. This is a detailed account of how the transition happened, which tools drove it, and what the data actually shows.

Claude Code in Production: AI-Augmented Delivery on a Mission-Critical Platform | Case Study

Table of contents

~95%
Test coverage after AI
+31%
Velocity increase (6-dev team)
~85%
New tests authored with AI
~75%
Code with AI contribution

The Client

The client operates one of only three gas capacity trading platforms in Europe – a large digital marketplace for buying and selling gas transmission capacity across European networks. The platform is backed by around 20 institutional shareholders and operates in a regulated environment where reliability and test coverage are not optional.

On the technical side: a mature, Agile engineering organisation with high standards for delivery quality. Boldare joined as an external development team working in full Scrum alongside the client’s in-house developers.

The Problem

Three friction points repeated sprint after sprint:

AreaSymptomConsequence
TestingWriting tests was time-consuming relative to their sprint valueDeferred work, coverage gaps
DocumentationADRs and architectural decisions went unrecorded under delivery pressureGrowing documentation debt
OnboardingLarge, unfamiliar codebase slowed new developers downLong ramp-up time

The question was how to embed AI so that results scaled to the whole team – not just to one or two individual enthusiasts.

Key Objectives

  • Increase delivery velocity without expanding headcount
  • Raise and sustain test coverage across a complex codebase
  • Reduce the documentation debt that accumulates in fast-moving sprints
  • Make AI a shared team practice, not a tool used by one or two individuals
  • Establish repeatable standards and prompt practices that could evolve over time

How the Rollout Happened

PhaseWhat happened
Q4 2025 – kickoffTwo developers completed a formal Spec-Driven Development course (10xDevs / Brave Courses)
Q4 2025 – onboardingOne internal session for the full team (6 devs, PO, SM) covering AI workflow and Claude Code
Q4 2025 – stabilisationAI usage became consistent and habitual; practices stabilised and were documented
January 2026One developer moved teams; two new developers joined. The 6-person team continues the same AI practices

The tech stack required zero changes: React, Java/Spring, EventStore, PostgreSQL, Redis, DocumentDB, AWS, GitLab CI/CD. AI tooling integrated into existing workflows without new infrastructure.

Tool Split by Layer

LayerToolShare of dev timePrimary use cases
FrontendCursor~50%Component-driven iteration
FrontendClaude~50%Logic, analysis, refactoring
BackendClaude Code~90%Code generation, ADRs, event-service reasoning
BackendGitHub Copilot~10%Supplementary autocomplete

On the backend, Claude Code functioned not only as a code generator but as a contextual reasoning layer – analysing event-driven service behaviour, answering questions about business logic embedded in the codebase, and producing Architecture Decision Records (ADRs) as a natural by-product of the development process rather than a separate task.

Results in Detail

Velocity

Sprint velocity (story points per sprint) 13 SPBaseline 15 SP +15%Same team + AI 17 SP +31%6-dev team + AI

Test coverage

Coverage rose from ~85% to ~95% across the codebase. Approximately 85% of all new tests written since Q4 2025 were authored with AI assistance.

AI contribution by task type

AI contribution by task typeTest authoring85%15%Code gen / refactor75%25%Documentation70%30%Code review50%50%With AIWithout AI
TaskAI involvement
Code generation, refactoring, debugging~75%
Test authoring~85%
Documentation (ADRs, technical docs)~70%
Code review (pre-review pass)~50%

The ADR Problem – and How It Got Solved

In fast-moving teams, Architecture Decision Records are almost always deferred. Writing them is time-consuming relative to their immediate sprint value, so they slip off the backlog under delivery pressure.

Using Claude Code to analyse codebase context and generate structured ADRs changed that dynamic entirely: documentation became a by-product of delivery, not a task competing with it. ADRs and technical documentation are now a consistent output of the backend development process.

Methodology: Three Principles

Tool specialisation by layer

Frontend and backend developers use different primary tools because the nature of their work differs. Cursor and Claude suit the component-driven, iterative rhythm of frontend development. Claude Code suits backend work precisely because it reasons about codebases as a whole – essential for an event-driven architecture where understanding service dependencies is often the real bottleneck.

AI as a force-multiplier on high-volume tasks

Test authoring is the clearest example: the cognitive overhead per test is low, but the volume is high. AI removes that friction almost entirely. This is why test coverage improved substantially – not just the raw test count. The same logic applies to documentation: low cognitive overhead per decision record, high benefit over time.

Iterative adoption, not big-bang rollout

Formal training seeded initial fluency; knowledge spread from there. The team did not mandate a specific way of working with AI tools – instead, practices emerged from experience and were documented as they stabilised. This approach keeps the learning curve manageable and ensures adoption reflects what actually works in that specific codebase.

What This Tells Us

Embedding AI tools into a production team is primarily a process decision, not a technology one. The tooling is straightforward to introduce. What takes longer is reaching the point where every developer on the team uses it consistently – in ways that compound rather than cancel each other out across the codebase.

One quarter in, the evidence from this engagement points to a clear pattern: AI tooling is most valuable not where the task is hardest, but where the volume is highest and the cognitive overhead is low enough that developers tend to skip or defer the work. Tests and documentation are the obvious examples – and both improved significantly.

Identifying client details have been kept at a general level in accordance with NDA obligations.
All metrics reflect Boldare team members’ tooling. Additional team members use Junie and Codex; their usage patterns were not tracked for this case study.