Claude Code in Production: AI-Augmented Delivery on a Mission-Critical Platform | Case Study
Shipping features on a regulated, event-driven energy platform leaves little room for skipped tests or undocumented decisions.
When Boldare embedded Claude Code as a permanent, team-wide practice in Q4 2025, the results came faster than expected: test coverage climbed 10 percentage points in a single quarter, sprint velocity jumped by up to 31%, and AI now touches an estimated 75–85% of all new code and tests. This is a detailed account of how the transition happened, which tools drove it, and what the data actually shows.

Table of contents
The Client
The client operates one of only three gas capacity trading platforms in Europe – a large digital marketplace for buying and selling gas transmission capacity across European networks. The platform is backed by around 20 institutional shareholders and operates in a regulated environment where reliability and test coverage are not optional.
On the technical side: a mature, Agile engineering organisation with high standards for delivery quality. Boldare joined as an external development team working in full Scrum alongside the client’s in-house developers.
The Problem
Three friction points repeated sprint after sprint:
| Area | Symptom | Consequence |
|---|---|---|
| Testing | Writing tests was time-consuming relative to their sprint value | Deferred work, coverage gaps |
| Documentation | ADRs and architectural decisions went unrecorded under delivery pressure | Growing documentation debt |
| Onboarding | Large, unfamiliar codebase slowed new developers down | Long ramp-up time |
The question was how to embed AI so that results scaled to the whole team – not just to one or two individual enthusiasts.
Key Objectives
- Increase delivery velocity without expanding headcount
- Raise and sustain test coverage across a complex codebase
- Reduce the documentation debt that accumulates in fast-moving sprints
- Make AI a shared team practice, not a tool used by one or two individuals
- Establish repeatable standards and prompt practices that could evolve over time
How the Rollout Happened
| Phase | What happened |
|---|---|
| Q4 2025 – kickoff | Two developers completed a formal Spec-Driven Development course (10xDevs / Brave Courses) |
| Q4 2025 – onboarding | One internal session for the full team (6 devs, PO, SM) covering AI workflow and Claude Code |
| Q4 2025 – stabilisation | AI usage became consistent and habitual; practices stabilised and were documented |
| January 2026 | One developer moved teams; two new developers joined. The 6-person team continues the same AI practices |
The tech stack required zero changes: React, Java/Spring, EventStore, PostgreSQL, Redis, DocumentDB, AWS, GitLab CI/CD. AI tooling integrated into existing workflows without new infrastructure.
Tool Split by Layer
| Layer | Tool | Share of dev time | Primary use cases |
|---|---|---|---|
| Frontend | Cursor | ~50% | Component-driven iteration |
| Frontend | Claude | ~50% | Logic, analysis, refactoring |
| Backend | Claude Code | ~90% | Code generation, ADRs, event-service reasoning |
| Backend | GitHub Copilot | ~10% | Supplementary autocomplete |
On the backend, Claude Code functioned not only as a code generator but as a contextual reasoning layer – analysing event-driven service behaviour, answering questions about business logic embedded in the codebase, and producing Architecture Decision Records (ADRs) as a natural by-product of the development process rather than a separate task.
Results in Detail
Velocity
Test coverage
Coverage rose from ~85% to ~95% across the codebase. Approximately 85% of all new tests written since Q4 2025 were authored with AI assistance.
AI contribution by task type
| Task | AI involvement |
|---|---|
| Code generation, refactoring, debugging | ~75% |
| Test authoring | ~85% |
| Documentation (ADRs, technical docs) | ~70% |
| Code review (pre-review pass) | ~50% |
The ADR Problem – and How It Got Solved
In fast-moving teams, Architecture Decision Records are almost always deferred. Writing them is time-consuming relative to their immediate sprint value, so they slip off the backlog under delivery pressure.
Using Claude Code to analyse codebase context and generate structured ADRs changed that dynamic entirely: documentation became a by-product of delivery, not a task competing with it. ADRs and technical documentation are now a consistent output of the backend development process.
Methodology: Three Principles
Tool specialisation by layer
Frontend and backend developers use different primary tools because the nature of their work differs. Cursor and Claude suit the component-driven, iterative rhythm of frontend development. Claude Code suits backend work precisely because it reasons about codebases as a whole – essential for an event-driven architecture where understanding service dependencies is often the real bottleneck.
AI as a force-multiplier on high-volume tasks
Test authoring is the clearest example: the cognitive overhead per test is low, but the volume is high. AI removes that friction almost entirely. This is why test coverage improved substantially – not just the raw test count. The same logic applies to documentation: low cognitive overhead per decision record, high benefit over time.
Iterative adoption, not big-bang rollout
Formal training seeded initial fluency; knowledge spread from there. The team did not mandate a specific way of working with AI tools – instead, practices emerged from experience and were documented as they stabilised. This approach keeps the learning curve manageable and ensures adoption reflects what actually works in that specific codebase.
What This Tells Us
Embedding AI tools into a production team is primarily a process decision, not a technology one. The tooling is straightforward to introduce. What takes longer is reaching the point where every developer on the team uses it consistently – in ways that compound rather than cancel each other out across the codebase.
One quarter in, the evidence from this engagement points to a clear pattern: AI tooling is most valuable not where the task is hardest, but where the volume is highest and the cognitive overhead is low enough that developers tend to skip or defer the work. Tests and documentation are the obvious examples – and both improved significantly.
Identifying client details have been kept at a general level in accordance with NDA obligations.
All metrics reflect Boldare team members’ tooling. Additional team members use Junie and Codex; their usage patterns were not tracked for this case study.
Share this article:





