The impact of AI on our code, measured.
We compared the team before and after adopting Claude — deliveries, bugs and code volume, normalized per day, across three adoption phases.
From zero AI to a whole team on Claude
Adoption happened across three phases of different lengths. That is why everything is measured per day — the axis below is proportional to each period's days.
Baseline of the team working without any AI tooling.
One dev using Claude alone to build the internal tool.
All 5 developers operating with Claude in their workflow.
The most robust signal of effective work: it combines productivity and quality in a single number. The team that used to create ~1 bug per delivered task now creates 0.3 — while delivering far more in the process.
Six numbers, one direction
Per-day comparison across the three phases. P2 is the build phase — context, not feature delivery.
// note: in P2 (build) the largest code volume of all phases was written — without opening Jira tasks. Hence few tasks/day (1.10) and the inflated Bug:Task ratio (1.67): P2's volume is infrastructure, not feature delivery.
Fewer bugs — and less severe ones
The drop in volume comes with an even larger drop in critical bugs: the team not only introduces fewer defects, it introduces less dangerous ones.
More delivery, much more code
With fewer bugs, the team also produced more: daily deliveries rose by a third and code volume nearly doubled, in PRs 2.4× larger.
The gain is distributed across the team
Jira Done per day, per developer. The jump shows up in almost everyone — it is not down to a single person.
// * Dev 2 was present for 34 of the 49 days of P3 (left on 05/18). ** Dev 3's focus in P2 was infrastructure — code without associated tasks.
The team spends energy cleaning the present
Of the 42 bugs closed in P3, most were born in the phase itself — a sign of active flow, not accumulated debt.
AI does not replace the team. It multiplies a team that knows the product.
The biggest gain did not come from writing more code, but from a team with product context and history driving the tool. AI amplifies that knowledge — domain, legacy, prior decisions. Without that context, no tool delivers −70% bugs per task with +36% deliveries. It is the team that knows what it is building that makes AI effective.
- Phases of different lengths (30d / 19d / 49d) — all metrics normalized per day.
- Analysis limited to the internal engineering team, over the AI repository built by the team itself — no external consulting.
- P2 was an infrastructure-building phase, not feature delivery.
- The growth coincides with the natural maturing of the team and the project.
- Dev 2 left the project on 05/18 — partial presence in P3 (34 of 49 days).