Inside 452 AI-built pull requests

What it actually looks like when most of an app is implemented by AI agents and shipped through a guarded workflow: real timings and diff sizes from a production system, anonymised.

It’s easy to put “10× faster” on a marketing page. It’s harder to show the receipts. So here are ours.

We build and operate a production B2B web app where the large majority of code is implemented by AI agents and shipped through the exact workflow we’d set up for a client. Below is what that has produced: measured, not estimated. To keep a client’s product and data private, everything here is aggregate timings and diff sizes only: no feature names, no branch titles, no screenshots.

By the numbers

Metric	Value
Merged pull requests	452 (Aug 2025 – May 2026)
Median time, PR open → merge	~1.8 hours
Merged within the hour	~45%
Merged the same day	~79%
Median change size	+92 / −23 lines
Files in the median PR	5

The flow behind every one of those PRs

The shape is always the same, and it’s the workflow we sell:

A request lands where the team already works: a message in chat, or a ticket.
An agent picks it up, implements it on its own branch and its own pull request, with full repo context (the conventions, the patterns, the examples).
Automated review runs on the PR: lint, type-check, the test suite, and an AI review that knows the repo’s rules. Failures loop back to the agent to fix, never bypassed.
A human approves the green PR. It merges and deploys.

No step copies context from one tool to another by hand. That’s the whole point: the hand-offs are automated, so a person spends their time on judgement and approval, not on shepherding.

What these numbers prove

The review-and-merge loop is fast, and it stays fast under volume. A median of ~1.8 hours from opening a PR to merging it (with nearly half merged inside the hour) means changes don’t rot in a queue waiting for a free reviewer. That’s the bottleneck most teams actually have, and it’s the one the workflow removes.

Changes ship in small, reviewable batches. A median PR touches 5 files and changes ~115 lines. Small PRs are easier to review, safer to merge, and quicker to roll back, textbook engineering hygiene. The workflow makes it the path of least resistance: one request, one focused PR, instead of a sprawling branch nobody wants to read.

What they do not prove

This is where most “AI productivity” claims quietly overreach, so we’ll be explicit.

Open → merge is not “time to build a feature.” It measures the loop after a change is ready: review, fixes, approval, merge. It’s a genuine, useful number (it’s the part teams lose days to) but it is not a claim that a feature gets designed and built in 1.8 hours.

Big work still takes real time. The distribution has a long tail: the largest single PR in this set changed ~2,400 lines across 35 files and took the better part of a day to get green and merged. The median is fast because the work is deliberately broken into small pieces, not because hard problems became easy.

We keep the way we derive every figure (measured vs. estimated) in the open. See how we measure the hours saved.

Why we run it on ourselves first

Every guardrail in this system (single-writer state, “every check green, no bypassing”, human-approved merges) exists because we ran it at volume and hit the edges ourselves. 452 PRs is 452 chances for an automation to corrupt a board, merge something red, or lose work. It hasn’t, because the safety rules came first. That’s what we hand a client on day one, instead of the version that learns those lessons on their codebase.

This is the Advanced AI tier from our workflow comparison: not a chatbot helping you type, and not a coding agent you babysit step by step, but the hand-offs themselves automated and guarded. If you’d like to see where the hours are hiding in your own delivery, book a strategy call.