From Industry Claims to Empirical Reality: An Empirical Study of Code Review Agents in Pull Requests

· 2026 · cs.SE · arXiv 2604.03196

2 Pith papers cite this work. Polarity classification is still indexing.

2 Pith papers citing it

open full Pith review browse 2 citing papers arXiv PDF

abstract

Autonomous coding agents are generating code at an unprecedented scale, with OpenAI Codex alone creating over 400,000 pull requests (PRs) in two months. As agentic PR volumes increase, code review agents (CRAs) have become routine gatekeepers in development workflows. Industry reports claim that CRAs can manage 80% of PRs in open source repositories without human involvement. As a result, understanding the effectiveness of CRA reviews is crucial for maintaining developmental workflows and preventing wasted effort on abandoned pull requests. However, empirical evidence on how CRA feedback quality affects PR outcomes remains limited. The goal of this paper is to help researchers and practitioners understand when and how CRAs influence PR merge success by empirically analyzing reviewer composition and the signal quality of CRA-generated comments. From AIDev's 19,450 PRs, we analyze 3,109 unique PRs in the commented review state, comparing human-only versus CRA-only reviews. We examine 98 closed CRA-only PRs to assess whether low signal-to-noise ratios contribute to abandonment. CRA-only PRs achieve a 45.20% merge rate, 23.17 percentage points lower than human-only PRs (68.37%), with significantly higher abandonment. Our signal-to-noise analysis reveals that 60.2% of closed CRA-only PRs fall into the 0-30% signal range, and 12 of 13 CRAs exhibit average signal ratios below 60%, indicating substantial noise in automated review feedback. These findings suggest that CRAs without human oversight often generate low-signal feedback associated with higher abandonment. For practitioners, our results indicate that CRAs should augment rather than replace human reviewers and that human involvement remains critical for effective and actionable code review.

representative citing papers

Augmentation with Dilution: A Large-Scale Empirical Study of Human Contributor Ecosystems After AI Coding Agent Adoption

cs.SE · 2026-06-24 · unverdicted · novelty 7.0

AI coding agent adoption causes no change in human contributor count but reduces contributor density and newcomer share by 3.7pp while increasing review depth by 5.3% in a staggered DiD analysis of 11k GitHub projects.

All Smoke, No Alarm: Oracle Signals in Agent-Authored Test Code

cs.SE · 2026-06-16 · unverdicted · novelty 6.0

An empirical study of 86,156 test patches from five AI agents finds 80.2% lack strong oracle signals, with strong oracles linked to higher merge rates (OR=1.28) after regression controls.

citing papers explorer

Showing 2 of 2 citing papers.

Augmentation with Dilution: A Large-Scale Empirical Study of Human Contributor Ecosystems After AI Coding Agent Adoption cs.SE · 2026-06-24 · unverdicted · none · ref 27 · internal anchor
AI coding agent adoption causes no change in human contributor count but reduces contributor density and newcomer share by 3.7pp while increasing review depth by 5.3% in a staggered DiD analysis of 11k GitHub projects.
All Smoke, No Alarm: Oracle Signals in Agent-Authored Test Code cs.SE · 2026-06-16 · unverdicted · none · ref 20 · internal anchor
An empirical study of 86,156 test patches from five AI agents finds 80.2% lack strong oracle signals, with strong oracles linked to higher merge rates (OR=1.28) after regression controls.

From Industry Claims to Empirical Reality: An Empirical Study of Code Review Agents in Pull Requests

fields

years

verdicts

representative citing papers

citing papers explorer