Does AI Code Review Lead to Code Changes? A Case Study of GitHub Actions

· 2025 · cs.SE · arXiv 2508.18771

5 Pith papers cite this work. Polarity classification is still indexing.

5 Pith papers citing it

open full Pith review browse 5 citing papers arXiv PDF

abstract

AI-based code review tools automatically review and comment on pull requests to improve code quality. Despite their growing presence, little is known about their actual impact. We present a large-scale empirical study of 16 popular AI-based code review actions for GitHub workflows, analyzing more than 22,000 review comments in 178 repositories. We investigate (1) how these tools are adopted and configured, (2) whether their comments lead to code changes, and (3) which factors influence their effectiveness. We develop a two-stage LLM-assisted framework to determine whether review comments are addressed, and use interpretable machine learning to identify influencing factors. Our findings show that, while adoption is growing, effectiveness varies widely. Comments that are concise, contain code snippets, and are manually triggered, particularly those from hunk-level review tools, are more likely to result in code changes. These results highlight the importance of careful tool design and suggest directions for improving AI-based code review systems.

representative citing papers

Heimdallr: Characterizing and Detecting LLM-Induced Security Risks in GitHub CI Workflows

cs.CR · 2026-05-07 · unverdicted · novelty 8.0

Heimdallr detects LLM-induced security risks in GitHub CI workflows by normalizing them into an LLM-Workflow Property Graph and combining triggerability analysis with LLM-assisted dataflow summarization, achieving over 0.91 F1 on threat detection in evaluation.

Why Are Agentic Pull Requests Merged or Rejected? An Empirical Study

cs.SE · 2026-05-21 · unverdicted · novelty 6.0

Analysis of 9,799 human-reviewed agentic PRs shows only 35.7% of rejections reflect clear agent failures, with 31.2% due to workflow constraints and 33.1% lacking clear rationale, plus notable interaction differences across agents.

On the Footprints of Reviewer Bots Feedback on Agentic Pull Requests in OSS GitHub Repositories

cs.SE · 2026-04-27 · unverdicted · novelty 6.0

Reviewer bots' higher comment volume on AI agent PRs is associated with slower resolutions and poorer average feedback quality, while feedback quality itself has no association with PR outcomes.

From Industry Claims to Empirical Reality: An Empirical Study of Code Review Agents in Pull Requests

cs.SE · 2026-04-03 · conditional · novelty 5.0

Code review agents achieve 45.20% merge rate on PRs versus 68.37% for humans, with 60.2% of agent-only closed PRs showing 0-30% signal quality.

Enhancing Large Language Models with Retrieval Augmented Generation for Software Testing and Inspection Automation

cs.SE · 2026-04-16 · unverdicted · novelty 3.0

RAG-enhanced LLMs show generally positive effects on automated test generation and code inspection by supplying supplementary context that reduces hallucinations.

citing papers explorer

Showing 5 of 5 citing papers.

Heimdallr: Characterizing and Detecting LLM-Induced Security Risks in GitHub CI Workflows cs.CR · 2026-05-07 · unverdicted · none · ref 40 · internal anchor
Heimdallr detects LLM-induced security risks in GitHub CI workflows by normalizing them into an LLM-Workflow Property Graph and combining triggerability analysis with LLM-assisted dataflow summarization, achieving over 0.91 F1 on threat detection in evaluation.
Why Are Agentic Pull Requests Merged or Rejected? An Empirical Study cs.SE · 2026-05-21 · unverdicted · none · ref 17 · internal anchor
Analysis of 9,799 human-reviewed agentic PRs shows only 35.7% of rejections reflect clear agent failures, with 31.2% due to workflow constraints and 33.1% lacking clear rationale, plus notable interaction differences across agents.
On the Footprints of Reviewer Bots Feedback on Agentic Pull Requests in OSS GitHub Repositories cs.SE · 2026-04-27 · unverdicted · none · ref 17 · internal anchor
Reviewer bots' higher comment volume on AI agent PRs is associated with slower resolutions and poorer average feedback quality, while feedback quality itself has no association with PR outcomes.
From Industry Claims to Empirical Reality: An Empirical Study of Code Review Agents in Pull Requests cs.SE · 2026-04-03 · conditional · none · ref 7 · internal anchor
Code review agents achieve 45.20% merge rate on PRs versus 68.37% for humans, with 60.2% of agent-only closed PRs showing 0-30% signal quality.
Enhancing Large Language Models with Retrieval Augmented Generation for Software Testing and Inspection Automation cs.SE · 2026-04-16 · unverdicted · none · ref 38 · internal anchor
RAG-enhanced LLMs show generally positive effects on automated test generation and code inspection by supplying supplementary context that reduces hallucinations.

Does AI Code Review Lead to Code Changes? A Case Study of GitHub Actions

fields

years

verdicts

representative citing papers

citing papers explorer