pith. sign in

arxiv: 2508.06879 · v5 · submitted 2025-08-09 · 💻 cs.SE

Quo Vadis, Code Review? Exploring the Future of Code Review

Pith reviewed 2026-05-19 00:12 UTC · model grok-4.3

classification 💻 cs.SE
keywords code reviewsoftware engineeringdeveloper surveyAI automationlarge language modelsfuture expectationssocio-technical challengesaccountability in development
0
0 comments X

The pith

Developers expect code review to remain essential over the next five years even as automation increases.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

This paper reports on a survey of 100 professional developers from five companies about how they see code review changing in the coming five years. The developers anticipate that reviews will stay important, with time spent on them staying the same or going up, and that more kinds of development artifacts will be reviewed. Many mention AI and large language models playing bigger roles in both writing and reviewing code. The study points to new tensions around whether developers will understand automated reviews, who is accountable for them, and how much to trust them. These findings matter because they give an early look at the practical and social challenges that AI tools may bring to everyday software work.

Core claim

Practitioners expect code review to remain essential, anticipating stable or increased time investment and a broader range of reviewed artifacts over the next five years. Many participants referenced AI and large language models, describing increasing automation in both code authoring and reviewing, including scenarios where automated systems handle both roles. The analysis identifies emerging tensions concerning understanding, accountability, and trust in automation-mediated code review.

What carries the argument

Cross-sectional survey of developer expectations combined with thematic analysis of open responses to identify anticipated changes and socio-technical tensions in future code review practices.

If this is right

  • Code review practices will likely adapt to include more automated assistance without reducing overall effort.
  • Teams may need to review a wider variety of artifacts including those generated by AI.
  • New processes will be required to maintain accountability when AI contributes to code or reviews.
  • Trust in automated review tools will become a key factor in their effective use.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Organizations should start preparing guidelines for when and how to use AI in code review to address potential trust gaps.
  • Longitudinal studies could track whether these predicted tensions actually materialize as AI tools mature.
  • Code review could serve as a practical case study for managing human-AI collaboration in other software engineering activities.

Load-bearing premise

That the current self-reported expectations of developers will accurately predict how code review practices actually evolve over the next five years.

What would settle it

A replication survey conducted in five years showing that actual time spent on code review has decreased substantially due to automation or that developers no longer view it as essential.

read the original abstract

Context: Code review has long been a core practice in collaborative software engineering. As automation becomes increasingly embedded in development workflows, the role and functioning of code review are subject to change. Objective: This study explores how professional developers anticipate the evolution of code review and identifies emerging tensions reflected in these expectations. Method: We conducted a cross-sectional survey with 100 developers across five software-driven companies. The survey captured estimates of current review time and reviewed artifacts, as well as anticipated changes over a five-year horizon. Open-ended questions invited reflections on the future of code review. Quantitative responses were analyzed descriptively, and open-ended responses were independently coded by multiple researchers using thematic analysis to identify recurring patterns in participant responses. Results: Practitioners expect code review to remain essential, anticipating stable or increased time investment and a broader range of reviewed artifacts over the next five years. In open-ended responses, many participants explicitly referenced AI and large language models (LLMs), describing increasing automation in both code authoring and reviewing, including scenarios in which automated systems operate in both roles. Conclusion: Our analysis suggests emerging tensions concerning understanding, accountability, and trust in automation-mediated code review. These tensions provide early empirical signals of socio-technical challenges and position code review as a concrete setting for examining the implications of LLM integration in collaborative software engineering.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 1 minor

Summary. The paper reports findings from a cross-sectional survey of 100 professional developers across five software-driven companies. It uses descriptive quantitative analysis of estimates for current and future review time and artifacts, combined with thematic analysis of open-ended responses, to argue that code review is expected to remain essential with stable or increased time investment and broader artifact coverage over five years, while surfacing emerging tensions around understanding, accountability, and trust in automation- and LLM-mediated review.

Significance. If the thematic patterns prove robust, the work supplies early empirical signals on socio-technical frictions in LLM-augmented code review, a timely topic for collaborative software engineering. The survey design directly elicits practitioner expectations rather than deriving them from models, providing falsifiable baseline data for longitudinal follow-up studies.

major comments (2)
  1. [Methods] Methods section: the thematic analysis is described as responses being 'independently coded by multiple researchers using thematic analysis,' yet no inter-rater reliability statistic, disagreement-resolution protocol, or validation procedure (e.g., member checking) is reported. This directly undermines the load-bearing claim that the data reveal recurring tensions in understanding, accountability, and trust, as the patterns may reflect coder framing of AI topics rather than participant consensus.
  2. [Results] Results / Sample description: the manuscript provides no response rate, exclusion criteria, or detailed demographics for the 100 respondents from five companies. Without these, the descriptive findings on anticipated stable/increased review time and broader artifacts cannot be assessed for selection bias or generalizability.
minor comments (1)
  1. [Abstract] The abstract and results refer to 'five software-driven companies' without indicating company size, domain, or maturity; adding this context would help readers interpret the scope of the expectations reported.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive and detailed feedback on our manuscript. We address each major comment below, indicating the revisions planned for the next version.

read point-by-point responses
  1. Referee: [Methods] Methods section: the thematic analysis is described as responses being 'independently coded by multiple researchers using thematic analysis,' yet no inter-rater reliability statistic, disagreement-resolution protocol, or validation procedure (e.g., member checking) is reported. This directly undermines the load-bearing claim that the data reveal recurring tensions in understanding, accountability, and trust, as the patterns may reflect coder framing of AI topics rather than participant consensus.

    Authors: We appreciate the referee's point on reporting standards for thematic analysis. Two researchers independently coded the open-ended responses using an inductive approach and resolved differences through discussion to reach consensus on the themes. We did not compute an inter-rater reliability statistic because the analysis prioritized identifying broad, recurring patterns over fine-grained code agreement. Member checking was not feasible given the anonymous survey format. We will revise the Methods section to describe the independent coding and consensus process in more detail, thereby strengthening the grounding of the reported tensions in participant responses rather than coder imposition. revision: yes

  2. Referee: [Results] Results / Sample description: the manuscript provides no response rate, exclusion criteria, or detailed demographics for the 100 respondents from five companies. Without these, the descriptive findings on anticipated stable/increased review time and broader artifacts cannot be assessed for selection bias or generalizability.

    Authors: We agree that greater transparency on sampling would improve the manuscript. The survey was distributed voluntarily through internal company channels at the five organizations, but we did not maintain centralized records of the total number of invitations issued and therefore cannot report a precise response rate. Exclusion criteria were limited to confirming relevant professional experience with code review. To respect anonymity, only high-level role and experience information was collected. We will expand the Sample Description to document these aspects and add a Limitations section that discusses implications for selection bias and generalizability. The quantitative findings will be framed explicitly as descriptive results from this sample. revision: partial

Circularity Check

0 steps flagged

No circularity: purely empirical survey with findings derived from external participant responses

full rationale

This is a cross-sectional survey study collecting quantitative estimates and open-ended reflections from 100 developers. Results on expected changes in review time, artifacts, and emerging tensions are obtained via descriptive statistics and thematic analysis of participant responses. No mathematical derivations, equations, fitted parameters, or first-principles claims exist that could reduce to the paper's own inputs by construction. No load-bearing self-citations or uniqueness theorems are invoked; the analysis remains self-contained against the collected data.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

This empirical survey study rests on standard assumptions about the validity of self-reported data and qualitative coding rather than mathematical axioms or new postulated entities.

axioms (2)
  • domain assumption Self-reported expectations from a cross-sectional survey reliably indicate future behavior and concerns
    The study treats participant answers about five-year changes as meaningful signals without external validation of predictive accuracy.
  • domain assumption Independent thematic coding by multiple researchers produces unbiased and complete identification of recurring patterns
    The method description assumes that the coding process captures the key tensions without significant interpretive variance.

pith-pipeline@v0.9.0 · 5797 in / 1435 out tokens · 50709 ms · 2026-05-19T00:12:54.444798+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

16 extracted references · 16 canonical work pages

  1. [1]

    Expecta- tions, outcomes, and challenges of modern code review

    Alberto Bacchelli and Christian Bird. “Expecta- tions, outcomes, and challenges of modern code review”. In: 2013 35th International Conference on Software Engineering (ICSE). 2013, pp. 712–

  2. [2]

    DOI: 10.1109/ICSE.2013.6606617

  3. [3]

    Modern Code Reviews - A Survey of Literature and Practice

    Deepika Badampudi, Michael Unterkalmsteiner, and Ricardo Britto. “Modern Code Reviews - A Survey of Literature and Practice”. In: ACM Transactions on Software Engineering and Methodology (Feb. 2023). DOI: 10 . 1145 / 3585004

  4. [4]

    Code review guide- lines for GUI-based testing artifacts

    Andreas Bauer, Riccardo Coppola, Emil Alé- groth, and Tony Gorschek. “Code review guide- lines for GUI-based testing artifacts”. In:Informa- tion and Software Technology 163 (Nov. 2023), p. 107299. DOI: 10.1016/j.infsof.2023.107299

  5. [5]

    Process Aspects and Social Dynamics of Contemporary Code Review: Insights from Open Source Development and Industrial Practice at Microsoft

    Amiangshu Bosu, Jeffrey C. Carver, Christian Bird, Jonathan Orbeck, and Christopher Chock- ley. “Process Aspects and Social Dynamics of Contemporary Code Review: Insights from Open Source Development and Industrial Practice at Microsoft”. In: IEEE Transactions on Software Engineering 43 (1 2017), pp. 56–75. DOI: 10 . 1109/TSE.2016.2576451

  6. [6]

    Thematic analysis

    Victoria Clarke and Virginia Braun. “Thematic analysis”. In:The Journal of Positive Psychology 12 (3 May 2017), pp. 297–298. DOI: 10.1080/ 17439760.2016.1262613

  7. [7]

    Tales From the Trenches: Expectations and Challenges From Practice for Code Review in the Generative AI Era

    Nicole Davila, Jorge Melegati, and Igor Wiese. “Tales From the Trenches: Expectations and Challenges From Practice for Code Review in the Generative AI Era”. In: IEEE Software 41 (6 Nov. 2024), pp. 38–45. DOI: 10.1109/MS.2024. 3428439

  8. [8]

    Taxing Collaborative Software Engineering

    Michael Dorner, Maximilian Capraro, Oliver Trei- dler, Tom-Eric Kunz, Darja Šmite, Ehsan Zabar- dast, Daniel Mendez, and Krzysztof Wnuk. “Taxing Collaborative Software Engineering”. In: IEEE Software (2024), pp. 1–8. DOI: 10.1109/ MS.2023.3346646

  9. [9]

    The Upper Bound of Information Diffusion in Code Review

    Michael Dorner, Daniel Mendez, Krzysztof Wnuk, Ehsan Zabardast, and Jacek Czerwonka. “The Upper Bound of Information Diffusion in Code Review”. In: Empirical Software Engineer- ing (June 2023)

  10. [10]

    Design and code inspections to reduce errors in program development

    M. E. Fagan. “Design and code inspections to reduce errors in program development”. In: IBM Systems Journal 15 (3 1976), pp. 182–211. DOI: 10.1147/sj.153.0182

  11. [11]

    Future of software development with generative AI

    Jaakko Sauvola, Sasu Tarkoma, Mika Klemetti- nen, Jukka Riekki, and David Doermann. “Future of software development with generative AI”. In: Automated Software Engineering 31.1 (2024). DOI: 10.1007/s10515-024-00426-z

  12. [12]

    The I in LLM Stands For Intel- ligence

    Daniel Stenberg. The I in LLM Stands For Intel- ligence. 2024. URL: https://daniel.haxx.se/blog/ 2024/01/02/the-i-in-llm-stands-for-intelligence/ (visited on 05/29/2025)

  13. [13]

    Calibration and Correctness of Language Models for Code

    Rosalia Tufano, Alberto Martin-Lopez, Ahmad Tayeb, Ozren Dabic, Sonia Haiduc, and Gabriele Bavota. “ Deep Learning-based Code Reviews: A Paradigm Shift or a Double-Edged Sword? ” In: 2025 IEEE/ACM 47th International Conference on Software Engineering (ICSE) . Los Alamitos, CA, USA: IEEE Computer Society, May 2025, pp. 597–597. DOI: 10.1109/ICSE55347.2025. 00060

  14. [14]

    Integrating pair programming into a software development process

    Laurie Williams. “Integrating pair programming into a software development process”. In: Pro- ceedings 14th Conference on Software En- gineering Education and Training. ’In search of a software engineering profession’ (Cat. No.PR01059). IEEE Comput. Soc, pp. 27–36. DOI: 10.1109/CSEE.2001.913816

  15. [15]

    Ownership vs Contribution: In- vestigating the Alignment Between Ownership 2025 IEEE Software 7 and Contribution

    Ehsan Zabardast, Javier Gonzalez-Huerta, and Binish Tanveer. “Ownership vs Contribution: In- vestigating the Alignment Between Ownership 2025 IEEE Software 7 and Contribution”. In: 2022 IEEE 19th Inter- national Conference on Software Architecture Companion (ICSA-C). IEEE, Mar. 2022, pp. 30–

  16. [16]

    8 IEEE Software 2025

    DOI: 10.1109/ICSA-C54293.2022.00013. 8 IEEE Software 2025