Quo Vadis, Code Review? Exploring the Future of Code Review
Pith reviewed 2026-05-19 00:12 UTC · model grok-4.3
The pith
Developers expect code review to remain essential over the next five years even as automation increases.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Practitioners expect code review to remain essential, anticipating stable or increased time investment and a broader range of reviewed artifacts over the next five years. Many participants referenced AI and large language models, describing increasing automation in both code authoring and reviewing, including scenarios where automated systems handle both roles. The analysis identifies emerging tensions concerning understanding, accountability, and trust in automation-mediated code review.
What carries the argument
Cross-sectional survey of developer expectations combined with thematic analysis of open responses to identify anticipated changes and socio-technical tensions in future code review practices.
If this is right
- Code review practices will likely adapt to include more automated assistance without reducing overall effort.
- Teams may need to review a wider variety of artifacts including those generated by AI.
- New processes will be required to maintain accountability when AI contributes to code or reviews.
- Trust in automated review tools will become a key factor in their effective use.
Where Pith is reading between the lines
- Organizations should start preparing guidelines for when and how to use AI in code review to address potential trust gaps.
- Longitudinal studies could track whether these predicted tensions actually materialize as AI tools mature.
- Code review could serve as a practical case study for managing human-AI collaboration in other software engineering activities.
Load-bearing premise
That the current self-reported expectations of developers will accurately predict how code review practices actually evolve over the next five years.
What would settle it
A replication survey conducted in five years showing that actual time spent on code review has decreased substantially due to automation or that developers no longer view it as essential.
read the original abstract
Context: Code review has long been a core practice in collaborative software engineering. As automation becomes increasingly embedded in development workflows, the role and functioning of code review are subject to change. Objective: This study explores how professional developers anticipate the evolution of code review and identifies emerging tensions reflected in these expectations. Method: We conducted a cross-sectional survey with 100 developers across five software-driven companies. The survey captured estimates of current review time and reviewed artifacts, as well as anticipated changes over a five-year horizon. Open-ended questions invited reflections on the future of code review. Quantitative responses were analyzed descriptively, and open-ended responses were independently coded by multiple researchers using thematic analysis to identify recurring patterns in participant responses. Results: Practitioners expect code review to remain essential, anticipating stable or increased time investment and a broader range of reviewed artifacts over the next five years. In open-ended responses, many participants explicitly referenced AI and large language models (LLMs), describing increasing automation in both code authoring and reviewing, including scenarios in which automated systems operate in both roles. Conclusion: Our analysis suggests emerging tensions concerning understanding, accountability, and trust in automation-mediated code review. These tensions provide early empirical signals of socio-technical challenges and position code review as a concrete setting for examining the implications of LLM integration in collaborative software engineering.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper reports findings from a cross-sectional survey of 100 professional developers across five software-driven companies. It uses descriptive quantitative analysis of estimates for current and future review time and artifacts, combined with thematic analysis of open-ended responses, to argue that code review is expected to remain essential with stable or increased time investment and broader artifact coverage over five years, while surfacing emerging tensions around understanding, accountability, and trust in automation- and LLM-mediated review.
Significance. If the thematic patterns prove robust, the work supplies early empirical signals on socio-technical frictions in LLM-augmented code review, a timely topic for collaborative software engineering. The survey design directly elicits practitioner expectations rather than deriving them from models, providing falsifiable baseline data for longitudinal follow-up studies.
major comments (2)
- [Methods] Methods section: the thematic analysis is described as responses being 'independently coded by multiple researchers using thematic analysis,' yet no inter-rater reliability statistic, disagreement-resolution protocol, or validation procedure (e.g., member checking) is reported. This directly undermines the load-bearing claim that the data reveal recurring tensions in understanding, accountability, and trust, as the patterns may reflect coder framing of AI topics rather than participant consensus.
- [Results] Results / Sample description: the manuscript provides no response rate, exclusion criteria, or detailed demographics for the 100 respondents from five companies. Without these, the descriptive findings on anticipated stable/increased review time and broader artifacts cannot be assessed for selection bias or generalizability.
minor comments (1)
- [Abstract] The abstract and results refer to 'five software-driven companies' without indicating company size, domain, or maturity; adding this context would help readers interpret the scope of the expectations reported.
Simulated Author's Rebuttal
We thank the referee for the constructive and detailed feedback on our manuscript. We address each major comment below, indicating the revisions planned for the next version.
read point-by-point responses
-
Referee: [Methods] Methods section: the thematic analysis is described as responses being 'independently coded by multiple researchers using thematic analysis,' yet no inter-rater reliability statistic, disagreement-resolution protocol, or validation procedure (e.g., member checking) is reported. This directly undermines the load-bearing claim that the data reveal recurring tensions in understanding, accountability, and trust, as the patterns may reflect coder framing of AI topics rather than participant consensus.
Authors: We appreciate the referee's point on reporting standards for thematic analysis. Two researchers independently coded the open-ended responses using an inductive approach and resolved differences through discussion to reach consensus on the themes. We did not compute an inter-rater reliability statistic because the analysis prioritized identifying broad, recurring patterns over fine-grained code agreement. Member checking was not feasible given the anonymous survey format. We will revise the Methods section to describe the independent coding and consensus process in more detail, thereby strengthening the grounding of the reported tensions in participant responses rather than coder imposition. revision: yes
-
Referee: [Results] Results / Sample description: the manuscript provides no response rate, exclusion criteria, or detailed demographics for the 100 respondents from five companies. Without these, the descriptive findings on anticipated stable/increased review time and broader artifacts cannot be assessed for selection bias or generalizability.
Authors: We agree that greater transparency on sampling would improve the manuscript. The survey was distributed voluntarily through internal company channels at the five organizations, but we did not maintain centralized records of the total number of invitations issued and therefore cannot report a precise response rate. Exclusion criteria were limited to confirming relevant professional experience with code review. To respect anonymity, only high-level role and experience information was collected. We will expand the Sample Description to document these aspects and add a Limitations section that discusses implications for selection bias and generalizability. The quantitative findings will be framed explicitly as descriptive results from this sample. revision: partial
Circularity Check
No circularity: purely empirical survey with findings derived from external participant responses
full rationale
This is a cross-sectional survey study collecting quantitative estimates and open-ended reflections from 100 developers. Results on expected changes in review time, artifacts, and emerging tensions are obtained via descriptive statistics and thematic analysis of participant responses. No mathematical derivations, equations, fitted parameters, or first-principles claims exist that could reduce to the paper's own inputs by construction. No load-bearing self-citations or uniqueness theorems are invoked; the analysis remains self-contained against the collected data.
Axiom & Free-Parameter Ledger
axioms (2)
- domain assumption Self-reported expectations from a cross-sectional survey reliably indicate future behavior and concerns
- domain assumption Independent thematic coding by multiple researchers produces unbiased and complete identification of recurring patterns
Lean theorems connected to this paper
-
IndisputableMonolith/Foundation/RealityFromDistinction.leanreality_from_one_distinction unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
We conducted a cross-sectional survey with 100 developers... open-ended responses were independently coded by multiple researchers using thematic analysis
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Reference graph
Works this paper leans on
-
[1]
Expecta- tions, outcomes, and challenges of modern code review
Alberto Bacchelli and Christian Bird. “Expecta- tions, outcomes, and challenges of modern code review”. In: 2013 35th International Conference on Software Engineering (ICSE). 2013, pp. 712–
work page 2013
-
[2]
DOI: 10.1109/ICSE.2013.6606617
-
[3]
Modern Code Reviews - A Survey of Literature and Practice
Deepika Badampudi, Michael Unterkalmsteiner, and Ricardo Britto. “Modern Code Reviews - A Survey of Literature and Practice”. In: ACM Transactions on Software Engineering and Methodology (Feb. 2023). DOI: 10 . 1145 / 3585004
work page 2023
-
[4]
Code review guide- lines for GUI-based testing artifacts
Andreas Bauer, Riccardo Coppola, Emil Alé- groth, and Tony Gorschek. “Code review guide- lines for GUI-based testing artifacts”. In:Informa- tion and Software Technology 163 (Nov. 2023), p. 107299. DOI: 10.1016/j.infsof.2023.107299
-
[5]
Amiangshu Bosu, Jeffrey C. Carver, Christian Bird, Jonathan Orbeck, and Christopher Chock- ley. “Process Aspects and Social Dynamics of Contemporary Code Review: Insights from Open Source Development and Industrial Practice at Microsoft”. In: IEEE Transactions on Software Engineering 43 (1 2017), pp. 56–75. DOI: 10 . 1109/TSE.2016.2576451
-
[6]
Victoria Clarke and Virginia Braun. “Thematic analysis”. In:The Journal of Positive Psychology 12 (3 May 2017), pp. 297–298. DOI: 10.1080/ 17439760.2016.1262613
-
[7]
Nicole Davila, Jorge Melegati, and Igor Wiese. “Tales From the Trenches: Expectations and Challenges From Practice for Code Review in the Generative AI Era”. In: IEEE Software 41 (6 Nov. 2024), pp. 38–45. DOI: 10.1109/MS.2024. 3428439
-
[8]
Taxing Collaborative Software Engineering
Michael Dorner, Maximilian Capraro, Oliver Trei- dler, Tom-Eric Kunz, Darja Šmite, Ehsan Zabar- dast, Daniel Mendez, and Krzysztof Wnuk. “Taxing Collaborative Software Engineering”. In: IEEE Software (2024), pp. 1–8. DOI: 10.1109/ MS.2023.3346646
-
[9]
The Upper Bound of Information Diffusion in Code Review
Michael Dorner, Daniel Mendez, Krzysztof Wnuk, Ehsan Zabardast, and Jacek Czerwonka. “The Upper Bound of Information Diffusion in Code Review”. In: Empirical Software Engineer- ing (June 2023)
work page 2023
-
[10]
Design and code inspections to reduce errors in program development
M. E. Fagan. “Design and code inspections to reduce errors in program development”. In: IBM Systems Journal 15 (3 1976), pp. 182–211. DOI: 10.1147/sj.153.0182
-
[11]
Future of software development with generative AI
Jaakko Sauvola, Sasu Tarkoma, Mika Klemetti- nen, Jukka Riekki, and David Doermann. “Future of software development with generative AI”. In: Automated Software Engineering 31.1 (2024). DOI: 10.1007/s10515-024-00426-z
-
[12]
The I in LLM Stands For Intel- ligence
Daniel Stenberg. The I in LLM Stands For Intel- ligence. 2024. URL: https://daniel.haxx.se/blog/ 2024/01/02/the-i-in-llm-stands-for-intelligence/ (visited on 05/29/2025)
work page 2024
-
[13]
Calibration and Correctness of Language Models for Code
Rosalia Tufano, Alberto Martin-Lopez, Ahmad Tayeb, Ozren Dabic, Sonia Haiduc, and Gabriele Bavota. “ Deep Learning-based Code Reviews: A Paradigm Shift or a Double-Edged Sword? ” In: 2025 IEEE/ACM 47th International Conference on Software Engineering (ICSE) . Los Alamitos, CA, USA: IEEE Computer Society, May 2025, pp. 597–597. DOI: 10.1109/ICSE55347.2025. 00060
-
[14]
Integrating pair programming into a software development process
Laurie Williams. “Integrating pair programming into a software development process”. In: Pro- ceedings 14th Conference on Software En- gineering Education and Training. ’In search of a software engineering profession’ (Cat. No.PR01059). IEEE Comput. Soc, pp. 27–36. DOI: 10.1109/CSEE.2001.913816
-
[15]
Ehsan Zabardast, Javier Gonzalez-Huerta, and Binish Tanveer. “Ownership vs Contribution: In- vestigating the Alignment Between Ownership 2025 IEEE Software 7 and Contribution”. In: 2022 IEEE 19th Inter- national Conference on Software Architecture Companion (ICSA-C). IEEE, Mar. 2022, pp. 30–
work page 2025
-
[16]
DOI: 10.1109/ICSA-C54293.2022.00013. 8 IEEE Software 2025
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.