Quo Vadis, Code Review? Exploring the Future of Code Review

Andreas Bauer; Daniel Mendez; Darja \v{S}mite; Ehsan Zabardast; Lukas Thode; Michael Dorner; Michael Kormann; Ricardo Britto; Stephan Lukasczyk

arxiv: 2508.06879 · v5 · submitted 2025-08-09 · 💻 cs.SE

Quo Vadis, Code Review? Exploring the Future of Code Review

Michael Dorner , Andreas Bauer , Darja \v{S}mite , Lukas Thode , Daniel Mendez , Ricardo Britto , Stephan Lukasczyk , Ehsan Zabardast

show 1 more author

Michael Kormann

This is my paper

Pith reviewed 2026-05-19 00:12 UTC · model grok-4.3

classification 💻 cs.SE

keywords code reviewsoftware engineeringdeveloper surveyAI automationlarge language modelsfuture expectationssocio-technical challengesaccountability in development

0 comments

The pith

Developers expect code review to remain essential over the next five years even as automation increases.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

This paper reports on a survey of 100 professional developers from five companies about how they see code review changing in the coming five years. The developers anticipate that reviews will stay important, with time spent on them staying the same or going up, and that more kinds of development artifacts will be reviewed. Many mention AI and large language models playing bigger roles in both writing and reviewing code. The study points to new tensions around whether developers will understand automated reviews, who is accountable for them, and how much to trust them. These findings matter because they give an early look at the practical and social challenges that AI tools may bring to everyday software work.

Core claim

Practitioners expect code review to remain essential, anticipating stable or increased time investment and a broader range of reviewed artifacts over the next five years. Many participants referenced AI and large language models, describing increasing automation in both code authoring and reviewing, including scenarios where automated systems handle both roles. The analysis identifies emerging tensions concerning understanding, accountability, and trust in automation-mediated code review.

What carries the argument

Cross-sectional survey of developer expectations combined with thematic analysis of open responses to identify anticipated changes and socio-technical tensions in future code review practices.

If this is right

Code review practices will likely adapt to include more automated assistance without reducing overall effort.
Teams may need to review a wider variety of artifacts including those generated by AI.
New processes will be required to maintain accountability when AI contributes to code or reviews.
Trust in automated review tools will become a key factor in their effective use.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Organizations should start preparing guidelines for when and how to use AI in code review to address potential trust gaps.
Longitudinal studies could track whether these predicted tensions actually materialize as AI tools mature.
Code review could serve as a practical case study for managing human-AI collaboration in other software engineering activities.

Load-bearing premise

That the current self-reported expectations of developers will accurately predict how code review practices actually evolve over the next five years.

What would settle it

A replication survey conducted in five years showing that actual time spent on code review has decreased substantially due to automation or that developers no longer view it as essential.

read the original abstract

Context: Code review has long been a core practice in collaborative software engineering. As automation becomes increasingly embedded in development workflows, the role and functioning of code review are subject to change. Objective: This study explores how professional developers anticipate the evolution of code review and identifies emerging tensions reflected in these expectations. Method: We conducted a cross-sectional survey with 100 developers across five software-driven companies. The survey captured estimates of current review time and reviewed artifacts, as well as anticipated changes over a five-year horizon. Open-ended questions invited reflections on the future of code review. Quantitative responses were analyzed descriptively, and open-ended responses were independently coded by multiple researchers using thematic analysis to identify recurring patterns in participant responses. Results: Practitioners expect code review to remain essential, anticipating stable or increased time investment and a broader range of reviewed artifacts over the next five years. In open-ended responses, many participants explicitly referenced AI and large language models (LLMs), describing increasing automation in both code authoring and reviewing, including scenarios in which automated systems operate in both roles. Conclusion: Our analysis suggests emerging tensions concerning understanding, accountability, and trust in automation-mediated code review. These tensions provide early empirical signals of socio-technical challenges and position code review as a concrete setting for examining the implications of LLM integration in collaborative software engineering.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 1 minor

Summary. The paper reports findings from a cross-sectional survey of 100 professional developers across five software-driven companies. It uses descriptive quantitative analysis of estimates for current and future review time and artifacts, combined with thematic analysis of open-ended responses, to argue that code review is expected to remain essential with stable or increased time investment and broader artifact coverage over five years, while surfacing emerging tensions around understanding, accountability, and trust in automation- and LLM-mediated review.

Significance. If the thematic patterns prove robust, the work supplies early empirical signals on socio-technical frictions in LLM-augmented code review, a timely topic for collaborative software engineering. The survey design directly elicits practitioner expectations rather than deriving them from models, providing falsifiable baseline data for longitudinal follow-up studies.

major comments (2)

[Methods] Methods section: the thematic analysis is described as responses being 'independently coded by multiple researchers using thematic analysis,' yet no inter-rater reliability statistic, disagreement-resolution protocol, or validation procedure (e.g., member checking) is reported. This directly undermines the load-bearing claim that the data reveal recurring tensions in understanding, accountability, and trust, as the patterns may reflect coder framing of AI topics rather than participant consensus.
[Results] Results / Sample description: the manuscript provides no response rate, exclusion criteria, or detailed demographics for the 100 respondents from five companies. Without these, the descriptive findings on anticipated stable/increased review time and broader artifacts cannot be assessed for selection bias or generalizability.

minor comments (1)

[Abstract] The abstract and results refer to 'five software-driven companies' without indicating company size, domain, or maturity; adding this context would help readers interpret the scope of the expectations reported.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive and detailed feedback on our manuscript. We address each major comment below, indicating the revisions planned for the next version.

read point-by-point responses

Referee: [Methods] Methods section: the thematic analysis is described as responses being 'independently coded by multiple researchers using thematic analysis,' yet no inter-rater reliability statistic, disagreement-resolution protocol, or validation procedure (e.g., member checking) is reported. This directly undermines the load-bearing claim that the data reveal recurring tensions in understanding, accountability, and trust, as the patterns may reflect coder framing of AI topics rather than participant consensus.

Authors: We appreciate the referee's point on reporting standards for thematic analysis. Two researchers independently coded the open-ended responses using an inductive approach and resolved differences through discussion to reach consensus on the themes. We did not compute an inter-rater reliability statistic because the analysis prioritized identifying broad, recurring patterns over fine-grained code agreement. Member checking was not feasible given the anonymous survey format. We will revise the Methods section to describe the independent coding and consensus process in more detail, thereby strengthening the grounding of the reported tensions in participant responses rather than coder imposition. revision: yes
Referee: [Results] Results / Sample description: the manuscript provides no response rate, exclusion criteria, or detailed demographics for the 100 respondents from five companies. Without these, the descriptive findings on anticipated stable/increased review time and broader artifacts cannot be assessed for selection bias or generalizability.

Authors: We agree that greater transparency on sampling would improve the manuscript. The survey was distributed voluntarily through internal company channels at the five organizations, but we did not maintain centralized records of the total number of invitations issued and therefore cannot report a precise response rate. Exclusion criteria were limited to confirming relevant professional experience with code review. To respect anonymity, only high-level role and experience information was collected. We will expand the Sample Description to document these aspects and add a Limitations section that discusses implications for selection bias and generalizability. The quantitative findings will be framed explicitly as descriptive results from this sample. revision: partial

Circularity Check

0 steps flagged

No circularity: purely empirical survey with findings derived from external participant responses

full rationale

This is a cross-sectional survey study collecting quantitative estimates and open-ended reflections from 100 developers. Results on expected changes in review time, artifacts, and emerging tensions are obtained via descriptive statistics and thematic analysis of participant responses. No mathematical derivations, equations, fitted parameters, or first-principles claims exist that could reduce to the paper's own inputs by construction. No load-bearing self-citations or uniqueness theorems are invoked; the analysis remains self-contained against the collected data.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

This empirical survey study rests on standard assumptions about the validity of self-reported data and qualitative coding rather than mathematical axioms or new postulated entities.

axioms (2)

domain assumption Self-reported expectations from a cross-sectional survey reliably indicate future behavior and concerns
The study treats participant answers about five-year changes as meaningful signals without external validation of predictive accuracy.
domain assumption Independent thematic coding by multiple researchers produces unbiased and complete identification of recurring patterns
The method description assumes that the coding process captures the key tensions without significant interpretive variance.

pith-pipeline@v0.9.0 · 5797 in / 1435 out tokens · 50709 ms · 2026-05-19T00:12:54.444798+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Foundation/RealityFromDistinction.lean reality_from_one_distinction unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

We conducted a cross-sectional survey with 100 developers... open-ended responses were independently coded by multiple researchers using thematic analysis

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

16 extracted references · 16 canonical work pages

[1]

Expecta- tions, outcomes, and challenges of modern code review

Alberto Bacchelli and Christian Bird. “Expecta- tions, outcomes, and challenges of modern code review”. In: 2013 35th International Conference on Software Engineering (ICSE). 2013, pp. 712–

work page 2013
[2]

DOI: 10.1109/ICSE.2013.6606617

work page doi:10.1109/icse.2013.6606617 2013
[3]

Modern Code Reviews - A Survey of Literature and Practice

Deepika Badampudi, Michael Unterkalmsteiner, and Ricardo Britto. “Modern Code Reviews - A Survey of Literature and Practice”. In: ACM Transactions on Software Engineering and Methodology (Feb. 2023). DOI: 10 . 1145 / 3585004

work page 2023
[4]

Code review guide- lines for GUI-based testing artifacts

Andreas Bauer, Riccardo Coppola, Emil Alé- groth, and Tony Gorschek. “Code review guide- lines for GUI-based testing artifacts”. In:Informa- tion and Software Technology 163 (Nov. 2023), p. 107299. DOI: 10.1016/j.infsof.2023.107299

work page doi:10.1016/j.infsof.2023.107299 2023
[5]

Process Aspects and Social Dynamics of Contemporary Code Review: Insights from Open Source Development and Industrial Practice at Microsoft

Amiangshu Bosu, Jeffrey C. Carver, Christian Bird, Jonathan Orbeck, and Christopher Chock- ley. “Process Aspects and Social Dynamics of Contemporary Code Review: Insights from Open Source Development and Industrial Practice at Microsoft”. In: IEEE Transactions on Software Engineering 43 (1 2017), pp. 56–75. DOI: 10 . 1109/TSE.2016.2576451

work page arXiv 2017
[6]

Thematic analysis

Victoria Clarke and Virginia Braun. “Thematic analysis”. In:The Journal of Positive Psychology 12 (3 May 2017), pp. 297–298. DOI: 10.1080/ 17439760.2016.1262613

work page arXiv 2017
[7]

Tales From the Trenches: Expectations and Challenges From Practice for Code Review in the Generative AI Era

Nicole Davila, Jorge Melegati, and Igor Wiese. “Tales From the Trenches: Expectations and Challenges From Practice for Code Review in the Generative AI Era”. In: IEEE Software 41 (6 Nov. 2024), pp. 38–45. DOI: 10.1109/MS.2024. 3428439

work page doi:10.1109/ms.2024 2024
[8]

Taxing Collaborative Software Engineering

Michael Dorner, Maximilian Capraro, Oliver Trei- dler, Tom-Eric Kunz, Darja Šmite, Ehsan Zabar- dast, Daniel Mendez, and Krzysztof Wnuk. “Taxing Collaborative Software Engineering”. In: IEEE Software (2024), pp. 1–8. DOI: 10.1109/ MS.2023.3346646

work page arXiv 2024
[9]

The Upper Bound of Information Diffusion in Code Review

Michael Dorner, Daniel Mendez, Krzysztof Wnuk, Ehsan Zabardast, and Jacek Czerwonka. “The Upper Bound of Information Diffusion in Code Review”. In: Empirical Software Engineer- ing (June 2023)

work page 2023
[10]

Design and code inspections to reduce errors in program development

M. E. Fagan. “Design and code inspections to reduce errors in program development”. In: IBM Systems Journal 15 (3 1976), pp. 182–211. DOI: 10.1147/sj.153.0182

work page doi:10.1147/sj.153.0182 1976
[11]

Future of software development with generative AI

Jaakko Sauvola, Sasu Tarkoma, Mika Klemetti- nen, Jukka Riekki, and David Doermann. “Future of software development with generative AI”. In: Automated Software Engineering 31.1 (2024). DOI: 10.1007/s10515-024-00426-z

work page doi:10.1007/s10515-024-00426-z 2024
[12]

The I in LLM Stands For Intel- ligence

Daniel Stenberg. The I in LLM Stands For Intel- ligence. 2024. URL: https://daniel.haxx.se/blog/ 2024/01/02/the-i-in-llm-stands-for-intelligence/ (visited on 05/29/2025)

work page 2024
[13]

Calibration and Correctness of Language Models for Code

Rosalia Tufano, Alberto Martin-Lopez, Ahmad Tayeb, Ozren Dabic, Sonia Haiduc, and Gabriele Bavota. “ Deep Learning-based Code Reviews: A Paradigm Shift or a Double-Edged Sword? ” In: 2025 IEEE/ACM 47th International Conference on Software Engineering (ICSE) . Los Alamitos, CA, USA: IEEE Computer Society, May 2025, pp. 597–597. DOI: 10.1109/ICSE55347.2025. 00060

work page doi:10.1109/icse55347.2025 2025
[14]

Integrating pair programming into a software development process

Laurie Williams. “Integrating pair programming into a software development process”. In: Pro- ceedings 14th Conference on Software En- gineering Education and Training. ’In search of a software engineering profession’ (Cat. No.PR01059). IEEE Comput. Soc, pp. 27–36. DOI: 10.1109/CSEE.2001.913816

work page doi:10.1109/csee.2001.913816 2001
[15]

Ownership vs Contribution: In- vestigating the Alignment Between Ownership 2025 IEEE Software 7 and Contribution

Ehsan Zabardast, Javier Gonzalez-Huerta, and Binish Tanveer. “Ownership vs Contribution: In- vestigating the Alignment Between Ownership 2025 IEEE Software 7 and Contribution”. In: 2022 IEEE 19th Inter- national Conference on Software Architecture Companion (ICSA-C). IEEE, Mar. 2022, pp. 30–

work page 2025
[16]

8 IEEE Software 2025

DOI: 10.1109/ICSA-C54293.2022.00013. 8 IEEE Software 2025

work page doi:10.1109/icsa-c54293.2022.00013 2022

[1] [1]

Expecta- tions, outcomes, and challenges of modern code review

Alberto Bacchelli and Christian Bird. “Expecta- tions, outcomes, and challenges of modern code review”. In: 2013 35th International Conference on Software Engineering (ICSE). 2013, pp. 712–

work page 2013

[2] [2]

DOI: 10.1109/ICSE.2013.6606617

work page doi:10.1109/icse.2013.6606617 2013

[3] [3]

Modern Code Reviews - A Survey of Literature and Practice

Deepika Badampudi, Michael Unterkalmsteiner, and Ricardo Britto. “Modern Code Reviews - A Survey of Literature and Practice”. In: ACM Transactions on Software Engineering and Methodology (Feb. 2023). DOI: 10 . 1145 / 3585004

work page 2023

[4] [4]

Code review guide- lines for GUI-based testing artifacts

Andreas Bauer, Riccardo Coppola, Emil Alé- groth, and Tony Gorschek. “Code review guide- lines for GUI-based testing artifacts”. In:Informa- tion and Software Technology 163 (Nov. 2023), p. 107299. DOI: 10.1016/j.infsof.2023.107299

work page doi:10.1016/j.infsof.2023.107299 2023

[5] [5]

Process Aspects and Social Dynamics of Contemporary Code Review: Insights from Open Source Development and Industrial Practice at Microsoft

Amiangshu Bosu, Jeffrey C. Carver, Christian Bird, Jonathan Orbeck, and Christopher Chock- ley. “Process Aspects and Social Dynamics of Contemporary Code Review: Insights from Open Source Development and Industrial Practice at Microsoft”. In: IEEE Transactions on Software Engineering 43 (1 2017), pp. 56–75. DOI: 10 . 1109/TSE.2016.2576451

work page arXiv 2017

[6] [6]

Thematic analysis

Victoria Clarke and Virginia Braun. “Thematic analysis”. In:The Journal of Positive Psychology 12 (3 May 2017), pp. 297–298. DOI: 10.1080/ 17439760.2016.1262613

work page arXiv 2017

[7] [7]

Tales From the Trenches: Expectations and Challenges From Practice for Code Review in the Generative AI Era

Nicole Davila, Jorge Melegati, and Igor Wiese. “Tales From the Trenches: Expectations and Challenges From Practice for Code Review in the Generative AI Era”. In: IEEE Software 41 (6 Nov. 2024), pp. 38–45. DOI: 10.1109/MS.2024. 3428439

work page doi:10.1109/ms.2024 2024

[8] [8]

Taxing Collaborative Software Engineering

Michael Dorner, Maximilian Capraro, Oliver Trei- dler, Tom-Eric Kunz, Darja Šmite, Ehsan Zabar- dast, Daniel Mendez, and Krzysztof Wnuk. “Taxing Collaborative Software Engineering”. In: IEEE Software (2024), pp. 1–8. DOI: 10.1109/ MS.2023.3346646

work page arXiv 2024

[9] [9]

The Upper Bound of Information Diffusion in Code Review

Michael Dorner, Daniel Mendez, Krzysztof Wnuk, Ehsan Zabardast, and Jacek Czerwonka. “The Upper Bound of Information Diffusion in Code Review”. In: Empirical Software Engineer- ing (June 2023)

work page 2023

[10] [10]

Design and code inspections to reduce errors in program development

M. E. Fagan. “Design and code inspections to reduce errors in program development”. In: IBM Systems Journal 15 (3 1976), pp. 182–211. DOI: 10.1147/sj.153.0182

work page doi:10.1147/sj.153.0182 1976

[11] [11]

Future of software development with generative AI

Jaakko Sauvola, Sasu Tarkoma, Mika Klemetti- nen, Jukka Riekki, and David Doermann. “Future of software development with generative AI”. In: Automated Software Engineering 31.1 (2024). DOI: 10.1007/s10515-024-00426-z

work page doi:10.1007/s10515-024-00426-z 2024

[12] [12]

The I in LLM Stands For Intel- ligence

Daniel Stenberg. The I in LLM Stands For Intel- ligence. 2024. URL: https://daniel.haxx.se/blog/ 2024/01/02/the-i-in-llm-stands-for-intelligence/ (visited on 05/29/2025)

work page 2024

[13] [13]

Calibration and Correctness of Language Models for Code

Rosalia Tufano, Alberto Martin-Lopez, Ahmad Tayeb, Ozren Dabic, Sonia Haiduc, and Gabriele Bavota. “ Deep Learning-based Code Reviews: A Paradigm Shift or a Double-Edged Sword? ” In: 2025 IEEE/ACM 47th International Conference on Software Engineering (ICSE) . Los Alamitos, CA, USA: IEEE Computer Society, May 2025, pp. 597–597. DOI: 10.1109/ICSE55347.2025. 00060

work page doi:10.1109/icse55347.2025 2025

[14] [14]

Integrating pair programming into a software development process

Laurie Williams. “Integrating pair programming into a software development process”. In: Pro- ceedings 14th Conference on Software En- gineering Education and Training. ’In search of a software engineering profession’ (Cat. No.PR01059). IEEE Comput. Soc, pp. 27–36. DOI: 10.1109/CSEE.2001.913816

work page doi:10.1109/csee.2001.913816 2001

[15] [15]

Ownership vs Contribution: In- vestigating the Alignment Between Ownership 2025 IEEE Software 7 and Contribution

Ehsan Zabardast, Javier Gonzalez-Huerta, and Binish Tanveer. “Ownership vs Contribution: In- vestigating the Alignment Between Ownership 2025 IEEE Software 7 and Contribution”. In: 2022 IEEE 19th Inter- national Conference on Software Architecture Companion (ICSA-C). IEEE, Mar. 2022, pp. 30–

work page 2025

[16] [16]

8 IEEE Software 2025

DOI: 10.1109/ICSA-C54293.2022.00013. 8 IEEE Software 2025

work page doi:10.1109/icsa-c54293.2022.00013 2022