Adoption and Impact of Command-Line AI Coding Agents: A Study of Microsoft's Early 2026 Rollout of Claude Code and GitHub Copilot CLI
Pith reviewed 2026-07-03 19:12 UTC · model grok-4.3
The pith
Microsoft engineers who adopted command-line AI coding agents merged 24% more pull requests than similar non-adopters, with the gain holding over four months.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
In a study of tens of thousands of Microsoft engineers during the early 2026 rollout of Claude Code and GitHub Copilot CLI, first use diffused primarily through social networks, retention correlated with prior coding activity, and adopters merged roughly 24% more pull requests than they otherwise would have, with the effect persisting over four months when using merged pull requests as the output proxy.
What carries the argument
Comparison of merged pull request counts between adopters and non-adopters after matching or regression controls to isolate the contribution of tool use.
Load-bearing premise
Differences in merged pull request counts between adopters and non-adopters can be attributed to tool use rather than unobserved differences in engineer behavior or project characteristics.
What would settle it
A before-and-after comparison of the same engineers or a randomized rollout that shows no difference in merged pull request volume would indicate the reported lift is not caused by the agents.
Figures
read the original abstract
Organizations rolling out agentic command line tools like Anthropic's Claude Code and GitHub's Copilot CLI need to know who will try them, who will keep using them, and whether the tools produce enough output to justify their cost. At organizational scale, token spend can run into millions of dollars annually, so misreading adoption, retention, or impact can make a rollout expensive without changing engineering velocity. Studying tens of thousands of engineers at Microsoft over its early-2026 rollout, we find that first use spread primarily through social networks, retention was associated more with engineers' coding activity than with demographics, and adopters merged roughly 24% more pull requests than they would have otherwise. We use merged pull requests as our proxy for output -- acknowledging that a merged PR is not the same as the value it delivers -- and the lift persists across our four-month window. These results suggest that CLI coding agents are neither uniformly adopted nor mere novelty effects and that organizations should treat visible peer use as central to rollout strategy.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper studies the early-2026 rollout of command-line AI coding agents (Claude Code and GitHub Copilot CLI) at Microsoft using data on tens of thousands of engineers. It reports that adoption spreads primarily via social networks, retention correlates more with coding activity than demographics, and adopters merged roughly 24% more pull requests than they would have otherwise, with the effect persisting over a four-month window. Merged PR count is used as an output proxy while acknowledging its limitations.
Significance. If the 24% causal lift in merged PRs holds after proper identification, the results would inform organizational strategies for scaling agentic coding tools by highlighting peer-driven adoption and the role of baseline coding activity in retention. The large internal sample and explicit proxy caveat are strengths for an empirical software engineering study.
major comments (2)
- [Abstract] Abstract: the headline causal claim that adopters merged 24% more PRs 'than they would have otherwise' is presented without any description of the sample construction, matching procedure, regression specification, or robustness checks. This identification strategy is load-bearing for the central impact result and cannot be evaluated from the provided text.
- [Abstract and Results] The manuscript notes that adoption spread via social networks and retention tied to coding activity, yet supplies no detail on how these or other observables (e.g., pre-adoption trends, project characteristics) enter the matching or regression controls used to isolate the treatment effect.
minor comments (1)
- [Abstract] The abstract could more explicitly quantify the sample size and time window in the opening sentence for immediate context.
Simulated Author's Rebuttal
We thank the referee for highlighting the need for greater transparency in the abstract regarding our identification strategy. We agree that the central causal claim requires sufficient detail for evaluation and will revise the abstract and results sections accordingly. Point-by-point responses follow.
read point-by-point responses
-
Referee: [Abstract] Abstract: the headline causal claim that adopters merged 24% more PRs 'than they would have otherwise' is presented without any description of the sample construction, matching procedure, regression specification, or robustness checks. This identification strategy is load-bearing for the central impact result and cannot be evaluated from the provided text.
Authors: We agree the abstract omits these details. The full manuscript uses propensity-score matching on pre-adoption merged PRs, coding activity, tenure, team size, and project characteristics, followed by a difference-in-differences regression with engineer and time fixed effects plus robustness checks (alternative calipers, placebo tests on non-adopters). We will revise the abstract to concisely summarize the sample (tens of thousands of engineers), matching procedure, regression specification, and key robustness results. revision: yes
-
Referee: [Abstract and Results] The manuscript notes that adoption spread via social networks and retention tied to coding activity, yet supplies no detail on how these or other observables (e.g., pre-adoption trends, project characteristics) enter the matching or regression controls used to isolate the treatment effect.
Authors: Social-network diffusion and activity-based retention are analyzed descriptively via network graphs and logistic regressions on usage frequency. For the impact estimates, pre-adoption trends, project characteristics, and the listed observables are used both as matching covariates and as controls in the regression. We will add explicit language in the abstract and results clarifying their role in the identification strategy. revision: yes
Circularity Check
No circularity: observational empirical study with no derivation chain
full rationale
The paper is a purely observational empirical analysis of adoption and impact using merged PR counts as a proxy. The 24% lift is presented as an estimated difference between adopters and non-adopters after controls, not as a quantity derived from or identical to any fitted parameter or self-citation. No equations, ansatzes, uniqueness theorems, or self-referential predictions appear in the provided text. The identification strategy (matching or regression) is an external methodological choice whose validity is debatable on causal grounds but does not constitute circularity by construction. The result is therefore self-contained as a data-driven estimate rather than a definitional identity.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption Merged pull requests constitute a reasonable proxy for engineering output and impact
Reference graph
Works this paper leans on
-
[1]
The Fast and Spurious: Developer Productivity with GenAI
Sadia Afroz, Zixuan Feng, Tyler Menezes, Katie Kimura, Bianca Trinkenreich, Igor Stein- macher, and Anita Sarma. The fast and spurious: Developer productivity with genai, 2026. URLhttps://arxiv.org/abs/2510.24265
work page internal anchor Pith review Pith/arXiv arXiv 2026
-
[2]
AI IDEs or autonomous agents? measuring the impact of coding agents on software development
Shyam Agarwal, Hao He, and Bogdan Vasilescu. AI IDEs or autonomous agents? measuring the impact of coding agents on software development. InProceedings of the 23rd International Conference on Mining Software Repositories (MSR), Mining Challenge Track, 2026
2026
-
[3]
Distinguishing influence-based contagion from homophily-driven diffusion in dynamic networks.Proceedings of the National Academy of Sciences, 106(51):21544–21549, 2009
Sinan Aral, Lev Muchnik, and Arun Sundararajan. Distinguishing influence-based contagion from homophily-driven diffusion in dynamic networks.Proceedings of the National Academy of Sciences, 106(51):21544–21549, 2009
2009
-
[4]
Measuring the Impact of Early-2025 AI on Experienced Open-Source Developer Productivity, July 2025
Joel Becker, Nate Rush, Elizabeth Barnes, and David Rein. Measuring the impact of early- 2025 ai on experienced open-source developer productivity.arXiv preprint arXiv:2507.09089, 2025. 19
-
[5]
Understanding information systems continuance: An expectation- confirmation model.MIS quarterly, 25(3):351–370, 2001
Anol Bhattacherjee. Understanding information systems continuance: An expectation- confirmation model.MIS quarterly, 25(3):351–370, 2001
2001
-
[6]
Developers’ experience with generative ai–first insights from an empirical mixed-methods field study
Charlotte Brandebusemeyer, Tobias Schimmer, and Bert Arnrich. Developers’ experience with generative ai–first insights from an empirical mixed-methods field study. InProceedings of the International Conference on Software Engineering (ICSE), Software Engineering in Practice Track (SEIP), 2026
2026
-
[7]
Inferring causal impact using Bayesian structural time-series models.Annals of Applied Statistics, 9: 247–274, 2015
Kay H Brodersen, Fabian Gallusser, Jim Koehler, Nicolas Remy, and Steven L Scott. Inferring causal impact using Bayesian structural time-series models.Annals of Applied Statistics, 9: 247–274, 2015
2015
-
[8]
Dear diary: A randomized controlled trial of generative ai coding tools in the workplace
Jenna Butler, Jina Suh, Sankeerti Haniyur, and Constance Hadley. Dear diary: A randomized controlled trial of generative ai coding tools in the workplace. In2025 IEEE/ACM 47th International Conference on Software Engineering: Software Engineering in Practice (ICSE- SEIP), pages 319–329, 2025. doi: 10.1109/ICSE-SEIP66354.2025.00034
-
[9]
The Productivity Effects of Generative AI: Evidence from a Field Experiment with GitHub Copilot.An MIT Exploration of Generative AI, mar 27 2024
Kevin Zheyuan Cui, Mert Demirer, Sonia Jaffe, Leon Musolff, Sida Peng, and Tobias Salz. The Productivity Effects of Generative AI: Evidence from a Field Experiment with GitHub Copilot.An MIT Exploration of Generative AI, mar 27 2024. https://mit- genai.pubpub.org/pub/v5iixksv
2024
-
[10]
Zheyuan Cui, Mert Demirer, Sonia Jaffe, Leon Musolff, Sida Peng, and Tobias Salz. The effects of generative AI on high-skilled work: Evidence from three field experiments with software developers.Management Science, 2025. doi: 10.2139/ssrn.4945566. Forthcoming
-
[11]
Who is using ai to code? global diffusion and impact of generative ai.Science, page eadz9311, 2026
Simone Daniotti, Johannes Wachs, Xiangnan Feng, and Frank Neffke. Who is using ai to code? global diffusion and impact of generative ai.Science, page eadz9311, 2026
2026
-
[12]
Perceived usefulness, perceived ease of use, and user acceptance of information technology.MIS quarterly, 13(3):319–340, 1989
Fred D Davis. Perceived usefulness, perceived ease of use, and user acceptance of information technology.MIS quarterly, 13(3):319–340, 1989
1989
-
[13]
Writing code vs
Mert Demirer, Leon Musolff, and Liyuan Yang. Writing code vs. shipping code: Productiv- ity effects across generations of ai coding tools. Working Paper 35275, National Bureau of Economic Research, May 2026. URLhttp://www.nber.org/papers/w35275
2026
-
[14]
A Meta employee created a dashboard so coworkers can compete to be the company’s no
Fortune. A Meta employee created a dashboard so coworkers can compete to be the company’s no. 1 AI token user – and Zuckerberg doesn’t even rank in the top 250.https://fortune. com/2026/04/09/meta-killed-employee-ai-token-dashboard/, April 2026. Accessed 2026-05-20
2026
-
[15]
Vincent Gurgul, Robin Gubela, and Stefan Lessmann. The state of generative ai in software de- velopment: Insights from literature and a developer survey.arXiv preprint arXiv:2603.16975, 2026
-
[16]
Speed at the cost of quality: How Cursor AI increases short-term velocity and long-term complexity in open-source projects
Hao He, Courtney Miller, Shyam Agarwal, Christian K¨ astner, and Bogdan Vasilescu. Speed at the cost of quality: How Cursor AI increases short-term velocity and long-term complexity in open-source projects. InProceedings of the 23rd International Conference on Mining Software Repositories (MSR), 2026
2026
-
[17]
GitHub Copilot and developer produc- tivity: An observational dose-response analysis, 2026
Alex Heilman, Alex Kyllo, and Emerson Murphy-Hill. GitHub Copilot and developer produc- tivity: An observational dose-response analysis, 2026. 20
2026
-
[18]
The heterogeneous productivity effects of generative ai
David Kreitmeir and Paul A Raschky. The heterogeneous productivity effects of generative ai. arXiv preprint arXiv:2403.01964, 2024
-
[19]
Why ai agents still need you: Findings from developer-agent collaborations in the wild
Aayush Kumar, Yasharth Bajpai, Sumit Gulwani, Gustavo Soares, and Emerson Murphy-Hill. Why ai agents still need you: Findings from developer-agent collaborations in the wild. In 2025 40th IEEE/ACM International Conference on Automated Software Engineering (ASE), page 432–444. IEEE Press, 2025. doi: 10.1109/ASE63991.2025.00043. URLhttps://doi.or g/10.1109/...
-
[20]
Anand Kumar, Vishal Khare, Deepak Sharma, Satyam Kumar, Vijay Saini, Anshul Yadav, Sachendra Jain, Ankit Rana, Pratham Verma, Vaibhav Meena, et al. Intuition to evidence: Measuring ai’s true impact on developer productivity.arXiv preprint arXiv:2509.19708, 2025
-
[21]
Amr Mohamed, Maram Assi, and Mariam Guizani. The impact of llm-assistants on software developer productivity: A systematic review and mapping study.ACM Trans. Softw. Eng. Methodol., April 2026. ISSN 1049-331X. doi: 10.1145/3809494. URLhttps://doi.org/10.1 145/3809494. Just Accepted
-
[22]
Reading between the lines: Modeling user behavior and costs in ai-assisted programming
Hussein Mozannar, Gagan Bansal, Adam Fourney, and Eric Horvitz. Reading between the lines: Modeling user behavior and costs in ai-assisted programming. InProceedings of the 2024 CHI conference on human factors in computing systems, pages 1–16, 2024
2024
-
[23]
Peer interaction effectively, yet infrequently, enables programmers to discover new tools
Emerson Murphy-Hill and Gail C Murphy. Peer interaction effectively, yet infrequently, enables programmers to discover new tools. InProceedings of the ACM 2011 conference on Computer supported cooperative work, pages 405–414, 2011
2011
-
[24]
How do users discover new tools in software development and beyond?Computer Supported Cooperative Work (CSCW), 24(5):389–422, 2015
Emerson Murphy-Hill, Da Young Lee, Gail C Murphy, and Joanna McGrenere. How do users discover new tools in software development and beyond?Computer Supported Cooperative Work (CSCW), 24(5):389–422, 2015
2015
-
[25]
A survey of generative AI adoption and perceived productivity among scientists who program
Gabrielle O’Brien, Alexis Parker, Nasir Eisty, and Jeffrey Carver. More code, less vali- dation: Risk factors for over-reliance on ai coding tools among scientists.arXiv preprint arXiv:2512.19644, 2025
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[26]
AI Tooling for Software Engineers in 2026.https://newsletter.pragmatic engineer.com/p/ai-tooling-2026, 2026
Gergely Orosz. AI Tooling for Software Engineers in 2026.https://newsletter.pragmatic engineer.com/p/ai-tooling-2026, 2026. Accessed 2026-05-20
2026
-
[27]
How much does AI impact development speed? an enterprise-based randomized controlled trial
Elise Paradis, Kate Grey, Quinn Madison, Daye Nam, Andrew Macvean, Vahid Meimand, Nan Zhang, Ben Ferrari-Church, and Satish Chandra. How much does AI impact development speed? an enterprise-based randomized controlled trial. In2025 IEEE/ACM 47th International Conference on Software Engineering: Software Engineering in Practice (ICSE-SEIP), pages 618–629. ...
2025
-
[28]
The Impact of AI on Developer Productivity: Evidence from GitHub Copilot
Sida Peng, Eirini Kalliamvakou, Peter Cihon, and Mert Demirer. The impact of AI on devel- oper productivity: Evidence from GitHub Copilot.arXiv preprint arXiv:2302.06590, 2023
work page internal anchor Pith review Pith/arXiv arXiv 2023
-
[29]
Coding Beyond Your Training: Claude Code and the Technological Frontier of Software Developers
Alexander Quispe. Coding beyond your training: Claude code and the technological frontier of software developers.arXiv preprint arXiv:2605.25438, 2026
work page internal anchor Pith review Pith/arXiv arXiv 2026
-
[30]
Adop- tion of ai tools in software development: a systematic literature review.Science of Computer Programming, 254:103521, 2026
Dar´ ıo Reyes-Reina, Jenny Marcela Sanch´ ez-Torres, and Iv´ an Mauricio Rueda-C´ aceres. Adop- tion of ai tools in software development: a systematic literature review.Science of Computer Programming, 254:103521, 2026. 21
2026
-
[31]
Agentic Much? Adoption of Coding Agents on GitHub
Romain Robbes, Th´ eo Matricon, Thomas Degueule, Andre Hora, and Stefano Zacchiroli. Agen- tic much? adoption of coding agents on github.arXiv preprint arXiv:2601.18341, 2026
work page internal anchor Pith review Pith/arXiv arXiv 2026
-
[32]
Rogers.Diffusion of Innovations, 5th Edition
Everett M. Rogers.Diffusion of Innovations, 5th Edition. Simon and Schuster, 2003. ISBN 9780743258234
2003
-
[33]
The effects of github copilot on computing students’ program- ming effectiveness, efficiency, and processes in brownfield coding tasks
Md Istiak Hossain Shihab, Christopher Hundhausen, Ahsun Tariq, Summit Haque, Yunhan Qiao, and Brian Wise Mulanda. The effects of github copilot on computing students’ program- ming effectiveness, efficiency, and processes in brownfield coding tasks. InProceedings of the 2025 ACM Conference on International Computing Education Research V. 1, pages 407–420, 2025
2025
-
[34]
Fangchen Song, Ashish Agarwal, and Wen Wen. The impact of generative ai on collab- orative open-source software development: Evidence from github copilot.arXiv preprint arXiv:2410.02091, 2024
work page internal anchor Pith review Pith/arXiv arXiv 2024
-
[35]
Stack Overflow. Developers remain willing but reluctant to use AI: The 2025 developer survey results are here.https://stackoverflow.blog/2025/12/29/developers-remain-wil ling-but-reluctant-to-use-ai-the-2025-developer-survey-results-are-here/, December 2025. Accessed 2026-05-20
2025
-
[36]
Developer productivity with and without github copilot: A longitudinal mixed-methods case study
Viktoria Stray, Elias Goldmann Brandtzæg, Viggo Wivestad, Astri Barbala, and Nils Brede Moe. Developer productivity with and without github copilot: A longitudinal mixed-methods case study. InProceedings of the 59th Hawaii International Conference on System Sciences, 2026
2026
-
[37]
Rafael Tomaz, Paloma Guenes, Allysson Allex Ara˜A¯ ejo, Maria Teresa Baldassarre, and Marcos Kalinowski. Impacts of generative ai on agile teams’ productivity: A multi-case longitudinal study.arXiv preprint arXiv:2602.13766, 2026
-
[38]
Expectation vs
Priyan Vaithilingam, Tianyi Zhang, and Elena L Glassman. Expectation vs. experience: Eval- uating the usability of code generation tools powered by large language models. InCHI conference on human factors in computing systems extended abstracts, pages 1–7, 2022
2022
-
[39]
A theoretical extension of the technology acceptance model: Four longitudinal field studies.Management science, 46(2):186–204, 2000
Viswanath Venkatesh and Fred D Davis. A theoretical extension of the technology acceptance model: Four longitudinal field studies.Management science, 46(2):186–204, 2000
2000
-
[40]
User acceptance of information technology: Toward a unified view1.MIS quarterly, 27(3):425–478, 2003
Viswanath Venkatesh, Michael G Morris, Gordon B Davis, and Fred D Davis. User acceptance of information technology: Toward a unified view1.MIS quarterly, 27(3):425–478, 2003
2003
-
[41]
Significant productivity gains through programming with large language models.Proceedings of the ACM on Human-Computer Interaction, 8(EICS):1–29, 2024
Thomas Weber, Maximilian Brandmaier, Albrecht Schmidt, and Sven Mayer. Significant productivity gains through programming with large language models.Proceedings of the ACM on Human-Computer Interaction, 8(EICS):1–29, 2024
2024
-
[42]
Social influences on secure develop- ment tool adoption: why security tools spread
Shundan Xiao, Jim Witschey, and Emerson Murphy-Hill. Social influences on secure develop- ment tool adoption: why security tools spread. InProceedings of the 17th ACM conference on Computer supported cooperative work & social computing, pages 1095–1106, 2014
2014
-
[43]
Claude code scientists: Measuring ai adoption and productivity among scien- tists.Available at SSRN 6803624, 2026
Charles Yang. Claude code scientists: Measuring ai adoption and productivity among scien- tists.Available at SSRN 6803624, 2026. 22
2026
-
[44]
The Impact of Large Language Models on Open-source Innovation: Evidence from GitHub Copilot
Doron Yeverechyahu, Raveesh Mayya, and Gal Oestreicher-Singer. The impact of large language models on open-source innovation: Evidence from github copilot.arXiv preprint arXiv:2409.08379, 2024
work page internal anchor Pith review Pith/arXiv arXiv 2024
-
[45]
Productivity assessment of neural code completion
Albert Ziegler, Eirini Kalliamvakou, X Alice Li, Andrew Rice, Devon Rifkin, Shawn Simis- ter, Ganesh Sittampalam, and Edward Aftandilian. Productivity assessment of neural code completion. InProceedings of the 6th ACM SIGPLAN international symposium on machine programming, pages 21–29, 2022. 23
2022
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.