Augmentation with Dilution: A Large-Scale Empirical Study of Human Contributor Ecosystems After AI Coding Agent Adoption
Pith reviewed 2026-06-26 01:18 UTC · model grok-4.3
The pith
AI coding agent adoption leaves the absolute number of human contributors unchanged while reducing their relative density and newcomer share in open-source projects.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Adoption of AI coding agents produces no statistically significant change in the absolute count of human contributors, yet it lowers human contributor density, reduces the relative share of newcomers by 3.7 percentage points, and raises review depth by 5.3 percent. The effects appear immediately after adoption and persist, varying with project size, language, and maturity. The overall pattern is described as augmentation with dilution rather than displacement.
What carries the argument
Staggered difference-in-differences design with the Sun and Abraham estimator applied to the timing of AI coding agent adoption across repositories.
If this is right
- Absolute human contributor counts remain stable after AI adoption.
- Human contributor density declines as AI-generated contributions accumulate.
- The relative participation share of newcomers falls immediately and stays lower.
- Review depth increases as human effort shifts from code production to evaluation.
- The size of these changes differs across project size, programming language, and maturity levels.
Where Pith is reading between the lines
- Projects may need deliberate mechanisms to preserve newcomer entry points if the observed dilution continues.
- Increased review burden could raise the value of experienced human reviewers and change contribution norms.
- Longer-term ecosystem health may depend on whether diluted human participation still supplies enough novel ideas and maintenance effort.
- The pattern suggests AI tools redistribute rather than eliminate human roles, which could be tested by tracking contributor retention rates over longer windows.
Load-bearing premise
The timing of AI agent adoption across repositories is unrelated to any factors that would also drive changes in contributor numbers, density, or newcomer shares.
What would settle it
A dataset showing that repositories adopting AI agents already exhibited different pre-adoption trends in contributor density or newcomer share compared with non-adopters would undermine the causal interpretation.
Figures
read the original abstract
AI coding agents are penetrating open-source software development at an unprecedented pace, yet existing research predominantly treats human contributors as a static backdrop rather than as the subject of inquiry. This paper presents the first large-scale empirical study that takes the human contributor ecosystem as its dependent variable, examining how the number, composition, and behavior of human participants change following AI coding agent adoption in open-source projects. Using a staggered difference-in-differences design on a dataset of 11,097 GitHub repositories spanning January 2023 to May 2026, we provide causal evidence via the Sun and Abraham estimator. Our results show that AI agent adoption does not significantly change the absolute number of human contributors (ATT = 0.014, p = 0.224), but significantly reduces human contributor density (ATT = -0.019, p = 0.002), indicating that the relative share of human participation declines as AI-generated pull requests accumulate. The relative participation share of newcomers declines significantly by 3.7 percentage points (ATT = -0.037, p < 0.001), with the effect emerging immediately after adoption and remaining stable throughout the observation window. Review depth increases significantly by 5.3% (ATT = +0.0168, p < 0.001), indicating that AI agents shift burden from the code production stage to the review stage. Moderator analysis reveals that these effects vary systematically with project size, programming language, and project maturity. Together, these findings present a pattern of augmentation with dilution: AI agents are not displacing human contributors, but are systematically reshaping the participation structure of open-source ecosystems.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. This paper conducts the first large-scale causal study of AI coding agent adoption's impact on human contributors in open-source GitHub repositories. Using a staggered DiD design with the Sun and Abraham estimator on 11,097 repositories from January 2023 to May 2026, it finds no significant effect on the absolute number of human contributors (ATT = 0.014, p = 0.224) but a reduction in contributor density (ATT = -0.019, p = 0.002), a 3.7 percentage point drop in newcomer share (ATT = -0.037, p < 0.001), and a 5.3% increase in review depth (ATT = +0.0168, p < 0.001), concluding that AI leads to 'augmentation with dilution' of human participation, with heterogeneity by project size, language, and maturity.
Significance. If the causal identification holds, this provides novel evidence on how AI agents reshape rather than displace human participation structures in OSS ecosystems, with implications for project sustainability, governance, and the division of labor between code production and review. The large sample size and application of the Sun and Abraham estimator tailored to staggered adoption are strengths that could advance empirical software engineering research on tool adoption effects.
major comments (2)
- [Empirical Strategy] Empirical Strategy section: The central causal claims rest on the Sun and Abraham estimator recovering unbiased ATT effects under parallel trends and no anticipation, yet the manuscript reports no pre-trend coefficients, event-study plots, or robustness checks against time-varying confounders or anticipation. This directly affects the credibility of the key estimates on contributor density, newcomer share, and review depth.
- [Data and Sample] Data and Sample section: AI coding agent adoption timing is assumed exogenous conditional on fixed effects, but no tests, discussions, or sensitivity analyses address potential endogeneity (e.g., adoption driven by declining human participation or project maturity), which could bias the reported effects on the outcome variables.
minor comments (1)
- [Abstract] The abstract introduces the interpretive phrase 'augmentation with dilution' without a concise definition; a brief operationalization in the introduction would improve clarity for readers.
Simulated Author's Rebuttal
Thank you for the constructive feedback on our manuscript. We address each major comment below, agreeing where revisions are warranted to strengthen the causal claims and identification discussion. We propose targeted additions to the Empirical Strategy and Data sections.
read point-by-point responses
-
Referee: [Empirical Strategy] Empirical Strategy section: The central causal claims rest on the Sun and Abraham estimator recovering unbiased ATT effects under parallel trends and no anticipation, yet the manuscript reports no pre-trend coefficients, event-study plots, or robustness checks against time-varying confounders or anticipation. This directly affects the credibility of the key estimates on contributor density, newcomer share, and review depth.
Authors: We agree that explicit validation of the identifying assumptions is necessary. In the revised manuscript we will add event-study plots and pre-treatment coefficients using the Sun and Abraham estimator to document parallel trends, along with robustness checks that shift adoption dates to test for anticipation. These will be placed in a new subsection of the Empirical Strategy section and referenced in the Results. revision: yes
-
Referee: [Data and Sample] Data and Sample section: AI coding agent adoption timing is assumed exogenous conditional on fixed effects, but no tests, discussions, or sensitivity analyses address potential endogeneity (e.g., adoption driven by declining human participation or project maturity), which could bias the reported effects on the outcome variables.
Authors: We acknowledge the need for greater transparency on this point. The revised version will include an expanded discussion of the exogeneity assumption conditional on fixed effects and add sensitivity analyses examining whether pre-adoption trends in contributor outcomes predict adoption timing. We will also report results from alternative specifications that control for project maturity proxies. Full resolution of all endogeneity channels may be limited by available covariates, which we will note. revision: partial
Circularity Check
No circularity: results are data-driven ATT estimates from external GitHub repositories via standard Sun-Abraham estimator
full rationale
The paper reports causal estimates (ATT on contributor count, density, newcomer share, review depth) obtained by applying the Sun and Abraham (2021) staggered DiD estimator to an external dataset of 11,097 GitHub repositories. No equations, parameters, or predictions are defined in terms of the target outcomes; the estimator is an off-the-shelf method whose validity rests on external identifying assumptions (parallel trends, conditional exogeneity of adoption timing) rather than any self-referential construction. No self-citations are load-bearing, no fitted inputs are relabeled as predictions, and no ansatz or uniqueness claim reduces the results to the inputs by definition. The derivation chain is therefore self-contained against external benchmarks.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption The parallel trends assumption holds for the staggered adoption of AI coding agents across the selected repositories.
Reference graph
Works this paper leans on
-
[1]
Agentic much? adoption of coding agents on github,
R. Robbes, T. Matricon, T. Degueule, A. Hora, and S. Zacchiroli, “Agentic much? adoption of coding agents on github,”arXiv preprint arXiv:2601.18341, 2026
Pith/arXiv arXiv 2026
-
[2]
A systematic literature review on the barriers faced by newcomers to open source software projects,
I. Steinmacher, M. A. G. Silva, M. A. Gerosa, and D. F. Redmiles, “A systematic literature review on the barriers faced by newcomers to open source software projects,”Information and Software Technology, vol. 59, pp. 67–85, 2015
2015
-
[3]
Why do people give up flossing? a study of contributor disengagement in open source,
C. Miller, D. G. Widder, C. K ¨astner, and B. Vasilescu, “Why do people give up flossing? a study of contributor disengagement in open source,” inIFIP International Conference on Open Source Systems. Springer, 2019, pp. 116–129
2019
-
[4]
Gender and tenure diversity in github teams,
B. Vasilescu, D. Posnett, B. Ray, M. G. van den Brand, A. Serebrenik, P. Devanbu, and V . Filkov, “Gender and tenure diversity in github teams,” inProceedings of the 33rd annual ACM conference on human factors in computing systems, 2015, pp. 3789–3798
2015
-
[5]
How ai coding agents modify code: A large-scale study of github pull requests,
D. Ogenrwot and J. Businge, “How ai coding agents modify code: A large-scale study of github pull requests,”arXiv preprint arXiv:2601.17581, 2026
Pith/arXiv arXiv 2026
-
[6]
Will it survive? deciphering the fate of ai- generated code in open source,
M. Rahman and E. Shihab, “Will it survive? deciphering the fate of ai- generated code in open source,”arXiv preprint arXiv:2601.16809, 2026
arXiv 2026
-
[7]
Debt behind the ai boom: A large-scale empirical study of ai-generated code in the wild,
Y . Liu, R. Widyasari, Y . Zhao, I. C. Irsan, J. Chen, and D. Lo, “Debt behind the ai boom: A large-scale empirical study of ai-generated code in the wild,”arXiv preprint arXiv:2603.28592, 2026
Pith/arXiv arXiv 2026
-
[8]
On autopilot? an empirical study of human-ai teaming and review practices in open source,
H. Gao, P. Banyongrakkul, H. Guan, M. Zahedi, and C. Treude, “On autopilot? an empirical study of human-ai teaming and review practices in open source,”arXiv preprint arXiv:2601.13754, 2026
arXiv 2026
-
[9]
AI IDEs or autonomous agents? Measuring the impact of coding agents on software development,
S. Agarwal, H. He, and B. Vasilescu, “AI IDEs or autonomous agents? Measuring the impact of coding agents on software development,” in Proceedings of the 23rd International Conference on Mining Software Repositories (MSR), 2026
2026
-
[10]
We are changing our developer productivity experiment design,
J. Becker, N. Rush, T. Cunningham, D. Rein, and K. Mahamud, “We are changing our developer productivity experiment design,” https://metr. org/blog/2026-02-24-uplift-update/, 02 2026
2026
-
[11]
Work practices and challenges in pull-based development: The integrator’s perspective,
G. Gousios, A. Zaidman, M.-A. Storey, and A. Van Deursen, “Work practices and challenges in pull-based development: The integrator’s perspective,” in2015 IEEE/ACM 37th IEEE International Conference on Software Engineering, vol. 1. IEEE, 2015, pp. 358–368
2015
-
[12]
Difference-in-differences with multi- ple time periods,
B. Callaway and P. H. Sant’Anna, “Difference-in-differences with multi- ple time periods,”Journal of econometrics, vol. 225, no. 2, pp. 200–230, 2021
2021
-
[13]
Towards causal analysis of empir- ical software engineering data: The impact of programming languages on coding competitions,
C. A. Furia, R. Torkar, and R. Feldt, “Towards causal analysis of empir- ical software engineering data: The impact of programming languages on coding competitions,”ACM Transactions on Software Engineering and Methodology, vol. 33, no. 1, pp. 1–35, 2023
2023
-
[14]
Replication package for ‘Augmentation with Dilution: A Large-Scale Empirical Study of Human Contributor Ecosystems After AI Coding Agent Adoption’,
A. Author, “Replication package for ‘Augmentation with Dilution: A Large-Scale Empirical Study of Human Contributor Ecosystems After AI Coding Agent Adoption’,” 2026. [Online]. Available: https://osf.io/ phzk7/overview?view only=73979e746c5f4a0aa59317b5457204ff
2026
-
[15]
Where do ai coding agents fail? an empirical study of failed agentic pull requests in github,
R. Ehsani, S. Pathak, S. Rawal, A. A. Mujahid, M. M. Imran, and P. Chatterjee, “Where do ai coding agents fail? an empirical study of failed agentic pull requests in github,”arXiv preprint arXiv:2601.15195, 2026
arXiv 2026
-
[16]
On the use of agentic coding: An empirical study of pull requests on github,
M. Watanabe, H. Li, Y . Kashiwa, B. Reid, H. Iida, and A. E. Hassan, “On the use of agentic coding: An empirical study of pull requests on github,”ACM Transactions on Software Engineering and Methodology, 2025
2025
-
[17]
The end of code review: Coding agents supersede human inspection,
M. Monperrus, “The end of code review: Coding agents supersede human inspection,”arXiv preprint arXiv:2606.13175, 2026
Pith/arXiv arXiv 2026
-
[18]
Ecosystem-level determinants of sustained activity in open-source projects: A case study of the pypi ecosystem,
M. Valiev, B. Vasilescu, and J. Herbsleb, “Ecosystem-level determinants of sustained activity in open-source projects: A case study of the pypi ecosystem,” inProceedings of the 2018 26th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering, 2018, pp. 644–655
2018
-
[19]
H. Li, H. Zhang, and A. E. Hassan, “The rise of AI teammates in software engineering (SE) 3.0: How autonomous coding agents are reshaping software engineering,” 2025. [Online]. Available: https://doi.org/10.48550/arXiv.2507.15003
work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2507.15003 2025
-
[20]
Speed at the cost of quality: How Cursor AI increases short-term velocity and long-term complexity in open-source projects,
H. He, C. Miller, S. Agarwal, C. K ¨astner, and B. Vasilescu, “Speed at the cost of quality: How Cursor AI increases short-term velocity and long-term complexity in open-source projects,” inProceedings of the 23rd International Conference on Mining Software Repositories (MSR), 2026
2026
-
[21]
The central role of the propensity score in observational studies for causal effects,
P. R. Rosenbaum and D. B. Rubin, “The central role of the propensity score in observational studies for causal effects,”Biometrika, vol. 70, no. 1, pp. 41–55, 1983
1983
-
[22]
An introduction to propensity score methods for reduc- ing the effects of confounding in observational studies,
P. C. Austin, “An introduction to propensity score methods for reduc- ing the effects of confounding in observational studies,”Multivariate behavioral research, vol. 46, no. 3, pp. 399–424, 2011
2011
-
[23]
Two-way fixed effects, the two-way mundlak regres- sion, and difference-in-differences estimators,
J. M. Wooldridge, “Two-way fixed effects, the two-way mundlak regres- sion, and difference-in-differences estimators,”Empirical Economics, vol. 69, no. 5, pp. 2545–2587, 2025
2025
-
[24]
Estimating dynamic treatment effects in event studies with heterogeneous treatment effects,
L. Sun and S. Abraham, “Estimating dynamic treatment effects in event studies with heterogeneous treatment effects,”Journal of econometrics, vol. 225, no. 2, pp. 175–199, 2021
2021
-
[25]
Early-stage prediction of review effort in ai-generated pull requests,
D. S. D. Minh, H. T. Kiet, N. L. P. Quy, P. P. Hoa, T. C. Nguyen, N. D. H. Duong, and T. B. Tran, “Early-stage prediction of review effort in ai-generated pull requests,”arXiv preprint arXiv:2601.00753, 2026
arXiv 2026
-
[26]
Influence of social and technical factors for evaluating contribution in github,
J. Tsay, L. Dabbish, and J. Herbsleb, “Influence of social and technical factors for evaluating contribution in github,” inProceedings of the 36th international conference on Software engineering, 2014, pp. 356–366
2014
-
[27]
K. Chowdhury, D. Banik, K. Ferdous, and S. I. Shamim, “From industry claims to empirical reality: An empirical study of code review agents in pull requests,”arXiv preprint arXiv:2604.03196, 2026
Pith/arXiv arXiv 2026
-
[28]
An exploratory study of the pull-based software development model,
G. Gousios, M. Pinzger, and A. v. Deursen, “An exploratory study of the pull-based software development model,” inProceedings of the 36th international conference on software engineering, 2014, pp. 345–355
2014
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.