Empirical Study on the Characteristics and Evolution of AI-usage in GitHub Repositories: Evidence from Code Comments
Pith reviewed 2026-06-27 21:34 UTC · model grok-4.3
The pith
Developers primarily use LLMs for code implementation in GitHub projects, with later commits showing refactoring and bug fixes.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Developers primarily use LLMs for code implementation, followed by code enhancement, debugging, documentation, and testing. Subsequent commits frequently involve refactoring and cleanup, feature integration and extension, and bug fixing, indicating sustained human oversight in adapting AI-assisted code. Over time, AI-referencing comments shift from direct code generation toward knowledge and conceptual support and code enhancement.
What carries the argument
Open-coded taxonomy of AI-assisted activities applied to comments via two LLM classifiers aggregated with Dawid-Skene method, plus analysis of subsequent commit messages and temporal trends.
If this is right
- AI serves as an aid embedded in human-driven workflows rather than a replacement.
- Developers continue to invest effort in refining AI-generated code after its introduction.
- AI usage in projects evolves from task-specific generation to broader knowledge support.
- Evidence of sustained oversight suggests AI tools complement rather than automate development entirely.
Where Pith is reading between the lines
- Design of future AI coding tools could emphasize support for editing and conceptual queries.
- Similar patterns may appear in other domains where AI generates initial artifacts that require human adaptation.
- The findings point to a collaborative model that could be tested in controlled studies of developer-AI interaction.
Load-bearing premise
The 35,361 explicitly referencing comments, classified accurately by the LLM models and aggregation method, represent typical AI usage in GitHub projects without major selection or classification biases.
What would settle it
Finding a substantially different distribution of usage categories or lack of human refinement in a broader sample of GitHub projects that do not explicitly mention AI in comments would falsify the central patterns.
read the original abstract
Developers increasingly use AI tools such as ChatGPT, Copilot, and Claude in everyday software workflows, but prior studies often evaluate LLM outputs in isolation rather than examining how developers adapt them in real projects. We analyze 35,361 GitHub code comments that explicitly reference AI use and their associated code blocks. We first open-code 500 unique comments and code blocks to derive a taxonomy of AI-assisted development activities, then annotate the full dataset using two LLM-based classifiers and aggregate predictions with Dawid-Skene expectation-maximization. We also analyze 12,996 subsequent commit messages to study how AI-assisted code evolves after introduction, and examine temporal trends from December 2022 to March 2026. Our results show that developers primarily use LLMs for code implementation, followed by code enhancement, debugging, documentation, and testing. Subsequent commits frequently involve refactoring and cleanup, feature integration and extension, and bug fixing, indicating sustained human oversight in adapting AI-assisted code. Over time, AI-referencing comments shift from direct code generation toward knowledge and conceptual support and code enhancement. These findings suggest that AI tools are becoming embedded not only as code-generation aids, but also as collaborative support mechanisms whose outputs are refined, extended, and corrected by developers over time.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper analyzes 35,361 GitHub code comments explicitly referencing AI tools (e.g., ChatGPT, Copilot). It derives a taxonomy of AI-assisted activities via open-coding of 500 items, annotates the full set with two LLM classifiers aggregated by Dawid-Skene, examines 12,996 subsequent commit messages, and tracks temporal trends (Dec 2022–Mar 2026). Claims include primary LLM use for code implementation then enhancement/debugging/documentation/testing; frequent post-introduction refactoring, feature extension, and bug fixing; and a shift over time from direct generation toward conceptual support and enhancement.
Significance. If the sampling and annotation hold, the work supplies concrete empirical evidence on how AI tools are actually integrated into live repositories rather than evaluated in isolation. The combination of taxonomy development, multi-classifier aggregation, follow-up commit analysis, and temporal tracking is a strength; it directly supports falsifiable claims about activity distributions and evolution.
major comments (2)
- [Methods] Methods (sampling and data collection): All headline distributions (implementation > enhancement > debugging etc.) and the temporal shift toward conceptual support are derived exclusively from the filtered set of 35,361 comments containing explicit AI mentions. The manuscript does not address whether the probability of adding an explicit reference correlates with activity type; if it does, the reported ordering and evolution trends could be artifacts of the selection filter rather than underlying developer behavior. This selection step is load-bearing for the central claims.
- [Methods] Annotation and validation (open-coding and Dawid-Skene): The abstract and methods description provide limited quantitative detail on inter-annotator agreement for the initial 500-item open-coding, held-out validation accuracy of the LLM classifiers, or potential biases in how comments were selected for annotation. Without these metrics, the reliability of the taxonomy and the aggregated labels that underpin all reported percentages remains difficult to assess.
minor comments (1)
- [Abstract] The abstract states the temporal window as "December 2022 to March 2026"; confirm the end date is not a typo and ensure the corresponding figure or table reports the exact monthly counts used for the trend analysis.
Simulated Author's Rebuttal
We thank the referee for their constructive comments. We address each major comment below and indicate where we will revise the manuscript.
read point-by-point responses
-
Referee: [Methods] Methods (sampling and data collection): All headline distributions (implementation > enhancement > debugging etc.) and the temporal shift toward conceptual support are derived exclusively from the filtered set of 35,361 comments containing explicit AI mentions. The manuscript does not address whether the probability of adding an explicit reference correlates with activity type; if it does, the reported ordering and evolution trends could be artifacts of the selection filter rather than underlying developer behavior. This selection step is load-bearing for the central claims.
Authors: We acknowledge the validity of this concern. The study is scoped to comments containing explicit AI references, which may introduce selection bias if referencing likelihood varies by activity. This limits claims to documented cases rather than all AI usage. We will add a dedicated Limitations subsection discussing this potential bias, its implications for the observed distributions and trends, and the scope of generalizability. No new data are needed for this addition. revision: yes
-
Referee: [Methods] Annotation and validation (open-coding and Dawid-Skene): The abstract and methods description provide limited quantitative detail on inter-annotator agreement for the initial 500-item open-coding, held-out validation accuracy of the LLM classifiers, or potential biases in how comments were selected for annotation. Without these metrics, the reliability of the taxonomy and the aggregated labels that underpin all reported percentages remains difficult to assess.
Authors: We agree that additional quantitative details would improve transparency. The open-coding involved multiple authors and the classifiers were validated on held-out data, but these specifics were not fully reported. We will revise the Methods section to include inter-annotator agreement (e.g., Cohen's kappa), classifier validation accuracy, and discussion of annotation sample selection criteria or biases. revision: yes
Circularity Check
No circularity detected in empirical analysis
full rationale
The paper is a purely empirical study that derives its taxonomy from open-coding a sample of 500 comments, applies LLM classifiers and Dawid-Skene aggregation to the remaining data, and reports frequency counts and temporal trends directly from the resulting annotations. No equations, fitted parameters presented as predictions, self-definitional constructs, or load-bearing self-citations appear in the derivation chain. All reported distributions and evolution patterns are computed outputs from the filtered corpus rather than reductions of those outputs back to the inputs by construction. The analysis is therefore self-contained against external benchmarks and receives the default non-circularity finding.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption The selected GitHub comments mentioning AI tools are representative of AI-assisted development activities.
Reference graph
Works this paper leans on
-
[1]
and Yu, Shipeng and Zhao, Linda H
Raykar, Vikas C. and Yu, Shipeng and Zhao, Linda H. and Valadez, Gerardo Hermosillo and Florin, Charles and Bogoni, Luca and Moy, Linda , title =. Journal of Machine Learning Research , volume =
-
[2]
Bayesian Calibration of Win Rate Estimation with
Gao, Yicheng and Xu, Gonghan and Wang, Zhe and Cohan, Arman , booktitle =. Bayesian Calibration of Win Rate Estimation with. 2024 , month = nov, pages =. doi:10.18653/v1/2024.emnlp-main.273 , url =
-
[3]
Ehsani, Ramtin and Pathak, Sakshi and Parra, Esteban and Haiduc, Sonia and Chatterjee, Preetha , title =. Empirical Softw. Engg. , month = nov, numpages =. 2025 , issue_date =. doi:10.1007/s10664-025-10745-8 , abstract =
-
[4]
Proceedings of the 2009 IEEE International Conference on Program Comprehension (ICPC) , pages =
Automatic classification of large changes into maintenance categories , author =. Proceedings of the 2009 IEEE International Conference on Program Comprehension (ICPC) , pages =. 2009 , organization =
2009
-
[5]
Journal of the Royal Statistical Society: Series C (Applied Statistics) , volume =
Maximum likelihood estimation of observer error-rates using the EM algorithm , author =. Journal of the Royal Statistical Society: Series C (Applied Statistics) , volume =. 1979 , publisher =
1979
-
[6]
Advances in Neural Information Processing Systems (NeurIPS 22) , pages =
Whose vote should count more: Optimal integration of labels from labelers of unknown expertise , author =. Advances in Neural Information Processing Systems (NeurIPS 22) , pages =
-
[7]
Proceedings of the IEEE/ACM 46th international conference on software engineering , pages=
Uncovering the causes of emotions in software developer communication using zero-shot llms , author=. Proceedings of the IEEE/ACM 46th international conference on software engineering , pages=
-
[8]
Proceedings of the 2008 Conference on Empirical Methods in Natural Language Processing (EMNLP) , pages =
Cheap and fast---but is it good? Evaluating non-expert annotations for natural language tasks , author =. Proceedings of the 2008 Conference on Empirical Methods in Natural Language Processing (EMNLP) , pages =. 2008 , address =
2008
-
[9]
Learning whom to trust with
Hovy, Dirk and Berg-Kirkpatrick, Taylor and Vaswani, Ashish and Hovy, Eduard , booktitle =. Learning whom to trust with
-
[10]
Proceedings of the 2024 CHI Conference on Human Factors in Computing Systems , pages=
If in a crowdsourced data annotation pipeline, a gpt-4 , author=. Proceedings of the 2024 CHI Conference on Human Factors in Computing Systems , pages=
2024
-
[11]
NeurIPS 2024 Workshop on Bayesian Decision-making and Uncertainty , year=
A bayesian approach towards crowdsourcing the truths from llms , author=. NeurIPS 2024 Workshop on Bayesian Decision-making and Uncertainty , year=
2024
-
[12]
IEEE Signal Processing Magazine , volume=
Learning from crowdsourced noisy labels: A signal processing perspective , author=. IEEE Signal Processing Magazine , volume=. 2025 , publisher=
2025
-
[13]
Proceedings of the 6th Joint SIGHUM Workshop on Computational Linguistics for Cultural Heritage, Social Sciences, Humanities and Literature , pages=
Developing a tool for fair and reproducible use of paid crowdsourcing in the digital humanities , author=. Proceedings of the 6th Joint SIGHUM Workshop on Computational Linguistics for Cultural Heritage, Social Sciences, Humanities and Literature , pages=
-
[14]
2024 , month =
Introducing GPT-OSS , author =. 2024 , month =
2024
-
[15]
2024 , month =
Mistral AI Models Overview , author =. 2024 , month =
2024
-
[16]
Pattern Recognition and Machine Learning , author =
-
[17]
Machine Learning: A Probabilistic Perspective , author =
-
[18]
Bayesian Data Analysis , author =
-
[19]
ACM Transactions on Information Systems (TOIS) , volume =
A Study of Smoothing Methods for Language Models Applied to Ad Hoc Information Retrieval , author =. ACM Transactions on Information Systems (TOIS) , volume =. 2004 , publisher =
2004
-
[20]
An Empirical Study of Smoothing Techniques for Language Modeling , author =
-
[21]
Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks , author =. Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing (EMNLP) and the 9th International Joint Conference on Natural Language Processing (IJCNLP) , pages =. 2019 , organization =. doi:10.18653/v1/D19-1410 , url =
-
[22]
Journal of Systems and Software , volume=
A decade of code comment quality assessment: A systematic literature review , author=. Journal of Systems and Software , volume=. 2023 , publisher=
2023
-
[23]
Journal of experimental social psychology , volume=
Detecting outliers: Do not use standard deviation around the mean, use absolute deviation around the median , author=. Journal of experimental social psychology , volume=. 2013 , publisher=
2013
-
[24]
Journal of the American Statistical association , volume=
Alternatives to the median absolute deviation , author=. Journal of the American Statistical association , volume=. 1993 , publisher=
1993
-
[25]
Bioinformatics , volume=
Understanding sequencing data as compositions: an outlook and review , author=. Bioinformatics , volume=. 2018 , publisher=
2018
-
[26]
ACM computing surveys (CSUR) , volume=
Anomaly detection: A survey , author=. ACM computing surveys (CSUR) , volume=. 2009 , publisher=
2009
-
[27]
2015 , publisher=
Time series analysis: forecasting and control , author=. 2015 , publisher=
2015
-
[28]
Journal of quality technology , volume=
The exponentially weighted moving average , author=. Journal of quality technology , volume=. 1986 , publisher=
1986
-
[29]
1977 , publisher=
Exploratory data analysis , author=. 1977 , publisher=
1977
-
[30]
2009 IEEE 31st international conference on software engineering , pages=
Predicting faults using the complexity of code changes , author=. 2009 IEEE 31st international conference on software engineering , pages=. 2009 , organization=
2009
-
[31]
Information , volume=
Semantic clustering of functional requirements using agglomerative hierarchical clustering , author=. Information , volume=. 2018 , publisher=
2018
-
[32]
Journal of Open Source Software , volume =
hdbscan: Hierarchical density based clustering , author =. Journal of Open Source Software , volume =. 2017 , publisher =. doi:10.21105/joss.00205 , url =
-
[33]
2024 , month =
Octoverse 2024: AI leads Python to top language , author =. 2024 , month =
2024
-
[34]
2023 , month =
The State of Open Source and AI in 2023 , author =. 2023 , month =
2023
-
[35]
2024 , url =
Technology | 2024 Stack Overflow Developer Survey , author =. 2024 , url =
2024
-
[36]
2024 , url =
Tree-sitter: An Incremental Parsing System for Programming Tools , author =. 2024 , url =
2024
-
[37]
Proceedings of the 21st International Conference on Mining Software Repositories (MSR) , year=
Unveiling ChatGPT’s Usage in Open Source Projects: A Mining-based Study , author=. Proceedings of the 21st International Conference on Mining Software Repositories (MSR) , year=
-
[38]
Proceedings of the 39th IEEE/ACM International Conference on Automated Software Engineering (ASE) , year=
Where Are Large Language Models for Code Generation on GitHub? , author=. Proceedings of the 39th IEEE/ACM International Conference on Automated Software Engineering (ASE) , year=
-
[39]
Empirical Software Engineering , year=
An Empirical Study on Developers’ Shared Conversations with ChatGPT in GitHub Pull Requests and Issues , author=. Empirical Software Engineering , year=
-
[40]
Proceedings of the 21st International Conference on Mining Software Repositories (MSR) , year=
Chatting with AI: Deciphering Developer Conversations with ChatGPT , author=. Proceedings of the 21st International Conference on Mining Software Repositories (MSR) , year=
-
[41]
Proceedings of the 21st International Conference on Mining Software Repositories (MSR) – Mining Challenge Track , year=
On the Taxonomy of Developers’ Discussion Topics with ChatGPT , author=. Proceedings of the 21st International Conference on Mining Software Repositories (MSR) – Mining Challenge Track , year=
-
[42]
Proceedings of the 21st International Conference on Mining Software Repositories , pages=
Can chatgpt support developers? an empirical evaluation of large language models for code generation , author=. Proceedings of the 21st International Conference on Mining Software Repositories , pages=
-
[43]
Proceedings of the 21st International Conference on Mining Software Repositories (MSR) , year=
ChatGPT in Action: Analyzing its Use in Software Development , author=. Proceedings of the 21st International Conference on Mining Software Repositories (MSR) , year=
-
[44]
Handbook on Teaching Empirical Software Engineering , pages=
Teaching Mining Software Repositories , author=. Handbook on Teaching Empirical Software Engineering , pages=. 2024 , publisher=
2024
-
[45]
2023 IEEE/ACM 45th International Conference on Software Engineering (ICSE) , pages=
Developer-intent driven code comment generation , author=. 2023 IEEE/ACM 45th International Conference on Software Engineering (ICSE) , pages=. 2023 , organization=
2023
-
[46]
Journal of software maintenance and evolution: Research and practice , volume=
A survey and taxonomy of approaches for mining software repositories in the context of software evolution , author=. Journal of software maintenance and evolution: Research and practice , volume=. 2007 , publisher=
2007
-
[47]
2013 21st international conference on program comprehension (icpc) , pages=
Quality analysis of source code comments , author=. 2013 21st international conference on program comprehension (icpc) , pages=. 2013 , organization=
2013
-
[48]
Proceedings of the 2008 international working conference on Mining software repositories , pages=
What do large commits tell us? a taxonomical study of large commits , author=. Proceedings of the 2008 international working conference on Mining software repositories , pages=
2008
-
[49]
IEEE Transactions on Software Engineering , year=
Self-admitted GenAI usage in open-source software , author=. IEEE Transactions on Software Engineering , year=
-
[50]
ACM Transactions on Software Engineering and Methodology , volume=
Large language models for software engineering: A systematic literature review , author=. ACM Transactions on Software Engineering and Methodology , volume=. 2024 , publisher=
2024
-
[51]
Proceedings of the ACM on Programming Languages , volume=
Grounded copilot: How programmers interact with code-generating models , author=. Proceedings of the ACM on Programming Languages , volume=. 2023 , publisher=
2023
-
[52]
Proceedings of the 21st International Conference on Mining Software Repositories , pages=
How to refactor this code? an exploratory study on developer-chatgpt refactoring conversations , author=. Proceedings of the 21st International Conference on Mining Software Repositories , pages=
-
[53]
Proceedings of the 1st ACM international conference on AI-powered software , pages=
A comparative analysis of large language models for code documentation generation , author=. Proceedings of the 1st ACM international conference on AI-powered software , pages=
-
[54]
Proceedings of the ACM on Software Engineering , volume=
AI-assisted Code Authoring at Scale: Fine-tuning, deploying, and mixed methods evaluation , author=. Proceedings of the ACM on Software Engineering , volume=. 2024 , publisher=
2024
-
[55]
Proceedings of the 2024 CHI Conference on Human Factors in Computing Systems , pages=
Reading between the lines: Modeling user behavior and costs in AI-assisted programming , author=. Proceedings of the 2024 CHI Conference on Human Factors in Computing Systems , pages=
2024
-
[56]
Proceedings of the IEEE/ACM 46th International Conference on Software Engineering , pages=
Evaluating large language models in class-level code generation , author=. Proceedings of the IEEE/ACM 46th International Conference on Software Engineering , pages=
-
[57]
arXiv preprint arXiv:2302.06590 , year=
The impact of ai on developer productivity: Evidence from github copilot , author=. arXiv preprint arXiv:2302.06590 , year=
-
[58]
Proceedings of the 21st International Conference on Mining Software Repositories , pages=
How do software developers use chatgpt? an exploratory study on github pull requests , author=. Proceedings of the 21st International Conference on Mining Software Repositories , pages=
-
[59]
Proceedings of the 21st International Conference on Mining Software Repositories , pages=
Analyzing developer use of chatgpt generated code in open source github projects , author=. Proceedings of the 21st International Conference on Mining Software Repositories , pages=
-
[60]
Proceedings of the 46th IEEE/ACM International Conference on Software Engineering , pages=
Exploring the potential of chatgpt in automated code refinement: An empirical study , author=. Proceedings of the 46th IEEE/ACM International Conference on Software Engineering , pages=
-
[61]
2026 , url =
Replication Package , author =. 2026 , url =
2026
-
[62]
2019 IEEE/ACM 27th International Conference on Program Comprehension (ICPC) , pages=
A large-scale empirical study on code-comment inconsistencies , author=. 2019 IEEE/ACM 27th International Conference on Program Comprehension (ICPC) , pages=. 2019 , organization=
2019
-
[63]
Proceedings of the 44th international conference on software engineering , pages=
Practitioners' expectations on automated code comment generation , author=. Proceedings of the 44th international conference on software engineering , pages=
-
[64]
IEEE Transactions on Software Engineering , year=
Automated commit message generation with large language models: An empirical study and beyond , author=. IEEE Transactions on Software Engineering , year=
-
[65]
2025 IEEE/ACM 22nd International Conference on Mining Software Repositories (MSR) , pages=
Towards detecting prompt knowledge gaps for improved llm-guided issue resolution , author=. 2025 IEEE/ACM 22nd International Conference on Mining Software Repositories (MSR) , pages=. 2025 , organization=
2025
-
[66]
Proceedings of the 21st International Conference on Predictive Models and Data Analytics in Software Engineering , pages =
Katzy, Jonathan and Huang, Yongcheng and Panchu, Gopal-Raj and Ziemlewski, Maksym and Loizides, Paris and Vermeulen, Sander and van Deursen, Arie and Izadi, Maliheh , title =. Proceedings of the 21st International Conference on Predictive Models and Data Analytics in Software Engineering , pages =. 2025 , isbn =
2025
-
[67]
2026 , issn =
Do comments and expertise still matter? An experiment on programmers’ adoption of AI-generated JavaScript code , journal =. 2026 , issn =
2026
-
[68]
The art and science of analyzing software data , pages=
Code comment analysis for improving software quality , author=. The art and science of analyzing software data , pages=. 2015 , publisher=
2015
-
[69]
Sample Size Calculator , year =
-
[70]
What Is the Right Sample Size for Research? , year =
-
[71]
International Conference on Learning Representations , volume=
Swe-bench: Can language models resolve real-world github issues? , author=. International Conference on Learning Representations , volume=
-
[72]
Proceedings of the XXI Brazilian Symposium on Software Quality , pages=
Characterizing commits in open-source software , author=. Proceedings of the XXI Brazilian Symposium on Software Quality , pages=
-
[73]
Proceedings of the 44th International Conference on Software Engineering , pages=
What makes a good commit message? , author=. Proceedings of the 44th International Conference on Software Engineering , pages=
-
[74]
Grootendorst, Maarten , title =. n.d. , howpublished =
-
[75]
arXiv preprint arXiv:2203.05794 , year=
BERTopic: Neural topic modeling with a class-based TF-IDF procedure , author=. arXiv preprint arXiv:2203.05794 , year=
-
[76]
Frontiers in sociology , volume=
A topic modeling comparison between lda, nmf, top2vec, and bertopic to demystify twitter posts , author=. Frontiers in sociology , volume=. 2022 , publisher=
2022
-
[77]
2014 IEEE international conference on software maintenance and evolution , pages=
Clustering commits for understanding the intents of implementation , author=. 2014 IEEE international conference on software maintenance and evolution , pages=. 2014 , organization=
2014
-
[78]
2023 IEEE/ACM International Conference on Software Engineering: Future of Software Engineering (ICSE-FoSE) , pages=
Large language models for software engineering: Survey and open problems , author=. 2023 IEEE/ACM International Conference on Software Engineering: Future of Software Engineering (ICSE-FoSE) , pages=. 2023 , organization=
2023
-
[79]
Proceedings of the 18th ACM/IEEE International Symposium on Empirical Software Engineering and Measurement , pages=
Multi-language software development in the llm era: Insights from practitioners’ conversations with chatgpt , author=. Proceedings of the 18th ACM/IEEE International Symposium on Empirical Software Engineering and Measurement , pages=
-
[80]
IEEE Software , volume=
Application of large language models to software engineering tasks: Opportunities, risks, and implications , author=. IEEE Software , volume=. 2023 , publisher=
2023
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.