Empirical Study on the Characteristics and Evolution of AI-usage in GitHub Repositories: Evidence from Code Comments

Abdullah Al Mujahid; Mia Mohammad Imran; Preetha Chatterjee

arxiv: 2606.06843 · v1 · pith:5VVCRWL3new · submitted 2026-06-05 · 💻 cs.SE

Empirical Study on the Characteristics and Evolution of AI-usage in GitHub Repositories: Evidence from Code Comments

Abdullah Al Mujahid , Preetha Chatterjee , Mia Mohammad Imran This is my paper

Pith reviewed 2026-06-27 21:34 UTC · model grok-4.3

classification 💻 cs.SE

keywords AI usage in software developmentGitHub code commentsLLM toolsempirical analysiscode evolutiontemporal trends in AI adoptiondeveloper workflows

0 comments

The pith

Developers primarily use LLMs for code implementation in GitHub projects, with later commits showing refactoring and bug fixes.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper studies 35,361 GitHub code comments that mention AI tools to understand real-world usage patterns. It derives a taxonomy of activities and finds code implementation as the main purpose, followed by enhancement and debugging. Analysis of follow-up commits reveals consistent human involvement in adapting the code, while trends over time show a shift from generation to conceptual support.

Core claim

Developers primarily use LLMs for code implementation, followed by code enhancement, debugging, documentation, and testing. Subsequent commits frequently involve refactoring and cleanup, feature integration and extension, and bug fixing, indicating sustained human oversight in adapting AI-assisted code. Over time, AI-referencing comments shift from direct code generation toward knowledge and conceptual support and code enhancement.

What carries the argument

Open-coded taxonomy of AI-assisted activities applied to comments via two LLM classifiers aggregated with Dawid-Skene method, plus analysis of subsequent commit messages and temporal trends.

If this is right

AI serves as an aid embedded in human-driven workflows rather than a replacement.
Developers continue to invest effort in refining AI-generated code after its introduction.
AI usage in projects evolves from task-specific generation to broader knowledge support.
Evidence of sustained oversight suggests AI tools complement rather than automate development entirely.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Design of future AI coding tools could emphasize support for editing and conceptual queries.
Similar patterns may appear in other domains where AI generates initial artifacts that require human adaptation.
The findings point to a collaborative model that could be tested in controlled studies of developer-AI interaction.

Load-bearing premise

The 35,361 explicitly referencing comments, classified accurately by the LLM models and aggregation method, represent typical AI usage in GitHub projects without major selection or classification biases.

What would settle it

Finding a substantially different distribution of usage categories or lack of human refinement in a broader sample of GitHub projects that do not explicitly mention AI in comments would falsify the central patterns.

read the original abstract

Developers increasingly use AI tools such as ChatGPT, Copilot, and Claude in everyday software workflows, but prior studies often evaluate LLM outputs in isolation rather than examining how developers adapt them in real projects. We analyze 35,361 GitHub code comments that explicitly reference AI use and their associated code blocks. We first open-code 500 unique comments and code blocks to derive a taxonomy of AI-assisted development activities, then annotate the full dataset using two LLM-based classifiers and aggregate predictions with Dawid-Skene expectation-maximization. We also analyze 12,996 subsequent commit messages to study how AI-assisted code evolves after introduction, and examine temporal trends from December 2022 to March 2026. Our results show that developers primarily use LLMs for code implementation, followed by code enhancement, debugging, documentation, and testing. Subsequent commits frequently involve refactoring and cleanup, feature integration and extension, and bug fixing, indicating sustained human oversight in adapting AI-assisted code. Over time, AI-referencing comments shift from direct code generation toward knowledge and conceptual support and code enhancement. These findings suggest that AI tools are becoming embedded not only as code-generation aids, but also as collaborative support mechanisms whose outputs are refined, extended, and corrected by developers over time.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper maps usage patterns from 35k explicit AI-referencing GitHub comments and tracks later commits, but the sampling filter is a real limit on what the distributions can tell us.

read the letter

This paper pulls 35,361 GitHub code comments that name AI tools, open-codes 500 of them to build a taxonomy of activities, then scales the labeling with two LLMs and Dawid-Skene aggregation. It also looks at 12,996 follow-up commit messages and checks how the comment types shift from December 2022 onward.

The concrete results are the main value: implementation is the biggest category, followed by enhancement, debugging, documentation, and testing. Later commits often involve refactoring, feature work, or bug fixes, and the comments move over time toward conceptual help rather than pure generation. That gives a grounded picture of what developers actually write down when they use these tools.

The approach is standard and replicable for this kind of software engineering study. Starting with manual open coding and then using aggregated LLM labels is a practical way to handle the volume.

The sampling choice is the clear soft spot. All the numbers and trends come only from comments that explicitly mention AI. Any AI-assisted edit that left no such trace is invisible, and if the chance of adding a mention varies by activity type, the reported ordering and the temporal shift could be artifacts of that filter. The abstract and methods do not quantify or correct for that upstream selection.

This is useful for researchers who study real-world AI adoption in development or who build tools that need to fit how people actually work. The data and methods are solid enough to warrant peer review so the bias discussion and annotation validation can be checked in detail.

Referee Report

2 major / 1 minor

Summary. The paper analyzes 35,361 GitHub code comments explicitly referencing AI tools (e.g., ChatGPT, Copilot). It derives a taxonomy of AI-assisted activities via open-coding of 500 items, annotates the full set with two LLM classifiers aggregated by Dawid-Skene, examines 12,996 subsequent commit messages, and tracks temporal trends (Dec 2022–Mar 2026). Claims include primary LLM use for code implementation then enhancement/debugging/documentation/testing; frequent post-introduction refactoring, feature extension, and bug fixing; and a shift over time from direct generation toward conceptual support and enhancement.

Significance. If the sampling and annotation hold, the work supplies concrete empirical evidence on how AI tools are actually integrated into live repositories rather than evaluated in isolation. The combination of taxonomy development, multi-classifier aggregation, follow-up commit analysis, and temporal tracking is a strength; it directly supports falsifiable claims about activity distributions and evolution.

major comments (2)

[Methods] Methods (sampling and data collection): All headline distributions (implementation > enhancement > debugging etc.) and the temporal shift toward conceptual support are derived exclusively from the filtered set of 35,361 comments containing explicit AI mentions. The manuscript does not address whether the probability of adding an explicit reference correlates with activity type; if it does, the reported ordering and evolution trends could be artifacts of the selection filter rather than underlying developer behavior. This selection step is load-bearing for the central claims.
[Methods] Annotation and validation (open-coding and Dawid-Skene): The abstract and methods description provide limited quantitative detail on inter-annotator agreement for the initial 500-item open-coding, held-out validation accuracy of the LLM classifiers, or potential biases in how comments were selected for annotation. Without these metrics, the reliability of the taxonomy and the aggregated labels that underpin all reported percentages remains difficult to assess.

minor comments (1)

[Abstract] The abstract states the temporal window as "December 2022 to March 2026"; confirm the end date is not a typo and ensure the corresponding figure or table reports the exact monthly counts used for the trend analysis.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their constructive comments. We address each major comment below and indicate where we will revise the manuscript.

read point-by-point responses

Referee: [Methods] Methods (sampling and data collection): All headline distributions (implementation > enhancement > debugging etc.) and the temporal shift toward conceptual support are derived exclusively from the filtered set of 35,361 comments containing explicit AI mentions. The manuscript does not address whether the probability of adding an explicit reference correlates with activity type; if it does, the reported ordering and evolution trends could be artifacts of the selection filter rather than underlying developer behavior. This selection step is load-bearing for the central claims.

Authors: We acknowledge the validity of this concern. The study is scoped to comments containing explicit AI references, which may introduce selection bias if referencing likelihood varies by activity. This limits claims to documented cases rather than all AI usage. We will add a dedicated Limitations subsection discussing this potential bias, its implications for the observed distributions and trends, and the scope of generalizability. No new data are needed for this addition. revision: yes
Referee: [Methods] Annotation and validation (open-coding and Dawid-Skene): The abstract and methods description provide limited quantitative detail on inter-annotator agreement for the initial 500-item open-coding, held-out validation accuracy of the LLM classifiers, or potential biases in how comments were selected for annotation. Without these metrics, the reliability of the taxonomy and the aggregated labels that underpin all reported percentages remains difficult to assess.

Authors: We agree that additional quantitative details would improve transparency. The open-coding involved multiple authors and the classifiers were validated on held-out data, but these specifics were not fully reported. We will revise the Methods section to include inter-annotator agreement (e.g., Cohen's kappa), classifier validation accuracy, and discussion of annotation sample selection criteria or biases. revision: yes

Circularity Check

0 steps flagged

No circularity detected in empirical analysis

full rationale

The paper is a purely empirical study that derives its taxonomy from open-coding a sample of 500 comments, applies LLM classifiers and Dawid-Skene aggregation to the remaining data, and reports frequency counts and temporal trends directly from the resulting annotations. No equations, fitted parameters presented as predictions, self-definitional constructs, or load-bearing self-citations appear in the derivation chain. All reported distributions and evolution patterns are computed outputs from the filtered corpus rather than reductions of those outputs back to the inputs by construction. The analysis is therefore self-contained against external benchmarks and receives the default non-circularity finding.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim rests on the representativeness of GitHub comments as proxies for AI usage and the accuracy of the automated annotation pipeline.

axioms (1)

domain assumption The selected GitHub comments mentioning AI tools are representative of AI-assisted development activities.
This underpins the analysis of the 35,361 comments as evidence.

pith-pipeline@v0.9.1-grok · 5765 in / 1319 out tokens · 26151 ms · 2026-06-27T21:34:16.741212+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

112 extracted references · 4 canonical work pages

[1]

and Yu, Shipeng and Zhao, Linda H

Raykar, Vikas C. and Yu, Shipeng and Zhao, Linda H. and Valadez, Gerardo Hermosillo and Florin, Charles and Bogoni, Luca and Moy, Linda , title =. Journal of Machine Learning Research , volume =
[2]

Bayesian Calibration of Win Rate Estimation with

Gao, Yicheng and Xu, Gonghan and Wang, Zhe and Cohan, Arman , booktitle =. Bayesian Calibration of Win Rate Estimation with. 2024 , month = nov, pages =. doi:10.18653/v1/2024.emnlp-main.273 , url =

work page doi:10.18653/v1/2024.emnlp-main.273 2024
[3]

Empirical Softw

Ehsani, Ramtin and Pathak, Sakshi and Parra, Esteban and Haiduc, Sonia and Chatterjee, Preetha , title =. Empirical Softw. Engg. , month = nov, numpages =. 2025 , issue_date =. doi:10.1007/s10664-025-10745-8 , abstract =

work page doi:10.1007/s10664-025-10745-8 2025
[4]

Proceedings of the 2009 IEEE International Conference on Program Comprehension (ICPC) , pages =

Automatic classification of large changes into maintenance categories , author =. Proceedings of the 2009 IEEE International Conference on Program Comprehension (ICPC) , pages =. 2009 , organization =

2009
[5]

Journal of the Royal Statistical Society: Series C (Applied Statistics) , volume =

Maximum likelihood estimation of observer error-rates using the EM algorithm , author =. Journal of the Royal Statistical Society: Series C (Applied Statistics) , volume =. 1979 , publisher =

1979
[6]

Advances in Neural Information Processing Systems (NeurIPS 22) , pages =

Whose vote should count more: Optimal integration of labels from labelers of unknown expertise , author =. Advances in Neural Information Processing Systems (NeurIPS 22) , pages =
[7]

Proceedings of the IEEE/ACM 46th international conference on software engineering , pages=

Uncovering the causes of emotions in software developer communication using zero-shot llms , author=. Proceedings of the IEEE/ACM 46th international conference on software engineering , pages=
[8]

Proceedings of the 2008 Conference on Empirical Methods in Natural Language Processing (EMNLP) , pages =

Cheap and fast---but is it good? Evaluating non-expert annotations for natural language tasks , author =. Proceedings of the 2008 Conference on Empirical Methods in Natural Language Processing (EMNLP) , pages =. 2008 , address =

2008
[9]

Learning whom to trust with

Hovy, Dirk and Berg-Kirkpatrick, Taylor and Vaswani, Ashish and Hovy, Eduard , booktitle =. Learning whom to trust with
[10]

Proceedings of the 2024 CHI Conference on Human Factors in Computing Systems , pages=

If in a crowdsourced data annotation pipeline, a gpt-4 , author=. Proceedings of the 2024 CHI Conference on Human Factors in Computing Systems , pages=

2024
[11]

NeurIPS 2024 Workshop on Bayesian Decision-making and Uncertainty , year=

A bayesian approach towards crowdsourcing the truths from llms , author=. NeurIPS 2024 Workshop on Bayesian Decision-making and Uncertainty , year=

2024
[12]

IEEE Signal Processing Magazine , volume=

Learning from crowdsourced noisy labels: A signal processing perspective , author=. IEEE Signal Processing Magazine , volume=. 2025 , publisher=

2025
[13]

Proceedings of the 6th Joint SIGHUM Workshop on Computational Linguistics for Cultural Heritage, Social Sciences, Humanities and Literature , pages=

Developing a tool for fair and reproducible use of paid crowdsourcing in the digital humanities , author=. Proceedings of the 6th Joint SIGHUM Workshop on Computational Linguistics for Cultural Heritage, Social Sciences, Humanities and Literature , pages=
[14]

2024 , month =

Introducing GPT-OSS , author =. 2024 , month =

2024
[15]

2024 , month =

Mistral AI Models Overview , author =. 2024 , month =

2024
[16]

Pattern Recognition and Machine Learning , author =
[17]

Machine Learning: A Probabilistic Perspective , author =
[18]

Bayesian Data Analysis , author =
[19]

ACM Transactions on Information Systems (TOIS) , volume =

A Study of Smoothing Methods for Language Models Applied to Ad Hoc Information Retrieval , author =. ACM Transactions on Information Systems (TOIS) , volume =. 2004 , publisher =

2004
[20]

An Empirical Study of Smoothing Techniques for Language Modeling , author =
[21]

Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks , author =. Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing (EMNLP) and the 9th International Joint Conference on Natural Language Processing (IJCNLP) , pages =. 2019 , organization =. doi:10.18653/v1/D19-1410 , url =

work page doi:10.18653/v1/d19-1410 2019
[22]

Journal of Systems and Software , volume=

A decade of code comment quality assessment: A systematic literature review , author=. Journal of Systems and Software , volume=. 2023 , publisher=

2023
[23]

Journal of experimental social psychology , volume=

Detecting outliers: Do not use standard deviation around the mean, use absolute deviation around the median , author=. Journal of experimental social psychology , volume=. 2013 , publisher=

2013
[24]

Journal of the American Statistical association , volume=

Alternatives to the median absolute deviation , author=. Journal of the American Statistical association , volume=. 1993 , publisher=

1993
[25]

Bioinformatics , volume=

Understanding sequencing data as compositions: an outlook and review , author=. Bioinformatics , volume=. 2018 , publisher=

2018
[26]

ACM computing surveys (CSUR) , volume=

Anomaly detection: A survey , author=. ACM computing surveys (CSUR) , volume=. 2009 , publisher=

2009
[27]

2015 , publisher=

Time series analysis: forecasting and control , author=. 2015 , publisher=

2015
[28]

Journal of quality technology , volume=

The exponentially weighted moving average , author=. Journal of quality technology , volume=. 1986 , publisher=

1986
[29]

1977 , publisher=

Exploratory data analysis , author=. 1977 , publisher=

1977
[30]

2009 IEEE 31st international conference on software engineering , pages=

Predicting faults using the complexity of code changes , author=. 2009 IEEE 31st international conference on software engineering , pages=. 2009 , organization=

2009
[31]

Information , volume=

Semantic clustering of functional requirements using agglomerative hierarchical clustering , author=. Information , volume=. 2018 , publisher=

2018
[32]

Journal of Open Source Software , volume =

hdbscan: Hierarchical density based clustering , author =. Journal of Open Source Software , volume =. 2017 , publisher =. doi:10.21105/joss.00205 , url =

work page doi:10.21105/joss.00205 2017
[33]

2024 , month =

Octoverse 2024: AI leads Python to top language , author =. 2024 , month =

2024
[34]

2023 , month =

The State of Open Source and AI in 2023 , author =. 2023 , month =

2023
[35]

2024 , url =

Technology | 2024 Stack Overflow Developer Survey , author =. 2024 , url =

2024
[36]

2024 , url =

Tree-sitter: An Incremental Parsing System for Programming Tools , author =. 2024 , url =

2024
[37]

Proceedings of the 21st International Conference on Mining Software Repositories (MSR) , year=

Unveiling ChatGPT’s Usage in Open Source Projects: A Mining-based Study , author=. Proceedings of the 21st International Conference on Mining Software Repositories (MSR) , year=
[38]

Proceedings of the 39th IEEE/ACM International Conference on Automated Software Engineering (ASE) , year=

Where Are Large Language Models for Code Generation on GitHub? , author=. Proceedings of the 39th IEEE/ACM International Conference on Automated Software Engineering (ASE) , year=
[39]

Empirical Software Engineering , year=

An Empirical Study on Developers’ Shared Conversations with ChatGPT in GitHub Pull Requests and Issues , author=. Empirical Software Engineering , year=
[40]

Proceedings of the 21st International Conference on Mining Software Repositories (MSR) , year=

Chatting with AI: Deciphering Developer Conversations with ChatGPT , author=. Proceedings of the 21st International Conference on Mining Software Repositories (MSR) , year=
[41]

Proceedings of the 21st International Conference on Mining Software Repositories (MSR) – Mining Challenge Track , year=

On the Taxonomy of Developers’ Discussion Topics with ChatGPT , author=. Proceedings of the 21st International Conference on Mining Software Repositories (MSR) – Mining Challenge Track , year=
[42]

Proceedings of the 21st International Conference on Mining Software Repositories , pages=

Can chatgpt support developers? an empirical evaluation of large language models for code generation , author=. Proceedings of the 21st International Conference on Mining Software Repositories , pages=
[43]

Proceedings of the 21st International Conference on Mining Software Repositories (MSR) , year=

ChatGPT in Action: Analyzing its Use in Software Development , author=. Proceedings of the 21st International Conference on Mining Software Repositories (MSR) , year=
[44]

Handbook on Teaching Empirical Software Engineering , pages=

Teaching Mining Software Repositories , author=. Handbook on Teaching Empirical Software Engineering , pages=. 2024 , publisher=

2024
[45]

2023 IEEE/ACM 45th International Conference on Software Engineering (ICSE) , pages=

Developer-intent driven code comment generation , author=. 2023 IEEE/ACM 45th International Conference on Software Engineering (ICSE) , pages=. 2023 , organization=

2023
[46]

Journal of software maintenance and evolution: Research and practice , volume=

A survey and taxonomy of approaches for mining software repositories in the context of software evolution , author=. Journal of software maintenance and evolution: Research and practice , volume=. 2007 , publisher=

2007
[47]

2013 21st international conference on program comprehension (icpc) , pages=

Quality analysis of source code comments , author=. 2013 21st international conference on program comprehension (icpc) , pages=. 2013 , organization=

2013
[48]

Proceedings of the 2008 international working conference on Mining software repositories , pages=

What do large commits tell us? a taxonomical study of large commits , author=. Proceedings of the 2008 international working conference on Mining software repositories , pages=

2008
[49]

IEEE Transactions on Software Engineering , year=

Self-admitted GenAI usage in open-source software , author=. IEEE Transactions on Software Engineering , year=
[50]

ACM Transactions on Software Engineering and Methodology , volume=

Large language models for software engineering: A systematic literature review , author=. ACM Transactions on Software Engineering and Methodology , volume=. 2024 , publisher=

2024
[51]

Proceedings of the ACM on Programming Languages , volume=

Grounded copilot: How programmers interact with code-generating models , author=. Proceedings of the ACM on Programming Languages , volume=. 2023 , publisher=

2023
[52]

Proceedings of the 21st International Conference on Mining Software Repositories , pages=

How to refactor this code? an exploratory study on developer-chatgpt refactoring conversations , author=. Proceedings of the 21st International Conference on Mining Software Repositories , pages=
[53]

Proceedings of the 1st ACM international conference on AI-powered software , pages=

A comparative analysis of large language models for code documentation generation , author=. Proceedings of the 1st ACM international conference on AI-powered software , pages=
[54]

Proceedings of the ACM on Software Engineering , volume=

AI-assisted Code Authoring at Scale: Fine-tuning, deploying, and mixed methods evaluation , author=. Proceedings of the ACM on Software Engineering , volume=. 2024 , publisher=

2024
[55]

Proceedings of the 2024 CHI Conference on Human Factors in Computing Systems , pages=

Reading between the lines: Modeling user behavior and costs in AI-assisted programming , author=. Proceedings of the 2024 CHI Conference on Human Factors in Computing Systems , pages=

2024
[56]

Proceedings of the IEEE/ACM 46th International Conference on Software Engineering , pages=

Evaluating large language models in class-level code generation , author=. Proceedings of the IEEE/ACM 46th International Conference on Software Engineering , pages=
[57]

arXiv preprint arXiv:2302.06590 , year=

The impact of ai on developer productivity: Evidence from github copilot , author=. arXiv preprint arXiv:2302.06590 , year=

Pith/arXiv arXiv
[58]

Proceedings of the 21st International Conference on Mining Software Repositories , pages=

How do software developers use chatgpt? an exploratory study on github pull requests , author=. Proceedings of the 21st International Conference on Mining Software Repositories , pages=
[59]

Proceedings of the 21st International Conference on Mining Software Repositories , pages=

Analyzing developer use of chatgpt generated code in open source github projects , author=. Proceedings of the 21st International Conference on Mining Software Repositories , pages=
[60]

Proceedings of the 46th IEEE/ACM International Conference on Software Engineering , pages=

Exploring the potential of chatgpt in automated code refinement: An empirical study , author=. Proceedings of the 46th IEEE/ACM International Conference on Software Engineering , pages=
[61]

2026 , url =

Replication Package , author =. 2026 , url =

2026
[62]

2019 IEEE/ACM 27th International Conference on Program Comprehension (ICPC) , pages=

A large-scale empirical study on code-comment inconsistencies , author=. 2019 IEEE/ACM 27th International Conference on Program Comprehension (ICPC) , pages=. 2019 , organization=

2019
[63]

Proceedings of the 44th international conference on software engineering , pages=

Practitioners' expectations on automated code comment generation , author=. Proceedings of the 44th international conference on software engineering , pages=
[64]

IEEE Transactions on Software Engineering , year=

Automated commit message generation with large language models: An empirical study and beyond , author=. IEEE Transactions on Software Engineering , year=
[65]

2025 IEEE/ACM 22nd International Conference on Mining Software Repositories (MSR) , pages=

Towards detecting prompt knowledge gaps for improved llm-guided issue resolution , author=. 2025 IEEE/ACM 22nd International Conference on Mining Software Repositories (MSR) , pages=. 2025 , organization=

2025
[66]

Proceedings of the 21st International Conference on Predictive Models and Data Analytics in Software Engineering , pages =

Katzy, Jonathan and Huang, Yongcheng and Panchu, Gopal-Raj and Ziemlewski, Maksym and Loizides, Paris and Vermeulen, Sander and van Deursen, Arie and Izadi, Maliheh , title =. Proceedings of the 21st International Conference on Predictive Models and Data Analytics in Software Engineering , pages =. 2025 , isbn =

2025
[67]

2026 , issn =

Do comments and expertise still matter? An experiment on programmers’ adoption of AI-generated JavaScript code , journal =. 2026 , issn =

2026
[68]

The art and science of analyzing software data , pages=

Code comment analysis for improving software quality , author=. The art and science of analyzing software data , pages=. 2015 , publisher=

2015
[69]

Sample Size Calculator , year =
[70]

What Is the Right Sample Size for Research? , year =
[71]

International Conference on Learning Representations , volume=

Swe-bench: Can language models resolve real-world github issues? , author=. International Conference on Learning Representations , volume=
[72]

Proceedings of the XXI Brazilian Symposium on Software Quality , pages=

Characterizing commits in open-source software , author=. Proceedings of the XXI Brazilian Symposium on Software Quality , pages=
[73]

Proceedings of the 44th International Conference on Software Engineering , pages=

What makes a good commit message? , author=. Proceedings of the 44th International Conference on Software Engineering , pages=
[74]

Grootendorst, Maarten , title =. n.d. , howpublished =
[75]

arXiv preprint arXiv:2203.05794 , year=

BERTopic: Neural topic modeling with a class-based TF-IDF procedure , author=. arXiv preprint arXiv:2203.05794 , year=

Pith/arXiv arXiv
[76]

Frontiers in sociology , volume=

A topic modeling comparison between lda, nmf, top2vec, and bertopic to demystify twitter posts , author=. Frontiers in sociology , volume=. 2022 , publisher=

2022
[77]

2014 IEEE international conference on software maintenance and evolution , pages=

Clustering commits for understanding the intents of implementation , author=. 2014 IEEE international conference on software maintenance and evolution , pages=. 2014 , organization=

2014
[78]

2023 IEEE/ACM International Conference on Software Engineering: Future of Software Engineering (ICSE-FoSE) , pages=

Large language models for software engineering: Survey and open problems , author=. 2023 IEEE/ACM International Conference on Software Engineering: Future of Software Engineering (ICSE-FoSE) , pages=. 2023 , organization=

2023
[79]

Proceedings of the 18th ACM/IEEE International Symposium on Empirical Software Engineering and Measurement , pages=

Multi-language software development in the llm era: Insights from practitioners’ conversations with chatgpt , author=. Proceedings of the 18th ACM/IEEE International Symposium on Empirical Software Engineering and Measurement , pages=
[80]

IEEE Software , volume=

Application of large language models to software engineering tasks: Opportunities, risks, and implications , author=. IEEE Software , volume=. 2023 , publisher=

2023

Showing first 80 references.

[1] [1]

and Yu, Shipeng and Zhao, Linda H

Raykar, Vikas C. and Yu, Shipeng and Zhao, Linda H. and Valadez, Gerardo Hermosillo and Florin, Charles and Bogoni, Luca and Moy, Linda , title =. Journal of Machine Learning Research , volume =

[2] [2]

Bayesian Calibration of Win Rate Estimation with

Gao, Yicheng and Xu, Gonghan and Wang, Zhe and Cohan, Arman , booktitle =. Bayesian Calibration of Win Rate Estimation with. 2024 , month = nov, pages =. doi:10.18653/v1/2024.emnlp-main.273 , url =

work page doi:10.18653/v1/2024.emnlp-main.273 2024

[3] [3]

Empirical Softw

Ehsani, Ramtin and Pathak, Sakshi and Parra, Esteban and Haiduc, Sonia and Chatterjee, Preetha , title =. Empirical Softw. Engg. , month = nov, numpages =. 2025 , issue_date =. doi:10.1007/s10664-025-10745-8 , abstract =

work page doi:10.1007/s10664-025-10745-8 2025

[4] [4]

Proceedings of the 2009 IEEE International Conference on Program Comprehension (ICPC) , pages =

Automatic classification of large changes into maintenance categories , author =. Proceedings of the 2009 IEEE International Conference on Program Comprehension (ICPC) , pages =. 2009 , organization =

2009

[5] [5]

Journal of the Royal Statistical Society: Series C (Applied Statistics) , volume =

Maximum likelihood estimation of observer error-rates using the EM algorithm , author =. Journal of the Royal Statistical Society: Series C (Applied Statistics) , volume =. 1979 , publisher =

1979

[6] [6]

Advances in Neural Information Processing Systems (NeurIPS 22) , pages =

Whose vote should count more: Optimal integration of labels from labelers of unknown expertise , author =. Advances in Neural Information Processing Systems (NeurIPS 22) , pages =

[7] [7]

Proceedings of the IEEE/ACM 46th international conference on software engineering , pages=

Uncovering the causes of emotions in software developer communication using zero-shot llms , author=. Proceedings of the IEEE/ACM 46th international conference on software engineering , pages=

[8] [8]

Proceedings of the 2008 Conference on Empirical Methods in Natural Language Processing (EMNLP) , pages =

Cheap and fast---but is it good? Evaluating non-expert annotations for natural language tasks , author =. Proceedings of the 2008 Conference on Empirical Methods in Natural Language Processing (EMNLP) , pages =. 2008 , address =

2008

[9] [9]

Learning whom to trust with

Hovy, Dirk and Berg-Kirkpatrick, Taylor and Vaswani, Ashish and Hovy, Eduard , booktitle =. Learning whom to trust with

[10] [10]

Proceedings of the 2024 CHI Conference on Human Factors in Computing Systems , pages=

If in a crowdsourced data annotation pipeline, a gpt-4 , author=. Proceedings of the 2024 CHI Conference on Human Factors in Computing Systems , pages=

2024

[11] [11]

NeurIPS 2024 Workshop on Bayesian Decision-making and Uncertainty , year=

A bayesian approach towards crowdsourcing the truths from llms , author=. NeurIPS 2024 Workshop on Bayesian Decision-making and Uncertainty , year=

2024

[12] [12]

IEEE Signal Processing Magazine , volume=

Learning from crowdsourced noisy labels: A signal processing perspective , author=. IEEE Signal Processing Magazine , volume=. 2025 , publisher=

2025

[13] [13]

Proceedings of the 6th Joint SIGHUM Workshop on Computational Linguistics for Cultural Heritage, Social Sciences, Humanities and Literature , pages=

Developing a tool for fair and reproducible use of paid crowdsourcing in the digital humanities , author=. Proceedings of the 6th Joint SIGHUM Workshop on Computational Linguistics for Cultural Heritage, Social Sciences, Humanities and Literature , pages=

[14] [14]

2024 , month =

Introducing GPT-OSS , author =. 2024 , month =

2024

[15] [15]

2024 , month =

Mistral AI Models Overview , author =. 2024 , month =

2024

[16] [16]

Pattern Recognition and Machine Learning , author =

[17] [17]

Machine Learning: A Probabilistic Perspective , author =

[18] [18]

Bayesian Data Analysis , author =

[19] [19]

ACM Transactions on Information Systems (TOIS) , volume =

A Study of Smoothing Methods for Language Models Applied to Ad Hoc Information Retrieval , author =. ACM Transactions on Information Systems (TOIS) , volume =. 2004 , publisher =

2004

[20] [20]

An Empirical Study of Smoothing Techniques for Language Modeling , author =

[21] [21]

Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks , author =. Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing (EMNLP) and the 9th International Joint Conference on Natural Language Processing (IJCNLP) , pages =. 2019 , organization =. doi:10.18653/v1/D19-1410 , url =

work page doi:10.18653/v1/d19-1410 2019

[22] [22]

Journal of Systems and Software , volume=

A decade of code comment quality assessment: A systematic literature review , author=. Journal of Systems and Software , volume=. 2023 , publisher=

2023

[23] [23]

Journal of experimental social psychology , volume=

Detecting outliers: Do not use standard deviation around the mean, use absolute deviation around the median , author=. Journal of experimental social psychology , volume=. 2013 , publisher=

2013

[24] [24]

Journal of the American Statistical association , volume=

Alternatives to the median absolute deviation , author=. Journal of the American Statistical association , volume=. 1993 , publisher=

1993

[25] [25]

Bioinformatics , volume=

Understanding sequencing data as compositions: an outlook and review , author=. Bioinformatics , volume=. 2018 , publisher=

2018

[26] [26]

ACM computing surveys (CSUR) , volume=

Anomaly detection: A survey , author=. ACM computing surveys (CSUR) , volume=. 2009 , publisher=

2009

[27] [27]

2015 , publisher=

Time series analysis: forecasting and control , author=. 2015 , publisher=

2015

[28] [28]

Journal of quality technology , volume=

The exponentially weighted moving average , author=. Journal of quality technology , volume=. 1986 , publisher=

1986

[29] [29]

1977 , publisher=

Exploratory data analysis , author=. 1977 , publisher=

1977

[30] [30]

2009 IEEE 31st international conference on software engineering , pages=

Predicting faults using the complexity of code changes , author=. 2009 IEEE 31st international conference on software engineering , pages=. 2009 , organization=

2009

[31] [31]

Information , volume=

Semantic clustering of functional requirements using agglomerative hierarchical clustering , author=. Information , volume=. 2018 , publisher=

2018

[32] [32]

Journal of Open Source Software , volume =

hdbscan: Hierarchical density based clustering , author =. Journal of Open Source Software , volume =. 2017 , publisher =. doi:10.21105/joss.00205 , url =

work page doi:10.21105/joss.00205 2017

[33] [33]

2024 , month =

Octoverse 2024: AI leads Python to top language , author =. 2024 , month =

2024

[34] [34]

2023 , month =

The State of Open Source and AI in 2023 , author =. 2023 , month =

2023

[35] [35]

2024 , url =

Technology | 2024 Stack Overflow Developer Survey , author =. 2024 , url =

2024

[36] [36]

2024 , url =

Tree-sitter: An Incremental Parsing System for Programming Tools , author =. 2024 , url =

2024

[37] [37]

Proceedings of the 21st International Conference on Mining Software Repositories (MSR) , year=

Unveiling ChatGPT’s Usage in Open Source Projects: A Mining-based Study , author=. Proceedings of the 21st International Conference on Mining Software Repositories (MSR) , year=

[38] [38]

Proceedings of the 39th IEEE/ACM International Conference on Automated Software Engineering (ASE) , year=

Where Are Large Language Models for Code Generation on GitHub? , author=. Proceedings of the 39th IEEE/ACM International Conference on Automated Software Engineering (ASE) , year=

[39] [39]

Empirical Software Engineering , year=

An Empirical Study on Developers’ Shared Conversations with ChatGPT in GitHub Pull Requests and Issues , author=. Empirical Software Engineering , year=

[40] [40]

Proceedings of the 21st International Conference on Mining Software Repositories (MSR) , year=

Chatting with AI: Deciphering Developer Conversations with ChatGPT , author=. Proceedings of the 21st International Conference on Mining Software Repositories (MSR) , year=

[41] [41]

Proceedings of the 21st International Conference on Mining Software Repositories (MSR) – Mining Challenge Track , year=

On the Taxonomy of Developers’ Discussion Topics with ChatGPT , author=. Proceedings of the 21st International Conference on Mining Software Repositories (MSR) – Mining Challenge Track , year=

[42] [42]

Proceedings of the 21st International Conference on Mining Software Repositories , pages=

Can chatgpt support developers? an empirical evaluation of large language models for code generation , author=. Proceedings of the 21st International Conference on Mining Software Repositories , pages=

[43] [43]

Proceedings of the 21st International Conference on Mining Software Repositories (MSR) , year=

ChatGPT in Action: Analyzing its Use in Software Development , author=. Proceedings of the 21st International Conference on Mining Software Repositories (MSR) , year=

[44] [44]

Handbook on Teaching Empirical Software Engineering , pages=

Teaching Mining Software Repositories , author=. Handbook on Teaching Empirical Software Engineering , pages=. 2024 , publisher=

2024

[45] [45]

2023 IEEE/ACM 45th International Conference on Software Engineering (ICSE) , pages=

Developer-intent driven code comment generation , author=. 2023 IEEE/ACM 45th International Conference on Software Engineering (ICSE) , pages=. 2023 , organization=

2023

[46] [46]

Journal of software maintenance and evolution: Research and practice , volume=

A survey and taxonomy of approaches for mining software repositories in the context of software evolution , author=. Journal of software maintenance and evolution: Research and practice , volume=. 2007 , publisher=

2007

[47] [47]

2013 21st international conference on program comprehension (icpc) , pages=

Quality analysis of source code comments , author=. 2013 21st international conference on program comprehension (icpc) , pages=. 2013 , organization=

2013

[48] [48]

Proceedings of the 2008 international working conference on Mining software repositories , pages=

What do large commits tell us? a taxonomical study of large commits , author=. Proceedings of the 2008 international working conference on Mining software repositories , pages=

2008

[49] [49]

IEEE Transactions on Software Engineering , year=

Self-admitted GenAI usage in open-source software , author=. IEEE Transactions on Software Engineering , year=

[50] [50]

ACM Transactions on Software Engineering and Methodology , volume=

Large language models for software engineering: A systematic literature review , author=. ACM Transactions on Software Engineering and Methodology , volume=. 2024 , publisher=

2024

[51] [51]

Proceedings of the ACM on Programming Languages , volume=

Grounded copilot: How programmers interact with code-generating models , author=. Proceedings of the ACM on Programming Languages , volume=. 2023 , publisher=

2023

[52] [52]

Proceedings of the 21st International Conference on Mining Software Repositories , pages=

How to refactor this code? an exploratory study on developer-chatgpt refactoring conversations , author=. Proceedings of the 21st International Conference on Mining Software Repositories , pages=

[53] [53]

Proceedings of the 1st ACM international conference on AI-powered software , pages=

A comparative analysis of large language models for code documentation generation , author=. Proceedings of the 1st ACM international conference on AI-powered software , pages=

[54] [54]

Proceedings of the ACM on Software Engineering , volume=

AI-assisted Code Authoring at Scale: Fine-tuning, deploying, and mixed methods evaluation , author=. Proceedings of the ACM on Software Engineering , volume=. 2024 , publisher=

2024

[55] [55]

Proceedings of the 2024 CHI Conference on Human Factors in Computing Systems , pages=

Reading between the lines: Modeling user behavior and costs in AI-assisted programming , author=. Proceedings of the 2024 CHI Conference on Human Factors in Computing Systems , pages=

2024

[56] [56]

Proceedings of the IEEE/ACM 46th International Conference on Software Engineering , pages=

Evaluating large language models in class-level code generation , author=. Proceedings of the IEEE/ACM 46th International Conference on Software Engineering , pages=

[57] [57]

arXiv preprint arXiv:2302.06590 , year=

The impact of ai on developer productivity: Evidence from github copilot , author=. arXiv preprint arXiv:2302.06590 , year=

Pith/arXiv arXiv

[58] [58]

Proceedings of the 21st International Conference on Mining Software Repositories , pages=

How do software developers use chatgpt? an exploratory study on github pull requests , author=. Proceedings of the 21st International Conference on Mining Software Repositories , pages=

[59] [59]

Proceedings of the 21st International Conference on Mining Software Repositories , pages=

Analyzing developer use of chatgpt generated code in open source github projects , author=. Proceedings of the 21st International Conference on Mining Software Repositories , pages=

[60] [60]

Proceedings of the 46th IEEE/ACM International Conference on Software Engineering , pages=

Exploring the potential of chatgpt in automated code refinement: An empirical study , author=. Proceedings of the 46th IEEE/ACM International Conference on Software Engineering , pages=

[61] [61]

2026 , url =

Replication Package , author =. 2026 , url =

2026

[62] [62]

2019 IEEE/ACM 27th International Conference on Program Comprehension (ICPC) , pages=

A large-scale empirical study on code-comment inconsistencies , author=. 2019 IEEE/ACM 27th International Conference on Program Comprehension (ICPC) , pages=. 2019 , organization=

2019

[63] [63]

Proceedings of the 44th international conference on software engineering , pages=

Practitioners' expectations on automated code comment generation , author=. Proceedings of the 44th international conference on software engineering , pages=

[64] [64]

IEEE Transactions on Software Engineering , year=

Automated commit message generation with large language models: An empirical study and beyond , author=. IEEE Transactions on Software Engineering , year=

[65] [65]

2025 IEEE/ACM 22nd International Conference on Mining Software Repositories (MSR) , pages=

Towards detecting prompt knowledge gaps for improved llm-guided issue resolution , author=. 2025 IEEE/ACM 22nd International Conference on Mining Software Repositories (MSR) , pages=. 2025 , organization=

2025

[66] [66]

Proceedings of the 21st International Conference on Predictive Models and Data Analytics in Software Engineering , pages =

Katzy, Jonathan and Huang, Yongcheng and Panchu, Gopal-Raj and Ziemlewski, Maksym and Loizides, Paris and Vermeulen, Sander and van Deursen, Arie and Izadi, Maliheh , title =. Proceedings of the 21st International Conference on Predictive Models and Data Analytics in Software Engineering , pages =. 2025 , isbn =

2025

[67] [67]

2026 , issn =

Do comments and expertise still matter? An experiment on programmers’ adoption of AI-generated JavaScript code , journal =. 2026 , issn =

2026

[68] [68]

The art and science of analyzing software data , pages=

Code comment analysis for improving software quality , author=. The art and science of analyzing software data , pages=. 2015 , publisher=

2015

[69] [69]

Sample Size Calculator , year =

[70] [70]

What Is the Right Sample Size for Research? , year =

[71] [71]

International Conference on Learning Representations , volume=

Swe-bench: Can language models resolve real-world github issues? , author=. International Conference on Learning Representations , volume=

[72] [72]

Proceedings of the XXI Brazilian Symposium on Software Quality , pages=

Characterizing commits in open-source software , author=. Proceedings of the XXI Brazilian Symposium on Software Quality , pages=

[73] [73]

Proceedings of the 44th International Conference on Software Engineering , pages=

What makes a good commit message? , author=. Proceedings of the 44th International Conference on Software Engineering , pages=

[74] [74]

Grootendorst, Maarten , title =. n.d. , howpublished =

[75] [75]

arXiv preprint arXiv:2203.05794 , year=

BERTopic: Neural topic modeling with a class-based TF-IDF procedure , author=. arXiv preprint arXiv:2203.05794 , year=

Pith/arXiv arXiv

[76] [76]

Frontiers in sociology , volume=

A topic modeling comparison between lda, nmf, top2vec, and bertopic to demystify twitter posts , author=. Frontiers in sociology , volume=. 2022 , publisher=

2022

[77] [77]

2014 IEEE international conference on software maintenance and evolution , pages=

Clustering commits for understanding the intents of implementation , author=. 2014 IEEE international conference on software maintenance and evolution , pages=. 2014 , organization=

2014

[78] [78]

2023 IEEE/ACM International Conference on Software Engineering: Future of Software Engineering (ICSE-FoSE) , pages=

Large language models for software engineering: Survey and open problems , author=. 2023 IEEE/ACM International Conference on Software Engineering: Future of Software Engineering (ICSE-FoSE) , pages=. 2023 , organization=

2023

[79] [79]

Proceedings of the 18th ACM/IEEE International Symposium on Empirical Software Engineering and Measurement , pages=

Multi-language software development in the llm era: Insights from practitioners’ conversations with chatgpt , author=. Proceedings of the 18th ACM/IEEE International Symposium on Empirical Software Engineering and Measurement , pages=

[80] [80]

IEEE Software , volume=

Application of large language models to software engineering tasks: Opportunities, risks, and implications , author=. IEEE Software , volume=. 2023 , publisher=

2023