PatchTrack: A Comprehensive Analysis of ChatGPT's Influence on Pull Request Outcomes

Daniel Ogenrwot; John Businge

arxiv: 2505.07700 · v3 · submitted 2025-05-12 · 💻 cs.SE

PatchTrack: A Comprehensive Analysis of ChatGPT's Influence on Pull Request Outcomes

Daniel Ogenrwot , John Businge This is my paper

Pith reviewed 2026-05-22 16:08 UTC · model grok-4.3

classification 💻 cs.SE

keywords pull requestsChatGPTAI-generated codepatch integrationsoftware collaborationGitHubcode review

0 comments

The pith

Developers rarely adopt ChatGPT-generated code fully in pull requests, instead using it as a starting point that shapes adaptation and review discussions.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

This paper examines real-world use of ChatGPT in GitHub pull requests by studying hundreds of cases where developers openly noted the tool's involvement. It tracks how often the AI code gets incorporated and identifies the ways developers modify or draw from it during collaboration. The analysis shows that complete acceptance is uncommon and that the tool's role often extends to providing ideas and strategies rather than ready-made solutions. These patterns matter because they illustrate how generative AI changes the process of negotiating and refining code in team settings.

Core claim

The study of 338 pull requests with self-admitted ChatGPT usage, covering 645 AI-generated snippets and 3486 developer patches, finds a median integration rate of 25 percent. Qualitative examination of 89 cases with integrated patches identifies recurring patterns of structural integration, selective extraction, and iterative refinement. Developers treat AI output as a starting point rather than a final implementation. Even without direct adoption, ChatGPT affects workflows through conceptual guidance, documentation, and debugging. Integration decisions depend on contextual fit, integration effort, maintainer trust, and established review norms rather than serving as direct measures of code

What carries the argument

PatchTrack, an automated classifier that determines whether AI-generated patches were applied, partially reused, or not integrated into pull requests.

If this is right

Full adoption of ChatGPT-generated code is uncommon in pull request workflows.
Developers typically treat AI output as a starting point rather than a final implementation.
ChatGPT influences workflows through conceptual guidance, documentation, and debugging strategies even when code is not directly adopted.
Integration decisions reflect contextual fit, integration effort, maintainer trust, and established pull request review norms.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

AI coding tools could be designed to better support partial reuse and adaptation of suggestions rather than aiming for complete replacements.
Similar analyses of other large language models might show whether the observed integration patterns hold beyond ChatGPT.
The work implies that AI assistance may gradually alter established norms in code review and collaboration.

Load-bearing premise

The dataset of 338 pull requests containing self-admitted ChatGPT usage accurately represents typical AI-assisted development without significant selection or reporting bias.

What would settle it

Measuring integration rates in a larger set of pull requests from projects known to use AI tools but identified without requiring self-admission and finding substantially different rates would challenge the representativeness of the observed patterns.

read the original abstract

The rapid adoption of large language models (LLMs) like ChatGPT has introduced new dynamics in software development, particularly within pull request workflows. While prior research has examined the quality of AI-generated code, less is known about how developers evaluate, adapt, and integrate these suggestions in real-world collaboration. We analyze 338 pull requests from 255 GitHub repositories containing self-admitted ChatGPT usage, comprising 645 AI-generated snippets and 3,486 developer-authored patches. To support this analysis at scale, we use PatchTrack, an automated classifier that identifies whether AI-generated patches were applied, partially reused, or not integrated. Our findings reveal that full adoption of ChatGPT-generated code is uncommon: the median integration rate is 25%. Qualitative analysis of 89 pull requests with integrated patches reveals recurring patterns of structural integration, selective extraction, and iterative refinement, indicating that developers typically treat AI output as a starting point rather than a final implementation. Even when code is not directly adopted, ChatGPT influences workflows through conceptual guidance, documentation, and debugging strategies. Integration decisions reflect contextual fit, integration effort, maintainer trust, and established pull request review norms rather than serving as direct indicators of code correctness. Overall, this study provides empirical insight into AI-mediated decision-making in collaborative software development, showing that the influence of generative AI extends beyond patch generation to how developers reason about, adapt, and negotiate code during review within pull request workflows. These findings inform the design of AI-assisted tools and support more transparent and effective use of LLMs in practice.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper tracks self-admitted ChatGPT use in 338 real PRs and finds low direct integration plus broader workflow effects, but the sample choice undercuts how far the patterns can be generalized.

read the letter

The main takeaway is that developers rarely drop ChatGPT code straight into a pull request. The median full-integration rate sits at 25 percent across the 338 cases, and even when code gets in it usually arrives after selective extraction or iterative tweaks. The work also notes that ChatGPT shapes thinking around documentation, debugging, and review strategy even when the actual patch is rejected. That split between direct adoption and indirect influence is the clearest new observation here. PatchTrack, the classifier they built to label integration at scale, lets them handle the volume without reading every diff by hand, and the qualitative pass on 89 integrated cases surfaces recurring patterns that line up with how reviews actually work on GitHub. Credit for using observable repository data instead of lab tasks or self-report surveys. The numbers and examples give a grounded picture of current practice. The soft spot is the data source itself. Self-admitted usage selects for developers willing to disclose the tool, which likely correlates with different trust levels, project norms, or transparency habits than the larger set of hidden uses. The abstract does not show any check against that selection effect or any comparison to non-admitted PRs, so the 25 percent figure and the claims about reasoning during review rest on a potentially skewed base. Classifier validation details and handling of confounders such as PR size are also thin in the summary. This is useful for software-engineering researchers who study LLM adoption in open-source workflows or who design review-support tools. A reader looking for empirical patterns rather than theory would get concrete material to build on. The paper has enough real data and a novel classifier to merit serious referee time, even if the bias discussion needs strengthening. I would send it out for review with a request for explicit limitations on the sample and more method validation.

Referee Report

2 major / 2 minor

Summary. The manuscript analyzes 338 pull requests from 255 GitHub repositories containing self-admitted ChatGPT usage, comprising 645 AI-generated snippets and 3,486 developer-authored patches. It introduces PatchTrack, an automated classifier to categorize whether AI patches were fully applied, partially reused, or not integrated. Findings show a median integration rate of 25%, with qualitative review of 89 integrated cases revealing patterns of structural integration, selective extraction, and iterative refinement. The study concludes that developers treat AI output as a starting point, that ChatGPT influences workflows via conceptual guidance even without direct adoption, and that integration decisions depend on contextual fit, effort, trust, and review norms rather than code correctness alone. Overall, the paper claims that generative AI shapes not only patch generation but also reasoning, adaptation, and negotiation during PR review.

Significance. If the core empirical patterns hold after addressing methodological gaps, this study offers meaningful insight into LLM use in real collaborative software development. The scale of the self-admitted dataset and the mixed quantitative-qualitative approach provide concrete observations on integration rates and adaptation behaviors that go beyond synthetic benchmarks. Strengths include the focus on actual PR workflows and the identification of recurring developer strategies; these can usefully inform tool design and guidelines for transparent LLM adoption. The work is a solid empirical contribution to the growing literature on AI-assisted development.

major comments (2)

[Abstract and data collection] Abstract and data collection description: The central claim that the influence of generative AI extends to how developers reason about, adapt, and negotiate code in PR workflows rests on the 338 self-admitted PRs (and the 89 qualitatively reviewed) being representative of typical AI-assisted development. Self-admission selects for developers willing to disclose usage, which may correlate with higher transparency, different trust levels, or project norms that favor integration; this selection effect is not mitigated or quantified in the described collection approach and directly affects the generalizability of the 25% median adoption figure and the qualitative themes.
[PatchTrack classifier and qualitative analysis] PatchTrack classifier and qualitative analysis sections: The manuscript provides insufficient detail on validation of the automated classifier (e.g., precision, recall, or agreement with manual labels), inter-rater reliability for the coding of the 89 cases, and any controls for confounding variables such as PR size, complexity, or repository-specific review norms. These elements are load-bearing for the reliability of the reported integration patterns and the distinction between full, partial, and non-integration.

minor comments (2)

[Abstract] The abstract would benefit from an explicit sentence on the limitations of relying on self-admitted usage to help readers calibrate expectations about generalizability.
[Results] Notation for the three integration categories (full, partial, none) should be defined consistently in the text and any tables or figures that report the 25% median rate.

Simulated Author's Rebuttal

2 responses · 1 unresolved

We thank the referee for the constructive feedback, which identifies key areas for improving methodological transparency and acknowledging limitations. We have revised the manuscript to address both major comments by expanding the limitations discussion and adding validation details.

read point-by-point responses

Referee: [Abstract and data collection] Abstract and data collection description: The central claim that the influence of generative AI extends to how developers reason about, adapt, and negotiate code in PR workflows rests on the 338 self-admitted PRs (and the 89 qualitatively reviewed) being representative of typical AI-assisted development. Self-admission selects for developers willing to disclose usage, which may correlate with higher transparency, different trust levels, or project norms that favor integration; this selection effect is not mitigated or quantified in the described collection approach and directly affects the generalizability of the 25% median adoption figure and the qualitative themes.

Authors: We acknowledge that reliance on self-admitted ChatGPT usage introduces a selection bias, as developers who publicly disclose AI assistance may differ systematically in transparency, trust levels, or project norms from those who do not. This is an inherent challenge when studying emerging practices without platform-level logging of AI tool use. In the revised manuscript, we have added an expanded Limitations section that explicitly discusses this selection effect, its potential influence on the observed 25% median integration rate and qualitative themes, and the resulting bounds on generalizability. We have clarified that findings are presented as observations from disclosed cases rather than claims of representativeness across all AI-assisted development, and we suggest directions for future work using complementary identification methods. revision: yes
Referee: [PatchTrack classifier and qualitative analysis] PatchTrack classifier and qualitative analysis sections: The manuscript provides insufficient detail on validation of the automated classifier (e.g., precision, recall, or agreement with manual labels), inter-rater reliability for the coding of the 89 cases, and any controls for confounding variables such as PR size, complexity, or repository-specific review norms. These elements are load-bearing for the reliability of the reported integration patterns and the distinction between full, partial, and non-integration.

Authors: We agree that additional methodological detail is required to support the reliability of PatchTrack and the qualitative findings. In the revised manuscript, we have inserted a dedicated validation subsection for the PatchTrack classifier that reports agreement metrics with manual labels on a held-out set. For the qualitative coding of the 89 integrated cases, we now include inter-rater reliability statistics. We have also added explicit discussion of how we considered potential confounders such as PR size, complexity, and repository norms, including stratification where data permitted and sensitivity checks in the thematic analysis. These revisions directly address the load-bearing elements raised. revision: yes

standing simulated objections not resolved

Fully quantifying the magnitude of selection bias from self-admission would require a separate comparative study of undisclosed AI usage, which exceeds the scope of this observational analysis.

Circularity Check

0 steps flagged

No circularity: purely observational empirical study with no derivations or self-referential reductions

full rationale

This paper conducts an empirical analysis of 338 GitHub pull requests containing self-admitted ChatGPT usage, using data collection, an automated classifier (PatchTrack) for integration patterns, and qualitative coding on a subset of 89 PRs. All claims rest on observed frequencies, median integration rates, and recurring patterns identified in the collected data rather than any mathematical derivation, fitted-parameter prediction, or self-citation chain that reduces the central findings to the inputs by construction. The study is self-contained against external benchmarks of GitHub data and qualitative methods, with no load-bearing steps that equate outputs to inputs via definition or prior author work.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 1 invented entities

The central claims rest on the representativeness of self-admitted usage data and the reliability of the newly introduced PatchTrack classifier for categorizing patch integration.

axioms (1)

domain assumption Self-admitted ChatGPT usage in pull request descriptions serves as a reliable indicator of actual AI assistance without substantial false positives or under-reporting bias.
The study selects PRs based on explicit mentions; this assumption underpins the entire dataset construction and generalizability of findings.

invented entities (1)

PatchTrack automated classifier no independent evidence
purpose: To scale identification of whether AI-generated patches were fully applied, partially reused, or not integrated across hundreds of PRs.
New tool created for the study; no external validation or independent evidence of accuracy is mentioned in the abstract.

pith-pipeline@v0.9.0 · 5810 in / 1511 out tokens · 54868 ms · 2026-05-22T16:08:13.835831+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Foundation/RealityFromDistinction.lean reality_from_one_distinction unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

We analyze 338 pull requests ... use PatchTrack, an automated classifier that identifies whether AI-generated patches were applied, partially reused, or not integrated. ... median integration rate is 25%.
IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

Qualitative analysis ... recurring patterns of structural integration, selective extraction, and iterative refinement

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Forward citations

Cited by 2 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

AgenticFlict: A Large-Scale Dataset of Merge Conflicts in AI Coding Agent Pull Requests on GitHub
cs.SE 2026-04 accept novelty 7.0

AgenticFlict is a public dataset of 29K+ textual merge conflicts from AI agent PRs, collected via merge simulation on 107K processed PRs and showing a 27.67% conflict rate with variation across agents.
How AI Coding Agents Modify Code: A Large-Scale Study of GitHub Pull Requests
cs.SE 2026-01 unverdicted novelty 7.0

AI coding agents produce pull requests with substantially more commits and slightly higher description-to-diff similarity than human developers, based on analysis of 29,095 merged PRs.

Reference graph

Works this paper leans on

116 extracted references · 116 canonical work pages · cited by 2 Pith papers · 4 internal anchors

[1]

Automated Software Engineering27(4), 459–489 (2020) https: //doi.org/10.1007/s10515-020-00280-2

Menzies, T., Pecheur, C.: Software engineering with ai/ml: State of the art and future prospects. Automated Software Engineering27(4), 459–489 (2020) https: //doi.org/10.1007/s10515-020-00280-2

work page doi:10.1007/s10515-020-00280-2 2020
[2]

ACM Trans

Russo, D.: Navigating the complexity of generative ai adoption in software engi- neering. ACM Trans. Softw. Eng. Methodol. (2024) https://doi.org/10.1145/ 3652154 . Just Accepted

work page 2024
[3]

arXiv preprint arXiv:2403.02583 (2024) https://doi.org/10.48550/arXiv.2403.02583

Huang, Y., Chen, Y., Chen, X., Chen, J., Peng, R., Tang, Z., Huang, J., Xu, F., Zheng, Z.: Generative software engineering. arXiv preprint arXiv:2403.02583 (2024) https://doi.org/10.48550/arXiv.2403.02583 . Submitted on 5 Mar 2024, last revised 3 Apr 2024 (this version, v2)

work page doi:10.48550/arxiv.2403.02583 2024
[4]

IEEE Software 40(4), 30–38 (2023) https://doi.org/10.1109/MS.2023.3265877

Ebert, C., Louridas, P.: Generative ai for software practitioners. IEEE Software 40(4), 30–38 (2023) https://doi.org/10.1109/MS.2023.3265877

work page doi:10.1109/ms.2023.3265877 2023
[5]

Automated Software Engineering 31(26) (2024) https://doi.org/10.1007/s10515-024-00330-1 43

Sauvola, J., Tarkoma, S., Klemettinen, M., Riekki, J., Doermann, D.: Future of software development with generative ai. Automated Software Engineering 31(26) (2024) https://doi.org/10.1007/s10515-024-00330-1 43

work page doi:10.1007/s10515-024-00330-1 2024
[6]

European Journal of Technic (2023) https://doi.org/10.36222/ejt.1330631

Ozpolat, Z., Yildirim, Karabatak, M.: Artificial intelligence-based tools in software development processes: Application of chatgpt. European Journal of Technic (2023) https://doi.org/10.36222/ejt.1330631

work page doi:10.36222/ejt.1330631 2023
[7]

The Impact of AI on Developer Productivity: Evidence from GitHub Copilot

Peng, S., Kalliamvakou, E., Cihon, P., Demirer, M.: The impact of ai on developer productivity: Evidence from github copilot. arXiv preprint arXiv:2302.06590 (2023) https://doi.org/10.48550/arXiv.2302.06590 . Submit- ted on 13 Feb 2023

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2302.06590 2023
[8]

In: Proceedings of the 54th ACM Technical Symposium on Computer Sci- ence Education, p

Wermelinger, M.: Using github copilot to solve simple programming problems. In: Proceedings of the 54th ACM Technical Symposium on Computer Sci- ence Education, p. 7. ACM, Toronto, Canada (2023). https://doi.org/10.1145/ 3545945.3569830

work page arXiv 2023
[9]

In: Proceedings of the 21st Inter- national Conference on Mining Software Repositories

Jin, K., Wang, C.-Y., Pham, H.V., Hemmati, H.: Can chatgpt support devel- opers? an empirical evaluation of large language models for code generation. In: Proceedings of the 21st International Conference on Mining Software Repositories. MSR ’24, pp. 167–171. Association for Computing Machin- ery, New York, NY, USA (2024). https://doi.org/10.1145/3643991.3...

work page doi:10.1145/3643991.3645074 2024
[10]

In: Proceedings of the 21st International Conference on Mining Software Repositories

Grewal, B., Lu, W., Nadi, S., Bezemer, C.-P.: Analyzing developer use of chat- gpt generated code in open source github projects. In: Proceedings of the 21st International Conference on Mining Software Repositories. MSR ’24, pp. 157–

work page
[11]

In: Proceedings of the 21st Inter- national Conference on Mining Software Repositories

Association for Computing Machinery, New York, NY, USA (2024). https: //doi.org/10.1145/3643991.3645072 .https://doi.org/10.1145/3643991.3645072

work page doi:10.1145/3643991.3645072 2024
[12]

In: Proceedings of the 21st International Conference on Mining Software Repositories

Siddiq, M.L., Roney, L., Zhang, J., Santos, J.C.D.S.: Quality assessment of chat- gpt generated code and their use by developers. In: Proceedings of the 21st International Conference on Mining Software Repositories. MSR ’24, pp. 152–

work page
[13]

In: Proceedings of the 21st Inter- national Conference on Mining Software Repositories

Association for Computing Machinery, New York, NY, USA (2024). https: //doi.org/10.1145/3643991.3645071 .https://doi.org/10.1145/3643991.3645071

work page doi:10.1145/3643991.3645071 2024
[14]

In: Proceedings of the 2013 9th Joint Meeting on Foundations of Software Engineering

Rigby, P.C., Bird, C.: Convergent contemporary software peer review prac- tices. In: Proceedings of the 2013 9th Joint Meeting on Foundations of Software Engineering. ESEC/FSE 2013, pp. 202–212. Association for Computing Machin- ery, New York, NY, USA (2013). https://doi.org/10.1145/2491411.2491444 . https://doi.org/10.1145/2491411.2491444

work page doi:10.1145/2491411.2491444 2013
[15]

Murphy-Hill, and Robert W

Bacchelli, A., Bird, C.: Expectations, outcomes, and challenges of modern code review. In: 2013 35th International Conference on Software Engineering (ICSE), pp. 712–721 (2013). https://doi.org/10.1109/ICSE.2013.6606617

work page doi:10.1109/icse.2013.6606617 2013
[16]

IEEE Transactions on Software Engineering43(2), 185–204 (2017) https://doi.org/10.1109/TSE.2016.2584053 44

Storey, M.-A., Zagalsky, A., Filho, F.F., Singer, L., German, D.M.: How social and communication channels shape and challenge a participatory culture in soft- ware development. IEEE Transactions on Software Engineering43(2), 185–204 (2017) https://doi.org/10.1109/TSE.2016.2584053 44

work page doi:10.1109/tse.2016.2584053 2017
[17]

In: Proceedings of the 36th International Conference on Software Engineering

Gousios, G., Pinzger, M., Deursen, A.v.: An exploratory study of the pull- based software development model. In: Proceedings of the 36th International Conference on Software Engineering. ICSE 2014, pp. 345–355. Association for Computing Machinery, New York, NY, USA (2014). https://doi.org/10.1145/ 2568225.2568260 .https://doi.org/10.1145/2568225.2568260

work page doi:10.1145/2568225.2568260 2014
[18]

In: Proceedings of the 36th International Conference on Software Engineering

Tsay, J., Dabbish, L., Herbsleb, J.: Influence of social and technical factors for evaluating contribution in github. In: Proceedings of the 36th International Conference on Software Engineering. ICSE 2014, pp. 356–366. Association for Computing Machinery, New York, NY, USA (2014). https://doi.org/10.1145/ 2568225.2568315 .https://doi.org/10.1145/2568225.2568315

work page doi:10.1145/2568225.2568315 2014
[19]

In: Proceedings of the 38th International Conference on Software Engineering

Gousios, G., Storey, M.-A., Bacchelli, A.: Work practices and challenges in pull-based development: the contributor’s perspective. In: Proceedings of the 38th International Conference on Software Engineering. ICSE ’16, pp. 285–

work page
[20]

https: //doi.org/10.1145/2884781.2884826 .https://doi.org/10.1145/2884781.2884826

Association for Computing Machinery, New York, NY, USA (2016). https: //doi.org/10.1145/2884781.2884826 .https://doi.org/10.1145/2884781.2884826

work page doi:10.1145/2884781.2884826 2016
[21]

In: Proceedings of the 30th Annual ACM Symposium on Applied Computing

Soares, D.M., Lima J´ unior, M.L., Murta, L., Plastino, A.: Acceptance factors of pull requests in open-source projects. In: Proceedings of the 30th Annual ACM Symposium on Applied Computing. SAC ’15, pp. 1541–1546. Association for Computing Machinery, New York, NY, USA (2015). https://doi.org/10.1145/ 2695664.2695856 .https://doi.org/10.1145/2695664.2695856

work page doi:10.1145/2695664.2695856 2015
[22]

In: Proceedings of the 2016 24th ACM SIGSOFT International Symposium on Foundations of Software Engi- neering

Zhu, J., Zhou, M., Mockus, A.: Effectiveness of code contribution: from patch-based to pull-request-based tools. In: Proceedings of the 2016 24th ACM SIGSOFT International Symposium on Foundations of Software Engi- neering. FSE 2016, pp. 871–882. Association for Computing Machinery, New York, NY, USA (2016). https://doi.org/10.1145/2950290.2950364 . https...

work page doi:10.1145/2950290.2950364 2016
[23]

Xiao, T., Hata, H., Treude, C., Matsumoto, K.: Generative ai for pull request descriptions: Adoption, impact, and developer interventions. Proc. ACM Softw. Eng.1(FSE) (2024) https://doi.org/10.1145/3643773

work page doi:10.1145/3643773 2024
[24]

ACM Press/Addison- Wesley, Reading, MA (1990)

Rich, C., Waters, R.C.: The Programmer’s Apprentice. ACM Press/Addison- Wesley, Reading, MA (1990)

work page 1990
[25]

Empirical Software Engineering24(4), 2140–2170 (2019) https://doi.org/10.1007/s10664-019-09696-8

Zhao, G., Costa, D.A., Zou, Y.: Improving the pull requests review process using learning-to-rank algorithms. Empirical Software Engineering24(4), 2140–2170 (2019) https://doi.org/10.1007/s10664-019-09696-8

work page doi:10.1007/s10664-019-09696-8 2019
[26]

In: Proceedings of the International Conference on Software and System Processes

Azeem, M.I., Panichella, S., Di Sorbo, A., Serebrenik, A., Wang, Q.: Action- based recommendation in pull-request development. In: Proceedings of the International Conference on Software and System Processes. ICSSP ’20, pp. 115–

work page
[27]

https: //doi.org/10.1145/3379177.3388904 .https://doi.org/10.1145/3379177.3388904 45

Association for Computing Machinery, New York, NY, USA (2020). https: //doi.org/10.1145/3379177.3388904 .https://doi.org/10.1145/3379177.3388904 45

work page doi:10.1145/3379177.3388904 2020
[28]

In: Proceedings of the 14th ACM / IEEE International Symposium on Empirical Software Engineering and Measurement (ESEM)

Dey, T., Mockus, A.: Effect of technical and social factors on pull request quality for the npm ecosystem. In: Proceedings of the 14th ACM / IEEE International Symposium on Empirical Software Engineering and Measurement (ESEM). ESEM ’20. Association for Computing Machin- ery, New York, NY, USA (2020). https://doi.org/10.1145/3382494.3410685 . https://doi....

work page doi:10.1145/3382494.3410685 2020
[29]

arXiv preprint arXiv:2402.15943 (2024)

Hassan, A.E., Lin, D., Rajbahadur, G.K., Gallaba, K., Cogo, F.R., Chen, B., Zhang, H., Thangarajah, K., Oliva, G.A., Lin, J., Abdullah, W.M., Jiang, Z.M.: Rethinking software engineering in the foundation model era: A curated cata- logue of challenges in the development of trustworthy fmware. arXiv preprint arXiv:2402.15943 (2024)

work page arXiv 2024
[30]

A Prompt Pattern Catalog to Enhance Prompt Engineering with ChatGPT

White, J., Fu, Q., Hays, S., Sandborn, M., Olea, C., Gilbert, H., Elnashar, A., Spencer-Smith, J., Schmidt, D.C.: A prompt pattern catalog to enhance prompt engineering with chatgpt. arXiv preprint arXiv:2302.11382 (2023)

work page internal anchor Pith review Pith/arXiv arXiv 2023
[31]

Journal of the American Medical Informatics Association, 037 (2024)

Luo, L., Ning, J., Zhao, Y., Wang, Z., Ding, Z., Chen, P., Fu, W., Han, Q., Xu, G., Qiu, Y., et al.: Taiyi: a bilingual fine-tuned large language model for diverse biomedical tasks. Journal of the American Medical Informatics Association, 037 (2024)

work page 2024
[32]

In: Gurevych, I., Miyao, Y

Howard, J., Ruder, S.: Universal language model fine-tuning for text classifica- tion. In: Gurevych, I., Miyao, Y. (eds.) Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pp. 328–339. Association for Computational Linguistics, Melbourne, Australia (2018). https://doi.org/10.18653/v1/P18-1031 ...

work page doi:10.18653/v1/p18-1031 2018
[33]

In 45th IEEE/ACM International Conference on Software Engineering, ICSE 2023, Melbourne, Australia, May 14-20, 2023

Jiang, N., Liu, K., Lutellier, T., Tan, L.: Impact of code language mod- els on automated program repair. In: Proceedings of the 45th Inter- national Conference on Software Engineering. ICSE ’23, pp. 1430–1442. IEEE Press, ??? (2023). https://doi.org/10.1109/ICSE48619.2023.00125 . https://doi.org/10.1109/ICSE48619.2023.00125

work page doi:10.1109/icse48619.2023.00125 2023
[34]

In: Proceedings of the IEEE/ACM 46th International Conference on Software Engineering

Guo, Q., Cao, J., Xie, X., Liu, S., Li, X., Chen, B., Peng, X.: Explor- ing the potential of chatgpt in automated code refinement: An empirical study. In: Proceedings of the IEEE/ACM 46th International Conference on Software Engineering. ICSE ’24, pp. 1–13. Association for Computing Machin- ery, New York, NY, USA (2024). https://doi.org/10.1145/3597503.36...

work page doi:10.1145/3597503.3623306 2024
[35]

Dataflow analysis-inspired deep learning for efficient vulnerability detection

Deng, Y., Xia, C.S., Yang, C., Zhang, S.D., Yang, S., Zhang, L.: Large lan- guage models are edge-case generators: Crafting unusual programs for fuzzing deep learning libraries. In: Proceedings of the IEEE/ACM 46th International Conference on Software Engineering. ICSE ’24, pp. 1–13. Association for Computing Machinery, New York, NY, USA (2024). https://d...

work page doi:10.1145/3597503.3623343 2024
[36]

arXiv preprint arXiv:2308.10620 (2023)

Hou, X., Zhao, Y., Liu, Y., Yang, Z., Wang, K., Li, L., Luo, X., Lo, D., Grundy, J., Wang, H.: Large language models for software engineering: A systematic literature review. arXiv preprint arXiv:2308.10620 (2023)

work page arXiv 2023
[37]

In: 2023 IEEE 34th International Symposium on Software Reliability Engineering (ISSRE), pp

Ju, J., Yu, L., Li, X., Yang, L., Zuo, C.: Llama-reviewer: Advancing code review automation with large language models through parameter-efficient fine- tuning. In: 2023 IEEE 34th International Symposium on Software Reliability Engineering (ISSRE), pp. 647–658. IEEE, ??? (2023)

work page 2023
[38]

arXiv preprint arXiv:2305.00418 (2023)

Siddiq, M.L., Santos, J., Tanvir, R.H., Ulfat, N., Rifat, F.A., Lopes, V.C.: Exploring the effectiveness of large language models in generating unit tests. arXiv preprint arXiv:2305.00418 (2023)

work page arXiv 2023
[39]

arXiv preprint arXiv:2402.13456 (2024)

Tufano, R., Mastropaolo, A., Pepe, F., Dabi´ c, O., Di Penta, M., Bavota, G.: Unveiling chatgpt’s usage in open source projects: A mining-based study. arXiv preprint arXiv:2402.13456 (2024). Paper accepted for publication at 21st International Conference on Mining Software Repositories (MASR’24)

work page arXiv 2024
[40]

IEEE Trans

Tufano, R., Dabi´ c, O., Mastropaolo, A., Ciniselli, M., Bavota, G.: Code review automation: Strengths and weaknesses of the state of the art. IEEE Trans. Softw. Eng.50(2), 338–353 (2024) https://doi.org/10.1109/TSE.2023.3348172

work page doi:10.1109/tse.2023.3348172 2024
[41]

In: Proceedings of the IEEE/ACM 46th International Conference on Software Engineering

Tanzil, M.H., Khan, J.Y., Uddin, G.: Chatgpt incorrectness detection in software reviews. In: Proceedings of the IEEE/ACM 46th International Confer- ence on Software Engineering. ICSE ’24. Association for Computing Machin- ery, New York, NY, USA (2024). https://doi.org/10.1145/3597503.3639194 . https://doi.org/10.1145/3597503.3639194

work page doi:10.1145/3597503.3639194 2024
[42]

arXiv preprint arXiv:2506.04418 (2025)

Nashid, N., Ding, D., Gallaba, K., Hassan, A.E., Mesbah, A.: Characterizing multi-hunk patches: Divergence, proximity, and llm repair challenges. arXiv preprint arXiv:2506.04418 (2025)

work page arXiv 2025
[43]

Zenodo (2023) https://doi.org/10.5281/zenodo.8304091

Xiao, T., Treude, C., Hata, H., Matsumoto, K.: Devgpt: Studying developer- chatgpt conversations. Zenodo (2023) https://doi.org/10.5281/zenodo.8304091

work page doi:10.5281/zenodo.8304091 2023
[44]

In: 2024 IEEE 35th International Symposium on Software Reliability Engineering (ISSRE), pp

Li, S., Cheng, Y., Chen, J., Xuan, J., He, S., Shang, W.: Assessing the per- formance of ai-generated code: A case study on github copilot. In: 2024 IEEE 35th International Symposium on Software Reliability Engineering (ISSRE), pp. 216–227 (2024). https://doi.org/10.1109/ISSRE62328.2024.00030

work page doi:10.1109/issre62328.2024.00030 2024
[45]

https://doi

Ogenrwot, D., Businge, J.: Replication Package for PatchTrack: A Comprehen- sive Analysis of ChatGPT’s Influence on Pull Request Outcomes. https://doi. org/10.5281/zenodo.14978624 . https://doi.org/10.5281/zenodo.14978624

work page doi:10.5281/zenodo.14978624
[46]

GitHub: Online Appendix. GitHub. https://www.gnu.org/software/diffutils/ 47 manual/html node/Hunks.html

work page
[47]

https://chatgpt.com/share/ 8cb16814-2855-4fbd-87e5-bde8ba349728

GitHub pull request (2023). https://chatgpt.com/share/ 8cb16814-2855-4fbd-87e5-bde8ba349728

work page 2023
[48]

https://github.com/faker-js/faker/pull/2405

GitHub pull request (2023). https://github.com/faker-js/faker/pull/2405

work page 2023
[49]

In: Proceedings of the 22nd Interna- tional Conference on Mining Software Repositories (MSR 2025)

Ehsani, R., Pathak, S., Chatterjee, P.: Towards detecting prompt knowledge gaps for improved llm-guided issue resolution. In: Proceedings of the 22nd Interna- tional Conference on Mining Software Repositories (MSR 2025). ACM, Ottawa, Canada (2025). To appear

work page 2025
[50]

GitHub: GitHub REST API Documentation. GitHub. https://docs.github.com/ en/rest?apiVersion=2022-11-28

work page 2022
[51]

OpenAI: Terms of Use. OpenAI. https://openai.com/policies/terms-of-use

work page
[52]

BMC Medical Research Methodology13(1), 117 (2013) https://doi

Gale, N.K., Heath, G., Cameron, E., Rashid, S., Redwood, S.: Using the frame- work method for the analysis of qualitative data in multi-disciplinary health research. BMC Medical Research Methodology13(1), 117 (2013) https://doi. org/10.1186/1471-2288-13-117

work page doi:10.1186/1471-2288-13-117 2013
[53]

Empirical Software Engineering28, 150 (2023) https://doi.org/ 10.1007/s10664-023-10394-9

Weeraddana, N.R., Xu, X., Alfadel, M.,et al.: An empirical comparison of ethnic and gender diversity of devops and non-devops contributions to open- source projects. Empirical Software Engineering28, 150 (2023) https://doi.org/ 10.1007/s10664-023-10394-9 . Accepted: 11 September 2023

work page doi:10.1007/s10664-023-10394-9 2023
[54]

In: Advances in Neural Information Processing Systems, Curran Associates, Inc., vol 34, pp 27,865–27,876,https://proceedings

Wang, L., Zheng, Z., Wu, X., Sang, B., Zhang, J., Tao, X.: Fork entropy: Assessing the diversity of open source software projects’ forks. In: 2023 38th IEEE/ACM International Conference on Automated Software Engineering (ASE), pp. 204–216 (2023). https://doi.org/10.1109/ASE56229.2023.00168

work page doi:10.1109/ase56229.2023.00168 2023
[55]

In: Proceedings of the 29th Edition of the IEEE International Conference on Software Analysis, Evolu- tion and Reengineering, pp

Businge, J., Decan, A., Zerouali, A., Mens, T., Demeyer, S., De Roover, C.: Variant forks – motivations and impediments. In: Proceedings of the 29th Edition of the IEEE International Conference on Software Analysis, Evolu- tion and Reengineering, pp. 867–877. IEEE Computer Society, ??? (2022). https://doi.org/10.1109/SANER53432.2022.00105

work page doi:10.1109/saner53432.2022.00105 2022
[56]

https://github.blog/ news-insights/octoverse/octoverse-2024/

GitHub: The State of Open Source: Octoverse 2024 (2024). https://github.blog/ news-insights/octoverse/octoverse-2024/

work page 2024
[57]

ReDeBug: Finding unpatched code clones in entire OS distributions,

Jang, J., Agrawal, A., Brumley, D.: Redebug: Finding unpatched code clones in entire os distributions. In: 2012 IEEE Symposium on Security and Privacy, pp. 48–62 (2012). https://doi.org/10.1109/SP.2012.13

work page doi:10.1109/sp.2012.13 2012
[58]

https://github.com/pokt-network/poktroll/pull/ 185

GitHub pull request (2024). https://github.com/pokt-network/poktroll/pull/ 185. 48

work page 2024
[59]

https://github.com/Mudlet/Mudlet/pull/7123

GitHub pull request (2024). https://github.com/Mudlet/Mudlet/pull/7123

work page 2024
[60]

https://github.com/nylas/nylas-python/pull/279

GitHub pull request (2024). https://github.com/nylas/nylas-python/pull/279

work page 2024
[61]

https://github.com/alshedivat/al-folio/pull/2059

GitHub pull request (2024). https://github.com/alshedivat/al-folio/pull/2059

work page 2024
[62]

https://github.com/laravel-json-api/core/pull/12

GitHub pull request (2023). https://github.com/laravel-json-api/core/pull/12

work page 2023
[63]

https://github.com/ory/elements/pull/171

GitHub pull request (2023). https://github.com/ory/elements/pull/171

work page 2023
[64]

https://github.com/darklang/dark/pull/5063

GitHub pull request (2023). https://github.com/darklang/dark/pull/5063

work page 2023
[65]

https://github.com/sveltejs/learn.svelte.dev/pull/ 522

GitHub pull request (2023). https://github.com/sveltejs/learn.svelte.dev/pull/ 522

work page 2023
[66]

https://github.com/darklang/dark/pull/5058

GitHub pull request (2023). https://github.com/darklang/dark/pull/5058

work page 2023
[67]

https://github.com/Bananapus/nana-core/pull/ 37

GitHub pull request (2023). https://github.com/Bananapus/nana-core/pull/ 37

work page 2023
[68]

In: Proceed- ings of the 39th IEEE/ACM International Conference on Automated Software Engineering (ASE) (2024)

Moumoula, M., Kabore, A., Klein, J., Bissyand´ e, T.: Cross-lingual code clone detection: When llms fall short against embedding-based classifier. In: Proceed- ings of the 39th IEEE/ACM International Conference on Automated Software Engineering (ASE) (2024)

work page 2024
[69]

In: Proceedings of the 39th IEEE/ACM International Conference on Automated Software Engineering (ASE) (2024)

Ou´ edraogo, W., Kabore, K., Tian, H., Song, Y., Koyuncu, A., Klein, J., Lo, D., Bissyand´ e, T.: Llms and prompting for unit test generation: A large-scale evaluation. In: Proceedings of the 39th IEEE/ACM International Conference on Automated Software Engineering (ASE) (2024)

work page 2024
[70]

ACM Transactions on Software Engineering and Methodology33(5), 1–44 (2024)

Chen, D., Liu, Y., Zhou, M., Zhao, Y., Wang, S., Wang, X., Chen, X., Bissyand´ e, T., Klein, J.: Llm for mobile: An initial roadmap. ACM Transactions on Software Engineering and Methodology33(5), 1–44 (2024)

work page 2024
[71]

https://github.com/codecrafters-io/frontend/pull/ 1061

GitHub pull request (2023). https://github.com/codecrafters-io/frontend/pull/ 1061

work page 2023
[72]

https://github.com/faker-js/faker/pull/2230

GitHub pull request (2023). https://github.com/faker-js/faker/pull/2230

work page 2023
[73]

https://github.com/digitalbitbox/ bitbox-wallet-app/pull/2415

GitHub pull request (2023). https://github.com/digitalbitbox/ bitbox-wallet-app/pull/2415

work page 2023
[74]

https://github.com/darklang/dark/pull/5068

GitHub pull request (2023). https://github.com/darklang/dark/pull/5068

work page 2023
[75]

https://github.com/gemini-hlsw/scheduler/pull/ 428

GitHub pull request (2024). https://github.com/gemini-hlsw/scheduler/pull/ 428

work page 2024
[76]

https://github.com/theosanderson/taxonium/pull/ 49 534

GitHub pull request (2024). https://github.com/theosanderson/taxonium/pull/ 49 534

work page 2024
[77]

https://github.com/open-learning-exchange/ myplanet/pull/2214

GitHub pull request (2024). https://github.com/open-learning-exchange/ myplanet/pull/2214

work page 2024
[78]

https://github.com/open-learning-exchange/ myplanet/pull/2212

GitHub pull request (2024). https://github.com/open-learning-exchange/ myplanet/pull/2212

work page 2024
[79]

https://github.com/labdao/plex/pull/468

GitHub pull request (2024). https://github.com/labdao/plex/pull/468

work page 2024
[80]

https://github.com/plausible/analytics/pull/3792

GitHub pull request (2024). https://github.com/plausible/analytics/pull/3792

work page 2024

Showing first 80 references.

[1] [1]

Automated Software Engineering27(4), 459–489 (2020) https: //doi.org/10.1007/s10515-020-00280-2

Menzies, T., Pecheur, C.: Software engineering with ai/ml: State of the art and future prospects. Automated Software Engineering27(4), 459–489 (2020) https: //doi.org/10.1007/s10515-020-00280-2

work page doi:10.1007/s10515-020-00280-2 2020

[2] [2]

ACM Trans

Russo, D.: Navigating the complexity of generative ai adoption in software engi- neering. ACM Trans. Softw. Eng. Methodol. (2024) https://doi.org/10.1145/ 3652154 . Just Accepted

work page 2024

[3] [3]

arXiv preprint arXiv:2403.02583 (2024) https://doi.org/10.48550/arXiv.2403.02583

Huang, Y., Chen, Y., Chen, X., Chen, J., Peng, R., Tang, Z., Huang, J., Xu, F., Zheng, Z.: Generative software engineering. arXiv preprint arXiv:2403.02583 (2024) https://doi.org/10.48550/arXiv.2403.02583 . Submitted on 5 Mar 2024, last revised 3 Apr 2024 (this version, v2)

work page doi:10.48550/arxiv.2403.02583 2024

[4] [4]

IEEE Software 40(4), 30–38 (2023) https://doi.org/10.1109/MS.2023.3265877

Ebert, C., Louridas, P.: Generative ai for software practitioners. IEEE Software 40(4), 30–38 (2023) https://doi.org/10.1109/MS.2023.3265877

work page doi:10.1109/ms.2023.3265877 2023

[5] [5]

Automated Software Engineering 31(26) (2024) https://doi.org/10.1007/s10515-024-00330-1 43

Sauvola, J., Tarkoma, S., Klemettinen, M., Riekki, J., Doermann, D.: Future of software development with generative ai. Automated Software Engineering 31(26) (2024) https://doi.org/10.1007/s10515-024-00330-1 43

work page doi:10.1007/s10515-024-00330-1 2024

[6] [6]

European Journal of Technic (2023) https://doi.org/10.36222/ejt.1330631

Ozpolat, Z., Yildirim, Karabatak, M.: Artificial intelligence-based tools in software development processes: Application of chatgpt. European Journal of Technic (2023) https://doi.org/10.36222/ejt.1330631

work page doi:10.36222/ejt.1330631 2023

[7] [7]

The Impact of AI on Developer Productivity: Evidence from GitHub Copilot

Peng, S., Kalliamvakou, E., Cihon, P., Demirer, M.: The impact of ai on developer productivity: Evidence from github copilot. arXiv preprint arXiv:2302.06590 (2023) https://doi.org/10.48550/arXiv.2302.06590 . Submit- ted on 13 Feb 2023

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2302.06590 2023

[8] [8]

In: Proceedings of the 54th ACM Technical Symposium on Computer Sci- ence Education, p

Wermelinger, M.: Using github copilot to solve simple programming problems. In: Proceedings of the 54th ACM Technical Symposium on Computer Sci- ence Education, p. 7. ACM, Toronto, Canada (2023). https://doi.org/10.1145/ 3545945.3569830

work page arXiv 2023

[9] [9]

In: Proceedings of the 21st Inter- national Conference on Mining Software Repositories

Jin, K., Wang, C.-Y., Pham, H.V., Hemmati, H.: Can chatgpt support devel- opers? an empirical evaluation of large language models for code generation. In: Proceedings of the 21st International Conference on Mining Software Repositories. MSR ’24, pp. 167–171. Association for Computing Machin- ery, New York, NY, USA (2024). https://doi.org/10.1145/3643991.3...

work page doi:10.1145/3643991.3645074 2024

[10] [10]

In: Proceedings of the 21st International Conference on Mining Software Repositories

Grewal, B., Lu, W., Nadi, S., Bezemer, C.-P.: Analyzing developer use of chat- gpt generated code in open source github projects. In: Proceedings of the 21st International Conference on Mining Software Repositories. MSR ’24, pp. 157–

work page

[11] [11]

In: Proceedings of the 21st Inter- national Conference on Mining Software Repositories

Association for Computing Machinery, New York, NY, USA (2024). https: //doi.org/10.1145/3643991.3645072 .https://doi.org/10.1145/3643991.3645072

work page doi:10.1145/3643991.3645072 2024

[12] [12]

In: Proceedings of the 21st International Conference on Mining Software Repositories

Siddiq, M.L., Roney, L., Zhang, J., Santos, J.C.D.S.: Quality assessment of chat- gpt generated code and their use by developers. In: Proceedings of the 21st International Conference on Mining Software Repositories. MSR ’24, pp. 152–

work page

[13] [13]

In: Proceedings of the 21st Inter- national Conference on Mining Software Repositories

Association for Computing Machinery, New York, NY, USA (2024). https: //doi.org/10.1145/3643991.3645071 .https://doi.org/10.1145/3643991.3645071

work page doi:10.1145/3643991.3645071 2024

[14] [14]

In: Proceedings of the 2013 9th Joint Meeting on Foundations of Software Engineering

Rigby, P.C., Bird, C.: Convergent contemporary software peer review prac- tices. In: Proceedings of the 2013 9th Joint Meeting on Foundations of Software Engineering. ESEC/FSE 2013, pp. 202–212. Association for Computing Machin- ery, New York, NY, USA (2013). https://doi.org/10.1145/2491411.2491444 . https://doi.org/10.1145/2491411.2491444

work page doi:10.1145/2491411.2491444 2013

[15] [15]

Murphy-Hill, and Robert W

Bacchelli, A., Bird, C.: Expectations, outcomes, and challenges of modern code review. In: 2013 35th International Conference on Software Engineering (ICSE), pp. 712–721 (2013). https://doi.org/10.1109/ICSE.2013.6606617

work page doi:10.1109/icse.2013.6606617 2013

[16] [16]

IEEE Transactions on Software Engineering43(2), 185–204 (2017) https://doi.org/10.1109/TSE.2016.2584053 44

Storey, M.-A., Zagalsky, A., Filho, F.F., Singer, L., German, D.M.: How social and communication channels shape and challenge a participatory culture in soft- ware development. IEEE Transactions on Software Engineering43(2), 185–204 (2017) https://doi.org/10.1109/TSE.2016.2584053 44

work page doi:10.1109/tse.2016.2584053 2017

[17] [17]

In: Proceedings of the 36th International Conference on Software Engineering

Gousios, G., Pinzger, M., Deursen, A.v.: An exploratory study of the pull- based software development model. In: Proceedings of the 36th International Conference on Software Engineering. ICSE 2014, pp. 345–355. Association for Computing Machinery, New York, NY, USA (2014). https://doi.org/10.1145/ 2568225.2568260 .https://doi.org/10.1145/2568225.2568260

work page doi:10.1145/2568225.2568260 2014

[18] [18]

In: Proceedings of the 36th International Conference on Software Engineering

Tsay, J., Dabbish, L., Herbsleb, J.: Influence of social and technical factors for evaluating contribution in github. In: Proceedings of the 36th International Conference on Software Engineering. ICSE 2014, pp. 356–366. Association for Computing Machinery, New York, NY, USA (2014). https://doi.org/10.1145/ 2568225.2568315 .https://doi.org/10.1145/2568225.2568315

work page doi:10.1145/2568225.2568315 2014

[19] [19]

In: Proceedings of the 38th International Conference on Software Engineering

Gousios, G., Storey, M.-A., Bacchelli, A.: Work practices and challenges in pull-based development: the contributor’s perspective. In: Proceedings of the 38th International Conference on Software Engineering. ICSE ’16, pp. 285–

work page

[20] [20]

https: //doi.org/10.1145/2884781.2884826 .https://doi.org/10.1145/2884781.2884826

Association for Computing Machinery, New York, NY, USA (2016). https: //doi.org/10.1145/2884781.2884826 .https://doi.org/10.1145/2884781.2884826

work page doi:10.1145/2884781.2884826 2016

[21] [21]

In: Proceedings of the 30th Annual ACM Symposium on Applied Computing

Soares, D.M., Lima J´ unior, M.L., Murta, L., Plastino, A.: Acceptance factors of pull requests in open-source projects. In: Proceedings of the 30th Annual ACM Symposium on Applied Computing. SAC ’15, pp. 1541–1546. Association for Computing Machinery, New York, NY, USA (2015). https://doi.org/10.1145/ 2695664.2695856 .https://doi.org/10.1145/2695664.2695856

work page doi:10.1145/2695664.2695856 2015

[22] [22]

In: Proceedings of the 2016 24th ACM SIGSOFT International Symposium on Foundations of Software Engi- neering

Zhu, J., Zhou, M., Mockus, A.: Effectiveness of code contribution: from patch-based to pull-request-based tools. In: Proceedings of the 2016 24th ACM SIGSOFT International Symposium on Foundations of Software Engi- neering. FSE 2016, pp. 871–882. Association for Computing Machinery, New York, NY, USA (2016). https://doi.org/10.1145/2950290.2950364 . https...

work page doi:10.1145/2950290.2950364 2016

[23] [23]

Xiao, T., Hata, H., Treude, C., Matsumoto, K.: Generative ai for pull request descriptions: Adoption, impact, and developer interventions. Proc. ACM Softw. Eng.1(FSE) (2024) https://doi.org/10.1145/3643773

work page doi:10.1145/3643773 2024

[24] [24]

ACM Press/Addison- Wesley, Reading, MA (1990)

Rich, C., Waters, R.C.: The Programmer’s Apprentice. ACM Press/Addison- Wesley, Reading, MA (1990)

work page 1990

[25] [25]

Empirical Software Engineering24(4), 2140–2170 (2019) https://doi.org/10.1007/s10664-019-09696-8

Zhao, G., Costa, D.A., Zou, Y.: Improving the pull requests review process using learning-to-rank algorithms. Empirical Software Engineering24(4), 2140–2170 (2019) https://doi.org/10.1007/s10664-019-09696-8

work page doi:10.1007/s10664-019-09696-8 2019

[26] [26]

In: Proceedings of the International Conference on Software and System Processes

Azeem, M.I., Panichella, S., Di Sorbo, A., Serebrenik, A., Wang, Q.: Action- based recommendation in pull-request development. In: Proceedings of the International Conference on Software and System Processes. ICSSP ’20, pp. 115–

work page

[27] [27]

https: //doi.org/10.1145/3379177.3388904 .https://doi.org/10.1145/3379177.3388904 45

Association for Computing Machinery, New York, NY, USA (2020). https: //doi.org/10.1145/3379177.3388904 .https://doi.org/10.1145/3379177.3388904 45

work page doi:10.1145/3379177.3388904 2020

[28] [28]

In: Proceedings of the 14th ACM / IEEE International Symposium on Empirical Software Engineering and Measurement (ESEM)

Dey, T., Mockus, A.: Effect of technical and social factors on pull request quality for the npm ecosystem. In: Proceedings of the 14th ACM / IEEE International Symposium on Empirical Software Engineering and Measurement (ESEM). ESEM ’20. Association for Computing Machin- ery, New York, NY, USA (2020). https://doi.org/10.1145/3382494.3410685 . https://doi....

work page doi:10.1145/3382494.3410685 2020

[29] [29]

arXiv preprint arXiv:2402.15943 (2024)

Hassan, A.E., Lin, D., Rajbahadur, G.K., Gallaba, K., Cogo, F.R., Chen, B., Zhang, H., Thangarajah, K., Oliva, G.A., Lin, J., Abdullah, W.M., Jiang, Z.M.: Rethinking software engineering in the foundation model era: A curated cata- logue of challenges in the development of trustworthy fmware. arXiv preprint arXiv:2402.15943 (2024)

work page arXiv 2024

[30] [30]

A Prompt Pattern Catalog to Enhance Prompt Engineering with ChatGPT

White, J., Fu, Q., Hays, S., Sandborn, M., Olea, C., Gilbert, H., Elnashar, A., Spencer-Smith, J., Schmidt, D.C.: A prompt pattern catalog to enhance prompt engineering with chatgpt. arXiv preprint arXiv:2302.11382 (2023)

work page internal anchor Pith review Pith/arXiv arXiv 2023

[31] [31]

Journal of the American Medical Informatics Association, 037 (2024)

Luo, L., Ning, J., Zhao, Y., Wang, Z., Ding, Z., Chen, P., Fu, W., Han, Q., Xu, G., Qiu, Y., et al.: Taiyi: a bilingual fine-tuned large language model for diverse biomedical tasks. Journal of the American Medical Informatics Association, 037 (2024)

work page 2024

[32] [32]

In: Gurevych, I., Miyao, Y

Howard, J., Ruder, S.: Universal language model fine-tuning for text classifica- tion. In: Gurevych, I., Miyao, Y. (eds.) Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pp. 328–339. Association for Computational Linguistics, Melbourne, Australia (2018). https://doi.org/10.18653/v1/P18-1031 ...

work page doi:10.18653/v1/p18-1031 2018

[33] [33]

In 45th IEEE/ACM International Conference on Software Engineering, ICSE 2023, Melbourne, Australia, May 14-20, 2023

Jiang, N., Liu, K., Lutellier, T., Tan, L.: Impact of code language mod- els on automated program repair. In: Proceedings of the 45th Inter- national Conference on Software Engineering. ICSE ’23, pp. 1430–1442. IEEE Press, ??? (2023). https://doi.org/10.1109/ICSE48619.2023.00125 . https://doi.org/10.1109/ICSE48619.2023.00125

work page doi:10.1109/icse48619.2023.00125 2023

[34] [34]

In: Proceedings of the IEEE/ACM 46th International Conference on Software Engineering

Guo, Q., Cao, J., Xie, X., Liu, S., Li, X., Chen, B., Peng, X.: Explor- ing the potential of chatgpt in automated code refinement: An empirical study. In: Proceedings of the IEEE/ACM 46th International Conference on Software Engineering. ICSE ’24, pp. 1–13. Association for Computing Machin- ery, New York, NY, USA (2024). https://doi.org/10.1145/3597503.36...

work page doi:10.1145/3597503.3623306 2024

[35] [35]

Dataflow analysis-inspired deep learning for efficient vulnerability detection

Deng, Y., Xia, C.S., Yang, C., Zhang, S.D., Yang, S., Zhang, L.: Large lan- guage models are edge-case generators: Crafting unusual programs for fuzzing deep learning libraries. In: Proceedings of the IEEE/ACM 46th International Conference on Software Engineering. ICSE ’24, pp. 1–13. Association for Computing Machinery, New York, NY, USA (2024). https://d...

work page doi:10.1145/3597503.3623343 2024

[36] [36]

arXiv preprint arXiv:2308.10620 (2023)

Hou, X., Zhao, Y., Liu, Y., Yang, Z., Wang, K., Li, L., Luo, X., Lo, D., Grundy, J., Wang, H.: Large language models for software engineering: A systematic literature review. arXiv preprint arXiv:2308.10620 (2023)

work page arXiv 2023

[37] [37]

In: 2023 IEEE 34th International Symposium on Software Reliability Engineering (ISSRE), pp

Ju, J., Yu, L., Li, X., Yang, L., Zuo, C.: Llama-reviewer: Advancing code review automation with large language models through parameter-efficient fine- tuning. In: 2023 IEEE 34th International Symposium on Software Reliability Engineering (ISSRE), pp. 647–658. IEEE, ??? (2023)

work page 2023

[38] [38]

arXiv preprint arXiv:2305.00418 (2023)

Siddiq, M.L., Santos, J., Tanvir, R.H., Ulfat, N., Rifat, F.A., Lopes, V.C.: Exploring the effectiveness of large language models in generating unit tests. arXiv preprint arXiv:2305.00418 (2023)

work page arXiv 2023

[39] [39]

arXiv preprint arXiv:2402.13456 (2024)

Tufano, R., Mastropaolo, A., Pepe, F., Dabi´ c, O., Di Penta, M., Bavota, G.: Unveiling chatgpt’s usage in open source projects: A mining-based study. arXiv preprint arXiv:2402.13456 (2024). Paper accepted for publication at 21st International Conference on Mining Software Repositories (MASR’24)

work page arXiv 2024

[40] [40]

IEEE Trans

Tufano, R., Dabi´ c, O., Mastropaolo, A., Ciniselli, M., Bavota, G.: Code review automation: Strengths and weaknesses of the state of the art. IEEE Trans. Softw. Eng.50(2), 338–353 (2024) https://doi.org/10.1109/TSE.2023.3348172

work page doi:10.1109/tse.2023.3348172 2024

[41] [41]

In: Proceedings of the IEEE/ACM 46th International Conference on Software Engineering

Tanzil, M.H., Khan, J.Y., Uddin, G.: Chatgpt incorrectness detection in software reviews. In: Proceedings of the IEEE/ACM 46th International Confer- ence on Software Engineering. ICSE ’24. Association for Computing Machin- ery, New York, NY, USA (2024). https://doi.org/10.1145/3597503.3639194 . https://doi.org/10.1145/3597503.3639194

work page doi:10.1145/3597503.3639194 2024

[42] [42]

arXiv preprint arXiv:2506.04418 (2025)

Nashid, N., Ding, D., Gallaba, K., Hassan, A.E., Mesbah, A.: Characterizing multi-hunk patches: Divergence, proximity, and llm repair challenges. arXiv preprint arXiv:2506.04418 (2025)

work page arXiv 2025

[43] [43]

Zenodo (2023) https://doi.org/10.5281/zenodo.8304091

Xiao, T., Treude, C., Hata, H., Matsumoto, K.: Devgpt: Studying developer- chatgpt conversations. Zenodo (2023) https://doi.org/10.5281/zenodo.8304091

work page doi:10.5281/zenodo.8304091 2023

[44] [44]

In: 2024 IEEE 35th International Symposium on Software Reliability Engineering (ISSRE), pp

Li, S., Cheng, Y., Chen, J., Xuan, J., He, S., Shang, W.: Assessing the per- formance of ai-generated code: A case study on github copilot. In: 2024 IEEE 35th International Symposium on Software Reliability Engineering (ISSRE), pp. 216–227 (2024). https://doi.org/10.1109/ISSRE62328.2024.00030

work page doi:10.1109/issre62328.2024.00030 2024

[45] [45]

https://doi

Ogenrwot, D., Businge, J.: Replication Package for PatchTrack: A Comprehen- sive Analysis of ChatGPT’s Influence on Pull Request Outcomes. https://doi. org/10.5281/zenodo.14978624 . https://doi.org/10.5281/zenodo.14978624

work page doi:10.5281/zenodo.14978624

[46] [46]

GitHub: Online Appendix. GitHub. https://www.gnu.org/software/diffutils/ 47 manual/html node/Hunks.html

work page

[47] [47]

https://chatgpt.com/share/ 8cb16814-2855-4fbd-87e5-bde8ba349728

GitHub pull request (2023). https://chatgpt.com/share/ 8cb16814-2855-4fbd-87e5-bde8ba349728

work page 2023

[48] [48]

https://github.com/faker-js/faker/pull/2405

GitHub pull request (2023). https://github.com/faker-js/faker/pull/2405

work page 2023

[49] [49]

In: Proceedings of the 22nd Interna- tional Conference on Mining Software Repositories (MSR 2025)

Ehsani, R., Pathak, S., Chatterjee, P.: Towards detecting prompt knowledge gaps for improved llm-guided issue resolution. In: Proceedings of the 22nd Interna- tional Conference on Mining Software Repositories (MSR 2025). ACM, Ottawa, Canada (2025). To appear

work page 2025

[50] [50]

GitHub: GitHub REST API Documentation. GitHub. https://docs.github.com/ en/rest?apiVersion=2022-11-28

work page 2022

[51] [51]

OpenAI: Terms of Use. OpenAI. https://openai.com/policies/terms-of-use

work page

[52] [52]

BMC Medical Research Methodology13(1), 117 (2013) https://doi

Gale, N.K., Heath, G., Cameron, E., Rashid, S., Redwood, S.: Using the frame- work method for the analysis of qualitative data in multi-disciplinary health research. BMC Medical Research Methodology13(1), 117 (2013) https://doi. org/10.1186/1471-2288-13-117

work page doi:10.1186/1471-2288-13-117 2013

[53] [53]

Empirical Software Engineering28, 150 (2023) https://doi.org/ 10.1007/s10664-023-10394-9

Weeraddana, N.R., Xu, X., Alfadel, M.,et al.: An empirical comparison of ethnic and gender diversity of devops and non-devops contributions to open- source projects. Empirical Software Engineering28, 150 (2023) https://doi.org/ 10.1007/s10664-023-10394-9 . Accepted: 11 September 2023

work page doi:10.1007/s10664-023-10394-9 2023

[54] [54]

In: Advances in Neural Information Processing Systems, Curran Associates, Inc., vol 34, pp 27,865–27,876,https://proceedings

Wang, L., Zheng, Z., Wu, X., Sang, B., Zhang, J., Tao, X.: Fork entropy: Assessing the diversity of open source software projects’ forks. In: 2023 38th IEEE/ACM International Conference on Automated Software Engineering (ASE), pp. 204–216 (2023). https://doi.org/10.1109/ASE56229.2023.00168

work page doi:10.1109/ase56229.2023.00168 2023

[55] [55]

In: Proceedings of the 29th Edition of the IEEE International Conference on Software Analysis, Evolu- tion and Reengineering, pp

Businge, J., Decan, A., Zerouali, A., Mens, T., Demeyer, S., De Roover, C.: Variant forks – motivations and impediments. In: Proceedings of the 29th Edition of the IEEE International Conference on Software Analysis, Evolu- tion and Reengineering, pp. 867–877. IEEE Computer Society, ??? (2022). https://doi.org/10.1109/SANER53432.2022.00105

work page doi:10.1109/saner53432.2022.00105 2022

[56] [56]

https://github.blog/ news-insights/octoverse/octoverse-2024/

GitHub: The State of Open Source: Octoverse 2024 (2024). https://github.blog/ news-insights/octoverse/octoverse-2024/

work page 2024

[57] [57]

ReDeBug: Finding unpatched code clones in entire OS distributions,

Jang, J., Agrawal, A., Brumley, D.: Redebug: Finding unpatched code clones in entire os distributions. In: 2012 IEEE Symposium on Security and Privacy, pp. 48–62 (2012). https://doi.org/10.1109/SP.2012.13

work page doi:10.1109/sp.2012.13 2012

[58] [58]

https://github.com/pokt-network/poktroll/pull/ 185

GitHub pull request (2024). https://github.com/pokt-network/poktroll/pull/ 185. 48

work page 2024

[59] [59]

https://github.com/Mudlet/Mudlet/pull/7123

GitHub pull request (2024). https://github.com/Mudlet/Mudlet/pull/7123

work page 2024

[60] [60]

https://github.com/nylas/nylas-python/pull/279

GitHub pull request (2024). https://github.com/nylas/nylas-python/pull/279

work page 2024

[61] [61]

https://github.com/alshedivat/al-folio/pull/2059

GitHub pull request (2024). https://github.com/alshedivat/al-folio/pull/2059

work page 2024

[62] [62]

https://github.com/laravel-json-api/core/pull/12

GitHub pull request (2023). https://github.com/laravel-json-api/core/pull/12

work page 2023

[63] [63]

https://github.com/ory/elements/pull/171

GitHub pull request (2023). https://github.com/ory/elements/pull/171

work page 2023

[64] [64]

https://github.com/darklang/dark/pull/5063

GitHub pull request (2023). https://github.com/darklang/dark/pull/5063

work page 2023

[65] [65]

https://github.com/sveltejs/learn.svelte.dev/pull/ 522

GitHub pull request (2023). https://github.com/sveltejs/learn.svelte.dev/pull/ 522

work page 2023

[66] [66]

https://github.com/darklang/dark/pull/5058

GitHub pull request (2023). https://github.com/darklang/dark/pull/5058

work page 2023

[67] [67]

https://github.com/Bananapus/nana-core/pull/ 37

GitHub pull request (2023). https://github.com/Bananapus/nana-core/pull/ 37

work page 2023

[68] [68]

In: Proceed- ings of the 39th IEEE/ACM International Conference on Automated Software Engineering (ASE) (2024)

Moumoula, M., Kabore, A., Klein, J., Bissyand´ e, T.: Cross-lingual code clone detection: When llms fall short against embedding-based classifier. In: Proceed- ings of the 39th IEEE/ACM International Conference on Automated Software Engineering (ASE) (2024)

work page 2024

[69] [69]

In: Proceedings of the 39th IEEE/ACM International Conference on Automated Software Engineering (ASE) (2024)

Ou´ edraogo, W., Kabore, K., Tian, H., Song, Y., Koyuncu, A., Klein, J., Lo, D., Bissyand´ e, T.: Llms and prompting for unit test generation: A large-scale evaluation. In: Proceedings of the 39th IEEE/ACM International Conference on Automated Software Engineering (ASE) (2024)

work page 2024

[70] [70]

ACM Transactions on Software Engineering and Methodology33(5), 1–44 (2024)

Chen, D., Liu, Y., Zhou, M., Zhao, Y., Wang, S., Wang, X., Chen, X., Bissyand´ e, T., Klein, J.: Llm for mobile: An initial roadmap. ACM Transactions on Software Engineering and Methodology33(5), 1–44 (2024)

work page 2024

[71] [71]

https://github.com/codecrafters-io/frontend/pull/ 1061

GitHub pull request (2023). https://github.com/codecrafters-io/frontend/pull/ 1061

work page 2023

[72] [72]

https://github.com/faker-js/faker/pull/2230

GitHub pull request (2023). https://github.com/faker-js/faker/pull/2230

work page 2023

[73] [73]

https://github.com/digitalbitbox/ bitbox-wallet-app/pull/2415

GitHub pull request (2023). https://github.com/digitalbitbox/ bitbox-wallet-app/pull/2415

work page 2023

[74] [74]

https://github.com/darklang/dark/pull/5068

GitHub pull request (2023). https://github.com/darklang/dark/pull/5068

work page 2023

[75] [75]

https://github.com/gemini-hlsw/scheduler/pull/ 428

GitHub pull request (2024). https://github.com/gemini-hlsw/scheduler/pull/ 428

work page 2024

[76] [76]

https://github.com/theosanderson/taxonium/pull/ 49 534

GitHub pull request (2024). https://github.com/theosanderson/taxonium/pull/ 49 534

work page 2024

[77] [77]

https://github.com/open-learning-exchange/ myplanet/pull/2214

GitHub pull request (2024). https://github.com/open-learning-exchange/ myplanet/pull/2214

work page 2024

[78] [78]

https://github.com/open-learning-exchange/ myplanet/pull/2212

GitHub pull request (2024). https://github.com/open-learning-exchange/ myplanet/pull/2212

work page 2024

[79] [79]

https://github.com/labdao/plex/pull/468

GitHub pull request (2024). https://github.com/labdao/plex/pull/468

work page 2024

[80] [80]

https://github.com/plausible/analytics/pull/3792

GitHub pull request (2024). https://github.com/plausible/analytics/pull/3792

work page 2024