pith. sign in

arxiv: 2505.07700 · v3 · submitted 2025-05-12 · 💻 cs.SE

PatchTrack: A Comprehensive Analysis of ChatGPT's Influence on Pull Request Outcomes

Pith reviewed 2026-05-22 16:08 UTC · model grok-4.3

classification 💻 cs.SE
keywords pull requestsChatGPTAI-generated codepatch integrationsoftware collaborationGitHubcode review
0
0 comments X

The pith

Developers rarely adopt ChatGPT-generated code fully in pull requests, instead using it as a starting point that shapes adaptation and review discussions.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

This paper examines real-world use of ChatGPT in GitHub pull requests by studying hundreds of cases where developers openly noted the tool's involvement. It tracks how often the AI code gets incorporated and identifies the ways developers modify or draw from it during collaboration. The analysis shows that complete acceptance is uncommon and that the tool's role often extends to providing ideas and strategies rather than ready-made solutions. These patterns matter because they illustrate how generative AI changes the process of negotiating and refining code in team settings.

Core claim

The study of 338 pull requests with self-admitted ChatGPT usage, covering 645 AI-generated snippets and 3486 developer patches, finds a median integration rate of 25 percent. Qualitative examination of 89 cases with integrated patches identifies recurring patterns of structural integration, selective extraction, and iterative refinement. Developers treat AI output as a starting point rather than a final implementation. Even without direct adoption, ChatGPT affects workflows through conceptual guidance, documentation, and debugging. Integration decisions depend on contextual fit, integration effort, maintainer trust, and established review norms rather than serving as direct measures of code

What carries the argument

PatchTrack, an automated classifier that determines whether AI-generated patches were applied, partially reused, or not integrated into pull requests.

If this is right

  • Full adoption of ChatGPT-generated code is uncommon in pull request workflows.
  • Developers typically treat AI output as a starting point rather than a final implementation.
  • ChatGPT influences workflows through conceptual guidance, documentation, and debugging strategies even when code is not directly adopted.
  • Integration decisions reflect contextual fit, integration effort, maintainer trust, and established pull request review norms.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • AI coding tools could be designed to better support partial reuse and adaptation of suggestions rather than aiming for complete replacements.
  • Similar analyses of other large language models might show whether the observed integration patterns hold beyond ChatGPT.
  • The work implies that AI assistance may gradually alter established norms in code review and collaboration.

Load-bearing premise

The dataset of 338 pull requests containing self-admitted ChatGPT usage accurately represents typical AI-assisted development without significant selection or reporting bias.

What would settle it

Measuring integration rates in a larger set of pull requests from projects known to use AI tools but identified without requiring self-admission and finding substantially different rates would challenge the representativeness of the observed patterns.

read the original abstract

The rapid adoption of large language models (LLMs) like ChatGPT has introduced new dynamics in software development, particularly within pull request workflows. While prior research has examined the quality of AI-generated code, less is known about how developers evaluate, adapt, and integrate these suggestions in real-world collaboration. We analyze 338 pull requests from 255 GitHub repositories containing self-admitted ChatGPT usage, comprising 645 AI-generated snippets and 3,486 developer-authored patches. To support this analysis at scale, we use PatchTrack, an automated classifier that identifies whether AI-generated patches were applied, partially reused, or not integrated. Our findings reveal that full adoption of ChatGPT-generated code is uncommon: the median integration rate is 25%. Qualitative analysis of 89 pull requests with integrated patches reveals recurring patterns of structural integration, selective extraction, and iterative refinement, indicating that developers typically treat AI output as a starting point rather than a final implementation. Even when code is not directly adopted, ChatGPT influences workflows through conceptual guidance, documentation, and debugging strategies. Integration decisions reflect contextual fit, integration effort, maintainer trust, and established pull request review norms rather than serving as direct indicators of code correctness. Overall, this study provides empirical insight into AI-mediated decision-making in collaborative software development, showing that the influence of generative AI extends beyond patch generation to how developers reason about, adapt, and negotiate code during review within pull request workflows. These findings inform the design of AI-assisted tools and support more transparent and effective use of LLMs in practice.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The manuscript analyzes 338 pull requests from 255 GitHub repositories containing self-admitted ChatGPT usage, comprising 645 AI-generated snippets and 3,486 developer-authored patches. It introduces PatchTrack, an automated classifier to categorize whether AI patches were fully applied, partially reused, or not integrated. Findings show a median integration rate of 25%, with qualitative review of 89 integrated cases revealing patterns of structural integration, selective extraction, and iterative refinement. The study concludes that developers treat AI output as a starting point, that ChatGPT influences workflows via conceptual guidance even without direct adoption, and that integration decisions depend on contextual fit, effort, trust, and review norms rather than code correctness alone. Overall, the paper claims that generative AI shapes not only patch generation but also reasoning, adaptation, and negotiation during PR review.

Significance. If the core empirical patterns hold after addressing methodological gaps, this study offers meaningful insight into LLM use in real collaborative software development. The scale of the self-admitted dataset and the mixed quantitative-qualitative approach provide concrete observations on integration rates and adaptation behaviors that go beyond synthetic benchmarks. Strengths include the focus on actual PR workflows and the identification of recurring developer strategies; these can usefully inform tool design and guidelines for transparent LLM adoption. The work is a solid empirical contribution to the growing literature on AI-assisted development.

major comments (2)
  1. [Abstract and data collection] Abstract and data collection description: The central claim that the influence of generative AI extends to how developers reason about, adapt, and negotiate code in PR workflows rests on the 338 self-admitted PRs (and the 89 qualitatively reviewed) being representative of typical AI-assisted development. Self-admission selects for developers willing to disclose usage, which may correlate with higher transparency, different trust levels, or project norms that favor integration; this selection effect is not mitigated or quantified in the described collection approach and directly affects the generalizability of the 25% median adoption figure and the qualitative themes.
  2. [PatchTrack classifier and qualitative analysis] PatchTrack classifier and qualitative analysis sections: The manuscript provides insufficient detail on validation of the automated classifier (e.g., precision, recall, or agreement with manual labels), inter-rater reliability for the coding of the 89 cases, and any controls for confounding variables such as PR size, complexity, or repository-specific review norms. These elements are load-bearing for the reliability of the reported integration patterns and the distinction between full, partial, and non-integration.
minor comments (2)
  1. [Abstract] The abstract would benefit from an explicit sentence on the limitations of relying on self-admitted usage to help readers calibrate expectations about generalizability.
  2. [Results] Notation for the three integration categories (full, partial, none) should be defined consistently in the text and any tables or figures that report the 25% median rate.

Simulated Author's Rebuttal

2 responses · 1 unresolved

We thank the referee for the constructive feedback, which identifies key areas for improving methodological transparency and acknowledging limitations. We have revised the manuscript to address both major comments by expanding the limitations discussion and adding validation details.

read point-by-point responses
  1. Referee: [Abstract and data collection] Abstract and data collection description: The central claim that the influence of generative AI extends to how developers reason about, adapt, and negotiate code in PR workflows rests on the 338 self-admitted PRs (and the 89 qualitatively reviewed) being representative of typical AI-assisted development. Self-admission selects for developers willing to disclose usage, which may correlate with higher transparency, different trust levels, or project norms that favor integration; this selection effect is not mitigated or quantified in the described collection approach and directly affects the generalizability of the 25% median adoption figure and the qualitative themes.

    Authors: We acknowledge that reliance on self-admitted ChatGPT usage introduces a selection bias, as developers who publicly disclose AI assistance may differ systematically in transparency, trust levels, or project norms from those who do not. This is an inherent challenge when studying emerging practices without platform-level logging of AI tool use. In the revised manuscript, we have added an expanded Limitations section that explicitly discusses this selection effect, its potential influence on the observed 25% median integration rate and qualitative themes, and the resulting bounds on generalizability. We have clarified that findings are presented as observations from disclosed cases rather than claims of representativeness across all AI-assisted development, and we suggest directions for future work using complementary identification methods. revision: yes

  2. Referee: [PatchTrack classifier and qualitative analysis] PatchTrack classifier and qualitative analysis sections: The manuscript provides insufficient detail on validation of the automated classifier (e.g., precision, recall, or agreement with manual labels), inter-rater reliability for the coding of the 89 cases, and any controls for confounding variables such as PR size, complexity, or repository-specific review norms. These elements are load-bearing for the reliability of the reported integration patterns and the distinction between full, partial, and non-integration.

    Authors: We agree that additional methodological detail is required to support the reliability of PatchTrack and the qualitative findings. In the revised manuscript, we have inserted a dedicated validation subsection for the PatchTrack classifier that reports agreement metrics with manual labels on a held-out set. For the qualitative coding of the 89 integrated cases, we now include inter-rater reliability statistics. We have also added explicit discussion of how we considered potential confounders such as PR size, complexity, and repository norms, including stratification where data permitted and sensitivity checks in the thematic analysis. These revisions directly address the load-bearing elements raised. revision: yes

standing simulated objections not resolved
  • Fully quantifying the magnitude of selection bias from self-admission would require a separate comparative study of undisclosed AI usage, which exceeds the scope of this observational analysis.

Circularity Check

0 steps flagged

No circularity: purely observational empirical study with no derivations or self-referential reductions

full rationale

This paper conducts an empirical analysis of 338 GitHub pull requests containing self-admitted ChatGPT usage, using data collection, an automated classifier (PatchTrack) for integration patterns, and qualitative coding on a subset of 89 PRs. All claims rest on observed frequencies, median integration rates, and recurring patterns identified in the collected data rather than any mathematical derivation, fitted-parameter prediction, or self-citation chain that reduces the central findings to the inputs by construction. The study is self-contained against external benchmarks of GitHub data and qualitative methods, with no load-bearing steps that equate outputs to inputs via definition or prior author work.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 1 invented entities

The central claims rest on the representativeness of self-admitted usage data and the reliability of the newly introduced PatchTrack classifier for categorizing patch integration.

axioms (1)
  • domain assumption Self-admitted ChatGPT usage in pull request descriptions serves as a reliable indicator of actual AI assistance without substantial false positives or under-reporting bias.
    The study selects PRs based on explicit mentions; this assumption underpins the entire dataset construction and generalizability of findings.
invented entities (1)
  • PatchTrack automated classifier no independent evidence
    purpose: To scale identification of whether AI-generated patches were fully applied, partially reused, or not integrated across hundreds of PRs.
    New tool created for the study; no external validation or independent evidence of accuracy is mentioned in the abstract.

pith-pipeline@v0.9.0 · 5810 in / 1511 out tokens · 54868 ms · 2026-05-22T16:08:13.835831+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Forward citations

Cited by 2 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. AgenticFlict: A Large-Scale Dataset of Merge Conflicts in AI Coding Agent Pull Requests on GitHub

    cs.SE 2026-04 accept novelty 7.0

    AgenticFlict is a public dataset of 29K+ textual merge conflicts from AI agent PRs, collected via merge simulation on 107K processed PRs and showing a 27.67% conflict rate with variation across agents.

  2. How AI Coding Agents Modify Code: A Large-Scale Study of GitHub Pull Requests

    cs.SE 2026-01 unverdicted novelty 7.0

    AI coding agents produce pull requests with substantially more commits and slightly higher description-to-diff similarity than human developers, based on analysis of 29,095 merged PRs.

Reference graph

Works this paper leans on

116 extracted references · 116 canonical work pages · cited by 2 Pith papers · 4 internal anchors

  1. [1]

    Automated Software Engineering27(4), 459–489 (2020) https: //doi.org/10.1007/s10515-020-00280-2

    Menzies, T., Pecheur, C.: Software engineering with ai/ml: State of the art and future prospects. Automated Software Engineering27(4), 459–489 (2020) https: //doi.org/10.1007/s10515-020-00280-2

  2. [2]

    ACM Trans

    Russo, D.: Navigating the complexity of generative ai adoption in software engi- neering. ACM Trans. Softw. Eng. Methodol. (2024) https://doi.org/10.1145/ 3652154 . Just Accepted

  3. [3]

    arXiv preprint arXiv:2403.02583 (2024) https://doi.org/10.48550/arXiv.2403.02583

    Huang, Y., Chen, Y., Chen, X., Chen, J., Peng, R., Tang, Z., Huang, J., Xu, F., Zheng, Z.: Generative software engineering. arXiv preprint arXiv:2403.02583 (2024) https://doi.org/10.48550/arXiv.2403.02583 . Submitted on 5 Mar 2024, last revised 3 Apr 2024 (this version, v2)

  4. [4]

    IEEE Software 40(4), 30–38 (2023) https://doi.org/10.1109/MS.2023.3265877

    Ebert, C., Louridas, P.: Generative ai for software practitioners. IEEE Software 40(4), 30–38 (2023) https://doi.org/10.1109/MS.2023.3265877

  5. [5]

    Automated Software Engineering 31(26) (2024) https://doi.org/10.1007/s10515-024-00330-1 43

    Sauvola, J., Tarkoma, S., Klemettinen, M., Riekki, J., Doermann, D.: Future of software development with generative ai. Automated Software Engineering 31(26) (2024) https://doi.org/10.1007/s10515-024-00330-1 43

  6. [6]

    European Journal of Technic (2023) https://doi.org/10.36222/ejt.1330631

    Ozpolat, Z., Yildirim, Karabatak, M.: Artificial intelligence-based tools in software development processes: Application of chatgpt. European Journal of Technic (2023) https://doi.org/10.36222/ejt.1330631

  7. [7]

    The Impact of AI on Developer Productivity: Evidence from GitHub Copilot

    Peng, S., Kalliamvakou, E., Cihon, P., Demirer, M.: The impact of ai on developer productivity: Evidence from github copilot. arXiv preprint arXiv:2302.06590 (2023) https://doi.org/10.48550/arXiv.2302.06590 . Submit- ted on 13 Feb 2023

  8. [8]

    In: Proceedings of the 54th ACM Technical Symposium on Computer Sci- ence Education, p

    Wermelinger, M.: Using github copilot to solve simple programming problems. In: Proceedings of the 54th ACM Technical Symposium on Computer Sci- ence Education, p. 7. ACM, Toronto, Canada (2023). https://doi.org/10.1145/ 3545945.3569830

  9. [9]

    In: Proceedings of the 21st Inter- national Conference on Mining Software Repositories

    Jin, K., Wang, C.-Y., Pham, H.V., Hemmati, H.: Can chatgpt support devel- opers? an empirical evaluation of large language models for code generation. In: Proceedings of the 21st International Conference on Mining Software Repositories. MSR ’24, pp. 167–171. Association for Computing Machin- ery, New York, NY, USA (2024). https://doi.org/10.1145/3643991.3...

  10. [10]

    In: Proceedings of the 21st International Conference on Mining Software Repositories

    Grewal, B., Lu, W., Nadi, S., Bezemer, C.-P.: Analyzing developer use of chat- gpt generated code in open source github projects. In: Proceedings of the 21st International Conference on Mining Software Repositories. MSR ’24, pp. 157–

  11. [11]

    In: Proceedings of the 21st Inter- national Conference on Mining Software Repositories

    Association for Computing Machinery, New York, NY, USA (2024). https: //doi.org/10.1145/3643991.3645072 .https://doi.org/10.1145/3643991.3645072

  12. [12]

    In: Proceedings of the 21st International Conference on Mining Software Repositories

    Siddiq, M.L., Roney, L., Zhang, J., Santos, J.C.D.S.: Quality assessment of chat- gpt generated code and their use by developers. In: Proceedings of the 21st International Conference on Mining Software Repositories. MSR ’24, pp. 152–

  13. [13]

    In: Proceedings of the 21st Inter- national Conference on Mining Software Repositories

    Association for Computing Machinery, New York, NY, USA (2024). https: //doi.org/10.1145/3643991.3645071 .https://doi.org/10.1145/3643991.3645071

  14. [14]

    In: Proceedings of the 2013 9th Joint Meeting on Foundations of Software Engineering

    Rigby, P.C., Bird, C.: Convergent contemporary software peer review prac- tices. In: Proceedings of the 2013 9th Joint Meeting on Foundations of Software Engineering. ESEC/FSE 2013, pp. 202–212. Association for Computing Machin- ery, New York, NY, USA (2013). https://doi.org/10.1145/2491411.2491444 . https://doi.org/10.1145/2491411.2491444

  15. [15]

    Murphy-Hill, and Robert W

    Bacchelli, A., Bird, C.: Expectations, outcomes, and challenges of modern code review. In: 2013 35th International Conference on Software Engineering (ICSE), pp. 712–721 (2013). https://doi.org/10.1109/ICSE.2013.6606617

  16. [16]

    IEEE Transactions on Software Engineering43(2), 185–204 (2017) https://doi.org/10.1109/TSE.2016.2584053 44

    Storey, M.-A., Zagalsky, A., Filho, F.F., Singer, L., German, D.M.: How social and communication channels shape and challenge a participatory culture in soft- ware development. IEEE Transactions on Software Engineering43(2), 185–204 (2017) https://doi.org/10.1109/TSE.2016.2584053 44

  17. [17]

    In: Proceedings of the 36th International Conference on Software Engineering

    Gousios, G., Pinzger, M., Deursen, A.v.: An exploratory study of the pull- based software development model. In: Proceedings of the 36th International Conference on Software Engineering. ICSE 2014, pp. 345–355. Association for Computing Machinery, New York, NY, USA (2014). https://doi.org/10.1145/ 2568225.2568260 .https://doi.org/10.1145/2568225.2568260

  18. [18]

    In: Proceedings of the 36th International Conference on Software Engineering

    Tsay, J., Dabbish, L., Herbsleb, J.: Influence of social and technical factors for evaluating contribution in github. In: Proceedings of the 36th International Conference on Software Engineering. ICSE 2014, pp. 356–366. Association for Computing Machinery, New York, NY, USA (2014). https://doi.org/10.1145/ 2568225.2568315 .https://doi.org/10.1145/2568225.2568315

  19. [19]

    In: Proceedings of the 38th International Conference on Software Engineering

    Gousios, G., Storey, M.-A., Bacchelli, A.: Work practices and challenges in pull-based development: the contributor’s perspective. In: Proceedings of the 38th International Conference on Software Engineering. ICSE ’16, pp. 285–

  20. [20]

    https: //doi.org/10.1145/2884781.2884826 .https://doi.org/10.1145/2884781.2884826

    Association for Computing Machinery, New York, NY, USA (2016). https: //doi.org/10.1145/2884781.2884826 .https://doi.org/10.1145/2884781.2884826

  21. [21]

    In: Proceedings of the 30th Annual ACM Symposium on Applied Computing

    Soares, D.M., Lima J´ unior, M.L., Murta, L., Plastino, A.: Acceptance factors of pull requests in open-source projects. In: Proceedings of the 30th Annual ACM Symposium on Applied Computing. SAC ’15, pp. 1541–1546. Association for Computing Machinery, New York, NY, USA (2015). https://doi.org/10.1145/ 2695664.2695856 .https://doi.org/10.1145/2695664.2695856

  22. [22]

    In: Proceedings of the 2016 24th ACM SIGSOFT International Symposium on Foundations of Software Engi- neering

    Zhu, J., Zhou, M., Mockus, A.: Effectiveness of code contribution: from patch-based to pull-request-based tools. In: Proceedings of the 2016 24th ACM SIGSOFT International Symposium on Foundations of Software Engi- neering. FSE 2016, pp. 871–882. Association for Computing Machinery, New York, NY, USA (2016). https://doi.org/10.1145/2950290.2950364 . https...

  23. [23]

    Xiao, T., Hata, H., Treude, C., Matsumoto, K.: Generative ai for pull request descriptions: Adoption, impact, and developer interventions. Proc. ACM Softw. Eng.1(FSE) (2024) https://doi.org/10.1145/3643773

  24. [24]

    ACM Press/Addison- Wesley, Reading, MA (1990)

    Rich, C., Waters, R.C.: The Programmer’s Apprentice. ACM Press/Addison- Wesley, Reading, MA (1990)

  25. [25]

    Empirical Software Engineering24(4), 2140–2170 (2019) https://doi.org/10.1007/s10664-019-09696-8

    Zhao, G., Costa, D.A., Zou, Y.: Improving the pull requests review process using learning-to-rank algorithms. Empirical Software Engineering24(4), 2140–2170 (2019) https://doi.org/10.1007/s10664-019-09696-8

  26. [26]

    In: Proceedings of the International Conference on Software and System Processes

    Azeem, M.I., Panichella, S., Di Sorbo, A., Serebrenik, A., Wang, Q.: Action- based recommendation in pull-request development. In: Proceedings of the International Conference on Software and System Processes. ICSSP ’20, pp. 115–

  27. [27]

    https: //doi.org/10.1145/3379177.3388904 .https://doi.org/10.1145/3379177.3388904 45

    Association for Computing Machinery, New York, NY, USA (2020). https: //doi.org/10.1145/3379177.3388904 .https://doi.org/10.1145/3379177.3388904 45

  28. [28]

    In: Proceedings of the 14th ACM / IEEE International Symposium on Empirical Software Engineering and Measurement (ESEM)

    Dey, T., Mockus, A.: Effect of technical and social factors on pull request quality for the npm ecosystem. In: Proceedings of the 14th ACM / IEEE International Symposium on Empirical Software Engineering and Measurement (ESEM). ESEM ’20. Association for Computing Machin- ery, New York, NY, USA (2020). https://doi.org/10.1145/3382494.3410685 . https://doi....

  29. [29]

    arXiv preprint arXiv:2402.15943 (2024)

    Hassan, A.E., Lin, D., Rajbahadur, G.K., Gallaba, K., Cogo, F.R., Chen, B., Zhang, H., Thangarajah, K., Oliva, G.A., Lin, J., Abdullah, W.M., Jiang, Z.M.: Rethinking software engineering in the foundation model era: A curated cata- logue of challenges in the development of trustworthy fmware. arXiv preprint arXiv:2402.15943 (2024)

  30. [30]

    A Prompt Pattern Catalog to Enhance Prompt Engineering with ChatGPT

    White, J., Fu, Q., Hays, S., Sandborn, M., Olea, C., Gilbert, H., Elnashar, A., Spencer-Smith, J., Schmidt, D.C.: A prompt pattern catalog to enhance prompt engineering with chatgpt. arXiv preprint arXiv:2302.11382 (2023)

  31. [31]

    Journal of the American Medical Informatics Association, 037 (2024)

    Luo, L., Ning, J., Zhao, Y., Wang, Z., Ding, Z., Chen, P., Fu, W., Han, Q., Xu, G., Qiu, Y., et al.: Taiyi: a bilingual fine-tuned large language model for diverse biomedical tasks. Journal of the American Medical Informatics Association, 037 (2024)

  32. [32]

    In: Gurevych, I., Miyao, Y

    Howard, J., Ruder, S.: Universal language model fine-tuning for text classifica- tion. In: Gurevych, I., Miyao, Y. (eds.) Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pp. 328–339. Association for Computational Linguistics, Melbourne, Australia (2018). https://doi.org/10.18653/v1/P18-1031 ...

  33. [33]

    In 45th IEEE/ACM International Conference on Software Engineering, ICSE 2023, Melbourne, Australia, May 14-20, 2023

    Jiang, N., Liu, K., Lutellier, T., Tan, L.: Impact of code language mod- els on automated program repair. In: Proceedings of the 45th Inter- national Conference on Software Engineering. ICSE ’23, pp. 1430–1442. IEEE Press, ??? (2023). https://doi.org/10.1109/ICSE48619.2023.00125 . https://doi.org/10.1109/ICSE48619.2023.00125

  34. [34]

    In: Proceedings of the IEEE/ACM 46th International Conference on Software Engineering

    Guo, Q., Cao, J., Xie, X., Liu, S., Li, X., Chen, B., Peng, X.: Explor- ing the potential of chatgpt in automated code refinement: An empirical study. In: Proceedings of the IEEE/ACM 46th International Conference on Software Engineering. ICSE ’24, pp. 1–13. Association for Computing Machin- ery, New York, NY, USA (2024). https://doi.org/10.1145/3597503.36...

  35. [35]

    Dataflow analysis-inspired deep learning for efficient vulnerability detection

    Deng, Y., Xia, C.S., Yang, C., Zhang, S.D., Yang, S., Zhang, L.: Large lan- guage models are edge-case generators: Crafting unusual programs for fuzzing deep learning libraries. In: Proceedings of the IEEE/ACM 46th International Conference on Software Engineering. ICSE ’24, pp. 1–13. Association for Computing Machinery, New York, NY, USA (2024). https://d...

  36. [36]

    arXiv preprint arXiv:2308.10620 (2023)

    Hou, X., Zhao, Y., Liu, Y., Yang, Z., Wang, K., Li, L., Luo, X., Lo, D., Grundy, J., Wang, H.: Large language models for software engineering: A systematic literature review. arXiv preprint arXiv:2308.10620 (2023)

  37. [37]

    In: 2023 IEEE 34th International Symposium on Software Reliability Engineering (ISSRE), pp

    Ju, J., Yu, L., Li, X., Yang, L., Zuo, C.: Llama-reviewer: Advancing code review automation with large language models through parameter-efficient fine- tuning. In: 2023 IEEE 34th International Symposium on Software Reliability Engineering (ISSRE), pp. 647–658. IEEE, ??? (2023)

  38. [38]

    arXiv preprint arXiv:2305.00418 (2023)

    Siddiq, M.L., Santos, J., Tanvir, R.H., Ulfat, N., Rifat, F.A., Lopes, V.C.: Exploring the effectiveness of large language models in generating unit tests. arXiv preprint arXiv:2305.00418 (2023)

  39. [39]

    arXiv preprint arXiv:2402.13456 (2024)

    Tufano, R., Mastropaolo, A., Pepe, F., Dabi´ c, O., Di Penta, M., Bavota, G.: Unveiling chatgpt’s usage in open source projects: A mining-based study. arXiv preprint arXiv:2402.13456 (2024). Paper accepted for publication at 21st International Conference on Mining Software Repositories (MASR’24)

  40. [40]

    IEEE Trans

    Tufano, R., Dabi´ c, O., Mastropaolo, A., Ciniselli, M., Bavota, G.: Code review automation: Strengths and weaknesses of the state of the art. IEEE Trans. Softw. Eng.50(2), 338–353 (2024) https://doi.org/10.1109/TSE.2023.3348172

  41. [41]

    In: Proceedings of the IEEE/ACM 46th International Conference on Software Engineering

    Tanzil, M.H., Khan, J.Y., Uddin, G.: Chatgpt incorrectness detection in software reviews. In: Proceedings of the IEEE/ACM 46th International Confer- ence on Software Engineering. ICSE ’24. Association for Computing Machin- ery, New York, NY, USA (2024). https://doi.org/10.1145/3597503.3639194 . https://doi.org/10.1145/3597503.3639194

  42. [42]

    arXiv preprint arXiv:2506.04418 (2025)

    Nashid, N., Ding, D., Gallaba, K., Hassan, A.E., Mesbah, A.: Characterizing multi-hunk patches: Divergence, proximity, and llm repair challenges. arXiv preprint arXiv:2506.04418 (2025)

  43. [43]

    Zenodo (2023) https://doi.org/10.5281/zenodo.8304091

    Xiao, T., Treude, C., Hata, H., Matsumoto, K.: Devgpt: Studying developer- chatgpt conversations. Zenodo (2023) https://doi.org/10.5281/zenodo.8304091

  44. [44]

    In: 2024 IEEE 35th International Symposium on Software Reliability Engineering (ISSRE), pp

    Li, S., Cheng, Y., Chen, J., Xuan, J., He, S., Shang, W.: Assessing the per- formance of ai-generated code: A case study on github copilot. In: 2024 IEEE 35th International Symposium on Software Reliability Engineering (ISSRE), pp. 216–227 (2024). https://doi.org/10.1109/ISSRE62328.2024.00030

  45. [45]

    https://doi

    Ogenrwot, D., Businge, J.: Replication Package for PatchTrack: A Comprehen- sive Analysis of ChatGPT’s Influence on Pull Request Outcomes. https://doi. org/10.5281/zenodo.14978624 . https://doi.org/10.5281/zenodo.14978624

  46. [46]

    GitHub: Online Appendix. GitHub. https://www.gnu.org/software/diffutils/ 47 manual/html node/Hunks.html

  47. [47]

    https://chatgpt.com/share/ 8cb16814-2855-4fbd-87e5-bde8ba349728

    GitHub pull request (2023). https://chatgpt.com/share/ 8cb16814-2855-4fbd-87e5-bde8ba349728

  48. [48]

    https://github.com/faker-js/faker/pull/2405

    GitHub pull request (2023). https://github.com/faker-js/faker/pull/2405

  49. [49]

    In: Proceedings of the 22nd Interna- tional Conference on Mining Software Repositories (MSR 2025)

    Ehsani, R., Pathak, S., Chatterjee, P.: Towards detecting prompt knowledge gaps for improved llm-guided issue resolution. In: Proceedings of the 22nd Interna- tional Conference on Mining Software Repositories (MSR 2025). ACM, Ottawa, Canada (2025). To appear

  50. [50]

    GitHub: GitHub REST API Documentation. GitHub. https://docs.github.com/ en/rest?apiVersion=2022-11-28

  51. [51]

    OpenAI: Terms of Use. OpenAI. https://openai.com/policies/terms-of-use

  52. [52]

    BMC Medical Research Methodology13(1), 117 (2013) https://doi

    Gale, N.K., Heath, G., Cameron, E., Rashid, S., Redwood, S.: Using the frame- work method for the analysis of qualitative data in multi-disciplinary health research. BMC Medical Research Methodology13(1), 117 (2013) https://doi. org/10.1186/1471-2288-13-117

  53. [53]

    Empirical Software Engineering28, 150 (2023) https://doi.org/ 10.1007/s10664-023-10394-9

    Weeraddana, N.R., Xu, X., Alfadel, M.,et al.: An empirical comparison of ethnic and gender diversity of devops and non-devops contributions to open- source projects. Empirical Software Engineering28, 150 (2023) https://doi.org/ 10.1007/s10664-023-10394-9 . Accepted: 11 September 2023

  54. [54]

    In: Advances in Neural Information Processing Systems, Curran Associates, Inc., vol 34, pp 27,865–27,876,https://proceedings

    Wang, L., Zheng, Z., Wu, X., Sang, B., Zhang, J., Tao, X.: Fork entropy: Assessing the diversity of open source software projects’ forks. In: 2023 38th IEEE/ACM International Conference on Automated Software Engineering (ASE), pp. 204–216 (2023). https://doi.org/10.1109/ASE56229.2023.00168

  55. [55]

    In: Proceedings of the 29th Edition of the IEEE International Conference on Software Analysis, Evolu- tion and Reengineering, pp

    Businge, J., Decan, A., Zerouali, A., Mens, T., Demeyer, S., De Roover, C.: Variant forks – motivations and impediments. In: Proceedings of the 29th Edition of the IEEE International Conference on Software Analysis, Evolu- tion and Reengineering, pp. 867–877. IEEE Computer Society, ??? (2022). https://doi.org/10.1109/SANER53432.2022.00105

  56. [56]

    https://github.blog/ news-insights/octoverse/octoverse-2024/

    GitHub: The State of Open Source: Octoverse 2024 (2024). https://github.blog/ news-insights/octoverse/octoverse-2024/

  57. [57]

    ReDeBug: Finding unpatched code clones in entire OS distributions,

    Jang, J., Agrawal, A., Brumley, D.: Redebug: Finding unpatched code clones in entire os distributions. In: 2012 IEEE Symposium on Security and Privacy, pp. 48–62 (2012). https://doi.org/10.1109/SP.2012.13

  58. [58]

    https://github.com/pokt-network/poktroll/pull/ 185

    GitHub pull request (2024). https://github.com/pokt-network/poktroll/pull/ 185. 48

  59. [59]

    https://github.com/Mudlet/Mudlet/pull/7123

    GitHub pull request (2024). https://github.com/Mudlet/Mudlet/pull/7123

  60. [60]

    https://github.com/nylas/nylas-python/pull/279

    GitHub pull request (2024). https://github.com/nylas/nylas-python/pull/279

  61. [61]

    https://github.com/alshedivat/al-folio/pull/2059

    GitHub pull request (2024). https://github.com/alshedivat/al-folio/pull/2059

  62. [62]

    https://github.com/laravel-json-api/core/pull/12

    GitHub pull request (2023). https://github.com/laravel-json-api/core/pull/12

  63. [63]

    https://github.com/ory/elements/pull/171

    GitHub pull request (2023). https://github.com/ory/elements/pull/171

  64. [64]

    https://github.com/darklang/dark/pull/5063

    GitHub pull request (2023). https://github.com/darklang/dark/pull/5063

  65. [65]

    https://github.com/sveltejs/learn.svelte.dev/pull/ 522

    GitHub pull request (2023). https://github.com/sveltejs/learn.svelte.dev/pull/ 522

  66. [66]

    https://github.com/darklang/dark/pull/5058

    GitHub pull request (2023). https://github.com/darklang/dark/pull/5058

  67. [67]

    https://github.com/Bananapus/nana-core/pull/ 37

    GitHub pull request (2023). https://github.com/Bananapus/nana-core/pull/ 37

  68. [68]

    In: Proceed- ings of the 39th IEEE/ACM International Conference on Automated Software Engineering (ASE) (2024)

    Moumoula, M., Kabore, A., Klein, J., Bissyand´ e, T.: Cross-lingual code clone detection: When llms fall short against embedding-based classifier. In: Proceed- ings of the 39th IEEE/ACM International Conference on Automated Software Engineering (ASE) (2024)

  69. [69]

    In: Proceedings of the 39th IEEE/ACM International Conference on Automated Software Engineering (ASE) (2024)

    Ou´ edraogo, W., Kabore, K., Tian, H., Song, Y., Koyuncu, A., Klein, J., Lo, D., Bissyand´ e, T.: Llms and prompting for unit test generation: A large-scale evaluation. In: Proceedings of the 39th IEEE/ACM International Conference on Automated Software Engineering (ASE) (2024)

  70. [70]

    ACM Transactions on Software Engineering and Methodology33(5), 1–44 (2024)

    Chen, D., Liu, Y., Zhou, M., Zhao, Y., Wang, S., Wang, X., Chen, X., Bissyand´ e, T., Klein, J.: Llm for mobile: An initial roadmap. ACM Transactions on Software Engineering and Methodology33(5), 1–44 (2024)

  71. [71]

    https://github.com/codecrafters-io/frontend/pull/ 1061

    GitHub pull request (2023). https://github.com/codecrafters-io/frontend/pull/ 1061

  72. [72]

    https://github.com/faker-js/faker/pull/2230

    GitHub pull request (2023). https://github.com/faker-js/faker/pull/2230

  73. [73]

    https://github.com/digitalbitbox/ bitbox-wallet-app/pull/2415

    GitHub pull request (2023). https://github.com/digitalbitbox/ bitbox-wallet-app/pull/2415

  74. [74]

    https://github.com/darklang/dark/pull/5068

    GitHub pull request (2023). https://github.com/darklang/dark/pull/5068

  75. [75]

    https://github.com/gemini-hlsw/scheduler/pull/ 428

    GitHub pull request (2024). https://github.com/gemini-hlsw/scheduler/pull/ 428

  76. [76]

    https://github.com/theosanderson/taxonium/pull/ 49 534

    GitHub pull request (2024). https://github.com/theosanderson/taxonium/pull/ 49 534

  77. [77]

    https://github.com/open-learning-exchange/ myplanet/pull/2214

    GitHub pull request (2024). https://github.com/open-learning-exchange/ myplanet/pull/2214

  78. [78]

    https://github.com/open-learning-exchange/ myplanet/pull/2212

    GitHub pull request (2024). https://github.com/open-learning-exchange/ myplanet/pull/2212

  79. [79]

    https://github.com/labdao/plex/pull/468

    GitHub pull request (2024). https://github.com/labdao/plex/pull/468

  80. [80]

    https://github.com/plausible/analytics/pull/3792

    GitHub pull request (2024). https://github.com/plausible/analytics/pull/3792

Showing first 80 references.