How Do Developers Maintain and Evolve Their Agents' Instructions? An Empirical Study

Alfonso Cannavale; Andrea De Lucia; Fabio Palomba; Gemma Catolino; Gianmario Voria; Yutaro Kashiwa

arxiv: 2606.25257 · v1 · pith:H4RKABVBnew · submitted 2026-06-24 · 💻 cs.SE

How Do Developers Maintain and Evolve Their Agents' Instructions? An Empirical Study

Gianmario Voria , Alfonso Cannavale , Andrea De Lucia , Yutaro Kashiwa , Gemma Catolino , Fabio Palomba This is my paper

Pith reviewed 2026-06-25 21:37 UTC · model grok-4.3

classification 💻 cs.SE

keywords Agent Context Filesautonomous coding agentssoftware maintenanceempirical mining studycode quality metricstemporal patternsACF evolutionmining software repositories

0 comments

The pith

Developers evolve agent instructions through changes classifiable by maintenance theory and tied to code quality

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper outlines plans for an empirical mining study on how developers maintain and evolve Agent Context Files, the structured instructions that direct autonomous coding agents in software tasks. It intends to build a taxonomy of ACF changes drawn from software maintenance theory, test statistical associations between change categories and code quality metrics, and map when these changes occur across the development lifecycle. The work targets practical challenges in governing AI-assisted coding by reconstructing evolution histories from public repository commits. A sympathetic reader would care because effective management of these files could improve traceability and control as agents handle more of the engineering workload.

Core claim

The authors plan to classify changes to Agent Context Files using a taxonomy grounded in software maintenance theory, statistically analyze associations between change types and code quality outcomes, and examine temporal patterns of these changes across the agent-driven development lifecycle.

What carries the argument

Taxonomy of ACF changes grounded in software maintenance theory, applied through commit-level reconstruction of file histories and statistical linking to code quality metrics.

If this is right

ACF changes will be grouped into distinct maintenance categories with measurable differences.
Statistical analysis will identify which change types associate with specific code quality outcomes.
Temporal mapping will reveal when ACF updates typically occur in the project lifecycle.
Findings will support recommendations for designing ACFs to better govern autonomous agents.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

If patterns emerge, maintaining ACFs may emerge as a distinct developer skill alongside traditional code maintenance.
The taxonomy approach could extend to instruction files for AI agents in domains beyond software coding.
Automated detection of needed ACF updates based on code changes becomes a testable next step.

Load-bearing premise

Public repositories exist in sufficient quantity that contain both identifiable Agent Context Files and agent-generated commits to support large-scale reconstruction, qualitative classification, and statistical analysis.

What would settle it

Discovery of too few qualifying repositories or failure to find statistically significant differences in code quality metrics across the maintenance categories would prevent the planned associations from being established.

read the original abstract

Context. Autonomous coding agents are increasingly used in software development, shifting parts of the engineering process to AI assistance. While this automation brings clear benefits, it introduces challenges in governance, traceability, and control over agent behavior. Agent Context Files (ACFs) have emerged as a practical mechanism to guide agents through structured instructions, yet little is known about how these artifacts are maintained and how their evolution relates to code development. Objective. This paper plans to investigate the evolution of ACFs and their role in agent-driven development. Specifically, we (1) classify ACF changes through a taxonomy grounded in software maintenance theory, (2) analyze how different types of changes are associated with code quality outcomes, and (3) examine their temporal patterns across the development lifecycle. Method. We conduct a large-scale mining study combining repositories with ACFs and agent-generated commits. We reconstruct ACF evolution at the commit level, classify changes using a qualitative approach, and analyze their association with code quality metrics. Statistical analyses and hypotheses are used to evaluate differences across maintenance categories, to inform future design of ACFs for governing autonomous coding agents.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

This is a research proposal outlining plans for a mining study on Agent Context Files rather than a completed empirical paper with data or findings.

read the letter

The core point is that this document lays out a plan to mine public repositories for how Agent Context Files evolve in autonomous coding agent projects, classify the changes against an existing maintenance taxonomy, and test links to code quality metrics. No actual repositories, classifications, or statistical results are included.

The proposal does a clean job stating three focused objectives and grounding the taxonomy in prior software maintenance work. The method sketch—commit-level reconstruction followed by qualitative coding and then quantitative association tests—follows standard empirical software engineering patterns without unnecessary complication.

The main limitation is that the whole effort hinges on locating a usable volume of public repos that contain both ACFs and commits clearly attributable to agents. The text gives no preliminary counts, search strategy details, or evidence that such data exists at scale, so the planned associations and temporal analyses remain untested assumptions. Attribution noise between agent and human commits is also left unaddressed.

This is relevant to researchers studying AI-assisted development practices and governance. Readers looking for new empirical categories or validated associations will not find them here. The thinking is direct and the objectives are realistic in principle, but the absence of any executed analysis means there is nothing yet to review for soundness or reproducibility.

I would not recommend sending this for full peer review in its current form. It fits better as a workshop proposal or registered report outline once data collection begins.

Referee Report

2 major / 1 minor

Summary. The manuscript outlines a planned empirical mining study on the maintenance and evolution of Agent Context Files (ACFs) that guide autonomous coding agents. It proposes three objectives: (1) classifying ACF changes via a taxonomy grounded in software maintenance theory, (2) analyzing associations between change types and code quality outcomes, and (3) examining temporal patterns across the development lifecycle. The method sketch involves identifying public repositories containing ACFs and agent-generated commits, reconstructing evolution at the commit level, performing qualitative classification, and applying statistical analyses and hypotheses.

Significance. The topic is timely as AI coding agents become more prevalent and ACFs emerge as a governance mechanism. If the planned study were executed with adequate data and yielded reproducible findings, it could inform best practices for ACF design and traceability in agent-driven development. However, the current manuscript contains no data, results, completed analysis, or validation of the core assumptions, so its significance cannot be assessed.

major comments (2)

[Abstract] Abstract and Method: The manuscript presents only a forward-looking research plan and method sketch. No repositories are mined, no ACF changes are classified, and no statistical associations with code quality metrics are computed or reported. Consequently, none of the three stated objectives can be evaluated or supported.
[Method] Method: All three objectives rest on the unverified premise that a sufficient number of public repositories exist containing both ACFs and commits reliably attributable to autonomous agents (via metadata or message patterns). No evidence, pilot data, or feasibility assessment is provided to substantiate this data-availability assumption, which is load-bearing for the entire study design.

minor comments (1)

The abstract and method description alternate between future tense ('plans to', 'we conduct') and present tense, which may mislead readers expecting a completed empirical paper.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the review of our manuscript, which describes a planned empirical mining study on the maintenance and evolution of Agent Context Files (ACFs). We acknowledge that the submission is a research plan rather than a completed study and respond to the major comments below.

read point-by-point responses

Referee: [Abstract] Abstract and Method: The manuscript presents only a forward-looking research plan and method sketch. No repositories are mined, no ACF changes are classified, and no statistical associations with code quality metrics are computed or reported. Consequently, none of the three stated objectives can be evaluated or supported.

Authors: We agree that the manuscript is a forward-looking research plan and contains no executed analyses, mined data, classifications, or statistical results. This is by design: the abstract and method section explicitly frame the work as a proposal to investigate the three objectives using the described approach. The contribution is the study design, including the taxonomy grounded in maintenance theory, rather than empirical outcomes. We do not claim to have completed or validated the objectives. revision: no
Referee: [Method] Method: All three objectives rest on the unverified premise that a sufficient number of public repositories exist containing both ACFs and commits reliably attributable to autonomous agents (via metadata or message patterns). No evidence, pilot data, or feasibility assessment is provided to substantiate this data-availability assumption, which is load-bearing for the entire study design.

Authors: We agree that the manuscript provides no pilot data, evidence, or feasibility assessment regarding the availability of suitable public repositories. This is a substantive limitation for evaluating the study's practicality. As the work is a planned study that has not yet been executed, we cannot supply such data in the current submission. We are willing to revise the method section to include a brief discussion of candidate data sources based on preliminary repository searches, though this would not constitute a full pilot study. revision: partial

Circularity Check

0 steps flagged

No circularity in empirical study proposal

full rationale

The paper is a forward-looking empirical study proposal with three objectives: taxonomy classification of ACF changes, statistical association with code quality metrics, and temporal pattern analysis. It contains no equations, fitted parameters, predictions, or derivations that could reduce to inputs by construction. No self-citations are invoked as load-bearing premises, and the method relies on external repository mining and qualitative analysis independent of the paper's claims. This is the expected outcome for a non-derivational mining study.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The proposal rests on the domain assumption that software maintenance theory supplies a useful taxonomy for ACF changes and that mining repositories with ACFs and agent commits is feasible; no free parameters or invented entities are introduced.

axioms (1)

domain assumption Software maintenance theory provides a suitable taxonomy for classifying changes to Agent Context Files
Invoked directly in objective (1) of the study plan

pith-pipeline@v0.9.1-grok · 5742 in / 1255 out tokens · 20853 ms · 2026-06-25T21:37:52.421232+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

29 extracted references · 2 linked inside Pith

[1]

Agentic software engineering: Foundational pillars and a research roadmap,

A. E. Hassan, H. Li, D. Lin, B. Adams, T.-H. Chen, Y . Kashiwa, and D. Qiu, “Agentic software engineering: Foundational pillars and a research roadmap,” 2025. [Online]. Available: https://arxiv.org/abs/ 2509.06216

Pith/arXiv arXiv 2025
[2]

The rise of ai teammates in software engineering (se) 3.0: How autonomous coding agents are reshaping software engineering,

H. Li, H. Zhang, and A. E. Hassan, “The rise of ai teammates in software engineering (se) 3.0: How autonomous coding agents are reshaping software engineering,” 2025. [Online]. Available: https://arxiv.org/abs/2507.15003

Pith/arXiv arXiv 2025
[3]

Large language models for software engi- neering: A systematic literature review,

X. Hou, Y . Zhao, Y . Liu, Z. Yang, K. Wang, L. Li, X. Luo, D. Lo, J. Grundy, and H. Wang, “Large language models for software engi- neering: A systematic literature review,”ACM Transactions on Software Engineering and Methodology, vol. 33, no. 8, pp. 1–79, 2024

2024
[4]

Agentic ai for software: thoughts from software engineering community,

A. Roychoudhury, “Agentic ai for software: thoughts from software engineering community,” 2025. [Online]. Available: https://arxiv.org/ abs/2508.17343

arXiv 2025
[5]

The current challenges of software engineering in the era of large language models,

C. Gao, X. Hu, S. Gao, X. Xia, and Z. Jin, “The current challenges of software engineering in the era of large language models,”ACM Transactions on Software Engineering and Methodology, vol. 34, pp. 1 – 30, 2024

2024
[6]

Re- quirements development and formalization for reliable code generation: A multi-agent vision,

X. Lu, W. Sun, Y . Zhang, M. Hu, C. Tian, Z. Jin, and Y . Liu, “Re- quirements development and formalization for reliable code generation: A multi-agent vision,”Proceedings of the 40th IEEE/ACM International Conference on Automated Software Engineering (ASE), pp. 3932–3937, 2025

2025
[7]

Echoes of ai: Investigating the downstream effects of ai assistants on software maintainability,

M. Borg, D. Hewett, N. Hagatulah, N. Couderc, E. S ¨oderberg, D. Graham, U. Kini, and D. Farley, “Echoes of ai: Investigating the downstream effects of ai assistants on software maintainability,” 2026. [Online]. Available: https://arxiv.org/abs/2507.00788

arXiv 2026
[8]

On the use of agentic coding manifests: An empirical study of claude code,

W. Chatlatanagulchai, K. Thonglek, B. Reid, Y . Kashiwa, P. Leelaprute, A. Rungsawang, B. Manaskasemsak, and H. Iida, “On the use of agentic coding manifests: An empirical study of claude code,” inProceedings of the 27th International Conference on Product-Focused Software Process Improvement (PROFES). Springer, 2025, pp. 543–551

2025
[9]

Agent readmes: An empirical study of context files for agentic coding,

W. Chatlatanagulchai, H. Li, Y . Kashiwa, B. Reid, K. Thonglek, P. Leelaprute, A. Rungsawang, B. Manaskasemsak, B. Adams, A. E. Hassan, and H. Iida, “Agent readmes: An empirical study of context files for agentic coding,” 2025. [Online]. Available: https://arxiv.org/abs/2511.12884

arXiv 2025
[10]

Context engineering for ai agents in open-source software,

S. Mohsenimofidi, M. Galster, C. Treude, and S. Baltes, “Context engineering for ai agents in open-source software,” 2026. [Online]. Available: https://arxiv.org/abs/2510.21413

arXiv 2026
[11]

On the impacts of contexts on repository-level code generation,

N. Le Hai, D. M. Nguyen, and N. D. Bui, “On the impacts of contexts on repository-level code generation,” inFindings of the Association for Computational Linguistics: NAACL 2025, 2025, pp. 1496–1524

2025
[12]

Evaluating software development agents: Patch patterns, code quality, and issue complexity in real-world github sce- narios,

Z. Chen and L. Jiang, “Evaluating software development agents: Patch patterns, code quality, and issue complexity in real-world github sce- narios,” in2025 IEEE international conference on software analysis, evolution and reengineering (SANER). IEEE, 2025, pp. 657–668

2025
[13]

Agents in software engineering: Survey, landscape, and vision,

Y . Wang, W. Zhong, Y . Huang, E. Shi, M. Yang, J. Chen, H. Li, Y . Ma, Q. Wang, and Z. Zheng, “Agents in software engineering: Survey, landscape, and vision,”Automated Software Engineering, vol. 32, no. 2, p. 70, 2025

2025
[14]

Repairagent: An autonomous, llm-based agent for program repair,

I. Bouzenia, P. Devanbu, and M. Pradel, “Repairagent: An autonomous, llm-based agent for program repair,” in2025 IEEE/ACM 47th Interna- tional Conference on Software Engineering (ICSE). IEEE, 2025, pp. 2188–2200

2025
[15]

Towards autonomous test- ing agents via conversational large language models,

R. Feldt, S. Kang, J. Yoon, and S. Yoo, “Towards autonomous test- ing agents via conversational large language models,” in2023 38th IEEE/ACM International Conference on Automated Software Engineer- ing (ASE). IEEE, 2023, pp. 1688–1693

2023
[16]

Agentic refactoring: An empirical study of ai coding agents,

K. Horikawa, H. Li, Y . Kashiwa, B. Adams, H. Iida, and A. E. Hassan, “Agentic refactoring: An empirical study of ai coding agents,” 2025. [Online]. Available: https://arxiv.org/abs/2511.04824

arXiv 2025
[17]

Are llms correctly integrated into software systems?

Y . Shao, Y . Huang, J. Shen, L. Ma, T. Su, and C. Wan, “Are llms correctly integrated into software systems?” in2025 IEEE/ACM 47th International Conference on Software Engineering (ICSE). IEEE, 2025, pp. 1178–1190

2025
[18]

Guide to the software engineering body of knowledge,

H. Washizaki, “Guide to the software engineering body of knowledge,” IEEE Computer Society, 2024

2024
[19]

What really changes when developers intend to improve their source code: a commit- level study of static metric value and static analysis warning changes,

A. Trautsch, J. Erbel, S. Herbold, and J. Grabowski, “What really changes when developers intend to improve their source code: a commit- level study of static metric value and static analysis warning changes,” Empirical Software Engineering, vol. 28, no. 2, p. 30, 2023

2023
[20]

Repository-level prompt generation for large language models of code,

D. Shrivastava, H. Larochelle, and D. Tarlow, “Repository-level prompt generation for large language models of code,” inInternational Confer- ence on Machine Learning. PMLR, 2023, pp. 31 693–31 715

2023
[21]

Can llms generate higher quality code than humans? an empirical study,

M. T. Jamil, S. Abid, and S. Shamail, “Can llms generate higher quality code than humans? an empirical study,” in2025 IEEE/ACM 22nd International Conference on Mining Software Repositories (MSR). IEEE, 2025, pp. 478–489

2025
[22]

Assessing the quality and security of ai-generated code: A quantitative analysis,

A. Sabra, O. Schmitt, and J. Tyler, “Assessing the quality and security of ai-generated code: A quantitative analysis,” 2025. [Online]. Available: https://arxiv.org/abs/2508.14727

arXiv 2025
[23]

The dimensions of maintenance,

E. B. Swanson, “The dimensions of maintenance,” inProceedings of the 2nd international conference on Software engineering, 1976, pp. 492–497

1976
[24]

Corrective commit probability: a measure of the effort invested in bug fixing,

I. Amit and D. G. Feitelson, “Corrective commit probability: a measure of the effort invested in bug fixing,”Software Quality Journal, vol. 29, no. 4, pp. 817–861, 2021

2021
[25]

A complexity measure,

T. J. McCabe, “A complexity measure,”IEEE Transactions on software Engineering, no. 4, pp. 308–320, 1976

1976
[26]

Toward methodological guidelines for process theories and taxonomies in software engineering,

P. Ralph, “Toward methodological guidelines for process theories and taxonomies in software engineering,”IEEE Transactions on Software Engineering, vol. 45, no. 7, pp. 712–735, 2018

2018
[27]

Taxonomies in soft- ware engineering: A systematic mapping study and a revised taxonomy development method,

M. Usman, R. Britto, J. B ¨orstler, and E. Mendes, “Taxonomies in soft- ware engineering: A systematic mapping study and a revised taxonomy development method,”Information and Software Technology, vol. 85, pp. 43–59, 2017

2017
[28]

Robust statistical methods for empirical software engineering,

B. Kitchenham, L. Madeyski, D. Budgen, J. Keung, P. Brereton, S. Charters, S. Gibbs, and A. Pohthong, “Robust statistical methods for empirical software engineering,”Empirical Software Engineering, vol. 22, no. 2, pp. 579–630, 2017

2017
[29]

Significance tests and goodness of fit in the analysis of covariance structures

P. M. Bentler and D. G. Bonett, “Significance tests and goodness of fit in the analysis of covariance structures.”Psychological bulletin, vol. 88, no. 3, p. 588, 1980

1980

[1] [1]

Agentic software engineering: Foundational pillars and a research roadmap,

A. E. Hassan, H. Li, D. Lin, B. Adams, T.-H. Chen, Y . Kashiwa, and D. Qiu, “Agentic software engineering: Foundational pillars and a research roadmap,” 2025. [Online]. Available: https://arxiv.org/abs/ 2509.06216

Pith/arXiv arXiv 2025

[2] [2]

The rise of ai teammates in software engineering (se) 3.0: How autonomous coding agents are reshaping software engineering,

H. Li, H. Zhang, and A. E. Hassan, “The rise of ai teammates in software engineering (se) 3.0: How autonomous coding agents are reshaping software engineering,” 2025. [Online]. Available: https://arxiv.org/abs/2507.15003

Pith/arXiv arXiv 2025

[3] [3]

Large language models for software engi- neering: A systematic literature review,

X. Hou, Y . Zhao, Y . Liu, Z. Yang, K. Wang, L. Li, X. Luo, D. Lo, J. Grundy, and H. Wang, “Large language models for software engi- neering: A systematic literature review,”ACM Transactions on Software Engineering and Methodology, vol. 33, no. 8, pp. 1–79, 2024

2024

[4] [4]

Agentic ai for software: thoughts from software engineering community,

A. Roychoudhury, “Agentic ai for software: thoughts from software engineering community,” 2025. [Online]. Available: https://arxiv.org/ abs/2508.17343

arXiv 2025

[5] [5]

The current challenges of software engineering in the era of large language models,

C. Gao, X. Hu, S. Gao, X. Xia, and Z. Jin, “The current challenges of software engineering in the era of large language models,”ACM Transactions on Software Engineering and Methodology, vol. 34, pp. 1 – 30, 2024

2024

[6] [6]

Re- quirements development and formalization for reliable code generation: A multi-agent vision,

X. Lu, W. Sun, Y . Zhang, M. Hu, C. Tian, Z. Jin, and Y . Liu, “Re- quirements development and formalization for reliable code generation: A multi-agent vision,”Proceedings of the 40th IEEE/ACM International Conference on Automated Software Engineering (ASE), pp. 3932–3937, 2025

2025

[7] [7]

Echoes of ai: Investigating the downstream effects of ai assistants on software maintainability,

M. Borg, D. Hewett, N. Hagatulah, N. Couderc, E. S ¨oderberg, D. Graham, U. Kini, and D. Farley, “Echoes of ai: Investigating the downstream effects of ai assistants on software maintainability,” 2026. [Online]. Available: https://arxiv.org/abs/2507.00788

arXiv 2026

[8] [8]

On the use of agentic coding manifests: An empirical study of claude code,

W. Chatlatanagulchai, K. Thonglek, B. Reid, Y . Kashiwa, P. Leelaprute, A. Rungsawang, B. Manaskasemsak, and H. Iida, “On the use of agentic coding manifests: An empirical study of claude code,” inProceedings of the 27th International Conference on Product-Focused Software Process Improvement (PROFES). Springer, 2025, pp. 543–551

2025

[9] [9]

Agent readmes: An empirical study of context files for agentic coding,

W. Chatlatanagulchai, H. Li, Y . Kashiwa, B. Reid, K. Thonglek, P. Leelaprute, A. Rungsawang, B. Manaskasemsak, B. Adams, A. E. Hassan, and H. Iida, “Agent readmes: An empirical study of context files for agentic coding,” 2025. [Online]. Available: https://arxiv.org/abs/2511.12884

arXiv 2025

[10] [10]

Context engineering for ai agents in open-source software,

S. Mohsenimofidi, M. Galster, C. Treude, and S. Baltes, “Context engineering for ai agents in open-source software,” 2026. [Online]. Available: https://arxiv.org/abs/2510.21413

arXiv 2026

[11] [11]

On the impacts of contexts on repository-level code generation,

N. Le Hai, D. M. Nguyen, and N. D. Bui, “On the impacts of contexts on repository-level code generation,” inFindings of the Association for Computational Linguistics: NAACL 2025, 2025, pp. 1496–1524

2025

[12] [12]

Evaluating software development agents: Patch patterns, code quality, and issue complexity in real-world github sce- narios,

Z. Chen and L. Jiang, “Evaluating software development agents: Patch patterns, code quality, and issue complexity in real-world github sce- narios,” in2025 IEEE international conference on software analysis, evolution and reengineering (SANER). IEEE, 2025, pp. 657–668

2025

[13] [13]

Agents in software engineering: Survey, landscape, and vision,

Y . Wang, W. Zhong, Y . Huang, E. Shi, M. Yang, J. Chen, H. Li, Y . Ma, Q. Wang, and Z. Zheng, “Agents in software engineering: Survey, landscape, and vision,”Automated Software Engineering, vol. 32, no. 2, p. 70, 2025

2025

[14] [14]

Repairagent: An autonomous, llm-based agent for program repair,

I. Bouzenia, P. Devanbu, and M. Pradel, “Repairagent: An autonomous, llm-based agent for program repair,” in2025 IEEE/ACM 47th Interna- tional Conference on Software Engineering (ICSE). IEEE, 2025, pp. 2188–2200

2025

[15] [15]

Towards autonomous test- ing agents via conversational large language models,

R. Feldt, S. Kang, J. Yoon, and S. Yoo, “Towards autonomous test- ing agents via conversational large language models,” in2023 38th IEEE/ACM International Conference on Automated Software Engineer- ing (ASE). IEEE, 2023, pp. 1688–1693

2023

[16] [16]

Agentic refactoring: An empirical study of ai coding agents,

K. Horikawa, H. Li, Y . Kashiwa, B. Adams, H. Iida, and A. E. Hassan, “Agentic refactoring: An empirical study of ai coding agents,” 2025. [Online]. Available: https://arxiv.org/abs/2511.04824

arXiv 2025

[17] [17]

Are llms correctly integrated into software systems?

Y . Shao, Y . Huang, J. Shen, L. Ma, T. Su, and C. Wan, “Are llms correctly integrated into software systems?” in2025 IEEE/ACM 47th International Conference on Software Engineering (ICSE). IEEE, 2025, pp. 1178–1190

2025

[18] [18]

Guide to the software engineering body of knowledge,

H. Washizaki, “Guide to the software engineering body of knowledge,” IEEE Computer Society, 2024

2024

[19] [19]

What really changes when developers intend to improve their source code: a commit- level study of static metric value and static analysis warning changes,

A. Trautsch, J. Erbel, S. Herbold, and J. Grabowski, “What really changes when developers intend to improve their source code: a commit- level study of static metric value and static analysis warning changes,” Empirical Software Engineering, vol. 28, no. 2, p. 30, 2023

2023

[20] [20]

Repository-level prompt generation for large language models of code,

D. Shrivastava, H. Larochelle, and D. Tarlow, “Repository-level prompt generation for large language models of code,” inInternational Confer- ence on Machine Learning. PMLR, 2023, pp. 31 693–31 715

2023

[21] [21]

Can llms generate higher quality code than humans? an empirical study,

M. T. Jamil, S. Abid, and S. Shamail, “Can llms generate higher quality code than humans? an empirical study,” in2025 IEEE/ACM 22nd International Conference on Mining Software Repositories (MSR). IEEE, 2025, pp. 478–489

2025

[22] [22]

Assessing the quality and security of ai-generated code: A quantitative analysis,

A. Sabra, O. Schmitt, and J. Tyler, “Assessing the quality and security of ai-generated code: A quantitative analysis,” 2025. [Online]. Available: https://arxiv.org/abs/2508.14727

arXiv 2025

[23] [23]

The dimensions of maintenance,

E. B. Swanson, “The dimensions of maintenance,” inProceedings of the 2nd international conference on Software engineering, 1976, pp. 492–497

1976

[24] [24]

Corrective commit probability: a measure of the effort invested in bug fixing,

I. Amit and D. G. Feitelson, “Corrective commit probability: a measure of the effort invested in bug fixing,”Software Quality Journal, vol. 29, no. 4, pp. 817–861, 2021

2021

[25] [25]

A complexity measure,

T. J. McCabe, “A complexity measure,”IEEE Transactions on software Engineering, no. 4, pp. 308–320, 1976

1976

[26] [26]

Toward methodological guidelines for process theories and taxonomies in software engineering,

P. Ralph, “Toward methodological guidelines for process theories and taxonomies in software engineering,”IEEE Transactions on Software Engineering, vol. 45, no. 7, pp. 712–735, 2018

2018

[27] [27]

Taxonomies in soft- ware engineering: A systematic mapping study and a revised taxonomy development method,

M. Usman, R. Britto, J. B ¨orstler, and E. Mendes, “Taxonomies in soft- ware engineering: A systematic mapping study and a revised taxonomy development method,”Information and Software Technology, vol. 85, pp. 43–59, 2017

2017

[28] [28]

Robust statistical methods for empirical software engineering,

B. Kitchenham, L. Madeyski, D. Budgen, J. Keung, P. Brereton, S. Charters, S. Gibbs, and A. Pohthong, “Robust statistical methods for empirical software engineering,”Empirical Software Engineering, vol. 22, no. 2, pp. 579–630, 2017

2017

[29] [29]

Significance tests and goodness of fit in the analysis of covariance structures

P. M. Bentler and D. G. Bonett, “Significance tests and goodness of fit in the analysis of covariance structures.”Psychological bulletin, vol. 88, no. 3, p. 588, 1980

1980