arxiv: 2512.24635 · v2 · submitted 2025-12-31 · 💻 cs.SE · cs.AI

Recognition: no theorem link

DynaFix: Iterative Automated Program Repair Driven by Execution-Level Dynamic Information

Zhili Huang , Ling Xu , Chao Liu , Weifeng Sun , Xu Zhang , Yan Lei , Meng Yan , Hongyu Zhang

Authors on Pith no claims yet

Pith reviewed 2026-05-16 19:21 UTC · model grok-4.3

classification 💻 cs.SE cs.AI

keywords automated program repairlarge language modelsdynamic analysisruntime feedbackiterative repairDefects4Jpatch generation

0 comments

The pith

DynaFix iteratively injects runtime variable states and control flows into LLM prompts to repair more bugs than static or single-shot methods.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces DynaFix as an automated repair technique that repeatedly runs the buggy program and its patched variants to extract detailed execution traces. These traces, including variable values, executed paths, and call stacks, are turned into structured prompts that steer a language model toward new candidate patches. Failed patches trigger fresh executions whose updated information feeds the next round, creating a feedback loop that refines suggestions over time. The approach is evaluated on the Defects4J benchmarks, where it produces correct fixes for 186 single-function bugs. A reader would care because such dynamic guidance could shrink the manual debugging burden that currently dominates software maintenance.

Core claim

DynaFix repairs 186 single-function bugs on Defects4J v1.2 and v2.0, a 10 percent gain over prior state-of-the-art baselines, and succeeds on 38 bugs that earlier techniques left unfixed. It reaches correct patches in at most 35 attempts while shrinking the explored patch space by 70 percent relative to existing methods. The method works by capturing execution-level dynamic information such as variable states, control-flow paths, and call stacks in each round, converting them into structured prompts that guide large language models to generate and iteratively improve candidate patches.

What carries the argument

The iterative execution loop that re-runs the program after each failed patch attempt, extracts fresh variable states and control-flow information, and injects it as structured prompts for the next LLM generation round.

If this is right

Bugs that static analysis or one-shot prompting miss become reachable once execution traces accumulate across rounds.
The number of model queries required to reach a valid patch drops because each attempt narrows the remaining possibilities.
Complex control-flow bugs can be addressed by letting the model observe how its changes alter actual execution paths.
Repair pipelines can remain effective without enlarging the underlying language model, because guidance quality improves through iteration.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

An IDE plugin could run lightweight executions in the background and surface runtime-guided edit suggestions while a developer is editing.
The same loop might help generate or repair test cases by feeding execution differences back into the model.
Collecting traces across multiple modules could let the method scale beyond single functions if call-stack information is extended accordingly.
The overhead of repeated executions raises a practical limit on how many iterations remain acceptable in time-sensitive repair settings.

Load-bearing premise

That language models will translate structured runtime traces into correct code changes more reliably than they do with static code views or simple pass-or-fail signals.

What would settle it

Apply DynaFix unchanged to a fresh benchmark containing a comparable number of single-function bugs and measure whether the 10 percent repair-rate gain and 70 percent search-space reduction still appear.

Figures

Figures reproduced from arXiv: 2512.24635 by Chao Liu, Hongyu Zhang, Ling Xu, Meng Yan, Weifeng Sun, Xu Zhang, Yan Lei, Zhili Huang.

**Figure 2.** Figure 2: Overview of DynaFix. • Step 3 (Section 3.3): The candidate patch generated by the LLM is applied to the source code and validated by running test cases. This automated validation ensures that patch correctness is assessed automatically. • Step 4 (Section 3.4): We apply the LPR strategy to iteratively refine patches based on validation results. When a patch fails validation, two cases are distinguished: – (… view at source ↗

**Figure 3.** Figure 3: Structure of the hierarchical prompt template. A fixed input–output example is included to enforce [PITH_FULL_IMAGE:figures/full_fig_p007_3.png] view at source ↗

**Figure 4.** Figure 4: Uniquely repaired bugs on Defects4J. As shown in [PITH_FULL_IMAGE:figures/full_fig_p012_4.png] view at source ↗

**Figure 5.** Figure 5: Unique Bug Repairs by DynaFix, Execution-Level Information, Exception Information, and Pure LLM [PITH_FULL_IMAGE:figures/full_fig_p014_5.png] view at source ↗

**Figure 6.** Figure 6: Impact of Search Depth and Breadth in the LPR Strategy. [PITH_FULL_IMAGE:figures/full_fig_p015_6.png] view at source ↗

**Figure 7.** Figure 7: Comparison of Maximum Patch Attempts per Bug Across APR Approaches [PITH_FULL_IMAGE:figures/full_fig_p016_7.png] view at source ↗

read the original abstract

Automated Program Repair (APR) aims to automatically generate correct patches for buggy programs. Recent approaches leveraging large language models (LLMs) have shown promise but face limitations. Most rely solely on static analysis, ignoring runtime behaviors. Some attempt to incorporate dynamic signals, but these are often restricted to training or fine-tuning, or injected only once into the repair prompt, without iterative use. This fails to fully capture program execution. Current iterative repair frameworks typically rely on coarse-grained feedback, such as pass/fail results or exception types, and do not leverage fine-grained execution-level information effectively. As a result, models struggle to simulate human stepwise debugging, limiting their effectiveness in multi-step reasoning and complex bug repair. To address these challenges, we propose DynaFix, an execution-level dynamic information-driven APR method that iteratively leverages runtime information to refine the repair process. In each repair round, DynaFix captures execution-level dynamic information such as variable states, control-flow paths, and call stacks, transforming them into structured prompts to guide LLMs in generating candidate patches. If a patch fails validation, DynaFix re-executes the modified program to collect new execution information for the next attempt. This iterative loop incrementally improves patches based on updated feedback, similar to the stepwise debugging practices of human developers. We evaluate DynaFix on the Defects4J v1.2 and v2.0 benchmarks. DynaFix repairs 186 single-function bugs, a 10% improvement over state-of-the-art baselines, including 38 bugs previously unrepaired. It achieves correct patches within at most 35 attempts, reducing the patch search space by 70% compared with existing methods, thereby demonstrating both effectiveness and efficiency in repairing complex bugs.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

DynaFix gets a 10% lift on Defects4J single-function bugs by looping fine-grained runtime traces into LLM prompts, but the evaluation does not isolate whether the dynamic details or the iteration itself drives the result.

read the letter

DynaFix improves on prior LLM-based APR by running an iterative loop that captures variable states, control-flow paths, and call stacks after each failed patch attempt, then feeds that structured data back into the next prompt. It reports 186 fixed bugs on Defects4J v1.2 and v2.0, including 38 that earlier methods missed, with correct patches found in at most 35 attempts and a 70% smaller search space than the baselines it compares against. The method is presented as closer to how developers actually debug than one-shot static prompts or coarse pass/fail feedback loops. That concrete empirical result on a standard benchmark is the clearest contribution. The paper does a reasonable job stating the motivation and showing the headline numbers without obvious over-claiming in the abstract. The approach is simple enough that others could re-implement the core loop if the prompt templates and execution instrumentation are described clearly in the full text. The main soft spot is the missing ablation that would keep the same number of iterations and attempt budget but swap the detailed runtime information for only pass/fail or exception-type signals. Without that control, it is hard to know how much of the 10% gain comes from the fine-grained execution data versus simply giving the model more rounds of feedback. The abstract also does not mention statistical tests or exact baseline re-implementations, so those details will need checking. The citation pattern follows the usual recent LLM-APR papers and does not look padded. This work is aimed at researchers building or evaluating LLM repair tools who already know the Defects4J setup. It is not a foundational shift but gives a practical incremental technique worth testing. I would send it for peer review so the experimental controls and prompt details can be examined properly.

Referee Report

2 major / 2 minor

Summary. The paper introduces DynaFix, an iterative APR technique that captures fine-grained runtime information (variable states, control-flow paths, call stacks) at each step, converts it into structured LLM prompts, and re-executes the program on patch failure to obtain updated dynamic feedback for the next round. Evaluated on Defects4J v1.2 and v2.0, it reports repairing 186 single-function bugs (10% above SOTA baselines, including 38 previously unrepaired), with correct patches found in at most 35 attempts and a 70% reduction in patch search space.

Significance. If the results hold after addressing evaluation gaps, the work would demonstrate a concrete advance in LLM-based APR by showing that iterative, execution-level dynamic signals can outperform both purely static approaches and coarse pass/fail iterative baselines, yielding both higher repair rates on complex bugs and improved efficiency.

major comments (2)

[Evaluation] Evaluation section: the central claim attributes the 10% improvement (186 fixes, 38 new) and 70% search-space reduction to the use of structured dynamic execution information, yet no ablation is reported that preserves the exact iterative loop, attempt budget, and prompt format while replacing variable states/control-flow/call-stack details with only pass/fail or exception-type feedback. Without this control, the observed gains cannot be confidently isolated from iteration volume or prompt engineering.
[Section 4] Section 4 (results): the reported numbers lack details on exact baseline implementations, prompt templates for both DynaFix and comparators, and any statistical significance tests or variance measures across runs. This weakens support for the 10% improvement and 38-new-bug claims on standard Defects4J benchmarks.

minor comments (2)

[Abstract] Abstract and §3: the phrase 'at most 35 attempts' should be clarified with the precise stopping criterion and whether the bound is fixed or adaptive across benchmarks.
[Section 4.1] Figure captions and §4.1: ensure all tables reporting repair counts explicitly state the number of runs, random seeds, and how 'previously unrepaired' is defined relative to the cited baselines.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive comments. We address each major point below and will revise the manuscript accordingly to strengthen the evaluation and reproducibility.

read point-by-point responses

Referee: [Evaluation] Evaluation section: the central claim attributes the 10% improvement (186 fixes, 38 new) and 70% search-space reduction to the use of structured dynamic execution information, yet no ablation is reported that preserves the exact iterative loop, attempt budget, and prompt format while replacing variable states/control-flow/call-stack details with only pass/fail or exception-type feedback. Without this control, the observed gains cannot be confidently isolated from iteration volume or prompt engineering.

Authors: We agree that an ablation isolating the fine-grained dynamic signals is required. In the revised manuscript we will add a controlled ablation that retains the identical iterative loop, 35-attempt budget, and prompt scaffolding while substituting only pass/fail or exception-type feedback for the variable-state, control-flow, and call-stack details. This will quantify the incremental benefit of the execution-level information. revision: yes
Referee: [Section 4] Section 4 (results): the reported numbers lack details on exact baseline implementations, prompt templates for both DynaFix and comparators, and any statistical significance tests or variance measures across runs. This weakens support for the 10% improvement and 38-new-bug claims on standard Defects4J benchmarks.

Authors: We acknowledge the need for greater transparency. The revised Section 4 will include: (1) precise descriptions of how each baseline was implemented or re-executed, (2) the complete prompt templates used by DynaFix and all comparators, and (3) variance statistics across runs together with significance tests (e.g., McNemar’s test) to support the reported 10 % improvement and the 38 newly repaired bugs. revision: yes

Circularity Check

0 steps flagged

No circularity: empirical claims rest on external benchmark evaluation

full rationale

The paper presents DynaFix as an iterative APR method that injects runtime execution details into LLM prompts and evaluates it on the external Defects4J v1.2/v2.0 benchmarks. It reports concrete repair counts (186 bugs) and comparisons to prior SOTA baselines without any equations, fitted parameters, self-referential predictions, or load-bearing self-citations that reduce the central result to its own inputs. The derivation chain consists of method description followed by independent empirical measurement; no step equates a claimed outcome to a definition or fit performed inside the paper itself.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The approach assumes LLMs can reliably interpret structured runtime traces and that Defects4J single-function bugs are representative of real repair tasks; no free parameters or invented entities are introduced.

axioms (2)

domain assumption Large language models can effectively translate structured runtime traces into improved patch proposals
Invoked in the description of prompt construction and iterative refinement
domain assumption Defects4J benchmarks provide a valid measure of repair effectiveness for complex bugs
Used to claim 10% improvement and 38 newly repaired bugs

pith-pipeline@v0.9.0 · 5631 in / 1361 out tokens · 51459 ms · 2026-05-16T19:21:47.770472+00:00 · methodology

discussion (0)

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

SiblingRepair: Sibling-Based Multi-Hunk Repair with Large Language Models
cs.SE 2026-05 unverdicted novelty 7.0

SiblingRepair uses LLMs with semantic sibling detection and simultaneous/iterative repair strategies to outperform prior multi-hunk APR tools like Hercules on Defects4J and GHRB benchmarks.

Reference graph

Works this paper leans on

68 extracted references · 68 canonical work pages · cited by 1 Pith paper · 2 internal anchors

[1]

Samuel Benton, Xia Li, Yiling Lou, and Lingming Zhang. 2020. On the effectiveness of unified debugging: An extensive study on 16 program repair systems. InProceedings of the 35th IEEE/ACM International Conference on Automated Software Engineering. 907–918

work page 2020
[2]

Islem Bouzenia, Premkumar Devanbu, and Michael Pradel. 2025. RepairAgent: An Autonomous, LLM-Based Agent for Program Repair. In2025 IEEE/ACM 47th International Conference on Software Engineering (ICSE). IEEE, 2188–2200

work page 2025
[3]

Islem Bouzenia, Yangruibo Ding, Kexin Pei, Baishakhi Ray, and Michael Pradel. 2023. Tracefixer: Execution trace-driven program repair.arXiv preprint arXiv:2304.12743(2023)

work page arXiv 2023
[4]

Eric Bruneton, Romain Lenglet, and Thierry Coupaye. 2002. ASM: a code manipulation tool to implement adaptable systems.Adaptable and extensible component systems30, 19 (2002)

work page 2002
[5]

Junkai Chen, Xing Hu, Zhenhao Li, Cuiyun Gao, Xin Xia, and David Lo. 2024. Code search is all you need? improving code suggestions with code search. InProceedings of the IEEE/ACM 46th international conference on software engineering. 1–13

work page 2024
[6]

Xinyun Chen, Maxwell Lin, Nathanael Schärli, and Denny Zhou. 2023. Teaching large language models to self-debug. arXiv preprint arXiv:2304.05128(2023)

work page internal anchor Pith review Pith/arXiv arXiv 2023
[7]

Zimin Chen, Steve Kommrusch, Michele Tufano, Louis-Noël Pouchet, Denys Poshyvanyk, and Martin Monperrus. 2019. Sequencer: Sequence-to-sequence learning for end-to-end program repair.IEEE Transactions on Software Engineering 47, 9 (2019), 1943–1959

work page 2019
[8]

Yangruibo Ding, Benjamin Steenhoek, Kexin Pei, Gail Kaiser, Wei Le, and Baishakhi Ray. 2024. Traced: Execution-aware pre-training for source code. InProceedings of the 46th IEEE/ACM International Conference on Software Engineering. 1–12

work page 2024
[9]

Qiong Feng, Xiaotian Ma, Jiayi Sheng, Ziyuan Feng, Wei Song, and Peng Liang. 2024. Integrating Various Software Artifacts for Better LLM-based Bug Localization and Program Repair.arXiv preprint arXiv:2412.03905(2024)

work page arXiv 2024
[10]

Sidong Feng and Chunyang Chen. 2024. Prompting is all you need: Automated android bug replay with large language models. InProceedings of the 46th IEEE/ACM International Conference on Software Engineering. 1–13

work page 2024
[11]

Michael Fu, Chakkrit Tantithamthavorn, Trung Le, Van Nguyen, and Dinh Phung. 2022. VulRepair: a T5-based automated software vulnerability repair. InProceedings of the 30th ACM joint european software engineering conference and symposium on the foundations of software engineering. 935–947

work page 2022
[12]

Luca Gazzola, Daniela Micucci, and Leonardo Mariani. 2018. Automatic software repair: A survey. InProceedings of the 40th International Conference on Software Engineering. 1219–1219

work page 2018
[13]

Ali Ghanbari, Samuel Benton, and Lingming Zhang. 2019. Practical program repair via bytecode mutation. InProceedings of the 28th ACM SIGSOFT International Symposium on Software Testing and Analysis. 19–30

work page 2019
[14]

Mirazul Haque, Petr Babkin, Farima Farmahinifarahani, and Manuela Veloso. 2025. Towards Effectively Leveraging Execution Traces for Program Repair with Code LLMs.arXiv preprint arXiv:2505.04441(2025). , Vol. 1, No. 1, Article . Publication date: January 2026. 20 Zhili Huang, Ling Xu, Chao Liu, Weifeng Sun, Xu Zhang, Yan Lei, Meng Yan, and Hongyu Zhang

work page arXiv 2025
[15]

Jinru Hua, Mengshi Zhang, Kaiyuan Wang, and Sarfraz Khurshid. 2018. Towards practical program repair with on-demand candidate generation. InProceedings of the 40th international conference on software engineering. 12–23

work page 2018
[16]

Jiajun Jiang, Luyao Ren, Yingfei Xiong, and Lingming Zhang. 2019. Inferring program transformations from singular examples via big code. In2019 34th IEEE/ACM International Conference on Automated Software Engineering (ASE). IEEE, 255–266

work page 2019
[17]

Jiajun Jiang, Yingfei Xiong, Hongyu Zhang, Qing Gao, and Xiangqun Chen. 2018. Shaping program repair space with existing patches and similar code. InProceedings of the 27th ACM SIGSOFT international symposium on software testing and analysis. 298–309

work page 2018
[18]

Nan Jiang, Kevin Liu, Thibaud Lutellier, and Lin Tan. 2023. Impact of code language models on automated program repair. In2023 IEEE/ACM 45th International Conference on Software Engineering (ICSE). IEEE, 1430–1442

work page 2023
[19]

Nan Jiang, Thibaud Lutellier, Yiling Lou, Lin Tan, Dan Goldwasser, and Xiangyu Zhang. 2023. Knod: Domain knowledge distilled tree decoder for automated program repair. In2023 IEEE/ACM 45th International Conference on Software Engineering (ICSE). IEEE, 1251–1263

work page 2023
[20]

Nan Jiang, Thibaud Lutellier, and Lin Tan. 2021. Cure: Code-aware neural machine translation for automatic program repair. In2021 IEEE/ACM 43rd International Conference on Software Engineering (ICSE). IEEE, 1161–1173

work page 2021
[21]

René Just, Darioush Jalali, and Michael D Ernst. 2014. Defects4J: A database of existing faults to enable controlled testing studies for Java programs. InProceedings of the 2014 international symposium on software testing and analysis. 437–440

work page 2014
[22]

Sungmin Kang, Bei Chen, Shin Yoo, and Jian-Guang Lou. 2025. Explainable automated debugging via large language model-driven scientific debugging.Empirical Software Engineering30, 2 (2025), 45

work page 2025
[23]

Raffi Khatchadourian, Yiming Tang, and Mehdi Bagherzadeh. 2020. Safe automated refactoring for intelligent paral- lelization of Java 8 streams.Science of Computer Programming195 (2020), 102476

work page 2020
[24]

Xuan-Bach D Le, Duc-Hiep Chu, David Lo, Claire Le Goues, and Willem Visser. 2017. S3: syntax-and semantic-guided repair synthesis via programming by examples. InProceedings of the 2017 11th Joint Meeting on Foundations of Software Engineering. 593–604

work page 2017
[25]

Xuan Bach D Le, David Lo, and Claire Le Goues. 2016. History driven program repair. In2016 IEEE 23rd international conference on software analysis, evolution, and reengineering (SANER), Vol. 1. IEEE, 213–224

work page 2016
[26]

Claire Le Goues, ThanhVu Nguyen, Stephanie Forrest, and Westley Weimer. 2011. Genprog: A generic method for automatic software repair.Ieee transactions on software engineering38, 1 (2011), 54–72

work page 2011
[27]

Fengjie Li, Jiajun Jiang, Jiajun Sun, and Hongyu Zhang. 2025. Hybrid Automated Program Repair by Combining Large Language Models and Program Analysis.ACM Transactions on Software Engineering and Methodology (TOSEM)(2025). doi:10.1145/3715004

work page doi:10.1145/3715004 2025
[28]

Haonan Li, Yu Hao, Yizhuo Zhai, and Zhiyun Qian. 2024. Enhancing static analysis for practical bug detection: An llm-integrated approach.Proceedings of the ACM on Programming Languages8, OOPSLA1 (2024), 474–499

work page 2024
[29]

Yi Li, Shaohua Wang, and Tien N Nguyen. 2020. Dlfix: Context-based code transformation learning for automated program repair. InProceedings of the ACM/IEEE 42nd international conference on software engineering. 602–614

work page 2020
[30]

Aixin Liu, Bei Feng, Bing Xue, Bingxuan Wang, Bochao Wu, Chengda Lu, Chenggang Zhao, Chengqi Deng, Chenyu Zhang, Chong Ruan, et al. 2024. Deepseek-v3 technical report.arXiv preprint arXiv:2412.19437(2024)

work page internal anchor Pith review Pith/arXiv arXiv 2024
[31]

Kui Liu, Anil Koyuncu, Dongsun Kim, and Tegawendé F Bissyandé. 2019. Avatar: Fixing semantic bugs with fix patterns of static analysis violations. In2019 IEEE 26th International Conference on Software Analysis, Evolution and Reengineering (SANER). IEEE, 1–12

work page 2019
[32]

Kui Liu, Anil Koyuncu, Dongsun Kim, and Tegawendé F Bissyandé. 2019. TBar: Revisiting template-based automated program repair. InProceedings of the 28th ACM SIGSOFT international symposium on software testing and analysis. 31–42

work page 2019
[33]

Fan Long, Peter Amidon, and Martin Rinard. 2017. Automatic inference of code transforms for patch generation. In Proceedings of the 2017 11th joint meeting on foundations of software engineering. 727–739

work page 2017
[34]

Fan Long and Martin Rinard. 2015. Staged program repair with condition synthesis. InProceedings of the 2015 10th Joint Meeting on Foundations of Software Engineering. 166–178

work page 2015
[35]

Fan Long and Martin Rinard. 2016. Automatic patch generation by learning correct code. InProceedings of the 43rd annual ACM SIGPLAN-SIGACT symposium on principles of programming languages. 298–312

work page 2016
[36]

José Antonio Hernández López, Boqi Chen, Mootez Saad, Tushar Sharma, and Dániel Varró. 2024. On inter-dataset code duplication and data leakage in large language models.IEEE Transactions on Software Engineering(2024)

work page 2024
[37]

Thibaud Lutellier, Hung Viet Pham, Lawrence Pang, Yitong Li, Moshi Wei, and Lin Tan. 2020. Coconut: combining context-aware neural translation models using ensemble for program repair. InProceedings of the 29th ACM SIGSOFT international symposium on software testing and analysis. 101–114

work page 2020
[38]

Matias Martinez and Martin Monperrus. 2016. Astor: A program repair library for java. InProceedings of the 25th international symposium on software testing and analysis. 441–444. , Vol. 1, No. 1, Article . Publication date: January 2026. DynaFix: Iterative Automated Program Repair Driven by Execution-Level Dynamic Information 21

work page 2016
[39]

Sergey Mechtaev, Jooyong Yi, and Abhik Roychoudhury. 2016. Angelix: Scalable multiline program patch synthesis via symbolic analysis. InProceedings of the 38th international conference on software engineering. 691–701

work page 2016
[40]

Philippe Moret, Walter Binder, and Éric Tanter. 2011. Polymorphic bytecode instrumentation. InProceedings of the tenth international conference on Aspect-oriented software development. 129–140

work page 2011
[41]

Devon H O’Dell. 2017. The Debugging Mindset: Understanding the psychology of learning strategies leads to effective problem-solving skills.Queue15, 1 (2017), 71–90

work page 2017
[42]

OpenAI. 2024. Documentation of OpenAI API. https://platform.openai.com/docs/introduction. Accessed: 2025-07-30

work page 2024
[43]

Rishov Paul, Md Mohib Hossain, Mohammed Latif Siddiq, Masum Hasan, Anindya Iqbal, and Joanna Santos. 2023. Enhancing automated program repair through fine-tuning and prompt engineering.arXiv preprint arXiv:2304.07840 (2023)

work page arXiv 2023
[44]

Daniel Ramos, Claudia Mamede, Kush Jain, Paulo Canelas, Catarina Gamboa, and Claire Le Goues. 2025. Are large language models memorizing bug benchmarks?. In2025 IEEE/ACM International Workshop on Large Language Models for Code (LLM4Code). IEEE, 1–8

work page 2025
[45]

Chukri Soueidi, Marius Monnier, and Yliès Falcone. 2023. Efficient and expressive bytecode-level instrumentation for Java programs.International Journal on Software Tools for Technology Transfer25, 4 (2023), 453–479

work page 2023
[46]

Ilya Sutskever, Oriol Vinyals, and Quoc V Le. 2014. Sequence to sequence learning with neural networks.Advances in neural information processing systems27 (2014)

work page 2014
[47]

Shin Hwei Tan, Hiroaki Yoshida, Mukul R Prasad, and Abhik Roychoudhury. 2016. Anti-patterns in search-based program repair. InProceedings of the 2016 24th ACM SIGSOFT International Symposium on Foundations of Software Engineering. 727–738

work page 2016
[48]

Yuxiang Wei, Chunqiu Steven Xia, and Lingming Zhang. 2023. Copiloting the copilots: Fusing large language models with completion engines for automated program repair. InProceedings of the 31st ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering. 172–184

work page 2023
[49]

Ming Wen, Junjie Chen, Rongxin Wu, Dan Hao, and Shing-Chi Cheung. 2018. Context-aware patch generation for better automated program repair. InProceedings of the 40th international conference on software engineering. 1–11

work page 2018
[50]

Chunqiu Steven Xia, Yifeng Ding, and Lingming Zhang. 2023. The plastic surgery hypothesis in the era of large language models. In2023 38th IEEE/ACM International Conference on Automated Software Engineering (ASE). IEEE, 522–534

work page 2023
[51]

Chunqiu Steven Xia, Yifeng Ding, and Lingming Zhang. 2023. Revisiting the plastic surgery hypothesis via large language models.arXiv preprint arXiv:2303.10494(2023)

work page arXiv 2023
[52]

Chunqiu Steven Xia, Yuxiang Wei, and Lingming Zhang. 2023. Automated program repair in the era of large pre-trained language models. In2023 IEEE/ACM 45th International Conference on Software Engineering (ICSE). IEEE, 1482–1494

work page 2023
[53]

Chunqiu Steven Xia and Lingming Zhang. 2022. Less training, more repairing please: revisiting automated program repair via zero-shot learning. InProceedings of the 30th ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering. 959–971

work page 2022
[54]

Chunqiu Steven Xia and Lingming Zhang. 2023. Conversational automated program repair.arXiv preprint arXiv:2301.13246(2023)

work page arXiv 2023
[55]

Chunqiu Steven Xia and Lingming Zhang. 2024. Automated program repair via conversation: Fixing 162 out of 337 bugs for $0.42 each using chatgpt. InProceedings of the 33rd ACM SIGSOFT International Symposium on Software Testing and Analysis. 819–831

work page 2024
[56]

Qi Xin and Steven P Reiss. 2017. Leveraging syntax-related code for automated program repair. In2017 32nd IEEE/ACM International Conference on Automated Software Engineering (ASE). IEEE, 660–670

work page 2017
[57]

Yingfei Xiong and Bo Wang. 2022. L2S: A framework for synthesizing the most probable program under a specification. ACM Transactions on Software Engineering and Methodology (TOSEM)31, 3 (2022), 1–45

work page 2022
[58]

Yingfei Xiong, Jie Wang, Runfa Yan, Jiachen Zhang, Shi Han, Gang Huang, and Lu Zhang. 2017. Precise condition synthesis for program repair. In2017 IEEE/ACM 39th International Conference on Software Engineering (ICSE). IEEE, 416–426

work page 2017
[59]

Junjielong Xu, Ying Fu, Shin Hwei Tan, and Pinjia He. 2025. Aligning the Objective of LLM-Based Program Repair. In 2025 IEEE/ACM 47th International Conference on Software Engineering (ICSE). IEEE, 2548–2560

work page 2025
[60]

Jifeng Xuan, Matias Martinez, Favio Demarco, Maxime Clement, Sebastian Lamelas Marcote, Thomas Durieux, Daniel Le Berre, and Martin Monperrus. 2016. Nopol: Automatic repair of conditional statement bugs in java programs.IEEE Transactions on Software Engineering43, 1 (2016), 34–55

work page 2016
[61]

He Ye, Matias Martinez, Xiapu Luo, Tao Zhang, and Martin Monperrus. 2022. Selfapr: Self-supervised program repair with test execution diagnostics. InProceedings of the 37th IEEE/ACM International Conference on Automated Software Engineering. 1–13

work page 2022
[62]

He Ye, Matias Martinez, and Martin Monperrus. 2022. Neural program repair with execution-based backpropagation. InProceedings of the 44th international conference on software engineering. 1506–1518. , Vol. 1, No. 1, Article . Publication date: January 2026. 22 Zhili Huang, Ling Xu, Chao Liu, Weifeng Sun, Xu Zhang, Yan Lei, Meng Yan, and Hongyu Zhang

work page 2022
[63]

He Ye and Martin Monperrus. 2024. Iter: Iterative neural repair for multi-location patches. InProceedings of the 46th IEEE/ACM international conference on software engineering. 1–13

work page 2024
[64]

Xin Yin, Chao Ni, Shaohua Wang, Zhenhao Li, Limin Zeng, and Xiaohu Yang. 2024. Thinkrepair: Self-directed automated program repair. InProceedings of the 33rd ACM SIGSOFT International Symposium on Software Testing and Analysis. 1274–1286

work page 2024
[65]

Quanjun Zhang, Chunrong Fang, Yuxiang Ma, Weisong Sun, and Zhenyu Chen. 2023. A survey of learning-based automated program repair.ACM Transactions on Software Engineering and Methodology33, 2 (2023), 1–69

work page 2023
[66]

Quanjun Zhang, Chunrong Fang, Tongke Zhang, Bowen Yu, Weisong Sun, and Zhenyu Chen. 2023. Gamma: Revisiting template-based automated program repair via mask prediction. In2023 38th IEEE/ACM International Conference on Automated Software Engineering (ASE). IEEE, 535–547

work page 2023
[67]

Qihao Zhu, Zeyu Sun, Yuan-an Xiao, Wenjie Zhang, Kang Yuan, Yingfei Xiong, and Lu Zhang. 2021. A syntax-guided edit decoder for neural program repair. InProceedings of the 29th ACM joint meeting on European software engineering conference and symposium on the foundations of software engineering. 341–353

work page 2021
[68]

Qihao Zhu, Zeyu Sun, Wenjie Zhang, Yingfei Xiong, and Lu Zhang. 2023. Tare: Type-aware neural program repair. In 2023 IEEE/ACM 45th International Conference on Software Engineering (ICSE). IEEE, 1443–1455. , Vol. 1, No. 1, Article . Publication date: January 2026

work page 2023