pith. sign in

arxiv: 2407.17240 · v2 · submitted 2024-07-24 · 💻 cs.SE

Ranking Plausible Patches by Historic Feature Frequencies

Pith reviewed 2026-05-23 22:39 UTC · model grok-4.3

classification 💻 cs.SE
keywords automated program repairpatch rankinghistoric fixesfeature similarityplausible patchesDefects4JAPR tools
0
0 comments X

The pith

PrevaRank ranks patches by how closely their features match those of past correct human fixes.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper presents PrevaRank as a post-processing step that any automated program repair tool can use to reorder its plausible patches. It measures each patch's feature profile against frequencies observed in thousands of programmer-written fixes from 81 Java projects. The goal is to surface correct fixes earlier in the list so developers examine fewer incorrect but test-passing candidates. Experiments on 168 Defects4J bugs and eight repair tools showed the method lifted correct fixes into the top three positions 27 percent more often than the original rankings. The approach relies on lightweight heuristics rather than deep semantic analysis.

Core claim

PrevaRank ranks plausible patches produced by any APR technique according to their feature similarity with historic programmer-written fixes for similar bugs. After training on the fix history of 81 open-source Java projects, it was used to rank patches produced by 8 Java APR tools on 168 Defects4J bugs. PrevaRank consistently improved the ranking of correct fixes, for example ranking a correct fix within the top-3 positions in 27% more cases than the original tools did.

What carries the argument

Feature similarity heuristics that compare a patch's syntactic and semantic characteristics to frequency counts drawn from historic correct fixes.

If this is right

  • Any APR tool that already produces a list of plausible patches can adopt PrevaRank without changing its core repair logic.
  • The ranking improvement holds across multiple distinct repair techniques and bug types.
  • Overhead remains negligible once the historic feature model has been built.
  • Correct fixes move into the top three positions 27 percent more often than under the tools' native ranking.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same frequency-based signal could be tested on languages other than Java if comparable fix histories exist.
  • Developers might need to inspect substantially fewer patches before locating a working repair.
  • If feature similarity proves stable, it could serve as a lightweight correctness filter before expensive validation steps.

Load-bearing premise

Feature overlap with past correct fixes reliably signals that a new patch is also correct.

What would settle it

Running PrevaRank on a fresh collection of bugs and repair tools and finding that the fraction of correct fixes appearing in the top three positions stays the same or drops.

read the original abstract

Automated program repair (APR) techniques have achieved conspicuous progress, and are now capable of producing genuinely correct fixes in scenarios that were well beyond their capabilities only a few years ago. Nevertheless, even when an APR technique can find a correct fix for a bug, it still runs the risk of ranking the fix lower than other patches that are plausible (they pass all available tests) but incorrect. This can seriously hurt the technique's practical effectiveness, as the user will have to peruse a larger number of patches before finding the correct one. This paper presents PrevaRank, a technique that ranks plausible patches produced by any APR technique according to their feature similarity with historic programmer-written fixes for similar bugs. PrevaRank implements simple heuristics, which help make it scalable and applicable to any APR tool that produces plausible patches. In our experimental evaluation, after training PrevaRank on the fix history of 81 open-source Java projects, we used it to rank patches produced by 8 Java APR tools on 168 Defects4J bugs. PrevaRank consistently improved the ranking of correct fixes: for example, it ranked a correct fix within the top-3 positions in 27% more cases than the original tools did. Other experimental results indicate that PrevaRank works robustly with a variety of APR tools and bugs, with negligible overhead.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

3 major / 3 minor

Summary. The paper introduces PrevaRank, a post-processing technique that re-ranks plausible patches generated by any APR tool according to the frequency of code features observed in a corpus of historic programmer-written fixes. After training on fix histories from 81 open-source Java projects, the method is evaluated on 168 Defects4J bugs using patches from eight existing APR tools; the central empirical claim is that PrevaRank improves the ranking of correct patches (e.g., placing a correct fix in the top-3 positions in 27% more cases than the original tools).

Significance. If the reported ranking gains hold under scrutiny, the work offers a lightweight, tool-agnostic way to improve the practical utility of APR without modifying the underlying repair engines. The use of simple, scalable heuristics and the explicit separation of training and evaluation corpora are positive design choices that support reproducibility and broad applicability.

major comments (3)
  1. [§4.3, Table 5] §4.3 and Table 5: the reported 27% relative improvement in top-3 ranking is presented without per-tool or per-bug statistical significance tests or confidence intervals; it is therefore unclear whether the aggregate figure is driven by a small number of outliers or holds consistently across the eight APR tools.
  2. [§3.2] §3.2: the feature set is defined by simple syntactic heuristics (e.g., AST node types, identifier patterns); the paper does not report an ablation that isolates which features drive the ranking gains, making it difficult to assess whether the improvement is robust or merely an artifact of the particular Defects4J distribution.
  3. [§4.1] §4.1: the training corpus of 81 projects is stated to be disjoint from Defects4J, yet no explicit check (e.g., project-name or commit-hash overlap) is described; any undetected overlap would introduce a data-leakage risk that directly affects the validity of the cross-project evaluation.
minor comments (3)
  1. [§3.1] The notation for feature vectors and similarity scores in §3.1 is introduced without a compact mathematical definition; adding a single equation would improve readability.
  2. [Figure 3] Figure 3 caption does not state the exact number of patches per tool that were re-ranked; this detail is only recoverable from the text.
  3. [Related Work] The paper cites prior ranking work but does not compare runtime overhead against learning-based rankers such as those using neural embeddings; a brief paragraph would strengthen the positioning.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the positive recommendation of minor revision and the constructive comments. We respond to each major comment below.

read point-by-point responses
  1. Referee: [§4.3, Table 5] §4.3 and Table 5: the reported 27% relative improvement in top-3 ranking is presented without per-tool or per-bug statistical significance tests or confidence intervals; it is therefore unclear whether the aggregate figure is driven by a small number of outliers or holds consistently across the eight APR tools.

    Authors: We agree that statistical tests would strengthen the claims. In the revised version we will add per-tool Wilcoxon signed-rank tests together with bootstrap confidence intervals on the top-3 ranking improvements. revision: yes

  2. Referee: [§3.2] §3.2: the feature set is defined by simple syntactic heuristics (e.g., AST node types, identifier patterns); the paper does not report an ablation that isolates which features drive the ranking gains, making it difficult to assess whether the improvement is robust or merely an artifact of the particular Defects4J distribution.

    Authors: The features were selected from patterns established in prior fix-pattern literature; the paper's focus is the ranking method rather than feature discovery. We will add a short discussion in §3.2 of the relative frequency of each feature category in the training corpus, but a full ablation study remains outside the current scope. revision: partial

  3. Referee: [§4.1] §4.1: the training corpus of 81 projects is stated to be disjoint from Defects4J, yet no explicit check (e.g., project-name or commit-hash overlap) is described; any undetected overlap would introduce a data-leakage risk that directly affects the validity of the cross-project evaluation.

    Authors: The 81 projects were chosen by name to exclude any Defects4J subjects and the training commits come from repositories outside Defects4J. We will explicitly describe this verification (project-name matching and repository exclusion) in the revised §4.1. revision: yes

Circularity Check

0 steps flagged

No significant circularity

full rationale

The paper presents an empirical ranking technique (PrevaRank) trained on fix histories from 81 distinct open-source Java projects and evaluated on 168 Defects4J bugs across 8 APR tools. The central claim is an observed improvement in top-k placement of correct patches, derived from separate training and test data with no equations, self-definitional features, fitted-input predictions, or load-bearing self-citations that reduce the result to its inputs by construction. The use of simple heuristics for feature similarity is presented as an explicit design decision for broad applicability rather than a derived necessity.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

No free parameters, axioms, or invented entities are mentioned in the abstract.

pith-pipeline@v0.9.0 · 5779 in / 1098 out tokens · 48480 ms · 2026-05-23T22:39:49.432293+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

87 extracted references · 87 canonical work pages · 1 internal anchor

  1. [1]

    Defects4J: A collection of reproducible bugs and a supporting infrastructure with the goal of advancing software engineering research

    2024. Defects4J: A collection of reproducible bugs and a supporting infrastructure with the goal of advancing software engineering research. https://github.com/rjust/defects4j

  2. [2]

    Andrea Arcuri. 2011. Evolutionary Repair of Faulty Software. Appl. Soft Comput. 11, 4 (June 2011), 3494–3514

  3. [3]

    Andrea Arcuri and Xin Yao. 2008. A Novel Co-evolutionary Approach to Automatic Software Bug Fixing. In In Proceedings of the IEEE Congress on Evolutionary Computation . 162–168

  4. [4]

    Raghunath Arnab. 2017. Chapter 7 - Stratified Sampling. In Survey Sampling Theory and Applications , Raghunath Arnab (Ed.). Academic Press, 213–256. https://doi.org/10.1016/B978-0-12-811848-1.00007-8

  5. [5]

    Padraic Cashin, Carianne Martinez, Westley Weimer, and Stephanie Forrest. 2019. Understanding Automatically- Generated Patches Through Symbolic Invariant Differences. In 2019 34th IEEE/ACM International Conference on Automated Software Engineering (ASE). 411–414. https://doi.org/10.1109/ASE.2019.00046

  6. [6]

    Saikat Chakraborty, Yangruibo Ding, Miltiadis Allamanis, and Baishakhi Ray. 2020. CODIT: Code Editing with Tree-Based Neural Models. IEEE Transactions on Software Engineering (2020), 1–1. https://doi.org/10.1109/ TSE.2020.3020502

  7. [7]

    Liushan Chen, Yu Pei, and Carlo A. Furia. 2017. Contract-based program repair without the contracts. In 2017 32nd IEEE/ACM International Conference on Automated Software Engineering (ASE) . 637–647. https://doi.org/ 10.1109/ASE.2017.8115674

  8. [8]

    Liushan Chen, Yu Pei, and Carlo A. Furia. 2017. Contract-Based Program Repair without the Contracts. In Proceedings of the 2017 32th IEEE/ACM International Conference on Automated Software Engineering . Urbana- Champaign, IL, USA, 637–647

  9. [9]

    L. Chen, Y. Pei, M. Pan, T. Zhang, Q. Wang, and C. A. Furia. 2023. Program Repair With Repeated Learning. IEEE Transactions on Software Engineering 49, 02 (feb 2023), 831–848. https://doi.org/10.1109/TSE.2022.3164662

  10. [10]

    Zimin Chen, Steve Kommrusch, Michele Tufano, Louis-Noël Pouchet, Denys Poshyvanyk, and Martin Monperrus

  11. [11]

    IEEE Transactions on Software Engineering 47, 9 (2019), 1943–1959

    Sequencer: Sequence-to-sequence learning for end-to-end program repair. IEEE Transactions on Software Engineering 47, 9 (2019), 1943–1959. 16

  12. [12]

    Andreea Costea, Abhishek Tiwari, Sigmund Chianasta, Kishore R, Abhik Roychoudhury, and Ilya Sergey. 2023. Hippodrome: Data Race Repair Using Static Analysis Summaries. ACM Trans. Softw. Eng. Methodol. 32, 2, Article 41 (mar 2023), 33 pages. https://doi.org/10.1145/3546942

  13. [13]

    DeMillo, R.J

    R.A. DeMillo, R.J. Lipton, and F.G. Sayward. 1978. Hints on Test Data Selection: Help for the Practicing Programmer. Computer 11, 4 (1978), 34–41. https://doi.org/10.1109/C-M.1978.218136

  14. [14]

    X. B. Dinh Le, L. Bao, D. Lo, X. Xia, S. Li, and C. Pasareanu. 2019. On Reliability of Patch Correctness Assessment. In 2019 IEEE/ACM 41st International Conference on Software Engineering (ICSE) . 524–535

  15. [15]

    Thomas Durieux, Fernanda Madeiral, Matias Martinez, and Rui Abreu. 2019. Empirical Review of Java Program Repair Tools: A Large-Scale Experiment on 2,141 Bugs and 23,551 Repair Attempts. In Proceedings of the 2019 27th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering (Tallinn, Estonia) ...

  16. [16]

    Hadeel Eladawy, Claire Le Goues, and Yuriy Brun. 2024. Automated Program Repair, What Is It Good For? Not Absolutely Nothing!. In Proceedings of the IEEE/ACM 46th International Conference on Software Engineering (ICSE ’24). Association for Computing Machinery, New York, NY, USA, Article 84, 13 pages. https://doi.org/10. 1145/3597503.3639095

  17. [17]

    Jean-Rémy Falleri, Floréal Morandat, Xavier Blanc, Matias Martinez, and Martin Monperrus. 2014. Fine-grained and Accurate Source Code Differencing. In Proceedings of the International Conference on Automated Software Engineering. 313–324. https://doi.org/10.1145/2642937.2642982

  18. [18]

    Alcides Fonseca and Máximo Oliveira. 2022. Figra: Evaluating a larger search space for Cardumen in Automatic Program Repair. In 2022 IEEE/ACM International Workshop on Automated Program Repair (APR) . 24–30. https: //doi.org/10.1145/3524459.3527345

  19. [19]

    Luca Gazzola, Daniela Micucci, and Leonardo Mariani. 2019. Automatic Software Repair: A Survey. IEEE Transactions on Software Engineering 45, 1 (2019), 34–67. https://doi.org/10.1109/TSE.2017.2755013

  20. [20]

    Ali Ghanbari, Samuel Benton, and Lingming Zhang. 2019. Practical Program Repair via Bytecode Mutation. In Proceedings of the 28th ACM SIGSOFT International Symposium on Software Testing and Analysis (Beijing, China) (ISSTA 2019). ACM, New York, NY, USA, 19–30

  21. [21]

    Ali Ghanbari and Andrian Marcus. 2022. Patch correctness assessment in automated program repair based on the impact of patches on production and test code. In Proceedings of the 31st ACM SIGSOFT International Symposium on Software Testing and Analysis (, Virtual, South Korea,) (ISSTA 2022). Association for Computing Machinery, New York, NY, USA, 654–665

  22. [23]

    Rahul Gupta, Soham Pal, Aditya Kanade, and Shirish Shevade. 2017. DeepFix: Fixing Common C Language Errors by Deep Learning. In Proceedings of the Thirty-First AAAI Conference on Artificial Intelligence (San Francisco, California, USA) (AAAI’17). AAAI Press, 1345–1351

  23. [24]

    Hideaki Hata, Emad Shihab, and Graham Neubig. 2019. Learning to Generate Corrective Patches using Neural Machine Translation. arXiv:1812.07170 [cs.SE]

  24. [25]

    Dávid Hidvégi. 2023. Token Budget Minimisation of Large Language Model based Program Repair

  25. [26]

    Roya Hosseini and Peter Brusilovsky. 2013. Javaparser: A fine-grain concept indexing tool for java problems. In CEUR Workshop Proceedings, Vol. 1009. University of Pittsburgh, 60–63

  26. [27]

    Jinru Hua, Mengshi Zhang, Kaiyuan Wang, and Sarfraz Khurshid. 2018. Towards practical program repair with on-demand candidate generation. In Proceedings of the 40th International Conference on Software Engineering, ICSE 2018, Gothenburg, Sweden, May 27 - June 03, 2018 . 12–23

  27. [28]

    Kai Huang, Xiangxin Meng, Jian Zhang, Yang Liu, Wenjie Wang, Shuhao Li, and Yuqing Zhang. 2023. An Empirical Study on Fine-Tuning Large Language Models of Code for Automated Program Repair. In2023 38th IEEE/ACM International Conference on Automated Software Engineering (ASE) . 1162–1174. 17

  28. [29]

    Jiajun Jiang, Yingfei Xiong, Hongyu Zhang, Qing Gao, and Xiangqun Chen. 2018. Shaping program repair space with existing patches and similar code. In Proceedings of the 27th ACM SIGSOFT international symposium on software testing and analysis . 298–309

  29. [30]

    Nan Jiang, Thibaud Lutellier, Yiling Lou, Lin Tan, Dan Goldwasser, and Xiangyu Zhang. 2023. Knod: Domain knowledge distilled tree decoder for automated program repair. In 2023 IEEE/ACM 45th International Conference on Software Engineering (ICSE). IEEE, 1251–1263

  30. [31]

    Nan Jiang, Thibaud Lutellier, and Lin Tan. 2021. Cure: Code-aware neural machine translation for automatic program repair. In 2021 IEEE/ACM 43rd International Conference on Software Engineering (ICSE) . IEEE, 1161–1173

  31. [32]

    René Just, Darioush Jalali, and Michael D Ernst. 2014. Defects4J: A database of existing faults to enable controlled testing studies for Java programs. In Proceedings of the 2014 international symposium on software testing and analysis

  32. [33]

    Dongsun Kim, Jaechang Nam, Jaewoo Song, and Sunghun Kim. 2013. Automatic Patch Generation Learned from Human-written Patches. In Proceedings of the 2013 International Conference on Software Engineering . Piscataway, NJ, USA, 802–811

  33. [34]

    Bissyandé, Dongsun Kim, Martin Monperrus, Jacques Klein, and Yves Le Traon

    Anil Koyuncu, Kui Liu, Tegawendé F. Bissyandé, Dongsun Kim, Martin Monperrus, Jacques Klein, and Yves Le Traon. 2020. FixMiner: Mining Relevant Fix Patterns for Automated Program Repair. Empirical Software Engineering 25, 3 (2020), 1980–2024. https://doi.org/10.1007/s10664-019-09780-z

  34. [35]

    X. B. D. Le, D. Lo, and C. L. Goues. 2016. History Driven Program Repair. In2016 IEEE 23rd International Conference on Software Analysis, Evolution, and Reengineering (SANER) (2016-03), Vol. 1. 213–224

  35. [36]

    Claire Le Goues, Michael Pradel, and Abhik Roychoudhury. 2019. Automated program repair. Commun. ACM 62, 12 (nov 2019), 56–65. https://doi.org/10.1145/3318162

  36. [37]

    Yi Li, Shaohua Wang, and Tien N. Nguyen. 2020. DLFix: Context-Based Code Transformation Learning for Automated Program Repair. In Proceedings of the ACM/IEEE 42nd International Conference on Software Engineering (Seoul, South Korea) (ICSE ’20). Association for Computing Machinery, New York, NY, USA, 602–614. https: //doi.org/10.1145/3377811.3380345

  37. [38]

    Kui Liu, Anil Koyuncu, Tegawendé F Bissyandé, Dongsun Kim, Jacques Klein, and Yves Le Traon. 2019. You cannot fix what you cannot find! an investigation of fault localization bias in benchmarking automated program repair systems. In 2019 12th IEEE conference on software testing, validation and verification (ICST) . IEEE, 102–113

  38. [39]

    Bissyandé

    Kui Liu, Anil Koyuncu, Dongsun Kim, and Tegawendé F. Bissyandé. 2019. TBar: revisiting template-based automated program repair. In Proceedings of the 28th ACM SIGSOFT International Symposium on Software Testing and Analysis (Beijing, China) (ISSTA 2019). Association for Computing Machinery, New York, NY, USA, 31–42. https://doi.org/10.1145/3293882.3330577

  39. [40]

    Fan Long, Peter Amidon, and Martin Rinard. 2017. Automatic Inference of Code Transforms for Patch Generation. In Proceedings of the 2017 11th Joint Meeting on Foundations of Software Engineering (Paderborn, Germany). New York, NY, USA, 727–739

  40. [41]

    Fan Long and Martin Rinard. 2015. Staged Program Repair with Condition Synthesis. In Proceedings of the 2015 10th Joint Meeting on Foundations of Software Engineering (ESEC/FSE 2015) . New York, NY, USA, 166–178

  41. [42]

    Fan Long and Martin Rinard. 2016. Automatic patch generation by learning correct code. In Proceedings of the 43rd Annual ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages, POPL 2016, St. Petersburg, FL, USA, January 20 - 22, 2016 , Rastislav Bodík and Rupak Majumdar (Eds.). ACM, 298–312. https: //doi.org/10.1145/2837614.2837617

  42. [43]

    Thibaud Lutellier, Hung Viet Pham, Lawrence Pang, Yitong Li, Moshi Wei, and Lin Tan. 2020. Coconut: combining context-aware neural translation models using ensemble for program repair. In Proceedings of the 29th ACM SIGSOFT international symposium on software testing and analysis . 101–114

  43. [44]

    Matias Martinez and Martin Monperrus. 2016. Astor: A program repair library for java. In Proceedings of the 25th international symposium on software testing and analysis . 441–444. 18

  44. [45]

    Matias Martinez, Westley Weimer, and Martin Monperrus. 2014. Do the fix ingredients already exist? An empirical inquiry into the redundancy assumptions of program repair approaches. In 36th International Conference on Software Engineering, ICSE ’14, Companion Proceedings, Hyderabad, India, May 31 - June 07, 2014 , Pankaj Jalote, Lionel C. Briand, and Andr...

  45. [46]

    Sergey Mechtaev, Xiang Gao, Shin Hwei Tan, and Abhik Roychoudhury. 2018. Test-Equivalence Analysis for Automatic Patch Generation. ACM Trans. Softw. Eng. Methodol. 27, 4, Article 15 (Oct. 2018), 37 pages

  46. [47]

    Sergey Mechtaev, Manh-Dung Nguyen, Yannic Noller, Lars Grunske, and Abhik Roychoudhury. 2018. Semantic Program Repair Using a Reference Implementation. In Proceedings of the 40th International Conference on Software Engineering (Gothenburg, Sweden) (ICSE ’18). ACM, New York, NY, USA, 129–139

  47. [48]

    Sergey Mechtaev, Jooyong Yi, and Abhik Roychoudhury. 2015. DirectFix: Looking for Simple Program Repairs. In Proceedings of the 37th International Conference on Software Engineering - Volume 1 (Piscataway, NJ, USA). 448–458

  48. [49]

    Sergey Mechtaev, Jooyong Yi, and Abhik Roychoudhury. 2016. Angelix: Scalable Multiline Program Patch Synthesis via Symbolic Analysis. In Proceedings of the 38th International Conference on Software Engineering . New York, NY, USA, 691–701. https://doi.org/10.1145/2884781.2884807

  49. [50]

    Fairuz Nawer Meem, Justin Smith, and Brittany Johnson. 2024. Exploring Experiences with Automated Program Repair in Practice. In Proceedings of the IEEE/ACM 46th International Conference on Software Engineering . 1–11

  50. [51]

    Xiangxin Meng, Xu Wang, Hongyu Zhang, Hailong Sun, and Xudong Liu. 2022. Improving Fault Localization and Program Repair with Deep Semantic Features and Transferred Knowledge. In 2022 IEEE/ACM 44th International Conference on Software Engineering (ICSE) . 1169–1180

  51. [52]

    Xiangxin Meng, Xu Wang, Hongyu Zhang, Hailong Sun, Xudong Liu, and Chunming Hu. 2023. Template-based Neural Program Repair. In2023 IEEE/ACM 45th International Conference on Software Engineering (ICSE). 1456–1468

  52. [53]

    Automatic Patch Generation Learned from Human-written Patches

    Martin Monperrus. 2014. A Critical Review of "Automatic Patch Generation Learned from Human-written Patches": Essay on the Problem Statement and the Evaluation of Automatic Software Repair. In Proceedings of the 36th International Conference on Software Engineering . New York, NY, USA, 234–242

  53. [54]

    Yannic Noller, Ridwan Shariffdeen, Xiang Gao, and Abhik Roychoudhury. 2022. Trust enhancement issues in pro- gram repair. In Proceedings of the 44th International Conference on Software Engineering (Pittsburgh, Pennsylvania) (ICSE ’22). Association for Computing Machinery, New York, NY, USA, 2228–2240

  54. [55]

    Furia, Martin Nordio, Yi Wei, Bertrand Meyer, and Andreas Zeller

    Yu Pei, Carlo A. Furia, Martin Nordio, Yi Wei, Bertrand Meyer, and Andreas Zeller. 2014. Automated Fixing of Programs with Contracts. IEEE Transactions on Software Engineering 40, 5 (2014), 427–449

  55. [56]

    Furia, Martin Nordio, Yi Wei, Bertrand Meyer, and Andreas Zeller

    Yu Pei, Carlo A. Furia, Martin Nordio, Yi Wei, Bertrand Meyer, and Andreas Zeller. 2014. Automated Fixing of Programs with Contracts. IEEE Transactions on Software Engineering 40, 5 (2014), 427–449. https://doi.org/10. 1109/TSE.2014.2312918

  56. [57]

    Yun Peng, Shuzheng Gao, Cuiyun Gao, Yintong Huo, and Michael Lyu. 2024. Domain Knowledge Matters: Improving Prompts with Fix Templates for Repairing Python Type Errors. InProceedings of the IEEE/ACM 46th International Conference on Software Engineering (, Lisbon, Portugal,) (ICSE ’24). Association for Computing Machinery, New York, NY, USA, Article 4, 13 ...

  57. [58]

    Yuhua Qi, Xiaoguang Mao, Yan Lei, Ziying Dai, and Chengsong Wang. 2014. The strength of random search on automated program repair. In Proceedings of the 36th International Conference on Software Engineering (Hyderabad, India) (ICSE 2014). Association for Computing Machinery, New York, NY, USA, 254–265. https://doi.org/10. 1145/2568225.2568254

  58. [59]

    Zichao Qi, Fan Long, Sara Achour, and Martin Rinard. 2015. An Analysis of Patch Plausibility and Correctness for Generate-and-validate Patch Generation Systems. In Proceedings of the 2015 International Symposium on Software Testing and Analysis. New York, NY, USA, 24–36

  59. [60]

    Saha, Yingjun Lyu, Hiroaki Yoshida, and Mukul R

    Ripon K. Saha, Yingjun Lyu, Hiroaki Yoshida, and Mukul R. Prasad. 2017. ELIXIR: Effective Object Oriented Program Repair. In Proceedings of the 32Nd IEEE/ACM International Conference on Automated Software Engineering (Urbana-Champaign, IL, USA). Piscataway, NJ, USA, 648–659. 19

  60. [61]

    Saha, and Mukul R

    Seemanta Saha, Ripon K. Saha, and Mukul R. Prasad. 2019. Harnessing Evolution for Multi-hunk Program Repair. In Proceedings of the 41st International Conference on Software Engineering (Montreal, Quebec, Canada) (ICSE ’19). IEEE Press, Piscataway, NJ, USA, 13–24

  61. [62]

    Bissyandé

    Haoye Tian, Yinghua Li, Weiguo Pian, Abdoul Kader Kaboré, Kui Liu, Andrew Habib, Jacques Klein, and Tegawendé F. Bissyandé. 2022. Predicting Patch Correctness Based on the Similarity of Failing Test Cases. ACM Trans. Softw. Eng. Methodol. 31, 4 (2022), 77:1–77:30. https://doi.org/10.1145/3511096

  62. [63]

    Michele Tufano, Cody Watson, Gabriele Bavota, Massimiliano Di Penta, Martin White, and Denys Poshyvanyk

  63. [64]

    ACM Trans

    An Empirical Study on Learning Bug-Fixing Patches in the Wild via Neural Machine Translation. ACM Trans. Softw. Eng. Methodol. 28, 4, Article 19 (Sept. 2019), 29 pages

  64. [65]

    Shangwen Wang, Ming Wen, Bo Lin, Hongjun Wu, Yihao Qin, Deqing Zou, Xiaoguang Mao, and Hai Jin. 2021. Automated patch correctness assessment: how far are we?. In Proceedings of the 35th IEEE/ACM International Conference on Automated Software Engineering (Virtual Event, Australia) (ASE ’20). Association for Computing Machinery, New York, NY, USA, 968–980

  65. [66]

    Yuxiang Wei, Chunqiu Steven Xia, and Lingming Zhang. 2023. Copiloting the Copilots: Fusing Large Language Models with Completion Engines for Automated Program Repair. In Proceedings of the 31st ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering (, San Francisco, CA, USA,) (ESEC/FSE 2023). Associatio...

  66. [67]

    Westley Weimer, ThanhVu Nguyen, Claire Le Goues, and Stephanie Forrest. 2009. Automatically finding patches using genetic programming. In Proceedings of the IEEE 31st International Conference on Software Engineering . 364–374

  67. [68]

    Ming Wen, Junjie Chen, Rongxin Wu, Dan Hao, and Shing-Chi Cheung. 2018. Context-aware patch generation for better automated program repair. In Proceedings of the 40th International Conference on Software Engineering, ICSE 2018, Gothenburg, Sweden, May 27 - June 03, 2018 . 1–11

  68. [69]

    White, M

    M. White, M. Tufano, M. Martínez, M. Monperrus, and D. Poshyvanyk. 2019. Sorting and Transforming Program Repair Ingredients via Deep Learning Code Similarities. In 2019 IEEE 26th International Conference on Software Analysis, Evolution and Reengineering (SANER) . 479–490

  69. [70]

    Chunqiu Steven Xia, Yuxiang Wei, and Lingming Zhang. 2023. Automated Program Repair in the Era of Large Pre-trained Language Models. In 2023 IEEE/ACM 45th International Conference on Software Engineering (ICSE) . 1482–1494. https://doi.org/10.1109/ICSE48619.2023.00129

  70. [71]

    Chunqiu Steven Xia and Lingming Zhang. 2022. Less training, more repairing please: revisiting automated program repair via zero-shot learning. In Proceedings of the 30th ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering . 959–971

  71. [72]

    Qi Xin and Steven P. Reiss. 2017. Identifying test-suite-overfitted patches through test case generation. In Proceedings of the 26th ACM SIGSOFT International Symposium on Software Testing and Analysis (Santa Barbara, CA, USA) (ISSTA 2017). Association for Computing Machinery, New York, NY, USA, 226–236. https://doi.org/ 10.1145/3092703.3092718

  72. [73]

    Qi Xin and Steven P. Reiss. 2017. Leveraging Syntax-related Code for Automated Program Repair. In Proceedings of the 32nd IEEE/ACM International Conference on Automated Software Engineering . Piscataway, NJ, USA, 660–670

  73. [74]

    Yingfei Xiong, Xinyuan Liu, Muhan Zeng, Lu Zhang, and Gang Huang. 2018. Identifying patch correctness in test-based program repair. InProceedings of the 40th International Conference on Software Engineering (,Gothenburg, Sweden). Association for Computing Machinery, New York, NY, USA, 789–799

  74. [75]

    Yingfei Xiong, Jie Wang, Runfa Yan, Jiachen Zhang, Shi Han, Gang Huang, and Lu Zhang. 2017. Precise Condition Synthesis for Program Repair. In Proceedings of the 39th International Conference on Software Engineering (Buenos Aires, Argentina). Piscataway, NJ, USA, 416–426

  75. [76]

    T. Xu, L. Chen, Y. Pei, T. Zhang, M. Pan, and C. A. Furia. 2020. Restore: Retrospective Fault Localization Enhancing Automated Program Repair. IEEE Transactions on Software Engineering (2020), 1–1

  76. [77]

    J. Xuan, M. Martinez, F. DeMarco, M. Clement, S. Lamelas Marcote, T. Durieux, D. Le Berre, and M. Monperrus

  77. [78]

    IEEE Transactions on Software Engineering PP, 99 (2016), 1–1

    Nopol: Automatic Repair of Conditional Statement Bugs in Java Programs. IEEE Transactions on Software Engineering PP, 99 (2016), 1–1. https://doi.org/10.1109/TSE.2016.2560811 20

  78. [79]

    Jinqiu Yang, Alexey Zhikhartsev, Yuefei Liu, and Lin Tan. 2017. Better test cases for better automated program repair. In Proceedings of the 2017 11th Joint Meeting on Foundations of Software Engineering (Paderborn, Germany) (ESEC/FSE 2017). Association for Computing Machinery, New York, NY, USA, 831–841. https://doi.org/10. 1145/3106237.3106274

  79. [80]

    He Ye, Matias Martinez, and Martin Monperrus. 2022. Neural program repair with execution-based backpropagation. In 2022 IEEE/ACM 44th International Conference on Software Engineering (ICSE). IEEE, 1506ś1518 (2022)

  80. [81]

    He Ye and Martin Monperrus. 2024. ITER: Iterative Neural Repair for Multi-Location Patches. In Proceedings of the 46th IEEE/ACM International Conference on Software Engineering . 1–13

Showing first 80 references.