Harnessing Evolution for Multi-Hunk Program Repair
Pith reviewed 2026-05-25 19:06 UTC · model grok-4.3
The pith
Identifying evolutionary siblings lets a repair tool apply similar patches across multiple locations to fix more multi-hunk bugs.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Bugs that require substantially similar patches at multiple locations can be repaired by first identifying evolutionary siblings through combined use of test-suite spectrum, code similarity analysis, and revision history, then applying the same patch to all siblings at once.
What carries the argument
Evolutionary siblings: groups of similar code locations in similar contexts that are expected to undergo similar changes, discovered via test-suite spectrum, code similarity, and revision history to enable simultaneous repair.
If this is right
- Single-hunk repair methods can be extended to an important class of multi-hunk bugs by grouping locations that share expected changes.
- Hercules produces the largest number of correct fixes on Defects4J achieved by any single APR technique.
- Fifteen multi-hunk bugs are correctly repaired, a category that defeats most existing methods.
- Thirteen bugs receive their first correct fix from this approach.
Where Pith is reading between the lines
- Many multi-hunk bugs may turn out to be instances of repeated similar edits that history and similarity can surface automatically.
- The same sibling detection could be added to existing repair pipelines as a preprocessing step to increase their reach.
- Projects with sparse revision history might still benefit if spectrum and similarity alone prove sufficient for sibling identification.
- The method could be tested on other benchmarks to check whether the evolutionary-sibling pattern appears outside Defects4J.
Load-bearing premise
The combined analysis of test-suite spectrum, code similarity, and revision history will select only locations that genuinely need the same patch and that applying it will not create new errors.
What would settle it
A Defects4J bug for which the identified evolutionary siblings require different patches at different locations, so that simultaneous repair produces an incorrect result or leaves the bug unfixed.
Figures
read the original abstract
Despite significant advances in automatic program repair (APR)techniques over the past decade, practical deployment remains an elusive goal. One of the important challenges in this regard is the general inability of current APR techniques to produce patches that require edits in multiple locations, i.e., multi-hunk patches. In this work, we present a novel APR technique that generalizes single-hunk repair techniques to include an important class of multi-hunk bugs, namely bugs that may require applying a substantially similar patch at a number of locations. We term such sets of repair locations as evolutionary siblings - similar looking code, instantiated in similar contexts, that are expected to undergo similar changes. At the heart of our proposed method is an analysis to accurately identify a set of evolutionary siblings, for a given bug. This analysis leverages three distinct sources of information, namely the test-suite spectrum, a novel code similarity analysis, and the revision history of the project. The discovered siblings are then simultaneously repaired in a similar fashion. We instantiate this technique in a tool called Hercules and demonstrate that it is able to correctly fix 49 bugs in the Defects4J dataset, the highest of any individual APR technique to date. This includes 15 multi-hunk bugs and overall 13 bugs which have not been fixed by any other technique so far.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper presents Hercules, an APR technique that identifies sets of 'evolutionary siblings' (code locations expected to require substantially similar patches) by combining test-suite spectrum information, a novel code similarity analysis, and project revision history. These siblings are then repaired simultaneously using a generalized single-hunk repair approach. On the Defects4J benchmark the tool is reported to produce correct patches for 49 bugs—the highest count for any individual technique—including 15 multi-hunk bugs and 13 bugs not previously fixed by any other technique.
Significance. If the empirical results hold under rigorous validation, the work is significant because it directly targets the multi-hunk repair problem that has limited practical deployment of APR. The combination of three orthogonal signals for sibling detection is a concrete methodological contribution, and the headline counts on a public benchmark (49 total, 15 multi-hunk) would represent a measurable advance over prior single-technique results.
major comments (3)
- [§5] §5 (Evaluation) and the abstract: the headline claim of 49 correct fixes (15 multi-hunk) is presented without an ablation that isolates the contribution of the sibling-identification component. No precision/recall figures or manual audit of the 15 multi-hunk cases are supplied, so it remains possible that the multi-hunk successes are produced by the underlying single-hunk engine on nearby locations rather than by the evolutionary-sibling mechanism.
- [§4.2–4.3] §4.2–4.3 (Sibling identification): the three-signal analysis (spectrum + similarity + history) is described, yet no independent validation of the resulting sibling sets is reported. The central assumption that the identified locations “genuinely require substantially similar patches” is therefore supported only by end-to-end repair counts.
- [§5.1] §5.1 (Patch validation): the procedure used to classify a patch as correct (test-suite adequacy, false-positive controls, statistical significance) is not detailed. Without this information the reported counts cannot be fully reproduced or compared with prior work.
minor comments (2)
- [§4] Notation for the similarity threshold (listed as a free parameter) should be introduced once and used consistently in the equations of §4.
- [Table 2] Table 2 (or equivalent results table) would benefit from an additional column showing, for each multi-hunk bug, which of the three signals contributed to the sibling set.
Simulated Author's Rebuttal
We thank the referee for the thoughtful and detailed review. The comments highlight important areas for clarification and strengthening. We respond to each major comment below, indicating planned revisions to the manuscript.
read point-by-point responses
-
Referee: [§5] §5 (Evaluation) and the abstract: the headline claim of 49 correct fixes (15 multi-hunk) is presented without an ablation that isolates the contribution of the sibling-identification component. No precision/recall figures or manual audit of the 15 multi-hunk cases are supplied, so it remains possible that the multi-hunk successes are produced by the underlying single-hunk engine on nearby locations rather than by the evolutionary-sibling mechanism.
Authors: We agree that an ablation isolating the sibling-identification component would strengthen the claims. The 15 multi-hunk fixes rely on coordinated application across locations identified as evolutionary siblings; the underlying single-hunk engine alone cannot produce such patches without this mechanism. We will revise the evaluation section to include a manual audit of these 15 cases, describing for each how sibling detection enabled the multi-location repair, and add a comparison against independent single-hunk repairs on the same locations. Where feasible we will also report precision/recall for the sibling sets. revision: yes
-
Referee: [§4.2–4.3] §4.2–4.3 (Sibling identification): the three-signal analysis (spectrum + similarity + history) is described, yet no independent validation of the resulting sibling sets is reported. The central assumption that the identified locations “genuinely require substantially similar patches” is therefore supported only by end-to-end repair counts.
Authors: The primary validation of sibling sets occurs through successful end-to-end repair, as the technique's objective is improved repair capability rather than standalone sibling detection. Ground-truth sibling sets are not available in Defects4J, limiting independent validation. We will add concrete examples of identified sibling sets (with their patches) to §4 to illustrate the assumption in practice and discuss the three-signal combination's role in the revised manuscript. revision: partial
-
Referee: [§5.1] §5.1 (Patch validation): the procedure used to classify a patch as correct (test-suite adequacy, false-positive controls, statistical significance) is not detailed. Without this information the reported counts cannot be fully reproduced or compared with prior work.
Authors: We agree that the patch validation procedure must be described in sufficient detail for reproducibility. We will expand §5.1 to explicitly detail the criteria for classifying patches as correct, including test-suite adequacy checks, any manual inspection steps, false-positive controls, and how our methodology aligns with or differs from prior APR work for fair comparison. revision: yes
Circularity Check
No circularity; empirical evaluation on external benchmark with no self-referential derivation
full rationale
The paper presents an empirical APR technique (Hercules) whose central claim is the count of bugs fixed on the public Defects4J benchmark (49 total, 15 multi-hunk). No equations, fitted parameters, or derivation chain appear in the abstract or described method. The sibling-identification step is presented as a heuristic combining three signals, but the result is reported only as end-to-end repair success rather than a mathematical reduction to inputs. No self-citation is invoked as a uniqueness theorem or load-bearing premise. This is a standard empirical result on an external benchmark and receives the default non-circularity finding.
Axiom & Free-Parameter Ledger
free parameters (1)
- similarity threshold
axioms (1)
- domain assumption Bugs that require substantially similar patches at multiple locations can be detected from test coverage, static similarity, and revision history.
invented entities (1)
-
evolutionary siblings
no independent evidence
Lean theorems connected to this paper
-
IndisputableMonolith/Foundation/RealityFromDistinction.leanreality_from_one_distinction unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
Hercules identifies evolutionary siblings using test-suite spectrum, reaching-definition analysis, AST tree-distance, and revision-history edit operations to enable simultaneous multi-hunk repair.
-
IndisputableMonolith/Cost/FunctionalEquation.leanwashburn_uniqueness_aczel unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
Repair schema instantiation and candidate-patch ranking via machine-learned model on Defects4J.
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Reference graph
Works this paper leans on
-
[1]
Rui Abreu, Peter Zoeteweij, Rob Golsteijn, and Arjan J. C. van Gemund. 2009. A Practical Evaluation of Spectrum-based Fault Localization. Journal of Systems and Software 82, 11 (Nov. 2009), 1780–1792
work page 2009
-
[2]
Earl T Barr, Yuriy Brun, Premkumar Devanbu, Mark Harman, and Federica Sarro
-
[3]
The plastic surgery hypothesis. In Proceedings of the 22nd ACM SIGSOFT International Symposium on Foundations of Software Engineering . ACM, 306–317
-
[4]
Barr, Mark Harman, Yue Jia, Alexandru Marginean, and Justyna Petke
Earl T. Barr, Mark Harman, Yue Jia, Alexandru Marginean, and Justyna Petke. 2015. Automated Software Transplantation. In Proceedings of the 2015 International Symposium on Software Testing and Analysis (ISSTA 2015) . ACM, New York, NY, USA, 257–269
work page 2015
-
[5]
Liushan Chen, Yu Pei, and Carlo A Furia. 2017. Contract-based program re- pair without the contracts. In Automated Software Engineering (ASE), 2017 32nd IEEE/ACM International Conference on. IEEE, 637–647
work page 2017
-
[6]
James R. Cordy and Chanchal K. Roy. 2011. The NiCad Clone Detector. In Pro- ceedings of the 2011 IEEE 19th International Conference on Program Comprehension (ICPC ’11). IEEE Computer Society, Washington, DC, USA, 219–220
work page 2011
-
[7]
Ekwa Duala-Ekoko and Martin P Robillard. 2007. Tracking code clones in evolving software. In Software Engineering, 2007. ICSE 2007. 29th International Conference on. IEEE, 158–167
work page 2007
-
[8]
Thomas Durieux, Matias Martinez, Martin Monperrus, Romain Sommerard, and Jifeng Xuan. 2015. Automatic Repair of Real Bugs: An Experience Report on the Defects4J Dataset. CoRR abs/1505.07002 (2015). http://arxiv.org/abs/1505.07002
work page internal anchor Pith review Pith/arXiv arXiv 2015
-
[9]
L. Gazzola, D. Micucci, and L. Mariani. 2018. Automatic Software Repair: A Survey. IEEE Transactions on Software Engineering (2018), 1–1
work page 2018
-
[10]
Judith F Islam, Manishankar Mondal, and Chanchal K Roy. 2016. Bug replication in code clones: An empirical study. InSoftware Analysis, Evolution, and Reengineering (SANER), 2016 IEEE 23rd International Conference on , Vol. 1. IEEE, 68–78
work page 2016
-
[11]
Tom Janssen, Rui Abreu, and Arjan J.C. van Gemund. 2009. Zoltar: a spectrum- based fault localization tool. In SINTER ’09: Proceedings of the 2009 ESEC/FSE workshop on Software integration and evolution @ runtime . ACM, New York, NY, USA, 23–30
work page 2009
-
[12]
Jiajun Jiang, Yingfei Xiong, Hongyu Zhang, Qing Gao, and Xiangqun Chen
-
[13]
Shaping Program Repair Space with Existing Patches and Similar Code. In Proceedings of the 27th ACM SIGSOFT International Symposium on Software Testing and Analysis (ISSTA 2018). ACM, New York, NY, USA, 298–309
work page 2018
-
[14]
Lingxiao Jiang, Ghassan Misherghi, Zhendong Su, and Stephane Glondu. 2007. Deckard: Scalable and accurate tree-based detection of code clones. InProceedings of the 29th international conference on Software Engineering . IEEE Computer Society, 96–105
work page 2007
-
[15]
Jones, Mary Jean Harrold, and John Stasko
James A. Jones, Mary Jean Harrold, and John Stasko. 2002. Visualization of Test Information to Assist Fault Localization. In Proceedings of the 24th International Conference on Software Engineering (ICSE ’02). ACM, New York, NY, USA, 467– 477
work page 2002
-
[16]
Elmar Juergens, Florian Deissenboeck, Benjamin Hummel, and Stefan Wagner
-
[17]
Do code clones matter?. In Software Engineering, 2009. ICSE 2009. IEEE 31st International Conference on. IEEE, 485–495
work page 2009
-
[18]
René Just, Darioush Jalali, and Michael D Ernst. 2014. Defects4J: A database of existing faults to enable controlled testing studies for Java programs. In Proceed- ings of the 2014 International Symposium on Software Testing and Analysis . ACM, 437–440
work page 2014
-
[19]
Toshihiro Kamiya, Shinji Kusumoto, and Katsuro Inoue. 2002. CCFinder: A Multilinguistic Token-based Code Clone Detection System for Large Scale Source Code. IEEE Trans. Softw. Eng. 28, 7 (July 2002), 654–670
work page 2002
-
[20]
Y. Ke, K. T. Stolee, C. Le Goues, and Y. Brun. 2015. Repairing Programs with Semantic Code Search. In 2015 30th IEEE/ACM International Conference on Auto- mated Software Engineering (ASE) . 295–306
work page 2015
-
[21]
Dongsun Kim, Jaechang Nam, Jaewoo Song, and Sunghun Kim. 2013. Automatic Patch Generation Learned from Human-written Patches. In Proceedings of the 2013 International Conference on Software Engineering (ICSE ’13) . IEEE Press, Piscataway, NJ, USA, 802–811
work page 2013
-
[22]
Miryung Kim and David Notkin. 2005. Using a clone genealogy extractor for understanding and supporting evolution of code clones. InACM SIGSOFT Software Engineering Notes, Vol. 30. ACM, 1–5. 11
work page 2005
-
[23]
Miryung Kim and David Notkin. 2009. Discovering and Representing Systematic Code Changes. In Proceedings of the 31st International Conference on Software Engineering (ICSE ’09). IEEE Computer Society, Washington, DC, USA, 309–319
work page 2009
-
[24]
X. B. D. Le, D. Lo, and C. Le Goues. 2016. History Driven Program Repair. In 2016 IEEE 23rd International Conference on Software Analysis, Evolution, and Reengineering (SANER), Vol. 1. IEEE Press, Piscataway, NJ, USA, 213–224
work page 2016
-
[25]
Claire Le Goues, Michael Dewey-Vogt, Stephanie Forrest, and Westley Weimer
-
[26]
In Proceedings of the 34th International Conference on Software Engineering (ICSE ’12)
A Systematic Study of Automated Program Repair: Fixing 55 out of 105 Bugs for $8 Each. In Proceedings of the 34th International Conference on Software Engineering (ICSE ’12). IEEE Press, Piscataway, NJ, USA, 3–13
-
[27]
Claire Le Goues, Stephanie Forrest, and Westley Weimer. 2013. Current challenges in automatic software repair. Software quality journal 21, 3 (2013), 421–443
work page 2013
-
[28]
Fan Long, Peter Amidon, and Martin Rinard. 2017. Automatic Inference of Code Transforms for Patch Generation. In Proceedings of the 2017 11th Joint Meeting on Foundations of Software Engineering (ESEC/FSE 2017) . ACM, New York, NY, USA, 727–739
work page 2017
-
[29]
Fan Long and Martin Rinard. 2015. Staged Program Repair with Condition Synthesis. In Proceedings of the 2015 10th Joint Meeting on Foundations of Software Engineering (ESEC/FSE 2015) . ACM, New York, NY, USA, 166–178
work page 2015
-
[30]
Fan Long and Martin Rinard. 2016. Automatic Patch Generation by Learning Correct Code. In Proceedings of the 43rd Annual ACM SIGPLAN-SIGACT Sympo- sium on Principles of Programming Languages (POPL ’16) . ACM, New York, NY, USA, 298–312
work page 2016
-
[31]
Sergey Mechtaev, Jooyong Yi, and Abhik Roychoudhury. 2016. Angelix: Scalable Multiline Program Patch Synthesis via Symbolic Analysis. In Proceedings of the 38th International Conference on Software Engineering (ICSE ’16). ACM, New York, NY, USA, 691–701
work page 2016
- [32]
- [33]
- [34]
-
[35]
Martin Monperrus. 2018. Automatic Software Repair: A Bibliography. Comput. Surveys 51, 1, Article 17 (Jan. 2018), 24 pages
work page 2018
-
[36]
Yuhua Qi, Xiaoguang Mao, Yan Lei, Ziying Dai, and Chengsong Wang. 2014. The Strength of Random Search on Automated Program Repair. In Proceedings of the 36th International Conference on Software Engineering (ICSE 2014) . ACM, New York, NY, USA, 254–265
work page 2014
-
[37]
Zichao Qi, Fan Long, Sara Achour, and Martin Rinard. 2015. An Analysis of Patch Plausibility and Correctness for Generate-and-validate Patch Generation Systems. In Proceedings of the 2015 International Symposium on Software Testing and Analysis (ISSTA 2015). ACM, New York, NY, USA, 24–36
work page 2015
-
[38]
Matthias Rieger, Stéphane Ducasse, and Michele Lanza. 2004. Insights into system- wide code duplication. In Reverse Engineering, 2004. Proceedings. 11th Working Conference on. IEEE, 100–109
work page 2004
-
[39]
Chanchal Kumar Roy and James R Cordy. 2007. A survey on software clone detection research. Queen?s School of Computing TR 541, 115 (2007), 64–68
work page 2007
-
[40]
Chanchal K Roy and James R Cordy. 2008. An empirical study of function clones in open source software. In 2008 15th Working Conference on Reverse Engineering . IEEE, 81–90
work page 2008
-
[41]
Ripon K Saha, Yingjun Lyu, Wing Lam, Hiroaki Yoshida, and Mukul R Prasad. 2018. Bugs. jar: a large-scale, diverse dataset of real-world Java bugs. In Proceedings of the 15th International Conference on Mining Software Repositories . ACM, 10–13
work page 2018
-
[42]
Saha, Yingjun Lyu, Hiroaki Yoshida, and Mukul R
Ripon K. Saha, Yingjun Lyu, Hiroaki Yoshida, and Mukul R. Prasad. 2017. ELIXIR: Effective Object Oriented Program Repair. In Proceedings of the 32Nd IEEE/ACM International Conference on Automated Software Engineering (ASE 2017) . IEEE Press, Piscataway, NJ, USA, 648–659
work page 2017
-
[43]
Seemanta Saha, Ripon K. Saha, and Mukul R. Prasad. 2019. Harnessing Evolution for Multi-Hunk Program Repair. InProceedings of the IEEE/ACM 41st International Conference on Software Engineering (ICSE ’19). IEEE Press, Piscataway, NJ, USA, 13–24
work page 2019
-
[44]
Stelios Sidiroglou-Douskos, Eric Lahtinen, Fan Long, and Martin Rinard. 2015. Automatic Error Elimination by Horizontal Code Transfer Across Multiple Ap- plications. In Proceedings of the 36th ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI ’15) . ACM, New York, NY, USA, 43–54
work page 2015
-
[45]
Shin Hwei Tan and Abhik Roychoudhury. 2015. Relifix: Automated Repair of Software Regressions. In Proceedings of the 37th International Conference on Software Engineering - Volume 1 (ICSE ’15) . IEEE Press, Piscataway, NJ, USA, 471–482
work page 2015
-
[46]
Prasad, and Abhik Roychoudhury
Shin Hwei Tan, Hiroaki Yoshida, Mukul R. Prasad, and Abhik Roychoudhury. 2016. Anti-patterns in Search-based Program Repair. In Proceedings of the 2016 24th ACM SIGSOFT International Symposium on Foundations of Software Engineering (FSE 2016). ACM, New York, NY, USA, 727–738
work page 2016
- [47]
-
[48]
Ming Wen, Junjie Chen, Rongxin Wu, Dan Hao, and Shing-Chi Cheung. 2018. Context-aware Patch Generation for Better Automated Program Repair. In Pro- ceedings of the 40th International Conference on Software Engineering (ICSE ’18) . ACM, New York, NY, USA, 1–11
work page 2018
-
[49]
Qi Xin and Steven P. Reiss. 2017. Leveraging Syntax-related Code for Automated Program Repair. In Proceedings of the 32Nd IEEE/ACM International Conference on Automated Software Engineering (ASE 2017) . IEEE Press, Piscataway, NJ, USA, 660–670
work page 2017
-
[50]
Yingfei Xiong, Jie Wang, Runfa Yan, Jiachen Zhang, Shi Han, Gang Huang, and Lu Zhang. 2017. Precise Condition Synthesis for Program Repair. In Proceedings of the 39th International Conference on Software Engineering (ICSE ’17). IEEE Press, Piscataway, NJ, USA, 416–426
work page 2017
-
[51]
Kaizhong Zhang and Dennis Shasha. 1989. Simple fast algorithms for the editing distance between trees and related problems. SIAM journal on computing 18, 6 (1989), 1245–1262
work page 1989
-
[52]
Hao Zhong and Zhendong Su. 2015. An empirical study on real bug fixes. In Proceedings of the 37th International Conference on Software Engineering-Volume
work page 2015
-
[53]
IEEE Press, 913–923. 12
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.