QiMeng-PRepair: Precise Code Repair via Edit-Aware Reward Optimization
Pith reviewed 2026-05-10 18:37 UTC · model grok-4.3
The pith
Language models learn precise code repairs by training on self-generated bugs with rewards that penalize unnecessary edits.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
PRepair mitigates over-editing in LLM-based program repair by training on diverse self-generated bugs and optimizing with an edit-aware reward that encourages only the necessary modifications, thereby maximizing reuse of correct code and improving overall repair accuracy.
What carries the argument
Edit-Aware Group Relative Policy Optimization (EA-GRPO), which augments standard policy optimization with a reward signal based on the extent and correctness of code edits to favor minimal yet complete fixes.
Load-bearing premise
Self-generated buggy examples combined with an edit-aware reward will reliably steer the model toward minimal correct repairs without missing bugs or needing human repair labels.
What would settle it
On a held-out test set, models trained with PRepair produce either more edits than necessary or fail to correct the injected bugs at rates comparable to or worse than baseline fine-tuning.
Figures
read the original abstract
Large Language Models (LLMs) achieve strong program repair performance but often suffer from over-editing, where excessive modifications overwrite correct code and hinder bug localization. We systematically quantify its impact and introduce precise repair task, which maximizes reuse of correct code while fixing only buggy parts. Building on this insight, we propose PRepair, a framework that mitigates over-editing and improves repair accuracy. PRepair has two components: Self-Breaking, which generates diverse buggy programs via controlled bug injection and min-max sampling, and Self-Repairing, which trains models with Edit-Aware Group Relative Policy Optimization (EA-GRPO) using an edit-aware reward to encourage minimal yet correct edits. Experiments show that PRepair improves repair precision by up to 31.4% under $\mathrm{fix}_1@1$, a metric that jointly considers repair correctness and extent, and significantly increases decoding throughput when combined with speculative editing, demonstrating its potential for precise and practical code repair.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper claims that LLMs for program repair suffer from over-editing, which the authors quantify and address by defining a 'precise repair' task that maximizes reuse of correct code. They introduce PRepair, consisting of Self-Breaking (controlled bug injection with min-max sampling to generate diverse buggy programs) and Self-Repairing (training via Edit-Aware Group Relative Policy Optimization (EA-GRPO) with an edit-aware reward). Experiments report up to 31.4% gains in the fix₁@1 metric (which jointly scores correctness and edit extent) plus improved decoding throughput via speculative editing.
Significance. If the reported precision gains and the reliability of the edit-aware reward hold under rigorous controls, the work could meaningfully advance practical LLM-based repair by reducing unnecessary changes to correct code. The fix₁@1 metric and the synthetic bug-generation approach are potentially useful contributions for evaluating and training minimal-edit repair models.
major comments (2)
- [§4, Table 2] §4 (Experiments), Table 2 and the fix₁@1 definition: the abstract and results claim a 31.4% improvement, but the manuscript must explicitly state the exact formula for fix₁@1, the full list of baselines (including whether they use the same base model and decoding settings), dataset sizes, and statistical significance tests; without these, the central empirical claim cannot be verified as load-bearing.
- [§3.2] §3.2 (EA-GRPO): the edit-aware reward is described as trading off minimality against correctness, but the manuscript should provide the precise reward function (including any coefficients) and an ablation showing that removing the edit term collapses performance; otherwise the claim that EA-GRPO reliably prevents over-editing rests on an untested assumption.
minor comments (3)
- [§1] The abstract and §1 should cite prior work on over-editing in code repair (e.g., recent studies on LLM repair precision) to better situate the contribution.
- [Figure 1] Figure 1 (overview) and the Self-Breaking description would benefit from a small example showing a concrete bug injection and the resulting min-max sample pair.
- [§3.1] The paper should release the synthetic bug-generation code and the exact prompts used for Self-Breaking to support reproducibility.
Simulated Author's Rebuttal
We thank the referee for the constructive comments, which help strengthen the verifiability of our empirical results. We address each major comment below and will revise the manuscript to incorporate the requested clarifications.
read point-by-point responses
-
Referee: [§4, Table 2] §4 (Experiments), Table 2 and the fix₁@1 definition: the abstract and results claim a 31.4% improvement, but the manuscript must explicitly state the exact formula for fix₁@1, the full list of baselines (including whether they use the same base model and decoding settings), dataset sizes, and statistical significance tests; without these, the central empirical claim cannot be verified as load-bearing.
Authors: We agree that these details are necessary for full verification. In the revised manuscript we will: (1) state the exact formula for fix₁@1 in §4, (2) provide the complete list of baselines together with their base models and decoding settings, (3) report the precise dataset sizes used in each experiment, and (4) add statistical significance tests (bootstrap confidence intervals and paired t-tests) supporting the reported gains. revision: yes
-
Referee: [§3.2] §3.2 (EA-GRPO): the edit-aware reward is described as trading off minimality against correctness, but the manuscript should provide the precise reward function (including any coefficients) and an ablation showing that removing the edit term collapses performance; otherwise the claim that EA-GRPO reliably prevents over-editing rests on an untested assumption.
Authors: We will insert the precise mathematical definition of the edit-aware reward (including all weighting coefficients) into §3.2. We will also add an ablation experiment that removes the edit term from the reward and reports the resulting drop in fix₁@1, thereby confirming the contribution of the edit-aware component. revision: yes
Circularity Check
No significant circularity; empirical training/evaluation framework
full rationale
The paper presents an empirical framework (Self-Breaking for synthetic bug generation + EA-GRPO training with edit-aware reward) evaluated on repair precision metrics such as fix₁@1. No derivation chain, mathematical model, or uniqueness theorem is claimed; the central result is an observed performance gain from the proposed training procedure. All load-bearing elements are externally falsifiable via standard ML benchmarks and do not reduce to self-defined fitted quantities or self-citation chains. This matches the expected non-circular outcome for a purely empirical methods paper.
Axiom & Free-Parameter Ledger
free parameters (1)
- edit-aware reward coefficients
axioms (1)
- domain assumption Controlled synthetic bug injection produces training signals representative of real-world bugs
Reference graph
Works this paper leans on
-
[1]
Evaluating Large Language Models Trained on Code
Evaluating large language models trained on code.arXiv preprint arXiv:2107.03374. Will Epperson, Gagan Bansal, Victor C Dibia, Adam Fourney, Jack Gerrits, Erkang (Eric) Zhu, and Saleema Amershi. 2025. Interactive debugging and steering of multi-agent ai systems. InProceedings of the 2025 CHI Conference on Human Factors in Computing Systems, CHI ’25, page ...
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[2]
Slmfix: Leveraging small language models for error fixing with reinforcement learning.Preprint, arXiv:2511.19422. Jiale Guo, Suizhi Huang, Mei Li, Dong Huang, Xing- sheng Chen, Regina Zhang, Zhijiang Guo, Han Yu, Siu-Ming Yiu, Pietro Lio, and Kwok-Yan Lam. 2025. A comprehensive survey on benchmarks and solu- tions in software engineering of llm-empowered ...
-
[3]
Verilogcoder: Autonomous verilog coding agents with graph-based planning and abstract syn- tax tree (ast)-based waveform tracing tool.Preprint, arXiv:2408.08927. Binyuan Hui, Jian Yang, Zeyu Cui, Jiaxi Yang, Day- iheng Liu, Lei Zhang, Tianyu Liu, Jiajun Zhang, Bowen Yu, Keming Lu, Kai Dang, Yang Fan, Yichang Zhang, An Yang, Rui Men, Fei Huang, Bo Zheng, Y...
-
[4]
Leetcodedataset: A temporal dataset for robust evaluation and efficient training of code llms
Leetcodedataset: A temporal dataset for ro- bust evaluation and efficient training of code llms. Preprint, arXiv:2504.14655. Junjielong Xu, Ying Fu, Shin Hwei Tan, and Pinjia He
-
[5]
Aligning the objective of llm-based program repair.Preprint, arXiv:2404.08877. Boyang Yang, Haoye Tian, Jiadong Ren, Hongyu Zhang, Jacques Klein, Tegawende Bissyande, Claire Le Goues, and Shunfu Jin. 2025. Morepair: Teaching llms to repair code via multi-objective fine-tuning. ACM Transactions on Software Engineering and Methodology. Xufeng Yao, Haoyang L...
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.