arxiv: 2604.05753 · v1 · submitted 2026-04-07 · 💻 cs.SE

Recognition: no theorem link

An End-to-End Approach for Fixing Concurrency Bugs via SHB-Based Context Extractor

Zhuang Li , Qiuping Yi , Keyang Xiao , Zongcheng Ji , Hongliang Liang

Authors on Pith no claims yet

Pith reviewed 2026-05-10 19:17 UTC · model grok-4.3

classification 💻 cs.SE

keywords concurrency bugsautomated program repairstatic happens-beforeLLM agentscontext extractionend-to-end repairbug fixing

0 comments

The pith

ConFixAgent repairs concurrency bugs end-to-end by extracting contexts from Static Happens-Before graphs for LLM use.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces an agent called ConFixAgent that fixes different kinds of concurrency bugs in software using large language models. It works without needing any bug reports, test cases, or other prior information that most other repair tools assume are available. The key step is building a Static Happens-Before graph from the source code to locate the parts of the program most relevant to the bug. This extracted context then guides the language model to generate accurate repair patches. Tests on several benchmark sets show that this method works on more bug types and produces better fixes than current state-of-the-art approaches.

Core claim

ConFixAgent is an LLM-driven agent that addresses various concurrency bugs in an end-to-end manner by utilizing Static Happens-Before Graphs to identify bug-relevant code sections, eliminating the need for any prior bug-related information, and experiments confirm it outperforms existing tools.

What carries the argument

Static Happens-Before Graphs, which identify bug-relevant sections of code to provide context for LLM-generated repairs.

If this is right

Concurrency bug repair becomes possible without access to bug reports or test oracles.
The accuracy of LLM repair suggestions increases when provided with SHB-extracted contexts.
More types of concurrency bugs can be automatically addressed than with previous tools.
End-to-end automation reduces the manual effort required for fixing issues in multi-threaded programs.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Developers working on concurrent systems could integrate such agents into their workflows to speed up debugging.
Similar context extraction techniques based on happens-before relations might extend to detecting other non-deterministic issues.
Further improvements could come from combining this with dynamic analysis for even better context selection.

Load-bearing premise

Static Happens-Before graphs can reliably surface the bug-relevant code sections in real-world programs without any prior bug reports or test oracles.

What would settle it

Running ConFixAgent on a program where the Static Happens-Before graph fails to include critical synchronization code, resulting in incorrect or no repair suggestions.

Figures

Figures reproduced from arXiv: 2604.05753 by Hongliang Liang, Keyang Xiao, Qiuping Yi, Zhuang Li, Zongcheng Ji.

**Figure 2.** Figure 2: Overview of ConFixAgent: The input is a program containing bugs, and the output is the repaired program. The iteration stops when no concurrency bugs can be detected or when the maximum number of iterations is reached. snippets P ′ , guiding the LLM’s attention toward bug-related code and away from unrelated content. The bug reports, together with the extracted relevant code snippets, are passed to the LLM… view at source ↗

**Figure 3.** Figure 3: A code snippet from the Java benchmark nested_monitor in SIR [53]. Algorithm 1 Identifying methods that cannot be removed. 1: Input: Program: P; 2: Output: Marked methods set: mSet; 3: mSet ← ∅; 4: SHBG ← CONSTRUCTSHBGRAPH(P); 5: mSet ← DECTDEADLOCKS(SHBG); 6: mSet ← mSet ∪ GETSEMANTICMETHODS(SHBG); 7: for (each event e in SHBG) 8: if (e is a bug-relevant event); 9: Let f be the method contain e; 10: mSet … view at source ↗

**Figure 4.** Figure 4: Prompt for Bug Localization and Fixing. order conflicting event pairs are unlikely to cause concurrency bugs. It is important to note that SHB itself is complete, ensuring that all potential concurrent event relationships are captured. Additionally, the deadlock detection algorithm based on SHB does not miss any potential deadlocks [11]. Since the second principle cannot be guaranteed to hold in all cases,… view at source ↗

**Figure 5.** Figure 5: The output results of the LLM for the example in Figure 3. [PITH_FULL_IMAGE:figures/full_fig_p008_5.png] view at source ↗

**Figure 6.** Figure 6: Number of successfully repaired (where success is defined as at [PITH_FULL_IMAGE:figures/full_fig_p011_6.png] view at source ↗

**Figure 7.** Figure 7: The specific content of each part of the prompt for bug localization [PITH_FULL_IMAGE:figures/full_fig_p018_7.png] view at source ↗

read the original abstract

With the rise of multi-core processors and distributed systems, concurrent programming has become essential yet challenging, primarily due to the non-deterministic nature of thread execution. Manually addressing concurrency bugs is time-consuming and error-prone. Automated Program Repair techniques provide a promising solution. However, developing an end-to-end concurrency bug repair tool is particularly challenging. Most existing tools rely on the assumption that bug-related information is readily available or that concurrency bug contexts are ideally extracted, which is often impractical in real-world scenarios. This paper introduces ConFixAgent, an LLM-driven agent capable of fixing various types of concurrency bugs in an end-to-end manner, eliminating the need for any prior bug-related information. Specifically, we propose a novel context extraction approach designed for concurrency bug repair, utilizing Static Happens-Before Graphs to identify bug-relevant sections.We implemented ConFixAgent and evaluated it across multiple benchmark sets. Our extensive experiments demonstrate that ConFixAgent significantly outperforms state-of-the-art tools in addressing diverse types of concurrency bugs, with its context extraction method markedly enhancing the accuracy of LLM-generated repair solutions.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

ConFixAgent pairs an SHB-based context extractor with an LLM agent for end-to-end concurrency repair, but the abstract supplies no numbers and the static over-approximation risk is real.

read the letter

The paper's core move is to drop the usual assumption that bug locations or oracles are already known and instead let a static happens-before graph pull out the relevant code slices before an LLM proposes fixes. That end-to-end framing is the clearest departure from prior repair work on concurrency bugs. The authors implement it as ConFixAgent and report that the extracted context improves LLM accuracy while the whole system beats existing tools on their benchmarks. Those two claims are what a reader should take away first. The approach is concrete and directly targets a practical pain point in multi-threaded code maintenance. The citation pattern looks standard for the automated repair and concurrency analysis literatures, with no obvious self-referential loops. The evaluation is presented as an empirical comparison rather than a fitted-parameter exercise. That said, the abstract contains zero quantitative results, no listed baselines, no bug-selection criteria, and no error bars, so the outperformance claim cannot be checked from what is given. The stress-test concern about SHB over-approximation also lands: static happens-before graphs routinely include irrelevant paths and can omit statements whose relevance depends on runtime values or rare interleavings. If that happens on a non-trivial fraction of the evaluated cases, the accuracy gain cannot be attributed to the context method. The work shows clear thinking about the engineering constraints of real-world repair and engages the literature without obvious internal contradictions. It is worth sending to a referee who can press on the evaluation details and the fidelity of the extracted contexts, even if the current evidence is still thin.

Referee Report

2 major / 2 minor

Summary. The paper presents ConFixAgent, an LLM-driven agent for end-to-end concurrency bug repair that uses a novel Static Happens-Before (SHB) graph-based context extractor to identify relevant code sections without any prior bug reports, test oracles, or bug-related information. It evaluates the approach on multiple benchmark sets and claims that ConFixAgent significantly outperforms state-of-the-art tools across diverse concurrency bug types, with the SHB context method markedly improving the accuracy of LLM-generated repairs.

Significance. If the empirical claims hold, the work would advance automated program repair for concurrency bugs, a challenging area due to non-determinism in multi-threaded execution. The end-to-end design without strong assumptions about available bug information is a strength, as is the proposal of SHB graphs for context extraction in an LLM agent setting. The paper receives credit for attempting a fully automated pipeline and for conducting experiments on diverse bug types.

major comments (2)

[Abstract and context extraction description] The central performance claim (abstract) that the SHB-based context extractor 'markedly enhancing the accuracy of LLM-generated repair solutions' is load-bearing and depends on the assumption that SHB graphs reliably surface the minimal bug-relevant statements. However, static SHB construction (via lock-order or memory-access analysis) produces over-approximations that can omit statements whose relevance depends on runtime values or rare interleavings, as highlighted by the stress-test concern. The manuscript should include a concrete analysis or ablation (e.g., in the evaluation section) quantifying the fraction of benchmark bugs where the extracted context excludes the defect site or critical synchronization, or demonstrate that this does not undermine the reported gains.
[Abstract and §5 (Evaluation)] The abstract asserts 'extensive experiments demonstrate that ConFixAgent significantly outperforms state-of-the-art tools' and 'markedly enhancing the accuracy,' but provides no quantitative results, baselines, error bars, bug selection criteria, or verification method. If these details appear in §5 or the evaluation tables, they must be cross-checked against the SHB failure modes to ensure the outperformance can be attributed to the context method rather than other factors in the agent.

minor comments (2)

[Abstract] The abstract could more precisely state the specific metrics (e.g., repair success rate, number of bugs fixed) and the exact benchmarks used, to allow readers to immediately assess the scale of the claimed improvements.
Notation for SHB graphs and the context extraction algorithm should be introduced with a small illustrative example early in the paper to clarify how static over-approximation is handled in practice.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the thoughtful and detailed review. The comments highlight important considerations for the reliability of our SHB-based context extractor and the clarity of our empirical claims. We address each major comment below and outline revisions to strengthen the manuscript.

read point-by-point responses

Referee: [Abstract and context extraction description] The central performance claim (abstract) that the SHB-based context extractor 'markedly enhancing the accuracy of LLM-generated repair solutions' is load-bearing and depends on the assumption that SHB graphs reliably surface the minimal bug-relevant statements. However, static SHB construction (via lock-order or memory-access analysis) produces over-approximations that can omit statements whose relevance depends on runtime values or rare interleavings, as highlighted by the stress-test concern. The manuscript should include a concrete analysis or ablation (e.g., in the evaluation section) quantifying the fraction of benchmark bugs where the extracted context excludes the defect site or critical synchronization, or demonstrate that this does not undermine the reported gains.

Authors: We acknowledge that static SHB construction inherently involves over-approximation and cannot capture all runtime-value-dependent or rare interleaving cases. In the revised manuscript, we will add a dedicated analysis in Section 5 that examines the benchmark bugs and reports the fraction of cases where the extracted SHB context includes the defect site and critical synchronization operations. We will also include an ablation study comparing ConFixAgent with and without the SHB extractor to isolate its contribution and demonstrate that the reported gains hold even when accounting for static-analysis limitations. revision: yes
Referee: [Abstract and §5 (Evaluation)] The abstract asserts 'extensive experiments demonstrate that ConFixAgent significantly outperforms state-of-the-art tools' and 'markedly enhancing the accuracy,' but provides no quantitative results, baselines, error bars, bug selection criteria, or verification method. If these details appear in §5 or the evaluation tables, they must be cross-checked against the SHB failure modes to ensure the outperformance can be attributed to the context method rather than other factors in the agent.

Authors: The abstract is intentionally concise; all quantitative results, baseline comparisons (including the specific SOTA tools), performance metrics with error bars where applicable, bug selection criteria, and verification procedures are detailed in Section 5 and its tables. In the revision we will add explicit forward references from the abstract and introduction to these sections. We will also expand the evaluation discussion to explicitly cross-check outperformance against potential SHB failure modes, including attribution analysis showing the isolated effect of the context extractor versus other agent components. revision: partial

Circularity Check

0 steps flagged

No circularity: empirical tool evaluation with external benchmarks

full rationale

The paper introduces ConFixAgent as an LLM agent for end-to-end concurrency bug repair using a proposed SHB-based context extractor. Its central claims rest on experimental results comparing repair accuracy against external state-of-the-art tools across multiple benchmarks, without any mathematical derivation chain, fitted parameters renamed as predictions, or load-bearing self-citations. The evaluation is presented as direct empirical measurement of the implemented system, making the performance claims independent of the method's internal construction.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 1 invented entities

The central claim rests on the unstated premise that LLMs can produce correct patches when given SHB-derived context and that the graph construction itself is sound for bug localization. No free parameters or invented physical entities are mentioned.

axioms (2)

domain assumption Static Happens-Before graphs can be constructed accurately from source code and will highlight sections relevant to concurrency bugs.
Invoked in the description of the novel context extraction approach.
domain assumption Large language models can generate valid concurrency bug fixes when supplied with focused code context.
Underlying the use of LLM-generated repair solutions.

invented entities (1)

ConFixAgent no independent evidence
purpose: End-to-end LLM agent for concurrency bug repair
The system name and architecture introduced in the paper; no independent evidence provided beyond the claimed experiments.

pith-pipeline@v0.9.0 · 5495 in / 1403 out tokens · 32543 ms · 2026-05-10T19:17:41.248372+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

121 extracted references · 26 canonical work pages · 9 internal anchors

[1]

Lu S, Park S, Seo E, et al. Learning from mistakes: a comprehensive study on real world concurrency bug characteristics[C]//Proceedings of the 13th international conference on Architectural support for program- ming languages and operating systems. 2008: 329-339

2008
[2]

Tree of thoughts: Deliberate problem solving with large language models[J]

Yao S, Yu D, Zhao J, et al. Tree of thoughts: Deliberate problem solving with large language models[J]. Advances in Neural Information Processing Systems, 2024, 36

2024
[3]

A unified approach for localizing non- deadlock concurrency bugs[C]//2012 IEEE Fifth International Confer- ence on Software Testing, Verification and Validation

Park S, Vuduc R, Harrold M J. A unified approach for localizing non- deadlock concurrency bugs[C]//2012 IEEE Fifth International Confer- ence on Software Testing, Verification and Validation. IEEE, 2012: 51- 60

2012
[4]

Verifying synchronization for atomicity violation fixing[J]

Shi Q, Huang J, Chen Z, et al. Verifying synchronization for atomicity violation fixing[J]. IEEE Transactions on Software Engineering, 2015, 42(3): 280-296

2015
[5]

RacerD: composi- tional static race detection[J]

Blackshear S, Gorogiannis N, O’Hearn P W, et al. RacerD: composi- tional static race detection[J]. Proceedings of the ACM on Programming Languages, 2018, 2(OOPSLA): 1-28

2018
[6]

Pfix: fixing concurrency bugs based on memory access patterns[C]//Proceedings of the 33rd ACM/IEEE International Conference on Automated Software Engineering

Lin H, Wang Z, Liu S, et al. Pfix: fixing concurrency bugs based on memory access patterns[C]//Proceedings of the 33rd ACM/IEEE International Conference on Automated Software Engineering. 2018: 589-600

2018
[7]

E. T. Barr, M. Harman, P. McMinn, M. Shahbaz and S. Yoo, ”The Oracle Problem in Software Testing: A Survey,” in IEEE Transactions on Software Engineering, vol. 41, no. 5, pp. 507-525, 1 May 2015

2015
[8]

Chang-Ai Sun, Hepeng Dai, Ning Geng, Huai Liu, Tsong Yueh Chen, Peng Wu, Yan Cai, and Jinqiu Wang. 2023. An Interleaving Guided Metamorphic Testing Approach for Concurrent Programs. ACM Trans. Softw. Eng. Methodol. 33, 1, Article 8 (January 2024), 21 pages

2023
[9]

Sheng Zhan and Jeff Huang. 2016. ECHO: instantaneous in situ race detection in the IDE. In Proceedings of the 2016 24th ACM SIGSOFT International Symposium on Foundations of Software Engineering (FSE 2016). Association for Computing Machinery, New York, NY , USA, 775–786

2016
[10]

S. Park, R. Vuduc and M. J. Harrold, ”A Unified Approach for Local- izing Non-deadlock Concurrency Bugs,” 2012 IEEE Fifth International Conference on Software Testing, Verification and Validation, Montreal, QC, Canada, 2012, pp. 51-60, doi: 10.1109/ICST.2012.85

work page doi:10.1109/icst.2012.85 2012
[11]

Time, clocks, and the ordering of events in a distributed system[M]//Concurrency: the Works of Leslie Lamport

Lamport L. Time, clocks, and the ordering of events in a distributed system[M]//Concurrency: the Works of Leslie Lamport. 2019: 179-196

2019
[12]

Vaziri M , Tip F , Dolby J .Associating synchronization constraints with data in an object-oriented language[J].Acm Sigplan Notices, 2006.DOI:10.1145/1111320.1111067

work page doi:10.1145/1111320.1111067 2006
[13]

Jeff Huang. 2015. Stateless model checking concurrent programs with maximal causality reduction. SIGPLAN Not. 50, 6 (June 2015), 165–174

2015
[14]

Gabriel Ryan, Burcu Cetin, Yongwhan Lim, and Suman Jana. 2024. Accurate Data Race Prediction in the Linux Kernel through Sparse Fourier Learning. Proc. ACM Program. Lang. 8, OOPSLA1, Article 123 (April 2024), 23 pages

2024
[15]

Zheng Shi, Umang Mathur, and Andreas Pavlogiannis. 2024. Optimistic Prediction of Synchronization-Reversal Data Races. In Proceedings of the IEEE/ACM 46th International Conference on Software Engineering (ICSE ’24). Association for Computing Machinery, New York, NY , USA, Article 134, 1–13

2024
[16]

P ˇasˇareanu, and Sarfraz Khurshid

Willem Visser, Corina S. P ˇasˇareanu, and Sarfraz Khurshid. 2004. Test input generation with java PathFinder. SIGSOFT Softw. Eng. Notes 29, 4 (July 2004), 97–107

2004
[17]

Andreea Costea, Abhishek Tiwari, Sigmund Chianasta, Kishore R, Abhik Roychoudhury, and Ilya Sergey. 2023. Hippodrome: Data Race Repair Using Static Analysis Summaries. ACM Trans. Softw. Eng. Methodol. 32, 2, Article 41 (March 2023), 33 pages

2023
[18]

Hou, X., Zhao, Y ., Liu, Y ., Yang, Z., Wang, K., Li, L., Luo, X., Lo, D., Grundy, J.C., & Wang, H. (2023). Large Language Models for Software Engineering: A Systematic Literature Review. ArXiv, abs/2308.10620

work page arXiv 2023
[19]

Yan Cai, Lingwei Cao, and Jing Zhao. 2017. Adaptively generating high quality fixes for atomicity violations. In Proceedings of the 2017 11th Joint Meeting on Foundations of Software Engineering (ESEC/FSE 2017). Association for Computing Machinery, New York, NY , USA, 303–314

2017
[20]

Jeff Huang and Charles Zhang. 2012. Execution privatization for scheduler-oblivious concurrent programs. SIGPLAN Not. 47, 10 (Oc- tober 2012), 737–752

2012
[21]

Jin G , Zhang W , Deng D ,et al.2013.Automated Concurrency-Bug Fixing.Usenix Conference on Operating Systems Design & Implemen- tation.DOI:http://dx.doi.org/

2013
[23]

(2014, November)

Liu, P., Tripp, O., & Zhang, C. (2014, November). Grail: Context-aware fixing of concurrency bugs. In Proceedings of the 22nd ACM SIGSOFT international symposium on foundations of software engineering (pp. 318-329)

2014
[24]

(2012, June)

Liu, P., & Zhang, C. (2012, June). Axis: Automatically fixing atomicity violations through solving control constraints. In 2012 34th International Conference on Software Engineering (ICSE) (pp. 299-309). IEEE

2012
[25]

ConcBugAssist: constraint solving for diagnosis and repair of concurrency bugs[C]//Proc of the 2015 International Symposium on Software Testing and Analysis (ISSTA 2015)

Khoshnood S, Kusano M, Wang C. ConcBugAssist: constraint solving for diagnosis and repair of concurrency bugs[C]//Proc of the 2015 International Symposium on Software Testing and Analysis (ISSTA 2015). New York, NY , USA: ACM, 2015: 165-176

2015
[26]

Yan Cai and Lingwei Cao. 2016. Fixing deadlocks via lock pre- acquisitions. In Proceedings of the 38th International Conference on Software Engineering, ICSE 2016, Austin, TX, USA, May 14-22, 2016. 1109–1120

2016
[27]

Y . Wang, S. Lafortune, T. Kelly, M. Kudlur, and S Mahlke. 2009. The theory of deadlock avoidance via discrete control. In ACM SIGPLAN Notices 44, 1 (2009), 252–263

2009
[28]

Madanlal Musuvathi and Shaz Qadeer. 2007. Iterative context bounding for systematic testing of multithreaded programs. SIGPLAN Not. 42, 6 (June 2007), 446–455

2007
[29]

Zuoning Yin, Ding Yuan, Yuanyuan Zhou, Shankar Pasupathy, and Lakshmi Bairavasundaram. 2011. How do fixes become bugs? In Pro- ceedings of the 19th ACM SIGSOFT symposium and the 13th European conference on Foundations of software engineering (ESEC/FSE ’11). Association for Computing Machinery, New York, NY , USA, 26–36

2011
[30]

Bohuslav Krena, Zdenek Letko, Rachel Tzoref, Shmuel Ur, and Tom ´aˇs V ojnar. 2007. Healing data races on-the-fly. In Proceedings of the 2007 ACM workshop on Parallel and distributed systems: testing and debugging (PADTAD ’07). Association for Computing Machinery, New York, NY , USA, 54–64

2007
[31]

Anthropic. 2023. Claude. https://www.anthropic.com/claude

2023
[32]

TDFix: A lightweight tool for fixing deadlocks based on templates[J]

Ji W, Bo L, Yuan Y , et al. TDFix: A lightweight tool for fixing deadlocks based on templates[J]. Science of Computer Programming, 2024, 233: 103073

2024
[33]

Self-refine: Iterative refinement with self-feedback[J]

Madaan A, Tandon N, Gupta P, et al. Self-refine: Iterative refinement with self-feedback[J]. Advances in Neural Information Processing Sys- tems, 2024, 36

2024
[34]

CRITIC: Large Language Models Can Self-Correct with Tool-Interactive Critiquing

Gou Z, Shao Z, Gong Y , et al. Critic: Large language mod- els can self-correct with tool-interactive critiquing[J]. arXiv preprint arXiv:2305.11738, 2023

work page internal anchor Pith review arXiv 2023
[35]

Lazy diagnosis of in-production concurrency bugs[C]//Proceedings of the 26th Symposium on Operating Systems Principles

Kasikci B, Cui W, Ge X, et al. Lazy diagnosis of in-production concurrency bugs[C]//Proceedings of the 26th Symposium on Operating Systems Principles. 2017: 582-598

2017
[36]

Fixing rust compilation errors using llms[J]

Deligiannis P, Lal A, Mehrotra N, et al. Fixing rust compilation errors using llms[J]. arXiv preprint arXiv:2308.05177, 2023

work page arXiv 2023
[37]

Automatic Program Repair with OpenAI’s Codex: Evaluating QuixBugs[J]

Prenner J A, Robbes R. Automatic Program Repair with OpenAI’s Codex: Evaluating QuixBugs[J]. arXiv preprint arXiv:2111.03922, 2021

work page arXiv 2021
[38]

Mark Chen, Jerry Tworek, Heewoo Jun, Qiming Yuan, Henrique Ponde de Oliveira Pinto, Jared Kaplan, Harri Edwards, Yuri Burda, Nicholas Joseph, Greg Brockman, et al. 2021. Evaluating large language models trained on code. arXiv preprint arXiv:2107.03374 (2021)

work page internal anchor Pith review Pith/arXiv arXiv 2021
[39]

Jacob Austin, Augustus Odena, Maxwell Nye, Maarten Bosma, Henryk Michalewski, David Dohan, Ellen Jiang, Carrie Cai, Michael Terry, Quoc Le, and Charles Sutton. 2021. Program Synthesis with Large Language Models

2021
[40]

Wang, Eric Wallace, Freda Shi, Ruiqi Zhong, Wen tau Yih, Luke Zettlemoyer, and Mike Lewis

Fried D, Aghajanyan A, Lin J, et al. Incoder: A generative model for code infilling and synthesis[J]. arXiv preprint arXiv:2204.05999, 2022. JOURNAL OF LATEX CLASS FILES, VOL. 14, NO. 8, AUGUST 2021 17

work page arXiv 2022
[41]

In Pro- ceedings of the 30th ACM Joint European Software Engineering Con- ference and Symposium on the Foundations of Software Engineering (ESEC/FSE 2022)

Revisiting Automated Program Repair via Zero-shot Learning. In Pro- ceedings of the 30th ACM Joint European Software Engineering Con- ference and Symposium on the Foundations of Software Engineering (ESEC/FSE 2022). Association for Computing Machinery, New York, NY , USA, 959–971

2022
[42]

Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y ., Welleck, S., Majumder, B., Gupta, S., Yazdanbakhsh, A., & Clark, P. (2023). Self-Refine: Iterative Refinement with Self-Feedback. ArXiv, abs/2303.17651

work page internal anchor Pith review arXiv 2023
[43]

Can LLMs Patch Security Issues?[J]

Alrashedy K, Aljasser A. Can LLMs Patch Security Issues?[J]. arXiv preprint arXiv:2312.00024, 2023

work page arXiv 2023
[45]

FastTrack: efficient and precise dynamic race detection[J]

Flanagan C, Freund S N. FastTrack: efficient and precise dynamic race detection[J]. ACM Sigplan Notices, 2009, 44(6): 121-133

2009
[46]

Weidong Cui, Xinyang Ge, Baris Kasikci, Ben Niu, Upamanyu Sharma, Ruoyu Wang, and Insu Yun. 2018. REPT: reverse debugging of failures in deployed software. In Proceedings of the 13th USENIX conference on Operating Systems Design and Implementation (OSDI’18). USENIX Association, USA, 17–32

2018
[47]

Automated program repair[J]

Goues C L, Pradel M, Roychoudhury A. Automated program repair[J]. Communications of the ACM, 2019, 62(12): 56-65

2019
[48]

The Pecan Benchmarks

2011. The Pecan Benchmarks. https://www.cse.ust.hk/prism/pecan/ #Experiment

2011
[49]

JaConTeBe Object Biography

2016. JaConTeBe Object Biography. http://sir.csc.ncsu.edu/portal/bios/concurrency.php

2016
[50]

The PFix Benchmarks

2018. The PFix Benchmarks. https://github.com/PFixConcurrency/FixExamples

2018
[51]

Bozhen Liu and Jeff Huang. 2018. D4: fast concurrency debugging with parallel differential analysis. SIGPLAN Not. 53, 4 (April 2018), 359–373

2018
[52]

Sword: A scalable whole program race detector for java[C]//2019 IEEE/ACM 41st International Conference on Software Engineering: Companion Proceedings (ICSE-Companion)

Li Y , Liu B, Huang J. Sword: A scalable whole program race detector for java[C]//2019 IEEE/ACM 41st International Conference on Software Engineering: Companion Proceedings (ICSE-Companion). IEEE, 2019: 75-78

2019
[53]

Hyunsook Do, Sebastian Elbaum, and Gregg Rothermel. 2005. Sup- porting Controlled Experimentation with Testing Techniques: An Infras- tructure and its Potential Impact. Empirical Software Engineering 10, 4 (2005), 405–435

2005
[54]

GPT-3.5 Turbo Documentation

OpenAI. GPT-3.5 Turbo Documentation. Available at: https://platform.openai.com/docs/models/gpt-3-5-turbo
[55]

GPT-4 Turbo Documentation

OpenAI. GPT-4 Turbo Documentation. 2024. Available at: https://help.openai.com/en/articles/8555510-gpt-4-turbo-in-the-openai- api

work page arXiv 2024
[56]

Chunqiu Steven Xia, Yuxiang Wei, and Lingming Zhang. 2022. Practical Program Repair in the Era of Large Pre-trained Language Models. arXiv:2210.14179 [cs.SE]

work page arXiv 2022
[58]

Julian Aron Prenner, Hlib Babii, and Romain Robbes. 2022. Can OpenAI’s codex fix bugs? an evaluation on QuixBugs. In Proceedings of the Third International Workshop on Automated Program Repair (APR ’22). Association for Computing Machinery, New York, NY , USA, 69–75

2022
[59]

Chunqiu Steven Xia, Yuxiang Wei, and Lingming Zhang. 2023. Au- tomated Program Repair in the Era of Large Pre-Trained Language Models. In Proceedings of the 45th International Conference on Software Engineering (ICSE ’23). IEEE Press, 1482–1494

2023
[60]

LLM4Vuln: A Unified Evaluation Framework for Decoupling and Enhancing LLMs’ Vulnerability Reasoning,

Sun Y , Wu D, Xue Y , et al. LLM4Vuln: A Unified Evaluation Framework for Decoupling and Enhancing LLMs’ Vulnerability Reasoning[J]. arXiv preprint arXiv:2401.16185, 2024

work page arXiv 2024
[61]

Valerio Terragni and Mauro Pezz `e. 2018. Effectiveness and challenges in generating concurrent tests for thread-safe classes. In Proceedings of the 33rd ACM/IEEE International Conference on Automated Software Engineering (ASE ’18). Association for Computing Machinery, New York, NY , USA, 64–75

2018
[62]

Nuno Machado, Brandon Lucia, and Lu ´ıs Rodrigues. 2016. Production- guided concurrency debugging. SIGPLAN Not. 51, 8, Article 29 (August 2016), 12 pages

2016
[63]

Yan Cai and Lingwei Cao. 2016. Fixing deadlocks via lock pre- acquisitions. In Proceedings of the 38th International Conference on Software Engineering (ICSE ’16). Association for Computing Machin- ery, New York, NY , USA, 1109–1120

2016
[64]

Y . Wang, T. Kelly, M. Kudlur, S. Lafortune, and S. Mahlke. Gadara: dynamic deadlock avoidance for multithreaded programs. In Proc. OSDI, 281–294, 2008

2008
[65]

Effective Data-Race Detection for the Kernel[C]//9th USENIX Symposium on Operating Systems Design and Implementation (OSDI 10)

Erickson J, Musuvathi M, Burckhardt S, et al. Effective Data-Race Detection for the Kernel[C]//9th USENIX Symposium on Operating Systems Design and Implementation (OSDI 10). 2010

2010
[66]

Automatically classifying benign and harmful data races using replay analysis[C]//Proceedings of the 28th ACM SIGPLAN Conference on Programming Language Design and Implementation

Narayanasamy S, Wang Z, Tigani J, et al. Automatically classifying benign and harmful data races using replay analysis[C]//Proceedings of the 28th ACM SIGPLAN Conference on Programming Language Design and Implementation. 2007: 22-31

2007
[67]

Baris Kasikci, Weidong Cui, Xinyang Ge, and Ben Niu. 2017. Lazy Di- agnosis of In-Production Concurrency Bugs. In Proceedings of the 26th Symposium on Operating Systems Principles (SOSP ’17). Association for Computing Machinery, New York, NY , USA, 582–598

2017
[68]

T. j. watson libraries for analysis (wala), http://wala.sourceforge.net/
[69]

Better patching using LLM prompting, via Self-Consistency[C]//2023 38th IEEE/ACM International Conference on Automated Software Engineering (ASE)

Ahmed T, Devanbu P. Better patching using LLM prompting, via Self-Consistency[C]//2023 38th IEEE/ACM International Conference on Automated Software Engineering (ASE). IEEE, 2023: 1742-1746

2023
[70]

Huang K, Meng X, Zhang J, et al. An empirical study on fine-tuning large language models of code for automated program repair[C]//2023 38th IEEE/ACM International Conference on Automated Software En- gineering (ASE). IEEE, 2023: 1162-1174

2023
[71]

An empirical study on learning bug-fixing patches in the wild via neural machine translation[J]

Tufano M, Watson C, Bavota G, et al. An empirical study on learning bug-fixing patches in the wild via neural machine translation[J]. ACM Transactions on Software Engineering and Methodology (TOSEM), 2019, 28(4): 1-29

2019
[72]

Sequencer: Sequence-to- sequence learning for end-to-end program repair[J]

Chen Z, Kommrusch S, Tufano M, et al. Sequencer: Sequence-to- sequence learning for end-to-end program repair[J]. IEEE Transactions on Software Engineering, 2019, 47(9): 1943-1959

2019
[73]

Impact of code language models on automated program repair[C]//2023 IEEE/ACM 45th International Conference on Software Engineering (ICSE)

Jiang N, Liu K, Lutellier T, et al. Impact of code language models on automated program repair[C]//2023 IEEE/ACM 45th International Conference on Software Engineering (ICSE). IEEE, 2023: 1430-1442

2023
[74]

Dear: A novel deep learning-based approach for automated program repair[C]//Proceedings of the 44th international conference on software engineering

Li Y , Wang S, Nguyen T N. Dear: A novel deep learning-based approach for automated program repair[C]//Proceedings of the 44th international conference on software engineering. 2022: 511-523

2022
[75]

Fu M, Tantithamthavorn C, Le T, et al. VulRepair: a T5-based auto- mated software vulnerability repair[C]//Proceedings of the 30th ACM joint european software engineering conference and symposium on the foundations of software engineering. 2022: 935-947

2022
[76]

Tfix: Learning to fix coding errors with a text-to-text transformer[C]//International Conference on Machine Learning

Berabi B, He J, Raychev V , et al. Tfix: Learning to fix coding errors with a text-to-text transformer[C]//International Conference on Machine Learning. PMLR, 2021: 780-791

2021
[77]

Patch generation with language models: Feasibility and scaling behavior[C]//Deep Learning for Code Workshop

Kolak S D, Martins R, Le Goues C, et al. Patch generation with language models: Feasibility and scaling behavior[C]//Deep Learning for Code Workshop. 2022

2022
[78]

URLhttps://doi.org/10.48550/arXiv.2311.07215

Moon S, Chae H, Song Y , et al. Coffee: Boost your code llms by fixing bugs with feedback[J]. arXiv preprint arXiv:2311.07215, 2023

work page arXiv 2023
[79]

Retrieval-based prompt selection for code-related few-shot learning[C]//2023 IEEE/ACM 45th International Conference on Software Engineering (ICSE)

Nashid N, Sintaha M, Mesbah A. Retrieval-based prompt selection for code-related few-shot learning[C]//2023 IEEE/ACM 45th International Conference on Software Engineering (ICSE). IEEE, 2023: 2450-2462

2023
[80]

Xia C S, Zhang L. Less training, more repairing please: revisiting automated program repair via zero-shot learning[C]//Proceedings of the 30th ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering. 2022: 959- 971

2022
[81]

Dlfix: Context-based code transfor- mation learning for automated program repair[C]//Proceedings of the ACM/IEEE 42nd international conference on software engineering

Li Y , Wang S, Nguyen T N. Dlfix: Context-based code transfor- mation learning for automated program repair[C]//Proceedings of the ACM/IEEE 42nd international conference on software engineering. 2020: 602-614

2020
[82]

Jin M, Shahriar S, Tufano M, et al. Inferfix: End-to-end program repair with llms[C]//Proceedings of the 31st ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering. 2023: 1646-1656

2023
[83]

Improving bug detection via context-based code representation learning and attention-based neural networks[J]

Li Y , Wang S, Nguyen T N, et al. Improving bug detection via context-based code representation learning and attention-based neural networks[J]. Proceedings of the ACM on Programming Languages, 2019, 3(OOPSLA): 1-30

2019

Showing first 80 references.