pith. machine review for the scientific record. sign in

arxiv: 2605.03117 · v1 · submitted 2026-05-04 · 💻 cs.SE · cs.AI

Recognition: 2 theorem links

ARISE: A Repository-level Graph Representation and Toolset for Agentic Fault Localization and Program Repair

Authors on Pith no claims yet

Pith reviewed 2026-05-08 17:48 UTC · model grok-4.3

classification 💻 cs.SE cs.AI
keywords agentic program repairfault localizationdata-flow slicingrepository graphsSWE-benchLLM agentsprogram repair
0
0 comments X

The pith

A multi-granularity graph with intra-procedural data-flow edges lets LLM agents localize bugs more precisely and generate more successful fixes.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Repository-level bug fixing requires agents to track both file-level structure and how values move inside functions, yet prior systems stop at structural links and leave agents without direct access to definition-use flows. ARISE constructs statement-level nodes connected by those flows and surfaces them through a tool API so an agent can request a variable slice in one step. On 300 real GitHub issues the added precision raises function-level recall by 17 points and line-level recall by 15 points, which produces a 4.7-point gain in patches that pass all tests. A reader would care because many bugs are data-flow errors that structural or text-only methods routinely miss. Ablation checks isolate the data-flow component as the driver and show that large code models can consume the structured slices directly.

Core claim

ARISE augments an LLM-based agent with a multi-granularity program graph that extends structural relationships to statement-level nodes connected by intra-procedural definition-use edges. The graph is exposed through a three-tier tool API that treats data-flow slicing as a first-class queryable primitive, letting the model trace in a single call which statements define or consume any variable of interest. Evaluated on SWE-bench Lite using Qwen2.5-Coder-32B-Instruct, ARISE improves Function Recall@1 by 17 points and Line Recall@1 by 15 points over the unmodified SWE-agent baseline. These localization gains raise repair success to 22 percent Pass@1 (66 out of 300 issues), a 4.7-point increase.

What carries the argument

Multi-granularity program graph with intra-procedural definition-use edges, exposed as a queryable primitive through a three-tier tool API for data-flow slicing.

If this is right

  • Localization gains from the graph translate directly into higher rates of valid patches.
  • Large code models can use the structured slice output without an extra natural-language summarization step.
  • The graph builder and slicing API function as a drop-in addition for other agent frameworks.
  • Controlled ablations show the performance lift comes from the data-flow edges rather than the tool interface alone.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same slicing primitive could be tested on tasks beyond repair, such as vulnerability detection or test generation.
  • Porting the graph builder to additional languages would allow direct comparison of data-flow benefits across codebases.
  • Combining the intra-procedural slices with inter-procedural call edges might further improve localization on bugs that cross function boundaries.
  • If the graph construction misses flows in unusually complex procedures, the observed gains would shrink on those specific cases.

Load-bearing premise

The automatically constructed graph accurately records the true definition-use relationships inside each procedure without adding false links or omitting critical flows.

What would settle it

Disable the data-flow edges and slicing tools while keeping the rest of the agent and tool schema unchanged, then re-run on the same 300 SWE-bench Lite issues; if recall and Pass@1 fall back to baseline levels the contribution of the graph is confirmed, but unchanged scores would falsify it.

Figures

Figures reproduced from arXiv: 2605.03117 by Fatemeh Fard, Shahd Seddik.

Figure 1
Figure 1. Figure 1: ARISE pipeline overview. Phase 1: Given a repository snapshot, ARISE first constructs a multi￾granularity program graph that combines structural relationships with intra-procedural data-flow edges (definition-use chains at statement level). Phase 2: We augment the agentic toolset with three tiers of tools. Tier 1 provides structural navigation. Tier 2 adds data-flow slicing, allowing the agent to trace how… view at source ↗
read the original abstract

Repository-level fault localization (FL) and automated program repair (APR) require an agent to identify the relevant code units across files, follow call and data dependencies, and generate a valid patch. Existing graph-based systems provide structural representations of repositories (files, classes, functions and their relationships) but do not model how variable values flow within procedures, leaving agents without the semantic precision needed for function- and line-level localization. We present ARISE (Agentic Repository-level Issue Solving Engine), which augments an LLM-based agent with a multi-granularity program graph that extends structural relationships down to statement-level nodes connected by intra-procedural definition-use edges. ARISE exposes this graph through a three-tier tool API, which brings data-flow slicing as a first-class, queryable agent primitive that allows the model to trace, in a single call, which statements define or consume a variable of interest. We evaluate on SWE-bench Lite (300 real GitHub issues, 11 Python repositories) using Qwen2.5-Coder-32B-Instruct as the backbone. Compared to the unmodified SWE-agent baseline, ARISE improves Function Recall@1 by 17.0 points and Line Recall@1 by 15.0 points. These localization gains translate directly into repair success, with ARISE achieving 22.0% Pass@1 (66/300), a 4.7 percentage-point improvement over SWE-agent. Controlled ablations confirm that the improvement is driven by the data-flow graph rather than the tool schema, and that large code models consume structured slice output directly without requiring a natural-language summarization layer. The graph builder and slicing API are designed as a framework-agnostic, drop-in toolset for future APR research.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

1 major / 3 minor

Summary. The manuscript presents ARISE, a system that augments LLM-based agents for repository-level fault localization and automated program repair with a multi-granularity program graph. This graph extends structural relationships (files, classes, functions) with statement-level nodes connected by intra-procedural definition-use edges. The graph is exposed via a three-tier tool API that makes data-flow slicing a first-class primitive, allowing agents to trace definitions and uses in a single query. Evaluated on SWE-bench Lite (300 issues across 11 Python repositories) with Qwen2.5-Coder-32B-Instruct, ARISE reports a 17-point gain in Function Recall@1 and 15-point gain in Line Recall@1 over the unmodified SWE-agent baseline; these localization improvements yield a 4.7-point increase in Pass@1 repair success (22.0%, 66/300). Controlled ablations attribute the gains to the data-flow component rather than the tool schema, and show that the LLM backbone consumes structured slice outputs directly without a natural-language summarization layer. The graph builder and slicing API are presented as a framework-agnostic drop-in toolset.

Significance. If the reported gains and ablation results hold under scrutiny, the work makes a practical contribution to agentic APR by demonstrating that explicit intra-procedural data-flow modeling can measurably improve both localization precision and end-to-end repair rates on a standard benchmark. The release of the graph-construction and slicing infrastructure as a reusable toolset is a concrete strength that lowers the barrier for follow-on research. The observation that large code models can directly interpret structured slice output is also useful for system design.

major comments (1)
  1. [Graph construction and slicing sections (around §3–4)] The central claim that the automatically constructed multi-granularity graph 'accurately captures all relevant intra-procedural definition-use relationships without introducing false dependencies or missing critical flows' (weakest assumption) is load-bearing for the attribution of the 17-point Recall@1 and 4.7-point Pass@1 gains to the data-flow component. The manuscript should provide either (a) a manual audit of a random sample of generated slices against ground-truth def-use on a subset of the benchmark or (b) quantitative metrics on false-positive/negative edges, as the current ablation evidence alone does not rule out that the observed improvement stems from incidental properties of the slice representation rather than semantic fidelity.
minor comments (3)
  1. [Tool API description] A concrete example (with code snippet, graph fragment, and sample slice output) would help readers understand how the three-tier API is invoked and how the LLM consumes the structured result.
  2. [Abstract and §6] The abstract states the system is 'framework-agnostic,' yet all experiments are restricted to Python repositories; a brief discussion of the engineering effort required to port the graph builder to another language would clarify the scope of this claim.
  3. [Evaluation figures/tables] Table or figure captions should explicitly state the exact number of issues (300) and the backbone model (Qwen2.5-Coder-32B-Instruct) so that results can be interpreted without cross-referencing the text.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for the constructive review and the recommendation of minor revision. We address the single major comment below.

read point-by-point responses
  1. Referee: The central claim that the automatically constructed multi-granularity graph 'accurately captures all relevant intra-procedural definition-use relationships without introducing false dependencies or missing critical flows' (weakest assumption) is load-bearing for the attribution of the 17-point Recall@1 and 4.7-point Pass@1 gains to the data-flow component. The manuscript should provide either (a) a manual audit of a random sample of generated slices against ground-truth def-use on a subset of the benchmark or (b) quantitative metrics on false-positive/negative edges, as the current ablation evidence alone does not rule out that the observed improvement stems from incidental properties of the slice representation rather than semantic fidelity.

    Authors: We appreciate the referee's identification of this load-bearing assumption. The ablations in §5.3 hold the tool schema and structural graph fixed while removing only the intra-procedural def-use edges, producing consistent drops in both localization and repair metrics; this design makes it unlikely that gains arise solely from incidental formatting of the slice output. Nevertheless, we agree that direct quantification of edge-level fidelity would strengthen attribution. A full ground-truth audit across the 300 issues is not feasible within the revision window, as it would require exhaustive manual annotation of def-use relations. In the revised manuscript we have therefore (i) added a dedicated paragraph in §4.2 describing the static-analysis rules used for edge construction (reaching definitions via AST-based use-def chains) and (ii) inserted a Limitations subsection (§6) that explicitly states the assumption, notes that the analysis follows standard sound techniques for Python, and provides one fully worked example of a generated slice with manual verification. We view this as a partial but substantive response to the comment. revision: partial

Circularity Check

0 steps flagged

No significant circularity identified

full rationale

The paper describes an empirical system (ARISE) that augments an LLM agent with a multi-granularity repository graph including intra-procedural def-use edges, exposed via a three-tier tool API. Central claims consist of measured gains in Function/Line Recall@1 and Pass@1 on the external SWE-bench Lite benchmark (300 issues), plus ablations attributing gains to the data-flow component rather than tool schema. No equations, first-principles derivations, fitted parameters, or predictions appear in the provided text; the argument rests on direct experimental comparison against a public baseline (SWE-agent) with controlled conditions. This is self-contained empirical evidence with no reduction of outputs to inputs by construction.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 1 invented entities

The central claim rests on the assumption that the constructed graph faithfully encodes data dependencies and that LLMs can directly exploit the resulting slices. No numeric free parameters are mentioned. The graph representation itself is the primary invented artifact.

axioms (1)
  • domain assumption The automatically extracted intra-procedural definition-use edges are sufficiently accurate and complete for the downstream agent to improve localization and repair.
    Invoked when claiming that the data-flow graph (rather than tool schema) drives the observed gains.
invented entities (1)
  • ARISE multi-granularity program graph with statement-level def-use edges no independent evidence
    purpose: To expose data-flow slicing as a first-class, queryable primitive for LLM agents
    The graph and its three-tier API constitute the core technical contribution; no external falsifiable prediction (e.g., a specific new bug type) is supplied beyond the benchmark results.

pith-pipeline@v0.9.0 · 5624 in / 1567 out tokens · 67053 ms · 2026-05-08T17:48:03.445001+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

45 extracted references · 21 canonical work pages · 4 internal anchors

  1. [1]

    Miltiadis Allamanis, Marc Brockschmidt, and Mahmoud Khademi. 2018. Learning to Represent Programs with Graphs. InICLR. https://miltos.allamanis.com/publicationfiles/allamanis2018learning/allamanis2018learning.pdf

  2. [2]

    Uri Alon, Meital Zilberstein, Omer Levy, and Eran Yahav. 2019. code2vec: Learning Distributed Representations of Code.Proceedings of the ACM on Programming Languages3, POPL (2019), 40:1–40:29. doi:10.1145/3290353

  3. [3]

    Amazon Web Services. 2024. Reimagining Software Development with the Amazon Q Developer Agent. https://aws.amazon.com/blogs/machine-learning/reimagining-software-development-with-the-amazon-q- developer-agent/. AWS Machine Learning Blog

  4. [4]

    Ramakrishna Bairi et al. 2024. CodePlan: Repository-level Coding using LLMs and Planning.ACM Transactions on Software Engineering and Methodology (TOSEM)(2024). doi:10.1145/3643757

  5. [5]

    Sebastian Baltes, Oliver Moseler, Fabian Beck, and Stephan Diehl. 2017. Navigate, Understand, Communicate: How Developers Locate Performance Bugs. InICPC. 260–270. doi:10.1109/ICPC.2017.21

  6. [6]

    Andreas Bexell, Emma Söderberg, Christofer Rydenfält, and Sigrid Eldh. 2024. How Do Developers Approach Their First Bug in an Unfamiliar Code Base? An Exploratory Study of Large Program Comprehension. InPPIG. https://ppig.org/files/2024-PPIG-35th-bexell.pdf

  7. [7]

    Zhaoling Chen, Xiangru Tang, Gangda Deng, Fang Wu, Jialong Wu, Zhiwei Jiang, Viktor Prasanna, Arman Cohan, and Xingyao Wang. 2025. LocAgent: Graph-Guided LLM Agents for Code Localization. InProceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). Association for Computational Linguistics, Vienna, Aus...

  8. [8]

    Ottenstein, and Joe D

    Jeanne Ferrante, Karl J. Ottenstein, and Joe D. Warren. 1987. The Program Dependence Graph and Its Use in Optimization.ACM Transactions on Programming Languages and Systems9, 3 (1987), 319–349. doi:10.1145/24039.24041

  9. [9]

    Daya Guo, Shuo Ren, Shuai Lu, Zhangyin Feng, Duyu Tang, Shujie Liu, Long Zhou, Nan Duan, et al. 2021. GraphCode- BERT: Pre-training Code Representations with Data Flow. InICLR. https://openreview.net/forum?id=jLoC4ez43PZ

  10. [10]

    Susan Horwitz, Thomas Reps, and David Binkley. 1990. Interprocedural slicing using dependence graphs.ACM Transactions on Programming Languages and Systems (TOPLAS)12, 1 (1990), 26–60

  11. [11]

    Soneya Binta Hossain, Nan Jiang, Qiang Zhou, Xiaopeng Li, Wen-Hao Chiang, Yingjun Lyu, Hoan Nguyen, and Omer Tripp. 2024. A deep dive into large language models for automated bug localization and repair.Proceedings of the ACM on Software Engineering1, FSE (2024), 1471–1493

  12. [12]

    Binyuan Hui, Jian Yang, Zeyu Cui, Jiaxi Yang, Dayiheng Liu, Lei Zhang, Tianyu Liu, Jiajun Zhang, Bowen Yu, Kai Dang, An Yang, Rui Men, Fei Huang, Xingzhang Ren, Xuancheng Ren, Jingren Zhou, and Junyang Lin. 2024. Qwen2.5-Coder Technical Report.arXiv preprint arXiv:2409.12186(2024)

  13. [13]

    Carlos E Jimenez, John Yang, Alexander Wettig, Shunyu Yao, Kexin Pei, Ofir Press, and Karthik Narasimhan. 2023. Swe-bench: Can language models resolve real-world github issues?arXiv preprint arXiv:2310.06770(2023)

  14. [14]

    James A Jones and Mary Jean Harrold. 2005. Empirical evaluation of the tarantula automatic fault-localization technique. InProceedings of the 20th IEEE/ACM international Conference on Automated software engineering. 273–282

  15. [15]

    Gonzalez, Hao Zhang, and Ion Stoica

    Woosuk Kwon, Zhuohan Li, Siyuan Zhuang, Ying Sheng, Lianmin Zheng, Cody Hao Yu, Joseph E. Gonzalez, Hao Zhang, and Ion Stoica. 2023. Efficient Memory Management for Large Language Model Serving with PagedAttention. InProceedings of the ACM SIGOPS 29th Symposium on Operating Systems Principles (SOSP ’23). ACM. doi:10.1145/ 3600006.3613165

  16. [16]

    Jia Li et al. 2025. LONGCODEU: Benchmarking Long-Context Language Models on Long Code Understanding. In ACL. https://aclanthology.org/2025.acl-long.1324.pdf

  17. [17]

    Chunyan Liu, Yan Lei, Huan Xie, Jinping Wang, Yue Yu, and David Lo. 2026. Survey on learning-based dynamic fault localization: From traditional machine learning to large language models.Comput. Surveys58, 9 (2026), 1–39

  18. [18]

    Jia Liu et al . 2024. RepoQA: Evaluating Long Context Code Understanding. InICLR (Workshop/Poster). https: //openreview.net/pdf?id=hK9YSrFuGf 22 Shahd Seddik and Fatemeh Fard

  19. [19]

    Tianyang Liu, Canwen Xu, and Julian McAuley. 2024. RepoBench: Benchmarking Repository-Level Code Auto-Completion Systems. InICLR. https://proceedings.iclr.cc/paper_files/paper/2024/file/ d191ba4c8923ed8fd8935b7c98658b5f-Paper-Conference.pdf

  20. [20]

    Wenjun Liu, Yihui Sun, Jiefeng Wei, Yiheng Li, Yiran Chen, Hai Zhao, Shuai Wang, Shizhe Fu, Ge Sun, and Kai Zhang. 2024. GraphCoder: Enhancing Repository-Level Code Completion via Coarse-to-Fine Retrieval Based on Code Context Graph. InProceedings of the 39th IEEE/ACM International Conference on Automated Software Engineering (ASE). IEEE/ACM, 570–582. doi...

  21. [21]

    Xiangyan Liu, Bo Lan, Zhiyuan Hu, Yang Liu, Zhicheng Zhang, Fei Wang, Michael Qizhe Shieh, and Wenmeng Zhou

  22. [22]

    InProceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers)

    Codexgraph: Bridging large language models and code repositories via code graph databases. InProceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers). 142–160

  23. [23]

    Yingwei Ma, Qingping Yang, Rongyu Cao, Binhua Li, Fei Huang, and Yongbin Li. 2025. Alibaba LingmaAgent: Improving Automated Issue Resolution via Comprehensive Repository Exploration. InCompanion Proceedings of the 33rd ACM International Conference on the Foundations of Software Engineering (FSE Companion ’25). ACM, New York, NY, USA. doi:10.1145/3696630.3728549

  24. [24]

    Fangwen Mu, Junjie Wang, Lin Shi, Song Wang, Shoubin Li, and Qing Wang. 2025. EXPEREPAIR: Dual-Memory Enhanced LLM-based Repository-Level Program Repair.arXiv preprint arXiv:2506.10484(2025)

  25. [25]

    Siru Ouyang, Wenhao Yu, Kaixin Ma, Zilin Xiao, Zhihan Zhang, Mengzhao Jia, Jiawei Han, Hongming Zhang, and Dong Yu. 2025. RepoGraph: Enhancing AI Software Engineering with Repository-level Code Graph. InProceedings of the International Conference on Learning Representations (ICLR). https://proceedings.iclr.cc/paper_files/paper/2025/ file/4a4a3c197deac0424...

  26. [26]

    Mike Papadakis and Yves Le Traon. 2015. Metallaxis-FL: mutation-based fault localization.Software Testing, Verification and Reliability25, 5-7 (2015), 605–628

  27. [27]

    Shani Pearce, Abhay Singh, Luke Hales, Emma Finlayson, and Brett A. Becker. 2024. Needles in a Haystack: Student Struggles with Working on Large Code Bases. InSIGCSE. doi:10.1145/3702652.3744218

  28. [28]

    Rachel Potvin and Josh Levenberg. 2016. Why Google Stores Billions of Lines of Code in a Single Repository.Commun. ACM59, 7 (2016), 78–87. doi:10.1145/2854146

  29. [29]

    Yihao Qin, Shangwen Wang, Yiling Lou, Jinhao Dong, Kaixin Wang, Xiaoling Li, and Xiaoguang Mao. 2024. Agentfl: Scaling llm-based fault localization to project-level context.arXiv preprint arXiv:2403.16362(2024)

  30. [30]

    Samuel Rando et al. 2025. Evaluating Coding LLMs at 1M Context Windows: LongCodeBench. OpenReview preprint. https://openreview.net/pdf?id=GFPoM8Ylp8

  31. [31]

    Melika Sepidband, Hamed Taherkhani, Hung Viet Pham, and Hadi Hemmati. 2026. RGFL: Reasoning Guided Fault Localization for Automated Program Repair Using Large Language Models.arXiv e-prints(2026), arXiv–2601

  32. [32]

    Akihiro Takahashi, Yoshiki Higo, and Shinji Kusumoto. 2021. An Extensive Study on Smell-Aware Bug Localization. Journal of Systems and Software177 (2021), 110957. doi:10.1016/j.jss.2021.110957

  33. [33]

    Tianyi Tang, Tianyi Xu, Sumon Karmakar, and Toby Jia-Jun Li. 2023. An Empirical Study of Developer Behaviors for Validating and Repairing AI-Generated Code. InPLATEAU@SPLASH. https://toby.li/files/plateau23-tang-copilot.pdf

  34. [34]

    Frank Tipp. 1995. A survey of program slicing techniques.Journal of programming languages3, 3 (1995), 121–189

  35. [35]

    Mark Weiser. 1981. Program slicing.IEEE Transactions on Software Engineering4 (1981), 352–357

  36. [36]

    W Eric Wong, Ruizhi Gao, Yihao Li, Rui Abreu, Franz Wotawa, and Dongcheng Li. 2023. Software fault localization: An overview of research, techniques, and tools.Handbook of Software Fault Localization: Foundations and Advances (2023), 1–117

  37. [37]

    Chunqiu Steven Xia, Yinlin Deng, Soren Dunn, and Lingming Zhang. 2025. Demystifying LLM-Based Software Engineering Agents.Proceedings of the ACM on Software Engineering2, FSE (2025), 801–824. doi:10.1145/3715754

  38. [38]

    Fabian Yamaguchi, Nico Golde, Daniel Arp, and Konrad Rieck. 2014. Modeling and Discovering Vulnerabilities with Code Property Graphs. InIEEE Symposium on Security and Privacy. 590–604. doi:10.1109/SP.2014.44

  39. [39]

    An Yang, Anfeng Li, Baosong Yang, Beichen Zhang, Binyuan Hui, Bo Zheng, Bowen Yu, Chang Gao, Chengen Huang, Chenxu Lv, Chujie Zheng, Dayiheng Liu, Fan Zhou, Fei Huang, Feng Hu, Hao Ge, Haoran Wei, Huan Lin, Jialong Tang, Jian Yang, Jianhong Tu, Jianwei Zhang, Jianxin Yang, Jiaxi Yang, Jing Zhou, Jingren Zhou, Junyang Lin, Kai Dang, Keqin Bao, Kexin Yang, ...

  40. [40]

    Boyang Yang, Jiadong Ren, Shunfu Jin, Yang Liu, Feng Liu, Bach Le, and Haoye Tian. 2025. Enhancing Repository-Level Software Repair via Repository-Aware Knowledge Graphs.arXiv preprint arXiv:2503.21710(2025). ARISE: A Repository-level Graph Representation and Toolset for Agentic Fault Localization and Program Repair 23

  41. [41]

    Jiménez, Alexander Wettig, Kilian Lieret, Shunyu Yao, et al

    Jian Yang, Carlos E. Jiménez, Alexander Wettig, Kilian Lieret, Shunyu Yao, et al. 2024. SWE-agent: Agent-Computer Interfaces Enable Automated Software Engineering. InAdvances in Neural Information Processing Systems (NeurIPS). https://papers.nips.cc/paper_files/paper/2024/file/5a7c947568c1b1328ccc5230172e1e7c-Paper-Conference.pdf

  42. [42]

    Seohyun Youm, Hojun Yeon, Eunjong Kim, Eunjong Lee, Eunjong Park, et al. 2018. Bench4BL: Reproducibility Study on the Performance of IR-based Bug Localization. InProceedings of ISSTA. doi:10.1145/3213846.3213856

  43. [43]

    Zhongming Yu, Hejia Zhang, Yujie Zhao, Hanxian Huang, Matrix Yao, Ke Ding, and Jishen Zhao. 2025. OrcaLoca: An LLM Agent Framework for Software Issue Localization. InProceedings of the 42nd International Conference on Machine Learning (Proceedings of Machine Learning Research, Vol. 267). PMLR, 73416–73436. https://proceedings.mlr.press/ v267/yu25x.html

  44. [44]

    Yuntong Zhang, Haifeng Ruan, Zhiyu Fan, and Abhik Roychoudhury. 2024. AutoCodeRover: Autonomous Program Improvement. InProceedings of the 33rd ACM SIGSOFT International Symposium on Software Testing and Analysis (ISSTA ’24). ACM, New York, NY, USA, 1592–1604. doi:10.1145/3650212.3680384

  45. [45]

    Yaqin Zhou, Shangqing Liu, Jing Kai Siow, Xiaoning Du, and Yang Liu. 2019. Devign: Effective Vulnerability Iden- tification by Learning Comprehensive Program Semantics via Graph Neural Networks. InNeurIPS. 10197–10207. https://papers.neurips.cc/paper/2019/file/49265d2447bc3bbfe9e76306ce40a31f-Paper.pdf