arxiv: 2605.03117 · v1 · submitted 2026-05-04 · 💻 cs.SE · cs.AI

Recognition: 2 theorem links

ARISE: A Repository-level Graph Representation and Toolset for Agentic Fault Localization and Program Repair

Shahd Seddik , Fatemeh Fard

Authors on Pith no claims yet

Pith reviewed 2026-05-08 17:48 UTC · model grok-4.3

classification 💻 cs.SE cs.AI

keywords agentic program repairfault localizationdata-flow slicingrepository graphsSWE-benchLLM agentsprogram repair

0 comments

The pith

A multi-granularity graph with intra-procedural data-flow edges lets LLM agents localize bugs more precisely and generate more successful fixes.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Repository-level bug fixing requires agents to track both file-level structure and how values move inside functions, yet prior systems stop at structural links and leave agents without direct access to definition-use flows. ARISE constructs statement-level nodes connected by those flows and surfaces them through a tool API so an agent can request a variable slice in one step. On 300 real GitHub issues the added precision raises function-level recall by 17 points and line-level recall by 15 points, which produces a 4.7-point gain in patches that pass all tests. A reader would care because many bugs are data-flow errors that structural or text-only methods routinely miss. Ablation checks isolate the data-flow component as the driver and show that large code models can consume the structured slices directly.

Core claim

ARISE augments an LLM-based agent with a multi-granularity program graph that extends structural relationships to statement-level nodes connected by intra-procedural definition-use edges. The graph is exposed through a three-tier tool API that treats data-flow slicing as a first-class queryable primitive, letting the model trace in a single call which statements define or consume any variable of interest. Evaluated on SWE-bench Lite using Qwen2.5-Coder-32B-Instruct, ARISE improves Function Recall@1 by 17 points and Line Recall@1 by 15 points over the unmodified SWE-agent baseline. These localization gains raise repair success to 22 percent Pass@1 (66 out of 300 issues), a 4.7-point increase.

What carries the argument

Multi-granularity program graph with intra-procedural definition-use edges, exposed as a queryable primitive through a three-tier tool API for data-flow slicing.

If this is right

Localization gains from the graph translate directly into higher rates of valid patches.
Large code models can use the structured slice output without an extra natural-language summarization step.
The graph builder and slicing API function as a drop-in addition for other agent frameworks.
Controlled ablations show the performance lift comes from the data-flow edges rather than the tool interface alone.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same slicing primitive could be tested on tasks beyond repair, such as vulnerability detection or test generation.
Porting the graph builder to additional languages would allow direct comparison of data-flow benefits across codebases.
Combining the intra-procedural slices with inter-procedural call edges might further improve localization on bugs that cross function boundaries.
If the graph construction misses flows in unusually complex procedures, the observed gains would shrink on those specific cases.

Load-bearing premise

The automatically constructed graph accurately records the true definition-use relationships inside each procedure without adding false links or omitting critical flows.

What would settle it

Disable the data-flow edges and slicing tools while keeping the rest of the agent and tool schema unchanged, then re-run on the same 300 SWE-bench Lite issues; if recall and Pass@1 fall back to baseline levels the contribution of the graph is confirmed, but unchanged scores would falsify it.

Figures

Figures reproduced from arXiv: 2605.03117 by Fatemeh Fard, Shahd Seddik.

**Figure 1.** Figure 1: ARISE pipeline overview. Phase 1: Given a repository snapshot, ARISE first constructs a multigranularity program graph that combines structural relationships with intra-procedural data-flow edges (definition-use chains at statement level). Phase 2: We augment the agentic toolset with three tiers of tools. Tier 1 provides structural navigation. Tier 2 adds data-flow slicing, allowing the agent to trace how… view at source ↗

read the original abstract

Repository-level fault localization (FL) and automated program repair (APR) require an agent to identify the relevant code units across files, follow call and data dependencies, and generate a valid patch. Existing graph-based systems provide structural representations of repositories (files, classes, functions and their relationships) but do not model how variable values flow within procedures, leaving agents without the semantic precision needed for function- and line-level localization. We present ARISE (Agentic Repository-level Issue Solving Engine), which augments an LLM-based agent with a multi-granularity program graph that extends structural relationships down to statement-level nodes connected by intra-procedural definition-use edges. ARISE exposes this graph through a three-tier tool API, which brings data-flow slicing as a first-class, queryable agent primitive that allows the model to trace, in a single call, which statements define or consume a variable of interest. We evaluate on SWE-bench Lite (300 real GitHub issues, 11 Python repositories) using Qwen2.5-Coder-32B-Instruct as the backbone. Compared to the unmodified SWE-agent baseline, ARISE improves Function Recall@1 by 17.0 points and Line Recall@1 by 15.0 points. These localization gains translate directly into repair success, with ARISE achieving 22.0% Pass@1 (66/300), a 4.7 percentage-point improvement over SWE-agent. Controlled ablations confirm that the improvement is driven by the data-flow graph rather than the tool schema, and that large code models consume structured slice output directly without requiring a natural-language summarization layer. The graph builder and slicing API are designed as a framework-agnostic, drop-in toolset for future APR research.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

ARISE adds statement-level def-use edges as a queryable tool and gets a 17-point recall lift plus 4.7-point repair gain over SWE-agent on SWE-bench Lite.

read the letter

The main thing to know is that ARISE extends structural repo graphs with intra-procedural definition-use edges at statement level and exposes slicing as a direct agent tool. This produces measurable gains on the 300-issue SWE-bench Lite set: 17 points higher function recall@1, 15 points higher line recall@1, and 22% pass@1 repair success, which is 4.7 points above the unmodified SWE-agent baseline. Ablations tie the improvement to the data-flow component rather than just the tool interface, and the model uses the structured slice output without needing extra summarization. That combination is new enough in the agentic APR literature and the evaluation is run on a public benchmark with clear controls. The graph construction and three-tier API are presented as a reusable drop-in for Python repos. The numbers are modest in absolute terms, but the controlled attribution makes the result usable for follow-on work. The soft spots are that everything is shown on one model family and one language, with no deep breakdown of cases where the slices introduce noise or miss flows. We also lack the full graph-construction pseudocode or per-issue error analysis in the material I have, so it is hard to judge how robust the def-use extraction is on messy real code. Still, the central claim holds up under the reported conditions. This paper is for people building or evaluating agent toolsets for repository-level repair. It supplies a concrete primitive with evidence that others can test or extend. I would send it to peer review because the evaluation design is sound and the contribution is narrow but reproducible.

Referee Report

1 major / 3 minor

Summary. The manuscript presents ARISE, a system that augments LLM-based agents for repository-level fault localization and automated program repair with a multi-granularity program graph. This graph extends structural relationships (files, classes, functions) with statement-level nodes connected by intra-procedural definition-use edges. The graph is exposed via a three-tier tool API that makes data-flow slicing a first-class primitive, allowing agents to trace definitions and uses in a single query. Evaluated on SWE-bench Lite (300 issues across 11 Python repositories) with Qwen2.5-Coder-32B-Instruct, ARISE reports a 17-point gain in Function Recall@1 and 15-point gain in Line Recall@1 over the unmodified SWE-agent baseline; these localization improvements yield a 4.7-point increase in Pass@1 repair success (22.0%, 66/300). Controlled ablations attribute the gains to the data-flow component rather than the tool schema, and show that the LLM backbone consumes structured slice outputs directly without a natural-language summarization layer. The graph builder and slicing API are presented as a framework-agnostic drop-in toolset.

Significance. If the reported gains and ablation results hold under scrutiny, the work makes a practical contribution to agentic APR by demonstrating that explicit intra-procedural data-flow modeling can measurably improve both localization precision and end-to-end repair rates on a standard benchmark. The release of the graph-construction and slicing infrastructure as a reusable toolset is a concrete strength that lowers the barrier for follow-on research. The observation that large code models can directly interpret structured slice output is also useful for system design.

major comments (1)

[Graph construction and slicing sections (around §3–4)] The central claim that the automatically constructed multi-granularity graph 'accurately captures all relevant intra-procedural definition-use relationships without introducing false dependencies or missing critical flows' (weakest assumption) is load-bearing for the attribution of the 17-point Recall@1 and 4.7-point Pass@1 gains to the data-flow component. The manuscript should provide either (a) a manual audit of a random sample of generated slices against ground-truth def-use on a subset of the benchmark or (b) quantitative metrics on false-positive/negative edges, as the current ablation evidence alone does not rule out that the observed improvement stems from incidental properties of the slice representation rather than semantic fidelity.

minor comments (3)

[Tool API description] A concrete example (with code snippet, graph fragment, and sample slice output) would help readers understand how the three-tier API is invoked and how the LLM consumes the structured result.
[Abstract and §6] The abstract states the system is 'framework-agnostic,' yet all experiments are restricted to Python repositories; a brief discussion of the engineering effort required to port the graph builder to another language would clarify the scope of this claim.
[Evaluation figures/tables] Table or figure captions should explicitly state the exact number of issues (300) and the backbone model (Qwen2.5-Coder-32B-Instruct) so that results can be interpreted without cross-referencing the text.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for the constructive review and the recommendation of minor revision. We address the single major comment below.

read point-by-point responses

Referee: The central claim that the automatically constructed multi-granularity graph 'accurately captures all relevant intra-procedural definition-use relationships without introducing false dependencies or missing critical flows' (weakest assumption) is load-bearing for the attribution of the 17-point Recall@1 and 4.7-point Pass@1 gains to the data-flow component. The manuscript should provide either (a) a manual audit of a random sample of generated slices against ground-truth def-use on a subset of the benchmark or (b) quantitative metrics on false-positive/negative edges, as the current ablation evidence alone does not rule out that the observed improvement stems from incidental properties of the slice representation rather than semantic fidelity.

Authors: We appreciate the referee's identification of this load-bearing assumption. The ablations in §5.3 hold the tool schema and structural graph fixed while removing only the intra-procedural def-use edges, producing consistent drops in both localization and repair metrics; this design makes it unlikely that gains arise solely from incidental formatting of the slice output. Nevertheless, we agree that direct quantification of edge-level fidelity would strengthen attribution. A full ground-truth audit across the 300 issues is not feasible within the revision window, as it would require exhaustive manual annotation of def-use relations. In the revised manuscript we have therefore (i) added a dedicated paragraph in §4.2 describing the static-analysis rules used for edge construction (reaching definitions via AST-based use-def chains) and (ii) inserted a Limitations subsection (§6) that explicitly states the assumption, notes that the analysis follows standard sound techniques for Python, and provides one fully worked example of a generated slice with manual verification. We view this as a partial but substantive response to the comment. revision: partial

Circularity Check

0 steps flagged

No significant circularity identified

full rationale

The paper describes an empirical system (ARISE) that augments an LLM agent with a multi-granularity repository graph including intra-procedural def-use edges, exposed via a three-tier tool API. Central claims consist of measured gains in Function/Line Recall@1 and Pass@1 on the external SWE-bench Lite benchmark (300 issues), plus ablations attributing gains to the data-flow component rather than tool schema. No equations, first-principles derivations, fitted parameters, or predictions appear in the provided text; the argument rests on direct experimental comparison against a public baseline (SWE-agent) with controlled conditions. This is self-contained empirical evidence with no reduction of outputs to inputs by construction.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 1 invented entities

The central claim rests on the assumption that the constructed graph faithfully encodes data dependencies and that LLMs can directly exploit the resulting slices. No numeric free parameters are mentioned. The graph representation itself is the primary invented artifact.

axioms (1)

domain assumption The automatically extracted intra-procedural definition-use edges are sufficiently accurate and complete for the downstream agent to improve localization and repair.
Invoked when claiming that the data-flow graph (rather than tool schema) drives the observed gains.

invented entities (1)

ARISE multi-granularity program graph with statement-level def-use edges no independent evidence
purpose: To expose data-flow slicing as a first-class, queryable primitive for LLM agents
The graph and its three-tier API constitute the core technical contribution; no external falsifiable prediction (e.g., a specific new bug type) is supplied beyond the benchmark results.

pith-pipeline@v0.9.0 · 5624 in / 1567 out tokens · 67053 ms · 2026-05-08T17:48:03.445001+00:00 · methodology

Review history (2 revisions) →

discussion (0)

Reference graph

Works this paper leans on

45 extracted references · 21 canonical work pages · 4 internal anchors

[1]

Miltiadis Allamanis, Marc Brockschmidt, and Mahmoud Khademi. 2018. Learning to Represent Programs with Graphs. InICLR. https://miltos.allamanis.com/publicationfiles/allamanis2018learning/allamanis2018learning.pdf

2018
[2]

Uri Alon, Meital Zilberstein, Omer Levy, and Eran Yahav. 2019. code2vec: Learning Distributed Representations of Code.Proceedings of the ACM on Programming Languages3, POPL (2019), 40:1–40:29. doi:10.1145/3290353

work page doi:10.1145/3290353 2019
[3]

Amazon Web Services. 2024. Reimagining Software Development with the Amazon Q Developer Agent. https://aws.amazon.com/blogs/machine-learning/reimagining-software-development-with-the-amazon-q- developer-agent/. AWS Machine Learning Blog

2024
[4]

Ramakrishna Bairi et al. 2024. CodePlan: Repository-level Coding using LLMs and Planning.ACM Transactions on Software Engineering and Methodology (TOSEM)(2024). doi:10.1145/3643757

work page doi:10.1145/3643757 2024
[5]

Sebastian Baltes, Oliver Moseler, Fabian Beck, and Stephan Diehl. 2017. Navigate, Understand, Communicate: How Developers Locate Performance Bugs. InICPC. 260–270. doi:10.1109/ICPC.2017.21

work page doi:10.1109/icpc.2017.21 2017
[6]

Andreas Bexell, Emma Söderberg, Christofer Rydenfält, and Sigrid Eldh. 2024. How Do Developers Approach Their First Bug in an Unfamiliar Code Base? An Exploratory Study of Large Program Comprehension. InPPIG. https://ppig.org/files/2024-PPIG-35th-bexell.pdf

2024
[7]

Zhaoling Chen, Xiangru Tang, Gangda Deng, Fang Wu, Jialong Wu, Zhiwei Jiang, Viktor Prasanna, Arman Cohan, and Xingyao Wang. 2025. LocAgent: Graph-Guided LLM Agents for Code Localization. InProceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). Association for Computational Linguistics, Vienna, Aus...

work page doi:10.18653/v1/2025.acl-long.426 2025
[8]

Ottenstein, and Joe D

Jeanne Ferrante, Karl J. Ottenstein, and Joe D. Warren. 1987. The Program Dependence Graph and Its Use in Optimization.ACM Transactions on Programming Languages and Systems9, 3 (1987), 319–349. doi:10.1145/24039.24041

work page doi:10.1145/24039.24041 1987
[9]

Daya Guo, Shuo Ren, Shuai Lu, Zhangyin Feng, Duyu Tang, Shujie Liu, Long Zhou, Nan Duan, et al. 2021. GraphCode- BERT: Pre-training Code Representations with Data Flow. InICLR. https://openreview.net/forum?id=jLoC4ez43PZ

2021
[10]

Susan Horwitz, Thomas Reps, and David Binkley. 1990. Interprocedural slicing using dependence graphs.ACM Transactions on Programming Languages and Systems (TOPLAS)12, 1 (1990), 26–60

1990
[11]

Soneya Binta Hossain, Nan Jiang, Qiang Zhou, Xiaopeng Li, Wen-Hao Chiang, Yingjun Lyu, Hoan Nguyen, and Omer Tripp. 2024. A deep dive into large language models for automated bug localization and repair.Proceedings of the ACM on Software Engineering1, FSE (2024), 1471–1493

2024
[12]

Binyuan Hui, Jian Yang, Zeyu Cui, Jiaxi Yang, Dayiheng Liu, Lei Zhang, Tianyu Liu, Jiajun Zhang, Bowen Yu, Kai Dang, An Yang, Rui Men, Fei Huang, Xingzhang Ren, Xuancheng Ren, Jingren Zhou, and Junyang Lin. 2024. Qwen2.5-Coder Technical Report.arXiv preprint arXiv:2409.12186(2024)

work page internal anchor Pith review arXiv 2024
[13]

Carlos E Jimenez, John Yang, Alexander Wettig, Shunyu Yao, Kexin Pei, Ofir Press, and Karthik Narasimhan. 2023. Swe-bench: Can language models resolve real-world github issues?arXiv preprint arXiv:2310.06770(2023)

work page internal anchor Pith review arXiv 2023
[14]

James A Jones and Mary Jean Harrold. 2005. Empirical evaluation of the tarantula automatic fault-localization technique. InProceedings of the 20th IEEE/ACM international Conference on Automated software engineering. 273–282

2005
[15]

Gonzalez, Hao Zhang, and Ion Stoica

Woosuk Kwon, Zhuohan Li, Siyuan Zhuang, Ying Sheng, Lianmin Zheng, Cody Hao Yu, Joseph E. Gonzalez, Hao Zhang, and Ion Stoica. 2023. Efficient Memory Management for Large Language Model Serving with PagedAttention. InProceedings of the ACM SIGOPS 29th Symposium on Operating Systems Principles (SOSP ’23). ACM. doi:10.1145/ 3600006.3613165

work page arXiv 2023
[16]

Jia Li et al. 2025. LONGCODEU: Benchmarking Long-Context Language Models on Long Code Understanding. In ACL. https://aclanthology.org/2025.acl-long.1324.pdf

2025
[17]

Chunyan Liu, Yan Lei, Huan Xie, Jinping Wang, Yue Yu, and David Lo. 2026. Survey on learning-based dynamic fault localization: From traditional machine learning to large language models.Comput. Surveys58, 9 (2026), 1–39

2026
[18]

Jia Liu et al . 2024. RepoQA: Evaluating Long Context Code Understanding. InICLR (Workshop/Poster). https: //openreview.net/pdf?id=hK9YSrFuGf 22 Shahd Seddik and Fatemeh Fard

2024
[19]

Tianyang Liu, Canwen Xu, and Julian McAuley. 2024. RepoBench: Benchmarking Repository-Level Code Auto-Completion Systems. InICLR. https://proceedings.iclr.cc/paper_files/paper/2024/file/ d191ba4c8923ed8fd8935b7c98658b5f-Paper-Conference.pdf

2024
[20]

Wenjun Liu, Yihui Sun, Jiefeng Wei, Yiheng Li, Yiran Chen, Hai Zhao, Shuai Wang, Shizhe Fu, Ge Sun, and Kai Zhang. 2024. GraphCoder: Enhancing Repository-Level Code Completion via Coarse-to-Fine Retrieval Based on Code Context Graph. InProceedings of the 39th IEEE/ACM International Conference on Automated Software Engineering (ASE). IEEE/ACM, 570–582. doi...

work page doi:10.1145/3691620.3695054 2024
[21]

Xiangyan Liu, Bo Lan, Zhiyuan Hu, Yang Liu, Zhicheng Zhang, Fei Wang, Michael Qizhe Shieh, and Wenmeng Zhou
[22]

InProceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers)

Codexgraph: Bridging large language models and code repositories via code graph databases. InProceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers). 142–160

2025
[23]

Yingwei Ma, Qingping Yang, Rongyu Cao, Binhua Li, Fei Huang, and Yongbin Li. 2025. Alibaba LingmaAgent: Improving Automated Issue Resolution via Comprehensive Repository Exploration. InCompanion Proceedings of the 33rd ACM International Conference on the Foundations of Software Engineering (FSE Companion ’25). ACM, New York, NY, USA. doi:10.1145/3696630.3728549

work page doi:10.1145/3696630.3728549 2025
[24]

Fangwen Mu, Junjie Wang, Lin Shi, Song Wang, Shoubin Li, and Qing Wang. 2025. EXPEREPAIR: Dual-Memory Enhanced LLM-based Repository-Level Program Repair.arXiv preprint arXiv:2506.10484(2025)

work page internal anchor Pith review arXiv 2025
[25]

Siru Ouyang, Wenhao Yu, Kaixin Ma, Zilin Xiao, Zhihan Zhang, Mengzhao Jia, Jiawei Han, Hongming Zhang, and Dong Yu. 2025. RepoGraph: Enhancing AI Software Engineering with Repository-level Code Graph. InProceedings of the International Conference on Learning Representations (ICLR). https://proceedings.iclr.cc/paper_files/paper/2025/ file/4a4a3c197deac0424...

2025
[26]

Mike Papadakis and Yves Le Traon. 2015. Metallaxis-FL: mutation-based fault localization.Software Testing, Verification and Reliability25, 5-7 (2015), 605–628

2015
[27]

Shani Pearce, Abhay Singh, Luke Hales, Emma Finlayson, and Brett A. Becker. 2024. Needles in a Haystack: Student Struggles with Working on Large Code Bases. InSIGCSE. doi:10.1145/3702652.3744218

work page doi:10.1145/3702652.3744218 2024
[28]

Rachel Potvin and Josh Levenberg. 2016. Why Google Stores Billions of Lines of Code in a Single Repository.Commun. ACM59, 7 (2016), 78–87. doi:10.1145/2854146

work page doi:10.1145/2854146 2016
[29]

Yihao Qin, Shangwen Wang, Yiling Lou, Jinhao Dong, Kaixin Wang, Xiaoling Li, and Xiaoguang Mao. 2024. Agentfl: Scaling llm-based fault localization to project-level context.arXiv preprint arXiv:2403.16362(2024)

work page arXiv 2024
[30]

Samuel Rando et al. 2025. Evaluating Coding LLMs at 1M Context Windows: LongCodeBench. OpenReview preprint. https://openreview.net/pdf?id=GFPoM8Ylp8

2025
[31]

Melika Sepidband, Hamed Taherkhani, Hung Viet Pham, and Hadi Hemmati. 2026. RGFL: Reasoning Guided Fault Localization for Automated Program Repair Using Large Language Models.arXiv e-prints(2026), arXiv–2601

2026
[32]

Akihiro Takahashi, Yoshiki Higo, and Shinji Kusumoto. 2021. An Extensive Study on Smell-Aware Bug Localization. Journal of Systems and Software177 (2021), 110957. doi:10.1016/j.jss.2021.110957

work page doi:10.1016/j.jss.2021.110957 2021
[33]

Tianyi Tang, Tianyi Xu, Sumon Karmakar, and Toby Jia-Jun Li. 2023. An Empirical Study of Developer Behaviors for Validating and Repairing AI-Generated Code. InPLATEAU@SPLASH. https://toby.li/files/plateau23-tang-copilot.pdf

2023
[34]

Frank Tipp. 1995. A survey of program slicing techniques.Journal of programming languages3, 3 (1995), 121–189

1995
[35]

Mark Weiser. 1981. Program slicing.IEEE Transactions on Software Engineering4 (1981), 352–357

1981
[36]

W Eric Wong, Ruizhi Gao, Yihao Li, Rui Abreu, Franz Wotawa, and Dongcheng Li. 2023. Software fault localization: An overview of research, techniques, and tools.Handbook of Software Fault Localization: Foundations and Advances (2023), 1–117

2023
[37]

Chunqiu Steven Xia, Yinlin Deng, Soren Dunn, and Lingming Zhang. 2025. Demystifying LLM-Based Software Engineering Agents.Proceedings of the ACM on Software Engineering2, FSE (2025), 801–824. doi:10.1145/3715754

work page doi:10.1145/3715754 2025
[38]

Fabian Yamaguchi, Nico Golde, Daniel Arp, and Konrad Rieck. 2014. Modeling and Discovering Vulnerabilities with Code Property Graphs. InIEEE Symposium on Security and Privacy. 590–604. doi:10.1109/SP.2014.44

work page doi:10.1109/sp.2014.44 2014
[39]

An Yang, Anfeng Li, Baosong Yang, Beichen Zhang, Binyuan Hui, Bo Zheng, Bowen Yu, Chang Gao, Chengen Huang, Chenxu Lv, Chujie Zheng, Dayiheng Liu, Fan Zhou, Fei Huang, Feng Hu, Hao Ge, Haoran Wei, Huan Lin, Jialong Tang, Jian Yang, Jianhong Tu, Jianwei Zhang, Jianxin Yang, Jiaxi Yang, Jing Zhou, Jingren Zhou, Junyang Lin, Kai Dang, Keqin Bao, Kexin Yang, ...

work page internal anchor Pith review arXiv 2025
[40]

Boyang Yang, Jiadong Ren, Shunfu Jin, Yang Liu, Feng Liu, Bach Le, and Haoye Tian. 2025. Enhancing Repository-Level Software Repair via Repository-Aware Knowledge Graphs.arXiv preprint arXiv:2503.21710(2025). ARISE: A Repository-level Graph Representation and Toolset for Agentic Fault Localization and Program Repair 23

work page arXiv 2025
[41]

Jiménez, Alexander Wettig, Kilian Lieret, Shunyu Yao, et al

Jian Yang, Carlos E. Jiménez, Alexander Wettig, Kilian Lieret, Shunyu Yao, et al. 2024. SWE-agent: Agent-Computer Interfaces Enable Automated Software Engineering. InAdvances in Neural Information Processing Systems (NeurIPS). https://papers.nips.cc/paper_files/paper/2024/file/5a7c947568c1b1328ccc5230172e1e7c-Paper-Conference.pdf

2024
[42]

Seohyun Youm, Hojun Yeon, Eunjong Kim, Eunjong Lee, Eunjong Park, et al. 2018. Bench4BL: Reproducibility Study on the Performance of IR-based Bug Localization. InProceedings of ISSTA. doi:10.1145/3213846.3213856

work page doi:10.1145/3213846.3213856 2018
[43]

Zhongming Yu, Hejia Zhang, Yujie Zhao, Hanxian Huang, Matrix Yao, Ke Ding, and Jishen Zhao. 2025. OrcaLoca: An LLM Agent Framework for Software Issue Localization. InProceedings of the 42nd International Conference on Machine Learning (Proceedings of Machine Learning Research, Vol. 267). PMLR, 73416–73436. https://proceedings.mlr.press/ v267/yu25x.html

2025
[44]

Yuntong Zhang, Haifeng Ruan, Zhiyu Fan, and Abhik Roychoudhury. 2024. AutoCodeRover: Autonomous Program Improvement. InProceedings of the 33rd ACM SIGSOFT International Symposium on Software Testing and Analysis (ISSTA ’24). ACM, New York, NY, USA, 1592–1604. doi:10.1145/3650212.3680384

work page doi:10.1145/3650212.3680384 2024
[45]

Yaqin Zhou, Shangqing Liu, Jing Kai Siow, Xiaoning Du, and Yang Liu. 2019. Devign: Effective Vulnerability Iden- tification by Learning Comprehensive Program Semantics via Graph Neural Networks. InNeurIPS. 10197–10207. https://papers.neurips.cc/paper/2019/file/49265d2447bc3bbfe9e76306ce40a31f-Paper.pdf

2019