MAS-SZZ: Multi-Agentic SZZ Algorithm for Vulnerability-Inducing Commit Identification

Fu Xiao; Jing Yang; Jinxuan Xu; Le Yu; Linlin Zhu; Sicong Cao; Xingwei Lin

arxiv: 2604.24398 · v1 · submitted 2026-04-27 · 💻 cs.CR · cs.SE

MAS-SZZ: Multi-Agentic SZZ Algorithm for Vulnerability-Inducing Commit Identification

Sicong Cao , Jinxuan Xu , Le Yu , Jing Yang , Xingwei Lin , Linlin Zhu , Fu Xiao This is my paper

Pith reviewed 2026-05-08 02:54 UTC · model grok-4.3

classification 💻 cs.CR cs.SE

keywords vulnerability-inducing commitSZZ algorithmmulti-agent systemCVE descriptionpatch hunk localizationroot cause summarizationcode history tracingsoftware security

0 comments

The pith

MAS-SZZ uses multi-agent collaboration to summarize root causes and localize vulnerable statements for more accurate backtracking to vulnerability-inducing commits.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper seeks to solve the problem of reliably identifying the earliest commit that introduced a software vulnerability, a task that underpins vulnerability detection, affected version analysis, and other security work. Prior SZZ variants, including V-SZZ and LLM4SZZ, often fail because they select poor anchors in the fixing patch and cannot trace backward effectively. MAS-SZZ instead deploys cooperating agents that first distill the vulnerability root cause from the CVE description and fixing commit, then apply structured step-forward prompting to mark vulnerability-related statements inside each patch hunk. These statements become the anchors from which the system traces the repository history to locate the introducing commit. Experiments across datasets and languages report F1-score improvements of up to 65.22 percent over the strongest baseline.

Core claim

MAS-SZZ identifies vulnerability-inducing commits by having agents collaborate on two steps: they summarize the root cause from the given CVE description and fixing commit, then use structured step-forward prompting to localize vulnerability-related statements from the change intent of each patch hunk; those statements serve as anchors for autonomous backward tracing through the repository history to the commit that first introduced the vulnerability.

What carries the argument

Multi-agent system that summarizes the root cause and applies structured step-forward prompting to localize vulnerability-related statements from patch hunks, which then act as anchors for historical backtracking.

If this is right

Supplies a stronger foundation for downstream security tasks such as vulnerability detection and affected-version analysis.
Delivers F1-score gains of up to 65.22 percent over the best-performing prior SZZ algorithm across multiple datasets and languages.
Addresses the specific failures of incorrect anchor selection and inadequate backtracking that limited V-SZZ and LLM4SZZ.
Enables autonomous tracing from patch-derived anchors without manual intervention.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same agent-collaboration pattern could be tested on identifying the origins of non-vulnerability bugs or other code-change types.
Performance may vary on vulnerabilities lacking detailed CVE descriptions, since the method depends on those descriptions for root-cause summarization.
Replacing the current agents with larger or fine-tuned models might increase localization accuracy but would require separate validation.
The approach could be combined with static-analysis tools to cross-check the localized statements before backtracking begins.

Load-bearing premise

The multi-agent system can reliably summarize the root cause from the CVE and fixing commit and correctly localize vulnerability-related statements from patch hunks without introducing errors that invalidate the subsequent backtracking.

What would settle it

A hand-labeled sample of CVEs in which the agents' summarized root cause or localized statements are shown to be incorrect, or a new dataset on which MAS-SZZ fails to produce F1 gains over the best prior SZZ method.

Figures

Figures reproduced from arXiv: 2604.24398 by Fu Xiao, Jing Yang, Jinxuan Xu, Le Yu, Linlin Zhu, Sicong Cao, Xingwei Lin.

**Figure 1.** Figure 1: Overview of MAS-SZZ. adopted proprietary coding agent, while SWE-agent [19] bridges LLMs and terminal environments by structuring tool interactions. 3 METHODOLOGY 3.1 Overview As shown in view at source ↗

**Figure 2.** Figure 2: A running example detailing the full workflow of view at source ↗

read the original abstract

Accurate vulnerability-inducing commit identification serves as a foundation for a series of software security tasks, such as vulnerability detection and affected version analysis. A straightforward solution is the SZZ algorithm, which traces back through the code history to identify the earliest commit that modify the vulnerable code. Unfortunately, neither the customized V-SZZ nor state-of-the-art LLM4SZZ perform satisfactorily due to the incorrect anchor selection and inadequate backtracking capability, making them far beyond a reliable usage in practice. To overcome these challenges, we propose a multi-agentic SZZ algorithm, named MAS-SZZ, that facilitates the identification of vulnerability-inducing commits through collaboration among agents. Specifically, given a CVE description and its corresponding fixing commit, MAS-SZZ summarizes the root cause of the vulnerability and employs a structured step-forward prompting strategy to localize vulnerability-related statements based on the change intent of each patch hunk. These vulnerable statements serve as anchors from which MAS-SZZ autonomously traces backward through the repository's history to find the commit that first introduced the vulnerability. Extensive experiments show that MAS-SZZ outperforms the state-of-the-art baselines across datasets and programming languages, achieving F1-score gains of up to 65.22% over the best-performing SZZ algorithm.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

MAS-SZZ adds multi-agent root-cause summarization and step-forward localization to SZZ, claiming large F1 gains, but the abstract leaves the evaluation details too thin to judge if the gains hold up.

read the letter

The core idea is straightforward: feed a CVE and its fixing commit to a set of agents that first write a root-cause summary, then use structured prompting to mark the vulnerable statements inside each patch hunk, and finally backtrack from those statements through git history. This is meant to fix the bad-anchor problem that hurts V-SZZ and LLM4SZZ. The approach is new in its explicit multi-agent division of labor and the step-forward tactic for statement selection, and it is scoped exactly to the practical task of vulnerability-inducing commit identification. If the reported F1 lifts of up to 65 % are real and reproducible, the method would be directly useful to people who maintain vulnerability databases or do affected-version analysis. The abstract is clear on the intended workflow and on the baselines it beats, which is better than many security papers that stay vague. The main soft spot is that the performance numbers are presented without any visible information on dataset construction, baseline re-implementations, statistical tests, or error analysis. That makes it impossible to tell whether the gains come from the agent architecture or from differences in how the comparison was run. Reliance on LLM summarization and localization also introduces the risk that an incorrect root-cause summary or a mis-marked statement will send the backtracking down the wrong path; the abstract does not discuss how often that happens or how it is mitigated. The work is aimed at researchers and tool builders in software security who already use or extend SZZ variants. It is worth sending to peer review because the problem is well-defined, the proposed fix is concrete, and the claimed improvement is large enough to matter if it survives scrutiny. A referee can check the experimental controls and ask for ablations on the individual agent steps. I would not cite it yet, but I would read the full methods and results if they become available.

Referee Report

2 major / 1 minor

Summary. The paper proposes MAS-SZZ, a multi-agentic SZZ algorithm for identifying vulnerability-inducing commits. Given a CVE description and fixing commit, agents summarize the root cause and apply structured step-forward prompting to localize vulnerability-related statements within patch hunks; these statements serve as anchors for autonomous backward tracing through repository history. The central claim is that MAS-SZZ outperforms prior SZZ variants (including V-SZZ and LLM4SZZ) across datasets and languages, with F1-score gains reaching 65.22%.

Significance. If the empirical results are robust, the work offers a meaningful improvement to a foundational technique in software security. Accurate vulnerability-inducing commit identification supports downstream tasks such as vulnerability detection and affected-version analysis. The multi-agent pipeline directly targets documented failure modes of anchor selection and backtracking in existing SZZ implementations, and the reported gains suggest practical impact if the localization step proves reliable.

major comments (2)

[Abstract] Abstract: the performance claim of up to 65.22% F1 improvement is presented without any description of experimental design, baseline re-implementations, dataset construction criteria, statistical tests, or controls for selection effects. This absence prevents verification of the central empirical result.
[Methods] Methods (localization and summarization pipeline): the approach rests on the premise that the multi-agent system produces accurate root-cause summaries and correctly identifies vulnerability-related statements in patch hunks. No ablation, error analysis, or human validation of localization accuracy is referenced, leaving open the possibility that downstream backtracking errors arise from this step.

minor comments (1)

[Abstract] Abstract: the phrase 'structured step-forward prompting strategy' is introduced without a brief illustrative example or pseudocode, reducing immediate clarity for readers unfamiliar with the technique.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the detailed and constructive review. We address each major comment below, indicating where revisions will be made to strengthen the manuscript.

read point-by-point responses

Referee: [Abstract] Abstract: the performance claim of up to 65.22% F1 improvement is presented without any description of experimental design, baseline re-implementations, dataset construction criteria, statistical tests, or controls for selection effects. This absence prevents verification of the central empirical result.

Authors: We agree that the abstract, constrained by length, omits key experimental details. The full manuscript (Sections 4 and 5) describes the datasets (CVE-linked fixing commits across languages), baseline re-implementations (V-SZZ and LLM4SZZ), evaluation using precision/recall/F1, and dataset construction from public vulnerability repositories. To address the concern, we will revise the abstract to concisely note the evaluation setup, datasets, and metrics, while retaining the high-level claim. We will also add a brief reference to statistical significance testing in the experiments section for completeness. revision: yes
Referee: [Methods] Methods (localization and summarization pipeline): the approach rests on the premise that the multi-agent system produces accurate root-cause summaries and correctly identifies vulnerability-related statements in patch hunks. No ablation, error analysis, or human validation of localization accuracy is referenced, leaving open the possibility that downstream backtracking errors arise from this step.

Authors: We acknowledge the importance of validating the localization and summarization components. The manuscript details the multi-agent pipeline and structured prompting in Section 3, but does not include dedicated ablation studies, error analysis, or human validation specifically for root-cause summary accuracy and statement localization. In the revised version, we will add an ablation study isolating the localization step, an error analysis of failure cases, and a human evaluation on a subset of samples to quantify the accuracy of the anchors used for backtracking. revision: yes

Circularity Check

0 steps flagged

No significant circularity identified

full rationale

The paper proposes MAS-SZZ as a novel multi-agent pipeline that takes CVE descriptions and fixing commits as inputs, summarizes root causes, localizes vulnerable statements via structured step-forward prompting on patch hunks, and then performs autonomous history backtracking to identify inducing commits. Evaluation consists of direct comparisons against external baselines (V-SZZ, LLM4SZZ) on separate datasets across languages, reporting F1 gains. No equations, fitted parameters, self-definitional reductions, or load-bearing self-citations appear in the provided text; the central claims rest on the independent algorithmic construction and external empirical results rather than reducing to inputs by construction.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

The approach rests on the unverified assumption that LLM agents can accurately perform root-cause summarization and statement localization; no explicit free parameters, axioms, or invented entities are stated in the abstract.

pith-pipeline@v0.9.0 · 5532 in / 1094 out tokens · 100773 ms · 2026-05-08T02:54:27.929586+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

25 extracted references · 25 canonical work pages

[1]

Hassan, and Xiaohu Yang

Lingfeng Bao, Xin Xia, Ahmed E. Hassan, and Xiaohu Yang. 2022. V-SZZ: Au- tomatic Identification of Version Ranges Affected by CVE Vulnerabilities. In Proceedings of the 44th IEEE/ACM International Conference on Software Engineer- ing (ICSE). 2352–2364

work page 2022
[2]

Xingchu Chen, Chengwei Liu, Jialun Cao, Yang Xiao, Xinyue Cai, Yeting Li, Jingyi Shi, Tianqi Sun, Haiming Chen, and Wei Huo. 2025. Vulnerability-Affected Versions Identification: How Far Are We?. InProceedings of the 40th IEEE/ACM International Conference on Automated Software Engineering (ASE). 2970–2982

work page 2025
[3]

Daniel Alencar da Costa, Shane McIntosh, Weiyi Shang, Uirá Kulesza, Roberta Coelho, and Ahmed E. Hassan. 2017. A Framework for Evaluating the Results of the SZZ Approach for Identifying Bug-Introducing Changes.IEEE Trans. Software Eng.43, 7 (2017), 641–657

work page 2017
[4]

Steven Davies, Marc Roper, and Murray Wood. 2014. Comparing text-based and dependence-based approaches for determining the origins of bugs.J. Softw. Evol. Process.26, 1 (2014), 107–139

work page 2014
[5]

Kim Herzig, Sascha Just, and Andreas Zeller. 2016. The Impact of Tangled Code Changes on Defect Prediction Models.Empir. Softw. Eng.21, 2 (2016), 303–336

work page 2016
[6]

Torge Hinrichs, Emanuele Iannone, Tamás Aladics, Péter Hegedűs, Andrea De Lucia, Fabio Palomba, and Riccardo Scandariato. 2026. Back to the Roots: As- sessing Mining Techniques for Java Vulnerability-Contributing Commits.ACM Trans. Softw. Eng. Methodol.(2026)

work page 2026
[7]

James Whitehead Jr

Sunghun Kim, Thomas Zimmermann, Kai Pan, and E. James Whitehead Jr. 2006. Automatic Identification of Bug-Introducing Changes. InProceedings of the 21st IEEE/ACM International Conference on Automated Software Engineering (ASE). 81–90

work page 2006
[8]

Yi Li, Aashish Yadavally, Jiaxing Zhang, Shaohua Wang, and Tien N. Nguyen. 2023. Commit-Level, Neural Vulnerability Detection and Assessment. InProceedings of the 31st ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering (ESEC/FSE). 1024–1036

work page 2023
[9]

Yunbo Lyu, Hong Jin Kang, Ratnadira Widyasari, Julia Lawall, and David Lo

work page
[10]

IEEE Trans

Evaluating SZZ Implementations: An Empirical Study on the Linux Kernel. IEEE Trans. Software Eng.50, 9 (2024), 2219–2239

work page 2024
[11]

Viet Hung Nguyen, Stanislav Dashevskyi, and Fabio Massacci. 2016. An automatic method for assessing the versions affected by a vulnerability.Empirical Software Engineering21, 6 (2016), 2268–2297

work page 2016
[12]

Christophe Rezk, Yasutaka Kamei, and Shane McIntosh. 2022. The Ghost Commit Problem When Identifying Fix-Inducing Changes: An Empirical Study of Apache Projects.IEEE Trans. Software Eng.48, 9 (2022), 3297–3309

work page 2022
[13]

Giovanni Rosa, Luca Pascarella, Simone Scalabrino, Rosalia Tufano, Gabriele Bavota, Michele Lanza, and Rocco Oliveto. 2021. Evaluating SZZ Implementations Through a Developer-informed Oracle. InProceedings of the 43rd IEEE/ACM International Conference on Software Engineering (ICSE). 436–447

work page 2021
[14]

Jacek Sliwerski, Thomas Zimmermann, and Andreas Zeller. 2005. When do changes induce fixes?ACM SIGSOFT Softw. Eng. Notes30, 4 (2005), 1–5

work page 2005
[15]

Shiyu Sun, Yunlong Xing, Xinda Wang, Shu Wang, Qi Li, and Kun Sun. 2025. DISPATCH: Unraveling Security Patches from Entangled Code Changes. InPro- ceedings of the 34th USENIX Security Symposium (Security). 4521–4540

work page 2025
[16]

Xiaobing Sun, Mingxuan Zhou, Sicong Cao, Xiaoxue Wu, Lili Bo, Di Wu, Bin Li, and Yang Xiang. 2025. HgtJIT: Just-in-Time Vulnerability Detection Based on Heterogeneous Graph Transformer.IEEE Trans. Dependable Secur. Comput.22, 6 (2025), 6522–6538

work page 2025
[17]

Lingxiao Tang, Jiakun Liu, Zhongxin Liu, Xiaohu Yang, and Lingfeng Bao. 2025. LLM4SZZ: Enhancing SZZ Algorithm with Context-Enhanced Assessment on Large Language Models.Proc. ACM Softw. Eng.2, ISSTA (2025), 343–365

work page 2025
[18]

Wei Tao, Yucheng Zhou, Yanlin Wang, Wenqiang Zhang, Hongyu Zhang, and Yu Cheng. 2024. MAGIS: LLM-Based Multi-Agent Framework for GitHub Issue Resolution. InProceedings of the 38th Annual Conference on Neural Information Processing Systems (NeurIPS). 51963–51993

work page 2024
[19]

Chi, Quoc V

Jason Wei, Xuezhi Wang, Dale Schuurmans, Maarten Bosma, Brian Ichter, Fei Xia, Ed H. Chi, Quoc V. Le, and Denny Zhou. 2022. Chain-of-Thought Prompting Elicits Reasoning in Large Language Models. InProceedings of the 36th Annual Conference on Neural Information Processing Systems (NeurIPS). 24824–24837

work page 2022
[20]

Jimenez, Alexander Wettig, Kilian Lieret, Shunyu Yao, Karthik Narasimhan, and Ofir Press

John Yang, Carlos E. Jimenez, Alexander Wettig, Kilian Lieret, Shunyu Yao, Karthik Narasimhan, and Ofir Press. 2024. SWE-agent: Agent-Computer Inter- faces Enable Automated Software Engineering. InProceedings of the 38th Annual Conference on Neural Information Processing Systems (NeurIPS). 50528–50652

work page 2024
[21]

Songtao Yang, Yubo He, Kaixiang Chen, Zheyu Ma, Xiapu Luo, Yong Xie, Jianjun Chen, and Chao Zhang. 2023. 1dFuzz: Reproduce 1-Day Vulnerabilities with Di- rected Differential Fuzzing. InProceedings of the 32nd ACM SIGSOFT International Symposium on Software Testing and Analysis (ISSTA). 867–879

work page 2023
[22]

Qunhong Zeng, Yuxia Zhang, Zhiqing Qiu, and Hui Liu. 2025. A First Look at Conventional Commits Classification. InProceedings of the 47th IEEE/ACM International Conference on Software Engineering (ICSE). 2277–2289

work page 2025
[23]

Jian Zhang, Chong Wang, Anran Li, Weisong Sun, Cen Zhang, Wei Ma, and Yang Liu. 2026. Evaluating Large Language Models for Line-Level Vulnerability Localization.IEEE Trans. Software Eng.52, 3 (2026), 770–785

work page 2026
[24]

Xin Zhou, Sicong Cao, Xiaobing Sun, and David Lo. 2025. Large Language Model for Vulnerability Detection and Repair: Literature Review and the Road Ahead. ACM Trans. Softw. Eng. Methodol.34, 5 (2025), 145:1–145:31

work page 2025
[25]

Kangchen Zhu, Zhiliang Tian, Shangwen Wang, Mingyue Leng, and Xiaoguang Mao. 2026. Atomizer: An LLM-based Collaborative Multi-Agent Framework for Intent-Driven Commit Untangling. InProceedings of the 48th IEEE/ACM International Conference on Software Engineering (ICSE)

work page 2026

[1] [1]

Hassan, and Xiaohu Yang

Lingfeng Bao, Xin Xia, Ahmed E. Hassan, and Xiaohu Yang. 2022. V-SZZ: Au- tomatic Identification of Version Ranges Affected by CVE Vulnerabilities. In Proceedings of the 44th IEEE/ACM International Conference on Software Engineer- ing (ICSE). 2352–2364

work page 2022

[2] [2]

Xingchu Chen, Chengwei Liu, Jialun Cao, Yang Xiao, Xinyue Cai, Yeting Li, Jingyi Shi, Tianqi Sun, Haiming Chen, and Wei Huo. 2025. Vulnerability-Affected Versions Identification: How Far Are We?. InProceedings of the 40th IEEE/ACM International Conference on Automated Software Engineering (ASE). 2970–2982

work page 2025

[3] [3]

Daniel Alencar da Costa, Shane McIntosh, Weiyi Shang, Uirá Kulesza, Roberta Coelho, and Ahmed E. Hassan. 2017. A Framework for Evaluating the Results of the SZZ Approach for Identifying Bug-Introducing Changes.IEEE Trans. Software Eng.43, 7 (2017), 641–657

work page 2017

[4] [4]

Steven Davies, Marc Roper, and Murray Wood. 2014. Comparing text-based and dependence-based approaches for determining the origins of bugs.J. Softw. Evol. Process.26, 1 (2014), 107–139

work page 2014

[5] [5]

Kim Herzig, Sascha Just, and Andreas Zeller. 2016. The Impact of Tangled Code Changes on Defect Prediction Models.Empir. Softw. Eng.21, 2 (2016), 303–336

work page 2016

[6] [6]

Torge Hinrichs, Emanuele Iannone, Tamás Aladics, Péter Hegedűs, Andrea De Lucia, Fabio Palomba, and Riccardo Scandariato. 2026. Back to the Roots: As- sessing Mining Techniques for Java Vulnerability-Contributing Commits.ACM Trans. Softw. Eng. Methodol.(2026)

work page 2026

[7] [7]

James Whitehead Jr

Sunghun Kim, Thomas Zimmermann, Kai Pan, and E. James Whitehead Jr. 2006. Automatic Identification of Bug-Introducing Changes. InProceedings of the 21st IEEE/ACM International Conference on Automated Software Engineering (ASE). 81–90

work page 2006

[8] [8]

Yi Li, Aashish Yadavally, Jiaxing Zhang, Shaohua Wang, and Tien N. Nguyen. 2023. Commit-Level, Neural Vulnerability Detection and Assessment. InProceedings of the 31st ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering (ESEC/FSE). 1024–1036

work page 2023

[9] [9]

Yunbo Lyu, Hong Jin Kang, Ratnadira Widyasari, Julia Lawall, and David Lo

work page

[10] [10]

IEEE Trans

Evaluating SZZ Implementations: An Empirical Study on the Linux Kernel. IEEE Trans. Software Eng.50, 9 (2024), 2219–2239

work page 2024

[11] [11]

Viet Hung Nguyen, Stanislav Dashevskyi, and Fabio Massacci. 2016. An automatic method for assessing the versions affected by a vulnerability.Empirical Software Engineering21, 6 (2016), 2268–2297

work page 2016

[12] [12]

Christophe Rezk, Yasutaka Kamei, and Shane McIntosh. 2022. The Ghost Commit Problem When Identifying Fix-Inducing Changes: An Empirical Study of Apache Projects.IEEE Trans. Software Eng.48, 9 (2022), 3297–3309

work page 2022

[13] [13]

Giovanni Rosa, Luca Pascarella, Simone Scalabrino, Rosalia Tufano, Gabriele Bavota, Michele Lanza, and Rocco Oliveto. 2021. Evaluating SZZ Implementations Through a Developer-informed Oracle. InProceedings of the 43rd IEEE/ACM International Conference on Software Engineering (ICSE). 436–447

work page 2021

[14] [14]

Jacek Sliwerski, Thomas Zimmermann, and Andreas Zeller. 2005. When do changes induce fixes?ACM SIGSOFT Softw. Eng. Notes30, 4 (2005), 1–5

work page 2005

[15] [15]

Shiyu Sun, Yunlong Xing, Xinda Wang, Shu Wang, Qi Li, and Kun Sun. 2025. DISPATCH: Unraveling Security Patches from Entangled Code Changes. InPro- ceedings of the 34th USENIX Security Symposium (Security). 4521–4540

work page 2025

[16] [16]

Xiaobing Sun, Mingxuan Zhou, Sicong Cao, Xiaoxue Wu, Lili Bo, Di Wu, Bin Li, and Yang Xiang. 2025. HgtJIT: Just-in-Time Vulnerability Detection Based on Heterogeneous Graph Transformer.IEEE Trans. Dependable Secur. Comput.22, 6 (2025), 6522–6538

work page 2025

[17] [17]

Lingxiao Tang, Jiakun Liu, Zhongxin Liu, Xiaohu Yang, and Lingfeng Bao. 2025. LLM4SZZ: Enhancing SZZ Algorithm with Context-Enhanced Assessment on Large Language Models.Proc. ACM Softw. Eng.2, ISSTA (2025), 343–365

work page 2025

[18] [18]

Wei Tao, Yucheng Zhou, Yanlin Wang, Wenqiang Zhang, Hongyu Zhang, and Yu Cheng. 2024. MAGIS: LLM-Based Multi-Agent Framework for GitHub Issue Resolution. InProceedings of the 38th Annual Conference on Neural Information Processing Systems (NeurIPS). 51963–51993

work page 2024

[19] [19]

Chi, Quoc V

Jason Wei, Xuezhi Wang, Dale Schuurmans, Maarten Bosma, Brian Ichter, Fei Xia, Ed H. Chi, Quoc V. Le, and Denny Zhou. 2022. Chain-of-Thought Prompting Elicits Reasoning in Large Language Models. InProceedings of the 36th Annual Conference on Neural Information Processing Systems (NeurIPS). 24824–24837

work page 2022

[20] [20]

Jimenez, Alexander Wettig, Kilian Lieret, Shunyu Yao, Karthik Narasimhan, and Ofir Press

John Yang, Carlos E. Jimenez, Alexander Wettig, Kilian Lieret, Shunyu Yao, Karthik Narasimhan, and Ofir Press. 2024. SWE-agent: Agent-Computer Inter- faces Enable Automated Software Engineering. InProceedings of the 38th Annual Conference on Neural Information Processing Systems (NeurIPS). 50528–50652

work page 2024

[21] [21]

Songtao Yang, Yubo He, Kaixiang Chen, Zheyu Ma, Xiapu Luo, Yong Xie, Jianjun Chen, and Chao Zhang. 2023. 1dFuzz: Reproduce 1-Day Vulnerabilities with Di- rected Differential Fuzzing. InProceedings of the 32nd ACM SIGSOFT International Symposium on Software Testing and Analysis (ISSTA). 867–879

work page 2023

[22] [22]

Qunhong Zeng, Yuxia Zhang, Zhiqing Qiu, and Hui Liu. 2025. A First Look at Conventional Commits Classification. InProceedings of the 47th IEEE/ACM International Conference on Software Engineering (ICSE). 2277–2289

work page 2025

[23] [23]

Jian Zhang, Chong Wang, Anran Li, Weisong Sun, Cen Zhang, Wei Ma, and Yang Liu. 2026. Evaluating Large Language Models for Line-Level Vulnerability Localization.IEEE Trans. Software Eng.52, 3 (2026), 770–785

work page 2026

[24] [24]

Xin Zhou, Sicong Cao, Xiaobing Sun, and David Lo. 2025. Large Language Model for Vulnerability Detection and Repair: Literature Review and the Road Ahead. ACM Trans. Softw. Eng. Methodol.34, 5 (2025), 145:1–145:31

work page 2025

[25] [25]

Kangchen Zhu, Zhiliang Tian, Shangwen Wang, Mingyue Leng, and Xiaoguang Mao. 2026. Atomizer: An LLM-based Collaborative Multi-Agent Framework for Intent-Driven Commit Untangling. InProceedings of the 48th IEEE/ACM International Conference on Software Engineering (ICSE)

work page 2026