FuzzingBrain V2, a multi-agent LLM system with a novel Suspicious Point abstraction and dual-layer fuzzing, reports 90% detection on a C/C++ benchmark and 29 confirmed zero-day vulnerabilities in real open-source projects.
arXiv preprint arXiv:2406.11147 , year =
6 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
roles
background 1polarities
background 1representative citing papers
ReasonVul deploys three LLM agents with independent analysis and structured debate to achieve 40% PairAcc and 72.52% F1 on PrimeVul, outperforming baselines by 81% in PairAcc.
AnyPoC introduces a multi-agent system for generating and validating PoC tests from LLM bug reports, producing 1.3x more valid PoCs, rejecting 9.8x more false positives, and discovering 122 new bugs across 12 major projects.
QuiLL is a new evaluation pipeline that uses optimized LLM prompts, dynamic in-context learning from an NVD vector store, and a novel accuracy-plus-reasoning metric to benchmark vulnerability detection in real code.
RAG frequently degrades LLM malware explanations when structured VirusTotal input is already available by introducing irrelevant context and narrative noise.
A literature survey that collects and categorizes 124 papers on LLM-based agents for software engineering from SE and agent perspectives.
citing papers explorer
-
FuzzingBrain V2: A Multi-Agent LLM System for Automated Vulnerability Discovery and Reproduction
FuzzingBrain V2, a multi-agent LLM system with a novel Suspicious Point abstraction and dual-layer fuzzing, reports 90% detection on a C/C++ benchmark and 29 confirmed zero-day vulnerabilities in real open-source projects.
-
Three Heads Are Better Than One: A Multi-perspective Reasoning Framework for Enhanced Vulnerability Detection
ReasonVul deploys three LLM agents with independent analysis and structured debate to achieve 40% PairAcc and 72.52% F1 on PrimeVul, outperforming baselines by 81% in PairAcc.
-
AnyPoC: Universal Proof-of-Concept Test Generation for Scalable LLM-Based Bug Detection
AnyPoC introduces a multi-agent system for generating and validating PoC tests from LLM bug reports, producing 1.3x more valid PoCs, rejecting 9.8x more false positives, and discovering 122 new bugs across 12 major projects.
-
QuiLL: An LLM-Based Vulnerability Assessment Framework for the Wild
QuiLL is a new evaluation pipeline that uses optimized LLM prompts, dynamic in-context learning from an NVD vector store, and a novel accuracy-plus-reasoning metric to benchmark vulnerability detection in real code.
-
Evaluating Retrieval-Augmented Generation for Explainable Malware Analysis
RAG frequently degrades LLM malware explanations when structured VirusTotal input is already available by introducing irrelevant context and narrative noise.
-
Large Language Model-Based Agents for Software Engineering: A Survey
A literature survey that collects and categorizes 124 papers on LLM-based agents for software engineering from SE and agent perspectives.