hub Canonical reference

Llm-assisted static analysis for detecting security vulnerabilities

· 2024 · arXiv 2405.17238

Canonical reference. 100% of citing Pith papers cite this work as background.

18 Pith papers citing it

Background 100% of classified citations

read on arXiv browse 18 citing papers

hub tools

JSON dossier citing papers JSON arXiv source

citation-role summary

background 5

citation-polarity summary

background 5

representative citing papers

Do Coding Agents Understand Least-Privilege Authorization?

cs.CR · 2026-05-14 · unverdicted · novelty 7.0

Coding agents struggle to infer least-privilege file permissions by omitting needed accesses while granting unused or sensitive ones, but Sufficiency-Tightness Decomposition improves sensitive-task success by up to 15.8% and reduces attacks.

Generating Complex Code Analyzers from Natural Language Questions

cs.SE · 2026-05-10 · unverdicted · novelty 7.0

Merlin generates CodeQL queries from natural language questions via RAG-based iteration and a self-test technique using assistive queries, achieving 3.8x higher task accuracy and 31% less completion time in user studies while finding additional software issues.

Longitudinal Analyses of SAST Tools: A CodeQL Case Study

cs.CR · 2026-05-08 · unverdicted · novelty 7.0

CodeQL detected 171 CVEs total, with 83 caught by a prior version before the fix; detections were often actionable within the vulnerable file but not stable across tool versions.

Refute-or-Promote: An Adversarial Stage-Gated Multi-Agent Review Methodology for High-Precision LLM-Assisted Defect Discovery

cs.CR · 2026-04-21 · unverdicted · novelty 7.0

Refute-or-Promote applies adversarial multi-agent review with kill gates and empirical verification to filter LLM defect candidates, killing 79-83% before disclosure and yielding 4 CVEs plus multiple accepted fixes across libraries, C++ standard, and compilers.

CodeCureAgent: Automatic Classification and Repair of Static Analysis Warnings

cs.SE · 2025-09-15 · conditional · novelty 7.0

CodeCureAgent achieves 96.8% plausible fixes and 86.3% correct fixes for 1,000 SonarQube warnings across 106 Java projects using an agentic LLM framework.

FuzzingBrain V2: A Multi-Agent LLM System for Automated Vulnerability Discovery and Reproduction

cs.CR · 2026-05-20 · unverdicted · novelty 6.0

FuzzingBrain V2, a multi-agent LLM system with a novel Suspicious Point abstraction and dual-layer fuzzing, reports 90% detection on a C/C++ benchmark and 29 confirmed zero-day vulnerabilities in real open-source projects.

Three Heads Are Better Than One: A Multi-perspective Reasoning Framework for Enhanced Vulnerability Detection

cs.SE · 2026-05-18 · conditional · novelty 6.0

ReasonVul deploys three LLM agents with independent analysis and structured debate to achieve 40% PairAcc and 72.52% F1 on PrimeVul, outperforming baselines by 81% in PairAcc.

Veritas: A Semantically Grounded Agentic Framework for Memory Corruption Vulnerability Detection in Binaries

cs.SE · 2026-05-14 · unverdicted · novelty 6.0

Veritas detects memory corruption vulnerabilities in stripped binaries by combining static value-flow slicing, dual-view LLM reasoning, and multi-agent runtime validation, reporting 90% recall, zero false positives on 623 exhaustive cases, and discovery of a real Apple CVE.

Semia: Auditing Agent Skills via Constraint-Guided Representation Synthesis

cs.CR · 2026-05-01 · unverdicted · novelty 6.0

Semia synthesizes Datalog representations of agent skills via constraint-guided loops to enable reachability queries for semantic risks, finding critical issues in over half of 13,728 real skills with 97.7% recall on expert-labeled samples.

AnyPoC: Universal Proof-of-Concept Test Generation for Scalable LLM-Based Bug Detection

cs.SE · 2026-04-13 · conditional · novelty 6.0

AnyPoC introduces a multi-agent system for generating and validating PoC tests from LLM bug reports, producing 1.3x more valid PoCs, rejecting 9.8x more false positives, and discovering 122 new bugs across 12 major projects.

Do Fine-Tuned LLMs Understand Vulnerabilities? An Investigation into the Semantic Trap

cs.CR · 2026-01-30 · unverdicted · novelty 6.0

Fine-tuned decoder-only LLMs fall into a Semantic Trap on vulnerability detection, achieving high scores on unpaired normal code but failing on paired vulnerable-patched code, semantic perturbations, and gap analysis, while reasoning supervision reduces symptoms at the cost of recall.

Learning Project-wise Subsequent Code Edits via Interleaving Neural-based Induction and Tool-based Deduction

cs.SE · 2026-04-14 · unverdicted · novelty 5.0

TRACE improves project-wise subsequent code editing by interleaving neural-based induction for semantic edits and tool-based deduction for syntactic edits.

VulWeaver: Weaving Broken Semantics for Grounded Vulnerability Detection

cs.SE · 2026-04-12 · unverdicted · novelty 5.0

VulWeaver improves Java vulnerability detection to 0.75 F1 by enhancing dependency graphs with LLM semantic fixes, extracting full context from slices plus implicit usage info, and applying type-specific meta-prompting with majority voting.

Evaluating the Reliability of Multiple Large Language Models in Risk Assessment: A CIS Controls Based Approach

cs.CR · 2026-05-06 · unverdicted · novelty 4.0

Large language models consistently underestimate cybersecurity risks compared to human experts in CIS Controls-based assessments, indicating they should serve as complementary rather than standalone tools.

CyberAId: AI-Driven Cybersecurity for Financial Service Providers

cs.AI · 2026-05-03 · unverdicted · novelty 4.0

CyberAId is a proposed on-premise multi-agent system that coordinates LLM subagents with classical security tools to improve threat response and regulatory alignment in financial services.

Adaptive and AI-Augmented Security Testing: A Systematic Survey of Program Analysis, Feedback-Driven Testing, and Hybrid Learning-Based Approaches

cs.SE · 2026-04-29 · unverdicted · novelty 4.0

Systematic survey of 55 studies on security testing identifies structural-adaptive fragmentation between program representations and adaptive mechanisms, proposing a unified research agenda.

A Blueprint for AI-Driven Software Quality: Integrating LLMs with Established Standards

cs.SE · 2025-05-19 · unverdicted · novelty 3.0

Survey mapping LLM applications in software quality assurance to established standards including ISO/IEC 12207, ISO 25010, CMMI, and TMM, with case studies, challenges, and future directions.

Finding Memory Leaks in C/C++ Programs via Neuro-Symbolic Augmented Static Analysis

cs.SE · 2026-03-28

citing papers explorer

Showing 18 of 18 citing papers.

Do Coding Agents Understand Least-Privilege Authorization? cs.CR · 2026-05-14 · unverdicted · none · ref 35
Coding agents struggle to infer least-privilege file permissions by omitting needed accesses while granting unused or sensitive ones, but Sufficiency-Tightness Decomposition improves sensitive-task success by up to 15.8% and reduces attacks.
Generating Complex Code Analyzers from Natural Language Questions cs.SE · 2026-05-10 · unverdicted · none · ref 26
Merlin generates CodeQL queries from natural language questions via RAG-based iteration and a self-test technique using assistive queries, achieving 3.8x higher task accuracy and 31% less completion time in user studies while finding additional software issues.
Longitudinal Analyses of SAST Tools: A CodeQL Case Study cs.CR · 2026-05-08 · unverdicted · none · ref 41
CodeQL detected 171 CVEs total, with 83 caught by a prior version before the fix; detections were often actionable within the vulnerable file but not stable across tool versions.
Refute-or-Promote: An Adversarial Stage-Gated Multi-Agent Review Methodology for High-Precision LLM-Assisted Defect Discovery cs.CR · 2026-04-21 · unverdicted · none · ref 17
Refute-or-Promote applies adversarial multi-agent review with kill gates and empirical verification to filter LLM defect candidates, killing 79-83% before disclosure and yielding 4 CVEs plus multiple accepted fixes across libraries, C++ standard, and compilers.
CodeCureAgent: Automatic Classification and Repair of Static Analysis Warnings cs.SE · 2025-09-15 · conditional · none · ref 34
CodeCureAgent achieves 96.8% plausible fixes and 86.3% correct fixes for 1,000 SonarQube warnings across 106 Java projects using an agentic LLM framework.
FuzzingBrain V2: A Multi-Agent LLM System for Automated Vulnerability Discovery and Reproduction cs.CR · 2026-05-20 · unverdicted · none · ref 5
FuzzingBrain V2, a multi-agent LLM system with a novel Suspicious Point abstraction and dual-layer fuzzing, reports 90% detection on a C/C++ benchmark and 29 confirmed zero-day vulnerabilities in real open-source projects.
Three Heads Are Better Than One: A Multi-perspective Reasoning Framework for Enhanced Vulnerability Detection cs.SE · 2026-05-18 · conditional · none · ref 27
ReasonVul deploys three LLM agents with independent analysis and structured debate to achieve 40% PairAcc and 72.52% F1 on PrimeVul, outperforming baselines by 81% in PairAcc.
Veritas: A Semantically Grounded Agentic Framework for Memory Corruption Vulnerability Detection in Binaries cs.SE · 2026-05-14 · unverdicted · none · ref 28
Veritas detects memory corruption vulnerabilities in stripped binaries by combining static value-flow slicing, dual-view LLM reasoning, and multi-agent runtime validation, reporting 90% recall, zero false positives on 623 exhaustive cases, and discovery of a real Apple CVE.
Semia: Auditing Agent Skills via Constraint-Guided Representation Synthesis cs.CR · 2026-05-01 · unverdicted · none · ref 21
Semia synthesizes Datalog representations of agent skills via constraint-guided loops to enable reachability queries for semantic risks, finding critical issues in over half of 13,728 real skills with 97.7% recall on expert-labeled samples.
AnyPoC: Universal Proof-of-Concept Test Generation for Scalable LLM-Based Bug Detection cs.SE · 2026-04-13 · conditional · none · ref 32
AnyPoC introduces a multi-agent system for generating and validating PoC tests from LLM bug reports, producing 1.3x more valid PoCs, rejecting 9.8x more false positives, and discovering 122 new bugs across 12 major projects.
Do Fine-Tuned LLMs Understand Vulnerabilities? An Investigation into the Semantic Trap cs.CR · 2026-01-30 · unverdicted · none · ref 25
Fine-tuned decoder-only LLMs fall into a Semantic Trap on vulnerability detection, achieving high scores on unpaired normal code but failing on paired vulnerable-patched code, semantic perturbations, and gap analysis, while reasoning supervision reduces symptoms at the cost of recall.
Learning Project-wise Subsequent Code Edits via Interleaving Neural-based Induction and Tool-based Deduction cs.SE · 2026-04-14 · unverdicted · none · ref 69
TRACE improves project-wise subsequent code editing by interleaving neural-based induction for semantic edits and tool-based deduction for syntactic edits.
VulWeaver: Weaving Broken Semantics for Grounded Vulnerability Detection cs.SE · 2026-04-12 · unverdicted · none · ref 24
VulWeaver improves Java vulnerability detection to 0.75 F1 by enhancing dependency graphs with LLM semantic fixes, extracting full context from slices plus implicit usage info, and applying type-specific meta-prompting with majority voting.
Evaluating the Reliability of Multiple Large Language Models in Risk Assessment: A CIS Controls Based Approach cs.CR · 2026-05-06 · unverdicted · none · ref 6
Large language models consistently underestimate cybersecurity risks compared to human experts in CIS Controls-based assessments, indicating they should serve as complementary rather than standalone tools.
CyberAId: AI-Driven Cybersecurity for Financial Service Providers cs.AI · 2026-05-03 · unverdicted · none · ref 6
CyberAId is a proposed on-premise multi-agent system that coordinates LLM subagents with classical security tools to improve threat response and regulatory alignment in financial services.
Adaptive and AI-Augmented Security Testing: A Systematic Survey of Program Analysis, Feedback-Driven Testing, and Hybrid Learning-Based Approaches cs.SE · 2026-04-29 · unverdicted · none · ref 22
Systematic survey of 55 studies on security testing identifies structural-adaptive fragmentation between program representations and adaptive mechanisms, proposing a unified research agenda.
A Blueprint for AI-Driven Software Quality: Integrating LLMs with Established Standards cs.SE · 2025-05-19 · unverdicted · none · ref 160
Survey mapping LLM applications in software quality assurance to established standards including ISO/IEC 12207, ISO 25010, CMMI, and TMM, with case studies, challenges, and future directions.
Finding Memory Leaks in C/C++ Programs via Neuro-Symbolic Augmented Static Analysis cs.SE · 2026-03-28 · unreviewed · ref 40

Llm-assisted static analysis for detecting security vulnerabilities

hub tools

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer