LLM-based security code review is vulnerable to framing bias, with a novel iterative refinement attack achieving 100% success in reintroducing vulnerabilities across real projects.
Title resolution pending
5 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
roles
background 2polarities
background 2representative citing papers
AutoSOUP automates component-level memory-safety verification by generating Safety-Oriented Unit Proofs via three techniques and a hybrid LLM-plus-program-synthesis architecture called LLM-As-Function-Call.
PRAXIS combines LLM-driven structured traversal of service dependency graphs and hammock-block program dependence graphs to improve root-cause analysis accuracy by up to 6.3x while cutting token consumption by 5.3x on 30 real-world cloud incidents.
Introduces a taxonomy of nine LLM code smells, a static detection tool, and reports 73.5% prevalence with 91.3% precision and 71.8% recall across 692 projects.
Agent Mentor analyzes semantic trajectories in agent logs to identify undesired behaviors and derives corrective prompt instructions, yielding measurable accuracy gains on benchmark tasks across three agent setups.
citing papers explorer
-
Measuring and Exploiting Contextual Bias in LLM-Assisted Security Code Review
LLM-based security code review is vulnerable to framing bias, with a novel iterative refinement attack achieving 100% success in reintroducing vulnerabilities across real projects.
-
AutoSOUP: Safety-Oriented Unit Proof Generation for Component-level Memory-Safety Verification
AutoSOUP automates component-level memory-safety verification by generating Safety-Oriented Unit Proofs via three techniques and a hybrid LLM-plus-program-synthesis architecture called LLM-As-Function-Call.
-
PRAXIS: Integrating Program Analysis with Observability for Root-Cause Analysis
PRAXIS combines LLM-driven structured traversal of service dependency graphs and hammock-block program dependence graphs to improve root-cause analysis accuracy by up to 6.3x while cutting token consumption by 5.3x on 30 real-world cloud incidents.
-
LLM Code Smells: A Taxonomy and Detection Approach
Introduces a taxonomy of nine LLM code smells, a static detection tool, and reports 73.5% prevalence with 91.3% precision and 71.8% recall across 692 projects.
-
Agent Mentor: Framing Agent Knowledge through Semantic Trajectory Analysis
Agent Mentor analyzes semantic trajectories in agent logs to identify undesired behaviors and derives corrective prompt instructions, yielding measurable accuracy gains on benchmark tasks across three agent setups.