SecureVibeBench supplies 105 realistic multi-file C/C++ tasks from 41 OSS-Fuzz projects with known vulnerability introduction points and evaluates five code agents, finding the best achieves only 23.8% fully correct and secure solutions.
Title resolution pending
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
fields
cs.SE 1years
2025 1verdicts
ACCEPT 1representative citing papers
citing papers explorer
-
SecureVibeBench: Benchmarking Secure Vibe Coding of AI Agents via Reconstructing Vulnerability-Introducing Scenarios
SecureVibeBench supplies 105 realistic multi-file C/C++ tasks from 41 OSS-Fuzz projects with known vulnerability introduction points and evaluates five code agents, finding the best achieves only 23.8% fully correct and secure solutions.