Dual-mode benchmarks reveal frontier LLMs have high false positives and low vulnerability coverage in cybersecurity tasks while domain-specialized models reach over 50% per-family detection and 0.904 precision, indicating methodology and specialization matter more than scale.
Title resolution pending
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
fields
cs.CR 1years
2026 1verdicts
UNVERDICTED 1representative citing papers
citing papers explorer
-
Are Frontier LLMs Ready for Cybersecurity? Evidence for Vertical Foundation Models from Dual-Mode Vulnerability Benchmarks
Dual-mode benchmarks reveal frontier LLMs have high false positives and low vulnerability coverage in cybersecurity tasks while domain-specialized models reach over 50% per-family detection and 0.904 precision, indicating methodology and specialization matter more than scale.