CritBench evaluates five LLMs on 81 tasks in IEC 61850 environments, showing reliable performance on static analysis and single-tool reconnaissance but degradation on dynamic live-system tasks that require sequential reasoning, with domain-specific tools improving results.
On the Surprising Efficacy of LLMs for Penetration-Testing, July 2025
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
fields
cs.CR 1years
2026 1verdicts
UNVERDICTED 1representative citing papers
citing papers explorer
-
CritBench: A Framework for Evaluating Cybersecurity Capabilities of Large Language Models in IEC 61850 Digital Substation Environments
CritBench evaluates five LLMs on 81 tasks in IEC 61850 environments, showing reliable performance on static analysis and single-tool reconnaissance but degradation on dynamic live-system tasks that require sequential reasoning, with domain-specific tools improving results.