CyberCertBench shows frontier LLMs reach human-expert performance on general IT and networking security but drop on vendor-specific and formal standards questions such as IEC 62443, with a new framework for producing interpretable explanations.
In: 2024 IEEE Inter- national Conference on Cyber Security and Resilience (CSR)
6 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
years
2026 6verdicts
UNVERDICTED 6roles
background 2polarities
background 2representative citing papers
Agentic LLMs remain robust to renaming and insertion but degrade on composed transformations and deeper obfuscation in CTF tasks, enabled by a new Evolve-CTF tool for generating equivalent challenge families.
H-TechniqueRAG improves F1 by 3.8% and cuts latency 62% over flat TechniqueRAG by retrieving tactics first then techniques within them on three CTI datasets.
RedShell fine-tunes LLMs on enhanced malicious PowerShell data to produce syntactically valid offensive code for pentesting, reporting over 90% validity, strong semantic match to references, and better edit-distance similarity than prior methods plus functional execution success.
XGBoost with SHAP and statistical distribution analysis on UAVIDS-2025 identifies density support intersection as the cause of false predictions for Wormhole and Blackhole attacks in UAV intrusion detection.
RedShell fine-tunes LLMs on a custom dataset of public code samples to generate syntactically valid PowerShell scripts with semantic similarity to references, reporting under 10% parse errors and over 50%/40% mean similarity on Edit Distance and METEOR.
citing papers explorer
-
CyberCertBench: Evaluating LLMs in Cybersecurity Certification Knowledge
CyberCertBench shows frontier LLMs reach human-expert performance on general IT and networking security but drop on vendor-specific and formal standards questions such as IEC 62443, with a new framework for producing interpretable explanations.
-
Capture the Flags: Family-Based Evaluation of Agentic LLMs via Semantics-Preserving Transformations
Agentic LLMs remain robust to renaming and insertion but degrade on composed transformations and deeper obfuscation in CTF tasks, enabled by a new Evolve-CTF tool for generating equivalent challenge families.
-
Hierarchical Retrieval Augmented Generation for Adversarial Technique Annotation in Cyber Threat Intelligence Text
H-TechniqueRAG improves F1 by 3.8% and cuts latency 62% over flat TechniqueRAG by retrieving tactics first then techniques within them on three CTI datasets.
-
Towards Automated Pentesting with Large Language Models
RedShell fine-tunes LLMs on enhanced malicious PowerShell data to produce syntactically valid offensive code for pentesting, reporting over 90% validity, strong semantic match to references, and better edit-distance similarity than prior methods plus functional execution success.
-
XAI and Statistical Analysis for Reliable Intrusion Detection in the UAVIDS-2025 Dataset: From Tree to Hybrid and Tabular DNN Ensembles
XGBoost with SHAP and statistical distribution analysis on UAVIDS-2025 identifies density support intersection as the cause of false predictions for Wormhole and Blackhole attacks in UAV intrusion detection.
-
RedShell: A Generative AI-Based Approach to Ethical Hacking
RedShell fine-tunes LLMs on a custom dataset of public code samples to generate syntactically valid PowerShell scripts with semantic similarity to references, reporting under 10% parse errors and over 50%/40% mean similarity on Edit Distance and METEOR.