ATLAS: AI-Assisted Threat-to-Assertion Learning for System-on-Chip Security Verification
Pith reviewed 2026-05-15 18:19 UTC · model grok-4.3
The pith
ATLAS uses large language models to turn standardized threat models into formal security assertions for System-on-Chip designs.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
ATLAS automates the conversion of vulnerability knowledge into verifiable security properties by using LLMs to perform asset identification, weakness mapping, and assertion generation for System-on-Chip designs, evaluated by detecting 39 out of 48 CWEs and correctly producing properties for 33 bugs across HACK@DAC benchmarks.
What carries the argument
The ATLAS framework, which performs asset-centric analysis combined with standardized threat model templates and multi-source SoC context to transform vulnerability reasoning into formal proof scripts.
If this is right
- Security verification steps can be derived automatically from existing public weakness databases rather than written from scratch.
- Formal tools such as JasperGold can receive ready-made scripts generated directly from threat descriptions.
- Design teams can incorporate property generation into earlier stages of the SoC flow without needing separate security specialists for every weakness.
- Knowledge bases like CWE become directly usable inputs for machine-assisted formal methods.
Where Pith is reading between the lines
- The same mapping process could be tested on industrial SoC designs that include proprietary blocks not present in public benchmarks.
- Generated assertions could be cross-checked by independent formal tools or symbolic execution to catch LLM hallucinations before verification runs.
- Extending the asset-identification step to include power or side-channel models might surface additional classes of weaknesses.
Load-bearing premise
The generated assertions from the language model are accurate and complete enough to represent the actual security requirements of the mapped weaknesses.
What would settle it
Running ATLAS on a new SoC design that contains a documented CWE vulnerability and checking whether the generated properties miss that vulnerability or produce an incorrect assertion.
Figures
read the original abstract
This work presents ATLAS, an LLM-driven framework that bridges standardized threat modeling and property-based formal verification for System-on-Chip (SoC) security. Starting from vulnerability knowledge bases such as Common Weakness Enumeration (CWE), ATLAS identifies SoC-specific assets, maps relevant weaknesses, and generates assertion-based security properties and JasperGold scripts for verification. By combining asset-centric analysis with standardized threat model templates and multi-source SoC context, ATLAS automates the transformation from vulnerability reasoning to formal proof. Evaluated on three HACK@DAC benchmarks, ATLAS detected 39/48 CWEs and generated correct properties for 33 of those bugs, advancing automated, knowledge-driven SoC security verification toward a secure-by-design paradigm.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript presents ATLAS, an LLM-driven framework that maps Common Weakness Enumerations (CWEs) to SoC-specific assets using standardized threat models and multi-source context, then generates SystemVerilog assertions and JasperGold scripts for formal security verification. On three HACK@DAC benchmarks, ATLAS detects 39/48 CWEs and produces correct properties for 33 of the detected weaknesses, with the goal of advancing automated, knowledge-driven SoC security verification.
Significance. If the correctness claims hold under rigorous validation, the work offers a practical bridge between threat-modeling knowledge bases and property-based formal verification, which could reduce manual effort in hardware security assurance and support secure-by-design flows for complex SoCs.
major comments (2)
- [Abstract] Abstract and Evaluation: The central performance claim (correct properties generated for 33 of 39 detected CWEs) is presented without any description of the validation procedure used to establish correctness. It is unclear whether assertions were confirmed by successful JasperGold proofs on injected faults, expert manual review against ground truth, or only syntactic checks. This protocol is load-bearing for interpreting the 33/39 metric as an objective result rather than an LLM self-assessment.
- [Evaluation] Evaluation: No baseline comparisons, ablation studies, or error analysis are supplied for either the CWE detection rate or the property-generation accuracy. Without these, the advancement over prior automated SoC verification techniques cannot be quantified.
minor comments (2)
- [Abstract] The abstract refers to 'three HACK@DAC benchmarks' but does not name them or summarize their design characteristics, making it hard to assess generalizability.
- [Framework] Notation for asset identification and threat-template mapping could be clarified with a small diagram or pseudocode example in the framework description.
Simulated Author's Rebuttal
We thank the referee for the constructive feedback. We address each major comment below and will revise the manuscript to provide the requested details on validation and evaluation.
read point-by-point responses
-
Referee: [Abstract] Abstract and Evaluation: The central performance claim (correct properties generated for 33 of 39 detected CWEs) is presented without any description of the validation procedure used to establish correctness. It is unclear whether assertions were confirmed by successful JasperGold proofs on injected faults, expert manual review against ground truth, or only syntactic checks. This protocol is load-bearing for interpreting the 33/39 metric as an objective result rather than an LLM self-assessment.
Authors: We agree that the validation procedure must be explicitly described. The original manuscript omitted these details. In the revised version, we will expand the Evaluation section (and update the abstract) to state that correctness of the 33 properties was established via expert manual review: two hardware security researchers independently compared each generated SystemVerilog assertion against ground-truth properties manually derived from the CWE descriptions and the HACK@DAC SoC specifications. In addition, for a subset of cases, we ran JasperGold proofs on designs with injected faults to confirm the assertions were sound. Syntactic checks were used only as a preliminary filter and did not contribute to the 33/39 count. This protocol will be documented with inter-rater agreement statistics. revision: yes
-
Referee: [Evaluation] Evaluation: No baseline comparisons, ablation studies, or error analysis are supplied for either the CWE detection rate or the property-generation accuracy. Without these, the advancement over prior automated SoC verification techniques cannot be quantified.
Authors: We acknowledge that the current evaluation lacks explicit baselines, ablations, and error analysis, limiting direct quantification of improvement. In the revision we will add: (1) a baseline comparison using direct LLM prompting (without asset mapping or threat-model templates) on the same three HACK@DAC benchmarks; (2) an ablation study measuring the contribution of each ATLAS component (threat templates, multi-source context, asset identification) to both CWE detection and property accuracy; and (3) a categorized error analysis of the 9 undetected CWEs and 6 incorrect properties, identifying causes such as CWE ambiguity or LLM limitations on complex SoC interactions. These additions will be placed in a new subsection of the Evaluation. revision: yes
Circularity Check
No circularity; empirical framework with external benchmark evaluation
full rationale
The paper presents ATLAS as a procedural LLM-driven pipeline that maps CWEs to SoC assets and generates SystemVerilog assertions, with performance measured by detection rates (39/48 CWEs) and correctness counts (33/39) on three external HACK@DAC benchmarks. No equations, fitted parameters, self-definitional loops, or load-bearing self-citations appear in the derivation chain. Claims rest on observable benchmark outcomes rather than internal reductions or renamed inputs, making the work self-contained against external evaluation.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption Large language models can accurately map standardized weaknesses to SoC-specific assets and generate correct formal assertions
Reference graph
Works this paper leans on
- [1]
-
[2]
Hasan Al Shaikh, Mohammad Bin Monjil, Kimia Zamiri Azar, Farimah Farahmandi, Mark Tehranipoor, and Fahim Rahman. 2023. Quardtropy: Detecting and quantifying unauthorized information leakage in hardware designs using g-entropy. In2023 IEEE International Symposium on Defect and Fault Tolerance in VLSI and Nanotechnology Systems (DFT). IEEE, 1–6
work page 2023
-
[3]
Hasan Al-Shaikh, Arash Vafaei, Mridha Md Mashahedur Rahman, Kimia Zamiri Azar, Fahim Rahman, Farimah Farahmandi, and Mark Tehranipoor. 2023. Sharpen: Soc security verification by hardware penetration test. InProceedings of the 28th Asia and South Pacific Design Automation Conference. 579–584
work page 2023
-
[5]
Avinash Ayalasomayajula, Rui Guo, Jingbo Zhou, Sujan Kumar Saha, and Farimah Farahmandi. 2024. Lasp: Llm assisted security property generation for soc verification. InProceedings of the 2024 ACM/IEEE International Symposium on Machine Learning for CAD. 1–7
work page 2024
-
[6]
Kimia Zamiri Azar, Muhammad Monir Hossain, Arash Vafaei, Hasan Al Shaikh, Nu- run N Mondol, Fahim Rahman, Mark Tehranipoor, and Farimah Farahmandi. 2022. Fuzz, penetration, and ai testing for soc security verification: Challenges and solutions. Cryptology ePrint Archive(2022)
work page 2022
-
[7]
S Barnum. 2008. Common attack pattern enumeration and classification (CAPEC) schema.Department of Homeland Security(2008)
work page 2008
-
[8]
2025.JasperGold Formal Verification Platform
Cadence Design Systems, Inc. 2025.JasperGold Formal Verification Platform. Cadence Design Systems, San Jose, California, USA. Commercial formal verification tool suite for property checking, equivalence, and static verification
work page 2025
-
[9]
Paul E. Calzada, Zahin Ibnat, Mark Tehranipoor, and Farimah Farahmandi. 2025. VerilogDB: The Largest, Highest-Quality Dataset with a Preprocessing Framework for LLM-based RTL Generation. https://arxiv.org/abs/2507.13369
-
[10]
Common Attack Pattern Enumerations and Classifications. [n. d.]. https://capec.mitre.org/
-
[11]
Common Vulnerabilities and Exposures. [n. d.]. https://www.cve.org/
-
[12]
CWE-1245: Improper Finite State Machines (FSMs) in Hardware Logic. [n. d.]. https://cwe.mitre.org/data/definitions/1245.html
-
[13]
CWE VIEW: Hardware Design. [n. d.]. https://cwe.mitre.org/data/definitions/1194.html
-
[14]
Fabio De Rosa, Nicolò Maunero, Paolo Prinetto, Federico Talentino, and Martina Trussoni. 2022. Threma: Ontology-based automated threat modeling for ict infrastruc- tures.IEEE Access10 (2022), 116514–116526
work page 2022
-
[15]
Isra Elsharef, Zhen Zeng, and Zhongshu Gu. 2024. Facilitating threat modeling by leveraging large language models. InWorkshop on AI Systems with Confidential Computing
work page 2024
-
[16]
Baleegh Ahmad et. al. 2022. Don’t cweat it: Toward cwe analysis techniques in early stages of hardware design. InProceedings of the 41st IEEE/ACM International Conference on Computer-Aided Design. 1–9
work page 2022
-
[17]
Muhammad Monir Hossain et. al. 2023. Socfuzzer: Soc vulnerability detection us- ing cost function enabled fuzz testing. In2023 Design, Automation & Test in Europe Conference & Exhibition (DATE). IEEE, 1–6
work page 2023
-
[18]
Jeyavijayan Rajendran et.al. 2016. Formal security verification of third party intellec- tual property cores for information leakage. In2016 29th International conference on VLSI design and 2016 15th international conference on embedded systems (VLSID). IEEE, 547–552
work page 2016
-
[19]
Kaiyan Chang et.al. 2024. Natural language is not enough: Benchmarking multi-modal generative AI for Verilog generation. InProceedings of the 43rd IEEE/ACM International Conference on Computer-Aided Design. 1–9
work page 2024
-
[20]
Farimah Farahmandi, M Sazadur Rahman, Sree Ranjani Rajendran, and Mark Tehra- nipoor. 2023. CAD for High-Level Synthesis. InCAD for Hardware Security. Springer International Publishing Cham
work page 2023
-
[21]
Nusrat Farzana, Fahim Rahman, Mark Tehranipoor, and Farimah Farahmandi. 2019. Soc security verification using property checking. In2019 IEEE International Test Conference (ITC). IEEE, 1–10
work page 2019
-
[22]
Vasudev Gohil, Rahul Kande, Chen Chen, Ahmad-Reza Sadeghi, and Jeyavijayan Rajendran. 2024. MABFuzz: Multi-armed bandit algorithms for fuzzing processors. In 2024 Design, Automation & Test in Europe Conference & Exhibition (DATE). IEEE, 1–6
work page 2024
-
[23]
HACK-EVENT. 2019. HACK@DAC 2018: SoC Security Benchmark. https://github. com/HACK-EVENT/hackatdac18 Accessed: 2025-11-14
work page 2019
-
[24]
HACK-EVENT. 2019. HACK@DAC 2019: SoC Security Benchmark. https://github. com/HACK-EVENT/hackatdac19 Accessed: 2025-11-14
work page 2019
-
[25]
HACK-EVENT. 2021. HACK@DAC 2021: OpenTitan SoC Benchmark. https://github. com/HACK-EVENT/hackatdac21 Accessed: 2025-11-14
work page 2021
-
[26]
Hardware security challenge contest. [n. d.]. https://hackthesilicon.com/hackdac24/
-
[27]
Intel’s stock drops 30% overnight - company sheds $39 billion in market cap. [n. d.]. https://www.tomshardware.com/pc-components/cpus/intels-stock-drops-30- overnight-company-sheds-dollar39-billion-in-market-cap
-
[28]
Roger G Johnston. 2010. Being vulnerable to the threat of confusing threats with vulnerabilities.The Journal of Physical Security4, 2 (2010), 30–34
work page 2010
- [29]
-
[30]
Rasheed Kibria, Farimah Farahmandi, and Mark Tehranipoor. 2024. A Survey on SoC Security Verification Methods at the Pre-silicon Stage.Cryptology ePrint Archive (2024)
work page 2024
-
[31]
Rasheed Kibria, M Sazadur Rahman, Farimah Farahmandi, and Mark Tehranipoor
-
[32]
In2022 IEEE International Test Conference (ITC)
Rtl-fsmx: Fast and accurate finite state machine extraction at the rtl for security applications. In2022 IEEE International Test Conference (ITC). IEEE, 165–174
-
[33]
Paul Kocher, Jann Horn, Anders Fogh, Daniel Genkin, Daniel Gruss, Werner Haas, Mike Hamburg, Moritz Lipp, Stefan Mangard, Thomas Prescher, et al. 2020. Spectre attacks: Exploiting speculative execution.Commun. ACM63, 7 (2020), 93–101
work page 2020
-
[34]
lowRISC C.I.C. 2025. OpenTitan: Open source silicon root of trust (RoT) project. https://opentitan.org/. Accessed: 2025-11-12
work page 2025
-
[35]
Yangdi Lyu, Alif Ahmed, and Prabhat Mishra. 2019. Automated activation of multiple targets in RTL models using concolic testing. In2019 Design, Automation & Test in Europe Conference & Exhibition (DATE). IEEE, 354–359
work page 2019
-
[36]
Yangdi Lyu and Prabhat Mishra. 2020. Scalable concolic testing of RTL models.IEEE Trans. Comput.70, 7 (2020), 979–991
work page 2020
-
[37]
Russ McRee. 2014. Microsoft threat modeling tool 2014: identify & mitigate.ISSA Journal39 (2014), 42
work page 2014
-
[38]
2024.Hardware Security Failure Scenarios: Potential Weaknesses in Hardware Design
Peter Mell and Irena Bojanova. 2024.Hardware Security Failure Scenarios: Potential Weaknesses in Hardware Design. Technical Report. National Institute of Standards and Technology
work page 2024
-
[39]
Xingyu Meng, Amisha Srivastava, Ayush Arunachalam, Avik Ray, Pedro Henrique Silva, Rafail Psiakis, Yiorgos Makris, and Kanad Basu. 2024. Nspg: Natural language processing-based security property generator for hardware security assurance. In Proceedings of the 61st ACM/IEEE Design Automation Conference. 1–6
work page 2024
-
[40]
Microsoft Unveils Emergency Windows Update That Disables Intel’s Spectre Fix. [n. d.]. https://fortune.com/2018/01/29/microsoft-windows-intel-spectre-fix/
work page 2018
-
[41]
The MITRE EMB3D Threat Model. 2024. https://emb3d.mitre.org/
work page 2024
-
[42]
Intel Secure Development: Threat Modeling. [n. d.]. https://www.intel.com/content/www/us/en/security/security-practices/secure- development-practices/threat-modeling.html
-
[43]
Researchers Point to an AMD Backdoor—And Face Their Own Backlash. [n. d.]. https://www.wired.com/story/amd-backdoor-cts-labs-backlash/
- [44]
-
[45]
Dipayan Saha, Sujan Kumar Saha, Mark Tehranipoor, and Farimah Farahmandi. 2024. Empowering Hardware Security with LLM: The Development of a Vulnerable Hard- ware Database
work page 2024
- [46]
-
[47]
Dipayan Saha, Shams Tarek, Hasan Al Shaikh, Khan Thamid Hasan, Pavan Sai Nal- luri, Md Ajoad Hasan, Nashmin Alam, Jingbo Zhou, Sujan Kumar Saha, and Mark Tehranipoor. 2025. SV-LLM: An Agentic Approach for SoC Security Verification using Large Language Models.arXiv preprint arXiv:2506.20415(2025)
-
[48]
MITRE Software and Hardware Design CWEs. [n. d.]. https://cwe.mitre.org/index.html
-
[49]
Pramod Subramanyan and Divya Arora. 2014. Formal verification of taint-propagation security properties in a commercial SoC design. In2014 Design, Automation & Test in Europe Conference & Exhibition (DATE). IEEE, 1–2
work page 2014
-
[50]
Ishraq Tashdid, Tasnuva Farheen, and Sazadur Rahman. 2025. SAFE-SiP: Secure Authentication Framework for System-in-Package Using Multi-party Computation. In Proceedings of the Great Lakes Symposium on VLSI 2025 (GLSVLSI ’25). Association for Computing Machinery, New York, NY, USA, 391–396. doi:10.1145/3716368.3735248
- [51]
-
[52]
Ishraq Tashdid, Valentina Terry, Jordan Merkel, Tasnuva Farheen, and Sazadur Rahman
-
[53]
In2025 ACM/IEEE 7th Symposium on Machine Learning for CAD (MLCAD)
BeyondPPA: Human-Inspired Reinforcement Learning for Post-Route Reliability- Aware Macro Placement. In2025 ACM/IEEE 7th Symposium on Machine Learning for CAD (MLCAD). 1–12. doi:10.1109/MLCAD65511.2025.11189164
-
[54]
Kimia Tasnia, Alexander Garcia, Tasnuva Farheen, and Sazadur Rahman. 2025. VeriOpt: PPA-Aware High-Quality Verilog Generation via Multi-Role LLMs. In2025 IEEE/ACM International Conference on Computer Aided Design (ICCAD). IEEE
work page 2025
-
[55]
Kimia Tasnia and Sazadur Rahman. 2025.OPL4GPT: An Application Space Exploration of Optimal Programming Language for Hardware Design by LLM. Association for Computing Machinery, New York, NY, USA, 981–987. https://doi.org/10.1145/3658617. 3697621
-
[56]
This ’Demonically Clever’ Backdoor Hides In a Tiny Slice of a Computer Chip. [n. d.]. https://www.wired.com/2016/06/demonically-clever-backdoor-hides-inside- computer-chip/
work page 2016
-
[57]
CVSS Severity Distribution Over Time. [n. d.]. https://nvd.nist.gov/general/visualizations/vulnerability-visualizations/cvss- severity-distribution-over-time
-
[58]
Timothy Trippel, Kang G Shin, Alex Chernyakhovsky, Garret Kelly, Dominic Rizzo, and Matthew Hicks. 2022. Fuzzing hardware like software. In31st USENIX Security Symposium (USENIX Security 22). 3237–3254
work page 2022
-
[59]
Clifford Wolf and Johann Glaser. 2013. Yosys – A Free Verilog Synthesis Suite. In Proceedings of the 21st Austrian Workshop on Microelectronics (Austrochip). Linz, Austria, 47–52
work page 2013
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.