Recognition: unknown
Can Agents Secure Hardware? Evaluating Agentic LLM-Driven Obfuscation for IP Protection
Pith reviewed 2026-05-10 14:36 UTC · model grok-4.3
The pith
An LLM agent framework produces correct obfuscated hardware netlists that corrupt outputs under wrong keys but remain vulnerable to SAT attacks.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
By using an agentic large language model framework that decomposes hardware netlist obfuscation into retrieval-grounded planning, structured lock-plan generation, deterministic compilation, functional verification, and SAT-based evaluation stages, correct locked netlists are generated for ISCAS-85 benchmarks; these netlists match the original circuit behavior with the correct key and show measurable output corruption with incorrect keys, yet SAT attacks remain effective at key recovery.
What carries the argument
Agentic LLM-driven multi-stage obfuscation pipeline with retrieval-grounded planning and deterministic post-processing for netlist locking and verification.
If this is right
- Hardware IP obfuscation can be automated for benchmark designs using LLM agents instead of manual methods.
- Locked netlists from the framework maintain functional correctness and exhibit key-dependent behavior.
- SAT-based attacks continue to be an effective method for breaking the security of these obfuscated circuits.
- The inclusion of attack evaluation in the pipeline allows direct assessment of security during generation.
Where Pith is reading between the lines
- Integrating specialized SAT-resistant obfuscation techniques into the lock-plan stage could enhance resistance in future iterations.
- This staged agentic method could be adapted for related tasks like hardware watermarking or anti-tamper measures.
- Applying the framework to larger, real-world designs would test whether the current balance of correctness and vulnerability scales.
Load-bearing premise
Decomposing the obfuscation task into LLM-driven stages with retrieval-grounded planning and deterministic compilation will produce obfuscation that is both functionally correct and resistant to SAT-based key recovery attacks.
What would settle it
A concrete falsifier would be generating a locked netlist where an SAT attack fails to recover the key after exhaustive search within computational limits, contrary to the observed effectiveness on benchmarks.
Figures
read the original abstract
The globalization of integrated circuit (IC) design and manufacturing has increased the exposure of hardware intellectual property (IP) to untrusted stages of the supply chain, raising concerns about reverse engineering, piracy, tampering, and overbuilding. Hardware netlist obfuscation is a promising countermeasure, but automating the generation of functionally correct and security-relevant obfuscated circuits remains challenging, particularly for benchmark-scale designs. This paper presents an agentic, large language model (LLM)-driven framework for automated hardware netlist obfuscation. The proposed framework combines retrieval-grounded planning, structured lock-plan generation, deterministic netlist compilation, functional verification, and SAT-based security evaluation. Rather than a single prompt-to-output generation step, the framework decomposes the task into specialized stages for circuit analysis, synthesis, verification, and attack evaluation. We evaluate the framework on ISCAS-85 benchmarks using functional equivalence checking and SAT-based attacks. Results show that the framework generates correct locked netlists while introducing measurable output corruption under incorrect keys, while SAT attacks remain effective. These findings highlight both the potential and current limitations of agentic LLM-driven obfuscation.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript presents an agentic LLM-driven framework for automated hardware netlist obfuscation that decomposes the task into retrieval-grounded planning, structured lock-plan generation, deterministic netlist compilation, functional verification via equivalence checking, and SAT-based security evaluation. Evaluated on ISCAS-85 benchmarks, the framework is claimed to produce functionally correct locked netlists that introduce measurable output corruption under incorrect keys, while SAT attacks remain effective at key recovery. The work positions this as evidence of both the potential and current limitations of LLM agents for hardware IP protection.
Significance. If the empirical outcomes hold, the work is significant for exploring multi-stage LLM agents in a complex hardware security domain. It contributes by showing that staged, retrieval-grounded planning combined with deterministic compilation can yield correct obfuscated designs on standard benchmarks, while honestly documenting that this does not yet achieve resistance to established SAT attacks. The use of ISCAS-85 benchmarks and standard SAT methods provides a reproducible baseline for future agentic approaches in IP protection.
major comments (2)
- [Evaluation/Results] Evaluation/Results section: The central claims that the framework 'generates correct locked netlists' and 'introduces measurable output corruption under incorrect keys' while 'SAT attacks remain effective' are stated without accompanying quantitative metrics (e.g., corruption rates per benchmark, SAT attack success rates or runtimes, or equivalence-checking pass rates). This absence makes it impossible to assess the magnitude or consistency of the reported outcomes.
- [Framework/Methodology] Framework description: The decomposition into specialized stages (retrieval-grounded planning + deterministic compilation) is presented as key to correctness, but the manuscript does not detail how retrieval is implemented, what constitutes a 'lock-plan', or the exact interface between LLM-generated plans and the deterministic compiler, leaving the reproducibility of the functional-correctness result unclear.
minor comments (2)
- [Abstract] Abstract: The phrase 'measurable output corruption' is vague; replacing it with a brief quantitative summary (e.g., 'X% average output corruption on Y benchmarks') would strengthen the abstract without lengthening it.
- [Introduction] Notation and terminology: The manuscript uses 'agentic' and 'LLM-driven' interchangeably in places; consistent terminology and a short definition of the agent architecture would improve clarity.
Simulated Author's Rebuttal
We thank the referee for the constructive feedback and positive assessment of the significance of our work. We address each major comment below and will revise the manuscript to improve clarity, detail, and reproducibility as requested.
read point-by-point responses
-
Referee: [Evaluation/Results] Evaluation/Results section: The central claims that the framework 'generates correct locked netlists' and 'introduces measurable output corruption under incorrect keys' while 'SAT attacks remain effective' are stated without accompanying quantitative metrics (e.g., corruption rates per benchmark, SAT attack success rates or runtimes, or equivalence-checking pass rates). This absence makes it impossible to assess the magnitude or consistency of the reported outcomes.
Authors: We agree that the current presentation of results is too high-level and that quantitative metrics are needed to properly evaluate the claims. In the revised manuscript we will expand the Evaluation/Results section with a new table that reports, for each ISCAS-85 benchmark: (i) output corruption rate under incorrect keys (fraction of differing output bits), (ii) SAT attack success rate and runtime, and (iii) equivalence-checking pass rate (which is 100 % for all generated netlists). These additions will allow readers to assess both the magnitude and consistency of the observed outcomes. revision: yes
-
Referee: [Framework/Methodology] Framework description: The decomposition into specialized stages (retrieval-grounded planning + deterministic compilation) is presented as key to correctness, but the manuscript does not detail how retrieval is implemented, what constitutes a 'lock-plan', or the exact interface between LLM-generated plans and the deterministic compiler, leaving the reproducibility of the functional-correctness result unclear.
Authors: We acknowledge that additional implementation details are required for reproducibility. In the revised Framework section we will explicitly describe: (1) the retrieval mechanism, which employs a vector database populated with hardware-security literature and ISCAS-85 netlist examples; (2) the structure of a 'lock-plan' as a machine-readable JSON specification that enumerates locking locations, key-bit assignments, and chosen obfuscation primitives; and (3) the deterministic interface, in which the LLM output is parsed by a Python-based compiler that applies the plan to the input netlist using standard graph-manipulation libraries, with no further LLM involvement. These clarifications will make the functional-correctness results fully reproducible. revision: yes
Circularity Check
No significant circularity detected
full rationale
The paper presents an empirical evaluation of an agentic LLM-driven framework for hardware netlist obfuscation. It decomposes the task into retrieval-grounded planning, structured lock-plan generation, deterministic compilation, functional verification via equivalence checking, and SAT-based attack evaluation on standard ISCAS-85 benchmarks. No mathematical derivations, fitted parameters, or self-citations are load-bearing; results are direct measurements of functional correctness and attack outcomes that highlight both potential and limitations without reducing to self-defined inputs or prior author work by construction. The evaluation relies on external benchmarks and established attack methods, rendering the chain self-contained.
Axiom & Free-Parameter Ledger
Forward citations
Cited by 2 Pith papers
-
LLMs for Secure Hardware Design and Related Problems: Opportunities and Challenges
A survey of LLM applications in secure hardware design covering EDA synthesis, vulnerability analysis, countermeasures, and educational uses.
-
LLMs for Secure Hardware Design and Related Problems: Opportunities and Challenges
LLMs enable RTL code generation and vulnerability analysis in hardware design but introduce data contamination and adversarial risks that require red-teaming and dynamic benchmarking.
Reference graph
Works this paper leans on
-
[1]
Hardware Design and Security Needs Attention: From Survey to Path Forward,
S. Ghimire, M. A. Chowdhury, B. S. Latibari, M. Mamun, J. W. Carpenter, B. Tan, H. Pearce, K. Chakrabarty, P. Satam, and S. Salehi, “Hardware Design and Security Needs Attention: From Survey to Path Forward,” 6 2025
2025
-
[2]
Transformers for secure hardware systems: Applications, challenges, and outlook,
B. Saber Latibari, N. Nazari, A. Sasan, H. Homayoun, P. Satam, S. Salehi, and H. Sayadi, “Transformers for secure hardware systems: Applications, challenges, and outlook,” inProceedings of the Great Lakes Symposium on VLSI 2025, GLSVLSI ’25, (New York, NY , USA), p. 841–848, Association for Computing Machinery, 2025
2025
-
[3]
J. Blocklove, S. Garg, R. Karri, and H. Pearce, “Chip-Chat: Challenges and Opportunities in Conversational Hardware Design,”arXiv preprint arXiv:2305.13243, 2023
-
[4]
VeriGen: A Large Language Model for Verilog Code Generation,
S. Thakur, B. Ahmad, H. Pearce, B. Tan, B. Dolan-Gavitt, R. Karri, and S. Garg, “VeriGen: A Large Language Model for Verilog Code Generation,” 2023
2023
-
[5]
K. Thorat, J. Zhao, Y . Liu, H. Peng, X. Xie, B. Lei, J. Zhang, and C. Ding, “Advanced Language Model-Driven Verilog Development: En- hancing Power, Performance, and Area Optimization in Code Synthesis,” arXiv preprint arXiv:2312.01022, 2023
-
[6]
Improving Large Language Model Hardware Generating Quality through Post-LLM Search,
K. Chang, H. Ren, M. Wang, S. Liang, Y . Han, H. Li, X. Li, and Y . Wang, “Improving Large Language Model Hardware Generating Quality through Post-LLM Search,”
-
[7]
As- sertLLM: Generating and Evaluating Hardware Verification Assertions from Design Specifications via Multi-LLMs,
W. Fang, M. Li, M. Li, Z. Yan, S. Liu, H. Zhang, and Z. Xie, “As- sertLLM: Generating and Evaluating Hardware Verification Assertions from Design Specifications via Multi-LLMs,” 2024
2024
-
[8]
ChIRAAG: ChatGPT Informed Rapid and Automated Assertion Gen- eration,
B. Mali, K. Maddala, S. Reddy, V . Gupta, C. Karfa, and R. Karri, “ChIRAAG: ChatGPT Informed Rapid and Automated Assertion Gen- eration,” 2024
2024
-
[9]
Rtlfixer: Automatically fixing rtl syntax errors with large language models,
Y . Tsai, M. Liu, and H. Ren, “RTLFixer: Automatically Fixing RTL Syntax Errors with Large Language Models,”arXiv preprint arXiv:2311.16543, 2023
-
[10]
HDLdebugger: Streamlining HDL debugging with Large Language Models,
X. Yao, “HDLdebugger: Streamlining HDL debugging with Large Language Models,”arXiv preprint arXiv:2403.11671, 2024
-
[11]
SENTAUR: Security EnhaNced Trojan Assessment Using LLMs Against Undesirable Revisions,
J. Bhandari, R. Sadhukhan, P. Krishnamurthy, F. Khorrami, and R. Karri, “SENTAUR: Security EnhaNced Trojan Assessment Using LLMs Against Undesirable Revisions,” 2024
2024
-
[12]
Harnessing the Power of General-Purpose LLMs in Hardware Trojan Design,
G. Kokolakis, A. Moschos, and A. D. Keromytis, “Harnessing the Power of General-Purpose LLMs in Hardware Trojan Design,” inInternational Conference on Applied Cryptography and Network Security, pp. 176– 194, Springer, 2024
2024
-
[13]
HWREx: AI-enabled Hardware Weakness and Risk Exploration and Storytelling Framework with LLM-assisted Mitigation Suggestion,
S. Ghimire, Y . Z. Lin, M. Mamun, M. A. Chowdhury, F. Alemi, S. Cai, J. Guo, M. Zhu, H. Li, B. S. Latibari, S. Rafatirad, P. Satam, and S. Salehi, “HWREx: AI-enabled Hardware Weakness and Risk Exploration and Storytelling Framework with LLM-assisted Mitigation Suggestion,”ACM Transactions on Design Automation of Electronic Systems, vol. 30, 10 2025
2025
-
[14]
LLM-HyPZ: Hardware Vulnerability Discovery using an LLM-Assisted Hybrid Platform for Zero-Shot Knowledge Extraction and Refinement,
Y .-Z. Lin, S. Ghimire, A. Nandimandalam, J. M. Camacho, U. Tripathi, R. Macwan, S. Shao, S. Rafatirad, R. Yasaei, P. Satam, and S. Salehi, “LLM-HyPZ: Hardware Vulnerability Discovery using an LLM-Assisted Hybrid Platform for Zero-Shot Knowledge Extraction and Refinement,” 8 2025
2025
-
[15]
Ontology-Driven Framework for Trend Analysis of Vulnerabilities and Impacts in IoT Hardware,
C. Bandi, S. Salehi, R. Hassan, S. Manoj, H. Homayoun, and S. Rafati- rad, “Ontology-Driven Framework for Trend Analysis of Vulnerabilities and Impacts in IoT Hardware,”Proceedings - 2021 IEEE 15th Interna- tional Conference on Semantic Computing, ICSC 2021, pp. 211–214, 1 2021
2021
-
[16]
Automated Supervised Topic Modeling Framework for Hardware Weaknesses,
R. Hassan, C. Bandi, M. T. Tsai, S. Golchin, P. D. S. Manoj, S. Rafatirad, and S. Salehi, “Automated Supervised Topic Modeling Framework for Hardware Weaknesses,”Proceedings - International Symposium on Quality Electronic Design, ISQED, vol. 2023-April, 2023
2023
-
[17]
LLM4MCU-Onto: Leveraging LLMs for Auto- mated Ontology Generation From Microcontroller Reference Manual,
A. Asmita, G. Bandodkar, S. Ghimire, S. Srivastav, S. Salehi, and H. Homayoun, “LLM4MCU-Onto: Leveraging LLMs for Auto- mated Ontology Generation From Microcontroller Reference Manual,” pp. 582–589, 12 2025
2025
-
[18]
Automated Hardware Logic Obfuscation Framework Using GPT,
B. S. Latibari, S. Ghimire, M. A. Chowdhury, N. Nazari, K. I. Gubbi, H. Homayoun, A. Sasan, and S. Salehi, “Automated Hardware Logic Obfuscation Framework Using GPT,” in2024 IEEE 17th Dallas Circuits and Systems Conference (DCAS), pp. 1–5, IEEE, 2024
2024
-
[19]
Emerg- ing Frontiers and Limitations of Logic Locking for Secure IC De- sign,
J. Gandhi, D. Shekhawat, M. Santosh, and J. G. Pandey, “Emerg- ing Frontiers and Limitations of Logic Locking for Secure IC De- sign,” in2024 IEEE 17th International Symposium on Embedded Multicore/Many-core Systems-on-Chip (MCSoC), pp. 239–244, 2024
2024
-
[20]
Op- timized and Automated Secure IC Design Flow: A Defense-in-Depth Approach,
K. I. Gubbi, B. S. Latibari, M. A. Chowdhury, A. Jalilzadeh, E. Y . Hamedani, S. Rafatirad, A. Sasan, H. Homayoun, and S. Salehi, “Op- timized and Automated Secure IC Design Flow: A Defense-in-Depth Approach,”IEEE Transactions on Circuits and Systems I: Regular Papers, vol. 71, pp. 2031–2044, 5 2024
2031
-
[21]
OptiLock: Automated Optimization of Learning- Resilient Logic Locking,
Z. Wang, L. Alrahis, A. B. Chowdhury, D. Germek, R. Karri, and O. Sinanoglu, “OptiLock: Automated Optimization of Learning- Resilient Logic Locking,”IEEE Access, vol. 13, pp. 166649–166669, 2025
2025
-
[22]
CAC 2.0: A Corrupt and Correct Logic Locking Technique Resilient to Structural Analysis Attacks,
L. Aksoy, M. Yasin, and S. Pagliarini, “CAC 2.0: A Corrupt and Correct Logic Locking Technique Resilient to Structural Analysis Attacks,”2024 IEEE 25th Latin American Test Symposium, LATS 2024, 2024
2024
-
[23]
TestLock: A Testability Logic Locking Method Against Machine Learning-based Oracle-Less Attacks,
M. Pandi, M. Moghaddas, and H. Beitollahi, “TestLock: A Testability Logic Locking Method Against Machine Learning-based Oracle-Less Attacks,”The Journal of Supercomputing 2025 81:14, vol. 81, pp. 1320– , 9 2025
2025
-
[24]
SubLock: Sub- Circuit Replacement based Input Dependent Key-based Logic Locking for Robust IP Protection,
V . S. Rathor, M. Singh, K. S. Sahoo, and S. P. Mohanty, “SubLock: Sub- Circuit Replacement based Input Dependent Key-based Logic Locking for Robust IP Protection,” 6 2024
2024
-
[25]
Quality Assessment of Logic Locking Mechanisms using Pseudo-Boolean Optimization Techniques,
M. Merten, M. Hassan, and R. Drechsler, “Quality Assessment of Logic Locking Mechanisms using Pseudo-Boolean Optimization Techniques,” Proceedings - 2023 26th International Symposium on Design and Diagnostics of Electronic Circuits and Systems, DDECS 2023, pp. 105– 110, 2023
2023
-
[26]
SCONE: A Logic Locking Technique Utilizing SMT Solver and Circuit Encoding Scheme for Efficient Hardware IP Protection,
Z. Han, D. Xing, K. Amberiadis, A. Srivastava, and J. J. Rajendran, “SCONE: A Logic Locking Technique Utilizing SMT Solver and Circuit Encoding Scheme for Efficient Hardware IP Protection,”Proceedings - Design Automation Conference, 2025
2025
-
[27]
SeeMLess: Security Evaluation of Logic Locking using Machine Learning oriented Estimation,
B. Ahmed, S. Rahman, K. Z. Azar, F. Farahmandi, F. Rahman, and M. Tehranipoor, “SeeMLess: Security Evaluation of Logic Locking using Machine Learning oriented Estimation,”Proceedings of the ACM Great Lakes Symposium on VLSI, GLSVLSI, pp. 489–494, 6 2024
2024
-
[28]
Removal of SAT-Hard Instances in Logic Obfuscation Through Inference of Functionality,
I. McDaniel, M. Zuzak, and A. Srivastava, “Removal of SAT-Hard Instances in Logic Obfuscation Through Inference of Functionality,” ACM Transactions on Design Automation of Electronic Systems, vol. 29, 7 2024
2024
-
[29]
Flow: Modularized Agentic Workflow Automation,
B. Niu, Y . Song, K. Lian, Y . Shen, Y . Yao, K. Zhang, and T. Liu, “Flow: Modularized Agentic Workflow Automation,” 2025
2025
-
[30]
Y . Xiong, J. Wang, B. Li, Y . Zhu, and Y . Zhao, “Self-Organizing Agent Network for LLM-based Workflow Automation,”ArXiv, vol. abs/2508.13732, 2025
-
[31]
SagaLLM: Context Management, Validation, and Transaction Guarantees for Multi-Agent LLM Planning,
E. Y . Chang and L. Geng, “SagaLLM: Context Management, Validation, and Transaction Guarantees for Multi-Agent LLM Planning,”Proceed- ings of the VLDB Endowment, vol. 18, pp. 4874–4886, 8 2025
2025
-
[32]
Formal-LLM: Integrating Formal Language and Natural Language for Controllable LLM-based Agents,
Z. Li, W. Hua, H. Wang, H. Zhu, and Y . Zhang, “Formal-LLM: Integrating Formal Language and Natural Language for Controllable LLM-based Agents,”arXiv.org, 2024
2024
-
[33]
Ro- bust Planning with Compound LLM Architectures: An LLM-Modulo Approach,
A. Gundawar, K. Valmeekam, M. Verma, and S. Kambhampati, “Ro- bust Planning with Compound LLM Architectures: An LLM-Modulo Approach,” 11 2024
2024
-
[34]
LASP: LLM Assisted Security Property Generation for SoC Verifica- tion,
A. Ayalasomayajula, R. Guo, J. Zhou, S. K. Saha, and F. Farahmandi, “LASP: LLM Assisted Security Property Generation for SoC Verifica- tion,” pp. 1–7, 11 2024
2024
-
[35]
Enhanced VLSI Assertion Generation: Conforming to High- Level Specifications and Reducing LLM Hallucinations with RAG,
H. A. Quddus, M. S. Hossain, Z. Cevahir, A. Jesser, and M. N. Amin, “Enhanced VLSI Assertion Generation: Conforming to High- Level Specifications and Reducing LLM Hallucinations with RAG,” 2024 Design and Verification Conference and Exhibition Europe, DVCon Europe 2024 - Proceedings, pp. 57–62, 2024
2024
-
[36]
Logic Encryption: A Fault Analysis Perspective,
J. Rajendran, Y . Pino, O. Sinanoglu, and R. Karri, “Logic Encryption: A Fault Analysis Perspective,”Proceedings -Design, Automation and Test in Europe, DATE, pp. 953–958, 2012
2012
-
[37]
Improving Logic Obfuscation via Logic Cone Analysis,
Y . W. Lee and N. A. Touba, “Improving Logic Obfuscation via Logic Cone Analysis,”2015 16th Latin-American Test Symposium, LATS 2015, 5 2015
2015
-
[38]
ISCAS ’85 benchmarks - verilog
“ISCAS ’85 benchmarks - verilog.” Accessed October 05, 2024
2024
-
[39]
Power side-channel leakage assessment of fpga-based spiking neural networks,
V . Pugazhenthi, M. A. Chowdhury, S. Ghimire, H. Dharavath, B. Saber Latibari, and S. Salehi, “Power side-channel leakage assessment of fpga-based spiking neural networks,” pp. 588–592, 08 2025
2025
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.