TrojanGYM: A Detector-in-the-Loop LLM for Adaptive RTL Hardware Trojan Insertion

Akashdeep Saha; Johann Knechtel; Minghao Shao; Muhammad Shafique; Ozgur Sinanoglu; Ramesh Karri; Saideep Sreekumar; Weihua Xiao; Zeng Wang

arxiv: 2601.17178 · v2 · pith:P6JDLBXRnew · submitted 2026-01-23 · 💻 cs.CR · cs.AI· cs.AR

TrojanGYM: A Detector-in-the-Loop LLM for Adaptive RTL Hardware Trojan Insertion

Saideep Sreekumar , Zeng Wang , Akashdeep Saha , Weihua Xiao , Minghao Shao , Muhammad Shafique , Ozgur Sinanoglu , Ramesh Karri

show 1 more author

Johann Knechtel

This is my paper

classification 💻 cs.CR cs.AIcs.AR

keywords detectorstrojangymbenchmarksdetectorgnn-basedblinddesignsdiverse

0 comments

read the original abstract

Hardware Trojans (HTs) remain a critical threat because learning-based detectors often overfit to narrow trigger/payload patterns and small, stylized benchmarks. We introduce TrojanGYM, an agentic, LLM-driven framework that automatically curates HT insertions to expose detector blind spots while preserving design correctness. Given high-level HT specifications, a suite of cooperating LLM agents (instantiated with GPT-4, LLaMA-3.3-70B, and Gemini-2.5Pro) proposes and refines RTL modifications that realize diverse triggers and payloads without impacting normal functionality. TrojanGYM implements a feedback-driven benchmark generation loop co-designed with HT detectors, in which constraint-aware syntactic checking and GNN-based HT detectors provide feedback that iteratively refines HT specifications and insertion strategies to better surface detector blind spots. We further propose Robust-GNN4TJ, a new implementation of the GNN4TJ with improved graph extraction, training robustness, and prediction reliability, especially on LLM-generated HT designs. On the most challenging TrojanGYM-generated benchmarks, Robust-GNN4TJ raises HT detection rates from 0% to 60% relative to a prior GNN-based detector. We instantiate TrojanGYM on SRAM, AES-128, and UART designs at RTL level, and show that it systematically produces diverse, functionally correct HTs that reach up to 83.33% evasion rates against modern GNN-based detectors, revealing robustness gaps that are not apparent when these detectors are evaluated solely on existing TrustHub-style benchmarks. Post peer-review, we will release all codes and artifacts.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 4 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

LLMs for Secure Hardware Design and Related Problems: Opportunities and Challenges
cs.CR 2026-05 unverdicted novelty 3.0

A survey of LLM applications in secure hardware design covering EDA synthesis, vulnerability analysis, countermeasures, and educational uses.
LLMs for Secure Hardware Design and Related Problems: Opportunities and Challenges
cs.CR 2026-05 unverdicted novelty 3.0

A review synthesizing opportunities and challenges of using LLMs for secure hardware design, EDA synthesis, and related security issues.
LLMs for Secure Hardware Design and Related Problems: Opportunities and Challenges
cs.CR 2026-05 accept novelty 2.0

LLMs enable RTL code generation and vulnerability analysis in hardware design but introduce data contamination and adversarial risks that require red-teaming and dynamic benchmarking.
LLMs for Secure Hardware Design and Related Problems: Opportunities and Challenges
cs.CR 2026-05 unverdicted novelty 1.0

A survey of LLM applications in electronic design automation and hardware security, covering opportunities, vulnerabilities, and countermeasures.