pith. sign in

arxiv: 2601.18696 · v4 · pith:KZCSZS3Onew · submitted 2026-01-26 · 💻 cs.LG

Explainability Methods for Hardware Trojan Detection: A Systematic Comparison

Pith reviewed 2026-05-21 14:12 UTC · model grok-4.3

classification 💻 cs.LG
keywords explainabilityhardware trojan detectionmachine learningfeature attributioncircuit analysisTrust-Hubgate-level securityLIME SHAP
0
0 comments X

The pith

Existing explainability methods from general domains do not provide the actionable insights hardware engineers need for Trojan detection.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

This paper systematically compares three categories of explainability methods for detecting hardware Trojans in integrated circuits. It evaluates domain-aware property analysis using 31 circuit-specific features, case-based reasoning with k-nearest neighbors, and feature attribution techniques like LIME, SHAP, and gradients. The goal is to identify which approaches deliver the circuit-level context necessary to filter false positives and negatives in security analysis. A reader would care because undetected Trojans lead to unpatchable compromises in hardware, requiring expensive recalls. The comparison on the Trust-Hub benchmark highlights differences in how well each method supports hardware security applications.

Core claim

The central claim is that a comparison of domain-aware property-based analysis of 31 circuit-specific features derived from gate fanin patterns, flip-flop distances, and primary I/O connectivity, model-agnostic case-based reasoning using k-nearest neighbors, and model-agnostic feature attribution methods such as LIME, SHAP, and gradient on the Trust-Hub benchmark dataset is required to determine their relative effectiveness in providing actionable insights for hardware Trojan detection.

What carries the argument

The systematic comparison of domain-aware property analysis, case-based reasoning, and feature attribution techniques applied to gate-level circuit features for hardware Trojan detection.

Load-bearing premise

The 31 circuit-specific features derived from gate fanin patterns, flip-flop distances, and primary I/O connectivity, when used with the Trust-Hub benchmark, will enable a meaningful comparison that reveals differences in actionable insights for hardware security applications.

What would settle it

A concrete falsifier would be if all three explainability categories produce equivalent levels of actionable insights for hardware engineers when applied to the same set of circuits from the Trust-Hub benchmark, showing no advantage for domain-aware methods.

read the original abstract

Hardware trojans are malicious circuits which compromise the functionality and security of an integrated circuit (IC). These circuits are manufactured directly into the silicon and cannot be fixed by security patches like software. The solution would require a costly product recall by replacing the IC and hence, early detection in the design process is essential. Hardware detection at best provides statistically based solutions with many false positives and false negatives. These detection methods require more thorough explainable analysis to filter out false indicators. Existing explainability methods developed for general domains like image classification may not provide the actionable insights that hardware engineers need. A question remains: How do domain-aware property analysis, model-agnostic case-based reasoning, and model-agnostic feature attribution techniques compare for hardware security applications? This work compares three categories of explainability for gate-level hardware trojan detection on the Trust-Hub benchmark dataset: (1) domain-aware property-based analysis of 31 circuit-specific features derived from gate fanin patterns, flip-flop distances, and primary Input/Output (I/O) connectivity; (2) model-agnostic case-based reasoning using k-nearest neighbors for precedent-based explanations; and (3) model-agnostic feature attribution methods (Local Interpretable Model-agnostic Explanations (LIME), SHapley Additive exPlanations (SHAP), gradient) that provide generic importance scores without circuit-level context.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

1 major / 1 minor

Summary. The paper conducts a systematic comparison of explainability methods for gate-level hardware Trojan detection on the Trust-Hub benchmark. It evaluates three categories: (1) domain-aware property-based analysis using 31 circuit-specific features derived from gate fanin patterns, flip-flop distances, and primary I/O connectivity; (2) model-agnostic case-based reasoning via k-nearest neighbors; and (3) model-agnostic feature attribution techniques including LIME, SHAP, and gradient-based methods. The central question addressed is whether general-domain explainability approaches deliver actionable insights for hardware engineers to filter false positives, or whether domain-specific methods are required.

Significance. If the comparison successfully quantifies differences in actionability (e.g., via metrics for false-positive filtering or Trojan localization), the work could guide development of hardware-security-tailored XAI tools and reduce reliance on statistically noisy detection methods. The direct empirical comparison on an external, standard benchmark (Trust-Hub) with no free parameters or circular definitions is a methodological strength that supports reproducibility and falsifiability of the claims.

major comments (1)
  1. [Abstract and methodology description of domain-aware analysis] The central claim that domain-aware analysis of the 31 features will reveal superior actionable insights for hardware engineers rests on the untested assumption that these features (gate fanin patterns, flip-flop distances, primary I/O connectivity) possess sufficient Trojan-localization power and that Trust-Hub instances exhibit diverse false-positive regimes. No proxy metric for actionability (e.g., engineer-rated usefulness or post-explanation Trojan-gate recall) is indicated, which is load-bearing for the comparative conclusion.
minor comments (1)
  1. [Abstract] The abstract would benefit from a brief statement of the key quantitative outcomes or metrics used to compare the three explainability categories.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for the constructive feedback on our systematic comparison of explainability methods for hardware Trojan detection. We address the major comment point by point below and outline the revisions we will make to strengthen the manuscript.

read point-by-point responses
  1. Referee: [Abstract and methodology description of domain-aware analysis] The central claim that domain-aware analysis of the 31 features will reveal superior actionable insights for hardware engineers rests on the untested assumption that these features (gate fanin patterns, flip-flop distances, primary I/O connectivity) possess sufficient Trojan-localization power and that Trust-Hub instances exhibit diverse false-positive regimes. No proxy metric for actionability (e.g., engineer-rated usefulness or post-explanation Trojan-gate recall) is indicated, which is load-bearing for the comparative conclusion.

    Authors: We appreciate this observation regarding the need for explicit evaluation of actionability. The 31 features were selected from established hardware security literature precisely because they target structural properties known to correlate with Trojan insertions, such as atypical gate fan-in, distances to sequential elements, and primary I/O connectivity patterns. Trust-Hub serves as the de facto standard benchmark and contains a range of Trojan implementations across different designs, which inherently includes varied false-positive behaviors in detection models. Our comparison quantifies differences in the insights produced by each explainability category on this benchmark. To directly address the concern about untested assumptions, we will add a new subsection in the revised manuscript that introduces a proxy metric for actionability: specifically, the post-explanation recall of Trojan gates when explanations are used to filter or localize suspicious regions. This will provide a quantitative basis for comparing how well each method supports false-positive reduction. revision: partial

Circularity Check

0 steps flagged

Empirical comparison on external benchmark with no derivation chain

full rationale

The paper conducts a systematic empirical comparison of three categories of explainability methods (domain-aware property analysis of 31 circuit features, k-NN case-based reasoning, and LIME/SHAP/gradient attribution) applied to gate-level hardware Trojan detection on the Trust-Hub benchmark. No equations, predictions, or first-principles derivations are present that could reduce to inputs by construction. The work uses an external public benchmark and standard techniques without self-citation load-bearing, fitted-input predictions, or ansatz smuggling. The central claim rests on observable differences in the comparison results rather than any self-referential definition.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract provides no explicit free parameters, axioms, or invented entities; the approach relies on standard machine learning explainability tools and a public benchmark dataset without additional postulates.

pith-pipeline@v0.9.0 · 5779 in / 1077 out tokens · 47348 ms · 2026-05-21T14:12:45.627209+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

42 extracted references · 42 canonical work pages · 4 internal anchors

  1. [1]

    In: Proceedings of the 22nd ACM SIGKDD Inter- national Conference on Knowledge Discovery and Data Mining, pp

    Ribeiro, M.T., Singh, S., Guestrin, C.: ” why should i trust you?” explaining the predictions of any classifier. In: Proceedings of the 22nd ACM SIGKDD Inter- national Conference on Knowledge Discovery and Data Mining, pp. 1135–1144 (2016)

  2. [2]

    Advances in neural information processing systems30(2017)

    Lundberg, S.M., Lee, S.-I.: A unified approach to interpreting model predictions. Advances in neural information processing systems30(2017)

  3. [3]

    In: 2008 Design, Automation and Test in Europe, pp

    Wolff, F., Papachristou, C., Bhunia, S., Chakraborty, R.S.: Towards trojan-free trusted ics: Problem analysis and detection scheme. In: 2008 Design, Automation and Test in Europe, pp. 1362–1365 (2008). https://doi.org/10.1109/DATE.2008. 4484928

  4. [4]

    In: 2016 IEEE 22nd International Symposium on On-Line Testing and Robust System Design (IOLTS), pp

    Hasegawa, K., Oya, M., Yanagisawa, M., Togawa, N.: Hardware trojans clas- sification for gate-level netlists based on machine learning. In: 2016 IEEE 22nd International Symposium on On-Line Testing and Robust System Design (IOLTS), pp. 203–206 (2016). https://doi.org/10.1109/IOLTS.2016.7604700

  5. [5]

    PhD thesis, Waseda University (2020) 29

    Hasegawa, K.: Hardware-trojan detection methods utilizing machine learning based on hardware-specific features. PhD thesis, Waseda University (2020) 29

  6. [6]

    https://doi.org/10.21227/px6s-sm21

    Salmani, H., Tehranipoor, M., Sutikno, S., Wijitrisnanto, F.: Trust-Hub Tro- jan Benchmark for Hardware Trojan Detection Model Creation Using Machine Learning. https://doi.org/10.21227/px6s-sm21 . https://dx.doi.org/10.21227/ px6s-sm21

  7. [7]

    In: 2024 IEEE International Conference on Consumer Electronics (ICCE), pp

    Negishi, R., Togawa, N.: Evaluation of ensemble learning models for hardware- trojan identification at gate-level netlists. In: 2024 IEEE International Conference on Consumer Electronics (ICCE), pp. 1–6 (2024). https://doi.org/10.1109/ ICCE59016.2024.10444240

  8. [8]

    Suchara, J

    Salmani, H., Tehranipoor, M., Karri, R.: On design vulnerability analysis and trust benchmarks development. In: 2013 IEEE 31st International Conference on Computer Design (ICCD), pp. 471–474 (2013). https://doi.org/10.1109/ICCD. 2013.6657085

  9. [9]

    Journal of Hardware and Systems Security1, 85–102 (2017)

    Shakya, B., He, T., Salmani, H., Forte, D., Bhunia, S., Tehranipoor, M.: Benchmarking of hardware trojans and maliciously affected circuits. Journal of Hardware and Systems Security1, 85–102 (2017)

  10. [10]

    PhD thesis, Monterey, California: Naval Postgraduate School (2015)

    Slayback, S.M.: A computer scientist’s evaluation of publically available hardware trojan benchmarks. PhD thesis, Monterey, California: Naval Postgraduate School (2015)

  11. [11]

    Journal of Electronic Testing41(4), 467–482 (2025)

    Rathor, V.S., Rastogi, A.: Ht-pred: An extensive methodology for dataset preparation and hardware trojan prediction using gate-level netlist. Journal of Electronic Testing41(4), 467–482 (2025)

  12. [12]

    Political analysis9(2), 137–163 (2001)

    King, G., Zeng, L.: Logistic regression in rare events data. Political analysis9(2), 137–163 (2001)

  13. [13]

    Proceedings

    Caruana, R., Kangarloo, H., Dionisio, J.D.N., Sinha, U.S., Johnson, D.B.: Case-based explanation of non-case-based learning methods. Proceedings. AMIA Symposium, 212–5 (1999)

  14. [14]

    AI Mag40(2), 44 (2019)

    DW, G.D.A.: Darpa’s explainable artificial intelligence program. AI Mag40(2), 44 (2019)

  15. [15]

    In: 2018 41st International Convention on Information and Communication Technology, Electronics and Microelectronics (MIPRO), pp

    Doˇ silovi´ c, F.K., Brˇ ci´ c, M., Hlupi´ c, N.: Explainable artificial intelligence: A sur- vey. In: 2018 41st International Convention on Information and Communication Technology, Electronics and Microelectronics (MIPRO), pp. 0210–0215 (2018). https://doi.org/10.23919/MIPRO.2018.8400040

  16. [16]

    Fernandez, A., Herrera, F., Cordon, O., Jose del Jesus, M., Marcelloni, F.: Evolu- tionary fuzzy systems for explainable artificial intelligence: Why, when, what for, and where to? IEEE Computational Intelligence Magazine14(1), 69–81 (2019) https://doi.org/10.1109/MCI.2018.2881645 30

  17. [17]

    Computer51(9), 28– 36 (2018) https://doi.org/10.1109/MC.2018.3620965

    Hagras, H.: Toward human-understandable, explainable ai. Computer51(9), 28– 36 (2018) https://doi.org/10.1109/MC.2018.3620965

  18. [18]

    In: 2018 International Conference on Machine Learning and Data Engineering (iCMLDE), pp

    Howard, D., Edwards, M.A.: Explainable a.i.: The promise of genetic pro- gramming multi-run subtree encapsulation. In: 2018 International Conference on Machine Learning and Data Engineering (iCMLDE), pp. 158–159 (2018). https://doi.org/10.1109/iCMLDE.2018.00037

  19. [19]

    In: Proceedings of the IEEE International Conference on Computer Vision, pp

    Selvaraju, R.R., Cogswell, M., Das, A., Vedantam, R., Parikh, D., Batra, D.: Grad-cam: Visual explanations from deep networks via gradient-based localiza- tion. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 618–626 (2017)

  20. [20]

    Towards A Rigorous Science of Interpretable Machine Learning

    Doshi-Velez, F., Kim, B.: Towards a rigorous science of interpretable machine learning. arXiv preprint arXiv:1702.08608 (2017)

  21. [21]

    ACM computing surveys (CSUR)51(5), 1–42 (2018)

    Guidotti, R., Monreale, A., Ruggieri, S., Turini, F., Giannotti, F., Pedreschi, D.: A survey of methods for explaining black box models. ACM computing surveys (CSUR)51(5), 1–42 (2018)

  22. [22]

    arXiv preprint arXiv:2006.00093 (2020)

    Vilone, G., Longo, L.: Explainable artificial intelligence: a systematic review. arXiv preprint arXiv:2006.00093 (2020)

  23. [23]

    Information fusion58, 82–115 (2020)

    Arrieta, A.B., D´ ıaz-Rodr´ ıguez, N., Del Ser, J., Bennetot, A., Tabik, S., Barbado, A., Garc´ ıa, S., Gil-L´ opez, S., Molina, D., Benjamins, R.,et al.: Explainable artifi- cial intelligence (xai): Concepts, taxonomies, opportunities and challenges toward responsible ai. Information fusion58, 82–115 (2020)

  24. [24]

    Explainable Neural Networks based on Additive Index Models

    Vaughan, J., Sudjianto, A., Brahimi, E., Chen, J., Nair, V.N.: Explainable neural networks based on additive index models. arXiv preprint arXiv:1806.01933 (2018)

  25. [25]

    Journal of Open Source Software5(56), 2646 (2020) https: //doi.org/10.21105/joss.02646

    Sweeney, J., Purdy, R., Blanton, R.D., Pileggi, L.: Circuitgraph: A python package for boolean circuits. Journal of Open Source Software5(56), 2646 (2020) https: //doi.org/10.21105/joss.02646

  26. [26]

    PyPI (2024)

    Shinan, E.: Lark: A modern parsing library for Python. PyPI (2024)

  27. [27]

    Technical report, Los Alamos National Laboratory (LANL), Los Alamos, NM (United States) (2008)

    Hagberg, A., Swart, P.J., Schult, D.A.: Exploring network structure, dynamics, and function using networkx. Technical report, Los Alamos National Laboratory (LANL), Los Alamos, NM (United States) (2008)

  28. [28]

    Cornell University (2016)

    Chen, T.: Xgboost: A scalable tree boosting system. Cornell University (2016)

  29. [29]

    In: 2024 IEEE 3rd International Conference on Com- puting and Machine Intelligence (ICMI), pp

    Whitten, P., Wolff, F., Papachristou, C.: An ai architecture with the capability to explain recognition results. In: 2024 IEEE 3rd International Conference on Com- puting and Machine Intelligence (ICMI), pp. 1–6 (2024). https://doi.org/10.1109/ ICMI60790.2024.10586116 .https://doi.org/10.1109/ICMI60790.2024.10586116 31

  30. [30]

    https://github.com/paulwhitten/case-explainer

    Whitten, P., Wolff, F., Papachristou, C.: Case-Explainer: General-Purpose Case- Based Explainability, (2025). https://github.com/paulwhitten/case-explainer

  31. [31]

    Deep Inside Convolutional Networks: Visualising Image Classification Models and Saliency Maps

    Simonyan, K., Vedaldi, A., Zisserman, A.: Deep inside convolutional net- works: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034 (2013)

  32. [32]

    PhD thesis, Case Western Reserve University (2025)

    Whitten, P.C.: Explainable ai architectures: Methods, applications, examples, and results. PhD thesis, Case Western Reserve University (2025). http://rave. ohiolink.edu/etdc/view?acc num=case1743462769378071

  33. [33]

    on the problem of the most efficient tests of statistical hypotheses

    Neyman, J., Pearson, E.S.: Ix. on the problem of the most efficient tests of statistical hypotheses. Philosophical Transactions of the Royal Society of London, Series A: Containing Papers of a Mathematical or Phys- ical Character231(694-706), 289–337 (1933) https://doi.org/10.1098/ rsta.1933.0009 https://royalsocietypublishing.org/rsta/article-pdf/231/694...

  34. [34]

    JA, G.D.S.: Signal detection theory and psychophysics 1966. Wiley

  35. [35]

    Van Rijsbergen, C.J.: Information retrieval. 2nd. newton, ma. USA: Butterworth- Heinemann (1979)

  36. [36]

    Pattern recognition letters27(8), 861–874 (2006)

    Fawcett, T.: An introduction to roc analysis. Pattern recognition letters27(8), 861–874 (2006)

  37. [37]

    Journal of Machine Learning Research12, 2825–2830 (2011)

    Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., Blondel, M., Prettenhofer, P., Weiss, R., Dubourg, V., Vanderplas, J., Passos, A., Cournapeau, D., Brucher, M., Perrot, M., Duchesnay, E.: Scikit-learn: Machine learning in Python. Journal of Machine Learning Research12, 2825–2830 (2011)

  38. [38]

    Software: Practice and Experience30(11), 1203–1233 (2000)

    Gansner, E.R., North, S.C.: An open graph visualization system and its appli- cations to software engineering. Software: Practice and Experience30(11), 1203–1233 (2000)

  39. [39]

    Evaluating Learning Algorithms

    Japkowicz, N., Shah, M.: Evaluating learning algorithms: A classification per- spective. Evaluating Learning Algorithms. A Classification Perspective (2011) https://doi.org/10.1017/CBO9780511921803

  40. [40]

    Neural computation10(7), 1895–1923 (1998)

    Dietterich, T.G.: Approximate statistical tests for comparing supervised classifi- cation learning algorithms. Neural computation10(7), 1895–1923 (1998)

  41. [41]

    Advances in neural information processing systems33, 1877–1901 (2020) 32

    Brown, T., Mann, B., Ryder, N., Subbiah, M., Kaplan, J.D., Dhariwal, P., Neelakantan, A., Shyam, P., Sastry, G., Askell, A.,et al.: Language models are few-shot learners. Advances in neural information processing systems33, 1877–1901 (2020) 32

  42. [42]

    Llama 2: Open Foundation and Fine-Tuned Chat Models

    Touvron, H., Martin, L., Stone, K., Albert, P., Almahairi, A., Babaei, Y., Bash- lykov, N., Batra, S., Bhargava, P., Bhosale, S., et al.: Llama 2: Open foundation and fine-tuned chat models. arXiv preprint arXiv:2307.09288 (2023) 33