pith. sign in

arxiv: 2605.22437 · v1 · pith:HQK3JSZ6new · submitted 2026-05-21 · 💻 cs.CR · cs.AI· cs.LG

Characterizing the Fault Response of the Intel Neural Compute Stick 2 Under Single-Pulse Electromagnetic Fault Injection

Pith reviewed 2026-05-22 05:19 UTC · model grok-4.3

classification 💻 cs.CR cs.AIcs.LG
keywords electromagnetic fault injectionIntel Neural Compute Stick 2persistent degradationneural network inferenceOpenVINO runtimeSEU-like faultsedge AI securityconvolutional neural networks
0
0 comments X

The pith

Single electromagnetic pulses cause persistent accuracy collapse below 5 percent in CNN inferences on the Intel NCS2 that survives until model reload and evades API detection.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper reports a systematic campaign of single-pulse electromagnetic fault injection on the Intel Neural Compute Stick 2 running ResNet-18, ResNet-50, and VGG-11 models under OpenVINO. It identifies four repeatable outcome classes, with the major persistent degradation class appearing at 18-31 percent of trials at characterized locations. In this class top-1 accuracy falls below five percent and remains low on every subsequent inference until the model is explicitly reloaded. The same degradation can be triggered by pulsing an idle device that already holds the loaded model, showing that load-time checks alone do not prevent the effect.

Core claim

Single pulses produce four reproducible outcome classes interpreted as no-effect, minor SDC, SEU-like persistent corruption, and SEFI-like loss of functionality. The major-degradation class reaches post-collapse top-1 accuracy below five percent, persists across all following inferences until explicit model reload, occurs at 18-31 percent of trials at hotspots, and is inducible on an idle device with the model already loaded, demonstrating that no inference-API-level mechanism detects the regime.

What carries the argument

Four reproducible outcome classes from single-pulse EMFI interpreted as no-effect, SDC, SEU-like persistent corruption, and SEFI-like hangs.

If this is right

  • Major degradation produces top-1 accuracy below five percent that remains low on every subsequent inference until explicit reload.
  • The regime is undetectable by any inference-API-level mechanism.
  • The same persistent degradation can be induced by pulses delivered to an idle device that already holds the loaded model.
  • Load-time integrity checks alone are therefore insufficient to prevent the effect.
  • Mitigation strategies can be graded by outcome class and implemented at the application level without changes to firmware or the OpenVINO runtime.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Application-level output monitoring or consistency checks across consecutive inferences could flag the persistent degradation in deployed systems.
  • Similar single-pulse EMFI behavior may appear in other commercial vision-processing units used in edge safety applications.
  • Periodic model re-verification or checksums during runtime might reduce the window in which undetected degradation can affect decisions.

Load-bearing premise

The observed major persistent degradation is produced by the single electromagnetic pulses rather than by unrelated factors such as power supply noise or software timing.

What would settle it

Repeating the spot-test trials at the same characterized hotspots while applying no electromagnetic pulse and checking whether the major persistent degradation class still appears at rates near 18-31 percent.

Figures

Figures reproduced from arXiv: 2605.22437 by Jakub Breier, \v{S}tefan Ku\v{c}er\'ak, Xiaolu Hou.

Figure 1
Figure 1. Figure 1: Experimental setup overview. The host laptop (left) [PITH_FULL_IMAGE:figures/full_fig_p003_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Decased Intel Neural Compute Stick 2 placed on the [PITH_FULL_IMAGE:figures/full_fig_p004_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Spot-test fault outcome distribution at the right-side hotspot [PITH_FULL_IMAGE:figures/full_fig_p008_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Per-trial top-1 accuracy distributions over 256 spot-test [PITH_FULL_IMAGE:figures/full_fig_p009_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Output-layer logit vectors (1000 ImageNet classes, ResNet-50) for representative single inferences in each [PITH_FULL_IMAGE:figures/full_fig_p010_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: Persistent-fault rate Rpersist versus device-failure rate Rfail for the six (model, timing) configurations of Table III. Each model occupies a distinct region of the reliability plane, and the during-vs-before- inference pulse timing moves each model along a different vector, indicating that timing sensitiv￾ity is itself architecture-dependent. munication are located centrally while sub-blocks responsible … view at source ↗
Figure 7
Figure 7. Figure 7: Spatial sensitivity of the NCS2 to a 1 mm CCW EM pulse during ResNet-50 inference, from approximately 16,000 [PITH_FULL_IMAGE:figures/full_fig_p012_7.png] view at source ↗
Figure 8
Figure 8. Figure 8: Day-to-day repeatability of the four-class outcome [PITH_FULL_IMAGE:figures/full_fig_p012_8.png] view at source ↗
read the original abstract

Vision processing units and other commercial neural-network inference accelerators are increasingly deployed in safety-relevant edge applications, but their fault response under transient hardware disturbances remains poorly characterized in the open literature. For the Intel Movidius Myriad X, packaged as the Intel Neural Compute Stick 2 (NCS2), only a single feasibility study has been published. We report a systematic single-pulse electromagnetic fault injection (EMFI) campaign on the NCS2 running three ImageNet-trained convolutional neural networks (ResNet-18, ResNet-50, VGG-11) on the OpenVINO runtime. Across 1,536 spot-test trials at characterized hotspots and approximately 16,000 parameter-search trials, single pulses produce four reproducible outcome classes: no measured accuracy change, minor silent data corruption, major persistent degradation that survives across subsequent inferences until model reload, and device hangs requiring USB power-cycling; these outcomes are respectively interpreted as no-effect, SDC with possible SET-like or small persistent-state mechanisms, SEU-like persistent corruption, and SEFI-like loss of functionality. Two findings are central. First, the major-degradation class can be induced at 18-31% of trials at characterized hotspots, with post-collapse top-1 accuracy below five percent and persistence across all subsequent inferences until explicit model reload - a regime that no inference-API-level mechanism detects. Second, this regime is also inducible by pulses delivered to an idle device with the model already loaded, demonstrating that load-time integrity checks alone are insufficient. We discuss mitigation strategies graded by class, focusing on mechanisms implementable at the application level without modification to the device firmware or the OpenVINO runtime.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper reports results from a systematic single-pulse electromagnetic fault injection (EMFI) campaign on the Intel Neural Compute Stick 2 (NCS2) running ResNet-18, ResNet-50, and VGG-11 on the OpenVINO runtime. Across 1,536 spot-test trials at characterized hotspots and approximately 16,000 parameter-search trials, the authors identify four reproducible outcome classes: no measured accuracy change, minor silent data corruption, major persistent degradation (post-collapse top-1 accuracy below 5% that persists across subsequent inferences until explicit model reload), and device hangs requiring USB power-cycling. These are interpreted as no-effect, SDC/SET-like, SEU-like persistent corruption, and SEFI-like loss of functionality. Central claims are that the major-degradation class occurs at 18-31% of trials at hotspots, is undetectable by inference-API mechanisms, and remains inducible on idle devices with the model already loaded (showing load-time checks are insufficient). The authors discuss graded mitigation strategies implementable at the application level.

Significance. If the central experimental findings hold after addressing controls, the work provides a useful open characterization of transient fault responses in a commercial neural inference accelerator, which is relevant for safety-critical edge deployments. The scale of the campaign (over 17,000 total trials) and the identification of a persistent low-accuracy regime that survives reload-free operation are strengths that could inform hardware-security practices. The paper correctly notes the absence of prior systematic studies beyond one feasibility paper and supplies reproducible outcome classes that future work can build upon.

major comments (2)
  1. [Abstract and §4] Abstract and §4 (Results): The attribution of the major persistent degradation class to an SEU-like mechanism induced by single-pulse EMFI is load-bearing for the headline claims (18-31% induction rate, undetectability by APIs, and insufficiency of load-time checks), yet the manuscript reports no controls that would exclude coincident power-rail transients, USB timing glitches, or OpenVINO runtime state corruption as alternative explanations for the observed persistent accuracy collapse to <5% top-1.
  2. [§3 and §4] §3 (Experimental Methodology) and §4: The outcome classification and the reported 18-31% induction rates at hotspots rest on post-injection accuracy measurements without visible error bars, per-class trial counts, or explicit exclusion criteria for non-EMFI artifacts; this weakens the statistical grounding of the central claim that the major-degradation regime is reliably produced by the fault injection.
minor comments (2)
  1. [Abstract] Abstract: The phrase 'approximately 16,000 parameter-search trials' should be replaced by the exact total and a breakdown by outcome class to allow readers to assess coverage.
  2. [§5] §5 (Discussion): The mitigation strategies are described at a high level; adding a short table or pseudocode examples for the proposed application-level checks would improve clarity without altering the technical contribution.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their careful and constructive review of our manuscript. We address each major comment in turn below, providing the strongest honest responses we can offer based on the experiments performed. Where the comments identify opportunities to strengthen statistical presentation or experimental controls, we have revised the manuscript accordingly.

read point-by-point responses
  1. Referee: [Abstract and §4] Abstract and §4 (Results): The attribution of the major persistent degradation class to an SEU-like mechanism induced by single-pulse EMFI is load-bearing for the headline claims (18-31% induction rate, undetectability by APIs, and insufficiency of load-time checks), yet the manuscript reports no controls that would exclude coincident power-rail transients, USB timing glitches, or OpenVINO runtime state corruption as alternative explanations for the observed persistent accuracy collapse to <5% top-1.

    Authors: We agree that explicit discussion of alternative explanations strengthens the attribution. The persistence of the low-accuracy state across repeated inferences until an explicit model reload is the primary basis for interpreting the outcome as persistent state corruption rather than a transient power or timing artifact; transient glitches would be expected either to resolve on the next inference or to produce immediate hangs, neither of which matches the observed behavior. Experiments performed on idle devices with the model already resident further reduce the likelihood of load-time runtime corruption. Nevertheless, to address the referee’s concern directly we will add a new subsection in §3 that documents the power-rail monitoring, USB timing verification, and hotspot-characterization procedures used to minimize and detect non-EMFI confounds. revision: yes

  2. Referee: [§3 and §4] §3 (Experimental Methodology) and §4: The outcome classification and the reported 18-31% induction rates at hotspots rest on post-injection accuracy measurements without visible error bars, per-class trial counts, or explicit exclusion criteria for non-EMFI artifacts; this weakens the statistical grounding of the central claim that the major-degradation regime is reliably produced by the fault injection.

    Authors: We accept that the current presentation would benefit from greater statistical transparency. In the revised manuscript we will report the exact number of trials falling into each outcome class, include binomial confidence intervals or standard-error bars on the 18–31 % hotspot rates, and state the explicit exclusion criteria applied to trials affected by device instability or USB enumeration failures. These additions will appear in §4 with a brief reference in §3. revision: yes

Circularity Check

0 steps flagged

No circularity: purely experimental characterization

full rationale

The paper conducts direct empirical fault-injection trials on the NCS2 hardware, measures post-injection top-1 accuracy and device state across thousands of trials, and classifies observed outcomes into four reproducible classes. No equations, fitted parameters, predictions, or derivations appear; outcome classes are defined by measured accuracy thresholds and persistence behavior rather than by any self-referential construction. All central claims rest on external benchmarks (ImageNet accuracy, USB power-cycle recovery) and are falsifiable by replication. Self-citations, if present, are not load-bearing for any derivation. This is the normal case of an experimental characterization paper whose results do not reduce to their inputs by construction.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claims rest on standard fault-injection assumptions about pulse effects and outcome classification; no free parameters or invented entities are introduced.

axioms (1)
  • domain assumption Single electromagnetic pulses produce distinguishable and reproducible outcome classes (no-effect, minor SDC, major persistent degradation, device hang) that can be mapped to hardware fault models such as SET/SEU/SEFI.
    Classification and interpretation of results in the abstract.

pith-pipeline@v0.9.0 · 5854 in / 1208 out tokens · 40402 ms · 2026-05-22T05:19:54.875251+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

19 extracted references · 19 canonical work pages · 1 internal anchor

  1. [1]

    IEC 61508: Functional safety of electrical/electronic/programmable electronic safety-related systems,

    International Electrotechnical Commission, “IEC 61508: Functional safety of electrical/electronic/programmable electronic safety-related systems,” Edition 2.0, multiple parts, Geneva, Switzerland, 2010

  2. [2]

    ISO 26262: Road ve- hicles — functional safety,

    International Organization for Standardization, “ISO 26262: Road ve- hicles — functional safety,” Second edition, multiple parts, Geneva, Switzerland, 2018

  3. [3]

    ISO/IEC TR 5469: Artificial intelligence — functional safety and AI systems,

    International Organization for Standardization and International Elec- trotechnical Commission, “ISO/IEC TR 5469: Artificial intelligence — functional safety and AI systems,” ISO/IEC, Geneva, Switzerland, Technical Report ISO/IEC TR 5469:2024, 2024

  4. [4]

    ISO 21448: Road vehi- cles — safety of the intended functionality,

    International Organization for Standardization, “ISO 21448: Road vehi- cles — safety of the intended functionality,” Geneva, Switzerland, 2022

  5. [5]

    Late breaking results: Practical electromagnetic fault injection on Intel neural compute stick 2,

    S. Bhasin, D. Jap, P. Ravi, M. Kr ˇcek, and S. Picek, “Late breaking results: Practical electromagnetic fault injection on Intel neural compute stick 2,” inDesign, Automation & Test in Europe Conference (DATE). IEEE, 2025, pp. 1–2, also available as Cryptology ePrint Archive, Paper 2025/192, https://eprint.iacr.org/2025/192

  6. [6]

    EMFI for safety-critical testing of automotive systems,

    C. O’Flynn, “EMFI for safety-critical testing of automotive systems,” Cryptology ePrint Archive, Paper 2021/1217, 2021, published at the Workshop on Fault Diagnosis and Tolerance in Cryptography (FDTC). [Online]. Available: https://eprint.iacr.org/2021/1217

  7. [7]

    Intel neural compute stick 2 product brief,

    Intel Corporation, “Intel neural compute stick 2 product brief,” https://www.intel.com/content/www/us/en/products/sku/140109/intel- neural-compute-stick-2/specifications.html, 2018, accessed: May 22, 2026

  8. [8]

    Myriad 2: Eye of the computational vision storm,

    D. Moloney, B. Barry, R. Richmond, F. Connor, C. Brick, and D. Dono- hoe, “Myriad 2: Eye of the computational vision storm,” in2014 IEEE Hot Chips 26 Symposium (HCS). IEEE, 2014, pp. 1–18

  9. [9]

    Physical security of deep learning on edge devices: Comprehensive evaluation of fault injection attack vectors,

    X. Hou, J. Breier, D. Jap, L. Ma, S. Bhasin, and Y . Liu, “Physical security of deep learning on edge devices: Comprehensive evaluation of fault injection attack vectors,”Microelectronics Reliability, vol. 120, p. 114116, 2021

  10. [10]

    Understanding error propagation in deep learning neural network accelerators and application to resilience evaluation,

    G. Li, S. K. S. Hari, M. Sullivan, T. Tsai, K. Pattabiraman, J. Emer, and S. W. Keckler, “Understanding error propagation in deep learning neural network accelerators and application to resilience evaluation,” inProceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis (SC). ACM, 2017, pp. 8:1–8:12

  11. [11]

    Ares: A framework for quantifying the resilience of deep neural networks,

    B. Reagen, U. Gupta, L. Pentecost, P. Whatmough, S. K. Lee, N. Mulhol- land, D. Brooks, and G.-Y . Wei, “Ares: A framework for quantifying the resilience of deep neural networks,” inProceedings of the 55th Annual Design Automation Conference (DAC). ACM, 2018, pp. 17:1–17:6

  12. [12]

    PyTorchFI: A runtime perturbation tool for DNNs,

    A. Mahmoud, N. Aggarwal, A. Nobbe, J. R. Sanchez Vicarte, S. V . Adve, C. W. Fletcher, I. Frosio, and S. K. S. Hari, “PyTorchFI: A runtime perturbation tool for DNNs,” in50th Annual IEEE/IFIP International Conference on Dependable Systems and Networks Workshops (DSN-W). IEEE, 2020, pp. 25–31

  13. [13]

    Terminal brain damage: Exposing the graceless degradation in deep neural networks under hardware fault attacks,

    S. Hong, P. Frigo, Y . Kaya, C. Giuffrida, and T. Dumitras ¸, “Terminal brain damage: Exposing the graceless degradation in deep neural networks under hardware fault attacks,” in 28th USENIX Security Symposium. Santa Clara, CA, USA: USENIX Association, 2019, pp. 497–514. [Online]. Available: https://www.usenix.org/conference/usenixsecurity19/presentation/hong 15

  14. [14]

    On the resilience of deep learning for reduced-voltage FPGAs,

    N. Khoshavi, S. Sargolzaei, Y . Bi, and A. Roohi, “On the resilience of deep learning for reduced-voltage FPGAs,” arXiv preprint arXiv:2001.00053, 2020. [Online]. Available: https://arxiv.org/abs/2001.00053

  15. [15]

    Practical fault attack on deep neural networks,

    J. Breier, X. Hou, D. Jap, L. Ma, S. Bhasin, and Y . Liu, “Practical fault attack on deep neural networks,” inProceedings of the 2018 ACM SIGSAC Conference on Computer and Communications Security (CCS). ACM, 2018, pp. 2204–2206

  16. [16]

    Sniff: reverse engineering of neural networks with fault attacks,

    J. Breier, D. Jap, X. Hou, S. Bhasin, and Y . Liu, “Sniff: reverse engineering of neural networks with fault attacks,”IEEE Transactions on Reliability, vol. 71, no. 4, pp. 1527–1539, 2021

  17. [17]

    Fault injection attack on deep neural network,

    Y . Liu, L. Wei, B. Luo, and Q. Xu, “Fault injection attack on deep neural network,” inProceedings of the IEEE/ACM International Conference on Computer-Aided Design (ICCAD). IEEE, 2017, pp. 131–138

  18. [18]

    The Weight of a Bit: EMFI Sensitivity Analysis of Embedded Deep Learning Models

    J. Breier, ˇS. Ku ˇcer´ak, and X. Hou, “The Weight of a Bit: EMFI Sen- sitivity Analysis of Embedded Deep Learning Models,”arXiv preprint arXiv:2602.16309, 2026

  19. [19]

    Optuna: A next- generation hyperparameter optimization framework,

    T. Akiba, S. Sano, T. Yanase, T. Ohta, and M. Koyama, “Optuna: A next- generation hyperparameter optimization framework,” inProceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. ACM, 2019, pp. 2623–2631