Recognition: unknown
Strix: Re-thinking NPU Reliability from a System Perspective
Pith reviewed 2026-05-10 16:18 UTC · model grok-4.3
The pith
Strix re-partitions NPUs to achieve sub-microsecond fault localisation and correction at 1.04 times slowdown
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Strix is a full-stack NPU reliability framework that re-partitions the NPU along the system inference pipeline, identifies dominant failure modes, and attaches targeted safeguards, achieving sub-micro-second fault localisation, error detection, and correction with only 1.04 times slowdown and minimal hardware overhead on an open-source SoC.
What carries the argument
Re-partitioning the NPU along the inference pipeline to expose and protect against dominant failure modes with targeted safeguards
Load-bearing premise
The failure modes identified as dominant after re-partitioning remain the main ones that actually occur across workloads and process nodes.
What would settle it
Running the system on a new workload or process node and observing a previously unseen failure mode that evades the targeted safeguards and produces undetected errors would show the approach does not cover real faults.
Figures
read the original abstract
DNNs and LLMs increasingly rely on hardware accelerators, including in safety-critical domains, while technology scaling and growing model complexity make hardware faults more frequent. Existing system-level mechanisms typically treat the NPU as a monolithic unit, using coarse-grained replication that incurs prohibitive performance and hardware overheads, leaving a gap between reliability requirements and deployable solutions. To bridge this gap, we present Strix, a full-stack NPU reliability framework on an open-source SoC, spanning micro-architecture, ISA, and programming methods. Strix re-partitions the NPU along the system inference pipeline, identifies dominant failure modes, and attaches targeted safeguards, achieving sub-micro-second fault localisation, error detection, and correction with only 1.04$\times$ slowdown and minimal hardware overhead.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper presents Strix, a full-stack NPU reliability framework on an open-source SoC that re-partitions the inference pipeline to identify dominant failure modes and attach targeted safeguards at micro-architecture, ISA, and programming levels. It claims this yields sub-microsecond fault localization, detection, and correction with only 1.04× slowdown and minimal hardware overhead, in contrast to coarse-grained monolithic replication.
Significance. If the measured overheads and coverage hold under the stated assumptions, Strix would meaningfully narrow the gap between reliability requirements and deployable NPU solutions for safety-critical DNN/LLM workloads. The system-level re-partitioning approach, rather than treating the accelerator as a black box, could influence future designs in reliable computing and computer architecture.
major comments (1)
- [Abstract] Abstract: the central quantitative claims (1.04× slowdown, sub-microsecond localization/detection/correction) are presented without any accompanying evaluation methodology, workload characterization, or error-bar data. Because the low-overhead guarantee rests on the re-partitioning step producing an exhaustive and stable set of dominant failure modes, the absence of cross-workload or cross-node coverage metrics for that identification step is load-bearing; unmodeled faults would leave coverage gaps that invalidate the claimed overhead bound.
Simulated Author's Rebuttal
We thank the referee for the constructive review and for recognizing the potential of Strix to narrow the reliability gap for safety-critical NPU workloads. We address the single major comment below, focusing on substance and indicating where revisions will be made.
read point-by-point responses
-
Referee: [Abstract] Abstract: the central quantitative claims (1.04× slowdown, sub-microsecond localization/detection/correction) are presented without any accompanying evaluation methodology, workload characterization, or error-bar data. Because the low-overhead guarantee rests on the re-partitioning step producing an exhaustive and stable set of dominant failure modes, the absence of cross-workload or cross-node coverage metrics for that identification step is load-bearing; unmodeled faults would leave coverage gaps that invalidate the claimed overhead bound.
Authors: The abstract is deliberately concise, as is conventional, while the full evaluation methodology, workload characterization (including representative DNN/LLM models), error-bar reporting from repeated runs, and cross-workload/cross-node validation of the re-partitioning step appear in Sections 4 and 5. The dominant failure modes were identified via exhaustive pipeline analysis on the open-source SoC and shown to be stable across the evaluated workloads and hardware nodes; unmodeled faults outside this set are acknowledged as a coverage limit in the paper. To address the referee's concern about the load-bearing nature of the claims, we will revise the abstract to include a short clause referencing the evaluation methodology and cross-workload stability results. This change will be incorporated in the next version. revision: yes
Circularity Check
No significant circularity; empirical system design with measured results
full rationale
The paper presents a system design and implementation for NPU reliability, describing re-partitioning of the inference pipeline, identification of dominant failure modes, and attachment of targeted safeguards, with claims supported by measured overheads (1.04× slowdown) on an open-source SoC. No mathematical derivations, equations, fitted parameters presented as predictions, or self-referential logic appear in the abstract or description. The central result is the design and its empirical evaluation rather than a chain that reduces to its own inputs by construction. No load-bearing self-citations, self-definitional steps, or other enumerated circular patterns are present. This is consistent with a typical hardware/systems paper where the output is the artifact itself.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption Dominant failure modes in NPUs can be identified after re-partitioning along the inference pipeline and addressed with stage-specific safeguards.
Reference graph
Works this paper leans on
-
[1]
2023.Resilience assessment of machine learning applica- tions under hardware faults
Udit Kumar Agarwal. 2023.Resilience assessment of machine learning applica- tions under hardware faults. Ph.D. Dissertation. University of British Columbia
2023
-
[2]
Udit Kumar Agarwal et al. 2023. Towards reliability assessment of systolic arrays against stuck-at faults. InDSN-S. IEEE, 230–236
2023
-
[3]
Performance characterization of using quantization for DNN inference on edge devices
Hyunho Ahn et al.2023. Performance characterization of using quantization for DNN inference on edge devices. InICFEC. IEEE, 1–6
2023
-
[4]
Haya Al Kassir et al. 2022. A review of the state of the art and future challenges of deep learning-based beamforming.IEEE Access10 (2022), 80869–80882
2022
-
[5]
GPU scheduling on the NVIDIA TX2: Hidden details revealed
Tanya Amert et al.2017. GPU scheduling on the NVIDIA TX2: Hidden details revealed. InRTSS. IEEE, 104–115
2017
-
[6]
Practical fault attack on deep neural networks
Jakub Breier et al.2018. Practical fault attack on deep neural networks. InACM CCS. 2204–2206
2018
-
[7]
Wei Cao et al. 2023. The future transistors.Nature620, 7974 (2023), 501–515
2023
-
[8]
Yu-Hsin Chen et al. 2019. Eyeriss v2: A flexible accelerator for emerging deep neural networks on mobile devices.IEEE JESTCS9, 2 (2019), 292–308
2019
-
[9]
Asurveyonmultimodallargelanguagemodelsforautonomous driving
CanCuietal .2024. Asurveyonmultimodallargelanguagemodelsforautonomous driving. InWACV. IEEE, 958–979
2024
-
[10]
Gpt3.int8():8-bitmatrixmultiplicationfortransformers at scale.NeurIPS35 (2022), 30318–30332
TimDettmersetal .2022. Gpt3.int8():8-bitmatrixmultiplicationfortransformers at scale.NeurIPS35 (2022), 30318–30332
2022
-
[11]
Michael Ditty. 2022. Nvidia orin system-on-chip. InHCS. IEEE, 1–17
2022
-
[12]
Fernando Fernandes Dos Santos et al. 2023. Understanding and Improving GPUs’ Reliability Combining Beam Experiments with Fault Simulation. InITC. IEEE, 176–185
2023
-
[13]
Joshua Fromm et al. 2018. Heterogeneous bitwidth binarization in convolutional neural networks.NeurIPS31 (2018)
2018
-
[14]
Daocheng Fu et al. 2024. Drive like a human: Rethinking autonomous driving with large language models. InWACVW. IEEE, 910–919
2024
-
[15]
Onthedependabilityofbidirectionalencoderrepresentations from transformers to soft errors.IEEE TNANO(2025)
ZhenGaoetal .2025. Onthedependabilityofbidirectionalencoderrepresentations from transformers to soft errors.IEEE TNANO(2025)
2025
-
[16]
Gemmini:Enablingsystematicdeep-learningarchitecture evaluation via full-stack integration
HasanGencetal .2021. Gemmini:Enablingsystematicdeep-learningarchitecture evaluation via full-stack integration. InDAC. IEEE, 769–774
2021
-
[17]
TheDarkSideofComputing:SilentDataCorruptions
DimitrisGizopoulos.2025. TheDarkSideofComputing:SilentDataCorruptions. Computer58, 6 (2025), 101–106
2025
-
[18]
Deeplearningaccelerators’configura- tion space exploration effect on performance and resource utilization: A Gemmini case study.Sensors23, 5 (2023), 2380
DennisAgyemanhNanaGookyietal .2023. Deeplearningaccelerators’configura- tion space exploration effect on performance and resource utilization: A Gemmini case study.Sensors23, 5 (2023), 2380
2023
-
[19]
The llama 3 herd of models.arXiv e-prints(2024), arXiv–2407
Aaron Grattafiori et al.2024. The llama 3 herd of models.arXiv e-prints(2024), arXiv–2407
2024
-
[20]
Wilfread Guillemé et al. 2024. HTAG-eNN: Hardening Technique with AND Gates for Embedded Neural Networks. InACM/IEEE DAC. 1–6
2024
-
[21]
Neuralnetworkreliabilityanalysisbasedonfaultinjection
BaixinGuoetal .2023. Neuralnetworkreliabilityanalysisbasedonfaultinjection. InCNML. 366–370
2023
-
[22]
Daya Guo et al. 2024. DeepSeek-Coder: When the Large Language Model Meets Programming–The Rise of Code Intelligence.arXiv preprint arXiv:2401.14196 (2024)
work page internal anchor Pith review arXiv 2024
-
[23]
Jiajun He et al. 2025. Fine-Grained Fault Sensitivity Analysis of Vision Trans- formers Under Soft Errors.Electronics14, 12 (2025), 2418
2025
-
[24]
Kaiming He et al. 2016. Deep residual learning for image recognition. InIEEE CVPR. 770–778
2016
-
[25]
Le-Ha Hoang et al. 2020. Ft-clipact: Resilience analysis of deep neural networks and improving their fault tolerance using clipped activation. InDATE. IEEE, 1241–1246
2020
-
[26]
Denselyconnectedconvolutionalnetworks.InIEEE CVPR
GaoHuangetal .2017. Denselyconnectedconvolutionalnetworks.InIEEE CVPR. 4700–4708
2017
-
[27]
Binyuan Hui et al. 2024. Qwen2. 5-coder technical report.arXiv preprint arXiv:2409.12186(2024)
work page internal anchor Pith review Pith/arXiv arXiv 2024
-
[28]
Younis Ibrahim et al. 2020. Analyzing the reliability of convolutional neural networks on gpus: Googlenet as a case study. InICCIT. IEEE, 1–6
2020
-
[29]
Younis Ibrahim et al. 2020. Soft error resilience of deep residual networks for object recognition.IEEE Access8 (2020), 19490–19503
2020
-
[30]
Younis Ibrahim et al. 2020. Soft errors in DNN accelerators: A comprehensive review.Microelectronics Reliability115 (2020), 113969
2020
-
[31]
ISO26262 ISO. 2018. 26262: Road vehicles-Functional safety. (2018)
2018
-
[32]
Haojie Jian et al. 2024. PerFT-N: Low-overhead Permanent Fault-Tolerance Mechanism for Neural Processing Units. InGLSVLSI. 25–31
2024
-
[33]
AlbertQ.Jiangetal .2023. Mistral7B.arXiv preprint arXiv:2310.068253(2023)
work page internal anchor Pith review Pith/arXiv arXiv 2023
-
[34]
Jeff Johnson. 2018. Rethinking floating point for deep learning.arXiv preprint arXiv:1811.01721(2018)
work page Pith review arXiv 2018
-
[35]
A domain-specific supercomputer for training deep neural networks.Commun
Norman P Jouppi et al.2020. A domain-specific supercomputer for training deep neural networks.Commun. ACM63, 7 (2020), 67–78
2020
-
[36]
Bert:Pre-trainingofdeepbidirectionaltransform- ers for language understanding
JacobDevlinKentonetal .2019. Bert:Pre-trainingofdeepbidirectionaltransform- ers for language understanding. InnaacL-HLT, Vol. 1. Minneapolis, Minnesota, 2
2019
-
[37]
K Korosec. 2021. Tesla will open controversial FSD Beta software to owners with a good driving record
2021
-
[38]
Alex Krizhevsky et al. 2012. Imagenet classification with deep convolutional neural networks.NeurIPS25 (2012)
2012
-
[39]
Guanpeng Li et al. 2025. Understanding Error Propagation in Deep-Learning Neural Networks Accelerators and Applications.IEEE Des. Test(2025)
2025
-
[40]
High-performance FPGA-based CNN accelerator with block-floating-point arithmetic.IEEE VLSI27, 8 (2019), 1874–1885
Xiaocong Lian et al.2019. High-performance FPGA-based CNN accelerator with block-floating-point arithmetic.IEEE VLSI27, 8 (2019), 1874–1885
2019
-
[41]
Fabiano Libano et al. 2018. Selective hardening for neural networks in FPGAs. IEEE TNS66, 1 (2018), 216–222
2018
-
[42]
Hsin-Chen Lu et al. 2024. Highly Fault-Tolerant Systolic-Array-Based Matrix Multiplication.Electronics13, 9 (2024), 1780
2024
-
[43]
Sparsh Mittal. 2020. A survey on modeling and improving reliability of DNN algorithms and accelerators.JSA104 (2020), 101689
2020
-
[44]
Pramesh Pandey et al. 2019. GreenTPU: Improving timing error resilience of a near-threshold tensor processing unit. InACM/IEEE DAC. 1–6
2019
-
[45]
The LAMBADA dataset: Word prediction requiring a broad discourse context
Denis Paperno et al.2016. The LAMBADA dataset: Word prediction requiring a broad discourse context.arXiv preprint arXiv:1606.06031(2016)
work page Pith review arXiv 2016
-
[46]
Sigma:Asparseandirregulargemmacceleratorwithflexible interconnects for dnn training
EricQinetal .2020. Sigma:Asparseandirregulargemmacceleratorwithflexible interconnects for dnn training. InHPCA. IEEE, 58–70
2020
-
[47]
Harsh Rangwani et al. 2022. Cost-sensitive self-training for optimizing non- decomposable metrics.NeurIPS35 (2022), 26994–27007
2022
-
[48]
Brandon Reagen et al. 2018. Ares: A framework for quantifying the resilience of deep neural networks. InACM/IEEE DAC. 1–6
2018
-
[49]
RISC-V Software Source. 2024. riscv-isa-sim: RISC-V ISA Simulator (Spike). https://github.com/riscv-software-src/riscv-isa-sim
2024
-
[50]
Yamato Saikawa and Yoichi Tomioka. 2024. Approximated Triple Modular Redundancy of Convolutional Neural Networks Based on Residual Quantization. InMCSoC. IEEE, 302–309
2024
-
[51]
Errormitigationusingapproximatelogic circuits: A comparison of probabilistic and evolutionary approaches.IEEE TR65, 4 (2016), 1871–1883
AntonioJSanchez-Clementeetal .2016. Errormitigationusingapproximatelogic circuits: A comparison of probabilistic and evolutionary approaches.IEEE TR65, 4 (2016), 1871–1883
2016
-
[52]
Mobilenetv2: Inverted residuals and linear bottlenecks
Mark Sandler et al.2018. Mobilenetv2: Inverted residuals and linear bottlenecks. InIEEE CVPR. 4510–4520
2018
-
[53]
Dongjoo Shin et al. 2018. DNPU: An energy-efficient deep-learning processor with heterogeneous multi-core architecture.IEEE Micro38, 5 (2018), 85–93
2018
-
[54]
Rethinking the inception architecture for computer vision
Christian Szegedy et al.2016. Rethinking the inception architecture for computer vision. InIEEE CVPR. 2818–2826
2016
-
[55]
Mahdi Taheri et al. 2024. Exploration of activation fault reliability in quantized systolic array-based dnn accelerators. InISQED. IEEE, 1–8
2024
-
[56]
Dojo:The microarchitecture of tesla’sexa-scalecomputer
Emil Talpes etal.2022. Dojo:The microarchitecture of tesla’sexa-scalecomputer. InHCS. IEEE, 1–28
2022
-
[57]
Mingxing Tan and Quoc Le. 2019. Efficientnet: Rethinking model scaling for convolutional neural networks. InICML. PMLR, 6105–6114
2019
-
[58]
UC Berkeley. [n.d.]. Gemmini — Systolic Array and Transposer.https://github. com/ucb-bar/gemmini#systolic-array-and-transposer
-
[59]
Conference’17, July 2017, Washington, DC, USA Jiapeng Guan et al
JoãoVieiraetal .2023.Gem5-accel:Apre-RTLsimulationtoolchainforaccelerator architecture validation.CAL(2023). Conference’17, July 2017, Washington, DC, USA Jiapeng Guan et al
2023
-
[60]
Jinghe Wei et al. 2020. Analyzing the impact of soft errors in VGG networks implemented on GPUs.Microelectronics Reliability110 (2020), 113648
2020
-
[61]
Wenda Wei et al. 2023. An approximate fault-tolerance design for a convolutional neural network accelerator.IT Professional25, 4 (2023), 85–90
2023
-
[62]
Tong Xie et al. 2025. ReaLM: Reliable and Efficient Large Language Model Inference with Statistical Algorithm-Based Fault Tolerance. InACM/IEEE DAC. 1–6
2025
-
[63]
Leon Yao and John Miller. 2015. Tiny imagenet classification with convolutional neural networks.CS 231N2, 5 (2015), 8
2015
-
[64]
Yiren Zhou et al. 2018. Adaptive quantization for deep neural network. In Proceedings of AAAI, Vol. 32
2018
-
[65]
A Survey on Efficient Inference for Large Language Models
ZixuanZhouetal .2024. Asurveyonefficientinferenceforlargelanguagemodels. arXiv preprint arXiv:2404.14294(2024)
work page internal anchor Pith review arXiv 2024
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.