Adaptive Soft Error Protection for Neural Network Processing

Cheng Liu; Feng Min; Xinghua Xue; Yinhe Han

arxiv: 2407.19664 · v3 · submitted 2024-07-29 · 💻 cs.LG

Adaptive Soft Error Protection for Neural Network Processing

Xinghua Xue , Cheng Liu , Feng Min , Yinhe Han This is my paper

Pith reviewed 2026-05-23 23:13 UTC · model grok-4.3

classification 💻 cs.LG

keywords soft error protectionneural networksgraph neural networkadaptive fault toleranceinput-dependent vulnerabilityruntime predictionfault tolerance

0 comments

The pith

A lightweight GNN predicts input-specific soft error vulnerabilities in neural networks to enable adaptive protection that reduces overhead by 42 percent on average.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper establishes that neural network vulnerability to soft errors depends on both fixed component differences and the particular input being processed at runtime. It introduces a lightweight graph neural network to forecast which inputs and components require protection, then adjusts fault tolerance policies accordingly in real time. This yields over 95 percent prediction accuracy and cuts average computational overhead by 42.12 percent while model accuracy stays intact. A reader would care because static protection schemes apply the same costly measures regardless of the input, wasting resources on workloads that are memory- and compute-intensive.

Core claim

By observing that neural network vulnerability is also input-dependent and varies dynamically, the work proposes an adaptive vulnerability-aware fault tolerance framework whose core is a lightweight GNN that predicts soft error vulnerabilities across inputs and components at runtime. This enables real-time adaptation of protection policies. The GNN predictor reaches over 95 percent accuracy in identifying critical cases, and the resulting adaptive scheme reduces computational overhead by an average of 42.12 percent while preserving model accuracy and outperforming static selective protection methods.

What carries the argument

A lightweight graph neural network (GNN) model that dynamically predicts soft error vulnerabilities across inputs and neural network components to drive real-time policy adaptation.

If this is right

The adaptive scheme reduces computational overhead by an average of 42.12 percent compared with static selective protection.
Model accuracy remains preserved under the reduced protection levels.
The GNN predictor identifies critical inputs and components with over 95 percent accuracy.
The approach supplies a complementary protection scheme that can be used alongside traditional static methods.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same predictor could be applied to other transient fault types beyond soft errors if the vulnerability patterns remain input-dependent.
Hardware implementations of the GNN predictor itself would need separate error handling to avoid creating a new single point of failure.
Savings may increase on larger models where the fraction of non-critical inputs grows, but this remains untested in the current results.
Integration with compiler-level or hardware-level redundancy could compound the overhead reductions reported here.

Load-bearing premise

Neural network vulnerability to soft errors is sufficiently input-dependent that a lightweight predictor can identify the critical cases accurately and cheaply at runtime.

What would settle it

An experiment applying the GNN predictor to previously unseen inputs or network architectures where prediction accuracy falls below 90 percent or where the adaptive scheme no longer reduces overhead by at least 30 percent without accuracy loss.

Figures

Figures reproduced from arXiv: 2407.19664 by Cheng Liu, Feng Min, Xinghua Xue, Yinhe Han.

**Figure 2.** Figure 2: The proposed adaptive fault-tolerant design framework. It leverages a GNN model to predict the NN vulnerability to soft errors. The prediction is [PITH_FULL_IMAGE:figures/full_fig_p003_2.png] view at source ↗

**Figure 3.** Figure 3: An example of graph representation. and incorporates three SAGEConv layers [16]. Each node is classified into one of two output labels: vulnerable or nonvulnerable, making the overall model lightweight and efficient. To train the GNN model, we label each NN layer as either vulnerable (1) or non-vulnerable (0) through simulationbased vulnerability analysis, thereby constructing a training dataset. Specifi… view at source ↗

**Figure 4.** Figure 4: Model accuracy comparison between different fault-tolerant design strategies in presence of various fault injection setups. [PITH_FULL_IMAGE:figures/full_fig_p004_4.png] view at source ↗

**Figure 5.** Figure 5: Fault-tolerant design overhead comparison between different fault-tolerant design strategies in presence of various fault injection setups. [PITH_FULL_IMAGE:figures/full_fig_p004_5.png] view at source ↗

**Figure 7.** Figure 7: (a) Accuracy of the vulnerability predictor on different datasets. (b) [PITH_FULL_IMAGE:figures/full_fig_p005_7.png] view at source ↗

**Figure 8.** Figure 8: Model accuracy and protection overhead comparison when using [PITH_FULL_IMAGE:figures/full_fig_p006_8.png] view at source ↗

read the original abstract

Previous research on selective protection for neural network components typically exploits only static vulnerability differences. Although these methods improve upon classical modular redundancy, they still incur substantial overhead for neural network workloads that are both memory-intensive and compute-intensive. In this work, we observe that neural network vulnerability is also input-dependent and varies dynamically at runtime. With this observation, we propose an adaptive, vulnerability-aware fault tolerance framework. At its core, a lightweight graph neural network (GNN) model dynamically predicts soft error vulnerabilities across inputs and neural network components, enabling real-time adaptation of fault tolerance policies. This design offers a complementary and more efficient protection scheme compared to traditional approaches. Experimental results demonstrate that the GNN predictor achieves over 95% accuracy in identifying critical inputs and components. Moreover, our adaptive scheme reduces computational overhead by an average of 42.12% while preserving model accuracy, significantly outperforming static selective protection methods.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper adds a runtime GNN predictor to make soft-error protection input-dependent, but the headline 42% overhead cut rests on unverified assumptions about the predictor's own cost and exposure.

read the letter

The new piece is the shift from static selective protection to an input-aware version that uses a lightweight GNN to decide at runtime which components need extra redundancy. That combination is not in the earlier static work the abstract cites, and the reported 95% predictor accuracy plus 42% average overhead drop are the concrete numbers the authors put forward. If those numbers hold under scrutiny, the approach could matter for memory- and compute-heavy models running on error-prone hardware. The experiments appear to compare against static baselines and claim accuracy is preserved, which is the right direction for this kind of claim. The soft spot is exactly the one the stress-test flags: the abstract gives no breakdown showing whether GNN inference time is folded into the overhead figure, whether the GNN itself is protected, or how end-to-end latency compares once the predictor is added. Without those details the net saving is hard to trust. The paper also does not spell out the datasets, error models, or statistical significance of the results, so the empirical support stays thin on first read. This is the kind of work that would interest people building reliable accelerators or deploying models in radiation-heavy settings. It is not a paradigm shift, but the adaptive angle is worth checking. I would send it to review rather than desk-reject, provided the full manuscript supplies the missing overhead accounting and experimental controls.

Referee Report

2 major / 0 minor

Summary. The manuscript proposes an adaptive soft error protection framework for neural networks that exploits input-dependent vulnerability. At its core is a lightweight GNN predictor that dynamically identifies critical inputs and components at runtime to adapt fault-tolerance policies. The central empirical claims are that the GNN achieves over 95% accuracy and that the adaptive scheme reduces computational overhead by an average of 42.12% while preserving model accuracy, significantly outperforming static selective protection methods.

Significance. If the results hold after proper accounting for predictor overhead and self-protection, the work would demonstrate a practical way to reduce the cost of selective protection in memory- and compute-intensive NN workloads by moving from static to input-adaptive policies. The observation that vulnerability varies dynamically is potentially useful, but its value depends on reproducible evidence that the GNN does not erase the claimed savings.

major comments (2)

[Abstract] Abstract: the headline claim of a 42.12% overhead reduction does not state whether GNN inference latency is included in the measured overhead or whether the GNN itself receives protection. This information is required to evaluate the net savings versus static baselines.
[Abstract] Abstract: no experimental details (datasets, models, baselines, number of runs, error bars, or end-to-end latency measurements) are supplied, so the >95% accuracy and 42.12% reduction figures cannot be verified or compared.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback. We agree the abstract needs to be more explicit on overhead accounting and will incorporate key experimental context. We address the comments point by point below.

read point-by-point responses

Referee: [Abstract] Abstract: the headline claim of a 42.12% overhead reduction does not state whether GNN inference latency is included in the measured overhead or whether the GNN itself receives protection. This information is required to evaluate the net savings versus static baselines.

Authors: We accept the point; the abstract is ambiguous here. The full manuscript measures overhead end-to-end (including GNN inference latency) and leaves the lightweight GNN unprotected due to its negligible vulnerability and size. We will revise the abstract to state that the 42.12% figure accounts for GNN inference and that the predictor operates without protection, enabling direct comparison to static baselines. revision: yes
Referee: [Abstract] Abstract: no experimental details (datasets, models, baselines, number of runs, error bars, or end-to-end latency measurements) are supplied, so the >95% accuracy and 42.12% reduction figures cannot be verified or compared.

Authors: Abstracts are space-constrained, but we agree some context would help. The manuscript reports results on ResNet/VGG models, CIFAR/ImageNet datasets, static selective protection baselines, averaged over multiple runs with error bars, and end-to-end latency. We will partially revise the abstract to include a brief clause such as 'evaluated on standard DNNs and datasets with statistical validation' while keeping full details in the body. revision: partial

Circularity Check

0 steps flagged

No circularity: empirical experimental claims with no derivation chain

full rationale

The paper is an empirical proposal whose central claims rest on measured experimental outcomes (GNN predictor accuracy >95%, 42.12% overhead reduction) rather than any mathematical derivation or first-principles prediction. No equations, fitted parameters renamed as predictions, self-definitional steps, or load-bearing self-citations appear in the abstract or described structure. The work reports results against external benchmarks and is therefore self-contained; the reader's assigned score of 2 reflects the absence of any reduction to inputs by construction.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The framework rests on the domain assumption that vulnerability varies dynamically with inputs and that a GNN can predict it accurately enough to guide protection decisions. No free parameters or invented entities are explicitly named in the abstract.

axioms (1)

domain assumption Neural network vulnerability to soft errors is input-dependent and varies dynamically at runtime
Stated as the key observation enabling the adaptive approach in the abstract.

pith-pipeline@v0.9.0 · 5680 in / 1155 out tokens · 20522 ms · 2026-05-23T23:13:56.391828+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

37 extracted references · 37 canonical work pages

[1]

Impact of artificial intelligence on aeronautics: An industry-wide review

Amina Zaoui, Dieudonn ´e Tchuente, Samuel Fosso Wamba, and Bernard Kamsu-Foguem. Impact of artificial intelligence on aeronautics: An industry-wide review. Journal of Engineering and Technology Manage- ment, 71:101800, 2024

work page 2024
[2]

Emerging trends and future research opportunities in artificial intelligence, machine learning, and deep learning

NL Rane, M Paramesha, J Rane, and O Kaya. Emerging trends and future research opportunities in artificial intelligence, machine learning, and deep learning. Artificial Intelligence and Industry in Society, 5:2–96, 2024

work page 2024
[3]

A survey on multimodal large language models for autonomous driving

Can Cui, Yunsheng Ma, Xu Cao, Wenqian Ye, Yang Zhou, Kaizhao Liang, Jintai Chen, Juanwu Lu, Zichong Yang, Kuei-Da Liao, et al. A survey on multimodal large language models for autonomous driving. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pages 958–979, 2024

work page 2024
[4]

Artificial intelligence for safety-critical systems in industrial and transportation domains: A survey

Jon Perez-Cerrolaza, Jaume Abella, Markus Borg, Carlo Donzella, Jes ´us Cerquides, Francisco J Cazorla, Cristofer Englund, Markus Tauber, George Nikolakopoulos, and Jose Luis Flores. Artificial intelligence for safety-critical systems in industrial and transportation domains: A survey. ACM Computing Surveys , 56(7):1–40, 2024

work page 2024
[5]

Software error incident categorizations in aerospace

Lorraine E Prokop. Software error incident categorizations in aerospace. Journal of Aerospace Information Systems , 21(10):775–789, 2024

work page 2024
[6]

A reliability study on cnns for critical embedded systems

Mohamed A Neggaz, Ihsen Alouani, Pablo R Lorenzo, and Smail Niar. A reliability study on cnns for critical embedded systems. In 2018 IEEE 36th International Conference on Computer Design (ICCD), pages 476–

work page 2018
[7]

Smart: Selective mac zero- optimization for neural network reliability under radiation

Anuj Justus Rajappa, Philippe Reiter, Tarso Kraemer Sarzi Sartori, Luiz Henrique Laurini, Hassen Fourati, Siegfried Mercelis, Jeroen Famaey, and Rodrigo Possamai Bastos. Smart: Selective mac zero- optimization for neural network reliability under radiation. Microelec- tronics Reliability, 150:115092, 2023

work page 2023
[8]

Understand- ing error propagation in deep learning neural network (dnn) accelerators and applications

Guanpeng Li, Siva Kumar Sastry Hari, Michael Sullivan, Timothy Tsai, Karthik Pattabiraman, Joel Emer, and Stephen W Keckler. Understand- ing error propagation in deep learning neural network (dnn) accelerators and applications. In Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, pages 1–12, 2017

work page 2017
[9]

Artificial neural networks for space and safety-critical ap- plications: Reliability issues and potential solutions

Paolo Rech. Artificial neural networks for space and safety-critical ap- plications: Reliability issues and potential solutions. IEEE Transactions on Nuclear Science , 2024

work page 2024
[10]

Efficient software-implemented hw fault tolerance for tinyml inference in safety-critical applications

Uzair Sharif, Daniel Mueller-Gritschneder, Rafael Stahl, and Ulf Schlichtmann. Efficient software-implemented hw fault tolerance for tinyml inference in safety-critical applications. In 2023 Design, Au- tomation & Test in Europe Conference & Exhibition (DATE) , pages 1–6. IEEE, 2023

work page 2023
[11]

Fault-tolerant neural network accelerators with selective tmr

Timoteo Garc ´ıa Bertoa, Giulio Gambardella, Nicholas J Fraser, Michaela Blott, and John McAllister. Fault-tolerant neural network accelerators with selective tmr. IEEE Design & Test , 40(2):67–74, 2022

work page 2022
[12]

Cost-effective memory protection and reliability evaluation based on machine error-tolerance: A case study on no-accuracy-loss yolov4 object detection model

Tong-Yu Hsieh, Ching-Yeh Tsai, Sian-Jhang Hou, and Wei-Ji Chao. Cost-effective memory protection and reliability evaluation based on machine error-tolerance: A case study on no-accuracy-loss yolov4 object detection model. Microelectronics Reliability, 147:115039, 2023

work page 2023
[13]

Reliability evaluation and analysis of fpga-based neural network acceleration sys- tem

Dawen Xu, Ziyang Zhu, Cheng Liu, Ying Wang, Shuang Zhao, Lei Zhang, Huaguo Liang, Huawei Li, and Kwang-Ting Cheng. Reliability evaluation and analysis of fpga-based neural network acceleration sys- tem. IEEE Transactions on Very Large Scale Integration (VLSI) Systems, 29(3):472–484, 2021

work page 2021
[14]

Exploration of activation fault reliability in quantized systolic array-based dnn ac- celerators

Mahdi Taheri, Natalia Cherezova, Mohammad Saeed Ansari, Maksim Jenihhin, Ali Mahani, Masoud Daneshtalab, and Jaan Raik. Exploration of activation fault reliability in quantized systolic array-based dnn ac- celerators. In 2024 25th International Symposium on Quality Electronic Design (ISQED), pages 1–8. IEEE, 2024

work page 2024
[15]

Dac-sdc low power object detection challenge for uav applications

Xiaowei Xu, Xinyi Zhang, Bei Yu, Xiaobo Sharon Hu, Christopher Rowen, Jingtong Hu, and Yiyu Shi. Dac-sdc low power object detection challenge for uav applications. IEEE transactions on pattern analysis and machine intelligence , 43(2):392–403, 2019

work page 2019
[16]

Inductive representation learning on large graphs

Will Hamilton, Zhitao Ying, and Jure Leskovec. Inductive representation learning on large graphs. Advances in neural information processing systems, 30, 2017

work page 2017
[17]

Bag-of-visual-words and spatial exten- sions for land-use classification

Yi Yang and Shawn Newsam. Bag-of-visual-words and spatial exten- sions for land-use classification. In Proceedings of the 18th SIGSPA- TIAL international conference on advances in geographic information systems, pages 270–279, 2010

work page 2010
[18]

Caltech 101

M Ranzato FF Li, M Andreeto and P Perona. Caltech 101. caltechdata, 2022

work page 2022
[19]

Ft-cnn: Algorithm-based fault tolerance for convolutional neural networks

Kai Zhao, Sheng Di, Sihuan Li, Xin Liang, Yujia Zhai, Jieyang Chen, Kaiming Ouyang, Franck Cappello, and Zizhong Chen. Ft-cnn: Algorithm-based fault tolerance for convolutional neural networks. IEEE Transactions on Parallel and Distributed Systems , 32(7):1677–1689, 2020

work page 2020
[20]

Arithmetic-intensity-guided fault tol- erance for neural network inference on gpus

Jack Kosaian and KV Rashmi. Arithmetic-intensity-guided fault tol- erance for neural network inference on gpus. In Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis , pages 1–15, 2021

work page 2021
[21]

Soft error reliability analysis of vision transformers

Xinghua Xue, Cheng Liu, Ying Wang, Bing Yang, Tao Luo, Lei Zhang, Huawei Li, and Xiaowei Li. Soft error reliability analysis of vision transformers. IEEE Transactions on Very Large Scale Integration (VLSI) Systems, 2023

work page 2023
[22]

Selective hardening of cnns based on layer vulnerability estimation

Cristiana Bolchini, Luca Cassano, Antonio Miele, and Alessandro Naz- zari. Selective hardening of cnns based on layer vulnerability estimation. In 2022 IEEE International Symposium on Defect and Fault Tolerance in VLSI and Nanotechnology Systems (DFT) , pages 1–6. IEEE, 2022

work page 2022
[23]

Evaluation and mitigation of weight-related single event upsets in a convolutional neural network

Yulong Cai, Ming Cai, Yanlai Wu, Jian Lu, Zeyu Bian, Bingkai Liu, and Shuai Cui. Evaluation and mitigation of weight-related single event upsets in a convolutional neural network. Electronics, 13(7):1296, 2024

work page 2024
[24]

Exploring winograd convolution for cost-effective neural network fault tolerance

Xinghua Xue, Cheng Liu, Bo Liu, Haitong Huang, Ying Wang, Tao Luo, Lei Zhang, Huawei Li, and Xiaowei Li. Exploring winograd convolution for cost-effective neural network fault tolerance. IEEE Transactions on Very Large Scale Integration (VLSI) Systems , 2023

work page 2023
[25]

Thop: Pytorch-opcounter

Ligeng Zhu. Thop: Pytorch-opcounter. In THOP: PyTorch-OpCounter, 2022

work page 2022
[26]

Sequential minimal optimization: A fast algorithm for training support vector machines

JC Platt. Sequential minimal optimization: A fast algorithm for training support vector machines. Technical report, Microsoft Research Technical Report, 1998

work page 1998
[27]

Random forests

Leo Breiman. Random forests. Machine learning, 45:5–32, 2001

work page 2001
[28]

Greedy function approximation: a gradient boosting machine

Jerome H Friedman. Greedy function approximation: a gradient boosting machine. Annals of statistics , pages 1189–1232, 2001

work page 2001
[29]

Approxabft: Approximate algorithm-based fault tolerance for vision transformers

Xinghua Xue, Cheng Liu, Haitong Huang, Bo Liu, Ying Wang, Bing Yang, Tao Luo, Lei Zhang, Huawei Li, and Xiaowei Li. Approxabft: Approximate algorithm-based fault tolerance for vision transformers. arXiv preprint arXiv:2302.10469 , 2023

work page arXiv 2023
[30]

The use of triple-modular redundancy to improve computer reliability

Robert E Lyons and Wouter Vanderkulk. The use of triple-modular redundancy to improve computer reliability. IBM journal of research and development, 6(2):200–209, 1962

work page 1962
[31]

Multicore soft error rate stabilization using adaptive dual modular redundancy

Ramakrishna Vadlamani, Jia Zhao, Wayne Burleson, and Russell Tessier. Multicore soft error rate stabilization using adaptive dual modular redundancy. In 2010 Design, Automation & Test in Europe Conference & Exhibition (DATE 2010) , pages 27–32. IEEE, 2010

work page 2010
[32]

Soft error mitigation in memory system

NORHUZAIMIN JULAI, FARHANA MOHAMAD ABDUL KADIR, and SHAMSIAH SUHAILI. Soft error mitigation in memory system. Journal of Engineering Science and Technology , 18(2):862–879, 2023

work page 2023
[33]

Smart redundancy schemes for anns against fault attacks

Troya C ¸ a˘gıl K¨oyl¨u, Said Hamdioui, and Mottaqiallah Taouil. Smart redundancy schemes for anns against fault attacks. In 2022 IEEE European Test Symposium (ETS) , pages 1–2. IEEE, 2022

work page 2022
[34]

Winograd convolution: A perspective from fault tolerance

Xinghua Xue, Haitong Huang, Cheng Liu, Tao Luo, Lei Zhang, and Ying Wang. Winograd convolution: A perspective from fault tolerance. In Proceedings of the 59th ACM/IEEE Design Automation Conference , pages 853–858, 2022

work page 2022
[35]

R2f: A remote retraining framework for aiot proces- sors with computing errors

Xu Dawen et al. R2f: A remote retraining framework for aiot proces- sors with computing errors. IEEE Transactions on Very Large Scale Integration (VLSI) Systems , 29(11):1955–1966, 2021

work page 1955
[36]

Selective hardening of critical neurons in deep neural networks

Annachiara Ruospo, Gabriele Gavarini, Ilaria Bragaglia, Marcello Traiola, Alberto Bosio, and Ernesto Sanchez. Selective hardening of critical neurons in deep neural networks. In 2022 25th International Symposium on Design and Diagnostics of Electronic Circuits and Systems (DDECS), pages 136–141. IEEE, 2022

work page 2022
[37]

Fkeras: A sensitivity analysis tool for edge neural networks

Olivia Weng, Andres Meza, Quinlan Bock, Benjamin Hawks, Javier Campos, Nhan Tran, Javier Mauricio Duarte, and Ryan Kastner. Fkeras: A sensitivity analysis tool for edge neural networks. Journal on Autonomous Transportation Systems, 2024

work page 2024

[1] [1]

Impact of artificial intelligence on aeronautics: An industry-wide review

Amina Zaoui, Dieudonn ´e Tchuente, Samuel Fosso Wamba, and Bernard Kamsu-Foguem. Impact of artificial intelligence on aeronautics: An industry-wide review. Journal of Engineering and Technology Manage- ment, 71:101800, 2024

work page 2024

[2] [2]

Emerging trends and future research opportunities in artificial intelligence, machine learning, and deep learning

NL Rane, M Paramesha, J Rane, and O Kaya. Emerging trends and future research opportunities in artificial intelligence, machine learning, and deep learning. Artificial Intelligence and Industry in Society, 5:2–96, 2024

work page 2024

[3] [3]

A survey on multimodal large language models for autonomous driving

Can Cui, Yunsheng Ma, Xu Cao, Wenqian Ye, Yang Zhou, Kaizhao Liang, Jintai Chen, Juanwu Lu, Zichong Yang, Kuei-Da Liao, et al. A survey on multimodal large language models for autonomous driving. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pages 958–979, 2024

work page 2024

[4] [4]

Artificial intelligence for safety-critical systems in industrial and transportation domains: A survey

Jon Perez-Cerrolaza, Jaume Abella, Markus Borg, Carlo Donzella, Jes ´us Cerquides, Francisco J Cazorla, Cristofer Englund, Markus Tauber, George Nikolakopoulos, and Jose Luis Flores. Artificial intelligence for safety-critical systems in industrial and transportation domains: A survey. ACM Computing Surveys , 56(7):1–40, 2024

work page 2024

[5] [5]

Software error incident categorizations in aerospace

Lorraine E Prokop. Software error incident categorizations in aerospace. Journal of Aerospace Information Systems , 21(10):775–789, 2024

work page 2024

[6] [6]

A reliability study on cnns for critical embedded systems

Mohamed A Neggaz, Ihsen Alouani, Pablo R Lorenzo, and Smail Niar. A reliability study on cnns for critical embedded systems. In 2018 IEEE 36th International Conference on Computer Design (ICCD), pages 476–

work page 2018

[7] [7]

Smart: Selective mac zero- optimization for neural network reliability under radiation

Anuj Justus Rajappa, Philippe Reiter, Tarso Kraemer Sarzi Sartori, Luiz Henrique Laurini, Hassen Fourati, Siegfried Mercelis, Jeroen Famaey, and Rodrigo Possamai Bastos. Smart: Selective mac zero- optimization for neural network reliability under radiation. Microelec- tronics Reliability, 150:115092, 2023

work page 2023

[8] [8]

Understand- ing error propagation in deep learning neural network (dnn) accelerators and applications

Guanpeng Li, Siva Kumar Sastry Hari, Michael Sullivan, Timothy Tsai, Karthik Pattabiraman, Joel Emer, and Stephen W Keckler. Understand- ing error propagation in deep learning neural network (dnn) accelerators and applications. In Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, pages 1–12, 2017

work page 2017

[9] [9]

Artificial neural networks for space and safety-critical ap- plications: Reliability issues and potential solutions

Paolo Rech. Artificial neural networks for space and safety-critical ap- plications: Reliability issues and potential solutions. IEEE Transactions on Nuclear Science , 2024

work page 2024

[10] [10]

Efficient software-implemented hw fault tolerance for tinyml inference in safety-critical applications

Uzair Sharif, Daniel Mueller-Gritschneder, Rafael Stahl, and Ulf Schlichtmann. Efficient software-implemented hw fault tolerance for tinyml inference in safety-critical applications. In 2023 Design, Au- tomation & Test in Europe Conference & Exhibition (DATE) , pages 1–6. IEEE, 2023

work page 2023

[11] [11]

Fault-tolerant neural network accelerators with selective tmr

Timoteo Garc ´ıa Bertoa, Giulio Gambardella, Nicholas J Fraser, Michaela Blott, and John McAllister. Fault-tolerant neural network accelerators with selective tmr. IEEE Design & Test , 40(2):67–74, 2022

work page 2022

[12] [12]

Cost-effective memory protection and reliability evaluation based on machine error-tolerance: A case study on no-accuracy-loss yolov4 object detection model

Tong-Yu Hsieh, Ching-Yeh Tsai, Sian-Jhang Hou, and Wei-Ji Chao. Cost-effective memory protection and reliability evaluation based on machine error-tolerance: A case study on no-accuracy-loss yolov4 object detection model. Microelectronics Reliability, 147:115039, 2023

work page 2023

[13] [13]

Reliability evaluation and analysis of fpga-based neural network acceleration sys- tem

Dawen Xu, Ziyang Zhu, Cheng Liu, Ying Wang, Shuang Zhao, Lei Zhang, Huaguo Liang, Huawei Li, and Kwang-Ting Cheng. Reliability evaluation and analysis of fpga-based neural network acceleration sys- tem. IEEE Transactions on Very Large Scale Integration (VLSI) Systems, 29(3):472–484, 2021

work page 2021

[14] [14]

Exploration of activation fault reliability in quantized systolic array-based dnn ac- celerators

Mahdi Taheri, Natalia Cherezova, Mohammad Saeed Ansari, Maksim Jenihhin, Ali Mahani, Masoud Daneshtalab, and Jaan Raik. Exploration of activation fault reliability in quantized systolic array-based dnn ac- celerators. In 2024 25th International Symposium on Quality Electronic Design (ISQED), pages 1–8. IEEE, 2024

work page 2024

[15] [15]

Dac-sdc low power object detection challenge for uav applications

Xiaowei Xu, Xinyi Zhang, Bei Yu, Xiaobo Sharon Hu, Christopher Rowen, Jingtong Hu, and Yiyu Shi. Dac-sdc low power object detection challenge for uav applications. IEEE transactions on pattern analysis and machine intelligence , 43(2):392–403, 2019

work page 2019

[16] [16]

Inductive representation learning on large graphs

Will Hamilton, Zhitao Ying, and Jure Leskovec. Inductive representation learning on large graphs. Advances in neural information processing systems, 30, 2017

work page 2017

[17] [17]

Bag-of-visual-words and spatial exten- sions for land-use classification

Yi Yang and Shawn Newsam. Bag-of-visual-words and spatial exten- sions for land-use classification. In Proceedings of the 18th SIGSPA- TIAL international conference on advances in geographic information systems, pages 270–279, 2010

work page 2010

[18] [18]

Caltech 101

M Ranzato FF Li, M Andreeto and P Perona. Caltech 101. caltechdata, 2022

work page 2022

[19] [19]

Ft-cnn: Algorithm-based fault tolerance for convolutional neural networks

Kai Zhao, Sheng Di, Sihuan Li, Xin Liang, Yujia Zhai, Jieyang Chen, Kaiming Ouyang, Franck Cappello, and Zizhong Chen. Ft-cnn: Algorithm-based fault tolerance for convolutional neural networks. IEEE Transactions on Parallel and Distributed Systems , 32(7):1677–1689, 2020

work page 2020

[20] [20]

Arithmetic-intensity-guided fault tol- erance for neural network inference on gpus

Jack Kosaian and KV Rashmi. Arithmetic-intensity-guided fault tol- erance for neural network inference on gpus. In Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis , pages 1–15, 2021

work page 2021

[21] [21]

Soft error reliability analysis of vision transformers

Xinghua Xue, Cheng Liu, Ying Wang, Bing Yang, Tao Luo, Lei Zhang, Huawei Li, and Xiaowei Li. Soft error reliability analysis of vision transformers. IEEE Transactions on Very Large Scale Integration (VLSI) Systems, 2023

work page 2023

[22] [22]

Selective hardening of cnns based on layer vulnerability estimation

Cristiana Bolchini, Luca Cassano, Antonio Miele, and Alessandro Naz- zari. Selective hardening of cnns based on layer vulnerability estimation. In 2022 IEEE International Symposium on Defect and Fault Tolerance in VLSI and Nanotechnology Systems (DFT) , pages 1–6. IEEE, 2022

work page 2022

[23] [23]

Evaluation and mitigation of weight-related single event upsets in a convolutional neural network

Yulong Cai, Ming Cai, Yanlai Wu, Jian Lu, Zeyu Bian, Bingkai Liu, and Shuai Cui. Evaluation and mitigation of weight-related single event upsets in a convolutional neural network. Electronics, 13(7):1296, 2024

work page 2024

[24] [24]

Exploring winograd convolution for cost-effective neural network fault tolerance

Xinghua Xue, Cheng Liu, Bo Liu, Haitong Huang, Ying Wang, Tao Luo, Lei Zhang, Huawei Li, and Xiaowei Li. Exploring winograd convolution for cost-effective neural network fault tolerance. IEEE Transactions on Very Large Scale Integration (VLSI) Systems , 2023

work page 2023

[25] [25]

Thop: Pytorch-opcounter

Ligeng Zhu. Thop: Pytorch-opcounter. In THOP: PyTorch-OpCounter, 2022

work page 2022

[26] [26]

Sequential minimal optimization: A fast algorithm for training support vector machines

JC Platt. Sequential minimal optimization: A fast algorithm for training support vector machines. Technical report, Microsoft Research Technical Report, 1998

work page 1998

[27] [27]

Random forests

Leo Breiman. Random forests. Machine learning, 45:5–32, 2001

work page 2001

[28] [28]

Greedy function approximation: a gradient boosting machine

Jerome H Friedman. Greedy function approximation: a gradient boosting machine. Annals of statistics , pages 1189–1232, 2001

work page 2001

[29] [29]

Approxabft: Approximate algorithm-based fault tolerance for vision transformers

Xinghua Xue, Cheng Liu, Haitong Huang, Bo Liu, Ying Wang, Bing Yang, Tao Luo, Lei Zhang, Huawei Li, and Xiaowei Li. Approxabft: Approximate algorithm-based fault tolerance for vision transformers. arXiv preprint arXiv:2302.10469 , 2023

work page arXiv 2023

[30] [30]

The use of triple-modular redundancy to improve computer reliability

Robert E Lyons and Wouter Vanderkulk. The use of triple-modular redundancy to improve computer reliability. IBM journal of research and development, 6(2):200–209, 1962

work page 1962

[31] [31]

Multicore soft error rate stabilization using adaptive dual modular redundancy

Ramakrishna Vadlamani, Jia Zhao, Wayne Burleson, and Russell Tessier. Multicore soft error rate stabilization using adaptive dual modular redundancy. In 2010 Design, Automation & Test in Europe Conference & Exhibition (DATE 2010) , pages 27–32. IEEE, 2010

work page 2010

[32] [32]

Soft error mitigation in memory system

NORHUZAIMIN JULAI, FARHANA MOHAMAD ABDUL KADIR, and SHAMSIAH SUHAILI. Soft error mitigation in memory system. Journal of Engineering Science and Technology , 18(2):862–879, 2023

work page 2023

[33] [33]

Smart redundancy schemes for anns against fault attacks

Troya C ¸ a˘gıl K¨oyl¨u, Said Hamdioui, and Mottaqiallah Taouil. Smart redundancy schemes for anns against fault attacks. In 2022 IEEE European Test Symposium (ETS) , pages 1–2. IEEE, 2022

work page 2022

[34] [34]

Winograd convolution: A perspective from fault tolerance

Xinghua Xue, Haitong Huang, Cheng Liu, Tao Luo, Lei Zhang, and Ying Wang. Winograd convolution: A perspective from fault tolerance. In Proceedings of the 59th ACM/IEEE Design Automation Conference , pages 853–858, 2022

work page 2022

[35] [35]

R2f: A remote retraining framework for aiot proces- sors with computing errors

Xu Dawen et al. R2f: A remote retraining framework for aiot proces- sors with computing errors. IEEE Transactions on Very Large Scale Integration (VLSI) Systems , 29(11):1955–1966, 2021

work page 1955

[36] [36]

Selective hardening of critical neurons in deep neural networks

Annachiara Ruospo, Gabriele Gavarini, Ilaria Bragaglia, Marcello Traiola, Alberto Bosio, and Ernesto Sanchez. Selective hardening of critical neurons in deep neural networks. In 2022 25th International Symposium on Design and Diagnostics of Electronic Circuits and Systems (DDECS), pages 136–141. IEEE, 2022

work page 2022

[37] [37]

Fkeras: A sensitivity analysis tool for edge neural networks

Olivia Weng, Andres Meza, Quinlan Bock, Benjamin Hawks, Javier Campos, Nhan Tran, Javier Mauricio Duarte, and Ryan Kastner. Fkeras: A sensitivity analysis tool for edge neural networks. Journal on Autonomous Transportation Systems, 2024

work page 2024