Experimental Analysis of FreeRTOS Dependability through Targeted Fault Injection Campaigns
Pith reviewed 2026-05-21 10:36 UTC · model grok-4.3
The pith
Corruption of pointer and key scheduler variables in FreeRTOS frequently causes crashes while many TCB fields have limited impact on availability.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Using the KRONOS fault injection framework on FreeRTOS, the study shows that corruption of pointer and key scheduler-related variables frequently leads to crashes, whereas many TCB fields have only a limited impact on system availability.
What carries the argument
KRONOS, a software-based non-intrusive post-propagation fault injection framework that injects transient and permanent faults into OS-visible kernel data structures without specialized hardware.
If this is right
- Protection efforts can focus on scheduler pointers and key variables rather than all TCB fields to improve FreeRTOS radiation tolerance.
- System designers can prioritize hardening or monitoring of high-impact scheduler structures to maintain availability.
- The differential impact between variable types suggests selective fault tolerance mechanisms could be added to the kernel with lower overhead.
- Similar fault patterns may appear in other real-time operating systems that share comparable scheduler and task control data structures.
Where Pith is reading between the lines
- The same targeted injection approach could be applied to other RTOS kernels to compare vulnerability profiles across implementations.
- Results could guide the creation of lightweight runtime checks that detect and recover from corruption in the most critical scheduler variables.
- Extending the campaign to include inter-task communication structures might reveal additional single points of failure not covered in the current scheduler and TCB focus.
Load-bearing premise
The software-based post-propagation fault injection in KRONOS accurately represents the effects of real ionizing radiation on kernel data structures without introducing method-specific artifacts or missing propagation paths.
What would settle it
Running equivalent experiments with hardware-based fault injection or actual radiation exposure and observing substantially different crash frequencies or availability impacts for the same variables would indicate that KRONOS does not faithfully model real radiation effects.
Figures
read the original abstract
Real-Time Operating Systems (RTOSes) play a crucial role in safety-critical domains, where deterministic and predictable task execution is essential. Yet they are increasingly exposed to ionizing radiation, which can compromise system dependability. To assess FreeRTOS under such conditions, we introduce KRONOS, a software-based, non-intrusive post-propagation Fault Injection (FI) framework that injects transient and permanent faults into Operating System-visible kernel data structures without specialized hardware or debug interfaces. Using KRONOS, we conduct an extensive FI campaign on core FreeRTOS kernel components, including scheduler-related variables and Task Control Blocks (TCBs), characterizing the impact of kernel-level corruptions on functional correctness, timing behavior, and availability. The results show that corruption of pointer and key scheduler-related variables frequently leads to crashes, whereas many TCB fields have only a limited impact on system availability.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript introduces KRONOS, a software-based non-intrusive post-propagation fault injection framework for FreeRTOS that targets OS-visible kernel data structures (scheduler variables and TCB fields) to emulate transient and permanent faults without hardware or debug interfaces. Through an extensive campaign, it reports that pointer and key scheduler-related corruptions frequently cause crashes while many TCB fields exhibit only limited impact on system availability, timing, and functional correctness.
Significance. If the KRONOS injection model is shown to faithfully represent ionizing-radiation effects on kernel structures, the work supplies concrete, actionable data on FreeRTOS dependability that can inform hardening strategies for safety-critical real-time systems. The non-intrusive, software-only approach is a practical strength that could lower the barrier to similar analyses.
major comments (3)
- [§3] §3 (KRONOS framework description): The central claim that post-propagation injection into OS-visible structures accurately captures radiation-induced faults is load-bearing for all headline results, yet the manuscript provides no hardware cross-validation, radiation-beam comparison, or analysis of missed propagation paths that never reach the injected structures. This leaves open the possibility that the observed differential sensitivity (scheduler pointers vs. TCB fields) is an artifact of the injection timing and location model.
- [§4] §4 (Results and statistical analysis): The abstract and results sections state that pointer/scheduler corruptions 'frequently lead to crashes' and TCB fields have 'limited impact,' but supply no injection counts, confidence intervals, error bars, or explicit data-exclusion criteria. Without these, it is impossible to judge whether the reported patterns are statistically robust or sensitive to campaign parameters.
- [§2, §5] §2 and §5 (Related work and discussion): The paper does not compare KRONOS outcomes against prior hardware-based SEU campaigns on FreeRTOS or similar RTOS kernels, nor does it quantify how the post-propagation restriction might under-sample faults whose effects remain invisible to the OS-visible structures.
minor comments (2)
- [Introduction, §3] Clarify the precise definition and timing of 'post-propagation' injection in the introduction and §3 so readers can immediately understand what class of faults is being excluded.
- [§4] Add a table or figure summarizing the total number of injections per variable class and the resulting crash/availability percentages to make the quantitative claims easier to parse.
Simulated Author's Rebuttal
Thank you for the opportunity to revise our manuscript based on the referee's insightful comments. We have carefully considered each point and provide point-by-point responses below. Where appropriate, we have made revisions to address the concerns raised.
read point-by-point responses
-
Referee: [§3] §3 (KRONOS framework description): The central claim that post-propagation injection into OS-visible structures accurately captures radiation-induced faults is load-bearing for all headline results, yet the manuscript provides no hardware cross-validation, radiation-beam comparison, or analysis of missed propagation paths that never reach the injected structures. This leaves open the possibility that the observed differential sensitivity (scheduler pointers vs. TCB fields) is an artifact of the injection timing and location model.
Authors: We recognize the importance of validating the post-propagation injection model against actual hardware faults. While direct radiation-beam experiments are not included in this work due to the software-based nature of KRONOS, we have added a new subsection in §3 discussing the model's assumptions and potential limitations regarding missed propagation paths. Specifically, we argue that faults impacting the OS must propagate to visible kernel structures, and we provide a qualitative analysis of why the differential sensitivity is unlikely to be solely an artifact. We believe this strengthens the manuscript without requiring hardware access. revision: partial
-
Referee: [§4] §4 (Results and statistical analysis): The abstract and results sections state that pointer/scheduler corruptions 'frequently lead to crashes' and TCB fields have 'limited impact,' but supply no injection counts, confidence intervals, error bars, or explicit data-exclusion criteria. Without these, it is impossible to judge whether the reported patterns are statistically robust or sensitive to campaign parameters.
Authors: We agree that the statistical details are necessary for a complete assessment. In the revised manuscript, we have updated §4 to include the total number of injections performed for each structure, 95% confidence intervals for the reported crash rates, error bars on all figures, and explicit criteria for excluding invalid runs (such as those affected by external interrupts). These additions ensure the robustness of the findings can be properly evaluated. revision: yes
-
Referee: [§2, §5] §2 and §5 (Related work and discussion): The paper does not compare KRONOS outcomes against prior hardware-based SEU campaigns on FreeRTOS or similar RTOS kernels, nor does it quantify how the post-propagation restriction might under-sample faults whose effects remain invisible to the OS-visible structures.
Authors: We have revised §2 to include a comparison with prior hardware-based single-event upset (SEU) campaigns on FreeRTOS and other RTOSes, noting that our results align with observations of scheduler sensitivity in those studies. In §5, we have added a discussion quantifying the potential under-sampling by estimating the fraction of kernel memory that is OS-visible and arguing that invisible faults would not affect OS behavior directly. This addresses the concern about the scope of the injection model. revision: yes
- Direct hardware cross-validation or radiation-beam comparison for the KRONOS injection model
Circularity Check
No circularity: purely experimental fault-injection measurements with no derivations or self-referential predictions
full rationale
The manuscript describes an experimental campaign that injects faults into FreeRTOS kernel structures via the KRONOS software framework and reports observed outcomes on crashes, timing, and availability. No equations, fitted parameters, predictions, or first-principles derivations appear in the provided abstract or described content. Results are presented as direct empirical measurements rather than outputs computed from prior results or self-citations. The reader's assessment correctly identifies the work as observational; the skeptic's concern addresses methodological validity (whether post-propagation injection matches real radiation) but does not constitute circularity under the defined criteria, which require explicit reduction of a claimed derivation to its own inputs.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption Post-propagation software fault injection into OS-visible structures faithfully models radiation-induced transient and permanent faults.
invented entities (1)
-
KRONOS framework
no independent evidence
Reference graph
Works this paper leans on
-
[1]
DeepDriving: Learning Affordance for Direct Perception in Autonomous Driving,
C. Chenet al., “DeepDriving: Learning Affordance for Direct Perception in Autonomous Driving,” in2015 IEEE International Conference on Computer Vision (ICCV), Dec 2015, pp. 2722–2730
work page 2015
-
[2]
M. A. Soloukiet al., “Dependability in Embedded Systems: A Survey of Fault Tolerance Methods and Software-Based Mitigation Techniques,” IEEE Access, vol. 12, pp. 180 939–180 967, 2024
work page 2024
-
[3]
Enhancing Reliability in Embedded Systems Hardware: A Literature Survey,
R. Aalund and V . Philip Paglioni, “Enhancing Reliability in Embedded Systems Hardware: A Literature Survey,”IEEE Access, vol. 13, pp. 17 285–17 302, 2025
work page 2025
-
[4]
Displacement damage effects in irradiated semiconductor devices,
J. R. Srour and J. W. Palko, “Displacement damage effects in irradiated semiconductor devices,”IEEE Transactions on Nuclear Science, vol. 60, no. 3, pp. 1740–1766, June 2013
work page 2013
-
[5]
A. Valleroet al., “SyRA: Early System Reliability Analysis for Cross- Layer Soft Errors Resilience in Memory Arrays of Microprocessor Systems,”IEEE Transactions on Computers, vol. 68, no. 5, pp. 765– 783, May 2019
work page 2019
-
[6]
Alternatives to fault injections for early safety/security evaluations,
M. Portolanet al., “Alternatives to fault injections for early safety/security evaluations,” in2019 IEEE European Test Symposium (ETS), 2019, pp. 1–10
work page 2019
-
[7]
Demystifying the system vulnera- bility stack: Transient fault effects across the layers,
G. Papadimitriou and D. Gizopoulos, “Demystifying the system vulnera- bility stack: Transient fault effects across the layers,” in2021 ACM/IEEE 48th Annual International Symposium on Computer Architecture (ISCA), 2021, pp. 902–915
work page 2021
-
[8]
A. Benso and S. Di Carlo, “The Art of Fault Injection,”Control Engineering and Applied Informatics, vol. 13, no. 4, pp. 9–18, 2011
work page 2011
-
[9]
C. De Sioet al., “Evaluating reliability against SEE of embedded systems: A comparison of RTOS and bare-metal approaches,”Micro- electronics Reliability, vol. 150, p. 115124, 2023, Special issue of 34th European Symposium on Reliability of Electron Devices, Failure Physics and Analysis, ESREF 2023
work page 2023
-
[10]
Open source FreeRTOS as a case study in real-time operating system evolution,
F. Guanet al., “Open source FreeRTOS as a case study in real-time operating system evolution,”Journal of Systems and Software, vol. 118, pp. 19–35, 2016
work page 2016
-
[11]
FERRARI: a flexible software-based fault and error injection system,
G. A. Kanawatiet al., “FERRARI: a flexible software-based fault and error injection system,”IEEE Transactions on Computers, vol. 44, no. 2, pp. 248–260, Feb 1995
work page 1995
-
[12]
Xception: a technique for the experimental evaluation of dependability in modern computers,
J. Carreiraet al., “Xception: a technique for the experimental evaluation of dependability in modern computers,”IEEE Transactions on Software Engineering, vol. 24, no. 2, pp. 125–136, Feb 1998
work page 1998
-
[13]
A fast, flexible, and easy-to-develop FPGA-based fault injection technique,
M. Ebrahimiet al., “A fast, flexible, and easy-to-develop FPGA-based fault injection technique,”Microelectronics Reliability, vol. 54, no. 5, pp. 1000–1008, 2014
work page 2014
-
[14]
Di Nataleet al.,Cross-Layer Reliability of Computing Systems
G. Di Nataleet al.,Cross-Layer Reliability of Computing Systems. The Institution of Engineering and Technology (IET), 2020
work page 2020
-
[15]
Soft Error Effects on Arm Microprocessors: Early Estimations vs. Chip Measurements,
P. Bodmannet al., “Soft Error Effects on Arm Microprocessors: Early Estimations vs. Chip Measurements,”IEEE Transactions on Computers, pp. 1–1, 2021
work page 2021
-
[16]
Reliability assessment of FreeRTOS in Embedded Systems,
A. Bosioet al., “Reliability assessment of FreeRTOS in Embedded Systems,” in52nd Annual IEEE/IFIP International Conference on Dependable Systems and Networks - Supplemental Volume (DSN-S), 2022, pp. 28–30
work page 2022
-
[17]
E. Casseauet al., “Special Session: Operating Systems under test: an overview of the significance of the operating system in the resiliency of the computing continuum,” in2021 IEEE 39th VLSI Test Symposium (VTS), 2021, pp. 1–10
work page 2021
-
[18]
Gem5-MARVEL: Microarchitecture-Level Re- silience Analysis of Heterogeneous SoC Architectures,
O. Chatzopouloset al., “Gem5-MARVEL: Microarchitecture-Level Re- silience Analysis of Heterogeneous SoC Architectures,” in2024 IEEE International Symposium on High-Performance Computer Architecture (HPCA), March 2024, pp. 543–559
work page 2024
-
[19]
N. Binkertet al., “The gem5 simulator,”SIGARCH Computer Architec- ture News, vol. 39, no. 2, p. 1–7, Aug. 2011
work page 2011
-
[20]
Quantitative evaluation of soft error injection techniques for robust system design,
H. Choet al., “Quantitative evaluation of soft error injection techniques for robust system design,” inProceedings of the 50th Annual Design Au- tomation Conference, ser. DAC ’13. New York, NY , USA: Association for Computing Machinery (ACM), 2013
work page 2013
-
[21]
Fault injection experiments using FIAT,
J. Bartonet al., “Fault injection experiments using FIAT,”IEEE Trans- actions on Computers, vol. 39, no. 4, pp. 575–582, 4 1990
work page 1990
-
[22]
Dependability of COTS microkernel-based systems,
J. Arlatet al., “Dependability of COTS microkernel-based systems,” IEEE Transactions on Computers, vol. 51, no. 2, pp. 138–163, 2 2002
work page 2002
-
[23]
A Hardware-Based Approach for Fault Detection in RTOS-Based Embedded Systems,
D. Silvaet al., “A Hardware-Based Approach for Fault Detection in RTOS-Based Embedded Systems,” in2011 Sixteenth IEEE European Test Symposium. IEEE, 5 2011, pp. 209–209
work page 2011
-
[24]
International Electrotechnical Commission (IEC), “IEC 61000-4-29: Electromagnetic compatibility (EMC) – Part 4-29: Testing and measure- ment techniques – V oltage dips, short interruptions and voltage variations on DC input power port immunity tests,” 2000
work page 2000
-
[25]
On the Analysis of Real-time Operating System Reli- ability in Embedded Systems,
D. Mamoneet al., “On the Analysis of Real-time Operating System Reli- ability in Embedded Systems,” in2020 IEEE International Symposium on Defect and Fault Tolerance in VLSI and Nanotechnology Systems (DFT). IEEE, 2020, pp. 1–6
work page 2020
-
[26]
TACLeBench: A Benchmark Collection to Support Worst-Case Execution Time Research,
H. Falket al., “TACLeBench: A Benchmark Collection to Support Worst-Case Execution Time Research,” in16th International Workshop on Worst-Case Execution Time Analysis (WCET 2016), ser. OpenAccess Series in Informatics (OASIcs), M. Schoeberl, Ed., vol. 55. Dagstuhl, Germany: Schloss Dagstuhl–Leibniz-Zentrum f ¨ur Informatik, 2016, pp. 2:1–2:10
work page 2016
-
[27]
Statistical fault injection: Quantified error and confidence,
R. Leveugleet al., “Statistical fault injection: Quantified error and confidence,” in2009 Design, Automation & Test in Europe Conference & Exhibition. IEEE, 4 2009, pp. 502–506
work page 2009
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.