pith. sign in

arxiv: 2603.25666 · v1 · pith:KENCVSHXnew · submitted 2026-03-26 · 💻 cs.OS

Experimental Analysis of FreeRTOS Dependability through Targeted Fault Injection Campaigns

Pith reviewed 2026-05-21 10:36 UTC · model grok-4.3

classification 💻 cs.OS
keywords FreeRTOSfault injectiondependabilityreal-time operating systemsradiation effectstask control blocksscheduler variableskernel data structures
0
0 comments X

The pith

Corruption of pointer and key scheduler variables in FreeRTOS frequently causes crashes while many TCB fields have limited impact on availability.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces KRONOS, a software-based fault injection framework, to evaluate how transient and permanent faults affect FreeRTOS kernel data structures under conditions similar to ionizing radiation. It performs targeted injections into scheduler-related variables and Task Control Blocks, then measures effects on functional correctness, timing, and system availability. A sympathetic reader would care because FreeRTOS is widely used in safety-critical real-time applications where radiation exposure can compromise reliability. The central results identify that certain pointer and scheduler variables are high-risk points while many TCB fields tolerate corruption better.

Core claim

Using the KRONOS fault injection framework on FreeRTOS, the study shows that corruption of pointer and key scheduler-related variables frequently leads to crashes, whereas many TCB fields have only a limited impact on system availability.

What carries the argument

KRONOS, a software-based non-intrusive post-propagation fault injection framework that injects transient and permanent faults into OS-visible kernel data structures without specialized hardware.

If this is right

  • Protection efforts can focus on scheduler pointers and key variables rather than all TCB fields to improve FreeRTOS radiation tolerance.
  • System designers can prioritize hardening or monitoring of high-impact scheduler structures to maintain availability.
  • The differential impact between variable types suggests selective fault tolerance mechanisms could be added to the kernel with lower overhead.
  • Similar fault patterns may appear in other real-time operating systems that share comparable scheduler and task control data structures.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same targeted injection approach could be applied to other RTOS kernels to compare vulnerability profiles across implementations.
  • Results could guide the creation of lightweight runtime checks that detect and recover from corruption in the most critical scheduler variables.
  • Extending the campaign to include inter-task communication structures might reveal additional single points of failure not covered in the current scheduler and TCB focus.

Load-bearing premise

The software-based post-propagation fault injection in KRONOS accurately represents the effects of real ionizing radiation on kernel data structures without introducing method-specific artifacts or missing propagation paths.

What would settle it

Running equivalent experiments with hardware-based fault injection or actual radiation exposure and observing substantially different crash frequencies or availability impacts for the same variables would indicate that KRONOS does not faithfully model real radiation effects.

Figures

Figures reproduced from arXiv: 2603.25666 by Alessandro Savino, Luca Mannella, Stefano Di Carlo.

Figure 1
Figure 1. Figure 1: Overview of the main components of KRONOS and their interactions. [PITH_FULL_IMAGE:figures/full_fig_p003_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Injections on FreeRTOS variables: transient (on the left) and permanent (on the right) faults. and more than 20% resulted in system crashes, indicating limited fault-handling capability in the default configuration. SDC occurrences were infrequent (below 2%) and generally associated with timing deviations, whereas fewer than 5% of injections targeted invalid objects. Comparable failure rates were observed … view at source ↗
Figure 4
Figure 4. Figure 4: Injections on FreeRTOS lists: transient (on the left) and permanent (on the right) faults. category does not produce any SDC ( [PITH_FULL_IMAGE:figures/full_fig_p005_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Injections on FreeRTOS current TCB fields: transient (on the left) and permanent (on the right) faults. VI. CONCLUSION This work extended earlier experimental analyses of FreeR￾TOS reliability using KRONOS, a systematic software-based FI framework targeting kernel-level data structures. By operat￾ing on OS-visible memory state, KRONOS enables repeatable evaluation of RTOS failure modes without specialized … view at source ↗
read the original abstract

Real-Time Operating Systems (RTOSes) play a crucial role in safety-critical domains, where deterministic and predictable task execution is essential. Yet they are increasingly exposed to ionizing radiation, which can compromise system dependability. To assess FreeRTOS under such conditions, we introduce KRONOS, a software-based, non-intrusive post-propagation Fault Injection (FI) framework that injects transient and permanent faults into Operating System-visible kernel data structures without specialized hardware or debug interfaces. Using KRONOS, we conduct an extensive FI campaign on core FreeRTOS kernel components, including scheduler-related variables and Task Control Blocks (TCBs), characterizing the impact of kernel-level corruptions on functional correctness, timing behavior, and availability. The results show that corruption of pointer and key scheduler-related variables frequently leads to crashes, whereas many TCB fields have only a limited impact on system availability.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

3 major / 2 minor

Summary. The manuscript introduces KRONOS, a software-based non-intrusive post-propagation fault injection framework for FreeRTOS that targets OS-visible kernel data structures (scheduler variables and TCB fields) to emulate transient and permanent faults without hardware or debug interfaces. Through an extensive campaign, it reports that pointer and key scheduler-related corruptions frequently cause crashes while many TCB fields exhibit only limited impact on system availability, timing, and functional correctness.

Significance. If the KRONOS injection model is shown to faithfully represent ionizing-radiation effects on kernel structures, the work supplies concrete, actionable data on FreeRTOS dependability that can inform hardening strategies for safety-critical real-time systems. The non-intrusive, software-only approach is a practical strength that could lower the barrier to similar analyses.

major comments (3)
  1. [§3] §3 (KRONOS framework description): The central claim that post-propagation injection into OS-visible structures accurately captures radiation-induced faults is load-bearing for all headline results, yet the manuscript provides no hardware cross-validation, radiation-beam comparison, or analysis of missed propagation paths that never reach the injected structures. This leaves open the possibility that the observed differential sensitivity (scheduler pointers vs. TCB fields) is an artifact of the injection timing and location model.
  2. [§4] §4 (Results and statistical analysis): The abstract and results sections state that pointer/scheduler corruptions 'frequently lead to crashes' and TCB fields have 'limited impact,' but supply no injection counts, confidence intervals, error bars, or explicit data-exclusion criteria. Without these, it is impossible to judge whether the reported patterns are statistically robust or sensitive to campaign parameters.
  3. [§2, §5] §2 and §5 (Related work and discussion): The paper does not compare KRONOS outcomes against prior hardware-based SEU campaigns on FreeRTOS or similar RTOS kernels, nor does it quantify how the post-propagation restriction might under-sample faults whose effects remain invisible to the OS-visible structures.
minor comments (2)
  1. [Introduction, §3] Clarify the precise definition and timing of 'post-propagation' injection in the introduction and §3 so readers can immediately understand what class of faults is being excluded.
  2. [§4] Add a table or figure summarizing the total number of injections per variable class and the resulting crash/availability percentages to make the quantitative claims easier to parse.

Simulated Author's Rebuttal

3 responses · 1 unresolved

Thank you for the opportunity to revise our manuscript based on the referee's insightful comments. We have carefully considered each point and provide point-by-point responses below. Where appropriate, we have made revisions to address the concerns raised.

read point-by-point responses
  1. Referee: [§3] §3 (KRONOS framework description): The central claim that post-propagation injection into OS-visible structures accurately captures radiation-induced faults is load-bearing for all headline results, yet the manuscript provides no hardware cross-validation, radiation-beam comparison, or analysis of missed propagation paths that never reach the injected structures. This leaves open the possibility that the observed differential sensitivity (scheduler pointers vs. TCB fields) is an artifact of the injection timing and location model.

    Authors: We recognize the importance of validating the post-propagation injection model against actual hardware faults. While direct radiation-beam experiments are not included in this work due to the software-based nature of KRONOS, we have added a new subsection in §3 discussing the model's assumptions and potential limitations regarding missed propagation paths. Specifically, we argue that faults impacting the OS must propagate to visible kernel structures, and we provide a qualitative analysis of why the differential sensitivity is unlikely to be solely an artifact. We believe this strengthens the manuscript without requiring hardware access. revision: partial

  2. Referee: [§4] §4 (Results and statistical analysis): The abstract and results sections state that pointer/scheduler corruptions 'frequently lead to crashes' and TCB fields have 'limited impact,' but supply no injection counts, confidence intervals, error bars, or explicit data-exclusion criteria. Without these, it is impossible to judge whether the reported patterns are statistically robust or sensitive to campaign parameters.

    Authors: We agree that the statistical details are necessary for a complete assessment. In the revised manuscript, we have updated §4 to include the total number of injections performed for each structure, 95% confidence intervals for the reported crash rates, error bars on all figures, and explicit criteria for excluding invalid runs (such as those affected by external interrupts). These additions ensure the robustness of the findings can be properly evaluated. revision: yes

  3. Referee: [§2, §5] §2 and §5 (Related work and discussion): The paper does not compare KRONOS outcomes against prior hardware-based SEU campaigns on FreeRTOS or similar RTOS kernels, nor does it quantify how the post-propagation restriction might under-sample faults whose effects remain invisible to the OS-visible structures.

    Authors: We have revised §2 to include a comparison with prior hardware-based single-event upset (SEU) campaigns on FreeRTOS and other RTOSes, noting that our results align with observations of scheduler sensitivity in those studies. In §5, we have added a discussion quantifying the potential under-sampling by estimating the fraction of kernel memory that is OS-visible and arguing that invisible faults would not affect OS behavior directly. This addresses the concern about the scope of the injection model. revision: yes

standing simulated objections not resolved
  • Direct hardware cross-validation or radiation-beam comparison for the KRONOS injection model

Circularity Check

0 steps flagged

No circularity: purely experimental fault-injection measurements with no derivations or self-referential predictions

full rationale

The manuscript describes an experimental campaign that injects faults into FreeRTOS kernel structures via the KRONOS software framework and reports observed outcomes on crashes, timing, and availability. No equations, fitted parameters, predictions, or first-principles derivations appear in the provided abstract or described content. Results are presented as direct empirical measurements rather than outputs computed from prior results or self-citations. The reader's assessment correctly identifies the work as observational; the skeptic's concern addresses methodological validity (whether post-propagation injection matches real radiation) but does not constitute circularity under the defined criteria, which require explicit reduction of a claimed derivation to its own inputs.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 1 invented entities

Central claim depends on the representativeness of the fault model and the non-intrusive nature of the injection method; no free parameters or invented physical entities are evident from the abstract.

axioms (1)
  • domain assumption Post-propagation software fault injection into OS-visible structures faithfully models radiation-induced transient and permanent faults.
    Invoked to justify the validity of the FI campaign results on functional correctness, timing, and availability.
invented entities (1)
  • KRONOS framework no independent evidence
    purpose: Perform targeted, non-intrusive fault injection into FreeRTOS kernel data structures.
    New tool created for this study; no independent evidence of prior existence or external validation provided in abstract.

pith-pipeline@v0.9.0 · 5676 in / 1158 out tokens · 47532 ms · 2026-05-21T10:36:20.420194+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

27 extracted references · 27 canonical work pages

  1. [1]

    DeepDriving: Learning Affordance for Direct Perception in Autonomous Driving,

    C. Chenet al., “DeepDriving: Learning Affordance for Direct Perception in Autonomous Driving,” in2015 IEEE International Conference on Computer Vision (ICCV), Dec 2015, pp. 2722–2730

  2. [2]

    Dependability in Embedded Systems: A Survey of Fault Tolerance Methods and Software-Based Mitigation Techniques,

    M. A. Soloukiet al., “Dependability in Embedded Systems: A Survey of Fault Tolerance Methods and Software-Based Mitigation Techniques,” IEEE Access, vol. 12, pp. 180 939–180 967, 2024

  3. [3]

    Enhancing Reliability in Embedded Systems Hardware: A Literature Survey,

    R. Aalund and V . Philip Paglioni, “Enhancing Reliability in Embedded Systems Hardware: A Literature Survey,”IEEE Access, vol. 13, pp. 17 285–17 302, 2025

  4. [4]

    Displacement damage effects in irradiated semiconductor devices,

    J. R. Srour and J. W. Palko, “Displacement damage effects in irradiated semiconductor devices,”IEEE Transactions on Nuclear Science, vol. 60, no. 3, pp. 1740–1766, June 2013

  5. [5]

    SyRA: Early System Reliability Analysis for Cross- Layer Soft Errors Resilience in Memory Arrays of Microprocessor Systems,

    A. Valleroet al., “SyRA: Early System Reliability Analysis for Cross- Layer Soft Errors Resilience in Memory Arrays of Microprocessor Systems,”IEEE Transactions on Computers, vol. 68, no. 5, pp. 765– 783, May 2019

  6. [6]

    Alternatives to fault injections for early safety/security evaluations,

    M. Portolanet al., “Alternatives to fault injections for early safety/security evaluations,” in2019 IEEE European Test Symposium (ETS), 2019, pp. 1–10

  7. [7]

    Demystifying the system vulnera- bility stack: Transient fault effects across the layers,

    G. Papadimitriou and D. Gizopoulos, “Demystifying the system vulnera- bility stack: Transient fault effects across the layers,” in2021 ACM/IEEE 48th Annual International Symposium on Computer Architecture (ISCA), 2021, pp. 902–915

  8. [8]

    The Art of Fault Injection,

    A. Benso and S. Di Carlo, “The Art of Fault Injection,”Control Engineering and Applied Informatics, vol. 13, no. 4, pp. 9–18, 2011

  9. [9]

    Evaluating reliability against SEE of embedded systems: A comparison of RTOS and bare-metal approaches,

    C. De Sioet al., “Evaluating reliability against SEE of embedded systems: A comparison of RTOS and bare-metal approaches,”Micro- electronics Reliability, vol. 150, p. 115124, 2023, Special issue of 34th European Symposium on Reliability of Electron Devices, Failure Physics and Analysis, ESREF 2023

  10. [10]

    Open source FreeRTOS as a case study in real-time operating system evolution,

    F. Guanet al., “Open source FreeRTOS as a case study in real-time operating system evolution,”Journal of Systems and Software, vol. 118, pp. 19–35, 2016

  11. [11]

    FERRARI: a flexible software-based fault and error injection system,

    G. A. Kanawatiet al., “FERRARI: a flexible software-based fault and error injection system,”IEEE Transactions on Computers, vol. 44, no. 2, pp. 248–260, Feb 1995

  12. [12]

    Xception: a technique for the experimental evaluation of dependability in modern computers,

    J. Carreiraet al., “Xception: a technique for the experimental evaluation of dependability in modern computers,”IEEE Transactions on Software Engineering, vol. 24, no. 2, pp. 125–136, Feb 1998

  13. [13]

    A fast, flexible, and easy-to-develop FPGA-based fault injection technique,

    M. Ebrahimiet al., “A fast, flexible, and easy-to-develop FPGA-based fault injection technique,”Microelectronics Reliability, vol. 54, no. 5, pp. 1000–1008, 2014

  14. [14]

    Di Nataleet al.,Cross-Layer Reliability of Computing Systems

    G. Di Nataleet al.,Cross-Layer Reliability of Computing Systems. The Institution of Engineering and Technology (IET), 2020

  15. [15]

    Soft Error Effects on Arm Microprocessors: Early Estimations vs. Chip Measurements,

    P. Bodmannet al., “Soft Error Effects on Arm Microprocessors: Early Estimations vs. Chip Measurements,”IEEE Transactions on Computers, pp. 1–1, 2021

  16. [16]

    Reliability assessment of FreeRTOS in Embedded Systems,

    A. Bosioet al., “Reliability assessment of FreeRTOS in Embedded Systems,” in52nd Annual IEEE/IFIP International Conference on Dependable Systems and Networks - Supplemental Volume (DSN-S), 2022, pp. 28–30

  17. [17]

    Special Session: Operating Systems under test: an overview of the significance of the operating system in the resiliency of the computing continuum,

    E. Casseauet al., “Special Session: Operating Systems under test: an overview of the significance of the operating system in the resiliency of the computing continuum,” in2021 IEEE 39th VLSI Test Symposium (VTS), 2021, pp. 1–10

  18. [18]

    Gem5-MARVEL: Microarchitecture-Level Re- silience Analysis of Heterogeneous SoC Architectures,

    O. Chatzopouloset al., “Gem5-MARVEL: Microarchitecture-Level Re- silience Analysis of Heterogeneous SoC Architectures,” in2024 IEEE International Symposium on High-Performance Computer Architecture (HPCA), March 2024, pp. 543–559

  19. [19]

    The gem5 simulator,

    N. Binkertet al., “The gem5 simulator,”SIGARCH Computer Architec- ture News, vol. 39, no. 2, p. 1–7, Aug. 2011

  20. [20]

    Quantitative evaluation of soft error injection techniques for robust system design,

    H. Choet al., “Quantitative evaluation of soft error injection techniques for robust system design,” inProceedings of the 50th Annual Design Au- tomation Conference, ser. DAC ’13. New York, NY , USA: Association for Computing Machinery (ACM), 2013

  21. [21]

    Fault injection experiments using FIAT,

    J. Bartonet al., “Fault injection experiments using FIAT,”IEEE Trans- actions on Computers, vol. 39, no. 4, pp. 575–582, 4 1990

  22. [22]

    Dependability of COTS microkernel-based systems,

    J. Arlatet al., “Dependability of COTS microkernel-based systems,” IEEE Transactions on Computers, vol. 51, no. 2, pp. 138–163, 2 2002

  23. [23]

    A Hardware-Based Approach for Fault Detection in RTOS-Based Embedded Systems,

    D. Silvaet al., “A Hardware-Based Approach for Fault Detection in RTOS-Based Embedded Systems,” in2011 Sixteenth IEEE European Test Symposium. IEEE, 5 2011, pp. 209–209

  24. [24]

    International Electrotechnical Commission (IEC), “IEC 61000-4-29: Electromagnetic compatibility (EMC) – Part 4-29: Testing and measure- ment techniques – V oltage dips, short interruptions and voltage variations on DC input power port immunity tests,” 2000

  25. [25]

    On the Analysis of Real-time Operating System Reli- ability in Embedded Systems,

    D. Mamoneet al., “On the Analysis of Real-time Operating System Reli- ability in Embedded Systems,” in2020 IEEE International Symposium on Defect and Fault Tolerance in VLSI and Nanotechnology Systems (DFT). IEEE, 2020, pp. 1–6

  26. [26]

    TACLeBench: A Benchmark Collection to Support Worst-Case Execution Time Research,

    H. Falket al., “TACLeBench: A Benchmark Collection to Support Worst-Case Execution Time Research,” in16th International Workshop on Worst-Case Execution Time Analysis (WCET 2016), ser. OpenAccess Series in Informatics (OASIcs), M. Schoeberl, Ed., vol. 55. Dagstuhl, Germany: Schloss Dagstuhl–Leibniz-Zentrum f ¨ur Informatik, 2016, pp. 2:1–2:10

  27. [27]

    Statistical fault injection: Quantified error and confidence,

    R. Leveugleet al., “Statistical fault injection: Quantified error and confidence,” in2009 Design, Automation & Test in Europe Conference & Exhibition. IEEE, 4 2009, pp. 502–506