Architectural Isolation as a Timing Safety Primitive for Edge AI Medical Devices: Controlled Experimental Evidence on a Shared-Silicon Platform
Pith reviewed 2026-05-08 04:59 UTC · model grok-4.3
The pith
Accuracy and output stability can hold while timing constraints fail on shared hardware for edge AI medical devices.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
A system can satisfy accuracy-based validation, maintain output stability with Safety-Threshold Exceedance Rate equal to zero, and still violate timing constraints under deployment load. These are structurally independent properties. The demonstration uses identical MobileNetV2 models under identical adversarial load on two paths of the same NVIDIA Jetson Orin Nano: the TensorRT FP16 GPU path keeps mean latency below 11 ms while the ONNX Runtime FP32 CPU path shows 9.8 times higher mean latency and breaches the 10 Hz budget by 65 percent, even though both paths maintain STER equal to zero.
What carries the argument
Architectural isolation on a shared-silicon platform, implemented by running identical models on a dedicated GPU accelerator versus a general-purpose CPU under the same combined load to separate timing behavior from accuracy and stability outcomes.
Load-bearing premise
The specific adversarial load, MobileNetV2 model, and Jetson Orin Nano hardware setup under combined load are representative of real medical edge device conditions and that zero safety-threshold exceedance rate equates to clinical safety.
What would settle it
Repeating the identical experiment on the same hardware and model but finding that the CPU path maintains latency below 100 ms under combined load while keeping STER equal to zero would show the reported timing violation is not independent of the other properties.
Figures
read the original abstract
A system can satisfy accuracy-based validation, maintain output stability (Safety-Threshold Exceedance Rate, STER, equal to zero), and still violate timing constraints under deployment load. These are structurally independent properties that current pre-market validation protocols often do not operationalize at the inference layer. This letter demonstrates their independence through a controlled same-hardware experiment: identical MobileNetV2 models are evaluated under identical adversarial load on two execution paths of the same NVIDIA Jetson Orin Nano Super, a dedicated GPU accelerator (TensorRT FP16, half-precision floating point) and a general-purpose CPU (ONNX Runtime FP32, single-precision floating point). Both paths maintain STER = 0; the CPU path (ONNX Runtime FP32) degrades 7.2x under combined load (mean latency 9.8x higher than the GPU path (TensorRT FP16), which maintains latency below 11 ms), breaching the 10 Hz clinical cycle budget by 65%. Joint STER and latency verification is proposed as a candidate method for operationalizing U.S. FDA Draft Guidance FDA-2024-D-4488 robustness requirements at the inference layer, subject to regulatory review and clinical validation.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper claims that accuracy-based validation and zero Safety-Threshold Exceedance Rate (STER) do not guarantee satisfaction of timing constraints under deployment load in edge AI medical devices. It demonstrates this independence via a controlled same-silicon experiment on an NVIDIA Jetson Orin Nano using identical MobileNetV2 models: the TensorRT FP16 GPU path maintains latency below 11 ms (meeting the 10 Hz budget), while the ONNX Runtime FP32 CPU path degrades 7.2x in latency (9.8x higher mean, breaching the budget by 65%), yet both paths achieve STER=0. The work proposes joint STER and latency verification to operationalize FDA robustness guidance at the inference layer.
Significance. If the reported divergence holds under fuller methodological scrutiny, the result supplies a clear existence proof that functional stability and timing safety are separable properties, directly relevant to architectural isolation techniques for shared-silicon edge platforms. The same-hardware, same-load design is a strength that minimizes confounding variables and provides falsifiable, quantitative evidence (specific degradation factors and breach percentage) that could inform pre-market validation protocols.
major comments (2)
- [Abstract / Experimental Results] Abstract and experimental description: the central quantitative claims (7.2x degradation, 9.8x mean latency, 65% breach) are presented without error bars, number of trials, load-generation parameters, or any statistical tests. These omissions are load-bearing because the independence claim rests on the reliability of the observed timing difference between the two paths.
- [Methods] Methods: insufficient detail is given on how the adversarial load was constructed and applied identically to both execution paths, and on the precise definition and measurement protocol for STER=0. Without these, it is not possible to replicate or assess whether the CPU-path violation is robust or an artifact of the specific setup.
minor comments (2)
- [Abstract] Clarify whether the 10 Hz budget is a hard clinical requirement or a chosen threshold, and state the exact latency target used for the breach calculation.
- [Discussion] The title emphasizes 'Architectural Isolation as a Timing Safety Primitive'; the manuscript would benefit from a short paragraph explicitly linking the observed CPU/GPU divergence to isolation mechanisms rather than leaving it implicit.
Simulated Author's Rebuttal
We thank the referee for the constructive feedback on our manuscript. The comments highlight important areas for improving experimental transparency and replicability, and we have revised the paper accordingly to address them directly.
read point-by-point responses
-
Referee: [Abstract / Experimental Results] Abstract and experimental description: the central quantitative claims (7.2x degradation, 9.8x mean latency, 65% breach) are presented without error bars, number of trials, load-generation parameters, or any statistical tests. These omissions are load-bearing because the independence claim rests on the reliability of the observed timing difference between the two paths.
Authors: We agree that these statistical and methodological details are essential to substantiate the quantitative claims and the independence result. In the revised manuscript we have expanded the experimental results section to report the number of trials (1,000 independent inferences per execution path and load condition), error bars as standard deviation, the load-generation parameters (fixed concurrent CPU- and memory-bound processes launched via the same script), and a statistical comparison (Wilcoxon rank-sum test, p < 0.001) confirming the latency difference between paths. These additions directly support the reliability of the observed divergence while preserving the original quantitative findings. revision: yes
-
Referee: [Methods] Methods: insufficient detail is given on how the adversarial load was constructed and applied identically to both execution paths, and on the precise definition and measurement protocol for STER=0. Without these, it is not possible to replicate or assess whether the CPU-path violation is robust or an artifact of the specific setup.
Authors: We have substantially expanded the Methods section with a new subsection on experimental controls. The adversarial load was constructed from a fixed suite of background processes (matrix multiplications and I/O operations) executed concurrently on the shared SoC; identical load scripts and process priorities were used for both the TensorRT GPU and ONNX Runtime CPU paths to guarantee equivalent contention. STER is defined as the fraction of inferences in which the model’s predicted probability for the ground-truth class falls below a pre-specified clinical safety threshold (0.95); it was measured by logging every inference output against held-out validation labels and computing the exceedance rate over the full trial window. Both paths yielded STER = 0 under these conditions. The added protocol enables direct replication and robustness checks. revision: yes
Circularity Check
No significant circularity
full rationale
The paper is a purely experimental report of controlled measurements on the same Jetson Orin Nano hardware: identical MobileNetV2 models run under identical adversarial load on GPU (TensorRT FP16) versus CPU (ONNX Runtime FP32) paths, with STER=0 on both but CPU latency violating the 10 Hz budget. No equations, fitted parameters, derivations, or self-citations appear in the provided text. The central claim of structural independence between accuracy validation, STER=0, and timing constraints is established directly by the existence proof of divergent outcomes on the reported data, with no reduction to inputs by construction.
Axiom & Free-Parameter Ledger
axioms (2)
- domain assumption The chosen combined adversarial load on the Jetson Orin Nano represents realistic deployment conditions for edge AI medical devices
- domain assumption Zero Safety-Threshold Exceedance Rate indicates output stability adequate for clinical safety
Reference graph
Works this paper leans on
-
[1]
FDA, “Artificial Intelligence-Enabled Device Software Functions: Lifecycle Management and Marketing Submission Recommendations. Draft Guidance,” Docket FDA-2024-D-4488, Jan. 2025
work page 2024
-
[2]
J. Hao, P. Subedi, L. Ramaswamy, and I. K. Kim, “Reaching for the Sky: Maximizing Deep Learning Inference Throughput on Edge Devices with AI Multi-Tenancy,”ACM Trans. Internet Technol., vol. 23, no. 1, pp. 1–33, Feb. 2023
work page 2023
-
[3]
Performance Isolation for Inference Processes in Edge GPU Systems,
J. J. Martín, J. Flich, and C. Hernández, “Performance Isolation for Inference Processes in Edge GPU Systems,” arXiv:2601.07600, Jan. 2026
-
[4]
DeepEdgeBench: Benchmarking Deep Neural Networks on Edge Devices,
S. P. Baller, A. Jindal, M. Chadha, and M. Gerndt, “DeepEdgeBench: Benchmarking Deep Neural Networks on Edge Devices,” inProc. IEEE IC2E, 2021, pp. 20–30
work page 2021
-
[5]
Edge Devices Inference Performance Comparison,
M. Tobiasz et al., “Edge Devices Inference Performance Comparison,” arXiv:2306.12093, Jun. 2023
-
[6]
Increasing Safety of Neural Networks in Medical Devices,
B. A. Becker, “Increasing Safety of Neural Networks in Medical Devices,” inProc. SAFECOMP Workshops, LNCS vol. 11699, Springer, 2019, pp. 91–101
work page 2019
-
[7]
The Worst-Case Execution Time Problem—Overview of Methods and Survey of Tools,
R. Wilhelm et al., “The Worst-Case Execution Time Problem—Overview of Methods and Survey of Tools,”ACM Trans. Embed. Comput. Syst., vol. 7, no. 3, pp. 36:1–36:53, Apr. 2008. 9
work page 2008
-
[8]
Medical device software: Software life cycle processes,
IEC 62304:2006+AMD1:2015, “Medical device software: Software life cycle processes,” IEC, Geneva, 2015
work page 2006
-
[9]
Medical devices: Application of risk management to medical devices,
ISO 14971:2019, “Medical devices: Application of risk management to medical devices,” ISO, Geneva, 2019
work page 2019
-
[10]
stress-ng: Tool to Load and Stress a Computer System,
C. I. King, “stress-ng: Tool to Load and Stress a Computer System,” GitHub, 2023. [Online]. Available:https://github.com/ColinIanKing/stress-ng
work page 2023
-
[11]
MobileNetV2: Inverted Residuals and Linear Bottlenecks,
M. Sandler, A. Howard, M. Zhu, A. Zhmoginov, and L.-C. Chen, “MobileNetV2: Inverted Residuals and Linear Bottlenecks,” inProc. IEEE/CVF CVPR, 2018, pp. 4510–4520
work page 2018
-
[12]
NVIDIA Corporation, “TensorRT Developer Guide,” NVIDIA Developer Documen- tation, 2024. [Online]. Available: https://docs.nvidia.com/deeplearning/tensorrt/ developer-guide/
work page 2024
-
[13]
ONNX Runtime: Cross-Platform Inference Accelerator,
Microsoft Corporation, “ONNX Runtime: Cross-Platform Inference Accelerator,” GitHub, 2024. [Online]. Available:https://github.com/microsoft/onnxruntime
work page 2024
-
[14]
Early Recalls and Clinical Validation Gaps in Artificial Intelligence-Enabled Medical Devices,
B. Lee et al., “Early Recalls and Clinical Validation Gaps in Artificial Intelligence-Enabled Medical Devices,”JAMA Health Forum, vol. 6, no. 8, p. e253172, Aug. 2025. 10
work page 2025
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.