pith. sign in

arxiv: 2604.09631 · v2 · pith:RN52RC5Mnew · submitted 2026-03-19 · 💻 cs.DC · cs.AI

Hardware Utilization and Inference Performance of Edge Object Detection Under Fault Injection

Pith reviewed 2026-05-21 10:39 UTC · model grok-4.3

classification 💻 cs.DC cs.AI
keywords fault injectionedge computingYOLOTensorRTJetson Nanoobject detectionhardware utilizationautonomous driving
0
0 comments X

The pith

TensorRT-optimized YOLO models maintain stable GPU occupancy, controlled temperatures, and safe power levels on Jetson Nano under large-scale input fault injections for lane following and object detection.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper characterizes CPU load, GPU utilization, RAM consumption, power draw, throughput, and thermal behavior of TensorRT-optimized YOLOv10s, YOLOv11s, and YOLO2026n pipelines on NVIDIA Jetson Nano. It applies a fault injection campaign with faults synthesized by large language models and latent diffusion models drawn from JetBot platform data. The study examines both lane-following and object detection tasks under heavily degraded inputs. Results indicate that GPU occupancy stays stable, temperature rise remains controlled, power consumption stays within safe limits, and memory usage follows a consistent release pattern after the initial warm-up phase. Object detection exhibits somewhat more variability in memory and thermal metrics, but overall the pipelines demonstrate resilience to input degradation.

Core claim

Across both tasks and both models the inference engines keep GPU occupancy stable, temperature rise under control, and power consumption within safe limits, while memory usage settles into a consistent release pattern after the initial warm-up phase. Object detection tends to show somewhat more variability in memory and thermal behavior, yet both tasks point to the same conclusion: the TensorRT pipelines hold up well even when the input data is heavily degraded.

What carries the argument

Decoupled LLM and LDM fault synthesis framework that generates degraded inputs from JetBot data to test TensorRT YOLO inference pipelines on NVIDIA Jetson Nano hardware.

If this is right

  • Hardware utilization remains predictable enough to support reliable long-running operation of edge AI in autonomous vehicles.
  • Memory release patterns after warm-up enable improved resource scheduling for sustained edge inference workloads.
  • Thermal and power stability lowers the risk of overheating or excessive energy use in battery-powered platforms.
  • Robustness across lane following and object detection indicates the optimizations transfer to multiple autonomous driving subtasks.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Similar hardware stability may appear in other model families if they receive equivalent TensorRT optimization on comparable edge hardware.
  • Direct comparison of the synthesized faults against physical sensor artifacts from vehicle cameras would provide a stronger test of ecological validity.
  • The characterization could inform adaptive runtime systems that adjust inference parameters based on observed hardware stress under varying input quality.

Load-bearing premise

The faults synthesized by the LLM and LDM framework based on JetBot data accurately represent the kinds of input degradation that occur in real autonomous driving environments.

What would settle it

Applying real captured noisy driving images to the same YOLO models on Jetson Nano and checking whether GPU occupancy fluctuates or temperatures exceed the controlled ranges reported under synthesized faults.

Figures

Figures reproduced from arXiv: 2604.09631 by Achim Rettberg, Faezeh Pasandideh, Mehdi Azarafza.

Figure 1
Figure 1. Figure 1: Two-phase pipeline: offline synthetic fault generation via LLM-LDM; online hardware characterization under fault injection on Jetson Nano. tight thresholds producing up to 30% false positives in fault￾free runs, while memory contention consistently emerged as the dominantstressor driving timeout misclassifications across both platforms. However,their work focuses exclusively on timeout￾based failure detect… view at source ↗
Figure 3
Figure 3. Figure 3: Normal and synthetically generated faulty image sam [PITH_FULL_IMAGE:figures/full_fig_p004_3.png] view at source ↗
Figure 5
Figure 5. Figure 5: (b) Power Consumption (∆Power) – YOLO2026v2 (lane following) [PITH_FULL_IMAGE:figures/full_fig_p005_5.png] view at source ↗
Figure 7
Figure 7. Figure 7: (d) Power Consumption (∆Power) – YOLO2026v2 (object detection). operational band, confirming that the TensorRT engine main￾tains a stable power profile even under extended, high-volume load. The absolute mean total power consumption of 5.19 W reflects efficient hardware utilization throughout the run. For the YOLO2026v2 scenarios, the data distributions follow a tight curve in both lane following and objec… view at source ↗
Figure 10
Figure 10. Figure 10: (c) Memory Alloca￾tion (∆RAM) – YOLOv11s (object detection) [PITH_FULL_IMAGE:figures/full_fig_p006_10.png] view at source ↗
Figure 14
Figure 14. Figure 14: (c) Thermal Flux (∆Temp) – YOLOv11s (ob￾ject detection). (∆Temp) – YOLO2026v2 (lane following) [PITH_FULL_IMAGE:figures/full_fig_p006_14.png] view at source ↗
Figure 16
Figure 16. Figure 16: (a) Core Utilization (GPU%) – YOLOv11s (lane following) [PITH_FULL_IMAGE:figures/full_fig_p007_16.png] view at source ↗
Figure 18
Figure 18. Figure 18: (c) Core Utilization (GPU%) – YOLOv11s (object detection) [PITH_FULL_IMAGE:figures/full_fig_p007_18.png] view at source ↗
Figure 20
Figure 20. Figure 20: (a) Detection Stabil￾ity (Retention%) – YOLOv11s (object detection) [PITH_FULL_IMAGE:figures/full_fig_p008_20.png] view at source ↗
Figure 22
Figure 22. Figure 22: (a) Detection Stabil￾ity (Retention%) – YOLOv11s (lane following) [PITH_FULL_IMAGE:figures/full_fig_p008_22.png] view at source ↗
Figure 29
Figure 29. Figure 29: (b) Tail Latency – YOLO2026v2 (object detec￾tion) [PITH_FULL_IMAGE:figures/full_fig_p009_29.png] view at source ↗
Figure 31
Figure 31. Figure 31: (b) Tail Latency – YOLO2026v2 (lane follow￾ing). Tables I–IV summarise the mean values of all hardware resource and inference performance metrics across the three TensorRT-optimized models evaluated on the NVIDIA Jetson Nano under fault-injected load, reported separately for the object detection and lane-following tasks. For the object detection task, the comparison across all three models reveals several… view at source ↗
Figure 33
Figure 33. Figure 33: (b) Execution Consistency (∆Jitter) – YOLO2026v2 (object detection) [PITH_FULL_IMAGE:figures/full_fig_p011_33.png] view at source ↗
Figure 35
Figure 35. Figure 35: (d) Execution Consistency (∆Jitter) – YOLO2026v2 (lane following). REFERENCES [1] F. Pasandideh, M. Azarafza, A. Ehteshami Bejnordi, S. Henkler, and A. Rettberg, “Decoupled generative fault injection for autonomous robots via LLM-LDM,” in Proceedings of the AI Technology Conference (AITC), 2026, poster presentation. [2] M. Pourreza and P. Narasimhan, “When timeouts fail: Revisiting fault detection under r… view at source ↗
read the original abstract

As deep learning models are deployed on resource constrained edge platforms in autonomous driving systems, reli able knowledge of hardware behavior under resource degradation becomes an essential requirement. Therefore, we introduce a systematic characterization of CPU load, GPU utilization, RAM consumption, power draw, throughput, and thermal behaviour of TensorRT-optimized YOLOv10s, YOLOv11s and YOLO2026n pipelines running on NVIDIA Jetson Nano under a large-scale fault injection campaign targeting both lane-following and ob ject detection tasks. Faults are synthesized using a decoupled framework that leverages large language models (LLMs) and latent diffusion models (LDMs), based on original data from our JetBot platform data collection. Results show that across both tasks and both models the inference engines keep GPU occupancy stable, temperature rise under control, and power consumption within safe limits, while memory usage settles into a consistent release pattern after the initial warm-up phase. Object detection tends to show somewhat more variability in memory and thermal behavior, yet both tasks point to the same conclusion: the TensorRT pipelines hold up well even when the input data is heavily degraded. These findings offer a hardware-level view of model reliability that sits alongside, rather than against, the broader body of work focused on inference performance at the edge.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper reports an experimental characterization of hardware metrics (CPU load, GPU utilization, RAM, power, throughput, temperature) for TensorRT-optimized YOLOv10s, YOLOv11s, and YOLO2026n models running on NVIDIA Jetson Nano. It uses a decoupled LLM/LDM framework to synthesize faults from JetBot-collected data and evaluates stability under these faults for both lane-following and object detection tasks. The central claim is that GPU occupancy remains stable, temperature rise is controlled, power stays within safe limits, and memory follows a consistent release pattern after warm-up, even with heavily degraded inputs.

Significance. If the results hold, the work supplies concrete hardware-level data on edge inference robustness under input degradation, which is useful for autonomous driving deployments on resource-constrained platforms. The direct measurement approach and focus on TensorRT pipelines add practical value alongside accuracy-focused studies. However, the significance is constrained by the absence of validation that the synthetic faults match real-world degradation statistics.

major comments (2)
  1. [§4.2 (Fault Synthesis Framework) and Results section] The headline stability conclusion (GPU occupancy, temperature, power, and memory behavior) is presented as holding under 'heavily degraded inputs,' yet the manuscript provides no quantitative validation that the LLM/LDM-generated faults reproduce the spatial-frequency content, severity distribution, or temporal correlations of real autonomous-driving degradations (e.g., sensor noise, occlusion, or lighting faults). Without distributional similarity metrics, perceptual distances, or side-by-side comparisons against corpora such as KITTI, nuScenes, or BDD100K, the observed hardware stability under the synthetic regime does not entail stability under authentic conditions.
  2. [Abstract and §5 (Experimental Results)] The abstract and results sections state clear directional outcomes but report no trial counts, statistical tests, error bars, or exact fault severity parameters. This makes it impossible to assess whether the stability claims are supported by rigorous evidence or could be explained by insufficient fault intensity.
minor comments (2)
  1. [Abstract and §3] Notation for the three YOLO variants is inconsistent between the abstract (YOLO2026n) and later sections; standardize model naming.
  2. [Figures 4–7] Figures showing hardware counter traces lack axis labels for time or frame count and do not indicate the number of runs averaged.

Simulated Author's Rebuttal

2 responses · 1 unresolved

We thank the referee for their insightful comments, which help improve the clarity and rigor of our work. We address each major comment below and indicate the revisions we will make.

read point-by-point responses
  1. Referee: [§4.2 (Fault Synthesis Framework) and Results section] The headline stability conclusion (GPU occupancy, temperature, power, and memory behavior) is presented as holding under 'heavily degraded inputs,' yet the manuscript provides no quantitative validation that the LLM/LDM-generated faults reproduce the spatial-frequency content, severity distribution, or temporal correlations of real autonomous-driving degradations (e.g., sensor noise, occlusion, or lighting faults). Without distributional similarity metrics, perceptual distances, or side-by-side comparisons against corpora such as KITTI, nuScenes, or BDD100K, the observed hardware stability under the synthetic regime does not entail stability under authentic conditions.

    Authors: We agree that the manuscript does not include quantitative comparisons or similarity metrics between the synthetic faults and real-world degradation statistics from datasets like KITTI or nuScenes. Our fault synthesis framework is designed to generate heavily degraded inputs using LLMs and LDMs based on JetBot data to evaluate hardware stability under extreme conditions. The results demonstrate that the TensorRT pipelines maintain stable GPU occupancy, controlled temperature, and safe power levels even with these degraded inputs. However, we recognize that without explicit validation of distributional similarity, the findings are specific to the synthetic fault model. In the revised manuscript, we will add a dedicated paragraph in the discussion section clarifying the scope of the synthetic faults as a stress-testing mechanism rather than a direct emulation of real-world statistics, and we will suggest future work involving such comparisons. This addresses the concern partially. revision: partial

  2. Referee: [Abstract and §5 (Experimental Results)] The abstract and results sections state clear directional outcomes but report no trial counts, statistical tests, error bars, or exact fault severity parameters. This makes it impossible to assess whether the stability claims are supported by rigorous evidence or could be explained by insufficient fault intensity.

    Authors: We acknowledge the need for more precise reporting of experimental parameters. The current manuscript describes a 'large-scale fault injection campaign' but does not specify the exact number of trials, fault severity levels, or include statistical details such as error bars. In the revised version, we will update the abstract and results section to include the number of fault injections performed per model and task, the specific parameters used in the LLM/LDM synthesis for degradation severity, and add error bars or variance measures to the reported hardware metrics where appropriate. If statistical tests were applied to confirm stability (e.g., low variance across runs), we will report them. This will provide the necessary rigor to support the claims. revision: yes

standing simulated objections not resolved
  • Performing a full quantitative validation of the synthetic faults against real-world datasets would require new experiments and data access not available in the current study.

Circularity Check

0 steps flagged

No circularity: results are direct hardware measurements with no fitted derivations or self-referential definitions

full rationale

The paper reports empirical measurements of CPU load, GPU utilization, RAM, power, throughput, and temperature on Jetson Nano for TensorRT-optimized YOLO models under LLM/LDM-synthesized faults. No equations, parameter fits, or derivations appear in the provided text or abstract; outcomes are not reduced to quantities defined by the authors' own parameters or prior self-citations. The central claims rest on observed hardware counters rather than any tautological construction, making the work self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim rests on the assumption that the LLM/LDM fault generator produces inputs whose degradation statistics match real-world sensor faults; no free parameters or invented entities are introduced.

axioms (1)
  • domain assumption Faults generated by the decoupled LLM/LDM framework based on JetBot data are representative of realistic input degradation in autonomous driving.
    This premise is required for the claim that the observed hardware stability generalizes beyond the synthetic faults.

pith-pipeline@v0.9.0 · 5768 in / 1313 out tokens · 53325 ms · 2026-05-21T10:39:02.088821+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

18 extracted references · 18 canonical work pages · 2 internal anchors

  1. [1]

    Decoupled generative fault injection for autonomous robots via LLM-LDM,

    F. Pasandideh, M. Azarafza, A. Ehteshami Bejnordi, S. Henkler, and A. Rettberg, “Decoupled generative fault injection for autonomous robots via LLM-LDM,” in Proceedings of the AI Technology Conference (AITC), 2026, poster presentation

  2. [2]

    When timeouts fail: Revisiting fault detection under resource stress in edge computing,

    M. Pourreza and P. Narasimhan, “When timeouts fail: Revisiting fault detection under resource stress in edge computing,” in Proceedings of the 18th IEEE/ACM International Conference on Utility and Cloud Computing, ser. UCC ’25. New York, NY, USA: Association for Computing Machinery, 2026. [Online]. Available: https://doi.org/10. 1145/3773274.3774280

  3. [3]

    Improving performance of real-time object detection in edge device through concurrent multi -frame processing,

    S. Kim, C. Kim, and S. Kim, “Improving performance of real-time object detection in edge device through concurrent multi -frame processing,” IEEE Access, vol. 13, pp. 1522–1533, 2025

  4. [4]

    Benchmarking yolov8 variants for object detection efficiency on jetson orin nx for edge computing applications,

    H. M. Aljami, N. A. Alrowais, A. M. AlAwajy, S. O. Alhrgan, R. A. Aldwaani, M. S. Alsawadi, N. U. Saqib, S. S. Alam, and R. Alsubaie, “Benchmarking yolov8 variants for object detection efficiency on jetson orin nx for edge computing applications,” Computers, vol. 15, no. 2,

  5. [5]

    Available: https://www.mdpi.com/2073-431X/15/2/74

    [Online]. Available: https://www.mdpi.com/2073-431X/15/2/74

  6. [6]

    An edge -deployed real-time adaptive traffic light control system using yolo -based vehicle detection and pce -aware density estimation,

    M. Raza, M. Kazmi, H. M. Kidwai, H. R. Khan, S. A. Qazi, K. Ar- shad, and K. Assaleh, “An edge -deployed real-time adaptive traffic light control system using yolo -based vehicle detection and pce -aware density estimation,” IEEE Access, vol. 13, pp. 153 586–153 613, 2025

  7. [7]

    NVIDIA Jetson Nano,

    NVIDIA Developer, “NVIDIA Jetson Nano,” https://developer.nvidia. com/embedded/jetson-nano, accessed: 2 Sep. 2025

  8. [8]

    Logitech C270 HD webcam, 720p video with noise reduc- ing mic,

    Logitech, “Logitech C270 HD webcam, 720p video with noise reduc- ing mic,” https://www.logitech.com/en-us/shop/p/c270-hd-webcam, ac- cessed: 2 Sep. 2025

  9. [9]

    NVIDIA T4 tensor core GPUs for accelerating AI in- ference,

    NVIDIA, “NVIDIA T4 tensor core GPUs for accelerating AI in- ference,” https://www.nvidia.com/en-us/data-center/tesla-t4/, accessed: 2 Sep. 2025

  10. [10]

    Train and deploy YOLO models,

    EdjeElectronics, “Train and deploy YOLO models,” https://github. com/EdjeElectronics/Train-and-Deploy-YOLO-Models, 2023, accessed: 2 Sep. 2025

  11. [11]

    PyTorch 2: Faster machine learning through dynamic Python bytecode transformation and graph compilation,

    J. Ansel et al. , “PyTorch 2: Faster machine learning through dynamic Python bytecode transformation and graph compilation,” in Proceedings of the ACM International Conference on Architectural Support for Pro - gramming Languages and Operating Systems, 2024, pp. 929–947

  12. [12]

    Comparative performance of YOLOv8, YOLOv9, YOLOv10, YOLOv11 and Faster R-CNN models for detection of multiple weed species,

    A. Sharma, V. Kumar, and L. Longchamps, “Comparative performance of YOLOv8, YOLOv9, YOLOv10, YOLOv11 and Faster R-CNN models for detection of multiple weed species,” Smart Agricultural Technology, vol. 9, p. 100648, Nov 2024. For the lane -following task, the models demonstrate stable hardware resource utilization, with YOLO2026v2 recording a mean power d...

  13. [13]

    ONNX runtime,

    ONNX Runtime Developers, “ONNX runtime,” https://onnxruntime.ai/, 2021, accessed: 2 Sep. 2025

  14. [14]

    TensorRT -based framework and optimiza- tion methodology for deep learning inference on Jetson boards,

    E. Jeong, J. Kim, and S. Ha, “TensorRT -based framework and optimiza- tion methodology for deep learning inference on Jetson boards,” ACM Transactions on Embedded Computing Systems , vol. 21, no. 5, pp. 1 –26, Jan 2022

  15. [15]

    gpt-oss-120b & gpt-oss-20b Model Card

    OpenAI, “GPT-OSS-120B & GPT-OSS-20B model card,” arXiv preprint arXiv:2508.10925, Aug. 2025. [Online]. Available: https: //arxiv.org/abs/2508.10925

  16. [16]

    [Online]

    Ollama Contributors, “Ollama,” GitHub repository, 2023. [Online]. Available: https://github.com/ollama/ollama

  17. [17]

    High-Resolution Image Synthesis with Latent Diffusion Models

    R. Rombach, A. Blattmann, D. Lorenz, P. Esser, and B. Ommer, “High-resolution image synthesis with latent diffusion models,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), New Orleans, LA, USA, Jun. 2022, pp. 10 684–10 695. [Online]. Available: https://arxiv.org/abs/2112.10752

  18. [18]

    Visionfault -350k: A large -scale fault injection dataset for robotic vision systems,

    M. Azarafza and F. Pasandideh, “Visionfault -350k: A large -scale fault injection dataset for robotic vision systems,” Feb. 2026. [Online]. Available: https://doi.org/10.5281/zenodo.18695332