pith. sign in

arxiv: 2409.16808 · v2 · submitted 2024-09-25 · 💻 cs.CV · cs.AR· cs.DC· cs.LG· cs.SE

A Comprehensive Evaluation of Deep Learning Object Detection Models on Heterogeneous Edge Devices

Pith reviewed 2026-05-23 20:38 UTC · model grok-4.3

classification 💻 cs.CV cs.ARcs.DCcs.LGcs.SE
keywords object detectionedge computingbenchmarkingYOLOv8energy consumptionlatencyaccuracyscene complexity
0
0 comments X

The pith

SSD MobileNet V1 delivers lowest latency and energy on edge devices but lowest accuracy, while YOLOv8 Medium achieves highest accuracy at greater cost.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper benchmarks YOLOv8, EfficientDet Lite, and SSD models on Raspberry Pi and Jetson edge devices, measuring energy, latency, and accuracy. It also examines accuracy changes based on the number of objects in input images. Results show distinct trade-offs, with SSD MobileNet V1 being most efficient yet least accurate and YOLOv8 Medium most accurate but costlier. Orin Nano stands out for overall balance, and accuracy similarities are greater on simpler images.

Core claim

SSD MobileNet V1 achieves the lowest latency and energy consumption but the lowest accuracy, whereas YOLOv8 Medium achieves the highest accuracy at higher computational cost. Orin Nano offers the most favorable overall balance across most model families. Models achieve more similar accuracy on simpler images, while the accuracy gap widens as scene complexity increases.

What carries the argument

Systematic benchmarking of object detection models across heterogeneous edge devices with additional analysis of accuracy versus object count in images.

Load-bearing premise

That counting the number of objects serves as a reliable proxy for scene complexity and that the tested models and devices represent the general space of edge deployments.

What would settle it

Observing no systematic widening of accuracy gaps with increasing object counts across a larger set of images or finding substantially different trade-offs on untested edge hardware would challenge the conclusions.

Figures

Figures reproduced from arXiv: 2409.16808 by Adel N. Toosi, Daghash K. Alqahtani, Maria A. Rodriguez, Muhammad Aamir Cheema.

Figure 1
Figure 1. Figure 1: Experimental Software and Hardware Setup. [PITH_FULL_IMAGE:figures/full_fig_p007_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Base Energy Energy Consumption per request excluding the base energy [PITH_FULL_IMAGE:figures/full_fig_p009_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Inference Time per request for different edge devices. [PITH_FULL_IMAGE:figures/full_fig_p010_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Accuracy (mAP) for different edge devices. [PITH_FULL_IMAGE:figures/full_fig_p011_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Energy consumption per request (excluding base energy) versus inference [PITH_FULL_IMAGE:figures/full_fig_p012_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: Energy consumption per request (excluding base energy) versus accuracy [PITH_FULL_IMAGE:figures/full_fig_p013_6.png] view at source ↗
Figure 7
Figure 7. Figure 7: Inference time versus accuracy for various object detection models [PITH_FULL_IMAGE:figures/full_fig_p014_7.png] view at source ↗
Figure 8
Figure 8. Figure 8: Energy consumption per request (excluding base energy) versus inference [PITH_FULL_IMAGE:figures/full_fig_p015_8.png] view at source ↗
read the original abstract

Modern applications such as autonomous vehicles, intelligent surveillance, and smart city systems increasingly require object detection on resource-constrained edge devices. Yet, there is still limited understanding of how different object detection models behave across heterogeneous edge devices and under varying scene complexity. In this paper, we benchmark YOLOv8 (Nano, Small, Medium), EfficientDet Lite (Lite0, Lite1, Lite2), and SSD (SSD MobileNet V1, SSDLite MobileDet) on Raspberry Pi 3, 4, 5 with/without Coral TPU accelerators, Raspberry Pi 5 with AI HAT+, Jetson Nano, and Jetson Orin Nano. We evaluate energy consumption, inference time, and accuracy, and further examine how accuracy changes with the number of objects in the input image. The results reveal clear trade-offs among accuracy, latency, and energy efficiency across model-device combinations. SSD MobileNet V1 achieves the lowest latency and energy consumption but the lowest accuracy, whereas YOLOv8 Medium achieves the highest accuracy at higher computational cost. TPU-based Raspberry Pi devices improve the efficiency of SSD and EfficientDet Lite while reducing YOLOv8 accuracy. Orin Nano offers the most favorable overall balance across most model families. The object-count-based analysis further shows that models achieve more similar accuracy on simpler images, while the accuracy gap widens as scene complexity increases.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 1 minor

Summary. The paper benchmarks YOLOv8 (Nano, Small, Medium), EfficientDet Lite (Lite0, Lite1, Lite2), and SSD (SSD MobileNet V1, SSDLite MobileDet) models on Raspberry Pi 3/4/5 (with/without Coral TPU and AI HAT+), Jetson Nano, and Jetson Orin Nano. It evaluates energy consumption, inference time, and accuracy, and further examines accuracy changes with the number of objects in input images. Key findings include SSD MobileNet V1 having the lowest latency/energy but lowest accuracy, YOLOv8 Medium having the highest accuracy at higher cost, Orin Nano offering the best overall balance, and accuracy gaps widening as object count (proxy for scene complexity) increases.

Significance. If the results hold, this provides practical empirical guidance on model-device trade-offs for edge object detection in applications like surveillance and autonomous systems. The broad coverage of multiple model families and heterogeneous devices is a strength for practitioners seeking deployment insights.

major comments (2)
  1. [object-count-based analysis] The object-count-based analysis: the claim that 'the accuracy gap widens as scene complexity increases' rests on binning by raw object count without validation or controls for confounders such as average object area, occlusion rate, background clutter, or lighting. This assumption is load-bearing for the complexity-related conclusion and risks confounding if object count correlates with these factors in the dataset.
  2. [evaluation methodology] Evaluation methodology: the manuscript supplies no details on the dataset(s) used for accuracy, statistical methods, error bars, or controls for input resolution and preprocessing. These omissions leave the accuracy and trade-off claims only partially supported.
minor comments (1)
  1. Clarify the exact model versions, training details, and any hardware-specific optimizations in the methods to improve reproducibility.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive comments. We address each major comment below and indicate planned revisions.

read point-by-point responses
  1. Referee: [object-count-based analysis] The object-count-based analysis: the claim that 'the accuracy gap widens as scene complexity increases' rests on binning by raw object count without validation or controls for confounders such as average object area, occlusion rate, background clutter, or lighting. This assumption is load-bearing for the complexity-related conclusion and risks confounding if object count correlates with these factors in the dataset.

    Authors: We acknowledge that binning by object count alone does not control for other factors and that this is a limitation of the current analysis. Object count was chosen as a readily available proxy from the annotations. In revision we will add an explicit discussion of potential confounders, qualify the conclusions, and include any feasible correlation analysis between object count and other factors using existing annotations. This will be a partial revision. revision: partial

  2. Referee: [evaluation methodology] Evaluation methodology: the manuscript supplies no details on the dataset(s) used for accuracy, statistical methods, error bars, or controls for input resolution and preprocessing. These omissions leave the accuracy and trade-off claims only partially supported.

    Authors: We agree these details were omitted and will add them in revision. A new subsection will describe the dataset(s), input resolutions, preprocessing pipelines, and any statistical methods used. We will also clarify that measurements were single-run due to device constraints and note this as a limitation. These changes will fully address the comment. revision: yes

Circularity Check

0 steps flagged

No circularity: purely empirical benchmarking with no derivations or self-referential claims

full rationale

The paper is a standard empirical evaluation study that measures latency, energy consumption, and accuracy of several object detection models across listed edge devices, then reports observed trends including accuracy versus object count in input images. No equations, fitted parameters, predictions, uniqueness theorems, or ansatzes appear in the text. All claims rest on direct experimental outputs from standard benchmarks (e.g., COCO-style evaluation) rather than any reduction to prior self-citations or constructed definitions. This matches the default case of a self-contained empirical paper with no load-bearing circular steps.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

This is an empirical benchmarking study. No free parameters are fitted or introduced. Relies only on standard domain assumptions of computer vision evaluation such as the validity of mAP or similar accuracy metrics on held-out images.

axioms (1)
  • domain assumption Standard assumptions in machine learning model evaluation hold, such as the validity of accuracy metrics on the test images used.
    The paper relies on conventional evaluation practices for object detection without stating or proving them.

pith-pipeline@v0.9.0 · 5804 in / 1393 out tokens · 35636 ms · 2026-05-23T20:38:47.907449+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

21 extracted references · 21 canonical work pages

  1. [1]

    arXiv preprint arXiv:2201.07706 (2022)

    Balasubramaniam, A., Pasricha, S.: Object detection in autonomous vehicles: Sta- tus and open challenges. arXiv preprint arXiv:2201.07706 (2022)

  2. [2]

    In: 2021 IEEE International Conference on Cloud Engineering (IC2E)

    Baller, S.P., Jindal, A., Chadha, M., Gerndt, M.: Deepedgebench: Benchmarking deep neural networks on edge devices. In: 2021 IEEE International Conference on Cloud Engineering (IC2E). pp. 20–30. IEEE (2021)

  3. [3]

    In: Proceed- ings of the 2023 5th International Conference on Image Processing and Machine Vision

    Bulut, A., Ozdemir, F., Bostanci, Y.S., Soyturk, M.: Performance evaluation of recent object detection models for traffic safety applications on edge. In: Proceed- ings of the 2023 5th International Conference on Image Processing and Machine Vision. pp. 1–6 (2023)

  4. [4]

    Sensors22(11), 4205 (2022)

    Cantero, D., Esnaola-Gonzalez, I., Miguel-Alonso, J., Jauregi, E.: Benchmarking object detection deep learning models in embedded devices. Sensors22(11), 4205 (2022)

  5. [5]

    In: Proceedings of the 2018 VII International Conference on Network, Communication and Computing

    Chen, C.W., Ruan, S.J., Lin, C.H., Hung, C.C.: Performance evaluation of edge computing-based deep learning object detection. In: Proceedings of the 2018 VII International Conference on Network, Communication and Computing. pp. 40–43 (2018)

  6. [6]

    Coral: Usb accelerator datasheet. Tech. rep., Google LLC, https://coral.ai/docs/accelerator/datasheet/ (2019)

  7. [7]

    Coral: Object detection (May 2024), https://coral.ai/models/ object-detection/

  8. [8]

    Foundation, R.P.: About us (May 2024),https://www.raspberrypi.org/about/

  9. [9]

    Procedia Computer Science205, 239–248 (2022) 18 D

    Galliera, R., Suri, N.: Object detection at the edge: Off-the-shelf deep learning capable devices and accelerators. Procedia Computer Science205, 239–248 (2022) 18 D. Alqahtani et al

  10. [10]

    In: 2021 International Conference on Circuits, Controls and Communications (CCUBE)

    Kamath, V., Renuka, A.: Performance analysis of the pretrained efficientdet for real-time object detection on raspberry pi. In: 2021 International Conference on Circuits, Controls and Communications (CCUBE). pp. 1–6. IEEE (2021)

  11. [11]

    Mathematics10(22), 4299 (2022)

    Kang, P., Somtham, A.: An evaluation of modern accelerator-based edge devices for object detection applications. Mathematics10(22), 4299 (2022)

  12. [12]

    Integration95, 102127 (2024)

    Lema, D.G., Usamentiaga, R., García, D.F.: Quantitative comparison and perfor- mance evaluation of deep learning-based object detection models on edge comput- ing devices. Integration95, 102127 (2024)

  13. [13]

    In: Computer Vision– ECCV 2014: 13th European Conference, Zurich, Switzerland, September 6-12, 2014, Proceedings, Part V 13

    Lin, T.Y., Maire, M., Belongie, S., Hays, J., Perona, P., Ramanan, D., Dollár, P., Zitnick, C.L.: Microsoft coco: Common objects in context. In: Computer Vision– ECCV 2014: 13th European Conference, Zurich, Switzerland, September 6-12, 2014, Proceedings, Part V 13. pp. 740–755. Springer (2014)

  14. [14]

    In: Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11–14, 2016, Proceedings, Part I 14

    Liu, W., Anguelov, D., Erhan, D., Szegedy, C., Reed, S., Fu, C.Y., Berg, A.C.: Ssd: Single shot multibox detector. In: Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11–14, 2016, Proceedings, Part I 14. pp. 21–37. Springer (2016)

  15. [15]

    Engineer- ing Applications of Artificial Intelligence117, 105604 (2023)

    Magalhães, S.C., dos Santos, F.N., Machado, P., Moreira, A.P., Dias, J.: Bench- marking edge computing devices for grape bunches and trunks detection using accelerated object detection single shot multibox deep learning models. Engineer- ing Applications of Artificial Intelligence117, 105604 (2023)

  16. [16]

    Nvidia: Nvidia jetson orin (May 2024), https://www.nvidia.com/en-us/ autonomous-machines/embedded-systems/jetson-orin/

  17. [17]

    In: Proceedings of the IEEE conference on computer vision and pattern recognition

    Redmon, J., Divvala, S., Girshick, R., Farhadi, A.: You only look once: Unified, real-time object detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition. pp. 779–788 (2016)

  18. [18]

    Tan,M.,Pang,R.,Le,Q.V.:Efficientdet:Scalableandefficientobjectdetection.In: Proceedings of the IEEE/CVF conference on computer vision and pattern recog- nition. pp. 10781–10790 (2020)

  19. [19]

    Ultralytics: Home (2024),https://docs.ultralytics.com/

  20. [20]

    Voxel51: Fiftyone (May 2024),https://voxel51.com/fiftyone/

  21. [21]

    Computer48(2) (2024)

    Zagitov, A., Chebotareva, E., Toschev, A., Magid, E.: Comparative analysis of neural network models performance on low-power devices for a real-time object detection task. Computer48(2) (2024)