A Comprehensive Evaluation of Deep Learning Object Detection Models on Heterogeneous Edge Devices
Pith reviewed 2026-05-23 20:38 UTC · model grok-4.3
The pith
SSD MobileNet V1 delivers lowest latency and energy on edge devices but lowest accuracy, while YOLOv8 Medium achieves highest accuracy at greater cost.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
SSD MobileNet V1 achieves the lowest latency and energy consumption but the lowest accuracy, whereas YOLOv8 Medium achieves the highest accuracy at higher computational cost. Orin Nano offers the most favorable overall balance across most model families. Models achieve more similar accuracy on simpler images, while the accuracy gap widens as scene complexity increases.
What carries the argument
Systematic benchmarking of object detection models across heterogeneous edge devices with additional analysis of accuracy versus object count in images.
Load-bearing premise
That counting the number of objects serves as a reliable proxy for scene complexity and that the tested models and devices represent the general space of edge deployments.
What would settle it
Observing no systematic widening of accuracy gaps with increasing object counts across a larger set of images or finding substantially different trade-offs on untested edge hardware would challenge the conclusions.
Figures
read the original abstract
Modern applications such as autonomous vehicles, intelligent surveillance, and smart city systems increasingly require object detection on resource-constrained edge devices. Yet, there is still limited understanding of how different object detection models behave across heterogeneous edge devices and under varying scene complexity. In this paper, we benchmark YOLOv8 (Nano, Small, Medium), EfficientDet Lite (Lite0, Lite1, Lite2), and SSD (SSD MobileNet V1, SSDLite MobileDet) on Raspberry Pi 3, 4, 5 with/without Coral TPU accelerators, Raspberry Pi 5 with AI HAT+, Jetson Nano, and Jetson Orin Nano. We evaluate energy consumption, inference time, and accuracy, and further examine how accuracy changes with the number of objects in the input image. The results reveal clear trade-offs among accuracy, latency, and energy efficiency across model-device combinations. SSD MobileNet V1 achieves the lowest latency and energy consumption but the lowest accuracy, whereas YOLOv8 Medium achieves the highest accuracy at higher computational cost. TPU-based Raspberry Pi devices improve the efficiency of SSD and EfficientDet Lite while reducing YOLOv8 accuracy. Orin Nano offers the most favorable overall balance across most model families. The object-count-based analysis further shows that models achieve more similar accuracy on simpler images, while the accuracy gap widens as scene complexity increases.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper benchmarks YOLOv8 (Nano, Small, Medium), EfficientDet Lite (Lite0, Lite1, Lite2), and SSD (SSD MobileNet V1, SSDLite MobileDet) models on Raspberry Pi 3/4/5 (with/without Coral TPU and AI HAT+), Jetson Nano, and Jetson Orin Nano. It evaluates energy consumption, inference time, and accuracy, and further examines accuracy changes with the number of objects in input images. Key findings include SSD MobileNet V1 having the lowest latency/energy but lowest accuracy, YOLOv8 Medium having the highest accuracy at higher cost, Orin Nano offering the best overall balance, and accuracy gaps widening as object count (proxy for scene complexity) increases.
Significance. If the results hold, this provides practical empirical guidance on model-device trade-offs for edge object detection in applications like surveillance and autonomous systems. The broad coverage of multiple model families and heterogeneous devices is a strength for practitioners seeking deployment insights.
major comments (2)
- [object-count-based analysis] The object-count-based analysis: the claim that 'the accuracy gap widens as scene complexity increases' rests on binning by raw object count without validation or controls for confounders such as average object area, occlusion rate, background clutter, or lighting. This assumption is load-bearing for the complexity-related conclusion and risks confounding if object count correlates with these factors in the dataset.
- [evaluation methodology] Evaluation methodology: the manuscript supplies no details on the dataset(s) used for accuracy, statistical methods, error bars, or controls for input resolution and preprocessing. These omissions leave the accuracy and trade-off claims only partially supported.
minor comments (1)
- Clarify the exact model versions, training details, and any hardware-specific optimizations in the methods to improve reproducibility.
Simulated Author's Rebuttal
We thank the referee for the constructive comments. We address each major comment below and indicate planned revisions.
read point-by-point responses
-
Referee: [object-count-based analysis] The object-count-based analysis: the claim that 'the accuracy gap widens as scene complexity increases' rests on binning by raw object count without validation or controls for confounders such as average object area, occlusion rate, background clutter, or lighting. This assumption is load-bearing for the complexity-related conclusion and risks confounding if object count correlates with these factors in the dataset.
Authors: We acknowledge that binning by object count alone does not control for other factors and that this is a limitation of the current analysis. Object count was chosen as a readily available proxy from the annotations. In revision we will add an explicit discussion of potential confounders, qualify the conclusions, and include any feasible correlation analysis between object count and other factors using existing annotations. This will be a partial revision. revision: partial
-
Referee: [evaluation methodology] Evaluation methodology: the manuscript supplies no details on the dataset(s) used for accuracy, statistical methods, error bars, or controls for input resolution and preprocessing. These omissions leave the accuracy and trade-off claims only partially supported.
Authors: We agree these details were omitted and will add them in revision. A new subsection will describe the dataset(s), input resolutions, preprocessing pipelines, and any statistical methods used. We will also clarify that measurements were single-run due to device constraints and note this as a limitation. These changes will fully address the comment. revision: yes
Circularity Check
No circularity: purely empirical benchmarking with no derivations or self-referential claims
full rationale
The paper is a standard empirical evaluation study that measures latency, energy consumption, and accuracy of several object detection models across listed edge devices, then reports observed trends including accuracy versus object count in input images. No equations, fitted parameters, predictions, uniqueness theorems, or ansatzes appear in the text. All claims rest on direct experimental outputs from standard benchmarks (e.g., COCO-style evaluation) rather than any reduction to prior self-citations or constructed definitions. This matches the default case of a self-contained empirical paper with no load-bearing circular steps.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption Standard assumptions in machine learning model evaluation hold, such as the validity of accuracy metrics on the test images used.
Reference graph
Works this paper leans on
-
[1]
arXiv preprint arXiv:2201.07706 (2022)
Balasubramaniam, A., Pasricha, S.: Object detection in autonomous vehicles: Sta- tus and open challenges. arXiv preprint arXiv:2201.07706 (2022)
-
[2]
In: 2021 IEEE International Conference on Cloud Engineering (IC2E)
Baller, S.P., Jindal, A., Chadha, M., Gerndt, M.: Deepedgebench: Benchmarking deep neural networks on edge devices. In: 2021 IEEE International Conference on Cloud Engineering (IC2E). pp. 20–30. IEEE (2021)
work page 2021
-
[3]
In: Proceed- ings of the 2023 5th International Conference on Image Processing and Machine Vision
Bulut, A., Ozdemir, F., Bostanci, Y.S., Soyturk, M.: Performance evaluation of recent object detection models for traffic safety applications on edge. In: Proceed- ings of the 2023 5th International Conference on Image Processing and Machine Vision. pp. 1–6 (2023)
work page 2023
-
[4]
Cantero, D., Esnaola-Gonzalez, I., Miguel-Alonso, J., Jauregi, E.: Benchmarking object detection deep learning models in embedded devices. Sensors22(11), 4205 (2022)
work page 2022
-
[5]
In: Proceedings of the 2018 VII International Conference on Network, Communication and Computing
Chen, C.W., Ruan, S.J., Lin, C.H., Hung, C.C.: Performance evaluation of edge computing-based deep learning object detection. In: Proceedings of the 2018 VII International Conference on Network, Communication and Computing. pp. 40–43 (2018)
work page 2018
-
[6]
Coral: Usb accelerator datasheet. Tech. rep., Google LLC, https://coral.ai/docs/accelerator/datasheet/ (2019)
work page 2019
-
[7]
Coral: Object detection (May 2024), https://coral.ai/models/ object-detection/
work page 2024
-
[8]
Foundation, R.P.: About us (May 2024),https://www.raspberrypi.org/about/
work page 2024
-
[9]
Procedia Computer Science205, 239–248 (2022) 18 D
Galliera, R., Suri, N.: Object detection at the edge: Off-the-shelf deep learning capable devices and accelerators. Procedia Computer Science205, 239–248 (2022) 18 D. Alqahtani et al
work page 2022
-
[10]
In: 2021 International Conference on Circuits, Controls and Communications (CCUBE)
Kamath, V., Renuka, A.: Performance analysis of the pretrained efficientdet for real-time object detection on raspberry pi. In: 2021 International Conference on Circuits, Controls and Communications (CCUBE). pp. 1–6. IEEE (2021)
work page 2021
-
[11]
Mathematics10(22), 4299 (2022)
Kang, P., Somtham, A.: An evaluation of modern accelerator-based edge devices for object detection applications. Mathematics10(22), 4299 (2022)
work page 2022
-
[12]
Lema, D.G., Usamentiaga, R., García, D.F.: Quantitative comparison and perfor- mance evaluation of deep learning-based object detection models on edge comput- ing devices. Integration95, 102127 (2024)
work page 2024
-
[13]
Lin, T.Y., Maire, M., Belongie, S., Hays, J., Perona, P., Ramanan, D., Dollár, P., Zitnick, C.L.: Microsoft coco: Common objects in context. In: Computer Vision– ECCV 2014: 13th European Conference, Zurich, Switzerland, September 6-12, 2014, Proceedings, Part V 13. pp. 740–755. Springer (2014)
work page 2014
-
[14]
Liu, W., Anguelov, D., Erhan, D., Szegedy, C., Reed, S., Fu, C.Y., Berg, A.C.: Ssd: Single shot multibox detector. In: Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11–14, 2016, Proceedings, Part I 14. pp. 21–37. Springer (2016)
work page 2016
-
[15]
Engineer- ing Applications of Artificial Intelligence117, 105604 (2023)
Magalhães, S.C., dos Santos, F.N., Machado, P., Moreira, A.P., Dias, J.: Bench- marking edge computing devices for grape bunches and trunks detection using accelerated object detection single shot multibox deep learning models. Engineer- ing Applications of Artificial Intelligence117, 105604 (2023)
work page 2023
-
[16]
Nvidia: Nvidia jetson orin (May 2024), https://www.nvidia.com/en-us/ autonomous-machines/embedded-systems/jetson-orin/
work page 2024
-
[17]
In: Proceedings of the IEEE conference on computer vision and pattern recognition
Redmon, J., Divvala, S., Girshick, R., Farhadi, A.: You only look once: Unified, real-time object detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition. pp. 779–788 (2016)
work page 2016
-
[18]
Tan,M.,Pang,R.,Le,Q.V.:Efficientdet:Scalableandefficientobjectdetection.In: Proceedings of the IEEE/CVF conference on computer vision and pattern recog- nition. pp. 10781–10790 (2020)
work page 2020
-
[19]
Ultralytics: Home (2024),https://docs.ultralytics.com/
work page 2024
-
[20]
Voxel51: Fiftyone (May 2024),https://voxel51.com/fiftyone/
work page 2024
-
[21]
Zagitov, A., Chebotareva, E., Toschev, A., Magid, E.: Comparative analysis of neural network models performance on low-power devices for a real-time object detection task. Computer48(2) (2024)
work page 2024
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.