Beyond Benchmarks: Continuous Edge Inference for Fine-Grained Roadside Perception

Aditya Mishra; Haroon Lone

arxiv: 2606.17241 · v1 · pith:Y7JAN2BBnew · submitted 2026-06-15 · 💻 cs.CV · cs.RO· cs.SY· eess.SY

Beyond Benchmarks: Continuous Edge Inference for Fine-Grained Roadside Perception

Aditya Mishra , Haroon Lone This is my paper

Pith reviewed 2026-06-27 03:55 UTC · model grok-4.3

classification 💻 cs.CV cs.ROcs.SYeess.SY

keywords edge inferencecontinuous deploymenttemporal stabilizationroadside perceptionbenchmark evaluationstreaming videofine-grained classificationembedded AI

0 comments

The pith

Benchmark-centric evaluation overstates deployed edge inference performance by 20-30%.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper establishes that standard benchmark evaluations of AI models on edge hardware fail to reflect performance during continuous streaming video operation. It documents consistent accuracy drops of 20-30% relative when moving from static images to real-world roadside perception streams, caused by factors such as temporal instability and sustained-load effects. The authors introduce Edge-TSR as a system that combines detection, tracking, classification, and a lightweight track-aware temporal stabilization step to reduce these inconsistencies. This approach recovers up to 10.16% classification accuracy over per-frame baselines while sustaining real-time throughput on embedded hardware during extended deployments. A reader would care because many edge AI deployments rely on benchmark numbers that do not predict actual field behavior.

Core claim

Our central finding is that benchmark-centric evaluation systematically overstates deployed edge inference performance. Across three state-of-the-art baselines, we observe consistent 20-30% relative degradation when transitioning from static-image evaluation to real-world streaming deployment. Edge-TSR addresses this gap through temporal inference stabilization, recovering up to 10.16% classification accuracy over per-frame inference baselines while maintaining sustained real-time performance under continuous operation.

What carries the argument

Edge-TSR, a continuous edge inference system that integrates detection, tracking, fine-grained classification, and a lightweight track-aware temporal stabilization mechanism.

If this is right

Three state-of-the-art baselines each show 20-30% relative accuracy loss when evaluated on streaming video instead of static images.
The track-aware stabilization recovers up to 10.16% classification accuracy while adding negligible computational overhead.
A 55-minute, 26 km vehicular deployment sustains 16.18 FPS within safe thermal limits on a single embedded device.
Joint characterization of inference quality, latency, throughput, and thermal behavior is required for long-duration operation.
Release of an annotated streaming video dataset enables reproducible deployment-centric evaluation.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same benchmark-to-deployment gap may appear in other continuous perception tasks that rely on fine-grained classification from video streams.
Temporal stabilization techniques could be tested on different embedded platforms to check whether the accuracy recovery holds under varied thermal and workload profiles.
Deployment-centric datasets that include long-duration streams may become necessary complements to existing static-image benchmarks.
If track consistency is the key enabler, similar mechanisms might apply to any edge system that already maintains object tracks across frames.

Load-bearing premise

The observed performance degradation stems primarily from temporal instability, thermal throttling, and workload variability, and the track-aware stabilization generalizes across real-world conditions without new errors or latency costs.

What would settle it

A new streaming evaluation dataset in which the temporal stabilization mechanism produces no accuracy gain over per-frame baselines, or in which degradation remains above 30% despite its use.

Figures

Figures reproduced from arXiv: 2606.17241 by Aditya Mishra, Haroon Lone.

**Figure 2.** Figure 2: Overview of the Edge-TSR continuous edge inference system. Detection is performed every [PITH_FULL_IMAGE:figures/full_fig_p005_2.png] view at source ↗

**Figure 3.** Figure 3: State machine of the track-aware temporal stabi [PITH_FULL_IMAGE:figures/full_fig_p006_3.png] view at source ↗

**Figure 4.** Figure 4: Real-world deployment setup. The NVIDIA Jetson [PITH_FULL_IMAGE:figures/full_fig_p007_4.png] view at source ↗

**Figure 5.** Figure 5: Effect of sparse frame sampling. Moderate sampling [PITH_FULL_IMAGE:figures/full_fig_p010_5.png] view at source ↗

**Figure 8.** Figure 8: Qualitative results from real-world deployment of [PITH_FULL_IMAGE:figures/full_fig_p011_8.png] view at source ↗

**Figure 9.** Figure 9: Qualitative results of Edge-TSR on the dense urban traffic scenario. The system correctly detects and classifies multiple [PITH_FULL_IMAGE:figures/full_fig_p015_9.png] view at source ↗

**Figure 10.** Figure 10: Qualitative results of Edge-TSR on the rain scenario. The system correctly detects and classifies signs under active [PITH_FULL_IMAGE:figures/full_fig_p016_10.png] view at source ↗

**Figure 11.** Figure 11: Qualitative results of Edge-TSR on the rural/dark scenario, where illumination is provided exclusively by the vehicle’s [PITH_FULL_IMAGE:figures/full_fig_p016_11.png] view at source ↗

**Figure 12.** Figure 12: Qualitative results of Edge-TSR on the out-of-distribution (OOD) scenario, containing sign categories rendered in [PITH_FULL_IMAGE:figures/full_fig_p017_12.png] view at source ↗

read the original abstract

Continuous AI inference on resource-constrained edge hardware introduces deployment effects that are largely invisible to conventional benchmark evaluation, including temporal instability in streaming video, thermal throttling under sustained load, and workload-dependent performance variability. We present Edge-TSR, a deployment-oriented continuous edge inference system for sustained roadside perception on the NVIDIA Jetson Orin Nano. Edge-TSR integrates detection, tracking, fine-grained classification, and a lightweight track-aware temporal stabilization mechanism that improves streaming inference consistency with negligible computational overhead. Our central finding is that benchmark-centric evaluation systematically overstates deployed edge inference performance. Across three state-of-the-art baselines, we observe consistent 20-30% relative degradation when transitioning from static-image evaluation to real-world streaming deployment. Edge-TSR addresses this gap through temporal inference stabilization, recovering up to 10.16% classification accuracy over per-frame inference baselines while maintaining sustained real-time performance under continuous operation. We evaluate the complete system under diverse real-world deployment conditions, jointly characterizing inference quality, latency, throughput, and thermal behavior during long-duration operation. A 55-minute vehicular deployment over a 26 km route demonstrates sustained operation at 16.18 FPS within safe thermal limits on a single embedded device without cloud offload. Our findings show that deployment-aware evaluation and temporal inference stabilization are necessary components of continuously operating edge AI systems intended for real-world sensing deployments. We release a sample annotated streaming video evaluation dataset and full system implementation to support reproducible deployment-centric evaluation.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper documents a real benchmark-to-deployment drop in continuous edge roadside perception and adds a low-overhead stabilization fix, but the attribution to temporal and thermal factors over domain shift is not isolated.

read the letter

The paper's main observation is that static-image benchmarks overstate performance for continuous edge inference on roadside video, with 20-30% relative degradation across three baselines when moving to real streaming on a Jetson Orin Nano. Edge-TSR adds a track-aware temporal stabilization step that recovers up to 10.16% accuracy over per-frame baselines while keeping sustained real-time operation.

What the work does well is the end-to-end deployment evaluation. They run the full system for 55 minutes over 26 km, report joint metrics on accuracy, latency, throughput, and thermal behavior, and release both a sample annotated streaming video dataset and the implementation. That level of hardware-grounded measurement and reproducibility is still uncommon and directly useful for people building these systems.

The soft spot is the central claim about what drives the degradation. The abstract does not describe an ablation that reapplies the static benchmark protocol to frames extracted from the deployment videos, so the 20-30% figure cannot be cleanly attributed to temporal instability, thermal throttling, or workload variability rather than input distribution differences such as motion blur, lighting, or object statistics. The stress-test note is on target here.

This is aimed at engineers and applied researchers working on reliable edge perception for traffic or infrastructure monitoring. It shows clear thinking about deployment constraints and provides the artifacts needed to check the numbers, so it deserves a serious referee even if the methods need tighter controls on the degradation analysis.

Referee Report

2 major / 2 minor

Summary. The paper presents Edge-TSR, a continuous edge inference system for fine-grained roadside perception on NVIDIA Jetson Orin Nano that combines detection, tracking, classification, and a lightweight track-aware temporal stabilization module. Its central claim is that conventional static-image benchmarks systematically overstate real-world streaming performance, with three SOTA baselines exhibiting 20-30% relative degradation under continuous deployment; Edge-TSR recovers up to 10.16% classification accuracy while sustaining 16.18 FPS over a 55-minute, 26 km vehicular route within thermal limits, and the authors release a sample streaming dataset and implementation.

Significance. If the measured degradation is shown to stem primarily from temporal/thermal effects rather than unisolated domain shift, and if the stabilization generalizes without new error modes, the work would usefully demonstrate the necessity of deployment-aware evaluation and temporal mechanisms for sustained edge perception systems; the released dataset and code would further support reproducible studies in this area.

major comments (2)

[Abstract / Evaluation] Abstract and §4 (presumably the evaluation section): the central claim that benchmark-centric evaluation overstates performance by 20-30% due to temporal instability, thermal throttling, and workload variability lacks isolation from input distribution shift. No ablation is described that re-evaluates the three baselines on frames extracted from the deployment video stream using the identical static-image protocol; without this, the degradation cannot be confidently attributed to the listed deployment effects rather than motion blur, lighting, scale, or roadside-specific distributions.
[Abstract / Results] Abstract and §5 (deployment results): the reported 10.16% recovery and sustained 16.18 FPS over 55 minutes require explicit quantification of whether the track-aware stabilization introduces new errors or latency trade-offs under the same real-world conditions, and whether the mechanism remains effective when input statistics differ from the static benchmarks.

minor comments (2)

[Methods] Clarify the exact definition of 'per-frame inference baselines' versus the track-aware mechanism, including any hyper-parameters in the stabilization logic.
[Evaluation] Provide more detail on the thermal and latency measurement methodology (e.g., sampling rate, sensor placement) to allow replication of the sustained-operation claims.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the detailed and constructive comments. Below we respond point-by-point to the major comments. We agree that additional ablations and quantifications will strengthen the attribution of performance degradation and the characterization of the stabilization module; we will incorporate these in the revision.

read point-by-point responses

Referee: [Abstract / Evaluation] Abstract and §4 (presumably the evaluation section): the central claim that benchmark-centric evaluation overstates performance by 20-30% due to temporal instability, thermal throttling, and workload variability lacks isolation from input distribution shift. No ablation is described that re-evaluates the three baselines on frames extracted from the deployment video stream using the identical static-image protocol; without this, the degradation cannot be confidently attributed to the listed deployment effects rather than motion blur, lighting, scale, or roadside-specific distributions.

Authors: The referee correctly identifies that the current manuscript does not contain an ablation that reapplies the static-image protocol to frames sampled from the continuous deployment stream. Such an experiment would help separate temporal/thermal/workload effects from distribution shift. We will add this controlled ablation (re-evaluating all three baselines on deployment-stream frames under the original static protocol) to the revised evaluation section and update the abstract accordingly. revision: yes
Referee: [Abstract / Results] Abstract and §5 (deployment results): the reported 10.16% recovery and sustained 16.18 FPS over 55 minutes require explicit quantification of whether the track-aware stabilization introduces new errors or latency trade-offs under the same real-world conditions, and whether the mechanism remains effective when input statistics differ from the static benchmarks.

Authors: We agree that the manuscript would benefit from more explicit reporting of any new error modes or latency overhead introduced by the track-aware stabilization under the continuous deployment conditions, as well as a brief discussion of its behavior when input statistics deviate from the static benchmarks. We will add per-track error analysis, latency breakdowns, and a short generalization note to §5 (and the abstract) in the revision. revision: yes

Circularity Check

0 steps flagged

No circularity: empirical deployment measurements with no derivation chain or self-referential reductions.

full rationale

The paper reports direct hardware measurements of inference degradation (20-30% relative) when moving from static benchmarks to streaming deployment, plus accuracy recovery from the proposed track-aware stabilization mechanism. No equations, fitted parameters presented as predictions, ansatzes, or uniqueness theorems appear in the provided text. Central claims rest on observed FPS, thermal behavior, and accuracy deltas under sustained operation rather than any reduction to inputs by construction. Self-citations, if present, are not load-bearing for the attribution of effects.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract provides no explicit free parameters, axioms, or invented entities; the work is an empirical systems evaluation of deployment effects and a mitigation technique.

pith-pipeline@v0.9.1-grok · 5799 in / 1230 out tokens · 81489 ms · 2026-06-27T03:55:54.202053+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

43 extracted references · 7 canonical work pages · 4 internal anchors

[1]

Nir Aharon, Roy Orfaig, and Ben-Zion Bobrovsky. 2022. BoT-SORT: Robust associations multi-pedestrian tracking.arXiv preprint arXiv:2206.14651(2022)

work page arXiv 2022
[2]

KS Anoop, KK Chandrathejas, SP Anusha, et al . 2025. Real-Time Two-Stage Detection of Indian Traffic Signboards Using YOLO11 on Jetson Orin Nano. In 2025 International Conference on Advancements in Power, Communication and Intelligent Systems (APCI). IEEE, 1–6

2025
[3]

Riadh Ayachi, Mouna Afif, Yahia Said, and Abdessalem Ben Abdelali. 2022. An edge implementation of a traffic sign detection system for advanced driver assis- tance systems.International Journal of Intelligent Robotics and Applications6, 2 (2022), 207–215

2022
[4]

Théo Benoit-Cattin, Delia Velasco-Montero, and Jorge Fernández-Berni. 2020. Impact of thermal throttling on long-term visual inference in a CPU-based edge device.Electronics9, 12 (2020), 2106

2020
[5]

Alex Bewley, Zongyuan Ge, Lionel Ott, Fabio Ramos, and Ben Upcroft. 2016. Simple online and realtime tracking. In2016 IEEE international conference on image processing (ICIP). Ieee, 3464–3468

2016
[6]

Simone Bianco, Remi Cadene, Luigi Celona, and Paolo Napoletano. 2018. Bench- mark analysis of representative deep neural network architectures.IEEE access6 (2018), 64270–64277

2018
[7]

Mario Bijelic, Tobias Gruber, Fahim Mannan, Florian Kraus, Werner Ritter, Klaus Dietmayer, and Felix Heide. 2020. Seeing through fog without seeing fog: Deep multimodal sensor fusion in unseen adverse weather. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition. 11682–11692

2020
[8]

Alfredo Canziani, Adam Paszke, and Eugenio Culurciello. 2016. An analy- sis of deep neural network models for practical applications.arXiv preprint arXiv:1605.07678(2016)

work page internal anchor Pith review Pith/arXiv arXiv 2016
[9]

Junzhou Chen, Heqiang Huang, Ronghui Zhang, Nengchao Lyu, Yanyong Guo, Hong-Ning Dai, and Hong Yan. 2025. Yolo-ts: Real-time traffic sign detection with enhanced accuracy using optimized receptive fields and anchor-free fusion. IEEE Transactions on Intelligent Transportation Systems(2025)

2025
[10]

2023.Computer Vision Annotation Tool (CV AT)

CVAT.ai Corporation. 2023.Computer Vision Annotation Tool (CV AT). doi:10.5281/ zenodo.4009388

2023
[11]

Yunhao Du, Zhicheng Zhao, Yang Song, Yanyun Zhao, Fei Su, Tao Gong, and Hongying Meng. 2023. Strongsort: Make deepsort great again.IEEE Transactions on Multimedia25 (2023), 8725–8737

2023
[12]

Christian Ertler, Jerneja Mislej, Tobias Ollmann, Lorenzo Porzi, Gerhard Neuhold, and Yubin Kuang. 2020. The mapillary traffic sign dataset for detection and classification on a global scale. InEuropean conference on computer vision. Springer, 68–84

2020
[13]

Mark Everingham, Luc Van Gool, Christopher KI Williams, John Winn, and Andrew Zisserman. 2010. The pascal visual object classes (voc) challenge.Inter- national journal of computer vision88, 2 (2010), 303–338

2010
[14]

Kevin Eykholt, Ivan Evtimov, Earlence Fernandes, Bo Li, Amir Rahmati, Chaowei Xiao, Atul Prakash, Tadayoshi Kohno, and Dawn Song. 2018. Robust physical- world attacks on deep learning visual classification. InProceedings of the IEEE conference on computer vision and pattern recognition. 1625–1634

2018
[15]

Biyi Fang, Xiao Zeng, and Mi Zhang. 2018. Nestdnn: Resource-aware multi-tenant on-device deep learning for continuous mobile vision. InProceedings of the 24th Annual International Conference on Mobile Computing and Networking. 115–127

2018
[16]

Mingfei Han, Yali Wang, Xiaojun Chang, and Yu Qiao. 2020. Mining inter-video proposal relations for video object detection. InEuropean conference on computer vision. Springer, 431–446

2020
[17]

Song Han, Huizi Mao, and William J Dally. 2015. Deep compression: Compressing deep neural networks with pruning, trained quantization and huffman coding. arXiv preprint arXiv:1510.00149(2015)

work page internal anchor Pith review Pith/arXiv arXiv 2015
[18]

Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep residual learning for image recognition. InProceedings of the IEEE conference on computer vision and pattern recognition. 770–778

2016
[19]

Geoffrey Hinton, Oriol Vinyals, and Jeff Dean. 2015. Distilling the knowledge in a neural network.arXiv preprint arXiv:1503.02531(2015)

work page internal anchor Pith review Pith/arXiv arXiv 2015
[20]

Andrew Howard, Mark Sandler, Grace Chu, Liang-Chieh Chen, Bo Chen, Mingx- ing Tan, Weijun Wang, Yukun Zhu, Ruoming Pang, Vijay Vasudevan, et al. 2019. Searching for mobilenetv3. InProceedings of the IEEE/CVF international conference on computer vision. 1314–1324

2019
[21]

Benoit Jacob, Skirmantas Kligys, Bo Chen, Menglong Zhu, Matthew Tang, Andrew Howard, Hartwig Adam, and Dmitry Kalenichenko. 2018. Quantization and training of neural networks for efficient integer-arithmetic-only inference. In Proceedings of the IEEE conference on computer vision and pattern recognition. 2704–2713

2018
[22]

Junchen Jiang, Ganesh Ananthanarayanan, Peter Bodik, Siddhartha Sen, and Ion Stoica. 2018. Chameleon: scalable adaptation of video analytics. InProceedings of the 2018 conference of the ACM special interest group on data communication. 253–266

2018
[23]

Glenn Jocher, Ayush Chaurasia, and Jing Qiu. 2023. Ultralytics YOLOv8. https: //github.com/ultralytics/ultralytics

2023
[24]

Jianan Li, Xiaodan Liang, Yunchao Wei, Tingfa Xu, Jiashi Feng, and Shuicheng Yan. 2017. Perceptual generative adversarial networks for small object detection. InProceedings of the IEEE conference on computer vision and pattern recognition. 1222–1230

2017
[25]

Ziyu Lin, Yunfan Wu, Yuhang Ma, Junzhou Chen, Ronghui Zhang, Jiaming Wu, Guodong Yin, and Liang Lin. 2025. YOLO-LLTS: Real-Time Low-Light Traffic Sign Detection via Prior-Guided Enhancement and Multi-Branch Feature Interaction. arXiv preprint arXiv:2503.13883(2025)

work page arXiv 2025
[26]

Weitang Liu, Xiaoyun Wang, John Owens, and Yixuan Li. 2020. Energy-based out-of-distribution detection.Advances in neural information processing systems 33 (2020), 21464–21475

2020
[27]

Aditya Mishra, Akshay Agarwal, and Haroon Lone. 2026. Learning Un- der Low Illumination: A Dataset and Algorithm for Traffic Sign Recognition. arXiv:2511.17183 [cs.CV] https://arxiv.org/abs/2511.17183

work page internal anchor Pith review Pith/arXiv arXiv 2026
[28]

Joseph Redmon, Santosh Divvala, Ross Girshick, and Ali Farhadi. 2016. You only look once: Unified, real-time object detection. InProceedings of the IEEE conference on computer vision and pattern recognition. 779–788

2016
[29]

Pierre Sermanet and Yann LeCun. 2011. Traffic sign recognition with multi- scale convolutional networks. InThe 2011 international joint conference on neural networks. IEEE, 2809–2813

2011
[30]

Johannes Stallkamp, Marc Schlipsing, Jan Salmen, and Christian Igel. 2012. Man vs. computer: Benchmarking machine learning algorithms for traffic sign recognition. Neural networks32 (2012), 323–332

2012
[31]

Mingxing Tan and Quoc Le. 2019. Efficientnet: Rethinking model scaling for convolutional neural networks. InInternational conference on machine learning. PMLR, 6105–6114

2019
[32]

Du Tran, Lubomir Bourdev, Rob Fergus, Lorenzo Torresani, and Manohar Paluri
[33]

In Proceedings of the IEEE international conference on computer vision

Learning spatiotemporal features with 3d convolutional networks. In Proceedings of the IEEE international conference on computer vision. 4489–4497
[34]

Rishabh Uikey, Haroon R Lone, and Akshay Agarwal. 2024. Indian traffic sign detection and classification through a unified framework.IEEE Transactions on Intelligent Transportation Systems25, 10 (2024), 14866–14875

2024
[35]

Ishparsh Uprety, Griffen Agnello, and Xinghui Zhao. 2026. Optimizing deep learning based autonomous driving applications on edge devices.Journal on Autonomous Transportation Systems3, 3 (2026), 1–18

2026
[36]

Daniel Wagner, Gerhard Reitmayr, Alessandro Mulloni, Tom Drummond, and Dieter Schmalstieg. 2009. Real-time detection and tracking for augmented reality on mobile phones.IEEE transactions on visualization and computer graphics16, 3 (2009), 355–368

2009
[37]

Limin Wang, Yuanjun Xiong, Zhe Wang, Yu Qiao, Dahua Lin, Xiaoou Tang, and Luc Van Gool. 2016. Temporal segment networks: Towards good practices for deep action recognition. InEuropean conference on computer vision. Springer, 20–36

2016
[38]

Nicolai Wojke, Alex Bewley, and Dietrich Paulus. 2017. Simple online and realtime tracking with a deep association metric. In2017 IEEE international conference on image processing (ICIP). IEEE, 3645–3649

2017
[39]

Daliang Xu, Mengwei Xu, Qipeng Wang, Shangguang Wang, Yun Ma, Kang Huang, Gang Huang, Xin Jin, and Xuanzhe Liu. 2022. Mandheling: Mixed- precision on-device dnn training with dsp offloading. InProceedings of the 28th Annual International Conference on Mobile Computing And Networking. 214–227

2022
[40]

Mengwei Xu, Mengze Zhu, Yunxin Liu, Felix Xiaozhu Lin, and Xuanzhe Liu. 2018. Deepcache: Principled cache for mobile deep vision. InProceedings of the 24th annual international conference on mobile computing and networking. 129–144

2018
[41]

Xiao Zeng, Biyi Fang, Haichen Shen, and Mi Zhang. 2020. Distream: scaling live video analytics with workload-adaptive distributed edge intelligence. In Proceedings of the 18th Conference on Embedded Networked Sensor Systems. 409– 421

2020
[42]

Yifu Zhang, Peize Sun, Yi Jiang, Dongdong Yu, Fucheng Weng, Zehuan Yuan, Ping Luo, Wenyu Liu, and Xinggang Wang. 2022. Bytetrack: Multi-object tracking by associating every detection box. InEuropean conference on computer vision. Springer, 1–21

2022
[43]

GPU”field and GPU tem- perature from “Temp gpu

Xizhou Zhu, Yujie Wang, Jifeng Dai, Lu Yuan, and Yichen Wei. 2017. Flow- guided feature aggregation for video object detection. InProceedings of the IEEE international conference on computer vision. 408–417. 13 9 Appendix 9.1 Backbone Comparison Table 7 reports the performance of three candidate classification backbones evaluated on the dense urban traffi...

work page arXiv 2017

[1] [1]

Nir Aharon, Roy Orfaig, and Ben-Zion Bobrovsky. 2022. BoT-SORT: Robust associations multi-pedestrian tracking.arXiv preprint arXiv:2206.14651(2022)

work page arXiv 2022

[2] [2]

KS Anoop, KK Chandrathejas, SP Anusha, et al . 2025. Real-Time Two-Stage Detection of Indian Traffic Signboards Using YOLO11 on Jetson Orin Nano. In 2025 International Conference on Advancements in Power, Communication and Intelligent Systems (APCI). IEEE, 1–6

2025

[3] [3]

Riadh Ayachi, Mouna Afif, Yahia Said, and Abdessalem Ben Abdelali. 2022. An edge implementation of a traffic sign detection system for advanced driver assis- tance systems.International Journal of Intelligent Robotics and Applications6, 2 (2022), 207–215

2022

[4] [4]

Théo Benoit-Cattin, Delia Velasco-Montero, and Jorge Fernández-Berni. 2020. Impact of thermal throttling on long-term visual inference in a CPU-based edge device.Electronics9, 12 (2020), 2106

2020

[5] [5]

Alex Bewley, Zongyuan Ge, Lionel Ott, Fabio Ramos, and Ben Upcroft. 2016. Simple online and realtime tracking. In2016 IEEE international conference on image processing (ICIP). Ieee, 3464–3468

2016

[6] [6]

Simone Bianco, Remi Cadene, Luigi Celona, and Paolo Napoletano. 2018. Bench- mark analysis of representative deep neural network architectures.IEEE access6 (2018), 64270–64277

2018

[7] [7]

Mario Bijelic, Tobias Gruber, Fahim Mannan, Florian Kraus, Werner Ritter, Klaus Dietmayer, and Felix Heide. 2020. Seeing through fog without seeing fog: Deep multimodal sensor fusion in unseen adverse weather. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition. 11682–11692

2020

[8] [8]

Alfredo Canziani, Adam Paszke, and Eugenio Culurciello. 2016. An analy- sis of deep neural network models for practical applications.arXiv preprint arXiv:1605.07678(2016)

work page internal anchor Pith review Pith/arXiv arXiv 2016

[9] [9]

Junzhou Chen, Heqiang Huang, Ronghui Zhang, Nengchao Lyu, Yanyong Guo, Hong-Ning Dai, and Hong Yan. 2025. Yolo-ts: Real-time traffic sign detection with enhanced accuracy using optimized receptive fields and anchor-free fusion. IEEE Transactions on Intelligent Transportation Systems(2025)

2025

[10] [10]

2023.Computer Vision Annotation Tool (CV AT)

CVAT.ai Corporation. 2023.Computer Vision Annotation Tool (CV AT). doi:10.5281/ zenodo.4009388

2023

[11] [11]

Yunhao Du, Zhicheng Zhao, Yang Song, Yanyun Zhao, Fei Su, Tao Gong, and Hongying Meng. 2023. Strongsort: Make deepsort great again.IEEE Transactions on Multimedia25 (2023), 8725–8737

2023

[12] [12]

Christian Ertler, Jerneja Mislej, Tobias Ollmann, Lorenzo Porzi, Gerhard Neuhold, and Yubin Kuang. 2020. The mapillary traffic sign dataset for detection and classification on a global scale. InEuropean conference on computer vision. Springer, 68–84

2020

[13] [13]

Mark Everingham, Luc Van Gool, Christopher KI Williams, John Winn, and Andrew Zisserman. 2010. The pascal visual object classes (voc) challenge.Inter- national journal of computer vision88, 2 (2010), 303–338

2010

[14] [14]

Kevin Eykholt, Ivan Evtimov, Earlence Fernandes, Bo Li, Amir Rahmati, Chaowei Xiao, Atul Prakash, Tadayoshi Kohno, and Dawn Song. 2018. Robust physical- world attacks on deep learning visual classification. InProceedings of the IEEE conference on computer vision and pattern recognition. 1625–1634

2018

[15] [15]

Biyi Fang, Xiao Zeng, and Mi Zhang. 2018. Nestdnn: Resource-aware multi-tenant on-device deep learning for continuous mobile vision. InProceedings of the 24th Annual International Conference on Mobile Computing and Networking. 115–127

2018

[16] [16]

Mingfei Han, Yali Wang, Xiaojun Chang, and Yu Qiao. 2020. Mining inter-video proposal relations for video object detection. InEuropean conference on computer vision. Springer, 431–446

2020

[17] [17]

Song Han, Huizi Mao, and William J Dally. 2015. Deep compression: Compressing deep neural networks with pruning, trained quantization and huffman coding. arXiv preprint arXiv:1510.00149(2015)

work page internal anchor Pith review Pith/arXiv arXiv 2015

[18] [18]

Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep residual learning for image recognition. InProceedings of the IEEE conference on computer vision and pattern recognition. 770–778

2016

[19] [19]

Geoffrey Hinton, Oriol Vinyals, and Jeff Dean. 2015. Distilling the knowledge in a neural network.arXiv preprint arXiv:1503.02531(2015)

work page internal anchor Pith review Pith/arXiv arXiv 2015

[20] [20]

Andrew Howard, Mark Sandler, Grace Chu, Liang-Chieh Chen, Bo Chen, Mingx- ing Tan, Weijun Wang, Yukun Zhu, Ruoming Pang, Vijay Vasudevan, et al. 2019. Searching for mobilenetv3. InProceedings of the IEEE/CVF international conference on computer vision. 1314–1324

2019

[21] [21]

Benoit Jacob, Skirmantas Kligys, Bo Chen, Menglong Zhu, Matthew Tang, Andrew Howard, Hartwig Adam, and Dmitry Kalenichenko. 2018. Quantization and training of neural networks for efficient integer-arithmetic-only inference. In Proceedings of the IEEE conference on computer vision and pattern recognition. 2704–2713

2018

[22] [22]

Junchen Jiang, Ganesh Ananthanarayanan, Peter Bodik, Siddhartha Sen, and Ion Stoica. 2018. Chameleon: scalable adaptation of video analytics. InProceedings of the 2018 conference of the ACM special interest group on data communication. 253–266

2018

[23] [23]

Glenn Jocher, Ayush Chaurasia, and Jing Qiu. 2023. Ultralytics YOLOv8. https: //github.com/ultralytics/ultralytics

2023

[24] [24]

Jianan Li, Xiaodan Liang, Yunchao Wei, Tingfa Xu, Jiashi Feng, and Shuicheng Yan. 2017. Perceptual generative adversarial networks for small object detection. InProceedings of the IEEE conference on computer vision and pattern recognition. 1222–1230

2017

[25] [25]

Ziyu Lin, Yunfan Wu, Yuhang Ma, Junzhou Chen, Ronghui Zhang, Jiaming Wu, Guodong Yin, and Liang Lin. 2025. YOLO-LLTS: Real-Time Low-Light Traffic Sign Detection via Prior-Guided Enhancement and Multi-Branch Feature Interaction. arXiv preprint arXiv:2503.13883(2025)

work page arXiv 2025

[26] [26]

Weitang Liu, Xiaoyun Wang, John Owens, and Yixuan Li. 2020. Energy-based out-of-distribution detection.Advances in neural information processing systems 33 (2020), 21464–21475

2020

[27] [27]

Aditya Mishra, Akshay Agarwal, and Haroon Lone. 2026. Learning Un- der Low Illumination: A Dataset and Algorithm for Traffic Sign Recognition. arXiv:2511.17183 [cs.CV] https://arxiv.org/abs/2511.17183

work page internal anchor Pith review Pith/arXiv arXiv 2026

[28] [28]

Joseph Redmon, Santosh Divvala, Ross Girshick, and Ali Farhadi. 2016. You only look once: Unified, real-time object detection. InProceedings of the IEEE conference on computer vision and pattern recognition. 779–788

2016

[29] [29]

Pierre Sermanet and Yann LeCun. 2011. Traffic sign recognition with multi- scale convolutional networks. InThe 2011 international joint conference on neural networks. IEEE, 2809–2813

2011

[30] [30]

Johannes Stallkamp, Marc Schlipsing, Jan Salmen, and Christian Igel. 2012. Man vs. computer: Benchmarking machine learning algorithms for traffic sign recognition. Neural networks32 (2012), 323–332

2012

[31] [31]

Mingxing Tan and Quoc Le. 2019. Efficientnet: Rethinking model scaling for convolutional neural networks. InInternational conference on machine learning. PMLR, 6105–6114

2019

[32] [32]

Du Tran, Lubomir Bourdev, Rob Fergus, Lorenzo Torresani, and Manohar Paluri

[33] [33]

In Proceedings of the IEEE international conference on computer vision

Learning spatiotemporal features with 3d convolutional networks. In Proceedings of the IEEE international conference on computer vision. 4489–4497

[34] [34]

Rishabh Uikey, Haroon R Lone, and Akshay Agarwal. 2024. Indian traffic sign detection and classification through a unified framework.IEEE Transactions on Intelligent Transportation Systems25, 10 (2024), 14866–14875

2024

[35] [35]

Ishparsh Uprety, Griffen Agnello, and Xinghui Zhao. 2026. Optimizing deep learning based autonomous driving applications on edge devices.Journal on Autonomous Transportation Systems3, 3 (2026), 1–18

2026

[36] [36]

Daniel Wagner, Gerhard Reitmayr, Alessandro Mulloni, Tom Drummond, and Dieter Schmalstieg. 2009. Real-time detection and tracking for augmented reality on mobile phones.IEEE transactions on visualization and computer graphics16, 3 (2009), 355–368

2009

[37] [37]

Limin Wang, Yuanjun Xiong, Zhe Wang, Yu Qiao, Dahua Lin, Xiaoou Tang, and Luc Van Gool. 2016. Temporal segment networks: Towards good practices for deep action recognition. InEuropean conference on computer vision. Springer, 20–36

2016

[38] [38]

Nicolai Wojke, Alex Bewley, and Dietrich Paulus. 2017. Simple online and realtime tracking with a deep association metric. In2017 IEEE international conference on image processing (ICIP). IEEE, 3645–3649

2017

[39] [39]

Daliang Xu, Mengwei Xu, Qipeng Wang, Shangguang Wang, Yun Ma, Kang Huang, Gang Huang, Xin Jin, and Xuanzhe Liu. 2022. Mandheling: Mixed- precision on-device dnn training with dsp offloading. InProceedings of the 28th Annual International Conference on Mobile Computing And Networking. 214–227

2022

[40] [40]

Mengwei Xu, Mengze Zhu, Yunxin Liu, Felix Xiaozhu Lin, and Xuanzhe Liu. 2018. Deepcache: Principled cache for mobile deep vision. InProceedings of the 24th annual international conference on mobile computing and networking. 129–144

2018

[41] [41]

Xiao Zeng, Biyi Fang, Haichen Shen, and Mi Zhang. 2020. Distream: scaling live video analytics with workload-adaptive distributed edge intelligence. In Proceedings of the 18th Conference on Embedded Networked Sensor Systems. 409– 421

2020

[42] [42]

Yifu Zhang, Peize Sun, Yi Jiang, Dongdong Yu, Fucheng Weng, Zehuan Yuan, Ping Luo, Wenyu Liu, and Xinggang Wang. 2022. Bytetrack: Multi-object tracking by associating every detection box. InEuropean conference on computer vision. Springer, 1–21

2022

[43] [43]

GPU”field and GPU tem- perature from “Temp gpu

Xizhou Zhu, Yujie Wang, Jifeng Dai, Lu Yuan, and Yichen Wei. 2017. Flow- guided feature aggregation for video object detection. InProceedings of the IEEE international conference on computer vision. 408–417. 13 9 Appendix 9.1 Backbone Comparison Table 7 reports the performance of three candidate classification backbones evaluated on the dense urban traffi...

work page arXiv 2017