MR2-ByteTrack: CNN and Transformer-based Video Object Detection for AI-augmented Embedded Vision Sensor Nodes
Pith reviewed 2026-05-19 15:20 UTC · model grok-4.3
The pith
MR2-ByteTrack enables video object detection with up to 55% energy savings on microcontroller-based vision sensors by alternating resolutions and rescoring detections.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
MR2-ByteTrack reduces multiply-accumulate operations by up to 53% for CNN detectors and 32% for Transformer detectors on the ImageNetVID dataset while maintaining mAP scores of 49.0 and 48.7 respectively. When run on the GAP9 MCU it achieves up to 55% energy savings over full-resolution processing and supports real-time Transformer-based video object detection for the first time on such hardware.
What carries the argument
The Multi-Resolution Rescored ByteTrack (MR2-ByteTrack) pipeline that switches between full- and low-resolution inference passes and corrects low-resolution errors via ByteTrack association combined with the Rescore algorithm's probability union aggregation of per-frame confidences.
If this is right
- Reduces computational cost measured in multiply-accumulate operations by as much as 53% for CNN models and 32% for Transformer models.
- Achieves up to 55% energy savings on the GAP9 ultra-low-power RISC-V MCU compared to full-resolution processing.
- Enables real-time Transformer-based video object detection on MCU-class embedded vision nodes for the first time.
- Preserves detection accuracy with mAP values up to 49.0 for CNN and 48.7 for Transformer on ImageNetVID.
Where Pith is reading between the lines
- Similar multi-resolution strategies could be tested on other tracking or detection architectures beyond the ones evaluated here.
- The approach might reduce bandwidth needs in distributed vision systems by keeping more processing local.
- Extending the Rescore logic to longer sequences or different confidence aggregation rules could further improve robustness on very low-power hardware.
Load-bearing premise
The Rescore algorithm reliably fixes misclassifications introduced by low-resolution frames using probability union rules without lowering overall detection performance.
What would settle it
Measuring the mAP on ImageNetVID when running the full pipeline but disabling the Rescore step and seeing if accuracy drops below the reported levels or below a full-resolution baseline.
Figures
read the original abstract
Modern smart vision sensors need on-device intelligence to process video streams, as cloud computing is often impractical due to bandwidth, latency, and privacy constraints. However, these sensory systems typically rely on ultra-low-power microcontrollers (MCUs) with limited memory and compute, making conventional video object detection methods, which require feature storage or multi-frame buffering, unfeasible. To address this challenge, we introduce Multi-Resolution Rescored ByteTrack (MR2-ByteTrack), a Video Object Detection (VOD) method tailored for MCU-based embedded vision nodes. MR2-ByteTrack reduces computational cost by alternating between full- and low-resolution inference, while linking detections across frames via ByteTrack and correcting misclassifications through the Rescore algorithm, which applies probability union rules to aggregate detection confidence scores across frames. We apply our approach to both a CNN-based detector and a Transformer-based model, demonstrating its generality across architectures with fundamentally different spatial processing. Experiments on ImageNetVID demonstrate that MR2-ByteTrack maintains accuracy, achieving mAP scores of up to 49.0 for the CNN-based models and 48.7 for the Transformer, while reducing multiply-accumulate operations by as much as 53\% for the CNNs and 32\% for the Transformer. When deployed on GAP9, an ultra-low-power RISC-V multicore MCU, our method yields up to 55\% energy savings compared to processing only full-resolution images, enabling the first real-time Transformer-based VOD on an MCU-class embedded vision node. Code available at https://github.com/Bomps4/Multi_Resolution_Rescored_ByteTrack/tree/IEEE_Access
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript introduces MR2-ByteTrack, a video object detection method for MCU-based embedded vision nodes. It alternates full- and low-resolution inference on CNN and Transformer detectors, links detections with ByteTrack, and applies a Rescore algorithm using probability union rules to aggregate per-frame confidence scores and correct misclassifications. On ImageNetVID it reports maintained mAP of 49.0 (CNN) and 48.7 (Transformer) with MAC reductions of 53% and 32%, respectively; on GAP9 hardware it claims up to 55% energy savings versus full-resolution processing, enabling the first real-time Transformer VOD on an MCU-class node. Code is released.
Significance. If the accuracy-maintenance claim holds, the work is significant for practical on-device video intelligence under severe memory and power constraints. It demonstrates cross-architecture generality (CNN and Transformer), reports concrete hardware energy measurements on GAP9, and provides reproducible code. These elements directly address the gap between high-accuracy VOD models and ultra-low-power embedded deployment.
major comments (2)
- [Method description of Rescore algorithm] The central claim that mAP is preserved while increasing the fraction of low-resolution frames (thereby achieving the reported 53%/32% MAC and 55% energy reductions) rests on the Rescore step. The manuscript states that probability-union aggregation corrects low-resolution misclassifications, yet provides no ablation, error-tolerance bound, or quantitative analysis of how many high-confidence false positives or missed small/fast objects the union rule can absorb before mAP falls below the full-resolution baseline. This is load-bearing for the energy-savings result.
- [Experiments on ImageNetVID and GAP9] The experimental section reports mAP values and MAC counts but does not specify the exact alternating schedule (e.g., fraction of low-resolution frames per sequence), the precise definition of the probability-union rule, or comparisons against other multi-resolution or frame-skipping baselines. Without these details the optimality and robustness of the 55% energy figure cannot be fully assessed.
minor comments (2)
- [Abstract and Method] The abstract and method sections use “probability union rules” without a short inline formula or pseudocode; adding one would improve clarity for readers unfamiliar with the exact aggregation.
- [Results tables/figures] Table or figure captions should explicitly state the resolution schedule and the number of low-resolution frames used to obtain the reported MAC and energy numbers.
Simulated Author's Rebuttal
We thank the referee for the constructive feedback and for recognizing the potential significance of MR2-ByteTrack for energy-constrained embedded vision. We address each major comment below and will revise the manuscript accordingly to strengthen the presentation of the Rescore algorithm and experimental details.
read point-by-point responses
-
Referee: [Method description of Rescore algorithm] The central claim that mAP is preserved while increasing the fraction of low-resolution frames (thereby achieving the reported 53%/32% MAC and 55% energy reductions) rests on the Rescore step. The manuscript states that probability-union aggregation corrects low-resolution misclassifications, yet provides no ablation, error-tolerance bound, or quantitative analysis of how many high-confidence false positives or missed small/fast objects the union rule can absorb before mAP falls below the full-resolution baseline. This is load-bearing for the energy-savings result.
Authors: We agree that additional analysis is required to fully support the central claim. In the revised manuscript we will add an ablation study that varies the fraction of low-resolution frames and reports mAP both with and without the Rescore step. We will also include a quantitative error-tolerance analysis, showing concrete examples of how the union rule recovers high-confidence false positives and missed small or fast objects across linked tracks. The probability-union rule will be defined precisely (maximum probability across linked detections or 1 - product(1 - p_i)). These additions will directly address the load-bearing nature of the result. revision: yes
-
Referee: [Experiments on ImageNetVID and GAP9] The experimental section reports mAP values and MAC counts but does not specify the exact alternating schedule (e.g., fraction of low-resolution frames per sequence), the precise definition of the probability-union rule, or comparisons against other multi-resolution or frame-skipping baselines. Without these details the optimality and robustness of the 55% energy figure cannot be fully assessed.
Authors: We acknowledge that the current experimental description lacks sufficient detail. In the revision we will explicitly state the alternating schedule (e.g., full-resolution every third frame with the resulting fraction of low-resolution frames per sequence), provide the exact mathematical formulation of the probability-union rule, and add direct comparisons against simple frame-skipping and other multi-resolution baselines. These changes will allow readers to assess the optimality and robustness of the reported 55% energy savings on GAP9. revision: yes
Circularity Check
No circularity: empirical evaluation of resolution-alternating VOD with tracking and rescoring
full rationale
The paper introduces MR2-ByteTrack as an algorithmic combination of alternating full/low-resolution inference, ByteTrack linking, and a Rescore step that aggregates scores via probability-union rules. All performance claims (mAP 49.0/48.7, 53%/32% MAC reduction, 55% energy savings on GAP9) are presented as direct experimental outcomes on ImageNetVID, compared against full-resolution baselines. No equations, first-principles derivations, or fitted parameters are shown that reduce to the method's own inputs by construction. No self-citation chains or uniqueness theorems are invoked to justify the core approach. The method is therefore self-contained against external benchmarks and receives the default non-circularity finding.
Axiom & Free-Parameter Ledger
axioms (2)
- domain assumption Detections can be reliably linked across frames using ByteTrack
- domain assumption Probability union rules can aggregate confidence scores to correct errors
Lean theorems connected to this paper
-
IndisputableMonolith/Cost/FunctionalEquation.leanwashburn_uniqueness_aczel unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
MR2-ByteTrack reduces computational cost by alternating between full- and low-resolution inference, while linking detections across frames via ByteTrack and correcting misclassifications through the Rescore algorithm, which applies probability union rules to aggregate detection confidence scores across frames.
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Reference graph
Works this paper leans on
-
[1]
S. C. Mukhopadhyay, S. K. S. Tyagi, N. K. Suryadevara, V . Piuri, F. Scotti, and S. Zeadally, ‘‘Artificial intelligence-based sensors for next generation iot applications: A review,’’IEEE Sensors Journal, vol. 21, no. 22, pp. 24 920–24 932, 2021
work page 2021
-
[2]
W. Su, L. Li, F. Liu, M. He, and X. Liang, ‘‘Ai on the edge: a comprehensive review,’’Artif. Intell. Rev., vol. 55, no. 8, p. 6125–6183, Dec. 2022. [Online]. Available: https://doi.org/10.1007/s10462-022-10141-4
-
[3]
W. Y u, F. Liang, X. He, W. G. Hatcher, C. Lu, J. Lin, and X. Y ang, ‘‘A survey on the edge computing for the internet of things,’’IEEE Access, vol. 6, pp. 6900–6919, 2018
work page 2018
-
[4]
K. S. Patle, R. Saini, A. Kumar, and V . S. Palaparthy, ‘‘Field evaluation of smart sensor system for plant disease prediction using lstm network,’’ IEEE Sensors Journal, vol. 22, no. 4, pp. 3715–3725, 2022
work page 2022
- [5]
- [6]
-
[7]
L. Lamberti, L. Bompani, V . J. Kartsch, M. Rusci, D. Palossi, and L. Benini, ‘‘Bio-inspired autonomous exploration policies with cnn-based object de- tection on nano-drones,’’ in2023 Design, Automation & Testin Europe Conference & Exhibition (DATE). IEEE, 2023, pp. 1–6
work page 2023
-
[8]
E. AlNuaimi, E. Cereda, R. Psiakis, S. Sugumar, A. Giusti, and D. Palossi, ‘‘A Deep Learning-Based Face Mask Detector for Autonomous Nano- Drones (Student Abstract),’’ inProceedings of the AAAI Conference on Artificial Intelligence, vol. 36, no. 11, 2022, pp. 12 903–12 904
work page 2022
-
[9]
D. Rossi, F. Conti, M. Eggiman, A. D. Mauro, G. Tagliavini, S. Mach, M. Guermandi, A. Pullini, I. Loi, J. Chen, E. Flamand, and L. Benini, ‘‘V ega: A Ten-Core SoC for IoT Endnodes With DNN Acceleration and Cognitive Wake-Up From MRAM-Based State-Retentive Sleep Mode,’’ IEEE Journal of Solid-State Circuits, vol. 57, no. 1, pp. 127–139, 2022
work page 2022
-
[10]
L. Lamberti, M. Rusci, M. Fariselli, F. Paci, and L. Benini, ‘‘Low-power license plate detection and recognition on a risc-v multi-core mcu-based vision system,’’ in2021 IEEE International Symposium on Circuits and Systems (ISCAS). IEEE, 2021, pp. 1–5
work page 2021
-
[11]
L. Bompani, M. Rusci, D. Palossi, F. Conti, and L. Benini, ‘‘ Multi- resolution Rescored ByteTrack for Video Object Detection on Ultra-low- power Embedded Systems ,’’ in2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW). Los Alamitos, CA, USA: IEEE Computer Society, Jun. 2024, pp. 2182–2190. VOLUME 14, 2026 13
work page 2024
-
[12]
A. Dosovitskiy, L. Beyer, A. Kolesnikov, D. Weissenborn, X. Zhai, T. Unterthiner, M. Dehghani, M. Minderer, G. Heigold, S. Gelly, J. Uszkoreit, and N. Houlsby, ‘‘An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale,’’ inInternational Conference on Learning Representations, 2021. [Online]. Available: https://openreview.net/forum?id=YicbFdNTTy
work page 2021
-
[13]
Y . Wang, Y . Deng, Y . Zheng, P . Chattopadhyay, and L. Wang, ‘‘Vision transformers for image classification: A comparative survey,’’ Technologies, vol. 13, no. 1, 2025. [Online]. Available: https://www.mdpi. com/2227-7080/13/1/32
work page 2025
-
[14]
A. Khan, Z. Rauf, A. Sohail, A. R. Khan, H. Asif, A. Asif, and U. Farooq, ‘‘A survey of the vision transformers and their cnn-transformer based variants,’’Artificial Intelligence Review, vol. 56, no. 3, pp. 2917–2970, Dec
-
[15]
Available: https://doi.org/10.1007/s10462-023-10595-0
[Online]. Available: https://doi.org/10.1007/s10462-023-10595-0
-
[16]
H. Cai, J. Li, M. Hu, C. Gan, and S. Han, ‘‘EfficientViT: Lightweight Multi-Scale Attention for High-Resolution Dense Prediction,’’ in2023 IEEE/CVF International Conference on Computer Vision (ICCV), 2023, pp. 17 256–17 267
work page 2023
-
[17]
O. Russakovsky, J. Deng, H. Su, J. Krause, S. Satheesh, S. Ma, Z. Huang, A. Karpathy, A. Khosla, M. Bernstein, A. C. Berg, and L. Fei-Fei, ‘‘Ima- geNet Large Scale Visual Recognition Challenge,’’International Journal of Computer Vision (IJCV), vol. 115, no. 3, pp. 211–252, 2015
work page 2015
- [18]
-
[19]
Z. Ge, S. Liu, F. Wang, Z. Li, and J. Sun, ‘‘Y olox: Exceeding yolo series in 2021,’’arXivpreprintarXiv:2107.08430, 2021
work page internal anchor Pith review Pith/arXiv arXiv 2021
-
[20]
B. Liu, M. Cai, and J. Li, ‘‘Video Object Detection Based on 3D Con- volution,’’ in2022 IEEE International Conference on Unmanned Systems (ICUS), 2022, pp. 177–183
work page 2022
-
[21]
X. Zhu, Y . Wang, J. Dai, L. Y uan, and Y . Wei, ‘‘Flow-Guided Feature Aggregation for Video Object Detection,’’ in2017 IEEE International Conference on Computer Vision (ICCV), 2017, pp. 408–417
work page 2017
-
[22]
H. Wu, Y . Chen, N. Wang, and Z.-X. Zhang, ‘‘Sequence Level Semantics Aggregation for Video Object Detection,’’ in2019 IEEE/CVF Interna- tional Conference on Computer Vision (ICCV), 2019, pp. 9216–9224
work page 2019
-
[23]
Y . Chen, Y . Cao, H. Hu, and L. Wang, ‘‘Memory Enhanced Global-Local Aggregation for Video Object Detection,’’ in2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 06 2020, pp. 10 334– 10 343
work page 2020
-
[24]
Q. Zhou, X. Li, L. He, Y . Y ang, G. Cheng, Y . Tong, L. Ma, and D. Tao, ‘‘TransVOD: End-to-End Video Object Detection With Spatial-Temporal Transformers,’’IEEE Transactions on Pattern Analysis and Machine Intel- ligence, vol. 45, no. 6, pp. 7853–7869, 2023
work page 2023
-
[25]
Y . Shi, N. Wang, and X. Guo, ‘‘YOLOV: Making Still Image Object Detectors Great at Video Object Detection,’’Proceedings of the AAAI Conference on Artificial Intelligence, vol. 37, no. 2, pp. 2254–2262, Jun. 2023
work page 2023
-
[26]
H. Belhassen, H. Zhang, V . Fresse, and E.-B. Bourennane, ‘‘Im- proving Video Object Detection by Seq-BboxMatching.’’ inVISI- GRAPP(5:VISAPP), 2019, pp. 226–233
work page 2019
-
[27]
M. Li, L. Li, R. Bai, J. Ren, B. Meng, and Y . Y ang, ‘‘A Motion-based Seq-bbox Matching Method for Video Object Detection,’’ in2021 IEEE Symposium on Computers and Communications (ISCC), 2021, pp. 1–7
work page 2021
-
[28]
X. Liu, F. K. Nejadasl, J. C. van Gemert, O. Booij, and S. L. Pintea, ‘‘ Objects do not disappear: Video object detection by single-frame object location anticipation ,’’ in2023 IEEE/CVF International Conference on Computer Vision (ICCV). Los Alamitos, CA, USA: IEEE Computer Society, Oct. 2023, pp. 6927–6938
work page 2023
-
[29]
T. V erelst and T. Tuytelaars, ‘‘BlockCopy: High-Resolution Video Process- ing with Block-Sparse Feature Propagation and Online Policies,’’ in2021 IEEE/CVF International Conference on Computer Vision (ICCV), 2021, pp. 5138–5147
work page 2021
-
[30]
Q. Zhou, S. Guo, J. Pan, J. Liang, J. Guo, Z. Xu, and J. Zhou, ‘‘Pass: Patch automatic skip scheme for efficient on-device video perception,’’IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 46, no. 5, pp. 3938–3954, 2024
work page 2024
-
[31]
M. Liu, M. Zhu, M. White, Y . Li, and D. Kalenichenko, ‘‘Looking fast and slow: Memory-guided mobile video object detection,’’arXiv preprint arXiv:1903.10172, 2019
work page internal anchor Pith review Pith/arXiv arXiv 1903
- [32]
-
[33]
W. Han, P . Khorrami, T. L. Paine, P . Ramachandran, M. Babaeizadeh, H. Shi, J. Li, S. Y an, and T. S. Huang, ‘‘Seq-NMS for Video Object Detection.’’CoRR, vol. abs/1602.08465, 2016. [Online]. Available: http: //dblp.uni-trier.de/db/journals/corr/corr1602.html#HanKPRBSL YH16
work page internal anchor Pith review Pith/arXiv arXiv 2016
-
[34]
S. Ren, K. He, R. Girshick, and J. Sun, ‘‘Faster r-cnn: towards real-time object detection with region proposal networks,’’ inProceedings of the 29th International Conference on Neural Information Processing Systems - V olume 1, ser. NIPS’15. Cambridge, MA, USA: MIT Press, 2015, p. 91–99
work page 2015
-
[35]
W. Liu, D. Anguelov, D. Erhan, C. Szegedy, S. Reed, C.-Y . Fu, and A. C. Berg, ‘‘SSD: Single Shot MultiBox Detector,’’ inComputer Vision – ECCV 2016, B. Leibe, J. Matas, N. Sebe, and M. Welling, Eds. Cham: Springer International Publishing, 2016, pp. 21–37
work page 2016
-
[36]
M. Sandler, A. G. Howard, M. Zhu, A. Zhmoginov, and L.-C. Chen, ‘‘Mobilenetv2: Inverted residuals and linear bottlenecks,’’2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 4510–4520, 2018
work page 2018
-
[37]
M. Y aseen, ‘‘What is YOLOv8: An In-Depth Exploration of the Internal Features of the Next-Generation Object Detector,’’ 08 2024
work page 2024
-
[38]
A. Wang, H. Chen, L. Liu, K. CHEN, Z. Lin, J. Han, and G. Ding, ‘‘YOLOv10: Real-Time End-to-End Object Detection,’’ inThe Thirty- eighth Annual Conference on Neural Information Processing Systems,
-
[39]
Available: https://openreview.net/forum?id=tz83Nyb71l
[Online]. Available: https://openreview.net/forum?id=tz83Nyb71l
-
[40]
YOLOv11: An Overview of the Key Architectural Enhancements
R. Khanam and M. Hussain, ‘‘YOLOv11: An Overview of the Key Architectural Enhancements,’’ 2024. [Online]. Available: https://arxiv.org/ abs/2410.17725
work page internal anchor Pith review Pith/arXiv arXiv 2024
-
[41]
N. Carion, F. Massa, G. Synnaeve, N. Usunier, A. Kirillov, and S. Zagoruyko, ‘‘End-to-end object detection with transformers,’’ inCom- puter Vision – ECCV 2020, A. V edaldi, H. Bischof, T. Brox, and J.-M. Frahm, Eds. Cham: Springer International Publishing, 2020, pp. 213– 229
work page 2020
-
[42]
S. Mehta and M. Rastegari, ‘‘MobileViT: Light-weight, General-purpose, and Mobile-friendly Vision Transformer,’’ inInternational Conference on Learning Representations, 2022. [Online]. Available: https://openreview. net/forum?id=vh-0sUt8HlG
work page 2022
- [43]
-
[44]
S. Mehta and M. Rastegari, ‘‘Separable Self-attention for Mobile Vision Transformers,’’ 2022. [Online]. Available: https://arxiv.org/abs/ 2206.02680
-
[45]
D. Tran, L. Bourdev, R. Fergus, L. Torresani, and M. Paluri, ‘‘Learning Spatiotemporal Features with 3D Convolutional Networks,’’ in2015 IEEE International Conference on Computer Vision (ICCV). Los Alamitos, CA, USA: IEEE Computer Society, dec 2015, pp. 4489–4497
work page 2015
-
[46]
Y . Lyu, M. Y . Y ang, G. V osselman, and G.-S. Xia, ‘‘Video object detection with a convolutional regression tracker,’’ISPRS Journal of Photogramme- try and Remote Sensing, vol. 176, pp. 139–150, 2021
work page 2021
-
[47]
Integrated Object Detection and Tracking with Tracklet-Conditioned Detection
Z. Zhang, D. Cheng, X. Z. S. Lin, and J. Dai, ‘‘Integrated Object De- tection and Tracking with Tracklet-Conditioned Detection,’’ArXiv, vol. abs/1811.11167, 2018
work page internal anchor Pith review Pith/arXiv arXiv 2018
-
[48]
X. Zhu, W. Su, L. Lu, B. Li, X. Wang, and J. Dai, ‘‘Deformable DETR: Deformable Transformers for End-to-End Object Detection,’’ArXiv, vol. abs/2010.04159, 2020. [Online]. Available: https://api.semanticscholar. org/CorpusID:222208633
work page internal anchor Pith review Pith/arXiv arXiv 2010
- [49]
- [50]
-
[51]
B. A. Motetti, L. Crupi, M. O. M. E. Elshaigi, M. Risso, D. J. Pagliari, D. Palossi, and A. Burrello, ‘‘Adaptive Deep Learning for Efficient Visual Pose Estimation aboard Ultra-low-power Nano-drones,’’ArXiv, vol. abs/2401.15236, 2024. [Online]. Available: https://api.semanticscholar. org/CorpusID:267312457
-
[52]
J. Moosmann, H. Müller, N. Zimmerman, G. Rutishauser, L. Benini, and M. Magno, ‘‘Flexible and Fully Quantized Lightweight TinyissimoYOLO for Ultra-Low-Power Edge Systems,’’IEEE Access, vol. 12, pp. 75 093– 75 107, 2024
work page 2024
-
[53]
J. Moosmann, P . Bonazzi, Y . Li, S. Bian, P . Mayer, L. Benini, and M. Magno, ‘‘Ultra-efficient on-device object detection on ai-integrated smart glasses with tinyissimoyolo,’’ inComputer Vision – ECCV 2024 14 VOLUME 14, 2026 Workshops, A. Del Bue, C. Canton, J. Pont-Tuset, and T. Tommasi, Eds. Cham: Springer Nature Switzerland, 2025, pp. 262–280
work page 2024
-
[54]
H. H. Y . Shalby, M. Pavan, and M. Roveri, ‘‘StreamTinyNet: video stream- ing analysis with spatial-temporal TinyML,’’ in2024 International Joint Conference on Neural Networks (IJCNN), 2024, pp. 1–8
work page 2024
-
[55]
C. El Zeinaty, W. Hamidouche, G. Herrou, and D. Menard, ‘‘Designing object detection models for tinyml: Foundations, comparative analysis, challenges, and emerging solutions,’’ACM Comput. Surv., vol. 58, no. 2, Sep. 2025. [Online]. Available: https://doi.org/10.1145/3744339
-
[56]
A. Burrello, M. Scherer, M. Zanghieri, F. Conti, and L. Benini, ‘‘A Mi- crocontroller is All Y ou Need: Enabling Transformer Execution on Low- Power IoT Endnodes,’’ in2021 IEEE International Conference on Omni- Layer Intelligent Systems (COINS), 2021, pp. 1–6
work page 2021
-
[57]
V . J.-B. Jung, A. Burrello, M. Scherer, F. Conti, and L. Benini, ‘‘Optimiz- ing the Deployment of Tiny Transformers on Low-Power MCUs,’’IEEE Transactions on Computers, vol. 74, no. 2, pp. 526–541, 2025
work page 2025
-
[58]
A. Dequino, L. Bompani, L. Benini, and F. Conti, ‘‘Optimizing BFloat16 Deployment of Tiny Transformers on Ultra-Low Power Extreme Edge SoCs,’’Journal of Low Power Electronics and Applications, vol. 15, no. 1,
-
[59]
Available: https://www.mdpi.com/2079-9268/15/1/8
[Online]. Available: https://www.mdpi.com/2079-9268/15/1/8
work page 2079
-
[60]
X. Lu, C. Bai, A. Zhu, Y . Zhu, and K. Wang, ‘‘Mcformer: A transformer- based detector for molecular communication with accelerated particle- based solution,’’IEEE Communications Letters, vol. 27, no. 10, pp. 2837– 2841, 2023
work page 2023
-
[61]
A. V aswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, L. Kaiser, and I. Polosukhin, ‘‘Attention is all you need,’’ inProceedings of the 31st International Conference on Neural Information Processing Systems, ser. NIPS’17. Red Hook, NY , USA: Curran Associates Inc., 2017, p. 6000–6010
work page 2017
-
[62]
T.-Y . Lin, P . Goyal, R. Girshick, K. He, and P . Dollár, ‘‘Focal Loss for Dense Object Detection,’’IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 42, no. 2, pp. 318–327, 2020
work page 2020
-
[63]
L. Scarciglia, A. Paolillo, and D. Palossi, ‘‘A map-free deep learning- based framework for gate-to-gate monocular visual navigation aboard miniaturized aerial vehicles,’’ 2025. [Online]. Available: https://arxiv.org/ abs/2503.05251
-
[64]
L. Bompani, L. Crupi, D. Palossi, O. Baldoni, D. Brunelli, F. Conti, M. Rusci, and L. Benini, ‘‘Accelerating image-based pest detection on a heterogeneous multicore microcontroller,’’IEEE Transactions on Agri- F ood Electronics, vol. 2, no. 2, pp. 170–180, 2024
work page 2024
-
[65]
L. Crupi, L. Butera, A. Ferrante, A. Giusti, and D. Palossi, ‘‘An efficient ground-aerial transportation system for pest control enabled by ai-based autonomous nano-uavs,’’ACM J. Auton. Transport. Syst., vol. 2, no. 4, Jun. 2025. [Online]. Available: https://doi.org/10.1145/3719210
-
[66]
YOLOv4: Optimal Speed and Accuracy of Object Detection
A. Bochkovskiy, C.-Y . Wang, and H.-Y . M. Liao, ‘‘Y olov4: Optimal speed and accuracy of object detection,’’ArXiv, vol. abs/2004.10934, 2020. [Online]. Available: https://api.semanticscholar.org/CorpusID:216080778
work page internal anchor Pith review Pith/arXiv arXiv 2004
-
[67]
P . Zhu, L. Wen, D. Du, X. Bian, H. Fan, Q. Hu, and H. Ling, ‘‘Detection and tracking meet drones challenge,’’IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 44, no. 11, pp. 7380–7399, 2021
work page 2021
-
[68]
Z. Tang, M. Naphade, M.-Y . Liu, X. Y ang, S. Birchfield, S. Wang, R. Ku- mar, D. Anastasiu, and J.-N. Hwang, ‘‘CityFlow: A city-scale benchmark for multi-target multi-camera vehicle tracking and re-identification,’’ in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2019
work page 2019
-
[69]
L. Wen, D. Du, Z. Cai, Z. Lei, M.-C. Chang, H. Qi, J. Lim, M.-H. Y ang, and S. Lyu, ‘‘UA-DETRAC: A new benchmark and protocol for multi-object detection and tracking,’’Computer Vision and Image Understanding, vol. 193, p. 102907, 2020
work page 2020
-
[70]
T.-Y . Lin, M. Maire, S. Belongie, J. Hays, P . Perona, D. Ramanan, P . Dol- lár, and C. L. Zitnick, ‘‘Microsoft coco: Common objects in context,’’ inComputer Vision – ECCV 2014, D. Fleet, T. Pajdla, B. Schiele, and T. Tuytelaars, Eds. Cham: Springer International Publishing, 2014, pp. 740–755. LUCA BOMPANIPh.D. graduate in Electronic Engineering at the U...
work page 2014
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.