YOLO26 vs. YOLOv8: A Comprehensive Architectural Benchmark of Next-Generation Real-Time Object Detection Models

Chidera G. Oguine; Kanyifeechukwu J. Oguine; Obiozor M. Oguine; Ozioma C. Oguine

arxiv: 2605.24831 · v2 · pith:O3FFMJ2Tnew · submitted 2026-05-24 · 💻 cs.CV · cs.AI

YOLO26 vs. YOLOv8: A Comprehensive Architectural Benchmark of Next-Generation Real-Time Object Detection Models

Chidera G. Oguine , Kanyifeechukwu J. Oguine , Obiozor M. Oguine , Ozioma C. Oguine This is my paper

Pith reviewed 2026-06-30 12:16 UTC · model grok-4.3

classification 💻 cs.CV cs.AI

keywords YOLO26YOLOv8object detectionbenchmarkNMS-freePascal VOCVisDronereal-time detection

0 comments

The pith

YOLO26 achieves higher accuracy with lower compute than YOLOv8 on Pascal VOC but the advantage shrinks on VisDrone and GPU latency favors YOLOv8.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper runs a controlled cross-scale comparison of YOLO26 and YOLOv8 on Pascal VOC and VisDrone to test whether the NMS-free design delivers practical gains outside COCO. YOLO26 incorporates one-to-one label assignment, drops Distribution Focal Loss, and uses a spectral-constrained CSP-Muon backbone, which together cut computational footprint and raise mAP_50:95 to 0.635 on Pascal VOC. On VisDrone, where most objects are very small, the accuracy gap narrows to 0.010 mAP_50:95 while YOLOv8 records lower GPU inference times at every scale. The evaluation therefore maps the conditions under which the newer architecture is preferable.

Core claim

YOLO26 with its native end-to-end one-to-one label assignment, removal of Distribution Focal Loss, and spectral-constrained CSP-Muon backbone produces a lower computational footprint and higher accuracy than YOLOv8 on Pascal VOC, reaching 0.635 mAP_50:95 for the largest variant, yet the same design yields only a 0.010 mAP_50:95 advantage on VisDrone while increasing GPU latency across all model sizes.

What carries the argument

The native end-to-end one-to-one label assignment that replaces non-maximum suppression and Distribution Focal Loss, allowing direct prediction-to-ground-truth matching without post-processing.

If this is right

Architecture selection for real-time detection must consider object density and typical object size in the target domain.
NMS-free designs do not automatically produce lower GPU inference latency even when they reduce overall compute.
The CSP-Muon backbone and one-to-one assignment deliver their clearest benefits on datasets with larger, less crowded objects.
Hardware benchmarks on both CPU and GPU are required to decide deployment suitability rather than relying on model size alone.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Dense small-object regimes may need additional mechanisms beyond current NMS-free label assignment to close the performance gap.
Extending the same protocol to additional hardware platforms would clarify whether the GPU latency penalty is universal.
A follow-up study that isolates each component (backbone, loss removal, label assignment) would show which change drives the observed trade-offs.

Load-bearing premise

The two model families were trained and tuned under sufficiently similar conditions that measured differences can be attributed to the architectural changes rather than hidden differences in data or optimization.

What would settle it

Retraining both families from identical starting weights using the exact same training pipeline, augmentations, and hyperparameters, then re-measuring whether the accuracy and latency gaps remain.

Figures

Figures reproduced from arXiv: 2605.24831 by Chidera G. Oguine, Kanyifeechukwu J. Oguine, Obiozor M. Oguine, Ozioma C. Oguine.

**Figure 1.** Figure 1: Architectural model of YOLO26.[22] 3.2. Overview of YOLO26 YOLO26 departs from the YOLOv8 paradigm through three architectural ruptures: NMS elimination, DFL removal, and dynamic training supervision [3] (See [PITH_FULL_IMAGE:figures/full_fig_p004_1.png] view at source ↗

**Figure 3.** Figure 3: Accuracy-speed trade-off (mAP@50:95 vs. GPU la [PITH_FULL_IMAGE:figures/full_fig_p008_3.png] view at source ↗

**Figure 2.** Figure 2: Accuracy-computation Pareto frontier (mAP@50:95 vs. [PITH_FULL_IMAGE:figures/full_fig_p008_2.png] view at source ↗

**Figure 7.** Figure 7: YOLOv8-n and YOLO26-n on VisDrone, showing [PITH_FULL_IMAGE:figures/full_fig_p009_7.png] view at source ↗

**Figure 8.** Figure 8: YOLOv8-x and YOLO26-x on Pascal VOC, showing [PITH_FULL_IMAGE:figures/full_fig_p009_8.png] view at source ↗

**Figure 6.** Figure 6: YOLOv8-n and YOLO26-n on Pascal VOC, showing [PITH_FULL_IMAGE:figures/full_fig_p009_6.png] view at source ↗

**Figure 9.** Figure 9: YOLOv8-x and YOLO26-x on VisDrone, showing high [PITH_FULL_IMAGE:figures/full_fig_p010_9.png] view at source ↗

read the original abstract

This paper presents a rigorous empirical evaluation of Ultralytics YOLO26 against the YOLOv8 baseline, offering an independent real-world stress test of NMS-free architectures on non-COCO distributions. Engineered for edge deployment, YOLO26 introduces native end-to-end one-to-one label assignment, the removal of Distribution Focal Loss (DFL), and a spectral-constrained CSP-Muon backbone. We conducted a comprehensive, cross-scale comparative analysis across five model capacities, using the general object detection (Pascal VOC) and dense aerial small-object detection (VisDrone) datasets. Models are evaluated across accuracy (mAP_50 and mAP_50:95), model complexity, and hardware-specific CPU/GPU latency. Our findings revealed that while YOLO26 achieves a lower computational footprint and superior accuracy on Pascal VOC, with YOLO26-x reaching 0.635 mAP_50:95, this advantage narrows in dense aerial environments. On VisDrone, where over 75% of objects are under 2,000 pixels, both architectures struggle significantly, yielding a minimal performance gap (0.214 mAP_50:95 for YOLOv8-x vs. 0.224 mAP_50:95 for YOLO26-x). Crucially, hardware benchmarking demonstrates that YOLOv8 maintains a consistent edge in GPU inference latency across identical scales (e.g., 6.92 ms for YOLOv8-s vs. 8.38 ms for YOLO26-s), showing that NMS-free design does not inherently guarantee superiority in universal deployment. This work maps the operational boundaries of NMS-free frameworks to guide architecture selection based on dataset density, object scale, and hardware constraints.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

Training conditions are not shown to be matched, so the mAP and latency gaps cannot be attributed to YOLO26's architectural changes.

read the letter

The core problem is that the paper reports YOLO26-x at 0.635 mAP_50:95 on Pascal VOC versus lower numbers for YOLOv8, with the gap shrinking to 0.224 vs 0.214 on VisDrone, but supplies no evidence that the two families were trained or tuned under the same data, augmentations, or schedules. Without that, the differences could come from anything.

What the work actually supplies is a side-by-side run of five scales on two public datasets, with mAP_50, mAP_50:95, parameter counts, and measured CPU/GPU latencies. It notes that both models struggle when most objects are under 2,000 pixels and that YOLOv8 keeps a GPU latency edge. Those are concrete numbers a practitioner might use when deciding between these two families for aerial or edge work.

The weakness is exactly the one the stress-test flags: the abstract credits end-to-end assignment, DFL removal, and the spectral CSP-Muon backbone for the gains, yet the text gives no protocol table or statement that training was held constant. Minor gaps like the 0.01 mAP difference on VisDrone also lack error bars or significance tests. The rest of the paper is standard benchmark language already common in the YOLO literature.

This is for someone who needs quick, dataset-specific numbers on these exact models and is willing to treat the comparison as provisional. It does not reorganize object detection or provide first-principles insight. I would not bring it to reading group and would not cite it.

It does not merit peer review until the training details are added and shown to be matched; the current version rests on an unverified assumption that undercuts the main claims.

Referee Report

1 major / 0 minor

Summary. The manuscript claims to deliver a rigorous empirical benchmark of YOLO26 (featuring native end-to-end one-to-one assignment, DFL removal, and spectral-constrained CSP-Muon backbone) versus YOLOv8 across five scales on Pascal VOC and VisDrone. It reports YOLO26 advantages in computational footprint and accuracy on Pascal VOC (YOLO26-x at 0.635 mAP_50:95), a narrowed gap on VisDrone (0.224 vs. 0.214 mAP_50:95), and consistently lower GPU latency for YOLOv8 (e.g., 6.92 ms vs. 8.38 ms for the small variants), concluding that NMS-free designs do not guarantee universal superiority.

Significance. If training and evaluation conditions are verifiably matched, the work would provide useful guidance on when NMS-free architectures are preferable for general versus dense small-object detection and for CPU/GPU edge constraints, extending beyond standard COCO benchmarks.

major comments (1)

[Abstract] Abstract: the central attribution of mAP and latency deltas to the listed architectural changes (end-to-end assignment, DFL removal, CSP-Muon backbone) is load-bearing yet unsupported. No training protocol, data splits, augmentation policy, hyperparameter schedule, or optimizer settings are stated for either family, so observed differences cannot be isolated from possible differences in training regime.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for highlighting this important methodological gap. We address the comment below and will revise the manuscript accordingly.

read point-by-point responses

Referee: [Abstract] Abstract: the central attribution of mAP and latency deltas to the listed architectural changes (end-to-end assignment, DFL removal, CSP-Muon backbone) is load-bearing yet unsupported. No training protocol, data splits, augmentation policy, hyperparameter schedule, or optimizer settings are stated for either family, so observed differences cannot be isolated from possible differences in training regime.

Authors: We agree that the absence of training and evaluation protocol details prevents readers from confidently attributing performance differences to the architectural modifications. In the revised manuscript we will insert a dedicated Experimental Setup section that specifies the exact training protocol, data splits, augmentation policy, hyperparameter schedule, and optimizer settings used for both YOLO26 and YOLOv8 families. This addition will make the comparison conditions explicit and allow the attribution claims to be properly evaluated. revision: yes

Circularity Check

0 steps flagged

No circularity: pure empirical benchmark with no derivations or self-referential predictions

full rationale

The paper performs a direct empirical comparison of two model families on public external datasets (Pascal VOC, VisDrone) using standard metrics (mAP, latency). No mathematical derivations, first-principles predictions, fitted parameters renamed as outputs, or load-bearing self-citations appear in the abstract or described content. The central claims rest on observed performance deltas under the assumption of comparable training regimes, but this is an empirical fairness issue rather than circularity. No equations, ansatzes, or uniqueness theorems are invoked that reduce to the paper's own inputs. This matches the default expectation of a non-circular empirical study.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

The paper is an empirical benchmark relying on publicly available datasets and conventional evaluation metrics; no free parameters, domain axioms, or invented entities are introduced.

pith-pipeline@v0.9.1-grok · 5882 in / 1111 out tokens · 33002 ms · 2026-06-30T12:16:51.863867+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

32 extracted references · 6 canonical work pages · 2 internal anchors

[1]

Real- time object detection in computer vision for quality control in industries

Zahid Ahmed, S Varalakshmi, and Karishma Desai. Real- time object detection in computer vision for quality control in industries. In2024 International Conference on Advances in Computing Research on Science Engineering and Tech- nology (ACROSET), pages 1–7. IEEE, 2024. 1

2024
[2]

Yolov4: Optimal speed and accuracy of object detection, 2020

Alexey Bochkovskiy, Chien-Yao Wang, and Hong- Yuan Mark Liao. Yolov4: Optimal speed and accuracy of object detection, 2020. 2, 3

2020
[3]

Yolo26: An analysis of nms-free end to end framework for real-time object detection.arXiv preprint arXiv:2601.12882, 2026

Sudip Chakrabarty. Yolo26: An analysis of nms-free end to end framework for real-time object detection.arXiv preprint arXiv:2601.12882, 2026. 4

work page arXiv 2026
[4]

Mark Everingham, Luc Van Gool, Christopher K. I. Williams, John Winn, and Andrew Zisserman. The Pascal Visual Object Classes (VOC) Challenge.Int. J. Comput. Vi- sion, 88(2):303–338, 2010. 1, 2

2010
[5]

A survey of quan- tization methods for efficient neural network inference

Amir Gholami, Sehoon Kim, Zhen Dong, Zhewei Yao, Michael W Mahoney, and Kurt Keutzer. A survey of quan- tization methods for efficient neural network inference. In Low-power computer vision, pages 291–326. Chapman and Hall/CRC, 2022. 3

2022
[6]

Learning non-maximum suppression

Jan Hosang, Rodrigo Benenson, and Bernt Schiele. Learning non-maximum suppression. InProceedings of the IEEE con- ference on computer vision and pattern recognition, pages 4507–4515, 2017. 1

2017
[7]

Mff-yolov8: Small object detection based on multi-scale feature fusion for uav remote sensing images

Kun Hu, Jinzheng Lu, Chaoquan Zheng, Qiang Xiang, and Ling Miao. Mff-yolov8: Small object detection based on multi-scale feature fusion for uav remote sensing images. IET Image Processing, 19(1):e70066, 2025. 2

2025
[8]

Computer vision for autonomous vehicles: Prob- lems, datasets and state of the art.Foundations and Trends in Computer Graphics and Vision, 12(1-3):1–308, 2020

Joel Janai, Fatma G ¨uney, Aseem Behl, and Andreas Geiger. Computer vision for autonomous vehicles: Prob- lems, datasets and state of the art.Foundations and Trends in Computer Graphics and Vision, 12(1-3):1–308, 2020. 1

2020
[9]

Ultralytics yolov5, 2020

Glenn Jocher. Ultralytics yolov5, 2020. 3, 4

2020
[10]

Ultralytics yolo11, 2024

Glenn Jocher and Jing Qiu. Ultralytics yolo11, 2024. 1, 2, 3

2024
[11]

Ultralytics yolo26, 2026

Glenn Jocher and Jing Qiu. Ultralytics yolo26, 2026. 1, 3

2026
[12]

Ultralytics yolov8, 2023

Glenn Jocher, Ayush Chaurasia, and Jing Qiu. Ultralytics yolov8, 2023. 1, 2, 3

2023
[13]

Muon: An optimizer for hidden layers in neural networks, 2024

Keller Jordan, Yuchen Jin, Vlado Boza, You Jiacheng, Franz Cesista, Laker Newhouse, and Jeremy Bernstein. Muon: An optimizer for hidden layers in neural networks, 2024. 1

2024
[14]

Yolov13: Real-time object detection with hypergraph-enhanced adaptive visual perception.arXiv preprint arXiv:2506.17733, 2025

Mengqi Lei, Siqi Li, Yihong Wu, and et al. Yolov13: Real- time object detection with hypergraph-enhanced adaptive vi- sual perception.arXiv preprint arXiv:2506.17733, 2025. 2, 3

work page arXiv 2025
[15]

Microsoft coco: Common objects in context

Tsung-Yi Lin, Michael Maire, Serge Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Doll´ar, and C Lawrence Zitnick. Microsoft coco: Common objects in context. In European conference on computer vision, pages 740–755. Springer, 2014. 1, 2

2014
[16]

Rt-detrv2: Improved baseline with bag-of-freebies for real-time detection transformer, 2024

Wenyu Lv, Yian Zhao, Qinyao Chang, Kui Huang, Guanzhong Wang, and Yi Liu. Rt-detrv2: Improved base- line with bag-of-freebies for real-time detection transformer. arXiv preprint arXiv:2407.17140, 2024. 1, 3

work page arXiv 2024
[17]

Yolo v3: Visual and real-time object detection model for smart surveillance systems (3s)

Kanyifeechukwu Jane Oguine, Ozioma Collins Oguine, and Hashim Ibrahim Bisallah. Yolo v3: Visual and real-time object detection model for smart surveillance systems (3s). In2022 5th International Conference on Information Tech- nology for Education and Development (ITED), pages 1–8. IEEE, 2022. 2

2022
[18]

Yolo9000: better, faster, stronger

Joseph Redmon and Ali Farhadi. Yolo9000: better, faster, stronger. InProceedings of the IEEE conference on computer vision and pattern recognition, pages 7263–7271, 2017. 3, 4

2017
[19]

YOLOv3: An Incremental Improvement

Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement.arXiv preprint arXiv:1804.02767, 2018. 3

work page internal anchor Pith review Pith/arXiv arXiv 2018
[20]

You only look once: Unified, real-time object de- tection

Joseph Redmon, Santosh Divvala, Ross Girshick, and Ali Farhadi. You only look once: Unified, real-time object de- tection. InProceedings of the IEEE conference on computer vision and pattern recognition, pages 779–788, 2016. 1, 2, 3, 4

2016
[21]

Yolo26: key architectural enhancements and performance benchmarking for real-time object detection,

Ranjan Sapkota, Rahul Harsha Cheppally, Ajay Sharda, and Manoj Karkee. Yolo26: key architectural enhancements and performance benchmarking for real-time object detection. arXiv preprint arXiv:2509.25164, 2025. 2, 3

work page arXiv 2025
[22]

YOLOv26: An Object Detector Built for Real-Time Deployment.LearnOpenCV – Learn OpenCV , PyTorch, Keras, Tensorflow with code, & tutorials, 2026

Bhomik Sharma. YOLOv26: An Object Detector Built for Real-Time Deployment.LearnOpenCV – Learn OpenCV , PyTorch, Keras, Tensorflow with code, & tutorials, 2026. 4

2026
[23]

Quantizing YOLO v8 models.Medium,

Sulav Shrestha. Quantizing YOLO v8 models.Medium,
[24]

Dbyolov8: Dual- branch yolov8 network for small object detection on drone image.International Journal of Advanced Computer Science & Applications, 16(1), 2025

Yawei Tan, Bingxin Xu, Jiangsheng Sun, Cheng Xu, Weiguo Pan, Songyin Dai, and Hongzhe Liu. Dbyolov8: Dual- branch yolov8 network for small object detection on drone image.International Journal of Advanced Computer Science & Applications, 16(1), 2025. 2

2025
[25]

A comprehensive review of yolo architectures in computer vision: From yolov1 to yolov8 and yolo-nas.Machine learning and knowledge ex- traction, 5(4):1680–1716, 2023

Juan Terven, Diana-Margarita C ´ordova-Esparza, and Julio- Alejandro Romero-Gonz ´alez. A comprehensive review of yolo architectures in computer vision: From yolov1 to yolov8 and yolo-nas.Machine learning and knowledge ex- traction, 5(4):1680–1716, 2023. 2

2023
[26]

YOLOv12: Attention-Centric Real-Time Object Detectors

Yunjie Tian, Qixiang Ye, and David Doermann. Yolo12: Attention-centric real-time object detectors.arXiv preprint arXiv:2502.12524, 2025. 3

work page internal anchor Pith review Pith/arXiv arXiv 2025
[27]

Rethinking PASCAL-VOC and MS-COCO dataset for small object detection.Journal of Vi- sual Communication and Image Representation, 93:103830,

Kang Tong and Yiquan Wu. Rethinking PASCAL-VOC and MS-COCO dataset for small object detection.Journal of Vi- sual Communication and Image Representation, 93:103830,
[28]

Yolov10: Real-time end-to- end object detection.Advances in neural information pro- cessing systems, 37:107984–108011, 2024

Ao Wang, Hui Chen, Lihao Liu, Kai Chen, Zijia Lin, Jun- gong Han, and Guiguang Ding. Yolov10: Real-time end-to- end object detection.Advances in neural information pro- cessing systems, 37:107984–108011, 2024. 1, 2, 3

2024
[29]

Yolov9: Learning what you want to learn using programmable gra- dient information

Chien-Yao Wang and Hong-Yuan Mark Liao. Yolov9: Learning what you want to learn using programmable gra- dient information. 2024. 3

2024
[30]

Q-petr: Quant-aware position embedding transformation for multi-view 3d object detection.arXiv e-prints, pages arXiv–2502, 2025

Jiangyong Yu, Changyong Shu, Dawei Yang, Sifan Zhou, Zichen Yu, Xing Hu, and Yan Chen. Q-petr: Quant-aware position embedding transformation for multi-view 3d object detection.arXiv e-prints, pages arXiv–2502, 2025. 3

2025
[31]

An im- proved yolov5 real-time detection method for small objects captured by uav.Soft Computing, 26(1):361–373, 2022

Wei Zhan, Chenfan Sun, Maocai Wang, Jinhui She, Yangyang Zhang, Zhiliang Zhang, and Yong Sun. An im- proved yolov5 real-time detection method for small objects captured by uav.Soft Computing, 26(1):361–373, 2022. 1, 2

2022
[32]

Detection and tracking meet drones challenge.IEEE Transactions on Pattern Anal- ysis and Machine Intelligence, 44(11):7380–7399, 2021

Pengfei Zhu, Longyin Wen, Dawei Du, Xiao Bian, Heng Fan, Qinghua Hu, and Haibin Ling. Detection and tracking meet drones challenge.IEEE Transactions on Pattern Anal- ysis and Machine Intelligence, 44(11):7380–7399, 2021. 1, 2

2021

[1] [1]

Real- time object detection in computer vision for quality control in industries

Zahid Ahmed, S Varalakshmi, and Karishma Desai. Real- time object detection in computer vision for quality control in industries. In2024 International Conference on Advances in Computing Research on Science Engineering and Tech- nology (ACROSET), pages 1–7. IEEE, 2024. 1

2024

[2] [2]

Yolov4: Optimal speed and accuracy of object detection, 2020

Alexey Bochkovskiy, Chien-Yao Wang, and Hong- Yuan Mark Liao. Yolov4: Optimal speed and accuracy of object detection, 2020. 2, 3

2020

[3] [3]

Yolo26: An analysis of nms-free end to end framework for real-time object detection.arXiv preprint arXiv:2601.12882, 2026

Sudip Chakrabarty. Yolo26: An analysis of nms-free end to end framework for real-time object detection.arXiv preprint arXiv:2601.12882, 2026. 4

work page arXiv 2026

[4] [4]

Mark Everingham, Luc Van Gool, Christopher K. I. Williams, John Winn, and Andrew Zisserman. The Pascal Visual Object Classes (VOC) Challenge.Int. J. Comput. Vi- sion, 88(2):303–338, 2010. 1, 2

2010

[5] [5]

A survey of quan- tization methods for efficient neural network inference

Amir Gholami, Sehoon Kim, Zhen Dong, Zhewei Yao, Michael W Mahoney, and Kurt Keutzer. A survey of quan- tization methods for efficient neural network inference. In Low-power computer vision, pages 291–326. Chapman and Hall/CRC, 2022. 3

2022

[6] [6]

Learning non-maximum suppression

Jan Hosang, Rodrigo Benenson, and Bernt Schiele. Learning non-maximum suppression. InProceedings of the IEEE con- ference on computer vision and pattern recognition, pages 4507–4515, 2017. 1

2017

[7] [7]

Mff-yolov8: Small object detection based on multi-scale feature fusion for uav remote sensing images

Kun Hu, Jinzheng Lu, Chaoquan Zheng, Qiang Xiang, and Ling Miao. Mff-yolov8: Small object detection based on multi-scale feature fusion for uav remote sensing images. IET Image Processing, 19(1):e70066, 2025. 2

2025

[8] [8]

Computer vision for autonomous vehicles: Prob- lems, datasets and state of the art.Foundations and Trends in Computer Graphics and Vision, 12(1-3):1–308, 2020

Joel Janai, Fatma G ¨uney, Aseem Behl, and Andreas Geiger. Computer vision for autonomous vehicles: Prob- lems, datasets and state of the art.Foundations and Trends in Computer Graphics and Vision, 12(1-3):1–308, 2020. 1

2020

[9] [9]

Ultralytics yolov5, 2020

Glenn Jocher. Ultralytics yolov5, 2020. 3, 4

2020

[10] [10]

Ultralytics yolo11, 2024

Glenn Jocher and Jing Qiu. Ultralytics yolo11, 2024. 1, 2, 3

2024

[11] [11]

Ultralytics yolo26, 2026

Glenn Jocher and Jing Qiu. Ultralytics yolo26, 2026. 1, 3

2026

[12] [12]

Ultralytics yolov8, 2023

Glenn Jocher, Ayush Chaurasia, and Jing Qiu. Ultralytics yolov8, 2023. 1, 2, 3

2023

[13] [13]

Muon: An optimizer for hidden layers in neural networks, 2024

Keller Jordan, Yuchen Jin, Vlado Boza, You Jiacheng, Franz Cesista, Laker Newhouse, and Jeremy Bernstein. Muon: An optimizer for hidden layers in neural networks, 2024. 1

2024

[14] [14]

Yolov13: Real-time object detection with hypergraph-enhanced adaptive visual perception.arXiv preprint arXiv:2506.17733, 2025

Mengqi Lei, Siqi Li, Yihong Wu, and et al. Yolov13: Real- time object detection with hypergraph-enhanced adaptive vi- sual perception.arXiv preprint arXiv:2506.17733, 2025. 2, 3

work page arXiv 2025

[15] [15]

Microsoft coco: Common objects in context

Tsung-Yi Lin, Michael Maire, Serge Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Doll´ar, and C Lawrence Zitnick. Microsoft coco: Common objects in context. In European conference on computer vision, pages 740–755. Springer, 2014. 1, 2

2014

[16] [16]

Rt-detrv2: Improved baseline with bag-of-freebies for real-time detection transformer, 2024

Wenyu Lv, Yian Zhao, Qinyao Chang, Kui Huang, Guanzhong Wang, and Yi Liu. Rt-detrv2: Improved base- line with bag-of-freebies for real-time detection transformer. arXiv preprint arXiv:2407.17140, 2024. 1, 3

work page arXiv 2024

[17] [17]

Yolo v3: Visual and real-time object detection model for smart surveillance systems (3s)

Kanyifeechukwu Jane Oguine, Ozioma Collins Oguine, and Hashim Ibrahim Bisallah. Yolo v3: Visual and real-time object detection model for smart surveillance systems (3s). In2022 5th International Conference on Information Tech- nology for Education and Development (ITED), pages 1–8. IEEE, 2022. 2

2022

[18] [18]

Yolo9000: better, faster, stronger

Joseph Redmon and Ali Farhadi. Yolo9000: better, faster, stronger. InProceedings of the IEEE conference on computer vision and pattern recognition, pages 7263–7271, 2017. 3, 4

2017

[19] [19]

YOLOv3: An Incremental Improvement

Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement.arXiv preprint arXiv:1804.02767, 2018. 3

work page internal anchor Pith review Pith/arXiv arXiv 2018

[20] [20]

You only look once: Unified, real-time object de- tection

Joseph Redmon, Santosh Divvala, Ross Girshick, and Ali Farhadi. You only look once: Unified, real-time object de- tection. InProceedings of the IEEE conference on computer vision and pattern recognition, pages 779–788, 2016. 1, 2, 3, 4

2016

[21] [21]

Yolo26: key architectural enhancements and performance benchmarking for real-time object detection,

Ranjan Sapkota, Rahul Harsha Cheppally, Ajay Sharda, and Manoj Karkee. Yolo26: key architectural enhancements and performance benchmarking for real-time object detection. arXiv preprint arXiv:2509.25164, 2025. 2, 3

work page arXiv 2025

[22] [22]

YOLOv26: An Object Detector Built for Real-Time Deployment.LearnOpenCV – Learn OpenCV , PyTorch, Keras, Tensorflow with code, & tutorials, 2026

Bhomik Sharma. YOLOv26: An Object Detector Built for Real-Time Deployment.LearnOpenCV – Learn OpenCV , PyTorch, Keras, Tensorflow with code, & tutorials, 2026. 4

2026

[23] [23]

Quantizing YOLO v8 models.Medium,

Sulav Shrestha. Quantizing YOLO v8 models.Medium,

[24] [24]

Dbyolov8: Dual- branch yolov8 network for small object detection on drone image.International Journal of Advanced Computer Science & Applications, 16(1), 2025

Yawei Tan, Bingxin Xu, Jiangsheng Sun, Cheng Xu, Weiguo Pan, Songyin Dai, and Hongzhe Liu. Dbyolov8: Dual- branch yolov8 network for small object detection on drone image.International Journal of Advanced Computer Science & Applications, 16(1), 2025. 2

2025

[25] [25]

A comprehensive review of yolo architectures in computer vision: From yolov1 to yolov8 and yolo-nas.Machine learning and knowledge ex- traction, 5(4):1680–1716, 2023

Juan Terven, Diana-Margarita C ´ordova-Esparza, and Julio- Alejandro Romero-Gonz ´alez. A comprehensive review of yolo architectures in computer vision: From yolov1 to yolov8 and yolo-nas.Machine learning and knowledge ex- traction, 5(4):1680–1716, 2023. 2

2023

[26] [26]

YOLOv12: Attention-Centric Real-Time Object Detectors

Yunjie Tian, Qixiang Ye, and David Doermann. Yolo12: Attention-centric real-time object detectors.arXiv preprint arXiv:2502.12524, 2025. 3

work page internal anchor Pith review Pith/arXiv arXiv 2025

[27] [27]

Rethinking PASCAL-VOC and MS-COCO dataset for small object detection.Journal of Vi- sual Communication and Image Representation, 93:103830,

Kang Tong and Yiquan Wu. Rethinking PASCAL-VOC and MS-COCO dataset for small object detection.Journal of Vi- sual Communication and Image Representation, 93:103830,

[28] [28]

Yolov10: Real-time end-to- end object detection.Advances in neural information pro- cessing systems, 37:107984–108011, 2024

Ao Wang, Hui Chen, Lihao Liu, Kai Chen, Zijia Lin, Jun- gong Han, and Guiguang Ding. Yolov10: Real-time end-to- end object detection.Advances in neural information pro- cessing systems, 37:107984–108011, 2024. 1, 2, 3

2024

[29] [29]

Yolov9: Learning what you want to learn using programmable gra- dient information

Chien-Yao Wang and Hong-Yuan Mark Liao. Yolov9: Learning what you want to learn using programmable gra- dient information. 2024. 3

2024

[30] [30]

Q-petr: Quant-aware position embedding transformation for multi-view 3d object detection.arXiv e-prints, pages arXiv–2502, 2025

Jiangyong Yu, Changyong Shu, Dawei Yang, Sifan Zhou, Zichen Yu, Xing Hu, and Yan Chen. Q-petr: Quant-aware position embedding transformation for multi-view 3d object detection.arXiv e-prints, pages arXiv–2502, 2025. 3

2025

[31] [31]

An im- proved yolov5 real-time detection method for small objects captured by uav.Soft Computing, 26(1):361–373, 2022

Wei Zhan, Chenfan Sun, Maocai Wang, Jinhui She, Yangyang Zhang, Zhiliang Zhang, and Yong Sun. An im- proved yolov5 real-time detection method for small objects captured by uav.Soft Computing, 26(1):361–373, 2022. 1, 2

2022

[32] [32]

Detection and tracking meet drones challenge.IEEE Transactions on Pattern Anal- ysis and Machine Intelligence, 44(11):7380–7399, 2021

Pengfei Zhu, Longyin Wen, Dawei Du, Xiao Bian, Heng Fan, Qinghua Hu, and Haibin Ling. Detection and tracking meet drones challenge.IEEE Transactions on Pattern Anal- ysis and Machine Intelligence, 44(11):7380–7399, 2021. 1, 2

2021