YOLOv8 to YOLO11: A Comprehensive Architecture In-depth Comparative Review

Priyanto Hidayatullah , Nurjannah Syakrani , Muhammad Rizqi Sholahuddin , Trisna Gelar , Refdinal Tubagus

Authors on Pith no claims yet

classification 💻 cs.CV cs.AI

keywords yolomodelversionarchitecturechallengescomprehensivedeepencouraged

read the original abstract

Note: This is a preliminary version of the manuscript. The final, peer-reviewed, and substantially revised version has been published in Jurnal RESTI. Readers are encouraged to access and cite the published version: DOI: https://doi.org/10.29207/resti.v10i2.6598 In the field of deep learning-based computer vision, YOLO is revolutionary. With respect to deep learning models, YOLO is also the one that is evolving the most rapidly. Unfortunately, not every YOLO model possesses scholarly publications. Moreover, there exists a YOLO model that lacks a publicly accessible official architectural diagram. Naturally, this engenders challenges, such as complicating the understanding of how the model operates in practice. Furthermore, the review articles that are presently available do not investigate the specifics of each model. The objective of this study is to present a comprehensive and in-depth architecture comparison of the four most recent YOLO models, specifically YOLOv8 through YOLO11, thereby enabling readers to quickly grasp not only how each model functions, but also the distinctions between them. To analyze each YOLO version's architecture, we meticulously examined the relevant academic papers, documentation, and scrutinized the source code. The analysis reveals that while each version of YOLO has improvements in architecture and feature extraction, certain blocks remain unchanged. The lack of scholarly publications and official diagrams presents challenges for understanding the model's functionality and future enhancement. Future developers are encouraged to provide these resources.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 2 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

OmniHuman: A Large-scale Dataset and Benchmark for Human-Centric Video Generation
cs.CV 2026-04 unverdicted novelty 6.0

OmniHuman is a new large-scale multi-scene dataset with video-, frame-, and individual-level annotations for human-centric video generation, accompanied by the OHBench benchmark that adds metrics aligned with human pe...
Telescope: Learnable Hyperbolic Foveation for Ultra-Long-Range Object Detection
cs.CV 2026-04 unverdicted novelty 6.0

Telescope uses learnable hyperbolic foveation to deliver a 76% relative mAP gain (0.185 to 0.326) for objects beyond 250 meters while keeping overhead low.