YOLOv8 to YOLO11: A Comprehensive Architecture In-depth Comparative Review
Pith reviewed 2026-05-23 05:05 UTC · model grok-4.3
The pith
YOLOv8 through YOLO11 add feature extraction gains while keeping certain blocks unchanged.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
By examining the source code and documentation where official papers and diagrams are missing, the analysis shows that each version from YOLOv8 to YOLO11 brings improvements in architecture and feature extraction, while certain blocks remain unchanged. The lack of publications and diagrams creates ongoing challenges for understanding model operation and guiding future enhancements.
What carries the argument
Side-by-side architecture comparison via source code inspection that isolates both the updated components and the persistent blocks across the four models.
If this is right
- Users gain a clearer picture of functional differences between consecutive YOLO releases.
- Persistent blocks become obvious targets for targeted future modifications.
- The absence of documentation is framed as a barrier that slows community progress on these models.
- Developers are prompted to release papers and diagrams to reduce reliance on reverse engineering.
Where Pith is reading between the lines
- A public table of changed versus unchanged blocks could serve as a starting point for ablation studies on what actually drives accuracy gains.
- The pattern of unchanged blocks may indicate design choices that later versions deliberately preserve for compatibility or stability.
- Similar code-inspection methods could be applied to other rapidly evolving detection families that also skip formal papers.
Load-bearing premise
That source code and documentation alone are enough to correctly map every architectural block without official diagrams or papers for all versions.
What would settle it
An official diagram or paper for YOLOv9, YOLOv10, or YOLO11 that shows a different set of changed or unchanged blocks than the ones identified from the code review.
read the original abstract
Note: This is a preliminary version of the manuscript. The final, peer-reviewed, and substantially revised version has been published in Jurnal RESTI. Readers are encouraged to access and cite the published version: DOI: https://doi.org/10.29207/resti.v10i2.6598 In the field of deep learning-based computer vision, YOLO is revolutionary. With respect to deep learning models, YOLO is also the one that is evolving the most rapidly. Unfortunately, not every YOLO model possesses scholarly publications. Moreover, there exists a YOLO model that lacks a publicly accessible official architectural diagram. Naturally, this engenders challenges, such as complicating the understanding of how the model operates in practice. Furthermore, the review articles that are presently available do not investigate the specifics of each model. The objective of this study is to present a comprehensive and in-depth architecture comparison of the four most recent YOLO models, specifically YOLOv8 through YOLO11, thereby enabling readers to quickly grasp not only how each model functions, but also the distinctions between them. To analyze each YOLO version's architecture, we meticulously examined the relevant academic papers, documentation, and scrutinized the source code. The analysis reveals that while each version of YOLO has improvements in architecture and feature extraction, certain blocks remain unchanged. The lack of scholarly publications and official diagrams presents challenges for understanding the model's functionality and future enhancement. Future developers are encouraged to provide these resources.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript presents a comparative review of the architectures of YOLOv8 through YOLO11. By inspecting academic papers, documentation, and source code, it concludes that each successive version introduces improvements in architecture and feature extraction while certain blocks remain unchanged across versions. The work highlights challenges stemming from the absence of scholarly publications and official architectural diagrams for some models and encourages future developers to provide such resources.
Significance. If the architectural mappings hold, the review fills a documented gap in the YOLO literature by synthesizing details that are otherwise scattered or unavailable in official sources. This descriptive synthesis can assist practitioners and researchers in quickly understanding model differences and evolution, with the explicit acknowledgment of missing resources adding transparency.
major comments (2)
- [Abstract and conclusion] The central comparative claim (that certain blocks remain unchanged while others improve) is load-bearing yet presented at a high level in the abstract and conclusion without a consolidated table or section that explicitly lists the unchanged blocks, their versions, and the supporting code or documentation references used for identification.
- [Methodology description (implied in abstract)] The methodology relies on source-code inspection for models lacking official diagrams, but no concrete examples of the inspection process, specific file paths, or cross-verification steps are provided, leaving the accuracy of the mappings difficult to assess independently.
minor comments (2)
- [Abstract] The abstract contains awkward phrasing (e.g., 'Naturally, this engenders challenges, such as complicating the understanding') that could be revised for clarity and conciseness.
- [Front matter] The note that this is a preliminary version whose final form appears in Jurnal RESTI should be stated explicitly in the introduction or a footnote so readers know how the current manuscript relates to the published version.
Simulated Author's Rebuttal
We thank the referee for the constructive feedback and positive recommendation. We address each major comment below and will make the suggested revisions to strengthen the manuscript.
read point-by-point responses
-
Referee: [Abstract and conclusion] The central comparative claim (that certain blocks remain unchanged while others improve) is load-bearing yet presented at a high level in the abstract and conclusion without a consolidated table or section that explicitly lists the unchanged blocks, their versions, and the supporting code or documentation references used for identification.
Authors: We agree that the central claim would benefit from more explicit consolidation. We will add a dedicated table (or subsection) in the revised manuscript that lists the unchanged blocks, the versions in which they appear, and the specific code paths or documentation references used to confirm their invariance. revision: yes
-
Referee: [Methodology description (implied in abstract)] The methodology relies on source-code inspection for models lacking official diagrams, but no concrete examples of the inspection process, specific file paths, or cross-verification steps are provided, leaving the accuracy of the mappings difficult to assess independently.
Authors: We acknowledge that greater methodological transparency is warranted. In the revision we will expand the methodology section with concrete examples, including specific file paths within the Ultralytics repository and the cross-verification steps against papers and documentation. revision: yes
Circularity Check
No significant circularity
full rationale
The manuscript is a purely descriptive comparative review of YOLOv8–YOLO11 architectures. Its method consists of inspecting external artifacts (published papers, documentation, and open-source code) and summarizing observed differences; the central claim that “each version of YOLO has improvements in architecture and feature extraction, certain blocks remain unchanged” is a direct report of those observations, not a derivation, fitted prediction, or self-referential construction. No equations, parameters, uniqueness theorems, or ansatzes appear. The work therefore contains no load-bearing step that reduces to its own inputs by construction.
Axiom & Free-Parameter Ledger
Forward citations
Cited by 6 Pith papers
-
CANSURF: An ASV-View Can Dataset and Benchmark for Detection and Tracking of Surface-Level Debris
Presents the CANSURF dataset for surface-level aluminum can detection from ASV viewpoints and shows that training YOLOv11 on it yields a 12x performance boost over generic datasets along with stable tracking results.
-
TSBOW -- Traffic Surveillance Benchmark for Occluded Vehicles Under Various Weather Conditions
TSBOW is a large-scale public dataset of traffic CCTV footage in diverse weather conditions with annotations for occluded vehicles to benchmark object detection performance.
-
OmniHuman: A Large-scale Dataset and Benchmark for Human-Centric Video Generation
OmniHuman is a new large-scale multi-scene dataset with video-, frame-, and individual-level annotations for human-centric video generation, accompanied by the OHBench benchmark that adds metrics aligned with human pe...
-
Telescope: Learnable Hyperbolic Foveation for Ultra-Long-Range Object Detection
Telescope uses learnable hyperbolic foveation to deliver a 76% relative mAP gain (0.185 to 0.326) for objects beyond 250 meters while keeping overhead low.
-
TSBOW -- Traffic Surveillance Benchmark for Occluded Vehicles Under Various Weather Conditions
Introduces the TSBOW dataset and benchmark for occluded vehicle detection in traffic surveillance under diverse and extreme weather conditions.
-
CWT-Enhanced Vibration Sensing With Spatial Fault Localization Using YOLO
CWT spectrograms combined with YOLOv9-11 achieve mAP up to 99.5% for spatial localization of bearing faults on CWRU, PU, and IMS datasets, outperforming time-series and STFT baselines.
Reference graph
Works this paper leans on
-
[1]
YOLO -MS: Rethinking Multi -Scale Representation Learning for Real-time Object Detection,
Y. Chen, X. Yuan, R. Wu, J. Wang, Q. Hou, and M. -M. Cheng, “YOLO -MS: Rethinking Multi -Scale Representation Learning for Real-time Object Detection,” 2023, arXiv. doi: 10.48550/ARXIV.2308.05480
-
[2]
Glenn Jocher, Paula Derrenger, and Muhammad Rizwan Munawar, “Home - Ultralytics YOLO Docs.” Accessed: Jan. 13, 2025. [Online]. Available: https://docs.ultralytics.com/
work page 2025
-
[3]
Yolov9: Learning what you want to learn us- ing programmable gradient information
C.-Y. Wang, I.-H. Yeh, and H.-Y. M. Liao, “YOLOv9: Learning What You Want to Learn Using Programmable Gradient Information,” 2024, arXiv. doi: 10.48550/ARXIV.2402.13616
-
[4]
Yolov10: Real-time end- to-end object detection
A. Wang et al. , “YOLOv10: Real -Time End -to-End Object Detection,” 2024, arXiv. doi: 10.48550/ARXIV.2405.14458
-
[5]
YOLOv11: An Overview of the Key Architectural Enhancements
R. Khanam and M. Hussain, “YOLOv11: An Overview of the Key Architectural Enhancements,” 2024, arXiv. doi: 10.48550/ARXIV.2410.17725
work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2410.17725 2024
-
[6]
YOLO11 to Its Genesis: A Decadal and Comprehensive Review of The You Only Look Once (YOLO) Series,
R. Sapkota et al., “YOLO11 to Its Genesis: A Decadal and Comprehensive Review of The You Only Look Once (YOLO) Series,” 2024, arXiv. doi: 10.48550/ARXIV.2406.19407
-
[7]
M. A. R. Alif and M. Hussain, “YOLOv1 to YOLOv10: A comprehensive review of YOLO variants and their application in the agricultural domain,” 2024, arXiv. doi: 10.48550/ARXIV.2406.10139
-
[8]
J. Terven, D. -M. Córdova -Esparza, and J. -A. Romero -González, “A Comprehensive Review of YOLO Architectures in Computer Vision: From YOLOv1 to YOLOv8 and YOLO -NAS,” MAKE, vol. 5, no. 4, pp. 1680–1716, Nov. 2023, doi: 10.3390/make5040083
-
[9]
YOLOv1 to YOLOv10: The Fastest and Most Accurate Real -time Object Detection Systems,
C.-Y. Wang and H. -Y. M. Liao, “YOLOv1 to YOLOv10: The Fastest and Most Accurate Real -time Object Detection Systems,” SIP, vol. 13, no. 1, Art. no. 1, 2024, doi: 10.1561/116.20240058
-
[10]
YOLOv1 to v8: Unveiling Each Variant –A Comprehensive Review of YOLO,
M. Hussain, “YOLOv1 to v8: Unveiling Each Variant –A Comprehensive Review of YOLO,” IEEE Access, vol. 12, pp. 42816–42833, 2024, doi: 10.1109/ACCESS.2024.3378568
- [11]
-
[12]
RangeKing, “RangeKing,” GitHub. Accessed: Jul. 31, 2023. [Online]. Available: https://github.com/RangeKing
work page 2023
-
[13]
shortcut in backbone and neck · Issue #1200 · ultralytics/ultralytics
Glenn Jocher, “shortcut in backbone and neck · Issue #1200 · ultralytics/ultralytics.” Accessed: Nov. 06, 2024. [Online]. Available: https://github.com/ultralytics/ultralytics/issues/1200#issuecomment-1454873251
work page 2024
-
[14]
Understanding SPP and SPPF implementation · Issue #8785 · ultralytics/yolov5,
Glenn Jocher, “Understanding SPP and SPPF implementation · Issue #8785 · ultralytics/yolov5,” GitHub. Accessed: Nov. 06, 2024. [Online]. Available: https://github.com/ultralytics/yolov5/issues/8785
work page 2024
-
[15]
Kin-Yiu, Wong, “yolov9 -c.yaml.” Accessed: Jan. 20, 2025. [Online]. Available: https://github.com/WongKinYiu/yolov9/blob/main/models/detect/yolov9-c.yaml
work page 2025
- [16]
-
[17]
Wang Ao, “yolov10l.yaml.” Accessed: Jan. 20, 2025. [Online]. Available: https://github.com/THU - MIG/yolov10/blob/main/ultralytics/cfg/models/v10/yolov10l.yaml
work page 2025
- [18]
-
[19]
Glenn Jocher and Paula Derrenger, “yolo11.yaml.” Accessed: Jan. 20, 2025. [Online]. Available: https://github.com/ultralytics/ultralytics/blob/main/ultralytics/cfg/models/11/yolo11.yaml
work page 2025
- [20]
- [21]
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.