pith. sign in

arxiv: 2501.13400 · v4 · submitted 2025-01-23 · 💻 cs.CV · cs.AI

YOLOv8 to YOLO11: A Comprehensive Architecture In-depth Comparative Review

Pith reviewed 2026-05-23 05:05 UTC · model grok-4.3

classification 💻 cs.CV cs.AI
keywords YOLOobject detectionarchitecture comparisonYOLOv8YOLOv11deep learningcomputer visionmodel evolution
0
0 comments X

The pith

YOLOv8 through YOLO11 add feature extraction gains while keeping certain blocks unchanged.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper compares the architectures of YOLOv8 to YOLO11 by reviewing available papers, documentation, and source code. It establishes that newer versions introduce improvements in architecture and feature extraction, yet some blocks stay the same across releases. This matters for readers because several recent models lack full scholarly publications or official diagrams, making it harder to understand how they work or to plan changes. The review aims to clarify the distinctions so that developers can grasp the evolution quickly.

Core claim

By examining the source code and documentation where official papers and diagrams are missing, the analysis shows that each version from YOLOv8 to YOLO11 brings improvements in architecture and feature extraction, while certain blocks remain unchanged. The lack of publications and diagrams creates ongoing challenges for understanding model operation and guiding future enhancements.

What carries the argument

Side-by-side architecture comparison via source code inspection that isolates both the updated components and the persistent blocks across the four models.

If this is right

  • Users gain a clearer picture of functional differences between consecutive YOLO releases.
  • Persistent blocks become obvious targets for targeted future modifications.
  • The absence of documentation is framed as a barrier that slows community progress on these models.
  • Developers are prompted to release papers and diagrams to reduce reliance on reverse engineering.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • A public table of changed versus unchanged blocks could serve as a starting point for ablation studies on what actually drives accuracy gains.
  • The pattern of unchanged blocks may indicate design choices that later versions deliberately preserve for compatibility or stability.
  • Similar code-inspection methods could be applied to other rapidly evolving detection families that also skip formal papers.

Load-bearing premise

That source code and documentation alone are enough to correctly map every architectural block without official diagrams or papers for all versions.

What would settle it

An official diagram or paper for YOLOv9, YOLOv10, or YOLO11 that shows a different set of changed or unchanged blocks than the ones identified from the code review.

read the original abstract

Note: This is a preliminary version of the manuscript. The final, peer-reviewed, and substantially revised version has been published in Jurnal RESTI. Readers are encouraged to access and cite the published version: DOI: https://doi.org/10.29207/resti.v10i2.6598 In the field of deep learning-based computer vision, YOLO is revolutionary. With respect to deep learning models, YOLO is also the one that is evolving the most rapidly. Unfortunately, not every YOLO model possesses scholarly publications. Moreover, there exists a YOLO model that lacks a publicly accessible official architectural diagram. Naturally, this engenders challenges, such as complicating the understanding of how the model operates in practice. Furthermore, the review articles that are presently available do not investigate the specifics of each model. The objective of this study is to present a comprehensive and in-depth architecture comparison of the four most recent YOLO models, specifically YOLOv8 through YOLO11, thereby enabling readers to quickly grasp not only how each model functions, but also the distinctions between them. To analyze each YOLO version's architecture, we meticulously examined the relevant academic papers, documentation, and scrutinized the source code. The analysis reveals that while each version of YOLO has improvements in architecture and feature extraction, certain blocks remain unchanged. The lack of scholarly publications and official diagrams presents challenges for understanding the model's functionality and future enhancement. Future developers are encouraged to provide these resources.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The manuscript presents a comparative review of the architectures of YOLOv8 through YOLO11. By inspecting academic papers, documentation, and source code, it concludes that each successive version introduces improvements in architecture and feature extraction while certain blocks remain unchanged across versions. The work highlights challenges stemming from the absence of scholarly publications and official architectural diagrams for some models and encourages future developers to provide such resources.

Significance. If the architectural mappings hold, the review fills a documented gap in the YOLO literature by synthesizing details that are otherwise scattered or unavailable in official sources. This descriptive synthesis can assist practitioners and researchers in quickly understanding model differences and evolution, with the explicit acknowledgment of missing resources adding transparency.

major comments (2)
  1. [Abstract and conclusion] The central comparative claim (that certain blocks remain unchanged while others improve) is load-bearing yet presented at a high level in the abstract and conclusion without a consolidated table or section that explicitly lists the unchanged blocks, their versions, and the supporting code or documentation references used for identification.
  2. [Methodology description (implied in abstract)] The methodology relies on source-code inspection for models lacking official diagrams, but no concrete examples of the inspection process, specific file paths, or cross-verification steps are provided, leaving the accuracy of the mappings difficult to assess independently.
minor comments (2)
  1. [Abstract] The abstract contains awkward phrasing (e.g., 'Naturally, this engenders challenges, such as complicating the understanding') that could be revised for clarity and conciseness.
  2. [Front matter] The note that this is a preliminary version whose final form appears in Jurnal RESTI should be stated explicitly in the introduction or a footnote so readers know how the current manuscript relates to the published version.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback and positive recommendation. We address each major comment below and will make the suggested revisions to strengthen the manuscript.

read point-by-point responses
  1. Referee: [Abstract and conclusion] The central comparative claim (that certain blocks remain unchanged while others improve) is load-bearing yet presented at a high level in the abstract and conclusion without a consolidated table or section that explicitly lists the unchanged blocks, their versions, and the supporting code or documentation references used for identification.

    Authors: We agree that the central claim would benefit from more explicit consolidation. We will add a dedicated table (or subsection) in the revised manuscript that lists the unchanged blocks, the versions in which they appear, and the specific code paths or documentation references used to confirm their invariance. revision: yes

  2. Referee: [Methodology description (implied in abstract)] The methodology relies on source-code inspection for models lacking official diagrams, but no concrete examples of the inspection process, specific file paths, or cross-verification steps are provided, leaving the accuracy of the mappings difficult to assess independently.

    Authors: We acknowledge that greater methodological transparency is warranted. In the revision we will expand the methodology section with concrete examples, including specific file paths within the Ultralytics repository and the cross-verification steps against papers and documentation. revision: yes

Circularity Check

0 steps flagged

No significant circularity

full rationale

The manuscript is a purely descriptive comparative review of YOLOv8–YOLO11 architectures. Its method consists of inspecting external artifacts (published papers, documentation, and open-source code) and summarizing observed differences; the central claim that “each version of YOLO has improvements in architecture and feature extraction, certain blocks remain unchanged” is a direct report of those observations, not a derivation, fitted prediction, or self-referential construction. No equations, parameters, uniqueness theorems, or ansatzes appear. The work therefore contains no load-bearing step that reduces to its own inputs by construction.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

The paper is a comparative review and introduces no new mathematical models, parameters, axioms, or postulated entities.

pith-pipeline@v0.9.0 · 5823 in / 996 out tokens · 33996 ms · 2026-05-23T05:05:43.171229+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Forward citations

Cited by 6 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. CANSURF: An ASV-View Can Dataset and Benchmark for Detection and Tracking of Surface-Level Debris

    cs.CV 2026-05 unverdicted novelty 8.0

    Presents the CANSURF dataset for surface-level aluminum can detection from ASV viewpoints and shows that training YOLOv11 on it yields a 12x performance boost over generic datasets along with stable tracking results.

  2. TSBOW -- Traffic Surveillance Benchmark for Occluded Vehicles Under Various Weather Conditions

    cs.CV 2026-02 unverdicted novelty 7.0

    TSBOW is a large-scale public dataset of traffic CCTV footage in diverse weather conditions with annotations for occluded vehicles to benchmark object detection performance.

  3. OmniHuman: A Large-scale Dataset and Benchmark for Human-Centric Video Generation

    cs.CV 2026-04 unverdicted novelty 6.0

    OmniHuman is a new large-scale multi-scene dataset with video-, frame-, and individual-level annotations for human-centric video generation, accompanied by the OHBench benchmark that adds metrics aligned with human pe...

  4. Telescope: Learnable Hyperbolic Foveation for Ultra-Long-Range Object Detection

    cs.CV 2026-04 unverdicted novelty 6.0

    Telescope uses learnable hyperbolic foveation to deliver a 76% relative mAP gain (0.185 to 0.326) for objects beyond 250 meters while keeping overhead low.

  5. TSBOW -- Traffic Surveillance Benchmark for Occluded Vehicles Under Various Weather Conditions

    cs.CV 2026-02 unverdicted novelty 5.0

    Introduces the TSBOW dataset and benchmark for occluded vehicle detection in traffic surveillance under diverse and extreme weather conditions.

  6. CWT-Enhanced Vibration Sensing With Spatial Fault Localization Using YOLO

    eess.SP 2025-09 unverdicted novelty 4.0

    CWT spectrograms combined with YOLOv9-11 achieve mAP up to 99.5% for spatial localization of bearing faults on CWRU, PU, and IMS datasets, outperforming time-series and STFT baselines.

Reference graph

Works this paper leans on

21 extracted references · 21 canonical work pages · cited by 5 Pith papers · 1 internal anchor

  1. [1]

    YOLO -MS: Rethinking Multi -Scale Representation Learning for Real-time Object Detection,

    Y. Chen, X. Yuan, R. Wu, J. Wang, Q. Hou, and M. -M. Cheng, “YOLO -MS: Rethinking Multi -Scale Representation Learning for Real-time Object Detection,” 2023, arXiv. doi: 10.48550/ARXIV.2308.05480

  2. [2]

    Home - Ultralytics YOLO Docs

    Glenn Jocher, Paula Derrenger, and Muhammad Rizwan Munawar, “Home - Ultralytics YOLO Docs.” Accessed: Jan. 13, 2025. [Online]. Available: https://docs.ultralytics.com/

  3. [3]

    Yolov9: Learning what you want to learn us- ing programmable gradient information

    C.-Y. Wang, I.-H. Yeh, and H.-Y. M. Liao, “YOLOv9: Learning What You Want to Learn Using Programmable Gradient Information,” 2024, arXiv. doi: 10.48550/ARXIV.2402.13616

  4. [4]

    Yolov10: Real-time end- to-end object detection

    A. Wang et al. , “YOLOv10: Real -Time End -to-End Object Detection,” 2024, arXiv. doi: 10.48550/ARXIV.2405.14458

  5. [5]

    YOLOv11: An Overview of the Key Architectural Enhancements

    R. Khanam and M. Hussain, “YOLOv11: An Overview of the Key Architectural Enhancements,” 2024, arXiv. doi: 10.48550/ARXIV.2410.17725

  6. [6]

    YOLO11 to Its Genesis: A Decadal and Comprehensive Review of The You Only Look Once (YOLO) Series,

    R. Sapkota et al., “YOLO11 to Its Genesis: A Decadal and Comprehensive Review of The You Only Look Once (YOLO) Series,” 2024, arXiv. doi: 10.48550/ARXIV.2406.19407

  7. [7]

    YOLOv1 to YOLOv10: A comprehensive review of YOLO variants and their application in the agricultural domain,

    M. A. R. Alif and M. Hussain, “YOLOv1 to YOLOv10: A comprehensive review of YOLO variants and their application in the agricultural domain,” 2024, arXiv. doi: 10.48550/ARXIV.2406.10139

  8. [8]

    A Comprehensive Review of YOLO Architectures in Computer Vision: From YOLOv1 to YOLOv8 and YOLO -NAS,

    J. Terven, D. -M. Córdova -Esparza, and J. -A. Romero -González, “A Comprehensive Review of YOLO Architectures in Computer Vision: From YOLOv1 to YOLOv8 and YOLO -NAS,” MAKE, vol. 5, no. 4, pp. 1680–1716, Nov. 2023, doi: 10.3390/make5040083

  9. [9]

    YOLOv1 to YOLOv10: The Fastest and Most Accurate Real -time Object Detection Systems,

    C.-Y. Wang and H. -Y. M. Liao, “YOLOv1 to YOLOv10: The Fastest and Most Accurate Real -time Object Detection Systems,” SIP, vol. 13, no. 1, Art. no. 1, 2024, doi: 10.1561/116.20240058

  10. [10]

    YOLOv1 to v8: Unveiling Each Variant –A Comprehensive Review of YOLO,

    M. Hussain, “YOLOv1 to v8: Unveiling Each Variant –A Comprehensive Review of YOLO,” IEEE Access, vol. 12, pp. 42816–42833, 2024, doi: 10.1109/ACCESS.2024.3378568

  11. [11]

    28, 2023)

    Priyanto Hidayatullah and Refdinal Tubagus, YOLOv8 Architecture Detailed Explanation - A Complete Breakdown, (Oct. 28, 2023). Accessed: Jan. 20, 2025. [Online Video]. Available: https://www.youtube.com/watch?v=HQXhDO7COj8

  12. [12]

    RangeKing,

    RangeKing, “RangeKing,” GitHub. Accessed: Jul. 31, 2023. [Online]. Available: https://github.com/RangeKing

  13. [13]

    shortcut in backbone and neck · Issue #1200 · ultralytics/ultralytics

    Glenn Jocher, “shortcut in backbone and neck · Issue #1200 · ultralytics/ultralytics.” Accessed: Nov. 06, 2024. [Online]. Available: https://github.com/ultralytics/ultralytics/issues/1200#issuecomment-1454873251

  14. [14]

    Understanding SPP and SPPF implementation · Issue #8785 · ultralytics/yolov5,

    Glenn Jocher, “Understanding SPP and SPPF implementation · Issue #8785 · ultralytics/yolov5,” GitHub. Accessed: Nov. 06, 2024. [Online]. Available: https://github.com/ultralytics/yolov5/issues/8785

  15. [15]

    yolov9 -c.yaml

    Kin-Yiu, Wong, “yolov9 -c.yaml.” Accessed: Jan. 20, 2025. [Online]. Available: https://github.com/WongKinYiu/yolov9/blob/main/models/detect/yolov9-c.yaml

  16. [16]

    19, 2024)

    Priyanto Hidayatullah and Refdinal Tubagus, YOLOv9 Architecture Explained , (Apr. 19, 2024). Accessed: Jan. 20, 2025. [Online Video]. Available: https://www.youtube.com/watch?v=oZ6I1VHpil0

  17. [17]

    yolov10l.yaml

    Wang Ao, “yolov10l.yaml.” Accessed: Jan. 20, 2025. [Online]. Available: https://github.com/THU - MIG/yolov10/blob/main/ultralytics/cfg/models/v10/yolov10l.yaml

  18. [18]

    03, 2024)

    Priyanto Hidayatullah and Refdinal Tubagus, YOLOv10 Architecture Explained - A Complete Breakdown, (Sep. 03, 2024). Accessed: Jan. 20, 2025. [Online Video]. Available: https://www.youtube.com/watch?v=A6rHMzRvs98

  19. [19]

    yolo11.yaml

    Glenn Jocher and Paula Derrenger, “yolo11.yaml.” Accessed: Jan. 20, 2025. [Online]. Available: https://github.com/ultralytics/ultralytics/blob/main/ultralytics/cfg/models/11/yolo11.yaml

  20. [20]

    28, 2024)

    Priyanto Hidayatullah and Refdinal Tubagus, YOLO11 Architecture - Detailed Explanation , (Oct. 28, 2024). Accessed: Jan. 18, 2025. [Online Video]. Available: https://www.youtube.com/watch?v=L9Va7Y9UT8E

  21. [21]

    Zhang, Z

    A. Zhang, Z. C. Lipton, M. Li, and A. J. Smola, Dive into Deep Learning. Cambridge University Press, 2023