YOLO-NAS-Bench: A Surrogate Benchmark with Self-Evolving Predictors for YOLO Architecture Search
Pith reviewed 2026-05-21 11:05 UTC · model grok-4.3
The pith
A self-evolving surrogate predictor trained on sampled YOLO architectures can guide evolutionary search to detectors outperforming official YOLO baselines at comparable latency.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
YOLO-NAS-Bench defines a search space covering core YOLO modules from versions 8 through 12, samples 1,000 architectures via random, stratified, and Latin Hypercube methods, and trains them on COCO-mini. The Self-Evolving Mechanism iteratively deploys the predictor to locate and evaluate additional informative architectures near the high-performance frontier, growing the training pool to 1,500 while raising ensemble R2 from 0.770 to 0.815 and Sparse Kendall Tau from 0.694 to 0.752. Using the refined predictor directly as the fitness function in evolutionary search produces architectures that surpass all official YOLOv8-YOLO12 baselines at comparable latency on COCO-mini.
What carries the argument
The Self-Evolving Mechanism, which progressively aligns the predictor training distribution to the high-performance frontier by using the current model to discover and train new informative architectures each iteration.
If this is right
- The refined predictor supplies a low-cost fitness signal that lets evolutionary search explore the YOLO design space more thoroughly than full training permits.
- The benchmark supplies a standardized way to compare different NAS algorithms for detection without incurring repeated full training costs.
- Architectures surfaced by the predictor demonstrate that ranking consistency in the high-performance regime is sufficient to locate designs better than current hand-designed baselines.
- The iterative alignment process shows that predictor accuracy improves most when training data is deliberately shifted toward top-performing candidates rather than uniform sampling.
Where Pith is reading between the lines
- The same self-evolving sampling idea could be tested on other detection families or segmentation tasks where evaluation cost is similarly high.
- If the COCO-mini to full-COCO correlation holds, the approach offers a practical route to architecture search on larger-scale or video detection problems.
- The benchmark could serve as a testbed for hybrid search methods that combine the predictor with gradient-based or reinforcement-learning NAS strategies.
Load-bearing premise
Performance rankings and absolute metrics measured on the reduced COCO-mini dataset are sufficiently correlated with results on the full COCO dataset and real deployment conditions to make the surrogate useful for guiding architecture search.
What would settle it
Fully train the top architectures found by the evolutionary search on the complete COCO dataset and verify whether they still exceed the official YOLOv8-YOLO12 baselines in accuracy at matched latency.
Figures
read the original abstract
Neural Architecture Search (NAS) for object detection is severely bottlenecked by high evaluation cost, as fully training each candidate YOLO architecture on COCO demands days of GPU time. Meanwhile, existing NAS benchmarks largely target image classification, leaving the detection community without a comparable benchmark for NAS evaluation. To address this gap, we introduce YOLO-NAS-Bench, the first surrogate benchmark tailored to YOLO-style detectors. YOLO-NAS-Bench defines a search space spanning channel width, block depth, and operator type across both backbone and neck, covering the core modules of YOLOv8 through YOLO12. We sample 1,000 architectures via random, stratified, and Latin Hypercube strategies, train them on COCO-mini, and build a LightGBM surrogate predictor. To sharpen the predictor in the high-performance regime most relevant to NAS, we propose a Self-Evolving Mechanism that progressively aligns the predictor's training distribution with the high-performance frontier, by using the predictor itself to discover and evaluate informative architectures in each iteration. This method grows the pool to 1,500 architectures and raises the ensemble predictor's R2 from 0.770 to 0.815 and Sparse Kendall Tau from 0.694 to 0.752, demonstrating strong predictive accuracy and ranking consistency. Using the final predictor as the fitness function for evolutionary search, we discover architectures that surpass all official YOLOv8-YOLO12 baselines at comparable latency on COCO-mini, confirming the predictor's discriminative power for top-performing detection architectures. The code is available at https://github.com/VDIGPKU/YOLO-NAS-Bench.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript introduces YOLO-NAS-Bench as the first surrogate benchmark for neural architecture search targeting YOLO-style detectors. It defines a search space over channel widths, block depths, and operator types for backbone and neck modules spanning YOLOv8 to YOLO12. 1,000 architectures are sampled and trained on COCO-mini; a LightGBM surrogate is trained on these results. A Self-Evolving Mechanism then uses the predictor to propose and evaluate additional high-performance architectures, expanding the set to 1,500 samples and lifting ensemble R² from 0.770 to 0.815 and Sparse Kendall Tau from 0.694 to 0.752. The final predictor serves as the fitness function in an evolutionary search that yields architectures outperforming official YOLOv8–YOLO12 baselines at comparable latency on COCO-mini.
Significance. If the surrogate rankings prove transferable, the benchmark and self-evolving predictor would meaningfully lower the barrier to NAS for object detection by replacing multi-day full trainings with fast inference. The explicit focus on the high-performance regime via self-evolution is a constructive technical contribution, and public code release aids reproducibility. The central limitation is that all quantitative claims, including the headline result of surpassing official baselines, rest exclusively on COCO-mini without reported transfer or rank-correlation checks to the full COCO dataset.
major comments (2)
- [Abstract] Abstract and experimental results: All performance numbers, ranking metrics, and the claim that evolved architectures surpass official YOLOv8–YOLO12 baselines are obtained exclusively on COCO-mini. No experiment trains the discovered models on the full COCO train set, evaluates mAP on the standard val set, or measures rank correlation between COCO-mini and full-COCO orderings. Because the usefulness of the surrogate for guiding real NAS hinges on this correlation, the absence of such validation is load-bearing for the central claim.
- [Self-Evolving Mechanism] Self-Evolving Mechanism (described in abstract and §3): The loop uses the current predictor to select new candidate architectures, evaluates them on COCO-mini, and folds the results back into the training pool. While presented as active learning, the manuscript does not report a held-out validation set, an analysis of selection bias, or an ablation showing that the observed R²/Kendall-Tau gains are not partly artifacts of the predictor reinforcing its own preferences. This circularity risk directly affects the reliability of the final predictor used for evolutionary search.
minor comments (3)
- The term 'Sparse Kendall Tau' is used without definition or citation; a short explanation or reference to the exact variant employed would improve clarity.
- [Experimental setup] LightGBM hyperparameters and the precise number of architectures added per self-evolving iteration are treated as free parameters but are not tabulated or subjected to sensitivity analysis; including these values would help readers reproduce the reported metric improvements.
- Figure captions and axis labels should explicitly state that all latency and accuracy numbers are measured on COCO-mini rather than full COCO to avoid misinterpretation.
Simulated Author's Rebuttal
We are grateful to the referee for their positive evaluation of the paper's significance and for highlighting important aspects that require further clarification and validation. We address the two major comments below, committing to revisions that will strengthen the manuscript's claims regarding the surrogate's applicability and the robustness of the self-evolving approach.
read point-by-point responses
-
Referee: [Abstract] Abstract and experimental results: All performance numbers, ranking metrics, and the claim that evolved architectures surpass official YOLOv8–YOLO12 baselines are obtained exclusively on COCO-mini. No experiment trains the discovered models on the full COCO train set, evaluates mAP on the standard val set, or measures rank correlation between COCO-mini and full-COCO orderings. Because the usefulness of the surrogate for guiding real NAS hinges on this correlation, the absence of such validation is load-bearing for the central claim.
Authors: We acknowledge that all reported results, including the outperformance claims, are based on COCO-mini. This dataset was selected as a computationally tractable proxy to enable sampling and full training of 1,500 architectures, which would otherwise demand infeasible resources on full COCO. To directly address transferability, we will add to the revised manuscript: (i) training and evaluation of a subset of the evolved architectures on the full COCO train set with mAP reported on the standard validation set, and (ii) rank-correlation analysis between COCO-mini and full-COCO performance for a representative sample of architectures. These additions will provide concrete evidence on the proxy's reliability for NAS guidance. revision: yes
-
Referee: [Self-Evolving Mechanism] Self-Evolving Mechanism (described in abstract and §3): The loop uses the current predictor to select new candidate architectures, evaluates them on COCO-mini, and folds the results back into the training pool. While presented as active learning, the manuscript does not report a held-out validation set, an analysis of selection bias, or an ablation showing that the observed R²/Kendall-Tau gains are not partly artifacts of the predictor reinforcing its own preferences. This circularity risk directly affects the reliability of the final predictor used for evolutionary search.
Authors: We thank the referee for identifying this potential circularity concern. The self-evolving mechanism was intended to enrich the training distribution toward high-performing architectures relevant to NAS. In the revision we will add: (1) explicit description and results from a held-out validation set used to track predictor generalization across iterations, (2) distributional analysis comparing selected versus non-selected architectures to quantify selection bias, and (3) an ablation contrasting self-evolution against continued random or stratified sampling, demonstrating that the observed R² and Sparse Kendall Tau improvements arise from frontier enrichment rather than self-reinforcement alone. These elements will substantiate the mechanism's reliability. revision: yes
Circularity Check
No significant circularity in the surrogate benchmark construction
full rationale
The paper trains a LightGBM predictor on 1,000 randomly/stratified/LHS-sampled YOLO architectures whose true performance is obtained by full training on COCO-mini. The self-evolving step iteratively uses the current predictor only to propose additional candidates that are then actually trained and added to the pool, raising R² and Kendall-τ on the expanded set; this is ordinary active learning, not a definitional loop in which a prediction is forced by its own inputs. The final evolutionary search employs the trained predictor as a cheap fitness function, after which the discovered candidates are evaluated to produce the reported superiority claim. No equation reduces to another by construction, no uniqueness theorem is imported from self-citation, and no ansatz is smuggled. The derivation remains self-contained against the COCO-mini benchmark data.
Axiom & Free-Parameter Ledger
free parameters (2)
- LightGBM model hyperparameters
- Number of architectures added per self-evolving iteration
axioms (1)
- domain assumption COCO-mini performance rankings transfer to full COCO and deployment settings
invented entities (1)
-
Self-Evolving Mechanism
no independent evidence
Lean theorems connected to this paper
-
IndisputableMonolith/Cost/FunctionalEquation.leanwashburn_uniqueness_aczel unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
We sample 1,000 architectures via random, stratified, and Latin Hypercube strategies, train them on COCO-mini, and build a LightGBM surrogate predictor. To sharpen the predictor in the high-performance regime most relevant to NAS, we propose a Self-Evolving Mechanism...
-
IndisputableMonolith/Foundation/RealityFromDistinction.leanreality_from_one_distinction unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
Using the final predictor as the fitness function for evolutionary search, we discover architectures that surpass all official YOLOv8-YOLO12 baselines at comparable latency on COCO-mini
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Reference graph
Works this paper leans on
-
[1]
Shay Aharon, Louis-Dupont, Ofri Masad, Kate Yurkova, Lotem Fridman, Lkdci, Eugene Khvedchenya, Ran Rubin, Natan Bagrov, Borys Tymchenko, Tomer Keren, Alexander Zhilko, and Eran-Deci. Super-gradients, 2021. 1, 2
work page 2021
-
[2]
YOLOv4: Optimal Speed and Accuracy of Object Detection
Alexey Bochkovskiy, Chien-Yao Wang, and Hong- Yuan Mark Liao. YOLOv4: Optimal speed and accuracy of object detection.arXiv preprint arXiv:2004.10934, 2020. 5
work page internal anchor Pith review Pith/arXiv arXiv 2004
-
[3]
Random forests.Machine Learning, 45(1): 5–32, 2001
Leo Breiman. Random forests.Machine Learning, 45(1): 5–32, 2001. 6
work page 2001
-
[4]
XGBoost: A scalable tree boosting system
Tianqi Chen and Carlos Guestrin. XGBoost: A scalable tree boosting system. InProceedings of the ACM SIGKDD In- ternational Conference on Knowledge Discovery and Data Mining, pages 785–794, 2016. 6
work page 2016
-
[5]
DetNAS: Backbone search for object detection
Yukang Chen, Tong Yang, Xiangyu Zhang, Gaofeng Meng, Xinyu Xiao, and Jian Sun. DetNAS: Backbone search for object detection. 32:6642–6652, 2019. 1, 2
work page 2019
-
[6]
NAS-Bench-201: Extending the scope of reproducible neural architecture search
Xuanyi Dong and Yi Yang. NAS-Bench-201: Extending the scope of reproducible neural architecture search. InInter- national Conference on Learning Representations, 2020. 1, 2
work page 2020
-
[7]
NGBoost: Natural gradient boosting for probabilistic prediction
Tony Duan, Anand Avati, Daisy Yi Ding, Khanh K Thai, Sanjay Basu, Andrew Ng, and Alejandro Schuler. NGBoost: Natural gradient boosting for probabilistic prediction. InIn- ternational Conference on Machine Learning, pages 2690– 2700, 2020. 6
work page 2020
-
[8]
NAS-FPN: Learning scalable feature pyramid architecture for object detection
Golnaz Ghiasi, Tsung-Yi Lin, and Quoc V Le. NAS-FPN: Learning scalable feature pyramid architecture for object detection. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 7036– 7045, 2019. 2
work page 2019
-
[9]
Simple copy-paste is a strong data augmentation method for instance segmentation
Golnaz Ghiasi, Yin Cui, Aravind Srinivas, Rui Qian, Tsung- Yi Lin, Ekin D Cubuk, Quoc V Le, and Barret Zoph. Simple copy-paste is a strong data augmentation method for instance segmentation. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 2918– 2928, 2021. 5
work page 2021
-
[10]
SP-NAS: Serial-to-parallel backbone search for object detection
Chenhan Jiang, Hang Xu, Wei Zhang, Xiaodan Liang, and Zhenguo Li. SP-NAS: Serial-to-parallel backbone search for object detection. InProceedings of the IEEE/CVF Con- ference on Computer Vision and Pattern Recognition, pages 11863–11872, 2020. 2
work page 2020
- [11]
-
[12]
Glenn Jocher, Ayush Chaurasia, and Jing Qiu. Ultralytics yolov8, 2023. 2, 3
work page 2023
-
[13]
LightGBM: A highly efficient gradient boosting decision tree
Guolin Ke, Qi Meng, Thomas Finley, Taifeng Wang, Wei Chen, Weidong Ma, Qiwei Ye, and Tie-Yan Liu. LightGBM: A highly efficient gradient boosting decision tree. 30:3149– 3157, 2017. 4, 6
work page 2017
-
[14]
YOLOBench: Benchmarking efficient object detectors on embedded sys- tems
Ivan Lazarevich, Matteo Grimaldi, Ravish Kumar, Saptarshi Mitra, Shahrukh Khan, and Sudhakar Sah. YOLOBench: Benchmarking efficient object detectors on embedded sys- tems. InProceedings of the IEEE/CVF International Con- ference on Computer Vision, pages 1169–1178, 2023. 2
work page 2023
-
[15]
OPANAS: One-shot path aggregation network architecture search for object detection
Tingting Liang, Yongtao Wang, Zhi Tang, Guosheng Hu, and Haibin Ling. OPANAS: One-shot path aggregation network architecture search for object detection. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 10195–10203, 2021. 1, 2
work page 2021
-
[16]
Microsoft COCO: Common objects in context
Tsung-Yi Lin, Michael Maire, Serge Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Doll´ar, and C Lawrence Zitnick. Microsoft COCO: Common objects in context. In European Conference on Computer Vision, pages 740–755,
-
[17]
DARTS: Differentiable architecture search
Hanxiao Liu, Karen Simonyan, and Yiming Yang. DARTS: Differentiable architecture search. InInternational Confer- ence on Learning Representations, 2019. 1
work page 2019
-
[18]
Michael D McKay, Richard J Beckman, and William J Conover. A comparison of three methods for selecting values of input variables in the analysis of output from a computer code.Technometrics, 42(1):55–61, 2000. 3
work page 2000
-
[19]
Regularized evolution for image classifier architecture search
Esteban Real, Alok Aggarwal, Yanping Huang, and Quoc V Le. Regularized evolution for image classifier architecture search. InProceedings of the AAAI Conference on Artificial Intelligence, pages 4780–4789, 2019. 1, 2, 4, 5
work page 2019
-
[20]
YOLOv12: Attention-centric real-time object detectors
Yunjie Tian, Qixiang Ye, and David Doermann. YOLOv12: Attention-centric real-time object detectors. InAdvances in Neural Information Processing Systems, 2025. 2
work page 2025
-
[21]
NAS-Bench-360: Benchmarking neural architecture search on diverse tasks
Renbo Tu, Nicholas Roberts, Misha Khodak, Junhong Shen, Frederic Sala, and Ameet Talwalkar. NAS-Bench-360: Benchmarking neural architecture search on diverse tasks. 35:12380–12394, 2022. 2
work page 2022
-
[22]
YOLOv10: Real-time end- to-end object detection
Ao Wang, Hui Chen, Lihao Liu, Kai Chen, Zijia Lin, Jun- gong Han, and Guiguang Ding. YOLOv10: Real-time end- to-end object detection. 37:107984–108011, 2024. 2, 3
work page 2024
-
[23]
YOLOv9: Learning what you want to learn using pro- grammable gradient information
Chien-Yao Wang, I-Hau Yeh, and Hong-Yuan Mark Liao. YOLOv9: Learning what you want to learn using pro- grammable gradient information. InEuropean Conference on Computer Vision, pages 1–21, 2024. 2
work page 2024
-
[24]
NAS-Bench-101: To- wards reproducible neural architecture search
Chris Ying, Aaron Klein, Eric Christiansen, Esteban Real, Kevin Murphy, and Frank Hutter. NAS-Bench-101: To- wards reproducible neural architecture search. InInterna- tional Conference on Machine Learning, pages 7105–7114,
-
[25]
Surrogate NAS benchmarks: Going beyond the limited search spaces of tabular NAS benchmarks
Arber Zela, Julien Niklas Siems, Lucas Zimmer, Jovita Lukasik, Margret Keuper, and Frank Hutter. Surrogate NAS benchmarks: Going beyond the limited search spaces of tabular NAS benchmarks. InInternational Conference on Learning Representations, 2022. 1, 2, 4
work page 2022
-
[26]
mixup: Beyond empirical risk minimiza- tion
Hongyi Zhang, Moustapha Ciss ´e, Yann N Dauphin, and David Lopez-Paz. mixup: Beyond empirical risk minimiza- tion. InInternational Conference on Learning Representa- tions, 2018. 5
work page 2018
-
[27]
DETRs beat YOLOs on real-time object detection
Yian Zhao, Wenyu Lv, Shangliang Xu, Jinman Wei, Guanzhong Wang, Qingqing Dang, Yi Liu, and Jie Chen. DETRs beat YOLOs on real-time object detection. InPro- ceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 16965–16974, 2024. 2
work page 2024
-
[28]
Neural architecture search with reinforcement learning
Barret Zoph and Quoc V Le. Neural architecture search with reinforcement learning. InInternational Conference on Learning Representations, 2017. 1 7
work page 2017
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.