Mise en abyme with artificial intelligence: how to predict the accuracy of NN, applied to hyper-parameter tuning
Pith reviewed 2026-05-25 13:26 UTC · model grok-4.3
The pith
A support vector machine trained on early training runs can predict final neural network accuracy, enabling low-cost hyperparameter searches that recover known optimal results.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
By collecting initial and final accuracies across multiple trainings that differ in network characteristics, fitting curves to early performance, and training a support vector machine on the resulting database, the final accuracy of a network can be predicted from its primary iterations alone; applying this predictor inside a probabilistic hyperparameter search recovers the optimal accuracies known for MNIST and CIFAR-10 at substantially reduced computational cost.
What carries the argument
Support vector machine trained on a database of early accuracy observations paired with final accuracies from prior runs, together with curve fitting to model training trajectories.
Load-bearing premise
The mapping from early training behavior to final accuracy remains sufficiently stable across changes in network characteristics for an SVM trained on a modest database of prior runs to generalize to new configurations.
What would settle it
For a previously unseen hyperparameter configuration, measure the actual accuracy after full training and compare it to the accuracy predicted by the SVM from the first few iterations; a large discrepancy falsifies the claim.
Figures
read the original abstract
In the context of deep learning, the costliest phase from a computational point of view is the full training of the learning algorithm. However, this process is to be used a significant number of times during the design of a new artificial neural network, leading therefore to extremely expensive operations. Here, we propose a low-cost strategy to predict the accuracy of the algorithm, based only on its initial behaviour. To do so, we train the network of interest up to convergence several times, modifying its characteristics at each training. The initial and final accuracies observed during this beforehand process are stored in a database. We then make use of both curve fitting and Support Vector Machines techniques, the latter being trained on the created database, to predict the accuracy of the network, given its accuracy on the primary iterations of its learning. This approach can be of particular interest when the space of the characteristics of the network is notably large or when its full training is highly time-consuming. The results we obtained are promising and encouraged us to apply this strategy to a topical issue: hyper-parameter optimisation (HO). In particular, we focused on the HO of a convolutional neural network for the classification of the databases MNIST and CIFAR-10. By using our method of prediction, and an algorithm implemented by us for a probabilistic exploration of the hyper-parameter space, we were able to find the hyper-parameter settings corresponding to the optimal accuracies already known in literature, at a quite low-cost.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript proposes training neural networks multiple times with varying characteristics to build a database of initial and final accuracies, then using curve fitting and an SVM trained on this database to predict final accuracy from early training behavior. This predictor is integrated into a custom probabilistic hyperparameter optimization procedure to search for optimal settings of a CNN on MNIST and CIFAR-10, with the claim that known literature optima are recovered at low computational cost.
Significance. If the early-to-final accuracy mapping generalizes reliably to unseen hyperparameter configurations, the approach could reduce the number of full trainings needed during hyperparameter search, offering a practical low-cost alternative for expensive models or large search spaces. The reported recovery of known optima on standard benchmarks would indicate potential utility, though the absence of quantitative validation leaves the practical impact unevaluated.
major comments (3)
- [Abstract] Abstract: The assertion of 'promising results' and recovery of 'the hyper-parameter settings corresponding to the optimal accuracies already known in literature' is unsupported by any reported metrics, achieved accuracies, number of full trainings avoided, baseline comparisons, or error analysis of the predictions.
- [Method (database and SVM)] Database construction and SVM training: The SVM is fitted directly to accuracy pairs generated by the same class of network trainings it is later asked to forecast, conditioning the predictor on data drawn from the target distribution rather than independent external benchmarks; this makes generalization to new hyper-parameter combinations an unverified assumption.
- [Experiments (HO results)] Hyper-parameter optimisation experiments: No information is supplied on database size, diversity of network characteristics, held-out validation error of the SVM, or the probabilistic exploration algorithm; without these, the success on MNIST/CIFAR-10 provides no evidence that prediction errors would not cause the search to miss superior configurations.
Simulated Author's Rebuttal
We thank the referee for the thoughtful and detailed report. We address each major comment below, indicating revisions that will be incorporated into the next version of the manuscript to provide the requested quantitative support and methodological details.
read point-by-point responses
-
Referee: [Abstract] Abstract: The assertion of 'promising results' and recovery of 'the hyper-parameter settings corresponding to the optimal accuracies already known in literature' is unsupported by any reported metrics, achieved accuracies, number of full trainings avoided, baseline comparisons, or error analysis of the predictions.
Authors: We agree that the abstract would be strengthened by quantitative support. In the revision we will add specific metrics including the SVM prediction error on held-out data, the number of full trainings performed versus avoided, the final accuracies recovered on MNIST and CIFAR-10, and a brief comparison to a standard grid-search baseline. These additions will be drawn from the experimental results already obtained and will be stated concisely in the abstract. revision: yes
-
Referee: [Method (database and SVM)] Database construction and SVM training: The SVM is fitted directly to accuracy pairs generated by the same class of network trainings it is later asked to forecast, conditioning the predictor on data drawn from the target distribution rather than independent external benchmarks; this makes generalization to new hyper-parameter combinations an unverified assumption.
Authors: The database is generated by systematically varying hyper-parameters and architectural choices across repeated trainings, and the subsequent hyper-parameter search is allowed to propose combinations outside the exact training set. Nevertheless, the referee correctly notes that explicit verification of generalization is missing. We will add a held-out validation split of the accuracy-pair database and report the SVM test error on configurations not seen during SVM training, thereby quantifying the generalization assumption. revision: partial
-
Referee: [Experiments (HO results)] Hyper-parameter optimisation experiments: No information is supplied on database size, diversity of network characteristics, held-out validation error of the SVM, or the probabilistic exploration algorithm; without these, the success on MNIST/CIFAR-10 provides no evidence that prediction errors would not cause the search to miss superior configurations.
Authors: We acknowledge that the current manuscript omits these essential details. The revised version will report: (i) the exact number of trainings used to build the database and the ranges of characteristics varied, (ii) the held-out validation error of the SVM, and (iii) a concise description of the probabilistic exploration procedure (including how prediction uncertainty is propagated). These additions will allow readers to evaluate the risk that prediction errors could cause the search to overlook better configurations. revision: yes
Circularity Check
No significant circularity; empirical surrogate model is self-contained
full rationale
The paper builds an empirical database from multiple full trainings with varied network characteristics, fits an SVM (plus curve fitting) on that database to map early behavior to final accuracy, and deploys the fitted model as a cheap surrogate inside a probabilistic hyper-parameter search. This is a standard learned predictor whose validity rests on generalization from the database rather than any derivation that reduces by construction to its own inputs. No equations, self-citations, or uniqueness claims are present that would create a self-definitional or fitted-input loop; the reported recovery of known MNIST/CIFAR-10 optima is offered as external empirical evidence, not a mathematical identity.
Axiom & Free-Parameter Ledger
free parameters (2)
- SVM hyperparameters (kernel, regularization)
- Curve-fitting coefficients
axioms (1)
- domain assumption Early training accuracy behavior correlates reliably with final accuracy across variations in network characteristics.
Reference graph
Works this paper leans on
- [1]
-
[2]
J. S. Bergstra, R. Bardenet, Y. Bengio, and B. Kgl, Algorithms for hyper-parameter optimization. In NIPS. 2011
work page 2011
-
[3]
Jones, A taxonomy of global optimization methods based on response surfaces
D.R. Jones, A taxonomy of global optimization methods based on response surfaces. Journal of Global Optimization, 21:345383, 2001
work page 2001
-
[4]
B. Shahriari, K. Swersky, Z. Wang, R. P. Adams and N. de Freitas, Taking the human out of the loop: A review of bayesian optimization, Proc. IEEE 104(1) (2016) 148175
work page 2016
- [5]
-
[6]
J. Bergstra and Y. Bengio, Random search for hyper-parameter optimization, J. Mach. Learn. Res. 13(1) (2012) 281305
work page 2012
- [7]
-
[8]
B. Zoph and Q. V. Le. Neural architecture search with reinforcement learning, Int. Conf. Learning Representations, Toulon, France, 2017, pp. 116
work page 2017
- [9]
-
[10]
Practical Block-wise Neural Network Architecture Generation
Z. Zhong, J. Yan, W. Wei, J. Shao and C.-L. Liu, Practical block-wise neural network architecture generation, Conf. Computer Vision and Pattern Recognition , Salt Lake City, Utah, USA, 2018, arXiv preprint: 1708.05552
work page internal anchor Pith review Pith/arXiv arXiv 2018
-
[11]
H. Cai, T. Chen, W. Zhang, Y. Yu and J. Wang, Efficient architecture search by net- work transformation, AAAI Conf. Artificial Intelligence , New Orleans, Louisiana, USA, 2018, pp. 27872794 10 G. Franchini, M. Galinier, M. Verucchi
work page 2018
-
[12]
O. Chapelle and V. Vapnik, Model Selection for Support Vector Machines. In Advances in Neural Information Processing Systems , Vol 12, (1999)
work page 1999
-
[13]
Sandra Lach Arlinghaus, PHB Practical Handbook of Curve Fitting. CRC Press, 1994
work page 1994
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.