pith. sign in

arxiv: 2606.04866 · v1 · pith:ANV5B7YPnew · submitted 2026-06-03 · 💻 cs.LG

Provably Reduced Sample Cost in Prior-Guided Hyperparameter Optimization

Pith reviewed 2026-06-28 07:38 UTC · model grok-4.3

classification 💻 cs.LG
keywords hyperparameter optimizationprior-guided HPObest-arm identificationsample complexitymulti-fidelity optimizationdistribution-dependent boundsAutoML
0
0 comments X

The pith

Informative priors concentrating probability mass on near-optimal arms reduce the evaluations needed in multi-fidelity hyperparameter optimization.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper establishes that modeling priors directly over arm means as configuration performance allows derivation of distribution-dependent sample complexity bounds in the fixed-budget best-arm identification setting. These bounds show that priors concentrating on near-optimal configurations lower the number of required evaluations, while uninformative or misleading priors recover standard baseline performance. A reader would care because large-scale HPO in AutoML is computationally expensive, and this supplies the first quantitative link between prior informativeness and evaluation budget. Proof-of-concept runs on a synthetic benchmark and LCBench achieve up to 90 percent budget reduction while preserving solution quality.

Core claim

By modeling priors directly over arm means as configuration performance within the fixed-budget best-arm identification framework, we derive explicit distribution-dependent error bounds that quantify the relationship between prior informativeness and evaluation budget. Our analysis shows that informative priors, which concentrate probability mass on near-optimal arms, yield reductions in the number of required evaluations, whereas baseline performance is recovered with uninformative or misleading priors.

What carries the argument

Fixed-budget best-arm identification framework with priors modeled over arm means, yielding distribution-dependent sample complexity bounds.

If this is right

  • Informative priors reduce the number of evaluations required for multi-fidelity HPO.
  • Uninformative or misleading priors produce no improvement and recover baseline performance.
  • Up to 90 percent budget reduction is observed on synthetic and LCBench benchmarks while retaining solution quality.
  • The bounds supply a principled foundation for incorporating priors into compute-efficient AutoML.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The bounds could guide construction or selection of priors that maximize efficiency in new HPO tasks.
  • The same modeling approach may extend to quantify prior benefits in other black-box optimization problems.
  • The results support analysis of energy savings when prior information is available in large-scale ML pipelines.

Load-bearing premise

Priors can be modeled directly over the means of arm performances inside a fixed-budget best-arm identification setup to produce explicit error bounds linking prior quality to sample needs.

What would settle it

An experiment in which priors concentrating on near-optimal arms produce no reduction in evaluations below the no-prior baseline, or in which observed sample needs violate the derived bounds.

Figures

Figures reproduced from arXiv: 2606.04866 by Barbara Hammer, Jasmin Brandt, Leona Hennig, Lukas Fehring, Marcel Wever, Marius Lindauer.

Figure 1
Figure 1. Figure 1: Performance of prior-guided successive halving (PSH) with different priors on the synthetic [PITH_FULL_IMAGE:figures/full_fig_p009_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Learning curves of arms in the synthetic benchmark. Each arm [PITH_FULL_IMAGE:figures/full_fig_p022_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Hyperband performance across prior types on regret (left) and probability of [PITH_FULL_IMAGE:figures/full_fig_p025_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Regret for PriorBand with the PSH early stopping criterion across prior types, plotted [PITH_FULL_IMAGE:figures/full_fig_p026_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Sensitivity analysis of the required prior gap across confidence ( [PITH_FULL_IMAGE:figures/full_fig_p028_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: Regret and consumed budget of PSH on LCBench with varying GP surrogate kernels. [PITH_FULL_IMAGE:figures/full_fig_p029_6.png] view at source ↗
Figure 7
Figure 7. Figure 7: Regret distributions and consumed budgets for all PSH prior generation schemes under [PITH_FULL_IMAGE:figures/full_fig_p030_7.png] view at source ↗
read the original abstract

Large-scale hyperparameter optimization (HPO) in automated machine learning (AutoML) consumes substantial computational resources, raising growing concerns about scalability and energy efficiency. Existing methods use prior information heuristically to accelerate both black-box and multi-fidelity settings, but they lack a characterization of how prior informativeness quantitatively reduces sample complexity. In this work, we provide the first distribution-dependent sample complexity bounds for multi-fidelity HPO with priors through the formal lens of fixed-budget best-arm identification. By modeling priors directly over arm means as configuration performance, we derive explicit, distribution-dependent error bounds that quantify the relationship between priors and evaluation budget. Our analysis shows that informative priors, which concentrate probability mass on near-optimal arms, yield reductions in the number of required evaluations, whereas baseline performance is recovered with uninformative or misleading priors. We conduct proof-of-concept experiments on a synthetic benchmark and on LCBench, a common multi-fidelity HPO benchmark for deep learning, to confirm our theoretical results, achieving up to 90% budget reduction while retaining solution quality. Together, our results provide a principled foundation for prior-guided and compute-efficient green AutoML.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

1 major / 0 minor

Summary. The paper claims to provide the first distribution-dependent sample complexity bounds for multi-fidelity hyperparameter optimization with priors, derived via the fixed-budget best-arm identification framework. By modeling priors directly over arm means (as configuration performance), it derives explicit error bounds quantifying how informative priors (concentrating mass near optimal arms) reduce required evaluations, while uninformative or misleading priors recover baseline performance. Proof-of-concept experiments on synthetic data and LCBench are reported to achieve up to 90% budget reduction while retaining solution quality.

Significance. If the bounds correctly handle multi-fidelity structure, the work supplies a principled, quantitative link between prior informativeness and evaluation budget in HPO, offering a foundation for compute-efficient AutoML. The explicit distribution-dependent bounds and experimental validation on a standard benchmark constitute strengths that enable falsifiable predictions.

major comments (1)
  1. [theoretical analysis] Abstract and theoretical analysis: the claim of multi-fidelity HPO bounds with reduced sample cost is not supported by the stated modeling. The derivation places priors directly over arm means as configuration performance and produces distribution-dependent error bounds, but supplies no indication that a fidelity-dependent observation model, bias term, or cost-weighted allocation enters the bounds. If every evaluation is treated as a direct, unbiased observation of the target mean, the claimed reduction in total compute (rather than raw evaluation count) does not necessarily follow.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for the careful and constructive review. The feedback highlights an important gap between the modeling assumptions and the multi-fidelity claims, which we address directly below.

read point-by-point responses
  1. Referee: [theoretical analysis] Abstract and theoretical analysis: the claim of multi-fidelity HPO bounds with reduced sample cost is not supported by the stated modeling. The derivation places priors directly over arm means as configuration performance and produces distribution-dependent error bounds, but supplies no indication that a fidelity-dependent observation model, bias term, or cost-weighted allocation enters the bounds. If every evaluation is treated as a direct, unbiased observation of the target mean, the claimed reduction in total compute (rather than raw evaluation count) does not necessarily follow.

    Authors: We agree that the current theoretical derivation models each observation as an unbiased sample from the target arm mean and does not yet incorporate a fidelity-dependent observation model, bias terms for lower-fidelity evaluations, or explicit cost-weighted allocation. Consequently, the stated bounds quantify reductions in the number of evaluations rather than total computational cost. To resolve this, we will revise the theoretical analysis section to introduce a multi-fidelity observation model (with fidelity-specific bias and variance) and a cost function that enters the fixed-budget allocation rule. The revised bounds will then explicitly relate prior informativeness to reductions in total compute while recovering the original results as a special case when all evaluations are at full fidelity. revision: yes

Circularity Check

0 steps flagged

Derivation self-contained in standard BAI framework; no reduction to inputs or self-citations

full rationale

The paper models priors directly over arm means within the fixed-budget best-arm identification setting and derives explicit distribution-dependent error bounds from that framework. No equations or steps are shown reducing by construction to fitted parameters, renamed predictions, or load-bearing self-citations; the claimed relationship between prior informativeness and evaluation budget follows from the standard BAI analysis once the prior is placed on the means. The multi-fidelity aspect is invoked but does not alter the core derivation into a circular form. This is the normal non-circular outcome for a theoretical bounds paper.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The central claim rests on the domain assumption that priors can be modeled directly over arm means and that the fixed-budget best-arm identification lens applies to multi-fidelity HPO; no free parameters or invented entities are mentioned in the abstract.

axioms (2)
  • domain assumption Priors can be modeled directly over arm means representing configuration performance.
    Stated in the abstract as the modeling choice enabling the bounds.
  • domain assumption The fixed-budget best-arm identification framework applies to multi-fidelity HPO with priors.
    Described as the formal lens through which the sample complexity bounds are derived.

pith-pipeline@v0.9.1-grok · 5745 in / 1381 out tokens · 24480 ms · 2026-06-28T07:38:56.969151+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

43 extracted references · 3 canonical work pages · 1 internal anchor

  1. [1]

    Atsidakou, S

    A. Atsidakou, S. Katariya, S. Sanghavi, and B. Kveton. Bayesian fixed-budget best-arm identification.arXiv preprint arXiv:2211.08572, 2022

  2. [2]

    Audibert and S

    J.-Y . Audibert and S. Bubeck. Best arm identification in multi-armed bandits. InCOLT-23th Conference on learning theory-2010, pages 13–p, 2010

  3. [3]

    Bischl, M

    B. Bischl, M. Binder, M. Lang, T. Pielok, J. Richter, S. Coors, J. Thomas, T. Ullmann, M. Becker, A. Boulesteix, D. Deng, and M. Lindauer. Hyperparameter optimization: Foundations, algo- rithms, best practices, and open challenges.Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery, page e1484, 2023

  4. [4]

    Bohdal, L

    O. Bohdal, L. Balles, M. Wistuba, B. Ermis, C. Archambeau, and G. Zappella. PASHA: Efficient HPO and NAS with progressive resource allocation. InThe Eleventh International Conference on Learning Representations (ICLR’23). ICLR, 2023. Published online:iclr.cc

  5. [5]

    Brandt, E

    J. Brandt, E. Schede, B. Haddenhorst, V . Bengs, E. Hüllermeier, and K. Tierney. AC-Band: A combinatorial bandit-based approach to algorithm configuration. In B. Williams, S. Bernardini, Y . Chen, and J. Neville, editors,Proceedings of the Thirty-Seventh Conference on Artificial Intelligence (AAAI’23). Association for the Advancement of Artificial Intelli...

  6. [6]

    Brandt, M

    J. Brandt, M. Wever, V . Bengs, and E. Hüllermeier. Best arm identification with retroactively increased sampling budget for more resource-efficient hpo. In K. Larson, editor,Proceedings of the 33rd International Joint Conference on Artificial Intelligence (IJCAI’24), 2024

  7. [7]

    Castellanos-Nieves and L

    D. Castellanos-Nieves and L. García-Forte. Strategies of automated machine learning for energy sustainability in green artificial intelligence.Applied Sciences, 14(14), 2024

  8. [8]

    Castellanos-Nieves and L

    D. Castellanos-Nieves and L. García-Forte. Improving automated machine-learning systems through green ai.Applied Sciences, 13(20), 2023

  9. [9]

    Eggensperger, P

    K. Eggensperger, P. Müller, N. Mallik, M. Feurer, R. Sass, A. Klein, N. Awad, M. Lindauer, and F. Hutter. HPOBench: A collection of reproducible multi-fidelity benchmark problems for HPO. In J. Vanschoren and S. Yeung, editors,Proceedings of the Neural Information Processing Systems Track on Datasets and Benchmarks. Curran Associates, 2021. 10

  10. [10]

    AutoGluon-Tabular: Robust and Accurate AutoML for Structured Data

    N. Erickson, J. Mueller, A. Shirkov, H. Zhang, P. Larroy, M. Li, and A. Smola. Autogluon- tabular: Robust and accurate automl for structured data.arXiv:2003.06505 [stat.ML], 2020

  11. [11]

    Even-Dar, S

    E. Even-Dar, S. Mannor, and Y . Mansour. Pac bounds for multi-armed bandit and markov decision processes. InInternational Conference on Computational Learning Theory, pages 255–270. Springer, 2002

  12. [12]

    Falkner, G

    A. Falkner, G. Friedrich, K. Schekotihin, R. Taupe, and E. Teppan. Industrial applications of answer set programming.KI-Künstliche Intelligenz, pages 1–12, 2018

  13. [13]

    Feurer, J

    M. Feurer, J. Springenberg, and F. Hutter. Initializing Bayesian Hyperparameter Optimization via meta-learning. In B. Bonet and S. Koenig, editors,Proceedings of the Twenty-ninth AAAI Conference on Artificial Intelligence (AAAI’15), pages 1128–1135. AAAI Press, 2015

  14. [14]

    Feurer, K

    M. Feurer, K. Eggensperger, S. Falkner, M. Lindauer, and F. Hutter. Auto-Sklearn 2.0: Hands- free automl via meta-learning.Journal of Machine Learning Research, 23(261):1–61, 2022

  15. [15]

    Giovanelli, A

    J. Giovanelli, A. Tornede, T. Tornede, and M. Lindauer. Interactive hyperparameter opti- mization in multi-objective problems via preference learning. In M. J. Wooldridge, J. G. Dy, and S. Natarajan, editors,Proceedings of the Thirty-Eighth AAAI Conference on Artificial Intelligence (AAAI’24). AAAI Press, 2024

  16. [16]

    Hutter, L

    F. Hutter, L. Kotthoff, and J. Vanschoren, editors.Automated Machine Learning: Methods, Systems, Challenges. Springer, 2019. Available for free athttp://automl.org/book

  17. [17]

    Hvarfner, D

    C. Hvarfner, D. Stoll, A. Souza, L. Nardi, M. Lindauer, and F. Hutter. πBO: Augmenting Acquisition Functions with User Beliefs for Bayesian Optimization. InThe Tenth International Conference on Learning Representations (ICLR’22). ICLR, 2022. Published online: iclr.cc

  18. [18]

    Proceedings of the International Conference on Learning Representations (ICLR’17), 2017. ICLR. Published online:iclr.cc

  19. [19]

    Jamieson and A

    K. Jamieson and A. Talwalkar. Non-stochastic best arm identification and Hyperparameter Optimization. In A. Gretton and C. Robert, editors,Proceedings of the Seventeenth International Conference on Artificial Intelligence and Statistics (AISTATS’16), volume 51. Proceedings of Machine Learning Research, 2016

  20. [20]

    Klein, S

    A. Klein, S. Falkner, J. Springenberg, and F. Hutter. Learning curve prediction with Bayesian neural networks. InThe Fifth International Conference on Learning Representations (ICLR’17) ICL [18]. Published online:iclr.cc

  21. [21]

    Lederer, J

    A. Lederer, J. Umlauft, and S. Hirche. Uniform error bounds for gaussian process regression with application to safe control.Advances in Neural Information Processing Systems, 32, 2019

  22. [22]

    F. Li, S. Wang, and K. Li. Lamda: Two-phase multi-fidelity hpo via learning promising regions from data. InProceedings of the AAAI Conference on Artificial Intelligence, 2026

  23. [23]

    L. Li, K. Jamieson, G. DeSalvo, A. Rostamizadeh, and A. Talwalkar. Hyperband: Bandit- based configuration evaluation for Hyperparameter Optimization. InThe Fifth International Conference on Learning Representations (ICLR’17)ICL [18]. Published online:iclr.cc

  24. [24]

    L. Li, K. Jamieson, G. DeSalvo, A. Rostamizadeh, and A. Talwalkar. Hyperband: A novel bandit-based approach to Hyperparameter Optimization.Journal of Machine Learning Research, 18(185):1–52, 2018

  25. [25]

    Mallik, C

    N. Mallik, C. Hvarfner, D. Stoll, M. Janowski, E. Bergman, M. Lindauer, L. Nardi, and F. Hutter. PriorBand: HyperBand + human expert knowledge. In F. Ferreira, Q. Lei, E. Triantafillou, J. Vanschoren, and H. Yao, editors,NeurIPS 2022 Workshop on Meta-Learning, 2022

  26. [26]

    Mallik, C

    N. Mallik, C. Hvarfner, E. Bergman, D. Stoll, M. Janowski, M. Lindauer, L. Nardi, and F. Hutter. PriorBand: Practical hyperparameter optimization in the age of deep learning. In A. Oh, T. Naumann, A. Globerson, K. Saenko, M. Hardt, and S. Levine, editors,Proceedings of the 37th International Conference on Advances in Neural Information Processing Systems ...

  27. [27]

    Mohr and M

    F. Mohr and M. Wever. Naive automated machine learning.Machine Learning, 112, 2023

  28. [28]

    Müller, M

    S. Müller, M. Feurer, N. Hollmann, and F. Hutter. PFNs4BO: In-Context Learning for Bayesian Optimization. In A. Krause, E. Brunskill, K. Cho, B. Engelhardt, S. Sabato, and J. Scarlett, editors,Proceedings of the 40th International Conference on Machine Learning (ICML’23), volume 202 ofProceedings of Machine Learning Research. PMLR, 2023

  29. [29]

    Ordoumpozanis and G

    K. Ordoumpozanis and G. A. Papakostas. Green ai: Assessing the carbon footprint of fine- tuning pre-trained deep learning models in medical imaging. In2024 International Conference on Innovation and Intelligence for Informatics, Computing, and Technologies (3ICT). IEEE, 2024

  30. [30]

    Perrone, H

    V . Perrone, H. Shen, M. Seeger, C. Archambeau, and R. Jenatton. Learning search spaces for bayesian optimization: Another view of hyperparameter transfer learning. In H. Wallach, H. Larochelle, A. Beygelzimer, F. d’Alche Buc, E. Fox, and R. Garnett, editors,Proceedings of the 33rd International Conference on Advances in Neural Information Processing Syst...

  31. [31]

    Pfisterer, L

    F. Pfisterer, L. Schneider, J. Moosbauer, M. Binder, and B. Bischl. YAHPO Gym – an effi- cient multi-objective multi-fidelity benchmark for hyperparameter optimization. In I. Guyon, M. Lindauer, M. van der Schaar, F. Hutter, and R. Garnett, editors,Proceedings of the First International Conference on Automated Machine Learning. Proceedings of Machine Lear...

  32. [32]

    Rakotoarison, S

    H. Rakotoarison, S. Adriaensen, N. Mallik, S. Garibov, E. Bergman, and F. Hutter. In-context freeze-thaw bayesian optimization for hyperparameter optimization. In R. Salakhutdinov, Z. Kolter, K. Heller, A. Weller, N. Oliver, J. Scarlett, and F. Berkenkamp, editors,Proceedings of the 41st International Conference on Machine Learning (ICML’24), volume 251 o...

  33. [33]

    Rasmussen and C

    C. Rasmussen and C. Williams.Gaussian Processes for Machine Learning. The MIT Press, 2006

  34. [34]

    Rothfuss, V

    J. Rothfuss, V . Fortuin, M. Josifoski, and A. Krause. Pacoh: Bayes-optimal meta-learning with pac-guarantees. InInternational Conference on Machine Learning, pages 9116–9126. PMLR, 2021

  35. [35]

    Schede, J

    E. Schede, J. Brandt, A. Tornede, M. Wever, V . Bengs, E. Hüllermeier, and K. Tierney. A survey of methods for automated algorithm configuration.jair, 75:425–487, 2022

  36. [36]

    Schwartz, J

    R. Schwartz, J. Dodge, N. A. Smith, and O. Etzioni. Green AI.arXiv:1907.10597v3 [cs.CY], 2019

  37. [37]

    J. Seng, F. Ventola, Z. Yu, and K. Kersting. Hyperparameter optimization via interacting with probabilistic circuits. In R. Garnett, C. Doerr, J. van Rijn, and L. Akoglu, editors,Proceedings of the Third International Conference on Automated Machine Learning. Proceedings of Machine Learning Research, 2025

  38. [38]

    Souza, L

    A. Souza, L. Nardi, L. Oliveira, K. Olukotun, M. Lindauer, and F. Hutter. Bayesian optimization with a prior for the optimum. In N. Oliver, F. Pérez-Cruz, S. Kramer, J. Read, and J. A. Lozano, editors,Machine Learning and Knowledge Discovery in Databases. Research Track, volume 12975 ofLecture Notes in Artificial Intelligence, page 265–296. Springer-Verlag, 2021

  39. [39]

    Srinivas, A

    N. Srinivas, A. Krause, S. Kakade, and M. Seeger. Gaussian process optimization in the bandit setting: No regret and experimental design. In J. Fürnkranz and T. Joachims, editors, Proceedings of the 27th International Conference on Machine Learning (ICML’10), pages 1015–1022. Omnipress, 2010

  40. [40]

    Tornede, A

    T. Tornede, A. Tornede, J. Hanselle, F. Mohr, M. Wever, and E. Hüllermeier. Towards green automated machine learning: Status quo and future directions.Journal of Artificial Intelligence Research, 77:427–457, 2023. 12

  41. [41]

    van Rijn and F

    J. van Rijn and F. Hutter. Hyperparameter importance across datasets. In Y . Guo and F. Fa- rooq, editors,Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD’18), pages 2367–2376. ACM Press, 2018

  42. [42]

    Zimmer, M

    L. Zimmer, M. Lindauer, and F. Hutter. Auto-Pytorch: Multi-fidelity metalearning for efficient and robust AutoDL.IEEE Transactions on Pattern Analysis and Machine Intelligence, 43: 3079–3090, 2021. 13 Overview of the Appendix A List of Symbols 15 B Theoretical Guarantees 16 B.1 Expected behavior of Prior-Guided Successive Halving . . . . . . . . . . . . ....

  43. [43]

    ln 2 log2(K)( K 2 −1) δ ! − (νj∗ −ν j) max{ϵ, µj∗ −µ j} 2σ2 0 # . (20) Proof.Case 1: Let for allj∈ K\{j ∗}:µ j∗ −µ j ≥ϵ. Then we have N≥max j∈K\{j ∗} 4RΣ (µj∗ −µ j)2 ·

    and arms are eliminated based on their posterior means (Algorithm 2). Let j∗ denote the optimal arm. We define the error eventEϵ as returning an arm j such that µj∗ −µ j > ϵ. The expected probability of error is bounded by summing the risk over allRrounds: E[P(Eϵ |µ)]≤ R−1X r=0 Cr,ϵ X j∈K,j̸=j ∗ exp −(νj∗ −ν j)2 4σ2 0 exp − nrϵ2 4Σ (10) Proof. The algorit...