Provably Reduced Sample Cost in Prior-Guided Hyperparameter Optimization

Barbara Hammer; Jasmin Brandt; Leona Hennig; Lukas Fehring; Marcel Wever; Marius Lindauer

arxiv: 2606.04866 · v1 · pith:ANV5B7YPnew · submitted 2026-06-03 · 💻 cs.LG

Provably Reduced Sample Cost in Prior-Guided Hyperparameter Optimization

Leona Hennig , Jasmin Brandt , Lukas Fehring , Barbara Hammer , Marius Lindauer , Marcel Wever This is my paper

Pith reviewed 2026-06-28 07:38 UTC · model grok-4.3

classification 💻 cs.LG

keywords hyperparameter optimizationprior-guided HPObest-arm identificationsample complexitymulti-fidelity optimizationdistribution-dependent boundsAutoML

0 comments

The pith

Informative priors concentrating probability mass on near-optimal arms reduce the evaluations needed in multi-fidelity hyperparameter optimization.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper establishes that modeling priors directly over arm means as configuration performance allows derivation of distribution-dependent sample complexity bounds in the fixed-budget best-arm identification setting. These bounds show that priors concentrating on near-optimal configurations lower the number of required evaluations, while uninformative or misleading priors recover standard baseline performance. A reader would care because large-scale HPO in AutoML is computationally expensive, and this supplies the first quantitative link between prior informativeness and evaluation budget. Proof-of-concept runs on a synthetic benchmark and LCBench achieve up to 90 percent budget reduction while preserving solution quality.

Core claim

By modeling priors directly over arm means as configuration performance within the fixed-budget best-arm identification framework, we derive explicit distribution-dependent error bounds that quantify the relationship between prior informativeness and evaluation budget. Our analysis shows that informative priors, which concentrate probability mass on near-optimal arms, yield reductions in the number of required evaluations, whereas baseline performance is recovered with uninformative or misleading priors.

What carries the argument

Fixed-budget best-arm identification framework with priors modeled over arm means, yielding distribution-dependent sample complexity bounds.

If this is right

Informative priors reduce the number of evaluations required for multi-fidelity HPO.
Uninformative or misleading priors produce no improvement and recover baseline performance.
Up to 90 percent budget reduction is observed on synthetic and LCBench benchmarks while retaining solution quality.
The bounds supply a principled foundation for incorporating priors into compute-efficient AutoML.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The bounds could guide construction or selection of priors that maximize efficiency in new HPO tasks.
The same modeling approach may extend to quantify prior benefits in other black-box optimization problems.
The results support analysis of energy savings when prior information is available in large-scale ML pipelines.

Load-bearing premise

Priors can be modeled directly over the means of arm performances inside a fixed-budget best-arm identification setup to produce explicit error bounds linking prior quality to sample needs.

What would settle it

An experiment in which priors concentrating on near-optimal arms produce no reduction in evaluations below the no-prior baseline, or in which observed sample needs violate the derived bounds.

Figures

Figures reproduced from arXiv: 2606.04866 by Barbara Hammer, Jasmin Brandt, Leona Hennig, Lukas Fehring, Marcel Wever, Marius Lindauer.

**Figure 2.** Figure 2: Learning curves of arms in the synthetic benchmark. Each arm [PITH_FULL_IMAGE:figures/full_fig_p022_2.png] view at source ↗

**Figure 3.** Figure 3: Hyperband performance across prior types on regret (left) and probability of [PITH_FULL_IMAGE:figures/full_fig_p025_3.png] view at source ↗

**Figure 4.** Figure 4: Regret for PriorBand with the PSH early stopping criterion across prior types, plotted [PITH_FULL_IMAGE:figures/full_fig_p026_4.png] view at source ↗

**Figure 5.** Figure 5: Sensitivity analysis of the required prior gap across confidence ( [PITH_FULL_IMAGE:figures/full_fig_p028_5.png] view at source ↗

**Figure 6.** Figure 6: Regret and consumed budget of PSH on LCBench with varying GP surrogate kernels. [PITH_FULL_IMAGE:figures/full_fig_p029_6.png] view at source ↗

**Figure 7.** Figure 7: Regret distributions and consumed budgets for all PSH prior generation schemes under [PITH_FULL_IMAGE:figures/full_fig_p030_7.png] view at source ↗

read the original abstract

Large-scale hyperparameter optimization (HPO) in automated machine learning (AutoML) consumes substantial computational resources, raising growing concerns about scalability and energy efficiency. Existing methods use prior information heuristically to accelerate both black-box and multi-fidelity settings, but they lack a characterization of how prior informativeness quantitatively reduces sample complexity. In this work, we provide the first distribution-dependent sample complexity bounds for multi-fidelity HPO with priors through the formal lens of fixed-budget best-arm identification. By modeling priors directly over arm means as configuration performance, we derive explicit, distribution-dependent error bounds that quantify the relationship between priors and evaluation budget. Our analysis shows that informative priors, which concentrate probability mass on near-optimal arms, yield reductions in the number of required evaluations, whereas baseline performance is recovered with uninformative or misleading priors. We conduct proof-of-concept experiments on a synthetic benchmark and on LCBench, a common multi-fidelity HPO benchmark for deep learning, to confirm our theoretical results, achieving up to 90% budget reduction while retaining solution quality. Together, our results provide a principled foundation for prior-guided and compute-efficient green AutoML.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper supplies the first distribution-dependent bounds linking prior quality to sample cost in multi-fidelity HPO via best-arm identification, but the multi-fidelity bias and cost modeling look underspecified.

read the letter

The main contribution is a set of explicit, distribution-dependent error bounds for fixed-budget best-arm identification when priors are placed directly on arm means. Informative priors that concentrate mass near the optimum reduce the number of evaluations needed; uninformative or bad priors recover the usual rates. That framing is new for this setting and gives a quantitative handle on something that has mostly been handled heuristically.

The experiments on synthetic data and LCBench are presented as confirmation and report up to 90% budget reduction while keeping solution quality. That is useful evidence if the runs actually track the derived bounds.

The soft spot is the multi-fidelity part. The abstract models priors on configuration performance but gives no indication that the bounds include a fidelity-dependent observation model, bias correction, or cost-weighted allocation. If every evaluation is treated as an unbiased draw from the target mean, the claimed reduction in total compute does not automatically follow from a reduction in raw evaluation count. The stress-test note on this point looks like it lands.

The paper is aimed at researchers who want a principled account of prior-guided HPO rather than another heuristic method. It shows clear engagement with the best-arm identification literature and the formal steps are worth checking, so it deserves a serious referee even if the multi-fidelity modeling needs tightening.

Referee Report

1 major / 0 minor

Summary. The paper claims to provide the first distribution-dependent sample complexity bounds for multi-fidelity hyperparameter optimization with priors, derived via the fixed-budget best-arm identification framework. By modeling priors directly over arm means (as configuration performance), it derives explicit error bounds quantifying how informative priors (concentrating mass near optimal arms) reduce required evaluations, while uninformative or misleading priors recover baseline performance. Proof-of-concept experiments on synthetic data and LCBench are reported to achieve up to 90% budget reduction while retaining solution quality.

Significance. If the bounds correctly handle multi-fidelity structure, the work supplies a principled, quantitative link between prior informativeness and evaluation budget in HPO, offering a foundation for compute-efficient AutoML. The explicit distribution-dependent bounds and experimental validation on a standard benchmark constitute strengths that enable falsifiable predictions.

major comments (1)

[theoretical analysis] Abstract and theoretical analysis: the claim of multi-fidelity HPO bounds with reduced sample cost is not supported by the stated modeling. The derivation places priors directly over arm means as configuration performance and produces distribution-dependent error bounds, but supplies no indication that a fidelity-dependent observation model, bias term, or cost-weighted allocation enters the bounds. If every evaluation is treated as a direct, unbiased observation of the target mean, the claimed reduction in total compute (rather than raw evaluation count) does not necessarily follow.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for the careful and constructive review. The feedback highlights an important gap between the modeling assumptions and the multi-fidelity claims, which we address directly below.

read point-by-point responses

Referee: [theoretical analysis] Abstract and theoretical analysis: the claim of multi-fidelity HPO bounds with reduced sample cost is not supported by the stated modeling. The derivation places priors directly over arm means as configuration performance and produces distribution-dependent error bounds, but supplies no indication that a fidelity-dependent observation model, bias term, or cost-weighted allocation enters the bounds. If every evaluation is treated as a direct, unbiased observation of the target mean, the claimed reduction in total compute (rather than raw evaluation count) does not necessarily follow.

Authors: We agree that the current theoretical derivation models each observation as an unbiased sample from the target arm mean and does not yet incorporate a fidelity-dependent observation model, bias terms for lower-fidelity evaluations, or explicit cost-weighted allocation. Consequently, the stated bounds quantify reductions in the number of evaluations rather than total computational cost. To resolve this, we will revise the theoretical analysis section to introduce a multi-fidelity observation model (with fidelity-specific bias and variance) and a cost function that enters the fixed-budget allocation rule. The revised bounds will then explicitly relate prior informativeness to reductions in total compute while recovering the original results as a special case when all evaluations are at full fidelity. revision: yes

Circularity Check

0 steps flagged

Derivation self-contained in standard BAI framework; no reduction to inputs or self-citations

full rationale

The paper models priors directly over arm means within the fixed-budget best-arm identification setting and derives explicit distribution-dependent error bounds from that framework. No equations or steps are shown reducing by construction to fitted parameters, renamed predictions, or load-bearing self-citations; the claimed relationship between prior informativeness and evaluation budget follows from the standard BAI analysis once the prior is placed on the means. The multi-fidelity aspect is invoked but does not alter the core derivation into a circular form. This is the normal non-circular outcome for a theoretical bounds paper.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The central claim rests on the domain assumption that priors can be modeled directly over arm means and that the fixed-budget best-arm identification lens applies to multi-fidelity HPO; no free parameters or invented entities are mentioned in the abstract.

axioms (2)

domain assumption Priors can be modeled directly over arm means representing configuration performance.
Stated in the abstract as the modeling choice enabling the bounds.
domain assumption The fixed-budget best-arm identification framework applies to multi-fidelity HPO with priors.
Described as the formal lens through which the sample complexity bounds are derived.

pith-pipeline@v0.9.1-grok · 5745 in / 1381 out tokens · 24480 ms · 2026-06-28T07:38:56.969151+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

43 extracted references · 3 canonical work pages · 1 internal anchor

[1]

Atsidakou, S

A. Atsidakou, S. Katariya, S. Sanghavi, and B. Kveton. Bayesian fixed-budget best-arm identification.arXiv preprint arXiv:2211.08572, 2022

work page arXiv 2022
[2]

Audibert and S

J.-Y . Audibert and S. Bubeck. Best arm identification in multi-armed bandits. InCOLT-23th Conference on learning theory-2010, pages 13–p, 2010

2010
[3]

Bischl, M

B. Bischl, M. Binder, M. Lang, T. Pielok, J. Richter, S. Coors, J. Thomas, T. Ullmann, M. Becker, A. Boulesteix, D. Deng, and M. Lindauer. Hyperparameter optimization: Foundations, algo- rithms, best practices, and open challenges.Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery, page e1484, 2023

2023
[4]

Bohdal, L

O. Bohdal, L. Balles, M. Wistuba, B. Ermis, C. Archambeau, and G. Zappella. PASHA: Efficient HPO and NAS with progressive resource allocation. InThe Eleventh International Conference on Learning Representations (ICLR’23). ICLR, 2023. Published online:iclr.cc

2023
[5]

Brandt, E

J. Brandt, E. Schede, B. Haddenhorst, V . Bengs, E. Hüllermeier, and K. Tierney. AC-Band: A combinatorial bandit-based approach to algorithm configuration. In B. Williams, S. Bernardini, Y . Chen, and J. Neville, editors,Proceedings of the Thirty-Seventh Conference on Artificial Intelligence (AAAI’23). Association for the Advancement of Artificial Intelli...

2023
[6]

Brandt, M

J. Brandt, M. Wever, V . Bengs, and E. Hüllermeier. Best arm identification with retroactively increased sampling budget for more resource-efficient hpo. In K. Larson, editor,Proceedings of the 33rd International Joint Conference on Artificial Intelligence (IJCAI’24), 2024

2024
[7]

Castellanos-Nieves and L

D. Castellanos-Nieves and L. García-Forte. Strategies of automated machine learning for energy sustainability in green artificial intelligence.Applied Sciences, 14(14), 2024

2024
[8]

Castellanos-Nieves and L

D. Castellanos-Nieves and L. García-Forte. Improving automated machine-learning systems through green ai.Applied Sciences, 13(20), 2023

2023
[9]

Eggensperger, P

K. Eggensperger, P. Müller, N. Mallik, M. Feurer, R. Sass, A. Klein, N. Awad, M. Lindauer, and F. Hutter. HPOBench: A collection of reproducible multi-fidelity benchmark problems for HPO. In J. Vanschoren and S. Yeung, editors,Proceedings of the Neural Information Processing Systems Track on Datasets and Benchmarks. Curran Associates, 2021. 10

2021
[10]

AutoGluon-Tabular: Robust and Accurate AutoML for Structured Data

N. Erickson, J. Mueller, A. Shirkov, H. Zhang, P. Larroy, M. Li, and A. Smola. Autogluon- tabular: Robust and accurate automl for structured data.arXiv:2003.06505 [stat.ML], 2020

work page internal anchor Pith review Pith/arXiv arXiv 2003
[11]

Even-Dar, S

E. Even-Dar, S. Mannor, and Y . Mansour. Pac bounds for multi-armed bandit and markov decision processes. InInternational Conference on Computational Learning Theory, pages 255–270. Springer, 2002

2002
[12]

Falkner, G

A. Falkner, G. Friedrich, K. Schekotihin, R. Taupe, and E. Teppan. Industrial applications of answer set programming.KI-Künstliche Intelligenz, pages 1–12, 2018

2018
[13]

Feurer, J

M. Feurer, J. Springenberg, and F. Hutter. Initializing Bayesian Hyperparameter Optimization via meta-learning. In B. Bonet and S. Koenig, editors,Proceedings of the Twenty-ninth AAAI Conference on Artificial Intelligence (AAAI’15), pages 1128–1135. AAAI Press, 2015

2015
[14]

Feurer, K

M. Feurer, K. Eggensperger, S. Falkner, M. Lindauer, and F. Hutter. Auto-Sklearn 2.0: Hands- free automl via meta-learning.Journal of Machine Learning Research, 23(261):1–61, 2022

2022
[15]

Giovanelli, A

J. Giovanelli, A. Tornede, T. Tornede, and M. Lindauer. Interactive hyperparameter opti- mization in multi-objective problems via preference learning. In M. J. Wooldridge, J. G. Dy, and S. Natarajan, editors,Proceedings of the Thirty-Eighth AAAI Conference on Artificial Intelligence (AAAI’24). AAAI Press, 2024

2024
[16]

Hutter, L

F. Hutter, L. Kotthoff, and J. Vanschoren, editors.Automated Machine Learning: Methods, Systems, Challenges. Springer, 2019. Available for free athttp://automl.org/book

2019
[17]

Hvarfner, D

C. Hvarfner, D. Stoll, A. Souza, L. Nardi, M. Lindauer, and F. Hutter. πBO: Augmenting Acquisition Functions with User Beliefs for Bayesian Optimization. InThe Tenth International Conference on Learning Representations (ICLR’22). ICLR, 2022. Published online: iclr.cc

2022
[18]

Proceedings of the International Conference on Learning Representations (ICLR’17), 2017. ICLR. Published online:iclr.cc

2017
[19]

Jamieson and A

K. Jamieson and A. Talwalkar. Non-stochastic best arm identification and Hyperparameter Optimization. In A. Gretton and C. Robert, editors,Proceedings of the Seventeenth International Conference on Artificial Intelligence and Statistics (AISTATS’16), volume 51. Proceedings of Machine Learning Research, 2016

2016
[20]

Klein, S

A. Klein, S. Falkner, J. Springenberg, and F. Hutter. Learning curve prediction with Bayesian neural networks. InThe Fifth International Conference on Learning Representations (ICLR’17) ICL [18]. Published online:iclr.cc
[21]

Lederer, J

A. Lederer, J. Umlauft, and S. Hirche. Uniform error bounds for gaussian process regression with application to safe control.Advances in Neural Information Processing Systems, 32, 2019

2019
[22]

F. Li, S. Wang, and K. Li. Lamda: Two-phase multi-fidelity hpo via learning promising regions from data. InProceedings of the AAAI Conference on Artificial Intelligence, 2026

2026
[23]

L. Li, K. Jamieson, G. DeSalvo, A. Rostamizadeh, and A. Talwalkar. Hyperband: Bandit- based configuration evaluation for Hyperparameter Optimization. InThe Fifth International Conference on Learning Representations (ICLR’17)ICL [18]. Published online:iclr.cc
[24]

L. Li, K. Jamieson, G. DeSalvo, A. Rostamizadeh, and A. Talwalkar. Hyperband: A novel bandit-based approach to Hyperparameter Optimization.Journal of Machine Learning Research, 18(185):1–52, 2018

2018
[25]

Mallik, C

N. Mallik, C. Hvarfner, D. Stoll, M. Janowski, E. Bergman, M. Lindauer, L. Nardi, and F. Hutter. PriorBand: HyperBand + human expert knowledge. In F. Ferreira, Q. Lei, E. Triantafillou, J. Vanschoren, and H. Yao, editors,NeurIPS 2022 Workshop on Meta-Learning, 2022

2022
[26]

Mallik, C

N. Mallik, C. Hvarfner, E. Bergman, D. Stoll, M. Janowski, M. Lindauer, L. Nardi, and F. Hutter. PriorBand: Practical hyperparameter optimization in the age of deep learning. In A. Oh, T. Naumann, A. Globerson, K. Saenko, M. Hardt, and S. Levine, editors,Proceedings of the 37th International Conference on Advances in Neural Information Processing Systems ...

2023
[27]

Mohr and M

F. Mohr and M. Wever. Naive automated machine learning.Machine Learning, 112, 2023

2023
[28]

Müller, M

S. Müller, M. Feurer, N. Hollmann, and F. Hutter. PFNs4BO: In-Context Learning for Bayesian Optimization. In A. Krause, E. Brunskill, K. Cho, B. Engelhardt, S. Sabato, and J. Scarlett, editors,Proceedings of the 40th International Conference on Machine Learning (ICML’23), volume 202 ofProceedings of Machine Learning Research. PMLR, 2023

2023
[29]

Ordoumpozanis and G

K. Ordoumpozanis and G. A. Papakostas. Green ai: Assessing the carbon footprint of fine- tuning pre-trained deep learning models in medical imaging. In2024 International Conference on Innovation and Intelligence for Informatics, Computing, and Technologies (3ICT). IEEE, 2024

2024
[30]

Perrone, H

V . Perrone, H. Shen, M. Seeger, C. Archambeau, and R. Jenatton. Learning search spaces for bayesian optimization: Another view of hyperparameter transfer learning. In H. Wallach, H. Larochelle, A. Beygelzimer, F. d’Alche Buc, E. Fox, and R. Garnett, editors,Proceedings of the 33rd International Conference on Advances in Neural Information Processing Syst...

2019
[31]

Pfisterer, L

F. Pfisterer, L. Schneider, J. Moosbauer, M. Binder, and B. Bischl. YAHPO Gym – an effi- cient multi-objective multi-fidelity benchmark for hyperparameter optimization. In I. Guyon, M. Lindauer, M. van der Schaar, F. Hutter, and R. Garnett, editors,Proceedings of the First International Conference on Automated Machine Learning. Proceedings of Machine Lear...

2022
[32]

Rakotoarison, S

H. Rakotoarison, S. Adriaensen, N. Mallik, S. Garibov, E. Bergman, and F. Hutter. In-context freeze-thaw bayesian optimization for hyperparameter optimization. In R. Salakhutdinov, Z. Kolter, K. Heller, A. Weller, N. Oliver, J. Scarlett, and F. Berkenkamp, editors,Proceedings of the 41st International Conference on Machine Learning (ICML’24), volume 251 o...

2024
[33]

Rasmussen and C

C. Rasmussen and C. Williams.Gaussian Processes for Machine Learning. The MIT Press, 2006

2006
[34]

Rothfuss, V

J. Rothfuss, V . Fortuin, M. Josifoski, and A. Krause. Pacoh: Bayes-optimal meta-learning with pac-guarantees. InInternational Conference on Machine Learning, pages 9116–9126. PMLR, 2021

2021
[35]

Schede, J

E. Schede, J. Brandt, A. Tornede, M. Wever, V . Bengs, E. Hüllermeier, and K. Tierney. A survey of methods for automated algorithm configuration.jair, 75:425–487, 2022

2022
[36]

Schwartz, J

R. Schwartz, J. Dodge, N. A. Smith, and O. Etzioni. Green AI.arXiv:1907.10597v3 [cs.CY], 2019

work page arXiv 1907
[37]

J. Seng, F. Ventola, Z. Yu, and K. Kersting. Hyperparameter optimization via interacting with probabilistic circuits. In R. Garnett, C. Doerr, J. van Rijn, and L. Akoglu, editors,Proceedings of the Third International Conference on Automated Machine Learning. Proceedings of Machine Learning Research, 2025

2025
[38]

Souza, L

A. Souza, L. Nardi, L. Oliveira, K. Olukotun, M. Lindauer, and F. Hutter. Bayesian optimization with a prior for the optimum. In N. Oliver, F. Pérez-Cruz, S. Kramer, J. Read, and J. A. Lozano, editors,Machine Learning and Knowledge Discovery in Databases. Research Track, volume 12975 ofLecture Notes in Artificial Intelligence, page 265–296. Springer-Verlag, 2021

2021
[39]

Srinivas, A

N. Srinivas, A. Krause, S. Kakade, and M. Seeger. Gaussian process optimization in the bandit setting: No regret and experimental design. In J. Fürnkranz and T. Joachims, editors, Proceedings of the 27th International Conference on Machine Learning (ICML’10), pages 1015–1022. Omnipress, 2010

2010
[40]

Tornede, A

T. Tornede, A. Tornede, J. Hanselle, F. Mohr, M. Wever, and E. Hüllermeier. Towards green automated machine learning: Status quo and future directions.Journal of Artificial Intelligence Research, 77:427–457, 2023. 12

2023
[41]

van Rijn and F

J. van Rijn and F. Hutter. Hyperparameter importance across datasets. In Y . Guo and F. Fa- rooq, editors,Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD’18), pages 2367–2376. ACM Press, 2018

2018
[42]

Zimmer, M

L. Zimmer, M. Lindauer, and F. Hutter. Auto-Pytorch: Multi-fidelity metalearning for efficient and robust AutoDL.IEEE Transactions on Pattern Analysis and Machine Intelligence, 43: 3079–3090, 2021. 13 Overview of the Appendix A List of Symbols 15 B Theoretical Guarantees 16 B.1 Expected behavior of Prior-Guided Successive Halving . . . . . . . . . . . . ....

2021
[43]

ln 2 log2(K)( K 2 −1) δ ! − (νj∗ −ν j) max{ϵ, µj∗ −µ j} 2σ2 0 # . (20) Proof.Case 1: Let for allj∈ K\{j ∗}:µ j∗ −µ j ≥ϵ. Then we have N≥max j∈K\{j ∗} 4RΣ (µj∗ −µ j)2 ·

and arms are eliminated based on their posterior means (Algorithm 2). Let j∗ denote the optimal arm. We define the error eventEϵ as returning an arm j such that µj∗ −µ j > ϵ. The expected probability of error is bounded by summing the risk over allRrounds: E[P(Eϵ |µ)]≤ R−1X r=0 Cr,ϵ X j∈K,j̸=j ∗ exp −(νj∗ −ν j)2 4σ2 0 exp − nrϵ2 4Σ (10) Proof. The algorit...

[1] [1]

Atsidakou, S

A. Atsidakou, S. Katariya, S. Sanghavi, and B. Kveton. Bayesian fixed-budget best-arm identification.arXiv preprint arXiv:2211.08572, 2022

work page arXiv 2022

[2] [2]

Audibert and S

J.-Y . Audibert and S. Bubeck. Best arm identification in multi-armed bandits. InCOLT-23th Conference on learning theory-2010, pages 13–p, 2010

2010

[3] [3]

Bischl, M

B. Bischl, M. Binder, M. Lang, T. Pielok, J. Richter, S. Coors, J. Thomas, T. Ullmann, M. Becker, A. Boulesteix, D. Deng, and M. Lindauer. Hyperparameter optimization: Foundations, algo- rithms, best practices, and open challenges.Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery, page e1484, 2023

2023

[4] [4]

Bohdal, L

O. Bohdal, L. Balles, M. Wistuba, B. Ermis, C. Archambeau, and G. Zappella. PASHA: Efficient HPO and NAS with progressive resource allocation. InThe Eleventh International Conference on Learning Representations (ICLR’23). ICLR, 2023. Published online:iclr.cc

2023

[5] [5]

Brandt, E

J. Brandt, E. Schede, B. Haddenhorst, V . Bengs, E. Hüllermeier, and K. Tierney. AC-Band: A combinatorial bandit-based approach to algorithm configuration. In B. Williams, S. Bernardini, Y . Chen, and J. Neville, editors,Proceedings of the Thirty-Seventh Conference on Artificial Intelligence (AAAI’23). Association for the Advancement of Artificial Intelli...

2023

[6] [6]

Brandt, M

J. Brandt, M. Wever, V . Bengs, and E. Hüllermeier. Best arm identification with retroactively increased sampling budget for more resource-efficient hpo. In K. Larson, editor,Proceedings of the 33rd International Joint Conference on Artificial Intelligence (IJCAI’24), 2024

2024

[7] [7]

Castellanos-Nieves and L

D. Castellanos-Nieves and L. García-Forte. Strategies of automated machine learning for energy sustainability in green artificial intelligence.Applied Sciences, 14(14), 2024

2024

[8] [8]

Castellanos-Nieves and L

D. Castellanos-Nieves and L. García-Forte. Improving automated machine-learning systems through green ai.Applied Sciences, 13(20), 2023

2023

[9] [9]

Eggensperger, P

K. Eggensperger, P. Müller, N. Mallik, M. Feurer, R. Sass, A. Klein, N. Awad, M. Lindauer, and F. Hutter. HPOBench: A collection of reproducible multi-fidelity benchmark problems for HPO. In J. Vanschoren and S. Yeung, editors,Proceedings of the Neural Information Processing Systems Track on Datasets and Benchmarks. Curran Associates, 2021. 10

2021

[10] [10]

AutoGluon-Tabular: Robust and Accurate AutoML for Structured Data

N. Erickson, J. Mueller, A. Shirkov, H. Zhang, P. Larroy, M. Li, and A. Smola. Autogluon- tabular: Robust and accurate automl for structured data.arXiv:2003.06505 [stat.ML], 2020

work page internal anchor Pith review Pith/arXiv arXiv 2003

[11] [11]

Even-Dar, S

E. Even-Dar, S. Mannor, and Y . Mansour. Pac bounds for multi-armed bandit and markov decision processes. InInternational Conference on Computational Learning Theory, pages 255–270. Springer, 2002

2002

[12] [12]

Falkner, G

A. Falkner, G. Friedrich, K. Schekotihin, R. Taupe, and E. Teppan. Industrial applications of answer set programming.KI-Künstliche Intelligenz, pages 1–12, 2018

2018

[13] [13]

Feurer, J

M. Feurer, J. Springenberg, and F. Hutter. Initializing Bayesian Hyperparameter Optimization via meta-learning. In B. Bonet and S. Koenig, editors,Proceedings of the Twenty-ninth AAAI Conference on Artificial Intelligence (AAAI’15), pages 1128–1135. AAAI Press, 2015

2015

[14] [14]

Feurer, K

M. Feurer, K. Eggensperger, S. Falkner, M. Lindauer, and F. Hutter. Auto-Sklearn 2.0: Hands- free automl via meta-learning.Journal of Machine Learning Research, 23(261):1–61, 2022

2022

[15] [15]

Giovanelli, A

J. Giovanelli, A. Tornede, T. Tornede, and M. Lindauer. Interactive hyperparameter opti- mization in multi-objective problems via preference learning. In M. J. Wooldridge, J. G. Dy, and S. Natarajan, editors,Proceedings of the Thirty-Eighth AAAI Conference on Artificial Intelligence (AAAI’24). AAAI Press, 2024

2024

[16] [16]

Hutter, L

F. Hutter, L. Kotthoff, and J. Vanschoren, editors.Automated Machine Learning: Methods, Systems, Challenges. Springer, 2019. Available for free athttp://automl.org/book

2019

[17] [17]

Hvarfner, D

C. Hvarfner, D. Stoll, A. Souza, L. Nardi, M. Lindauer, and F. Hutter. πBO: Augmenting Acquisition Functions with User Beliefs for Bayesian Optimization. InThe Tenth International Conference on Learning Representations (ICLR’22). ICLR, 2022. Published online: iclr.cc

2022

[18] [18]

Proceedings of the International Conference on Learning Representations (ICLR’17), 2017. ICLR. Published online:iclr.cc

2017

[19] [19]

Jamieson and A

K. Jamieson and A. Talwalkar. Non-stochastic best arm identification and Hyperparameter Optimization. In A. Gretton and C. Robert, editors,Proceedings of the Seventeenth International Conference on Artificial Intelligence and Statistics (AISTATS’16), volume 51. Proceedings of Machine Learning Research, 2016

2016

[20] [20]

Klein, S

A. Klein, S. Falkner, J. Springenberg, and F. Hutter. Learning curve prediction with Bayesian neural networks. InThe Fifth International Conference on Learning Representations (ICLR’17) ICL [18]. Published online:iclr.cc

[21] [21]

Lederer, J

A. Lederer, J. Umlauft, and S. Hirche. Uniform error bounds for gaussian process regression with application to safe control.Advances in Neural Information Processing Systems, 32, 2019

2019

[22] [22]

F. Li, S. Wang, and K. Li. Lamda: Two-phase multi-fidelity hpo via learning promising regions from data. InProceedings of the AAAI Conference on Artificial Intelligence, 2026

2026

[23] [23]

L. Li, K. Jamieson, G. DeSalvo, A. Rostamizadeh, and A. Talwalkar. Hyperband: Bandit- based configuration evaluation for Hyperparameter Optimization. InThe Fifth International Conference on Learning Representations (ICLR’17)ICL [18]. Published online:iclr.cc

[24] [24]

L. Li, K. Jamieson, G. DeSalvo, A. Rostamizadeh, and A. Talwalkar. Hyperband: A novel bandit-based approach to Hyperparameter Optimization.Journal of Machine Learning Research, 18(185):1–52, 2018

2018

[25] [25]

Mallik, C

N. Mallik, C. Hvarfner, D. Stoll, M. Janowski, E. Bergman, M. Lindauer, L. Nardi, and F. Hutter. PriorBand: HyperBand + human expert knowledge. In F. Ferreira, Q. Lei, E. Triantafillou, J. Vanschoren, and H. Yao, editors,NeurIPS 2022 Workshop on Meta-Learning, 2022

2022

[26] [26]

Mallik, C

N. Mallik, C. Hvarfner, E. Bergman, D. Stoll, M. Janowski, M. Lindauer, L. Nardi, and F. Hutter. PriorBand: Practical hyperparameter optimization in the age of deep learning. In A. Oh, T. Naumann, A. Globerson, K. Saenko, M. Hardt, and S. Levine, editors,Proceedings of the 37th International Conference on Advances in Neural Information Processing Systems ...

2023

[27] [27]

Mohr and M

F. Mohr and M. Wever. Naive automated machine learning.Machine Learning, 112, 2023

2023

[28] [28]

Müller, M

S. Müller, M. Feurer, N. Hollmann, and F. Hutter. PFNs4BO: In-Context Learning for Bayesian Optimization. In A. Krause, E. Brunskill, K. Cho, B. Engelhardt, S. Sabato, and J. Scarlett, editors,Proceedings of the 40th International Conference on Machine Learning (ICML’23), volume 202 ofProceedings of Machine Learning Research. PMLR, 2023

2023

[29] [29]

Ordoumpozanis and G

K. Ordoumpozanis and G. A. Papakostas. Green ai: Assessing the carbon footprint of fine- tuning pre-trained deep learning models in medical imaging. In2024 International Conference on Innovation and Intelligence for Informatics, Computing, and Technologies (3ICT). IEEE, 2024

2024

[30] [30]

Perrone, H

V . Perrone, H. Shen, M. Seeger, C. Archambeau, and R. Jenatton. Learning search spaces for bayesian optimization: Another view of hyperparameter transfer learning. In H. Wallach, H. Larochelle, A. Beygelzimer, F. d’Alche Buc, E. Fox, and R. Garnett, editors,Proceedings of the 33rd International Conference on Advances in Neural Information Processing Syst...

2019

[31] [31]

Pfisterer, L

F. Pfisterer, L. Schneider, J. Moosbauer, M. Binder, and B. Bischl. YAHPO Gym – an effi- cient multi-objective multi-fidelity benchmark for hyperparameter optimization. In I. Guyon, M. Lindauer, M. van der Schaar, F. Hutter, and R. Garnett, editors,Proceedings of the First International Conference on Automated Machine Learning. Proceedings of Machine Lear...

2022

[32] [32]

Rakotoarison, S

H. Rakotoarison, S. Adriaensen, N. Mallik, S. Garibov, E. Bergman, and F. Hutter. In-context freeze-thaw bayesian optimization for hyperparameter optimization. In R. Salakhutdinov, Z. Kolter, K. Heller, A. Weller, N. Oliver, J. Scarlett, and F. Berkenkamp, editors,Proceedings of the 41st International Conference on Machine Learning (ICML’24), volume 251 o...

2024

[33] [33]

Rasmussen and C

C. Rasmussen and C. Williams.Gaussian Processes for Machine Learning. The MIT Press, 2006

2006

[34] [34]

Rothfuss, V

J. Rothfuss, V . Fortuin, M. Josifoski, and A. Krause. Pacoh: Bayes-optimal meta-learning with pac-guarantees. InInternational Conference on Machine Learning, pages 9116–9126. PMLR, 2021

2021

[35] [35]

Schede, J

E. Schede, J. Brandt, A. Tornede, M. Wever, V . Bengs, E. Hüllermeier, and K. Tierney. A survey of methods for automated algorithm configuration.jair, 75:425–487, 2022

2022

[36] [36]

Schwartz, J

R. Schwartz, J. Dodge, N. A. Smith, and O. Etzioni. Green AI.arXiv:1907.10597v3 [cs.CY], 2019

work page arXiv 1907

[37] [37]

J. Seng, F. Ventola, Z. Yu, and K. Kersting. Hyperparameter optimization via interacting with probabilistic circuits. In R. Garnett, C. Doerr, J. van Rijn, and L. Akoglu, editors,Proceedings of the Third International Conference on Automated Machine Learning. Proceedings of Machine Learning Research, 2025

2025

[38] [38]

Souza, L

A. Souza, L. Nardi, L. Oliveira, K. Olukotun, M. Lindauer, and F. Hutter. Bayesian optimization with a prior for the optimum. In N. Oliver, F. Pérez-Cruz, S. Kramer, J. Read, and J. A. Lozano, editors,Machine Learning and Knowledge Discovery in Databases. Research Track, volume 12975 ofLecture Notes in Artificial Intelligence, page 265–296. Springer-Verlag, 2021

2021

[39] [39]

Srinivas, A

N. Srinivas, A. Krause, S. Kakade, and M. Seeger. Gaussian process optimization in the bandit setting: No regret and experimental design. In J. Fürnkranz and T. Joachims, editors, Proceedings of the 27th International Conference on Machine Learning (ICML’10), pages 1015–1022. Omnipress, 2010

2010

[40] [40]

Tornede, A

T. Tornede, A. Tornede, J. Hanselle, F. Mohr, M. Wever, and E. Hüllermeier. Towards green automated machine learning: Status quo and future directions.Journal of Artificial Intelligence Research, 77:427–457, 2023. 12

2023

[41] [41]

van Rijn and F

J. van Rijn and F. Hutter. Hyperparameter importance across datasets. In Y . Guo and F. Fa- rooq, editors,Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD’18), pages 2367–2376. ACM Press, 2018

2018

[42] [42]

Zimmer, M

L. Zimmer, M. Lindauer, and F. Hutter. Auto-Pytorch: Multi-fidelity metalearning for efficient and robust AutoDL.IEEE Transactions on Pattern Analysis and Machine Intelligence, 43: 3079–3090, 2021. 13 Overview of the Appendix A List of Symbols 15 B Theoretical Guarantees 16 B.1 Expected behavior of Prior-Guided Successive Halving . . . . . . . . . . . . ....

2021

[43] [43]

ln 2 log2(K)( K 2 −1) δ ! − (νj∗ −ν j) max{ϵ, µj∗ −µ j} 2σ2 0 # . (20) Proof.Case 1: Let for allj∈ K\{j ∗}:µ j∗ −µ j ≥ϵ. Then we have N≥max j∈K\{j ∗} 4RΣ (µj∗ −µ j)2 ·

and arms are eliminated based on their posterior means (Algorithm 2). Let j∗ denote the optimal arm. We define the error eventEϵ as returning an arm j such that µj∗ −µ j > ϵ. The expected probability of error is bounded by summing the risk over allRrounds: E[P(Eϵ |µ)]≤ R−1X r=0 Cr,ϵ X j∈K,j̸=j ∗ exp −(νj∗ −ν j)2 4σ2 0 exp − nrϵ2 4Σ (10) Proof. The algorit...