In-Context Decision Making for Optimizing Complex AutoML Pipelines

Amir Rezaei Balef; Katharina Eggensperger

arxiv: 2508.13657 · v2 · submitted 2025-08-19 · 💻 cs.LG · cs.AI

In-Context Decision Making for Optimizing Complex AutoML Pipelines

Amir Rezaei Balef , Katharina Eggensperger This is my paper

Pith reviewed 2026-05-18 22:34 UTC · model grok-4.3

classification 💻 cs.LG cs.AI

keywords AutoMLCASHPosterior SamplingPrior-data Fitted NetworksBandit AlgorithmsIn-context LearningPipeline Optimization

0 comments

The pith

PS-PFN extends posterior sampling to the max k-armed bandit problem to select and adapt complex ML pipelines including fine-tuning and ensembling via in-context learning with prior-data fitted networks.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper aims to establish that PS-PFN can efficiently identify the best-performing ML pipeline among heterogeneous options by modeling the task as a max k-armed bandit problem. It uses prior-data fitted networks to estimate the posterior distribution over the highest reward value through in-context learning. A sympathetic reader would care because modern AutoML must handle pipelines far more varied than traditional hyperparameter tuning, yet exhaustive search becomes impractical. If the method succeeds, it would allow faster adaptation of pre-trained models without prohibitive computation. The work also incorporates varying evaluation costs and per-pipeline reward modeling.

Core claim

We propose PS-PFN to efficiently explore and exploit adapting ML pipelines by extending Posterior Sampling (PS) to the max k-armed bandit problem setup. PS-PFN leverages prior-data fitted networks (PFNs) to efficiently estimate the posterior distribution of the maximal value via in-context learning. We show how to extend this method to consider varying costs of pulling arms and to use different PFNs to model reward distributions individually per arm. Experimental results on one novel and two existing standard benchmark tasks demonstrate the superior performance of PS-PFN compared to other bandit and AutoML strategies.

What carries the argument

PS-PFN, which extends posterior sampling to the max k-armed bandit formulation and applies prior-data fitted networks to perform in-context estimation of the posterior distribution over the maximal reward.

If this is right

PS-PFN achieves superior performance over other bandit and AutoML strategies on both existing and new benchmark tasks.
The approach extends the CASH framework to modern pipelines that require fine-tuning and ensembling.
Varying costs of evaluating different pipelines can be incorporated directly into the decision process.
Separate prior-data fitted networks can model the reward distribution of each pipeline arm individually.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same in-context posterior estimation technique might apply to other selection tasks that involve choosing the single best option among many heterogeneous alternatives.
As prior-data fitted networks improve on broader training distributions, the method could scale to larger and more diverse pipeline search spaces.
Practitioners could combine PS-PFN with existing AutoML tools to reduce the number of pipeline evaluations needed for competitive results.

Load-bearing premise

Prior-data fitted networks can reliably estimate the posterior distribution of the maximal reward for heterogeneous AutoML pipelines that include fine-tuning and ensembling, even when reward structures and evaluation costs vary.

What would settle it

A new benchmark consisting of pipelines with reward distributions outside the training distribution of the prior-data fitted networks where PS-PFN fails to select the top performer more often than standard bandit or AutoML baselines.

Figures

Figures reproduced from arXiv: 2508.13657 by Amir Rezaei Balef, Katharina Eggensperger.

**Figure 1.** Figure 1: (Top) AutoML needs to perform algorithm selection and resource allocation across heterogeneous optimization tasks, such as hyperparameter tuning, fine-tuning, ensembling, and more. (Bottom) The performance of each workflow on two datasets demonstrates the variability of the optimization trajectories and the importance of algorithm selection. approaches either tackle this as a single-level optimization prob… view at source ↗

**Figure 2.** Figure 2: Exemplary HPO trajectories (from the Reshuffling benchmark) exhibit distributions that standard parametric models cannot capture. 1 In MKB problems, there is no single best arm and the oracle arm depends on the budget T [41]. If T is known and sufficiently large, with f(t) = T, the agent aims to achieve the best final performance. With f(t) = ni + T − t, the agent accounts for how many more iterations it c… view at source ↗

**Figure 3.** Figure 3: A single iteration of PS-PFN. (1) Context construction: Format observed rewards as PFN input; (2) Posterior prediction: Query PFN for time step t; (3) Decision: Sample from posterior to select arm. We use PFNs to model and estimate the unknown per-arm reward distributions as shown in [PITH_FULL_IMAGE:figures/full_fig_p004_3.png] view at source ↗

**Figure 4.** Figure 4: (Top) Different optimization trajectories for CASH+ tasks and their synthetically generated counterparts. (Bottom) Corresponding posterior predictions by PFNs given the same input (black line). Instead, we propose a different perspective: rather than adjusting the reward process by a cost estimate, we predict the posterior distribution of the maximal reward for a future time step adjusted by the remaining… view at source ↗

**Figure 5.** Figure 5: (Left) for 30 iterations of HPO, CatBoost outperforms MLP, while with the same budget, it is possible to run MLP for 50 iterations, outperforming CatBoost. (Right) The cost of one iteration is noisy. datasets (see Table B.1). For the existing CASH benchmarks, TabRepoRaw and YaHPOGym, we use available pre-computed HPO trajectories [47, 4]. For our newly developed CASH+ task, Complex, we run 5 different me… view at source ↗

**Figure 6.** Figure 6: Average rank of algorithms on different benchmarks, lower is better. SMAC and random search perform combined CASH across the joint space. as a flexible framework for resource allocation. To efficiently model reward distributions that do not follow common forms, are different for each arm, and potentially shift over time, we exploit PFNs [39]. The resulting method, PS-PFN and its extensions, outperforms pri… view at source ↗

read the original abstract

Combined Algorithm Selection and Hyperparameter Optimization (CASH) has been fundamental to traditional AutoML systems. However, with the advancements of pre-trained models, modern ML workflows go beyond hyperparameter optimization and often require fine-tuning, ensembling, and other adaptation techniques. While the core challenge of identifying the best-performing model for a downstream task remains, the increasing heterogeneity of ML pipelines demands novel AutoML approaches. This work extends the CASH framework to select and adapt modern ML pipelines. We propose PS-PFN to efficiently explore and exploit adapting ML pipelines by extending Posterior Sampling (PS) to the max k-armed bandit problem setup. PS-PFN leverages prior-data fitted networks (PFNs) to efficiently estimate the posterior distribution of the maximal value via in-context learning. We show how to extend this method to consider varying costs of pulling arms and to use different PFNs to model reward distributions individually per arm. Experimental results on one novel and two existing standard benchmark tasks demonstrate the superior performance of PS-PFN compared to other bandit and AutoML strategies. We make our code and data available at https://github.com/amirbalef/CASHPlus.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

PS-PFN combines posterior sampling with per-arm PFNs in a max k-armed bandit to handle modern AutoML pipelines beyond basic CASH, but the abstract gives almost no experimental detail.

read the letter

The main point is that this paper takes posterior sampling, extends it to a max k-armed bandit where the goal is to identify the single best arm, and uses prior-data fitted networks for in-context estimation of the posterior over the maximum reward. They apply the approach to an extended CASH problem that includes fine-tuning, ensembling, and other adaptation steps, with separate PFNs per arm and explicit handling of varying evaluation costs. Code and data are released, which is useful.

Referee Report

2 major / 1 minor

Summary. The manuscript proposes PS-PFN, which extends posterior sampling to a max k-armed bandit formulation for selecting and adapting heterogeneous ML pipelines (including fine-tuning and ensembling) in an extended CASH setting. It uses prior-data fitted networks (PFNs) for in-context estimation of the posterior over the maximal reward, incorporates per-arm PFNs and cost variations, and reports superior performance versus bandit and AutoML baselines on one novel benchmark plus two existing ones, with code and data released.

Significance. If the performance claims are substantiated, the work could provide a practical in-context learning route for modern AutoML pipelines whose heterogeneity exceeds traditional hyperparameter optimization. The explicit release of code and data at https://github.com/amirbalef/CASHPlus is a clear strength that supports reproducibility and follow-up work.

major comments (2)

[Abstract] The abstract states that PS-PFN demonstrates superior performance on one novel and two existing benchmarks, yet supplies no information on experimental design, baseline implementations, number of runs, statistical tests, or how cost variations and pipeline heterogeneity were controlled; this information is required to evaluate whether the data support the central empirical claim.
[Method] The claim that PFNs reliably estimate the posterior distribution of the maximal value via in-context learning (with per-arm models and cost-adjusted selection) rests on the unverified assumption that the PFN training distribution matches the reward structures, non-stationarity, and cost heterogeneity of AutoML tasks; no calibration checks or ground-truth posterior comparisons are described to substantiate this.

minor comments (1)

[Abstract] The phrase 'max k-armed bandit problem setup' is introduced without a short formal definition or pointer to the precise formulation (e.g., reward as maximum rather than sum), which would aid readers.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the detailed and constructive report. We address each major comment below and describe the revisions we will incorporate to improve clarity and substantiation of our claims.

read point-by-point responses

Referee: [Abstract] The abstract states that PS-PFN demonstrates superior performance on one novel and two existing benchmarks, yet supplies no information on experimental design, baseline implementations, number of runs, statistical tests, or how cost variations and pipeline heterogeneity were controlled; this information is required to evaluate whether the data support the central empirical claim.

Authors: We agree that the abstract would benefit from additional context to support the central empirical claim. In the revised manuscript we will expand the abstract to briefly note the experimental design, including the use of 10 independent runs per task, statistical significance testing via paired t-tests, and explicit controls for cost variations and pipeline heterogeneity. Full details on baseline implementations and the construction of the novel benchmark will continue to be provided in the Experiments section and appendix. revision: yes
Referee: [Method] The claim that PFNs reliably estimate the posterior distribution of the maximal value via in-context learning (with per-arm models and cost-adjusted selection) rests on the unverified assumption that the PFN training distribution matches the reward structures, non-stationarity, and cost heterogeneity of AutoML tasks; no calibration checks or ground-truth posterior comparisons are described to substantiate this.

Authors: We acknowledge that the current manuscript does not include explicit calibration checks or ground-truth posterior comparisons. The PFNs were trained on a synthetic distribution constructed to cover a broad range of reward structures, non-stationarities, and cost heterogeneities (detailed in Section 3), but we agree that direct validation would strengthen the methodological claims. We will add a new subsection (or appendix) in the revised manuscript that reports calibration results on synthetic tasks designed to mimic AutoML reward distributions, including posterior calibration plots and comparisons to exact posteriors where computationally feasible. revision: yes

Circularity Check

0 steps flagged

No circularity detected; derivation extends established PS and PFN concepts to max-k bandit without reducing to fitted inputs or self-citation chains

full rationale

The provided abstract and context describe an extension of posterior sampling to a max k-armed bandit formulation for AutoML pipelines, using PFNs for in-context posterior estimation over maximal rewards, with per-arm PFNs and cost handling. No equations, definitions, or load-bearing steps are shown that equate a claimed prediction or uniqueness result to a fitted parameter or prior self-citation by construction. The central method relies on external PFN training and bandit theory as independent inputs rather than re-deriving them from the target AutoML results. This matches the default expectation of a self-contained extension with no reduction to inputs.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 1 invented entities

The central claim depends on the domain assumption that the max k-armed bandit model accurately captures the costs and rewards of selecting and adapting heterogeneous ML pipelines, and that PFNs can perform effective in-context posterior estimation in this setting. No explicit free parameters or additional invented entities beyond the method itself are detailed.

axioms (1)

domain assumption The selection and adaptation of modern ML pipelines can be modeled as a max k-armed bandit problem with varying arm costs.
Invoked when extending Posterior Sampling to this setup for CASHPlus.

invented entities (1)

PS-PFN no independent evidence
purpose: To efficiently estimate posterior distributions of maximal values via in-context learning for pipeline optimization.
New method introduced to combine posterior sampling with per-arm PFNs.

pith-pipeline@v0.9.0 · 5730 in / 1394 out tokens · 51927 ms · 2026-05-18T22:34:19.908542+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

We propose PS-PFN to efficiently explore and exploit adapting ML pipelines by extending Posterior Sampling (PS) to the max k-armed bandit problem setup. PS-PFN leverages prior-data fitted networks (PFNs) to efficiently estimate the posterior distribution of the maximal value via in-context learning.
IndisputableMonolith/Foundation/AlphaCoordinateFixation.lean alpha_pin_under_high_calibration unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

The semi-flat prior covers trajectories where rewards gradually improve over time, reflecting a shift with a rapid decay in the distribution

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

61 extracted references · 61 canonical work pages · 1 internal anchor

[1]

Adriaensen, H

S. Adriaensen, H. Rakotoarison, S. Müller, and F. Hutter. Efficient Bayesian learning curve extrapolation using prior-data fitted networks. In Proc. of NeurIPS’23, 2023

work page 2023
[2]

S. P. Arango, F. Ferreira, A. Kadra, F. Hutter, and J. Grabocka. Quick- tune: Quickly learning which pretrained model to finetune and how. In Proc. of ICLR’24, 2024

work page 2024
[3]

A. R. Balef, C. Vernade, and K. Eggensperger. Towards bandit- based optimization for automated machine learning. In 5th Work- shop on practical ML for limited/low resource settings , 2024. URL https://openreview.net/forum?id=S5da3rzyuk

work page 2024
[4]

A. R. Balef, C. Vernade, and K. Eggensperger. Put CASH on bandits: A max k-armed problem for automated machine learning. arXiv preprint arXiv:2505.05226, 2025

work page arXiv 2025
[5]

Baudry, P

D. Baudry, P. Saux, and O.-A. Maillard. From optimality to robust- ness: Adaptive re-sampling strategies in stochastic bandits. In Proc. of NeurIPS’21, 2021

work page 2021
[6]

Bergman, M

E. Bergman, M. Feurer, A. Bahram, A. R. Balef, L. Purucker, S. Segel, M. Lindauer, F. Hutter, and K. Eggensperger. AMLTK: A Modular Automl Toolkit in Python. Journal of Open Source Software , 9(100): 6367, 2024. doi: 10.21105/joss.06367. URL https://doi.org/10.21105/ joss.06367

work page doi:10.21105/joss.06367 2024
[7]

Bischl, G

B. Bischl, G. Casalicchio, T. Das, M. Feurer, S. Fischer, P. Gijs- bers, S. Mukherjee, A. C. Müller, L. Németh, L. Oala, L. Purucker, S. Ravi, J. N. van Rijn, P. Singh, J. Vanschoren, J. van der Velde, and M. Wever. Openml: Insights from 10 years and more than a thou- sand papers. Patterns, 6(7):101317, 2025. ISSN 2666-3899. doi: https://doi.org/10.1016/j...

work page doi:10.1016/j.patter.2025.101317 2025
[8]

F. d. Breejen, S. Bae, S. Cha, and S.-Y . Yun. Fine-tuned in-context learn- ing transformers are excellent tabular data classifiers. arXiv preprint arXiv:2405.13396, 2024

work page arXiv 2024
[9]

Brown, B

T. Brown, B. Mann, N. Ryder, M. Subbiah, J. Kaplan, P. Dhariwal, A. Neelakantan, P. Shyam, G. Sastry, A. Askell, S. Agarwal, A. Herbert- V oss, G. Krueger, T. Henighan, R. Child, A. Ramesh, D. Ziegler, J. Wu, C. Winter, C. Hesse, M. Chen, E. Sigler, M. Litwin, S. Gray, B. Chess, J. Clark, C. Berner, S. McCandlish, A. Radford, I. Sutskever, and D. Amodei. ...

work page 1901
[10]

Cayci, A

S. Cayci, A. Eryilmaz, and R. Srikant. Budget-constrained bandits over general cost and reward distributions. In Proc. of AISTATS’20, 2020

work page 2020
[11]

L. Chen, K. Lu, A. Rajeswaran, K. Lee, A. Grover, M. Laskin, P. Abbeel, A. Srinivas, and I. Mordatch. Decision transformer: Rein- forcement learning via sequence modeling. In Proc. of NeurIPS’21, 2021

work page 2021
[12]

Chen and C

T. Chen and C. Guestrin. XGBoost: A scalable tree boosting system. In Proc. of KDD’16, pages 785–794, 2016

work page 2016
[13]

Cowen-Rivers, W

A. Cowen-Rivers, W. Lyu, R. Tutunov, Z. Wang, A. Grosnit, R. Grif- fiths, A. Maraval, H. Jianye, J. Wang, J. Peters, and H. Ammar. HEBO: Pushing the limits of sample-efficient hyper-parameter optimisation. Journal of Artificial Intelligence Research, 74:1269–1349, 2022

work page 2022
[14]

L. Cui, H. Li, K. Chen, L. Shou, and G. Chen. Tabular data augmenta- tion for machine learning: Progress and prospects of embracing gener- ative ai. arXiv:2407.21523 [cs.LG], 2024

work page arXiv 2024
[15]

W. Cui, R. Hosseinzadeh, J. Ma, T. Wu, Y . Sui, and K. Golestan. Tabular data contrastive learning via class-conditioned and feature-correlation based augmentation. arXiv preprint arXiv:2404.17489, 2024

work page arXiv 2024
[16]

W. Ding, T. Qin, X.-D. Zhang, and T.-Y . Liu. Multi-armed bandit with budget constraint and variable costs. In Proc. of AAAI’13, volume 27, pages 232–238, 2013

work page 2013
[17]

Feuer, R

B. Feuer, R. T. Schirrmeister, V . Cherepanova, C. Hegde, F. Hutter, M. Goldblum, N. Cohen, and C. White. Tunetables: Context optimiza- tion for scalable prior-data fitted networks. In Proc. of NeurIPS’24, 2024

work page 2024
[18]

Feurer, A

M. Feurer, A. Klein, K. Eggensperger, J. Springenberg, M. Blum, and F. Hutter. Efficient and robust automated machine learning. In Proc. of NeurIPS’15, pages 2962–2970, 2015

work page 2015
[19]

Feurer, K

M. Feurer, K. Eggensperger, S. Falkner, M. Lindauer, and F. Hutter. Auto-Sklearn 2.0: Hands-free automl via meta-learning.Journal of Ma- chine Learning Research, 23(261):1–61, 2022

work page 2022
[20]

Fiandri, A

M. Fiandri, A. M. Metelli, and F. Trovò. Thompson sampling-like al- gorithms for stochastic rising rested bandits. In Seventeenth European Workshop on Reinforcement Learning, 2024. URL https://openreview. net/forum?id=jaFhipqjxR

work page 2024
[21]

Gorishniy, I

Y . Gorishniy, I. Rubachev, V . Khrulkov, and A. Babenko. Revisiting deep learning models for tabular data. In Proc. of NeurIPS’21, 2021

work page 2021
[22]

Hollmann, S

N. Hollmann, S. Müller, K. Eggensperger, and F. Hutter. TabPFN: A transformer that solves small tabular classification problems in a sec- ond. In Proc. of ICLR’23, 2023

work page 2023
[23]

Hollmann, S

N. Hollmann, S. Müller, and F. Hutter. Large language models for auto- mated data science: Introducing CAAFE for context-aware automated feature engineering. arXiv:2305.03403[v5] [cs.AI], 2023

work page arXiv 2023
[24]

2025 , month = jan, journal =

N. Hollmann, S. Müller, and F. Hutter. Accurate predictions on small data with a tabular foundation model. Nature, 637:319–326, 2025. doi: 10.1038/s41586-024-08328-6. URL https://www.nature.com/articles/ s41586-024-08328-6

work page doi:10.1038/s41586-024-08328-6 2025
[25]

Holzmüller, L

D. Holzmüller, L. Grinsztajn, and I. Steinwart. Better by default: Strong pre-tuned mlps and boosted trees on tabular data. In Proc. of NeurIPS’24, 2024

work page 2024
[26]

Y . Hu, X. Liu, and S. L. Y . Yu. Cascaded algorithm selection with extreme-region UCB bandit. IEEE Transactions on Pattern Analysis and Machine Intelligence, 44(10):6782–6794, 2021

work page 2021
[27]

Kingma and J

D. Kingma and J. Ba. Adam: A method for stochastic optimization. In Proc. of ICLR’15, 2015

work page 2015
[28]

Komer, J

B. Komer, J. Bergstra, and C. Eliasmith. Hyperopt-sklearn: Automatic hyperparameter configuration for scikit-learn. In ICML Workshop on AutoML, 2014

work page 2014
[29]

Kotthoff, C

L. Kotthoff, C. Thornton, H. H. Hoos, F. Hutter, and K. Leyton-Brown. Auto-WEKA 2.0: Automatic model selection and hyperparameter op- timization in WEKA. Journal of Machine Learning Research, 18(25): 1–5, 2017

work page 2017
[30]

Kveton, B

B. Kveton, B. Oreshkin, Y . Park, A. A. Deshmukh, and R. Song. Online posterior sampling with a diffusion prior. InProc. of NeurIPS’24, 2024

work page 2024
[31]

Lattimore and C

T. Lattimore and C. Szepesvári. Bandit Algorithms. Cambridge Univer- sity Press, 2020

work page 2020
[32]

E. H. Lee, V . Perrone, C. Archambeau, and M. Seeger. Cost-aware bayesian optimization. In Proc. of UAI’20, 2020

work page 2020
[33]

J. Lee, A. Xie, A. Pacchiano, Y . Chandak, C. Finn, O. Nachum, and E. Brunskill. Supervised pretraining can learn in-context reinforcement learning. In Proc. of NeurIPS’23, 2023

work page 2023
[34]

Y . Li, J. Jiang, J. Gao, Y . Shao, C. Zhang, and B. Cui. Efficient auto- matic CASH via rising bandits. In Proc. of AAAI’20, pages 4763–4771, 2020

work page 2020
[35]

L. Lin, Y . Bai, and S. Mei. Transformers as decision makers: Provable in-context reinforcement learning via supervised pretraining. In Proc. of ICLR’24, 2024

work page 2024
[36]

Y . Liu, B. Van Roy, and K. Xu. Nonstationary bandit learning via pre- dictive sampling. In Proc. of AISTATS’23, 2023

work page 2023
[37]

Loshchilov and F

I. Loshchilov and F. Hutter. SGDR: Stochastic gradient descent with warm restarts. In Proc. of ICLR’17, 2017

work page 2017
[38]

Müller, N

S. Müller, N. Hollmann, S. Arango, J. Grabocka, and F. Hutter. Trans- formers can do Bayesian inference. In Proc. of ICLR’22, 2022

work page 2022
[39]

S. G. Müller, M. Feurer, N. Hollmann, and F. Hutter. PFNs4BO: In- context learning for bayesian optimization. In Proc. of ICML’23, 2023

work page 2023
[40]

Nagler, L

T. Nagler, L. Schneider, B. Bischl, and M. Feurer. Reshuffling resam- pling splits can improve generalization of hyperparameter optimization. 2024

work page 2024
[41]

Nishihara, D

R. Nishihara, D. Lopez-Paz, and L. Bottou. No regret bound for extreme bandits. In Proc. of AISTATS’16, 2016

work page 2016
[42]

Pfisterer, L

F. Pfisterer, L. Schneider, J. Moosbauer, M. Binder, and B. Bischl. Y AHPO Gym – an efficient multi-objective multi-fidelity benchmark for hyperparameter optimization. In Proc. of AutoML Conf’22. PMLR, 2022

work page 2022
[43]

M. Phan, Y . Abbasi Yadkori, and J. Domke. Thompson sampling and approximate inference. In Proc. of NeurIPS’19, 2019

work page 2019
[44]

Prokhorenkova, G

L. Prokhorenkova, G. Gusev, A. V orobev, A. Dorogush, and A. Gulin. Catboost: Unbiased boosting with categorical features. In Proc. of NeurIPS’18, page 6639–6649, 2018

work page 2018
[45]

Pushak and H

Y . Pushak and H. Hoos. Automl loss landscapes. ACM Transactions on Evolutionary Learning and Optimization, 2(3):1–30, 2022

work page 2022
[46]

Russo and B

D. Russo and B. Van Roy. Learning to optimize via posterior sampling. Mathematics of Operations Research, 39(4):1221–1243, 2014

work page 2014
[47]

Salinas and N

D. Salinas and N. Erickson. TabRepo: A large scale repository of tabular model evaluations and its AutoML applications. In Proc. of AutoML Conf’24. PMLR, 2024

work page 2024
[48]

C. Shen, X. Zhang, W. Wei, and J. Xu. Hyperbandit: Contextual ban- dit with hypernewtork for time-varying user preferences in streaming recommendation. In Proc. of CIKM’23, 2023

work page 2023
[49]

Snoek, H

J. Snoek, H. Larochelle, and R. P. Adams. Practical Bayesian optimiza- tion of machine learning algorithms. In Proc. of NeurIPS’12, 2012

work page 2012
[50]

W. R. Thompson. On the likelihood that one unknown probability ex- ceeds another in view of the evidence of two samples. Biometrika, 25 (3-4):285–294, 1933

work page 1933
[51]

Thornton, F

C. Thornton, F. Hutter, H. Hoos, and K. Leyton-Brown. Auto-WEKA: combined selection and Hyperparameter Optimization of classification algorithms. In Proc. of KDD’13, pages 847–855, 2013

work page 2013
[52]

van den Nieuwenhuijzen, C

M. van den Nieuwenhuijzen, C. Doerr, J. N. van Rijn, and H. Gouk. Se- lecting pre-trained models for transfer learning with data-centric meta- features. In AutoML Conference 2024 (Workshop Track), 2024

work page 2024
[53]

Vaswani, N

A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. Gomez, L. Kaiser, and I. Polosukhin. Attention is all you need. In Proc. of NeurIPS’17. Curran Associates, Inc., 2017

work page 2017
[54]

C. Wang, Q. Wu, M. Weimer, and E. Zhu. Flaml: A fast and lightweight automl library. In Proc. of MLSys’21, pages 434–447, 2021

work page 2021
[55]

Y . Xia, H. Li, T. Qin, N. Yu, and T.-Y . Liu. Thompson sampling for budgeted multi-armed bandits. In Proc. of IJCAI’15, 2015

work page 2015
[56]

Y . Xia, H. Li, T. Qin, N. Yu, and T.-Y . Liu. Thompson sampling for budgeted multi-armed bandits. arXiv preprint arXiv:1505.00146, 2015

work page internal anchor Pith review Pith/arXiv arXiv 2015
[57]

Y . Xia, W. Ding, X.-D. Zhang, N. Yu, and T. Qin. Budgeted bandit problems with continuous random costs. In Proc. of ACML’16, 2016

work page 2016
[58]

Q. Xie, R. Astudillo, P. Frazier, Z. Scully, and A. Terenin. Cost-aware bayesian optimization via the pandora’s box gittins index. In Proc. of NeurIPS’24, 2024

work page 2024
[59]

B. Zhu, X. Shi, N. Erickson, M. Li, G. Karypis, and M. Shoaran. Xtab: Cross-table pretraining for tabular transformers. In Proc. of ICML’23, 2023. Table of Contents for the Appendices • Appendix A: Preliminaries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ....

work page 2023
[60]

Draw U ∼ Uniform(0, 1)

work page
[61]

Convergence Analysis

Solve Fmax(x) = U: [F (x)]t = U =⇒ F (x) = U 1/t =⇒ x = F −1(U 1/t) Thus, max(r1:t) = F −1(U 1/t) follows the correct distribution. Convergence Analysis. For the convergence analysis, let Fn(x) denote the empirical CDF estimated from n i.i.d. samples. Using the Dvoretzky-Kiefer-Wolfowitz (DKW) inequality: P sup x |Fn(x) − F (x)| ≥ ϵ ≤ δ, with ϵ = r ln(2/δ...

work page 2000

[1] [1]

Adriaensen, H

S. Adriaensen, H. Rakotoarison, S. Müller, and F. Hutter. Efficient Bayesian learning curve extrapolation using prior-data fitted networks. In Proc. of NeurIPS’23, 2023

work page 2023

[2] [2]

S. P. Arango, F. Ferreira, A. Kadra, F. Hutter, and J. Grabocka. Quick- tune: Quickly learning which pretrained model to finetune and how. In Proc. of ICLR’24, 2024

work page 2024

[3] [3]

A. R. Balef, C. Vernade, and K. Eggensperger. Towards bandit- based optimization for automated machine learning. In 5th Work- shop on practical ML for limited/low resource settings , 2024. URL https://openreview.net/forum?id=S5da3rzyuk

work page 2024

[4] [4]

A. R. Balef, C. Vernade, and K. Eggensperger. Put CASH on bandits: A max k-armed problem for automated machine learning. arXiv preprint arXiv:2505.05226, 2025

work page arXiv 2025

[5] [5]

Baudry, P

D. Baudry, P. Saux, and O.-A. Maillard. From optimality to robust- ness: Adaptive re-sampling strategies in stochastic bandits. In Proc. of NeurIPS’21, 2021

work page 2021

[6] [6]

Bergman, M

E. Bergman, M. Feurer, A. Bahram, A. R. Balef, L. Purucker, S. Segel, M. Lindauer, F. Hutter, and K. Eggensperger. AMLTK: A Modular Automl Toolkit in Python. Journal of Open Source Software , 9(100): 6367, 2024. doi: 10.21105/joss.06367. URL https://doi.org/10.21105/ joss.06367

work page doi:10.21105/joss.06367 2024

[7] [7]

Bischl, G

B. Bischl, G. Casalicchio, T. Das, M. Feurer, S. Fischer, P. Gijs- bers, S. Mukherjee, A. C. Müller, L. Németh, L. Oala, L. Purucker, S. Ravi, J. N. van Rijn, P. Singh, J. Vanschoren, J. van der Velde, and M. Wever. Openml: Insights from 10 years and more than a thou- sand papers. Patterns, 6(7):101317, 2025. ISSN 2666-3899. doi: https://doi.org/10.1016/j...

work page doi:10.1016/j.patter.2025.101317 2025

[8] [8]

F. d. Breejen, S. Bae, S. Cha, and S.-Y . Yun. Fine-tuned in-context learn- ing transformers are excellent tabular data classifiers. arXiv preprint arXiv:2405.13396, 2024

work page arXiv 2024

[9] [9]

Brown, B

T. Brown, B. Mann, N. Ryder, M. Subbiah, J. Kaplan, P. Dhariwal, A. Neelakantan, P. Shyam, G. Sastry, A. Askell, S. Agarwal, A. Herbert- V oss, G. Krueger, T. Henighan, R. Child, A. Ramesh, D. Ziegler, J. Wu, C. Winter, C. Hesse, M. Chen, E. Sigler, M. Litwin, S. Gray, B. Chess, J. Clark, C. Berner, S. McCandlish, A. Radford, I. Sutskever, and D. Amodei. ...

work page 1901

[10] [10]

Cayci, A

S. Cayci, A. Eryilmaz, and R. Srikant. Budget-constrained bandits over general cost and reward distributions. In Proc. of AISTATS’20, 2020

work page 2020

[11] [11]

L. Chen, K. Lu, A. Rajeswaran, K. Lee, A. Grover, M. Laskin, P. Abbeel, A. Srinivas, and I. Mordatch. Decision transformer: Rein- forcement learning via sequence modeling. In Proc. of NeurIPS’21, 2021

work page 2021

[12] [12]

Chen and C

T. Chen and C. Guestrin. XGBoost: A scalable tree boosting system. In Proc. of KDD’16, pages 785–794, 2016

work page 2016

[13] [13]

Cowen-Rivers, W

A. Cowen-Rivers, W. Lyu, R. Tutunov, Z. Wang, A. Grosnit, R. Grif- fiths, A. Maraval, H. Jianye, J. Wang, J. Peters, and H. Ammar. HEBO: Pushing the limits of sample-efficient hyper-parameter optimisation. Journal of Artificial Intelligence Research, 74:1269–1349, 2022

work page 2022

[14] [14]

L. Cui, H. Li, K. Chen, L. Shou, and G. Chen. Tabular data augmenta- tion for machine learning: Progress and prospects of embracing gener- ative ai. arXiv:2407.21523 [cs.LG], 2024

work page arXiv 2024

[15] [15]

W. Cui, R. Hosseinzadeh, J. Ma, T. Wu, Y . Sui, and K. Golestan. Tabular data contrastive learning via class-conditioned and feature-correlation based augmentation. arXiv preprint arXiv:2404.17489, 2024

work page arXiv 2024

[16] [16]

W. Ding, T. Qin, X.-D. Zhang, and T.-Y . Liu. Multi-armed bandit with budget constraint and variable costs. In Proc. of AAAI’13, volume 27, pages 232–238, 2013

work page 2013

[17] [17]

Feuer, R

B. Feuer, R. T. Schirrmeister, V . Cherepanova, C. Hegde, F. Hutter, M. Goldblum, N. Cohen, and C. White. Tunetables: Context optimiza- tion for scalable prior-data fitted networks. In Proc. of NeurIPS’24, 2024

work page 2024

[18] [18]

Feurer, A

M. Feurer, A. Klein, K. Eggensperger, J. Springenberg, M. Blum, and F. Hutter. Efficient and robust automated machine learning. In Proc. of NeurIPS’15, pages 2962–2970, 2015

work page 2015

[19] [19]

Feurer, K

M. Feurer, K. Eggensperger, S. Falkner, M. Lindauer, and F. Hutter. Auto-Sklearn 2.0: Hands-free automl via meta-learning.Journal of Ma- chine Learning Research, 23(261):1–61, 2022

work page 2022

[20] [20]

Fiandri, A

M. Fiandri, A. M. Metelli, and F. Trovò. Thompson sampling-like al- gorithms for stochastic rising rested bandits. In Seventeenth European Workshop on Reinforcement Learning, 2024. URL https://openreview. net/forum?id=jaFhipqjxR

work page 2024

[21] [21]

Gorishniy, I

Y . Gorishniy, I. Rubachev, V . Khrulkov, and A. Babenko. Revisiting deep learning models for tabular data. In Proc. of NeurIPS’21, 2021

work page 2021

[22] [22]

Hollmann, S

N. Hollmann, S. Müller, K. Eggensperger, and F. Hutter. TabPFN: A transformer that solves small tabular classification problems in a sec- ond. In Proc. of ICLR’23, 2023

work page 2023

[23] [23]

Hollmann, S

N. Hollmann, S. Müller, and F. Hutter. Large language models for auto- mated data science: Introducing CAAFE for context-aware automated feature engineering. arXiv:2305.03403[v5] [cs.AI], 2023

work page arXiv 2023

[24] [24]

2025 , month = jan, journal =

N. Hollmann, S. Müller, and F. Hutter. Accurate predictions on small data with a tabular foundation model. Nature, 637:319–326, 2025. doi: 10.1038/s41586-024-08328-6. URL https://www.nature.com/articles/ s41586-024-08328-6

work page doi:10.1038/s41586-024-08328-6 2025

[25] [25]

Holzmüller, L

D. Holzmüller, L. Grinsztajn, and I. Steinwart. Better by default: Strong pre-tuned mlps and boosted trees on tabular data. In Proc. of NeurIPS’24, 2024

work page 2024

[26] [26]

Y . Hu, X. Liu, and S. L. Y . Yu. Cascaded algorithm selection with extreme-region UCB bandit. IEEE Transactions on Pattern Analysis and Machine Intelligence, 44(10):6782–6794, 2021

work page 2021

[27] [27]

Kingma and J

D. Kingma and J. Ba. Adam: A method for stochastic optimization. In Proc. of ICLR’15, 2015

work page 2015

[28] [28]

Komer, J

B. Komer, J. Bergstra, and C. Eliasmith. Hyperopt-sklearn: Automatic hyperparameter configuration for scikit-learn. In ICML Workshop on AutoML, 2014

work page 2014

[29] [29]

Kotthoff, C

L. Kotthoff, C. Thornton, H. H. Hoos, F. Hutter, and K. Leyton-Brown. Auto-WEKA 2.0: Automatic model selection and hyperparameter op- timization in WEKA. Journal of Machine Learning Research, 18(25): 1–5, 2017

work page 2017

[30] [30]

Kveton, B

B. Kveton, B. Oreshkin, Y . Park, A. A. Deshmukh, and R. Song. Online posterior sampling with a diffusion prior. InProc. of NeurIPS’24, 2024

work page 2024

[31] [31]

Lattimore and C

T. Lattimore and C. Szepesvári. Bandit Algorithms. Cambridge Univer- sity Press, 2020

work page 2020

[32] [32]

E. H. Lee, V . Perrone, C. Archambeau, and M. Seeger. Cost-aware bayesian optimization. In Proc. of UAI’20, 2020

work page 2020

[33] [33]

J. Lee, A. Xie, A. Pacchiano, Y . Chandak, C. Finn, O. Nachum, and E. Brunskill. Supervised pretraining can learn in-context reinforcement learning. In Proc. of NeurIPS’23, 2023

work page 2023

[34] [34]

Y . Li, J. Jiang, J. Gao, Y . Shao, C. Zhang, and B. Cui. Efficient auto- matic CASH via rising bandits. In Proc. of AAAI’20, pages 4763–4771, 2020

work page 2020

[35] [35]

L. Lin, Y . Bai, and S. Mei. Transformers as decision makers: Provable in-context reinforcement learning via supervised pretraining. In Proc. of ICLR’24, 2024

work page 2024

[36] [36]

Y . Liu, B. Van Roy, and K. Xu. Nonstationary bandit learning via pre- dictive sampling. In Proc. of AISTATS’23, 2023

work page 2023

[37] [37]

Loshchilov and F

I. Loshchilov and F. Hutter. SGDR: Stochastic gradient descent with warm restarts. In Proc. of ICLR’17, 2017

work page 2017

[38] [38]

Müller, N

S. Müller, N. Hollmann, S. Arango, J. Grabocka, and F. Hutter. Trans- formers can do Bayesian inference. In Proc. of ICLR’22, 2022

work page 2022

[39] [39]

S. G. Müller, M. Feurer, N. Hollmann, and F. Hutter. PFNs4BO: In- context learning for bayesian optimization. In Proc. of ICML’23, 2023

work page 2023

[40] [40]

Nagler, L

T. Nagler, L. Schneider, B. Bischl, and M. Feurer. Reshuffling resam- pling splits can improve generalization of hyperparameter optimization. 2024

work page 2024

[41] [41]

Nishihara, D

R. Nishihara, D. Lopez-Paz, and L. Bottou. No regret bound for extreme bandits. In Proc. of AISTATS’16, 2016

work page 2016

[42] [42]

Pfisterer, L

F. Pfisterer, L. Schneider, J. Moosbauer, M. Binder, and B. Bischl. Y AHPO Gym – an efficient multi-objective multi-fidelity benchmark for hyperparameter optimization. In Proc. of AutoML Conf’22. PMLR, 2022

work page 2022

[43] [43]

M. Phan, Y . Abbasi Yadkori, and J. Domke. Thompson sampling and approximate inference. In Proc. of NeurIPS’19, 2019

work page 2019

[44] [44]

Prokhorenkova, G

L. Prokhorenkova, G. Gusev, A. V orobev, A. Dorogush, and A. Gulin. Catboost: Unbiased boosting with categorical features. In Proc. of NeurIPS’18, page 6639–6649, 2018

work page 2018

[45] [45]

Pushak and H

Y . Pushak and H. Hoos. Automl loss landscapes. ACM Transactions on Evolutionary Learning and Optimization, 2(3):1–30, 2022

work page 2022

[46] [46]

Russo and B

D. Russo and B. Van Roy. Learning to optimize via posterior sampling. Mathematics of Operations Research, 39(4):1221–1243, 2014

work page 2014

[47] [47]

Salinas and N

D. Salinas and N. Erickson. TabRepo: A large scale repository of tabular model evaluations and its AutoML applications. In Proc. of AutoML Conf’24. PMLR, 2024

work page 2024

[48] [48]

C. Shen, X. Zhang, W. Wei, and J. Xu. Hyperbandit: Contextual ban- dit with hypernewtork for time-varying user preferences in streaming recommendation. In Proc. of CIKM’23, 2023

work page 2023

[49] [49]

Snoek, H

J. Snoek, H. Larochelle, and R. P. Adams. Practical Bayesian optimiza- tion of machine learning algorithms. In Proc. of NeurIPS’12, 2012

work page 2012

[50] [50]

W. R. Thompson. On the likelihood that one unknown probability ex- ceeds another in view of the evidence of two samples. Biometrika, 25 (3-4):285–294, 1933

work page 1933

[51] [51]

Thornton, F

C. Thornton, F. Hutter, H. Hoos, and K. Leyton-Brown. Auto-WEKA: combined selection and Hyperparameter Optimization of classification algorithms. In Proc. of KDD’13, pages 847–855, 2013

work page 2013

[52] [52]

van den Nieuwenhuijzen, C

M. van den Nieuwenhuijzen, C. Doerr, J. N. van Rijn, and H. Gouk. Se- lecting pre-trained models for transfer learning with data-centric meta- features. In AutoML Conference 2024 (Workshop Track), 2024

work page 2024

[53] [53]

Vaswani, N

A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. Gomez, L. Kaiser, and I. Polosukhin. Attention is all you need. In Proc. of NeurIPS’17. Curran Associates, Inc., 2017

work page 2017

[54] [54]

C. Wang, Q. Wu, M. Weimer, and E. Zhu. Flaml: A fast and lightweight automl library. In Proc. of MLSys’21, pages 434–447, 2021

work page 2021

[55] [55]

Y . Xia, H. Li, T. Qin, N. Yu, and T.-Y . Liu. Thompson sampling for budgeted multi-armed bandits. In Proc. of IJCAI’15, 2015

work page 2015

[56] [56]

Y . Xia, H. Li, T. Qin, N. Yu, and T.-Y . Liu. Thompson sampling for budgeted multi-armed bandits. arXiv preprint arXiv:1505.00146, 2015

work page internal anchor Pith review Pith/arXiv arXiv 2015

[57] [57]

Y . Xia, W. Ding, X.-D. Zhang, N. Yu, and T. Qin. Budgeted bandit problems with continuous random costs. In Proc. of ACML’16, 2016

work page 2016

[58] [58]

Q. Xie, R. Astudillo, P. Frazier, Z. Scully, and A. Terenin. Cost-aware bayesian optimization via the pandora’s box gittins index. In Proc. of NeurIPS’24, 2024

work page 2024

[59] [59]

B. Zhu, X. Shi, N. Erickson, M. Li, G. Karypis, and M. Shoaran. Xtab: Cross-table pretraining for tabular transformers. In Proc. of ICML’23, 2023. Table of Contents for the Appendices • Appendix A: Preliminaries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ....

work page 2023

[60] [60]

Draw U ∼ Uniform(0, 1)

work page

[61] [61]

Convergence Analysis

Solve Fmax(x) = U: [F (x)]t = U =⇒ F (x) = U 1/t =⇒ x = F −1(U 1/t) Thus, max(r1:t) = F −1(U 1/t) follows the correct distribution. Convergence Analysis. For the convergence analysis, let Fn(x) denote the empirical CDF estimated from n i.i.d. samples. Using the Dvoretzky-Kiefer-Wolfowitz (DKW) inequality: P sup x |Fn(x) − F (x)| ≥ ϵ ≤ δ, with ϵ = r ln(2/δ...

work page 2000