In-Context Decision Making for Optimizing Complex AutoML Pipelines
Pith reviewed 2026-05-18 22:34 UTC · model grok-4.3
The pith
PS-PFN extends posterior sampling to the max k-armed bandit problem to select and adapt complex ML pipelines including fine-tuning and ensembling via in-context learning with prior-data fitted networks.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
We propose PS-PFN to efficiently explore and exploit adapting ML pipelines by extending Posterior Sampling (PS) to the max k-armed bandit problem setup. PS-PFN leverages prior-data fitted networks (PFNs) to efficiently estimate the posterior distribution of the maximal value via in-context learning. We show how to extend this method to consider varying costs of pulling arms and to use different PFNs to model reward distributions individually per arm. Experimental results on one novel and two existing standard benchmark tasks demonstrate the superior performance of PS-PFN compared to other bandit and AutoML strategies.
What carries the argument
PS-PFN, which extends posterior sampling to the max k-armed bandit formulation and applies prior-data fitted networks to perform in-context estimation of the posterior distribution over the maximal reward.
If this is right
- PS-PFN achieves superior performance over other bandit and AutoML strategies on both existing and new benchmark tasks.
- The approach extends the CASH framework to modern pipelines that require fine-tuning and ensembling.
- Varying costs of evaluating different pipelines can be incorporated directly into the decision process.
- Separate prior-data fitted networks can model the reward distribution of each pipeline arm individually.
Where Pith is reading between the lines
- The same in-context posterior estimation technique might apply to other selection tasks that involve choosing the single best option among many heterogeneous alternatives.
- As prior-data fitted networks improve on broader training distributions, the method could scale to larger and more diverse pipeline search spaces.
- Practitioners could combine PS-PFN with existing AutoML tools to reduce the number of pipeline evaluations needed for competitive results.
Load-bearing premise
Prior-data fitted networks can reliably estimate the posterior distribution of the maximal reward for heterogeneous AutoML pipelines that include fine-tuning and ensembling, even when reward structures and evaluation costs vary.
What would settle it
A new benchmark consisting of pipelines with reward distributions outside the training distribution of the prior-data fitted networks where PS-PFN fails to select the top performer more often than standard bandit or AutoML baselines.
Figures
read the original abstract
Combined Algorithm Selection and Hyperparameter Optimization (CASH) has been fundamental to traditional AutoML systems. However, with the advancements of pre-trained models, modern ML workflows go beyond hyperparameter optimization and often require fine-tuning, ensembling, and other adaptation techniques. While the core challenge of identifying the best-performing model for a downstream task remains, the increasing heterogeneity of ML pipelines demands novel AutoML approaches. This work extends the CASH framework to select and adapt modern ML pipelines. We propose PS-PFN to efficiently explore and exploit adapting ML pipelines by extending Posterior Sampling (PS) to the max k-armed bandit problem setup. PS-PFN leverages prior-data fitted networks (PFNs) to efficiently estimate the posterior distribution of the maximal value via in-context learning. We show how to extend this method to consider varying costs of pulling arms and to use different PFNs to model reward distributions individually per arm. Experimental results on one novel and two existing standard benchmark tasks demonstrate the superior performance of PS-PFN compared to other bandit and AutoML strategies. We make our code and data available at https://github.com/amirbalef/CASHPlus.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript proposes PS-PFN, which extends posterior sampling to a max k-armed bandit formulation for selecting and adapting heterogeneous ML pipelines (including fine-tuning and ensembling) in an extended CASH setting. It uses prior-data fitted networks (PFNs) for in-context estimation of the posterior over the maximal reward, incorporates per-arm PFNs and cost variations, and reports superior performance versus bandit and AutoML baselines on one novel benchmark plus two existing ones, with code and data released.
Significance. If the performance claims are substantiated, the work could provide a practical in-context learning route for modern AutoML pipelines whose heterogeneity exceeds traditional hyperparameter optimization. The explicit release of code and data at https://github.com/amirbalef/CASHPlus is a clear strength that supports reproducibility and follow-up work.
major comments (2)
- [Abstract] The abstract states that PS-PFN demonstrates superior performance on one novel and two existing benchmarks, yet supplies no information on experimental design, baseline implementations, number of runs, statistical tests, or how cost variations and pipeline heterogeneity were controlled; this information is required to evaluate whether the data support the central empirical claim.
- [Method] The claim that PFNs reliably estimate the posterior distribution of the maximal value via in-context learning (with per-arm models and cost-adjusted selection) rests on the unverified assumption that the PFN training distribution matches the reward structures, non-stationarity, and cost heterogeneity of AutoML tasks; no calibration checks or ground-truth posterior comparisons are described to substantiate this.
minor comments (1)
- [Abstract] The phrase 'max k-armed bandit problem setup' is introduced without a short formal definition or pointer to the precise formulation (e.g., reward as maximum rather than sum), which would aid readers.
Simulated Author's Rebuttal
We thank the referee for the detailed and constructive report. We address each major comment below and describe the revisions we will incorporate to improve clarity and substantiation of our claims.
read point-by-point responses
-
Referee: [Abstract] The abstract states that PS-PFN demonstrates superior performance on one novel and two existing benchmarks, yet supplies no information on experimental design, baseline implementations, number of runs, statistical tests, or how cost variations and pipeline heterogeneity were controlled; this information is required to evaluate whether the data support the central empirical claim.
Authors: We agree that the abstract would benefit from additional context to support the central empirical claim. In the revised manuscript we will expand the abstract to briefly note the experimental design, including the use of 10 independent runs per task, statistical significance testing via paired t-tests, and explicit controls for cost variations and pipeline heterogeneity. Full details on baseline implementations and the construction of the novel benchmark will continue to be provided in the Experiments section and appendix. revision: yes
-
Referee: [Method] The claim that PFNs reliably estimate the posterior distribution of the maximal value via in-context learning (with per-arm models and cost-adjusted selection) rests on the unverified assumption that the PFN training distribution matches the reward structures, non-stationarity, and cost heterogeneity of AutoML tasks; no calibration checks or ground-truth posterior comparisons are described to substantiate this.
Authors: We acknowledge that the current manuscript does not include explicit calibration checks or ground-truth posterior comparisons. The PFNs were trained on a synthetic distribution constructed to cover a broad range of reward structures, non-stationarities, and cost heterogeneities (detailed in Section 3), but we agree that direct validation would strengthen the methodological claims. We will add a new subsection (or appendix) in the revised manuscript that reports calibration results on synthetic tasks designed to mimic AutoML reward distributions, including posterior calibration plots and comparisons to exact posteriors where computationally feasible. revision: yes
Circularity Check
No circularity detected; derivation extends established PS and PFN concepts to max-k bandit without reducing to fitted inputs or self-citation chains
full rationale
The provided abstract and context describe an extension of posterior sampling to a max k-armed bandit formulation for AutoML pipelines, using PFNs for in-context posterior estimation over maximal rewards, with per-arm PFNs and cost handling. No equations, definitions, or load-bearing steps are shown that equate a claimed prediction or uniqueness result to a fitted parameter or prior self-citation by construction. The central method relies on external PFN training and bandit theory as independent inputs rather than re-deriving them from the target AutoML results. This matches the default expectation of a self-contained extension with no reduction to inputs.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption The selection and adaptation of modern ML pipelines can be modeled as a max k-armed bandit problem with varying arm costs.
invented entities (1)
-
PS-PFN
no independent evidence
Lean theorems connected to this paper
-
IndisputableMonolith/Cost/FunctionalEquation.leanwashburn_uniqueness_aczel unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
We propose PS-PFN to efficiently explore and exploit adapting ML pipelines by extending Posterior Sampling (PS) to the max k-armed bandit problem setup. PS-PFN leverages prior-data fitted networks (PFNs) to efficiently estimate the posterior distribution of the maximal value via in-context learning.
-
IndisputableMonolith/Foundation/AlphaCoordinateFixation.leanalpha_pin_under_high_calibration unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
The semi-flat prior covers trajectories where rewards gradually improve over time, reflecting a shift with a rapid decay in the distribution
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Reference graph
Works this paper leans on
-
[1]
S. Adriaensen, H. Rakotoarison, S. Müller, and F. Hutter. Efficient Bayesian learning curve extrapolation using prior-data fitted networks. In Proc. of NeurIPS’23, 2023
work page 2023
-
[2]
S. P. Arango, F. Ferreira, A. Kadra, F. Hutter, and J. Grabocka. Quick- tune: Quickly learning which pretrained model to finetune and how. In Proc. of ICLR’24, 2024
work page 2024
-
[3]
A. R. Balef, C. Vernade, and K. Eggensperger. Towards bandit- based optimization for automated machine learning. In 5th Work- shop on practical ML for limited/low resource settings , 2024. URL https://openreview.net/forum?id=S5da3rzyuk
work page 2024
- [4]
- [5]
-
[6]
E. Bergman, M. Feurer, A. Bahram, A. R. Balef, L. Purucker, S. Segel, M. Lindauer, F. Hutter, and K. Eggensperger. AMLTK: A Modular Automl Toolkit in Python. Journal of Open Source Software , 9(100): 6367, 2024. doi: 10.21105/joss.06367. URL https://doi.org/10.21105/ joss.06367
-
[7]
B. Bischl, G. Casalicchio, T. Das, M. Feurer, S. Fischer, P. Gijs- bers, S. Mukherjee, A. C. Müller, L. Németh, L. Oala, L. Purucker, S. Ravi, J. N. van Rijn, P. Singh, J. Vanschoren, J. van der Velde, and M. Wever. Openml: Insights from 10 years and more than a thou- sand papers. Patterns, 6(7):101317, 2025. ISSN 2666-3899. doi: https://doi.org/10.1016/j...
- [8]
-
[9]
T. Brown, B. Mann, N. Ryder, M. Subbiah, J. Kaplan, P. Dhariwal, A. Neelakantan, P. Shyam, G. Sastry, A. Askell, S. Agarwal, A. Herbert- V oss, G. Krueger, T. Henighan, R. Child, A. Ramesh, D. Ziegler, J. Wu, C. Winter, C. Hesse, M. Chen, E. Sigler, M. Litwin, S. Gray, B. Chess, J. Clark, C. Berner, S. McCandlish, A. Radford, I. Sutskever, and D. Amodei. ...
work page 1901
- [10]
-
[11]
L. Chen, K. Lu, A. Rajeswaran, K. Lee, A. Grover, M. Laskin, P. Abbeel, A. Srinivas, and I. Mordatch. Decision transformer: Rein- forcement learning via sequence modeling. In Proc. of NeurIPS’21, 2021
work page 2021
-
[12]
T. Chen and C. Guestrin. XGBoost: A scalable tree boosting system. In Proc. of KDD’16, pages 785–794, 2016
work page 2016
-
[13]
A. Cowen-Rivers, W. Lyu, R. Tutunov, Z. Wang, A. Grosnit, R. Grif- fiths, A. Maraval, H. Jianye, J. Wang, J. Peters, and H. Ammar. HEBO: Pushing the limits of sample-efficient hyper-parameter optimisation. Journal of Artificial Intelligence Research, 74:1269–1349, 2022
work page 2022
- [14]
- [15]
-
[16]
W. Ding, T. Qin, X.-D. Zhang, and T.-Y . Liu. Multi-armed bandit with budget constraint and variable costs. In Proc. of AAAI’13, volume 27, pages 232–238, 2013
work page 2013
- [17]
- [18]
- [19]
-
[20]
M. Fiandri, A. M. Metelli, and F. Trovò. Thompson sampling-like al- gorithms for stochastic rising rested bandits. In Seventeenth European Workshop on Reinforcement Learning, 2024. URL https://openreview. net/forum?id=jaFhipqjxR
work page 2024
-
[21]
Y . Gorishniy, I. Rubachev, V . Khrulkov, and A. Babenko. Revisiting deep learning models for tabular data. In Proc. of NeurIPS’21, 2021
work page 2021
-
[22]
N. Hollmann, S. Müller, K. Eggensperger, and F. Hutter. TabPFN: A transformer that solves small tabular classification problems in a sec- ond. In Proc. of ICLR’23, 2023
work page 2023
-
[23]
N. Hollmann, S. Müller, and F. Hutter. Large language models for auto- mated data science: Introducing CAAFE for context-aware automated feature engineering. arXiv:2305.03403[v5] [cs.AI], 2023
-
[24]
N. Hollmann, S. Müller, and F. Hutter. Accurate predictions on small data with a tabular foundation model. Nature, 637:319–326, 2025. doi: 10.1038/s41586-024-08328-6. URL https://www.nature.com/articles/ s41586-024-08328-6
-
[25]
D. Holzmüller, L. Grinsztajn, and I. Steinwart. Better by default: Strong pre-tuned mlps and boosted trees on tabular data. In Proc. of NeurIPS’24, 2024
work page 2024
-
[26]
Y . Hu, X. Liu, and S. L. Y . Yu. Cascaded algorithm selection with extreme-region UCB bandit. IEEE Transactions on Pattern Analysis and Machine Intelligence, 44(10):6782–6794, 2021
work page 2021
-
[27]
D. Kingma and J. Ba. Adam: A method for stochastic optimization. In Proc. of ICLR’15, 2015
work page 2015
- [28]
-
[29]
L. Kotthoff, C. Thornton, H. H. Hoos, F. Hutter, and K. Leyton-Brown. Auto-WEKA 2.0: Automatic model selection and hyperparameter op- timization in WEKA. Journal of Machine Learning Research, 18(25): 1–5, 2017
work page 2017
- [30]
-
[31]
T. Lattimore and C. Szepesvári. Bandit Algorithms. Cambridge Univer- sity Press, 2020
work page 2020
-
[32]
E. H. Lee, V . Perrone, C. Archambeau, and M. Seeger. Cost-aware bayesian optimization. In Proc. of UAI’20, 2020
work page 2020
-
[33]
J. Lee, A. Xie, A. Pacchiano, Y . Chandak, C. Finn, O. Nachum, and E. Brunskill. Supervised pretraining can learn in-context reinforcement learning. In Proc. of NeurIPS’23, 2023
work page 2023
-
[34]
Y . Li, J. Jiang, J. Gao, Y . Shao, C. Zhang, and B. Cui. Efficient auto- matic CASH via rising bandits. In Proc. of AAAI’20, pages 4763–4771, 2020
work page 2020
-
[35]
L. Lin, Y . Bai, and S. Mei. Transformers as decision makers: Provable in-context reinforcement learning via supervised pretraining. In Proc. of ICLR’24, 2024
work page 2024
-
[36]
Y . Liu, B. Van Roy, and K. Xu. Nonstationary bandit learning via pre- dictive sampling. In Proc. of AISTATS’23, 2023
work page 2023
-
[37]
I. Loshchilov and F. Hutter. SGDR: Stochastic gradient descent with warm restarts. In Proc. of ICLR’17, 2017
work page 2017
- [38]
-
[39]
S. G. Müller, M. Feurer, N. Hollmann, and F. Hutter. PFNs4BO: In- context learning for bayesian optimization. In Proc. of ICML’23, 2023
work page 2023
- [40]
-
[41]
R. Nishihara, D. Lopez-Paz, and L. Bottou. No regret bound for extreme bandits. In Proc. of AISTATS’16, 2016
work page 2016
-
[42]
F. Pfisterer, L. Schneider, J. Moosbauer, M. Binder, and B. Bischl. Y AHPO Gym – an efficient multi-objective multi-fidelity benchmark for hyperparameter optimization. In Proc. of AutoML Conf’22. PMLR, 2022
work page 2022
-
[43]
M. Phan, Y . Abbasi Yadkori, and J. Domke. Thompson sampling and approximate inference. In Proc. of NeurIPS’19, 2019
work page 2019
-
[44]
L. Prokhorenkova, G. Gusev, A. V orobev, A. Dorogush, and A. Gulin. Catboost: Unbiased boosting with categorical features. In Proc. of NeurIPS’18, page 6639–6649, 2018
work page 2018
-
[45]
Y . Pushak and H. Hoos. Automl loss landscapes. ACM Transactions on Evolutionary Learning and Optimization, 2(3):1–30, 2022
work page 2022
-
[46]
D. Russo and B. Van Roy. Learning to optimize via posterior sampling. Mathematics of Operations Research, 39(4):1221–1243, 2014
work page 2014
-
[47]
D. Salinas and N. Erickson. TabRepo: A large scale repository of tabular model evaluations and its AutoML applications. In Proc. of AutoML Conf’24. PMLR, 2024
work page 2024
-
[48]
C. Shen, X. Zhang, W. Wei, and J. Xu. Hyperbandit: Contextual ban- dit with hypernewtork for time-varying user preferences in streaming recommendation. In Proc. of CIKM’23, 2023
work page 2023
- [49]
-
[50]
W. R. Thompson. On the likelihood that one unknown probability ex- ceeds another in view of the evidence of two samples. Biometrika, 25 (3-4):285–294, 1933
work page 1933
-
[51]
C. Thornton, F. Hutter, H. Hoos, and K. Leyton-Brown. Auto-WEKA: combined selection and Hyperparameter Optimization of classification algorithms. In Proc. of KDD’13, pages 847–855, 2013
work page 2013
-
[52]
M. van den Nieuwenhuijzen, C. Doerr, J. N. van Rijn, and H. Gouk. Se- lecting pre-trained models for transfer learning with data-centric meta- features. In AutoML Conference 2024 (Workshop Track), 2024
work page 2024
-
[53]
A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. Gomez, L. Kaiser, and I. Polosukhin. Attention is all you need. In Proc. of NeurIPS’17. Curran Associates, Inc., 2017
work page 2017
-
[54]
C. Wang, Q. Wu, M. Weimer, and E. Zhu. Flaml: A fast and lightweight automl library. In Proc. of MLSys’21, pages 434–447, 2021
work page 2021
-
[55]
Y . Xia, H. Li, T. Qin, N. Yu, and T.-Y . Liu. Thompson sampling for budgeted multi-armed bandits. In Proc. of IJCAI’15, 2015
work page 2015
-
[56]
Y . Xia, H. Li, T. Qin, N. Yu, and T.-Y . Liu. Thompson sampling for budgeted multi-armed bandits. arXiv preprint arXiv:1505.00146, 2015
work page internal anchor Pith review Pith/arXiv arXiv 2015
-
[57]
Y . Xia, W. Ding, X.-D. Zhang, N. Yu, and T. Qin. Budgeted bandit problems with continuous random costs. In Proc. of ACML’16, 2016
work page 2016
-
[58]
Q. Xie, R. Astudillo, P. Frazier, Z. Scully, and A. Terenin. Cost-aware bayesian optimization via the pandora’s box gittins index. In Proc. of NeurIPS’24, 2024
work page 2024
-
[59]
B. Zhu, X. Shi, N. Erickson, M. Li, G. Karypis, and M. Shoaran. Xtab: Cross-table pretraining for tabular transformers. In Proc. of ICML’23, 2023. Table of Contents for the Appendices • Appendix A: Preliminaries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ....
work page 2023
-
[60]
Draw U ∼ Uniform(0, 1)
-
[61]
Solve Fmax(x) = U: [F (x)]t = U =⇒ F (x) = U 1/t =⇒ x = F −1(U 1/t) Thus, max(r1:t) = F −1(U 1/t) follows the correct distribution. Convergence Analysis. For the convergence analysis, let Fn(x) denote the empirical CDF estimated from n i.i.d. samples. Using the Dvoretzky-Kiefer-Wolfowitz (DKW) inequality: P sup x |Fn(x) − F (x)| ≥ ϵ ≤ δ, with ϵ = r ln(2/δ...
work page 2000
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.