Recognition: unknown
In-Context Black-Box Optimization with Unreliable Feedback
Pith reviewed 2026-05-08 13:37 UTC · model grok-4.3
The pith
A transformer pretrained with a structured feedback prior can estimate source reliability at test time to guide black-box optimization.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Feedback-informed in-context black-box optimization (FICBO) pretrains a transformer on trajectories generated from a structured feedback prior that encodes variation in source access, relevance, and distortion. At test time the model receives both the optimization history and the current set of auxiliary feedback values, compares the latter against the observed objective values, and produces reliability estimates that are used to select the next query point. On synthetic and real-world tasks the resulting policy exploits informative feedback while remaining robust when sources are weak or misleading, outperforming baselines that either ignore auxiliary signals or assume they are always valid
What carries the argument
The structured feedback prior, which generates synthetic training trajectories by varying how each feedback source accesses, distorts, and relates to the true objective, thereby allowing the transformer to learn test-time reliability estimation.
If this is right
- The optimizer can accelerate search by exploiting informative signals from experts, simulators, or pretrained predictors.
- Performance does not degrade when some or all auxiliary sources become biased or uncorrelated with the objective.
- A single pretrained model adapts to new tasks and new combinations of feedback sources without retraining.
- Reliability estimates produced by the model provide an interpretable view of which sources the optimizer is currently trusting.
- The approach scales to settings with multiple simultaneous feedback sources whose quality varies across the search.
Where Pith is reading between the lines
- The same pretraining strategy could be applied to other sequential decision problems that receive noisy auxiliary observations, such as active learning or experimental design.
- If feedback sources change their behavior partway through an optimization run, the model's in-context inference might still track the shift without explicit detection mechanisms.
- The reliability estimates could be used downstream to decide whether to query a particular source at all, potentially reducing the cost of obtaining feedback.
- Extending the prior to include temporal correlations between successive feedback values might further improve robustness on long-horizon tasks.
Load-bearing premise
That a structured feedback prior can be defined to capture real variations in source access, relevance, and distortion sufficiently well for the transformer to learn reliable test-time reliability estimation from observed objective values.
What would settle it
If, on a held-out collection of optimization tasks whose feedback distortions lie outside the support of the training prior, the model's query selections show no improvement over baselines that simply ignore the auxiliary signals, the central claim would be falsified.
Figures
read the original abstract
Black-box optimization in science and engineering often comes with side information: experts, simulators, pretrained predictors, or heuristics can suggest which candidates look promising. This information can accelerate search, but it can also be biased, input-dependent, or misleading. Feedback-aware BO methods typically handle one task at a time, limiting their ability to generalize over multiple sources of feedback. In-context optimizers address cross-task adaptation, but usually assume that optimization history is the only available signal at test time. We study feedback-informed in-context black-box optimization (FICBO), where a pretrained optimizer conditions on both the observed history and cheap auxiliary feedback for the current candidate set. We introduce a structured feedback prior that models how feedback sources vary in their access, relevance, and distortion relative to the true objective, and use it to pretrain a feedback-aware transformer. At test time, the model estimates source reliability in context by comparing observed objective values with auxiliary signals, improving query selection. On synthetic and real-world tasks, FICBO effectively exploits informative feedback while remaining robust to weak or misleading sources, improving over other baselines. Empirical investigations further illustrate how the model perceives test-time sources, offering insights into its interpretability and decision-making process.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper introduces feedback-informed in-context black-box optimization (FICBO), where a transformer is pretrained on a structured feedback prior that models variations in source access, relevance, and distortion; at test time the model compares observed objective values against auxiliary signals to estimate per-source reliability and select queries, claiming improved performance over baselines on both synthetic and real-world tasks while remaining robust to weak or misleading feedback.
Significance. If the central claim holds, the work would be significant for extending in-context optimization to handle unreliable side information in a task-general way, with potential impact on scientific and engineering applications where auxiliary signals from simulators or heuristics are available but noisy; the use of a pretraining prior to enable test-time reliability inference is a novel angle, and the interpretability analysis of how the model perceives sources is a strength.
major comments (2)
- [§3] §3 (Methods, structured feedback prior definition): the parameterization of access, relevance, and distortion is not shown to be sufficiently expressive to cover the statistical structure of the real-world feedback sources used in experiments; without this, test-time reliability estimation from observed values versus auxiliary signals risks failing to generalize beyond the pretraining distribution, as synthetic tasks can be constructed to match the prior by design.
- [§4] §4 (Experiments, real-world tasks): no out-of-prior ablation is reported; the claimed robustness on real-world tasks could be explained by accidental alignment with the pretraining distribution rather than genuine reliability estimation, which is load-bearing for the headline claim that FICBO 'effectively exploits informative feedback while remaining robust to weak or misleading sources'.
minor comments (2)
- [Abstract] Abstract: the claim of improvement over 'other baselines' lacks any enumeration of the baselines, statistical tests, or data-split details; this should be expanded for clarity even if full results appear later.
- [§2] Notation: the distinction between 'observed objective values' and 'auxiliary signals' is used throughout but would benefit from an explicit early definition or table to avoid reader confusion in the methods section.
Simulated Author's Rebuttal
We thank the referee for the constructive and detailed feedback. The comments identify important aspects of the prior's expressiveness and the need for stronger evidence on generalization. We address each point below and describe the revisions we will make to the manuscript.
read point-by-point responses
-
Referee: [§3] §3 (Methods, structured feedback prior definition): the parameterization of access, relevance, and distortion is not shown to be sufficiently expressive to cover the statistical structure of the real-world feedback sources used in experiments; without this, test-time reliability estimation from observed values versus auxiliary signals risks failing to generalize beyond the pretraining distribution, as synthetic tasks can be constructed to match the prior by design.
Authors: We appreciate the referee's emphasis on this issue. The prior is intentionally factored into three interpretable dimensions—access (availability of the auxiliary signal), relevance (statistical dependence on the true objective), and distortion (bias and noise characteristics)—to allow the transformer to learn context-dependent reliability estimation. While we do not claim the parameterization exhausts every possible higher-order statistical dependence that could appear in real-world sources, the pretraining distribution samples broadly across these axes, and the in-context adaptation mechanism is designed to infer effective reliability from observed discrepancies at test time. The real-world results are consistent with this capability. To directly address the concern, we will revise §3 to include a more explicit discussion of the prior's coverage, along with a comparison of the induced marginal distributions against the empirical feedback statistics observed in the real-world tasks. revision: yes
-
Referee: [§4] §4 (Experiments, real-world tasks): no out-of-prior ablation is reported; the claimed robustness on real-world tasks could be explained by accidental alignment with the pretraining distribution rather than genuine reliability estimation, which is load-bearing for the headline claim that FICBO 'effectively exploits informative feedback while remaining robust to weak or misleading sources'.
Authors: We agree that an explicit out-of-prior ablation would provide clearer support for the robustness claim. The current real-world experiments use feedback sources with varying degrees of misalignment, but they were not deliberately constructed to lie outside the support of the pretraining distribution. We will add a new ablation subsection in §4 that introduces controlled mismatches (e.g., feedback with distortion levels and relevance patterns outside the pretraining ranges) on both synthetic and real-world tasks, and we will report whether FICBO retains its advantages relative to baselines that lack explicit reliability modeling. revision: yes
Circularity Check
No circularity: method is a standard pretrain-then-adapt pipeline with no self-referential derivations or fitted predictions.
full rationale
The paper defines a structured feedback prior explicitly, uses it to generate pretraining data for a transformer, and then evaluates the resulting model on held-out synthetic and real tasks. No equations, uniqueness theorems, or self-citations are presented that reduce the central claim (test-time reliability estimation from observed values vs. auxiliary signals) to a tautology or to parameters fitted on the evaluation data itself. The approach is self-contained against external benchmarks and does not rename known results or smuggle ansatzes via self-citation. Minor self-citations to prior in-context optimization work, if present, are not load-bearing for the new feedback-aware component.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption Feedback sources vary in access, relevance, and distortion relative to the true objective in a structured way that can be modeled a priori.
invented entities (2)
-
Structured feedback prior
no independent evidence
-
Feedback-aware transformer
no independent evidence
Reference graph
Works this paper leans on
-
[1]
Generative modeling for RNA splicing predictions and design.eLife, 2025
Di Wu, Natalie Maus, Anupama Jha, Kevin Yang, Benjamin D Wales-McGrath, San Jewell, Anna Tangiyan, Peter Choi, Jacob R Gardner, and Yoseph Barash. Generative modeling for RNA splicing predictions and design.eLife, 2025
2025
-
[2]
Human-in-the-loop assisted de novo molecular design.Journal of cheminformatics, 2022
Iiris Sundin, Alexey V oronov, Haoping Xiao, Kostas Papadopoulos, Esben Jannik Bjerrum, Markus Heinonen, Atanas Patronov, Samuel Kaski, and Ola Engkvist. Human-in-the-loop assisted de novo molecular design.Journal of cheminformatics, 2022
2022
-
[3]
Benchmarking the perfor- mance of bayesian optimization across multiple experimental materials science domains.npj Computational Materials, 2021
Qiaohao Liang, Aldair E Gongora, Zekun Ren, Armi Tiihonen, Zhe Liu, Shijing Sun, James R Deneault, Daniil Bash, Flore Mekki-Berrada, Saif A Khan, et al. Benchmarking the perfor- mance of bayesian optimization across multiple experimental materials science domains.npj Computational Materials, 2021
2021
-
[4]
InInternational Conference on Learning Representations, 2022
Carl Hvarfner, Danny Stoll, Artur Souza, Luigi Nardi, Marius Lindauer, and Frank Hutter.πBO: Augmenting acquisition functions with user beliefs for bayesian optimization. InInternational Conference on Learning Representations, 2022
2022
-
[5]
Multi-fidelity bayesian optimization with unreliable information sources
Petrus Mikkola, Julien Martinelli, Louis Filstroff, and Samuel Kaski. Multi-fidelity bayesian optimization with unreliable information sources. InProceedings of The 26th International Conference on Artificial Intelligence and Statistics, 2023
2023
-
[6]
Multi-fidelity bayesian optimization with multiple information sources of input-dependent fidelity
Mingzhou Fan, Byung-Jun Yoon, Edward Dougherty, Nathan Urban, Francis Alexander, Ray- mundo Arróyave, and Xiaoning Qian. Multi-fidelity bayesian optimization with multiple information sources of input-dependent fidelity. InProceedings of the Fortieth Conference on Uncertainty in Artificial Intelligence, 2024
2024
-
[7]
Carl Edward Rasmussen and Christopher K. I. Williams.Gaussian processes for machine learning.MIT Press, 2006
2006
-
[8]
Cambridge University Press, 2023
Roman Garnett.Bayesian optimization. Cambridge University Press, 2023
2023
-
[9]
Osborne, Sebastian Orbell, Natalia Ares, Krikamol Muandet, and Siu Lun Chau
Masaki Adachi, Brady Planden, David Howey, Michael A. Osborne, Sebastian Orbell, Natalia Ares, Krikamol Muandet, and Siu Lun Chau. Looping in the human: Collaborative and explainable Bayesian optimization. InProceedings of The 27th International Conference on Artificial Intelligence and Statistics, 2024
2024
-
[10]
Prior-guided bayesian optimization
Artur Souza, Luigi Nardi, Leonardo Oliveira, Kunle Olukotun, Marius Lindauer, and Frank Hutter. Prior-guided bayesian optimization. 2020
2020
-
[11]
Deep adaptive design: Amor- tizing sequential bayesian experimental design
Adam Foster, Desi R Ivanova, Ilyas Malik, and Tom Rainforth. Deep adaptive design: Amor- tizing sequential bayesian experimental design. InProceedings of the 38th International Conference on Machine Learning, 2021
2021
-
[12]
End-to-end meta-bayesian optimisation with transformer neural processes.Advances in Neural Information Processing Systems, 36:11246–11260, 2023
Alexandre Maraval, Matthieu Zimmer, Antoine Grosnit, and Haitham Bou Ammar. End-to-end meta-bayesian optimisation with transformer neural processes.Advances in Neural Information Processing Systems, 36:11246–11260, 2023
2023
-
[13]
ALINE: Joint amortization for bayesian inference and active data acquisition
Daolang Huang, Xinyi Wen, Ayush Bharti, Samuel Kaski, and Luigi Acerbi. ALINE: Joint amortization for bayesian inference and active data acquisition. InThirty-ninth Conference on Neural Information Processing Systems, 2025. 10
2025
-
[14]
In-context multi-objective optimization
Xinyu Zhang, Conor Hassan, Julien Martinelli, Daolang Huang, and Samuel Kaski. In-context multi-objective optimization. InThe Fourteenth International Conference on Learning Repre- sentations, 2026
2026
-
[15]
Pfns4bo: In-context learning for bayesian optimization
Samuel Müller, Matthias Feurer, Noah Hollmann, and Frank Hutter. Pfns4bo: In-context learning for bayesian optimization. InInternational Conference on Machine Learning, 2023
2023
-
[16]
GIT-BO: High-dimensional bayesian optimization with tabular foundation models
Rosen Ting-Ying Yu, Cyril Picard, and Faez Ahmed. GIT-BO: High-dimensional bayesian optimization with tabular foundation models. InThe Fourteenth International Conference on Learning Representations, 2026
2026
-
[17]
Yu-Heng Hung, Kai-Jie Lin, Yu-Heng Lin, Chien-Yi Wang, Cheng Sun, and Ping-Chun Hsieh. Boformer: Learning to solve multi-objective bayesian optimization via non-markovian rl.arXiv preprint arXiv:2505.21974, 2025
-
[18]
PABBO: Preferential amortized black-box optimization
Xinyu Zhang, Daolang Huang, Samuel Kaski, and Julien Martinelli. PABBO: Preferential amortized black-box optimization. InThe Thirteenth International Conference on Learning Representations, 2025
2025
-
[19]
When to elicit preferences in multi-objective bayesian optimization
Juan Ungredda and Juergen Branke. When to elicit preferences in multi-objective bayesian optimization. InProceedings of the Companion Conference on Genetic and Evolutionary Computation, 2023
2023
-
[20]
TabPFN: A transformer that solves small tabular classification problems in a second
Noah Hollmann, Samuel Müller, Katharina Eggensperger, and Frank Hutter. TabPFN: A transformer that solves small tabular classification problems in a second. InInternational Conference on Learning Representations, 2023
2023
-
[21]
Amortized bayesian optimization over discrete spaces
Kevin Swersky, Yulia Rubanova, David Dohan, and Kevin Murphy. Amortized bayesian optimization over discrete spaces. InConference on Uncertainty in Artificial Intelligence, 2020
2020
-
[22]
Humans as information sources in Bayesian optimization
Petrus Mikkola. Humans as information sources in Bayesian optimization. 2024
2024
-
[23]
Human-in-the-loop active learning for goal-oriented molecule generation.Journal of Cheminformatics, 2024
Yasmine Nahal, Janosch Menke, Julien Martinelli, Markus Heinonen, Mikhail Kabeshov, Jon Paul Janet, Eva Nittinger, Ola Engkvist, and Samuel Kaski. Human-in-the-loop active learning for goal-oriented molecule generation.Journal of Cheminformatics, 2024
2024
-
[24]
Yiming Yao, Fei Liu, Liang Zhao, Xi Lin, Yilu Liu, and Qingfu Zhang. Fomemo: Towards foundation models for expensive multi-objective optimization.arXiv preprint arXiv:2509.03244, 2025
-
[25]
Tabiclv2: A better, faster, scalable, and open tabular foundation model, 2026
Jingang Qu, David Holzmüller, Gaël Varoquaux, and Marine Le Morvan. Tabiclv2: A better, faster, scalable, and open tabular foundation model.arXiv preprint arXiv:2602.11139, 2026
-
[26]
More than irrational: Modeling belief-biased agents
Yifan Zhu, Sammie Katt, and Samuel Kaski. More than irrational: Modeling belief-biased agents. InProceedings of the AAAI Conference on Artificial Intelligence, 2026
2026
-
[27]
BoTorch: A framework for efficient monte-carlo bayesian optimization
Maximilian Balandat, Brian Karrer, Daniel Jiang, Samuel Daulton, Ben Letham, Andrew G Wil- son, and Eytan Bakshy. BoTorch: A framework for efficient monte-carlo bayesian optimization. InAdvances in Neural Information Processing Systems, 2020
2020
-
[28]
Multi-fidelity optimization via surrogate modelling.Proceedings of the royal society a: mathematical, physical and engineering sciences, 2007
Alexander IJ Forrester, András Sóbester, and Andy J Keane. Multi-fidelity optimization via surrogate modelling.Proceedings of the royal society a: mathematical, physical and engineering sciences, 2007
2007
-
[29]
Kernels for vector-valued functions: A review.Foundations and Trends® in Machine Learning, 2012
Mauricio A Alvarez, Lorenzo Rosasco, and Neil D Lawrence. Kernels for vector-valued functions: A review.Foundations and Trends® in Machine Learning, 2012
2012
-
[30]
Some considerations regarding the use of multi-fidelity kriging in the construction of surrogate models.Structural and Multidisciplinary Optimization, 2015
David JJ Toal. Some considerations regarding the use of multi-fidelity kriging in the construction of surrogate models.Structural and Multidisciplinary Optimization, 2015
2015
-
[31]
Pearson Educacion, 1999
H Scott Fogler.Elements of chemical reaction engineering. Pearson Educacion, 1999
1999
-
[32]
John wiley & sons, 1998
Octave Levenspiel.Chemical reaction engineering. John wiley & sons, 1998. 11
1998
-
[33]
Bayesian reaction optimization as a tool for chemical synthesis.Nature, 2021
Benjamin J Shields, Jason Stevens, Jun Li, Marvin Parasram, Farhan Damani, Jesus I Martinez Alvarado, Jacob M Janey, Ryan P Adams, and Abigail G Doyle. Bayesian reaction optimization as a tool for chemical synthesis.Nature, 2021
2021
-
[34]
Peter Sharpe and R John Hansman. Neuralfoil: An airfoil aerodynamics analysis tool using physics-informed machine learning.arXiv preprint arXiv:2503.16323, 2025
-
[35]
Multi-fidelity surrogate model-based airfoil optimization at a transitional low reynolds number.S ¯adhan¯a, 2021
R Priyanka and M Sivapragasam. Multi-fidelity surrogate model-based airfoil optimization at a transitional low reynolds number.S ¯adhan¯a, 2021
2021
-
[36]
Xuanyi Dong and Yi Yang. Nas-bench-201: Extending the scope of reproducible neural architecture search.arXiv preprint arXiv:2001.00326, 2020
-
[37]
Evaluating the search phase of neural architecture search.arXiv preprint arXiv:1902.08142, 2019
Kaicheng Yu, Christian Sciuto, Martin Jaggi, Claudiu Musat, and Mathieu Salzmann. Evaluating the search phase of neural architecture search.arXiv preprint arXiv:1902.08142, 2019
-
[38]
Regularized evolution for image classifier architecture search
Esteban Real, Alok Aggarwal, Yanping Huang, and Quoc V Le. Regularized evolution for image classifier architecture search. InProceedings of the aaai conference on artificial intelligence, 2019
2019
-
[39]
UMAP: Uniform Manifold Approximation and Projection for Dimension Reduction
Leland McInnes, John Healy, and James Melville. Umap: Uniform manifold approximation and projection for dimension reduction.arXiv preprint arXiv:1802.03426, 2018
work page internal anchor Pith review arXiv 2018
-
[40]
Pytorch: An imperative style, high-performance deep learning library.Advances in neural information processing systems, 32, 2019
Adam Paszke, Sam Gross, Francisco Massa, Adam Lerer, James Bradbury, Gregory Chanan, Trevor Killeen, Zeming Lin, Natalia Gimelshein, Luca Antiga, et al. Pytorch: An imperative style, high-performance deep learning library.Advances in neural information processing systems, 32, 2019. 12 The appendix is organized as follows: • Appendix Apresents additional e...
2019
-
[41]
Limitations
+ 10(S14) wherex ′ i = 15xi, over[0,1] 2. We use the correlation-adjustable low-fidelity variant of [30]: fl(x) =f h(x)−(a+ 0.5) x′ 2 − 5.1 4π2 x′2 1 + 5 π x′ 1 −6 2 (S15) 19 where a∈[0,1] controls the HF–LF correlation; increasing a suppresses the contribution of the squared term, shifting the low-fidelity optimum away from that off h. 0.0 0.2 0.4 0.6 0....
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.