Recognition: 2 theorem links
· Lean TheoremBayesian Optimization with Structured Measurements: A Vector-Valued RKHS Framework
Pith reviewed 2026-05-12 02:02 UTC · model grok-4.3
The pith
Bayesian optimization can use vector-valued measurements of trajectories or fields to learn faster than with single scalar values.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
We study Bayesian optimization over a vector-valued operator with structured measurements, where each measurement observes multidimensional or functional outputs rather than a single scalar value and the objective is defined as a linear functional of these measurements. Assuming the unknown operator lies in a vector-valued reproducing kernel Hilbert space, we derive high-probability concentration bounds for the kernel ridge regression estimator directly in the measurement space. Building on these results, we propose an algorithm based on the upper confidence bound acquisition function with regret guarantees under mild assumptions, recovering sublinear rates for common kernels.
What carries the argument
Vector-valued reproducing kernel Hilbert space for the unknown operator, together with high-probability concentration bounds on kernel ridge regression performed in the space of structured measurements.
If this is right
- The UCB acquisition function yields sublinear regret for common kernel choices.
- Structured measurements improve sample efficiency by transferring information across related objectives.
- The method supports adaptation when the underlying system varies over time.
- Uncertainty quantification works directly in general Hilbert spaces of measurements.
Where Pith is reading between the lines
- Similar structured-output modeling could be applied to other acquisition strategies or to non-Bayesian optimization routines that currently collapse outputs to scalars.
- Domains such as control or robotics might optimize directly over trajectory outputs without intermediate scalar reduction.
- The dependence of convergence rates on the choice of linear functional defining the objective could be studied further to tighten practical performance.
Load-bearing premise
The unknown operator that produces the structured vector or functional measurements belongs to a vector-valued reproducing kernel Hilbert space.
What would settle it
An experiment in which the true mapping from inputs to structured outputs lies outside any vector-valued RKHS and the observed cumulative regret grows faster than the sublinear rate predicted by the analysis.
Figures
read the original abstract
Bayesian optimization (BO) is an efficient framework for optimizing expensive black-box functions. However, it is typically formulated as learning an end-to-end mapping from inputs to scalar objectives, thereby discarding the potentially rich information whenever a structured system output is available. In this work, we study Bayesian optimization over a vector-valued operator with structured measurements, where each measurement observes multidimensional or functional outputs, e.g., trajectories or spatial fields, rather than a single scalar value. The objective is then defined as a linear functional of these measurements. This allows each observation to reveal substantially richer information about the underlying system compared to scalar observations. Assuming the unknown operator lies in a vector-valued reproducing kernel Hilbert space (RKHS), we derive high-probability concentration bounds for the kernel ridge regression (KRR) estimator directly in the measurement space, characterizing uncertainty in a general Hilbert space. Building on these results, we propose an algorithm based on the upper confidence bound (UCB) acquisition function with regret guarantees under mild assumptions, recovering sublinear rates for common kernels. Empirically, we demonstrate that leveraging structured measurements leads to improved sample efficiency by enabling efficient transfer of information across objectives and adaptation to time-varying settings.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper extends Bayesian optimization to vector-valued operators observed via structured multidimensional or functional measurements (e.g., trajectories), rather than scalar outputs. The objective is a linear functional of these measurements. Under the assumption that the unknown operator lies in a vector-valued RKHS, the authors derive high-probability concentration bounds for the kernel ridge regression estimator directly in the measurement space. They then propose a UCB acquisition function algorithm with regret guarantees that recover sublinear rates for common kernels, and provide empirical evidence of improved sample efficiency through information transfer across objectives and adaptation to time-varying settings.
Significance. If the derived concentration bounds and regret guarantees hold, the work offers a principled extension of scalar BO to richer observation models, enabling more efficient optimization when structured outputs are available. The recovery of standard sublinear regret rates for common kernels is a strength, as is the explicit use of vector-valued RKHS to handle functional data. This could benefit applications in control, robotics, and scientific computing where measurements yield trajectories or fields rather than scalars.
minor comments (3)
- §2 (Preliminaries): the notation for the vector-valued RKHS and the associated operator-valued kernel could be clarified with an explicit example of how the linear functional defining the objective is represented in the measurement space.
- §4 (Algorithm): the transition from the KRR estimator to the UCB acquisition function is described at a high level; a short pseudocode block or explicit formula for the acquisition function in terms of the vector-valued posterior would improve readability.
- §5 (Experiments): the time-varying setting experiment would benefit from a clearer description of how the operator drift is modeled and whether the regret analysis extends directly or requires additional assumptions.
Simulated Author's Rebuttal
We thank the referee for the positive summary of our work and the recommendation for minor revision. The referee accurately captures the core contribution: extending Bayesian optimization to vector-valued operators observed through structured measurements in a vector-valued RKHS, with concentration bounds and UCB regret guarantees that recover standard sublinear rates.
Circularity Check
No significant circularity; derivation self-contained under stated RKHS assumptions
full rationale
The paper states the core assumption that the unknown operator lies in a vector-valued RKHS, derives high-probability KRR concentration bounds directly in the measurement space from that assumption, and then constructs a UCB acquisition function whose regret analysis recovers sublinear rates for common kernels under mild conditions. These steps are presented as standard derivations in the Hilbert-space setting with no reduction of a 'prediction' to a fitted quantity by construction, no load-bearing self-citation chain, and no ansatz or uniqueness claim imported from prior author work. The central claims therefore remain independent of the inputs they are derived from.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption The unknown operator lies in a vector-valued reproducing kernel Hilbert space (RKHS)
Lean theorems connected to this paper
-
IndisputableMonolith/Foundation/Cost/FunctionalEquation.leanwashburn_uniqueness_aczel (J-cost uniqueness) unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
Assuming the unknown operator lies in a vector-valued reproducing kernel Hilbert space (RKHS), we derive high-probability concentration bounds for the kernel ridge regression (KRR) estimator directly in the measurement space... UCB acquisition function with regret guarantees... recovering sublinear rates for common kernels.
-
IndisputableMonolith/Foundation/AbsoluteFloorClosure.leanreality_from_one_distinction unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
For the separable kernel with B_M having finite spectrum... Linear kernel: R_T ≤ O(log T / √T), Gaussian: O((log T)^{d+1} √T), Matérn: O(T^{d(d+1)/(2ν+d(d+1))} log T / √T)
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Reference graph
Works this paper leans on
-
[1]
Naum Ilich Akhiezer.Theory of linear operators in Hilbert space. Pitman Publishing, 1981
work page 1981
-
[2]
David Blum, Javier Arroyo, Sen Huang, Ján Drgoˇna, Filip Jorissen, Harald Taxt Walnum, Yan Chen, Kyle Benne, Draguna Vrabie, Michael Wetter, et al. Building optimization testing framework (BOPTEST) for simulation-based benchmarking of control strategies in buildings.Journal of Building Performance Simulation, 14(5):586–610, 2021
work page 2021
-
[3]
Multi-task Gaussian process prediction.Advances in Neural Information Processing Systems, 20, 2007
Edwin V Bonilla, Kian Chai, and Christopher Williams. Multi-task Gaussian process prediction.Advances in Neural Information Processing Systems, 20, 2007
work page 2007
-
[4]
Infinite task learning in RKHSs
Romain Brault, Alex Lambert, Zoltán Szabó, Maxime Sangnier, and Florence d’Alché Buc. Infinite task learning in RKHSs. InThe 22nd International Conference on Artificial Intelligence and Statistics, pages 1294–1302. PMLR, 2019
work page 2019
-
[5]
On controller tuning with time-varying Bayesian optimization
Paul Brunzema, Alexander V on Rohr, and Sebastian Trimpe. On controller tuning with time-varying Bayesian optimization. In2022 IEEE 61st Conference on Decision and Control, pages 4046–4052. IEEE, 2022
work page 2022
-
[6]
Universal multi-task kernels.The Journal of Machine Learning Research, 9:1615–1646, 2008
Andrea Caponnetto, Charles A Micchelli, Massimiliano Pontil, and Yiming Ying. Universal multi-task kernels.The Journal of Machine Learning Research, 9:1615–1646, 2008
work page 2008
-
[7]
Claudio Carmeli, Ernesto De Vito, and Alessandro Toigo. Vector valued reproducing kernel Hilbert spaces of integrable functions and Mercer theorem.Analysis and Applications, 4(04):377–408, 2006
work page 2006
-
[8]
Claudio Carmeli, Ernesto De Vito, Alessandro Toigo, and Veronica Umanitá. Vector valued reproducing kernel Hilbert spaces and universality.Analysis and Applications, 8(01):19–61, 2010
work page 2010
-
[9]
Xiao Chen, Qian Wang, and Jelena Srebric. Model predictive control for indoor thermal comfort and energy optimization using occupant feedback.Energy and Buildings, 102:357–369, 2015
work page 2015
-
[10]
On kernelized multi-armed bandits
Sayak Ray Chowdhury and Aditya Gopalan. On kernelized multi-armed bandits. InInternational Conference on Machine Learning, pages 844–853. PMLR, 2017
work page 2017
-
[11]
No-regret algorithms for multi-task Bayesian optimization
Sayak Ray Chowdhury and Aditya Gopalan. No-regret algorithms for multi-task Bayesian optimization. InInternational Conference on Artificial Intelligence and Statistics, pages 1873–1881. PMLR, 2021
work page 2021
-
[12]
João PL Coutinho, You Peng, Ricardo Rendall, Kaiwen Ma, Swee-Teng Chin, Ivan Castillo, and Marco S Reis. Efficient human-in-the-loop MPC tuning with multi-task preferential Bayesian optimization.Control Engineering Practice, 173:106972, 2026
work page 2026
-
[13]
Multi-objective Bayesian optimization over high-dimensional search spaces
Samuel Daulton, David Eriksson, Maximilian Balandat, and Eytan Bakshy. Multi-objective Bayesian optimization over high-dimensional search spaces. InUncertainty in Artificial Intelligence, pages 507–517. PMLR, 2022
work page 2022
-
[14]
Regularized multi–task learning
Theodoros Evgeniou and Massimiliano Pontil. Regularized multi–task learning. InProceedings of the 10th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pages 109–117, 2004
work page 2004
-
[15]
A Tutorial on Bayesian Optimization
Peter I Frazier. A tutorial on Bayesian optimization.arXiv preprint arXiv:1807.02811, 2018
work page internal anchor Pith review arXiv 2018
-
[16]
Cambridge University Press, 2023
Roman Garnett.Bayesian optimization. Cambridge University Press, 2023
work page 2023
-
[17]
Linas Gelazanskas and Kelum A.A. Gamage. Demand side management in smart grid: A review and proposals for future direction.Sustainable Cities and Society, 11:22–30, 2014
work page 2014
-
[18]
Jan Gertheiss, David Rügamer, Bernard XW Liew, and Sonja Greven. Functional data analysis: An introduction and recent developments.Biometrical Journal, 66(7):e202300363, 2024
work page 2024
-
[19]
Fine-tuning of neural network approximate MPC without retraining via Bayesian optimization
Henrik Hose, Paul Brunzema, Alexander V on Rohr, Alexander Gräfe, Angela P Schoellig, and Sebastian Trimpe. Fine-tuning of neural network approximate MPC without retraining via Bayesian optimization. arXiv preprint arXiv:2512.14350, 2025
-
[20]
Function-on-function Bayesian optimization
Jingru Huang, Haijie Xu, Manrui Jiang, and Chen Zhang. Function-on-function Bayesian optimization. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 40, pages 21994–22002, 2026
work page 2026
- [21]
-
[22]
Hachem Kadri, Emmanuel Duflos, Manuel Davy, Philippe Preux, and Stephane Canu.General framework for nonlinear functional regression with reproducing kernel Hilbert spaces. PhD thesis, INRIA, 2009
work page 2009
-
[23]
Nonlinear functional regression: a functional RKHS approach
Hachem Kadri, Emmanuel Duflos, Philippe Preux, Stéphane Canu, and Manuel Davy. Nonlinear functional regression: a functional RKHS approach. InProceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics, pages 374–380. JMLR Workshop and Conference Proceedings, 2010
work page 2010
-
[24]
Hachem Kadri, Emmanuel Duflos, Philippe Preux, Stéphane Canu, Alain Rakotomamonjy, and Julien Audiffren. Operator-valued kernels for learning from functional response data.Journal of Machine Learning Research, 17(20):1–54, 2016
work page 2016
-
[25]
Multi-objective Bayesian optimization algorithm
Nazan Khan, David E Goldberg, and Martin Pelikan. Multi-objective Bayesian optimization algorithm. In Proceedings of the 4th Annual Conference on Genetic and Evolutionary Computation, pages 684–684, 2002
work page 2002
-
[26]
An overview of demand side management control schemes for buildings in smart grids
Anna Magdalena Kosek, Giuseppe Tommaso Costanzo, Henrik W Bindner, and Oliver Gehrke. An overview of demand side management control schemes for buildings in smart grids. In2013 IEEE International Conference on Smart Energy Grid Engineering, pages 1–9. IEEE, 2013
work page 2013
-
[27]
Nikola Kovachki, Zongyi Li, Burigede Liu, Kamyar Azizzadenesheli, Kaushik Bhattacharya, Andrew Stuart, and Anima Anandkumar. Neural operator: Learning maps between function spaces with applications to PDEs.Journal of Machine Learning Research, 24(89):1–97, 2023
work page 2023
-
[28]
Andreas Krause and Cheng Ong. Contextual Gaussian process bandit optimization.Advances in Neural Information Processing Systems, 24, 2011
work page 2011
-
[29]
Akshay Kudva and Joel A. Paulson. Bonsai: Structure-exploiting robust Bayesian optimization for networked black-box systems under uncertainty.Computers & Chemical Engineering, 204:109393, Jan 2026
work page 2026
-
[30]
Amon Lahr, Anna Scampicchio, Johannes Köhler, and Melanie N Zeilinger. Optimal uncertainty bounds for multivariate kernel regression under bounded noise: A Gaussian process-based dual function.arXiv preprint arXiv:2603.16481, 2026
-
[31]
Neural Operator: Graph Kernel Network for Partial Differential Equations
Zongyi Li, Nikola Kovachki, Kamyar Azizzadenesheli, Burigede Liu, Kaushik Bhattacharya, Andrew Stuart, and Anima Anandkumar. Neural operator: Graph kernel network for partial differential equations. arXiv preprint arXiv:2003.03485, 2020
work page internal anchor Pith review arXiv 2003
-
[32]
Preferential Bayesian optimization with crash feedback.IEEE Robotics and Automation Letters, 2026
Johanna Menn, David Stenger, and Sebastian Trimpe. Preferential Bayesian optimization with crash feedback.IEEE Robotics and Automation Letters, 2026
work page 2026
-
[33]
On learning vector-valued functions.Neural computation, 17(1):177–204, 2005
Charles A Micchelli and Massimiliano Pontil. On learning vector-valued functions.Neural computation, 17(1):177–204, 2005
work page 2005
-
[34]
Universal kernels.Journal of Machine Learning Research, 7(12), 2006
Charles A Micchelli, Yuesheng Xu, and Haizhang Zhang. Universal kernels.Journal of Machine Learning Research, 7(12), 2006
work page 2006
-
[35]
Ha Quang Minh. Infinite-dimensional log-determinant divergences between positive definite trace class operators.Linear Algebra and Its Applications, 528:331–383, 2017
work page 2017
-
[36]
James O Ramsay and Bernard W Silverman.Functional data analysis. Springer, 1997
work page 1997
-
[37]
A survey on kernel-based multi-task learning
Carlos Ruiz, Carlos M Alaíz, and José R Dorronsoro. A survey on kernel-based multi-task learning. Neurocomputing, 577:127255, 2024
work page 2024
-
[38]
Pier Giuseppe Sessa, Pierre Laforgue, Nicolò Cesa-Bianchi, and Andreas Krause. Multitask learning with no regret: from improved confidence bounds to active learning.Advances in Neural Information Processing Systems, 36:6770–6781, 2023
work page 2023
-
[39]
Jicheng Shi, Christophe Salzmann, and Colin N Jones. Disturbance-adaptive data-driven predictive control: Trading comfort violation for savings in building climate control.IEEE Transactions on Control Systems Technology, 2026
work page 2026
-
[40]
Niranjan Srinivas, Andreas Krause, Sham M Kakade, and Matthias W Seeger. Information-theoretic regret bounds for Gaussian process optimization in the bandit setting.IEEE Transactions on Information Theory, 58(5):3250–3265, 2012
work page 2012
-
[41]
Multi-task Bayesian optimization.Advances in Neural Information Processing Systems, 26, 2013
Kevin Swersky, Jasper Snoek, and Ryan P Adams. Multi-task Bayesian optimization.Advances in Neural Information Processing Systems, 26, 2013. 11
work page 2013
-
[42]
Bayesian functional optimization
Ngo Anh Vien, Heiko Zimmermann, and Marc Toussaint. Bayesian functional optimization. InProceedings of the AAAI Conference on Artificial Intelligence, volume 32, 2018
work page 2018
-
[43]
Decentralized feedback optimization via sensitivity decoupling: Stability and sub-optimality
Wenbin Wang, Zhiyu He, Giuseppe Belgioioso, Saverio Bolognani, and Florian Dorfler. Decentralized feedback optimization via sensitivity decoupling: Stability and sub-optimality. In2024 European Control Conference, pages 3201–3206. IEEE, 2024
work page 2024
-
[44]
Wenbin Wang, Jicheng Shi, and Colin N Jones. Personalized building climate control with contextual preferential Bayesian optimization.arXiv preprint arXiv:2512.09481, 2025
-
[45]
Wenjie Xu, Bratislav Svetozarevic, Loris Di Natale, Philipp Heer, and Colin N Jones. Data-driven adaptive building thermal controller tuning with constraints: A primal–dual contextual Bayesian optimization approach.Applied Energy, 358:122493, 2024
work page 2024
-
[46]
Wenjie Xu, Bratislav Svetozarevic, Loris Di Natale, Philipp Heer, and Colin N. Jones. Data-driven adaptive building thermal controller tuning with constraints: A primal–dual contextual Bayesian optimization approach.Applied Energy, 358:122493, 2024
work page 2024
-
[47]
Implementation and performance analysis of a multi-energy building emulator
Tao Yang, Konstantin Filonenko, Krzysztof Arendt, and Christain Veje. Implementation and performance analysis of a multi-energy building emulator. In2020 6th IEEE International Energy Conference, pages 451–456. IEEE, 2020
work page 2020
-
[48]
What price to pay? Auto-tuning a building MPC controller for optimal economic cost
Jiarui Yu, Jicheng Shi, Wenjie Xu, and Colin N Jones. Which price to pay? auto-tuning building mpc controller for optimal economic cost.arXiv preprint arXiv:2501.10859, 2025
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[49]
Ming Yuan and T. Tony Cai. A reproducing kernel hilbert space approach to functional linear regression. The Annals of Statistics, 38(6), December 2010. A Common linear functionals In this section, we provide examples of linear functionals in common Hilbert spaces. In all cases, continuous linear functionals admit a representation via the inner product, as...
work page 2010
-
[50]
Linear kernel:R T ≤ O logT √ T ,
-
[51]
Gaussian kernel:R T ≤ O (logT) d+1√ T),
-
[52]
Matérn kernel:R T ≤ O T d(d+1)/(2ν+d(d+1)) logT √ T . F Proof for Lemma 4 For the first part, according to the Fredholm determinant for positive definite trace class operators [35], we observe that log det I+λ −1GXT XT ⊗B M = TX t=1 ∞X i=1 log(1 + 1 λ λG t λBM i )≤ 1 λ TX t=1 ∞X i=1 λG t λBM i <∞. Hence, applying Fubini’s theorem for infinite series (swit...
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.