KSOS-BO: Improving Sampling in Bayesian Optimization via Kernel Sum of Squares
Pith reviewed 2026-05-21 01:28 UTC · model grok-4.3
The pith
Formulating acquisition optimization as a kernel sum of squares semidefinite program enables better global search in Bayesian optimization than standard derivative-free methods.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
KSOS-BO formulates the optimization of the acquisition function as a semidefinite program with kernel-induced representations, enabling a structured global search. Across diverse benchmark functions with varying landscape properties, KSOS-BO consistently outperforms derivative-free baselines using Sobol Search, Differential Evolution, or CMA-ES to optimize the acquisition function, achieving an average regret improvement of 81.16% on 10/15 benchmarks and faster wall-clock convergence with an average improvement of 93.55% on 10/15 benchmarks, with strong results especially on highly multimodal and unimodal but ill-conditioned functions.
What carries the argument
The kernel sum of squares representation that recasts acquisition function optimization as a globally solvable semidefinite program.
If this is right
- It delivers an average regret improvement of 81.16 percent on ten out of fifteen tested benchmarks.
- It reaches high-quality solutions with fewer evaluations, yielding 93.55 percent average wall-clock improvement on ten out of fifteen benchmarks.
- It performs especially well on highly multimodal functions and on unimodal but ill-conditioned functions.
- It shows reduced effectiveness on functions that contain steep drops or plate-shaped regions.
Where Pith is reading between the lines
- The semidefinite-program step may combine usefully with local refinement methods to handle remaining difficult landscape features.
- Similar kernel-sum-of-squares reformulations could be applied to other structured black-box optimization problems outside Bayesian optimization.
- Testing the method on higher-dimensional problems would reveal whether the semidefinite solver remains practical as dimension grows.
Load-bearing premise
The kernel sum of squares representation supplies a sufficiently accurate and globally solvable formulation of the acquisition function without missing important optima or adding significant bias on the tested landscape types.
What would settle it
If KSOS-BO fails to locate acquisition maxima that are better than those found by Sobol search or CMA-ES on a new set of functions containing steep drops or plateaus, the claim that the representation enables reliable global improvement would be challenged.
Figures
read the original abstract
Bayesian Optimization (BO) is an effective framework for globally optimizing functions whose evaluations are expensive. It is particularly effective for optimizing functions defined over continuous domains and explicitly handles stochastic noise in evaluations. As a result, it is widely applied in areas such as hyperparameter tuning, robotics policy search, and scientific experiment design, where sample efficiency is essential. Its two-step procedure consists of model fitting followed by optimization of the acquisition function, which is often treated as a generic black-box problem despite its structured nature. In this work, we introduce KSOS-BO, a kernel-based derivative-free framework for BO acquisition optimization. KSOS-BO formulates the optimization of the acquisition function as a semidefinite program with kernel-induced representations, enabling a structured global search. Across a diverse set of benchmark functions with varying landscape properties, KSOS-BO consistently outperforms derivative-free baselines using Sobol Search, Differential Evolution, or CMA-ES to optimize the acquisition function, achieving an average regret improvement of 81.16% on 10/15 benchmarks. In particular, KSOS-BO demonstrates strong performance in highly multimodal and unimodal but ill-conditioned functions, indicating its applicability to diverse landscape structures. Despite a higher per-iteration computational cost, it converges faster in wall-clock time with an average improvement of 93.55% on 10/15 benchmarks, as it reaches high-quality solutions with fewer evaluations. Limitations include reduced effectiveness on functions with steep drops or plate-shaped regions.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript introduces KSOS-BO, a kernel-based framework that reformulates acquisition-function optimization in Bayesian optimization as a semidefinite program using kernel-induced sum-of-squares representations. This is claimed to enable structured global search. On 15 benchmark functions, KSOS-BO is reported to outperform derivative-free baselines (Sobol search, Differential Evolution, CMA-ES) with an average regret improvement of 81.16% on 10/15 benchmarks and an average wall-clock convergence improvement of 93.55% on 10/15 benchmarks, performing especially well on multimodal and ill-conditioned landscapes while noting limitations on steep-drop or plate-shaped functions.
Significance. If the kernel-SOS SDP formulation recovers near-global optima of the acquisition function with negligible bias or duality gap, the approach could meaningfully improve sample efficiency in Bayesian optimization for expensive black-box problems. The reported regret and wall-clock gains are large enough to be practically relevant in hyperparameter tuning and robotics applications. However, the absence of reproducibility details (independent runs, statistical tests, baseline hyperparameter protocols) and explicit tightness guarantees for the SDP relaxation substantially weakens the assessed significance of the empirical claims.
major comments (2)
- [Method (kernel SOS formulation) and Experimental Results] The central empirical claims rest on KSOS-BO locating superior acquisition-function values compared with the derivative-free baselines. Yet the manuscript provides no explicit error bounds, duality-gap certificates, or direct comparison of attained acquisition values against a reference global solver for the non-polynomial acquisition functions (e.g., EI) used in the experiments. Without such verification, it remains possible that the kernel representation introduces systematic bias on certain landscape types, undermining the 81% regret and 93% wall-clock improvements.
- [Experimental Results] Table or figure reporting the 81.16% and 93.55% average improvements: the text states results on 10/15 benchmarks but supplies no information on the number of independent runs, standard deviations, or statistical significance tests. This omission makes it impossible to judge whether the reported gains are robust or could be explained by variability in the baselines.
minor comments (2)
- [Abstract] The abstract and introduction should explicitly define the acquisition functions tested (EI, UCB, etc.) and the precise kernel family and truncation order used in the SOS representation.
- [Implementation details] Clarify whether the SDP is solved to global optimality in every iteration or whether early termination heuristics are employed; this directly affects the claimed wall-clock advantage.
Simulated Author's Rebuttal
We thank the referee for their detailed review and constructive feedback. We have carefully considered the major comments and will revise the manuscript accordingly to strengthen the empirical validation and reporting. Below we provide point-by-point responses.
read point-by-point responses
-
Referee: The central empirical claims rest on KSOS-BO locating superior acquisition-function values compared with the derivative-free baselines. Yet the manuscript provides no explicit error bounds, duality-gap certificates, or direct comparison of attained acquisition values against a reference global solver for the non-polynomial acquisition functions (e.g., EI) used in the experiments. Without such verification, it remains possible that the kernel representation introduces systematic bias on certain landscape types, undermining the 81% regret and 93% wall-clock improvements.
Authors: We appreciate this observation. While the kernel sum-of-squares formulation provides a convex relaxation that enables global optimization of the approximated acquisition function, we acknowledge that explicit bounds on the approximation error or duality gaps are not provided in the current manuscript. To address this, we will include additional experiments comparing the attained acquisition values to those from a reference global optimizer (such as multi-start gradient descent or grid search in low dimensions) and report the observed duality gaps from the SDP solver. This will help quantify any potential bias introduced by the kernel representation. revision: yes
-
Referee: [Experimental Results] Table or figure reporting the 81.16% and 93.55% average improvements: the text states results on 10/15 benchmarks but supplies no information on the number of independent runs, standard deviations, or statistical significance tests. This omission makes it impossible to judge whether the reported gains are robust or could be explained by variability in the baselines.
Authors: We agree that reporting the number of independent runs, standard deviations, and statistical significance is essential for assessing the robustness of the results. In the revised manuscript, we will add this information to the relevant tables and figures, including details on the experimental protocol (e.g., 10 independent runs per benchmark) and results of paired t-tests or similar to establish statistical significance of the improvements. revision: yes
Circularity Check
No circularity in KSOS-BO derivation or claims
full rationale
The paper introduces KSOS-BO as a kernel sum-of-squares SDP formulation for acquisition-function optimization, solved via an external SDP solver. Performance claims rest on empirical regret and wall-clock comparisons against derivative-free baselines on independent benchmark functions with varying landscapes. No equations or steps reduce a central result to a fitted parameter, self-defined quantity, or self-citation chain by construction. The method is externally validated rather than internally forced, satisfying the criteria for a self-contained derivation.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption Acquisition functions admit a kernel-induced sum-of-squares representation that converts their maximization into a tractable semidefinite program.
Lean theorems connected to this paper
-
IndisputableMonolith/Cost/FunctionalEquation.leanwashburn_uniqueness_aczel unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
KSOS-BO formulates the optimization of the acquisition function as a semidefinite program with kernel-induced representations... c⋆_KSOS = max c,B⪰0 c−λTr(B) s.t. α(xi)−c=ϕ⊤i B ϕi
-
IndisputableMonolith/Foundation/AlphaCoordinateFixation.leanJ_uniquely_calibrated_via_higher_derivative unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
Kernel Sum-of-Squares (KernelSOS) extends SOS beyond polynomial representations by replacing polynomial bases with kernel functions
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Reference graph
Works this paper leans on
-
[1]
Bobak Shahriari, Kevin Swersky, Ziyu Wang, Ryan P Adams, and Nando De Freitas. Taking the human out of the loop: A review of Bayesian optimization.Proceedings of the IEEE, 104(1):148–175, 2015
work page 2015
-
[2]
Jasper Snoek, Oren Rippel, Kevin Swersky, Ryan Kiros, Nadathur Satish, Narayanan Sundaram, Mostofa Patwary, Prabhat, and Ryan P. Adams. Scalable Bayesian Optimization Using Deep Neural Networks. InProceedings of the International Conference on Machine Learning, pages 2171–2180. PMLR, 2015
work page 2015
-
[3]
A new acquisition function for Bayesian optimization based on the moment-generating function
Hao Wang, Bas van Stein, Michael Emmerich, and Thomas Back. A new acquisition function for Bayesian optimization based on the moment-generating function. In2017 IEEE International Conference on Systems, Man, and Cybernetics (SMC), pages 507–512. IEEE, 2017
work page 2017
-
[4]
Acquisition functions in Bayesian optimization
Weiao Gan, Ziyuan Ji, and Yongqing Liang. Acquisition functions in Bayesian optimization. In 2021 2nd International Conference on Big Data & Artificial Intelligence & misc Engineering (ICBASE), pages 129–135. IEEE, 2021
work page 2021
-
[5]
James Wilson, Frank Hutter, and Marc Deisenroth. Maximizing acquisition functions for Bayesian optimization.Advances in Neural Information Processing Systems, 31, 2018
work page 2018
-
[6]
Swagatam Das and Ponnuthurai Nagaratnam Suganthan. Differential Evolution: A survey of the state-of-the-art.IEEE Transactions on Evolutionary Computation, 15(1):4–31, 2010
work page 2010
-
[7]
The CMA Evolution Strategy: A Tutorial
Nikolaus Hansen. The CMA evolution strategy: A tutorial.arXiv preprint arXiv:1604.00772, 2016
work page internal anchor Pith review Pith/arXiv arXiv 2016
-
[8]
Efficient benchmarking of hyperparameter optimizers via surrogates
Katharina Eggensperger, Frank Hutter, Holger Hoos, and Kevin Leyton-Brown. Efficient benchmarking of hyperparameter optimizers via surrogates. InProceedings of the AAAI Conference on Artificial Intelligence, volume 29, 2015
work page 2015
-
[9]
Jorge Nocedal and Stephen J Wright.Numerical optimization. Springer, 2006
work page 2006
-
[10]
Jean B Lasserre. Global optimization with polynomials and the problem of moments.SIAM Journal on Optimization, 11(3):796–817, 2001
work page 2001
-
[11]
Pablo A Parrilo. Semidefinite programming relaxations for semialgebraic problems.Mathemat- ical Programming, 96(2):293–320, 2003
work page 2003
-
[12]
Jean Bernard Lasserre.Moments, positive polynomials and their applications, volume 1. World Scientific, 2009
work page 2009
-
[13]
Jasper Snoek, Hugo Larochelle, and Ryan P Adams. Practical bayesian optimization of machine learning algorithms.Advances in Neural Information Processing Systems, 25, 2012
work page 2012
-
[14]
Finding global minima via kernel approximations: A
Alessandro Rudi, Ulysse Marteau-Ferey, and Francis Bach. Finding global minima via kernel approximations: A. Rudi et al.Mathematical Programming, 209(1):703–784, 2025
work page 2025
-
[15]
PhD thesis, California Institute of Technology, 2000
Pablo A Parrilo.Structured Semidefinite Programs and Semialgebraic Geometry Methods in Robustness and Optimization. PhD thesis, California Institute of Technology, 2000
work page 2000
-
[16]
Russ Tedrake, Ian R Manchester, Mark Tobenkin, and John W Roberts. LQR-trees: Feedback motion planning via sums-of-squares verification.The International Journal of Robotics Research, 29(8):1038–1052, 2010
work page 2010
-
[17]
Heng Yang and Luca Carlone. Certifiably optimal outlier-robust geometric perception: Semidef- inite relaxations and scalable global optimization.IEEE Transactions on Pattern Analysis and Machine Intelligence, 45(3):2816–2834, 2022
work page 2022
-
[18]
Cambridge University Press, 2015
Jean Bernard Lasserre.An introduction to polynomial and semi-algebraic optimization, vol- ume 52. Cambridge University Press, 2015
work page 2015
-
[19]
Cambridge University Press, 2016
Vern I Paulsen and Mrinal Raghupathi.An introduction to the theory of reproducing kernel Hilbert spaces, volume 152. Cambridge University Press, 2016. 10
work page 2016
-
[20]
Smola.Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond
Bernhard Schölkopf and Alexander J. Smola.Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond. MIT Press, 2018
work page 2018
-
[21]
Infinite-dimensional sums-of-squares for optimal control
Eloïse Berthier, Justin Carpentier, Alessandro Rudi, and Francis Bach. Infinite-dimensional sums-of-squares for optimal control. In2022 IEEE 61st Conference on Decision and Control (CDC), pages 577–582. IEEE, 2022
work page 2022
-
[22]
Jean B Lasserre, Didier Henrion, Christophe Prieur, and Emmanuel Trélat. Nonlinear opti- mal control via occupation measures and LMI-relaxations.SIAM Journal on Control and Optimization, 47(4):1643–1666, 2008
work page 2008
-
[23]
Sampling-Based Global Optimal Control and Estimation via Semidefinite Programming
Antoine Groudiev, Fabian Schramm, Éloïse Berthier, Justin Carpentier, and Frederike Dümbgen. Sampling-based global optimal control and estimation via semidefinite programming.arXiv preprint arXiv:2507.17572, 2025
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[24]
Global Sampling-Based Trajectory Optimization for Contact-Rich Manipulation via KernelSOS
Zhongqi Wei and Frederike Dümbgen. Global sampling-based trajectory optimization for contact-rich manipulation via kernelsos.arXiv preprint arXiv:2604.27175, 2026
work page internal anchor Pith review Pith/arXiv arXiv 2026
-
[25]
Peter I Frazier. Bayesian optimization. InRecent Advances in Optimization and Modeling of Contemporary Problems, pages 255–278. Informs, 2018
work page 2018
-
[26]
Donald R Jones, Matthias Schonlau, and William J Welch. Efficient global optimization of expensive black-box functions.Journal of Global optimization, 13(4):455–492, 1998
work page 1998
-
[27]
James Bergstra, Rémi Bardenet, Yoshua Bengio, and Balázs Kégl. Algorithms for hyper- parameter optimization.Advances in Neural Information Processing Systems, 24, 2011
work page 2011
-
[28]
Andrew R Conn, Katya Scheinberg, and Luis N Vicente.Introduction to derivative-free optimization. SIAM, 2009
work page 2009
-
[29]
Eric Brochu, Vlad M. Cora, and Nando de Freitas. A Tutorial on Bayesian Optimization of Expensive Cost Functions, with Application to Active User Modeling and Hierarchical Reinforcement Learning. Technical report, Department of Computer Science, University of British Columbia, 2009
work page 2009
-
[30]
Gaussian processes for machine learning.International Journal of Neural Systems, 14(02):69–106, 2004
Matthias Seeger. Gaussian processes for machine learning.International Journal of Neural Systems, 14(02):69–106, 2004
work page 2004
-
[31]
Ulysse Marteau-Ferey, Francis Bach, and Alessandro Rudi. Non-parametric models for non- negative functions.Advances in Neural Information Processing Systems, 33:12816–12826, 2020
work page 2020
-
[32]
S. Surjanovic and D. Bingham. Virtual Library of Simulation Experiments: Test Functions and Datasets.http://www.sfu.ca/~ssurjano, 2026. Accessed: May 6, 2026
work page 2026
-
[33]
Gaussian processes for regression.Advances in Neural Information Processing Systems, 8, 1995
Christopher Williams and Carl Rasmussen. Gaussian processes for regression.Advances in Neural Information Processing Systems, 8, 1995
work page 1995
-
[34]
Antoine Groudiev and Frederike Dümbgen. ksos-tools: Implementation of the Kernel Sum- of-Squares optimization framework and various related solvers. https://github.com/ Simple-Robotics/ksos-tools, 2025. Version 0.2.2
work page 2025
-
[35]
Masahiro Nomura, Masashi Shibata, and Ryoki Hamano. cmaes: A Simple yet Practical Python Library for CMA-ES.arXiv preprint arXiv:2402.01373, 2026. 11 A THEORETICAL BACKGROUND A.1 Gaussian Process A Gaussian process (GP) is a collection of random variables {f(x)} x∈X such that any finite subset follows a joint Gaussian distribution. A GP is fully specified...
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.