pith. sign in

arxiv: 2605.21179 · v1 · pith:IEFCL5WEnew · submitted 2026-05-20 · 💻 cs.CE

KSOS-BO: Improving Sampling in Bayesian Optimization via Kernel Sum of Squares

Pith reviewed 2026-05-21 01:28 UTC · model grok-4.3

classification 💻 cs.CE
keywords Bayesian optimizationacquisition function optimizationkernel sum of squaressemidefinite programmingglobal optimizationderivative-free methods
0
0 comments X

The pith

Formulating acquisition optimization as a kernel sum of squares semidefinite program enables better global search in Bayesian optimization than standard derivative-free methods.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Bayesian optimization selects points by maximizing an acquisition function after fitting a model, but this step is usually treated as a black-box problem. The paper introduces KSOS-BO, which uses kernel sum of squares representations to turn acquisition maximization into a semidefinite program that supports structured global search without derivatives. On a range of benchmark functions with different landscape features, the approach reduces regret more than Sobol search, differential evolution, or CMA-ES and reaches good solutions faster in wall-clock time despite higher per-iteration cost. A sympathetic reader would care because expensive evaluations appear in hyperparameter tuning, robotics, and experiment design, so fewer wasted samples directly lowers total cost.

Core claim

KSOS-BO formulates the optimization of the acquisition function as a semidefinite program with kernel-induced representations, enabling a structured global search. Across diverse benchmark functions with varying landscape properties, KSOS-BO consistently outperforms derivative-free baselines using Sobol Search, Differential Evolution, or CMA-ES to optimize the acquisition function, achieving an average regret improvement of 81.16% on 10/15 benchmarks and faster wall-clock convergence with an average improvement of 93.55% on 10/15 benchmarks, with strong results especially on highly multimodal and unimodal but ill-conditioned functions.

What carries the argument

The kernel sum of squares representation that recasts acquisition function optimization as a globally solvable semidefinite program.

If this is right

  • It delivers an average regret improvement of 81.16 percent on ten out of fifteen tested benchmarks.
  • It reaches high-quality solutions with fewer evaluations, yielding 93.55 percent average wall-clock improvement on ten out of fifteen benchmarks.
  • It performs especially well on highly multimodal functions and on unimodal but ill-conditioned functions.
  • It shows reduced effectiveness on functions that contain steep drops or plate-shaped regions.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The semidefinite-program step may combine usefully with local refinement methods to handle remaining difficult landscape features.
  • Similar kernel-sum-of-squares reformulations could be applied to other structured black-box optimization problems outside Bayesian optimization.
  • Testing the method on higher-dimensional problems would reveal whether the semidefinite solver remains practical as dimension grows.

Load-bearing premise

The kernel sum of squares representation supplies a sufficiently accurate and globally solvable formulation of the acquisition function without missing important optima or adding significant bias on the tested landscape types.

What would settle it

If KSOS-BO fails to locate acquisition maxima that are better than those found by Sobol search or CMA-ES on a new set of functions containing steep drops or plateaus, the claim that the representation enables reliable global improvement would be challenged.

Figures

Figures reproduced from arXiv: 2605.21179 by Buqing Ou, Frederike D\"umbgen.

Figure 1
Figure 1. Figure 1: Overview of KSOS-BO. A Gaussian Process surrogate is fitted to the current measurement [PITH_FULL_IMAGE:figures/full_fig_p004_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Convergence comparison on 15 benchmarks in 10 dimensions using [PITH_FULL_IMAGE:figures/full_fig_p007_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Average runtime over 400 iterations for four acquisition optimizers across 6 benchmark [PITH_FULL_IMAGE:figures/full_fig_p007_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Convergence comparison on 15 benchmarks in 10 dimensions using [PITH_FULL_IMAGE:figures/full_fig_p008_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Visualization of KernelSOS surrogate approximation on 1D EI after 9 BO steps. [PITH_FULL_IMAGE:figures/full_fig_p009_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: Convergence comparison on 6 benchmarks in 10 dimensions using [PITH_FULL_IMAGE:figures/full_fig_p019_6.png] view at source ↗
Figure 7
Figure 7. Figure 7: Convergence comparison on 6 10-dimensional benchmark functions with [PITH_FULL_IMAGE:figures/full_fig_p019_7.png] view at source ↗
Figure 8
Figure 8. Figure 8: Convergence comparison on 6 10-dimensional benchmark functions with [PITH_FULL_IMAGE:figures/full_fig_p020_8.png] view at source ↗
Figure 9
Figure 9. Figure 9: Convergence comparison on 6 10-dimensional benchmark functions with [PITH_FULL_IMAGE:figures/full_fig_p021_9.png] view at source ↗
Figure 10
Figure 10. Figure 10: Convergence comparison on 6 10-dimensional benchmark functions with [PITH_FULL_IMAGE:figures/full_fig_p021_10.png] view at source ↗
Figure 11
Figure 11. Figure 11: Convergence comparison on 6 benchmarks in 2 dimensions using [PITH_FULL_IMAGE:figures/full_fig_p022_11.png] view at source ↗
Figure 12
Figure 12. Figure 12: Convergence comparison on 6 benchmarks in 5 dimensions using [PITH_FULL_IMAGE:figures/full_fig_p022_12.png] view at source ↗
Figure 13
Figure 13. Figure 13: Convergence comparison of Sobol and uniform sampling for KSOS-BO, along with [PITH_FULL_IMAGE:figures/full_fig_p023_13.png] view at source ↗
Figure 14
Figure 14. Figure 14: Convergence comparison of CMA-ES with different population sizes to KSOS-BO on 6 10-dimensional benchmark functions using EI over 400 iterations. Curves show the simple regret across 5 seeds on a logarithmic scale, with shaded regions indicating the 95% confidence interval [PITH_FULL_IMAGE:figures/full_fig_p024_14.png] view at source ↗
Figure 15
Figure 15. Figure 15: Convergence comparison of DE with different population sizes and crossover rate to KSOS-BO on 6 10-dimensional benchmark functions using EI over 400 iterations. Curves show the simple regret across 5 seeds on a logarithmic scale, with shaded regions indicating the 95% confidence interval. number of generations from 5 to 3, testing whether greater population diversity is more beneficial than additional ite… view at source ↗
Figure 16
Figure 16. Figure 16: Convergence comparison on 6 benchmark functions in 10 dimensions using [PITH_FULL_IMAGE:figures/full_fig_p025_16.png] view at source ↗
read the original abstract

Bayesian Optimization (BO) is an effective framework for globally optimizing functions whose evaluations are expensive. It is particularly effective for optimizing functions defined over continuous domains and explicitly handles stochastic noise in evaluations. As a result, it is widely applied in areas such as hyperparameter tuning, robotics policy search, and scientific experiment design, where sample efficiency is essential. Its two-step procedure consists of model fitting followed by optimization of the acquisition function, which is often treated as a generic black-box problem despite its structured nature. In this work, we introduce KSOS-BO, a kernel-based derivative-free framework for BO acquisition optimization. KSOS-BO formulates the optimization of the acquisition function as a semidefinite program with kernel-induced representations, enabling a structured global search. Across a diverse set of benchmark functions with varying landscape properties, KSOS-BO consistently outperforms derivative-free baselines using Sobol Search, Differential Evolution, or CMA-ES to optimize the acquisition function, achieving an average regret improvement of 81.16% on 10/15 benchmarks. In particular, KSOS-BO demonstrates strong performance in highly multimodal and unimodal but ill-conditioned functions, indicating its applicability to diverse landscape structures. Despite a higher per-iteration computational cost, it converges faster in wall-clock time with an average improvement of 93.55% on 10/15 benchmarks, as it reaches high-quality solutions with fewer evaluations. Limitations include reduced effectiveness on functions with steep drops or plate-shaped regions.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The manuscript introduces KSOS-BO, a kernel-based framework that reformulates acquisition-function optimization in Bayesian optimization as a semidefinite program using kernel-induced sum-of-squares representations. This is claimed to enable structured global search. On 15 benchmark functions, KSOS-BO is reported to outperform derivative-free baselines (Sobol search, Differential Evolution, CMA-ES) with an average regret improvement of 81.16% on 10/15 benchmarks and an average wall-clock convergence improvement of 93.55% on 10/15 benchmarks, performing especially well on multimodal and ill-conditioned landscapes while noting limitations on steep-drop or plate-shaped functions.

Significance. If the kernel-SOS SDP formulation recovers near-global optima of the acquisition function with negligible bias or duality gap, the approach could meaningfully improve sample efficiency in Bayesian optimization for expensive black-box problems. The reported regret and wall-clock gains are large enough to be practically relevant in hyperparameter tuning and robotics applications. However, the absence of reproducibility details (independent runs, statistical tests, baseline hyperparameter protocols) and explicit tightness guarantees for the SDP relaxation substantially weakens the assessed significance of the empirical claims.

major comments (2)
  1. [Method (kernel SOS formulation) and Experimental Results] The central empirical claims rest on KSOS-BO locating superior acquisition-function values compared with the derivative-free baselines. Yet the manuscript provides no explicit error bounds, duality-gap certificates, or direct comparison of attained acquisition values against a reference global solver for the non-polynomial acquisition functions (e.g., EI) used in the experiments. Without such verification, it remains possible that the kernel representation introduces systematic bias on certain landscape types, undermining the 81% regret and 93% wall-clock improvements.
  2. [Experimental Results] Table or figure reporting the 81.16% and 93.55% average improvements: the text states results on 10/15 benchmarks but supplies no information on the number of independent runs, standard deviations, or statistical significance tests. This omission makes it impossible to judge whether the reported gains are robust or could be explained by variability in the baselines.
minor comments (2)
  1. [Abstract] The abstract and introduction should explicitly define the acquisition functions tested (EI, UCB, etc.) and the precise kernel family and truncation order used in the SOS representation.
  2. [Implementation details] Clarify whether the SDP is solved to global optimality in every iteration or whether early termination heuristics are employed; this directly affects the claimed wall-clock advantage.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their detailed review and constructive feedback. We have carefully considered the major comments and will revise the manuscript accordingly to strengthen the empirical validation and reporting. Below we provide point-by-point responses.

read point-by-point responses
  1. Referee: The central empirical claims rest on KSOS-BO locating superior acquisition-function values compared with the derivative-free baselines. Yet the manuscript provides no explicit error bounds, duality-gap certificates, or direct comparison of attained acquisition values against a reference global solver for the non-polynomial acquisition functions (e.g., EI) used in the experiments. Without such verification, it remains possible that the kernel representation introduces systematic bias on certain landscape types, undermining the 81% regret and 93% wall-clock improvements.

    Authors: We appreciate this observation. While the kernel sum-of-squares formulation provides a convex relaxation that enables global optimization of the approximated acquisition function, we acknowledge that explicit bounds on the approximation error or duality gaps are not provided in the current manuscript. To address this, we will include additional experiments comparing the attained acquisition values to those from a reference global optimizer (such as multi-start gradient descent or grid search in low dimensions) and report the observed duality gaps from the SDP solver. This will help quantify any potential bias introduced by the kernel representation. revision: yes

  2. Referee: [Experimental Results] Table or figure reporting the 81.16% and 93.55% average improvements: the text states results on 10/15 benchmarks but supplies no information on the number of independent runs, standard deviations, or statistical significance tests. This omission makes it impossible to judge whether the reported gains are robust or could be explained by variability in the baselines.

    Authors: We agree that reporting the number of independent runs, standard deviations, and statistical significance is essential for assessing the robustness of the results. In the revised manuscript, we will add this information to the relevant tables and figures, including details on the experimental protocol (e.g., 10 independent runs per benchmark) and results of paired t-tests or similar to establish statistical significance of the improvements. revision: yes

Circularity Check

0 steps flagged

No circularity in KSOS-BO derivation or claims

full rationale

The paper introduces KSOS-BO as a kernel sum-of-squares SDP formulation for acquisition-function optimization, solved via an external SDP solver. Performance claims rest on empirical regret and wall-clock comparisons against derivative-free baselines on independent benchmark functions with varying landscapes. No equations or steps reduce a central result to a fitted parameter, self-defined quantity, or self-citation chain by construction. The method is externally validated rather than internally forced, satisfying the criteria for a self-contained derivation.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The approach rests on the domain assumption that kernel sum of squares can faithfully represent acquisition functions for global optimization via SDP; no free parameters or invented entities are explicitly introduced in the abstract.

axioms (1)
  • domain assumption Acquisition functions admit a kernel-induced sum-of-squares representation that converts their maximization into a tractable semidefinite program.
    This is the central modeling step that enables the structured search described in the abstract.

pith-pipeline@v0.9.0 · 5798 in / 1377 out tokens · 39140 ms · 2026-05-21T01:28:14.025848+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

35 extracted references · 35 canonical work pages · 3 internal anchors

  1. [1]

    Taking the human out of the loop: A review of Bayesian optimization.Proceedings of the IEEE, 104(1):148–175, 2015

    Bobak Shahriari, Kevin Swersky, Ziyu Wang, Ryan P Adams, and Nando De Freitas. Taking the human out of the loop: A review of Bayesian optimization.Proceedings of the IEEE, 104(1):148–175, 2015

  2. [2]

    Jasper Snoek, Oren Rippel, Kevin Swersky, Ryan Kiros, Nadathur Satish, Narayanan Sundaram, Mostofa Patwary, Prabhat, and Ryan P. Adams. Scalable Bayesian Optimization Using Deep Neural Networks. InProceedings of the International Conference on Machine Learning, pages 2171–2180. PMLR, 2015

  3. [3]

    A new acquisition function for Bayesian optimization based on the moment-generating function

    Hao Wang, Bas van Stein, Michael Emmerich, and Thomas Back. A new acquisition function for Bayesian optimization based on the moment-generating function. In2017 IEEE International Conference on Systems, Man, and Cybernetics (SMC), pages 507–512. IEEE, 2017

  4. [4]

    Acquisition functions in Bayesian optimization

    Weiao Gan, Ziyuan Ji, and Yongqing Liang. Acquisition functions in Bayesian optimization. In 2021 2nd International Conference on Big Data & Artificial Intelligence & misc Engineering (ICBASE), pages 129–135. IEEE, 2021

  5. [5]

    Maximizing acquisition functions for Bayesian optimization.Advances in Neural Information Processing Systems, 31, 2018

    James Wilson, Frank Hutter, and Marc Deisenroth. Maximizing acquisition functions for Bayesian optimization.Advances in Neural Information Processing Systems, 31, 2018

  6. [6]

    Differential Evolution: A survey of the state-of-the-art.IEEE Transactions on Evolutionary Computation, 15(1):4–31, 2010

    Swagatam Das and Ponnuthurai Nagaratnam Suganthan. Differential Evolution: A survey of the state-of-the-art.IEEE Transactions on Evolutionary Computation, 15(1):4–31, 2010

  7. [7]

    The CMA Evolution Strategy: A Tutorial

    Nikolaus Hansen. The CMA evolution strategy: A tutorial.arXiv preprint arXiv:1604.00772, 2016

  8. [8]

    Efficient benchmarking of hyperparameter optimizers via surrogates

    Katharina Eggensperger, Frank Hutter, Holger Hoos, and Kevin Leyton-Brown. Efficient benchmarking of hyperparameter optimizers via surrogates. InProceedings of the AAAI Conference on Artificial Intelligence, volume 29, 2015

  9. [9]

    Springer, 2006

    Jorge Nocedal and Stephen J Wright.Numerical optimization. Springer, 2006

  10. [10]

    Global optimization with polynomials and the problem of moments.SIAM Journal on Optimization, 11(3):796–817, 2001

    Jean B Lasserre. Global optimization with polynomials and the problem of moments.SIAM Journal on Optimization, 11(3):796–817, 2001

  11. [11]

    Semidefinite programming relaxations for semialgebraic problems.Mathemat- ical Programming, 96(2):293–320, 2003

    Pablo A Parrilo. Semidefinite programming relaxations for semialgebraic problems.Mathemat- ical Programming, 96(2):293–320, 2003

  12. [12]

    World Scientific, 2009

    Jean Bernard Lasserre.Moments, positive polynomials and their applications, volume 1. World Scientific, 2009

  13. [13]

    Practical bayesian optimization of machine learning algorithms.Advances in Neural Information Processing Systems, 25, 2012

    Jasper Snoek, Hugo Larochelle, and Ryan P Adams. Practical bayesian optimization of machine learning algorithms.Advances in Neural Information Processing Systems, 25, 2012

  14. [14]

    Finding global minima via kernel approximations: A

    Alessandro Rudi, Ulysse Marteau-Ferey, and Francis Bach. Finding global minima via kernel approximations: A. Rudi et al.Mathematical Programming, 209(1):703–784, 2025

  15. [15]

    PhD thesis, California Institute of Technology, 2000

    Pablo A Parrilo.Structured Semidefinite Programs and Semialgebraic Geometry Methods in Robustness and Optimization. PhD thesis, California Institute of Technology, 2000

  16. [16]

    LQR-trees: Feedback motion planning via sums-of-squares verification.The International Journal of Robotics Research, 29(8):1038–1052, 2010

    Russ Tedrake, Ian R Manchester, Mark Tobenkin, and John W Roberts. LQR-trees: Feedback motion planning via sums-of-squares verification.The International Journal of Robotics Research, 29(8):1038–1052, 2010

  17. [17]

    Heng Yang and Luca Carlone. Certifiably optimal outlier-robust geometric perception: Semidef- inite relaxations and scalable global optimization.IEEE Transactions on Pattern Analysis and Machine Intelligence, 45(3):2816–2834, 2022

  18. [18]

    Cambridge University Press, 2015

    Jean Bernard Lasserre.An introduction to polynomial and semi-algebraic optimization, vol- ume 52. Cambridge University Press, 2015

  19. [19]

    Cambridge University Press, 2016

    Vern I Paulsen and Mrinal Raghupathi.An introduction to the theory of reproducing kernel Hilbert spaces, volume 152. Cambridge University Press, 2016. 10

  20. [20]

    Smola.Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond

    Bernhard Schölkopf and Alexander J. Smola.Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond. MIT Press, 2018

  21. [21]

    Infinite-dimensional sums-of-squares for optimal control

    Eloïse Berthier, Justin Carpentier, Alessandro Rudi, and Francis Bach. Infinite-dimensional sums-of-squares for optimal control. In2022 IEEE 61st Conference on Decision and Control (CDC), pages 577–582. IEEE, 2022

  22. [22]

    Nonlinear opti- mal control via occupation measures and LMI-relaxations.SIAM Journal on Control and Optimization, 47(4):1643–1666, 2008

    Jean B Lasserre, Didier Henrion, Christophe Prieur, and Emmanuel Trélat. Nonlinear opti- mal control via occupation measures and LMI-relaxations.SIAM Journal on Control and Optimization, 47(4):1643–1666, 2008

  23. [23]

    Sampling-Based Global Optimal Control and Estimation via Semidefinite Programming

    Antoine Groudiev, Fabian Schramm, Éloïse Berthier, Justin Carpentier, and Frederike Dümbgen. Sampling-based global optimal control and estimation via semidefinite programming.arXiv preprint arXiv:2507.17572, 2025

  24. [24]

    Global Sampling-Based Trajectory Optimization for Contact-Rich Manipulation via KernelSOS

    Zhongqi Wei and Frederike Dümbgen. Global sampling-based trajectory optimization for contact-rich manipulation via kernelsos.arXiv preprint arXiv:2604.27175, 2026

  25. [25]

    Bayesian optimization

    Peter I Frazier. Bayesian optimization. InRecent Advances in Optimization and Modeling of Contemporary Problems, pages 255–278. Informs, 2018

  26. [26]

    Efficient global optimization of expensive black-box functions.Journal of Global optimization, 13(4):455–492, 1998

    Donald R Jones, Matthias Schonlau, and William J Welch. Efficient global optimization of expensive black-box functions.Journal of Global optimization, 13(4):455–492, 1998

  27. [27]

    Algorithms for hyper- parameter optimization.Advances in Neural Information Processing Systems, 24, 2011

    James Bergstra, Rémi Bardenet, Yoshua Bengio, and Balázs Kégl. Algorithms for hyper- parameter optimization.Advances in Neural Information Processing Systems, 24, 2011

  28. [28]

    SIAM, 2009

    Andrew R Conn, Katya Scheinberg, and Luis N Vicente.Introduction to derivative-free optimization. SIAM, 2009

  29. [29]

    Cora, and Nando de Freitas

    Eric Brochu, Vlad M. Cora, and Nando de Freitas. A Tutorial on Bayesian Optimization of Expensive Cost Functions, with Application to Active User Modeling and Hierarchical Reinforcement Learning. Technical report, Department of Computer Science, University of British Columbia, 2009

  30. [30]

    Gaussian processes for machine learning.International Journal of Neural Systems, 14(02):69–106, 2004

    Matthias Seeger. Gaussian processes for machine learning.International Journal of Neural Systems, 14(02):69–106, 2004

  31. [31]

    Non-parametric models for non- negative functions.Advances in Neural Information Processing Systems, 33:12816–12826, 2020

    Ulysse Marteau-Ferey, Francis Bach, and Alessandro Rudi. Non-parametric models for non- negative functions.Advances in Neural Information Processing Systems, 33:12816–12826, 2020

  32. [32]

    Surjanovic and D

    S. Surjanovic and D. Bingham. Virtual Library of Simulation Experiments: Test Functions and Datasets.http://www.sfu.ca/~ssurjano, 2026. Accessed: May 6, 2026

  33. [33]

    Gaussian processes for regression.Advances in Neural Information Processing Systems, 8, 1995

    Christopher Williams and Carl Rasmussen. Gaussian processes for regression.Advances in Neural Information Processing Systems, 8, 1995

  34. [34]

    ksos-tools: Implementation of the Kernel Sum- of-Squares optimization framework and various related solvers

    Antoine Groudiev and Frederike Dümbgen. ksos-tools: Implementation of the Kernel Sum- of-Squares optimization framework and various related solvers. https://github.com/ Simple-Robotics/ksos-tools, 2025. Version 0.2.2

  35. [35]

    Nomura and M

    Masahiro Nomura, Masashi Shibata, and Ryoki Hamano. cmaes: A Simple yet Practical Python Library for CMA-ES.arXiv preprint arXiv:2402.01373, 2026. 11 A THEORETICAL BACKGROUND A.1 Gaussian Process A Gaussian process (GP) is a collection of random variables {f(x)} x∈X such that any finite subset follows a joint Gaussian distribution. A GP is fully specified...