KSOS-BO: Improving Sampling in Bayesian Optimization via Kernel Sum of Squares

Buqing Ou; Frederike D\"umbgen

arxiv: 2605.21179 · v1 · pith:IEFCL5WEnew · submitted 2026-05-20 · 💻 cs.CE

KSOS-BO: Improving Sampling in Bayesian Optimization via Kernel Sum of Squares

Buqing Ou , Frederike D\"umbgen This is my paper

Pith reviewed 2026-05-21 01:28 UTC · model grok-4.3

classification 💻 cs.CE

keywords Bayesian optimizationacquisition function optimizationkernel sum of squaressemidefinite programmingglobal optimizationderivative-free methods

0 comments

The pith

Formulating acquisition optimization as a kernel sum of squares semidefinite program enables better global search in Bayesian optimization than standard derivative-free methods.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Bayesian optimization selects points by maximizing an acquisition function after fitting a model, but this step is usually treated as a black-box problem. The paper introduces KSOS-BO, which uses kernel sum of squares representations to turn acquisition maximization into a semidefinite program that supports structured global search without derivatives. On a range of benchmark functions with different landscape features, the approach reduces regret more than Sobol search, differential evolution, or CMA-ES and reaches good solutions faster in wall-clock time despite higher per-iteration cost. A sympathetic reader would care because expensive evaluations appear in hyperparameter tuning, robotics, and experiment design, so fewer wasted samples directly lowers total cost.

Core claim

KSOS-BO formulates the optimization of the acquisition function as a semidefinite program with kernel-induced representations, enabling a structured global search. Across diverse benchmark functions with varying landscape properties, KSOS-BO consistently outperforms derivative-free baselines using Sobol Search, Differential Evolution, or CMA-ES to optimize the acquisition function, achieving an average regret improvement of 81.16% on 10/15 benchmarks and faster wall-clock convergence with an average improvement of 93.55% on 10/15 benchmarks, with strong results especially on highly multimodal and unimodal but ill-conditioned functions.

What carries the argument

The kernel sum of squares representation that recasts acquisition function optimization as a globally solvable semidefinite program.

If this is right

It delivers an average regret improvement of 81.16 percent on ten out of fifteen tested benchmarks.
It reaches high-quality solutions with fewer evaluations, yielding 93.55 percent average wall-clock improvement on ten out of fifteen benchmarks.
It performs especially well on highly multimodal functions and on unimodal but ill-conditioned functions.
It shows reduced effectiveness on functions that contain steep drops or plate-shaped regions.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The semidefinite-program step may combine usefully with local refinement methods to handle remaining difficult landscape features.
Similar kernel-sum-of-squares reformulations could be applied to other structured black-box optimization problems outside Bayesian optimization.
Testing the method on higher-dimensional problems would reveal whether the semidefinite solver remains practical as dimension grows.

Load-bearing premise

The kernel sum of squares representation supplies a sufficiently accurate and globally solvable formulation of the acquisition function without missing important optima or adding significant bias on the tested landscape types.

What would settle it

If KSOS-BO fails to locate acquisition maxima that are better than those found by Sobol search or CMA-ES on a new set of functions containing steep drops or plateaus, the claim that the representation enables reliable global improvement would be challenged.

Figures

Figures reproduced from arXiv: 2605.21179 by Buqing Ou, Frederike D\"umbgen.

**Figure 2.** Figure 2: Convergence comparison on 15 benchmarks in 10 dimensions using [PITH_FULL_IMAGE:figures/full_fig_p007_2.png] view at source ↗

**Figure 3.** Figure 3: Average runtime over 400 iterations for four acquisition optimizers across 6 benchmark [PITH_FULL_IMAGE:figures/full_fig_p007_3.png] view at source ↗

**Figure 4.** Figure 4: Convergence comparison on 15 benchmarks in 10 dimensions using [PITH_FULL_IMAGE:figures/full_fig_p008_4.png] view at source ↗

**Figure 5.** Figure 5: Visualization of KernelSOS surrogate approximation on 1D EI after 9 BO steps. [PITH_FULL_IMAGE:figures/full_fig_p009_5.png] view at source ↗

**Figure 6.** Figure 6: Convergence comparison on 6 benchmarks in 10 dimensions using [PITH_FULL_IMAGE:figures/full_fig_p019_6.png] view at source ↗

**Figure 7.** Figure 7: Convergence comparison on 6 10-dimensional benchmark functions with [PITH_FULL_IMAGE:figures/full_fig_p019_7.png] view at source ↗

**Figure 8.** Figure 8: Convergence comparison on 6 10-dimensional benchmark functions with [PITH_FULL_IMAGE:figures/full_fig_p020_8.png] view at source ↗

**Figure 9.** Figure 9: Convergence comparison on 6 10-dimensional benchmark functions with [PITH_FULL_IMAGE:figures/full_fig_p021_9.png] view at source ↗

**Figure 10.** Figure 10: Convergence comparison on 6 10-dimensional benchmark functions with [PITH_FULL_IMAGE:figures/full_fig_p021_10.png] view at source ↗

**Figure 11.** Figure 11: Convergence comparison on 6 benchmarks in 2 dimensions using [PITH_FULL_IMAGE:figures/full_fig_p022_11.png] view at source ↗

**Figure 12.** Figure 12: Convergence comparison on 6 benchmarks in 5 dimensions using [PITH_FULL_IMAGE:figures/full_fig_p022_12.png] view at source ↗

**Figure 13.** Figure 13: Convergence comparison of Sobol and uniform sampling for KSOS-BO, along with [PITH_FULL_IMAGE:figures/full_fig_p023_13.png] view at source ↗

**Figure 14.** Figure 14: Convergence comparison of CMA-ES with different population sizes to KSOS-BO on 6 10-dimensional benchmark functions using EI over 400 iterations. Curves show the simple regret across 5 seeds on a logarithmic scale, with shaded regions indicating the 95% confidence interval [PITH_FULL_IMAGE:figures/full_fig_p024_14.png] view at source ↗

**Figure 15.** Figure 15: Convergence comparison of DE with different population sizes and crossover rate to KSOS-BO on 6 10-dimensional benchmark functions using EI over 400 iterations. Curves show the simple regret across 5 seeds on a logarithmic scale, with shaded regions indicating the 95% confidence interval. number of generations from 5 to 3, testing whether greater population diversity is more beneficial than additional ite… view at source ↗

**Figure 16.** Figure 16: Convergence comparison on 6 benchmark functions in 10 dimensions using [PITH_FULL_IMAGE:figures/full_fig_p025_16.png] view at source ↗

read the original abstract

Bayesian Optimization (BO) is an effective framework for globally optimizing functions whose evaluations are expensive. It is particularly effective for optimizing functions defined over continuous domains and explicitly handles stochastic noise in evaluations. As a result, it is widely applied in areas such as hyperparameter tuning, robotics policy search, and scientific experiment design, where sample efficiency is essential. Its two-step procedure consists of model fitting followed by optimization of the acquisition function, which is often treated as a generic black-box problem despite its structured nature. In this work, we introduce KSOS-BO, a kernel-based derivative-free framework for BO acquisition optimization. KSOS-BO formulates the optimization of the acquisition function as a semidefinite program with kernel-induced representations, enabling a structured global search. Across a diverse set of benchmark functions with varying landscape properties, KSOS-BO consistently outperforms derivative-free baselines using Sobol Search, Differential Evolution, or CMA-ES to optimize the acquisition function, achieving an average regret improvement of 81.16% on 10/15 benchmarks. In particular, KSOS-BO demonstrates strong performance in highly multimodal and unimodal but ill-conditioned functions, indicating its applicability to diverse landscape structures. Despite a higher per-iteration computational cost, it converges faster in wall-clock time with an average improvement of 93.55% on 10/15 benchmarks, as it reaches high-quality solutions with fewer evaluations. Limitations include reduced effectiveness on functions with steep drops or plate-shaped regions.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

KSOS-BO casts acquisition optimization as a kernel sum-of-squares SDP and claims large gains over derivative-free baselines, but the experiments lack key details on reproducibility.

read the letter

KSOS-BO reformulates acquisition function optimization in Bayesian optimization as a kernel sum-of-squares semidefinite program. The authors report that this leads to better regret and faster wall-clock convergence than derivative-free baselines on most of their test functions. The formulation is the main novelty here. By representing the acquisition function in a kernel space and using sum of squares, they turn the problem into an SDP that can be solved more reliably for global optima. This is a structured alternative to treating it as a generic optimization task. On the benchmarks, it shows strong results especially for multimodal landscapes, and the authors note the trade-off of higher per-step cost but overall faster convergence due to fewer evaluations needed. The experimental claims would be more convincing with more information on run counts and variability. The 81 percent average regret improvement is presented without those details, so it is difficult to know how robust it is. There is also the question of how well the SOS representation captures the true acquisition function for things like expected improvement, which are not straightforward polynomials. If there is approximation involved, that could explain some of the performance differences or limit applicability. This work is aimed at the Bayesian optimization crowd, particularly those thinking about better ways to solve the acquisition step. A reader who cares about SDP relaxations or kernel methods in optimization would find the technical setup useful. I would send it for peer review. The core idea has enough substance to warrant expert feedback on both the math and the experiments.

Referee Report

2 major / 2 minor

Summary. The manuscript introduces KSOS-BO, a kernel-based framework that reformulates acquisition-function optimization in Bayesian optimization as a semidefinite program using kernel-induced sum-of-squares representations. This is claimed to enable structured global search. On 15 benchmark functions, KSOS-BO is reported to outperform derivative-free baselines (Sobol search, Differential Evolution, CMA-ES) with an average regret improvement of 81.16% on 10/15 benchmarks and an average wall-clock convergence improvement of 93.55% on 10/15 benchmarks, performing especially well on multimodal and ill-conditioned landscapes while noting limitations on steep-drop or plate-shaped functions.

Significance. If the kernel-SOS SDP formulation recovers near-global optima of the acquisition function with negligible bias or duality gap, the approach could meaningfully improve sample efficiency in Bayesian optimization for expensive black-box problems. The reported regret and wall-clock gains are large enough to be practically relevant in hyperparameter tuning and robotics applications. However, the absence of reproducibility details (independent runs, statistical tests, baseline hyperparameter protocols) and explicit tightness guarantees for the SDP relaxation substantially weakens the assessed significance of the empirical claims.

major comments (2)

[Method (kernel SOS formulation) and Experimental Results] The central empirical claims rest on KSOS-BO locating superior acquisition-function values compared with the derivative-free baselines. Yet the manuscript provides no explicit error bounds, duality-gap certificates, or direct comparison of attained acquisition values against a reference global solver for the non-polynomial acquisition functions (e.g., EI) used in the experiments. Without such verification, it remains possible that the kernel representation introduces systematic bias on certain landscape types, undermining the 81% regret and 93% wall-clock improvements.
[Experimental Results] Table or figure reporting the 81.16% and 93.55% average improvements: the text states results on 10/15 benchmarks but supplies no information on the number of independent runs, standard deviations, or statistical significance tests. This omission makes it impossible to judge whether the reported gains are robust or could be explained by variability in the baselines.

minor comments (2)

[Abstract] The abstract and introduction should explicitly define the acquisition functions tested (EI, UCB, etc.) and the precise kernel family and truncation order used in the SOS representation.
[Implementation details] Clarify whether the SDP is solved to global optimality in every iteration or whether early termination heuristics are employed; this directly affects the claimed wall-clock advantage.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their detailed review and constructive feedback. We have carefully considered the major comments and will revise the manuscript accordingly to strengthen the empirical validation and reporting. Below we provide point-by-point responses.

read point-by-point responses

Referee: The central empirical claims rest on KSOS-BO locating superior acquisition-function values compared with the derivative-free baselines. Yet the manuscript provides no explicit error bounds, duality-gap certificates, or direct comparison of attained acquisition values against a reference global solver for the non-polynomial acquisition functions (e.g., EI) used in the experiments. Without such verification, it remains possible that the kernel representation introduces systematic bias on certain landscape types, undermining the 81% regret and 93% wall-clock improvements.

Authors: We appreciate this observation. While the kernel sum-of-squares formulation provides a convex relaxation that enables global optimization of the approximated acquisition function, we acknowledge that explicit bounds on the approximation error or duality gaps are not provided in the current manuscript. To address this, we will include additional experiments comparing the attained acquisition values to those from a reference global optimizer (such as multi-start gradient descent or grid search in low dimensions) and report the observed duality gaps from the SDP solver. This will help quantify any potential bias introduced by the kernel representation. revision: yes
Referee: [Experimental Results] Table or figure reporting the 81.16% and 93.55% average improvements: the text states results on 10/15 benchmarks but supplies no information on the number of independent runs, standard deviations, or statistical significance tests. This omission makes it impossible to judge whether the reported gains are robust or could be explained by variability in the baselines.

Authors: We agree that reporting the number of independent runs, standard deviations, and statistical significance is essential for assessing the robustness of the results. In the revised manuscript, we will add this information to the relevant tables and figures, including details on the experimental protocol (e.g., 10 independent runs per benchmark) and results of paired t-tests or similar to establish statistical significance of the improvements. revision: yes

Circularity Check

0 steps flagged

No circularity in KSOS-BO derivation or claims

full rationale

The paper introduces KSOS-BO as a kernel sum-of-squares SDP formulation for acquisition-function optimization, solved via an external SDP solver. Performance claims rest on empirical regret and wall-clock comparisons against derivative-free baselines on independent benchmark functions with varying landscapes. No equations or steps reduce a central result to a fitted parameter, self-defined quantity, or self-citation chain by construction. The method is externally validated rather than internally forced, satisfying the criteria for a self-contained derivation.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The approach rests on the domain assumption that kernel sum of squares can faithfully represent acquisition functions for global optimization via SDP; no free parameters or invented entities are explicitly introduced in the abstract.

axioms (1)

domain assumption Acquisition functions admit a kernel-induced sum-of-squares representation that converts their maximization into a tractable semidefinite program.
This is the central modeling step that enables the structured search described in the abstract.

pith-pipeline@v0.9.0 · 5798 in / 1377 out tokens · 39140 ms · 2026-05-21T01:28:14.025848+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

KSOS-BO formulates the optimization of the acquisition function as a semidefinite program with kernel-induced representations... c⋆_KSOS = max c,B⪰0 c−λTr(B) s.t. α(xi)−c=ϕ⊤i B ϕi
IndisputableMonolith/Foundation/AlphaCoordinateFixation.lean J_uniquely_calibrated_via_higher_derivative unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

Kernel Sum-of-Squares (KernelSOS) extends SOS beyond polynomial representations by replacing polynomial bases with kernel functions

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

35 extracted references · 35 canonical work pages · 3 internal anchors

[1]

Taking the human out of the loop: A review of Bayesian optimization.Proceedings of the IEEE, 104(1):148–175, 2015

Bobak Shahriari, Kevin Swersky, Ziyu Wang, Ryan P Adams, and Nando De Freitas. Taking the human out of the loop: A review of Bayesian optimization.Proceedings of the IEEE, 104(1):148–175, 2015

work page 2015
[2]

Jasper Snoek, Oren Rippel, Kevin Swersky, Ryan Kiros, Nadathur Satish, Narayanan Sundaram, Mostofa Patwary, Prabhat, and Ryan P. Adams. Scalable Bayesian Optimization Using Deep Neural Networks. InProceedings of the International Conference on Machine Learning, pages 2171–2180. PMLR, 2015

work page 2015
[3]

A new acquisition function for Bayesian optimization based on the moment-generating function

Hao Wang, Bas van Stein, Michael Emmerich, and Thomas Back. A new acquisition function for Bayesian optimization based on the moment-generating function. In2017 IEEE International Conference on Systems, Man, and Cybernetics (SMC), pages 507–512. IEEE, 2017

work page 2017
[4]

Acquisition functions in Bayesian optimization

Weiao Gan, Ziyuan Ji, and Yongqing Liang. Acquisition functions in Bayesian optimization. In 2021 2nd International Conference on Big Data & Artificial Intelligence & misc Engineering (ICBASE), pages 129–135. IEEE, 2021

work page 2021
[5]

Maximizing acquisition functions for Bayesian optimization.Advances in Neural Information Processing Systems, 31, 2018

James Wilson, Frank Hutter, and Marc Deisenroth. Maximizing acquisition functions for Bayesian optimization.Advances in Neural Information Processing Systems, 31, 2018

work page 2018
[6]

Differential Evolution: A survey of the state-of-the-art.IEEE Transactions on Evolutionary Computation, 15(1):4–31, 2010

Swagatam Das and Ponnuthurai Nagaratnam Suganthan. Differential Evolution: A survey of the state-of-the-art.IEEE Transactions on Evolutionary Computation, 15(1):4–31, 2010

work page 2010
[7]

The CMA Evolution Strategy: A Tutorial

Nikolaus Hansen. The CMA evolution strategy: A tutorial.arXiv preprint arXiv:1604.00772, 2016

work page internal anchor Pith review Pith/arXiv arXiv 2016
[8]

Efficient benchmarking of hyperparameter optimizers via surrogates

Katharina Eggensperger, Frank Hutter, Holger Hoos, and Kevin Leyton-Brown. Efficient benchmarking of hyperparameter optimizers via surrogates. InProceedings of the AAAI Conference on Artificial Intelligence, volume 29, 2015

work page 2015
[9]

Springer, 2006

Jorge Nocedal and Stephen J Wright.Numerical optimization. Springer, 2006

work page 2006
[10]

Global optimization with polynomials and the problem of moments.SIAM Journal on Optimization, 11(3):796–817, 2001

Jean B Lasserre. Global optimization with polynomials and the problem of moments.SIAM Journal on Optimization, 11(3):796–817, 2001

work page 2001
[11]

Semidefinite programming relaxations for semialgebraic problems.Mathemat- ical Programming, 96(2):293–320, 2003

Pablo A Parrilo. Semidefinite programming relaxations for semialgebraic problems.Mathemat- ical Programming, 96(2):293–320, 2003

work page 2003
[12]

World Scientific, 2009

Jean Bernard Lasserre.Moments, positive polynomials and their applications, volume 1. World Scientific, 2009

work page 2009
[13]

Practical bayesian optimization of machine learning algorithms.Advances in Neural Information Processing Systems, 25, 2012

Jasper Snoek, Hugo Larochelle, and Ryan P Adams. Practical bayesian optimization of machine learning algorithms.Advances in Neural Information Processing Systems, 25, 2012

work page 2012
[14]

Finding global minima via kernel approximations: A

Alessandro Rudi, Ulysse Marteau-Ferey, and Francis Bach. Finding global minima via kernel approximations: A. Rudi et al.Mathematical Programming, 209(1):703–784, 2025

work page 2025
[15]

PhD thesis, California Institute of Technology, 2000

Pablo A Parrilo.Structured Semidefinite Programs and Semialgebraic Geometry Methods in Robustness and Optimization. PhD thesis, California Institute of Technology, 2000

work page 2000
[16]

LQR-trees: Feedback motion planning via sums-of-squares verification.The International Journal of Robotics Research, 29(8):1038–1052, 2010

Russ Tedrake, Ian R Manchester, Mark Tobenkin, and John W Roberts. LQR-trees: Feedback motion planning via sums-of-squares verification.The International Journal of Robotics Research, 29(8):1038–1052, 2010

work page 2010
[17]

Heng Yang and Luca Carlone. Certifiably optimal outlier-robust geometric perception: Semidef- inite relaxations and scalable global optimization.IEEE Transactions on Pattern Analysis and Machine Intelligence, 45(3):2816–2834, 2022

work page 2022
[18]

Cambridge University Press, 2015

Jean Bernard Lasserre.An introduction to polynomial and semi-algebraic optimization, vol- ume 52. Cambridge University Press, 2015

work page 2015
[19]

Cambridge University Press, 2016

Vern I Paulsen and Mrinal Raghupathi.An introduction to the theory of reproducing kernel Hilbert spaces, volume 152. Cambridge University Press, 2016. 10

work page 2016
[20]

Smola.Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond

Bernhard Schölkopf and Alexander J. Smola.Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond. MIT Press, 2018

work page 2018
[21]

Infinite-dimensional sums-of-squares for optimal control

Eloïse Berthier, Justin Carpentier, Alessandro Rudi, and Francis Bach. Infinite-dimensional sums-of-squares for optimal control. In2022 IEEE 61st Conference on Decision and Control (CDC), pages 577–582. IEEE, 2022

work page 2022
[22]

Nonlinear opti- mal control via occupation measures and LMI-relaxations.SIAM Journal on Control and Optimization, 47(4):1643–1666, 2008

Jean B Lasserre, Didier Henrion, Christophe Prieur, and Emmanuel Trélat. Nonlinear opti- mal control via occupation measures and LMI-relaxations.SIAM Journal on Control and Optimization, 47(4):1643–1666, 2008

work page 2008
[23]

Sampling-Based Global Optimal Control and Estimation via Semidefinite Programming

Antoine Groudiev, Fabian Schramm, Éloïse Berthier, Justin Carpentier, and Frederike Dümbgen. Sampling-based global optimal control and estimation via semidefinite programming.arXiv preprint arXiv:2507.17572, 2025

work page internal anchor Pith review Pith/arXiv arXiv 2025
[24]

Global Sampling-Based Trajectory Optimization for Contact-Rich Manipulation via KernelSOS

Zhongqi Wei and Frederike Dümbgen. Global sampling-based trajectory optimization for contact-rich manipulation via kernelsos.arXiv preprint arXiv:2604.27175, 2026

work page internal anchor Pith review Pith/arXiv arXiv 2026
[25]

Bayesian optimization

Peter I Frazier. Bayesian optimization. InRecent Advances in Optimization and Modeling of Contemporary Problems, pages 255–278. Informs, 2018

work page 2018
[26]

Efficient global optimization of expensive black-box functions.Journal of Global optimization, 13(4):455–492, 1998

Donald R Jones, Matthias Schonlau, and William J Welch. Efficient global optimization of expensive black-box functions.Journal of Global optimization, 13(4):455–492, 1998

work page 1998
[27]

Algorithms for hyper- parameter optimization.Advances in Neural Information Processing Systems, 24, 2011

James Bergstra, Rémi Bardenet, Yoshua Bengio, and Balázs Kégl. Algorithms for hyper- parameter optimization.Advances in Neural Information Processing Systems, 24, 2011

work page 2011
[28]

SIAM, 2009

Andrew R Conn, Katya Scheinberg, and Luis N Vicente.Introduction to derivative-free optimization. SIAM, 2009

work page 2009
[29]

Cora, and Nando de Freitas

Eric Brochu, Vlad M. Cora, and Nando de Freitas. A Tutorial on Bayesian Optimization of Expensive Cost Functions, with Application to Active User Modeling and Hierarchical Reinforcement Learning. Technical report, Department of Computer Science, University of British Columbia, 2009

work page 2009
[30]

Gaussian processes for machine learning.International Journal of Neural Systems, 14(02):69–106, 2004

Matthias Seeger. Gaussian processes for machine learning.International Journal of Neural Systems, 14(02):69–106, 2004

work page 2004
[31]

Non-parametric models for non- negative functions.Advances in Neural Information Processing Systems, 33:12816–12826, 2020

Ulysse Marteau-Ferey, Francis Bach, and Alessandro Rudi. Non-parametric models for non- negative functions.Advances in Neural Information Processing Systems, 33:12816–12826, 2020

work page 2020
[32]

Surjanovic and D

S. Surjanovic and D. Bingham. Virtual Library of Simulation Experiments: Test Functions and Datasets.http://www.sfu.ca/~ssurjano, 2026. Accessed: May 6, 2026

work page 2026
[33]

Gaussian processes for regression.Advances in Neural Information Processing Systems, 8, 1995

Christopher Williams and Carl Rasmussen. Gaussian processes for regression.Advances in Neural Information Processing Systems, 8, 1995

work page 1995
[34]

ksos-tools: Implementation of the Kernel Sum- of-Squares optimization framework and various related solvers

Antoine Groudiev and Frederike Dümbgen. ksos-tools: Implementation of the Kernel Sum- of-Squares optimization framework and various related solvers. https://github.com/ Simple-Robotics/ksos-tools, 2025. Version 0.2.2

work page 2025
[35]

Nomura and M

Masahiro Nomura, Masashi Shibata, and Ryoki Hamano. cmaes: A Simple yet Practical Python Library for CMA-ES.arXiv preprint arXiv:2402.01373, 2026. 11 A THEORETICAL BACKGROUND A.1 Gaussian Process A Gaussian process (GP) is a collection of random variables {f(x)} x∈X such that any finite subset follows a joint Gaussian distribution. A GP is fully specified...

work page arXiv 2026

[1] [1]

Taking the human out of the loop: A review of Bayesian optimization.Proceedings of the IEEE, 104(1):148–175, 2015

Bobak Shahriari, Kevin Swersky, Ziyu Wang, Ryan P Adams, and Nando De Freitas. Taking the human out of the loop: A review of Bayesian optimization.Proceedings of the IEEE, 104(1):148–175, 2015

work page 2015

[2] [2]

Jasper Snoek, Oren Rippel, Kevin Swersky, Ryan Kiros, Nadathur Satish, Narayanan Sundaram, Mostofa Patwary, Prabhat, and Ryan P. Adams. Scalable Bayesian Optimization Using Deep Neural Networks. InProceedings of the International Conference on Machine Learning, pages 2171–2180. PMLR, 2015

work page 2015

[3] [3]

A new acquisition function for Bayesian optimization based on the moment-generating function

Hao Wang, Bas van Stein, Michael Emmerich, and Thomas Back. A new acquisition function for Bayesian optimization based on the moment-generating function. In2017 IEEE International Conference on Systems, Man, and Cybernetics (SMC), pages 507–512. IEEE, 2017

work page 2017

[4] [4]

Acquisition functions in Bayesian optimization

Weiao Gan, Ziyuan Ji, and Yongqing Liang. Acquisition functions in Bayesian optimization. In 2021 2nd International Conference on Big Data & Artificial Intelligence & misc Engineering (ICBASE), pages 129–135. IEEE, 2021

work page 2021

[5] [5]

Maximizing acquisition functions for Bayesian optimization.Advances in Neural Information Processing Systems, 31, 2018

James Wilson, Frank Hutter, and Marc Deisenroth. Maximizing acquisition functions for Bayesian optimization.Advances in Neural Information Processing Systems, 31, 2018

work page 2018

[6] [6]

Differential Evolution: A survey of the state-of-the-art.IEEE Transactions on Evolutionary Computation, 15(1):4–31, 2010

Swagatam Das and Ponnuthurai Nagaratnam Suganthan. Differential Evolution: A survey of the state-of-the-art.IEEE Transactions on Evolutionary Computation, 15(1):4–31, 2010

work page 2010

[7] [7]

The CMA Evolution Strategy: A Tutorial

Nikolaus Hansen. The CMA evolution strategy: A tutorial.arXiv preprint arXiv:1604.00772, 2016

work page internal anchor Pith review Pith/arXiv arXiv 2016

[8] [8]

Efficient benchmarking of hyperparameter optimizers via surrogates

Katharina Eggensperger, Frank Hutter, Holger Hoos, and Kevin Leyton-Brown. Efficient benchmarking of hyperparameter optimizers via surrogates. InProceedings of the AAAI Conference on Artificial Intelligence, volume 29, 2015

work page 2015

[9] [9]

Springer, 2006

Jorge Nocedal and Stephen J Wright.Numerical optimization. Springer, 2006

work page 2006

[10] [10]

Global optimization with polynomials and the problem of moments.SIAM Journal on Optimization, 11(3):796–817, 2001

Jean B Lasserre. Global optimization with polynomials and the problem of moments.SIAM Journal on Optimization, 11(3):796–817, 2001

work page 2001

[11] [11]

Semidefinite programming relaxations for semialgebraic problems.Mathemat- ical Programming, 96(2):293–320, 2003

Pablo A Parrilo. Semidefinite programming relaxations for semialgebraic problems.Mathemat- ical Programming, 96(2):293–320, 2003

work page 2003

[12] [12]

World Scientific, 2009

Jean Bernard Lasserre.Moments, positive polynomials and their applications, volume 1. World Scientific, 2009

work page 2009

[13] [13]

Practical bayesian optimization of machine learning algorithms.Advances in Neural Information Processing Systems, 25, 2012

Jasper Snoek, Hugo Larochelle, and Ryan P Adams. Practical bayesian optimization of machine learning algorithms.Advances in Neural Information Processing Systems, 25, 2012

work page 2012

[14] [14]

Finding global minima via kernel approximations: A

Alessandro Rudi, Ulysse Marteau-Ferey, and Francis Bach. Finding global minima via kernel approximations: A. Rudi et al.Mathematical Programming, 209(1):703–784, 2025

work page 2025

[15] [15]

PhD thesis, California Institute of Technology, 2000

Pablo A Parrilo.Structured Semidefinite Programs and Semialgebraic Geometry Methods in Robustness and Optimization. PhD thesis, California Institute of Technology, 2000

work page 2000

[16] [16]

LQR-trees: Feedback motion planning via sums-of-squares verification.The International Journal of Robotics Research, 29(8):1038–1052, 2010

Russ Tedrake, Ian R Manchester, Mark Tobenkin, and John W Roberts. LQR-trees: Feedback motion planning via sums-of-squares verification.The International Journal of Robotics Research, 29(8):1038–1052, 2010

work page 2010

[17] [17]

Heng Yang and Luca Carlone. Certifiably optimal outlier-robust geometric perception: Semidef- inite relaxations and scalable global optimization.IEEE Transactions on Pattern Analysis and Machine Intelligence, 45(3):2816–2834, 2022

work page 2022

[18] [18]

Cambridge University Press, 2015

Jean Bernard Lasserre.An introduction to polynomial and semi-algebraic optimization, vol- ume 52. Cambridge University Press, 2015

work page 2015

[19] [19]

Cambridge University Press, 2016

Vern I Paulsen and Mrinal Raghupathi.An introduction to the theory of reproducing kernel Hilbert spaces, volume 152. Cambridge University Press, 2016. 10

work page 2016

[20] [20]

Smola.Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond

Bernhard Schölkopf and Alexander J. Smola.Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond. MIT Press, 2018

work page 2018

[21] [21]

Infinite-dimensional sums-of-squares for optimal control

Eloïse Berthier, Justin Carpentier, Alessandro Rudi, and Francis Bach. Infinite-dimensional sums-of-squares for optimal control. In2022 IEEE 61st Conference on Decision and Control (CDC), pages 577–582. IEEE, 2022

work page 2022

[22] [22]

Nonlinear opti- mal control via occupation measures and LMI-relaxations.SIAM Journal on Control and Optimization, 47(4):1643–1666, 2008

Jean B Lasserre, Didier Henrion, Christophe Prieur, and Emmanuel Trélat. Nonlinear opti- mal control via occupation measures and LMI-relaxations.SIAM Journal on Control and Optimization, 47(4):1643–1666, 2008

work page 2008

[23] [23]

Sampling-Based Global Optimal Control and Estimation via Semidefinite Programming

Antoine Groudiev, Fabian Schramm, Éloïse Berthier, Justin Carpentier, and Frederike Dümbgen. Sampling-based global optimal control and estimation via semidefinite programming.arXiv preprint arXiv:2507.17572, 2025

work page internal anchor Pith review Pith/arXiv arXiv 2025

[24] [24]

Global Sampling-Based Trajectory Optimization for Contact-Rich Manipulation via KernelSOS

Zhongqi Wei and Frederike Dümbgen. Global sampling-based trajectory optimization for contact-rich manipulation via kernelsos.arXiv preprint arXiv:2604.27175, 2026

work page internal anchor Pith review Pith/arXiv arXiv 2026

[25] [25]

Bayesian optimization

Peter I Frazier. Bayesian optimization. InRecent Advances in Optimization and Modeling of Contemporary Problems, pages 255–278. Informs, 2018

work page 2018

[26] [26]

Efficient global optimization of expensive black-box functions.Journal of Global optimization, 13(4):455–492, 1998

Donald R Jones, Matthias Schonlau, and William J Welch. Efficient global optimization of expensive black-box functions.Journal of Global optimization, 13(4):455–492, 1998

work page 1998

[27] [27]

Algorithms for hyper- parameter optimization.Advances in Neural Information Processing Systems, 24, 2011

James Bergstra, Rémi Bardenet, Yoshua Bengio, and Balázs Kégl. Algorithms for hyper- parameter optimization.Advances in Neural Information Processing Systems, 24, 2011

work page 2011

[28] [28]

SIAM, 2009

Andrew R Conn, Katya Scheinberg, and Luis N Vicente.Introduction to derivative-free optimization. SIAM, 2009

work page 2009

[29] [29]

Cora, and Nando de Freitas

Eric Brochu, Vlad M. Cora, and Nando de Freitas. A Tutorial on Bayesian Optimization of Expensive Cost Functions, with Application to Active User Modeling and Hierarchical Reinforcement Learning. Technical report, Department of Computer Science, University of British Columbia, 2009

work page 2009

[30] [30]

Gaussian processes for machine learning.International Journal of Neural Systems, 14(02):69–106, 2004

Matthias Seeger. Gaussian processes for machine learning.International Journal of Neural Systems, 14(02):69–106, 2004

work page 2004

[31] [31]

Non-parametric models for non- negative functions.Advances in Neural Information Processing Systems, 33:12816–12826, 2020

Ulysse Marteau-Ferey, Francis Bach, and Alessandro Rudi. Non-parametric models for non- negative functions.Advances in Neural Information Processing Systems, 33:12816–12826, 2020

work page 2020

[32] [32]

Surjanovic and D

S. Surjanovic and D. Bingham. Virtual Library of Simulation Experiments: Test Functions and Datasets.http://www.sfu.ca/~ssurjano, 2026. Accessed: May 6, 2026

work page 2026

[33] [33]

Gaussian processes for regression.Advances in Neural Information Processing Systems, 8, 1995

Christopher Williams and Carl Rasmussen. Gaussian processes for regression.Advances in Neural Information Processing Systems, 8, 1995

work page 1995

[34] [34]

ksos-tools: Implementation of the Kernel Sum- of-Squares optimization framework and various related solvers

Antoine Groudiev and Frederike Dümbgen. ksos-tools: Implementation of the Kernel Sum- of-Squares optimization framework and various related solvers. https://github.com/ Simple-Robotics/ksos-tools, 2025. Version 0.2.2

work page 2025

[35] [35]

Nomura and M

Masahiro Nomura, Masashi Shibata, and Ryoki Hamano. cmaes: A Simple yet Practical Python Library for CMA-ES.arXiv preprint arXiv:2402.01373, 2026. 11 A THEORETICAL BACKGROUND A.1 Gaussian Process A Gaussian process (GP) is a collection of random variables {f(x)} x∈X such that any finite subset follows a joint Gaussian distribution. A GP is fully specified...

work page arXiv 2026