Stochastic simultaneous optimistic optimization

Alexandra Carpentier; Michal Valko; R\'emi Munos

arxiv: 2604.24537 · v1 · submitted 2026-04-27 · 💻 cs.LG · stat.ML

Stochastic simultaneous optimistic optimization

Michal Valko , Alexandra Carpentier , R\'emi Munos This is my paper

Pith reviewed 2026-05-08 04:03 UTC · model grok-4.3

classification 💻 cs.LG stat.ML

keywords stochastic optimizationglobal maximizationnoisy evaluationsoptimistic algorithmshierarchical partitioningbandit problemssemi-metric smoothnessfinite-time analysis

0 comments

The pith

StoSOO maximizes noisy functions nearly as well as tuned algorithms without knowing the local smoothness.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper presents StoSOO for global maximization of a function observed through noisy evaluations. It operates under the weak assumption that the function is locally smooth around one global maximum with respect to some unknown semi-metric. The method builds hierarchical partitions of the domain and selects sample points by constructing optimistic upper bounds on each partition cell. A finite-time analysis establishes that StoSOO attains performance close to that of algorithms tuned with full knowledge of the smoothness parameters. This removes the need for prior metric information while retaining near-optimal sample efficiency.

Core claim

StoSOO follows an optimistic strategy to iteratively construct upper confidence bounds over the hierarchical partitions of the function domain to decide which point to sample next. A finite-time analysis of StoSOO shows that it performs almost as well as the best specifically-tuned algorithms even though the local smoothness of the function is not known.

What carries the argument

StoSOO, which uses upper confidence bounds on hierarchical partitions to select samples without knowledge of the underlying semi-metric.

If this is right

Optimization problems in general spaces can proceed without separate tuning of smoothness parameters.
The same hierarchical optimistic strategy applies to other noisy global search tasks that admit suitable partitions.
Regret bounds remain controlled even when the semi-metric is discovered on the fly through sampling.
The approach extends earlier bandit algorithms to settings where the metric is not supplied in advance.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Hierarchical partitioning can serve as a substitute for explicit metric knowledge in a wider range of stochastic search problems.
The method may be combined with other partitioning schemes to handle functions that are only locally smooth near multiple maxima.
Similar optimistic bounding ideas could yield parameter-free algorithms for related sequential decision problems.

Load-bearing premise

The function is locally smooth around one of its global maxima with respect to some unknown semi-metric, and the domain allows a hierarchical partitioning that respects this semi-metric.

What would settle it

An empirical run on a test function known to be locally smooth around its maximum, where StoSOO requires substantially more samples than the best algorithm tuned with the true semi-metric to reach the same optimization accuracy.

Figures

Figures reproduced from arXiv: 2604.24537 by Alexandra Carpentier, Michal Valko, R\'emi Munos.

**Figure 1.** Figure 1: Functions with d = 0. Left: Two-sine product function f1(x) = 1 2 (sin(13x)· sin(27x)) + 0.5. Right: Garland function: f2(x) = 4x(1−x)·( 3 4 + 1 4 (1− p | sin(60x)|)). 5.2. Examples Let us consider the case of functions f defined on [0, 1]D that are locally equivalent to a polynomial of degree α around their maximum, i.e., f(x) − f(x ∗ ) = Θ(∥x − x ∗∥ α) for some α > 0, where ∥ · ∥ is any norm. The choice… view at source ↗

**Figure 3.** Figure 3: We illustrate the case of a function with different order in the upper and lower envelopes, when ℓ(x, y) = |x−y| α . Here f(x) = 1− √ x+(−x 2+ √ x)·(sin(1/x2 )+1)/2. The lower-envelope behaves like a square root whereas the upper one is quadratic. The maximum number of ℓ-balls with radius ε that can pack Xε (i.e., Euclidean balls with radius ε 1/α) is at most of order ε 1/2 /ε1/α ≤ ε −3/2 , since α ≤ 1/2 i… view at source ↗

**Figure 2.** Figure 2: Any function satisfying (11) lies in the gray area and possesses a lower- and upper-envelopes that are of same order around x ∗ . In the case when ε < η, due the upper envelope we have that: Xε ⊂ {x : cℓ(x, x∗ ) ≤ ε}, which corresponds to an ℓ-ball centered in x ∗ with radius ε/c. This ball can be packed by no more than a constant number of C ′ ℓ-balls of radius ε. C ′ is necessarily finite because the do… view at source ↗

**Figure 4.** Figure 4: Functions from view at source ↗

**Figure 5.** Figure 5: displays the performance of StoSOO for different levels of noise. We observe that as we increase the number of evaluations, the regret of StoSOO decreases. 10 50 100 500 1000 0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 regret (loss) number of function evaluations 10 50 100 500 1000 −0.05 0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 regret (loss) number of function evaluations 10 50 100 500 1000 0 0.05 0.1 0.15 0.2 0.2… view at source ↗

**Figure 6.** Figure 6: StoSOO (diamonds) vs. Stochastic DOO with ℓ1 (circles) and ℓ2 (squares) on f1. In view at source ↗

read the original abstract

We study the problem of global maximization of a function f given a finite number of evaluations perturbed by noise. We consider a very weak assumption on the function, namely that it is locally smooth (in some precise sense) with respect to some semi-metric, around one of its global maxima. Compared to previous works on bandits in general spaces (Kleinberg et al., 2008; Bubeck et al., 2011a) our algorithm does not require the knowledge of this semi-metric. Our algorithm, StoSOO, follows an optimistic strategy to iteratively construct upper confidence bounds over the hierarchical partitions of the function domain to decide which point to sample next. A finite-time analysis of StoSOO shows that it performs almost as well as the best specifically-tuned algorithms even though the local smoothness of the function is not known.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

StoSOO removes the need to know the smoothness semi-metric for near-optimal stochastic global optimization in general spaces.

read the letter

StoSOO is a new algorithm for noisy global optimization that works without knowing the local smoothness semi-metric, and its analysis shows it nearly matches the performance of metric-aware methods. The paper builds on simultaneous optimistic optimization by maintaining bounds over partitions in a way that doesn't require the semi-metric upfront. This is the key novelty compared to earlier bandit work in general spaces. It does well in providing a finite-time guarantee under weak assumptions: the function is locally smooth w.r.t. some semi-metric near a global max, and the domain has a suitable hierarchical partition. The stochastic noise is handled through the confidence bounds. The paper does a good job laying out the algorithm clearly and stating the assumptions precisely. The finite-time analysis is the main result, showing that the performance loss from not knowing the metric is only small factors. This removes a common barrier in applying these methods to real problems where the smoothness is hard to characterize. On the soft side, the analysis depends on the existence of a hierarchical partitioning that respects the semi-metric, even if the algorithm doesn't use the metric directly. In applications, selecting such a partition might still require some domain knowledge. The bounds are probably not tight in all cases, and there are no experiments to show how it behaves in practice compared to baselines. Since the full proof details aren't in the abstract, one would want to verify the derivation steps during review. This is targeted at people working on theoretical aspects of bandit algorithms and continuous optimization. Anyone interested in making optimistic methods more practical by reducing prior knowledge requirements would get something out of it. Overall, the work shows clear thinking and honest engagement with the literature on general space bandits. It deserves to go to peer review rather than a desk reject.

Referee Report

0 major / 3 minor

Summary. The manuscript introduces StoSOO, an algorithm for noisy global maximization of a function that is locally smooth with respect to an unknown semi-metric around one of its global maxima. The algorithm follows an optimistic strategy by constructing upper confidence bounds over a given hierarchical partitioning of the domain and selects the next evaluation point accordingly. A finite-time analysis is provided showing that StoSOO achieves regret bounds comparable (up to small factors) to those of algorithms that require explicit knowledge of the semi-metric.

Significance. If the finite-time analysis holds, the result is significant because it removes the need for the algorithm to know the semi-metric, addressing a practical limitation of prior work on bandits in general spaces (e.g., Kleinberg et al. 2008, Bubeck et al. 2011). The analysis treats the hierarchical partitioning as given (standard in the SOO family) and uses the semi-metric only to quantify the resulting bound, which is consistent with the central claim.

minor comments (3)

[Abstract] Abstract: the phrase 'almost as well' is informal; the introduction or analysis section should explicitly state the multiplicative factors appearing in the regret bound relative to the tuned baselines.
[Analysis section] The finite-time analysis (presumably Theorem 1 or the main regret theorem) should include a short discussion of how the noise variance enters the concentration inequalities used for the upper confidence bounds, as this is load-bearing for the stochastic setting.
[Section 2] Related work: the comparison to Bubeck et al. (2011a) could be expanded to clarify the precise improvement in the dependence on the unknown smoothness parameters.

Simulated Author's Rebuttal

0 responses · 0 unresolved

We thank the referee for the positive summary and recommendation of minor revision. We are pleased that the significance of StoSOO, which achieves regret bounds comparable to tuned algorithms without knowledge of the semi-metric, is recognized as addressing a practical limitation in prior work on bandits in general spaces.

Circularity Check

0 steps flagged

No significant circularity

full rationale

The derivation chain consists of a standard optimistic hierarchical partitioning strategy whose finite-time regret bound is obtained by applying concentration inequalities to the local smoothness assumption with respect to an unknown semi-metric. The algorithm itself never uses the semi-metric; the metric appears only in the analysis to quantify the resulting rate. No parameter is fitted to the paper's own data and then renamed as a prediction, no self-citation supplies a load-bearing uniqueness theorem, and no ansatz is smuggled in. The central claim therefore remains independent of its own outputs and is self-contained under the stated external assumptions.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim rests on the local smoothness assumption around a global maximum and the existence of a suitable hierarchical partition of the domain; no free parameters or invented entities are visible from the abstract.

axioms (1)

domain assumption The function is locally smooth with respect to some semi-metric around one of its global maxima
Explicitly stated in the abstract as the weak assumption used by the algorithm.

pith-pipeline@v0.9.0 · 5434 in / 1127 out tokens · 59527 ms · 2026-05-08T04:03:57.247218+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

18 extracted references · 18 canonical work pages

[1]

Finite-time Analysis of the Multiarmed Bandit Problem.Machine Learning, 47(2-3):235–256, 2002

Auer, Peter, Cesa-Bianchi, Nicol` o, and Fischer, Paul. Finite-time Analysis of the Multiarmed Bandit Problem.Machine Learning, 47(2-3):235–256, 2002

work page 2002
[2]

Online Optimization of X-armed Bandits

Bubeck, S´ ebastien, Munos, R´ emi, Stoltz, Gilles, and Szepesv´ ari, Csaba. Online Optimization of X-armed Bandits. InNeural Information Processing Systems (NeurIPS), pp. 201–208, 2008

work page 2008
[3]

Pure Exploration in Multi-armed Bandits Problems

Bubeck, S´ ebastien, Munos, R´ emi, and Stoltz, Gilles. Pure Exploration in Multi-armed Bandits Problems. Algorithmic Learning Theory, pp. 23–37, 2009

work page 2009
[4]

Convergence rates of efficient global optimization algorithms.The Journal of Machine Learning Research, 12:2879–2904, 2011

Bull, Adam. Convergence rates of efficient global optimization algorithms.The Journal of Machine Learning Research, 12:2879–2904, 2011

work page 2011
[5]

Bandit Algorithms for Tree Search

Coquelin, Pierre-Arnaud and Munos, R´ emi. Bandit Algorithms for Tree Search. InUncertainty in Arti- ficial Intelligence, pp. 67–74, 2007

work page 2007
[6]

The grand challenge of computer Go: Monte Carlo tree search and extensions.Com- mun

Teytaud, Olivier. The grand challenge of computer Go: Monte Carlo tree search and extensions.Com- mun. ACM, 55(3):106–113, March 2012

work page 2012
[7]

Pure and Applied Mathematics Series

Hansen, Eldon and Walster, William.Global Opti- mization Using Interval Analysis: Revised and Ex- panded. Pure and Applied Mathematics Series. Mar- cel Dekker, 2004

work page 2004
[8]

Optimistic Planning of Deterministic Systems

Hren, Jean-Francois and Munos, R´ emi. Optimistic Planning of Deterministic Systems. InEuropean Workshop on Reinforcement Learning, pp. 151–164, 2008

work page 2008
[9]

Lipschitzian optimization without the Lipschitz con- stant.Journal of Optimization Theory and Applica- tions, 79(1):157–181, 1993

Jones, David, Perttunen, Cary, and Stuckman, Bruce. Lipschitzian optimization without the Lipschitz con- stant.Journal of Optimization Theory and Applica- tions, 79(1):157–181, 1993

work page 1993
[10]

Nonconvex Optimization and Its Ap- plications

Kearfott, R Baker.Rigorous Global Search: Continu- ous Problems. Nonconvex Optimization and Its Ap- plications. Springer, 1996

work page 1996
[11]

Multi-armed bandit problems in metric spaces

Kleinberg, Robert, Slivkins, Alexander, and Upfal, Eli. Multi-armed bandit problems in metric spaces. In Proceedings of the 40th ACM symposium on Theory Of Computing, pp. 681–690, 2008

work page 2008
[12]

Bandit based Monte-Carlo Planning

Kocsis, Levente and Szepesv´ ari, Csaba. Bandit based Monte-Carlo Planning. InProceedings of the 15th European Conference on Machine Learning, pp. 282–293. Springer, 2006

work page 2006
[13]

Optimistic Optimization of Determinis- tic Functions without the Knowledge of its Smooth- ness

Munos, R´ emi. Optimistic Optimization of Determinis- tic Functions without the Knowledge of its Smooth- ness. InNeural Information Processing Systems (NeurIPS), pp. 783–791, 2011

work page 2011
[14]

Encyclopedia of Mathematics and its Applications

Neumaier, Arnold.Interval Methods for Systems of Equations. Encyclopedia of Mathematics and its Applications. Cambridge University Press, 2008

work page 2008
[15]

PhD thesis, University of Oxford, 2010

Osborne, Michael.Bayesian Gaussian processes for sequential prediction, optimisation and quadrature. PhD thesis, University of Oxford, 2010. Pint´ er, J´ anos.Global Optimization in Action: Contin- uous and Lipschitz Optimization: Algorithms, Im- plementations and Applications. Nonconvex Opti- mization and Its Applications. Springer, 1995

work page 2010
[16]

Multi-armed bandits on implicit metric spaces

Slivkins, Aleksandrs. Multi-armed bandits on implicit metric spaces. InNeural Information Processing Systems (NeurIPS), pp. 1602–1610. 2011

work page 2011
[17]

Gaussian Process Optimiza- tion in the Bandit Setting: No Regret and Exper- imental Design.International Conference on Ma- chine Learning (ICML), pp

Srinivas, Niranjan, Krause, Andreas, Kakade, Sham, and Seeger, Matthias. Gaussian Process Optimiza- tion in the Bandit Setting: No Regret and Exper- imental Design.International Conference on Ma- chine Learning (ICML), pp. 1015–1022, 2010

work page 2010
[18]

Nonconvex Optimization and Its Applications

Strongin, Roman and Sergeyev, Yaroslav.Global Opti- mization with Non-Convex Constraints: Sequential and Parallel Algorithms. Nonconvex Optimization and Its Applications. Springer, 2000

work page 2000

[1] [1]

Finite-time Analysis of the Multiarmed Bandit Problem.Machine Learning, 47(2-3):235–256, 2002

Auer, Peter, Cesa-Bianchi, Nicol` o, and Fischer, Paul. Finite-time Analysis of the Multiarmed Bandit Problem.Machine Learning, 47(2-3):235–256, 2002

work page 2002

[2] [2]

Online Optimization of X-armed Bandits

Bubeck, S´ ebastien, Munos, R´ emi, Stoltz, Gilles, and Szepesv´ ari, Csaba. Online Optimization of X-armed Bandits. InNeural Information Processing Systems (NeurIPS), pp. 201–208, 2008

work page 2008

[3] [3]

Pure Exploration in Multi-armed Bandits Problems

Bubeck, S´ ebastien, Munos, R´ emi, and Stoltz, Gilles. Pure Exploration in Multi-armed Bandits Problems. Algorithmic Learning Theory, pp. 23–37, 2009

work page 2009

[4] [4]

Convergence rates of efficient global optimization algorithms.The Journal of Machine Learning Research, 12:2879–2904, 2011

Bull, Adam. Convergence rates of efficient global optimization algorithms.The Journal of Machine Learning Research, 12:2879–2904, 2011

work page 2011

[5] [5]

Bandit Algorithms for Tree Search

Coquelin, Pierre-Arnaud and Munos, R´ emi. Bandit Algorithms for Tree Search. InUncertainty in Arti- ficial Intelligence, pp. 67–74, 2007

work page 2007

[6] [6]

The grand challenge of computer Go: Monte Carlo tree search and extensions.Com- mun

Teytaud, Olivier. The grand challenge of computer Go: Monte Carlo tree search and extensions.Com- mun. ACM, 55(3):106–113, March 2012

work page 2012

[7] [7]

Pure and Applied Mathematics Series

Hansen, Eldon and Walster, William.Global Opti- mization Using Interval Analysis: Revised and Ex- panded. Pure and Applied Mathematics Series. Mar- cel Dekker, 2004

work page 2004

[8] [8]

Optimistic Planning of Deterministic Systems

Hren, Jean-Francois and Munos, R´ emi. Optimistic Planning of Deterministic Systems. InEuropean Workshop on Reinforcement Learning, pp. 151–164, 2008

work page 2008

[9] [9]

Lipschitzian optimization without the Lipschitz con- stant.Journal of Optimization Theory and Applica- tions, 79(1):157–181, 1993

Jones, David, Perttunen, Cary, and Stuckman, Bruce. Lipschitzian optimization without the Lipschitz con- stant.Journal of Optimization Theory and Applica- tions, 79(1):157–181, 1993

work page 1993

[10] [10]

Nonconvex Optimization and Its Ap- plications

Kearfott, R Baker.Rigorous Global Search: Continu- ous Problems. Nonconvex Optimization and Its Ap- plications. Springer, 1996

work page 1996

[11] [11]

Multi-armed bandit problems in metric spaces

Kleinberg, Robert, Slivkins, Alexander, and Upfal, Eli. Multi-armed bandit problems in metric spaces. In Proceedings of the 40th ACM symposium on Theory Of Computing, pp. 681–690, 2008

work page 2008

[12] [12]

Bandit based Monte-Carlo Planning

Kocsis, Levente and Szepesv´ ari, Csaba. Bandit based Monte-Carlo Planning. InProceedings of the 15th European Conference on Machine Learning, pp. 282–293. Springer, 2006

work page 2006

[13] [13]

Optimistic Optimization of Determinis- tic Functions without the Knowledge of its Smooth- ness

Munos, R´ emi. Optimistic Optimization of Determinis- tic Functions without the Knowledge of its Smooth- ness. InNeural Information Processing Systems (NeurIPS), pp. 783–791, 2011

work page 2011

[14] [14]

Encyclopedia of Mathematics and its Applications

Neumaier, Arnold.Interval Methods for Systems of Equations. Encyclopedia of Mathematics and its Applications. Cambridge University Press, 2008

work page 2008

[15] [15]

PhD thesis, University of Oxford, 2010

Osborne, Michael.Bayesian Gaussian processes for sequential prediction, optimisation and quadrature. PhD thesis, University of Oxford, 2010. Pint´ er, J´ anos.Global Optimization in Action: Contin- uous and Lipschitz Optimization: Algorithms, Im- plementations and Applications. Nonconvex Opti- mization and Its Applications. Springer, 1995

work page 2010

[16] [16]

Multi-armed bandits on implicit metric spaces

Slivkins, Aleksandrs. Multi-armed bandits on implicit metric spaces. InNeural Information Processing Systems (NeurIPS), pp. 1602–1610. 2011

work page 2011

[17] [17]

Gaussian Process Optimiza- tion in the Bandit Setting: No Regret and Exper- imental Design.International Conference on Ma- chine Learning (ICML), pp

Srinivas, Niranjan, Krause, Andreas, Kakade, Sham, and Seeger, Matthias. Gaussian Process Optimiza- tion in the Bandit Setting: No Regret and Exper- imental Design.International Conference on Ma- chine Learning (ICML), pp. 1015–1022, 2010

work page 2010

[18] [18]

Nonconvex Optimization and Its Applications

Strongin, Roman and Sergeyev, Yaroslav.Global Opti- mization with Non-Convex Constraints: Sequential and Parallel Algorithms. Nonconvex Optimization and Its Applications. Springer, 2000

work page 2000