Bandits attack function optimization
Pith reviewed 2026-05-07 17:08 UTC · model grok-4.3
The pith
Simultaneous Optimistic Optimization frames continuous function optimization as a bandit problem solved by domain partitioning.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
By modeling the continuous optimization task as a multi-armed bandit problem and attacking it with simultaneous optimistic selection over a partition tree of the domain, SOO produces a deterministic algorithm that locates high-value regions efficiently and returns a solution with performance guarantees.
What carries the argument
The Simultaneous Optimistic Optimization (SOO) algorithm, which builds and refines a hierarchical partition of the search domain and evaluates points in cells chosen by optimistic upper bounds on their maximum value.
Load-bearing premise
Modeling continuous function optimization as a multi-armed bandit via domain partitioning is assumed to preserve the exploration-exploitation tradeoff and deliver the claimed performance guarantees without extra unstated conditions on the objective function.
What would settle it
Observing that SOO returns a poor solution or exhausts its evaluation budget without locating a near-global optimum on a standard multimodal test function from the CEC suite would directly challenge the central claim.
Figures
read the original abstract
We consider function optimization as a sequential decision making problem under budget constraint. This constraint limits the number of objective function evaluations allowed during the optimization. We consider an algorithm inspired by a continuous version of a multi-armed bandit problem which attacks this optimization problem by solving the tradeoff between exploration (initial quasi-uniform search of the domain) and exploitation (local optimization around the potentially global maxima). We introduce the so-called Simultaneous Optimistic Optimization (SOO), a deterministic algorithm that works by domain partitioning. The benefit of such approach are the guarantees on the returned solution and the numerical efficiency of the algorithm. We present this machine learning approach to optimization, and provide the empirical assessment of SOO on the CEC'2014 competition on single objective real-parameter numerical optimization test-suite.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript frames continuous function optimization as a budgeted sequential decision-making problem and introduces the Simultaneous Optimistic Optimization (SOO) algorithm. SOO is a deterministic domain-partitioning method inspired by continuous multi-armed bandits that balances exploration (quasi-uniform search) and exploitation (local optimization). The authors claim that this yields explicit guarantees on the quality of the returned solution together with numerical efficiency, and they report an empirical evaluation on the CEC'2014 single-objective real-parameter optimization benchmark suite.
Significance. If the claimed guarantees can be rigorously derived under clearly stated regularity conditions and the empirical results prove competitive, the work would usefully connect bandit theory to deterministic global optimization. The deterministic partitioning strategy offers potential advantages in reproducibility and efficiency over stochastic alternatives. The evaluation on the standard CEC'2014 suite is a positive step toward comparability, but the absence of proof sketches, error bounds, or explicit assumptions limits the immediate contribution.
major comments (2)
- [Abstract] Abstract: The assertion that SOO provides 'guarantees on the returned solution' is central to the contribution yet omits any statement of the regularity conditions (Lipschitz or Hölder continuity with parameters L and α) that are required for finite-time bounds in domain-partitioning bandit algorithms. This omission is load-bearing because the exploration-exploitation tradeoff and performance claims rest on these conditions, which are neither named nor verified for the CEC'2014 functions that include non-smooth and discontinuous cases.
- [Empirical assessment] Empirical assessment: No specific performance metrics, convergence rates, or comparisons with baselines are supplied in the abstract, and the evaluation on the CEC'2014 suite does not report whether the test functions satisfy the smoothness assumptions needed for the guarantees or how any such parameters are selected or estimated.
Simulated Author's Rebuttal
We thank the referee for the constructive and detailed comments, which help clarify the presentation of our theoretical contributions and empirical evaluation. We address each major comment point by point below.
read point-by-point responses
-
Referee: [Abstract] Abstract: The assertion that SOO provides 'guarantees on the returned solution' is central to the contribution yet omits any statement of the regularity conditions (Lipschitz or Hölder continuity with parameters L and α) that are required for finite-time bounds in domain-partitioning bandit algorithms. This omission is load-bearing because the exploration-exploitation tradeoff and performance claims rest on these conditions, which are neither named nor verified for the CEC'2014 functions that include non-smooth and discontinuous cases.
Authors: We agree that explicitly naming the regularity conditions in the abstract strengthens the manuscript. The finite-time guarantees for SOO are derived under the standard assumption that the objective function is Hölder continuous with parameters L and α (as formalized in the theoretical analysis of Section 3, following the framework of continuous bandit algorithms). We will revise the abstract to state these conditions clearly. For the CEC'2014 suite, we acknowledge that several functions are non-smooth or discontinuous; the guarantees apply where the local Hölder condition holds, and the algorithm remains well-defined otherwise. We will add a brief discussion in the revised manuscript on the applicability of the assumptions to the benchmark and the practical choice of L and α (typically set via domain scaling or conservative estimates). revision: yes
-
Referee: [Empirical assessment] Empirical assessment: No specific performance metrics, convergence rates, or comparisons with baselines are supplied in the abstract, and the evaluation on the CEC'2014 suite does not report whether the test functions satisfy the smoothness assumptions needed for the guarantees or how any such parameters are selected or estimated.
Authors: The abstract is kept concise per standard practice and focuses on the problem framing and method; detailed performance metrics (error values, ranks), convergence behavior, and comparisons against CEC baselines (e.g., CMA-ES and other top entries) are reported in Section 4 with tables and figures. We will add a short phrase to the abstract summarizing the competitive empirical results if space permits. On the smoothness assumptions, we will expand the experimental section to discuss the Hölder properties of the CEC'2014 functions (noting which satisfy global continuity) and explicitly describe the parameter selection procedure used (fixed conservative values based on the search domain bounds). This provides the missing verification without requiring new experiments. revision: partial
Circularity Check
No circularity: new algorithm construction with independent derivation of guarantees
full rationale
The paper introduces SOO as a novel deterministic domain-partitioning algorithm for continuous optimization framed as a bandit problem. Its claimed guarantees on the returned solution follow from the explicit partitioning rule and optimistic selection criterion under stated regularity conditions on the objective (e.g., Hölder continuity), which are external to the algorithm itself rather than fitted from the same data or smuggled via self-citation. No step reduces a prediction or theorem to a quantity defined by the result; the empirical evaluation on CEC'2014 is presented separately from the theoretical claims. This is the normal case of a self-contained algorithmic contribution.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption Function optimization under budget can be modeled as a continuous multi-armed bandit problem solved by domain partitioning.
invented entities (1)
-
Simultaneous Optimistic Optimization (SOO) algorithm
no independent evidence
Reference graph
Works this paper leans on
-
[1]
J.-Y . Audibert, S. Bubeck, and R. Munos. Best arm identification in multi-armed bandits. InConference on Learning Theory (COLT), 2010
work page 2010
-
[2]
P. Auer, N. Cesa-Bianchi, and P. Fischer. Finite time analysis of the multiarmed bandit problem.Machine Learning, 47(2-3):235–256, 2002
work page 2002
- [3]
-
[4]
S. G. Johnson. The NLopt nonlinear-optimization package.http: //ab-initio.mit.edu/nlopt
-
[5]
D. R. Jones, C. D. Perttunen, and B. E. Stuckman. Lipschitzian optimization without the Lipschitz constant.Journal of Optimization Theory and Applications, 79(1):157–181, 1993. TABLE IV FOR EACH FUNCTION OF THECEC’2014COMPETITION DEFINED IN DIMENSIONS10,AND30,THIS TABLE GIVES THE RATIO f(x ∗) f ∗ OBTAINED AFTER POST-PROCESSING THE BEST POINT FOUND BYSOOWI...
-
[6]
T. L. Lai and H. Robbins. Asymptotically efficient adaptive allocation rules.Advances in Applied Mathematics, 6:4–22, 1985
work page 1985
-
[7]
J. J. Liang, B. Y . Qu, and P. N. Suganthan. Problem definitions and evaluation criteria for the CEC 2014 special session and competition on single objective real-parameter numerical optimization. Technical Re- port 201311, Nanyang Technological University, Singapore, December 2014.http://www.ntu.edu.sg/home/EPNSugan/index_ files/CEC2014/CEC2014.htm
work page 2014
-
[8]
R. Munos. Optimistic optimization of deterministic functions without the knowledge of its smoothness. InNeural Information Processing Systems (NeurIPS), 2011
work page 2011
-
[9]
R. Munos. From bandits to Monte-Carlo Tree Search: The optimistic principle applied to optimization and planning.Foundations and Trends in Machine Learning, pages 1–130, 2014
work page 2014
-
[10]
M. J. D. Powell. The BOBYQA algorithm for bound constrained optimization without derivatives. Technical Report NA2009/06, De- partment of Applied Mathematics and Theoretical Physics, University of Cambridge, 2009
work page 2009
-
[11]
H. Robbins. Some aspects of the sequential design of experiments. Bulletin of the American Mathematical Society, 58:527–535, 1952
work page 1952
-
[12]
Rowan.Functional Stability Analysis of Numerical Algorithms
T. Rowan.Functional Stability Analysis of Numerical Algorithms. PhD thesis, Department of Computer Sciences, University of Texas at Austin, 1990
work page 1990
-
[13]
W. R. Thompson. On the likelihood that one unknown probability exceeds another in view of the evidence of two samples.Biometrika, 25:285–294, 1933
work page 1933
- [14]
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.