Bridging Spherical Black-Box Optimizers

Johannes Ackermann; Stefano Peluchetti

arxiv: 2606.25761 · v1 · pith:OY2SLGUBnew · submitted 2026-06-24 · 💻 cs.LG · math.OC

Bridging Spherical Black-Box Optimizers

Johannes Ackermann , Stefano Peluchetti This is my paper

Pith reviewed 2026-06-25 20:30 UTC · model grok-4.3

classification 💻 cs.LG math.OC

keywords black-box optimizationevolution strategiesconsensus-based optimizationoptimization via integrationhybrid optimizersfitness aggregationconsensus scopecontinuous control

0 comments

The pith

ES, CBO and OVI black-box optimizers differ mainly by fitness aggregation and consensus scope, enabling hybrids that interpolate their behaviors.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper unifies Evolution Strategies, Consensus-Based Optimization and Optimization via Integration inside one theoretical framework for gradient-free optimization. It argues that the methods vary along two axes: fitness aggregation, which sets preference for sharp versus flat solutions, and consensus scope, which sets how many particles or parameters participate in the update. From this view the authors construct hybrid algorithms, including an ES-OVI variant that lets users dial the flat-minima bias and CBO-OVI variants that mix parametric efficiency with particle-based multimodality. Experiments on continuous-control tasks and language-model merging show the hybrids can exceed the performance of the parent methods under fixed evaluation budgets.

Core claim

We unify these approaches within a common theoretical framework, revealing that they differ primarily in two design choices: fitness aggregation (controlling sharpness preference) and consensus scope (controlling modality). Leveraging these insights, we introduce hybrid optimizers that interpolate between existing methods.

What carries the argument

The two design axes of fitness aggregation and consensus scope that parameterize a family of spherical black-box optimizers and allow construction of interpolating hybrids.

If this is right

ES-OVI hybrids give explicit control over preference for flat minima and thereby trade performance against robustness on continuous control tasks.
CBO-OVI hybrids combine the sample efficiency of parametric updates with the multimodal search of particle methods, producing competitive results on language-model merging under tight evaluation limits.
On standard BBO benchmarks and higher-dimensional locomotion tasks the hybrids can outperform the original constituent algorithms.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

If the two-axis view is complete, the same parameterization could be used to generate additional hybrids that target other trade-offs not yet explored in the paper.
The framework suggests that tuning aggregation and scope separately may offer a more interpretable alternative to hand-crafted optimizer variants for new problem classes.
Extending the unification to methods outside the spherical family could reveal whether similar axes govern their design choices.

Load-bearing premise

The main distinctions among ES, CBO, OVI and related methods are captured exactly by the two axes of fitness aggregation and consensus scope.

What would settle it

A controlled test in which an ES-OVI or CBO-OVI hybrid loses a core property of one parent method that cannot be restored by any setting of the two axes.

Figures

Figures reproduced from arXiv: 2606.25761 by Johannes Ackermann, Stefano Peluchetti.

**Figure 1.** Figure 1: We investigate connections between the parametric ES, OVI, the nonparametric CBO, and further related optimizers. By utilizing these connections, we can derive hybrid methods, indicated by green arrows, that combine the strengths of existing optimizers: ES-OVI allows us to control convergence characteristics, SchedPol and AdaPol combine CBO and OVI updates, allowing us to obtain multiple optima in higher… view at source ↗

**Figure 3.** Figure 3: ES-OVI lets us control the flatness of the optimum. Markers indicate the minimum of J α on the Rosenbrock function for different values of α, yellow (α = 0, ES) to red (α = 1, OVI). as illustrated by the Rosenbrock example in [PITH_FULL_IMAGE:figures/full_fig_p005_3.png] view at source ↗

**Figure 4.** Figure 4: AdaPol improves upon CBO in higher dimensional tasks. OVI is competitive with CMA-ES. Evaluation on Brax tasks with 10 seeds each. The shaded areas show 95% CIs across 10 seeds, hyper-parameters were optimized for each method per each task. thus combine both approaches to obtain a method capable of finding multiple optima in higher-dimensional problems. AdaPol & SchedPol Since CH is equivalent to OVI, we c… view at source ↗

**Figure 5.** Figure 5: Hybrid optimizers can outperform base methods. Evaluation on a selection of 2D BBO tasks. Note that in multiple problems, ES-OVI performs better than either ES or OVI. The shaded areas show 95% CIs across 10 random seeds, hyperparameters were optimized for each method in each problem. 5.1. Benchmarks To our knowledge, neither CBO nor OVI has been evaluated on the popular BBO Benchmark (BBOB) (Hansen et a… view at source ↗

**Figure 6.** Figure 6: ES-OVI allows us to tune the optimization behavior for each environment. Performance on Brax of ES-OVI with different interpolation-coefficients α. α = 1.0 corresponds to OVI, α = 0.0 corresponds to ES. 20 trials, 90% CIs. Black-Box Optimization Benchmark For lowdimensional problems, we evaluate each method on 23 of the BBOB benchmark problems (Hansen et al., 2009). The BBOB benchmark consists of a series… view at source ↗

**Figure 7.** Figure 7: ES-OVI allows us to trade performance vs robustness. Robustness on the Acrobot task, trained with ES-OVI with different α values. Mean across 20 different random seeds with bootstrapped 95% CIs. robustness than either OVI or ES under strong action or observation disturbances. This experiment also gives us some guidance on how we can approach picking the hyperparameter α: When we have less confidence in t… view at source ↗

**Figure 9.** Figure 9: Results on BBOB tasks in 2D 19 [PITH_FULL_IMAGE:figures/full_fig_p019_9.png] view at source ↗

**Figure 10.** Figure 10: Results on BBOB tasks in 5D 20 [PITH_FULL_IMAGE:figures/full_fig_p020_10.png] view at source ↗

**Figure 11.** Figure 11: Results on BBOB tasks in 7D 21 [PITH_FULL_IMAGE:figures/full_fig_p021_11.png] view at source ↗

**Figure 12.** Figure 12: Results on BBOB tasks in 10D 22 [PITH_FULL_IMAGE:figures/full_fig_p022_12.png] view at source ↗

**Figure 13.** Figure 13: Results on BBOB tasks in 15D 23 [PITH_FULL_IMAGE:figures/full_fig_p023_13.png] view at source ↗

**Figure 14.** Figure 14 [PITH_FULL_IMAGE:figures/full_fig_p024_14.png] view at source ↗

read the original abstract

When gradient information is unavailable, black-box optimization (BBO) methods provide a practical alternative. While Evolution Strategies (ES), Consensus-Based Optimization (CBO), Optimization via Integration (OVI), and related methods have each been studied independently, their connections remain underexplored. We unify these approaches within a common theoretical framework, revealing that they differ primarily in two design choices: fitness aggregation (controlling sharpness preference) and consensus scope (controlling modality). Leveraging these insights, we introduce hybrid optimizers that interpolate between existing methods. Our ES-OVI hybrid allows explicit control over the preference for flat minima, enabling a trade-off between performance and robustness in continuous control tasks. Our CBO-OVI hybrids combine the higher-dimensional efficiency of parametric methods with the multimodal capabilities of particle-based approaches, achieving competitive results on language model merging under limited evaluation budgets. We validate our methods on standard BBO benchmarks and higher-dimensional locomotion tasks, demonstrating that the hybrid methods can outperform their constituent algorithms.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper gives a clean two-axis framing for black-box methods and shows workable hybrids, but the unification claim needs the full derivations to judge.

read the letter

The main thing here is a practical organizing frame for ES, CBO, and OVI that splits them on fitness aggregation (how sharply they prefer good points) and consensus scope (how local or global the update is). They then build two hybrids, ES-OVI and CBO-OVI, and test them on control tasks and language-model merging.

What lands is the hybrids themselves. The ES-OVI version lets you dial in flat-minima preference, and the CBO-OVI versions mix parametric speed with particle multimodality. The abstract reports competitive numbers on locomotion and merging under tight budgets, which is the kind of concrete result that matters for people who actually run these things.

The soft spot is the unification step. The abstract says the methods differ mainly on those two axes, but without the mappings or the equations that turn the original updates into the new frame, it is hard to know how exact the correspondence is or whether the hybrids keep the original convergence properties. If the full paper only shows empirical wins without closing that loop, the theoretical claim stays provisional.

This is for people who already work with derivative-free methods in ML or robotics and want knobs to trade off sharpness and modality. It is not a foundational rewrite of the field, but the hybrids look like they could be useful tools.

I would send it to referees. The empirical side is concrete enough to be worth checking, and the framing is simple enough that a clear write-up would help the subfield even if the unification turns out to be looser than claimed.

Referee Report

2 major / 2 minor

Summary. The paper claims to unify Evolution Strategies (ES), Consensus-Based Optimization (CBO), Optimization via Integration (OVI) and related spherical black-box methods in a common theoretical framework. The unification identifies two primary design axes—fitness aggregation (governing sharpness preference) and consensus scope (governing modality)—as the main distinctions among the methods. It then constructs hybrid optimizers (ES-OVI and CBO-OVI) that interpolate along these axes, with the ES-OVI hybrid providing explicit control over flat-minima preference and the CBO-OVI hybrids combining parametric efficiency with particle-based multimodality. Empirical results on standard BBO benchmarks and higher-dimensional locomotion tasks are reported to show that the hybrids can outperform their constituent algorithms.

Significance. If the two-axis unification is shown to be faithful and the hybrids preserve the essential properties of the source methods while delivering the claimed trade-offs, the work would offer a principled route to designing new black-box optimizers. This could be particularly useful for continuous-control and high-dimensional tasks where robustness to modality and preference for flat minima matter, and where evaluation budgets are limited.

major comments (2)

[Abstract / §3 (theoretical framework)] The central unification claim rests on the assertion that ES, CBO and OVI differ primarily along the two axes of fitness aggregation and consensus scope. Without explicit mappings (e.g., how the update rules or objective functionals of each method are recovered as special cases of the proposed framework), it is impossible to verify whether the hybrids truly interpolate without losing essential algorithmic properties.
[§4 (hybrid construction) and experimental section] The ES-OVI hybrid is said to enable explicit control over flat-minima preference. The manuscript should demonstrate that this control is achieved without introducing additional free parameters beyond those already present in the constituent methods, and that the resulting performance-robustness trade-off is not an artifact of hyper-parameter tuning.

minor comments (2)

[Abstract] The abstract refers to “spherical” black-box optimizers; the manuscript should clarify whether this refers to a specific geometric constraint on the search space or is simply descriptive of the methods considered.
[Experimental results] When reporting outperformance on locomotion tasks, the number of independent runs, statistical significance tests, and exact evaluation budgets should be stated explicitly so that the claimed superiority of the hybrids can be assessed.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the careful reading and constructive feedback. We address the two major comments below and will revise the manuscript accordingly to strengthen the presentation of the unification and the hybrid constructions.

read point-by-point responses

Referee: [Abstract / §3 (theoretical framework)] The central unification claim rests on the assertion that ES, CBO and OVI differ primarily along the two axes of fitness aggregation and consensus scope. Without explicit mappings (e.g., how the update rules or objective functionals of each method are recovered as special cases of the proposed framework), it is impossible to verify whether the hybrids truly interpolate without losing essential algorithmic properties.

Authors: We agree that explicit recovery of the base methods would make the unification more verifiable. Section 3 already frames the two axes and derives the general update, but we will add a dedicated subsection with the precise algebraic mappings showing how the standard ES, CBO, and OVI update rules (and their objective functionals) arise as special cases. This will also confirm that the proposed hybrids remain within the same family and preserve the core algorithmic properties. revision: yes
Referee: [§4 (hybrid construction) and experimental section] The ES-OVI hybrid is said to enable explicit control over flat-minima preference. The manuscript should demonstrate that this control is achieved without introducing additional free parameters beyond those already present in the constituent methods, and that the resulting performance-robustness trade-off is not an artifact of hyper-parameter tuning.

Authors: The flat-minima preference in ES-OVI is governed by the fitness-aggregation parameter already present in both base methods; no new free parameters are introduced. We will revise §4 to include an explicit parameter-correspondence table and add targeted ablation experiments in which only the aggregation parameter is varied while all other hyperparameters remain fixed at the values used for the constituent algorithms. These results will be reported to show that the observed trade-off is attributable to the design axis rather than additional tuning. revision: yes

Circularity Check

0 steps flagged

No significant circularity; unification framework is self-contained

full rationale

The paper's central claim is a unification of ES, CBO, OVI and related methods via two axes (fitness aggregation and consensus scope), followed by construction of hybrid optimizers. The provided abstract and text contain no equations, no fitted parameters renamed as predictions, no self-citations invoked as load-bearing uniqueness theorems, and no derivations that reduce outputs to inputs by construction. The framework is presented as an organizing lens that enables interpolation, without any step where the claimed differences or hybrids are defined circularly in terms of themselves. This is the normal case of an independent conceptual contribution.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract-only input provides no identifiable free parameters, axioms, or invented entities; full text would be required to audit these.

pith-pipeline@v0.9.1-grok · 5694 in / 1040 out tokens · 23262 ms · 2026-06-25T20:30:37.916403+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

47 extracted references · 3 linked inside Pith

[1]

Evolutionary optimization of model merging recipes

Akiba, T., Shing, M., Tang, Y., Sun, Q., and Ha, D. Evolutionary optimization of model merging recipes. Nature Machine Intelligence, 7: 0 195--204, 2025

2025
[2]

Gradient-free optimization via integration, 2024

Andrieu, C., Chopin, N., Fincato, E., and Gerber, M. Gradient-free optimization via integration, 2024. URL https://arxiv.org/abs/2408.00888

arXiv 2024
[3]

shisa-gamma-7b-v1, 2023

augmxnt. shisa-gamma-7b-v1, 2023. URL https://huggingface.co/augmxnt/shisa-gamma-7b-v1

2023
[4]

J., Leary, C., Maclaurin, D., Necula, G., Paszke, A., Vander P las, J., Wanderman- M ilne, S., and Zhang, Q

Bradbury, J., Frostig, R., Hawkins, P., Johnson, M. J., Leary, C., Maclaurin, D., Necula, G., Paszke, A., Vander P las, J., Wanderman- M ilne, S., and Zhang, Q. JAX : composable transformations of P ython+ N um P y programs, 2018. URL http://github.com/jax-ml/jax

2018
[5]

V., Lange, R

Braun, C. V., Lange, R. T., and Toussaint, M. Stein Variational Evolution Strategies . In UAI . arXiv, 2025. URL https://proceedings.mlr.press/v286/braun25a.html

2025
[6]

Polarized consensus-based dynamics for optimization and sampling

Bungert, L., Roith, T., and Wacker, P. Polarized consensus-based dynamics for optimization and sampling. Mathematical Programming, 211: 0 125--155, 2025

2025
[7]

A., Jin, S., Li, L., and Zhu, Y

Carrillo, J. A., Jin, S., Li, L., and Zhu, Y. A consensus-based global optimization method for high dimensional machine learning problems. ESAIM: Control, Optimisation and Calculus of Variations, 27: 0 S5, 2021

2021
[8]

Generative ai for math: Abel

Chern, E., Zou, H., Li, X., Hu, J., Feng, K., Li, J., and Liu, P. Generative ai for math: Abel. https://github.com/GAIR-NLP/abel, 2023

2023
[9]

Sharpness- Aware Minimization for Efficiently Improving Generalization

Foret, P., Kleiner, A., Mobahi, H., and Neyshabur, B. Sharpness- Aware Minimization for Efficiently Improving Generalization . In ICLR , 2021

2021
[10]

D., Frey, E., Raichuk, A., Girgin, S., Mordatch, I., and Bachem, O

Freeman, C. D., Frey, E., Raichuk, A., Girgin, S., Mordatch, I., and Bachem, O. Brax -- A Differentiable Physics Engine for Large Scale Rigid Body Simulation . In NeurIPS Datasets and Benchmarks Track, 2021

2021
[11]

The language model evaluation harness, 07 2024

Gao, L., Tow, J., Abbasi, B., Biderman, S., Black, S., DiPofi, A., Foster, C., Golding, L., Hsu, J., Le Noac'h, A., Li, H., McDonell, K., Muennighoff, N., Ociepa, C., Phang, J., Reynolds, L., Schoelkopf, H., Skowron, A., Sutawika, L., Tang, E., Thite, A., Wang, B., Wang, K., and Zou, A. The language model evaluation harness, 07 2024. URL https://zenodo.or...

arXiv 2024
[12]

Monte Carlo Methods in Financial Engineering , volume 53 of Stochastic Modelling and Applied Probability

Glasserman, P. Monte Carlo Methods in Financial Engineering , volume 53 of Stochastic Modelling and Applied Probability. Springer, New York, 2004

2004
[13]

Glynn, P. W. Likelihood ratio gradient estimation for stochastic systems. Commun. ACM, 33 0 (10): 0 75--84, 1990

1990
[14]

The CMA Evolution Strategy : A Tutorial , 2016

Hansen, N. The CMA Evolution Strategy : A Tutorial , 2016. URL https://arxiv.org/abs/1604.00772

Pith/arXiv arXiv 2016
[15]

Real- Parameter Black - Box Optimization Benchmarking 2009: Noiseless Functions Definitions

Hansen, N., Finck, S., Ros, R., and Auger, A. Real- Parameter Black - Box Optimization Benchmarking 2009: Noiseless Functions Definitions . Research Report RR-6829, INRIA, 2009. URL https://inria.hal.science/inria-00362633

2009
[16]

and Schmidhuber, J

Hochreiter, S. and Schmidhuber, J. Flat Minima . Neural Computation, 9 0 (1): 0 1--42, 1997

1997
[17]

T., Wortsman, M., Gururangan, S., Schmidt, L., Hajishirzi, H., and Farhadi, A

Ilharco, G., Ribeiro, M. T., Wortsman, M., Gururangan, S., Schmidt, L., Hajishirzi, H., and Farhadi, A. Editing Models with Task Arithmetic . In ICLR , 2023

2023
[18]

Q., Sablayrolles, A., Mensch, A., Bamford, C., Chaplot, D

Jiang, A. Q., Sablayrolles, A., Mensch, A., Bamford, C., Chaplot, D. S., Casas, D. d. l., Bressand, F., Lengyel, G., Lample, G., Saulnier, L., Lavaud, L. R., Lachaux, M.-A., Stock, P., Scao, T. L., Lavril, T., Wang, T., Lacroix, T., and Sayed, W. E. Mistral 7B , 2023. URL https://arxiv.org/abs/2310.06825

Pith/arXiv arXiv 2023
[19]

Katkovnik, V. Ya . and Kulchitsky, O. Yu . Convergence of a Class of Random Search Algorithms . Automation and Remote Control, 33: 0 1321--1326, 1972

1972
[20]

Lange, R. T. evosax: Jax-based evolution strategies, 2022. URL https://arxiv.org/abs/2212.04180

arXiv 2022
[21]

T., Schaul, T., Chen, Y., Zahavy, T., Dallibard, V., Lu, C., Singh, S., and Flennerhag, S

Lange, R. T., Schaul, T., Chen, Y., Zahavy, T., Dallibard, V., Lu, C., Singh, S., and Flennerhag, S. Discovering Evolution Strategies via Meta - Black - Box Optimization . In ICLR , 2023. URL https://openreview.net/forum?id=mFDU0fP3EQH

2023
[22]

Lee, H. K. and Yoon, S. W. Flat Reward in Policy Parameter Space Implies Robust Reinforcement Learning . In ICLR , 2025. URL https://openreview.net/forum?id=4OaO3GjP7k

2025
[23]

Wizardmath: Empowering mathematical reasoning for large language models via reinforced evol-instruct

Luo, H., Sun, Q., Xu, C., Zhao, P., Lou, J., Tao, C., Geng, X., Lin, Q., Chen, S., and Zhang, D. Wizardmath: Empowering mathematical reasoning for large language models via reinforced evol-instruct. In ICRL, 2025. URL https://openreview.net/forum?id=mMPMHWOdOy

2025
[24]

R., Crisostomi, D., Santilli, A., and Rodolà, E

Mencattini, T., Minut, A. R., Crisostomi, D., Santilli, A., and Rodolà, E. MERGE \ 3\ : Efficient Evolutionary Merging on Consumer -grade GPUs . In ICML, 2025. URL https://proceedings.mlr.press/v267/mencattini25a.html

2025
[25]

D., Schoenholz, S

Metz, L., Freeman, C. D., Schoenholz, S. S., and Kachman, T. Gradients are Not All You Need , 2022. URL http://arxiv.org/abs/2111.05803

arXiv 2022
[26]

and Spokoiny, V

Nesterov, Y. and Spokoiny, V. Random gradient-free minimization of convex functions. Foundations of Computational Mathematics, 17 0 (2): 0 527--566, 2017

2017
[27]

Information- Geometric Optimization Algorithms : A Unifying Picture via Invariance Principles

Ollivier, Y., Arnold, L., Auger, A., and Hansen, N. Information- Geometric Optimization Algorithms : A Unifying Picture via Invariance Principles . JMLR, 2017. URL https://jmlr.org/papers/v18/14-467.html

2017
[28]

A consensus-based model for global optimization and its mean-field limit

Pinnau, R., Totzeck, C., Tse, O., and Martin, S. A consensus-based model for global optimization and its mean-field limit. Mathematical Models and Methods in Applied Sciences, 27 0 (1): 0 183--204, 2017

2017
[29]

Evolutionsstrategie: Optimierung technischer Systeme nach Prinzipien der biologischen Evolution

Rechenberg, I. Evolutionsstrategie: Optimierung technischer Systeme nach Prinzipien der biologischen Evolution . Problemata, 15. Frommann-Holzboog, Stuttgart-Bad Cannstatt, 1973. ISBN 978-3-7728-0373-4

1973
[30]

Gradient is All You Need ?, 2023

Riedl, K., Klock, T., Geldhauser, C., and Fornasier, M. Gradient is All You Need ?, 2023. URL https://arxiv.org/abs/2306.09778

arXiv 2023
[31]

How Consensus - Based Optimization can be Interpreted as a Stochastic Relaxation of Gradient Descent

Riedl, K., Klock, T., Geldhauser, C., and Fornasier, M. How Consensus - Based Optimization can be Interpreted as a Stochastic Relaxation of Gradient Descent . In Differentiable Almost Everything Workshop , ICML 2024 , 2024

2024
[32]

An automatic method for finding the greatest or least value of a function

Rosenbrock, H. An automatic method for finding the greatest or least value of a function. The computer journal, 3 0 (3): 0 175--184, 1960

1960
[33]

Rubinstein, R. Y. The score function approach for sensitivity analysis of computer simulation models. Mathematics and Computers in Simulation, 28 0 (5): 0 351--379, 1986

1986
[34]

Evolution Strategies as a Scalable Alternative to Reinforcement Learning , 2017

Salimans, T., Ho, J., Chen, X., Sidor, S., and Sutskever, I. Evolution Strategies as a Scalable Alternative to Reinforcement Learning , 2017. URL https://arxiv.org/abs/1703.03864

Pith/arXiv arXiv 2017
[35]

W., Tay, Y., Ruder, S., Zhou, D., Das, D., and Wei, J

Shi, F., Suzgun, M., Freitag, M., Wang, X., Srivats, S., Vosoughi, S., Chung, H. W., Tay, Y., Ruder, S., Zhou, D., Das, D., and Wei, J. Language Models are Multilingual Chain -of- Thought Reasoners . In ICLR, 2023

2023
[36]

A., Maheswaranathan, N., and Ganguli, S

Sohl-Dickstein, J., Weiss, E. A., Maheswaranathan, N., and Ganguli, S. Deep Unsupervised Learning using Nonequilibrium Thermodynamics . In ICML , 2015

2015
[37]

Denoising diffusion implicit models

Song, J., Meng, C., and Ermon, S. Denoising diffusion implicit models. In ICLR , 2021 a

2021
[38]

P., Kumar, A., Ermon, S., and Poole, B

Song, Y., Sohl-Dickstein, J., Kingma, D. P., Kumar, A., Ermon, S., and Poole, B. Score- Based Generative Modeling through Stochastic Differential Equations . In ICLR, 2021 b

2021
[39]

Spall, J. C. Introduction to Stochastic Search and Optimization: Estimation , Simulation, and Control . Wiley, Hoboken, NJ, 2003

2003
[40]

Natural Evolution Strategies

Wierstra, D., Schaul, T., Peters, J., and Schmidhuber, J. Natural Evolution Strategies . In 2008 IEEE Congress on Evolutionary Computation , pp.\ 3381--3387, 2008

2008
[41]

Natural Evolution Strategies

Wierstra, D., Schaul, T., Glasmachers, T., Sun, Y., Peters, J., and Schmidhuber, J. Natural Evolution Strategies . Journal of Machine Learning Research, 15 0 (27): 0 949--980, 2014

2014
[42]

Y., Roelofs, R., Gontijo-Lopes, R., Morcos, A

Wortsman, M., Ilharco, G., Gadre, S. Y., Roelofs, R., Gontijo-Lopes, R., Morcos, A. S., Namkoong, H., Farhadi, A., Carmon, Y., Kornblith, S., and Schmidt, L. Model soups: averaging weights of multiple fine-tuned models improves accuracy without increasing inference time. In ICML, 2022

2022
[43]

Hybrid of PSO and CMA - ES for Global Optimization

Xu, P., Luo, W., Lin, X., Qiao, Y., and Zhu, T. Hybrid of PSO and CMA - ES for Global Optimization . In IEEE Congress on Evolutionary Computation , 2019

2019
[44]

TIES - Merging : Resolving Interference When Merging Models

Yadav, P., Tam, D., Choshen, L., Raffel, C., and Bansal, M. TIES - Merging : Resolving Interference When Merging Models . In NeurIPS , 2023

2023
[45]

Language Models are Super Mario : Absorbing Abilities from Homologous Models as a Free Lunch

Yu, L., Yu, B., Yu, H., Huang, F., and Li, Y. Language Models are Super Mario : Absorbing Abilities from Homologous Models as a Free Lunch . In ICML , 2024

2024
[46]

and Sanderson, A

Zhang, J. and Sanderson, A. C. JADE : Self -adaptive differential evolution with fast and reliable convergence performance. In 2007 IEEE Congress on Evolutionary Computation , 2007

2007
[47]

Diffusion Models are Evolutionary Algorithms

Zhang, Y., Hartl, B., Hazan, H., and Levin, M. Diffusion Models are Evolutionary Algorithms . In ICLR, 2025. URL https://openreview.net/forum?id=xVefsBbG2O

2025

[1] [1]

Evolutionary optimization of model merging recipes

Akiba, T., Shing, M., Tang, Y., Sun, Q., and Ha, D. Evolutionary optimization of model merging recipes. Nature Machine Intelligence, 7: 0 195--204, 2025

2025

[2] [2]

Gradient-free optimization via integration, 2024

Andrieu, C., Chopin, N., Fincato, E., and Gerber, M. Gradient-free optimization via integration, 2024. URL https://arxiv.org/abs/2408.00888

arXiv 2024

[3] [3]

shisa-gamma-7b-v1, 2023

augmxnt. shisa-gamma-7b-v1, 2023. URL https://huggingface.co/augmxnt/shisa-gamma-7b-v1

2023

[4] [4]

J., Leary, C., Maclaurin, D., Necula, G., Paszke, A., Vander P las, J., Wanderman- M ilne, S., and Zhang, Q

Bradbury, J., Frostig, R., Hawkins, P., Johnson, M. J., Leary, C., Maclaurin, D., Necula, G., Paszke, A., Vander P las, J., Wanderman- M ilne, S., and Zhang, Q. JAX : composable transformations of P ython+ N um P y programs, 2018. URL http://github.com/jax-ml/jax

2018

[5] [5]

V., Lange, R

Braun, C. V., Lange, R. T., and Toussaint, M. Stein Variational Evolution Strategies . In UAI . arXiv, 2025. URL https://proceedings.mlr.press/v286/braun25a.html

2025

[6] [6]

Polarized consensus-based dynamics for optimization and sampling

Bungert, L., Roith, T., and Wacker, P. Polarized consensus-based dynamics for optimization and sampling. Mathematical Programming, 211: 0 125--155, 2025

2025

[7] [7]

A., Jin, S., Li, L., and Zhu, Y

Carrillo, J. A., Jin, S., Li, L., and Zhu, Y. A consensus-based global optimization method for high dimensional machine learning problems. ESAIM: Control, Optimisation and Calculus of Variations, 27: 0 S5, 2021

2021

[8] [8]

Generative ai for math: Abel

Chern, E., Zou, H., Li, X., Hu, J., Feng, K., Li, J., and Liu, P. Generative ai for math: Abel. https://github.com/GAIR-NLP/abel, 2023

2023

[9] [9]

Sharpness- Aware Minimization for Efficiently Improving Generalization

Foret, P., Kleiner, A., Mobahi, H., and Neyshabur, B. Sharpness- Aware Minimization for Efficiently Improving Generalization . In ICLR , 2021

2021

[10] [10]

D., Frey, E., Raichuk, A., Girgin, S., Mordatch, I., and Bachem, O

Freeman, C. D., Frey, E., Raichuk, A., Girgin, S., Mordatch, I., and Bachem, O. Brax -- A Differentiable Physics Engine for Large Scale Rigid Body Simulation . In NeurIPS Datasets and Benchmarks Track, 2021

2021

[11] [11]

The language model evaluation harness, 07 2024

Gao, L., Tow, J., Abbasi, B., Biderman, S., Black, S., DiPofi, A., Foster, C., Golding, L., Hsu, J., Le Noac'h, A., Li, H., McDonell, K., Muennighoff, N., Ociepa, C., Phang, J., Reynolds, L., Schoelkopf, H., Skowron, A., Sutawika, L., Tang, E., Thite, A., Wang, B., Wang, K., and Zou, A. The language model evaluation harness, 07 2024. URL https://zenodo.or...

arXiv 2024

[12] [12]

Monte Carlo Methods in Financial Engineering , volume 53 of Stochastic Modelling and Applied Probability

Glasserman, P. Monte Carlo Methods in Financial Engineering , volume 53 of Stochastic Modelling and Applied Probability. Springer, New York, 2004

2004

[13] [13]

Glynn, P. W. Likelihood ratio gradient estimation for stochastic systems. Commun. ACM, 33 0 (10): 0 75--84, 1990

1990

[14] [14]

The CMA Evolution Strategy : A Tutorial , 2016

Hansen, N. The CMA Evolution Strategy : A Tutorial , 2016. URL https://arxiv.org/abs/1604.00772

Pith/arXiv arXiv 2016

[15] [15]

Real- Parameter Black - Box Optimization Benchmarking 2009: Noiseless Functions Definitions

Hansen, N., Finck, S., Ros, R., and Auger, A. Real- Parameter Black - Box Optimization Benchmarking 2009: Noiseless Functions Definitions . Research Report RR-6829, INRIA, 2009. URL https://inria.hal.science/inria-00362633

2009

[16] [16]

and Schmidhuber, J

Hochreiter, S. and Schmidhuber, J. Flat Minima . Neural Computation, 9 0 (1): 0 1--42, 1997

1997

[17] [17]

T., Wortsman, M., Gururangan, S., Schmidt, L., Hajishirzi, H., and Farhadi, A

Ilharco, G., Ribeiro, M. T., Wortsman, M., Gururangan, S., Schmidt, L., Hajishirzi, H., and Farhadi, A. Editing Models with Task Arithmetic . In ICLR , 2023

2023

[18] [18]

Q., Sablayrolles, A., Mensch, A., Bamford, C., Chaplot, D

Jiang, A. Q., Sablayrolles, A., Mensch, A., Bamford, C., Chaplot, D. S., Casas, D. d. l., Bressand, F., Lengyel, G., Lample, G., Saulnier, L., Lavaud, L. R., Lachaux, M.-A., Stock, P., Scao, T. L., Lavril, T., Wang, T., Lacroix, T., and Sayed, W. E. Mistral 7B , 2023. URL https://arxiv.org/abs/2310.06825

Pith/arXiv arXiv 2023

[19] [19]

Katkovnik, V. Ya . and Kulchitsky, O. Yu . Convergence of a Class of Random Search Algorithms . Automation and Remote Control, 33: 0 1321--1326, 1972

1972

[20] [20]

Lange, R. T. evosax: Jax-based evolution strategies, 2022. URL https://arxiv.org/abs/2212.04180

arXiv 2022

[21] [21]

T., Schaul, T., Chen, Y., Zahavy, T., Dallibard, V., Lu, C., Singh, S., and Flennerhag, S

Lange, R. T., Schaul, T., Chen, Y., Zahavy, T., Dallibard, V., Lu, C., Singh, S., and Flennerhag, S. Discovering Evolution Strategies via Meta - Black - Box Optimization . In ICLR , 2023. URL https://openreview.net/forum?id=mFDU0fP3EQH

2023

[22] [22]

Lee, H. K. and Yoon, S. W. Flat Reward in Policy Parameter Space Implies Robust Reinforcement Learning . In ICLR , 2025. URL https://openreview.net/forum?id=4OaO3GjP7k

2025

[23] [23]

Wizardmath: Empowering mathematical reasoning for large language models via reinforced evol-instruct

Luo, H., Sun, Q., Xu, C., Zhao, P., Lou, J., Tao, C., Geng, X., Lin, Q., Chen, S., and Zhang, D. Wizardmath: Empowering mathematical reasoning for large language models via reinforced evol-instruct. In ICRL, 2025. URL https://openreview.net/forum?id=mMPMHWOdOy

2025

[24] [24]

R., Crisostomi, D., Santilli, A., and Rodolà, E

Mencattini, T., Minut, A. R., Crisostomi, D., Santilli, A., and Rodolà, E. MERGE \ 3\ : Efficient Evolutionary Merging on Consumer -grade GPUs . In ICML, 2025. URL https://proceedings.mlr.press/v267/mencattini25a.html

2025

[25] [25]

D., Schoenholz, S

Metz, L., Freeman, C. D., Schoenholz, S. S., and Kachman, T. Gradients are Not All You Need , 2022. URL http://arxiv.org/abs/2111.05803

arXiv 2022

[26] [26]

and Spokoiny, V

Nesterov, Y. and Spokoiny, V. Random gradient-free minimization of convex functions. Foundations of Computational Mathematics, 17 0 (2): 0 527--566, 2017

2017

[27] [27]

Information- Geometric Optimization Algorithms : A Unifying Picture via Invariance Principles

Ollivier, Y., Arnold, L., Auger, A., and Hansen, N. Information- Geometric Optimization Algorithms : A Unifying Picture via Invariance Principles . JMLR, 2017. URL https://jmlr.org/papers/v18/14-467.html

2017

[28] [28]

A consensus-based model for global optimization and its mean-field limit

Pinnau, R., Totzeck, C., Tse, O., and Martin, S. A consensus-based model for global optimization and its mean-field limit. Mathematical Models and Methods in Applied Sciences, 27 0 (1): 0 183--204, 2017

2017

[29] [29]

Evolutionsstrategie: Optimierung technischer Systeme nach Prinzipien der biologischen Evolution

Rechenberg, I. Evolutionsstrategie: Optimierung technischer Systeme nach Prinzipien der biologischen Evolution . Problemata, 15. Frommann-Holzboog, Stuttgart-Bad Cannstatt, 1973. ISBN 978-3-7728-0373-4

1973

[30] [30]

Gradient is All You Need ?, 2023

Riedl, K., Klock, T., Geldhauser, C., and Fornasier, M. Gradient is All You Need ?, 2023. URL https://arxiv.org/abs/2306.09778

arXiv 2023

[31] [31]

How Consensus - Based Optimization can be Interpreted as a Stochastic Relaxation of Gradient Descent

Riedl, K., Klock, T., Geldhauser, C., and Fornasier, M. How Consensus - Based Optimization can be Interpreted as a Stochastic Relaxation of Gradient Descent . In Differentiable Almost Everything Workshop , ICML 2024 , 2024

2024

[32] [32]

An automatic method for finding the greatest or least value of a function

Rosenbrock, H. An automatic method for finding the greatest or least value of a function. The computer journal, 3 0 (3): 0 175--184, 1960

1960

[33] [33]

Rubinstein, R. Y. The score function approach for sensitivity analysis of computer simulation models. Mathematics and Computers in Simulation, 28 0 (5): 0 351--379, 1986

1986

[34] [34]

Evolution Strategies as a Scalable Alternative to Reinforcement Learning , 2017

Salimans, T., Ho, J., Chen, X., Sidor, S., and Sutskever, I. Evolution Strategies as a Scalable Alternative to Reinforcement Learning , 2017. URL https://arxiv.org/abs/1703.03864

Pith/arXiv arXiv 2017

[35] [35]

W., Tay, Y., Ruder, S., Zhou, D., Das, D., and Wei, J

Shi, F., Suzgun, M., Freitag, M., Wang, X., Srivats, S., Vosoughi, S., Chung, H. W., Tay, Y., Ruder, S., Zhou, D., Das, D., and Wei, J. Language Models are Multilingual Chain -of- Thought Reasoners . In ICLR, 2023

2023

[36] [36]

A., Maheswaranathan, N., and Ganguli, S

Sohl-Dickstein, J., Weiss, E. A., Maheswaranathan, N., and Ganguli, S. Deep Unsupervised Learning using Nonequilibrium Thermodynamics . In ICML , 2015

2015

[37] [37]

Denoising diffusion implicit models

Song, J., Meng, C., and Ermon, S. Denoising diffusion implicit models. In ICLR , 2021 a

2021

[38] [38]

P., Kumar, A., Ermon, S., and Poole, B

Song, Y., Sohl-Dickstein, J., Kingma, D. P., Kumar, A., Ermon, S., and Poole, B. Score- Based Generative Modeling through Stochastic Differential Equations . In ICLR, 2021 b

2021

[39] [39]

Spall, J. C. Introduction to Stochastic Search and Optimization: Estimation , Simulation, and Control . Wiley, Hoboken, NJ, 2003

2003

[40] [40]

Natural Evolution Strategies

Wierstra, D., Schaul, T., Peters, J., and Schmidhuber, J. Natural Evolution Strategies . In 2008 IEEE Congress on Evolutionary Computation , pp.\ 3381--3387, 2008

2008

[41] [41]

Natural Evolution Strategies

Wierstra, D., Schaul, T., Glasmachers, T., Sun, Y., Peters, J., and Schmidhuber, J. Natural Evolution Strategies . Journal of Machine Learning Research, 15 0 (27): 0 949--980, 2014

2014

[42] [42]

Y., Roelofs, R., Gontijo-Lopes, R., Morcos, A

Wortsman, M., Ilharco, G., Gadre, S. Y., Roelofs, R., Gontijo-Lopes, R., Morcos, A. S., Namkoong, H., Farhadi, A., Carmon, Y., Kornblith, S., and Schmidt, L. Model soups: averaging weights of multiple fine-tuned models improves accuracy without increasing inference time. In ICML, 2022

2022

[43] [43]

Hybrid of PSO and CMA - ES for Global Optimization

Xu, P., Luo, W., Lin, X., Qiao, Y., and Zhu, T. Hybrid of PSO and CMA - ES for Global Optimization . In IEEE Congress on Evolutionary Computation , 2019

2019

[44] [44]

TIES - Merging : Resolving Interference When Merging Models

Yadav, P., Tam, D., Choshen, L., Raffel, C., and Bansal, M. TIES - Merging : Resolving Interference When Merging Models . In NeurIPS , 2023

2023

[45] [45]

Language Models are Super Mario : Absorbing Abilities from Homologous Models as a Free Lunch

Yu, L., Yu, B., Yu, H., Huang, F., and Li, Y. Language Models are Super Mario : Absorbing Abilities from Homologous Models as a Free Lunch . In ICML , 2024

2024

[46] [46]

and Sanderson, A

Zhang, J. and Sanderson, A. C. JADE : Self -adaptive differential evolution with fast and reliable convergence performance. In 2007 IEEE Congress on Evolutionary Computation , 2007

2007

[47] [47]

Diffusion Models are Evolutionary Algorithms

Zhang, Y., Hartl, B., Hazan, H., and Levin, M. Diffusion Models are Evolutionary Algorithms . In ICLR, 2025. URL https://openreview.net/forum?id=xVefsBbG2O

2025