Convergence of Consensus-Based Particle Methods for Nonconvex Bi-Level Optimization

Jalal Etesami; Konstantin Riedl; Majid Khadiv; Xudong Sun; Yutong Chao

arxiv: 2605.19667 · v1 · pith:JC4RI3YAnew · submitted 2026-05-19 · 🧮 math.OC · cs.LG

Convergence of Consensus-Based Particle Methods for Nonconvex Bi-Level Optimization

Yutong Chao , Xudong Sun , Konstantin Riedl , Majid Khadiv , Jalal Etesami This is my paper

Pith reviewed 2026-05-20 04:37 UTC · model grok-4.3

classification 🧮 math.OC cs.LG

keywords consensus-based optimizationbi-level optimizationmean-field convergenceparticle methodsnonconvex optimizationWasserstein distanceexponential convergence

0 comments

The pith

Consensus-based particle methods converge exponentially to bi-level optimization solutions in the mean-field limit.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper develops a derivative-free consensus-based optimization approach for solving nonconvex bi-level problems by minimizing an upper-level function over the minimizers of a lower-level problem. It uses smooth quantile selection with a Gibbs-type Laplace approximation to build the consensus point. Convergence is proven for the mean-field dynamics and finite-particle systems, showing that under assumptions on quantile localization, error bounds, and stability, the mean-field law approaches any Wasserstein neighborhood of the target solution at an explicit exponential rate up to the hitting time. This matters for providing theoretical support to practical methods used in applications like constrained optimization and neural network training.

Core claim

Under suitable assumptions on smooth quantile localization, error bounds, and stability, the mean-field law reaches any arbitrary prescribed Wasserstein neighborhood of the target bi-level solution with an explicit exponential rate up to the hitting time.

What carries the argument

The mean-field dynamics of the consensus-based particle system with smooth quantile selection combined with Gibbs-type Laplace approximation.

If this is right

The finite-particle approximation also converges to the bi-level solution.
The method applies to two-dimensional constrained problems and neural network training as shown in experiments.
Explicit exponential rates provide practical guidance for implementation.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

This particle method may extend to other nonconvex optimization challenges beyond bi-level settings.
Connections could be drawn to existing mean-field analyses in consensus algorithms.
Further experiments on high-dimensional problems would test the practical reach of the exponential convergence.

Load-bearing premise

Suitable assumptions on smooth quantile localization, error bounds, and stability hold.

What would settle it

Observing that the mean-field dynamics do not approach the target Wasserstein neighborhood at the claimed exponential rate in a setting satisfying the assumptions.

Figures

Figures reproduced from arXiv: 2605.19667 by Jalal Etesami, Konstantin Riedl, Majid Khadiv, Xudong Sun, Yutong Chao.

**Figure 1.** Figure 1: CB2O vs SCB2O on the circle constraint (first row) and on the star-shaped constraint (second row). All metrics use log scale. Solid line depicts the mean over 5 seeds; shaded band is min–max range. From left to right: L(c⋆), G(c⋆), ∥c⋆ − θ ⋆∥2, and σ(x). comparable to the state-of-the-art CB2O. Notably, although CB2O performs well in practice, its discontinuity prevents the direct use of standard Lipschitz… view at source ↗

**Figure 2.** Figure 2: MNIST training curves: N = 50, β = 0.04. Small values of ξ (e.g., 1 and 5) allow poorly performing particles to influence the consensus, which degrades performance. In contrast, ξ ≥ 10 stabilizes training and yields results close to CB2O, thereby confirming the theoretical connection between the two algorithms. 5 Conclusion We studied a bi-level optimization problem with nonconvex lower- and upper-level ob… view at source ↗

**Figure 3.** Figure 3: Logical dependency structure of the proof. For readability, only the main proof dependencies [PITH_FULL_IMAGE:figures/full_fig_p015_3.png] view at source ↗

**Figure 4.** Figure 4: Left: Upper-level objective value. Right: Lower-level objective value. [PITH_FULL_IMAGE:figures/full_fig_p048_4.png] view at source ↗

**Figure 5.** Figure 5: CB2O vs SCB2O on the circle constraint. All metrics use log scale [PITH_FULL_IMAGE:figures/full_fig_p049_5.png] view at source ↗

**Figure 6.** Figure 6: CB2O vs SCB2O on the star-shaped constraint. All metrics use log scale [PITH_FULL_IMAGE:figures/full_fig_p050_6.png] view at source ↗

**Figure 7.** Figure 7: MNIST training curves: N = 50, β = 0.06. 0 20 40 60 80 100 Epoch 0.5 1.0 1.5 2.0 2.5 3.0 Training loss Training loss CB2O (n=5 seeds) SCB2O = 1 (n=5 seeds) SCB2O = 10 (n=5 seeds) SCB2O = 100 (n=5 seeds) SCB2O = 1000 (n=5 seeds) SCB2O = 5 (n=6 seeds) 0 20 40 60 80 100 Epoch 0.2 0.4 0.6 0.8 Test accuracy Test accuracy CB2O (n=5 seeds) SCB2O = 1 (n=5 seeds) SCB2O = 10 (n=5 seeds) SCB2O = 100 (n=5 seeds) SCB2O… view at source ↗

**Figure 8.** Figure 8: MNIST training curves: N = 50, β = 0.08. 0 20 40 60 80 100 Epoch 0.5 1.0 1.5 2.0 2.5 3.0 Training loss Training loss CB2O (n=5 seeds) SCB2O = 1 (n=5 seeds) SCB2O = 10 (n=5 seeds) SCB2O = 100 (n=5 seeds) SCB2O = 1000 (n=5 seeds) SCB2O = 5 (n=6 seeds) 0 20 40 60 80 100 Epoch 0.2 0.4 0.6 0.8 Test accuracy Test accuracy CB2O (n=5 seeds) SCB2O = 1 (n=5 seeds) SCB2O = 10 (n=5 seeds) SCB2O = 100 (n=5 seeds) SCB2O… view at source ↗

**Figure 9.** Figure 9: MNIST training curves: N = 50, β = 0.10. 51 [PITH_FULL_IMAGE:figures/full_fig_p051_9.png] view at source ↗

**Figure 10.** Figure 10: Final-epoch metrics vs. β for CB2O and SCB2O across all ξ values. Shaded band = min–max across seeds. 52 [PITH_FULL_IMAGE:figures/full_fig_p052_10.png] view at source ↗

read the original abstract

In this paper, we study a consensus-based optimization method for nonconvex bi-level optimization, where the objective is to minimize an upper-level function over the set of global minimizers of a lower-level problem. The proposed approach is derivative-free, and constructs its consensus point via smooth quantile selection combined with a Gibbs-type Laplace approximation. We establish convergence guarantees for both the associated \textit{mean-field} dynamics and its \textit{finite-particle} approximation. In particular, under suitable assumptions on smooth quantile localization, error bounds, and stability, we show that the mean-field law reaches any arbitrary prescribed Wasserstein neighborhood of the target bi-level solution with an explicit exponential rate up to the hitting time. Numerical experiments on a two-dimensional constrained problem and neural network training further support the theoretical results.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper gives mean-field exponential convergence for a consensus particle method on nonconvex bi-level problems, but the result rests on assumptions about quantile localization and stability that need checking.

read the letter

The key takeaway is that this paper provides convergence guarantees for a derivative-free consensus-based particle method tailored to nonconvex bi-level optimization problems. It combines smooth quantile selection with a Gibbs-type Laplace approximation to construct the consensus point and then analyzes the mean-field limit. The new element is applying these particle dynamics specifically to bi-level settings where the lower level is nonconvex. They derive an exponential rate of convergence in Wasserstein distance for the mean-field law to any prescribed neighborhood of the target solution, up to a hitting time. This is done under assumptions on smooth quantile localization, error bounds, and stability. The finite-particle approximation is also addressed. Numerically, they test on a two-dimensional constrained problem and on neural network training, which helps illustrate the method in practice. The approach does a decent job of extending existing consensus optimization ideas to this more complex bi-level case. The theoretical framework looks structured, with clear statements of what is assumed to reach the exponential convergence. Where it could be softer is in the reliance on those assumptions, particularly the smooth quantile localization and stability for nonconvex lower-level problems. When the lower level has a non-singleton set of global minimizers, the Gibbs approximation and quantile selection might not guarantee the localization or the Lyapunov stability needed for the rate to hold across the board. The paper uses these to obtain the result, so if they are difficult to verify or only hold in limited cases, the practical scope narrows. Reviewers should check the proofs for how these assumptions are justified and whether they are satisfied in the numerical examples. This paper is for specialists in optimization methods, especially those working on bi-level problems in machine learning or constrained optimization who are interested in particle-based or derivative-free approaches. A reader familiar with mean-field analysis or consensus optimization would get the most out of the theoretical parts. I recommend sending it for peer review. The topic is relevant and the contribution adds a new angle with some supporting analysis and experiments, even if the assumptions will require close examination.

Referee Report

2 major / 2 minor

Summary. The paper proposes a derivative-free consensus-based particle method for nonconvex bi-level optimization problems, where an upper-level objective is minimized over the global minimizers of a nonconvex lower-level problem. The consensus point is constructed via smooth quantile selection combined with a Gibbs-type Laplace approximation. Convergence guarantees are established for the associated mean-field dynamics and the finite-particle approximation: under assumptions on smooth quantile localization, error bounds, and stability, the mean-field law converges exponentially in Wasserstein distance to any prescribed neighborhood of the target bi-level solution up to a hitting time. Numerical experiments on a two-dimensional constrained problem and neural network training are included to illustrate the results.

Significance. If the stated assumptions hold in the relevant regimes, the work would provide a useful theoretical foundation for applying consensus-based methods to bi-level optimization, an area with applications in hyperparameter tuning and machine learning. The explicit exponential convergence rate in Wasserstein distance for the mean-field limit, together with the particle approximation analysis, represents a clear strength. The inclusion of numerical support is also positive, though the conditional nature of the main theorem limits immediate applicability until the assumptions are more fully characterized.

major comments (2)

[Abstract] Abstract: the central claim of exponential Wasserstein convergence to an arbitrary neighborhood of the bi-level solution (up to hitting time) is stated only under 'suitable assumptions on smooth quantile localization, error bounds, and stability'. For nonconvex lower-level objectives the set of global minimizers need not be a singleton; the manuscript provides no explicit verification or sufficient conditions ensuring that the Gibbs-type Laplace approximation plus quantile selection yields the required localization and global Lyapunov stability when the upper-level objective varies over that set. This assumption is load-bearing for the exponential rate.
[§4] §4 (mean-field analysis): the derivation of the exponential rate proceeds from the external assumptions on quantile localization and stability to the mean-field PDE without an intermediate step that confirms the constructed consensus point satisfies the stability condition in a basin containing the target solution. A concrete test would be to exhibit at least one nonconvex lower-level example with a non-singleton minimizer set for which the stability hypothesis can be checked directly.

minor comments (2)

[Abstract] The term 'hitting time' is used in the abstract and main theorem without an explicit definition or forward reference to its precise meaning in the context of the mean-field dynamics; adding a short clarifying sentence would improve readability.
[§2] Notation for the quantile selection operator and the Laplace approximation parameter could be introduced more explicitly at first use to avoid ambiguity for readers outside the immediate subfield.

Simulated Author's Rebuttal

2 responses · 1 unresolved

We thank the referee for the careful reading and constructive feedback on our manuscript. We appreciate the acknowledgment of the potential value of the exponential Wasserstein convergence results and the numerical illustrations. We address the major comments point by point below, clarifying the role of the assumptions and indicating revisions where appropriate.

read point-by-point responses

Referee: [Abstract] Abstract: the central claim of exponential Wasserstein convergence to an arbitrary neighborhood of the bi-level solution (up to hitting time) is stated only under 'suitable assumptions on smooth quantile localization, error bounds, and stability'. For nonconvex lower-level objectives the set of global minimizers need not be a singleton; the manuscript provides no explicit verification or sufficient conditions ensuring that the Gibbs-type Laplace approximation plus quantile selection yields the required localization and global Lyapunov stability when the upper-level objective varies over that set. This assumption is load-bearing for the exponential rate.

Authors: We agree that the assumptions on smooth quantile localization, error bounds, and stability are central and load-bearing for the exponential rate. The manuscript is structured to derive convergence under these assumptions, which are motivated by the smooth quantile selection combined with the Gibbs-type Laplace approximation to handle potentially non-singleton minimizer sets in nonconvex lower-level problems. The construction is intended to promote localization around the relevant global minimizers, but we do not claim universal verification or sufficient conditions that hold for every possible upper-level variation over the minimizer set. We will revise the abstract and the discussion of assumptions to more explicitly note that verification is problem-dependent and may require case-specific analysis. revision: partial
Referee: [§4] §4 (mean-field analysis): the derivation of the exponential rate proceeds from the external assumptions on quantile localization and stability to the mean-field PDE without an intermediate step that confirms the constructed consensus point satisfies the stability condition in a basin containing the target solution. A concrete test would be to exhibit at least one nonconvex lower-level example with a non-singleton minimizer set for which the stability hypothesis can be checked directly.

Authors: In the mean-field analysis of Section 4, the exponential rate is obtained by positing that the constructed consensus point satisfies the localization and stability conditions, from which the contraction in Wasserstein distance for the mean-field PDE follows. The proof does not include an additional intermediate verification step because the assumptions are taken as given for the general setting. We acknowledge that explicitly confirming the stability condition holds in a basin for the specific construction, particularly with a non-singleton minimizer set, would add clarity. We will add a remark in the revised Section 4 explaining how the quantile selection is designed to place the consensus point in the relevant basin, and we will reference the numerical experiments as empirical illustration of the overall behavior. revision: partial

standing simulated objections not resolved

Providing a concrete nonconvex lower-level example with a non-singleton minimizer set together with direct analytical verification of the stability hypothesis lies outside the scope of the current general convergence analysis and would require substantial additional case-by-case theoretical work.

Circularity Check

0 steps flagged

Mean-field convergence derived conditionally from external assumptions without self-referential reduction

full rationale

The paper's central derivation establishes exponential Wasserstein convergence of the mean-field law to a neighborhood of the bi-level solution up to hitting time, explicitly conditioned on assumptions regarding smooth quantile localization, error bounds, and stability. These assumptions are invoked as prerequisites rather than derived from or equivalent to the target convergence result. No equations or steps in the provided abstract or description reduce the claimed prediction to a fitted parameter, self-definition, or load-bearing self-citation chain. The finite-particle approximation and numerical experiments are presented as supporting the theory under those assumptions, keeping the derivation self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

Only the abstract is available, so the ledger is limited to the assumptions explicitly named for the convergence proof; no free parameters or invented entities are mentioned.

axioms (1)

domain assumption smooth quantile localization, error bounds, and stability assumptions
Invoked to obtain the exponential convergence rate of the mean-field law to a Wasserstein neighborhood of the target solution.

pith-pipeline@v0.9.0 · 5674 in / 1282 out tokens · 51426 ms · 2026-05-20T04:37:14.787361+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

under suitable assumptions on smooth quantile localization, error bounds, and stability, we show that the mean-field law reaches any arbitrary prescribed Wasserstein neighborhood of the target bi-level solution with an explicit exponential rate
IndisputableMonolith/Foundation/RealityFromDistinction.lean reality_from_one_distinction unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

soft β-quantile ... Z ψ((q−L(θ))/τ) dρ(θ)=β

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

177 extracted references · 177 canonical work pages · 2 internal anchors

[1]

Mathematical Models and Methods in Applied Sciences , volume=

An analytical framework for consensus-based global optimization method , author=. Mathematical Models and Methods in Applied Sciences , volume=. 2018 , publisher=

work page 2018
[3]

Modeling and Simulation for Collective Dynamics , pages=

Mean-field particle swarm optimization , author=. Modeling and Simulation for Collective Dynamics , pages=. 2023 , publisher=

work page 2023
[4]

Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences , volume=

Defending against diverse attacks in federated learning through consensus-based bi-level optimization , author=. Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences , volume=. 2025 , publisher=

work page 2025
[5]

Journal of Optimization Theory and Applications , volume=

A discrete consensus-based global optimization method with noisy objective function , author=. Journal of Optimization Theory and Applications , volume=. 2025 , publisher=

work page 2025
[7]

Journal of machine learning research , volume=

FedCBO: Reaching group consensus in clustered federated learning through consensus-based optimization , author=. Journal of machine learning research , volume=

work page
[8]

International conference on the applications of evolutionary computation (part of evostar) , pages=

Convergence of anisotropic consensus-based optimization in mean-field law , author=. International conference on the applications of evolutionary computation (part of evostar) , pages=. 2022 , organization=

work page 2022
[9]

ESAIM: Control, Optimisation and Calculus of Variations , volume=

A consensus-based global optimization method for high dimensional machine learning problems , author=. ESAIM: Control, Optimisation and Calculus of Variations , volume=. 2021 , publisher=

work page 2021
[10]

European Journal of Applied Mathematics , volume=

Consensus-based optimisation with truncated noise , author=. European Journal of Applied Mathematics , volume=. 2025 , publisher=

work page 2025
[11]

Modeling and simulation for collective dynamics , pages=

Consensus-based optimization and ensemble Kalman inversion for global optimization problems with constraints , author=. Modeling and simulation for collective dynamics , pages=. 2023 , publisher=

work page 2023
[12]

2006 , publisher=

Applied asymptotic analysis , author=. 2006 , publisher=

work page 2006
[13]

International Conference on Learning Representations , year=

Self-tuning networks: Bilevel optimization of hyperparameters using structured best-response functions , author=. International Conference on Learning Representations , year=

work page
[14]

Proceedings of the AAAI Conference on Artificial Intelligence , volume=

A stochastic approach to bi-level optimization for hyperparameter optimization and meta learning , author=. Proceedings of the AAAI Conference on Artificial Intelligence , volume=

work page
[15]

Proceedings of the AAAI Conference on Artificial Intelligence , volume=

Rethinking bi-level optimization in neural architecture search: A gibbs sampling perspective , author=. Proceedings of the AAAI Conference on Artificial Intelligence , volume=

work page
[16]

, title =

Gammelli, Daniele and Harrison, James and Yang, Kaidi and Pavone, Marco and Rodrigues, Filipe and Pereira, Francisco C. , title =. Proceedings of the 40th International Conference on Machine Learning , articleno =. 2023 , publisher =

work page 2023
[17]

Mathematical Programming , volume=

An online convex optimization-based framework for convex bilevel optimization , author=. Mathematical Programming , volume=. 2023 , publisher=

work page 2023
[18]

Journal of machine learning research , volume=

Lower bounds and accelerated algorithms for bilevel optimization , author=. Journal of machine learning research , volume=

work page
[19]

Advances in Neural Information Processing Systems , volume=

An accelerated gradient method for convex smooth simple bilevel optimization , author=. Advances in Neural Information Processing Systems , volume=

work page
[21]

SIAM Journal on Optimization , volume=

Proximal point algorithm controlled by a slowly vanishing term: applications to hierarchical minimization , author=. SIAM Journal on Optimization , volume=. 2005 , publisher=

work page 2005
[22]

Bilevel Optimization: Advances and Next Challenges , pages=

Algorithms for simple bilevel programming , author=. Bilevel Optimization: Advances and Next Challenges , pages=. 2020 , publisher=

work page 2020
[23]

Mathematical Programming , volume=

A first order method for finding minimal norm-like solutions of convex optimization problems , author=. Mathematical Programming , volume=. 2014 , publisher=

work page 2014
[24]

International Conference on Artificial Intelligence and Statistics , pages=

A conditional gradient-based method for simple bilevel optimization with convex lower-level problem , author=. International Conference on Artificial Intelligence and Statistics , pages=. 2023 , organization=

work page 2023
[25]

Mathematical Models and Methods in Applied Sciences , volume=

A consensus-based model for global optimization and its mean-field limit , author=. Mathematical Models and Methods in Applied Sciences , volume=. 2017 , publisher=

work page 2017
[26]

SIAM Journal on Optimization , volume=

Constrained consensus-based optimization , author=. SIAM Journal on Optimization , volume=. 2023 , publisher=

work page 2023
[28]

SIAM Journal on Optimization , volume=

Consensus-based optimization methods converge globally , author=. SIAM Journal on Optimization , volume=. 2024 , publisher=

work page 2024
[29]

SIAM Journal on Control and Optimization , volume=

Consensus-based optimization for saddle point problems , author=. SIAM Journal on Control and Optimization , volume=. 2024 , publisher=

work page 2024
[30]

Journal of Machine Learning Research , volume=

Consensus-based optimization on the sphere: Convergence to global minimizers and machine learning , author=. Journal of Machine Learning Research , volume=

work page
[31]

SIAM review , volume=

An algorithmic introduction to numerical simulation of stochastic differential equations , author=. SIAM review , volume=. 2001 , publisher=

work page 2001
[33]

Mathematical Models and Methods in Applied Sciences , volume=

A multiscale consensus-based algorithm for multilevel optimization , author=. Mathematical Models and Methods in Applied Sciences , volume=. 2025 , publisher=

work page 2025
[34]

Studies in Applied Mathematics , volume=

Consensus-based sampling , author=. Studies in Applied Mathematics , volume=. 2022 , publisher=

work page 2022
[35]

SIAM Journal on Optimization , volume=

Consensus-based algorithms for stochastic optimization problems , author=. SIAM Journal on Optimization , volume=. 2025 , publisher=

work page 2025
[36]

International conference on machine learning , pages=

Bilevel programming for hyperparameter optimization and meta-learning , author=. International conference on machine learning , pages=. 2018 , organization=

work page 2018
[38]

International Conference on Machine Learning , pages=

Revisiting and advancing fast adversarial training through the lens of bi-level optimization , author=. International Conference on Machine Learning , pages=. 2022 , organization=

work page 2022
[39]

Advances in Neural Information Processing Systems , volume=

Advancing model pruning via bi-level optimization , author=. Advances in Neural Information Processing Systems , volume=

work page
[40]

IEEE Signal Processing Magazine , volume=

An introduction to bilevel optimization: Foundations and applications in signal processing and machine learning , author=. IEEE Signal Processing Magazine , volume=. 2024 , publisher=

work page 2024
[41]

Advances in Neural Information Processing Systems , volume=

Functional bilevel optimization for machine learning , author=. Advances in Neural Information Processing Systems , volume=

work page
[42]

Applied Mathematics and Computation , volume=

A constrained consensus based optimization algorithm and its application to finance , author=. Applied Mathematics and Computation , volume=. 2022 , publisher=

work page 2022
[44]

An alternating optimization method for bilevel problems under the Polyak-

Xiao, Quan and Lu, Songtao and Chen, Tianyi , journal=. An alternating optimization method for bilevel problems under the Polyak-

work page
[45]

SIAM Journal on Mathematics of Data Science , volume=

Global minima of overparameterized neural networks , author=. SIAM Journal on Mathematics of Data Science , volume=. 2021 , publisher=

work page 2021
[46]

International conference on machine learning , pages=

Gradient descent finds global minima of deep neural networks , author=. International conference on machine learning , pages=. 2019 , organization=

work page 2019
[47]

Advances in Neural Information Processing Systems , volume=

Loss landscape characterization of neural networks without over-parametrization , author=. Advances in Neural Information Processing Systems , volume=

work page
[48]

Active Particles, Volume 3: Advances in Theory, Models, and Applications , pages=

Trends in consensus-based optimization , author=. Active Particles, Volume 3: Advances in Theory, Models, and Applications , pages=. 2021 , publisher=

work page 2021
[50]

Penalty-based methods for simple bilevel optimization under h

Chen, Pengyu and Shi, Xu and Jiang, Rujun and Wang, Jiulin , journal=. Penalty-based methods for simple bilevel optimization under h

work page
[51]

European Journal of Applied Mathematics , volume=

Leveraging memory effects and gradient information in consensus-based optimisation: On global convergence in mean-field law , author=. European Journal of Applied Mathematics , volume=. 2024 , publisher=

work page 2024
[53]

Journal of Optimization Theory and Applications , volume=

A consensus-based algorithm for non-convex multiplayer games , author=. Journal of Optimization Theory and Applications , volume=. 2025 , publisher=

work page 2025
[54]

Acta numerica , volume=

An introduction to numerical methods for stochastic differential equations , author=. Acta numerica , volume=. 1999 , publisher=

work page 1999
[55]

International conference on machine learning , pages=

On penalty-based bilevel gradient descent method , author=. International conference on machine learning , pages=. 2023 , organization=

work page 2023
[56]

Langley , title =

P. Langley , title =. Proceedings of the 17th International Conference on Machine Learning (ICML 2000) , address =. 2000 , pages =

work page 2000
[57]

T. M. Mitchell. The Need for Biases in Learning Generalizations. 1980

work page 1980
[58]

M. J. Kearns , title =

work page
[59]

Machine Learning: An Artificial Intelligence Approach, Vol. I. 1983

work page 1983
[60]

R. O. Duda and P. E. Hart and D. G. Stork. Pattern Classification. 2000

work page 2000
[61]

Suppressed for Anonymity , author=

work page
[62]

Newell and P

A. Newell and P. S. Rosenbloom. Mechanisms of Skill Acquisition and the Law of Practice. Cognitive Skills and Their Acquisition. 1981

work page 1981
[63]

A. L. Samuel. Some Studies in Machine Learning Using the Game of Checkers. IBM Journal of Research and Development. 1959

work page 1959
[64]

Mathematical programming , volume=

A nonlinear programming algorithm for solving semidefinite programs via low-rank factorization , author=. Mathematical programming , volume=. 2003 , publisher=

work page 2003
[65]

Optimization-Online , year=

Alternating direction methods for non convex optimization with applications to second-order least-squares and risk parity portfolio selection , author=. Optimization-Online , year=

work page
[66]

Advances in neural information processing systems , volume=

Algorithms for non-negative matrix factorization , author=. Advances in neural information processing systems , volume=

work page
[67]

Journal of the ACM (JACM) , volume=

Improved approximation algorithms for maximum cut and satisfiability problems using semidefinite programming , author=. Journal of the ACM (JACM) , volume=. 1995 , publisher=

work page 1995
[68]

IEEE Transactions on knowledge and data engineering , volume=

Nonnegative matrix factorization: A comprehensive review , author=. IEEE Transactions on knowledge and data engineering , volume=. 2012 , publisher=

work page 2012
[69]

International conference on machine learning , pages=

Training neural networks without gradients: A scalable admm approach , author=. International conference on machine learning , pages=. 2016 , organization=

work page 2016
[70]

IEEE Transactions on Geoscience and Remote Sensing , volume=

SAR parametric super-resolution image reconstruction methods based on ADMM and deep neural network , author=. IEEE Transactions on Geoscience and Remote Sensing , volume=. 2021 , publisher=

work page 2021
[71]

IEEE Transactions on Robotics , volume=

Biconmp: A nonlinear model predictive control framework for whole body motion planning , author=. IEEE Transactions on Robotics , volume=. 2023 , publisher=

work page 2023
[72]

IEEE Robotics and Automation Letters , volume=

Gait and trajectory optimization for legged systems through phase-based end-effector parameterization , author=. IEEE Robotics and Automation Letters , volume=. 2018 , publisher=

work page 2018
[73]

IEEE Transactions on Circuits and Systems for Video Technology , year=

Linear regression problem relaxations solved by nonconvex ADMM with convergence analysis , author=. IEEE Transactions on Circuits and Systems for Video Technology , year=

work page
[74]

Journal of Scientific Computing , volume=

Global convergence of ADMM in nonconvex nonsmooth optimization , author=. Journal of Scientific Computing , volume=. 2019 , publisher=

work page 2019
[75]

Journal of Machine Learning Research , volume=

Convergence for nonconvex ADMM, with applications to CT imaging , author=. Journal of Machine Learning Research , volume=

work page
[76]

Proceedings of the eighteenth annual ACM symposium on Theory of computing , pages=

The complexity of optimization problems , author=. Proceedings of the eighteenth annual ACM symposium on Theory of computing , pages=

work page
[77]

International Conference on Machine Learning , pages=

An asynchronous parallel stochastic coordinate descent algorithm , author=. International Conference on Machine Learning , pages=. 2014 , organization=

work page 2014
[78]

Gradient methods for convex minimization: better rates under weaker conditions

Gradient methods for convex minimization: better rates under weaker conditions , author=. arXiv preprint arXiv:1303.4645 , year=

work page internal anchor Pith review Pith/arXiv arXiv
[79]

IEEE Transactions on Automatic Control , volume=

Asynchronous optimization over graphs: Linear convergence under error bound conditions , author=. IEEE Transactions on Automatic Control , volume=. 2020 , publisher=

work page 2020
[80]

Mathematical Programming , pages=

Fast convergence to non-isolated minima: four equivalent conditions for C 2 functions , author=. Mathematical Programming , pages=. 2024 , publisher=

work page 2024
[81]

Mathematical programming , volume=

Semidefinite relaxations for quadratically constrained quadratic programming: A review and comparisons , author=. Mathematical programming , volume=. 2011 , publisher=

work page 2011
[82]

Optimization methods and software , volume=

Global solution of non-convex quadratically constrained quadratic programs , author=. Optimization methods and software , volume=. 2019 , publisher=

work page 2019
[83]

Optimization and engineering , volume=

A tutorial on geometric programming , author=. Optimization and engineering , volume=. 2007 , publisher=

work page 2007
[84]

European journal of operational research , volume=

Global optimization of signomial geometric programming problems , author=. European journal of operational research , volume=. 2014 , publisher=

work page 2014
[85]

2011 , publisher=

Mixed integer nonlinear programming , author=. 2011 , publisher=

work page 2011
[86]

Optimization and Engineering , volume=

Mixed-integer nonlinear programming 2018 , author=. Optimization and Engineering , volume=. 2019 , publisher=

work page 2018
[87]

Binary Optimization via Mathematical Programming with Equilibrium Constraints

Binary optimization via mathematical programming with equilibrium constraints , author=. arXiv preprint arXiv:1608.04425 , year=

work page internal anchor Pith review Pith/arXiv arXiv
[88]

Journal of Global Optimization , volume=

Optimality and duality for nonsmooth mathematical programming problems with equilibrium constraints , author=. Journal of Global Optimization , volume=. 2023 , publisher=

work page 2023
[89]

SIAM Journal on imaging sciences , volume=

A block coordinate descent method for regularized multiconvex optimization with applications to nonnegative tensor factorization and completion , author=. SIAM Journal on imaging sciences , volume=. 2013 , publisher=

work page 2013

Showing first 80 references.

[1] [1]

Mathematical Models and Methods in Applied Sciences , volume=

An analytical framework for consensus-based global optimization method , author=. Mathematical Models and Methods in Applied Sciences , volume=. 2018 , publisher=

work page 2018

[2] [3]

Modeling and Simulation for Collective Dynamics , pages=

Mean-field particle swarm optimization , author=. Modeling and Simulation for Collective Dynamics , pages=. 2023 , publisher=

work page 2023

[3] [4]

Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences , volume=

Defending against diverse attacks in federated learning through consensus-based bi-level optimization , author=. Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences , volume=. 2025 , publisher=

work page 2025

[4] [5]

Journal of Optimization Theory and Applications , volume=

A discrete consensus-based global optimization method with noisy objective function , author=. Journal of Optimization Theory and Applications , volume=. 2025 , publisher=

work page 2025

[5] [7]

Journal of machine learning research , volume=

FedCBO: Reaching group consensus in clustered federated learning through consensus-based optimization , author=. Journal of machine learning research , volume=

work page

[6] [8]

International conference on the applications of evolutionary computation (part of evostar) , pages=

Convergence of anisotropic consensus-based optimization in mean-field law , author=. International conference on the applications of evolutionary computation (part of evostar) , pages=. 2022 , organization=

work page 2022

[7] [9]

ESAIM: Control, Optimisation and Calculus of Variations , volume=

A consensus-based global optimization method for high dimensional machine learning problems , author=. ESAIM: Control, Optimisation and Calculus of Variations , volume=. 2021 , publisher=

work page 2021

[8] [10]

European Journal of Applied Mathematics , volume=

Consensus-based optimisation with truncated noise , author=. European Journal of Applied Mathematics , volume=. 2025 , publisher=

work page 2025

[9] [11]

Modeling and simulation for collective dynamics , pages=

Consensus-based optimization and ensemble Kalman inversion for global optimization problems with constraints , author=. Modeling and simulation for collective dynamics , pages=. 2023 , publisher=

work page 2023

[10] [12]

2006 , publisher=

Applied asymptotic analysis , author=. 2006 , publisher=

work page 2006

[11] [13]

International Conference on Learning Representations , year=

Self-tuning networks: Bilevel optimization of hyperparameters using structured best-response functions , author=. International Conference on Learning Representations , year=

work page

[12] [14]

Proceedings of the AAAI Conference on Artificial Intelligence , volume=

A stochastic approach to bi-level optimization for hyperparameter optimization and meta learning , author=. Proceedings of the AAAI Conference on Artificial Intelligence , volume=

work page

[13] [15]

Proceedings of the AAAI Conference on Artificial Intelligence , volume=

Rethinking bi-level optimization in neural architecture search: A gibbs sampling perspective , author=. Proceedings of the AAAI Conference on Artificial Intelligence , volume=

work page

[14] [16]

, title =

Gammelli, Daniele and Harrison, James and Yang, Kaidi and Pavone, Marco and Rodrigues, Filipe and Pereira, Francisco C. , title =. Proceedings of the 40th International Conference on Machine Learning , articleno =. 2023 , publisher =

work page 2023

[15] [17]

Mathematical Programming , volume=

An online convex optimization-based framework for convex bilevel optimization , author=. Mathematical Programming , volume=. 2023 , publisher=

work page 2023

[16] [18]

Journal of machine learning research , volume=

Lower bounds and accelerated algorithms for bilevel optimization , author=. Journal of machine learning research , volume=

work page

[17] [19]

Advances in Neural Information Processing Systems , volume=

An accelerated gradient method for convex smooth simple bilevel optimization , author=. Advances in Neural Information Processing Systems , volume=

work page

[18] [21]

SIAM Journal on Optimization , volume=

Proximal point algorithm controlled by a slowly vanishing term: applications to hierarchical minimization , author=. SIAM Journal on Optimization , volume=. 2005 , publisher=

work page 2005

[19] [22]

Bilevel Optimization: Advances and Next Challenges , pages=

Algorithms for simple bilevel programming , author=. Bilevel Optimization: Advances and Next Challenges , pages=. 2020 , publisher=

work page 2020

[20] [23]

Mathematical Programming , volume=

A first order method for finding minimal norm-like solutions of convex optimization problems , author=. Mathematical Programming , volume=. 2014 , publisher=

work page 2014

[21] [24]

International Conference on Artificial Intelligence and Statistics , pages=

A conditional gradient-based method for simple bilevel optimization with convex lower-level problem , author=. International Conference on Artificial Intelligence and Statistics , pages=. 2023 , organization=

work page 2023

[22] [25]

Mathematical Models and Methods in Applied Sciences , volume=

A consensus-based model for global optimization and its mean-field limit , author=. Mathematical Models and Methods in Applied Sciences , volume=. 2017 , publisher=

work page 2017

[23] [26]

SIAM Journal on Optimization , volume=

Constrained consensus-based optimization , author=. SIAM Journal on Optimization , volume=. 2023 , publisher=

work page 2023

[24] [28]

SIAM Journal on Optimization , volume=

Consensus-based optimization methods converge globally , author=. SIAM Journal on Optimization , volume=. 2024 , publisher=

work page 2024

[25] [29]

SIAM Journal on Control and Optimization , volume=

Consensus-based optimization for saddle point problems , author=. SIAM Journal on Control and Optimization , volume=. 2024 , publisher=

work page 2024

[26] [30]

Journal of Machine Learning Research , volume=

Consensus-based optimization on the sphere: Convergence to global minimizers and machine learning , author=. Journal of Machine Learning Research , volume=

work page

[27] [31]

SIAM review , volume=

An algorithmic introduction to numerical simulation of stochastic differential equations , author=. SIAM review , volume=. 2001 , publisher=

work page 2001

[28] [33]

Mathematical Models and Methods in Applied Sciences , volume=

A multiscale consensus-based algorithm for multilevel optimization , author=. Mathematical Models and Methods in Applied Sciences , volume=. 2025 , publisher=

work page 2025

[29] [34]

Studies in Applied Mathematics , volume=

Consensus-based sampling , author=. Studies in Applied Mathematics , volume=. 2022 , publisher=

work page 2022

[30] [35]

SIAM Journal on Optimization , volume=

Consensus-based algorithms for stochastic optimization problems , author=. SIAM Journal on Optimization , volume=. 2025 , publisher=

work page 2025

[31] [36]

International conference on machine learning , pages=

Bilevel programming for hyperparameter optimization and meta-learning , author=. International conference on machine learning , pages=. 2018 , organization=

work page 2018

[32] [38]

International Conference on Machine Learning , pages=

Revisiting and advancing fast adversarial training through the lens of bi-level optimization , author=. International Conference on Machine Learning , pages=. 2022 , organization=

work page 2022

[33] [39]

Advances in Neural Information Processing Systems , volume=

Advancing model pruning via bi-level optimization , author=. Advances in Neural Information Processing Systems , volume=

work page

[34] [40]

IEEE Signal Processing Magazine , volume=

An introduction to bilevel optimization: Foundations and applications in signal processing and machine learning , author=. IEEE Signal Processing Magazine , volume=. 2024 , publisher=

work page 2024

[35] [41]

Advances in Neural Information Processing Systems , volume=

Functional bilevel optimization for machine learning , author=. Advances in Neural Information Processing Systems , volume=

work page

[36] [42]

Applied Mathematics and Computation , volume=

A constrained consensus based optimization algorithm and its application to finance , author=. Applied Mathematics and Computation , volume=. 2022 , publisher=

work page 2022

[37] [44]

An alternating optimization method for bilevel problems under the Polyak-

Xiao, Quan and Lu, Songtao and Chen, Tianyi , journal=. An alternating optimization method for bilevel problems under the Polyak-

work page

[38] [45]

SIAM Journal on Mathematics of Data Science , volume=

Global minima of overparameterized neural networks , author=. SIAM Journal on Mathematics of Data Science , volume=. 2021 , publisher=

work page 2021

[39] [46]

International conference on machine learning , pages=

Gradient descent finds global minima of deep neural networks , author=. International conference on machine learning , pages=. 2019 , organization=

work page 2019

[40] [47]

Advances in Neural Information Processing Systems , volume=

Loss landscape characterization of neural networks without over-parametrization , author=. Advances in Neural Information Processing Systems , volume=

work page

[41] [48]

Active Particles, Volume 3: Advances in Theory, Models, and Applications , pages=

Trends in consensus-based optimization , author=. Active Particles, Volume 3: Advances in Theory, Models, and Applications , pages=. 2021 , publisher=

work page 2021

[42] [50]

Penalty-based methods for simple bilevel optimization under h

Chen, Pengyu and Shi, Xu and Jiang, Rujun and Wang, Jiulin , journal=. Penalty-based methods for simple bilevel optimization under h

work page

[43] [51]

European Journal of Applied Mathematics , volume=

Leveraging memory effects and gradient information in consensus-based optimisation: On global convergence in mean-field law , author=. European Journal of Applied Mathematics , volume=. 2024 , publisher=

work page 2024

[44] [53]

Journal of Optimization Theory and Applications , volume=

A consensus-based algorithm for non-convex multiplayer games , author=. Journal of Optimization Theory and Applications , volume=. 2025 , publisher=

work page 2025

[45] [54]

Acta numerica , volume=

An introduction to numerical methods for stochastic differential equations , author=. Acta numerica , volume=. 1999 , publisher=

work page 1999

[46] [55]

International conference on machine learning , pages=

On penalty-based bilevel gradient descent method , author=. International conference on machine learning , pages=. 2023 , organization=

work page 2023

[47] [56]

Langley , title =

P. Langley , title =. Proceedings of the 17th International Conference on Machine Learning (ICML 2000) , address =. 2000 , pages =

work page 2000

[48] [57]

T. M. Mitchell. The Need for Biases in Learning Generalizations. 1980

work page 1980

[49] [58]

M. J. Kearns , title =

work page

[50] [59]

Machine Learning: An Artificial Intelligence Approach, Vol. I. 1983

work page 1983

[51] [60]

R. O. Duda and P. E. Hart and D. G. Stork. Pattern Classification. 2000

work page 2000

[52] [61]

Suppressed for Anonymity , author=

work page

[53] [62]

Newell and P

A. Newell and P. S. Rosenbloom. Mechanisms of Skill Acquisition and the Law of Practice. Cognitive Skills and Their Acquisition. 1981

work page 1981

[54] [63]

A. L. Samuel. Some Studies in Machine Learning Using the Game of Checkers. IBM Journal of Research and Development. 1959

work page 1959

[55] [64]

Mathematical programming , volume=

A nonlinear programming algorithm for solving semidefinite programs via low-rank factorization , author=. Mathematical programming , volume=. 2003 , publisher=

work page 2003

[56] [65]

Optimization-Online , year=

Alternating direction methods for non convex optimization with applications to second-order least-squares and risk parity portfolio selection , author=. Optimization-Online , year=

work page

[57] [66]

Advances in neural information processing systems , volume=

Algorithms for non-negative matrix factorization , author=. Advances in neural information processing systems , volume=

work page

[58] [67]

Journal of the ACM (JACM) , volume=

Improved approximation algorithms for maximum cut and satisfiability problems using semidefinite programming , author=. Journal of the ACM (JACM) , volume=. 1995 , publisher=

work page 1995

[59] [68]

IEEE Transactions on knowledge and data engineering , volume=

Nonnegative matrix factorization: A comprehensive review , author=. IEEE Transactions on knowledge and data engineering , volume=. 2012 , publisher=

work page 2012

[60] [69]

International conference on machine learning , pages=

Training neural networks without gradients: A scalable admm approach , author=. International conference on machine learning , pages=. 2016 , organization=

work page 2016

[61] [70]

IEEE Transactions on Geoscience and Remote Sensing , volume=

SAR parametric super-resolution image reconstruction methods based on ADMM and deep neural network , author=. IEEE Transactions on Geoscience and Remote Sensing , volume=. 2021 , publisher=

work page 2021

[62] [71]

IEEE Transactions on Robotics , volume=

Biconmp: A nonlinear model predictive control framework for whole body motion planning , author=. IEEE Transactions on Robotics , volume=. 2023 , publisher=

work page 2023

[63] [72]

IEEE Robotics and Automation Letters , volume=

Gait and trajectory optimization for legged systems through phase-based end-effector parameterization , author=. IEEE Robotics and Automation Letters , volume=. 2018 , publisher=

work page 2018

[64] [73]

IEEE Transactions on Circuits and Systems for Video Technology , year=

Linear regression problem relaxations solved by nonconvex ADMM with convergence analysis , author=. IEEE Transactions on Circuits and Systems for Video Technology , year=

work page

[65] [74]

Journal of Scientific Computing , volume=

Global convergence of ADMM in nonconvex nonsmooth optimization , author=. Journal of Scientific Computing , volume=. 2019 , publisher=

work page 2019

[66] [75]

Journal of Machine Learning Research , volume=

Convergence for nonconvex ADMM, with applications to CT imaging , author=. Journal of Machine Learning Research , volume=

work page

[67] [76]

Proceedings of the eighteenth annual ACM symposium on Theory of computing , pages=

The complexity of optimization problems , author=. Proceedings of the eighteenth annual ACM symposium on Theory of computing , pages=

work page

[68] [77]

International Conference on Machine Learning , pages=

An asynchronous parallel stochastic coordinate descent algorithm , author=. International Conference on Machine Learning , pages=. 2014 , organization=

work page 2014

[69] [78]

Gradient methods for convex minimization: better rates under weaker conditions

Gradient methods for convex minimization: better rates under weaker conditions , author=. arXiv preprint arXiv:1303.4645 , year=

work page internal anchor Pith review Pith/arXiv arXiv

[70] [79]

IEEE Transactions on Automatic Control , volume=

Asynchronous optimization over graphs: Linear convergence under error bound conditions , author=. IEEE Transactions on Automatic Control , volume=. 2020 , publisher=

work page 2020

[71] [80]

Mathematical Programming , pages=

Fast convergence to non-isolated minima: four equivalent conditions for C 2 functions , author=. Mathematical Programming , pages=. 2024 , publisher=

work page 2024

[72] [81]

Mathematical programming , volume=

Semidefinite relaxations for quadratically constrained quadratic programming: A review and comparisons , author=. Mathematical programming , volume=. 2011 , publisher=

work page 2011

[73] [82]

Optimization methods and software , volume=

Global solution of non-convex quadratically constrained quadratic programs , author=. Optimization methods and software , volume=. 2019 , publisher=

work page 2019

[74] [83]

Optimization and engineering , volume=

A tutorial on geometric programming , author=. Optimization and engineering , volume=. 2007 , publisher=

work page 2007

[75] [84]

European journal of operational research , volume=

Global optimization of signomial geometric programming problems , author=. European journal of operational research , volume=. 2014 , publisher=

work page 2014

[76] [85]

2011 , publisher=

Mixed integer nonlinear programming , author=. 2011 , publisher=

work page 2011

[77] [86]

Optimization and Engineering , volume=

Mixed-integer nonlinear programming 2018 , author=. Optimization and Engineering , volume=. 2019 , publisher=

work page 2018

[78] [87]

Binary Optimization via Mathematical Programming with Equilibrium Constraints

Binary optimization via mathematical programming with equilibrium constraints , author=. arXiv preprint arXiv:1608.04425 , year=

work page internal anchor Pith review Pith/arXiv arXiv

[79] [88]

Journal of Global Optimization , volume=

Optimality and duality for nonsmooth mathematical programming problems with equilibrium constraints , author=. Journal of Global Optimization , volume=. 2023 , publisher=

work page 2023

[80] [89]

SIAM Journal on imaging sciences , volume=

A block coordinate descent method for regularized multiconvex optimization with applications to nonnegative tensor factorization and completion , author=. SIAM Journal on imaging sciences , volume=. 2013 , publisher=

work page 2013