Variance-Reduced Gradient Estimator for Nonconvex Zeroth-Order Distributed Optimization

Huaiyi Mu; Jie Song; Yujie Tang; Zhongkui Li

arxiv: 2409.19567 · v2 · submitted 2024-09-29 · 🧮 math.OC · cs.MA· cs.SY· eess.SY

Variance-Reduced Gradient Estimator for Nonconvex Zeroth-Order Distributed Optimization

Huaiyi Mu , Yujie Tang , Jie Song , Zhongkui Li This is my paper

Pith reviewed 2026-05-23 21:02 UTC · model grok-4.3

classification 🧮 math.OC cs.MAcs.SYeess.SY

keywords zeroth-order optimizationdistributed optimizationvariance reductionnonconvex optimizationgradient trackingoracle complexity

0 comments

The pith

A Bernoulli-based variance-reduced estimator achieves O(d/ε) oracle complexity in distributed zeroth-order nonconvex optimization.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper develops a gradient estimator for distributed zeroth-order optimization that randomly chooses, via a Bernoulli draw, between renovating one orthogonal direction of the gradient or performing a full-dimensional correction. This choice reduces the number of function evaluations per step compared with always using 2d-point estimators while controlling variance. When the estimator is combined with a gradient tracking mechanism, the resulting algorithm attains an upper bound of O(d/ε) oracle calls to reach an ε-stationary point for smooth nonconvex objectives and O(dκ ln(1/ε)) calls when the objective is also gradient-dominated. The improvement matters because zeroth-order methods are used when explicit gradients are unavailable and each function evaluation is expensive in a distributed network.

Core claim

The paper claims that a variance-reduced zeroth-order gradient estimator that, with probability governed by a Bernoulli distribution, either renovates a single orthogonal direction or computes a full-dimensional correction, when integrated with gradient tracking, yields oracle complexity O(d/ε) for smooth nonconvex functions and O(dκ ln(1/ε)) for smooth gradient-dominated nonconvex functions.

What carries the argument

The variance-reduced gradient estimator that randomly selects between single-direction renovation and full-dimensional variance correction according to a Bernoulli distribution.

If this is right

The per-iteration sampling cost is lower than that of 2d-point estimators while still guaranteeing the same convergence rate order.
Gradient tracking compensates for the partial updates introduced by the single-direction renovation step.
For gradient-dominated problems the complexity improves from linear to logarithmic dependence on 1/ε.
Numerical comparisons with existing zeroth-order distributed methods show fewer total function calls on the tested instances.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same Bernoulli choice could be tested in stochastic first-order settings to see whether similar variance reduction appears.
The estimator might be combined with other communication-efficient mechanisms such as compressed gradients.
Scaling the method to very large networks would test whether the per-node sampling savings remain dominant over communication overhead.

Load-bearing premise

The proposed estimator, when used with gradient tracking, produces no extra bias or variance that would prevent the stated oracle-complexity bounds from holding.

What would settle it

A numerical run on a smooth nonconvex test function in which the total number of function evaluations required to reach an ε-stationary point grows faster than O(d/ε) as ε decreases.

Figures

Figures reproduced from arXiv: 2409.19567 by Huaiyi Mu, Jie Song, Yujie Tang, Zhongkui Li.

**Figure 3.** Figure 3: Convergence of Algorithm 1 with different dimension [PITH_FULL_IMAGE:figures/full_fig_p008_3.png] view at source ↗

read the original abstract

This paper investigates distributed zeroth-order optimization for smooth nonconvex problems, targeting the trade-off between convergence rate and sampling cost per zeroth-order gradient estimation in current algorithms that use either the $2$-point or $2d$-point gradient estimators. We propose a novel variance-reduced gradient estimator that either randomly renovates a single orthogonal direction of the true gradient or calculates the gradient estimation across all dimensions for variance correction, based on a Bernoulli distribution. Integrating this estimator with gradient tracking mechanism allows us to address the trade-off. We show that the oracle complexity of our proposed algorithm is upper bounded by $O(d/\epsilon)$ for smooth nonconvex functions and by $O(d\kappa\ln (1/\epsilon))$ for smooth and gradient dominated nonconvex functions, where $d$ denotes the problem dimension and $\kappa$ is the condition number. Numerical simulations comparing our algorithm with existing methods confirm the effectiveness and efficiency of the proposed gradient estimator.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The O(d/ε) complexity claim for smooth nonconvex functions contradicts standard lower bounds and points to an error in the analysis.

read the letter

The main issue is the claimed oracle complexity of O(d/ε) to reach an ε-stationary point for smooth nonconvex functions. That bound cannot hold. Even first-order methods require Ω(1/ε²) queries in the worst case for nonconvex smooth problems, and zeroth-order methods cannot improve on the information-theoretic limit. The stress-test concern lands directly on the abstract's central result, so the convergence analysis almost certainly contains a gap, such as an unaccounted bias term from the single-direction updates or a missing factor when summing the variance or tracking error over iterations.

Referee Report

2 major / 2 minor

Summary. The paper proposes a variance-reduced zeroth-order gradient estimator for distributed nonconvex optimization that randomly selects (via Bernoulli) between single-direction renovation and full-dimensional correction, combined with gradient tracking. It claims this yields oracle complexity O(d/ε) to reach an ε-stationary point for smooth nonconvex objectives and O(dκ ln(1/ε)) for the gradient-dominated case, with supporting numerical simulations.

Significance. If the stated complexity bounds were valid, the work would represent a substantial improvement over existing zeroth-order methods by achieving dimension-linear dependence and sub-quadratic dependence on 1/ε. However, the central claim is inconsistent with known lower bounds, limiting its potential impact.

major comments (2)

[Abstract] Abstract (and the main nonconvex convergence result): the claimed oracle complexity upper bound of O(d/ε) for smooth nonconvex functions to reach an ε-stationary point contradicts the information-theoretic lower bound of Ω(1/ε²) that holds even for first-order methods (Carmon et al., 2017; Nesterov, 2018). Zeroth-order estimators cannot circumvent this, and the Bernoulli single-direction renovation plus gradient tracking still requires Ω(1) queries per iteration on average; the analysis therefore contains a load-bearing error, most likely in the descent inequality, bias/variance accounting for the estimator, or telescoping of the tracking error.
[§3] The variance-reduced estimator definition (likely §3): the Bernoulli choice between single-direction renovation and full correction is asserted to produce an unbiased estimator whose variance yields the improved rate, but no explicit bias calculation or variance bound is provided that would allow the O(d/ε) summation; this assumption is load-bearing for both complexity theorems.

minor comments (2)

Notation for the distributed setting (e.g., local functions f_i, communication graph) is introduced without a dedicated preliminary section, making it difficult to track the precise assumptions on smoothness, gradient dominance, and network connectivity.
The simulation section compares against baselines but does not report the precise values of d, ε, or number of nodes used to generate the plotted curves, hindering reproducibility.

Simulated Author's Rebuttal

2 responses · 1 unresolved

We thank the referee for the thorough review and for highlighting the inconsistency between our claimed complexity and known lower bounds. We acknowledge that the O(d/ε) bound for general smooth nonconvex problems cannot hold and likely stems from an error in the convergence analysis. We will revise the manuscript to correct this and strengthen the supporting calculations.

read point-by-point responses

Referee: [Abstract] Abstract (and the main nonconvex convergence result): the claimed oracle complexity upper bound of O(d/ε) for smooth nonconvex functions to reach an ε-stationary point contradicts the information-theoretic lower bound of Ω(1/ε²) that holds even for first-order methods (Carmon et al., 2017; Nesterov, 2018). Zeroth-order estimators cannot circumvent this, and the Bernoulli single-direction renovation plus gradient tracking still requires Ω(1) queries per iteration on average; the analysis therefore contains a load-bearing error, most likely in the descent inequality, bias/variance accounting for the estimator, or telescoping of the tracking error.

Authors: We agree that the stated O(d/ε) oracle complexity for general smooth nonconvex objectives is inconsistent with the Ω(1/ε²) lower bound. This indicates an error in our analysis, most probably in the descent inequality or the handling of the estimator's variance and bias when combined with gradient tracking. The Bernoulli mechanism does not reduce the per-iteration query cost below a constant on average, so it cannot improve the fundamental 1/ε² dependence. We will revise the main theorem to remove or correct this claim. The O(dκ ln(1/ε)) bound for the gradient-dominated case may remain valid under the PL-like assumption and will be re-verified separately. revision: yes
Referee: [§3] The variance-reduced estimator definition (likely §3): the Bernoulli choice between single-direction renovation and full correction is asserted to produce an unbiased estimator whose variance yields the improved rate, but no explicit bias calculation or variance bound is provided that would allow the O(d/ε) summation; this assumption is load-bearing for both complexity theorems.

Authors: We will add explicit derivations of the bias (which we believe is zero) and a tight variance bound for the proposed estimator in the revised §3. These calculations will be used to re-derive the complexity results under corrected assumptions, ensuring the supporting lemmas are fully rigorous. revision: yes

standing simulated objections not resolved

The O(d/ε) complexity claim for smooth nonconvex functions cannot be defended, as it directly contradicts established lower bounds.

Circularity Check

0 steps flagged

No significant circularity detected

full rationale

The paper presents its oracle complexity bounds O(d/ε) and O(dκ ln(1/ε)) as outcomes of a convergence analysis applied to the proposed Bernoulli-based variance-reduced estimator integrated with gradient tracking. No self-definitional reductions, fitted inputs renamed as predictions, or load-bearing self-citation chains appear in the abstract or description. The derivation chain is presented as independent of the target bounds themselves, with numerical simulations offered as external validation rather than internal fitting. This is the most common honest finding for a theoretical analysis paper whose central claims rest on stated assumptions and telescoping arguments rather than tautological redefinitions.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract provides no explicit free parameters, axioms, or invented entities; the Bernoulli mechanism is part of the algorithmic design rather than a fitted quantity.

pith-pipeline@v0.9.0 · 5708 in / 1237 out tokens · 47455 ms · 2026-05-23T21:02:04.306112+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

28 extracted references · 28 canonical work pages · 1 internal anchor

[1]

Distributed smooth convex optimization with coupled constraints

Shu Liang, Le Yi Wang, and George Yin. Distributed smooth convex optimization with coupled constraints. IEEE Transactions on Automatic Control, 65(1):347–353, 2020

work page 2020
[2]

Distributed control for reaching optimal steady state in network systems: An optimization approach

Xuan Zhang, Antonis Papachristodoulou, and Na Li. Distributed control for reaching optimal steady state in network systems: An optimization approach. IEEE Transactions on Automatic Control , 63(3):864–871, 2018

work page 2018
[3]

Jie Liu, Daniel W. C. Ho, and Lulu Li. A generic algorithm framework for distributed optimization over the time-varying network with communication delays. IEEE Transactions on Automatic Control, 69(1):371–378, 2024

work page 2024
[4]

Distributed subgradient meth- ods for multi-agent optimization

Angelia Nedi ´c and Asuman Ozdaglar. Distributed subgradient meth- ods for multi-agent optimization. IEEE Transactions on Automatic Control, 54(1):48–61, 2009

work page 2009
[5]

EXTRA: An exact first-order algorithm for decentralized consensus optimization

Wei Shi, Qing Ling, Gang Wu, and Wotao Yin. EXTRA: An exact first-order algorithm for decentralized consensus optimization. SIAM Journal on Optimization , 25(2):944–966, 2015

work page 2015
[6]

Distributed gradient methods for convex machine learning problems in networks: Distributed optimization

Angelia Nedi ´c. Distributed gradient methods for convex machine learning problems in networks: Distributed optimization. IEEE Signal Processing Magazine, 37(3):92–101, 2020

work page 2020
[7]

Non- convex distributed feedback optimization for aggregative cooperative robotics

Guido Carnevale, Nicola Mimmo, and Giuseppe Notarstefano. Non- convex distributed feedback optimization for aggregative cooperative robotics. Automatica, 167:111767, 2024

work page 2024
[8]

Can decentralized algorithms outperform centralized algorithms? A case study for decentralized parallel stochastic gradient descent

Xiangru Lian, Ce Zhang, Huan Zhang, Cho-Jui Hsieh, Wei Zhang, and Ji Liu. Can decentralized algorithms outperform centralized algorithms? A case study for decentralized parallel stochastic gradient descent. In Advances in Neural Information Processing Systems , volume 30, 2017

work page 2017
[9]

On nonconvex decentralized gradient descent

Jinshan Zeng and Wotao Yin. On nonconvex decentralized gradient descent. IEEE Transactions on signal processing , 66(11):2834–2848, 2018

work page 2018
[10]

Distributed nonconvex constrained optimization over time-varying digraphs

Gesualdo Scutari and Ying Sun. Distributed nonconvex constrained optimization over time-varying digraphs. Mathematical Programming, 176:497–544, 2019

work page 2019
[11]

Surrogate-based distributed optimisation for expensive black-box func- tions

Zhongguo Li, Zhen Dong, Zhongchao Liang, and Zhengtao Ding. Surrogate-based distributed optimisation for expensive black-box func- tions. Automatica, 125:109407, 2021

work page 2021
[12]

Regret analysis of stochastic and nonstochastic multi-armed bandit problems

S ´ebastien Bubeck and Nicolo Cesa-Bianchi. Regret analysis of stochastic and nonstochastic multi-armed bandit problems. Founda- tions and Trends® in Machine Learning , 5(1):1–122, 2012

work page 2012
[13]

Lee, Danqi Chen, and Sanjeev Arora

Sadhika Malladi, Tianyu Gao, Eshaan Nichani, Alex Damian, Jason D. Lee, Danqi Chen, and Sanjeev Arora. Fine-tuning language models with just forward passes. In Advances in Neural Information Process- ing Systems, volume 36, pages 53038–53075, 2023

work page 2023
[14]

Communication-Efficient Distributed Strongly Convex Stochastic Optimization: Non-Asymptotic Rates

Anit Kumar Sahu, Dusan Jakovetic, Dragana Bajovic, and Soummya Kar. Communication-efficient distributed strongly convex stochastic optimization: Non-asymptotic rates. arXiv preprint arXiv:1809.02920, 2018

work page internal anchor Pith review Pith/arXiv arXiv 2018
[15]

Random gradient-free mini- mization of convex functions

Yurii Nesterov and Vladimir Spokoiny. Random gradient-free mini- mization of convex functions. Foundations of Computational Mathe- matics, 17(2):527–566, 2017

work page 2017
[16]

Stochastic estimation of the maximum of a regression function

Jack Kiefer and Jacob Wolfowitz. Stochastic estimation of the maximum of a regression function. The Annals of Mathematical Statistics, pages 462–466, 1952

work page 1952
[17]

Distributed zero-order algorithms for nonconvex multiagent optimization

Yujie Tang, Junshan Zhang, and Na Li. Distributed zero-order algorithms for nonconvex multiagent optimization. IEEE Transactions on Control of Network Systems , 8(1):269–281, 2021

work page 2021
[18]

Machine learning for variance reduction in online experiments

Yongyi Guo, Dominic Coey, Mikael Konutgan, Wenting Li, Chris Schoener, and Matt Goldman. Machine learning for variance reduction in online experiments. In Advances in Neural Information Processing Systems, volume 34, pages 8637–8648, 2021

work page 2021
[19]

Smola, and Eric P

Chong Wang, Xi Chen, Alexander J. Smola, and Eric P. Xing. Variance reduction for stochastic gradient optimization. In Advances in Neural Information Processing Systems , volume 26, 2013

work page 2013
[20]

Accelerating stochastic gradient descent using predictive variance reduction

Rie Johnson and Tong Zhang. Accelerating stochastic gradient descent using predictive variance reduction. In Advances in Neural Information Processing Systems, volume 26, 2013

work page 2013
[21]

Zeroth-order stochastic variance reduction for nonconvex optimization

Sijia Liu, Bhavya Kailkhura, Pin-Yu Chen, Paishun Ting, Shiyu Chang, and Lisa Amini. Zeroth-order stochastic variance reduction for nonconvex optimization. In Advances in Neural Information Processing Systems, volume 31, 2018

work page 2018
[22]

Variance-reduced decentralized stochastic optimization with accelerated convergence

Ran Xin, Usman A Khan, and Soummya Kar. Variance-reduced decentralized stochastic optimization with accelerated convergence. IEEE Transactions on Signal Processing , 68:6255–6271, 2020

work page 2020
[23]

ZONE: Zeroth-order nonconvex multiagent optimization over networks

Davood Hajinezhad, Mingyi Hong, and Alfredo Garcia. ZONE: Zeroth-order nonconvex multiagent optimization over networks. IEEE Transactions on Automatic Control , 64(10):3995–4010, 2019

work page 2019
[24]

Prob- lem Complexity and Method Efficiency in Optimization

Arkadij Semenovi ˇc Nemirovskij and David Borisovich Yudin. Prob- lem Complexity and Method Efficiency in Optimization . Wiley- Interscience, 1983

work page 1983
[25]

Online convex optimization in the bandit setting: gradient descent without a gradient

Abraham D Flaxman, Adam Tauman Kalai, and H Brendan McMahan. Online convex optimization in the bandit setting: gradient descent without a gradient. In Proceedings of the Sixteenth Annual ACM- SIAM Symposium on Discrete Algorithms , pages 385–394, 2005

work page 2005
[26]

Harnessing smoothness to accelerate distributed optimization

Guannan Qu and Na Li. Harnessing smoothness to accelerate distributed optimization. IEEE Transactions on Control of Network Systems, 5(3):1245–1260, 2017

work page 2017
[27]

Achieving geometric convergence for distributed optimization over time-varying graphs

Angelia Nedi ´c, Alex Olshevsky, and Wei Shi. Achieving geometric convergence for distributed optimization over time-varying graphs. SIAM Journal on Optimization , 27(4):2597–2633, 2017

work page 2017
[28]

Horn and Charles R

Roger A. Horn and Charles R. Johnson. Matrix Analysis. Cambridge University Press, 2nd edition, 2012

work page 2012

[1] [1]

Distributed smooth convex optimization with coupled constraints

Shu Liang, Le Yi Wang, and George Yin. Distributed smooth convex optimization with coupled constraints. IEEE Transactions on Automatic Control, 65(1):347–353, 2020

work page 2020

[2] [2]

Distributed control for reaching optimal steady state in network systems: An optimization approach

Xuan Zhang, Antonis Papachristodoulou, and Na Li. Distributed control for reaching optimal steady state in network systems: An optimization approach. IEEE Transactions on Automatic Control , 63(3):864–871, 2018

work page 2018

[3] [3]

Jie Liu, Daniel W. C. Ho, and Lulu Li. A generic algorithm framework for distributed optimization over the time-varying network with communication delays. IEEE Transactions on Automatic Control, 69(1):371–378, 2024

work page 2024

[4] [4]

Distributed subgradient meth- ods for multi-agent optimization

Angelia Nedi ´c and Asuman Ozdaglar. Distributed subgradient meth- ods for multi-agent optimization. IEEE Transactions on Automatic Control, 54(1):48–61, 2009

work page 2009

[5] [5]

EXTRA: An exact first-order algorithm for decentralized consensus optimization

Wei Shi, Qing Ling, Gang Wu, and Wotao Yin. EXTRA: An exact first-order algorithm for decentralized consensus optimization. SIAM Journal on Optimization , 25(2):944–966, 2015

work page 2015

[6] [6]

Distributed gradient methods for convex machine learning problems in networks: Distributed optimization

Angelia Nedi ´c. Distributed gradient methods for convex machine learning problems in networks: Distributed optimization. IEEE Signal Processing Magazine, 37(3):92–101, 2020

work page 2020

[7] [7]

Non- convex distributed feedback optimization for aggregative cooperative robotics

Guido Carnevale, Nicola Mimmo, and Giuseppe Notarstefano. Non- convex distributed feedback optimization for aggregative cooperative robotics. Automatica, 167:111767, 2024

work page 2024

[8] [8]

Can decentralized algorithms outperform centralized algorithms? A case study for decentralized parallel stochastic gradient descent

Xiangru Lian, Ce Zhang, Huan Zhang, Cho-Jui Hsieh, Wei Zhang, and Ji Liu. Can decentralized algorithms outperform centralized algorithms? A case study for decentralized parallel stochastic gradient descent. In Advances in Neural Information Processing Systems , volume 30, 2017

work page 2017

[9] [9]

On nonconvex decentralized gradient descent

Jinshan Zeng and Wotao Yin. On nonconvex decentralized gradient descent. IEEE Transactions on signal processing , 66(11):2834–2848, 2018

work page 2018

[10] [10]

Distributed nonconvex constrained optimization over time-varying digraphs

Gesualdo Scutari and Ying Sun. Distributed nonconvex constrained optimization over time-varying digraphs. Mathematical Programming, 176:497–544, 2019

work page 2019

[11] [11]

Surrogate-based distributed optimisation for expensive black-box func- tions

Zhongguo Li, Zhen Dong, Zhongchao Liang, and Zhengtao Ding. Surrogate-based distributed optimisation for expensive black-box func- tions. Automatica, 125:109407, 2021

work page 2021

[12] [12]

Regret analysis of stochastic and nonstochastic multi-armed bandit problems

S ´ebastien Bubeck and Nicolo Cesa-Bianchi. Regret analysis of stochastic and nonstochastic multi-armed bandit problems. Founda- tions and Trends® in Machine Learning , 5(1):1–122, 2012

work page 2012

[13] [13]

Lee, Danqi Chen, and Sanjeev Arora

Sadhika Malladi, Tianyu Gao, Eshaan Nichani, Alex Damian, Jason D. Lee, Danqi Chen, and Sanjeev Arora. Fine-tuning language models with just forward passes. In Advances in Neural Information Process- ing Systems, volume 36, pages 53038–53075, 2023

work page 2023

[14] [14]

Communication-Efficient Distributed Strongly Convex Stochastic Optimization: Non-Asymptotic Rates

Anit Kumar Sahu, Dusan Jakovetic, Dragana Bajovic, and Soummya Kar. Communication-efficient distributed strongly convex stochastic optimization: Non-asymptotic rates. arXiv preprint arXiv:1809.02920, 2018

work page internal anchor Pith review Pith/arXiv arXiv 2018

[15] [15]

Random gradient-free mini- mization of convex functions

Yurii Nesterov and Vladimir Spokoiny. Random gradient-free mini- mization of convex functions. Foundations of Computational Mathe- matics, 17(2):527–566, 2017

work page 2017

[16] [16]

Stochastic estimation of the maximum of a regression function

Jack Kiefer and Jacob Wolfowitz. Stochastic estimation of the maximum of a regression function. The Annals of Mathematical Statistics, pages 462–466, 1952

work page 1952

[17] [17]

Distributed zero-order algorithms for nonconvex multiagent optimization

Yujie Tang, Junshan Zhang, and Na Li. Distributed zero-order algorithms for nonconvex multiagent optimization. IEEE Transactions on Control of Network Systems , 8(1):269–281, 2021

work page 2021

[18] [18]

Machine learning for variance reduction in online experiments

Yongyi Guo, Dominic Coey, Mikael Konutgan, Wenting Li, Chris Schoener, and Matt Goldman. Machine learning for variance reduction in online experiments. In Advances in Neural Information Processing Systems, volume 34, pages 8637–8648, 2021

work page 2021

[19] [19]

Smola, and Eric P

Chong Wang, Xi Chen, Alexander J. Smola, and Eric P. Xing. Variance reduction for stochastic gradient optimization. In Advances in Neural Information Processing Systems , volume 26, 2013

work page 2013

[20] [20]

Accelerating stochastic gradient descent using predictive variance reduction

Rie Johnson and Tong Zhang. Accelerating stochastic gradient descent using predictive variance reduction. In Advances in Neural Information Processing Systems, volume 26, 2013

work page 2013

[21] [21]

Zeroth-order stochastic variance reduction for nonconvex optimization

Sijia Liu, Bhavya Kailkhura, Pin-Yu Chen, Paishun Ting, Shiyu Chang, and Lisa Amini. Zeroth-order stochastic variance reduction for nonconvex optimization. In Advances in Neural Information Processing Systems, volume 31, 2018

work page 2018

[22] [22]

Variance-reduced decentralized stochastic optimization with accelerated convergence

Ran Xin, Usman A Khan, and Soummya Kar. Variance-reduced decentralized stochastic optimization with accelerated convergence. IEEE Transactions on Signal Processing , 68:6255–6271, 2020

work page 2020

[23] [23]

ZONE: Zeroth-order nonconvex multiagent optimization over networks

Davood Hajinezhad, Mingyi Hong, and Alfredo Garcia. ZONE: Zeroth-order nonconvex multiagent optimization over networks. IEEE Transactions on Automatic Control , 64(10):3995–4010, 2019

work page 2019

[24] [24]

Prob- lem Complexity and Method Efficiency in Optimization

Arkadij Semenovi ˇc Nemirovskij and David Borisovich Yudin. Prob- lem Complexity and Method Efficiency in Optimization . Wiley- Interscience, 1983

work page 1983

[25] [25]

Online convex optimization in the bandit setting: gradient descent without a gradient

Abraham D Flaxman, Adam Tauman Kalai, and H Brendan McMahan. Online convex optimization in the bandit setting: gradient descent without a gradient. In Proceedings of the Sixteenth Annual ACM- SIAM Symposium on Discrete Algorithms , pages 385–394, 2005

work page 2005

[26] [26]

Harnessing smoothness to accelerate distributed optimization

Guannan Qu and Na Li. Harnessing smoothness to accelerate distributed optimization. IEEE Transactions on Control of Network Systems, 5(3):1245–1260, 2017

work page 2017

[27] [27]

Achieving geometric convergence for distributed optimization over time-varying graphs

Angelia Nedi ´c, Alex Olshevsky, and Wei Shi. Achieving geometric convergence for distributed optimization over time-varying graphs. SIAM Journal on Optimization , 27(4):2597–2633, 2017

work page 2017

[28] [28]

Horn and Charles R

Roger A. Horn and Charles R. Johnson. Matrix Analysis. Cambridge University Press, 2nd edition, 2012

work page 2012