Distributed Inexact Successive Convex Approximation ADMM: Analysis-Part I

Daniel P. Palomar; Ketan Rajawat; Sandeep Kumar

arxiv: 1907.08969 · v1 · pith:RDRYZRTInew · submitted 2019-07-21 · 🧮 math.OC · cs.DC· stat.ML

Distributed Inexact Successive Convex Approximation ADMM: Analysis-Part I

Sandeep Kumar , Ketan Rajawat , Daniel P. Palomar This is my paper

Pith reviewed 2026-05-24 18:39 UTC · model grok-4.3

classification 🧮 math.OC cs.DCstat.ML

keywords non-convex optimizationADMMsuccessive convex approximationdistributed algorithmsconvergence analysisinexact methodsvariance reductionasynchronous optimization

0 comments

The pith

Distributed inexact SCA-ADMM achieves first-order convergence for non-convex problems under mild assumptions on errors and delays.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces an algorithmic framework that combines the alternating direction method of multipliers with successive convex approximation to solve non-convex problems whose objective is the sum of smooth component functions plus a convex or smooth regularization term. The approach supports distributed operation both with and without a fusion center and includes variance-reduced methods as special cases. Under very mild assumptions, two algorithm variants are shown to deliver first-order convergence rates while remaining robust to random, deterministic, or adversarial uncertainties through a generic model of errors and delays. This setup directly enables the variance-reduced, asynchronous, and stochastic implementations developed in part II.

Core claim

The paper develops two variants of the distributed inexact successive convex approximation ADMM algorithm for non-convex problems and establishes first-order convergence rate guarantees under mild assumptions. The framework incorporates ideas from ADMM, SCA, distributed algorithms, and inexact gradient methods, allowing flexibility for non-convex objectives, distributed operation, and variance reduction as special cases. The proofs accommodate generic errors and delays from random, deterministic, or adversarial sources.

What carries the argument

The distributed inexact SCA-ADMM algorithm, which applies successive convex approximations to non-convex components inside an ADMM splitting structure while allowing inexact updates and generic delays.

Load-bearing premise

The component functions and the generic errors or delays satisfy the mild conditions required for the convergence analysis to hold.

What would settle it

A concrete non-convex instance satisfying the mild assumptions on the functions, together with bounded errors and delays, in which either variant diverges or fails to achieve the stated first-order rate would disprove the claim.

Figures

Figures reproduced from arXiv: 1907.08969 by Daniel P. Palomar, Ketan Rajawat, Sandeep Kumar.

**Figure 1.** Figure 1: Distributed optimization architectures 1) Fusion-centric architecture: The fusion-centric or master-worker architecture has been widely used in data science applications [31], [32], where the aim is to divide the [PITH_FULL_IMAGE:figures/full_fig_p002_1.png] view at source ↗

read the original abstract

In this two-part work, we propose an algorithmic framework for solving non-convex problems whose objective function is the sum of a number of smooth component functions plus a convex (possibly non-smooth) or/and smooth (possibly non-convex) regularization function. The proposed algorithm incorporates ideas from several existing approaches such as alternate direction method of multipliers (ADMM), successive convex approximation (SCA), distributed and asynchronous algorithms, and inexact gradient methods. Different from a number of existing approaches, however, the proposed framework is flexible enough to incorporate a class of non-convex objective functions, allow distributed operation with and without a fusion center, and include variance reduced methods as special cases. Remarkably, the proposed algorithms are robust to uncertainties arising from random, deterministic, and adversarial sources. The part I of the paper develops two variants of the algorithm under very mild assumptions and establishes first-order convergence rate guarantees. The proof developed here allows for generic errors and delays, paving the way for different variance-reduced, asynchronous, and stochastic implementations, outlined and evaluated in part II.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

This paper offers a general inexact ADMM-SCA framework for non-convex distributed problems with convergence under mild error assumptions.

read the letter

The paper's key contribution is a flexible algorithmic framework that integrates ADMM and successive convex approximation for non-convex distributed optimization problems. It supports operation with or without a fusion center, treats variance-reduced methods as special cases, and proves first-order convergence while allowing for generic errors and delays from various sources. It does well by keeping the assumptions mild and by designing the analysis to be general enough to cover the stochastic and asynchronous extensions in part II. The robustness claims to random, deterministic, and adversarial uncertainties are backed by the proof structure. The soft spots are that the convergence guarantee is only first-order, which is typical but not strong, and this part I contains no numerical experiments. The exact details of the mild assumptions would need checking in the full proofs to confirm they do not restrict the non-convexity too much. Since the stress test found no inconsistencies, the argument seems internally consistent. The citation pattern is appropriate and there is no evidence of circular reasoning. This work is for researchers focused on distributed non-convex optimization in areas like signal processing and machine learning. A reader looking for general convergence results for inexact methods will find it useful. I recommend sending it for peer review as the framework adds a useful level of generality.

Referee Report

0 major / 3 minor

Summary. The manuscript proposes a distributed inexact successive convex approximation (SCA) ADMM framework for non-convex problems whose objective is the sum of smooth component functions plus a convex (possibly nonsmooth) or smooth (possibly nonconvex) regularizer. Part I develops two algorithm variants and proves first-order convergence under mild assumptions on the component functions, allowing generic (random, deterministic, or adversarial) errors and delays; the analysis is positioned to support variance-reduced, asynchronous, and stochastic implementations in Part II.

Significance. If the convergence claims hold, the framework's ability to unify ADMM, SCA, distributed operation (with/without fusion center), and variance reduction as special cases, while remaining robust to generic errors/delays, would be a useful contribution to distributed nonconvex optimization. The explicit allowance for inexactness and delays in the proof is a positive feature that directly enables the extensions claimed for Part II.

minor comments (3)

[Abstract / §1] The abstract and introduction repeatedly describe the assumptions as 'very mild' without listing them; a concise enumerated list of the precise assumptions (e.g., on smoothness, convexity of the regularizer, boundedness of errors) should appear early in §2 or §3 so readers can immediately assess the scope.
[§3] Notation for the two algorithm variants (e.g., how the inexact SCA step and the ADMM multiplier update differ) should be introduced with a side-by-side comparison table or clearly labeled equations to avoid confusion when the convergence proofs are presented.
[§1 / §4] The claim that variance-reduced methods arise as special cases is stated but not demonstrated with an explicit reduction; a short remark or corollary showing how a particular variance-reduction scheme fits the generic-error model would strengthen the flexibility argument.

Simulated Author's Rebuttal

0 responses · 0 unresolved

We thank the referee for their careful reading of the manuscript and for the positive assessment of its potential significance in unifying ADMM, SCA, distributed optimization, and robustness to errors/delays. The recommendation is listed as 'uncertain,' but the report does not enumerate any specific major comments. We therefore provide this general response and remain available to address any detailed points the referee may wish to raise.

Circularity Check

0 steps flagged

No significant circularity detected

full rationale

The paper constructs an algorithmic framework combining ADMM, SCA, and related methods for non-convex sum-of-functions problems, then derives first-order convergence under explicitly stated mild assumptions on component functions, generic errors, and delays. These assumptions are independent inputs to the proof rather than outputs or fitted quantities; the flexibility claims (non-convex objectives, distributed operation, variance reduction as special case) follow directly from the algorithm definition without any self-referential reduction. No load-bearing step equates a prediction to its own fitted input, invokes a self-citation as an unverified uniqueness theorem, or renames a known result as a new derivation. The analysis is therefore self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

The paper relies on standard optimization assumptions for convergence but introduces no new free parameters, axioms beyond domain standards, or invented entities; the analysis is built on mild assumptions about functions and errors that are not enumerated here.

pith-pipeline@v0.9.0 · 5721 in / 1212 out tokens · 17151 ms · 2026-05-24T18:39:04.857810+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

The proposed algorithm uses updates where the non-convexity is handled within SCA framework... convex surrogate functions... quadratic upper bound property
IndisputableMonolith/Foundation/RealityFromDistinction.lean reality_from_one_distinction unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

first-order convergence rate guarantees... generic errors and delays

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

53 extracted references · 53 canonical work pages · 7 internal anchors

[1]

Non-convex optimization for machine learning,

P. Jain, P. Kar et al., “Non-convex optimization for machine learning,” Foundations and Trends R⃝ in Machine Learning , vol. 10, no. 3-4, pp. 142–336, 2017

work page 2017
[2]

Imagenet classiﬁcation with deep convolutional neural networks,

A. Krizhevsky, I. Sutskever, and G. E. Hinton, “Imagenet classiﬁcation with deep convolutional neural networks,” in Advances in NIPS , 2012, pp. 1097–1105

work page 2012
[3]

Parallel and distributed successive convex approximation methods for big-data optimization,

G. Scutari and Y . Sun, “Parallel and distributed successive convex approximation methods for big-data optimization,” in Multi-agent Opti- mization. Springer, 2018, pp. 141–308

work page 2018
[4]

Asynchronous optimization over heterogeneous networks via consensus admm,

S. Kumar, R. Jain, and K. Rajawat, “Asynchronous optimization over heterogeneous networks via consensus admm,” IEEE Trans. on Signal and Info. Process. over Netw. , vol. 3, no. 1, pp. 114–129, 2017

work page 2017
[5]

A parallel de- composition method for nonconvex stochastic multi-agent optimization problems,

Y . Yang, G. Scutari, D. P. Palomar, and M. Pesavento, “A parallel de- composition method for nonconvex stochastic multi-agent optimization problems,” IEEE Trans. on Signal Process. , vol. 64, no. 11, pp. 2949– 2964, 2016

work page 2016
[6]

Variance reduction for faster non-convex optimization,

Z. Allen-Zhu and E. Hazan, “Variance reduction for faster non-convex optimization,” in Proc. of the 33rd ICML , 2016, pp. 699–707

work page 2016
[7]

Stochastic variance reduction for nonconvex optimization,

S. J. Reddi, A. Hefny, S. Sra, B. Poczos, and A. Smola, “Stochastic variance reduction for nonconvex optimization,” in Proc. of the 33rd ICML, 2016, pp. 314–323

work page 2016
[8]

Stochastic Alternating Direction Method of Multipliers with Variance Reduction for Nonconvex Optimization

F. Huang, S. Chen, and Z. Lu, “Stochastic alternating direction method of multipliers with variance reduction for nonconvex optimization,” arXiv preprint arXiv:1610.02758 , 2016

work page internal anchor Pith review Pith/arXiv arXiv 2016
[9]

Majorization-minimization algo- rithms in signal processing, communications, and machine learning,

Y . Sun, P. Babu, and D. P. Palomar, “Majorization-minimization algo- rithms in signal processing, communications, and machine learning,” IEEE Trans. on Signal Process. , vol. 65, no. 3, pp. 794–816, 2017

work page 2017
[10]

Parallel and distributed methods for nonconvex optimization,

G. Scutari, F. Facchinei, L. Lampariello, and P. Song, “Parallel and distributed methods for nonconvex optimization,” in Proc. of 2014 ICASSP. IEEE, 2014, pp. 840–844

work page 2014
[11]

D. P. Bertsekas, Nonlinear programming. Athena scientiﬁc Belmont, 1999

work page 1999
[12]

Convergence of Stochastic Proximal Gradient Algorithm

L. Rosasco, S. Villa, and B. C. V ˜u, “Convergence of stochastic proximal gradient algorithm,” arXiv preprint arXiv:1403.5074 , 2014

work page internal anchor Pith review Pith/arXiv arXiv 2014
[13]

A stochastic gradient method with an exponential convergence rate for ﬁnite training sets,

N. L. Roux, M. Schmidt, and F. R. Bach, “A stochastic gradient method with an exponential convergence rate for ﬁnite training sets,” in Advances in NIPS , 2012, pp. 2663–2671

work page 2012
[14]

First-order methods of smooth convex optimization with inexact oracle,

O. Devolder, F. Glineur, and Y . Nesterov, “First-order methods of smooth convex optimization with inexact oracle,” Mathematical Programming, vol. 146, no. 1-2, pp. 37–75, 2014

work page 2014
[15]

How to escape saddle points efﬁciently,

C. Jin, R. Ge, P. Netrapalli, S. M. Kakade, and M. I. Jordan, “How to escape saddle points efﬁciently,” inProceedings of the 34th International Conference on Machine Learning-Volume 70 . JMLR. org, 2017, pp. 1724–1732

work page 2017
[16]

QSGD: Communication-efﬁcient sgd via gradient quantization and encoding,

D. Alistarh, D. Grubic, J. Li, R. Tomioka, and M. V ojnovic, “QSGD: Communication-efﬁcient sgd via gradient quantization and encoding,” in Advances in NIPS , 2017, pp. 1709–1720

work page 2017
[17]

Gradient Sparsification for Communication-Efficient Distributed Optimization

J. Wangni, J. Wang, J. Liu, and T. Zhang, “Gradient sparsiﬁcation for communication-efﬁcient distributed optimization,” arXiv preprint arXiv:1710.09854, 2017

work page internal anchor Pith review Pith/arXiv arXiv 2017
[18]

Generalized stochastic frank-wolfe algorithm with stochastic “Substitute

H. Lu and R. M. Freund, “Generalized stochastic frank-wolfe algorithm with stochastic “Substitute” gradient for structured convex optimization,” arXiv preprint arxiv:1807.07680 , 2018

work page arXiv 2018
[19]

Robust accelerated gradient method,

N. S. Aybat, A. Fallah, M. Gurbuzbalaban, and A. Ozdaglar, “Robust accelerated gradient method,” arXiv preprint arXiv:1805.10579 , 2018

work page arXiv 2018
[20]

Analysis of approximate stochastic gra- dient using quadratic constraints and sequential semideﬁnite programs,

B. Hu, P. Seiler, and L. Lessard, “Analysis of approximate stochastic gra- dient using quadratic constraints and sequential semideﬁnite programs,” arXiv preprint arXiv:1711.00987 , 2017

work page arXiv 2017
[21]

Convergence rates of inexact proximal-gradient methods for convex optimization,

M. Schmidt, N. L. Roux, and F. R. Bach, “Convergence rates of inexact proximal-gradient methods for convex optimization,” in Advances in NIPS, 2011, pp. 1458–1466

work page 2011
[22]

Smooth optimization with approximate gradient,

A. d’Aspremont, “Smooth optimization with approximate gradient,” SIAM Journal on Optimization , vol. 19, no. 3, pp. 1171–1183, 2008

work page 2008
[23]

Robustness of accelerated ﬁrst-order algorithms for strongly convex optimization problems,

H. Mohammadi, M. Razaviyayn, and M. R. Jovanovi ´c, “Robustness of accelerated ﬁrst-order algorithms for strongly convex optimization problems,” arXiv preprint arXiv:1905.11011 , 2019

work page arXiv 1905
[24]

A distributed, asynchronous, and incremental algorithm for nonconvex optimization: An admm approach,

M. Hong, “A distributed, asynchronous, and incremental algorithm for nonconvex optimization: An admm approach,” IEEE Trans. on Control of Network Systems , vol. 5, no. 3, pp. 935–945, 2018

work page 2018
[25]

NESTT: A nonconvex primal-dual splitting method for distributed and stochastic optimization,

D. Hajinezhad, M. Hong, T. Zhao, and Z. Wang, “NESTT: A nonconvex primal-dual splitting method for distributed and stochastic optimization,” in Advances in NIPS , 2016, pp. 3215–3223

work page 2016
[26]

Parallel stochastic successive convex approximation method for large-scale dictionary learning,

A. Koppel, A. Mokhtari, and A. Ribeiro, “Parallel stochastic successive convex approximation method for large-scale dictionary learning,” in Proc. of 2018 ICASSP . IEEE, 2018, pp. 2771–2775. 15

work page 2018
[27]

Stochastic successive convex ap- proximation for non-convex constrained stochastic optimization,

A. Liu, V . Lau, and B. Kananian, “Stochastic successive convex ap- proximation for non-convex constrained stochastic optimization,” arXiv preprint arXiv:1801.08266, 2018

work page arXiv 2018
[28]

A convergence framework for inexact nonconvex and nonsmooth algorithms and its applications to several iterations

T. Sun, H. Jiang, L. Cheng, and W. Zhu, “A convergence frame for inexact nonconvex and nonsmooth algorithms and its applications to several iterations,” arXiv preprint arXiv:1709.04072 , 2018

work page internal anchor Pith review Pith/arXiv arXiv 2018
[29]

Gradient Method With Inexact Oracle for Composite Non-Convex Optimization

P. Dvurechensky, “Gradient method with inexact oracle for composite non-convex optimization,” arXiv preprint arXiv:1703.09180 , 2017

work page internal anchor Pith review Pith/arXiv arXiv 2017
[30]

A uniﬁed alternating direction method of multipliers by majorization minimization,

C. Lu, J. Feng, S. Yan, and Z. Lin, “A uniﬁed alternating direction method of multipliers by majorization minimization,” IEEE Trans. on Pattern Analysis and Machine Intelligence , vol. 40, no. 3, pp. 527–541, 2018

work page 2018
[31]

Convex optimization for big data: Scalable, randomized, and parallel algorithms for big data analytics,

V . Cevher, S. Becker, and M. Schmidt, “Convex optimization for big data: Scalable, randomized, and parallel algorithms for big data analytics,” IEEE Sig. Proc. Magazine , vol. 31, no. 5, pp. 32–43, 2014

work page 2014
[32]

D. P. Bertsekas and J. N. Tsitsiklis, Parallel and distributed computation: numerical methods. Prentice hall Englewood Cliffs, NJ, 1989, vol. 23

work page 1989
[33]

Recovering sparse signals with a certain family of nonconvex penalties and dc programming,

G. Gasso, A. Rakotomamonjy, and S. Canu, “Recovering sparse signals with a certain family of nonconvex penalties and dc programming,”IEEE Trans. on Signal Process. , vol. 57, no. 12, pp. 4686–4698, 2009

work page 2009
[34]

Non-convex hybrid total variation for image denoising,

S. Oh, H. Woo, S. Yun, and M. Kang, “Non-convex hybrid total variation for image denoising,” Journal of Visual Communication and Image Representation, vol. 24, no. 3, pp. 332–344, 2013

work page 2013
[35]

Regularization paths for generalized linear models via coordinate descent,

J. Friedman, T. Hastie, and R. Tibshirani, “Regularization paths for generalized linear models via coordinate descent,” Journal of statistical software, vol. 33, no. 1, p. 1, 2010

work page 2010
[36]

Distributed optimization and statistical learning via the alternating direction method of multipliers,

S. Boyd, N. Parikh, E. Chu, B. Peleato, and J. Eckstein, “Distributed optimization and statistical learning via the alternating direction method of multipliers,” Found. Trend. Mach. Learn. , vol. 3, no. 1, pp. 1–122, 2011

work page 2011
[37]

Distributed subgradient methods for multi- agent optimization,

A. Nedic and A. Ozdaglar, “Distributed subgradient methods for multi- agent optimization,” IEEE Trans. on Auto. Cont. , vol. 54, no. 1, pp. 48–61, 2009

work page 2009
[38]

Parallel and distributed methods for constrained nonconvex optimizationpart i: Theory,

G. Scutari, F. Facchinei, and L. Lampariello, “Parallel and distributed methods for constrained nonconvex optimizationpart i: Theory,” IEEE Trans. on Signal Process. , vol. 65, no. 8, pp. 1929–1944, April 2017

work page 1929
[39]

Par- allel and distributed methods for constrained nonconvex optimization- part ii: Applications in communications and machine learning,

G. Scutari, F. Facchinei, L. Lampariello, S. Sardellitti, and P. Song, “Par- allel and distributed methods for constrained nonconvex optimization- part ii: Applications in communications and machine learning,” IEEE Trans. on Signal Process. , vol. 65, no. 8, pp. 1945–1960, April 2017

work page 1945
[40]

Optimization with ﬁrst-order surrogate functions,

J. Mairal, “Optimization with ﬁrst-order surrogate functions,” in ICML, 2013, pp. 783–791

work page 2013
[41]

A uniﬁed convergence analysis of block successive minimization methods for nonsmooth optimization,

M. Razaviyayn, M. Hong, and Z.-Q. Luo, “A uniﬁed convergence analysis of block successive minimization methods for nonsmooth optimization,” SIAM Journal on Optimization , vol. 23, no. 2, pp. 1126– 1153, 2013

work page 2013
[42]

Stochastic majorization-minimization algorithms for large- scale optimization,

J. Mairal, “Stochastic majorization-minimization algorithms for large- scale optimization,” in Advances in NIPS , 2013, pp. 2283–2291

work page 2013
[43]

Parallel succes- sive convex approximation for nonsmooth nonconvex optimization,

M. Razaviyayn, M. Hong, Z.-Q. Luo, and J.-S. Pang, “Parallel succes- sive convex approximation for nonsmooth nonconvex optimization,” in Advances in NIPS , 2014, pp. 1440–1448

work page 2014
[44]

Clustering with bregman divergences,

A. Banerjee, S. Merugu, I. S. Dhillon, and J. Ghosh, “Clustering with bregman divergences,” Journal of machine learning research, vol. 6, no. Oct, pp. 1705–1749, 2005

work page 2005
[45]

Global convergence of admm in nonconvex nonsmooth optimization,

Y . Wang, W. Yin, and J. Zeng, “Global convergence of admm in nonconvex nonsmooth optimization,” Journal of Scientiﬁc Computing , pp. 1–35, 2015

work page 2015
[46]

Next: In-network nonconvex optimiza- tion,

P. Di Lorenzo and G. Scutari, “Next: In-network nonconvex optimiza- tion,” IEEE Trans. on Sig. and Info. Proc. over Netw. , vol. 2, no. 2, pp. 120–136, 2016

work page 2016
[47]

Asynchronous distributed admm for consensus optimization,

R. Zhang and J. Kwok, “Asynchronous distributed admm for consensus optimization,” in Proc. of the 31st ICML , 2014, pp. 1701–1709

work page 2014
[48]

Variance Reduction for Distributed Stochastic Gradient Descent

S. De, G. Taylor, and T. Goldstein, “Variance reduction for distributed stochastic gradient descent,” arXiv preprint arXiv:1512.01708 , 2015

work page internal anchor Pith review Pith/arXiv arXiv 2015
[49]

Asynchronous Stochastic Gradient Descent with Variance Reduction for Non-Convex Optimization

Z. Huo and H. Huang, “Asynchronous stochastic gradient descent with variance reduction for non-convex optimization,” arXiv preprint arXiv:1604.03584, 2016

work page internal anchor Pith review Pith/arXiv arXiv 2016
[50]

Asynchronous distributed admm for large-scale optimizationpart i: algorithm and convergence analysis,

T.-H. Chang, M. Hong, W.-C. Liao, and X. Wang, “Asynchronous distributed admm for large-scale optimizationpart i: algorithm and convergence analysis,” IEEE Trans. on Signal Process. , vol. 64, no. 12, pp. 3118–3130, 2016

work page 2016
[51]

Asynchronous parallel stochastic gradient for nonconvex optimization,

X. Lian, Y . Huang, Y . Li, and J. Liu, “Asynchronous parallel stochastic gradient for nonconvex optimization,” in Advances in NIPS , 2015, pp. 2719–2727

work page 2015
[52]

Stochastic ﬁrst-and zeroth-order methods for nonconvex stochastic programming,

S. Ghadimi and G. Lan, “Stochastic ﬁrst-and zeroth-order methods for nonconvex stochastic programming,” SIAM Journal on Optimization , vol. 23, no. 4, pp. 2341–2368, 2013

work page 2013
[53]

Accelerated gradient methods for nonconvex nonlinear and stochastic programming,

——, “Accelerated gradient methods for nonconvex nonlinear and stochastic programming,” Mathematical Programming, pp. 1–41, 2015

work page 2015

[1] [1]

Non-convex optimization for machine learning,

P. Jain, P. Kar et al., “Non-convex optimization for machine learning,” Foundations and Trends R⃝ in Machine Learning , vol. 10, no. 3-4, pp. 142–336, 2017

work page 2017

[2] [2]

Imagenet classiﬁcation with deep convolutional neural networks,

A. Krizhevsky, I. Sutskever, and G. E. Hinton, “Imagenet classiﬁcation with deep convolutional neural networks,” in Advances in NIPS , 2012, pp. 1097–1105

work page 2012

[3] [3]

Parallel and distributed successive convex approximation methods for big-data optimization,

G. Scutari and Y . Sun, “Parallel and distributed successive convex approximation methods for big-data optimization,” in Multi-agent Opti- mization. Springer, 2018, pp. 141–308

work page 2018

[4] [4]

Asynchronous optimization over heterogeneous networks via consensus admm,

S. Kumar, R. Jain, and K. Rajawat, “Asynchronous optimization over heterogeneous networks via consensus admm,” IEEE Trans. on Signal and Info. Process. over Netw. , vol. 3, no. 1, pp. 114–129, 2017

work page 2017

[5] [5]

A parallel de- composition method for nonconvex stochastic multi-agent optimization problems,

Y . Yang, G. Scutari, D. P. Palomar, and M. Pesavento, “A parallel de- composition method for nonconvex stochastic multi-agent optimization problems,” IEEE Trans. on Signal Process. , vol. 64, no. 11, pp. 2949– 2964, 2016

work page 2016

[6] [6]

Variance reduction for faster non-convex optimization,

Z. Allen-Zhu and E. Hazan, “Variance reduction for faster non-convex optimization,” in Proc. of the 33rd ICML , 2016, pp. 699–707

work page 2016

[7] [7]

Stochastic variance reduction for nonconvex optimization,

S. J. Reddi, A. Hefny, S. Sra, B. Poczos, and A. Smola, “Stochastic variance reduction for nonconvex optimization,” in Proc. of the 33rd ICML, 2016, pp. 314–323

work page 2016

[8] [8]

Stochastic Alternating Direction Method of Multipliers with Variance Reduction for Nonconvex Optimization

F. Huang, S. Chen, and Z. Lu, “Stochastic alternating direction method of multipliers with variance reduction for nonconvex optimization,” arXiv preprint arXiv:1610.02758 , 2016

work page internal anchor Pith review Pith/arXiv arXiv 2016

[9] [9]

Majorization-minimization algo- rithms in signal processing, communications, and machine learning,

Y . Sun, P. Babu, and D. P. Palomar, “Majorization-minimization algo- rithms in signal processing, communications, and machine learning,” IEEE Trans. on Signal Process. , vol. 65, no. 3, pp. 794–816, 2017

work page 2017

[10] [10]

Parallel and distributed methods for nonconvex optimization,

G. Scutari, F. Facchinei, L. Lampariello, and P. Song, “Parallel and distributed methods for nonconvex optimization,” in Proc. of 2014 ICASSP. IEEE, 2014, pp. 840–844

work page 2014

[11] [11]

D. P. Bertsekas, Nonlinear programming. Athena scientiﬁc Belmont, 1999

work page 1999

[12] [12]

Convergence of Stochastic Proximal Gradient Algorithm

L. Rosasco, S. Villa, and B. C. V ˜u, “Convergence of stochastic proximal gradient algorithm,” arXiv preprint arXiv:1403.5074 , 2014

work page internal anchor Pith review Pith/arXiv arXiv 2014

[13] [13]

A stochastic gradient method with an exponential convergence rate for ﬁnite training sets,

N. L. Roux, M. Schmidt, and F. R. Bach, “A stochastic gradient method with an exponential convergence rate for ﬁnite training sets,” in Advances in NIPS , 2012, pp. 2663–2671

work page 2012

[14] [14]

First-order methods of smooth convex optimization with inexact oracle,

O. Devolder, F. Glineur, and Y . Nesterov, “First-order methods of smooth convex optimization with inexact oracle,” Mathematical Programming, vol. 146, no. 1-2, pp. 37–75, 2014

work page 2014

[15] [15]

How to escape saddle points efﬁciently,

C. Jin, R. Ge, P. Netrapalli, S. M. Kakade, and M. I. Jordan, “How to escape saddle points efﬁciently,” inProceedings of the 34th International Conference on Machine Learning-Volume 70 . JMLR. org, 2017, pp. 1724–1732

work page 2017

[16] [16]

QSGD: Communication-efﬁcient sgd via gradient quantization and encoding,

D. Alistarh, D. Grubic, J. Li, R. Tomioka, and M. V ojnovic, “QSGD: Communication-efﬁcient sgd via gradient quantization and encoding,” in Advances in NIPS , 2017, pp. 1709–1720

work page 2017

[17] [17]

Gradient Sparsification for Communication-Efficient Distributed Optimization

J. Wangni, J. Wang, J. Liu, and T. Zhang, “Gradient sparsiﬁcation for communication-efﬁcient distributed optimization,” arXiv preprint arXiv:1710.09854, 2017

work page internal anchor Pith review Pith/arXiv arXiv 2017

[18] [18]

Generalized stochastic frank-wolfe algorithm with stochastic “Substitute

H. Lu and R. M. Freund, “Generalized stochastic frank-wolfe algorithm with stochastic “Substitute” gradient for structured convex optimization,” arXiv preprint arxiv:1807.07680 , 2018

work page arXiv 2018

[19] [19]

Robust accelerated gradient method,

N. S. Aybat, A. Fallah, M. Gurbuzbalaban, and A. Ozdaglar, “Robust accelerated gradient method,” arXiv preprint arXiv:1805.10579 , 2018

work page arXiv 2018

[20] [20]

Analysis of approximate stochastic gra- dient using quadratic constraints and sequential semideﬁnite programs,

B. Hu, P. Seiler, and L. Lessard, “Analysis of approximate stochastic gra- dient using quadratic constraints and sequential semideﬁnite programs,” arXiv preprint arXiv:1711.00987 , 2017

work page arXiv 2017

[21] [21]

Convergence rates of inexact proximal-gradient methods for convex optimization,

M. Schmidt, N. L. Roux, and F. R. Bach, “Convergence rates of inexact proximal-gradient methods for convex optimization,” in Advances in NIPS, 2011, pp. 1458–1466

work page 2011

[22] [22]

Smooth optimization with approximate gradient,

A. d’Aspremont, “Smooth optimization with approximate gradient,” SIAM Journal on Optimization , vol. 19, no. 3, pp. 1171–1183, 2008

work page 2008

[23] [23]

Robustness of accelerated ﬁrst-order algorithms for strongly convex optimization problems,

H. Mohammadi, M. Razaviyayn, and M. R. Jovanovi ´c, “Robustness of accelerated ﬁrst-order algorithms for strongly convex optimization problems,” arXiv preprint arXiv:1905.11011 , 2019

work page arXiv 1905

[24] [24]

A distributed, asynchronous, and incremental algorithm for nonconvex optimization: An admm approach,

M. Hong, “A distributed, asynchronous, and incremental algorithm for nonconvex optimization: An admm approach,” IEEE Trans. on Control of Network Systems , vol. 5, no. 3, pp. 935–945, 2018

work page 2018

[25] [25]

NESTT: A nonconvex primal-dual splitting method for distributed and stochastic optimization,

D. Hajinezhad, M. Hong, T. Zhao, and Z. Wang, “NESTT: A nonconvex primal-dual splitting method for distributed and stochastic optimization,” in Advances in NIPS , 2016, pp. 3215–3223

work page 2016

[26] [26]

Parallel stochastic successive convex approximation method for large-scale dictionary learning,

A. Koppel, A. Mokhtari, and A. Ribeiro, “Parallel stochastic successive convex approximation method for large-scale dictionary learning,” in Proc. of 2018 ICASSP . IEEE, 2018, pp. 2771–2775. 15

work page 2018

[27] [27]

Stochastic successive convex ap- proximation for non-convex constrained stochastic optimization,

A. Liu, V . Lau, and B. Kananian, “Stochastic successive convex ap- proximation for non-convex constrained stochastic optimization,” arXiv preprint arXiv:1801.08266, 2018

work page arXiv 2018

[28] [28]

A convergence framework for inexact nonconvex and nonsmooth algorithms and its applications to several iterations

T. Sun, H. Jiang, L. Cheng, and W. Zhu, “A convergence frame for inexact nonconvex and nonsmooth algorithms and its applications to several iterations,” arXiv preprint arXiv:1709.04072 , 2018

work page internal anchor Pith review Pith/arXiv arXiv 2018

[29] [29]

Gradient Method With Inexact Oracle for Composite Non-Convex Optimization

P. Dvurechensky, “Gradient method with inexact oracle for composite non-convex optimization,” arXiv preprint arXiv:1703.09180 , 2017

work page internal anchor Pith review Pith/arXiv arXiv 2017

[30] [30]

A uniﬁed alternating direction method of multipliers by majorization minimization,

C. Lu, J. Feng, S. Yan, and Z. Lin, “A uniﬁed alternating direction method of multipliers by majorization minimization,” IEEE Trans. on Pattern Analysis and Machine Intelligence , vol. 40, no. 3, pp. 527–541, 2018

work page 2018

[31] [31]

Convex optimization for big data: Scalable, randomized, and parallel algorithms for big data analytics,

V . Cevher, S. Becker, and M. Schmidt, “Convex optimization for big data: Scalable, randomized, and parallel algorithms for big data analytics,” IEEE Sig. Proc. Magazine , vol. 31, no. 5, pp. 32–43, 2014

work page 2014

[32] [32]

D. P. Bertsekas and J. N. Tsitsiklis, Parallel and distributed computation: numerical methods. Prentice hall Englewood Cliffs, NJ, 1989, vol. 23

work page 1989

[33] [33]

Recovering sparse signals with a certain family of nonconvex penalties and dc programming,

G. Gasso, A. Rakotomamonjy, and S. Canu, “Recovering sparse signals with a certain family of nonconvex penalties and dc programming,”IEEE Trans. on Signal Process. , vol. 57, no. 12, pp. 4686–4698, 2009

work page 2009

[34] [34]

Non-convex hybrid total variation for image denoising,

S. Oh, H. Woo, S. Yun, and M. Kang, “Non-convex hybrid total variation for image denoising,” Journal of Visual Communication and Image Representation, vol. 24, no. 3, pp. 332–344, 2013

work page 2013

[35] [35]

Regularization paths for generalized linear models via coordinate descent,

J. Friedman, T. Hastie, and R. Tibshirani, “Regularization paths for generalized linear models via coordinate descent,” Journal of statistical software, vol. 33, no. 1, p. 1, 2010

work page 2010

[36] [36]

Distributed optimization and statistical learning via the alternating direction method of multipliers,

S. Boyd, N. Parikh, E. Chu, B. Peleato, and J. Eckstein, “Distributed optimization and statistical learning via the alternating direction method of multipliers,” Found. Trend. Mach. Learn. , vol. 3, no. 1, pp. 1–122, 2011

work page 2011

[37] [37]

Distributed subgradient methods for multi- agent optimization,

A. Nedic and A. Ozdaglar, “Distributed subgradient methods for multi- agent optimization,” IEEE Trans. on Auto. Cont. , vol. 54, no. 1, pp. 48–61, 2009

work page 2009

[38] [38]

Parallel and distributed methods for constrained nonconvex optimizationpart i: Theory,

G. Scutari, F. Facchinei, and L. Lampariello, “Parallel and distributed methods for constrained nonconvex optimizationpart i: Theory,” IEEE Trans. on Signal Process. , vol. 65, no. 8, pp. 1929–1944, April 2017

work page 1929

[39] [39]

Par- allel and distributed methods for constrained nonconvex optimization- part ii: Applications in communications and machine learning,

G. Scutari, F. Facchinei, L. Lampariello, S. Sardellitti, and P. Song, “Par- allel and distributed methods for constrained nonconvex optimization- part ii: Applications in communications and machine learning,” IEEE Trans. on Signal Process. , vol. 65, no. 8, pp. 1945–1960, April 2017

work page 1945

[40] [40]

Optimization with ﬁrst-order surrogate functions,

J. Mairal, “Optimization with ﬁrst-order surrogate functions,” in ICML, 2013, pp. 783–791

work page 2013

[41] [41]

A uniﬁed convergence analysis of block successive minimization methods for nonsmooth optimization,

M. Razaviyayn, M. Hong, and Z.-Q. Luo, “A uniﬁed convergence analysis of block successive minimization methods for nonsmooth optimization,” SIAM Journal on Optimization , vol. 23, no. 2, pp. 1126– 1153, 2013

work page 2013

[42] [42]

Stochastic majorization-minimization algorithms for large- scale optimization,

J. Mairal, “Stochastic majorization-minimization algorithms for large- scale optimization,” in Advances in NIPS , 2013, pp. 2283–2291

work page 2013

[43] [43]

Parallel succes- sive convex approximation for nonsmooth nonconvex optimization,

M. Razaviyayn, M. Hong, Z.-Q. Luo, and J.-S. Pang, “Parallel succes- sive convex approximation for nonsmooth nonconvex optimization,” in Advances in NIPS , 2014, pp. 1440–1448

work page 2014

[44] [44]

Clustering with bregman divergences,

A. Banerjee, S. Merugu, I. S. Dhillon, and J. Ghosh, “Clustering with bregman divergences,” Journal of machine learning research, vol. 6, no. Oct, pp. 1705–1749, 2005

work page 2005

[45] [45]

Global convergence of admm in nonconvex nonsmooth optimization,

Y . Wang, W. Yin, and J. Zeng, “Global convergence of admm in nonconvex nonsmooth optimization,” Journal of Scientiﬁc Computing , pp. 1–35, 2015

work page 2015

[46] [46]

Next: In-network nonconvex optimiza- tion,

P. Di Lorenzo and G. Scutari, “Next: In-network nonconvex optimiza- tion,” IEEE Trans. on Sig. and Info. Proc. over Netw. , vol. 2, no. 2, pp. 120–136, 2016

work page 2016

[47] [47]

Asynchronous distributed admm for consensus optimization,

R. Zhang and J. Kwok, “Asynchronous distributed admm for consensus optimization,” in Proc. of the 31st ICML , 2014, pp. 1701–1709

work page 2014

[48] [48]

Variance Reduction for Distributed Stochastic Gradient Descent

S. De, G. Taylor, and T. Goldstein, “Variance reduction for distributed stochastic gradient descent,” arXiv preprint arXiv:1512.01708 , 2015

work page internal anchor Pith review Pith/arXiv arXiv 2015

[49] [49]

Asynchronous Stochastic Gradient Descent with Variance Reduction for Non-Convex Optimization

Z. Huo and H. Huang, “Asynchronous stochastic gradient descent with variance reduction for non-convex optimization,” arXiv preprint arXiv:1604.03584, 2016

work page internal anchor Pith review Pith/arXiv arXiv 2016

[50] [50]

Asynchronous distributed admm for large-scale optimizationpart i: algorithm and convergence analysis,

T.-H. Chang, M. Hong, W.-C. Liao, and X. Wang, “Asynchronous distributed admm for large-scale optimizationpart i: algorithm and convergence analysis,” IEEE Trans. on Signal Process. , vol. 64, no. 12, pp. 3118–3130, 2016

work page 2016

[51] [51]

Asynchronous parallel stochastic gradient for nonconvex optimization,

X. Lian, Y . Huang, Y . Li, and J. Liu, “Asynchronous parallel stochastic gradient for nonconvex optimization,” in Advances in NIPS , 2015, pp. 2719–2727

work page 2015

[52] [52]

Stochastic ﬁrst-and zeroth-order methods for nonconvex stochastic programming,

S. Ghadimi and G. Lan, “Stochastic ﬁrst-and zeroth-order methods for nonconvex stochastic programming,” SIAM Journal on Optimization , vol. 23, no. 4, pp. 2341–2368, 2013

work page 2013

[53] [53]

Accelerated gradient methods for nonconvex nonlinear and stochastic programming,

——, “Accelerated gradient methods for nonconvex nonlinear and stochastic programming,” Mathematical Programming, pp. 1–41, 2015

work page 2015