High-Probability Convergence Theory for Distributed Composite Optimization with Sub-Weibull Noises

Deming Yuan; Zhan Yu; Zhongjie Shi

arxiv: 2506.12901 · v3 · submitted 2025-06-15 · 🧮 math.OC

High-Probability Convergence Theory for Distributed Composite Optimization with Sub-Weibull Noises

Zhan Yu , Zhongjie Shi , Deming Yuan This is my paper

Pith reviewed 2026-05-19 09:30 UTC · model grok-4.3

classification 🧮 math.OC

keywords distributed optimizationstochastic mirror descentsub-Weibull noisehigh-probability convergencecomposite optimizationtime-varying networksconvex optimization

0 comments

The pith

Distributed stochastic mirror descent achieves high-probability convergence for convex composite optimization under sub-Weibull gradient noises.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces a distributed composite stochastic mirror descent scheme that incorporates sub-Weibull randomness for stochastic gradients. It establishes that this approach works for heavier-tailed noise environments common in real data scenarios and derives high-probability convergence rates without requiring smoothness on the functions. This provides a unified analytical framework applicable to various distributed optimization algorithms and noise types over time-varying multi-agent networks.

Core claim

The DCSMD-SW algorithm solves convex distributed composite optimization problems by using mirror descent updates with sub-Weibull stochastic gradients, and the analysis shows satisfactory high-probability convergence rates hold without any smoothness requirement on the objective.

What carries the argument

Distributed composite stochastic mirror descent with sub-Weibull gradient noise (DCSMD-SW), which enables high-probability bounds via moment-generating function properties of sub-Weibull distributions.

Load-bearing premise

The stochastic gradients are assumed to have sub-Weibull tails with finite parameters allowing moment-generating function bounds in the analysis.

What would settle it

A counterexample where gradients exhibit tails heavier than any sub-Weibull distribution but the algorithm fails to converge at the predicted high-probability rates.

Figures

Figures reproduced from arXiv: 2506.12901 by Deming Yuan, Zhan Yu, Zhongjie Shi.

**Figure 1.** Figure 1: Maximum, minimum, and median of the optimization errors versus the total number of iterations T of the DCSMD-SW algorithm [PITH_FULL_IMAGE:figures/full_fig_p028_1.png] view at source ↗

**Figure 2.** Figure 2: Median of the optimization errors versus the total number of iterations T of the DCSMD-SW algorithm for three different stochastic noises: uniform noise, Gaussian noise, and Laplace noise. x ∈ X . The parameters for the constraints are uj = −1 and vj = 1 for j = 1, 2, .., n. In each trial, for i = 1, 2, ..., m, each element of the input vector ai is generated from a uniform distribution over the interval [… view at source ↗

**Figure 3.** Figure 3: Median of the optimization errors versus the total number of iterations T of the DCSMD-SW algorithm for three different choices of the problem dimension n [PITH_FULL_IMAGE:figures/full_fig_p029_3.png] view at source ↗

**Figure 4.** Figure 4: Median of the optimization errors versus the total number of iterations T of the DCSMD-SW algorithm for three different number of agents m in a ring network. shown in [PITH_FULL_IMAGE:figures/full_fig_p029_4.png] view at source ↗

**Figure 5.** Figure 5: Median of the optimization errors versus the total number of iterations T of the DCSMD-SW algorithm for two different stepsizes: constant stepsize and varying stepsize [PITH_FULL_IMAGE:figures/full_fig_p030_5.png] view at source ↗

**Figure 6.** Figure 6: Maximum, minimum, and median of the optimization errors versus the total number of iterations T of the DSED-SW algorithm. number of iterations T. The outcomes, depicted in [PITH_FULL_IMAGE:figures/full_fig_p030_6.png] view at source ↗

**Figure 7.** Figure 7: Median of the optimization errors versus the total number of iterations T of the DSED-SW algorithm for three different choices of the problem dimension n [PITH_FULL_IMAGE:figures/full_fig_p031_7.png] view at source ↗

**Figure 8.** Figure 8: Median of the optimization errors versus the total number of iterations T of the DSED-SW algorithm for three different number of agents m in a ring network. Acknowledgements The work by Zhan Yu is partial supported by the Research Grants Council of Hong Kong [Project No. HKBU 12301424] and the National Natural Science Foundation of China [Project No. 12401123]. References [1] A. Nedić, A. Ozdaglar, “Distri… view at source ↗

read the original abstract

With the rapid development of distributed optimization (DO) theory, the distributed stochastic gradient methods (DSGMs) occupy an important position. Although the theory of different DSGMs has been widely established, the main-stream results of existing work are still derived under the condition of light-tailed stochastic gradient noises. Increasing examples from various fields, indicate that, the light-tailed noise model is overly idealized in many practical instances, failing to capture the complexity and variability of noises in real-world scenarios, such as the presence of outliers or extreme values from data science and statistical learning. To address this issue, we propose a new DO framework that incorporates stochastic gradients under sub-Weibull randomness. We study a distributed composite stochastic mirror descent scheme with sub-Weibull gradient noise (DCSMD-SW) for solving a convex distributed composite optimization (DCO) problem over the time-varying multi-agent network. By investigating sub-Weibull randomness in DCSMD for the first time, we show that the algorithm is applicable in some common heavier-tailed noise environments while also guaranteeing good convergence properties. We comprehensively study the convergence performance of DCSMD-SW. Satisfactory high-probability convergence rates are derived for DCSMD-SW without any smoothness requirement. The work also offers a unified analytical framework for several critical cases of both algorithms and noise environments.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper gives the first high-probability rates for distributed composite mirror descent under sub-Weibull noise on time-varying networks, without smoothness.

read the letter

The main thing to know is that this work derives explicit high-probability convergence rates for a distributed composite stochastic mirror descent scheme when the stochastic gradients follow sub-Weibull tails, and it does so without any smoothness assumption on the objective. The rates are stated to hold over time-varying multi-agent networks and to degrade gracefully with the sub-Weibull parameters. That is the concrete advance. The authors position the result as the first treatment of sub-Weibull noise inside this particular algorithm class and supply a unified analysis that covers several algorithm variants and noise regimes at once. The technical route looks standard: they use the moment-generating-function bounds that come with the sub-Weibull definition to control the deviation terms, then feed those into the usual mirror-descent recursion plus the network mixing argument. The absence of a smoothness requirement is useful for composite problems that include nonsmooth regularizers. The handling of time-varying graphs is also carried through without extra restrictions that would limit applicability. On the downside, the abstract and stress-test summary do not show the precise dependence of the failure probability or the iteration complexity on the number of agents and the network connectivity parameter. If the union-bound or martingale steps introduce hidden logarithmic factors that grow with the horizon, the rates could be less attractive than claimed. The sub-Weibull assumption itself supplies the needed tail control, so there is no obvious circularity, but the full proof would need to confirm that the constants remain reasonable when the sub-Weibull index is only modestly larger than 1. This paper is written for researchers already working on stochastic distributed optimization who want to relax the light-tailed noise assumption. A reader who cares about high-probability guarantees for mirror-descent methods on networks will find the unified framework and the explicit rates worth examining. The work is coherent on its own terms and formally grounded enough to merit referee time, even if the constants or the tightness of the bounds turn out to need tightening. I would send it to peer review.

Referee Report

1 major / 2 minor

Summary. The paper proposes the DCSMD-SW algorithm for convex distributed composite optimization over time-varying multi-agent networks, where stochastic gradients obey sub-Weibull randomness. It derives high-probability convergence rates without smoothness assumptions and presents a unified analytical framework applicable to several algorithm and noise variants.

Significance. If the derivations hold, the result meaningfully extends distributed optimization theory to heavier-tailed noise regimes that arise in data science and statistical learning. The absence of smoothness requirements and the explicit high-probability bounds under sub-Weibull moment conditions constitute a concrete advance over light-tailed analyses; the unified framework is a further strength.

major comments (1)

[§4.2, Theorem 4.1] §4.2, Theorem 4.1: the high-probability bound is stated to hold uniformly over the network mixing time, yet the proof sketch invokes a union bound whose failure probability scaling with the number of agents and iterations is not made explicit; this affects the claimed rate when the network diameter grows.

minor comments (2)

[§2.3] Notation for the sub-Weibull parameters (e.g., the index and scale) is introduced in the abstract but first defined only in §2.3; an early consolidated table would improve readability.
[Figure 1] Figure 1 caption refers to 'empirical convergence' but the plotted quantity is the expected optimality gap; clarify whether the curves are averaged over multiple runs or single trajectories.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for the constructive comment and positive evaluation of our work. We address the major comment below and will make the requested clarifications in the revised manuscript.

read point-by-point responses

Referee: [§4.2, Theorem 4.1] §4.2, Theorem 4.1: the high-probability bound is stated to hold uniformly over the network mixing time, yet the proof sketch invokes a union bound whose failure probability scaling with the number of agents and iterations is not made explicit; this affects the claimed rate when the network diameter grows.

Authors: We thank the referee for this observation. The full proof of Theorem 4.1 (in the appendix) applies a union bound over the n agents and T iterations to control the sub-Weibull concentration terms uniformly. To achieve an overall success probability of at least 1-δ, each individual term is controlled with failure probability δ/(nT), which introduces an explicit logarithmic factor log(nT/δ) into the high-probability bound. Because the network mixing time is bounded by a function of the diameter (which can grow with n), this factor must be stated explicitly to confirm uniformity. In the revised manuscript we will (i) restate Theorem 4.1 with the precise dependence on n, T, and δ, (ii) add a short remark in §4.2 clarifying the union-bound scaling, and (iii) verify that the leading convergence rate remains unchanged while the logarithmic terms are now fully explicit. These are minor textual and notational adjustments. revision: yes

Circularity Check

0 steps flagged

No significant circularity; derivation self-contained from sub-Weibull assumptions

full rationale

The manuscript derives high-probability convergence rates for the DCSMD-SW algorithm directly from the stated sub-Weibull gradient noise conditions via standard mirror-descent recursions, network mixing lemmas, and concentration inequalities that follow from the moment-generating function bounds supplied by the sub-Weibull parameters. No step reduces a claimed rate or bound to a fitted parameter, a self-citation chain, or an ansatz smuggled from prior work by the same authors; the unified framework simply specializes the same recursion to different noise indices and network topologies without redefining the target quantities in terms of themselves. The absence of smoothness assumptions is handled explicitly by the analysis rather than assumed away, confirming the central claims remain independent of the inputs.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The central claim rests on standard convexity and network connectivity assumptions plus the sub-Weibull noise model; no new entities introduced and no free parameters explicitly fitted in the abstract.

axioms (2)

domain assumption The objective function is convex and the problem is a composite optimization over a time-varying multi-agent network.
Stated in the abstract as the setting for DCSMD-SW.
domain assumption Stochastic gradients satisfy sub-Weibull randomness allowing moment-generating function control.
Core modeling choice enabling the high-probability analysis.

pith-pipeline@v0.9.0 · 5767 in / 1293 out tokens · 22146 ms · 2026-05-19T09:30:28.833282+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

We study a distributed composite stochastic mirror descent scheme with sub-Weibull gradient noise (DCSMD-SW) ... high probability convergence rates ... without any smoothness requirement.
IndisputableMonolith/Foundation/RealityFromDistinction.lean reality_from_one_distinction unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

Assumption 4. ... ∥ξi,t∥∗ ∼ SubW(θ,κ) ... E[exp{(∥ξi,t∥∗/κ)1/θ}|Ft−1] ≤ 2.

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

40 extracted references · 40 canonical work pages

[1]

Distributed subgradient methods for multi-agent optimization

A. Nedić, A. Ozdaglar, “Distributed subgradient methods for multi-agent optimization.”IEEE Transactions on Automatic Control, 54(1), 48-61, 2009

work page 2009
[2]

A survey of distributed optimization

T. Yang, X. Yi, J. Wu, Y. Yuan, D. Wu, Z. Meng, Y. Hong, H. Wang, Z. Lin, K. H. Johansson. “A survey of distributed optimization.”Annual Reviews in Control. 47, 278-305, 2019

work page 2019
[3]

Power control in wireless cellular networks

M. Chiang, P. Hande, T. Lan, C. Tan. “Power control in wireless cellular networks.”Founda- tions and Trends in Networking, 2(4), pp. 381-533, 2008

work page 2008
[4]

Optimization for data-driven learning and control

U. A. Khan, W. U. Bajwa, A. Nedić, M. G.. Rabbat, A. H. Sayed. “Optimization for data-driven learning and control.”Proceedings of the IEEE, 108(11), 2020

work page 2020
[5]

Initialization-free distributed algorithms for optimal resource allocation with feasibility constraints and its application to economic dispatch of power systems,

P. Yi, Y. Hong and F. Liu, “Initialization-free distributed algorithms for optimal resource allocation with feasibility constraints and its application to economic dispatch of power systems,”Automatica, vol. 74, pp. 259-269, 2016

work page 2016
[6]

Distributed constrained optimization by consensus- based primal-dual perturbation method

T. H. Chang, A. Nedić, A. Scaglione. “Distributed constrained optimization by consensus- based primal-dual perturbation method.”IEEE Transactions on Automatic Control, 59(6), 1524-1538, 2014. 31

work page 2014
[7]

Dual averaging for distributed optimization: convergence analysis and network scaling

J. C. Duchi, A. Agarwal, M. J. Wainwright. “Dual averaging for distributed optimization: convergence analysis and network scaling.”IEEE Transactions on Automatic Control, 57(3), 592-606, 2012

work page 2012
[8]

Distributed optimization over time-varying directed graphs

A. Nedić, A. Olshevsky. “Distributed optimization over time-varying directed graphs.”IEEE Transactions on Automatic Control, 60(3), 601-615, 2015

work page 2015
[9]

Optimal distributed stochastic mirror descent for strongly convex optimization

D. Yuan, Y. Hong, D. W. Ho, G. Jiang. “Optimal distributed stochastic mirror descent for strongly convex optimization.”Automatica, 90, 196-203, 2018

work page 2018
[10]

Distributed bandit online con- vex optimization with time-varying coupled inequality constraints

X. Yi, X. Li, T. Yang, L. Xie, T. Chai, K. H. Johansson. “Distributed bandit online con- vex optimization with time-varying coupled inequality constraints.”IEEE Transactions on Automatic Control, 66(10), 4620-4635, 2020

work page 2020
[11]

Sundhar Ram, A

S. Sundhar Ram, A. Nedić, V. V. Veeravalli. Distributed stochastic subgradient projection algorithms for convex optimization.Journal of Optimization Theory and Applications, 147, 516-545, 2010

work page 2010
[12]

Distributed mirror descent for online composite optimization

D. Yuan, Y. Hong, D. W. C. Ho, S. Xu. “Distributed mirror descent for online composite optimization.”IEEE Transactions on Automatic Control, 66(2), 714-729, 2021

work page 2021
[13]

Distributed randomized gradient-free mirror descent algorithm for constrained optimization

Z. Yu, D. W. Ho, D. Yuan. “Distributed randomized gradient-free mirror descent algorithm for constrained optimization.”IEEE Transactions on Automatic Control, 67(2), 957 - 964, Feb 2022

work page 2022
[14]

Stochastic mirror descent method for distributed multi-agent optimization

J. Li, G. Li, Z. Wu, C. Wu. “Stochastic mirror descent method for distributed multi-agent optimization.”Optimization Letters, 12, pp. 1179-1197, 2018

work page 2018
[15]

Distributed mirror descent method for multi-agent opti- mization with delay

J. Li, G. Chen, Z. Dong, Z. Wu. “Distributed mirror descent method for multi-agent opti- mization with delay.”Neurocomputing, 177, 643-650, 2016

work page 2016
[16]

Distributed constrained optimization with delayed subgradient information over time-varying network under adaptive quantization

J. Liu, Z. Yu, D. W. Ho. “Distributed constrained optimization with delayed subgradient information over time-varying network under adaptive quantization.”IEEE Transactions on Neural Networks and Learning Systems, 35(1), 143-156, 2022

work page 2022
[17]

Event-triggered distributed stochastic mirror descent for convex optimization

M. Xiong, B. Zhang, D. W. C. Ho, D. Yuan, S. Xu. “Event-triggered distributed stochastic mirror descent for convex optimization.”IEEE Transactions on Neural Networks and Learning Systems, 34(9), 6480-6491, 2022

work page 2022
[18]

Gossip-based distributed stochastic mirror descent for con- strained optimization

X. Fang, B. Zhang, D. Yuan. “Gossip-based distributed stochastic mirror descent for con- strained optimization.”Neural Networks, 175, 106291, 2024

work page 2024
[19]

Distributed mirror descent algorithm with Bregman damping for nonsmooth constrained optimization

G. Chen, G. Xu, W. Li, Y. Hong. “Distributed mirror descent algorithm with Bregman damping for nonsmooth constrained optimization.”IEEE Transactions on Automatic Control, 68(11), 6921-6928, 2023

work page 2023
[20]

High probability convergence of clipped distributed dual averaging with heavy-tailed noises

Y. Qin, K. Lu, H. Xu, X. Chen. “High probability convergence of clipped distributed dual averaging with heavy-tailed noises.”IEEE Transactions on Systems, Man, and Cybernetics: Systems, 2025

work page 2025
[21]

Distributed stochastic strongly convex optimization under heavy-tailed noises

C. Sun, B Chen. “Distributed stochastic strongly convex optimization under heavy-tailed noises.”In 2024 IEEE International Conference on Cybernetics and Intelligent Systems (CIS) and IEEE International Conference on Robotics, Automation and Mechatronics (RAM) (pp. 150-155). IEEE. 32

work page 2024
[22]

Online distributed stochastic gradient algorithm for nonconvex optimization with compressed communication

J. Li, C. Li, J. Fan, T. Huang. “Online distributed stochastic gradient algorithm for nonconvex optimization with compressed communication.”IEEE Transactions on Automatic Control, 69(2), 936-951, 2023

work page 2023
[23]

A tail-index analysis of stochastic gradient noise in deep neural networks

U. Simsekli, L. Sagun, M. Gurbuzbalaban. “A tail-index analysis of stochastic gradient noise in deep neural networks.”International Conference on Machine Learning, pp. 5827-5837, 2019

work page 2019
[24]

Why are adaptive methods good for attention models?

J. Zhang, S. P. Karimireddy, A. Veit, S. Kim, S. Reddi, S. Kumar, S. Sra. “Why are adaptive methods good for attention models?”Advances in Neural Information Processing Systems, 33, pp. 15383-15393, 2020

work page 2020
[25]

Approximate projected consensus for convex intersection computation: Convergence analysis and critical error angle

Y. Lou, G. Shi, K. H. Johansson, Y. Hong. “Approximate projected consensus for convex intersection computation: Convergence analysis and critical error angle.”IEEE Transactions on Automatic Control, 59(7), 1722-1736, 2014

work page 2014
[26]

Convergence in high probability of distributed stochastic gradient descent algorithms

K. Lu, H. Wang, H. Zhang, L. Wang. “Convergence in high probability of distributed stochastic gradient descent algorithms.”IEEE Transactions on Automatic Control, 69(4), 2189-2204, 2023

work page 2023
[27]

Distributed gradient descent for functional learning

Z. Yu, J. Fan, Z. Shi, D. X. Zhou. “Distributed gradient descent for functional learning.” IEEE Transactions on Information Theory, 70(9), 6547 - 6571, 2024

work page 2024
[28]

Distributed minimum error entropy algorithms

X. Guo, T. Hu, Q. Wu. “Distributed minimum error entropy algorithms.”Journal of Machine Learning Research, 21(126), 1-31, 2020

work page 2020
[29]

Gradient descent for robust kernel-based regression

Z. C. Guo, T. Hu, L. Shi. “Gradient descent for robust kernel-based regression.”Inverse Problems, 34(6), 065009, 2018

work page 2018
[30]

Distributed stochastic Nash equilibrium seeking under heavy-tailed noises

C. Sun, B. Chen, J. Wang, Z Wang, L. Yu. “Distributed stochastic Nash equilibrium seeking under heavy-tailed noises.”Automatica, 173, 112081, 2025

work page 2025
[31]

A stochastic operator framework for optimization and learning with sub-weibull errors

N. Bastianello, L. Madden, R. Carli E. Dall’Anese. “A stochastic operator framework for optimization and learning with sub-weibull errors.”IEEE Transactions on Automatic Control, 2024

work page 2024
[32]

High probability convergence bounds for non-convex stochastic gradient descent with sub-weibull noise

L. Madden, E. Dall’Anese, S. Becker. “High probability convergence bounds for non-convex stochastic gradient descent with sub-weibull noise.”Journal of Machine Learning Research, 25(241), pp. 1-36, 2024

work page 2024
[33]

General tail bounds for non-smooth stochastic mirror descent

K. Eldowa, A. Paudice. “General tail bounds for non-smooth stochastic mirror descent.” International Conference on Artificial Intelligence and Statistics, pp. 3205-3213. PMLR, 2024

work page 2024
[34]

High probability guarantees for nonconvex stochastic gradient descent with heavy tails

S. Li, Y. Liu. “High probability guarantees for nonconvex stochastic gradient descent with heavy tails.”International Conference on Machine Learning, pp. 12931-12963. PMLR, 2022

work page 2022
[35]

Sub-Weibull distributions: Generalizing sub-Gaussian and sub-Exponential properties to heavier tailed distributions

M. Vladimirova, S. Girard, H. Nguyen, J. Arbel. “Sub-Weibull distributions: Generalizing sub-Gaussian and sub-Exponential properties to heavier tailed distributions.”Stat, 9(1), 2020

work page 2020
[36]

Concentration inequalities for polynomials inα-sub- exponential random variables

F. Götze, H. Sambale, A. Sinulis. “Concentration inequalities for polynomials inα-sub- exponential random variables.”Electronic Journal of Probability, 26. 2021

work page 2021
[37]

Convergence of Adaptive Stochastic Mirror Descent

T. Hu, X. Liu, K. Ji, Y. Lei. “Convergence of Adaptive Stochastic Mirror Descent.”IEEE Transactions on Neural Networks and Learning Systems, DOI: 10.1109/TNNLS.2025.3545420, early access, 2025. 33

work page doi:10.1109/tnnls.2025.3545420 2025
[38]

Convergence of online mirror descent

Y. Lei, D. X. Zhou. “Convergence of online mirror descent.”Applied and Computational Harmonic Analysis, 48(1), 343-373, 2020

work page 2020
[39]

Nash equilibrium seeking forN-coalition noncooperative games

M. Ye, G. Hu, F. L. Lewis. “Nash equilibrium seeking forN-coalition noncooperative games.” Automatica, 95, 266-272, 2018

work page 2018
[40]

Distributed Nash equilibrium computation under round-robin scheduling protocol

Z. Feng, W. Xu, J. Cao. “Distributed Nash equilibrium computation under round-robin scheduling protocol.”IEEE Transactions on Automatic Control, 69(1), 339-346, 2023. 34

work page 2023

[1] [1]

Distributed subgradient methods for multi-agent optimization

A. Nedić, A. Ozdaglar, “Distributed subgradient methods for multi-agent optimization.”IEEE Transactions on Automatic Control, 54(1), 48-61, 2009

work page 2009

[2] [2]

A survey of distributed optimization

T. Yang, X. Yi, J. Wu, Y. Yuan, D. Wu, Z. Meng, Y. Hong, H. Wang, Z. Lin, K. H. Johansson. “A survey of distributed optimization.”Annual Reviews in Control. 47, 278-305, 2019

work page 2019

[3] [3]

Power control in wireless cellular networks

M. Chiang, P. Hande, T. Lan, C. Tan. “Power control in wireless cellular networks.”Founda- tions and Trends in Networking, 2(4), pp. 381-533, 2008

work page 2008

[4] [4]

Optimization for data-driven learning and control

U. A. Khan, W. U. Bajwa, A. Nedić, M. G.. Rabbat, A. H. Sayed. “Optimization for data-driven learning and control.”Proceedings of the IEEE, 108(11), 2020

work page 2020

[5] [5]

Initialization-free distributed algorithms for optimal resource allocation with feasibility constraints and its application to economic dispatch of power systems,

P. Yi, Y. Hong and F. Liu, “Initialization-free distributed algorithms for optimal resource allocation with feasibility constraints and its application to economic dispatch of power systems,”Automatica, vol. 74, pp. 259-269, 2016

work page 2016

[6] [6]

Distributed constrained optimization by consensus- based primal-dual perturbation method

T. H. Chang, A. Nedić, A. Scaglione. “Distributed constrained optimization by consensus- based primal-dual perturbation method.”IEEE Transactions on Automatic Control, 59(6), 1524-1538, 2014. 31

work page 2014

[7] [7]

Dual averaging for distributed optimization: convergence analysis and network scaling

J. C. Duchi, A. Agarwal, M. J. Wainwright. “Dual averaging for distributed optimization: convergence analysis and network scaling.”IEEE Transactions on Automatic Control, 57(3), 592-606, 2012

work page 2012

[8] [8]

Distributed optimization over time-varying directed graphs

A. Nedić, A. Olshevsky. “Distributed optimization over time-varying directed graphs.”IEEE Transactions on Automatic Control, 60(3), 601-615, 2015

work page 2015

[9] [9]

Optimal distributed stochastic mirror descent for strongly convex optimization

D. Yuan, Y. Hong, D. W. Ho, G. Jiang. “Optimal distributed stochastic mirror descent for strongly convex optimization.”Automatica, 90, 196-203, 2018

work page 2018

[10] [10]

Distributed bandit online con- vex optimization with time-varying coupled inequality constraints

X. Yi, X. Li, T. Yang, L. Xie, T. Chai, K. H. Johansson. “Distributed bandit online con- vex optimization with time-varying coupled inequality constraints.”IEEE Transactions on Automatic Control, 66(10), 4620-4635, 2020

work page 2020

[11] [11]

Sundhar Ram, A

S. Sundhar Ram, A. Nedić, V. V. Veeravalli. Distributed stochastic subgradient projection algorithms for convex optimization.Journal of Optimization Theory and Applications, 147, 516-545, 2010

work page 2010

[12] [12]

Distributed mirror descent for online composite optimization

D. Yuan, Y. Hong, D. W. C. Ho, S. Xu. “Distributed mirror descent for online composite optimization.”IEEE Transactions on Automatic Control, 66(2), 714-729, 2021

work page 2021

[13] [13]

Distributed randomized gradient-free mirror descent algorithm for constrained optimization

Z. Yu, D. W. Ho, D. Yuan. “Distributed randomized gradient-free mirror descent algorithm for constrained optimization.”IEEE Transactions on Automatic Control, 67(2), 957 - 964, Feb 2022

work page 2022

[14] [14]

Stochastic mirror descent method for distributed multi-agent optimization

J. Li, G. Li, Z. Wu, C. Wu. “Stochastic mirror descent method for distributed multi-agent optimization.”Optimization Letters, 12, pp. 1179-1197, 2018

work page 2018

[15] [15]

Distributed mirror descent method for multi-agent opti- mization with delay

J. Li, G. Chen, Z. Dong, Z. Wu. “Distributed mirror descent method for multi-agent opti- mization with delay.”Neurocomputing, 177, 643-650, 2016

work page 2016

[16] [16]

Distributed constrained optimization with delayed subgradient information over time-varying network under adaptive quantization

J. Liu, Z. Yu, D. W. Ho. “Distributed constrained optimization with delayed subgradient information over time-varying network under adaptive quantization.”IEEE Transactions on Neural Networks and Learning Systems, 35(1), 143-156, 2022

work page 2022

[17] [17]

Event-triggered distributed stochastic mirror descent for convex optimization

M. Xiong, B. Zhang, D. W. C. Ho, D. Yuan, S. Xu. “Event-triggered distributed stochastic mirror descent for convex optimization.”IEEE Transactions on Neural Networks and Learning Systems, 34(9), 6480-6491, 2022

work page 2022

[18] [18]

Gossip-based distributed stochastic mirror descent for con- strained optimization

X. Fang, B. Zhang, D. Yuan. “Gossip-based distributed stochastic mirror descent for con- strained optimization.”Neural Networks, 175, 106291, 2024

work page 2024

[19] [19]

Distributed mirror descent algorithm with Bregman damping for nonsmooth constrained optimization

G. Chen, G. Xu, W. Li, Y. Hong. “Distributed mirror descent algorithm with Bregman damping for nonsmooth constrained optimization.”IEEE Transactions on Automatic Control, 68(11), 6921-6928, 2023

work page 2023

[20] [20]

High probability convergence of clipped distributed dual averaging with heavy-tailed noises

Y. Qin, K. Lu, H. Xu, X. Chen. “High probability convergence of clipped distributed dual averaging with heavy-tailed noises.”IEEE Transactions on Systems, Man, and Cybernetics: Systems, 2025

work page 2025

[21] [21]

Distributed stochastic strongly convex optimization under heavy-tailed noises

C. Sun, B Chen. “Distributed stochastic strongly convex optimization under heavy-tailed noises.”In 2024 IEEE International Conference on Cybernetics and Intelligent Systems (CIS) and IEEE International Conference on Robotics, Automation and Mechatronics (RAM) (pp. 150-155). IEEE. 32

work page 2024

[22] [22]

Online distributed stochastic gradient algorithm for nonconvex optimization with compressed communication

J. Li, C. Li, J. Fan, T. Huang. “Online distributed stochastic gradient algorithm for nonconvex optimization with compressed communication.”IEEE Transactions on Automatic Control, 69(2), 936-951, 2023

work page 2023

[23] [23]

A tail-index analysis of stochastic gradient noise in deep neural networks

U. Simsekli, L. Sagun, M. Gurbuzbalaban. “A tail-index analysis of stochastic gradient noise in deep neural networks.”International Conference on Machine Learning, pp. 5827-5837, 2019

work page 2019

[24] [24]

Why are adaptive methods good for attention models?

J. Zhang, S. P. Karimireddy, A. Veit, S. Kim, S. Reddi, S. Kumar, S. Sra. “Why are adaptive methods good for attention models?”Advances in Neural Information Processing Systems, 33, pp. 15383-15393, 2020

work page 2020

[25] [25]

Approximate projected consensus for convex intersection computation: Convergence analysis and critical error angle

Y. Lou, G. Shi, K. H. Johansson, Y. Hong. “Approximate projected consensus for convex intersection computation: Convergence analysis and critical error angle.”IEEE Transactions on Automatic Control, 59(7), 1722-1736, 2014

work page 2014

[26] [26]

Convergence in high probability of distributed stochastic gradient descent algorithms

K. Lu, H. Wang, H. Zhang, L. Wang. “Convergence in high probability of distributed stochastic gradient descent algorithms.”IEEE Transactions on Automatic Control, 69(4), 2189-2204, 2023

work page 2023

[27] [27]

Distributed gradient descent for functional learning

Z. Yu, J. Fan, Z. Shi, D. X. Zhou. “Distributed gradient descent for functional learning.” IEEE Transactions on Information Theory, 70(9), 6547 - 6571, 2024

work page 2024

[28] [28]

Distributed minimum error entropy algorithms

X. Guo, T. Hu, Q. Wu. “Distributed minimum error entropy algorithms.”Journal of Machine Learning Research, 21(126), 1-31, 2020

work page 2020

[29] [29]

Gradient descent for robust kernel-based regression

Z. C. Guo, T. Hu, L. Shi. “Gradient descent for robust kernel-based regression.”Inverse Problems, 34(6), 065009, 2018

work page 2018

[30] [30]

Distributed stochastic Nash equilibrium seeking under heavy-tailed noises

C. Sun, B. Chen, J. Wang, Z Wang, L. Yu. “Distributed stochastic Nash equilibrium seeking under heavy-tailed noises.”Automatica, 173, 112081, 2025

work page 2025

[31] [31]

A stochastic operator framework for optimization and learning with sub-weibull errors

N. Bastianello, L. Madden, R. Carli E. Dall’Anese. “A stochastic operator framework for optimization and learning with sub-weibull errors.”IEEE Transactions on Automatic Control, 2024

work page 2024

[32] [32]

High probability convergence bounds for non-convex stochastic gradient descent with sub-weibull noise

L. Madden, E. Dall’Anese, S. Becker. “High probability convergence bounds for non-convex stochastic gradient descent with sub-weibull noise.”Journal of Machine Learning Research, 25(241), pp. 1-36, 2024

work page 2024

[33] [33]

General tail bounds for non-smooth stochastic mirror descent

K. Eldowa, A. Paudice. “General tail bounds for non-smooth stochastic mirror descent.” International Conference on Artificial Intelligence and Statistics, pp. 3205-3213. PMLR, 2024

work page 2024

[34] [34]

High probability guarantees for nonconvex stochastic gradient descent with heavy tails

S. Li, Y. Liu. “High probability guarantees for nonconvex stochastic gradient descent with heavy tails.”International Conference on Machine Learning, pp. 12931-12963. PMLR, 2022

work page 2022

[35] [35]

Sub-Weibull distributions: Generalizing sub-Gaussian and sub-Exponential properties to heavier tailed distributions

M. Vladimirova, S. Girard, H. Nguyen, J. Arbel. “Sub-Weibull distributions: Generalizing sub-Gaussian and sub-Exponential properties to heavier tailed distributions.”Stat, 9(1), 2020

work page 2020

[36] [36]

Concentration inequalities for polynomials inα-sub- exponential random variables

F. Götze, H. Sambale, A. Sinulis. “Concentration inequalities for polynomials inα-sub- exponential random variables.”Electronic Journal of Probability, 26. 2021

work page 2021

[37] [37]

Convergence of Adaptive Stochastic Mirror Descent

T. Hu, X. Liu, K. Ji, Y. Lei. “Convergence of Adaptive Stochastic Mirror Descent.”IEEE Transactions on Neural Networks and Learning Systems, DOI: 10.1109/TNNLS.2025.3545420, early access, 2025. 33

work page doi:10.1109/tnnls.2025.3545420 2025

[38] [38]

Convergence of online mirror descent

Y. Lei, D. X. Zhou. “Convergence of online mirror descent.”Applied and Computational Harmonic Analysis, 48(1), 343-373, 2020

work page 2020

[39] [39]

Nash equilibrium seeking forN-coalition noncooperative games

M. Ye, G. Hu, F. L. Lewis. “Nash equilibrium seeking forN-coalition noncooperative games.” Automatica, 95, 266-272, 2018

work page 2018

[40] [40]

Distributed Nash equilibrium computation under round-robin scheduling protocol

Z. Feng, W. Xu, J. Cao. “Distributed Nash equilibrium computation under round-robin scheduling protocol.”IEEE Transactions on Automatic Control, 69(1), 339-346, 2023. 34

work page 2023