Gaussian Approximation and Multiplier Bootstrap for Federated Linear Stochastic Approximation

arxiv: 2605.19629 · v1 · pith:5FCX7CCBnew · submitted 2026-05-19 · 📊 stat.ML · cs.LG· math.OC

Gaussian Approximation and Multiplier Bootstrap for Federated Linear Stochastic Approximation

Ilya Levin , Maksim Shuklin , Eric Moulines , Paul Mangold , Sergey Samsonov This is my paper

Pith reviewed 2026-05-20 02:05 UTC · model grok-4.3

classification 📊 stat.ML cs.LGmath.OC

keywords federated learningstochastic approximationBerry-Esseen boundsGaussian approximationmultiplier bootstraplast-iterate inferenceheterogeneous data

0 comments p. Extension

pith:5FCX7CCB Add to your LaTeX paper

What is a Pith Number?

\usepackage{pith}
\pithnumber{5FCX7CCB}

Prints a linked pith:5FCX7CCB badge after your title and writes the identifier into PDF metadata. Compiles on arXiv with no extra files. Learn more

The pith

Federated linear stochastic approximation admits explicit Berry-Esseen bounds that track local updates, communication rounds and client heterogeneity.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper establishes Berry-Esseen-type bounds for the last iterate of federated linear stochastic approximation. These bounds make the approach to a Gaussian limit explicit in the local step size, the number of local updates per communication round, and the level of heterogeneity across clients. The results hold for both constant step-size schedules and decreasing step sizes paired with increasing local iterations, recovering earlier centralized rates when heterogeneity vanishes. A direct application is an online multiplier bootstrap that supplies non-asymptotic validity guarantees for inference without estimating the asymptotic covariance matrix. Practitioners in distributed learning settings need such guarantees because communication is limited and data distributions differ across nodes.

Core claim

We establish Berry-Esseen-type bounds for federated linear stochastic approximation that capture communication-computation trade-offs and heterogeneity-aware error terms, quantifying the effects of local step size, number of local updates, and heterogeneity on convergence rates. The bounds are derived for both constant step-size regimes and decreasing step-size regimes with an increasing number of local iterations. As a primary application we develop an online multiplier bootstrap procedure for inference on the last iterate that avoids explicit estimation of the asymptotic covariance matrix and obtain non-asymptotic validity guarantees for this procedure.

What carries the argument

Federated Berry-Esseen bounds that bound the Kolmogorov distance of the scaled last-iterate estimator to a Gaussian limit through additive terms for local step size, local iteration count per round, and heterogeneity.

If this is right

Increasing the number of local updates per round reduces the contribution of heterogeneity to the overall error.
The multiplier bootstrap yields valid confidence intervals for the last iterate under the same communication and heterogeneity conditions used for the Gaussian approximation.
Centralized rates are recovered as a special case when the number of clients is one or heterogeneity is zero.
The same bounding technique applies to both constant and diminishing step-size policies.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Designers of federated systems could use the explicit trade-off terms to choose how many local updates to perform before each communication round.
The linear structure exploited here suggests the same bounding approach may extend to certain non-linear stochastic approximation problems in distributed settings.
Large-scale numerical checks on real heterogeneous datasets would reveal how tight the derived bounds are in practice.

Load-bearing premise

The noise terms satisfy uniform moment bounds across all heterogeneous local distributions and the step-size conditions permit a single global recursion whose error terms remain controllable.

What would settle it

A simulation in which the empirical distribution of the normalized last-iterate estimator in a federated linear regression task is compared against the Gaussian limit and the observed deviation exceeds the paper's explicit Berry-Esseen bound for moderate numbers of communication rounds.

Figures

Figures reproduced from arXiv: 2605.19629 by Eric Moulines, Ilya Levin, Maksim Shuklin, Paul Mangold, Sergey Samsonov.

read the original abstract

In this paper, we establish Berry-Esseen-type bounds for federated linear stochastic approximation (LSA). Our results provide the first federated Gaussian approximations for LSA that explicitly capture communication-computation trade-offs and heterogeneity-aware error terms, quantifying the effects of local step size, number of local updates, and heterogeneity on convergence rates. We present results for both (i) constant step size regime and (ii) decreasing step size with an increasing number of local iterations, recovering the recent rates of Bonnerjee et al. [2025] as a special case. As a primary application of our results, we develop an online multiplier bootstrap procedure for inference on the last iterate, which avoids explicit estimation of the asymptotic covariance matrix, and obtain non-asymptotic validity guarantees for this procedure.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

This paper gives non-asymptotic Berry-Esseen bounds for federated LSA that track comms costs and heterogeneity plus a multiplier bootstrap with validity guarantees.

read the letter

The main takeaway is that this work establishes Berry-Esseen-type bounds for federated linear stochastic approximation that explicitly include communication-computation trade-offs and heterogeneity-aware terms, and it introduces an online multiplier bootstrap for last-iterate inference with non-asymptotic validity guarantees. They cover both constant step-size and decreasing step-size regimes with increasing local iterations, recovering the Bonnerjee et al. rates as a special case when the setup collapses to centralized. The bootstrap avoids explicit covariance estimation, which fits federated constraints well. The derivations embed local processes into a global recursion where heterogeneity enters as controllable additive errors under uniform bounded-moment and step-size assumptions. This produces explicit dependence on local step sizes, number of local updates, and communication rounds. A minor soft spot is that the uniform moment conditions across clients could prove fragile under strong real-world heterogeneity, potentially inflating the error terms, though the paper treats this as a standard modeling assumption rather than a hidden gap. No internal inconsistencies or circularity show up in the stated results. This paper is for researchers working on federated optimization and non-asymptotic inference in distributed stochastic approximation. A reader focused on uncertainty quantification beyond point estimates in FL would get direct value from the trade-off quantifications and the bootstrap procedure. It deserves peer review because the claims are specific, recover prior rates cleanly, and add a usable inference tool with formal guarantees.

Referee Report

2 major / 3 minor

Summary. The manuscript establishes Berry-Esseen-type bounds for federated linear stochastic approximation (LSA). It provides the first federated Gaussian approximations that explicitly capture communication-computation trade-offs and heterogeneity-aware error terms, quantifying effects of local step size, number of local updates, and heterogeneity. Results are given for both constant step-size and decreasing step-size regimes (recovering Bonnerjee et al. [2025] rates as a special case). As an application, an online multiplier bootstrap for inference on the last iterate is developed with non-asymptotic validity guarantees.

Significance. If the derivations hold, the work is significant for extending non-asymptotic Gaussian approximation and bootstrap theory to the federated LSA setting while explicitly incorporating practical factors such as communication rounds and client heterogeneity. The recovery of prior centralized rates as a special case and the covariance-free bootstrap procedure are clear strengths. Non-asymptotic bounds and explicit trade-off quantification add practical value for distributed inference.

major comments (2)

[Main results on constant step-size regime] The central global recursion that absorbs local heterogeneity must be shown to keep all additive error terms controllable under the stated uniform bounded-moment assumptions; without an explicit bound on the heterogeneity-induced term in the Berry-Esseen error (e.g., in the constant-step-size case), the claimed communication-computation trade-off quantification is not yet load-bearing.
[Bootstrap validity theorem] The non-asymptotic validity of the online multiplier bootstrap is derived from the Gaussian approximation; the proof must verify that the bootstrap multiplier distribution matches the last-iterate fluctuation up to the same o(1) rate as the Gaussian approximation itself, particularly when the number of local updates grows.

minor comments (3)

[Introduction] The introduction should include a short table or paragraph contrasting the new federated rates with the centralized rates of Bonnerjee et al. [2025] to make the improvement explicit.
[Preliminaries] Notation for the heterogeneity measure (e.g., variance of local means) should be introduced once and used consistently in all error-term statements.
[Numerical experiments] Figure captions for any convergence plots should state the precise values of local step size, communication rounds, and heterogeneity level used.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the thorough review and positive recommendation for minor revision. We address the major comments below, providing clarifications and indicating planned revisions to strengthen the presentation of our results.

read point-by-point responses

Referee: [Main results on constant step-size regime] The central global recursion that absorbs local heterogeneity must be shown to keep all additive error terms controllable under the stated uniform bounded-moment assumptions; without an explicit bound on the heterogeneity-induced term in the Berry-Esseen error (e.g., in the constant-step-size case), the claimed communication-computation trade-off quantification is not yet load-bearing.

Authors: We agree with the referee that an explicit bound on the heterogeneity-induced term would make the argument more transparent. In the manuscript, the global recursion is given in (4.3), and under Assumption 2.1-2.3, the heterogeneity term is bounded by C * delta * eta * K in the error analysis for the constant step-size case (see the derivation leading to (4.12)). This term is then absorbed into the overall Berry-Esseen bound as an additive O(delta * sqrt(eta)) term, which is controllable and vanishes under the appropriate scaling of communication rounds and local steps. We will revise the manuscript to include a new display equation isolating this term and a short paragraph explaining its controllability, confirming the load-bearing nature of the trade-off quantification. revision: yes
Referee: [Bootstrap validity theorem] The non-asymptotic validity of the online multiplier bootstrap is derived from the Gaussian approximation; the proof must verify that the bootstrap multiplier distribution matches the last-iterate fluctuation up to the same o(1) rate as the Gaussian approximation itself, particularly when the number of local updates grows.

Authors: We appreciate this comment. The non-asymptotic validity in Theorem 5.1 is proven by coupling the bootstrap process to the Gaussian approximation via the multiplier central limit theorem, with the error rate matching that of the GA up to o(1). When the number of local updates increases, the additional variance from local steps is accounted for in the last-iterate analysis, and the proof shows the bootstrap distribution converges at the same rate provided the total iterations satisfy the conditions for the GA to hold. To make this explicit, we will add a remark in the proof of Theorem 5.1 detailing the rate matching for growing local updates. revision: yes

Circularity Check

0 steps flagged

No significant circularity; derivation is self-contained from standard assumptions

full rationale

The paper derives Berry-Esseen bounds and bootstrap validity for federated LSA from a global recursion under uniform standard linear SA assumptions (bounded moments, step-size conditions) across heterogeneous clients. It recovers Bonnerjee et al. [2025] rates only as a special case without reducing the central claims to fitted parameters or self-citation chains. No load-bearing step equates outputs to inputs by construction; the communication-computation trade-offs and heterogeneity terms are explicitly quantified from the recursion rather than smuggled in via ansatz or renaming. This is the expected non-circular outcome for a non-asymptotic analysis paper.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract-only review provides no explicit list of free parameters, axioms, or invented entities; standard stochastic approximation assumptions (moment bounds, step-size conditions) are implicitly invoked but not detailed.

pith-pipeline@v0.9.0 · 5673 in / 1318 out tokens · 43700 ms · 2026-05-20T02:05:35.536826+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Foundation/RealityFromDistinction.lean reality_from_one_distinction unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

We establish Berry-Esseen-type bounds for federated linear stochastic approximation (LSA)... online multiplier bootstrap procedure for inference on the last iterate
IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

error decomposition... moment bounds... linear statistic Mt... convex distance dC

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

120 extracted references · 120 canonical work pages · 4 internal anchors

[1]

International Conference on Artificial Intelligence and Statistics , pages=

Stochastic methods in variational inequalities: Ergodicity, bias and refinements , author=. International Conference on Artificial Intelligence and Statistics , pages=. 2024 , organization=

work page 2024
[2]

Journal of the Operational Research Society , volume=

On the generation of markov decision processes , author=. Journal of the Operational Research Society , volume=. 1995 , publisher=

work page 1995
[3]

, author=

Off-policy learning with eligibility traces: a survey. , author=. J. Mach. Learn. Res. , volume=

work page
[4]

arXiv preprint arXiv:1810.08693 , year=

The total variation distance between high-dimensional Gaussians with the same mean , author=. arXiv preprint arXiv:1810.08693 , year=

work page arXiv
[5]

Fedoryuk, M. V. , TITLE =. 1977 , PAGES =

work page 1977
[6]

Olver, Frank W. J. , TITLE =. 1997 , PAGES =

work page 1997
[7]

Estimates for the closeness of Gaussian measures. Dokl. Akad. Nauk SSSR , year =

work page
[8]

Journal of Statistical Planning and Inference , volume =

On the dependence of the. Journal of Statistical Planning and Inference , volume =. 2003 , issn =. doi:https://doi.org/10.1016/S0378-3758(02)00094-0 , url =

work page doi:10.1016/s0378-3758(02)00094-0 2003
[9]

The Annals of Statistics , number =

Vladimir Spokoiny and Mayya Zhilova , title =. The Annals of Statistics , number =. 2015 , doi =

work page 2015
[10]

Conference on Learning Theory , pages=

Statistical estimation and online inference via local sgd , author=. Conference on Learning Theory , pages=. 2022 , organization=

work page 2022
[11]

Lee and Xin T

Xi Chen and Jason D. Lee and Xin T. Tong and Yichen Zhang , doi =. The Annals of Statistics , number =. 2020 , bdsk-url-1 =

work page 2020
[12]

Journal of the American Statistical Association , volume=

Statistical inference for online decision making via stochastic gradient descent , author=. Journal of the American Statistical Association , volume=. 2021 , publisher=

work page 2021
[13]

arXiv preprint arXiv:2212.14883 , year=

Online statistical inference for contextual bandits via stochastic gradient descent , author=. arXiv preprint arXiv:2212.14883 , year=

work page arXiv
[14]

2000 , publisher=

Probability for statisticians , author=. 2000 , publisher=

work page 2000
[15]

Statistical

Samsonov, Sergey and Sheshukova, Marina and Moulines, Eric and Naumov, Alexey , journal=. Statistical

work page
[16]

Central Limit Theorems for Asynchronous Averaged Q-Learning

Central Limit Theorems for Asynchronous Averaged Q-Learning , author=. arXiv preprint arXiv:2509.18964 , year=

work page internal anchor Pith review Pith/arXiv arXiv
[17]

Proceedings of Machine learning and systems , volume=

Federated optimization in heterogeneous networks , author=. Proceedings of Machine learning and systems , volume=

work page
[18]

International Conference on Machine Learning , pages=

Proxskip: Yes! local gradient steps provably lead to communication acceleration! finally! , author=. International Conference on Machine Learning , pages=. 2022 , organization=

work page 2022
[19]

International conference on artificial intelligence and statistics , pages=

Tighter theory for local SGD on identical and heterogeneous data , author=. International conference on artificial intelligence and statistics , pages=. 2020 , organization=

work page 2020
[20]

International conference on machine learning , pages=

Scaffold: Stochastic controlled averaging for federated learning , author=. International conference on machine learning , pages=. 2020 , organization=

work page 2020
[21]

Improved

Butyrin, Bogdan and Moulines, Eric and Naumov, Alexey and Samsonov, Sergey and Shao, Qi-Man and Zhang, Zhuo-Song , journal=. Improved

work page
[22]

Uncertainty quantification for

Wu, Weichen and Wei, Yuting and Rinaldo, Alessandro , journal=. Uncertainty quantification for

work page
[23]

arXiv preprint arXiv:2410.16106 , year=

Statistical inference for temporal difference learning with linear function approximation , author=. arXiv preprint arXiv:2410.16106 , year=

work page arXiv
[24]

Advances in Neural Information Processing Systems , volume=

Gaussian approximation and multiplier bootstrap for polyak-ruppert averaged linear stochastic approximation with applications to td learning , author=. Advances in Neural Information Processing Systems , volume=

work page
[25]

Foundations and trends

An introduction to matrix concentration inequalities , author=. Foundations and trends. 2015 , publisher=

work page 2015
[26]

Gaussian Approximation and Multiplier Bootstrap for Stochastic Gradient Descent

Gaussian approximation and multiplier bootstrap for stochastic gradient descent , author=. arXiv preprint arXiv:2502.06719 , year=

work page internal anchor Pith review Pith/arXiv arXiv
[27]

Efficient estimations from a slowly convergent

Ruppert, David , year=. Efficient estimations from a slowly convergent

work page
[28]

Advances in Neural Information Processing Systems , volume=

Tight high probability bounds for linear stochastic approximation with fixed stepsize , author=. Advances in Neural Information Processing Systems , volume=

work page
[29]

A. S. Poznyak , publisher =. Advanced Mathematical Tools for Automatic Control Engineers: Deterministic Techniques , year =

work page
[30]

arXiv preprint arXiv:2401.13884 , year=

Constant Stepsize Q-learning: Distributional Convergence, Bias and Extrapolation , author=. arXiv preprint arXiv:2401.13884 , year=

work page arXiv
[31]

1987 , publisher=

Introduction to numerical analysis , author=. 1987 , publisher=

work page 1987
[32]

Bernoulli , volume=

Berry--Esseen bounds for multivariate nonlinear statistics with applications to M-estimators and stochastic gradient descent algorithms , author=. Bernoulli , volume=. 2022 , publisher=

work page 2022
[33]

Computing the

Allmeier, Sebastian and Gast, Nicolas , booktitle =. Computing the

work page
[34]

and Moulines, E

Douc, R. and Moulines, E. and Priouret, P. and Soulier, P. , TITLE =. 2018 , PAGES =

work page 2018
[35]

2012 , series =

Sharp Martingale and Semimartingale Inequalities , author =. 2012 , series =

work page 2012
[36]

Merlev. A. Probability Theory and Related Fields , volume=. 2011 , publisher=

work page 2011
[37]

Asymptotic Theory of Weakly Dependent Random Processes , volume =

Rio, Emmanuel , year =. Asymptotic Theory of Weakly Dependent Random Processes , volume =

work page
[38]

The Annals of Probability , number =

Iosif Pinelis , title =. The Annals of Probability , number =. 1994 , doi =

work page 1994
[39]

, TITLE =

Rosenthal, Haskell P. , TITLE =. Israel J. Math. , FJOURNAL =. 1970 , PAGES =. doi:10.1007/BF02771562 , URL =

work page doi:10.1007/bf02771562 1970
[40]

On linear stochastic approximation: Fine-grained

Mou, Wenlong and Li, Chris Junchi and Wainwright, Martin J and Bartlett, Peter L and Jordan, Michael I , booktitle=. On linear stochastic approximation: Fine-grained. 2020 , organization=

work page 2020
[41]

and Moulines, E

Bach, F. and Moulines, E. , booktitle =. Non-strongly-convex smooth stochastic approximation with convergence rate O(1/n) , url =

work page
[42]

The Collusion of Memory and Nonlinearity in Stochastic Approximation With Constant Stepsize , volume =

Huo, Dongyan Lucy and Zhang, Yixuan and Chen, Yudong and Xie, Qiaomin , booktitle =. The Collusion of Memory and Nonlinearity in Stochastic Approximation With Constant Stepsize , volume =

work page
[43]

nature , volume=

Human-level control through deep reinforcement learning , author=. nature , volume=. 2015 , publisher=

work page 2015
[44]

Neural networks , volume=

On the momentum term in gradient descent learning algorithms , author=. Neural networks , volume=. 1999 , publisher=

work page 1999
[45]

Nesterov, Yu. E. , TITLE =. Dokl. Akad. Nauk SSSR , FJOURNAL =. 1983 , NUMBER =

work page 1983
[46]

Defazio, Aaron and Bach, Francis and Lacoste-Julien, Simon , journal=

work page
[47]

Fast and faster convergence of

Vaswani, Sharan and Bach, Francis and Schmidt, Mark , booktitle=. Fast and faster convergence of. 2019 , organization=

work page 2019
[48]

International Conference on Machine Learning , pages=

SARAH: A novel method for machine learning problems using stochastic recursive gradient , author=. International Conference on Machine Learning , pages=. 2017 , organization=

work page 2017
[49]

SIAM Journal on Optimization , volume=

Stochastic first-and zeroth-order methods for nonconvex stochastic programming , author=. SIAM Journal on Optimization , volume=. 2013 , publisher=

work page 2013
[50]

The Journal of Machine Learning Research , volume=

Harder, better, faster, stronger convergence rates for least-squares regression , author=. The Journal of Machine Learning Research , volume=. 2017 , publisher=

work page 2017
[51]

2016 , Address =

Deep Learning , Author =. 2016 , Address =

work page 2016
[52]

and Barto, Andrew G

Sutton, Richard S. and Barto, Andrew G. , biburl =. Reinforcement Learning: An Introduction , url =

work page
[53]

International conference on machine learning , pages=

Trust region policy optimization , author=. International conference on machine learning , pages=. 2015 , organization=

work page 2015
[54]

2000 , publisher=

Asymptotic statistics , author=. 2000 , publisher=

work page 2000
[55]

SIAM journal on control and optimization , volume=

Acceleration of stochastic approximation by averaging , author=. SIAM journal on control and optimization , volume=. 1992 , publisher=

work page 1992
[56]

Generative Adversarial Nets , year =

Goodfellow, Ian and Pouget-Abadie, Jean and Mirza, Mehdi and Xu, Bing and Warde-Farley, David and Ozair, Sherjil and Courville, Aaron and Bengio, Yoshua , booktitle =. Generative Adversarial Nets , year =

work page
[57]

A stochastic approximation method , year =

Robbins, Herbert and Monro, Sutton , date-added =. A stochastic approximation method , year =. The annals of mathematical statistics , pages =

work page
[58]

and Talay, D

Bally, V. and Talay, D. , date =. The law of the Euler scheme for stochastic differential equations , url =. Probability Theory and Related Fields , number =. 1996 , bdsk-url-1 =. doi:10.1007/BF01303802 , id =

work page doi:10.1007/bf01303802 1996
[59]

Optimal non-asymptotic analysis of the

S. Optimal non-asymptotic analysis of the. Stochastic Processes and their Applications , keywords =. 2023 , bdsk-url-1 =. doi:https://doi.org/10.1016/j.spa.2022.11.012 , issn =

work page doi:10.1016/j.spa.2022.11.012 2023
[60]

Advances in neural information processing systems , volume=

Non-asymptotic analysis of stochastic approximation algorithms for machine learning , author=. Advances in neural information processing systems , volume=

work page
[61]

Asymptotic Almost Sure Efficiency of Averaged Stochastic Algorithms , url =

Pelletier, Mariane , doi =. Asymptotic Almost Sure Efficiency of Averaged Stochastic Algorithms , url =. 2000 , bdsk-url-1 =. https://doi.org/10.1137/S0363012998308169 , journal =

work page doi:10.1137/s0363012998308169 2000
[62]

Statistics , volume=

Convergence in quadratic mean of averaged stochastic gradient algorithms without strong convexity nor bounded gradient , author=. Statistics , volume=. 2023 , publisher=

work page 2023
[63]

The Annals of Statistics , keywords =

Aymeric Dieuleveut and Alain Durmus and Francis Bach , doi =. The Annals of Statistics , keywords =. 2020 , bdsk-url-1 =

work page 2020
[64]

Rosenthal-type inequalities for linear statistics of

Durmus, Alain and Moulines, Eric and Naumov, Alexey and Samsonov, Sergey and Sheshukova, Marina , journal=. Rosenthal-type inequalities for linear statistics of

work page
[65]

Advances in Neural Information Processing Systems , volume=

Stochastic gradient descent-ascent and consensus optimization for smooth games: Convergence analysis under expected co-coercivity , author=. Advances in Neural Information Processing Systems , volume=

work page
[66]

International Conference on Artificial Intelligence and Statistics , pages=

Stochastic gradient descent-ascent: Unified theory and new efficient methods , author=. International Conference on Artificial Intelligence and Statistics , pages=. 2023 , organization=

work page 2023
[67]

International Conference on Machine Learning , pages=

Finite-time last-iterate convergence for multi-agent learning in games , author=. International Conference on Machine Learning , pages=. 2020 , organization=

work page 2020
[68]

SIAM Journal on optimization , volume=

Robust stochastic approximation approach to stochastic programming , author=. SIAM Journal on optimization , volume=. 2009 , publisher=

work page 2009
[69]

Advances in Neural Information Processing Systems , volume=

Global convergence and variance reduction for a class of nonconvex-nonconcave minimax problems , author=. Advances in Neural Information Processing Systems , volume=

work page
[70]

arXiv preprint arXiv:2210.05995 , year=

Sgda with shuffling: faster convergence for nonconvex-p \ L \ minimax optimization , author=. arXiv preprint arXiv:2210.05995 , year=

work page arXiv
[71]

International Conference on Artificial Intelligence and Statistics , pages=

Local stochastic gradient descent ascent: Convergence analysis and communication efficiency , author=. International Conference on Artificial Intelligence and Statistics , pages=. 2021 , organization=

work page 2021
[72]

International Conference on Artificial Intelligence and Statistics , pages=

Randomized stochastic gradient descent ascent , author=. International Conference on Artificial Intelligence and Statistics , pages=. 2022 , organization=

work page 2022
[73]

IEEE Transactions on Pattern Analysis and Machine Intelligence , year=

Gradient Descent Ascent for Minimax Problems on Riemannian Manifolds , author=. IEEE Transactions on Pattern Analysis and Machine Intelligence , year=

work page
[74]

arXiv preprint arXiv:2306.11497 , year=

Convergence and concentration properties of constant step-size SGD through Markov chains , author=. arXiv preprint arXiv:2306.11497 , year=

work page arXiv
[75]

Yurii Nesterov , title =

work page
[76]

ESAIM: Probability and Statistics , volume=

Central limit theorems for stochastic approximation with controlled Markov chain dynamics , author=. ESAIM: Probability and Statistics , volume=. 2015 , publisher=

work page 2015
[77]

Prelimit Coupling and Steady-State Convergence of Constant-stepsize Nonsmooth Contractive

Yixuan Zhang and Dongyan Huo and Yudong Chen and Qiaomin Xie , year=. Prelimit Coupling and Steady-State Convergence of Constant-stepsize Nonsmooth Contractive. arXiv preprint arXiv:2303.05838 , url=

work page arXiv
[78]

Proceedings of Thirty Fifth Conference on Learning Theory , pages =

ROOT-SGD: Sharp Nonasymptotics and Asymptotic Efficiency in a Single Algorithm , author =. Proceedings of Thirty Fifth Conference on Learning Theory , pages =. 2022 , editor =

work page 2022
[79]

Expansion of the global error for numerical schemes solving stochastic differential equations , url =

Denis Talay and Luciano Tubaro , doi =. Expansion of the global error for numerical schemes solving stochastic differential equations , url =. 1990 , bdsk-url-1 =. https://doi.org/10.1080/07362999008809220 , journal =

work page doi:10.1080/07362999008809220 1990
[80]

Monte Carlo Methods and Applications , doi =

, author =. Monte Carlo Methods and Applications , doi =

work page

Showing first 80 references.

[1] [1]

International Conference on Artificial Intelligence and Statistics , pages=

Stochastic methods in variational inequalities: Ergodicity, bias and refinements , author=. International Conference on Artificial Intelligence and Statistics , pages=. 2024 , organization=

work page 2024

[2] [2]

Journal of the Operational Research Society , volume=

On the generation of markov decision processes , author=. Journal of the Operational Research Society , volume=. 1995 , publisher=

work page 1995

[3] [3]

, author=

Off-policy learning with eligibility traces: a survey. , author=. J. Mach. Learn. Res. , volume=

work page

[4] [4]

arXiv preprint arXiv:1810.08693 , year=

The total variation distance between high-dimensional Gaussians with the same mean , author=. arXiv preprint arXiv:1810.08693 , year=

work page arXiv

[5] [5]

Fedoryuk, M. V. , TITLE =. 1977 , PAGES =

work page 1977

[6] [6]

Olver, Frank W. J. , TITLE =. 1997 , PAGES =

work page 1997

[7] [7]

Estimates for the closeness of Gaussian measures. Dokl. Akad. Nauk SSSR , year =

work page

[8] [8]

Journal of Statistical Planning and Inference , volume =

On the dependence of the. Journal of Statistical Planning and Inference , volume =. 2003 , issn =. doi:https://doi.org/10.1016/S0378-3758(02)00094-0 , url =

work page doi:10.1016/s0378-3758(02)00094-0 2003

[9] [9]

The Annals of Statistics , number =

Vladimir Spokoiny and Mayya Zhilova , title =. The Annals of Statistics , number =. 2015 , doi =

work page 2015

[10] [10]

Conference on Learning Theory , pages=

Statistical estimation and online inference via local sgd , author=. Conference on Learning Theory , pages=. 2022 , organization=

work page 2022

[11] [11]

Lee and Xin T

Xi Chen and Jason D. Lee and Xin T. Tong and Yichen Zhang , doi =. The Annals of Statistics , number =. 2020 , bdsk-url-1 =

work page 2020

[12] [12]

Journal of the American Statistical Association , volume=

Statistical inference for online decision making via stochastic gradient descent , author=. Journal of the American Statistical Association , volume=. 2021 , publisher=

work page 2021

[13] [13]

arXiv preprint arXiv:2212.14883 , year=

Online statistical inference for contextual bandits via stochastic gradient descent , author=. arXiv preprint arXiv:2212.14883 , year=

work page arXiv

[14] [14]

2000 , publisher=

Probability for statisticians , author=. 2000 , publisher=

work page 2000

[15] [15]

Statistical

Samsonov, Sergey and Sheshukova, Marina and Moulines, Eric and Naumov, Alexey , journal=. Statistical

work page

[16] [16]

Central Limit Theorems for Asynchronous Averaged Q-Learning

Central Limit Theorems for Asynchronous Averaged Q-Learning , author=. arXiv preprint arXiv:2509.18964 , year=

work page internal anchor Pith review Pith/arXiv arXiv

[17] [17]

Proceedings of Machine learning and systems , volume=

Federated optimization in heterogeneous networks , author=. Proceedings of Machine learning and systems , volume=

work page

[18] [18]

International Conference on Machine Learning , pages=

Proxskip: Yes! local gradient steps provably lead to communication acceleration! finally! , author=. International Conference on Machine Learning , pages=. 2022 , organization=

work page 2022

[19] [19]

International conference on artificial intelligence and statistics , pages=

Tighter theory for local SGD on identical and heterogeneous data , author=. International conference on artificial intelligence and statistics , pages=. 2020 , organization=

work page 2020

[20] [20]

International conference on machine learning , pages=

Scaffold: Stochastic controlled averaging for federated learning , author=. International conference on machine learning , pages=. 2020 , organization=

work page 2020

[21] [21]

Improved

Butyrin, Bogdan and Moulines, Eric and Naumov, Alexey and Samsonov, Sergey and Shao, Qi-Man and Zhang, Zhuo-Song , journal=. Improved

work page

[22] [22]

Uncertainty quantification for

Wu, Weichen and Wei, Yuting and Rinaldo, Alessandro , journal=. Uncertainty quantification for

work page

[23] [23]

arXiv preprint arXiv:2410.16106 , year=

Statistical inference for temporal difference learning with linear function approximation , author=. arXiv preprint arXiv:2410.16106 , year=

work page arXiv

[24] [24]

Advances in Neural Information Processing Systems , volume=

Gaussian approximation and multiplier bootstrap for polyak-ruppert averaged linear stochastic approximation with applications to td learning , author=. Advances in Neural Information Processing Systems , volume=

work page

[25] [25]

Foundations and trends

An introduction to matrix concentration inequalities , author=. Foundations and trends. 2015 , publisher=

work page 2015

[26] [26]

Gaussian Approximation and Multiplier Bootstrap for Stochastic Gradient Descent

Gaussian approximation and multiplier bootstrap for stochastic gradient descent , author=. arXiv preprint arXiv:2502.06719 , year=

work page internal anchor Pith review Pith/arXiv arXiv

[27] [27]

Efficient estimations from a slowly convergent

Ruppert, David , year=. Efficient estimations from a slowly convergent

work page

[28] [28]

Advances in Neural Information Processing Systems , volume=

Tight high probability bounds for linear stochastic approximation with fixed stepsize , author=. Advances in Neural Information Processing Systems , volume=

work page

[29] [29]

A. S. Poznyak , publisher =. Advanced Mathematical Tools for Automatic Control Engineers: Deterministic Techniques , year =

work page

[30] [30]

arXiv preprint arXiv:2401.13884 , year=

Constant Stepsize Q-learning: Distributional Convergence, Bias and Extrapolation , author=. arXiv preprint arXiv:2401.13884 , year=

work page arXiv

[31] [31]

1987 , publisher=

Introduction to numerical analysis , author=. 1987 , publisher=

work page 1987

[32] [32]

Bernoulli , volume=

Berry--Esseen bounds for multivariate nonlinear statistics with applications to M-estimators and stochastic gradient descent algorithms , author=. Bernoulli , volume=. 2022 , publisher=

work page 2022

[33] [33]

Computing the

Allmeier, Sebastian and Gast, Nicolas , booktitle =. Computing the

work page

[34] [34]

and Moulines, E

Douc, R. and Moulines, E. and Priouret, P. and Soulier, P. , TITLE =. 2018 , PAGES =

work page 2018

[35] [35]

2012 , series =

Sharp Martingale and Semimartingale Inequalities , author =. 2012 , series =

work page 2012

[36] [36]

Merlev. A. Probability Theory and Related Fields , volume=. 2011 , publisher=

work page 2011

[37] [37]

Asymptotic Theory of Weakly Dependent Random Processes , volume =

Rio, Emmanuel , year =. Asymptotic Theory of Weakly Dependent Random Processes , volume =

work page

[38] [38]

The Annals of Probability , number =

Iosif Pinelis , title =. The Annals of Probability , number =. 1994 , doi =

work page 1994

[39] [39]

, TITLE =

Rosenthal, Haskell P. , TITLE =. Israel J. Math. , FJOURNAL =. 1970 , PAGES =. doi:10.1007/BF02771562 , URL =

work page doi:10.1007/bf02771562 1970

[40] [40]

On linear stochastic approximation: Fine-grained

Mou, Wenlong and Li, Chris Junchi and Wainwright, Martin J and Bartlett, Peter L and Jordan, Michael I , booktitle=. On linear stochastic approximation: Fine-grained. 2020 , organization=

work page 2020

[41] [41]

and Moulines, E

Bach, F. and Moulines, E. , booktitle =. Non-strongly-convex smooth stochastic approximation with convergence rate O(1/n) , url =

work page

[42] [42]

The Collusion of Memory and Nonlinearity in Stochastic Approximation With Constant Stepsize , volume =

Huo, Dongyan Lucy and Zhang, Yixuan and Chen, Yudong and Xie, Qiaomin , booktitle =. The Collusion of Memory and Nonlinearity in Stochastic Approximation With Constant Stepsize , volume =

work page

[43] [43]

nature , volume=

Human-level control through deep reinforcement learning , author=. nature , volume=. 2015 , publisher=

work page 2015

[44] [44]

Neural networks , volume=

On the momentum term in gradient descent learning algorithms , author=. Neural networks , volume=. 1999 , publisher=

work page 1999

[45] [45]

Nesterov, Yu. E. , TITLE =. Dokl. Akad. Nauk SSSR , FJOURNAL =. 1983 , NUMBER =

work page 1983

[46] [46]

Defazio, Aaron and Bach, Francis and Lacoste-Julien, Simon , journal=

work page

[47] [47]

Fast and faster convergence of

Vaswani, Sharan and Bach, Francis and Schmidt, Mark , booktitle=. Fast and faster convergence of. 2019 , organization=

work page 2019

[48] [48]

International Conference on Machine Learning , pages=

SARAH: A novel method for machine learning problems using stochastic recursive gradient , author=. International Conference on Machine Learning , pages=. 2017 , organization=

work page 2017

[49] [49]

SIAM Journal on Optimization , volume=

Stochastic first-and zeroth-order methods for nonconvex stochastic programming , author=. SIAM Journal on Optimization , volume=. 2013 , publisher=

work page 2013

[50] [50]

The Journal of Machine Learning Research , volume=

Harder, better, faster, stronger convergence rates for least-squares regression , author=. The Journal of Machine Learning Research , volume=. 2017 , publisher=

work page 2017

[51] [51]

2016 , Address =

Deep Learning , Author =. 2016 , Address =

work page 2016

[52] [52]

and Barto, Andrew G

Sutton, Richard S. and Barto, Andrew G. , biburl =. Reinforcement Learning: An Introduction , url =

work page

[53] [53]

International conference on machine learning , pages=

Trust region policy optimization , author=. International conference on machine learning , pages=. 2015 , organization=

work page 2015

[54] [54]

2000 , publisher=

Asymptotic statistics , author=. 2000 , publisher=

work page 2000

[55] [55]

SIAM journal on control and optimization , volume=

Acceleration of stochastic approximation by averaging , author=. SIAM journal on control and optimization , volume=. 1992 , publisher=

work page 1992

[56] [56]

Generative Adversarial Nets , year =

Goodfellow, Ian and Pouget-Abadie, Jean and Mirza, Mehdi and Xu, Bing and Warde-Farley, David and Ozair, Sherjil and Courville, Aaron and Bengio, Yoshua , booktitle =. Generative Adversarial Nets , year =

work page

[57] [57]

A stochastic approximation method , year =

Robbins, Herbert and Monro, Sutton , date-added =. A stochastic approximation method , year =. The annals of mathematical statistics , pages =

work page

[58] [58]

and Talay, D

Bally, V. and Talay, D. , date =. The law of the Euler scheme for stochastic differential equations , url =. Probability Theory and Related Fields , number =. 1996 , bdsk-url-1 =. doi:10.1007/BF01303802 , id =

work page doi:10.1007/bf01303802 1996

[59] [59]

Optimal non-asymptotic analysis of the

S. Optimal non-asymptotic analysis of the. Stochastic Processes and their Applications , keywords =. 2023 , bdsk-url-1 =. doi:https://doi.org/10.1016/j.spa.2022.11.012 , issn =

work page doi:10.1016/j.spa.2022.11.012 2023

[60] [60]

Advances in neural information processing systems , volume=

Non-asymptotic analysis of stochastic approximation algorithms for machine learning , author=. Advances in neural information processing systems , volume=

work page

[61] [61]

Asymptotic Almost Sure Efficiency of Averaged Stochastic Algorithms , url =

Pelletier, Mariane , doi =. Asymptotic Almost Sure Efficiency of Averaged Stochastic Algorithms , url =. 2000 , bdsk-url-1 =. https://doi.org/10.1137/S0363012998308169 , journal =

work page doi:10.1137/s0363012998308169 2000

[62] [62]

Statistics , volume=

Convergence in quadratic mean of averaged stochastic gradient algorithms without strong convexity nor bounded gradient , author=. Statistics , volume=. 2023 , publisher=

work page 2023

[63] [63]

The Annals of Statistics , keywords =

Aymeric Dieuleveut and Alain Durmus and Francis Bach , doi =. The Annals of Statistics , keywords =. 2020 , bdsk-url-1 =

work page 2020

[64] [64]

Rosenthal-type inequalities for linear statistics of

Durmus, Alain and Moulines, Eric and Naumov, Alexey and Samsonov, Sergey and Sheshukova, Marina , journal=. Rosenthal-type inequalities for linear statistics of

work page

[65] [65]

Advances in Neural Information Processing Systems , volume=

Stochastic gradient descent-ascent and consensus optimization for smooth games: Convergence analysis under expected co-coercivity , author=. Advances in Neural Information Processing Systems , volume=

work page

[66] [66]

International Conference on Artificial Intelligence and Statistics , pages=

Stochastic gradient descent-ascent: Unified theory and new efficient methods , author=. International Conference on Artificial Intelligence and Statistics , pages=. 2023 , organization=

work page 2023

[67] [67]

International Conference on Machine Learning , pages=

Finite-time last-iterate convergence for multi-agent learning in games , author=. International Conference on Machine Learning , pages=. 2020 , organization=

work page 2020

[68] [68]

SIAM Journal on optimization , volume=

Robust stochastic approximation approach to stochastic programming , author=. SIAM Journal on optimization , volume=. 2009 , publisher=

work page 2009

[69] [69]

Advances in Neural Information Processing Systems , volume=

Global convergence and variance reduction for a class of nonconvex-nonconcave minimax problems , author=. Advances in Neural Information Processing Systems , volume=

work page

[70] [70]

arXiv preprint arXiv:2210.05995 , year=

Sgda with shuffling: faster convergence for nonconvex-p \ L \ minimax optimization , author=. arXiv preprint arXiv:2210.05995 , year=

work page arXiv

[71] [71]

International Conference on Artificial Intelligence and Statistics , pages=

Local stochastic gradient descent ascent: Convergence analysis and communication efficiency , author=. International Conference on Artificial Intelligence and Statistics , pages=. 2021 , organization=

work page 2021

[72] [72]

International Conference on Artificial Intelligence and Statistics , pages=

Randomized stochastic gradient descent ascent , author=. International Conference on Artificial Intelligence and Statistics , pages=. 2022 , organization=

work page 2022

[73] [73]

IEEE Transactions on Pattern Analysis and Machine Intelligence , year=

Gradient Descent Ascent for Minimax Problems on Riemannian Manifolds , author=. IEEE Transactions on Pattern Analysis and Machine Intelligence , year=

work page

[74] [74]

arXiv preprint arXiv:2306.11497 , year=

Convergence and concentration properties of constant step-size SGD through Markov chains , author=. arXiv preprint arXiv:2306.11497 , year=

work page arXiv

[75] [75]

Yurii Nesterov , title =

work page

[76] [76]

ESAIM: Probability and Statistics , volume=

Central limit theorems for stochastic approximation with controlled Markov chain dynamics , author=. ESAIM: Probability and Statistics , volume=. 2015 , publisher=

work page 2015

[77] [77]

Prelimit Coupling and Steady-State Convergence of Constant-stepsize Nonsmooth Contractive

Yixuan Zhang and Dongyan Huo and Yudong Chen and Qiaomin Xie , year=. Prelimit Coupling and Steady-State Convergence of Constant-stepsize Nonsmooth Contractive. arXiv preprint arXiv:2303.05838 , url=

work page arXiv

[78] [78]

Proceedings of Thirty Fifth Conference on Learning Theory , pages =

ROOT-SGD: Sharp Nonasymptotics and Asymptotic Efficiency in a Single Algorithm , author =. Proceedings of Thirty Fifth Conference on Learning Theory , pages =. 2022 , editor =

work page 2022

[79] [79]

Expansion of the global error for numerical schemes solving stochastic differential equations , url =

Denis Talay and Luciano Tubaro , doi =. Expansion of the global error for numerical schemes solving stochastic differential equations , url =. 1990 , bdsk-url-1 =. https://doi.org/10.1080/07362999008809220 , journal =

work page doi:10.1080/07362999008809220 1990

[80] [80]

Monte Carlo Methods and Applications , doi =

, author =. Monte Carlo Methods and Applications , doi =

work page