On ergodicity of the SAGA-LD algorithm
Pith reviewed 2026-05-10 14:19 UTC · model grok-4.3
The pith
The SAGA-LD algorithm converges to a limiting distribution with a law of large numbers holding for its time averages.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Using a model-specific method, the SAGA-LD algorithm is proven to converge to a limiting distribution. A law of large numbers is shown to hold for the ergodic averages produced by the algorithm.
What carries the argument
The model-specific method developed to prove convergence and the law of large numbers for the intricate dynamics of SAGA-LD.
If this is right
- The algorithm produces asymptotically correct samples from the target distribution.
- Ergodic theorems allow consistent estimation of expectations via trajectory averages.
- SAGA-LD can be applied in high-dimensional settings with theoretical guarantees on its long-run behavior.
Where Pith is reading between the lines
- Similar custom methods could be developed for other non-standard sampling algorithms in machine learning.
- The result highlights the need for bespoke analysis when variance reduction techniques complicate the Markov property.
- One could test the convergence numerically for specific distributions to check the practical range of the theorem's assumptions.
Load-bearing premise
The model-specific proof requires particular conditions on the target distribution, step sizes, and memory parameters that remain unspecified in the abstract.
What would settle it
A concrete counterexample consisting of a target distribution and algorithm parameters where SAGA-LD does not converge to a unique stationary distribution would falsify the result.
read the original abstract
The so-called SAGA-LD algorithm is used for efficient sampling from high-dimensional distributions in machine learning. Its intricate dynamics resists standard approaches of Markov chain theory. We prove, using a model-specific method, that SAGA-LD converges to a limiting distribution and a law of large numbers holds.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript proves that the SAGA-LD algorithm converges to a limiting distribution and satisfies a law of large numbers, using a model-specific method to handle its intricate dynamics that resist standard Markov chain theory.
Significance. A rigorous proof of ergodicity for SAGA-LD under conditions relevant to high-dimensional non-convex sampling would be significant for machine learning, as it would justify the algorithm's use with theoretical guarantees on convergence and averaging.
major comments (1)
- The central claim rests on a model-specific proof whose assumptions on the target distribution (e.g., convexity or smoothness requirements), step-size schedule, and memory parameters are not stated explicitly enough to verify applicability to typical ML settings; without a clear theorem statement listing these conditions, the scope of the result cannot be assessed.
Simulated Author's Rebuttal
We thank the referee for their careful reading of the manuscript and for highlighting the need for greater clarity on the assumptions. We address the major comment below.
read point-by-point responses
-
Referee: The central claim rests on a model-specific proof whose assumptions on the target distribution (e.g., convexity or smoothness requirements), step-size schedule, and memory parameters are not stated explicitly enough to verify applicability to typical ML settings; without a clear theorem statement listing these conditions, the scope of the result cannot be assessed.
Authors: We agree that a single, self-contained theorem statement listing all assumptions would improve readability and allow readers to readily assess applicability. In the revised manuscript we will add an explicit theorem statement (placed prominently at the start of the main results section) that enumerates every condition required for the convergence and law-of-large-numbers results. This statement will include the precise requirements on the target distribution (any smoothness, convexity, or other regularity assumptions used in the proof), the step-size schedule, and the memory parameters of the SAGA-LD recursion. The model-specific character of the argument will be retained, as it is required to handle the non-standard dynamics that fall outside conventional Markov-chain frameworks, but the conditions themselves will be stated upfront and unambiguously. revision: yes
Circularity Check
No circularity: direct proof of ergodicity via model-specific method with no self-referential reductions.
full rationale
The paper claims a mathematical proof that SAGA-LD converges to a limiting distribution and satisfies a law of large numbers, using a model-specific method. No equations, parameters, or steps in the provided abstract or description reduce by construction to fitted inputs, self-definitions, or load-bearing self-citations. The derivation is presented as an independent argument rather than a renaming or tautological prediction, making the result self-contained against external benchmarks.
Axiom & Free-Parameter Ledger
Reference graph
Works this paper leans on
-
[1]
BHATTACHARYA, R. and MAJUMDAR, M. (1999). On a theorem of Dubins and Freedman.J. Theor. Probab.,121067–1087
work page 1999
-
[2]
BHATTACHARYA, R. N. and WAYMIRE, E. C. (2002). An approach to the existence of unique invariant probabilities for Markov processes.In: Limit Theorems in Probability and Statistics, János Bolyai Math. Soc., I (Balaton- lelle 1999), 181–200
work page 2002
-
[3]
BHATTACHARYA, R. N. and WAYMIRE, E. C. (2009).Stochastic Processes with Applications.SIAM, Philadelphia
work page 2009
-
[4]
CARASSUS, L. and RÁSONYI, M. (2015). On Optimal Investment for a Be- havioural Investor in Multiperiod Incomplete Market Models.Math. Fi- nance,25115–153
work page 2015
-
[5]
CHATTERJI, N., FLAMMARION, N., MA, Y., BARTLETT, B. and JORDAN, M. (2018). On the theory of variance reduction for stochastic gradient Monte Carlo.In: International Conference on Machine Learning, PMLR, 764–773
work page 2018
-
[6]
DEFAZIO, A., BACH, F. and LACOSTE-JULIEN, S. (2014). SAGA: A fast in- cremental gradient method with support for non-strongly convex composite objectives.Advances in Neural Information Processing Systems,27
work page 2014
-
[7]
GERENCSÉR, B. and RÁSONYI, M. (2022). Invariant measures for multidi- mensional fractional stochastic volatility models.Stochastics and PDEs,10 1132–1164
work page 2022
-
[8]
HANSEN, B. (2019). A weak law of large numbers under weak mixing. Preprint.https://users.ssc.wisc.edu/∼bhansen/papers/wlln.pdf
work page 2019
-
[9]
LOVAS, A. (2025). Transition ofα-mixing in random iterations with ap- plications in queuing theory.Stochastic Processes and their Applications, 104803
work page 2025
-
[10]
MEYN, S. P. and TWEEDIE, R. L. (1993).Markov chains and stochastic stability.Springer-Verlag
work page 1993
-
[11]
WELLING, M. and TEH, Y. W. (2011). Bayesian learning via stochastic gra- dient Langevin dynamics.In: Proceedings of the 28th International Confer- ence on Machine Learning (ICML-11), 681–688, 2011
work page 2011
- [12]
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.