Central Limit Theorems for Stochastic Gradient Descent Quantile Estimators

Jiaqi Li; Likai Chen; Wei Biao Wu; Ziyang Wei

arxiv: 2503.02178 · v2 · submitted 2025-03-04 · 📊 stat.ML · cs.LG

Central Limit Theorems for Stochastic Gradient Descent Quantile Estimators

Ziyang Wei , Jiaqi Li , Likai Chen , Wei Biao Wu This is my paper

Pith reviewed 2026-05-23 02:05 UTC · model grok-4.3

classification 📊 stat.ML cs.LG

keywords quantile estimationstochastic gradient descentcentral limit theoremMarkov chainstationary distributionasymptotic normalityconstant learning rateconfidence intervals

0 comments

The pith

The stationary distribution of quantile SGD with constant learning rate converges to a Gaussian after centering and scaling as the learning rate approaches zero.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

This paper proves a central limit theorem for quantile estimators obtained from stochastic gradient descent that uses a fixed learning rate. The authors treat the iteration as a Markov chain that reaches a unique stationary distribution from any initialization. Analysis of the chain's characteristic function, together with bounds on its moment generating function and tails, establishes that the centered and scaled stationary distribution becomes normal in the small-learning-rate limit. The result supplies the first such guarantee for constant-rate quantile SGD and yields a recursive procedure for constructing confidence intervals with statistical validity.

Core claim

Viewing the quantile SGD iteration as an irreducible, periodic, and positive recurrent Markov chain that cyclically converges to its unique stationary distribution regardless of initialization, the paper derives the exact form of this distribution from the stationary equation satisfied by its characteristic function. Tight bounds on the moment generating function and tail probabilities are established, from which it follows that the centered and standardized stationary distribution converges in law to a Gaussian distribution as the learning rate η tends to zero. This supplies the first central limit theorem for quantile SGD estimators that employ constant learning rates.

What carries the argument

The irreducible, periodic, positive recurrent Markov chain representation of the quantile SGD iteration and the stationary distribution obtained from its characteristic function via the stationary equation.

If this is right

Quantile SGD estimators admit asymptotic normality under constant learning rates.
A recursive algorithm produces confidence intervals for the estimators that carry statistical guarantees.
The Markov-chain analysis extends to other SGD procedures whose losses are convex but neither smooth nor strongly convex.
Finite-sample performance of both the online estimator and the associated inference procedure is satisfactory in numerical experiments.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same stationary-distribution technique could be applied to derive CLTs for other robust M-estimators trained by constant-rate SGD.
Practitioners may use the normality result to justify online quantile tracking in streaming-data settings where only a fixed learning rate is feasible.
Quantifying the rate of convergence to the Gaussian limit would give explicit error bounds for the confidence-interval procedure.
The approach may connect to existing Markov-chain analyses of non-convex optimization algorithms that lack strong convexity.

Load-bearing premise

The quantile SGD updates form an irreducible, periodic, and positive recurrent Markov chain that converges to a unique stationary distribution from any starting point.

What would settle it

Numerical simulation of many independent runs of quantile SGD with successively smaller constant learning rates, followed by checking whether the empirical distribution of the centered and scaled final iterates deviates from normality by a fixed positive amount that does not shrink with the learning rate.

read the original abstract

This paper develops asymptotic theory for quantile estimation via stochastic gradient descent (SGD) with a constant learning rate. The quantile loss function is neither smooth nor strongly convex. Beyond conventional perspectives and techniques, we view quantile SGD iteration as an irreducible, periodic, and positive recurrent Markov chain, which cyclically converges to its unique stationary distribution regardless of the arbitrarily fixed initialization. To derive the exact form of the stationary distribution, we analyze the structure of its characteristic function by exploiting the stationary equation. We also derive tight bounds for its moment generating function (MGF) and tail probabilities. Synthesizing the aforementioned approaches, we prove that the centered and standardized stationary distribution converges to a Gaussian distribution as the learning rate $\eta\rightarrow0$. This finding provides the first central limit theorem (CLT)-type theoretical guarantees for the quantile SGD estimator with constant learning rates. We further propose a recursive algorithm to construct confidence intervals of the estimators with statistical guarantee. Numerical studies demonstrate the satisfactory finite-sample performance of the online estimator and inference procedure. The theoretical tools developed in this study are of independent interest for investigating general SGD algorithms formulated as Markov chains, particularly in non-strongly convex and non-smooth settings.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper claims the first CLT for constant-step quantile SGD by analyzing the stationary distribution of a Markov chain, but the assumptions needed for that chain look fragile.

read the letter

The core claim is that quantile SGD with fixed learning rate can be viewed as an irreducible periodic positive recurrent Markov chain on R^d. From any initialization it converges cyclically to a unique stationary distribution whose characteristic function satisfies a stationary equation. They bound the MGF and tails, then show that centering and scaling this stationary distribution yields a Gaussian limit as the step size eta goes to zero. They also sketch a recursive procedure for confidence intervals and report some numerical checks.

Referee Report

2 major / 2 minor

Summary. The paper develops asymptotic theory for quantile estimation via SGD with constant learning rate η. It models the quantile SGD iteration (using the pinball loss) as an irreducible, periodic, positive recurrent Markov chain on R^d that cyclically converges to a unique stationary distribution independent of initialization. From the stationary equation, the characteristic function of the stationary distribution is derived exactly; tight MGF bounds and tail probabilities are obtained. These are used to prove that the centered and standardized stationary distribution converges in law to a Gaussian as η→0, yielding the first CLT-type result for constant-step quantile SGD. A recursive algorithm for constructing statistically valid confidence intervals is proposed and supported by numerical experiments. The Markov-chain tools are claimed to be of independent interest for non-smooth, non-strongly-convex SGD.

Significance. If the Markov-chain modeling and subsequent derivations hold, the result supplies the first rigorous CLT guarantees for constant-learning-rate quantile SGD, a setting where standard smoothness/strong-convexity assumptions fail. The explicit characteristic-function analysis, MGF bounds, and tail controls for the stationary distribution constitute a technical contribution that could apply to other non-smooth SGD problems. The recursive CI procedure adds practical value.

major comments (2)

[§2 and §3] §2 (Markov-chain formulation) and §3 (stationary distribution): the assertion that the iteration defines an irreducible, periodic, and positive-recurrent chain on R^d for arbitrary initialization and any data distribution is load-bearing for the entire CLT. The pinball loss is convex but not smooth; the transition kernel therefore involves a set-valued subgradient at the quantile point. No explicit verification is given that the resulting kernel satisfies the Harris-recurrence or minorization conditions needed for positive recurrence in continuous state space, nor is the claimed periodicity reconciled with the fact that absolutely continuous transitions generically produce aperiodic chains.
[§4] §4 (CLT proof): the passage from the stationary characteristic function and MGF bounds to the Gaussian limit as η→0 relies on the existence of a unique stationary distribution independent of initialization. If the recurrence or uniqueness arguments in §3 contain gaps (e.g., dependence on unstated moment or density assumptions on the noise), the subsequent η→0 scaling and centering arguments do not apply uniformly.

minor comments (2)

[§2] Notation for the subgradient of the pinball loss is introduced without a displayed equation; a numbered display would clarify the transition kernel.
[§5] The recursive CI algorithm is described at a high level; pseudocode or a precise statement of the recursion and its coverage guarantee would improve reproducibility.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the careful and constructive report. The comments correctly identify that the Markov-chain properties are foundational, and we address each point below with proposed revisions to strengthen the exposition.

read point-by-point responses

Referee: [§2 and §3] §2 (Markov-chain formulation) and §3 (stationary distribution): the assertion that the iteration defines an irreducible, periodic, and positive recurrent chain on R^d for arbitrary initialization and any data distribution is load-bearing for the entire CLT. The pinball loss is convex but not smooth; the transition kernel therefore involves a set-valued subgradient at the quantile point. No explicit verification is given that the resulting kernel satisfies the Harris-recurrence or minorization conditions needed for positive recurrence in continuous state space, nor is the claimed periodicity reconciled with the fact that absolutely continuous transitions generically produce aperiodic chains.

Authors: We agree that an explicit verification of Harris recurrence and the minorization condition is desirable for rigor in continuous state space. The manuscript derives irreducibility and positive recurrence from the fact that the subgradient of the pinball loss, combined with the data distribution, allows positive probability of moving toward or away from the quantile from any point; however, we will add a dedicated lemma in the revision that verifies the minorization condition under the mild assumption that the noise distribution admits a density that is positive in a neighborhood of the quantile. For periodicity, the set-valued subgradient at the exact quantile introduces a discrete choice (stay or move) that produces a deterministic period-2 cycle, which is compatible with the otherwise absolutely continuous transitions; we will insert a clarifying remark and a short proof of this periodicity in §3. revision: partial
Referee: [§4] §4 (CLT proof): the passage from the stationary characteristic function and MGF bounds to the Gaussian limit as η→0 relies on the existence of a unique stationary distribution independent of initialization. If the recurrence or uniqueness arguments in §3 contain gaps (e.g., dependence on unstated moment or density assumptions on the noise), the subsequent η→0 scaling and centering arguments do not apply uniformly.

Authors: The uniqueness of the stationary distribution (independent of initialization) is obtained in §3 by solving the stationary equation for the characteristic function and showing that the resulting distribution is the unique solution satisfying the MGF bounds; the CLT in §4 then follows directly from these bounds as η→0. We acknowledge that the current text states the result for arbitrary data distributions while the proofs implicitly use finite moments and a positive density near the quantile for the minorization step. In the revision we will state these assumptions explicitly at the beginning of §2 and confirm that the η→0 limit holds uniformly under them, thereby closing the potential gap. revision: partial

Circularity Check

0 steps flagged

No circularity: derivation proceeds from Markov chain stationary equation to explicit CLT limit

full rationale

The paper models the constant-step quantile SGD recursion as an irreducible periodic positive-recurrent Markov chain on R^d, invokes the stationary equation to obtain the characteristic function of the invariant measure, derives MGF and tail bounds from that equation, and proves the centered/scaled measure converges to N(0,1) as eta->0. None of these steps reduces a claimed result to a fitted parameter, renames an input, or rests on a self-citation whose content is the target theorem itself. The Markov-chain modeling is an assumption whose verification lies outside the derivation; the subsequent analysis is a direct functional-equation argument, not a tautology.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract provides no information on free parameters, axioms, or invented entities; all arrays are therefore empty.

pith-pipeline@v0.9.0 · 5743 in / 1118 out tokens · 54482 ms · 2026-05-23T02:05:49.600337+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Foundation/Atomicity.lean atomic_tick echoes

?

echoes
ECHOES: this paper passage has the same mathematical shape or conceptual pattern as the Recognition theorem, but is not a direct formal dependency.

the Markov chain induced by quantile SGD with τ = q/p has period q... Foster’s Theorem... positive recurrent

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.