The paper proves the first optimal O(n^{-1/2}) Wasserstein-1 CLT rates for locally dependent sequences and geometrically ergodic Markov chains, plus new W_p rates for p greater than or equal to 2 under mild moments, with an application to U-statistics.
Constant stepsize q-learning: Distributional convergence, bias and extrapolation
8 Pith papers cite this work. Polarity classification is still indexing.
verdicts
UNVERDICTED 8representative citing papers
Establishes non-asymptotic Gaussian approximation bounds for federated LSA with explicit communication-heterogeneity trade-offs and introduces an online multiplier bootstrap for last-iterate inference with validity guarantees.
Combining random reshuffling and Richardson-Romberg extrapolation yields cubic bias refinement and better MSE for constant-step SGD on structured non-monotone variational inequalities.
Establishes last-iterate convergence rates for on-policy Q-learning under minimal irreducibility assumptions, with sample complexity O(1/ξ²) matching off-policy up to exploration factors.
Establishes Õ(1/k) mean-square last-iterate convergence for asynchronous average-reward Q-learning with adaptive stepsizes and proves adaptivity is necessary.
A market choice model with random-size sampling from past customers is represented as an elephant random walk variant, with proofs of almost sure convergence of S_n/n and regime-dependent distributional limits for scaled S_n.
Constant stepsize SA with decision-dependent Markovian noise has stationary bias O(alpha) under Poisson-Gateaux differentiability, plus finite-time moment bounds and weak convergence.
Establishes non-asymptotic and functional central limit theorems for asynchronous averaged Q-learning with explicit rates depending on iterations, state-action space, discount factor, and exploration quality.
citing papers explorer
-
Wasserstein-p Central Limit Theorem Rates: From Local Dependence to Markov Chains
The paper proves the first optimal O(n^{-1/2}) Wasserstein-1 CLT rates for locally dependent sequences and geometrically ergodic Markov chains, plus new W_p rates for p greater than or equal to 2 under mild moments, with an application to U-statistics.
-
Gaussian Approximation and Multiplier Bootstrap for Federated Linear Stochastic Approximation
Establishes non-asymptotic Gaussian approximation bounds for federated LSA with explicit communication-heterogeneity trade-offs and introduces an online multiplier bootstrap for last-iterate inference with validity guarantees.
-
Shuffling the Data, Stretching the Step-size: Sharper Bias in constant step-size SGD
Combining random reshuffling and Richardson-Romberg extrapolation yields cubic bias refinement and better MSE for constant-step SGD on structured non-monotone variational inequalities.
-
A Minimal-Assumption Analysis of Q-Learning with Time-Varying Policies
Establishes last-iterate convergence rates for on-policy Q-learning under minimal irreducibility assumptions, with sample complexity O(1/ξ²) matching off-policy up to exploration factors.
-
From Set Convergence to Pointwise Convergence: Finite-Time Guarantees for Average-Reward Q-Learning with Adaptive Stepsizes
Establishes Õ(1/k) mean-square last-iterate convergence for asynchronous average-reward Q-learning with adaptive stepsizes and proves adaptivity is necessary.
-
Elephant random walk with attributed steps and extractions of random sizes
A market choice model with random-size sampling from past customers is represented as an elephant random walk variant, with proofs of almost sure convergence of S_n/n and regime-dependent distributional limits for scaled S_n.
-
Revisiting the Constant Stepsize Stochastic Approximation with Decision-Dependent Markovian Noise
Constant stepsize SA with decision-dependent Markovian noise has stationary bias O(alpha) under Poisson-Gateaux differentiability, plus finite-time moment bounds and weak convergence.
-
Central Limit Theorems for Asynchronous Averaged Q-Learning
Establishes non-asymptotic and functional central limit theorems for asynchronous averaged Q-learning with explicit rates depending on iterations, state-action space, discount factor, and exploration quality.