Constant stepsize q-learning: Distributional convergence, bias and extrapolation

Zhang, Y · 2024 · arXiv 2401.13884

8 Pith papers cite this work. Polarity classification is still indexing.

8 Pith papers citing it

representative citing papers

Wasserstein-p Central Limit Theorem Rates: From Local Dependence to Markov Chains

math.PR · 2026-01-13 · unverdicted · novelty 8.0

The paper proves the first optimal O(n^{-1/2}) Wasserstein-1 CLT rates for locally dependent sequences and geometrically ergodic Markov chains, plus new W_p rates for p greater than or equal to 2 under mild moments, with an application to U-statistics.

Gaussian Approximation and Multiplier Bootstrap for Federated Linear Stochastic Approximation

stat.ML · 2026-05-19 · unverdicted · novelty 7.0

Establishes non-asymptotic Gaussian approximation bounds for federated LSA with explicit communication-heterogeneity trade-offs and introduces an online multiplier bootstrap for last-iterate inference with validity guarantees.

Shuffling the Data, Stretching the Step-size: Sharper Bias in constant step-size SGD

math.OC · 2026-04-11 · unverdicted · novelty 7.0

Combining random reshuffling and Richardson-Romberg extrapolation yields cubic bias refinement and better MSE for constant-step SGD on structured non-monotone variational inequalities.

A Minimal-Assumption Analysis of Q-Learning with Time-Varying Policies

cs.LG · 2025-10-17 · unverdicted · novelty 7.0

Establishes last-iterate convergence rates for on-policy Q-learning under minimal irreducibility assumptions, with sample complexity O(1/ξ²) matching off-policy up to exploration factors.

From Set Convergence to Pointwise Convergence: Finite-Time Guarantees for Average-Reward Q-Learning with Adaptive Stepsizes

cs.LG · 2025-04-25 · unverdicted · novelty 7.0

Establishes Õ(1/k) mean-square last-iterate convergence for asynchronous average-reward Q-learning with adaptive stepsizes and proves adaptivity is necessary.

Elephant random walk with attributed steps and extractions of random sizes

math.PR · 2026-04-19 · unverdicted · novelty 6.0

A market choice model with random-size sampling from past customers is represented as an elephant random walk variant, with proofs of almost sure convergence of S_n/n and regime-dependent distributional limits for scaled S_n.

Revisiting the Constant Stepsize Stochastic Approximation with Decision-Dependent Markovian Noise

math.OC · 2026-04-15 · unverdicted · novelty 6.0

Constant stepsize SA with decision-dependent Markovian noise has stationary bias O(alpha) under Poisson-Gateaux differentiability, plus finite-time moment bounds and weak convergence.

Central Limit Theorems for Asynchronous Averaged Q-Learning

cs.LG · 2025-09-23 · unverdicted · novelty 6.0

Establishes non-asymptotic and functional central limit theorems for asynchronous averaged Q-learning with explicit rates depending on iterations, state-action space, discount factor, and exploration quality.

citing papers explorer

Showing 8 of 8 citing papers.

Wasserstein-p Central Limit Theorem Rates: From Local Dependence to Markov Chains math.PR · 2026-01-13 · unverdicted · none · ref 76
The paper proves the first optimal O(n^{-1/2}) Wasserstein-1 CLT rates for locally dependent sequences and geometrically ergodic Markov chains, plus new W_p rates for p greater than or equal to 2 under mild moments, with an application to U-statistics.
Gaussian Approximation and Multiplier Bootstrap for Federated Linear Stochastic Approximation stat.ML · 2026-05-19 · unverdicted · none · ref 30
Establishes non-asymptotic Gaussian approximation bounds for federated LSA with explicit communication-heterogeneity trade-offs and introduces an online multiplier bootstrap for last-iterate inference with validity guarantees.
Shuffling the Data, Stretching the Step-size: Sharper Bias in constant step-size SGD math.OC · 2026-04-11 · unverdicted · none · ref 141
Combining random reshuffling and Richardson-Romberg extrapolation yields cubic bias refinement and better MSE for constant-step SGD on structured non-monotone variational inequalities.
A Minimal-Assumption Analysis of Q-Learning with Time-Varying Policies cs.LG · 2025-10-17 · unverdicted · none · ref 38
Establishes last-iterate convergence rates for on-policy Q-learning under minimal irreducibility assumptions, with sample complexity O(1/ξ²) matching off-policy up to exploration factors.
From Set Convergence to Pointwise Convergence: Finite-Time Guarantees for Average-Reward Q-Learning with Adaptive Stepsizes cs.LG · 2025-04-25 · unverdicted · none · ref 87
Establishes Õ(1/k) mean-square last-iterate convergence for asynchronous average-reward Q-learning with adaptive stepsizes and proves adaptivity is necessary.
Elephant random walk with attributed steps and extractions of random sizes math.PR · 2026-04-19 · unverdicted · none · ref 60
A market choice model with random-size sampling from past customers is represented as an elephant random walk variant, with proofs of almost sure convergence of S_n/n and regime-dependent distributional limits for scaled S_n.
Revisiting the Constant Stepsize Stochastic Approximation with Decision-Dependent Markovian Noise math.OC · 2026-04-15 · unverdicted · none · ref 32
Constant stepsize SA with decision-dependent Markovian noise has stationary bias O(alpha) under Poisson-Gateaux differentiability, plus finite-time moment bounds and weak convergence.
Central Limit Theorems for Asynchronous Averaged Q-Learning cs.LG · 2025-09-23 · unverdicted · none · ref 12
Establishes non-asymptotic and functional central limit theorems for asynchronous averaged Q-learning with explicit rates depending on iterations, state-action space, discount factor, and exploration quality.

Constant stepsize q-learning: Distributional convergence, bias and extrapolation

fields

years

verdicts

representative citing papers

citing papers explorer