Establishes last-iterate convergence rates for on-policy Q-learning under minimal irreducibility assumptions, with sample complexity O(1/ξ²) matching off-policy up to exploration factors.
A stochastic approximation method.The Annals of Mathematical Statistics, pages 400–407
4 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
verdicts
UNVERDICTED 4roles
background 1polarities
background 1representative citing papers
VISTA adaptively tunes consistency thresholds in decentralized SGD so that the system converges asymptotically like standard SGD even when adversaries dominate the worker pool.
ASPIRE learns adaptive graph filters via bi-level optimization to overcome low-frequency explosion bias in spectral collaborative filtering, achieving strong performance and stability.
FedSLoP reduces communication and memory costs in federated learning through stochastic low-rank gradient projections, with a nonconvex convergence rate of O(1/sqrt(NT)) and competitive accuracy on heterogeneous MNIST data.
citing papers explorer
-
A Minimal-Assumption Analysis of Q-Learning with Time-Varying Policies
Establishes last-iterate convergence rates for on-policy Q-learning under minimal irreducibility assumptions, with sample complexity O(1/ξ²) matching off-policy up to exploration factors.
-
\mathsf{VISTA}: Decentralized Machine Learning in Adversary Dominated Environments
VISTA adaptively tunes consistency thresholds in decentralized SGD so that the system converges asymptotically like standard SGD even when adversaries dominate the worker pool.
-
ASPIRE: Make Spectral Graph Collaborative Filtering Great Again via Adaptive Filter Learning
ASPIRE learns adaptive graph filters via bi-level optimization to overcome low-frequency explosion bias in spectral collaborative filtering, achieving strong performance and stability.
-
FedSLoP: Memory-Efficient Federated Learning with Low-Rank Gradient Projection
FedSLoP reduces communication and memory costs in federated learning through stochastic low-rank gradient projections, with a nonconvex convergence rate of O(1/sqrt(NT)) and competitive accuracy on heterogeneous MNIST data.