Asynchronous Stochastic Approximation with Applications to Average-Reward Reinforcement Learning

Huizhen Yu; Richard S. Sutton; Yi Wan

arxiv: 2409.03915 · v3 · pith:DOCLTGMLnew · submitted 2024-09-05 · 💻 cs.LG · math.OC

Asynchronous Stochastic Approximation with Applications to Average-Reward Reinforcement Learning

Huizhen Yu , Yi Wan , Richard S. Sutton This is my paper

classification 💻 cs.LG math.OC

keywords asynchronousaverage-rewardconvergencelearningreinforcementalgorithmsapproximationproperties

0 comments

read the original abstract

This paper investigates the stability and convergence properties of asynchronous stochastic approximation (SA) algorithms, with a focus on extensions relevant to average-reward reinforcement learning. We first extend a stability proof method of Borkar and Meyn to accommodate more general noise conditions than previously considered, thereby yielding broader convergence guarantees for asynchronous SA. To sharpen the convergence analysis, we further examine the shadowing properties of asynchronous SA, building on a dynamical systems approach of Hirsch and Bena\"{i}m. These results provide a theoretical foundation for a class of relative value iteration-based reinforcement learning algorithms -- developed and analyzed in a companion paper -- for solving average-reward Markov and semi-Markov decision processes.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

From Set Convergence to Pointwise Convergence: Finite-Time Guarantees for Average-Reward Q-Learning with Adaptive Stepsizes
cs.LG 2025-04 unverdicted novelty 7.0

Establishes Õ(1/k) mean-square last-iterate convergence for asynchronous average-reward Q-learning with adaptive stepsizes and proves adaptivity is necessary.