pith. sign in

arxiv: 2506.01052 · v3 · pith:XCEBT2SSnew · submitted 2025-06-01 · 💻 cs.LG · math.OC· stat.ML

A Robust widetilde{mathcal{O}}(1/sqrt{T}) Rate for Unprojected TD Learning with Linear Function Approximation

classification 💻 cs.LG math.OCstat.ML
keywords learningconvergencefunctionrateadditionalapproximationboundedcondition
0
0 comments X
read the original abstract

We investigate the finite-time convergence properties of Temporal Difference (TD) learning with linear function approximation, a cornerstone of reinforcement learning. We are interested in the so-called ``robust'' setting, where the convergence guarantee does not depend on the potential function's minimal curvature. While prior work has established convergence guarantees in this setting, these results typically rely on the artificial assumption that each iterate is projected onto a bounded set. Removing such a condition was left as an open problem by Bhandari et al. (COLT'18), hypothesizing the need for additional ``regularity conditions''. In this paper, we show that the simple unprojected TD(0) converges with a rate of $\widetilde{\mathcal{O}}\left(\frac{\|\theta^*\|^2_2}{\sqrt{T}}\right)$ in expectation, even in the presence of Markovian noise. We do not require an additional regularity condition, but only a minor polylog correction to the learning rate. Our analysis reveals a novel self-bounding property of the TD updates and exploits it to guarantee bounded iterates.

This paper has not been read by Pith yet.

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. Corruption-Tolerant Asynchronous Q-Learning with Near-Optimal Rates

    cs.LG 2025-09 unverdicted novelty 6.0

    A novel robust asynchronous Q-learning algorithm achieves finite-time convergence rates that match clean-data bounds up to an additive term proportional to the corruption fraction, with a matching information-theoreti...