A finite time analysis of temporal difference learning with linear function approximation

Jalaj Bhandari, Daniel Russo, Raghav Singal · 2018

2 Pith papers cite this work. Polarity classification is still indexing.

2 Pith papers citing it

browse 2 citing papers

representative citing papers

Bridging the Gap Between Average and Discounted TD Learning

cs.LG · 2026-05-03 · unverdicted · novelty 6.0

A new two-trajectory sampling algorithm for average-reward TD learning guarantees convergence with quadratic sample complexity and no explicit dimension dependence in both tabular and linear approximation settings.

Corruption-Tolerant Asynchronous Q-Learning with Near-Optimal Rates

cs.LG · 2025-09-10 · unverdicted · novelty 6.0

A novel robust asynchronous Q-learning algorithm achieves finite-time convergence rates that match clean-data bounds up to an additive term proportional to the corruption fraction, with a matching information-theoretic lower bound.

citing papers explorer

Showing 2 of 2 citing papers.

Bridging the Gap Between Average and Discounted TD Learning cs.LG · 2026-05-03 · unverdicted · none · ref 6
A new two-trajectory sampling algorithm for average-reward TD learning guarantees convergence with quadratic sample complexity and no explicit dimension dependence in both tabular and linear approximation settings.
Corruption-Tolerant Asynchronous Q-Learning with Near-Optimal Rates cs.LG · 2025-09-10 · unverdicted · none · ref 50
A novel robust asynchronous Q-learning algorithm achieves finite-time convergence rates that match clean-data bounds up to an additive term proportional to the corruption fraction, with a matching information-theoretic lower bound.

A finite time analysis of temporal difference learning with linear function approximation

fields

years

verdicts

representative citing papers

citing papers explorer