A single stepsize for unprojected linear TD(0) under Markovian sampling yields simultaneous high-probability robust curvature-free and fast curvature-dependent rates via Polyak-Ruppert averaging.
Finite sample analysis of linear temporal difference learning with arbitrary features
2 Pith papers cite this work. Polarity classification is still indexing.
fields
cs.LG 2verdicts
UNVERDICTED 2representative citing papers
A novel robust asynchronous Q-learning algorithm achieves finite-time convergence rates that match clean-data bounds up to an additive term proportional to the corruption fraction, with a matching information-theoretic lower bound.
citing papers explorer
-
A Single Stepsize Suffices for Unprojected Linear TD(0): Simultaneous Robust and Fast Rates via Polyak--Ruppert Averaging
A single stepsize for unprojected linear TD(0) under Markovian sampling yields simultaneous high-probability robust curvature-free and fast curvature-dependent rates via Polyak-Ruppert averaging.
-
Corruption-Tolerant Asynchronous Q-Learning with Near-Optimal Rates
A novel robust asynchronous Q-learning algorithm achieves finite-time convergence rates that match clean-data bounds up to an additive term proportional to the corruption fraction, with a matching information-theoretic lower bound.