Relative TD learning with linear function approximation is stable for any discount factor when the baseline is the empirical state-action distribution, with uniformly bounded asymptotic bias and covariance.
Szepesv´ ari.Algorithms for Reinforcement Learning
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
fields
cs.LG 1years
2026 1verdicts
UNVERDICTED 1representative citing papers
citing papers explorer
-
Stability and Sensitivity Analysis of Relative Temporal-Difference Learning: Extended Version
Relative TD learning with linear function approximation is stable for any discount factor when the baseline is the empirical state-action distribution, with uniformly bounded asymptotic bias and covariance.