Learning to Predict Independent of Span

Hado van Hasselt; Richard S. Sutton

arxiv: 1508.04582 · v1 · pith:AXJYOOGQnew · submitted 2015-08-19 · 💻 cs.LG

Learning to Predict Independent of Span

Hado van Hasselt , Richard S. Sutton This is my paper

classification 💻 cs.LG

keywords predictionsalgorithmspanalgorithmscomputationconstructsconventionaldesiderata

0 comments

read the original abstract

We consider how to learn multi-step predictions efficiently. Conventional algorithms wait until observing actual outcomes before performing the computations to update their predictions. If predictions are made at a high rate or span over a large amount of time, substantial computation can be required to store all relevant observations and to update all predictions when the outcome is finally observed. We show that the exact same predictions can be learned in a much more computationally congenial way, with uniform per-step computation that does not depend on the span of the predictions. We apply this idea to various settings of increasing generality, repeatedly adding desired properties and each time deriving an equivalent span-independent algorithm for the conventional algorithm that satisfies these desiderata. Interestingly, along the way several known algorithmic constructs emerge spontaneously from our derivations, including dutch eligibility traces, temporal difference errors, and averaging. This allows us to link these constructs one-to-one to the corresponding desiderata, unambiguously connecting the `how' to the `why'. Each step, we make sure that the derived algorithm subsumes the previous algorithms, thereby retaining their properties. Ultimately we arrive at a single general temporal-difference algorithm that is applicable to the full setting of reinforcement learning.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Investigating Action Encodings in Recurrent Neural Networks in Reinforcement Learning
cs.LG 2026-05 unverdicted novelty 5.0

The authors compare multiple methods for incorporating action information into RNN state updates for RL and report empirical results on illustrative domains.