Emphatic TD Bellman Operator is a Contraction

Assaf Hallak; Aviv Tamar; Shie Mannor

arxiv: 1508.03411 · v2 · pith:RKVXXUIUnew · submitted 2015-08-14 · 📊 stat.ML · cs.LG

Emphatic TD Bellman Operator is a Contraction

Assaf Hallak , Aviv Tamar , Shie Mannor This is my paper

classification 📊 stat.ML cs.LG

keywords contractionerroralgorithmboundsemphaticevaluationgammaoff-policy

0 comments

read the original abstract

Recently, \citet{SuttonMW15} introduced the emphatic temporal differences (ETD) algorithm for off-policy evaluation in Markov decision processes. In this short note, we show that the projected fixed-point equation that underlies ETD involves a contraction operator, with a $\sqrt{\gamma}$-contraction modulus (where $\gamma$ is the discount factor). This allows us to provide error bounds on the approximation error of ETD. To our knowledge, these are the first error bounds for an off-policy evaluation algorithm under general target and behavior policies.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Offline Reinforcement Learning: Tutorial, Review, and Perspectives on Open Problems
cs.LG 2020-05 unverdicted novelty 2.0

Offline RL promises to extract high-utility policies from static datasets but faces fundamental challenges that current methods only partially address.