{"paper":{"title":"Generalized Emphatic Temporal Difference Learning: Bias-Variance Analysis","license":"http://arxiv.org/licenses/nonexclusive-distrib/1.0/","headline":"","cross_cats":["cs.LG"],"primary_cat":"stat.ML","authors_text":"Assaf Hallak, Aviv Tamar, Remi Munos, Shie Mannor","submitted_at":"2015-09-17T09:03:35Z","abstract_excerpt":"We consider the off-policy evaluation problem in Markov decision processes with function approximation. We propose a generalization of the recently introduced \\emph{emphatic temporal differences} (ETD) algorithm \\citep{SuttonMW15}, which encompasses the original ETD($\\lambda$), as well as several other off-policy evaluation algorithms as special cases. We call this framework \\ETD, where our introduced parameter $\\beta$ controls the decay rate of an importance-sampling term. We study conditions under which the projected fixed-point equation underlying \\ETD\\ involves a contraction operator, allo"},"claims":{"count":0,"items":[],"snapshot_sha256":"258153158e38e3291e3d48162225fcdb2d5a3ed65a07baac614ab91432fd4f57"},"source":{"id":"1509.05172","kind":"arxiv","version":2},"verdict":{"id":null,"model_set":{},"created_at":null,"strongest_claim":"","one_line_summary":"","pipeline_version":null,"weakest_assumption":"","pith_extraction_headline":""},"references":{"count":0,"sample":[],"resolved_work":0,"snapshot_sha256":"258153158e38e3291e3d48162225fcdb2d5a3ed65a07baac614ab91432fd4f57","internal_anchors":0},"formal_canon":{"evidence_count":0,"snapshot_sha256":"258153158e38e3291e3d48162225fcdb2d5a3ed65a07baac614ab91432fd4f57"},"author_claims":{"count":0,"strong_count":0,"snapshot_sha256":"258153158e38e3291e3d48162225fcdb2d5a3ed65a07baac614ab91432fd4f57"},"builder_version":"pith-number-builder-2026-05-17-v1"}