Finite sample analysis of average-reward TD learning and Q-learning.Advances in Neural Information Processing Systems, 34:1230–1242, 2021

Sheng Zhang, Zhe Zhang, Siva Theja Maguluri · 2021

1 Pith paper cite this work. Polarity classification is still indexing.

1 Pith paper citing it

browse 1 citing papers

representative citing papers

Bridging the Gap Between Average and Discounted TD Learning

cs.LG · 2026-05-03 · unverdicted · novelty 6.0

A new two-trajectory sampling algorithm for average-reward TD learning guarantees convergence with quadratic sample complexity and no explicit dimension dependence in both tabular and linear approximation settings.

citing papers explorer

Showing 1 of 1 citing paper.

Bridging the Gap Between Average and Discounted TD Learning cs.LG · 2026-05-03 · unverdicted · none · ref 11
A new two-trajectory sampling algorithm for average-reward TD learning guarantees convergence with quadratic sample complexity and no explicit dimension dependence in both tabular and linear approximation settings.

Finite sample analysis of average-reward TD learning and Q-learning.Advances in Neural Information Processing Systems, 34:1230–1242, 2021

fields

years

verdicts

representative citing papers

citing papers explorer