On the performance of temporal difference learning with neural networks

Haoxing Tian, Ioannis Ch Paschalidis, Alex Olshevsky · 2023 · arXiv 2312.05397

1 Pith paper cite this work. Polarity classification is still indexing.

1 Pith paper citing it

representative citing papers

Bridging the Gap Between Average and Discounted TD Learning

cs.LG · 2026-05-03 · unverdicted · novelty 6.0

A new two-trajectory sampling algorithm for average-reward TD learning guarantees convergence with quadratic sample complexity and no explicit dimension dependence in both tabular and linear approximation settings.

citing papers explorer

Showing 1 of 1 citing paper.

Bridging the Gap Between Average and Discounted TD Learning cs.LG · 2026-05-03 · unverdicted · none · ref 8
A new two-trajectory sampling algorithm for average-reward TD learning guarantees convergence with quadratic sample complexity and no explicit dimension dependence in both tabular and linear approximation settings.

On the performance of temporal difference learning with neural networks

fields

years

verdicts

representative citing papers

citing papers explorer