Td algorithm for the variance of return and mean-variance reinforcement learning

Makoto Sato, Hajime Kimura, Shibenobu Kobayashi · 2001

1 Pith paper cite this work. Polarity classification is still indexing.

1 Pith paper citing it

browse 1 citing papers

representative citing papers

Uncertainty-aware Model-based Policy Optimization

cs.LG · 2019-06-25 · unverdicted · novelty 5.0

Introduces a framework that learns an uncertainty-aware dynamics model and optimizes the policy via automatic differentiation through the model, reporting competitive asymptotic performance with significantly lower sample complexity than baselines on continuous control benchmarks.

citing papers explorer

Showing 1 of 1 citing paper.

Uncertainty-aware Model-based Policy Optimization cs.LG · 2019-06-25 · unverdicted · none · ref 24
Introduces a framework that learns an uncertainty-aware dynamics model and optimizes the policy via automatic differentiation through the model, reporting competitive asymptotic performance with significantly lower sample complexity than baselines on continuous control benchmarks.

Td algorithm for the variance of return and mean-variance reinforcement learning

fields

years

verdicts

representative citing papers

citing papers explorer