REPPO is an on-policy RL method that combines pathwise policy gradients with relative entropy constraints to achieve stable training and high sample efficiency without replay buffers.
An emphatic approach to the problem of off-policy temporal-difference learning
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
fields
cs.LG 1years
2025 1verdicts
UNVERDICTED 1representative citing papers
citing papers explorer
-
Relative Entropy Pathwise Policy Optimization
REPPO is an on-policy RL method that combines pathwise policy gradients with relative entropy constraints to achieve stable training and high sample efficiency without replay buffers.