Approximate Next Policy Sampling approximates the next policy's state distribution during training to enable larger safe policy updates in deep RL, demonstrated by SV-PPO matching or exceeding standard PPO on Atari and continuous control tasks.
Title resolution pending
2 Pith papers cite this work. Polarity classification is still indexing.
verdicts
UNVERDICTED 2representative citing papers
NFDRL models return distributions via continuous normalizing flows paired with a geometry-aware Cramér surrogate distance, delivering fixed-size parameters, a sqrt(gamma) contraction, unbiased gradients, and competitive Atari-5 performance.
citing papers explorer
-
Approximate Next Policy Sampling: Replacing Conservative Target Policy Updates in Deep RL
Approximate Next Policy Sampling approximates the next policy's state distribution during training to enable larger safe policy updates in deep RL, demonstrated by SV-PPO matching or exceeding standard PPO on Atari and continuous control tasks.
-
Parameter-Efficient Distributional RL via Normalizing Flows and a Geometry-Aware Cram\'er Surrogate
NFDRL models return distributions via continuous normalizing flows paired with a geometry-aware Cramér surrogate distance, delivering fixed-size parameters, a sqrt(gamma) contraction, unbiased gradients, and competitive Atari-5 performance.