RAT reformulates regularized natural policy gradients as vanilla gradients with a transformed advantage, computed efficiently via randomized block Kaczmarz iterations on on-policy data.
Muesli: Combining improvements in policy optimization.Preprint arXiv:2104.06159
3 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
roles
background 1polarities
background 1representative citing papers
Approximate Next Policy Sampling approximates the next policy's state distribution during training to enable larger safe policy updates in deep RL, demonstrated by SV-PPO matching or exceeding standard PPO on Atari and continuous control tasks.
Gato is a multi-modal, multi-task, multi-embodiment generalist policy using one transformer network to handle text, vision, games, and robotics tasks.
citing papers explorer
-
Randomized Advantage Transformation (RAT): Computing Natural Policy Gradients via Direct Backpropagation
RAT reformulates regularized natural policy gradients as vanilla gradients with a transformed advantage, computed efficiently via randomized block Kaczmarz iterations on on-policy data.
-
Approximate Next Policy Sampling: Replacing Conservative Target Policy Updates in Deep RL
Approximate Next Policy Sampling approximates the next policy's state distribution during training to enable larger safe policy updates in deep RL, demonstrated by SV-PPO matching or exceeding standard PPO on Atari and continuous control tasks.
-
A Generalist Agent
Gato is a multi-modal, multi-task, multi-embodiment generalist policy using one transformer network to handle text, vision, games, and robotics tasks.