A Bayesian method uses near-optimality constraints from expert trajectories to estimate transition dynamics in offline model-based reinforcement learning.
Title resolution pending
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
verdicts
UNVERDICTED 2representative citing papers
A two-stage distillation plus reinforced fine-tuning approach produces a single humanoid locomotion controller that adapts across skills and irregular terrains.
citing papers explorer
-
Bayesian Inverse Transition Learning: Learning Dynamics From Near-Optimal Trajectories
A Bayesian method uses near-optimality constraints from expert trajectories to estimate transition dynamics in offline model-based reinforcement learning.
-
Towards Adaptive Humanoid Control via Multi-Behavior Distillation and Reinforced Fine-Tuning
A two-stage distillation plus reinforced fine-tuning approach produces a single humanoid locomotion controller that adapts across skills and irregular terrains.