SMP turns pre-trained motion diffusion models into task-agnostic, reusable reward functions via score distillation sampling, enabling style-specific and composable motion priors for humanoid control without retraining per task.
Diffuseloco: Real-time legged locomotion control with diffusion from offline datasets
5 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
roles
background 1polarities
background 1representative citing papers
AID trains diffusion policies via behavior cloning on existing MAIPP planners followed by RL fine-tuning to achieve faster execution and higher information gain in multi-agent coordination.
NavOL collects expert trajectory labels online from a global planner during policy rollouts in simulation to train a diffusion navigation policy, mitigating distribution shift and improving performance on visual navigation tasks.
Diff-CAST replaces GAN discriminators with diffusion-based priors and adds symmetric command conditioning plus constrained RL to enable versatile, drift-free, and hardware-safe quadruped locomotion.
PODPO is a likelihood-free generative policy optimization method for online RL that steers actions to high-return regions using only positive-advantage samples and local contrastive drifting.
citing papers explorer
-
SMP: Reusable Score-Matching Motion Priors for Physics-Based Character Control
SMP turns pre-trained motion diffusion models into task-agnostic, reusable reward functions via score distillation sampling, enabling style-specific and composable motion priors for humanoid control without retraining per task.
-
AID: Agent Intent from Diffusion for Multi-Agent Informative Path Planning
AID trains diffusion policies via behavior cloning on existing MAIPP planners followed by RL fine-tuning to achieve faster execution and higher information gain in multi-agent coordination.
-
NavOL: Navigation Policy with Online Imitation Learning
NavOL collects expert trajectory labels online from a global planner during policy rollouts in simulation to train a diffusion navigation policy, mitigating distribution shift and improving performance on visual navigation tasks.
-
Constraint-Aware Diffusion Priors for High-Fidelity and Versatile Quadruped Locomotion
Diff-CAST replaces GAN discriminators with diffusion-based priors and adds symmetric command conditioning plus constrained RL to enable versatile, drift-free, and hardware-safe quadruped locomotion.
-
Positive-Only Drifting Policy Optimization
PODPO is a likelihood-free generative policy optimization method for online RL that steers actions to high-return regions using only positive-advantage samples and local contrastive drifting.