NaP-Control: Navigating Diffusion Prior for Versatile and Fast Character Control

Chia-Wen Chen; Korrawe Karunratanakul; Siyu Tang; Yan Wu

arxiv: 2605.20209 · v1 · pith:LYEEZBC6new · submitted 2026-04-15 · 💻 cs.GR · cs.LG· cs.RO

NaP-Control: Navigating Diffusion Prior for Versatile and Fast Character Control

Chia-Wen Chen , Yan Wu , Korrawe Karunratanakul , Siyu Tang This is my paper

Pith reviewed 2026-05-21 09:37 UTC · model grok-4.3

classification 💻 cs.GR cs.LGcs.RO

keywords character controldiffusion modelsreinforcement learningphysics-based animationmotion generationlatent noise manipulationwhole-body control

0 comments

The pith

Reinforcement learning manipulates latent noise in a diffusion motion prior to deliver fast, task-specific character control.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper presents NaP-Control, a technique that trains a reinforcement learning agent to adjust the noise inputs to a pre-trained, task-agnostic diffusion policy. This steering produces motions that satisfy specific control objectives without requiring slow gradient guidance at every denoising step. Because the agent interacts with the physics environment during training, it learns to correct motions on the fly and optimize task rewards. The result is higher success rates, quicker inference, and preserved natural movement across varied animation tasks.

Core claim

NaP-Control uses reinforcement learning to directly predict task-optimized diffusion noise from a task-agnostic prior, eliminating iterative test-time guidance while still achieving robust whole-body control and high motion fidelity through online correction of motions.

What carries the argument

Reinforcement learning policy that outputs adjustments to the latent noise of a pre-trained diffusion model to steer generated character motions toward task goals.

If this is right

Inference becomes substantially faster because no per-step gradient computations are needed during denoising.
Success rates rise on diverse control tasks because the method corrects motions through direct environment interaction.
Natural motion quality is retained by keeping the generation process anchored to the original diffusion prior.
The approach supports adaptation to challenging scenarios that offline training alone cannot handle.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same noise-manipulation idea might transfer to other diffusion-based generation domains where test-time optimization currently dominates runtime cost.
Pre-trained motion priors may contain more task-flexible knowledge than is typically accessed through fixed guidance schemes.
Combining the method with newer reinforcement learning algorithms could further improve sample efficiency during the noise-steering training phase.

Load-bearing premise

The motions encoded in the task-agnostic diffusion prior are rich enough that noise manipulation can reliably reach new task objectives without creating artifacts or unstable behavior.

What would settle it

If side-by-side tests on standard character control benchmarks show that NaP-Control produces lower success rates or slower inference than gradient-guided diffusion baselines, the central claim would be falsified.

Figures

Figures reproduced from arXiv: 2605.20209 by Chia-Wen Chen, Korrawe Karunratanakul, Siyu Tang, Yan Wu.

**Figure 1.** Figure 1: NaP-Control is a latent noise optimization framework combining reinforcement learning and diffusion-based prior for physics-based character control. We showcase its effectiveness in (a) far goal reaching, (b) agile hand reaching, (c) velocity control, (d) object interaction tasks, as well as its adaptation on uneven terrains. Abstract. Achieving precise, versatile whole-body character control in physics-ba… view at source ↗

**Figure 2.** Figure 2: Framework overview. (a) The RL policy πθ receives environment and proprioceptive states from the physics simulator. (b) The actor learns to predict optimal noise ω ∈ W aligned with task goals. (c) These predicted noises are denoised and decoded into executable actions a via a pretrained diffusion prior and a latent action decoder. Resulting transitions are then used to iteratively optimize the noise navig… view at source ↗

**Figure 3.** Figure 3: Qualitative Comparison of Object Interaction Task. advantage in motion naturalness becomes even more pronounced in this setting, as the task demands rapid directional and height changes that strongly challenge the temporal coherence of conventional RL-based policies. 4.4 Velocity Control For the velocity control task, the direction and speed of the target velocity are randomly sampled within 3m/s for each … view at source ↗

**Figure 4.** Figure 4: Ablation studies. (a) Effect of state representation on flat-ground far goal reaching. (b–c) Effect of action chunk size k for agile hand reaching on flat ground (b) and uneven terrain (c). (d) Comparison of joint state-action noise optimizing versus actiononly noise optimizing for agile hand reaching. RL exploration efficiency and control stability, as evidenced by our flat-ground far goal-reaching resu… view at source ↗

read the original abstract

Achieving precise, versatile whole-body character control in physics-based animation remains challenging. Recent diffusion-based policies generate rich and expressive motions but typically rely on gradient-based test-time guidance to satisfy task objectives, which is slow and can reduce robustness. We introduce NaP-Control (Navigating Diffusion Prior for Versatile and Fast Character Control), abbreviated as NaP. Our method uses reinforcement learning to manipulate the latent noise of a task-agnostic diffusion policy prior, steering it toward task-specific behaviors for fast, robust control with high motion fidelity. In contrast to methods that rely solely on offline training, NaP interacts with the environment during training to correct motions and optimize task rewards, improving success rates and enabling adaptation to challenging scenarios. By directly predicting task-optimized diffusion noise, NaP eliminates iterative guidance during denoising and enables efficient inference. Experiments show that NaP attains higher success rates and faster inference while preserving natural motion across diverse tasks.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

NaP-Control uses RL to pick task-optimized noise for a fixed diffusion prior so it can skip test-time guidance, but the abstract gives no numbers to back the speed and success claims.

read the letter

NaP-Control's main move is training an RL agent to output the right latent noise for a task-agnostic diffusion prior. This lets the model generate motions that meet task goals in one forward pass instead of relying on slow gradient guidance during denoising. The authors are right that guidance is a practical bottleneck for real-time character control in animation and robotics. Letting the system interact with the environment during training to fix motions and chase rewards is a reasonable way to push the prior toward better task performance without retraining the whole diffusion model from scratch. If the experiments hold up, this could make diffusion policies more usable in settings that need both natural motion and quick adaptation. The soft spots sit mostly in the evidence. The abstract states higher success rates and faster inference but supplies none of the actual metrics, baseline numbers, or ablation results needed to judge the size of the gains or the failure modes. The stress-test concern about the prior's noise space being smooth and rich enough for stable RL corrections is still open; the paper would have to show that the corrections stay reliable on out-of-distribution tasks rather than just masking problems inside the training distribution. Without that, it is hard to know whether the method truly improves robustness or simply trades one set of artifacts for another. This work is aimed at people already building physics-based controllers with diffusion models. A reader who cares about inference speed in graphics pipelines or robotics simulation could get something useful out of it once the quantitative side is filled in. It deserves a serious referee because the core idea is distinct from pure offline training or pure guidance methods and targets a real deployment issue. I would send it to review but ask the referees to focus on the experimental comparisons and any analysis of when the RL noise navigation breaks down.

Referee Report

2 major / 1 minor

Summary. The manuscript introduces NaP-Control (NaP), a method for whole-body character control in physics-based animation. It trains a reinforcement learning policy to directly predict task-optimized latent noise for a fixed, task-agnostic diffusion prior, thereby steering generated motions toward task objectives without test-time gradient guidance. The approach claims to improve success rates and inference speed while maintaining motion naturalness by allowing environment interaction during training to correct motions.

Significance. If the experimental claims are substantiated, the method would offer a practical way to combine the expressiveness of diffusion priors with the adaptability of RL, potentially reducing the computational cost of guidance-based diffusion control while improving robustness across diverse tasks.

major comments (2)

[Abstract] Abstract: the claims of 'higher success rates and faster inference' are presented without any quantitative metrics, baseline comparisons, ablation studies, or error analysis. This absence prevents verification of the central performance assertions and leaves the strength of the contribution unclear.
[Method] Method description (implicit in the abstract and skeptic note): the assumption that a fixed task-agnostic diffusion prior already encodes sufficiently rich and locally correctable motion distributions for RL-based latent noise manipulation to reach high task rewards without artifacts or instability is load-bearing but not yet supported by concrete evidence of stability or out-of-distribution behavior.

minor comments (1)

[Abstract] Abstract: consider including at least one concrete performance number or reference to a results table/figure to ground the performance claims.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback on clarifying our performance claims and methodological assumptions. We address each major comment below and have revised the manuscript to strengthen the presentation of results and supporting evidence.

read point-by-point responses

Referee: [Abstract] Abstract: the claims of 'higher success rates and faster inference' are presented without any quantitative metrics, baseline comparisons, ablation studies, or error analysis. This absence prevents verification of the central performance assertions and leaves the strength of the contribution unclear.

Authors: We agree that the abstract would benefit from explicit quantitative highlights to immediately substantiate the claims. The full manuscript already contains detailed metrics, baseline comparisons, ablations, and analysis in the Experiments section. In the revised version, we have updated the abstract to include specific results such as success rate improvements and inference speedups relative to guidance-based baselines, with pointers to the supporting tables and figures. This addresses the concern without altering the abstract's brevity. revision: yes
Referee: [Method] Method description (implicit in the abstract and skeptic note): the assumption that a fixed task-agnostic diffusion prior already encodes sufficiently rich and locally correctable motion distributions for RL-based latent noise manipulation to reach high task rewards without artifacts or instability is load-bearing but not yet supported by concrete evidence of stability or out-of-distribution behavior.

Authors: We acknowledge the importance of evidencing this core assumption. The RL policy is trained with direct environment interaction to correct motions toward task rewards, and our experiments demonstrate stable, high-fidelity outputs without artifacts across diverse tasks. To provide more concrete support, the revised manuscript adds a dedicated discussion subsection on the prior's motion distribution coverage, including qualitative visualizations and analysis of out-of-distribution handling via noise prediction. Stability is further supported by the reported success rates and motion quality metrics. revision: partial

Circularity Check

0 steps flagged

No circularity: method and claims remain independent of reported outcomes

full rationale

The abstract and method description present NaP-Control as using RL to manipulate latent noise from a pre-existing task-agnostic diffusion policy prior, with environment interaction during training to correct motions and optimize rewards. No equations, derivations, or self-referential definitions are shown that reduce the claimed success rates or inference speed to fitted parameters or prior outputs by construction. The performance claims are positioned as experimental results rather than tautological consequences of the method definition itself. The central premise about the prior's richness is treated as an assumption to be validated externally, not derived internally from the paper's own fitted values or self-citations in a load-bearing way.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 0 invented entities

The method depends on the existence of a high-quality task-agnostic diffusion prior and on RL being able to optimize noise inputs without destabilizing the generative process; no new physical entities are introduced.

free parameters (1)

RL reward function weights and noise manipulation hyperparameters
These are tuned during training to balance task success against motion naturalness.

axioms (1)

domain assumption A pre-trained task-agnostic diffusion policy prior captures sufficiently diverse and physically plausible whole-body motions that can be steered by latent noise changes.
Invoked when the paper states that the prior is manipulated toward task-specific behaviors.

pith-pipeline@v0.9.0 · 5701 in / 1315 out tokens · 47308 ms · 2026-05-21T09:37:47.717061+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

61 extracted references · 61 canonical work pages · 7 internal anchors

[1]

Bai, Y., Jones, A., Ndousse, K., Askell, A., Chen, A., DasSarma, N., Drain, D., Fort, S., Ganguli, D., Henighan, T., et al.: Training a helpful and harmless assistant withreinforcementlearningfromhumanfeedback.arXivpreprintarXiv:2204.05862 (2022) 4

work page internal anchor Pith review Pith/arXiv arXiv 2022
[2]

In: Pro- ceedings ofthe 26thannualinternational conferenceon machine learning.pp

Bengio, Y., Louradour, J., Collobert, R., Weston, J.: Curriculum learning. In: Pro- ceedings ofthe 26thannualinternational conferenceon machine learning.pp. 41–48 (2009) 10

work page 2009
[3]

Training Diffusion Models with Reinforcement Learning

Black, K., Janner, M., Du, Y., Kostrikov, I., Levine, S.: Training diffusion models with reinforcement learning. arXiv preprint arXiv:2305.13301 (2023) 4

work page internal anchor Pith review Pith/arXiv arXiv 2023
[4]

Advances in neural information processing systems30(2017) 4

Christiano, P.F., Leike, J., Brown, T., Martic, M., Legg, S., Amodei, D.: Deep reinforcement learning from human preferences. Advances in neural information processing systems30(2017) 4

work page 2017
[5]

In: European Confer- ence on Computer Vision

Dai, W., Chen, L.H., Wang, J., Liu, J., Dai, B., Tang, Y.: Motionlcm: Real-time controllable motion generation via latent consistency model. In: European Confer- ence on Computer Vision. pp. 390–408. Springer (2024) 3

work page 2024
[6]

ACM transactions on graphics (TOG)29(4), 1–10 (2010) 3

De Lasa, M., Mordatch, I., Hertzmann, A.: Feature-based locomotion controllers. ACM transactions on graphics (TOG)29(4), 1–10 (2010) 3

work page 2010
[7]

Advances in Neural Information Processing Systems37, 125487–125519 (2024) 4

Eyring, L., Karthik, S., Roth, K., Dosovitskiy, A., Akata, Z.: Reno: Enhancing one-step text-to-image models through reward-based noise optimization. Advances in Neural Information Processing Systems37, 125487–125519 (2024) 4

work page 2024
[8]

Optimizing ddpm sampling with shortcut fine-tuning.arXiv preprint arXiv:2301.13362,

Fan, Y., Lee, K.: Optimizing ddpm sampling with shortcut fine-tuning. arXiv preprint arXiv:2301.13362 (2023) 4

work page arXiv 2023
[9]

Advances in Neural Information Processing Sys- tems36, 79858–79885 (2023) 4

Fan, Y., Watkins, O., Du, Y., Liu, H., Ryu, M., Boutilier, C., Abbeel, P., Ghavamzadeh, M., Lee, K., Lee, K.: Dpok: Reinforcement learning for fine-tuning text-to-image diffusion models. Advances in Neural Information Processing Sys- tems36, 79858–79885 (2023) 4

work page 2023
[10]

In: Proceedings of the Special Interest Group on Computer Graphics and Interactive Techniques Conference Con- ference Papers

Gat, I., Raab, S., Tevet, G., Reshef, Y., Bermano, A.H., Cohen-Or, D.: Anytop: Character animation diffusion with any topology. In: Proceedings of the Special Interest Group on Computer Graphics and Interactive Techniques Conference Con- ference Papers. pp. 1–10 (2025) 3

work page 2025
[11]

In: CVPR (2024) 6

Guo, X., Liu, J., Cui, M., Li, J., Yang, H., Huang, D.: Initno: Boosting text-to- image diffusion models via initial noise optimization. In: CVPR (2024) 6

work page 2024
[12]

IDQL: Implicit Q-Learning as an Actor-Critic Method with Diffusion Policies

Hansen-Estruch, P., Kostrikov, I., Janner, M., Kuba, J.G., Levine, S.: Idql: Im- plicit q-learning as an actor-critic method with diffusion policies. arXiv preprint arXiv:2304.10573 (2023) 4

work page internal anchor Pith review Pith/arXiv arXiv 2023
[13]

ACM Transactions on Graphics (TOG)44(4), 1–12 (2025) 2, 4, 6, 8

Huang, X., Truong, T., Zhang, Y., Yu, F., Sleiman, J.P., Hodgins, J., Sreenath, K., Farshidian, F.: Diffuse-cloc: Guided diffusion for physics-based character look- ahead control. ACM Transactions on Graphics (TOG)44(4), 1–12 (2025) 2, 4, 6, 8

work page 2025
[14]

In: European Conference on Computer Vision

Huang, Y., Wan, W., Yang, Y., Callison-Burch, C., Yatskar, M., Liu, L.: Como: Controllable motion generation through language guided pose code editing. In: European Conference on Computer Vision. pp. 180–196. Springer (2024) 3

work page 2024
[15]

In: Pro- ceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recogni- tion

Karunratanakul, K., Preechakul, K., Aksan, E., Beeler, T., Suwajanakorn, S., Tang, S.: Optimizing diffusion noise can serve as universal motion priors. In: Pro- ceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recogni- tion. pp. 1334–1345 (2024) 2, 3, 6

work page 2024
[16]

In: Proceedings of the 16 C.-W

Karunratanakul, K., Preechakul, K., Suwajanakorn, S., Tang, S.: Guided mo- tion diffusion for controllable human motion synthesis. In: Proceedings of the 16 C.-W. Chen et al. IEEE/CVF International Conference on Computer Vision. pp. 2151–2162 (2023) 3

work page 2023
[17]

arXiv preprint arXiv:2505.21837 (2025) 3

Khani, A., Rampini, A., Atherton, E., Roy, B.: Unimogen: Universal motion gen- eration. arXiv preprint arXiv:2505.21837 (2025) 3

work page arXiv 2025
[18]

In: Proceedings of the IEEE/CVF International Conference on Computer Vision

Li,J.,Cao,J.,Zhang,H.,Rempe,D.,Kautz,J.,Iqbal,U.,Yuan,Y.:Genmo:Agen- eralist model for human motion. In: Proceedings of the IEEE/CVF International Conference on Computer Vision. pp. 11766–11776 (2025) 3

work page 2025
[19]

In: Proceed- ings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition

Li, Z., Cheng, K., Ghosh, A., Bhattacharya, U., Gui, L., Bera, A.: Simmotionedit: Text-based human motion editing with motion similarity prediction. In: Proceed- ings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. pp. 27827–27837 (2025) 3

work page 2025
[20]

In: Conference on Robot Learning

Liang, J., Makoviychuk, V., Handa, A., Chentanez, N., Macklin, M., Fox, D.: Gpu-accelerated robotic simulation for distributed reinforcement learning. In: Conference on Robot Learning. pp. 270–282. PMLR (2018),https : / / api . semanticscholar.org/CorpusID:5308461010

work page 2018
[21]

BeyondMimic: From Motion Tracking to Versatile Humanoid Control via Guided Diffusion

Liao, Q., Truong, T.E., Huang, X., Tevet, G., Sreenath, K., Liu, C.K.: Beyond- mimic: From motion tracking to versatile humanoid control via guided diffusion. arXiv preprint arXiv:2508.08241 (2025) 2, 3

work page internal anchor Pith review Pith/arXiv arXiv 2025
[22]

In: The Twelfth International Conference on Learning Representations 4

Liu, H., Sferrazza, C., Abbeel, P.: Chain of hindsight aligns language models with feedback. In: The Twelfth International Conference on Learning Representations 4

work page
[23]

In: ACM SIGGRAPH 2010 papers, pp

Liu, L., Yin, K., Van de Panne, M., Shao, T., Xu, W.: Sampling-based contact-rich motion control. In: ACM SIGGRAPH 2010 papers, pp. 1–10 (2010) 3

work page 2010
[24]

Loper, M., Mahmood, N., Romero, J., Pons-Moll, G., Black, M.J.: SMPL: A skinnedmulti-personlinearmodel.ACMTrans.Graphics(Proc.SIGGRAPHAsia) 34(6), 248:1–248:16 (Oct 2015) 10

work page 2015
[25]

In: The Twelfth Inter- national Conference on Learning Representations (2024),https://openreview

Luo, Z., Cao, J., Merel, J., Winkler, A., Huang, J., Kitani, K.M., Xu, W.: Universal humanoid motion representations for physics-based control. In: The Twelfth Inter- national Conference on Learning Representations (2024),https://openreview. net/forum?id=OrOd8PxOO22, 4, 6, 7, 10, 13

work page 2024
[26]

In: International Conference on Computer Vision (ICCV) (2023) 2, 3

Luo, Z., Cao, J., Winkler, A.W., Kitani, K., Xu, W.: Perpetual humanoid control for real-time simulated avatars. In: International Conference on Computer Vision (ICCV) (2023) 2, 3

work page 2023
[27]

In: International Conference on Com- puter Vision

Mahmood, N., Ghorbani, N., Troje, N.F., Pons-Moll, G., Black, M.J.: AMASS: Archive of motion capture as surface shapes. In: International Conference on Com- puter Vision. pp. 5442–5451 (Oct 2019) 5

work page 2019
[28]

Deepmimic: Example-guided deep reinforcement learning of physics-based character skills

Peng, X.B., Abbeel, P., Levine, S., van de Panne, M.: Deepmimic: Example-guided deep reinforcement learning of physics-based character skills. ACM Trans. Graph. 37(4), 143:1–143:14 (Jul 2018).https://doi.org/10.1145/3197517.3201311, http://doi.acm.org/10.1145/3197517.32013112, 3, 10

work page doi:10.1145/3197517.3201311 2018
[29]

Peng, X.B., Guo, Y., Halper, L., Levine, S., Fidler, S.: Ase: Large-scale reusable adversarialskillembeddingsforphysicallysimulatedcharacters.ACMTransactions On Graphics (TOG)41(4), 1–17 (2022) 3

work page 2022
[30]

ACM Transactions on Graphics (ToG)40(4), 1–20 (2021) 3

Peng, X.B., Ma, Z., Abbeel, P., Levine, S., Kanazawa, A.: Amp: Adversarial motion priors for stylized physics-based character control. ACM Transactions on Graphics (ToG)40(4), 1–20 (2021) 3

work page 2021
[31]

In: Proceedings of the IEEE/CVF International Conference on Computer Vision

Pinyoanuntapong, E., Saleem, M., Karunratanakul, K., Wang, P., Xue, H., Chen, C., Guo, C., Cao, J., Ren, J., Tulyakov, S.: Maskcontrol: Spatio-temporal con- trol for masked motion synthesis. In: Proceedings of the IEEE/CVF International Conference on Computer Vision. pp. 9955–9965 (2025) 3 NaP-Control 17

work page 2025
[32]

In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition

Pinyoanuntapong, E., Wang, P., Lee, M., Chen, C.: Mmm: Generative masked motion model. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. pp. 1546–1555 (2024) 3

work page 2024
[33]

Diffusion Policy Policy Optimization

Ren, A.Z., Lidard, J., Ankile, L.L., Simeonov, A., Agrawal, P., Majumdar, A., Burchfiel, B., Dai, H., Simchowitz, M.: Diffusion policy policy optimization. arXiv preprint arXiv:2409.00588 (2024) 4

work page internal anchor Pith review Pith/arXiv arXiv 2024
[34]

High-Dimensional Continuous Control Using Generalized Advantage Estimation

Schulman, J., Moritz, P., Levine, S., Jordan, M., Abbeel, P.: High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:1506.02438 (2015) 8

work page internal anchor Pith review Pith/arXiv arXiv 2015
[36]

ACM Trans

Shi, Y., Wang, J., Jiang, X., Lin, B., Dai, B., Peng, X.B.: Interactive character con- trol with auto-regressive motion diffusion models. ACM Trans. Graph.43(4) (Jul 2024).https://doi.org/10.1145/3658140,https://doi.org/10.1145/3658140 3

work page doi:10.1145/3658140 2024
[37]

Song,J.,Meng,C.,Ermon,S.:Denoisingdiffusionimplicitmodels.In:International Conference on Learning Representations (ICLR) (2021) 6

work page 2021
[38]

ACM Trans- actions on Graphics (TOG)43(6), 1–21 (2024) 2, 3, 4, 10, 13

Tessler, C., Guo, Y., Nabati, O., Chechik, G., Peng, X.B.: Maskedmimic: Unified physics-based character control through masked motion inpainting. ACM Trans- actions on Graphics (TOG)43(6), 1–21 (2024) 2, 3, 4, 10, 13

work page 2024
[39]

arXiv preprint arXiv:2505.19086 (2025) 2, 3

Tessler, C., Jiang, Y., Coumans, E., Luo, Z., Chechik, G., Peng, X.B.: Masked- manipulator: Versatile whole-body control for loco-manipulation. arXiv preprint arXiv:2505.19086 (2025) 2, 3

work page arXiv 2025
[40]

In: ACM SIGGRAPH 2023 Conference Proceedings

Tessler, C., Kasten, Y., Guo, Y., Mannor, S., Chechik, G., Peng, X.B.: Calm: Conditional adversarial latent models for directable virtual characters. In: ACM SIGGRAPH 2023 Conference Proceedings. pp. 1–9 (2023) 3

work page 2023
[41]

In: The Thirteenth International Confer- ence on Learning Representations (2025),https://openreview.net/forum?id= pZISppZSTv3, 4, 10, 13

Tevet, G., Raab, S., Cohan, S., Reda, D., Luo, Z., Peng, X.B., Bermano, A.H., van de Panne, M.: CLoSD: Closing the loop between simulation and diffu- sion for multi-task character control. In: The Thirteenth International Confer- ence on Learning Representations (2025),https://openreview.net/forum?id= pZISppZSTv3, 4, 10, 13

work page 2025
[42]

In: The Eleventh International Conference on Learning Representations (2023),https://openreview.net/forum?id=SJ1kSyO2jwu3

Tevet, G., Raab, S., Gordon, B., Shafir, Y., Cohen-or, D., Bermano, A.H.: Human motion diffusion model. In: The Eleventh International Conference on Learning Representations (2023),https://openreview.net/forum?id=SJ1kSyO2jwu3

work page 2023
[43]

In: SIGGRAPH Asia 2024 Conference Papers

Truong, T.E., Piseno, M., Xie, Z., Liu, K.: Pdp: Physics-based character animation via diffusion policy. In: SIGGRAPH Asia 2024 Conference Papers. pp. 1–10 (2024) 4

work page 2024
[44]

Conference on Robot Learning (2025) 2, 4, 6

Wagenmaker, A., Nakamoto, M., Zhang, Y., Park, S., Yagoub, W., Nagabandi, A., Gupta,A.,Levine,S.:Steeringyourdiffusionpolicywithlatentspacereinforcement learning. Conference on Robot Learning (2025) 2, 4, 6

work page 2025
[45]

arXiv preprint arXiv:2311.17135 (2023) 3

Wan, W., Dou, Z., Komura, T., Wang, W., Jayaraman, D., Liu, L.: Tlcontrol: Trajectory and language control for human motion synthesis. arXiv preprint arXiv:2311.17135 (2023) 3

work page arXiv 2023
[46]

In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition

Wang, J., Luo, Z., Yuan, Y., Li, Y., Dai, B.: Pacer+: On-demand pedestrian anima- tion controller in driving scenarios. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. pp. 718–728 (2024) 3

work page 2024
[47]

International Journal of Computer Vision133(7), 4277–4293 (2025) 3 18 C.-W

Wang, Y., Li, M., Liu, J., Leng, Z., Li, F.W., Zhang, Z., Liang, X.: Fg-t2m++: Llms-augmented fine-grained text driven human motion generation. International Journal of Computer Vision133(7), 4277–4293 (2025) 3 18 C.-W. Chen et al

work page 2025
[49]

In: Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV) (2025) 2, 4, 5, 6, 8, 10, 13

Wu, Y., Karunratanakul, K., Luo, Z., Tang, S.: Uniphys: Unified planner and con- troller with diffusion for flexible physics-based character control. In: Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV) (2025) 2, 4, 5, 6, 8, 10, 13

work page 2025
[50]

In: Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV)

Xiao, L., Lu, S., Pi, H., Fan, K., Pan, L., Zhou, Y., Feng, Z., Zhou, X., Peng, S., Wang, J.: Motionstreamer: Streaming motion generation via diffusion-based autoregressive model in causal latent space. In: Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV). pp. 10086–10096 (October

work page
[51]

arXiv preprint arXiv:2309.07918 (2023)

Xiao, Z., Wang, T., Wang, J., Cao, J., Zhang, W., Dai, B., Lin, D., Pang, J.: Unified human-scene interaction via prompted chain-of-contacts. arXiv preprint arXiv:2309.07918 (2023) 2

work page arXiv 2023
[52]

Xie, Y., Jampani, V., Zhong, L., Sun, D., Jiang, H.: Omnicontrol: Control any joint atanytimeforhumanmotiongeneration.In:TheTwelfthInternationalConference on Learning Representations (2024) 3

work page 2024
[53]

In: Proceedings of the AAAI Conference on Artificial Intelligence

Yang,H.,Su,K.,Zhang,Y.,Chen,J.,Qian,K.,Liu,G.,Gan,C.:Unimumo:Unified text, music, and motion generation. In: Proceedings of the AAAI Conference on Artificial Intelligence. vol. 39, pp. 25615–25623 (2025) 3

work page 2025
[54]

ACM Transactions on Graphics (TOG) 41(6), 1–16 (2022) 2, 3

Yao, H., Song, Z., Chen, B., Liu, L.: Controlvae: Model-based learning of generative controllers for physics-based characters. ACM Transactions on Graphics (TOG) 41(6), 1–16 (2022) 2, 3

work page 2022
[55]

ACM Transactions on Graphics (TOG)43(4), 1–21 (2024) 2, 3

Yao, H., Song, Z., Zhou, Y., Ao, T., Chen, B., Liu, L.: Moconvq: Unified physics- based motion control via scalable discrete representations. ACM Transactions on Graphics (TOG)43(4), 1–21 (2024) 2, 3

work page 2024
[56]

In: Proceedings of the IEEE/CVF international con- ference on computer vision

Yuan, Y., Song, J., Iqbal, U., Vahdat, A., Kautz, J.: Physdiff: Physics-guided hu- man motion diffusion model. In: Proceedings of the IEEE/CVF international con- ference on computer vision. pp. 16010–16021 (2023) 3

work page 2023
[57]

IEEE transactions on pattern analysis and machine intelligence46(6), 4115–4128 (2024) 3

Zhang, M., Cai, Z., Pan, L., Hong, F., Guo, X., Yang, L., Liu, Z.: Motiondiffuse: Text-driven human motion generation with diffusion model. IEEE transactions on pattern analysis and machine intelligence46(6), 4115–4128 (2024) 3

work page 2024
[58]

In: European Conference on Computer Vision

Zhang, Y., Tzeng, E., Du, Y., Kislyuk, D.: Large-scale reinforcement learning for diffusion models. In: European Conference on Computer Vision. pp. 1–17. Springer (2024) 4

work page 2024
[59]

Zhao, K., Li, G., Tang, S.: DartControl: A diffusion-based autoregressive motion model for real-time text-driven motion control. In: The Thirteenth International Conference on Learning Representations (ICLR) (2025) 3 NaP-Control 1 1 Supplementary We provide comprehensive qualitative results and side-by-side baseline compar- isons in the accompanying suppl...

work page 2025
[60]

arXiv preprint arXiv:2110.15191 (2021) 4

Laskin, M., Yarats, D., Liu, H., Lee, K., Zhan, A., Lu, K., Cang, C., Pinto, L., Abbeel, P.: Urlb: Unsupervised reinforcement learning benchmark. arXiv preprint arXiv:2110.15191 (2021) 4

work page arXiv 2021
[61]

Schulman, J., Wolski, F., Dhariwal, P., Radford, A., Klimov, O.: Proximal policy optimization algorithms (2017),https://arxiv.org/abs/1707.063477, 8, 2

work page internal anchor Pith review Pith/arXiv arXiv 2017
[62]

Ad- vances in Neural Information Processing Systems34, 13–23 (2021) 4

Touati, A., Ollivier, Y.: Learning one representation to optimize all rewards. Ad- vances in Neural Information Processing Systems34, 13–23 (2021) 4

work page 2021
[63]

arXiv preprint arXiv:2511.19236 (2025) 4

Wang,Y.,Jiang,H.,Yao,S.,Ding,Z.,Lu,Z.:Sentinel:Afullyend-to-endlanguage- action model for humanoid whole body control. arXiv preprint arXiv:2511.19236 (2025) 4

work page arXiv 2025

[1] [1]

Bai, Y., Jones, A., Ndousse, K., Askell, A., Chen, A., DasSarma, N., Drain, D., Fort, S., Ganguli, D., Henighan, T., et al.: Training a helpful and harmless assistant withreinforcementlearningfromhumanfeedback.arXivpreprintarXiv:2204.05862 (2022) 4

work page internal anchor Pith review Pith/arXiv arXiv 2022

[2] [2]

In: Pro- ceedings ofthe 26thannualinternational conferenceon machine learning.pp

Bengio, Y., Louradour, J., Collobert, R., Weston, J.: Curriculum learning. In: Pro- ceedings ofthe 26thannualinternational conferenceon machine learning.pp. 41–48 (2009) 10

work page 2009

[3] [3]

Training Diffusion Models with Reinforcement Learning

Black, K., Janner, M., Du, Y., Kostrikov, I., Levine, S.: Training diffusion models with reinforcement learning. arXiv preprint arXiv:2305.13301 (2023) 4

work page internal anchor Pith review Pith/arXiv arXiv 2023

[4] [4]

Advances in neural information processing systems30(2017) 4

Christiano, P.F., Leike, J., Brown, T., Martic, M., Legg, S., Amodei, D.: Deep reinforcement learning from human preferences. Advances in neural information processing systems30(2017) 4

work page 2017

[5] [5]

In: European Confer- ence on Computer Vision

Dai, W., Chen, L.H., Wang, J., Liu, J., Dai, B., Tang, Y.: Motionlcm: Real-time controllable motion generation via latent consistency model. In: European Confer- ence on Computer Vision. pp. 390–408. Springer (2024) 3

work page 2024

[6] [6]

ACM transactions on graphics (TOG)29(4), 1–10 (2010) 3

De Lasa, M., Mordatch, I., Hertzmann, A.: Feature-based locomotion controllers. ACM transactions on graphics (TOG)29(4), 1–10 (2010) 3

work page 2010

[7] [7]

Advances in Neural Information Processing Systems37, 125487–125519 (2024) 4

Eyring, L., Karthik, S., Roth, K., Dosovitskiy, A., Akata, Z.: Reno: Enhancing one-step text-to-image models through reward-based noise optimization. Advances in Neural Information Processing Systems37, 125487–125519 (2024) 4

work page 2024

[8] [8]

Optimizing ddpm sampling with shortcut fine-tuning.arXiv preprint arXiv:2301.13362,

Fan, Y., Lee, K.: Optimizing ddpm sampling with shortcut fine-tuning. arXiv preprint arXiv:2301.13362 (2023) 4

work page arXiv 2023

[9] [9]

Advances in Neural Information Processing Sys- tems36, 79858–79885 (2023) 4

Fan, Y., Watkins, O., Du, Y., Liu, H., Ryu, M., Boutilier, C., Abbeel, P., Ghavamzadeh, M., Lee, K., Lee, K.: Dpok: Reinforcement learning for fine-tuning text-to-image diffusion models. Advances in Neural Information Processing Sys- tems36, 79858–79885 (2023) 4

work page 2023

[10] [10]

In: Proceedings of the Special Interest Group on Computer Graphics and Interactive Techniques Conference Con- ference Papers

Gat, I., Raab, S., Tevet, G., Reshef, Y., Bermano, A.H., Cohen-Or, D.: Anytop: Character animation diffusion with any topology. In: Proceedings of the Special Interest Group on Computer Graphics and Interactive Techniques Conference Con- ference Papers. pp. 1–10 (2025) 3

work page 2025

[11] [11]

In: CVPR (2024) 6

Guo, X., Liu, J., Cui, M., Li, J., Yang, H., Huang, D.: Initno: Boosting text-to- image diffusion models via initial noise optimization. In: CVPR (2024) 6

work page 2024

[12] [12]

IDQL: Implicit Q-Learning as an Actor-Critic Method with Diffusion Policies

Hansen-Estruch, P., Kostrikov, I., Janner, M., Kuba, J.G., Levine, S.: Idql: Im- plicit q-learning as an actor-critic method with diffusion policies. arXiv preprint arXiv:2304.10573 (2023) 4

work page internal anchor Pith review Pith/arXiv arXiv 2023

[13] [13]

ACM Transactions on Graphics (TOG)44(4), 1–12 (2025) 2, 4, 6, 8

Huang, X., Truong, T., Zhang, Y., Yu, F., Sleiman, J.P., Hodgins, J., Sreenath, K., Farshidian, F.: Diffuse-cloc: Guided diffusion for physics-based character look- ahead control. ACM Transactions on Graphics (TOG)44(4), 1–12 (2025) 2, 4, 6, 8

work page 2025

[14] [14]

In: European Conference on Computer Vision

Huang, Y., Wan, W., Yang, Y., Callison-Burch, C., Yatskar, M., Liu, L.: Como: Controllable motion generation through language guided pose code editing. In: European Conference on Computer Vision. pp. 180–196. Springer (2024) 3

work page 2024

[15] [15]

In: Pro- ceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recogni- tion

Karunratanakul, K., Preechakul, K., Aksan, E., Beeler, T., Suwajanakorn, S., Tang, S.: Optimizing diffusion noise can serve as universal motion priors. In: Pro- ceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recogni- tion. pp. 1334–1345 (2024) 2, 3, 6

work page 2024

[16] [16]

In: Proceedings of the 16 C.-W

Karunratanakul, K., Preechakul, K., Suwajanakorn, S., Tang, S.: Guided mo- tion diffusion for controllable human motion synthesis. In: Proceedings of the 16 C.-W. Chen et al. IEEE/CVF International Conference on Computer Vision. pp. 2151–2162 (2023) 3

work page 2023

[17] [17]

arXiv preprint arXiv:2505.21837 (2025) 3

Khani, A., Rampini, A., Atherton, E., Roy, B.: Unimogen: Universal motion gen- eration. arXiv preprint arXiv:2505.21837 (2025) 3

work page arXiv 2025

[18] [18]

In: Proceedings of the IEEE/CVF International Conference on Computer Vision

Li,J.,Cao,J.,Zhang,H.,Rempe,D.,Kautz,J.,Iqbal,U.,Yuan,Y.:Genmo:Agen- eralist model for human motion. In: Proceedings of the IEEE/CVF International Conference on Computer Vision. pp. 11766–11776 (2025) 3

work page 2025

[19] [19]

In: Proceed- ings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition

Li, Z., Cheng, K., Ghosh, A., Bhattacharya, U., Gui, L., Bera, A.: Simmotionedit: Text-based human motion editing with motion similarity prediction. In: Proceed- ings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. pp. 27827–27837 (2025) 3

work page 2025

[20] [20]

In: Conference on Robot Learning

Liang, J., Makoviychuk, V., Handa, A., Chentanez, N., Macklin, M., Fox, D.: Gpu-accelerated robotic simulation for distributed reinforcement learning. In: Conference on Robot Learning. pp. 270–282. PMLR (2018),https : / / api . semanticscholar.org/CorpusID:5308461010

work page 2018

[21] [21]

BeyondMimic: From Motion Tracking to Versatile Humanoid Control via Guided Diffusion

Liao, Q., Truong, T.E., Huang, X., Tevet, G., Sreenath, K., Liu, C.K.: Beyond- mimic: From motion tracking to versatile humanoid control via guided diffusion. arXiv preprint arXiv:2508.08241 (2025) 2, 3

work page internal anchor Pith review Pith/arXiv arXiv 2025

[22] [22]

In: The Twelfth International Conference on Learning Representations 4

Liu, H., Sferrazza, C., Abbeel, P.: Chain of hindsight aligns language models with feedback. In: The Twelfth International Conference on Learning Representations 4

work page

[23] [23]

In: ACM SIGGRAPH 2010 papers, pp

Liu, L., Yin, K., Van de Panne, M., Shao, T., Xu, W.: Sampling-based contact-rich motion control. In: ACM SIGGRAPH 2010 papers, pp. 1–10 (2010) 3

work page 2010

[24] [24]

Loper, M., Mahmood, N., Romero, J., Pons-Moll, G., Black, M.J.: SMPL: A skinnedmulti-personlinearmodel.ACMTrans.Graphics(Proc.SIGGRAPHAsia) 34(6), 248:1–248:16 (Oct 2015) 10

work page 2015

[25] [25]

In: The Twelfth Inter- national Conference on Learning Representations (2024),https://openreview

Luo, Z., Cao, J., Merel, J., Winkler, A., Huang, J., Kitani, K.M., Xu, W.: Universal humanoid motion representations for physics-based control. In: The Twelfth Inter- national Conference on Learning Representations (2024),https://openreview. net/forum?id=OrOd8PxOO22, 4, 6, 7, 10, 13

work page 2024

[26] [26]

In: International Conference on Computer Vision (ICCV) (2023) 2, 3

Luo, Z., Cao, J., Winkler, A.W., Kitani, K., Xu, W.: Perpetual humanoid control for real-time simulated avatars. In: International Conference on Computer Vision (ICCV) (2023) 2, 3

work page 2023

[27] [27]

In: International Conference on Com- puter Vision

Mahmood, N., Ghorbani, N., Troje, N.F., Pons-Moll, G., Black, M.J.: AMASS: Archive of motion capture as surface shapes. In: International Conference on Com- puter Vision. pp. 5442–5451 (Oct 2019) 5

work page 2019

[28] [28]

Deepmimic: Example-guided deep reinforcement learning of physics-based character skills

Peng, X.B., Abbeel, P., Levine, S., van de Panne, M.: Deepmimic: Example-guided deep reinforcement learning of physics-based character skills. ACM Trans. Graph. 37(4), 143:1–143:14 (Jul 2018).https://doi.org/10.1145/3197517.3201311, http://doi.acm.org/10.1145/3197517.32013112, 3, 10

work page doi:10.1145/3197517.3201311 2018

[29] [29]

Peng, X.B., Guo, Y., Halper, L., Levine, S., Fidler, S.: Ase: Large-scale reusable adversarialskillembeddingsforphysicallysimulatedcharacters.ACMTransactions On Graphics (TOG)41(4), 1–17 (2022) 3

work page 2022

[30] [30]

ACM Transactions on Graphics (ToG)40(4), 1–20 (2021) 3

Peng, X.B., Ma, Z., Abbeel, P., Levine, S., Kanazawa, A.: Amp: Adversarial motion priors for stylized physics-based character control. ACM Transactions on Graphics (ToG)40(4), 1–20 (2021) 3

work page 2021

[31] [31]

In: Proceedings of the IEEE/CVF International Conference on Computer Vision

Pinyoanuntapong, E., Saleem, M., Karunratanakul, K., Wang, P., Xue, H., Chen, C., Guo, C., Cao, J., Ren, J., Tulyakov, S.: Maskcontrol: Spatio-temporal con- trol for masked motion synthesis. In: Proceedings of the IEEE/CVF International Conference on Computer Vision. pp. 9955–9965 (2025) 3 NaP-Control 17

work page 2025

[32] [32]

In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition

Pinyoanuntapong, E., Wang, P., Lee, M., Chen, C.: Mmm: Generative masked motion model. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. pp. 1546–1555 (2024) 3

work page 2024

[33] [33]

Diffusion Policy Policy Optimization

Ren, A.Z., Lidard, J., Ankile, L.L., Simeonov, A., Agrawal, P., Majumdar, A., Burchfiel, B., Dai, H., Simchowitz, M.: Diffusion policy policy optimization. arXiv preprint arXiv:2409.00588 (2024) 4

work page internal anchor Pith review Pith/arXiv arXiv 2024

[34] [34]

High-Dimensional Continuous Control Using Generalized Advantage Estimation

Schulman, J., Moritz, P., Levine, S., Jordan, M., Abbeel, P.: High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:1506.02438 (2015) 8

work page internal anchor Pith review Pith/arXiv arXiv 2015

[35] [36]

ACM Trans

Shi, Y., Wang, J., Jiang, X., Lin, B., Dai, B., Peng, X.B.: Interactive character con- trol with auto-regressive motion diffusion models. ACM Trans. Graph.43(4) (Jul 2024).https://doi.org/10.1145/3658140,https://doi.org/10.1145/3658140 3

work page doi:10.1145/3658140 2024

[36] [37]

Song,J.,Meng,C.,Ermon,S.:Denoisingdiffusionimplicitmodels.In:International Conference on Learning Representations (ICLR) (2021) 6

work page 2021

[37] [38]

ACM Trans- actions on Graphics (TOG)43(6), 1–21 (2024) 2, 3, 4, 10, 13

Tessler, C., Guo, Y., Nabati, O., Chechik, G., Peng, X.B.: Maskedmimic: Unified physics-based character control through masked motion inpainting. ACM Trans- actions on Graphics (TOG)43(6), 1–21 (2024) 2, 3, 4, 10, 13

work page 2024

[38] [39]

arXiv preprint arXiv:2505.19086 (2025) 2, 3

Tessler, C., Jiang, Y., Coumans, E., Luo, Z., Chechik, G., Peng, X.B.: Masked- manipulator: Versatile whole-body control for loco-manipulation. arXiv preprint arXiv:2505.19086 (2025) 2, 3

work page arXiv 2025

[39] [40]

In: ACM SIGGRAPH 2023 Conference Proceedings

Tessler, C., Kasten, Y., Guo, Y., Mannor, S., Chechik, G., Peng, X.B.: Calm: Conditional adversarial latent models for directable virtual characters. In: ACM SIGGRAPH 2023 Conference Proceedings. pp. 1–9 (2023) 3

work page 2023

[40] [41]

In: The Thirteenth International Confer- ence on Learning Representations (2025),https://openreview.net/forum?id= pZISppZSTv3, 4, 10, 13

Tevet, G., Raab, S., Cohan, S., Reda, D., Luo, Z., Peng, X.B., Bermano, A.H., van de Panne, M.: CLoSD: Closing the loop between simulation and diffu- sion for multi-task character control. In: The Thirteenth International Confer- ence on Learning Representations (2025),https://openreview.net/forum?id= pZISppZSTv3, 4, 10, 13

work page 2025

[41] [42]

In: The Eleventh International Conference on Learning Representations (2023),https://openreview.net/forum?id=SJ1kSyO2jwu3

Tevet, G., Raab, S., Gordon, B., Shafir, Y., Cohen-or, D., Bermano, A.H.: Human motion diffusion model. In: The Eleventh International Conference on Learning Representations (2023),https://openreview.net/forum?id=SJ1kSyO2jwu3

work page 2023

[42] [43]

In: SIGGRAPH Asia 2024 Conference Papers

Truong, T.E., Piseno, M., Xie, Z., Liu, K.: Pdp: Physics-based character animation via diffusion policy. In: SIGGRAPH Asia 2024 Conference Papers. pp. 1–10 (2024) 4

work page 2024

[43] [44]

Conference on Robot Learning (2025) 2, 4, 6

Wagenmaker, A., Nakamoto, M., Zhang, Y., Park, S., Yagoub, W., Nagabandi, A., Gupta,A.,Levine,S.:Steeringyourdiffusionpolicywithlatentspacereinforcement learning. Conference on Robot Learning (2025) 2, 4, 6

work page 2025

[44] [45]

arXiv preprint arXiv:2311.17135 (2023) 3

Wan, W., Dou, Z., Komura, T., Wang, W., Jayaraman, D., Liu, L.: Tlcontrol: Trajectory and language control for human motion synthesis. arXiv preprint arXiv:2311.17135 (2023) 3

work page arXiv 2023

[45] [46]

In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition

Wang, J., Luo, Z., Yuan, Y., Li, Y., Dai, B.: Pacer+: On-demand pedestrian anima- tion controller in driving scenarios. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. pp. 718–728 (2024) 3

work page 2024

[46] [47]

International Journal of Computer Vision133(7), 4277–4293 (2025) 3 18 C.-W

Wang, Y., Li, M., Liu, J., Leng, Z., Li, F.W., Zhang, Z., Liang, X.: Fg-t2m++: Llms-augmented fine-grained text driven human motion generation. International Journal of Computer Vision133(7), 4277–4293 (2025) 3 18 C.-W. Chen et al

work page 2025

[47] [49]

In: Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV) (2025) 2, 4, 5, 6, 8, 10, 13

Wu, Y., Karunratanakul, K., Luo, Z., Tang, S.: Uniphys: Unified planner and con- troller with diffusion for flexible physics-based character control. In: Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV) (2025) 2, 4, 5, 6, 8, 10, 13

work page 2025

[48] [50]

In: Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV)

Xiao, L., Lu, S., Pi, H., Fan, K., Pan, L., Zhou, Y., Feng, Z., Zhou, X., Peng, S., Wang, J.: Motionstreamer: Streaming motion generation via diffusion-based autoregressive model in causal latent space. In: Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV). pp. 10086–10096 (October

work page

[49] [51]

arXiv preprint arXiv:2309.07918 (2023)

Xiao, Z., Wang, T., Wang, J., Cao, J., Zhang, W., Dai, B., Lin, D., Pang, J.: Unified human-scene interaction via prompted chain-of-contacts. arXiv preprint arXiv:2309.07918 (2023) 2

work page arXiv 2023

[50] [52]

Xie, Y., Jampani, V., Zhong, L., Sun, D., Jiang, H.: Omnicontrol: Control any joint atanytimeforhumanmotiongeneration.In:TheTwelfthInternationalConference on Learning Representations (2024) 3

work page 2024

[51] [53]

In: Proceedings of the AAAI Conference on Artificial Intelligence

Yang,H.,Su,K.,Zhang,Y.,Chen,J.,Qian,K.,Liu,G.,Gan,C.:Unimumo:Unified text, music, and motion generation. In: Proceedings of the AAAI Conference on Artificial Intelligence. vol. 39, pp. 25615–25623 (2025) 3

work page 2025

[52] [54]

ACM Transactions on Graphics (TOG) 41(6), 1–16 (2022) 2, 3

Yao, H., Song, Z., Chen, B., Liu, L.: Controlvae: Model-based learning of generative controllers for physics-based characters. ACM Transactions on Graphics (TOG) 41(6), 1–16 (2022) 2, 3

work page 2022

[53] [55]

ACM Transactions on Graphics (TOG)43(4), 1–21 (2024) 2, 3

Yao, H., Song, Z., Zhou, Y., Ao, T., Chen, B., Liu, L.: Moconvq: Unified physics- based motion control via scalable discrete representations. ACM Transactions on Graphics (TOG)43(4), 1–21 (2024) 2, 3

work page 2024

[54] [56]

In: Proceedings of the IEEE/CVF international con- ference on computer vision

Yuan, Y., Song, J., Iqbal, U., Vahdat, A., Kautz, J.: Physdiff: Physics-guided hu- man motion diffusion model. In: Proceedings of the IEEE/CVF international con- ference on computer vision. pp. 16010–16021 (2023) 3

work page 2023

[55] [57]

IEEE transactions on pattern analysis and machine intelligence46(6), 4115–4128 (2024) 3

Zhang, M., Cai, Z., Pan, L., Hong, F., Guo, X., Yang, L., Liu, Z.: Motiondiffuse: Text-driven human motion generation with diffusion model. IEEE transactions on pattern analysis and machine intelligence46(6), 4115–4128 (2024) 3

work page 2024

[56] [58]

In: European Conference on Computer Vision

Zhang, Y., Tzeng, E., Du, Y., Kislyuk, D.: Large-scale reinforcement learning for diffusion models. In: European Conference on Computer Vision. pp. 1–17. Springer (2024) 4

work page 2024

[57] [59]

Zhao, K., Li, G., Tang, S.: DartControl: A diffusion-based autoregressive motion model for real-time text-driven motion control. In: The Thirteenth International Conference on Learning Representations (ICLR) (2025) 3 NaP-Control 1 1 Supplementary We provide comprehensive qualitative results and side-by-side baseline compar- isons in the accompanying suppl...

work page 2025

[58] [60]

arXiv preprint arXiv:2110.15191 (2021) 4

Laskin, M., Yarats, D., Liu, H., Lee, K., Zhan, A., Lu, K., Cang, C., Pinto, L., Abbeel, P.: Urlb: Unsupervised reinforcement learning benchmark. arXiv preprint arXiv:2110.15191 (2021) 4

work page arXiv 2021

[59] [61]

Schulman, J., Wolski, F., Dhariwal, P., Radford, A., Klimov, O.: Proximal policy optimization algorithms (2017),https://arxiv.org/abs/1707.063477, 8, 2

work page internal anchor Pith review Pith/arXiv arXiv 2017

[60] [62]

Ad- vances in Neural Information Processing Systems34, 13–23 (2021) 4

Touati, A., Ollivier, Y.: Learning one representation to optimize all rewards. Ad- vances in Neural Information Processing Systems34, 13–23 (2021) 4

work page 2021

[61] [63]

arXiv preprint arXiv:2511.19236 (2025) 4

Wang,Y.,Jiang,H.,Yao,S.,Ding,Z.,Lu,Z.:Sentinel:Afullyend-to-endlanguage- action model for humanoid whole body control. arXiv preprint arXiv:2511.19236 (2025) 4

work page arXiv 2025