pith. sign in

arxiv: 2607.01990 · v1 · pith:NZMHRKG6new · submitted 2026-07-02 · 💻 cs.CV

Training-free Controllable Human Motion Generation under Heterogeneous Constraints

Pith reviewed 2026-07-03 15:30 UTC · model grok-4.3

classification 💻 cs.CV
keywords training-free motion generationcontrollable human motiondiffusion modelsstochastic controlcriterion-based constraintsheterogeneous constraintsmotion synthesis
0
0 comments X

The pith

Casting diffusion motion generation as stochastic control unifies handling of both differentiable objective constraints and non-differentiable criterion constraints without any training.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper establishes that diffusion-based human motion generation can be reframed as a stochastic control problem to derive step-wise control laws. These laws support constraints that are either continuous and objective-based with differentiable losses or criterion-based with only discontinuous, sparse, or black-box feedback. The same mechanism also supplies an adaptive coordination step that balances multiple constraints during the denoising process. A sympathetic reader would care because most practical motion requirements fall into the criterion-based category that prior training-free methods cannot accommodate.

Core claim

MIC is the first training-free motion generation framework that handles both continuous objective-based and criterion-based motion constraints under a shared mechanism by casting diffusion-based motion generation as a stochastic control problem. This perspective provides principled and practically effective step-wise control laws that support criterion-based constraints without requiring differentiability and naturally accommodate objective-based constraints as a special case, while also motivating a control-oriented constraint coordination mechanism that adaptively balances and reconciles motion constraints during generation.

What carries the argument

The stochastic control formulation of the diffusion denoising process, which yields explicit step-wise control laws for enforcing constraints at each generation step.

If this is right

  • Criterion-based constraints with only discontinuous or black-box feedback become enforceable during generation.
  • Objective-based constraints with differentiable losses are handled as a direct special case.
  • Multiple heterogeneous constraints are balanced adaptively by a control-oriented coordination mechanism at each step.
  • Effectiveness holds across diverse constraint settings in reported experiments.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The control perspective may transfer to other diffusion generation domains where constraints mix smooth and discrete forms.
  • Real-time user interaction could supply criterion feedback directly to the step-wise laws without retraining.
  • Longer or multi-person sequences might reveal whether the coordination mechanism scales without additional tuning.

Load-bearing premise

The stochastic control view of diffusion motion generation produces effective step-wise laws that can enforce criterion-based constraints without differentiability.

What would settle it

Generate motions under a non-differentiable criterion constraint such as requiring a foot to land exactly on a discrete target point at a specific frame; measure whether the success rate exceeds that of random sampling or existing training-free baselines on the same constraint.

Figures

Figures reproduced from arXiv: 2607.01990 by Bo Yan, Haoxuan Qu, Hossein Rahmani, Jun Liu, Xiaofei Hui.

Figure 1
Figure 1. Figure 1: Real-world motion constraints can be heterogeneous. Some are continuous objective-based constraints that can be naturally modeled with differentiable losses (e.g., reaching a target point), whereas many others are criterion-based and provide feedback only as sparse, event-triggered, or binary evaluations, such as simulation-based validity tests and safety-threshold rules (e.g., entering the danger zone at … view at source ↗
Figure 2
Figure 2. Figure 2: Illustration of our proposed MIC framework. MIC adopts a unified constraint interface that converts heterogeneous constraints into control signals: criterion-based constraints are handled via forward terminal costs, while continuous objective-based constraints employ gradient computation. A constraint coordination mechanism then integrates these signals, with a feedback regulator adapting constraint weight… view at source ↗
Figure 3
Figure 3. Figure 3: Visualization comparisons to previ￾ous methods. The compared methods pro￾duce motions with constraint violations such as boundary penetration and collisions (highlighted in red boxes), whereas our method consistently satisfies constraints while maintaining natural motion quality. Qualitative Results. We also show qualitative comparisons. As shown in [PITH_FULL_IMAGE:figures/full_fig_p013_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Qualitative comparisons with ProgMoGen [41] and ReAlign [67] in physical simulation. The motions generated by the compared methods exhibit clear constraint violations and unstable movements (highlighted in red boxes), while MIC consistently satisfies different constraints while maintaining motion naturalness. 4.2 Ablation Study Below we conduct ablation studies on HSI-2 task. More ablation studies, further… view at source ↗
Figure 5
Figure 5. Figure 5: User study. We report the prefer￾ence rate (%) of pairwise comparisons be￾tween our MIC and each compared method. User study. Following [9, 61], we conduct a user study to assess mo￾tion quality. We compare MIC with DNO [31], ProgMoGen [41], and Re￾Align [67], using 30 motions gener￾ated from the same prompts and con￾straint settings. Twenty participants complete paired comparisons between MIC and each bas… view at source ↗
read the original abstract

Training-free controllable motion generation has attracted growing interest for enabling flexible constraint enforcement without constraint-specific training. However, existing training-free methods require constraints to be continuous objective-based with differentiable losses, while many real-world requirements are criterion-based and provide only discontinuous, sparse, or even black-box feedback. In this paper, we propose Motion-Inference-as-Control (MIC), the first training-free motion generation framework that handles both continuous objective-based and criterion-based motion constraints under a shared mechanism. The key idea is to cast diffusion-based motion generation as a stochastic control problem. This perspective not only provides principled and practically effective step-wise control laws that support criterion-based constraints without requiring differentiability and naturally accommodate objective-based constraints as a special case, but also motivates a control-oriented constraint coordination mechanism that adaptively balances and reconciles motion constraints during generation. Experiments across diverse constraint settings demonstrate the effectiveness of our framework.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

1 major / 0 minor

Summary. The manuscript proposes Motion-Inference-as-Control (MIC), the first training-free framework for controllable human motion generation under heterogeneous constraints. It casts diffusion-based motion generation as a stochastic control problem to obtain step-wise control laws that handle both continuous objective-based constraints (treated as a special case) and criterion-based constraints (discontinuous, sparse, or black-box) without requiring differentiability. The approach also introduces a control-oriented mechanism to adaptively balance and reconcile multiple constraints during generation. Experiments across diverse constraint settings are reported to demonstrate effectiveness.

Significance. If the stochastic control reformulation yields effective step-wise laws for non-differentiable constraints, the work would meaningfully extend training-free motion generation to a wider range of real-world requirements that existing gradient-based methods cannot accommodate. The unified mechanism and adaptive coordination could reduce the need for constraint-specific engineering.

major comments (1)
  1. [Abstract] Abstract, paragraph 3: the central claim that the stochastic control perspective 'provides principled and practically effective step-wise control laws that support criterion-based constraints without requiring differentiability' is asserted without any derivation, approximation scheme, or explicit control-law equation. Standard diffusion guidance relies on constraint gradients; extending to non-differentiable feedback requires additional mechanisms (e.g., zeroth-order estimates) whose correctness is not secured by the framing alone. This is load-bearing for the 'first training-free framework handling both types under a shared mechanism' claim.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for the constructive feedback. We address the single major comment below.

read point-by-point responses
  1. Referee: [Abstract] Abstract, paragraph 3: the central claim that the stochastic control perspective 'provides principled and practically effective step-wise control laws that support criterion-based constraints without requiring differentiability' is asserted without any derivation, approximation scheme, or explicit control-law equation. Standard diffusion guidance relies on constraint gradients; extending to non-differentiable feedback requires additional mechanisms (e.g., zeroth-order estimates) whose correctness is not secured by the framing alone. This is load-bearing for the 'first training-free framework handling both types under a shared mechanism' claim.

    Authors: We thank the referee for this observation. The abstract is a concise summary; the full derivation appears in Section 3, where the reverse diffusion process is recast as a finite-horizon stochastic control problem. The resulting step-wise control law is obtained by solving the associated stochastic Hamilton-Jacobi-Bellman equation (Eq. 7), which yields an explicit feedback form that depends only on the current state and the constraint value. For criterion-based (non-differentiable) constraints the same law is applied by evaluating the constraint directly inside the cost functional, without any gradient; the derivation shows that the optimality condition remains valid under the weaker assumption that the constraint is measurable. Objective-based constraints emerge as the special case in which the cost is differentiable. The shared mechanism therefore follows directly from the control formulation rather than from an ad-hoc extension of gradient guidance. If the editor wishes, we will add a parenthetical reference to Eq. 7 in the abstract. revision: partial

Circularity Check

0 steps flagged

No circularity detected; stochastic control framing is presented as independent perspective

full rationale

The abstract and description introduce MIC by casting diffusion motion generation as a stochastic control problem to obtain step-wise laws for heterogeneous constraints. No equations, fitted parameters, or self-citations are exhibited that reduce the claimed control laws or coordination mechanism to prior inputs by construction. The derivation chain is self-contained as a novel perspective rather than a renaming or statistical forcing of existing quantities.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Only the abstract is available, so no concrete free parameters, axioms, or invented entities can be extracted from the manuscript text.

pith-pipeline@v0.9.1-grok · 5686 in / 993 out tokens · 17576 ms · 2026-07-03T15:30:09.125228+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

85 extracted references · 16 canonical work pages · 1 internal anchor

  1. [1]

    Princeton University Press (2021),https://books.google.co.uk/books? id=qZ0DEAAAQBAJ

    Åström, K.J., Murray, R.: Feedback systems: an introduction for scientists and en- gineers. Princeton University Press (2021),https://books.google.co.uk/books? id=qZ0DEAAAQBAJ

  2. [2]

    In: Proceedings of the IEEE/CVF International Conference on Computer Vision

    Bae, J., Hwang, I., Lee, Y.Y., Guo, Z., Liu, J., Ben-Shabat, Y., Kim, Y.M., Kapa- dia, M.: Less is more: Improving motion diffusion models with sparse keyframes. In: Proceedings of the IEEE/CVF International Conference on Computer Vision. pp. 11069–11078 (2025)

  3. [3]

    Actuators12(7) (2023).https://doi.org/10.3390/ act12070282,https://www.mdpi.com/2076-0825/12/7/282

    Blaha, T.M., Smeur, E.J.J., Remes, B.D.W.: A survey of optimal control allocation for aerial vehicle control. Actuators12(7) (2023).https://doi.org/10.3390/ act12070282,https://www.mdpi.com/2076-0825/12/7/282

  4. [4]

    In: Handbook of statistics, vol

    Botev,Z.I.,Kroese,D.P.,Rubinstein,R.Y.,L’ecuyer,P.:Thecross-entropymethod for optimization. In: Handbook of statistics, vol. 31, pp. 35–59. Elsevier (2013)

  5. [5]

    In: Proceedings of the IEEE/CVF International Conference on Computer Vision

    Cai, Y., Wang, Y., Zhu, Y., Cham, T.J., Cai, J., Yuan, J., Liu, J., Zheng, C., Yan, S., Ding, H., et al.: A unified 3d human motion synthesis model via condi- tional variational auto-encoder. In: Proceedings of the IEEE/CVF International Conference on Computer Vision. pp. 11645–11655 (2021)

  6. [6]

    In: Proceedings of the IEEE/CVF International Conference on Computer Vision

    Cao, B., Zheng, S., Wang, Y., Xia, L., Wei, Q., Jin, Q., Liu, J., Lu, Z.: Motionc- trl: A real-time controllable vision-language-motion model. In: Proceedings of the IEEE/CVF International Conference on Computer Vision. pp. 12253–12262 (2025)

  7. [7]

    IEEE Transactions on Visual- ization and Computer Graphics26(10), 3089–3108 (2020).https://doi.org/10

    Caserman, P., Garcia-Agundez, A., Göbel, S.: A survey of full-body motion recon- struction in immersive virtual reality applications. IEEE Transactions on Visual- ization and Computer Graphics26(10), 3089–3108 (2020).https://doi.org/10. 1109/TVCG.2019.2912607

  8. [8]

    arXiv preprint arXiv:2410.18977 (2024)

    Chen, L.H., Lu, S., Dai, W., Dou, Z., Ju, X., Wang, J., Komura, T., Zhang, L.: Pay attention and move better: Harnessing attention for interactive motion generation and training-free editing. arXiv preprint arXiv:2410.18977 (2024)

  9. [9]

    In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition

    Chen, X., Jiang, B., Liu, W., Huang, Z., Fu, B., Chen, T., Yu, G.: Executing your commands via motion diffusion in latent space. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. pp. 18000–18010 (2023)

  10. [10]

    In: Proceedings of Robotics: Science and Systems

    Cheng, X., Ji, Y., Chen, J., Yang, R., Yang, G., Wang, X.: Expressive Whole-Body Control for Humanoid Robots. In: Proceedings of Robotics: Science and Systems. Delft, Netherlands (July 2024).https://doi.org/10.15607/RSS.2024.XX.107

  11. [11]

    In: Globerson, A., Mackey, L., Belgrave, D., Fan, A., Paquet, U., Tomczak, J., Zhang, C

    Christopher, J.K., Baek, S., Fioretto, F.: Constrained synthesis with projected dif- fusion models. In: Globerson, A., Mackey, L., Belgrave, D., Fan, A., Paquet, U., Tomczak, J., Zhang, C. (eds.) Advances in Neural Information Processing Sys- tems. vol. 37, pp. 89307–89333. Curran Associates, Inc. (2024).https://doi. org/10.52202/079017-2834,https://proce...

  12. [12]

    In: The Eleventh International Con- ference on Learning Representations (2023),https://openreview.net/forum?id= OnD9zGAGT0k

    Chung, H., Kim, J., Mccann, M.T., Klasky, M.L., Ye, J.C.: Diffusion posterior sampling for general noisy inverse problems. In: The Eleventh International Con- ference on Learning Representations (2023),https://openreview.net/forum?id= OnD9zGAGT0k

  13. [13]

    In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition

    Dabral, R., Mughal, M.H., Golyanik, V., Theobalt, C.: Mofusion: A framework for denoising-diffusion-based motion synthesis. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. pp. 9760–9770 (2023)

  14. [14]

    In: Proceed- ings of the IEEE/CVF Winter Conference on Applications of Computer Vision

    Degardin, B., Neves, J., Lopes, V., Brito, J., Yaghoubi, E., Proença, H.: Generative adversarial graph convolutional networks for human action synthesis. In: Proceed- ings of the IEEE/CVF Winter Conference on Applications of Computer Vision. pp. 1150–1159 (2022) Controllable Motion Generation under Heterogeneous Constraints 17

  15. [15]

    In: The Thirteenth International Conference on Learning Represen- tations (2025),https://openreview.net/forum?id=xQBRrtQM8u

    Domingo-Enrich, C., Drozdzal, M., Karrer, B., Chen, R.T.Q.: Adjoint matching: Fine-tuning flow and diffusion generative models with memoryless stochastic op- timal control. In: The Thirteenth International Conference on Learning Represen- tations (2025),https://openreview.net/forum?id=xQBRrtQM8u

  16. [16]

    In: Proceedings of the AAAI Conference on Artificial Intelligence

    Dou, H., Chen, Z., Li, Z., Li, H., Yang, L., Deng, Y.: Constrained particle seeking: Solving diffusion inverse problems with just forward passes. In: Proceedings of the AAAI Conference on Artificial Intelligence. vol. 40, pp. 20870–20878 (2026)

  17. [17]

    Journal of the American Statistical Association106(496), 1602–1614 (2011)

    Efron, B.: Tweedie’s formula and selection bias. Journal of the American Statistical Association106(496), 1602–1614 (2011)

  18. [18]

    In: Proceedings of the IEEE/CVF International Conference on Computer Vision

    Foo, L.G., Gong, J., Rahmani, H., Liu, J.: Distribution-aligned diffusion for hu- man mesh recovery. In: Proceedings of the IEEE/CVF International Conference on Computer Vision. pp. 9221–9232 (2023)

  19. [19]

    In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision

    Girolamo, M., Lorenzo, M., Bicchierai, M., Berretti, S., Bagdanov, A.D.: No mocap needed: Post-training motion diffusion models with reinforcement learning using only textual prompts. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision. pp. 967–976 (2026)

  20. [20]

    In: Proceedings of the IEEE/CVF Conference on Com- puter Vision and Pattern Recognition

    Gong, J., Foo, L.G., Fan, Z., Ke, Q., Rahmani, H., Liu, J.: Diffpose: Toward more reliable 3d pose estimation. In: Proceedings of the IEEE/CVF Conference on Com- puter Vision and Pattern Recognition. pp. 13041–13051 (2023)

  21. [21]

    Journal of the Royal Statistical Society: Series B (Methodological)56(4), 549–581 (1994)

    Grenander, U., Miller, M.I.: Representations of knowledge in complex systems. Journal of the Royal Statistical Society: Series B (Methodological)56(4), 549–581 (1994)

  22. [22]

    In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition

    Guo, C., Mu, Y., Javed, M.G., Wang, S., Cheng, L.: Momask: Generative masked modeling of 3d human motions. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. pp. 1900–1910 (2024)

  23. [23]

    In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)

    Guo, C., Zou, S., Zuo, X., Wang, S., Ji, W., Li, X., Cheng, L.: Generating di- verse and natural 3d human motions from text. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). pp. 5152–5161 (June 2022)

  24. [24]

    In: Proceedings of the IEEE/CVF International Conference on Computer Vision

    Guo, Z., Hu, Z., Soh, D.W., Zhao, N.: Motionlab: Unified human motion gener- ation and editing via the motion-condition-motion paradigm. In: Proceedings of the IEEE/CVF International Conference on Computer Vision. pp. 13869–13879 (2025)

  25. [25]

    In: 2025 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV)

    Han, G., Liang, M., Tang, J., Cheng, Y., Liu, W., Huang, S.: Reindiffuse: Crafting physically plausible motions with reinforced diffusion model. In: 2025 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV). pp. 2218–2227. IEEE (2025)

  26. [26]

    Advances in neural information processing systems33, 6840–6851 (2020)

    Ho, J., Jain, A., Abbeel, P.: Denoising diffusion probabilistic models. Advances in neural information processing systems33, 6840–6851 (2020)

  27. [27]

    In: International Conference on Machine Learning

    Huang,Y.,Ghatare,A.,Liu,Y.,Hu,Z.,Zhang,Q.,Sastry,C.S.,Gururani,S.,Oore, S.,Yue,Y.:Symbolicmusicgenerationwithnon-differentiableruleguideddiffusion. In: International Conference on Machine Learning. pp. 19772–19797. PMLR (2024)

  28. [28]

    In: Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV)

    Hwang, I., Bae, J., Lim, D., Kim, Y.M.: Motion synthesis with sparse and flexible keyjoint control. In: Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV). pp. 13203–13213 (October 2025)

  29. [29]

    In: The Thirteenth International Conferenceon LearningRepresentations(2025),https://openreview.net/forum? id=IEul1M5pyk 18 X

    Jeong, M., Hwang, Y., Lee, J., Jung, S., Kim, W.H.: HGM³: Hierarchical generative masked motion modeling with hard token mining. In: The Thirteenth International Conferenceon LearningRepresentations(2025),https://openreview.net/forum? id=IEul1M5pyk 18 X. Hui et al

  30. [30]

    Advances in Neural Information Processing Systems36, 20067–20079 (2023)

    Jiang, B., Chen, X., Liu, W., Yu, J., Yu, G., Chen, T.: Motiongpt: Human motion as a foreign language. Advances in Neural Information Processing Systems36, 20067–20079 (2023)

  31. [31]

    In: Pro- ceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recogni- tion

    Karunratanakul, K., Preechakul, K., Aksan, E., Beeler, T., Suwajanakorn, S., Tang, S.: Optimizing diffusion noise can serve as universal motion priors. In: Pro- ceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recogni- tion. pp. 1334–1345 (2024)

  32. [32]

    In: Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV)

    Karunratanakul, K., Preechakul, K., Suwajanakorn, S., Tang, S.: Guided mo- tion diffusion for controllable human motion synthesis. In: Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV). pp. 2151–2162 (October 2023)

  33. [33]

    In: Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV)

    Kim, J., Kim, B.S., Ye, J.C.: Free2guide: Training-free text-to-video alignment using image lvlm. In: Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV). pp. 17920–17929 (October 2025)

  34. [34]

    Journal of Global Optimization 37(1), 137–157 (2007)

    Kroese, D.P., Rubinstein, R.Y., Taimre, T.: Application of the cross-entropy method to clustering and vector quantization. Journal of Global Optimization 37(1), 137–157 (2007)

  35. [35]

    ACM Trans

    Lee, K., Lee, S., Lee, J.: Interactivecharacter animation by learning multi-objective control. ACM Trans. Graph.37(6) (Dec 2018).https://doi.org/10.1145/ 3272127.3275071,https://doi.org/10.1145/3272127.3275071

  36. [36]

    In: European Conference on Computer Vision

    Li, J., Clegg, A., Mottaghi, R., Wu, J., Puig, X., Liu, C.K.: Controllable human- object interaction synthesis. In: European Conference on Computer Vision. pp. 54–72. Springer (2024)

  37. [37]

    In: Proceed- ings of the Computer Vision and Pattern Recognition Conference

    Li, Z., Cheng, K., Ghosh, A., Bhattacharya, U., Gui, L., Bera, A.: Simmotionedit: Text-based human motion editing with motion similarity prediction. In: Proceed- ings of the Computer Vision and Pattern Recognition Conference. pp. 27827–27837 (2025)

  38. [38]

    In: Proceedings of the IEEE/CVF International Conference on Computer Vision

    Li, Z., Luo, M., Hou, R., Zhao, X., Liu, H., Chang, H., Liu, Z., Li, C.: Morph: A motion-free physics optimization framework for human motion generation. In: Proceedings of the IEEE/CVF International Conference on Computer Vision. pp. 14580–14589 (2025)

  39. [39]

    In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)

    Li, Z., Rahmani, H., Ke, Q., Liu, J.: Longdiff: Training-free long video generation in one go. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). pp. 17789–17798 (June 2025)

  40. [40]

    In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition

    Li, Z., Rahmani, H., Zhang, J., Xue, Y., Mirmehdi, M., Kuen, J., Gu, J., Liu, J.: Diffgraph: An automated agent-driven model merging framework for in-the- wild text-to-image generation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. pp. 36713–36723 (2026)

  41. [41]

    In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition

    Liu, H., Zhan, X., Huang, S., Mu, T.J., Shan, Y.: Programmable motion generation for open-set motion control tasks. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. pp. 1399–1408 (2024)

  42. [42]

    In: 35th British Machine Vision Confer- ence 2024, BMVC 2024, Glasgow, UK, November 25-28, 2024

    Louis, N., Khoshlessan, M., Corso, J.J.: Measuring physical plausibility of 3d human poses using physics simulation. In: 35th British Machine Vision Confer- ence 2024, BMVC 2024, Glasgow, UK, November 25-28, 2024. BMVA (2024), https://bmva-archive.org.uk/bmvc/2024/papers/Paper_615/paper.pdf

  43. [43]

    Advances in Neural Information Processing Systems35, 6815–6828 (2022)

    Luo, Z., Iwase, S., Yuan, Y., Kitani, K.: Embodied scene-aware human pose esti- mation. Advances in Neural Information Processing Systems35, 6815–6828 (2022)

  44. [44]

    In: Proceedings of the Computer Vision and Pattern Recognition Conference

    Meng, Z., Xie, Y., Peng, X., Han, Z., Jiang, H.: Rethinking diffusion for text-driven human motion generation: Redundant representations, evaluation, and masked au- toregression. In: Proceedings of the Computer Vision and Pattern Recognition Conference. pp. 27859–27871 (2025) Controllable Motion Generation under Heterogeneous Constraints 19

  45. [45]

    Electrical Engineering Series, CRC press (2018),https://books.google.co.uk/books?id=Kn50DwAAQBAJ

    Naidu, D.S.: Optimal control systems. Electrical Engineering Series, CRC press (2018),https://books.google.co.uk/books?id=Kn50DwAAQBAJ

  46. [46]

    Universitext, Springer Science & Business Media (2013),https://books.google

    Oksendal, B.: Stochastic differential equations: an introduction with applications. Universitext, Springer Science & Business Media (2013),https://books.google. co.uk/books?id=gizqCAAAQBAJ

  47. [47]

    In: Proceedings of the IEEE/CVF International Conference on Com- puter Vision

    Ota, S., Yu, Q., Fujiwara, K., Ikehata, S., Sato, I.: Pino: Person-interaction noise optimization for long-duration and customizable motion generation of arbitrary- sized groups. In: Proceedings of the IEEE/CVF International Conference on Com- puter Vision. pp. 10676–10685 (2025)

  48. [48]

    In: Forty-second International Conference on Machine Learning (2025),https://openreview.net/forum?id=Z0ffRRtOim

    Pandey, K., Sofian, F.M., Draxler, F., Karaletsos, T., Mandt, S.: Variational con- trol for guidance in diffusion models. In: Forty-second International Conference on Machine Learning (2025),https://openreview.net/forum?id=Z0ffRRtOim

  49. [49]

    Ap- plied Mathematics and Optimization19(1), 187–202 (1989)

    Pavon, M.: Stochastic control and nonequilibrium thermodynamical systems. Ap- plied Mathematics and Optimization19(1), 187–202 (1989)

  50. [50]

    In: Proceedings of the IEEE/CVF International Conference on Computer Vision

    Pinyoanuntapong, E., Saleem, M., Karunratanakul, K., Wang, P., Xue, H., Chen, C., Guo, C., Cao, J., Ren, J., Tulyakov, S.: Maskcontrol: Spatio-temporal con- trol for masked motion synthesis. In: Proceedings of the IEEE/CVF International Conference on Computer Vision. pp. 9955–9965 (2025)

  51. [51]

    Evolution Strategies as a Scalable Alternative to Reinforcement Learning

    Salimans, T., Ho, J., Chen, X., Sidor, S., Sutskever, I.: Evolution strategies as a scalable alternative to reinforcement learning. arXiv preprint arXiv:1703.03864 (2017)

  52. [52]

    In: The Twelfth International Conference on Learning Represen- tations (2024),https://openreview.net/forum?id=dTpbEdN9kr

    Shafir, Y., Tevet, G., Kapon, R., Bermano, A.H.: Human motion diffusion as a generative prior. In: The Twelfth International Conference on Learning Represen- tations (2024),https://openreview.net/forum?id=dTpbEdN9kr

  53. [53]

    Shen, Y., Jiang, X., Yang, Y., Wang, Y., Han, D., Li, D.: Understanding and im- provingtraining-freeloss-baseddiffusionguidance.AdvancesinNeuralInformation Processing Systems37, 108974–109002 (2024)

  54. [54]

    In: International Conference on Machine Learning

    Song, J., Zhang, Q., Yin, H., Mardani, M., Liu, M.Y., Kautz, J., Chen, Y., Vah- dat, A.: Loss-guided diffusion models for plug-and-play controllable generation. In: International Conference on Machine Learning. pp. 32483–32498. PMLR (2023)

  55. [55]

    In: Interna- tional Conference on Learning Representations (2021),https://openreview.net/ forum?id=PxTIG12RRHS

    Song, Y., Sohl-Dickstein, J., Kingma, D.P., Kumar, A., Ermon, S., Poole, B.: Score- based generative modeling through stochastic differential equations. In: Interna- tional Conference on Learning Representations (2021),https://openreview.net/ forum?id=PxTIG12RRHS

  56. [56]

    doi:10.1109/9.119632 , url =

    Spall, J.: Multivariate stochastic approximation using a simultaneous perturbation gradient approximation. IEEE Transactions on Automatic Control37(3), 332–341 (1992).https://doi.org/10.1109/9.119632

  57. [57]

    Estimation of the mean of a multivariate normal distribution,

    Stein, C.M.: Estimation of the mean of a multivariate normal distribution. The An- nals of Statistics9(6), 1135–1151 (1981),http://www.jstor.org/stable/2240405

  58. [58]

    In: Belgrave, D., Zhang, C., Lin, H., Pascanu, R., Koniusz, P., Ghassemi, M., Chen, N

    Tan, X., Wang, H., Geng, X., Zhou, P.: Sopo: Text-to-motion generation us- ing semi-online preference optimization. In: Belgrave, D., Zhang, C., Lin, H., Pascanu, R., Koniusz, P., Ghassemi, M., Chen, N. (eds.) Advances in Neural Information Processing Systems. vol. 38, pp. 60714–60747. Curran Associates, Inc. (2025),https://proceedings.neurips.cc/paper_fi...

  59. [59]

    arXiv preprint arXiv:2501.16778 (2025)

    Tashakori, A., Tashakori, A., Yang, G., Wang, Z.J., Servati, P.: Flexmotion: Lightweight, physics-aware, and controllable human motion generation. arXiv preprint arXiv:2501.16778 (2025)

  60. [60]

    Hui et al

    Tevet, G., Raab, S., Cohan, S., Reda, D., Luo, Z., Peng, X.B., Bermano, A.H., van de Panne, M.: CLoSD: Closing the loop between simulation and diffu- 20 X. Hui et al. sion for multi-task character control. In: The Thirteenth International Confer- ence on Learning Representations (2025),https://openreview.net/forum?id= pZISppZSTv

  61. [61]

    In: The Eleventh International Conference on Learning Representations (2023),https://openreview.net/forum?id=SJ1kSyO2jwu

    Tevet, G., Raab, S., Gordon, B., Shafir, Y., Cohen-Or, D., Bermano, A.H.: Human motion diffusion model. In: The Eleventh International Conference on Learning Representations (2023),https://openreview.net/forum?id=SJ1kSyO2jwu

  62. [62]

    The Journal of Machine Learning Research11, 3137– 3181 (2010)

    Theodorou, E., Buchli, J., Schaal, S.: A generalized path integral control approach to reinforcement learning. The Journal of Machine Learning Research11, 3137– 3181 (2010)

  63. [63]

    In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems

    Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems. pp. 5026–5033 (2012).https://doi.org/10.1109/IROS.2012.6386109

  64. [64]

    In: Proceedings of the AAAI conference on artificial intelligence

    Wang, Z., Yu, P., Zhao, Y., Zhang, R., Zhou, Y., Yuan, J., Chen, C.: Learning diverse stochastic human-action generators by learning smooth latent transitions. In: Proceedings of the AAAI conference on artificial intelligence. vol. 34, pp. 12281– 12288 (2020)

  65. [65]

    arXiv preprint arXiv:2509.20927 (2025)

    Watanabe, A., Ren, J., Siyao, L., Peng, Y., Wu, E., Simo-Serra, E.: Simdiff: Simulator-constrained diffusion model for physically plausible motion generation. arXiv preprint arXiv:2509.20927 (2025)

  66. [66]

    arXiv preprint arXiv:2512.23464 (2025)

    Wen, Y., Shuai, Q., Kang, D., Li, J., Wen, C., Qian, Y., Jiao, N., Chen, C., Chen, W., Wang, Y., et al.: Hy-motion 1.0: Scaling flow matching models for text-to- motion generation. arXiv preprint arXiv:2512.23464 (2025)

  67. [67]

    In: Proceedings of the AAAI Conference on Artificial Intelligence

    Weng, W., Tan, X., Wang, J., Xie, G.S., Zhou, P., Wang, H.: Realign: text-to- motion generation via step-aware reward-guided alignment. In: Proceedings of the AAAI Conference on Artificial Intelligence. vol. 40, pp. 10621–10629 (2026)

  68. [68]

    Cambridge mathematical textbooks, Cambridge University Press (1991),https://books.google.co.uk/books?id= e9saZ0YSi-AC

    Williams, D.: Probability with Martingales. Cambridge mathematical textbooks, Cambridge University Press (1991),https://books.google.co.uk/books?id= e9saZ0YSi-AC

  69. [69]

    Machine learning8(3), 229–256 (1992)

    Williams, R.J.: Simple statistical gradient-following algorithms for connectionist reinforcement learning. Machine learning8(3), 229–256 (1992)

  70. [70]

    In: The Twelfth International Conferenceon LearningRepresentations(2024),https://openreview.net/forum? id=gd0lAEtWso

    Xie, Y., Jampani, V., Zhong, L., Sun, D., Jiang, H.: Omnicontrol: Control any joint at any time for human motion generation. In: The Twelfth International Conferenceon LearningRepresentations(2024),https://openreview.net/forum? id=gd0lAEtWso

  71. [71]

    In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition

    Xu, L., Qu, H., Cai, Y., Liu, J.: 6d-diff: A keypoint diffusion framework for 6d object pose estimation. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. pp. 9676–9686 (2024)

  72. [72]

    ACM Comput

    Yang,L.,Zhang,Z.,Song,Y.,Hong,S.,Xu,R.,Zhao,Y.,Zhang,W.,Cui,B.,Yang, M.H.: Diffusion models: A comprehensive survey of methods and applications. ACM Comput. Surv.56(4) (Nov 2023).https://doi.org/10.1145/3626235, https://doi.org/10.1145/3626235

  73. [73]

    IEEE Transac- tions on Visualization and Computer Graphics28(5), 2245–2255 (2022).https: //doi.org/10.1109/TVCG.2022.3150507

    Yin, T., Hoyet, L., Christie, M., Cani, M.P., Pettré, J.: The one-man-crowd: Single user generation of crowd motions using virtual reality. IEEE Transac- tions on Visualization and Computer Graphics28(5), 2245–2255 (2022).https: //doi.org/10.1109/TVCG.2022.3150507

  74. [74]

    Yong,J.,Zhou,X.Y.:Stochasticcontrols:HamiltoniansystemsandHJBequations, vol. 43. Springer Science & Business Media (1999)

  75. [75]

    arXiv preprint arXiv:2508.01590 (2025) Controllable Motion Generation under Heterogeneous Constraints 21

    Yu, H., Liu, J., Gui, X., Wong, M., Hou, Y., Ong, Y.S.: A plug-and-play multi-criteria guidance for diverse in-betweening human motion generation. arXiv preprint arXiv:2508.01590 (2025) Controllable Motion Generation under Heterogeneous Constraints 21

  76. [76]

    In: Proceedings of the IEEE/CVF international con- ference on computer vision

    Yuan, Y., Song, J., Iqbal, U., Vahdat, A., Kautz, J.: Physdiff: Physics-guided hu- man motion diffusion model. In: Proceedings of the IEEE/CVF international con- ference on computer vision. pp. 16010–16021 (2023)

  77. [77]

    In: Proceedings of theIEEE/CVF conference on computer vision and pattern recognition

    Zhang, J., Zhang, Y., Cun, X., Zhang, Y., Zhao, H., Lu, H., Shen, X., Shan, Y.: Generating human motion from textual descriptions with discrete representa- tions. In: Proceedings of theIEEE/CVF conference on computer vision and pattern recognition. pp. 14730–14740 (2023)

  78. [78]

    IEEE transactions on pattern analysis and machine intelligence46(6), 4115–4128 (2024)

    Zhang, M., Cai, Z., Pan, L., Hong, F., Guo, X., Yang, L., Liu, Z.: Motiondiffuse: Text-driven human motion generation with diffusion model. IEEE transactions on pattern analysis and machine intelligence46(6), 4115–4128 (2024)

  79. [79]

    In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)

    Zhang, R., Li, C., Lou, Y., Shi, Y., Wang, H., Huang, Y.: Steering where to dif- fuse: Generative modeling of phenotypic response simulation with steered diffusion bridge. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). pp. 27367–27377 (June 2026)

  80. [80]

    In: The Thirteenth International Conference on Learning Represen- tations (2025),https://openreview.net/forum?id=HA0oLUvuGI

    Zhang, S., Zhang, W., Gu, Q.: Energy-weighted flow matching for offline reinforce- ment learning. In: The Thirteenth International Conference on Learning Represen- tations (2025),https://openreview.net/forum?id=HA0oLUvuGI

Showing first 80 references.