An inference-time optimization using a control-energy objective on pretrained diffusion models enables coherent long-range human motion generation with explicit domain transitions.
In: ICLR (2023)
4 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
fields
cs.CV 4years
2026 4verdicts
UNVERDICTED 4roles
background 2polarities
background 2representative citing papers
HO-Flow synthesizes realistic hand-object motions from text and canonical 3D objects via an interaction-aware VAE and masked flow matching, reporting SOTA physical plausibility and diversity on GRAB, OakInk, and DexYCB.
UNICA unifies motion planning, rigging, physical simulation, and rendering into a single skeleton-free neural framework that produces next-frame 3D avatar geometry from action inputs and renders it with Gaussian splatting.
MoCHA canonicalizes captions to motion-recoverable semantics before contrastive training, cutting within-motion embedding variance by 11-19% and lifting T2M R@1 by 3.1pp on HumanML3D and 10.3pp on KIT-ML.
citing papers explorer
-
Diffusion Path Alignment for Long-Range Motion Generation and Domain Transitions
An inference-time optimization using a control-energy objective on pretrained diffusion models enables coherent long-range human motion generation with explicit domain transitions.
-
HO-Flow: Generalizable Hand-Object Interaction Generation with Latent Flow Matching
HO-Flow synthesizes realistic hand-object motions from text and canonical 3D objects via an interaction-aware VAE and masked flow matching, reporting SOTA physical plausibility and diversity on GRAB, OakInk, and DexYCB.
-
UNICA: A Unified Neural Framework for Controllable 3D Avatars
UNICA unifies motion planning, rigging, physical simulation, and rendering into a single skeleton-free neural framework that produces next-frame 3D avatar geometry from action inputs and renders it with Gaussian splatting.
-
MoCHA: Denoising Caption Supervision for Motion-Text Retrieval
MoCHA canonicalizes captions to motion-recoverable semantics before contrastive training, cutting within-motion embedding variance by 11-19% and lifting T2M R@1 by 3.1pp on HumanML3D and 10.3pp on KIT-ML.