HumANDiff improves motion consistency in human video generation by sampling diffusion noise on an articulated human body template and adding joint appearance-motion prediction plus a geometric consistency loss.
LHM: large animatable human reconstruction model from a single image in seconds
7 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
fields
cs.CV 7years
2026 7representative citing papers
Pretraining on 1M wild videos followed by post-training on curated data yields high-fidelity feedforward 3D avatars that generalize across identities, clothing, and lighting with emergent relightability and loose-garment support.
VRGaussianAvatar enables real-time full-body 3D Gaussian Splatting avatars in VR from HMD tracking alone via inverse kinematics and binocular batching for efficient stereo rendering, outperforming mesh baselines in performance and user ratings.
A single-image head reconstruction method uses coarse-to-fine optimization with normal consistency, landmarks, and geometry-aware constraints on curvature and conformality to produce meshes with industry-grade topology and preserved facial identity.
A coupled world-agent framework uses 3D Gaussian reconstruction and first-person RGB-D perception with iterative planning to enable goal-directed, collision-avoiding humanoid behavior in novel reconstructed scenes.
TrioMan is a tri-module data augmentation framework using a Generator for pose/camera perturbations, a Refiner with one-step diffusion, and an Examiner with dual-branch attention to improve 3D avatar learning from monocular videos, claiming better results than prior methods on two benchmarks.
HUG3D uses group-instance multi-view diffusion and physics-based optimization to create physically plausible 3D reconstructions of interacting people from a single image.
citing papers explorer
-
HumANDiff: Articulated Noise Diffusion for Motion-Consistent Human Video Generation
HumANDiff improves motion consistency in human video generation by sampling diffusion noise on an articulated human body template and adding joint appearance-motion prediction plus a geometric consistency loss.
-
Large-scale Codec Avatars: The Unreasonable Effectiveness of Large-scale Avatar Pretraining
Pretraining on 1M wild videos followed by post-training on curated data yields high-fidelity feedforward 3D avatars that generalize across identities, clothing, and lighting with emergent relightability and loose-garment support.
-
VRGaussianAvatar: Integrating 3D Gaussian Avatars into VR
VRGaussianAvatar enables real-time full-body 3D Gaussian Splatting avatars in VR from HMD tracking alone via inverse kinematics and binocular batching for efficient stereo rendering, outperforming mesh baselines in performance and user ratings.
-
High-Fidelity Single-Image Head Modeling with Industry-Grade Topology
A single-image head reconstruction method uses coarse-to-fine optimization with normal consistency, landmarks, and geometry-aware constraints on curvature and conformality to produce meshes with industry-grade topology and preserved facial identity.
-
Visually-grounded Humanoid Agents
A coupled world-agent framework uses 3D Gaussian reconstruction and first-person RGB-D perception with iterative planning to enable goal-directed, collision-avoiding humanoid behavior in novel reconstructed scenes.
-
Generator-Refiner-Examiner: A Tri-Module Data Augmentation Framework for 3D Human Avatar Learning from Monocular Videos
TrioMan is a tri-module data augmentation framework using a Generator for pose/camera perturbations, a Refiner with one-step diffusion, and an Examiner with dual-branch attention to improve 3D avatar learning from monocular videos, claiming better results than prior methods on two benchmarks.
-
Human Interaction-Aware 3D Reconstruction from a Single Image
HUG3D uses group-instance multi-view diffusion and physics-based optimization to create physically plausible 3D reconstructions of interacting people from a single image.