Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition , pages=

Momask: Generative masked modeling of 3d human motions , author=

3 Pith papers cite this work. Polarity classification is still indexing.

3 Pith papers citing it

browse 3 citing papers

representative citing papers

R-DMesh: Video-Guided 3D Animation via Rectified Dynamic Mesh Flow

cs.CV · 2026-05-13 · unverdicted · novelty 7.0 · 2 refs

R-DMesh generates high-fidelity 4D meshes aligned to video by disentangling base mesh, motion, and a learned rectification jump offset inside a VAE, then using Triflow Attention and rectified-flow diffusion.

AnyAct: Towards Human Reenactment of Character Motion From Video

cs.CV · 2026-05-15 · unverdicted · novelty 6.0 · 2 refs

AnyAct generates editable human reenactments from character videos via conditional motion generation from transferable sparse local 2D articulated cues, with designs for human-only supervision and global-local decoupling.

Composing People Together: Iterative Pose-Image Generation for Multi-Person Interaction Scenes

cs.CV · 2026-05-22 · unverdicted · novelty 5.0

Introduces dual pose-image representation, cross-modal alignment, and iterative construction to improve prompt alignment and diversity in multi-person text-to-image generation.

citing papers explorer

Showing 3 of 3 citing papers.

R-DMesh: Video-Guided 3D Animation via Rectified Dynamic Mesh Flow cs.CV · 2026-05-13 · unverdicted · none · ref 195 · 2 links
R-DMesh generates high-fidelity 4D meshes aligned to video by disentangling base mesh, motion, and a learned rectification jump offset inside a VAE, then using Triflow Attention and rectified-flow diffusion.
AnyAct: Towards Human Reenactment of Character Motion From Video cs.CV · 2026-05-15 · unverdicted · none · ref 23 · 2 links
AnyAct generates editable human reenactments from character videos via conditional motion generation from transferable sparse local 2D articulated cues, with designs for human-only supervision and global-local decoupling.
Composing People Together: Iterative Pose-Image Generation for Multi-Person Interaction Scenes cs.CV · 2026-05-22 · unverdicted · none · ref 36
Introduces dual pose-image representation, cross-modal alignment, and iterative construction to improve prompt alignment and diversity in multi-person text-to-image generation.

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition , pages=

fields

years

verdicts

representative citing papers

citing papers explorer