MoCapAnything reconstructs asset-specific BVH animations from monocular video by predicting 3D joint trajectories then applying constraint-aware inverse kinematics guided by a reference prompt encoder.
Stacked hour- glass networks for human pose estimation
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
fields
cs.CV 2verdicts
UNVERDICTED 2representative citing papers
2D pre-training for 3D human pose estimation yields lower error and higher efficiency than 3D-only training, reaching MPJPE below 64.5 mm on standard benchmarks.
citing papers explorer
-
MoCapAnything: Unified 3D Motion Capture for Arbitrary Skeletons from Monocular Videos
MoCapAnything reconstructs asset-specific BVH animations from monocular video by predicting 3D joint trajectories then applying constraint-aware inverse kinematics guided by a reference prompt encoder.
-
2D Pre-Training for 3D Pose Estimation
2D pre-training for 3D human pose estimation yields lower error and higher efficiency than 3D-only training, reaching MPJPE below 64.5 mm on standard benchmarks.