{"paper":{"title":"PhyMotion: Structured 3D Motion Reward for Physics-Grounded Human Video Generation","license":"http://arxiv.org/licenses/nonexclusive-distrib/1.0/","headline":"PhyMotion scores recovered 3D human meshes in a physics simulator to reward realistic motion in generated videos.","cross_cats":["cs.AI"],"primary_cat":"cs.CV","authors_text":"Dong-Ki Kim, Han Lin, Jaehong Yoon, Jaemin Cho, Mohit Bansal, Shayegan Omidshafiei, Yidong Huang, Yue Zhang, Zun Wang","submitted_at":"2026-05-14T02:12:13Z","abstract_excerpt":"Generating realistic human motion is a central yet unsolved challenge in video generation. While reinforcement learning (RL)-based post-training has driven recent gains in general video quality, extending it to human motion remains bottlenecked by a reward signal that cannot reliably score motion realism. Existing video rewards primarily rely on 2D perceptual signals, without explicitly modeling the 3D body state, contact, and dynamics underlying articulated human motion, and often assign high scores to videos with floating bodies or physically implausible movements. To address this, we propos"},"claims":{"count":4,"items":[{"kind":"strongest_claim","text":"Optimizing PhyMotion leads to larger and more consistent improvements than optimizing existing rewards, improving motion realism across both autoregressive and bidirectional video generators under both automatic metrics and blind human evaluation (+68 Elo gain).","source":"verdict.strongest_claim","status":"machine_extracted","claim_id":"C1","attestation":"unclaimed"},{"kind":"weakest_assumption","text":"That SMPL mesh recovery from generated videos is sufficiently accurate and that retargeting those meshes into MuJoCo faithfully captures the physical violations that matter to human viewers.","source":"verdict.weakest_assumption","status":"machine_extracted","claim_id":"C2","attestation":"unclaimed"},{"kind":"one_line_summary","text":"PhyMotion scores generated human videos by grounding recovered 3D poses in a physics simulator across kinematic, contact, and dynamic axes, yielding stronger human correlation and larger RL post-training gains than prior 2D rewards.","source":"verdict.one_line_summary","status":"machine_extracted","claim_id":"C3","attestation":"unclaimed"},{"kind":"headline","text":"PhyMotion scores recovered 3D human meshes in a physics simulator to reward realistic motion in generated videos.","source":"verdict.pith_extraction.headline","status":"machine_extracted","claim_id":"C4","attestation":"unclaimed"}],"snapshot_sha256":"76fc71d5ab934f433dc110564896fd240ba597b0a53eb81a559273f9b2f5aec0"},"source":{"id":"2605.14269","kind":"arxiv","version":1},"verdict":{"id":"906e2181-3369-4e1a-92ad-b00a732479d5","model_set":{"reader":"grok-4.3"},"created_at":"2026-05-15T02:46:13.090754Z","strongest_claim":"Optimizing PhyMotion leads to larger and more consistent improvements than optimizing existing rewards, improving motion realism across both autoregressive and bidirectional video generators under both automatic metrics and blind human evaluation (+68 Elo gain).","one_line_summary":"PhyMotion scores generated human videos by grounding recovered 3D poses in a physics simulator across kinematic, contact, and dynamic axes, yielding stronger human correlation and larger RL post-training gains than prior 2D rewards.","pipeline_version":"pith-pipeline@v0.9.0","weakest_assumption":"That SMPL mesh recovery from generated videos is sufficiently accurate and that retargeting those meshes into MuJoCo faithfully captures the physical violations that matter to human viewers.","pith_extraction_headline":"PhyMotion scores recovered 3D human meshes in a physics simulator to reward realistic motion in generated videos."},"references":{"count":26,"sample":[{"doi":"","year":null,"title":"Onestory: Coherent multi-shot video generation with adaptive memory.CVPR, 2026a","work_id":"ee10761b-dd39-441e-9b68-159d0bbbf0c0","ref_index":1,"cited_arxiv_id":"","is_internal_anchor":false},{"doi":"","year":null,"title":"Videojam: Joint appearance-motion representations for en- hanced motion generation in video models","work_id":"d22ef704-e6df-4caf-a9b3-f220ad768f8b","ref_index":2,"cited_arxiv_id":"","is_internal_anchor":false},{"doi":"","year":null,"title":"Seedance 1.0: Exploring the Boundaries of Video Generation Models","work_id":"b2e36b5d-99e4-45b4-9358-64f6d3501983","ref_index":3,"cited_arxiv_id":"2506.09113","is_internal_anchor":true},{"doi":"","year":null,"title":"DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning","work_id":"e6b75ad5-2877-4168-97c8-710407094d20","ref_index":4,"cited_arxiv_id":"2501.12948","is_internal_anchor":true},{"doi":"","year":null,"title":"GARDO: Reinforcing diffusion models without reward hacking","work_id":"0f625cc6-9e6f-4185-9c68-eca92e12284a","ref_index":5,"cited_arxiv_id":"","is_internal_anchor":false}],"resolved_work":26,"snapshot_sha256":"a6fd45e36fcda394731252556139a8ca983c54f2facec3936e2ebca4662b51ef","internal_anchors":7},"formal_canon":{"evidence_count":2,"snapshot_sha256":"3c418c998b80d792f47145babb12428e273e4c73228fb6de04fe8a93b41ded53"},"author_claims":{"count":0,"strong_count":0,"snapshot_sha256":"258153158e38e3291e3d48162225fcdb2d5a3ed65a07baac614ab91432fd4f57"},"builder_version":"pith-number-builder-2026-05-17-v1"}