{"paper":{"title":"R-DMesh: Video-Guided 3D Animation via Rectified Dynamic Mesh Flow","license":"http://arxiv.org/licenses/nonexclusive-distrib/1.0/","headline":"A learned rectification jump offset aligns arbitrary input mesh poses to video starting frames before animation.","cross_cats":["cs.GR","cs.LG"],"primary_cat":"cs.CV","authors_text":"Chunchao Guo, Lixin Xu, Puhua Jiang, Sicong Liu, Xiang Bai, Zijie Wu","submitted_at":"2026-05-13T17:58:13Z","abstract_excerpt":"Video-guided 3D animation holds immense potential for content creation, offering intuitive and precise control over dynamic assets. However, practical deployment faces a critical yet frequently overlooked hurdle: the pose misalignment dilemma. In real-world scenarios, the initial pose of a user-provided static mesh rarely aligns with the starting frame of a reference video. Naively forcing a mesh to follow a mismatched trajectory inevitably leads to severe geometric distortion or animation failure. To address this, we present Rectified Dynamic Mesh (R-DMesh), a unified framework designed to ge"},"claims":{"count":4,"items":[{"kind":"strongest_claim","text":"our method introduces a novel VAE that explicitly disentangles the input into a conditional base mesh, relative motion trajectories, and a crucial rectification jump offset. This offset is learned to automatically transform the arbitrary pose of the input mesh to match the video's initial state before animation begins.","source":"verdict.strongest_claim","status":"machine_extracted","claim_id":"C1","attestation":"unclaimed"},{"kind":"weakest_assumption","text":"That a learned rectification jump offset can reliably map arbitrary input mesh poses onto the video's starting frame without introducing geometric distortion or breaking downstream physical consistency enforced by Triflow Attention.","source":"verdict.weakest_assumption","status":"machine_extracted","claim_id":"C2","attestation":"unclaimed"},{"kind":"one_line_summary","text":"R-DMesh generates high-fidelity 4D meshes aligned to video by disentangling base mesh, motion, and a learned rectification jump offset inside a VAE, then using Triflow Attention and rectified-flow diffusion.","source":"verdict.one_line_summary","status":"machine_extracted","claim_id":"C3","attestation":"unclaimed"},{"kind":"headline","text":"A learned rectification jump offset aligns arbitrary input mesh poses to video starting frames before animation.","source":"verdict.pith_extraction.headline","status":"machine_extracted","claim_id":"C4","attestation":"unclaimed"}],"snapshot_sha256":"4423c62138ace08262e96711928fcebcf954d4a1effc7429cc6c6f700862d4c5"},"source":{"id":"2605.13838","kind":"arxiv","version":2},"verdict":{"id":"97057716-2bdd-4895-acdf-7a0aa8e700ca","model_set":{"reader":"grok-4.3"},"created_at":"2026-05-15T06:05:18.661947Z","strongest_claim":"our method introduces a novel VAE that explicitly disentangles the input into a conditional base mesh, relative motion trajectories, and a crucial rectification jump offset. This offset is learned to automatically transform the arbitrary pose of the input mesh to match the video's initial state before animation begins.","one_line_summary":"R-DMesh generates high-fidelity 4D meshes aligned to video by disentangling base mesh, motion, and a learned rectification jump offset inside a VAE, then using Triflow Attention and rectified-flow diffusion.","pipeline_version":"pith-pipeline@v0.9.0","weakest_assumption":"That a learned rectification jump offset can reliably map arbitrary input mesh poses onto the video's starting frame without introducing geometric distortion or breaking downstream physical consistency enforced by Triflow Attention.","pith_extraction_headline":"A learned rectification jump offset aligns arbitrary input mesh poses to video starting frames before animation."},"references":{"count":208,"sample":[{"doi":"10.1145/1188913.1188915","year":2007,"title":"Abril and Robert Plant","work_id":"288e32b0-ff7e-4036-8340-8e39e290d6fc","ref_index":1,"cited_arxiv_id":"","is_internal_anchor":false},{"doi":"10.1145/1219092.1219093","year":2007,"title":"Deciding equivalances among conjunctive aggregate queries","work_id":"d71462fa-fb30-40aa-a394-bc36f1e31240","ref_index":2,"cited_arxiv_id":"","is_internal_anchor":false},{"doi":"","year":1996,"title":"Special issue: Digital Libraries. 1996","work_id":"023f03e9-ea8e-4c33-bf7c-8ed5ae78008e","ref_index":3,"cited_arxiv_id":"","is_internal_anchor":false},{"doi":"","year":2001,"title":"Understanding Policy-Based Networking","work_id":"dd882f77-9db1-42d5-b9fe-fd2758b72e46","ref_index":4,"cited_arxiv_id":"","is_internal_anchor":false},{"doi":"10.1007/3-540-09237-4","year":2008,"title":"Editor (Ed.), title The title of book two , The name of the series two, edition 2nd","work_id":"d8ed6381-22ac-4b2e-9bb2-a87be605a09a","ref_index":7,"cited_arxiv_id":"","is_internal_anchor":false}],"resolved_work":208,"snapshot_sha256":"923fc444cd82b865caa790d7e520a31303afd31be62d69f3b52fbf65f23555a4","internal_anchors":27},"formal_canon":{"evidence_count":1,"snapshot_sha256":"ffa54078aba625e17f6cd5ad341be72cb06aca0ad31ea40b654d6d00389419ea"},"author_claims":{"count":0,"strong_count":0,"snapshot_sha256":"258153158e38e3291e3d48162225fcdb2d5a3ed65a07baac614ab91432fd4f57"},"builder_version":"pith-number-builder-2026-05-17-v1"}