TextMesh4D: Zero-shot Text-to-4D Mesh Generation

Kai Xu; Sisi Dai; Xinxin Su

arxiv: 2506.24121 · v3 · pith:3QKHEEISnew · submitted 2025-06-30 · 💻 cs.CV

TextMesh4D: Zero-shot Text-to-4D Mesh Generation

Sisi Dai , Xinxin Su , Kai Xu This is my paper

classification 💻 cs.CV

keywords deformationtext-to-4dzero-shotgenerationmeshessurfaceconsistencydynamic

0 comments

read the original abstract

Large-scale, high-quality dynamic 3D (4D) assets are essential for learning physically grounded representations, but remain costly to capture and annotate at scale. This limits the viability of supervised 4D learning and motivates zero-shot text-to-4D generation leveraging pretrained diffusion priors. To model complex dynamics, prior methods typically adopt implicit 3D representations (e.g., NeRFs or 3DGS) for their deformation capacity. However, their implicit nature provides limited control over surface topology, which hinders high-fidelity geometry and makes temporally coherent surface reconstruction challenging. To address these limitations, we explore zero-shot text-to-4D mesh generation. However, a structural mismatch arises when combining diffusion-based guidance with topology-constrained meshes: the guidance is noisy and spatially inconsistent, while meshes impose severe topological constraints, making direct vertex-level deformation unstable. In this paper, we introduce TextMesh4D, the first zero-shot framework for text-to-4D that directly generates dynamic meshes by addressing the above challenge at two complementary levels. Geometrically, we shift deformation modeling from vertices to faces via a Jacobian Deformation Field (JDF), enabling topology-aware surface reconstruction through an integrability-enforcing integration formulation. Semantically, we propose a Local-Global Semantic Regularizer (LGSR) that preserves identity over time by jointly constraining local deformation plausibility and global shape consistency. Extensive experiments demonstrate state-of-the-art temporal consistency, structural fidelity, and visual quality, while remaining efficient on a single 24GB GPU.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Follow Your Track: Precise Skeleton Animation Controlled by 3D Trajectories
cs.CV 2026-06 unverdicted novelty 6.0

ACT is a trajectory-conditioned framework for topology-general skeletal animation that injects 3D point trajectories from monocular video into skeletons via a Routed Trajectory Injector for improved fidelity and tempo...