Sculpt4D: Generating 4D Shapes via Sparse-Attention Diffusion Transformers

· 2026 · cs.CV · arXiv 2604.21592

2 Pith papers cite this work. Polarity classification is still indexing.

2 Pith papers citing it

open full Pith review browse 2 citing papers arXiv PDF

abstract

Recent breakthroughs in 3D generative modeling have yielded remarkable progress in static shape synthesis, yet high-fidelity dynamic 4D generation remains elusive, hindered by temporal artifacts and prohibitive computational demand. We present Sculpt4D, a native 4D generative framework that seamlessly integrates efficient temporal modeling into a pretrained 3D Diffusion Transformer (Hunyuan3D 2.1), thereby mitigating the scarcity of 4D training data. At its core lies a Block Sparse Attention mechanism that preserves object identity by anchoring to the initial frame while capturing rich motion dynamics via a time-decaying sparse mask. This design faithfully models complex spatiotemporal dependencies with high fidelity, while sidestepping the quadratic overhead of full attention and reducing network total computation by 56%. Consequently, Sculpt4D establishes a new state-of-the-art in temporally coherent 4D synthesis and charts a path toward efficient and scalable 4D generation.

representative citing papers

MORPHOS: Autoregressive 4D Generation with Temporal Structured Latents

cs.CV · 2026-06-01 · unverdicted · novelty 6.0

MORPHOS introduces an autoregressive 4D generation method with Temporal Structured Latents (T-SLAT) that produces dynamic 3D assets from videos while handling topological changes and long sequences.

Helix4D: Complex 4D Mesh Generation

cs.CV · 2026-05-25 · unverdicted · novelty 6.0

Helix4D generates high-quality dynamic 4D meshes from videos by extending Trellis2 with sliding-window cross-frame attention anchored on the first frame and a repurposed 4D temporal encoding.

citing papers explorer

Showing 2 of 2 citing papers after filters.

MORPHOS: Autoregressive 4D Generation with Temporal Structured Latents cs.CV · 2026-06-01 · unverdicted · none · ref 46 · internal anchor
MORPHOS introduces an autoregressive 4D generation method with Temporal Structured Latents (T-SLAT) that produces dynamic 3D assets from videos while handling topological changes and long sequences.
Helix4D: Complex 4D Mesh Generation cs.CV · 2026-05-25 · unverdicted · none · ref 36 · internal anchor
Helix4D generates high-quality dynamic 4D meshes from videos by extending Trellis2 with sliding-window cross-frame attention anchored on the first frame and a repurposed 4D temporal encoding.

Sculpt4D: Generating 4D Shapes via Sparse-Attention Diffusion Transformers

fields

years

verdicts

representative citing papers

citing papers explorer