Sparc3D: Sparse Representation and Construction for High-Resolution 3D Shapes Modeling
read the original abstract
High-fidelity 3D object synthesis remains significantly more challenging than 2D image generation due to the unstructured nature of mesh data and the cubic complexity of dense volumetric grids. Existing two-stage pipelines-compressing meshes with a VAE (using either 2D or 3D supervision), followed by latent diffusion sampling-often suffer from severe detail loss caused by inefficient representations and modality mismatches introduced in VAE. We introduce Sparc3D, a unified framework that combines a sparse deformable marching cubes representation Sparcubes with a novel encoder Sparconv-VAE. Sparcubes converts raw meshes into high-resolution ($1024^3$) surfaces with arbitrary topology by scattering signed distance and deformation fields onto a sparse cube, allowing differentiable optimization. Sparconv-VAE is the first modality-consistent variational autoencoder built entirely upon sparse convolutional networks, enabling efficient and near-lossless 3D reconstruction suitable for high-resolution generative modeling through latent diffusion. Sparc3D achieves state-of-the-art reconstruction fidelity on challenging inputs, including open surfaces, disconnected components, and intricate geometry. It preserves fine-grained shape details, reduces training and inference cost, and integrates naturally with latent diffusion models for scalable, high-resolution 3D generation.
This paper has not been read by Pith yet.
Forward citations
Cited by 26 Pith papers
-
On the Generation and Mitigation of Harmful Geometry in Image-to-3D Models
Image-to-3D models successfully generate harmful geometries in most cases with under 0.3% caught by commercial filters; existing safeguards are weak but a stacked defense cuts harmful outputs to under 1% at 11% false-...
-
PolyFlow: Continuous Topology Embedding Flow Matching for Artist-style Mesh Generation
PolyFlow converts discrete meshes to continuous per-vertex representations using a topology embedder and applies flow matching for parallel artist-style mesh generation that outperforms autoregressive baselines on Toy...
-
CelloCut: Constructive Watertight Remeshing via Tetrahedral Cell Cuts
CelloCut formulates watertight remeshing as binary labeling on a Delaunay tetrahedral partition solved by graph-cut minimization with one-sided constraints to guarantee volumetrically consistent solids.
-
Velocity-Space 3D Asset Editing
VS3D performs local 3D asset editing by injecting reconstruction-anchored source signals, partial-mean guidance, and twin-agreement residuals into the velocity sampler to control edit strength and preserve identity.
-
Mix3R: Mixing Feed-forward Reconstruction and Generative 3D Priors for Joint Multi-view Aligned 3D Reconstruction and Pose Estimation
Mix3R mixes feed-forward reconstruction and generative 3D priors via Mixture-of-Transformers and overlap-based attention bias to achieve better-aligned 3D shapes and more accurate poses than either approach alone.
-
SVG360: Editable Multiview Vector Graphics from a Single SVG
SVG360 lifts a single SVG to a view-conditioned representation, uses spatial memory to propagate consistent parts across views, and applies structure-aware vectorization to produce editable multiview SVGs.
-
Mesh BDF: Barycentric Dominance Field for 3D Native Mesh Generation
Barycentric Dominance Field converts discrete mesh connectivity into a continuous surface signal that diffusion models can use directly for higher-quality native 3D mesh generation.
-
FLUX3D: High-Fidelity 3D Gaussian Generation with Diffusion-Aligned Sparse Representation
FLUX3D introduces Diffusion-Aligned Structured Latents (DA-SLAT) and Sparse-structure Multimodal Diffusion Transformer (SMDiT) with MARoPE to address representation and alignment bottlenecks in sparse-voxel 3DGS generation.
-
TOPOS: High-Fidelity and Efficient Industry-Grade 3D Head Generation
TOPOS creates high-fidelity 3D heads with fixed industry topology from single images via a specialized VAE with Perceiver Resampler and a rectified flow transformer.
-
Pixal3D: Pixel-Aligned 3D Generation from Images
Pixal3D performs pixel-aligned 3D generation from images via back-projected multi-scale feature volumes, achieving fidelity close to reconstruction while supporting multi-view and scene synthesis.
-
Generative 3D Gaussians with Learned Density Control
DeG models 3D Gaussians via learned octree density and uses VecSeq Sobol re-indexing to turn set generation into sequence modeling, claiming SOTA quality in single-image-to-3D.
-
DVD: Discrete Voxel Diffusion for 3D Generation and Editing
DVD applies discrete diffusion directly to voxel occupancy for 3D generation, uncertainty estimation via entropy, and single-round editing via block perturbation fine-tuning.
-
DVD: Discrete Voxel Diffusion for 3D Generation and Editing
DVD treats voxel occupancy as a discrete variable in a diffusion framework to generate, assess, and edit sparse 3D voxels without continuous thresholding.
-
High-Fidelity Single-Image Head Modeling with Industry-Grade Topology
A single-image head reconstruction method uses coarse-to-fine optimization with normal consistency, landmarks, and geometry-aware constraints on curvature and conformality to produce meshes with industry-grade topolog...
-
MeshReGen: A Unified 3D Geometry Regeneration Framework
MeshReGen introduces a conditioned 3D geometry regenerator with VecSet that learns a regeneration prior via self-supervision and reports state-of-the-art results on controllable generation tasks.
-
MeshReGen: A Unified 3D Geometry Regeneration Framework
3D-ReGen is a conditioned 3D regenerator using VecSet that learns a regeneration prior from unlabeled 3D datasets via self-supervised tasks and achieves state-of-the-art results on controllable 3D geometry tasks.
-
LSRM: High-Fidelity Object-Centric Reconstruction via Scaled Context Windows
LSRM scales transformer context windows with native sparse attention and geometric routing to deliver high-fidelity feed-forward 3D reconstruction and inverse rendering that approaches dense optimization quality.
-
UniRecGen: Unifying Multi-View 3D Reconstruction and Generation
UniRecGen unifies reconstruction and generation via shared canonical space and disentangled cooperative learning to produce complete, consistent 3D models from sparse views.
-
Native and Compact Structured Latents for 3D Generation
Introduces O-Voxel omni-voxel representation and Sparse Compression VAE for structured native 3D latents, enabling efficient training of large flow-matching models that produce higher-quality geometry and materials th...
-
MM-TRELLIS: Point-Cloud Guided Multi-Modal 3D Vehicle Generation in Autonomous Driving
MM-TRELLIS extends TRELLIS with LiDAR point-cloud guidance and multi-view image conditioning plus voxel filtering to generate high-fidelity 3D vehicle meshes from in-the-wild driving data.
-
MeshWeaver: Sparse-Voxel-Guided Surface Weaving for Autoregressive Mesh Generation
MeshWeaver uses sparse-voxel guidance for autoregressive surface weaving to achieve 18% compression and generate up to 16K-face meshes with improved fidelity.
-
SuperVoxelGPT: Adaptive and Ordered 3D Tokenization for Autoregressive Shape Generation
SuperVoxelGPT creates shape-adaptive, deterministically ordered supervoxel tokens via saliency-guided CVT, cutting sequence length to 12.8% of uniform voxels while claiming SOTA quality and 10x speedup on Trellis-500K.
-
WorldAct: Activating Monolithic 3D Worlds into Interactive-Ready Object-Centric Scenes
WorldAct activates monolithic 3D worlds into interactive scenes via multimodal agent-guided decomposition, geometrically aligned mesh reconstruction, and 3D inpainting.
-
Hitem3D 2.0: Multi-View Guided Native 3D Texture Generation
Hitem3D 2.0 combines multi-view image synthesis with native 3D texture projection to improve completeness, cross-view consistency, and geometry alignment over prior methods.
-
CG-MLLM: Captioning and Generating 3D content via Multi-modal Large Language Models
CG-MLLM is a multimodal LLM using a Mixture-of-Transformer architecture with separate TokenAR and BlockAR components integrated with a pre-trained vision-language backbone and 3D VAE to enable 3D captioning and high-f...
-
DreamLifting: A Plug-in Module Lifting MV Diffusion Models for 3D Asset Generation
LGAA is a modular adapter framework that lifts multi-view diffusion models to produce 2D Gaussian Splats with PBR channels for high-quality relightable 3D mesh extraction using data-efficient finetuning on 69k instances.
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.