pith. sign in

arxiv: 2505.14521 · v3 · pith:73UMZ6SOnew · submitted 2025-05-20 · 💻 cs.CV

Sparc3D: Sparse Representation and Construction for High-Resolution 3D Shapes Modeling

classification 💻 cs.CV
keywords high-resolutionsparsediffusionlatentsparc3dchallenginggenerationmeshes
0
0 comments X
read the original abstract

High-fidelity 3D object synthesis remains significantly more challenging than 2D image generation due to the unstructured nature of mesh data and the cubic complexity of dense volumetric grids. Existing two-stage pipelines-compressing meshes with a VAE (using either 2D or 3D supervision), followed by latent diffusion sampling-often suffer from severe detail loss caused by inefficient representations and modality mismatches introduced in VAE. We introduce Sparc3D, a unified framework that combines a sparse deformable marching cubes representation Sparcubes with a novel encoder Sparconv-VAE. Sparcubes converts raw meshes into high-resolution ($1024^3$) surfaces with arbitrary topology by scattering signed distance and deformation fields onto a sparse cube, allowing differentiable optimization. Sparconv-VAE is the first modality-consistent variational autoencoder built entirely upon sparse convolutional networks, enabling efficient and near-lossless 3D reconstruction suitable for high-resolution generative modeling through latent diffusion. Sparc3D achieves state-of-the-art reconstruction fidelity on challenging inputs, including open surfaces, disconnected components, and intricate geometry. It preserves fine-grained shape details, reduces training and inference cost, and integrates naturally with latent diffusion models for scalable, high-resolution 3D generation.

This paper has not been read by Pith yet.

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Forward citations

Cited by 26 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. On the Generation and Mitigation of Harmful Geometry in Image-to-3D Models

    cs.CR 2026-05 conditional novelty 8.0

    Image-to-3D models successfully generate harmful geometries in most cases with under 0.3% caught by commercial filters; existing safeguards are weak but a stacked defense cuts harmful outputs to under 1% at 11% false-...

  2. PolyFlow: Continuous Topology Embedding Flow Matching for Artist-style Mesh Generation

    cs.GR 2026-06 unverdicted novelty 7.0

    PolyFlow converts discrete meshes to continuous per-vertex representations using a topology embedder and applies flow matching for parallel artist-style mesh generation that outperforms autoregressive baselines on Toy...

  3. CelloCut: Constructive Watertight Remeshing via Tetrahedral Cell Cuts

    cs.GR 2026-05 unverdicted novelty 7.0

    CelloCut formulates watertight remeshing as binary labeling on a Delaunay tetrahedral partition solved by graph-cut minimization with one-sided constraints to guarantee volumetrically consistent solids.

  4. Velocity-Space 3D Asset Editing

    cs.GR 2026-05 unverdicted novelty 7.0

    VS3D performs local 3D asset editing by injecting reconstruction-anchored source signals, partial-mean guidance, and twin-agreement residuals into the velocity sampler to control edit strength and preserve identity.

  5. Mix3R: Mixing Feed-forward Reconstruction and Generative 3D Priors for Joint Multi-view Aligned 3D Reconstruction and Pose Estimation

    cs.CV 2026-05 unverdicted novelty 7.0

    Mix3R mixes feed-forward reconstruction and generative 3D priors via Mixture-of-Transformers and overlap-based attention bias to achieve better-aligned 3D shapes and more accurate poses than either approach alone.

  6. SVG360: Editable Multiview Vector Graphics from a Single SVG

    cs.CV 2025-11 unverdicted novelty 7.0

    SVG360 lifts a single SVG to a view-conditioned representation, uses spatial memory to propagate consistent parts across views, and applies structure-aware vectorization to produce editable multiview SVGs.

  7. Mesh BDF: Barycentric Dominance Field for 3D Native Mesh Generation

    cs.CV 2026-06 unverdicted novelty 6.0

    Barycentric Dominance Field converts discrete mesh connectivity into a continuous surface signal that diffusion models can use directly for higher-quality native 3D mesh generation.

  8. FLUX3D: High-Fidelity 3D Gaussian Generation with Diffusion-Aligned Sparse Representation

    cs.CV 2026-06 unverdicted novelty 6.0

    FLUX3D introduces Diffusion-Aligned Structured Latents (DA-SLAT) and Sparse-structure Multimodal Diffusion Transformer (SMDiT) with MARoPE to address representation and alignment bottlenecks in sparse-voxel 3DGS generation.

  9. TOPOS: High-Fidelity and Efficient Industry-Grade 3D Head Generation

    cs.CV 2026-05 unverdicted novelty 6.0

    TOPOS creates high-fidelity 3D heads with fixed industry topology from single images via a specialized VAE with Perceiver Resampler and a rectified flow transformer.

  10. Pixal3D: Pixel-Aligned 3D Generation from Images

    cs.CV 2026-05 unverdicted novelty 6.0

    Pixal3D performs pixel-aligned 3D generation from images via back-projected multi-scale feature volumes, achieving fidelity close to reconstruction while supporting multi-view and scene synthesis.

  11. Generative 3D Gaussians with Learned Density Control

    cs.GR 2026-05 unverdicted novelty 6.0

    DeG models 3D Gaussians via learned octree density and uses VecSeq Sobol re-indexing to turn set generation into sequence modeling, claiming SOTA quality in single-image-to-3D.

  12. DVD: Discrete Voxel Diffusion for 3D Generation and Editing

    cs.CV 2026-05 unverdicted novelty 6.0

    DVD applies discrete diffusion directly to voxel occupancy for 3D generation, uncertainty estimation via entropy, and single-round editing via block perturbation fine-tuning.

  13. DVD: Discrete Voxel Diffusion for 3D Generation and Editing

    cs.CV 2026-05 unverdicted novelty 6.0

    DVD treats voxel occupancy as a discrete variable in a diffusion framework to generate, assess, and edit sparse 3D voxels without continuous thresholding.

  14. High-Fidelity Single-Image Head Modeling with Industry-Grade Topology

    cs.CV 2026-05 unverdicted novelty 6.0

    A single-image head reconstruction method uses coarse-to-fine optimization with normal consistency, landmarks, and geometry-aware constraints on curvature and conformality to produce meshes with industry-grade topolog...

  15. MeshReGen: A Unified 3D Geometry Regeneration Framework

    cs.CV 2026-04 unverdicted novelty 6.0

    MeshReGen introduces a conditioned 3D geometry regenerator with VecSet that learns a regeneration prior via self-supervision and reports state-of-the-art results on controllable generation tasks.

  16. MeshReGen: A Unified 3D Geometry Regeneration Framework

    cs.CV 2026-04 unverdicted novelty 6.0

    3D-ReGen is a conditioned 3D regenerator using VecSet that learns a regeneration prior from unlabeled 3D datasets via self-supervised tasks and achieves state-of-the-art results on controllable 3D geometry tasks.

  17. LSRM: High-Fidelity Object-Centric Reconstruction via Scaled Context Windows

    cs.CV 2026-04 conditional novelty 6.0

    LSRM scales transformer context windows with native sparse attention and geometric routing to deliver high-fidelity feed-forward 3D reconstruction and inverse rendering that approaches dense optimization quality.

  18. UniRecGen: Unifying Multi-View 3D Reconstruction and Generation

    cs.CV 2026-04 unverdicted novelty 6.0

    UniRecGen unifies reconstruction and generation via shared canonical space and disentangled cooperative learning to produce complete, consistent 3D models from sparse views.

  19. Native and Compact Structured Latents for 3D Generation

    cs.CV 2025-12 unverdicted novelty 6.0

    Introduces O-Voxel omni-voxel representation and Sparse Compression VAE for structured native 3D latents, enabling efficient training of large flow-matching models that produce higher-quality geometry and materials th...

  20. MM-TRELLIS: Point-Cloud Guided Multi-Modal 3D Vehicle Generation in Autonomous Driving

    cs.CV 2026-06 unverdicted novelty 5.0

    MM-TRELLIS extends TRELLIS with LiDAR point-cloud guidance and multi-view image conditioning plus voxel filtering to generate high-fidelity 3D vehicle meshes from in-the-wild driving data.

  21. MeshWeaver: Sparse-Voxel-Guided Surface Weaving for Autoregressive Mesh Generation

    cs.CV 2026-06 unverdicted novelty 5.0

    MeshWeaver uses sparse-voxel guidance for autoregressive surface weaving to achieve 18% compression and generate up to 16K-face meshes with improved fidelity.

  22. SuperVoxelGPT: Adaptive and Ordered 3D Tokenization for Autoregressive Shape Generation

    cs.CV 2026-05 unverdicted novelty 5.0

    SuperVoxelGPT creates shape-adaptive, deterministically ordered supervoxel tokens via saliency-guided CVT, cutting sequence length to 12.8% of uniform voxels while claiming SOTA quality and 10x speedup on Trellis-500K.

  23. WorldAct: Activating Monolithic 3D Worlds into Interactive-Ready Object-Centric Scenes

    cs.CV 2026-05 unverdicted novelty 5.0

    WorldAct activates monolithic 3D worlds into interactive scenes via multimodal agent-guided decomposition, geometrically aligned mesh reconstruction, and 3D inpainting.

  24. Hitem3D 2.0: Multi-View Guided Native 3D Texture Generation

    cs.CV 2026-04 unverdicted novelty 5.0

    Hitem3D 2.0 combines multi-view image synthesis with native 3D texture projection to improve completeness, cross-view consistency, and geometry alignment over prior methods.

  25. CG-MLLM: Captioning and Generating 3D content via Multi-modal Large Language Models

    cs.CV 2026-01 unverdicted novelty 5.0

    CG-MLLM is a multimodal LLM using a Mixture-of-Transformer architecture with separate TokenAR and BlockAR components integrated with a pre-trained vision-language backbone and 3D VAE to enable 3D captioning and high-f...

  26. DreamLifting: A Plug-in Module Lifting MV Diffusion Models for 3D Asset Generation

    cs.CV 2025-09 unverdicted novelty 5.0

    LGAA is a modular adapter framework that lifts multi-view diffusion models to produce 2D Gaussian Splats with PBR channels for high-quality relightable 3D mesh extraction using data-efficient finetuning on 69k instances.