pith. sign in

hub Canonical reference

Native and Compact Structured Latents for 3D Generation

Canonical reference. 80% of citing Pith papers cite this work as background.

28 Pith papers citing it
Background 80% of classified citations
abstract

Recent advancements in 3D generative modeling have significantly improved the generation realism, yet the field is still hampered by existing representations, which struggle to capture assets with complex topologies and detailed appearance. This paper present an approach for learning a structured latent representation from native 3D data to address this challenge. At its core is a new sparse voxel structure called O-Voxel, an omni-voxel representation that encodes both geometry and appearance. O-Voxel can robustly model arbitrary topology, including open, non-manifold, and fully-enclosed surfaces, while capturing comprehensive surface attributes beyond texture color, such as physically-based rendering parameters. Based on O-Voxel, we design a Sparse Compression VAE which provides a high spatial compression rate and a compact latent space. We train large-scale flow-matching models comprising 4B parameters for 3D generation using diverse public 3D asset datasets. Despite their scale, inference remains highly efficient. Meanwhile, the geometry and material quality of our generated assets far exceed those of existing models. We believe our approach offers a significant advancement in 3D generative modeling.

hub tools

citation-role summary

background 8 method 2

citation-polarity summary

years

2026 28

clear filters

representative citing papers

Rigel3D: Rig-aware Latents for Animation-Ready 3D Asset Generation

cs.GR · 2026-05-13 · unverdicted · novelty 8.0

Rigel3D jointly generates rigged 3D meshes with geometry, skeleton topology, joint positions, and skinning weights using coupled surface and skeleton latent representations for image-conditioned animation-ready asset synthesis.

Count Anything at Any Granularity

cs.CV · 2026-05-11 · unverdicted · novelty 7.0

Multi-grained counting is introduced with five granularity levels, supported by the new KubriCount dataset generated via 3D synthesis and editing, and HieraCount model that combines text and visual exemplars for improved accuracy.

Velocity-Space 3D Asset Editing

cs.GR · 2026-05-08 · unverdicted · novelty 7.0

VS3D performs local 3D asset editing by injecting reconstruction-anchored source signals, partial-mean guidance, and twin-agreement residuals into the velocity sampler to control edit strength and preserve identity.

Helix4D: Complex 4D Mesh Generation

cs.CV · 2026-05-25 · unverdicted · novelty 6.0

Helix4D generates high-quality dynamic 4D meshes from videos by extending Trellis2 with sliding-window cross-frame attention anchored on the first frame and a repurposed 4D temporal encoding.

Pixal3D: Pixel-Aligned 3D Generation from Images

cs.CV · 2026-05-11 · unverdicted · novelty 6.0

Pixal3D performs pixel-aligned 3D generation from images via back-projected multi-scale feature volumes, achieving fidelity close to reconstruction while supporting multi-view and scene synthesis.

Generative 3D Gaussians with Learned Density Control

cs.GR · 2026-05-08 · unverdicted · novelty 6.0

DeG models 3D Gaussians via learned octree density and uses VecSeq Sobol re-indexing to turn set generation into sequence modeling, claiming SOTA quality in single-image-to-3D.

AssetGen: Deployable 3D Asset Generation at Interactive Speed

cs.GR · 2026-05-22 · unverdicted · novelty 5.0

AssetGen is a system that produces deployable 3D assets including meshes, baked normals, and textures from a single reference image in under 30 seconds via a coarse-to-refine VecSet pipeline and co-designed optimizations.

CMAG: Concept-Scaffolded Retrieval for Marketplace Avatar Generation

cs.CV · 2026-05-18 · unverdicted · novelty 5.0

CMAG combines 3D concept scaffolding, prompt decomposition, taxonomy routing, hybrid retrieval, and agentic VLM verification to assemble topologically consistent avatars from catalog assets given free-form text prompts.

Asset Harvester: Extracting 3D Assets from Autonomous Driving Logs for Simulation

cs.CV · 2026-04-20 · unverdicted · novelty 5.0

Asset Harvester converts sparse in-the-wild object observations from AV driving logs into complete simulation-ready 3D assets via data curation, geometry-aware preprocessing, and a SparseViewDiT model that couples sparse-view multiview generation with 3D Gaussian lifting.

citing papers explorer

Showing 24 of 24 citing papers after filters.

  • Rigel3D: Rig-aware Latents for Animation-Ready 3D Asset Generation cs.GR · 2026-05-13 · unverdicted · none · ref 11 · internal anchor

    Rigel3D jointly generates rigged 3D meshes with geometry, skeleton topology, joint positions, and skinning weights using coupled surface and skeleton latent representations for image-conditioned animation-ready asset synthesis.

  • Feedforward 3D Editing Learns from Semantic-Part Transformation cs.CV · 2026-05-26 · unverdicted · none · ref 7 · 2 links · internal anchor

    Pxform provides 100K semantic-part 3D edit pairs; PartFlow uses them to deliver feedforward 3D editing with improved fidelity and preservation over prior methods.

  • GenRecon: Bridging Generative Priors for Multi-View 3D Scene Reconstruction cs.CV · 2026-05-22 · unverdicted · none · ref 25 · internal anchor

    GenRecon lifts object-level generative priors to scene-scale reconstruction by chunking scenes and using projection-based conditioning on multi-view features, claiming 16% better results than prior methods.

  • Count Anything at Any Granularity cs.CV · 2026-05-11 · unverdicted · none · ref 74 · internal anchor

    Multi-grained counting is introduced with five granularity levels, supported by the new KubriCount dataset generated via 3D synthesis and editing, and HieraCount model that combines text and visual exemplars for improved accuracy.

  • Velocity-Space 3D Asset Editing cs.GR · 2026-05-08 · unverdicted · none · ref 7 · internal anchor

    VS3D performs local 3D asset editing by injecting reconstruction-anchored source signals, partial-mean guidance, and twin-agreement residuals into the velocity sampler to control edit strength and preserve identity.

  • Geometrically Consistent Multi-View Scene Generation from Freehand Sketches cs.CV · 2026-04-15 · unverdicted · none · ref 51 · internal anchor

    A framework generates consistent multi-view scenes from one freehand sketch via a ~9k-sample dataset, Parallel Camera-Aware Attention Adapters, and Sparse Correspondence Supervision Loss, outperforming baselines in realism and consistency.

  • Any 3D Scene is Worth 1K Tokens: 3D-Grounded Representation for Scene Generation at Scale cs.CV · 2026-04-13 · unverdicted · none · ref 77 · internal anchor

    A 3D-grounded autoencoder and diffusion transformer allow direct generation of 3D scenes in an implicit latent space using a fixed 1K-token representation for arbitrary views and resolutions.

  • Helix4D: Complex 4D Mesh Generation cs.CV · 2026-05-25 · unverdicted · none · ref 30 · internal anchor

    Helix4D generates high-quality dynamic 4D meshes from videos by extending Trellis2 with sliding-window cross-frame attention anchored on the first frame and a repurposed 4D temporal encoding.

  • PhysX-Omni: Unified Simulation-Ready Physical 3D Generation for Rigid, Deformable, and Articulated Objects cs.CV · 2026-05-20 · unverdicted · none · ref 6 · internal anchor

    PhysX-Omni unifies simulation-ready 3D asset generation across rigid, deformable, and articulated objects via a new geometry representation, the PhysXVerse dataset, and the PhysX-Bench evaluation suite.

  • Stream3D: Sequential Multi-View 3D Generation via Evidential Memory cs.CV · 2026-05-20 · unverdicted · none · ref 78 · 2 links · internal anchor

    Stream3D is a training-free method that maintains a fixed-size evidential memory of past frames to convert frozen view-conditioned 3D generators into consistent streaming generators.

  • ROAR-3D: Routing Arbitrary Views for High-Fidelity 3D Generation cs.CV · 2026-05-20 · unverdicted · none · ref 62 · internal anchor

    ROAR-3D adds a token-wise view router and dual-stream attention to pretrained single-view 3D generators so they can use arbitrary unposed images for higher-fidelity output.

  • Pixal3D: Pixel-Aligned 3D Generation from Images cs.CV · 2026-05-11 · unverdicted · none · ref 12 · internal anchor

    Pixal3D performs pixel-aligned 3D generation from images via back-projected multi-scale feature volumes, achieving fidelity close to reconstruction while supporting multi-view and scene synthesis.

  • Generative 3D Gaussians with Learned Density Control cs.GR · 2026-05-08 · unverdicted · none · ref 62 · internal anchor

    DeG models 3D Gaussians via learned octree density and uses VecSeq Sobol re-indexing to turn set generation into sequence modeling, claiming SOTA quality in single-image-to-3D.

  • VolFill: Single-View Amodal 3D Scene Reconstruction with Volumetric Flow Matching cs.CV · 2026-05-29 · unverdicted · none · ref 84 · internal anchor

    VolFill uses a hybrid 3D VAE to compress sparse truncated unsigned distance function grids into latent space and a latent Diffusion Transformer to denoise complete scenes, conditioned on geometry foundation models, outperforming baselines on SCRREAM and NRGB-D datasets.

  • SuperVoxelGPT: Adaptive and Ordered 3D Tokenization for Autoregressive Shape Generation cs.CV · 2026-05-28 · unverdicted · none · ref 41 · internal anchor

    SuperVoxelGPT creates shape-adaptive, deterministically ordered supervoxel tokens via saliency-guided CVT, cutting sequence length to 12.8% of uniform voxels while claiming SOTA quality and 10x speedup on Trellis-500K.

  • AssetGen: Deployable 3D Asset Generation at Interactive Speed cs.GR · 2026-05-22 · unverdicted · none · ref 23 · internal anchor

    AssetGen is a system that produces deployable 3D assets including meshes, baked normals, and textures from a single reference image in under 30 seconds via a coarse-to-refine VecSet pipeline and co-designed optimizations.

  • CMAG: Concept-Scaffolded Retrieval for Marketplace Avatar Generation cs.CV · 2026-05-18 · unverdicted · none · ref 17 · internal anchor

    CMAG combines 3D concept scaffolding, prompt decomposition, taxonomy routing, hybrid retrieval, and agentic VLM verification to assemble topologically consistent avatars from catalog assets given free-form text prompts.

  • EVA01: Unified Native 3D Understanding and Generation via Mixture-of-Transformers cs.CV · 2026-05-16 · unverdicted · none · ref 64 · internal anchor

    EVA01 introduces a Mixture-of-Transformers model that natively adds 3D mesh understanding, generation, and multi-turn editing to MLLMs by decoupling understanding and generation experts with shared global self-attention.

  • Pose Tracking with a Foundation Pose Model and an Ensemble Directional Kalman Filter cs.LG · 2026-05-04 · unverdicted · none · ref 30 · internal anchor

    EnDKF combines ensemble Kalman filtering with directional statistics and unit quaternions to achieve lower pose tracking error than raw measurements in synthetic constant-velocity tests and FoundationPose-based head tracking.

  • From Visual Synthesis to Interactive Worlds: Toward Production-Ready 3D Asset Generation cs.GR · 2026-04-26 · unverdicted · none · ref 34 · 2 links · internal anchor

    The paper surveys 3D asset generation methods and organizes them around the full production pipeline to assess which outputs meet engine-level requirements for interactive applications.

  • Asset Harvester: Extracting 3D Assets from Autonomous Driving Logs for Simulation cs.CV · 2026-04-20 · unverdicted · none · ref 38 · internal anchor

    Asset Harvester converts sparse in-the-wild object observations from AV driving logs into complete simulation-ready 3D assets via data curation, geometry-aware preprocessing, and a SparseViewDiT model that couples sparse-view multiview generation with 3D Gaussian lifting.

  • Hitem3D 2.0: Multi-View Guided Native 3D Texture Generation cs.CV · 2026-04-10 · unverdicted · none · ref 49 · internal anchor

    Hitem3D 2.0 combines multi-view image synthesis with native 3D texture projection to improve completeness, cross-view consistency, and geometry alignment over prior methods.

  • Seed3D 2.0: Advancing High-Fidelity Simulation-Ready 3D Content Generation cs.GR · 2026-04-22 · unverdicted · none · ref 21 · internal anchor

    Seed3D 2.0 advances 3D content generation via a coarse-to-fine geometry pipeline, unified PBR material model, and simulation-ready scene tools, reporting 69-89.9% win rates over commercial systems in human studies.

  • 3D Generation for Embodied AI and Robotic Simulation: A Survey cs.RO · 2026-04-29 · unverdicted · none · ref 102 · 3 links · internal anchor

    The paper surveys 3D generation techniques for embodied AI and robotics, categorizing them into data generation, simulation environments, and sim-to-real bridging while identifying bottlenecks in physical validity and transfer.