hub Mixed citations

Structured 3D Latents for Scalable and Versatile 3D Generation

Jianfeng Xiang, Zelong Lv, Sicheng Xu, Yu Deng, Ruicheng Wang, Bowen Zhang · 2024 · cs.CV · arXiv 2412.01506

Mixed citation behavior. Most common role is background (40%).

39 Pith papers citing it

Background 40% of classified citations

open full Pith review browse 39 citing papers arXiv PDF

abstract

We introduce a novel 3D generation method for versatile and high-quality 3D asset creation. The cornerstone is a unified Structured LATent (SLAT) representation which allows decoding to different output formats, such as Radiance Fields, 3D Gaussians, and meshes. This is achieved by integrating a sparsely-populated 3D grid with dense multiview visual features extracted from a powerful vision foundation model, comprehensively capturing both structural (geometry) and textural (appearance) information while maintaining flexibility during decoding. We employ rectified flow transformers tailored for SLAT as our 3D generation models and train models with up to 2 billion parameters on a large 3D asset dataset of 500K diverse objects. Our model generates high-quality results with text or image conditions, significantly surpassing existing methods, including recent ones at similar scales. We showcase flexible output format selection and local 3D editing capabilities which were not offered by previous models. Code, model, and data will be released.

hub tools

JSON dossier citing papers JSON arXiv source

citation-role summary

background 4 baseline 2 dataset 2 method 2

citation-polarity summary

background 4 baseline 2 use dataset 2 use method 2

representative citing papers

Garment Particles: A 2D--3D Symmetric Garment Representation for Generation and Editing

cs.GR · 2026-05-25 · unverdicted · novelty 7.0

Garment Particles is a 5D point cloud representation jointly encoding 2D sewing patterns and 3D geometry, supporting rectified flow generation from high-level inputs and diffusion-based editing of patterns or shapes.

GenRecon: Bridging Generative Priors for Multi-View 3D Scene Reconstruction

cs.CV · 2026-05-22 · unverdicted · novelty 7.0

GenRecon lifts object-level generative priors to scene-scale reconstruction by chunking scenes and using projection-based conditioning on multi-view features, claiming 16% better results than prior methods.

CAdam: Context-Adaptive Moment Estimation for 3D Gaussian Densification in Generative Distillation

cs.LG · 2026-05-20 · unverdicted · novelty 7.0

CAdam reinterprets densification in generative 3DGS as signal verification via gradient-moment interference, quantile context, and SNR gating to achieve large reductions in primitive count with comparable quality.

Towards Realistic and Consistent Orbital Video Generation via 3D Foundation Priors

cs.CV · 2026-04-14 · unverdicted · novelty 7.0

A video generation approach conditions a base model with multi-scale 3D latent features and a cross-attention adapter to produce geometrically realistic and consistent orbital videos from one image.

Physically Grounded 3D Generative Reconstruction under Hand Occlusion using Proprioception and Multi-Contact Touch

cs.CV · 2026-04-10 · unverdicted · novelty 7.0

A conditional diffusion model using proprioception and multi-contact touch produces metric-scale, physically consistent 3D object reconstructions under hand occlusion.

SEM-ROVER: Semantic Voxel-Guided Diffusion for Large-Scale Driving Scene Generation

cs.CV · 2026-04-07 · unverdicted · novelty 7.0

SEM-ROVER generates large multiview-consistent 3D urban driving scenes via semantic-conditioned diffusion on Σ-Voxfield voxel grids with progressive outpainting and deferred rendering.

MeshTailor: Cutting Seams via Generative Mesh Traversal

cs.GR · 2026-03-28 · unverdicted · novelty 7.0

MeshTailor is a mesh-native generative model that uses ChainingSeams serialization and a dual-stream transformer with pointer layers to trace coherent seams vertex-by-vertex on 3D surfaces.

ATATA: One Algorithm to Align Them All

cs.CV · 2026-01-16 · unverdicted · novelty 7.0

ATATA enables fast joint inference of structurally aligned pairs using Rectified Flow models via segment transport, improving state-of-the-art for image and video generation while matching 3D quality at much higher speed.

Affostruction: 3D Affordance Grounding with Generative Reconstruction

cs.CV · 2026-01-14 · unverdicted · novelty 7.0

Affostruction reconstructs full 3D object geometry from partial RGBD views and grounds text-based affordances on both visible and unobserved surfaces, reporting large gains over prior methods.

Voxify3D: Pixel Art Meets Volumetric Rendering

cs.CV · 2025-12-08 · unverdicted · novelty 7.0

Voxify3D generates voxel art from 3D meshes via orthographic pixel supervision, patch-based CLIP alignment, and palette-constrained Gumbel-Softmax quantization, achieving 37.12 CLIP-IQA and 77.90% user preference.

SVG360: Editable Multiview Vector Graphics from a Single SVG

cs.CV · 2025-11-20 · unverdicted · novelty 7.0

SVG360 lifts a single SVG to a view-conditioned representation, uses spatial memory to propagate consistent parts across views, and applies structure-aware vectorization to produce editable multiview SVGs.

GenHSI: Controllable Generation of Human-Scene Interaction Videos

cs.CV · 2025-06-24 · unverdicted · novelty 7.0

GenHSI is a training-free three-stage pipeline that turns a scene image, character image, and complex HSI prompt into long videos with plausible chained interactions by generating atomic actions, 3D keyframes via 2D inpainting plus optimization, and then feeding them to pre-trained video diffusion.

Lighting-Consistent Object Transfer Across Radiance Fields

cs.GR · 2026-06-21 · unverdicted · novelty 6.0

Diffusion-based per-view harmonization for lighting-consistent object transfer between 3DGS scenes, using heterogeneous training data and final 3D consolidation.

PhyGenHOI: Physically-Aware 4D Generation of Dynamic Human-Object Interactions

cs.CV · 2026-05-28 · unverdicted · novelty 6.0

PhyGenHOI couples a motion diffusion model for humans with material point method simulation for objects on 3D Gaussians, using attraction loss, contact re-simulation, and masked video-SDS to produce physically consistent dynamic interactions from text.

PhysX-Omni: Unified Simulation-Ready Physical 3D Generation for Rigid, Deformable, and Articulated Objects

cs.CV · 2026-05-20 · unverdicted · novelty 6.0

PhysX-Omni unifies simulation-ready 3D asset generation across rigid, deformable, and articulated objects via a new geometry representation, the PhysXVerse dataset, and the PhysX-Bench evaluation suite.

ROAR-3D: Routing Arbitrary Views for High-Fidelity 3D Generation

cs.CV · 2026-05-20 · unverdicted · novelty 6.0

ROAR-3D adds a token-wise view router and dual-stream attention to pretrained single-view 3D generators so they can use arbitrary unposed images for higher-fidelity output.

PhysForge: Generating Physics-Grounded 3D Assets for Interactive Virtual World

cs.CV · 2026-05-06 · unverdicted · novelty 6.0

PhysForge generates physics-grounded 3D assets via a VLM-planned Hierarchical Physical Blueprint and a KineVoxel Injection diffusion model, backed by the new PhysDB dataset of 150,000 annotated assets.

Velox: Learning Representations of 4D Geometry and Appearance

cs.CV · 2026-05-06 · unverdicted · novelty 6.0

Velox compresses dynamic point clouds into latent tokens that support geometry via 4D surface modeling and appearance via 3D Gaussians, showing strong results on video-to-4D generation, tracking, and image-to-4D cloth simulation.

MeshReGen: A Unified 3D Geometry Regeneration Framework

cs.CV · 2026-04-30 · unverdicted · novelty 6.0 · 2 refs

MeshReGen introduces a conditioned 3D geometry regenerator with VecSet that learns a regeneration prior via self-supervision and reports state-of-the-art results on controllable generation tasks.

REVIVE 3D: Refinement via Encoded Voluminous Inflated prior for Volume Enhancement

cs.CV · 2026-04-30 · unverdicted · novelty 6.0

REVIVE 3D generates voluminous 3D assets from flat 2D images via an inflated prior construction followed by latent-space refinement, plus new metrics for volume and flatness validated by user study.

Pair2Scene: Learning Local Object Relations for Procedural Scene Generation

cs.CV · 2026-04-13 · unverdicted · novelty 6.0 · 2 refs

Pair2Scene generates complex 3D scenes beyond training data by training a network on local object-pair placement rules and applying them recursively with collision-aware sampling.

WARPED: Wrist-Aligned Rendering for Robot Policy Learning from Egocentric Human Demonstrations

cs.RO · 2026-04-12 · unverdicted · novelty 6.0

WARPED synthesizes realistic wrist-view observations from monocular egocentric human videos via foundation models, hand-object tracking, retargeting, and Gaussian Splatting to train visuomotor policies that match teleoperation success rates on five tabletop tasks with 5-8x less collection effort.

ReplicateAnyScene: Zero-Shot Video-to-3D Composition via Textual-Visual-Spatial Alignment

cs.CV · 2026-04-12 · unverdicted · novelty 6.0

ReplicateAnyScene performs fully automated zero-shot video-to-compositional-3D reconstruction by cascading alignments of generic priors from vision foundation models across textual, visual, and spatial dimensions.

UniRecGen: Unifying Multi-View 3D Reconstruction and Generation

cs.CV · 2026-04-01 · unverdicted · novelty 6.0

UniRecGen unifies reconstruction and generation via shared canonical space and disentangled cooperative learning to produce complete, consistent 3D models from sparse views.

citing papers explorer

Showing 2 of 2 citing papers after filters.

Voxify3D: Pixel Art Meets Volumetric Rendering cs.CV · 2025-12-08 · unverdicted · none · ref 103 · internal anchor
Voxify3D generates voxel art from 3D meshes via orthographic pixel supervision, patch-based CLIP alignment, and palette-constrained Gumbel-Softmax quantization, achieving 37.12 CLIP-IQA and 77.90% user preference.
Depth Anything 3: Recovering the Visual Space from Any Views cs.CV · 2025-11-13 · unverdicted · none · ref 104 · internal anchor
DA3 recovers consistent visual geometry from arbitrary views via a vanilla DINO transformer and depth-ray target, setting new SOTA on a visual geometry benchmark while outperforming DA2 on monocular depth.

Structured 3D Latents for Scalable and Versatile 3D Generation

hub tools

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer