hub Mixed citations

Structured 3D Latents for Scalable and Versatile 3D Generation

Jianfeng Xiang, Zelong Lv, Sicheng Xu, Yu Deng, Ruicheng Wang, Bowen Zhang · 2024 · cs.CV · arXiv 2412.01506

Mixed citation behavior. Most common role is background (40%).

55 Pith papers citing it

Background 40% of classified citations

open full Pith review browse 55 citing papers arXiv PDF

abstract

We introduce a novel 3D generation method for versatile and high-quality 3D asset creation. The cornerstone is a unified Structured LATent (SLAT) representation which allows decoding to different output formats, such as Radiance Fields, 3D Gaussians, and meshes. This is achieved by integrating a sparsely-populated 3D grid with dense multiview visual features extracted from a powerful vision foundation model, comprehensively capturing both structural (geometry) and textural (appearance) information while maintaining flexibility during decoding. We employ rectified flow transformers tailored for SLAT as our 3D generation models and train models with up to 2 billion parameters on a large 3D asset dataset of 500K diverse objects. Our model generates high-quality results with text or image conditions, significantly surpassing existing methods, including recent ones at similar scales. We showcase flexible output format selection and local 3D editing capabilities which were not offered by previous models. Code, model, and data will be released.

hub tools

JSON dossier citing papers JSON arXiv source

citation-role summary

background 4 baseline 2 dataset 2 method 2

citation-polarity summary

background 4 baseline 2 use dataset 2 use method 2

representative citing papers

Arbor: Explicit Geometric Conditioning for Controllable 3D Asset Generation

cs.CV · 2026-06-22 · unverdicted · novelty 7.0

Arbor attaches constraint mesh tokens to a frozen text-to-3D denoiser to enable controllable generation obeying hull, avoidance, and touch constraints.

Adaptive Volumetric Mechanical Property Fields Invariant to Resolution

cs.CV · 2026-06-16 · unverdicted · novelty 7.0

AdaVoMP predicts accurate dense spatially-varying Young's modulus, Poisson's ratio and density for 3D objects using an adaptive sparse voxel structure generated by a sparse transformer encoder-decoder at 16^3 higher resolution than prior fixed-voxel methods.

Garment Particles: A 2D--3D Symmetric Garment Representation for Generation and Editing

cs.GR · 2026-05-25 · unverdicted · novelty 7.0

Garment Particles is a 5D point cloud representation jointly encoding 2D sewing patterns and 3D geometry, supporting rectified flow generation from high-level inputs and diffusion-based editing of patterns or shapes.

GenRecon: Bridging Generative Priors for Multi-View 3D Scene Reconstruction

cs.CV · 2026-05-22 · unverdicted · novelty 7.0

GenRecon lifts object-level generative priors to scene-scale reconstruction by chunking scenes and using projection-based conditioning on multi-view features, claiming 16% better results than prior methods.

CAdam: Context-Adaptive Moment Estimation for 3D Gaussian Densification in Generative Distillation

cs.LG · 2026-05-20 · unverdicted · novelty 7.0

CAdam reinterprets densification in generative 3DGS as signal verification via gradient-moment interference, quantile context, and SNR gating to achieve large reductions in primitive count with comparable quality.

Towards Realistic and Consistent Orbital Video Generation via 3D Foundation Priors

cs.CV · 2026-04-14 · unverdicted · novelty 7.0

A video generation approach conditions a base model with multi-scale 3D latent features and a cross-attention adapter to produce geometrically realistic and consistent orbital videos from one image.

SEM-ROVER: Semantic Voxel-Guided Diffusion for Large-Scale Driving Scene Generation

cs.CV · 2026-04-07 · unverdicted · novelty 7.0

SEM-ROVER generates large multiview-consistent 3D urban driving scenes via semantic-conditioned diffusion on Σ-Voxfield voxel grids with progressive outpainting and deferred rendering.

MeshTailor: Cutting Seams via Generative Mesh Traversal

cs.GR · 2026-03-28 · unverdicted · novelty 7.0

MeshTailor is a mesh-native generative model that uses ChainingSeams serialization and a dual-stream transformer with pointer layers to trace coherent seams vertex-by-vertex on 3D surfaces.

ATATA: One Algorithm to Align Them All

cs.CV · 2026-01-16 · unverdicted · novelty 7.0

ATATA enables fast joint inference of structurally aligned pairs using Rectified Flow models via segment transport, improving state-of-the-art for image and video generation while matching 3D quality at much higher speed.

Affostruction: 3D Affordance Grounding with Generative Reconstruction

cs.CV · 2026-01-14 · unverdicted · novelty 7.0

Affostruction reconstructs full 3D object geometry from partial RGBD views and grounds text-based affordances on both visible and unobserved surfaces, reporting large gains over prior methods.

Voxify3D: Pixel Art Meets Volumetric Rendering

cs.CV · 2025-12-08 · unverdicted · novelty 7.0

Voxify3D generates voxel art from 3D meshes via orthographic pixel supervision, patch-based CLIP alignment, and palette-constrained Gumbel-Softmax quantization, achieving 37.12 CLIP-IQA and 77.90% user preference.

SVG360: Editable Multiview Vector Graphics from a Single SVG

cs.CV · 2025-11-20 · unverdicted · novelty 7.0

SVG360 lifts a single SVG to a view-conditioned representation, uses spatial memory to propagate consistent parts across views, and applies structure-aware vectorization to produce editable multiview SVGs.

GenHSI: Controllable Generation of Human-Scene Interaction Videos

cs.CV · 2025-06-24 · unverdicted · novelty 7.0

GenHSI is a training-free three-stage pipeline that turns a scene image, character image, and complex HSI prompt into long videos with plausible chained interactions by generating atomic actions, 3D keyframes via 2D inpainting plus optimization, and then feeding them to pre-trained video diffusion.

PixGS: Pixel-Space Diffusion for Direct 3D Gaussian Splat Generation

cs.CV · 2026-07-02 · unverdicted · novelty 6.0

A single-stage pixel-space diffusion model for direct 3D Gaussian Splat generation that bypasses latent compression and adds geometric supervisions to outperform prior multi-stage methods.

HiFiVe: High-Fidelity Vehicle Generation Leveraging Auto-Regressive 2D Generative Priors

cs.CV · 2026-06-24 · unverdicted · novelty 6.0

HiFiVe is a training-free framework using an auto-regressive texture refinement pipeline with depth-based warping, multi-view fusion, and symmetry to enhance both texture and geometry fidelity in vehicle generation from 2D priors.

Generative Relightable Avatars

cs.CV · 2026-06-21 · unverdicted · novelty 6.0

GRA combines UV-space material optimization and physics rendering with feed-forward texture refinement and a fine-tuned video-to-video diffusion model to achieve controllable, high-detail relighting of full-body avatars.

Lighting-Consistent Object Transfer Across Radiance Fields

cs.GR · 2026-06-21 · unverdicted · novelty 6.0

Diffusion-based per-view harmonization for lighting-consistent object transfer between 3DGS scenes, using heterogeneous training data and final 3D consolidation.

Judging to Improve: A De-biased VLM-as-3D-Judge Protocol for Single-Image 3D Generation

cs.LG · 2026-06-18 · unverdicted · novelty 6.0

A de-biased VLM judge protocol is applied to adapt TRELLIS for single-image furniture 3D generation but yields no improvement over the strong public base across six methods.

Do as I Do: Dexterous Manipulation Data from Everyday Human Videos

cs.RO · 2026-06-17 · unverdicted · novelty 6.0

DO AS I DO reconstructs and retargets hand-object interactions from in-the-wild monocular RGB videos to produce dexterous robot manipulation trajectories, outperforming prior methods on ground-truth and online video datasets.

Surflo: Consistent 3D Surface Flow Model with Global State

cs.CV · 2026-06-11 · unverdicted · novelty 6.0

Surflo compresses unposed RGB views into K global latent tokens and uses flow matching with photometric guidance to decode consistent arbitrary-resolution 3D surface points in one forward pass.

MeshFlow: Efficient Artistic Mesh Generation via MeshVAE and Flow-based Diffusion Transformer

cs.CV · 2026-06-03 · unverdicted · novelty 6.0

MeshFlow uses a contrastive MeshVAE for compact mesh latents and a flow transformer for parallel generation, claiming 18x speedup over autoregressive methods with high accuracy on standard metrics.

PerceptTwin: Semantic Scene Reconstruction for Iterative LLM Planning and Verification

cs.RO · 2026-06-02 · unverdicted · novelty 6.0

PerceptTwin creates interactive simulations from open-vocabulary object maps for verifying and refining LLM robot plans, reporting ~39% higher success rates and up to 18% better human verification.

PhyGenHOI: Physically-Aware 4D Generation of Dynamic Human-Object Interactions

cs.CV · 2026-05-28 · unverdicted · novelty 6.0

PhyGenHOI couples a motion diffusion model for humans with material point method simulation for objects on 3D Gaussians, using attraction loss, contact re-simulation, and masked video-SDS to produce physically consistent dynamic interactions from text.

Fishbone: From One 3D Asset to a Million Controllable Edits

cs.CV · 2026-05-24 · unverdicted · novelty 6.0

Fishbone introduces a unified rib-spine representation computed via adaptive heat method, iso-contour ribs, and geometry-aware spine that enables real-time parametric deformation, reduced-space simulation, and animation on general meshes.

citing papers explorer

Showing 2 of 2 citing papers after filters.

ReplicateAnyScene: Zero-Shot Video-to-3D Composition via Textual-Visual-Spatial Alignment cs.CV · 2026-04-12 · unverdicted · none · ref 63 · internal anchor
ReplicateAnyScene performs fully automated zero-shot video-to-compositional-3D reconstruction by cascading alignments of generic priors from vision foundation models across textual, visual, and spatial dimensions.
Physically Grounded 3D Generative Reconstruction under Hand Occlusion using Proprioception and Multi-Contact Touch cs.CV · 2026-04-10 · unreviewed · ref 68 · internal anchor

Structured 3D Latents for Scalable and Versatile 3D Generation

hub tools

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer