The work creates the first dataset and baseline for generating emission textures on 3D objects to reproduce glowing materials from input images.
super hub Mixed citations
ShapeNet: An Information-Rich 3D Model Repository
Mixed citation behavior. Most common role is background (57%).
abstract
We present ShapeNet: a richly-annotated, large-scale repository of shapes represented by 3D CAD models of objects. ShapeNet contains 3D models from a multitude of semantic categories and organizes them under the WordNet taxonomy. It is a collection of datasets providing many semantic annotations for each 3D model such as consistent rigid alignments, parts and bilateral symmetry planes, physical sizes, keywords, as well as other planned annotations. Annotations are made available through a public web-based interface to enable data visualization of object attributes, promote data-driven geometric analysis, and provide a large-scale quantitative benchmark for research in computer graphics and vision. At the time of this technical report, ShapeNet has indexed more than 3,000,000 models, 220,000 models out of which are classified into 3,135 categories (WordNet synsets). In this report we describe the ShapeNet effort as a whole, provide details for all currently available datasets, and summarize future plans.
hub tools
citation-role summary
citation-polarity summary
claims ledger
- abstract We present ShapeNet: a richly-annotated, large-scale repository of shapes represented by 3D CAD models of objects. ShapeNet contains 3D models from a multitude of semantic categories and organizes them under the WordNet taxonomy. It is a collection of datasets providing many semantic annotations for each 3D model such as consistent rigid alignments, parts and bilateral symmetry planes, physical sizes, keywords, as well as other planned annotations. Annotations are made available through a public web-based interface to enable data visualization of object attributes, promote data-driven geometri
authors
co-cited works
representative citing papers
ARKitScenes is the largest real-world indoor RGB-D dataset captured with mobile LiDAR, including high-resolution depth maps and 3D furniture bounding box annotations for advancing object detection and depth upsampling.
WarpHammer densifies scene warps with 3D object priors from generative models and fuses pose-unknown auxiliary views via multi-view geometry to enable stable extreme novel view synthesis.
3D-CoS represents 3D objects as Blender code generated by VLMs, with workflows for planning, RAG, and agents, showing better edit fidelity than point-cloud baselines.
Diffusion for 3D shapes is moved from dense geometry to compact superquadric parameter sets, cutting state size to roughly 7 KB per shape and enabling faster generation plus new editing capabilities.
VLMs exhibit consistent vertical-distance entanglement in embeddings from perspective bias in natural images, producing accuracy gaps that a new synthetic benchmark SpatialTunnel exposes as model-intrinsic.
Morpheus learns morphable category-level shape priors to produce implicit 3D correspondences in camera space without explicit supervision and releases the HouseCorr3D benchmark with amodal and symmetry annotations.
Metric-Phase Fields decouple unsigned metric proximity from a smooth phase field with learnable sharpness to enable faithful reconstruction of thin and open structures from point clouds.
ArtSplat is the first feed-forward framework for articulated 3D Gaussian Splatting that reconstructs geometry and joints from sparse multi-state uncalibrated views in one pass.
MAPS provides 2618 validated 3D meshes and a controllable rendering pipeline to attribute vision model recognition failures to specific scene parameters, finding camera distance and elevation as the dominant failure factors across 20 tested models.
OffsetAxis reconstructs meshes from unsigned distance fields by extracting the medial axis of the alpha-offset volume using ray casting and variational medial ball optimization.
min-GSGW learns coupled nonlinear slicers to produce a rigid-motion-invariant, scalable approximation to the Gromov-Wasserstein distance and its transport plans.
Img2CADSeq generates standard CAD sequences from images via a multi-stage pipeline with three-level hierarchical codebook encoding, importance-guided compression, and contrastive point-cloud conditioning of a VQ-Diffusion model, outperforming prior methods on new CAD-220K and PrintCAD datasets.
Multi-grained counting is introduced with five granularity levels, supported by the new KubriCount dataset generated via 3D synthesis and editing, and HieraCount model that combines text and visual exemplars for improved accuracy.
Language representations serve as the asymptotic attractor for convergence in independently trained multimodal neural networks due to feature density asymmetry.
MeshFIM enables local low-poly mesh editing by autoregressively filling target regions conditioned on context, using boundary markers, positional embeddings, and a gated geometry encoder to enforce attachment, topology, and region limits.
Reinforcement learning internalizes physical stability rules for brick structures, enabling the first rollback-free generation with orders-of-magnitude faster inference.
Consistency learning reformulates 3D point cloud anomaly detection to predict clean geometry directly in one or two steps, yielding up to 80 times faster inference while matching state-of-the-art accuracy.
ADS adaptively refines a Delaunay scaffold to produce unbiased random samples on occupancy function surfaces together with a connecting mesh, using far fewer evaluations than existing approaches.
OGPP is a particle flow-matching method using orbit-space canonicalization and geometric paths that achieves lower error and fewer steps than prior approaches on 3D benchmarks.
AirZoo is a new dataset covering 378 regions across 22 countries with pixel-level metric depth and 6-DoF poses, shown via benchmarks to improve SoTA models on aerial image retrieval, cross-view matching, and multi-view 3D reconstruction.
Topo-ADV uses differentiable persistent homology to create topology-altering perturbations that achieve up to 100% attack success on point cloud classifiers like PointNet while remaining geometrically imperceptible.
XShapeEnc encodes arbitrary 2D spatially grounded shapes into compact invertible representations by decomposing them into unit-disk geometry and harmonic pose fields then applying Zernike bases with frequency propagation.
3D-Fixer performs in-place 3D asset completion from single-view partial point clouds via coarse-to-fine generation with ORFA conditioning, plus a new ARSG-110K dataset, to achieve higher geometric accuracy than MIDI and Gen3DSR while keeping diffusion efficiency.
citing papers explorer
-
Towards Realistic 3D Emission Materials: Dataset, Baseline, and Evaluation for Emission Texture Generation
The work creates the first dataset and baseline for generating emission textures on 3D objects to reproduce glowing materials from input images.
-
ARKitScenes: A Diverse Real-World Dataset For 3D Indoor Scene Understanding Using Mobile RGB-D Data
ARKitScenes is the largest real-world indoor RGB-D dataset captured with mobile LiDAR, including high-resolution depth maps and 3D furniture bounding box annotations for advancing object detection and depth upsampling.
-
WarpHammer: Densifying Scene Warps with 3D Object Priors for Extreme View Synthesis
WarpHammer densifies scene warps with 3D object priors from generative models and fuses pose-unknown auxiliary views via multi-view geometry to enable stable extreme novel view synthesis.
-
3D-CoS: A New 3D Reconstruction Paradigm Based on VLM Code Synthesis
3D-CoS represents 3D objects as Blender code generated by VLMs, with workflows for planning, RAG, and agents, showing better edit fidelity than point-cloud baselines.
-
Rethinking 3D Shape Generation: Diffusion over Superquadrics
Diffusion for 3D shapes is moved from dense geometry to compact superquadric parameter sets, cutting state size to roughly 7 KB per shape and enabling faster generation plus new editing capabilities.
-
Why Far Looks Up: Probing Spatial Representation in Vision-Language Models
VLMs exhibit consistent vertical-distance entanglement in embeddings from perspective bias in natural images, producing accuracy gaps that a new synthetic benchmark SpatialTunnel exposes as model-intrinsic.
-
Category-Level 3D Correspondence in Camera Space via Morphable Object Priors
Morpheus learns morphable category-level shape priors to produce implicit 3D correspondences in camera space without explicit supervision and releases the HouseCorr3D benchmark with amodal and symmetry annotations.
-
Metric--Phase Fields: Decoupling Distance and Sign for Thin-Structure Reconstruction from Unoriented Point Clouds
Metric-Phase Fields decouple unsigned metric proximity from a smooth phase field with learnable sharpness to enable faithful reconstruction of thin and open structures from point clouds.
-
ArtSplat: Feed-Forward Articulated 3D Gaussian Splatting from Sparse Multi-State Uncalibrated Views
ArtSplat is the first feed-forward framework for articulated 3D Gaussian Splatting that reconstructs geometry and joints from sparse multi-state uncalibrated views in one pass.
-
MAPS: A Synthetic Dataset for Probing Vision Models in a Controlled 3D Scene Space
MAPS provides 2618 validated 3D meshes and a controllable rendering pipeline to attribute vision model recognition failures to specific scene parameters, finding camera distance and elevation as the dominant failure factors across 20 tested models.
-
OffsetAxis: UDF Mesh Reconstruction via Offset-Volume Medial Axis Extraction
OffsetAxis reconstructs meshes from unsigned distance fields by extracting the medial axis of the alpha-offset volume using ray casting and variational medial ball optimization.
-
Min Generalized Sliced Gromov Wasserstein: A Scalable Path to Gromov Wasserstein
min-GSGW learns coupled nonlinear slicers to produce a rigid-motion-invariant, scalable approximation to the Gromov-Wasserstein distance and its transport plans.
-
Img2CADSeq: Image-to-CAD Generation via Sequence-Based Diffusion
Img2CADSeq generates standard CAD sequences from images via a multi-stage pipeline with three-level hierarchical codebook encoding, importance-guided compression, and contrastive point-cloud conditioning of a VQ-Diffusion model, outperforming prior methods on new CAD-220K and PrintCAD datasets.
-
Count Anything at Any Granularity
Multi-grained counting is introduced with five granularity levels, supported by the new KubriCount dataset generated via 3D synthesis and editing, and HieraCount model that combines text and visual exemplars for improved accuracy.
-
The Wittgensteinian Representation Hypothesis: Is Language the Attractor of Multimodal Convergence?
Language representations serve as the asymptotic attractor for convergence in independently trained multimodal neural networks due to feature density asymmetry.
-
MeshFIM: Local Low-Poly Mesh Editing via Fill-in-the-Middle Autoregressive Generation
MeshFIM enables local low-poly mesh editing by autoregressively filling target regions conditioned on context, using boundary markers, positional embeddings, and a gated geometry encoder to enforce attachment, topology, and region limits.
-
Rollback-Free Stable Brick Structures Generation
Reinforcement learning internalizes physical stability rules for brick structures, enabling the first rollback-free generation with orders-of-magnitude faster inference.
-
Two Steps Are All You Need: Efficient 3D Point Cloud Anomaly Detection with Consistency Models
Consistency learning reformulates 3D point cloud anomaly detection to predict clean geometry directly in one or two steps, yielding up to 80 times faster inference while matching state-of-the-art accuracy.
-
ADS: Random Sampling of Occupancy Functions using Adaptive Delaunay Scaffolding
ADS adaptively refines a Delaunay scaffold to produce unbiased random samples on occupancy function surfaces together with a connecting mesh, using far fewer evaluations than existing approaches.
-
Generative Modeling with Orbit-Space Particle Flow Matching
OGPP is a particle flow-matching method using orbit-space canonicalization and geometric paths that achieves lower error and fewer steps than prior approaches on 3D benchmarks.
-
AirZoo: A Unified Large-Scale Dataset for Grounding Aerial Geometric 3D Vision
AirZoo is a new dataset covering 378 regions across 22 countries with pixel-level metric depth and 6-DoF poses, shown via benchmarks to improve SoTA models on aerial image retrieval, cross-view matching, and multi-view 3D reconstruction.
-
Topo-ADV: Generating Topology-Driven Imperceptible Adversarial Point Clouds
Topo-ADV uses differentiable persistent homology to create topology-altering perturbations that achieve up to 100% attack success on point cloud classifiers like PointNet while remaining geometrically imperceptible.
-
Training-free Spatially Grounded Geometric Shape Encoding (Technical Report)
XShapeEnc encodes arbitrary 2D spatially grounded shapes into compact invertible representations by decomposing them into unit-disk geometry and harmonic pose fields then applying Zernike bases with frequency propagation.
-
3D-Fixer: Coarse-to-Fine In-place Completion for 3D Scenes from a Single Image
3D-Fixer performs in-place 3D asset completion from single-view partial point clouds via coarse-to-fine generation with ORFA conditioning, plus a new ARSG-110K dataset, to achieve higher geometric accuracy than MIDI and Gen3DSR while keeping diffusion efficiency.
-
Deformation-based In-Context Learning for Point Cloud Understanding
DeformPIC deforms query point clouds under prompt guidance for in-context learning, outperforming prior methods with lower Chamfer Distance on reconstruction, denoising, and registration tasks.
-
Align then Adapt: Rethinking Parameter-Efficient Transfer Learning in 4D Perception
PointATA is a parameter-efficient transfer learning method that aligns 3D-4D modality gaps via optimal transport before adapting a frozen 3D model with video-specific modules to achieve strong 4D perception results.
-
CLIPoint3D: Language-Grounded Few-Shot Unsupervised 3D Point Cloud Domain Adaptation
CLIPoint3D is the first CLIP-based framework for few-shot unsupervised 3D point cloud domain adaptation that reports 3-16% accuracy gains on PointDA-10 and GraspNetPC-10.
-
Physically Guided Visual Mass Estimation from a Single RGB Image
A method estimates mass from single RGB images by fusing depth-based volume cues with vision-language model density semantics via adaptive gating and separate regression heads trained on mass labels only.
-
Streaming Sliced Optimal Transport
A low-memory streaming estimator for sliced Wasserstein distance using quantile approximations on random projections with theoretical error guarantees.
-
Hard-Label Black-Box Attacks on 3D Point Clouds
A spectrum-aware decision boundary algorithm enables effective hard-label black-box adversarial attacks on 3D point cloud models by fusing spectral information across classes and performing curvature-aware iterative optimization.
-
LRM: Large Reconstruction Model for Single Image to 3D
LRM is a large transformer that predicts a NeRF directly from a single image after training on a million-object multi-view dataset.
-
Objaverse-XL: A Universe of 10M+ 3D Objects
Objaverse-XL supplies over 10 million diverse 3D objects that, when used to render 100 million views, improve zero-shot novel-view synthesis in models such as Zero123.
-
Fast Graph Representation Learning with PyTorch Geometric
PyTorch Geometric is a PyTorch library that delivers fast graph neural network training through sparse GPU kernels and variable-size mini-batching.
-
SuperFlex: Deformable Superquadrics for Point Cloud Decomposition
SuperFlex extends superquadrics with deformations and a new loss for higher-accuracy point cloud decomposition and trains a model robust to partial real-world data.
-
GenSP: Consistent Spherical Parameterization via Learning Shape Generative Models
GenSP learns a continuous neural deformation model from sphere coordinates and latent codes to produce consistent spherical parameterizations for genus-0 shapes.
-
From Grasps to Dexterity: Large-Scale Grasp Pretraining for Dexterous Manipulation
Grasp pretraining on 355k trajectories improves full-task success on six articulated tool-use tasks by 33.3 pp over DP3 in real-world experiments.
-
Emergence of a Shared Canonical Object Frame from In-the-Wild Videos
A coarse canonical mesh bottleneck plus multi-view consistency lets a shared object frame emerge from self-supervised training on in-the-wild videos without canonical labels or category conditioning.
-
GeoEdit: Geometry-Aware Object Editing via Dual-Branch Denoising
GeoEdit introduces a Lift-Manipulate-Render-Denoise pipeline with dual-branch denoising and variance-homogeneous injection for 3D-consistent object editing in single photos.
-
Anomaly Factory 3D: A Modular Framework for Diverse Pseudo-Anomaly Synthesis in Unsupervised 3D Anomaly Detection
AF3AD is a modular synthesis framework using center-conditioned parametric deformations in local PCA frames to create diverse pseudo-anomalies, improving unsupervised 3D anomaly detection on AnomalyShapeNet and Real3D-AD.
-
DeepJEB++: Foundation Model-Driven Large-Scale 3D Engineering Dataset via 2D Latent Space Augmentation
DeepJEB++ expands a small seed set of jet engine brackets into 15,360 labeled 3D designs via 2D latent diffusion augmentation, VLM filtering, generative 3D lifting, and automated finite-element labeling.
-
MAMVI: 3D Test-Time Adaptation via Masked Multi-View Point Clouds
MAMVI performs unified single-step TTA on masked multi-view point clouds with hybrid masking and confidence-adaptive learning rates, reporting SOTA on ShapeNet-C and ScanObjectNN-C plus 4.9-8.9x speedup.
-
Tac-DINO: Learning Vision-Tactile Features with Patch Alignment
Tac-DINO constructs a large tactile dataset and Vis-Tac Holographic Matching Benchmark, then proposes Vision-Tactile Patch Alignment (VTPA) methods that outperform non-aligned baselines on local-to-global feature matching.
-
SynthICL: Scalable In-context Imitation Learning with Synthetic Data
SynthICL trains flow-matching transformer policies for in-context imitation learning entirely from synthetic RGB data and reports 79% average success on 16 unseen real manipulation tasks with one test-time demonstration.
-
PAR3D: A Unified 3D-MLLM with Part-Aware Representation for Scene Understanding
PAR3D is a part-aware 3D-MLLM framework with ScenePart dataset, Part-Aware 3D Representation Learning, and Hierarchical Segmentation Query Generation to improve part-level 3D scene understanding.
-
EqGINO: Equivariant Geometry-Informed Fourier Neural Operators for 3D PDEs
EqGINO adds a spectral isotropy prior to FNOs to guarantee discrete equivariance and enable generalization to continuous SE(3) transformations on 3D PDEs with limited training data.
-
From Extrinsic to Intrinsic: Geodesic-Guided Representation Learning for 3D Geometric Data
PRISM is a pre-training method that learns isometric latent embeddings by explicitly recovering surface geodesic distances with a topology-enforcing loss and a two-stage training schedule.
-
HOLA: Holistic Multi-Modal Alignment for Open-Set 3D Recognition
HOLA introduces multi-view multi-text alignment and a decoupled contrastive loss for state-of-the-art open-vocabulary 3D recognition on long-tail benchmarks.
-
FoundObj: Self-supervised Foundation Models as Rewards for Label-free 3D Object Segmentation
FoundObj uses foundation-model priors as RL rewards to discover multi-class 3D objects from point clouds without scene-level labels.
-
DinoComplete: 3D Shape Completion with Distilled Semantic Priors and State Space Models
DinoComplete augments geometric 3D shape completion with voxel-aligned DINO semantic priors and multi-scale voxel Mamba modeling to improve results on unseen categories with lower compute.
-
BrickAnything: Geometry-Conditioned Buildable Brick Generation with Structure-Aware Tokenization
BrickAnything generates buildable brick structures from 3D point clouds via geometry-conditioned autoregressive prediction with structure-aware tree tokenization and post-training for stability.