arxiv: 1512.03012 · v1 · submitted 2015-12-09 · 💻 cs.GR · cs.AI· cs.CG· cs.CV· cs.RO

Recognition: 2 theorem links

· Lean Theorem

ShapeNet: An Information-Rich 3D Model Repository

Angel X. Chang , Thomas Funkhouser , Leonidas Guibas , Pat Hanrahan , Qixing Huang , Zimo Li , Silvio Savarese , Manolis Savva

show 5 more authors

Shuran Song Hao Su Jianxiong Xiao Li Yi Fisher Yu

Authors on Pith no claims yet

Pith reviewed 2026-05-11 16:02 UTC · model grok-4.3

classification 💻 cs.GR cs.AIcs.CGcs.CVcs.RO

keywords ShapeNet3D CAD modelsWordNet taxonomysemantic annotationscomputer graphicscomputer visionbenchmark datasetshape analysis

0 comments

The pith

ShapeNet supplies over three million 3D CAD models classified into thousands of WordNet categories and equipped with alignments, parts, symmetries, sizes, and keywords.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper presents ShapeNet as a large repository of 3D models drawn from many semantic categories and structured according to the WordNet taxonomy. It supplies each model with multiple annotations including rigid alignments, part decompositions, bilateral symmetry planes, physical sizes, and descriptive keywords, all accessible via a public web interface. These resources are intended to support data visualization, drive geometric analysis, and furnish a common quantitative benchmark for computer graphics and vision research. A reader would care because prior progress in 2D vision relied on large labeled collections, and an analogous 3D resource could enable similar scaling of shape-based algorithms.

Core claim

ShapeNet contains 3D models from a multitude of semantic categories and organizes them under the WordNet taxonomy. It is a collection of datasets providing many semantic annotations for each 3D model such as consistent rigid alignments, parts and bilateral symmetry planes, physical sizes, keywords, as well as other planned annotations. Annotations are made available through a public web-based interface to enable data visualization of object attributes, promote data-driven geometric analysis, and provide a large-scale quantitative benchmark for research in computer graphics and vision. At the time of this technical report, ShapeNet has indexed more than 3,000,000 models, 220,000 models out of

What carries the argument

The ShapeNet repository, which indexes 3D CAD models under WordNet synsets and attaches geometric and semantic annotations to each model for standardized access.

If this is right

Algorithms for 3D shape retrieval, segmentation, and symmetry detection can be evaluated on a shared, large-scale test set rather than on small private collections.
The taxonomy structure permits category-specific and cross-category experiments that were previously difficult to organize.
The web interface lets researchers inspect annotations visually before using them in experiments.
Planned additional annotations will further expand the range of tasks that can be benchmarked with the same data.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The scale and annotation density could support supervised learning of 3D representations at sizes comparable to those used in image classification.
Integration with image or text datasets might become straightforward once models carry both geometric and semantic labels.
The repository structure could serve as a template for similar collections in related domains such as scene understanding or robotic grasping.

Load-bearing premise

The collected CAD models are representative of real objects and the supplied annotations are accurate, consistent, and of sufficient quality to function as a reliable benchmark.

What would settle it

An independent check revealing that a large fraction of models are misclassified relative to their WordNet labels or that symmetry and part annotations disagree with human judgment on more than a small percentage of items.

read the original abstract

We present ShapeNet: a richly-annotated, large-scale repository of shapes represented by 3D CAD models of objects. ShapeNet contains 3D models from a multitude of semantic categories and organizes them under the WordNet taxonomy. It is a collection of datasets providing many semantic annotations for each 3D model such as consistent rigid alignments, parts and bilateral symmetry planes, physical sizes, keywords, as well as other planned annotations. Annotations are made available through a public web-based interface to enable data visualization of object attributes, promote data-driven geometric analysis, and provide a large-scale quantitative benchmark for research in computer graphics and vision. At the time of this technical report, ShapeNet has indexed more than 3,000,000 models, 220,000 models out of which are classified into 3,135 categories (WordNet synsets). In this report we describe the ShapeNet effort as a whole, provide details for all currently available datasets, and summarize future plans.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

ShapeNet is a useful large-scale 3D CAD collection with WordNet organization and basic annotations, but the report supplies no evidence on data quality or annotation reliability.

read the letter

ShapeNet collects over 220k classified 3D models across 3k categories and adds annotations for alignments, parts, symmetry planes, and sizes, all released through a public interface. That scale and organization under WordNet is the real addition; nothing like it existed as a single public resource before. The effort to index millions more models and plan further labels shows they are thinking about long-term use for data-driven 3D work in graphics and vision. The web browser for viewing attributes is a practical touch that lowers the barrier for others to explore the data. The central weakness is the lack of any quantitative check on the annotations themselves. The report describes the types of labels but gives no inter-annotator agreement numbers, no error rates from verification, and no details on how models were sourced or filtered. Without those, the claim that this forms a reliable benchmark rests on an untested assumption that the data is consistent and accurate enough for quantitative experiments. This is a data-resource paper rather than a methods paper, so the usual standards for new algorithms do not apply, but the missing quality evidence is still a gap that affects how much weight the community can place on it. Researchers working on 3D shape analysis or large-scale learning will want access to it and will cite the release when they use the models. It is worth sending to peer review so reviewers can ask for the missing validation details and any collection methodology that was left out of the report.

Referee Report

1 major / 1 minor

Summary. The manuscript presents ShapeNet as a large-scale repository of 3D CAD models, with more than 3 million models indexed and 220,000 classified into 3,135 WordNet categories. It details the provision of semantic annotations including consistent rigid alignments, parts, bilateral symmetry planes, physical sizes, and keywords, accessible via a public web-based interface intended to promote data-driven geometric analysis and serve as a quantitative benchmark for computer graphics and vision research.

Significance. If the annotations prove accurate and consistent, ShapeNet would be a highly significant resource, providing unprecedented scale and semantic richness for 3D shape research. The use of WordNet taxonomy for organization and the variety of annotations (alignments, parts, symmetry) address key needs in the field for standardized data, potentially enabling new data-driven methods similar to those facilitated by large 2D datasets.

major comments (1)

[Abstract] The positioning of ShapeNet as a 'large-scale quantitative benchmark' (Abstract) is undermined by the absence of any description of the annotation methodology, quality assurance processes, or validation metrics (such as accuracy or consistency measures) for the provided annotations like part labels and symmetry planes.

minor comments (1)

The phrasing in the abstract '220,000 models out of which are classified into 3,135 categories' is slightly awkward and could be clarified for readability.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for the positive evaluation of ShapeNet's potential impact and for the constructive feedback. We address the single major comment below and will revise the manuscript to strengthen the presentation of annotation details.

read point-by-point responses

Referee: [Abstract] The positioning of ShapeNet as a 'large-scale quantitative benchmark' (Abstract) is undermined by the absence of any description of the annotation methodology, quality assurance processes, or validation metrics (such as accuracy or consistency measures) for the provided annotations like part labels and symmetry planes.

Authors: We agree that the abstract's reference to a 'large-scale quantitative benchmark' would be better supported by explicit discussion of how the annotations were produced and validated. The current manuscript describes the types of annotations provided (alignments, parts, symmetry planes, etc.) and their intended uses but does not detail the underlying pipelines, crowdsourcing protocols, or any quantitative quality metrics. In the revised version we will add a new section (or subsection) that outlines the annotation methodology for each major attribute, including the tools and semi-automatic procedures employed, the quality-assurance steps taken, and any consistency or accuracy checks that were performed during data collection. We will also clarify that comprehensive per-annotation validation numbers remain an ongoing effort and will be reported as they become available. revision: yes

Circularity Check

0 steps flagged

No circularity: purely descriptive dataset report with no derivations or predictions

full rationale

This technical report describes the curation, taxonomy organization, and annotation of the ShapeNet repository without any mathematical derivations, equations, predictions, fitted parameters, or first-principles results. All claims are factual statements about data collection scale, WordNet synset classification, and annotation types (alignments, parts, symmetry planes, sizes). No load-bearing step reduces by construction to self-definition, self-citation, or renaming; the paper contains no derivation chain to inspect. It is self-contained as a data-release document.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

This is a dataset presentation paper with no mathematical derivations, so it introduces no free parameters, axioms, or invented entities.

pith-pipeline@v0.9.0 · 5524 in / 1242 out tokens · 93007 ms · 2026-05-11T16:02:46.819367+00:00 · methodology

discussion (0)

Forward citations

Cited by 58 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Towards Realistic 3D Emission Materials: Dataset, Baseline, and Evaluation for Emission Texture Generation
cs.CV 2026-04 unverdicted novelty 8.0

The work creates the first dataset and baseline for generating emission textures on 3D objects to reproduce glowing materials from input images.
Min Generalized Sliced Gromov Wasserstein: A Scalable Path to Gromov Wasserstein
cs.LG 2026-05 unverdicted novelty 7.0

min-GSGW learns coupled nonlinear slicers to produce a rigid-motion-invariant, scalable approximation to the Gromov-Wasserstein distance and its transport plans.
Img2CADSeq: Image-to-CAD Generation via Sequence-Based Diffusion
cs.CV 2026-05 unverdicted novelty 7.0

Img2CADSeq generates standard CAD sequences from images via a multi-stage pipeline with three-level hierarchical codebook encoding, importance-guided compression, and contrastive point-cloud conditioning of a VQ-Diffu...
Count Anything at Any Granularity
cs.CV 2026-05 unverdicted novelty 7.0

Multi-grained counting is introduced with five granularity levels, supported by the new KubriCount dataset generated via 3D synthesis and editing, and HieraCount model that combines text and visual exemplars for impro...
The Wittgensteinian Representation Hypothesis: Is Language the Attractor of Multimodal Convergence?
cs.AI 2026-05 unverdicted novelty 7.0

Language representations serve as the asymptotic attractor for convergence in independently trained multimodal neural networks due to feature density asymmetry.
MeshFIM: Local Low-Poly Mesh Editing via Fill-in-the-Middle Autoregressive Generation
cs.GR 2026-05 unverdicted novelty 7.0

MeshFIM enables local low-poly mesh editing by autoregressively filling target regions conditioned on context, using boundary markers, positional embeddings, and a gated geometry encoder to enforce attachment, topolog...
Rollback-Free Stable Brick Structures Generation
cs.LG 2026-05 unverdicted novelty 7.0

Reinforcement learning internalizes physical stability rules for brick structures, enabling the first rollback-free generation with orders-of-magnitude faster inference.
Two Steps Are All You Need: Efficient 3D Point Cloud Anomaly Detection with Consistency Models
cs.CV 2026-05 unverdicted novelty 7.0

Consistency learning reformulates 3D point cloud anomaly detection to predict clean geometry directly in one or two steps, yielding up to 80 times faster inference while matching state-of-the-art accuracy.
ADS: Random Sampling of Occupancy Functions using Adaptive Delaunay Scaffolding
cs.GR 2026-05 unverdicted novelty 7.0

ADS adaptively refines a Delaunay scaffold to produce unbiased random samples on occupancy function surfaces together with a connecting mesh, using far fewer evaluations than existing approaches.
Generative Modeling with Orbit-Space Particle Flow Matching
cs.GR 2026-05 unverdicted novelty 7.0

OGPP is a particle flow-matching method using orbit-space canonicalization and geometric paths that achieves lower error and fewer steps than prior approaches on 3D benchmarks.
AirZoo: A Unified Large-Scale Dataset for Grounding Aerial Geometric 3D Vision
cs.CV 2026-04 conditional novelty 7.0

AirZoo is a new large-scale synthetic dataset for aerial 3D vision that improves state-of-the-art models on image retrieval, cross-view matching, and 3D reconstruction when used for fine-tuning.
3D Generation for Embodied AI and Robotic Simulation: A Survey
cs.RO 2026-04 accept novelty 7.0

3D generation for embodied AI is shifting from visual realism toward interaction readiness, organized into data generation, simulation environments, and sim-to-real bridging roles.
AmaraSpatial-10K: A Spatially and Semantically Aligned 3D Dataset for Spatial Computing and Embodied AI
cs.CV 2026-04 unverdicted novelty 7.0

AmaraSpatial-10K is a new dataset of over 10,000 metric-scaled and semantically anchored 3D assets that achieves 3.4 times higher text retrieval precision than Objaverse for embodied AI and spatial computing.
Topo-ADV: Generating Topology-Driven Imperceptible Adversarial Point Clouds
cs.CV 2026-04 unverdicted novelty 7.0

Topo-ADV uses differentiable persistent homology to create topology-altering perturbations that achieve up to 100% attack success on point cloud classifiers like PointNet while remaining geometrically imperceptible.
Training-free Spatially Grounded Geometric Shape Encoding (Technical Report)
cs.CV 2026-04 unverdicted novelty 7.0

XShapeEnc encodes arbitrary 2D spatially grounded shapes into compact invertible representations by decomposing them into unit-disk geometry and harmonic pose fields then applying Zernike bases with frequency propagation.
3D-Fixer: Coarse-to-Fine In-place Completion for 3D Scenes from a Single Image
cs.CV 2026-04 unverdicted novelty 7.0

3D-Fixer performs in-place 3D asset completion from single-view partial point clouds via coarse-to-fine generation with ORFA conditioning, plus a new ARSG-110K dataset, to achieve higher geometric accuracy than MIDI a...
Deformation-based In-Context Learning for Point Cloud Understanding
cs.CV 2026-04 unverdicted novelty 7.0

DeformPIC deforms query point clouds under prompt guidance for in-context learning, outperforming prior methods with lower Chamfer Distance on reconstruction, denoising, and registration tasks.
Fast Graph Representation Learning with PyTorch Geometric
cs.LG 2019-03 accept novelty 7.0

PyTorch Geometric is a PyTorch library that delivers fast graph neural network training through sparse GPU kernels and variable-size mini-batching.
Sat3DGen: Comprehensive Street-Level 3D Scene Generation from Single Satellite Image
cs.CV 2026-05 unverdicted novelty 6.0

Sat3DGen improves geometric RMSE from 6.76m to 5.20m and FID from ~40 to 19 for street-level 3D generation from satellite images via geometry-centric constraints and perspective training.
ObjView-Bench: Rethinking Difficulty and Deployment for Object-Centric View Planning
cs.RO 2026-05 unverdicted novelty 6.0

ObjView-Bench disentangles omnidirectional self-occlusion, saturation difficulty, and set-cover planning difficulty, then shows that budget regimes and reachable-view constraints change planner rankings and failure mo...
GenMed: A Pairwise Generative Reformulation of Medical Diagnostic Tasks
cs.CV 2026-05 unverdicted novelty 6.0

GenMed uses diffusion models to capture P(X,Y) for medical tasks and performs inference via gradient-based test-time optimization, supporting arbitrary observation combinations without retraining.
Beyond Spatial Compression: Interface-Centric Generative States for Open-World 3D Structure
cs.LG 2026-05 unverdicted novelty 6.0

C2LT-3D factorizes 3D tokenization into canonical local geometry, partition-conditioned context, and relational seam variables to make latent states operational for assembly-level validation and repair in open-world m...
Minimax Optimal Estimation of Transport-Growth Pairs in Unbalanced Optimal Transport
math.ST 2026-05 unverdicted novelty 6.0

Estimators for transport-growth pairs in unbalanced OT achieve minimax optimal rates, supported by a value-based stability reduction through a UOT gap condition.
Toward Visually Realistic Simulation: A Benchmark for Evaluating Robot Manipulation in Simulation
cs.RO 2026-05 unverdicted novelty 6.0

VISER is a new visually realistic simulation benchmark for robot manipulation tasks that uses PBR materials and MLLM-assisted asset generation, achieving 0.92 Pearson correlation with real-world policy performance.
Prop-Chromeleon: Adaptive Haptic Props in Mixed Reality through Generative Artificial Intelligence
cs.HC 2026-05 unverdicted novelty 6.0

A generative-AI pipeline dynamically generates and anchors virtual assets to match the shape of physical props, enabling adaptive passive haptics in MR that users rate higher in realism, immersion, and enjoyment than ...
TAFA-GSGC: Group-wise Scalable Point Cloud Geometry Compression with Progressive Residual Refinement
cs.CV 2026-04 unverdicted novelty 6.0

TAFA-GSGC delivers scalable point cloud geometry compression supporting up to nine monotonic quality levels from a single trained model and bitstream while matching or slightly exceeding PCGCv2 rate-distortion performance.
TAFA-GSGC: Group-wise Scalable Point Cloud Geometry Compression with Progressive Residual Refinement
cs.CV 2026-04 unverdicted novelty 6.0

TAFA-GSGC is a scalable point cloud geometry compression codec using progressive residual refinement and group-wise entropy coding that achieves average BD-rate reductions of 4.99% (D1-PSNR) and 5.92% (D2-PSNR) over P...
ShapeY: A Principled Framework for Measuring Shape Recognition Capacity via Nearest-Neighbor Matching
cs.CV 2026-04 unverdicted novelty 6.0

ShapeY is a benchmark dataset and nearest-neighbor protocol that measures shape-based recognition in vision models, revealing that even state-of-the-art networks fail to generalize consistently across 3D viewpoints an...
Point-MF: One-step Point Cloud Generation from a Single Image via Mean Flows
cs.CV 2026-04 unverdicted novelty 6.0

Point-MF performs one-step point cloud reconstruction from single images by learning a mean velocity field in point space with a tailored Diffusion Transformer and a new auxiliary loss.
Text-Guided Multimodal Unified Industrial Anomaly Detection
cs.CV 2026-04 unverdicted novelty 6.0

A text-semantics-guided multimodal framework with geometry-aware mapping and object-conditioned text adaptation achieves state-of-the-art unsupervised anomaly detection and localization on RGB-3D industrial datasets w...
FILTR: Extracting Topological Features from Pretrained 3D Models
cs.CV 2026-04 unverdicted novelty 6.0

FILTR predicts persistence diagrams from pretrained 3D encoders on the new DONUT benchmark, showing limited topological signals in encoders but successful approximation via learnable feed-forward.
FurnSet: Exploiting Repeats for 3D Scene Reconstruction
cs.CV 2026-04 unverdicted novelty 6.0

FurnSet improves single-view 3D scene reconstruction by using per-object CLS tokens and set-aware self-attention to group and jointly reconstruct repeated object instances, with added scene-object conditioning and lay...
Volume Transformer: Revisiting Vanilla Transformers for 3D Scene Understanding
cs.CV 2026-04 unverdicted novelty 6.0

A minimally modified vanilla Transformer called Volt achieves state-of-the-art 3D semantic and instance segmentation by using volumetric tokens, 3D rotary embeddings, and a data-efficient training recipe that scales b...
One-Shot Cross-Geometry Skill Transfer through Part Decomposition
cs.RO 2026-04 unverdicted novelty 6.0

Part decomposition with generative shape models allows one-shot robot skill transfer across unfamiliar object geometries in simulation and real settings.
Feed-Forward 3D Scene Modeling: A Problem-Driven Perspective
cs.CV 2026-04 unverdicted novelty 6.0

The paper proposes a problem-driven taxonomy for feed-forward 3D scene modeling that groups methods by five core challenges: feature enhancement, geometry awareness, model efficiency, augmentation strategies, and temp...
ReplicateAnyScene: Zero-Shot Video-to-3D Composition via Textual-Visual-Spatial Alignment
cs.CV 2026-04 unverdicted novelty 6.0

ReplicateAnyScene performs fully automated zero-shot video-to-compositional-3D reconstruction by cascading alignments of generic priors from vision foundation models across textual, visual, and spatial dimensions.
L-PCN: A Point Cloud Accelerator Exploiting Spatial Locality through Octree-based Islandization
cs.AR 2026-04 unverdicted novelty 6.0

L-PCN exploits spatial locality in point cloud networks via octree partitioning into islands and intra-island hub scheduling, delivering 55-94% less feature fetching, 45-81% less computation, and 1.2-3.2x additional s...
TouchAnything: Diffusion-Guided 3D Reconstruction from Sparse Robot Touches
cs.CV 2026-04 unverdicted novelty 6.0

TouchAnything reconstructs accurate 3D object geometries from only a few tactile contacts by optimizing for consistency with a pretrained visual diffusion prior.
Training-free Spatially Grounded Geometric Shape Encoding (Technical Report)
cs.CV 2026-04 unverdicted novelty 6.0

XShapeEnc decomposes 2D shapes into unit-disk geometry and harmonic pose, encodes both with orthogonal Zernike bases, and applies frequency propagation to produce invertible, adaptive, frequency-rich representations.
Part-Level 3D Gaussian Vehicle Generation with Joint and Hinge Axis Estimation
cs.AI 2026-04 unverdicted novelty 6.0

A new framework generates part-level animatable 3D Gaussian vehicles from images by adding modules for exclusive part ownership and kinematic joint/axis prediction.
FusionBERT: Multi-View Image-3D Retrieval via Cross-Attention Visual Fusion and Normal-Aware 3D Encoder
cs.CV 2026-04 unverdicted novelty 6.0

FusionBERT uses cross-attention to fuse multi-view images and a normal-aware encoder for 3D models, achieving higher image-3D retrieval accuracy than prior multimodal models in both single- and multi-view settings.
SAM 3D: 3Dfy Anything in Images
cs.CV 2025-11 unverdicted novelty 6.0

SAM 3D reconstructs 3D objects from single images with geometry, texture, and pose using human-model annotated data at scale and synthetic-to-real training, achieving 5:1 human preference wins.
ViewCrafter: Taming Video Diffusion Models for High-fidelity Novel View Synthesis
cs.CV 2024-09 unverdicted novelty 6.0

ViewCrafter tames video diffusion models with point-based 3D guidance and iterative trajectory planning to produce high-fidelity novel views from single or sparse images.
DROID: A Large-Scale In-The-Wild Robot Manipulation Dataset
cs.RO 2024-03 accept novelty 6.0

DROID is a new 76k-trajectory in-the-wild robot manipulation dataset spanning 564 scenes and 84 tasks that improves policy performance and generalization when used for training.
EvObj: Learning Evolving Object-centric Representations for 3D Instance Segmentation without Scene Supervision
cs.CV 2026-05 unverdicted novelty 5.0

EvObj learns evolving object-centric representations for unsupervised 3D instance segmentation by dynamically refining object candidates and completing partial geometries to bridge the synthetic-to-real domain gap, ou...
Syn4D: A Multiview Synthetic 4D Dataset
cs.CV 2026-05 unverdicted novelty 5.0

Syn4D is a new multiview synthetic 4D dataset supplying dense ground-truth annotations for dynamic scene reconstruction, tracking, and human pose estimation.
Channel-Level Relation to Attentive Aggregation with Neighborhood-Homogeneity Constraint for Point Cloud Analysis
cs.CV 2026-05 unverdicted novelty 5.0

PointCRA reduces information loss in deep point cloud networks by treating temporal trend variation as an extra evaluation dimension alongside spatial and channel attention, guided by a neighborhood homogeneity constraint.
From Visual Synthesis to Interactive Worlds: Toward Production-Ready 3D Asset Generation
cs.GR 2026-04 unverdicted novelty 5.0

The paper surveys 3D asset generation methods and organizes them around the full production pipeline to assess which outputs meet engine-level requirements for interactive applications.
AmaraSpatial-10K: A Spatially and Semantically Aligned 3D Dataset for Spatial Computing and Embodied AI
cs.CV 2026-04 unverdicted novelty 5.0

AmaraSpatial-10K supplies 10K deployment-ready 3D assets with metric scaling and metadata, delivering 3.4x higher CLIP Recall@5 than Objaverse and 99.1% physics stability in Habitat-Sim.
Unposed-to-3D: Learning Simulation-Ready Vehicles from Real-World Images
cs.CV 2026-04 unverdicted novelty 5.0

Unposed-to-3D learns simulation-ready 3D vehicle models from unposed real images by predicting camera parameters for photometric self-supervision, then adding scale prediction and harmonization.
Neural Distribution Prior for LiDAR Out-of-Distribution Detection
cs.CV 2026-04 unverdicted novelty 5.0

NDP models prediction distributions and uses Perlin noise OOD synthesis to reach 61.31% point-level AP on STU LiDAR benchmark, over 10x prior best.
Channel-Level Relation to Attentive Aggregation with Neighborhood-Homogeneity Constraint for Point Cloud Analysis
cs.CV 2026-05 unverdicted novelty 4.0

PointCRA improves point cloud feature aggregation by using channel-level metrics with temporal trend variation and neighborhood-homogeneity calibration to enhance discriminability and reduce weight collapse in deep networks.
RETO: A Rotary-Enhanced Transformer Operator for High-Fidelity Prediction of Automotive Aerodynamics
eess.IV 2026-04 unverdicted novelty 4.0

RETO achieves relative L2 errors of 0.063 on ShapeNet and 0.089/0.097 on DrivAerML surface pressure/velocity, outperforming Transolver and other baselines.
From Visual Synthesis to Interactive Worlds: Toward Production-Ready 3D Asset Generation
cs.GR 2026-04 unverdicted novelty 4.0

The paper surveys 3D content generation literature using a taxonomy of asset types and production stages to evaluate progress toward engine-ready assets.
Reinforcing 3D Understanding in Point-VLMs via Geometric Reward Credit Assignment
cs.CV 2026-04 unverdicted novelty 4.0

Geometric Reward Credit Assignment disentangles rewards to geometric tokens and adds reprojection consistency to boost 3D keypoint accuracy from 0.64 to 0.93 and bounding box IoU to 0.686 on a ShapeNetCore benchmark w...
3D Generation for Embodied AI and Robotic Simulation: A Survey
cs.RO 2026-04 unverdicted novelty 3.0

The survey organizes 3D generation for embodied AI into data generators for assets, simulation environments for interaction, and sim-to-real bridges, noting a shift toward interaction readiness and listing bottlenecks...
Attention Is not Everything: Efficient Alternatives for Vision
cs.CV 2026-04 unverdicted novelty 3.0

A survey that taxonomizes non-Transformer vision models and evaluates their practical trade-offs across efficiency, scalability, and robustness.
3D Generation for Embodied AI and Robotic Simulation: A Survey
cs.RO 2026-04 unverdicted novelty 2.0

The paper surveys 3D generation techniques for embodied AI and robotics, categorizing them into data generation, simulation environments, and sim-to-real bridging while identifying bottlenecks in physical validity and...

Reference graph

Works this paper leans on

37 extracted references · 37 canonical work pages · cited by 51 Pith papers

[1]

The protein data bank

Helen M Berman, John Westbrook, Zukang Feng, Gary Gilliland, TN Bhat, Helge Weissig, Ilya N Shindyalov, and Philip E Bourne. The protein data bank. Nucleic Acids Res, 28:235–242, 2000. 2

work page 2000
[2]

A benchmark for 3D mesh segmentation

Xiaobai Chen, Aleksey Golovinskiy, and Thomas Funkhouser. A benchmark for 3D mesh segmentation. ACM TOG, 28(3):73:1–73:12, July 2009. 2 9

work page 2009
[3]

Schelling points on 3D surface meshes

Xiaobai Chen, Abulhair Saparov, Bill Pang, and Thomas Funkhouser. Schelling points on 3D surface meshes. ACM TOG, August 2012. 2

work page 2012
[4]

ImageNet: A large-scale hierarchical image database

Jia Deng, Wei Dong, Richard Socher, Li-Jia Li, Kai Li, and Li Fei-Fei. ImageNet: A large-scale hierarchical image database. In CVPR, 2009. 1, 2, 4

work page 2009
[5]

Aim@shape

Bianca Falcidieno. Aim@shape. http://www. aimatshape.net/ontologies/shapes/, 2005. 2

work page 2005
[6]

Example-based synthesis of 3D object arrangements

Matthew Fisher, Daniel Ritchie, Manolis Savva, Thomas Funkhouser, and Pat Hanrahan. Example-based synthesis of 3D object arrangements. ACM TOG, 31(6):135, 2012. 1

work page 2012
[7]

Paul-Louis George. Gamma. http://www.rocq. inria.fr/gamma/download/download.php,

work page
[8]

Fine-grained semi-supervised labeling of large shape collections

Qixing Huang, Hao Su, and Leonidas Guibas. Fine-grained semi-supervised labeling of large shape collections. ACM TOG, 32:190:1–190:10, 2013. 4, 11

work page 2013
[9]

Developing an engineering shape benchmark for CAD models.Computer-Aided Design,

Subramaniam Jayanti, Yagnanarayanan Kalyanaraman, Na- traj Iyer, and Karthik Ramani. Developing an engineering shape benchmark for CAD models.Computer-Aided Design,

work page
[10]

A probabilistic model for component-based shape synthesis

Evangelos Kalogerakis, Siddhartha Chaudhuri, Daphne Koller, and Vladlen Koltun. A probabilistic model for component-based shape synthesis. ACM TOG, 31:55, 2012. 1

work page 2012
[11]

Mobius transformations for global intrinsic symmetry analysis

Vladimir Kim, Yaron Lipman, Xiaobai Chen, and Thomas Funkhouser. Mobius transformations for global intrinsic symmetry analysis. Symposium on Geometry Processing , July 2010. 2

work page 2010
[12]

Kim, Wilmot Li, Niloy J

Vladimir G. Kim, Wilmot Li, Niloy J. Mitra, Siddhartha Chaudhuri, Stephen DiVerdi, and Thomas Funkhouser. Learning part-based templates from large collections of 3D shapes. ACM TOG, 32(4):70:1–70:12, July 2013. 2

work page 2013
[13]

Kim, Wilmot Li, Niloy J

Vladimir G. Kim, Wilmot Li, Niloy J. Mitra, Stephen DiVerdi, and Thomas Funkhouser. Exploring collections of 3D models using fuzzy correspondences. ACM TOG , 31(4):54:1–54:11, July 2012. 2, 4

work page 2012
[14]

3D object representations for ﬁne-grained categorization

Jonathan Krause, Michael Stark, Jia Deng, and Li Fei-Fei. 3D object representations for ﬁne-grained categorization. In 4th International IEEE Workshop on 3D Representation and Recognition (3dRR-13), Sydney, Australia, 2013. 2

work page 2013
[15]

PDBsum: A web-based database of summaries and analyses of all PDB structures

Roman A Laskowski, E Gail Hutchinson, Alex D Michie, Andrew C Wallace, Martin L Jones, and Janet M Thornton. PDBsum: A web-based database of summaries and analyses of all PDB structures. Trends Biochem. Sci. , 22:488–490,

work page
[16]

SHREC’12 track: generic 3D shape retrieval

Bo Li, Afzal Godil, Masaki Aono, X Bai, Takahiko Furuya, L Li, R L´opez-Sastre, Henry Johan, Ryutarou Ohbuchi, Car- olina Redondo-Cabrera, et al. SHREC’12 track: generic 3D shape retrieval. In 5th Eurographics Conference on 3D Ob- ject Retrieval, 2012. 3

work page 2012
[17]

SHREC’14 track: Large scale comprehensive 3D shape retrieval

Bo Li, Yijuan Lu, Chunyuan Li, Afzal Godil, Tobias Schreck, Masaki Aono, Qiang Chen, Nihad Karim Chowd- hury, Bin Fang, Takahiko Furuya, et al. SHREC’14 track: Large scale comprehensive 3D shape retrieval. In Euro- graphics Workshop on 3D Object Retrieval, 2014. 2

work page 2014
[18]

Multi-view object class detection with a 3D geometric model

Joerg Liebelt and Cordelia Schmid. Multi-view object class detection with a 3D geometric model. InCVPR, pages 1688–

work page
[19]

Kim, Qi- Xing Huang, Niloy J

Tianqiang Liu, Siddhartha Chaudhuri, Vladimir G. Kim, Qi- Xing Huang, Niloy J. Mitra, and Thomas Funkhouser. Cre- ating consistent scene graphs using a probabilistic grammar. ACM TOG, December 2014. 2

work page 2014
[20]

Building a large annotated corpus of english: The Penn Treebank

Mitchell P Marcus, Mary Ann Marcinkiewicz, and Beatrice Santorini. Building a large annotated corpus of english: The Penn Treebank. Computational linguistics, 19(2):313–330,

work page
[21]

George A. Miller. WordNet: a lexical database for English. CACM, 1995. 1, 2, 3, 4

work page 1995
[22]

Symmetry in 3D geometry: Extraction and applications

Niloy J Mitra, Mark Pauly, Michael Wand, and Duygu Cey- lan. Symmetry in 3D geometry: Extraction and applications. In Computer Graphics Forum, volume 32, pages 1–23, 2013. 7

work page 2013
[23]

Nooruddin and Greg Turk

Fakir S. Nooruddin and Greg Turk. Simpliﬁcation and repair of polygonal models using volumetric techniques. Visualiza- tion and Computer Graphics, IEEE Transactions on , 2003. 7

work page 2003
[24]

Building a database of 3D scenes from user annotations

Bryan C Russell and Antonio Torralba. Building a database of 3D scenes from user annotations. In CVPR, 2009. 2

work page 2009
[25]

Chang, Gilbert Bernstein, Christo- pher D

Manolis Savva, Angel X. Chang, Gilbert Bernstein, Christo- pher D. Manning, and Pat Hanrahan. On being the right scale: Sizing large collections of 3D models. In SIGGRAPH Asia 2014 Workshop on Indoor Scene Understanding: Where Graphics meets Vision, 2014. 7

work page 2014
[26]

Chang, and Pat Hanrahan

Manolis Savva, Angel X. Chang, and Pat Hanrahan. Semantically-Enriched 3D Models for Common-sense Knowledge. CVPR 2015 Workshop on Functionality, Physics, Intentionality and Causality, 2015. 7

work page 2015
[27]

The Princeton shape benchmark

Philip Shilane, Patrick Min, Michael Kazhdan, and Thomas Funkhouser. The Princeton shape benchmark. In Shape Modeling Applications. IEEE, 2004. 2, 3

work page 2004
[28]

Sliding shapes for 3D ob- ject detection in depth images

Shuran Song and Jianxiong Xiao. Sliding shapes for 3D ob- ject detection in depth images. In ECCV, 2014. 1

work page 2014
[29]

A large-scale shape benchmark for 3D object retrieval: Toy- ohashi shape benchmark

Atsushi Tatsuma, Hitoshi Koyanagi, and Masaki Aono. A large-scale shape benchmark for 3D object retrieval: Toy- ohashi shape benchmark. In Asia Paciﬁc Signal and Infor- mation Processing Association, 2012. 3

work page 2012
[30]

La- belMe: Online image annotation and applications

Antonio Torralba, Bryan C Russell, and Jenny Yuen. La- belMe: Online image annotation and applications. Proceed- ings of the IEEE, 98(8):1467–1484, 2010. 7

work page 2010
[31]

Veltkamp and FB ter Harr

Remco C. Veltkamp and FB ter Harr. SHREC 2007 3D shape retrieval contest. Technical report, Utrecht University Tech- nical Report UU-CS-2007-015, 2007. 3

work page 2007
[32]

3D model retrieval

Dejan V Vrani ´c. 3D model retrieval. University of Leipzig, Germany, PhD thesis, 2004. 3 10

work page 2004
[33]

A 3D shape benchmark for retrieval and automatic classiﬁcation of ar- chitectural data

Raoul Wessel, Ina Bl ¨umel, and Reinhard Klein. A 3D shape benchmark for retrieval and automatic classiﬁcation of ar- chitectural data. In Eurographics 2009 Workshop on 3D Ob- ject Retrieval, pages 53–56. The Eurographics Association,

work page 2009
[34]

3D ShapeNets: A Deep Representation for V olumetric Shapes

Zhirong Wu, Shuran Song, Aditya Khosla, Fisher Yu, Lin- guang Zhang, Xiaoou Tang, and Jianxiong Xiao. 3D ShapeNets: A Deep Representation for V olumetric Shapes. CVPR, 2015. 1, 2, 4

work page 2015
[35]

Beyond PASCAL: A benchmark for 3D object detection in the wild

Yu Xiang, Roozbeh Mottaghi, and Silvio Savarese. Beyond PASCAL: A benchmark for 3D object detection in the wild. In WACV, 2014. 2, 7

work page 2014
[36]

SUN3D: A database of big spaces reconstructed using SfM and object labels

Jianxiong Xiao, Andrew Owens, and Antonio Torralba. SUN3D: A database of big spaces reconstructed using SfM and object labels. In ICCV, pages 1625–1632, 2013. 2

work page 2013
[37]

Retrieving articulated 3-D mod- els using medial surfaces and their graph spectra

Juan Zhang, Kaleem Siddiqi, Diego Macrini, Ali Shokoufan- deh, and Sven Dickinson. Retrieving articulated 3-D mod- els using medial surfaces and their graph spectra. In Energy minimization methods in computer vision and pattern recog- nition, 2005. 3 A. Appendix A.1. Hierarchical Rigid Alignment In the following, we describe our hierarchical rigid align- ...

work page 2005