hub Mixed citations

Accelerating 3D Deep Learning with PyTorch3D

Nikhila Ravi, Jeremy Reizenstein, David Novotny, Taylor Gordon, Wan-Yen Lo, Justin Johnson · 2020 · cs.CV · arXiv 2007.08501

Mixed citation behavior. Most common role is method (62%).

26 Pith papers citing it

Method 62% of classified citations

open full Pith review browse 26 citing papers arXiv PDF

abstract

Deep learning has significantly improved 2D image recognition. Extending into 3D may advance many new applications including autonomous vehicles, virtual and augmented reality, authoring 3D content, and even improving 2D recognition. However despite growing interest, 3D deep learning remains relatively underexplored. We believe that some of this disparity is due to the engineering challenges involved in 3D deep learning, such as efficiently processing heterogeneous data and reframing graphics operations to be differentiable. We address these challenges by introducing PyTorch3D, a library of modular, efficient, and differentiable operators for 3D deep learning. It includes a fast, modular differentiable renderer for meshes and point clouds, enabling analysis-by-synthesis approaches. Compared with other differentiable renderers, PyTorch3D is more modular and efficient, allowing users to more easily extend it while also gracefully scaling to large meshes and images. We compare the PyTorch3D operators and renderer with other implementations and demonstrate significant speed and memory improvements. We also use PyTorch3D to improve the state-of-the-art for unsupervised 3D mesh and point cloud prediction from 2D images on ShapeNet. PyTorch3D is open-source and we hope it will help accelerate research in 3D deep learning.

hub tools

JSON dossier citing papers JSON arXiv source

citation-role summary

method 5 background 1 dataset 1 other 1

citation-polarity summary

use method 5 background 2 unclear 1

representative citing papers

Still Camouflage, Moving Illusion: View-Induced Trajectory Manipulation in Autonomous Driving

cs.CR · 2026-05-12 · unverdicted · novelty 8.0

Static adversarial camouflage exploits natural view-angle changes during relative motion to induce consistent feature drift in AV perception, leading to incorrect trajectory predictions and unnecessary braking.

Meschers: Geometry Processing of Impossible Objects

cs.GR · 2026-05-14 · unverdicted · novelty 7.0

Meschers are a new mesh representation for impossible geometric objects grounded in discrete exterior calculus that supports full discrete geometry processing including inverse rendering.

Human face perception reflects inverse-generative and naturalistic discriminative objectives

q-bio.NC · 2026-05-12 · unverdicted · novelty 7.0

Human face perception aligns with neural networks trained on inverse-generative and naturalistic discriminative tasks, as these best predict human dissimilarity judgments on controversial and random face pairs.

Profile-Specific 3DMM Regression from a Single Lateral Face Image

cs.CV · 2026-05-03 · unverdicted · novelty 7.0

Introduces the ProfileSynth dataset and a profile-specific FLAME 3DMM regression baseline with visibility-aware jawline regularization for 3D reconstruction from single lateral face images.

LEXIS: LatEnt ProXimal Interaction Signatures for 3D HOI from an Image

cs.CV · 2026-04-22 · unverdicted · novelty 7.0

LEXIS-Flow uses VQ-VAE-learned interaction signatures to guide diffusion-based reconstruction of 3D human-object meshes and dense proximity fields from single RGB images, outperforming SOTA on benchmarks.

Geometrically Consistent Multi-View Scene Generation from Freehand Sketches

cs.CV · 2026-04-15 · unverdicted · novelty 7.0

A framework generates consistent multi-view scenes from one freehand sketch via a ~9k-sample dataset, Parallel Camera-Aware Attention Adapters, and Sparse Correspondence Supervision Loss, outperforming baselines in realism and consistency.

InverseDraping: Recovering Sewing Patterns from 3D Garment Surfaces via BoxMesh Bridging

cs.CV · 2026-04-03 · unverdicted · novelty 7.0

A two-stage autoregressive framework centered on BoxMesh recovers parametric sewing patterns from 3D garment surfaces, claiming state-of-the-art results on benchmarks and generalization to real scans and single-view images.

AGILE: Hand-Object Interaction Reconstruction from Video via Agentic Generation

cs.CV · 2026-02-04 · unverdicted · novelty 7.0

AGILE generates complete object meshes via VLM-guided synthesis and tracks poses with anchor-and-track plus contact-aware optimization to achieve robust hand-object reconstruction from video.

UIKA: Fast Universal Head Avatar from Pose-Free Images

cs.CV · 2026-01-12 · conditional · novelty 7.0

UIKA is a feed-forward animatable Gaussian head model using UV-guided correspondence estimation and learnable UV tokens with dual-level attention, trained on large-scale synthetic data to handle pose-free inputs.

NeuralBoneReg: An Instance-Specific Label-Free Point Cloud-Based Method for Multi-Modal Bone Surface Registration

cs.CV · 2025-11-18 · unverdicted · novelty 7.0

NeuralBoneReg is a self-supervised instance-specific method using neural UDF and MLP-based point cloud registration that matches supervised SOTA accuracy on CT-US and CT-RGB-D bone datasets without inter-subject training data.

Objaverse-XL: A Universe of 10M+ 3D Objects

cs.CV · 2023-07-11 · accept · novelty 7.0

Objaverse-XL supplies over 10 million diverse 3D objects that, when used to render 100 million views, improve zero-shot novel-view synthesis in models such as Zero123.

ObjView-Bench: Rethinking Difficulty and Deployment for Object-Centric View Planning

cs.RO · 2026-05-11 · unverdicted · novelty 6.0

ObjView-Bench disentangles omnidirectional self-occlusion, saturation difficulty, and set-cover planning difficulty, then shows that budget regimes and reachable-view constraints change planner rankings and failure modes across classical, learned, and hybrid methods.

Learning a Delighting Prior for Facial Appearance Capture in the Wild

cs.CV · 2026-05-07 · unverdicted · novelty 6.0

A delighting network trained via Dataset Latent Modulation on heterogeneous OLAT and Light Stage data enables high-quality in-the-wild facial reflectance capture from video and produces the NeRSemble-Scan dataset.

Multimodal Diffusion to Mutually Enhance Polarized Light and Low Resolution EBSD Data

eess.IV · 2026-04-24 · unverdicted · novelty 6.0

A multimodal diffusion model trained on synthetic data enhances low-resolution EBSD and corrupted polarized light data, achieving near full-resolution performance with only 25% EBSD data.

TouchAnything: Diffusion-Guided 3D Reconstruction from Sparse Robot Touches

cs.CV · 2026-04-10 · unverdicted · novelty 6.0

TouchAnything reconstructs accurate 3D object geometries from only a few tactile contacts by optimizing for consistency with a pretrained visual diffusion prior.

Visually-grounded Humanoid Agents

cs.CV · 2026-04-09 · unverdicted · novelty 6.0

A coupled world-agent framework uses 3D Gaussian reconstruction and first-person RGB-D perception with iterative planning to enable goal-directed, collision-avoiding humanoid behavior in novel reconstructed scenes.

Stepper: Stepwise Immersive Scene Generation with Multiview Panoramas

cs.CV · 2026-03-30 · unverdicted · novelty 6.0

Stepper uses stepwise panoramic expansion with a multi-view 360-degree diffusion model and geometry reconstruction to produce high-fidelity, structurally consistent immersive 3D scenes from text.

MAGICIAN: Efficient Long-Term Planning with Imagined Gaussians for Active Mapping

cs.CV · 2026-03-23 · unverdicted · novelty 6.0

MAGICIAN uses Imagined Gaussians from occupancy networks for efficient coverage gain computation in tree-search based long-horizon planning for active mapping, achieving SOTA results on indoor and outdoor benchmarks.

ViewCrafter: Taming Video Diffusion Models for High-fidelity Novel View Synthesis

cs.CV · 2024-09-03 · unverdicted · novelty 6.0

ViewCrafter tames video diffusion models with point-based 3D guidance and iterative trajectory planning to produce high-fidelity novel views from single or sparse images.

DROID: A Large-Scale In-The-Wild Robot Manipulation Dataset

cs.RO · 2024-03-19 · accept · novelty 6.0

DROID is a new 76k-trajectory in-the-wild robot manipulation dataset spanning 564 scenes and 84 tasks that improves policy performance and generalization when used for training.

Shap-E: Generating Conditional 3D Implicit Functions

cs.CV · 2023-05-03 · accept · novelty 6.0

Shap-E encodes 3D assets into implicit function parameters then uses a conditional diffusion model to generate new ones from text, enabling fast multi-representation 3D asset creation.

Human Interaction-Aware 3D Reconstruction from a Single Image

cs.CV · 2026-04-07 · unverdicted · novelty 5.0

HUG3D uses group-instance multi-view diffusion and physics-based optimization to create physically plausible 3D reconstructions of interacting people from a single image.

Image-Guided Shape-from-Template Using Mesh Inextensibility Constraints

cs.CV · 2025-07-30 · unverdicted · novelty 5.0

An unsupervised SfT approach using image observations and mesh inextensibility constraints reconstructs deforming 3D shapes 400x faster than prior unsupervised methods while handling severe occlusions better.

A Systematic Survey on Deep Learning Architectures for Point Cloud Classification and Segmentation

cs.CV · 2026-05-16 · unverdicted · novelty 4.0

A systematic literature survey that categorizes deep learning architectures for point cloud classification, part segmentation, and semantic segmentation, evaluates them on benchmarks, and discusses innovations, limitations, and future directions.

citing papers explorer

Showing 26 of 26 citing papers.

Still Camouflage, Moving Illusion: View-Induced Trajectory Manipulation in Autonomous Driving cs.CR · 2026-05-12 · unverdicted · none · ref 41 · internal anchor
Static adversarial camouflage exploits natural view-angle changes during relative motion to induce consistent feature drift in AV perception, leading to incorrect trajectory predictions and unnecessary braking.
Meschers: Geometry Processing of Impossible Objects cs.GR · 2026-05-14 · unverdicted · none · ref 40 · internal anchor
Meschers are a new mesh representation for impossible geometric objects grounded in discrete exterior calculus that supports full discrete geometry processing including inverse rendering.
Human face perception reflects inverse-generative and naturalistic discriminative objectives q-bio.NC · 2026-05-12 · unverdicted · none · ref 79 · internal anchor
Human face perception aligns with neural networks trained on inverse-generative and naturalistic discriminative tasks, as these best predict human dissimilarity judgments on controversial and random face pairs.
Profile-Specific 3DMM Regression from a Single Lateral Face Image cs.CV · 2026-05-03 · unverdicted · none · ref 25 · internal anchor
Introduces the ProfileSynth dataset and a profile-specific FLAME 3DMM regression baseline with visibility-aware jawline regularization for 3D reconstruction from single lateral face images.
LEXIS: LatEnt ProXimal Interaction Signatures for 3D HOI from an Image cs.CV · 2026-04-22 · unverdicted · none · ref 63 · internal anchor
LEXIS-Flow uses VQ-VAE-learned interaction signatures to guide diffusion-based reconstruction of 3D human-object meshes and dense proximity fields from single RGB images, outperforming SOTA on benchmarks.
Geometrically Consistent Multi-View Scene Generation from Freehand Sketches cs.CV · 2026-04-15 · unverdicted · none · ref 38 · internal anchor
A framework generates consistent multi-view scenes from one freehand sketch via a ~9k-sample dataset, Parallel Camera-Aware Attention Adapters, and Sparse Correspondence Supervision Loss, outperforming baselines in realism and consistency.
InverseDraping: Recovering Sewing Patterns from 3D Garment Surfaces via BoxMesh Bridging cs.CV · 2026-04-03 · unverdicted · none · ref 56 · internal anchor
A two-stage autoregressive framework centered on BoxMesh recovers parametric sewing patterns from 3D garment surfaces, claiming state-of-the-art results on benchmarks and generalization to real scans and single-view images.
AGILE: Hand-Object Interaction Reconstruction from Video via Agentic Generation cs.CV · 2026-02-04 · unverdicted · none · ref 20 · internal anchor
AGILE generates complete object meshes via VLM-guided synthesis and tracks poses with anchor-and-track plus contact-aware optimization to achieve robust hand-object reconstruction from video.
UIKA: Fast Universal Head Avatar from Pose-Free Images cs.CV · 2026-01-12 · conditional · none · ref 65 · internal anchor
UIKA is a feed-forward animatable Gaussian head model using UV-guided correspondence estimation and learnable UV tokens with dual-level attention, trained on large-scale synthetic data to handle pose-free inputs.
NeuralBoneReg: An Instance-Specific Label-Free Point Cloud-Based Method for Multi-Modal Bone Surface Registration cs.CV · 2025-11-18 · unverdicted · none · ref 85 · internal anchor
NeuralBoneReg is a self-supervised instance-specific method using neural UDF and MLP-based point cloud registration that matches supervised SOTA accuracy on CT-US and CT-RGB-D bone datasets without inter-subject training data.
Objaverse-XL: A Universe of 10M+ 3D Objects cs.CV · 2023-07-11 · accept · none · ref 52 · internal anchor
Objaverse-XL supplies over 10 million diverse 3D objects that, when used to render 100 million views, improve zero-shot novel-view synthesis in models such as Zero123.
ObjView-Bench: Rethinking Difficulty and Deployment for Object-Centric View Planning cs.RO · 2026-05-11 · unverdicted · none · ref 9 · internal anchor
ObjView-Bench disentangles omnidirectional self-occlusion, saturation difficulty, and set-cover planning difficulty, then shows that budget regimes and reachable-view constraints change planner rankings and failure modes across classical, learned, and hybrid methods.
Learning a Delighting Prior for Facial Appearance Capture in the Wild cs.CV · 2026-05-07 · unverdicted · none · ref 72 · internal anchor
A delighting network trained via Dataset Latent Modulation on heterogeneous OLAT and Light Stage data enables high-quality in-the-wild facial reflectance capture from video and produces the NeRSemble-Scan dataset.
Multimodal Diffusion to Mutually Enhance Polarized Light and Low Resolution EBSD Data eess.IV · 2026-04-24 · unverdicted · none · ref 16 · internal anchor
A multimodal diffusion model trained on synthetic data enhances low-resolution EBSD and corrupted polarized light data, achieving near full-resolution performance with only 25% EBSD data.
TouchAnything: Diffusion-Guided 3D Reconstruction from Sparse Robot Touches cs.CV · 2026-04-10 · unverdicted · none · ref 34 · internal anchor
TouchAnything reconstructs accurate 3D object geometries from only a few tactile contacts by optimizing for consistency with a pretrained visual diffusion prior.
Visually-grounded Humanoid Agents cs.CV · 2026-04-09 · unverdicted · none · ref 77 · internal anchor
A coupled world-agent framework uses 3D Gaussian reconstruction and first-person RGB-D perception with iterative planning to enable goal-directed, collision-avoiding humanoid behavior in novel reconstructed scenes.
Stepper: Stepwise Immersive Scene Generation with Multiview Panoramas cs.CV · 2026-03-30 · unverdicted · none · ref 44 · internal anchor
Stepper uses stepwise panoramic expansion with a multi-view 360-degree diffusion model and geometry reconstruction to produce high-fidelity, structurally consistent immersive 3D scenes from text.
MAGICIAN: Efficient Long-Term Planning with Imagined Gaussians for Active Mapping cs.CV · 2026-03-23 · unverdicted · none · ref 39 · internal anchor
MAGICIAN uses Imagined Gaussians from occupancy networks for efficient coverage gain computation in tree-search based long-horizon planning for active mapping, achieving SOTA results on indoor and outdoor benchmarks.
ViewCrafter: Taming Video Diffusion Models for High-fidelity Novel View Synthesis cs.CV · 2024-09-03 · unverdicted · none · ref 71 · internal anchor
ViewCrafter tames video diffusion models with point-based 3D guidance and iterative trajectory planning to produce high-fidelity novel views from single or sparse images.
DROID: A Large-Scale In-The-Wild Robot Manipulation Dataset cs.RO · 2024-03-19 · accept · none · ref 44 · internal anchor
DROID is a new 76k-trajectory in-the-wild robot manipulation dataset spanning 564 scenes and 84 tasks that improves policy performance and generalization when used for training.
Shap-E: Generating Conditional 3D Implicit Functions cs.CV · 2023-05-03 · accept · none · ref 51 · internal anchor
Shap-E encodes 3D assets into implicit function parameters then uses a conditional diffusion model to generate new ones from text, enabling fast multi-representation 3D asset creation.
Human Interaction-Aware 3D Reconstruction from a Single Image cs.CV · 2026-04-07 · unverdicted · none · ref 36 · internal anchor
HUG3D uses group-instance multi-view diffusion and physics-based optimization to create physically plausible 3D reconstructions of interacting people from a single image.
Image-Guided Shape-from-Template Using Mesh Inextensibility Constraints cs.CV · 2025-07-30 · unverdicted · none · ref 39 · internal anchor
An unsupervised SfT approach using image observations and mesh inextensibility constraints reconstructs deforming 3D shapes 400x faster than prior unsupervised methods while handling severe occlusions better.
A Systematic Survey on Deep Learning Architectures for Point Cloud Classification and Segmentation cs.CV · 2026-05-16 · unverdicted · none · ref 75 · internal anchor
A systematic literature survey that categorizes deep learning architectures for point cloud classification, part segmentation, and semantic segmentation, evaluates them on benchmarks, and discusses innovations, limitations, and future directions.
Seed3D 2.0: Advancing High-Fidelity Simulation-Ready 3D Content Generation cs.GR · 2026-04-22 · unverdicted · none · ref 13 · internal anchor
Seed3D 2.0 advances 3D content generation via a coarse-to-fine geometry pipeline, unified PBR material model, and simulation-ready scene tools, reporting 69-89.9% win rates over commercial systems in human studies.
HY-World 2.0: A Multi-Modal World Model for Reconstructing, Generating, and Simulating 3D Worlds cs.CV · 2026-04-15 · unverdicted · none · ref 53 · internal anchor
HY-World 2.0 generates and reconstructs high-fidelity navigable 3D Gaussian Splatting worlds from text, images, or videos via upgraded panorama, planning, expansion, and composition modules, with released code claiming open-source SOTA performance.

Accelerating 3D Deep Learning with PyTorch3D

hub tools

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer