arxiv: 2403.02151 · v1 · submitted 2024-03-04 · 💻 cs.CV

Recognition: no theorem link

TripoSR: Fast 3D Object Reconstruction from a Single Image

Dmitry Tochilkin , David Pankratz , Zexiang Liu , Zixuan Huang , Adam Letts , Yangguang Li , Ding Liang , Christian Laforte

show 2 more authors

Varun Jampani Yan-Pei Cao

Authors on Pith no claims yet

Pith reviewed 2026-05-16 17:47 UTC · model grok-4.3

classification 💻 cs.CV

keywords 3D reconstructionsingle imagetransformerfeed-forwardmesh generationfast inferenceLRM

0 comments

The pith

TripoSR produces a 3D mesh from one photo in under half a second by refining the LRM transformer design.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper presents TripoSR as a feed-forward transformer model that converts a single image into a textured 3D mesh. It starts from the LRM architecture and adds targeted changes to data handling, network structure, and training procedure. These changes produce meshes faster than prior open-source systems while scoring higher on standard quantitative and visual benchmarks. The model is released under MIT license so that others can use and extend it for downstream 3D tasks.

Core claim

TripoSR is a transformer network that takes a single image and directly outputs a 3D mesh in under 0.5 seconds. By combining improvements in data processing, model design, and training techniques on top of the LRM backbone, the system achieves better numerical accuracy and visual quality than existing open-source single-image reconstruction methods on public test sets.

What carries the argument

A transformer-based feed-forward network derived from LRM that maps an input image to a 3D mesh through refined data pipelines, architectural tweaks, and training schedules.

If this is right

Real-time single-image 3D capture becomes practical on consumer hardware.
Downstream applications such as AR object placement or rapid prototyping can start from casual photos.
Open release under MIT license lowers the barrier for further model improvements by the community.
Quantitative benchmarks on public sets now have a stronger open-source baseline for comparison.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The speed may allow integration into video pipelines for frame-by-frame 3D lifting without multi-view capture.
If the same design pattern transfers to other modalities, similar speed gains could appear in related reconstruction tasks.
Casual users could generate 3D assets for games or VR directly from smartphone snapshots.
The approach might reduce reliance on expensive multi-camera rigs in industrial scanning workflows.

Load-bearing premise

The reported gains come from genuine generalization rather than fitting the specific datasets used for evaluation.

What would settle it

TripoSR producing lower accuracy or worse visual quality than the best prior open-source method on a fresh dataset collected from everyday photos not matching the original training distribution.

read the original abstract

This technical report introduces TripoSR, a 3D reconstruction model leveraging transformer architecture for fast feed-forward 3D generation, producing 3D mesh from a single image in under 0.5 seconds. Building upon the LRM network architecture, TripoSR integrates substantial improvements in data processing, model design, and training techniques. Evaluations on public datasets show that TripoSR exhibits superior performance, both quantitatively and qualitatively, compared to other open-source alternatives. Released under the MIT license, TripoSR is intended to empower researchers, developers, and creatives with the latest advancements in 3D generative AI.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

TripoSR is a solid engineering polish on LRM that delivers usable single-image meshes in under 0.5s with better benchmark numbers, but the gains look incremental and the generalization story needs more scrutiny.

read the letter

TripoSR takes the LRM transformer backbone and adds targeted tweaks to data handling, model details, and training to push single-image 3D mesh output under half a second while beating open-source baselines on public sets. The main practical win is speed plus an MIT release, which lowers the barrier for anyone who needs quick 3D assets from photos. That combination is genuinely useful for graphics pipelines and prototyping even if the core architecture is not a fresh invention. The quantitative edge over prior open models is the clearest new result here. The soft spot is the lack of visible ablations or split details that would show whether the reported lifts come from the claimed changes or from tighter alignment to the same Objaverse-style distributions used in evaluation. Without those checks the generalization claim stays plausible but unproven. The paper is aimed at practitioners who want a drop-in fast recon tool rather than theorists looking for new principles. It is coherent on its own terms and the empirical focus is straightforward, so it deserves a serious referee who can check the training recipes and test splits in detail.

Referee Report

3 major / 2 minor

Summary. The paper introduces TripoSR, a transformer-based feed-forward model for single-image 3D mesh reconstruction that runs in under 0.5 seconds. Building on the LRM architecture, it incorporates targeted improvements in data processing, model design, and training techniques, and reports superior quantitative and qualitative performance relative to other open-source baselines on public datasets.

Significance. If the reported gains prove robust to dataset shifts and are supported by isolating ablations, the work would meaningfully advance practical 3D generative AI by delivering a fast, open-source alternative suitable for downstream applications. The MIT release is a concrete strength that supports reproducibility.

major comments (3)

[§4] §4 (Experiments and Results): the central claim of superiority rests on quantitative metrics, yet no ablation tables isolate the individual contributions of the data-processing pipeline, architectural changes, or training schedule. Without these controls it is impossible to confirm that the gains arise from the stated improvements rather than hyper-parameter tuning or dataset alignment.
[§3, §4.2] §3 (Method) and §4.2 (Dataset): the manuscript does not state whether the public evaluation sets (Objaverse-derived or otherwise) use fully disjoint splits from any data used during the reported training or fine-tuning stages. This omission leaves open the possibility that higher metrics reflect distribution matching rather than improved generalization.
[§4.1] §4.1 (Baselines): the comparison set is limited to open-source alternatives; the paper should either include a stronger closed-source reference or explicitly justify why the chosen baselines suffice to support the claim of state-of-the-art performance.

minor comments (2)

[Figure 1, §2] Figure 1 and §2: the high-level diagram of the TripoSR pipeline would benefit from explicit call-outs to the new components relative to LRM so readers can immediately see the modifications.
[Abstract, §1] Abstract and §1: the phrase 'public datasets' is used without naming the specific benchmarks (e.g., Objaverse, ShapeNet) or providing a citation; this should be expanded on first mention.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We appreciate the referee's detailed feedback on our manuscript. We have carefully considered each comment and provide point-by-point responses below. We will incorporate revisions to address the concerns where feasible.

read point-by-point responses

Referee: [§4] §4 (Experiments and Results): the central claim of superiority rests on quantitative metrics, yet no ablation tables isolate the individual contributions of the data-processing pipeline, architectural changes, or training schedule. Without these controls it is impossible to confirm that the gains arise from the stated improvements rather than hyper-parameter tuning or dataset alignment.

Authors: We acknowledge that the current version of the manuscript does not include comprehensive ablation studies isolating each component. To address this, we will add ablation tables in the revised manuscript that evaluate the contributions of the data-processing pipeline, architectural changes, and training schedule separately. These ablations will help confirm that the reported improvements are due to the proposed techniques. revision: yes
Referee: [§3, §4.2] §3 (Method) and §4.2 (Dataset): the manuscript does not state whether the public evaluation sets (Objaverse-derived or otherwise) use fully disjoint splits from any data used during the reported training or fine-tuning stages. This omission leaves open the possibility that higher metrics reflect distribution matching rather than improved generalization.

Authors: The evaluation datasets are indeed disjoint from the training data. We used standard splits where the test sets do not overlap with training samples. We will explicitly document this in the revised §4.2 to eliminate any ambiguity regarding data leakage and to strengthen the generalization claims. revision: yes
Referee: [§4.1] §4.1 (Baselines): the comparison set is limited to open-source alternatives; the paper should either include a stronger closed-source reference or explicitly justify why the chosen baselines suffice to support the claim of state-of-the-art performance.

Authors: We agree that including closed-source comparisons would be ideal, but since those models are not publicly available, direct quantitative comparison is not possible. We will revise the manuscript to explicitly justify our choice of open-source baselines, noting that they represent the current reproducible state-of-the-art and that our work aims to provide an open-source alternative. This justification will be added to §4.1. revision: partial

Circularity Check

0 steps flagged

No significant circularity in the empirical model and performance claims

full rationale

The paper presents TripoSR as an empirical feed-forward model that builds on the LRM architecture with described improvements to data processing, model design, and training. Its central claims consist of quantitative and qualitative performance results on public datasets rather than any mathematical derivation chain. No equations, uniqueness theorems, or fitted-parameter predictions are shown that reduce by construction to the inputs. Any self-citations to LRM or prior work are not load-bearing for the reported results, which remain independently falsifiable via the stated evaluations. The analysis therefore finds no circular steps.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 0 invented entities

The approach rests on standard transformer and 3D reconstruction assumptions plus the specific data-processing and training changes introduced by the authors; no new physical entities or ad-hoc constants are described in the abstract.

free parameters (1)

model hyperparameters and training schedule
Standard deep-learning parameters fitted during training; exact values not stated in abstract.

axioms (1)

domain assumption Transformer architecture can map single-image features to 3D mesh parameters effectively
Inherited from LRM and standard in recent 3D generation work.

pith-pipeline@v0.9.0 · 5433 in / 1097 out tokens · 39545 ms · 2026-05-16T17:47:05.344727+00:00 · methodology

discussion (0)

Forward citations

Cited by 22 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

R-DMesh: Video-Guided 3D Animation via Rectified Dynamic Mesh Flow
cs.CV 2026-05 unverdicted novelty 7.0

R-DMesh generates high-fidelity 4D meshes aligned to video by disentangling base mesh, motion, and a learned rectification jump offset inside a VAE, then using Triflow Attention and rectified-flow diffusion.
Img2CADSeq: Image-to-CAD Generation via Sequence-Based Diffusion
cs.CV 2026-05 unverdicted novelty 7.0

Img2CADSeq generates standard CAD sequences from images via a multi-stage pipeline with three-level hierarchical codebook encoding, importance-guided compression, and contrastive point-cloud conditioning of a VQ-Diffu...
AmaraSpatial-10K: A Spatially and Semantically Aligned 3D Dataset for Spatial Computing and Embodied AI
cs.CV 2026-04 unverdicted novelty 7.0

AmaraSpatial-10K is a new dataset of over 10,000 metric-scaled and semantically anchored 3D assets that achieves 3.4 times higher text retrieval precision than Objaverse for embodied AI and spatial computing.
Towards Realistic and Consistent Orbital Video Generation via 3D Foundation Priors
cs.CV 2026-04 unverdicted novelty 7.0

A video generation approach conditions a base model with multi-scale 3D latent features and a cross-attention adapter to produce geometrically realistic and consistent orbital videos from one image.
AniGen: Unified $S^3$ Fields for Animatable 3D Asset Generation
cs.GR 2026-04 unverdicted novelty 7.0

AniGen directly generates animatable 3D assets with consistent shape, skeleton, and skinning from single images using unified S^3 fields and a two-stage flow-matching pipeline.
Benchmarking Vision-Language Models under Contradictory Virtual Content Attacks in Augmented Reality
cs.CV 2026-04 unverdicted novelty 7.0

ContrAR benchmark reveals that current VLMs show reasonable understanding of contradictory virtual content in AR but need improvement in detection, reasoning, and balancing accuracy with latency.
CARI4D: Category Agnostic 4D Reconstruction of Human-Object Interaction
cs.CV 2025-12 unverdicted novelty 7.0

CARI4D is the first category-agnostic pipeline that produces metric-scale, spatially and temporally consistent 4D reconstructions of human-object interactions from monocular RGB videos via foundation-model hypothesis ...
Structured 3D Latents for Scalable and Versatile 3D Generation
cs.CV 2024-12 unverdicted novelty 7.0

SLAT provides a unified 3D latent representation enabling versatile high-quality generation across multiple output formats from text or image inputs.
Sat3DGen: Comprehensive Street-Level 3D Scene Generation from Single Satellite Image
cs.CV 2026-05 unverdicted novelty 6.0

Sat3DGen improves geometric RMSE from 6.76m to 5.20m and FID from ~40 to 19 for street-level 3D generation from satellite images via geometry-centric constraints and perspective training.
Toward Visually Realistic Simulation: A Benchmark for Evaluating Robot Manipulation in Simulation
cs.RO 2026-05 unverdicted novelty 6.0

VISER is a new visually realistic simulation benchmark for robot manipulation tasks that uses PBR materials and MLLM-assisted asset generation, achieving 0.92 Pearson correlation with real-world policy performance.
Prop-Chromeleon: Adaptive Haptic Props in Mixed Reality through Generative Artificial Intelligence
cs.HC 2026-05 unverdicted novelty 6.0

A generative-AI pipeline dynamically generates and anchors virtual assets to match the shape of physical props, enabling adaptive passive haptics in MR that users rate higher in realism, immersion, and enjoyment than ...
Feed-Forward 3D Scene Modeling: A Problem-Driven Perspective
cs.CV 2026-04 unverdicted novelty 6.0

The paper proposes a problem-driven taxonomy for feed-forward 3D scene modeling that groups methods by five core challenges: feature enhancement, geometry awareness, model efficiency, augmentation strategies, and temp...
Lyra 2.0: Explorable Generative 3D Worlds
cs.CV 2026-04 unverdicted novelty 6.0

Lyra 2.0 produces persistent 3D-consistent video sequences for large explorable worlds by using per-frame geometry for information routing and self-augmented training to correct temporal drift.
A Semi-Automated Framework for 3D Reconstruction of Medieval Manuscript Miniatures
cs.CV 2026-04 conditional novelty 6.0

A pipeline using SAM segmentation and Hi3DGen mesh generation, evaluated on 69 medieval figures, produces usable 3D models for XR and tactile applications with Hi3DGen as the best starting point.
TripoSG: High-Fidelity 3D Shape Synthesis using Large-Scale Rectified Flow Models
cs.CV 2025-02 unverdicted novelty 6.0

TripoSG generates high-fidelity 3D meshes from input images via a large-scale rectified flow transformer and hybrid-trained 3D VAE on a custom 2-million-sample dataset, claiming state-of-the-art fidelity and generalization.
InstantMesh: Efficient 3D Mesh Generation from a Single Image with Sparse-view Large Reconstruction Models
cs.CV 2024-04 unverdicted novelty 6.0

InstantMesh produces diverse, high-quality 3D meshes from single images in seconds by combining a multi-view diffusion model with a sparse-view large reconstruction model and optimizing directly on meshes.
R-DMesh: Video-Guided 3D Animation via Rectified Dynamic Mesh Flow
cs.CV 2026-05 unverdicted novelty 5.0

R-DMesh uses a VAE with a learned rectification jump offset and Triflow Attention inside a rectified-flow diffusion transformer to produce video-aligned 4D meshes despite initial pose misalignment.
From Visual Synthesis to Interactive Worlds: Toward Production-Ready 3D Asset Generation
cs.GR 2026-04 unverdicted novelty 5.0

The paper surveys 3D asset generation methods and organizes them around the full production pipeline to assess which outputs meet engine-level requirements for interactive applications.
AmaraSpatial-10K: A Spatially and Semantically Aligned 3D Dataset for Spatial Computing and Embodied AI
cs.CV 2026-04 unverdicted novelty 5.0

AmaraSpatial-10K supplies 10K deployment-ready 3D assets with metric scaling and metadata, delivering 3.4x higher CLIP Recall@5 than Objaverse and 99.1% physics stability in Habitat-Sim.
UniMesh: Unifying 3D Mesh Understanding and Generation
cs.CV 2026-04 unverdicted novelty 5.0

UniMesh unifies 3D mesh generation and understanding in one model via a Mesh Head interface, Chain of Mesh iterative editing, and an Actor-Evaluator self-reflection loop.
From Visual Synthesis to Interactive Worlds: Toward Production-Ready 3D Asset Generation
cs.GR 2026-04 unverdicted novelty 4.0

The paper surveys 3D content generation literature using a taxonomy of asset types and production stages to evaluate progress toward engine-ready assets.
OpenWorldLib: A Unified Codebase and Definition of Advanced World Models
cs.CV 2026-04 unverdicted novelty 4.0

OpenWorldLib offers a standardized codebase and definition for world models that combine perception, interaction, and memory to understand and predict the world.

Reference graph

Works this paper leans on

35 extracted references · 35 canonical work pages · cited by 19 Pith papers · 4 internal anchors

[1]

Emerg- ing properties in self-supervised vision transformers

Mathilde Caron, Hugo Touvron, Ishan Misra, Herv ´e J´egou, Julien Mairal, Piotr Bojanowski, and Armand Joulin. Emerg- ing properties in self-supervised vision transformers. In Pro- ceedings of the International Conference on Computer Vi- sion (ICCV), 2021. 2

work page 2021
[2]

Efficient geometry-aware 3d generative adversarial networks

Eric R Chan, Connor Z Lin, Matthew A Chan, Koki Nagano, Boxiao Pan, Shalini De Mello, Orazio Gallo, Leonidas J Guibas, Jonathan Tremblay, Sameh Khamis, et al. Efficient geometry-aware 3d generative adversarial networks. In Pro- ceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 16123–16133, 2022. 2

work page 2022
[3]

Chan, Koki Nagano, Matthew A

Eric R. Chan, Koki Nagano, Matthew A. Chan, Alexan- der W. Bergman, Jeong Joon Park, Axel Levy, Miika Ait- tala, Shalini De Mello, Tero Karras, and Gordon Wetzstein. GeNVS: Generative novel view synthesis with 3D-aware dif- fusion models. In arXiv, 2023. 1

work page 2023
[4]

Objaverse: A universe of annotated 3d objects

Matt Deitke, Dustin Schwenk, Jordi Salvador, Luca Weihs, Oscar Michel, Eli VanderBilt, Ludwig Schmidt, Kiana Ehsani, Aniruddha Kembhavi, and Ali Farhadi. Objaverse: A universe of annotated 3d objects. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 13142–13153, 2023. 1, 2 5

work page 2023
[5]

Objaverse-xl: A universe of 10m+ 3d objects

Matt Deitke, Ruoshi Liu, Matthew Wallingford, Huong Ngo, Oscar Michel, Aditya Kusupati, Alan Fan, Christian Laforte, Vikram V oleti, Samir Yitzhak Gadre, et al. Objaverse-xl: A universe of 10m+ 3d objects. Advances in Neural Informa- tion Processing Systems, 36, 2024. 1

work page 2024
[6]

Google scanned objects: A high- quality dataset of 3d scanned household items

Laura Downs, Anthony Francis, Nate Koenig, Brandon Kin- man, Ryan Hickman, Krista Reymann, Thomas B McHugh, and Vincent Vanhoucke. Google scanned objects: A high- quality dataset of 3d scanned household items. In 2022 In- ternational Conference on Robotics and Automation (ICRA), pages 2553–2560. IEEE, 2022. 4, 5

work page 2022
[7]

Mesh r-cnn

Georgia Gkioxari, Jitendra Malik, and Justin Johnson. Mesh r-cnn. In Proceedings of the IEEE/CVF international confer- ence on computer vision , pages 9785–9795, 2019. 2

work page 2019
[8]

A papier-m ˆach´e ap- proach to learning 3d surface generation

Thibault Groueix, Matthew Fisher, Vladimir G Kim, Bryan C Russell, and Mathieu Aubry. A papier-m ˆach´e ap- proach to learning 3d surface generation. In Proceedings of the IEEE conference on computer vision and pattern recog- nition, pages 216–224, 2018. 2

work page 2018
[9]

threestudio: A unified framework for 3d content gen- eration, 2023

Yuan-Chen Guo, Ying-Tian Liu, Chen Wang, Zi-Xin Zou, Guan Luo, Chia-Hao Chen, Yan-Pei Cao, and Song-Hai Zhang. threestudio: A unified framework for 3d content gen- eration, 2023. 1

work page 2023
[10]

Openlrm: Open-source large reconstruction models

Zexin He and Tengfei Wang. Openlrm: Open-source large reconstruction models. https://github.com/3DTopi a/OpenLRM, 2023. 4

work page 2023
[11]

LRM: Large Reconstruction Model for Single Image to 3D

Yicong Hong, Kai Zhang, Jiuxiang Gu, Sai Bi, Yang Zhou, Difan Liu, Feng Liu, Kalyan Sunkavalli, Trung Bui, and Hao Tan. Lrm: Large reconstruction model for single image to 3d. arXiv preprint arXiv:2311.04400, 2023. 1, 2, 5

work page internal anchor Pith review Pith/arXiv arXiv 2023
[12]

Zixuan Huang, Varun Jampani, Anh Thai, Yuanzhen Li, Ste- fan Stojanov, and James M. Rehg. Shapeclipper: Scalable 3d shape learning from single-view images via geometric and clip-based consistency. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 12912–12922, 2023

work page 2023
[13]

Zeroshape: Regression-based zero- shot shape reconstruction

Zixuan Huang, Stefan Stojanov, Anh Thai, Varun Jampani, and James M Rehg. Zeroshape: Regression-based zero- shot shape reconstruction. arXiv preprint arXiv:2312.14198,

work page arXiv
[14]

arXiv preprint arXiv:2311.06214 , year=

Jiahao Li, Hao Tan, Kai Zhang, Zexiang Xu, Fujun Luan, Yinghao Xu, Yicong Hong, Kalyan Sunkavalli, Greg Shakhnarovich, and Sai Bi. Instant3d: Fast text-to-3d with sparse-view generation and large reconstruction model. arXiv preprint arXiv:2311.06214, 2023. 2

work page arXiv 2023
[15]

Advances in 3d generation: A survey

Xiaoyu Li, Qi Zhang, Di Kang, Weihao Cheng, Yiming Gao, Jingbo Zhang, Zhihao Liang, Jing Liao, Yan-Pei Cao, and Ying Shan. Advances in 3d generation: A survey. arXiv preprint arXiv:2401.17807, 2024. 2

work page arXiv 2024
[16]

One-2-3-45: Any single image to 3d mesh in 45 seconds without per-shape optimiza- tion

Minghua Liu, Chao Xu, Haian Jin, Linghao Chen, Mukund Varma T, Zexiang Xu, and Hao Su. One-2-3-45: Any single image to 3d mesh in 45 seconds without per-shape optimiza- tion. Advances in Neural Information Processing Systems , 36, 2024. 4

work page 2024
[17]

Zero-1-to-3: Zero-shot one image to 3d object

Ruoshi Liu, Rundi Wu, Basile Van Hoorick, Pavel Tok- makov, Sergey Zakharov, and Carl V ondrick. Zero-1-to-3: Zero-shot one image to 3d object. In Proceedings of the IEEE/CVF International Conference on Computer Vision , pages 9298–9309, 2023. 1, 2

work page 2023
[18]

Lorensen and Harvey E

William E. Lorensen and Harvey E. Cline. Marching cubes: A high resolution 3d surface construction algorithm. SIG- GRAPH Comput. Graph., 21(4):163–169, 1987. 4

work page 1987
[19]

Occupancy networks: Learning 3d reconstruction in function space

Lars Mescheder, Michael Oechsle, Michael Niemeyer, Se- bastian Nowozin, and Andreas Geiger. Occupancy networks: Learning 3d reconstruction in function space. InProceedings of the IEEE/CVF Conference on Computer Vision and Pat- tern Recognition, pages 4460–4470, 2019. 2

work page 2019
[20]

DreamFusion: Text-to-3D using 2D Diffusion

Ben Poole, Ajay Jain, Jonathan T Barron, and Ben Milden- hall. Dreamfusion: Text-to-3d using 2d diffusion. arXiv preprint arXiv:2209.14988, 2022. 2

work page internal anchor Pith review Pith/arXiv arXiv 2022
[21]

MVDream: Multi-view Diffusion for 3D Generation

Yichun Shi, Peng Wang, Jianglong Ye, Mai Long, Kejie Li, and Xiao Yang. Mvdream: Multi-view diffusion for 3d gen- eration. arXiv preprint arXiv:2308.16512, 2023. 2

work page internal anchor Pith review Pith/arXiv arXiv 2023
[22]

Deep generative models on 3d rep- resentations: A survey

Zifan Shi, Sida Peng, Yinghao Xu, Andreas Geiger, Yiyi Liao, and Yujun Shen. Deep generative models on 3d rep- resentations: A survey. arXiv preprint arXiv:2210.15663 ,

work page arXiv
[23]

DreamGaussian: Generative Gaussian Splatting for Efficient 3D Content Creation

Jiaxiang Tang, Jiawei Ren, Hang Zhou, Ziwei Liu, and Gang Zeng. Dreamgaussian: Generative gaussian splatting for effi- cient 3d content creation. arXiv preprint arXiv:2309.16653,

work page internal anchor Pith review Pith/arXiv arXiv
[24]

Lgm: Large multi-view gaussian model for high-resolution 3d content creation.arXiv preprint arXiv:2402.05054, 2024

Jiaxiang Tang, Zhaoxi Chen, Xiaokang Chen, Tengfei Wang, Gang Zeng, and Ziwei Liu. Lgm: Large multi-view gaussian model for high-resolution 3d content creation.arXiv preprint arXiv:2402.05054, 2024. 2

work page arXiv 2024
[25]

Pixel2mesh: Generating 3d mesh models from single rgb images

Nanyang Wang, Yinda Zhang, Zhuwen Li, Yanwei Fu, Wei Liu, and Yu-Gang Jiang. Pixel2mesh: Generating 3d mesh models from single rgb images. In Proceedings of the Euro- pean conference on computer vision (ECCV) , pages 52–67, 2018

work page 2018
[26]

arXiv preprint arXiv:2311.12024 , year=

Peng Wang, Hao Tan, Sai Bi, Yinghao Xu, Fujun Luan, Kalyan Sunkavalli, Wenping Wang, Zexiang Xu, and Kai Zhang. Pf-lrm: Pose-free large reconstruction model for joint pose and shape prediction. arXiv preprint arXiv:2311.12024, 2023. 2

work page arXiv 2023
[27]

Prolificdreamer: High-fidelity and diverse text-to-3d generation with variational score distilla- tion

Zhengyi Wang, Cheng Lu, Yikai Wang, Fan Bao, Chongxuan Li, Hang Su, and Jun Zhu. Prolificdreamer: High-fidelity and diverse text-to-3d generation with variational score distilla- tion. Advances in Neural Information Processing Systems , 36, 2024. 2

work page 2024
[28]

Multiview compres- sive coding for 3d reconstruction

Chao-Yuan Wu, Justin Johnson, Jitendra Malik, Christoph Feichtenhofer, and Georgia Gkioxari. Multiview compres- sive coding for 3d reconstruction. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 9065–9075, 2023. 2

work page 2023
[29]

Reconfu- sion: 3d reconstruction with diffusion priors

Rundi Wu, Ben Mildenhall, Philipp Henzler, Keunhong Park, Ruiqi Gao, Daniel Watson, Pratul P Srinivasan, Dor Verbin, Jonathan T Barron, Ben Poole, et al. Reconfu- sion: 3d reconstruction with diffusion priors. arXiv preprint arXiv:2312.02981, 2023. 1

work page arXiv 2023
[30]

Omniobject3d: Large-vocabulary 3d object dataset for realistic perception, reconstruction and generation

Tong Wu, Jiarui Zhang, Xiao Fu, Yuxin Wang, Jiawei Ren, Liang Pan, Wayne Wu, Lei Yang, Jiaqi Wang, Chen Qian, et al. Omniobject3d: Large-vocabulary 3d object dataset for realistic perception, reconstruction and generation. In Pro- 6 ceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 803–814, 2023. 4, 5

work page 2023
[31]

arXiv preprint arXiv:2311.09217 , year=

Yinghao Xu, Hao Tan, Fujun Luan, Sai Bi, Peng Wang, Ji- ahao Li, Zifan Shi, Kalyan Sunkavalli, Gordon Wetzstein, Zexiang Xu, et al. Dmv3d: Denoising multi-view diffu- sion using 3d large reconstruction model. arXiv preprint arXiv:2311.09217, 2023. 2

work page arXiv 2023
[32]

Learning to re- construct shapes from unseen classes

Xiuming Zhang, Zhoutong Zhang, Chengkai Zhang, Josh Tenenbaum, Bill Freeman, and Jiajun Wu. Learning to re- construct shapes from unseen classes. Advances in neural information processing systems, 31, 2018. 2

work page 2018
[33]

Sparsefusion: Dis- tilling view-conditioned diffusion for 3d reconstruction

Zhizhuo Zhou and Shubham Tulsiani. Sparsefusion: Dis- tilling view-conditioned diffusion for 3d reconstruction. In Proceedings of the IEEE/CVF Conference on Computer Vi- sion and Pattern Recognition, pages 12588–12597, 2023. 1

work page 2023
[34]

Sparse3d: Distill- ing multiview-consistent diffusion for object reconstruction from sparse views

Zi-Xin Zou, Weihao Cheng, Yan-Pei Cao, Shi-Sheng Huang, Ying Shan, and Song-Hai Zhang. Sparse3d: Distill- ing multiview-consistent diffusion for object reconstruction from sparse views. arXiv preprint arXiv:2308.14078, 2023

work page arXiv 2023
[35]

Triplane meets gaussian splatting: Fast and generalizable single- view 3d reconstruction with transformers

Zi-Xin Zou, Zhipeng Yu, Yuan-Chen Guo, Yangguang Li, Ding Liang, Yan-Pei Cao, and Song-Hai Zhang. Triplane meets gaussian splatting: Fast and generalizable single- view 3d reconstruction with transformers. arXiv preprint arXiv:2312.09147, 2023. 1, 2, 4 7

work page arXiv 2023