pith. machine review for the scientific record. sign in

arxiv: 2403.02151 · v1 · submitted 2024-03-04 · 💻 cs.CV

Recognition: no theorem link

TripoSR: Fast 3D Object Reconstruction from a Single Image

Authors on Pith no claims yet

Pith reviewed 2026-05-16 17:47 UTC · model grok-4.3

classification 💻 cs.CV
keywords 3D reconstructionsingle imagetransformerfeed-forwardmesh generationfast inferenceLRM
0
0 comments X

The pith

TripoSR produces a 3D mesh from one photo in under half a second by refining the LRM transformer design.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper presents TripoSR as a feed-forward transformer model that converts a single image into a textured 3D mesh. It starts from the LRM architecture and adds targeted changes to data handling, network structure, and training procedure. These changes produce meshes faster than prior open-source systems while scoring higher on standard quantitative and visual benchmarks. The model is released under MIT license so that others can use and extend it for downstream 3D tasks.

Core claim

TripoSR is a transformer network that takes a single image and directly outputs a 3D mesh in under 0.5 seconds. By combining improvements in data processing, model design, and training techniques on top of the LRM backbone, the system achieves better numerical accuracy and visual quality than existing open-source single-image reconstruction methods on public test sets.

What carries the argument

A transformer-based feed-forward network derived from LRM that maps an input image to a 3D mesh through refined data pipelines, architectural tweaks, and training schedules.

If this is right

  • Real-time single-image 3D capture becomes practical on consumer hardware.
  • Downstream applications such as AR object placement or rapid prototyping can start from casual photos.
  • Open release under MIT license lowers the barrier for further model improvements by the community.
  • Quantitative benchmarks on public sets now have a stronger open-source baseline for comparison.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The speed may allow integration into video pipelines for frame-by-frame 3D lifting without multi-view capture.
  • If the same design pattern transfers to other modalities, similar speed gains could appear in related reconstruction tasks.
  • Casual users could generate 3D assets for games or VR directly from smartphone snapshots.
  • The approach might reduce reliance on expensive multi-camera rigs in industrial scanning workflows.

Load-bearing premise

The reported gains come from genuine generalization rather than fitting the specific datasets used for evaluation.

What would settle it

TripoSR producing lower accuracy or worse visual quality than the best prior open-source method on a fresh dataset collected from everyday photos not matching the original training distribution.

read the original abstract

This technical report introduces TripoSR, a 3D reconstruction model leveraging transformer architecture for fast feed-forward 3D generation, producing 3D mesh from a single image in under 0.5 seconds. Building upon the LRM network architecture, TripoSR integrates substantial improvements in data processing, model design, and training techniques. Evaluations on public datasets show that TripoSR exhibits superior performance, both quantitatively and qualitatively, compared to other open-source alternatives. Released under the MIT license, TripoSR is intended to empower researchers, developers, and creatives with the latest advancements in 3D generative AI.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

3 major / 2 minor

Summary. The paper introduces TripoSR, a transformer-based feed-forward model for single-image 3D mesh reconstruction that runs in under 0.5 seconds. Building on the LRM architecture, it incorporates targeted improvements in data processing, model design, and training techniques, and reports superior quantitative and qualitative performance relative to other open-source baselines on public datasets.

Significance. If the reported gains prove robust to dataset shifts and are supported by isolating ablations, the work would meaningfully advance practical 3D generative AI by delivering a fast, open-source alternative suitable for downstream applications. The MIT release is a concrete strength that supports reproducibility.

major comments (3)
  1. [§4] §4 (Experiments and Results): the central claim of superiority rests on quantitative metrics, yet no ablation tables isolate the individual contributions of the data-processing pipeline, architectural changes, or training schedule. Without these controls it is impossible to confirm that the gains arise from the stated improvements rather than hyper-parameter tuning or dataset alignment.
  2. [§3, §4.2] §3 (Method) and §4.2 (Dataset): the manuscript does not state whether the public evaluation sets (Objaverse-derived or otherwise) use fully disjoint splits from any data used during the reported training or fine-tuning stages. This omission leaves open the possibility that higher metrics reflect distribution matching rather than improved generalization.
  3. [§4.1] §4.1 (Baselines): the comparison set is limited to open-source alternatives; the paper should either include a stronger closed-source reference or explicitly justify why the chosen baselines suffice to support the claim of state-of-the-art performance.
minor comments (2)
  1. [Figure 1, §2] Figure 1 and §2: the high-level diagram of the TripoSR pipeline would benefit from explicit call-outs to the new components relative to LRM so readers can immediately see the modifications.
  2. [Abstract, §1] Abstract and §1: the phrase 'public datasets' is used without naming the specific benchmarks (e.g., Objaverse, ShapeNet) or providing a citation; this should be expanded on first mention.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We appreciate the referee's detailed feedback on our manuscript. We have carefully considered each comment and provide point-by-point responses below. We will incorporate revisions to address the concerns where feasible.

read point-by-point responses
  1. Referee: [§4] §4 (Experiments and Results): the central claim of superiority rests on quantitative metrics, yet no ablation tables isolate the individual contributions of the data-processing pipeline, architectural changes, or training schedule. Without these controls it is impossible to confirm that the gains arise from the stated improvements rather than hyper-parameter tuning or dataset alignment.

    Authors: We acknowledge that the current version of the manuscript does not include comprehensive ablation studies isolating each component. To address this, we will add ablation tables in the revised manuscript that evaluate the contributions of the data-processing pipeline, architectural changes, and training schedule separately. These ablations will help confirm that the reported improvements are due to the proposed techniques. revision: yes

  2. Referee: [§3, §4.2] §3 (Method) and §4.2 (Dataset): the manuscript does not state whether the public evaluation sets (Objaverse-derived or otherwise) use fully disjoint splits from any data used during the reported training or fine-tuning stages. This omission leaves open the possibility that higher metrics reflect distribution matching rather than improved generalization.

    Authors: The evaluation datasets are indeed disjoint from the training data. We used standard splits where the test sets do not overlap with training samples. We will explicitly document this in the revised §4.2 to eliminate any ambiguity regarding data leakage and to strengthen the generalization claims. revision: yes

  3. Referee: [§4.1] §4.1 (Baselines): the comparison set is limited to open-source alternatives; the paper should either include a stronger closed-source reference or explicitly justify why the chosen baselines suffice to support the claim of state-of-the-art performance.

    Authors: We agree that including closed-source comparisons would be ideal, but since those models are not publicly available, direct quantitative comparison is not possible. We will revise the manuscript to explicitly justify our choice of open-source baselines, noting that they represent the current reproducible state-of-the-art and that our work aims to provide an open-source alternative. This justification will be added to §4.1. revision: partial

Circularity Check

0 steps flagged

No significant circularity in the empirical model and performance claims

full rationale

The paper presents TripoSR as an empirical feed-forward model that builds on the LRM architecture with described improvements to data processing, model design, and training. Its central claims consist of quantitative and qualitative performance results on public datasets rather than any mathematical derivation chain. No equations, uniqueness theorems, or fitted-parameter predictions are shown that reduce by construction to the inputs. Any self-citations to LRM or prior work are not load-bearing for the reported results, which remain independently falsifiable via the stated evaluations. The analysis therefore finds no circular steps.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 0 invented entities

The approach rests on standard transformer and 3D reconstruction assumptions plus the specific data-processing and training changes introduced by the authors; no new physical entities or ad-hoc constants are described in the abstract.

free parameters (1)
  • model hyperparameters and training schedule
    Standard deep-learning parameters fitted during training; exact values not stated in abstract.
axioms (1)
  • domain assumption Transformer architecture can map single-image features to 3D mesh parameters effectively
    Inherited from LRM and standard in recent 3D generation work.

pith-pipeline@v0.9.0 · 5433 in / 1097 out tokens · 39545 ms · 2026-05-16T17:47:05.344727+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Forward citations

Cited by 22 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. R-DMesh: Video-Guided 3D Animation via Rectified Dynamic Mesh Flow

    cs.CV 2026-05 unverdicted novelty 7.0

    R-DMesh generates high-fidelity 4D meshes aligned to video by disentangling base mesh, motion, and a learned rectification jump offset inside a VAE, then using Triflow Attention and rectified-flow diffusion.

  2. Img2CADSeq: Image-to-CAD Generation via Sequence-Based Diffusion

    cs.CV 2026-05 unverdicted novelty 7.0

    Img2CADSeq generates standard CAD sequences from images via a multi-stage pipeline with three-level hierarchical codebook encoding, importance-guided compression, and contrastive point-cloud conditioning of a VQ-Diffu...

  3. AmaraSpatial-10K: A Spatially and Semantically Aligned 3D Dataset for Spatial Computing and Embodied AI

    cs.CV 2026-04 unverdicted novelty 7.0

    AmaraSpatial-10K is a new dataset of over 10,000 metric-scaled and semantically anchored 3D assets that achieves 3.4 times higher text retrieval precision than Objaverse for embodied AI and spatial computing.

  4. Towards Realistic and Consistent Orbital Video Generation via 3D Foundation Priors

    cs.CV 2026-04 unverdicted novelty 7.0

    A video generation approach conditions a base model with multi-scale 3D latent features and a cross-attention adapter to produce geometrically realistic and consistent orbital videos from one image.

  5. AniGen: Unified $S^3$ Fields for Animatable 3D Asset Generation

    cs.GR 2026-04 unverdicted novelty 7.0

    AniGen directly generates animatable 3D assets with consistent shape, skeleton, and skinning from single images using unified S^3 fields and a two-stage flow-matching pipeline.

  6. Benchmarking Vision-Language Models under Contradictory Virtual Content Attacks in Augmented Reality

    cs.CV 2026-04 unverdicted novelty 7.0

    ContrAR benchmark reveals that current VLMs show reasonable understanding of contradictory virtual content in AR but need improvement in detection, reasoning, and balancing accuracy with latency.

  7. CARI4D: Category Agnostic 4D Reconstruction of Human-Object Interaction

    cs.CV 2025-12 unverdicted novelty 7.0

    CARI4D is the first category-agnostic pipeline that produces metric-scale, spatially and temporally consistent 4D reconstructions of human-object interactions from monocular RGB videos via foundation-model hypothesis ...

  8. Structured 3D Latents for Scalable and Versatile 3D Generation

    cs.CV 2024-12 unverdicted novelty 7.0

    SLAT provides a unified 3D latent representation enabling versatile high-quality generation across multiple output formats from text or image inputs.

  9. Sat3DGen: Comprehensive Street-Level 3D Scene Generation from Single Satellite Image

    cs.CV 2026-05 unverdicted novelty 6.0

    Sat3DGen improves geometric RMSE from 6.76m to 5.20m and FID from ~40 to 19 for street-level 3D generation from satellite images via geometry-centric constraints and perspective training.

  10. Toward Visually Realistic Simulation: A Benchmark for Evaluating Robot Manipulation in Simulation

    cs.RO 2026-05 unverdicted novelty 6.0

    VISER is a new visually realistic simulation benchmark for robot manipulation tasks that uses PBR materials and MLLM-assisted asset generation, achieving 0.92 Pearson correlation with real-world policy performance.

  11. Prop-Chromeleon: Adaptive Haptic Props in Mixed Reality through Generative Artificial Intelligence

    cs.HC 2026-05 unverdicted novelty 6.0

    A generative-AI pipeline dynamically generates and anchors virtual assets to match the shape of physical props, enabling adaptive passive haptics in MR that users rate higher in realism, immersion, and enjoyment than ...

  12. Feed-Forward 3D Scene Modeling: A Problem-Driven Perspective

    cs.CV 2026-04 unverdicted novelty 6.0

    The paper proposes a problem-driven taxonomy for feed-forward 3D scene modeling that groups methods by five core challenges: feature enhancement, geometry awareness, model efficiency, augmentation strategies, and temp...

  13. Lyra 2.0: Explorable Generative 3D Worlds

    cs.CV 2026-04 unverdicted novelty 6.0

    Lyra 2.0 produces persistent 3D-consistent video sequences for large explorable worlds by using per-frame geometry for information routing and self-augmented training to correct temporal drift.

  14. A Semi-Automated Framework for 3D Reconstruction of Medieval Manuscript Miniatures

    cs.CV 2026-04 conditional novelty 6.0

    A pipeline using SAM segmentation and Hi3DGen mesh generation, evaluated on 69 medieval figures, produces usable 3D models for XR and tactile applications with Hi3DGen as the best starting point.

  15. TripoSG: High-Fidelity 3D Shape Synthesis using Large-Scale Rectified Flow Models

    cs.CV 2025-02 unverdicted novelty 6.0

    TripoSG generates high-fidelity 3D meshes from input images via a large-scale rectified flow transformer and hybrid-trained 3D VAE on a custom 2-million-sample dataset, claiming state-of-the-art fidelity and generalization.

  16. InstantMesh: Efficient 3D Mesh Generation from a Single Image with Sparse-view Large Reconstruction Models

    cs.CV 2024-04 unverdicted novelty 6.0

    InstantMesh produces diverse, high-quality 3D meshes from single images in seconds by combining a multi-view diffusion model with a sparse-view large reconstruction model and optimizing directly on meshes.

  17. R-DMesh: Video-Guided 3D Animation via Rectified Dynamic Mesh Flow

    cs.CV 2026-05 unverdicted novelty 5.0

    R-DMesh uses a VAE with a learned rectification jump offset and Triflow Attention inside a rectified-flow diffusion transformer to produce video-aligned 4D meshes despite initial pose misalignment.

  18. From Visual Synthesis to Interactive Worlds: Toward Production-Ready 3D Asset Generation

    cs.GR 2026-04 unverdicted novelty 5.0

    The paper surveys 3D asset generation methods and organizes them around the full production pipeline to assess which outputs meet engine-level requirements for interactive applications.

  19. AmaraSpatial-10K: A Spatially and Semantically Aligned 3D Dataset for Spatial Computing and Embodied AI

    cs.CV 2026-04 unverdicted novelty 5.0

    AmaraSpatial-10K supplies 10K deployment-ready 3D assets with metric scaling and metadata, delivering 3.4x higher CLIP Recall@5 than Objaverse and 99.1% physics stability in Habitat-Sim.

  20. UniMesh: Unifying 3D Mesh Understanding and Generation

    cs.CV 2026-04 unverdicted novelty 5.0

    UniMesh unifies 3D mesh generation and understanding in one model via a Mesh Head interface, Chain of Mesh iterative editing, and an Actor-Evaluator self-reflection loop.

  21. From Visual Synthesis to Interactive Worlds: Toward Production-Ready 3D Asset Generation

    cs.GR 2026-04 unverdicted novelty 4.0

    The paper surveys 3D content generation literature using a taxonomy of asset types and production stages to evaluate progress toward engine-ready assets.

  22. OpenWorldLib: A Unified Codebase and Definition of Advanced World Models

    cs.CV 2026-04 unverdicted novelty 4.0

    OpenWorldLib offers a standardized codebase and definition for world models that combine perception, interaction, and memory to understand and predict the world.

Reference graph

Works this paper leans on

35 extracted references · 35 canonical work pages · cited by 19 Pith papers · 4 internal anchors

  1. [1]

    Emerg- ing properties in self-supervised vision transformers

    Mathilde Caron, Hugo Touvron, Ishan Misra, Herv ´e J´egou, Julien Mairal, Piotr Bojanowski, and Armand Joulin. Emerg- ing properties in self-supervised vision transformers. In Pro- ceedings of the International Conference on Computer Vi- sion (ICCV), 2021. 2

  2. [2]

    Efficient geometry-aware 3d generative adversarial networks

    Eric R Chan, Connor Z Lin, Matthew A Chan, Koki Nagano, Boxiao Pan, Shalini De Mello, Orazio Gallo, Leonidas J Guibas, Jonathan Tremblay, Sameh Khamis, et al. Efficient geometry-aware 3d generative adversarial networks. In Pro- ceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 16123–16133, 2022. 2

  3. [3]

    Chan, Koki Nagano, Matthew A

    Eric R. Chan, Koki Nagano, Matthew A. Chan, Alexan- der W. Bergman, Jeong Joon Park, Axel Levy, Miika Ait- tala, Shalini De Mello, Tero Karras, and Gordon Wetzstein. GeNVS: Generative novel view synthesis with 3D-aware dif- fusion models. In arXiv, 2023. 1

  4. [4]

    Objaverse: A universe of annotated 3d objects

    Matt Deitke, Dustin Schwenk, Jordi Salvador, Luca Weihs, Oscar Michel, Eli VanderBilt, Ludwig Schmidt, Kiana Ehsani, Aniruddha Kembhavi, and Ali Farhadi. Objaverse: A universe of annotated 3d objects. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 13142–13153, 2023. 1, 2 5

  5. [5]

    Objaverse-xl: A universe of 10m+ 3d objects

    Matt Deitke, Ruoshi Liu, Matthew Wallingford, Huong Ngo, Oscar Michel, Aditya Kusupati, Alan Fan, Christian Laforte, Vikram V oleti, Samir Yitzhak Gadre, et al. Objaverse-xl: A universe of 10m+ 3d objects. Advances in Neural Informa- tion Processing Systems, 36, 2024. 1

  6. [6]

    Google scanned objects: A high- quality dataset of 3d scanned household items

    Laura Downs, Anthony Francis, Nate Koenig, Brandon Kin- man, Ryan Hickman, Krista Reymann, Thomas B McHugh, and Vincent Vanhoucke. Google scanned objects: A high- quality dataset of 3d scanned household items. In 2022 In- ternational Conference on Robotics and Automation (ICRA), pages 2553–2560. IEEE, 2022. 4, 5

  7. [7]

    Mesh r-cnn

    Georgia Gkioxari, Jitendra Malik, and Justin Johnson. Mesh r-cnn. In Proceedings of the IEEE/CVF international confer- ence on computer vision , pages 9785–9795, 2019. 2

  8. [8]

    A papier-m ˆach´e ap- proach to learning 3d surface generation

    Thibault Groueix, Matthew Fisher, Vladimir G Kim, Bryan C Russell, and Mathieu Aubry. A papier-m ˆach´e ap- proach to learning 3d surface generation. In Proceedings of the IEEE conference on computer vision and pattern recog- nition, pages 216–224, 2018. 2

  9. [9]

    threestudio: A unified framework for 3d content gen- eration, 2023

    Yuan-Chen Guo, Ying-Tian Liu, Chen Wang, Zi-Xin Zou, Guan Luo, Chia-Hao Chen, Yan-Pei Cao, and Song-Hai Zhang. threestudio: A unified framework for 3d content gen- eration, 2023. 1

  10. [10]

    Openlrm: Open-source large reconstruction models

    Zexin He and Tengfei Wang. Openlrm: Open-source large reconstruction models. https://github.com/3DTopi a/OpenLRM, 2023. 4

  11. [11]

    LRM: Large Reconstruction Model for Single Image to 3D

    Yicong Hong, Kai Zhang, Jiuxiang Gu, Sai Bi, Yang Zhou, Difan Liu, Feng Liu, Kalyan Sunkavalli, Trung Bui, and Hao Tan. Lrm: Large reconstruction model for single image to 3d. arXiv preprint arXiv:2311.04400, 2023. 1, 2, 5

  12. [12]

    Zixuan Huang, Varun Jampani, Anh Thai, Yuanzhen Li, Ste- fan Stojanov, and James M. Rehg. Shapeclipper: Scalable 3d shape learning from single-view images via geometric and clip-based consistency. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 12912–12922, 2023

  13. [13]

    Zeroshape: Regression-based zero- shot shape reconstruction

    Zixuan Huang, Stefan Stojanov, Anh Thai, Varun Jampani, and James M Rehg. Zeroshape: Regression-based zero- shot shape reconstruction. arXiv preprint arXiv:2312.14198,

  14. [14]

    arXiv preprint arXiv:2311.06214 , year=

    Jiahao Li, Hao Tan, Kai Zhang, Zexiang Xu, Fujun Luan, Yinghao Xu, Yicong Hong, Kalyan Sunkavalli, Greg Shakhnarovich, and Sai Bi. Instant3d: Fast text-to-3d with sparse-view generation and large reconstruction model. arXiv preprint arXiv:2311.06214, 2023. 2

  15. [15]

    Advances in 3d generation: A survey

    Xiaoyu Li, Qi Zhang, Di Kang, Weihao Cheng, Yiming Gao, Jingbo Zhang, Zhihao Liang, Jing Liao, Yan-Pei Cao, and Ying Shan. Advances in 3d generation: A survey. arXiv preprint arXiv:2401.17807, 2024. 2

  16. [16]

    One-2-3-45: Any single image to 3d mesh in 45 seconds without per-shape optimiza- tion

    Minghua Liu, Chao Xu, Haian Jin, Linghao Chen, Mukund Varma T, Zexiang Xu, and Hao Su. One-2-3-45: Any single image to 3d mesh in 45 seconds without per-shape optimiza- tion. Advances in Neural Information Processing Systems , 36, 2024. 4

  17. [17]

    Zero-1-to-3: Zero-shot one image to 3d object

    Ruoshi Liu, Rundi Wu, Basile Van Hoorick, Pavel Tok- makov, Sergey Zakharov, and Carl V ondrick. Zero-1-to-3: Zero-shot one image to 3d object. In Proceedings of the IEEE/CVF International Conference on Computer Vision , pages 9298–9309, 2023. 1, 2

  18. [18]

    Lorensen and Harvey E

    William E. Lorensen and Harvey E. Cline. Marching cubes: A high resolution 3d surface construction algorithm. SIG- GRAPH Comput. Graph., 21(4):163–169, 1987. 4

  19. [19]

    Occupancy networks: Learning 3d reconstruction in function space

    Lars Mescheder, Michael Oechsle, Michael Niemeyer, Se- bastian Nowozin, and Andreas Geiger. Occupancy networks: Learning 3d reconstruction in function space. InProceedings of the IEEE/CVF Conference on Computer Vision and Pat- tern Recognition, pages 4460–4470, 2019. 2

  20. [20]

    DreamFusion: Text-to-3D using 2D Diffusion

    Ben Poole, Ajay Jain, Jonathan T Barron, and Ben Milden- hall. Dreamfusion: Text-to-3d using 2d diffusion. arXiv preprint arXiv:2209.14988, 2022. 2

  21. [21]

    MVDream: Multi-view Diffusion for 3D Generation

    Yichun Shi, Peng Wang, Jianglong Ye, Mai Long, Kejie Li, and Xiao Yang. Mvdream: Multi-view diffusion for 3d gen- eration. arXiv preprint arXiv:2308.16512, 2023. 2

  22. [22]

    Deep generative models on 3d rep- resentations: A survey

    Zifan Shi, Sida Peng, Yinghao Xu, Andreas Geiger, Yiyi Liao, and Yujun Shen. Deep generative models on 3d rep- resentations: A survey. arXiv preprint arXiv:2210.15663 ,

  23. [23]

    DreamGaussian: Generative Gaussian Splatting for Efficient 3D Content Creation

    Jiaxiang Tang, Jiawei Ren, Hang Zhou, Ziwei Liu, and Gang Zeng. Dreamgaussian: Generative gaussian splatting for effi- cient 3d content creation. arXiv preprint arXiv:2309.16653,

  24. [24]

    Lgm: Large multi-view gaussian model for high-resolution 3d content creation.arXiv preprint arXiv:2402.05054, 2024

    Jiaxiang Tang, Zhaoxi Chen, Xiaokang Chen, Tengfei Wang, Gang Zeng, and Ziwei Liu. Lgm: Large multi-view gaussian model for high-resolution 3d content creation.arXiv preprint arXiv:2402.05054, 2024. 2

  25. [25]

    Pixel2mesh: Generating 3d mesh models from single rgb images

    Nanyang Wang, Yinda Zhang, Zhuwen Li, Yanwei Fu, Wei Liu, and Yu-Gang Jiang. Pixel2mesh: Generating 3d mesh models from single rgb images. In Proceedings of the Euro- pean conference on computer vision (ECCV) , pages 52–67, 2018

  26. [26]

    arXiv preprint arXiv:2311.12024 , year=

    Peng Wang, Hao Tan, Sai Bi, Yinghao Xu, Fujun Luan, Kalyan Sunkavalli, Wenping Wang, Zexiang Xu, and Kai Zhang. Pf-lrm: Pose-free large reconstruction model for joint pose and shape prediction. arXiv preprint arXiv:2311.12024, 2023. 2

  27. [27]

    Prolificdreamer: High-fidelity and diverse text-to-3d generation with variational score distilla- tion

    Zhengyi Wang, Cheng Lu, Yikai Wang, Fan Bao, Chongxuan Li, Hang Su, and Jun Zhu. Prolificdreamer: High-fidelity and diverse text-to-3d generation with variational score distilla- tion. Advances in Neural Information Processing Systems , 36, 2024. 2

  28. [28]

    Multiview compres- sive coding for 3d reconstruction

    Chao-Yuan Wu, Justin Johnson, Jitendra Malik, Christoph Feichtenhofer, and Georgia Gkioxari. Multiview compres- sive coding for 3d reconstruction. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 9065–9075, 2023. 2

  29. [29]

    Reconfu- sion: 3d reconstruction with diffusion priors

    Rundi Wu, Ben Mildenhall, Philipp Henzler, Keunhong Park, Ruiqi Gao, Daniel Watson, Pratul P Srinivasan, Dor Verbin, Jonathan T Barron, Ben Poole, et al. Reconfu- sion: 3d reconstruction with diffusion priors. arXiv preprint arXiv:2312.02981, 2023. 1

  30. [30]

    Omniobject3d: Large-vocabulary 3d object dataset for realistic perception, reconstruction and generation

    Tong Wu, Jiarui Zhang, Xiao Fu, Yuxin Wang, Jiawei Ren, Liang Pan, Wayne Wu, Lei Yang, Jiaqi Wang, Chen Qian, et al. Omniobject3d: Large-vocabulary 3d object dataset for realistic perception, reconstruction and generation. In Pro- 6 ceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 803–814, 2023. 4, 5

  31. [31]

    arXiv preprint arXiv:2311.09217 , year=

    Yinghao Xu, Hao Tan, Fujun Luan, Sai Bi, Peng Wang, Ji- ahao Li, Zifan Shi, Kalyan Sunkavalli, Gordon Wetzstein, Zexiang Xu, et al. Dmv3d: Denoising multi-view diffu- sion using 3d large reconstruction model. arXiv preprint arXiv:2311.09217, 2023. 2

  32. [32]

    Learning to re- construct shapes from unseen classes

    Xiuming Zhang, Zhoutong Zhang, Chengkai Zhang, Josh Tenenbaum, Bill Freeman, and Jiajun Wu. Learning to re- construct shapes from unseen classes. Advances in neural information processing systems, 31, 2018. 2

  33. [33]

    Sparsefusion: Dis- tilling view-conditioned diffusion for 3d reconstruction

    Zhizhuo Zhou and Shubham Tulsiani. Sparsefusion: Dis- tilling view-conditioned diffusion for 3d reconstruction. In Proceedings of the IEEE/CVF Conference on Computer Vi- sion and Pattern Recognition, pages 12588–12597, 2023. 1

  34. [34]

    Sparse3d: Distill- ing multiview-consistent diffusion for object reconstruction from sparse views

    Zi-Xin Zou, Weihao Cheng, Yan-Pei Cao, Shi-Sheng Huang, Ying Shan, and Song-Hai Zhang. Sparse3d: Distill- ing multiview-consistent diffusion for object reconstruction from sparse views. arXiv preprint arXiv:2308.14078, 2023

  35. [35]

    Triplane meets gaussian splatting: Fast and generalizable single- view 3d reconstruction with transformers

    Zi-Xin Zou, Zhipeng Yu, Yuan-Chen Guo, Yangguang Li, Ding Liang, Yan-Pei Cao, and Song-Hai Zhang. Triplane meets gaussian splatting: Fast and generalizable single- view 3d reconstruction with transformers. arXiv preprint arXiv:2312.09147, 2023. 1, 2, 4 7