Recognition: no theorem link
TripoSR: Fast 3D Object Reconstruction from a Single Image
Pith reviewed 2026-05-16 17:47 UTC · model grok-4.3
The pith
TripoSR produces a 3D mesh from one photo in under half a second by refining the LRM transformer design.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
TripoSR is a transformer network that takes a single image and directly outputs a 3D mesh in under 0.5 seconds. By combining improvements in data processing, model design, and training techniques on top of the LRM backbone, the system achieves better numerical accuracy and visual quality than existing open-source single-image reconstruction methods on public test sets.
What carries the argument
A transformer-based feed-forward network derived from LRM that maps an input image to a 3D mesh through refined data pipelines, architectural tweaks, and training schedules.
If this is right
- Real-time single-image 3D capture becomes practical on consumer hardware.
- Downstream applications such as AR object placement or rapid prototyping can start from casual photos.
- Open release under MIT license lowers the barrier for further model improvements by the community.
- Quantitative benchmarks on public sets now have a stronger open-source baseline for comparison.
Where Pith is reading between the lines
- The speed may allow integration into video pipelines for frame-by-frame 3D lifting without multi-view capture.
- If the same design pattern transfers to other modalities, similar speed gains could appear in related reconstruction tasks.
- Casual users could generate 3D assets for games or VR directly from smartphone snapshots.
- The approach might reduce reliance on expensive multi-camera rigs in industrial scanning workflows.
Load-bearing premise
The reported gains come from genuine generalization rather than fitting the specific datasets used for evaluation.
What would settle it
TripoSR producing lower accuracy or worse visual quality than the best prior open-source method on a fresh dataset collected from everyday photos not matching the original training distribution.
read the original abstract
This technical report introduces TripoSR, a 3D reconstruction model leveraging transformer architecture for fast feed-forward 3D generation, producing 3D mesh from a single image in under 0.5 seconds. Building upon the LRM network architecture, TripoSR integrates substantial improvements in data processing, model design, and training techniques. Evaluations on public datasets show that TripoSR exhibits superior performance, both quantitatively and qualitatively, compared to other open-source alternatives. Released under the MIT license, TripoSR is intended to empower researchers, developers, and creatives with the latest advancements in 3D generative AI.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper introduces TripoSR, a transformer-based feed-forward model for single-image 3D mesh reconstruction that runs in under 0.5 seconds. Building on the LRM architecture, it incorporates targeted improvements in data processing, model design, and training techniques, and reports superior quantitative and qualitative performance relative to other open-source baselines on public datasets.
Significance. If the reported gains prove robust to dataset shifts and are supported by isolating ablations, the work would meaningfully advance practical 3D generative AI by delivering a fast, open-source alternative suitable for downstream applications. The MIT release is a concrete strength that supports reproducibility.
major comments (3)
- [§4] §4 (Experiments and Results): the central claim of superiority rests on quantitative metrics, yet no ablation tables isolate the individual contributions of the data-processing pipeline, architectural changes, or training schedule. Without these controls it is impossible to confirm that the gains arise from the stated improvements rather than hyper-parameter tuning or dataset alignment.
- [§3, §4.2] §3 (Method) and §4.2 (Dataset): the manuscript does not state whether the public evaluation sets (Objaverse-derived or otherwise) use fully disjoint splits from any data used during the reported training or fine-tuning stages. This omission leaves open the possibility that higher metrics reflect distribution matching rather than improved generalization.
- [§4.1] §4.1 (Baselines): the comparison set is limited to open-source alternatives; the paper should either include a stronger closed-source reference or explicitly justify why the chosen baselines suffice to support the claim of state-of-the-art performance.
minor comments (2)
- [Figure 1, §2] Figure 1 and §2: the high-level diagram of the TripoSR pipeline would benefit from explicit call-outs to the new components relative to LRM so readers can immediately see the modifications.
- [Abstract, §1] Abstract and §1: the phrase 'public datasets' is used without naming the specific benchmarks (e.g., Objaverse, ShapeNet) or providing a citation; this should be expanded on first mention.
Simulated Author's Rebuttal
We appreciate the referee's detailed feedback on our manuscript. We have carefully considered each comment and provide point-by-point responses below. We will incorporate revisions to address the concerns where feasible.
read point-by-point responses
-
Referee: [§4] §4 (Experiments and Results): the central claim of superiority rests on quantitative metrics, yet no ablation tables isolate the individual contributions of the data-processing pipeline, architectural changes, or training schedule. Without these controls it is impossible to confirm that the gains arise from the stated improvements rather than hyper-parameter tuning or dataset alignment.
Authors: We acknowledge that the current version of the manuscript does not include comprehensive ablation studies isolating each component. To address this, we will add ablation tables in the revised manuscript that evaluate the contributions of the data-processing pipeline, architectural changes, and training schedule separately. These ablations will help confirm that the reported improvements are due to the proposed techniques. revision: yes
-
Referee: [§3, §4.2] §3 (Method) and §4.2 (Dataset): the manuscript does not state whether the public evaluation sets (Objaverse-derived or otherwise) use fully disjoint splits from any data used during the reported training or fine-tuning stages. This omission leaves open the possibility that higher metrics reflect distribution matching rather than improved generalization.
Authors: The evaluation datasets are indeed disjoint from the training data. We used standard splits where the test sets do not overlap with training samples. We will explicitly document this in the revised §4.2 to eliminate any ambiguity regarding data leakage and to strengthen the generalization claims. revision: yes
-
Referee: [§4.1] §4.1 (Baselines): the comparison set is limited to open-source alternatives; the paper should either include a stronger closed-source reference or explicitly justify why the chosen baselines suffice to support the claim of state-of-the-art performance.
Authors: We agree that including closed-source comparisons would be ideal, but since those models are not publicly available, direct quantitative comparison is not possible. We will revise the manuscript to explicitly justify our choice of open-source baselines, noting that they represent the current reproducible state-of-the-art and that our work aims to provide an open-source alternative. This justification will be added to §4.1. revision: partial
Circularity Check
No significant circularity in the empirical model and performance claims
full rationale
The paper presents TripoSR as an empirical feed-forward model that builds on the LRM architecture with described improvements to data processing, model design, and training. Its central claims consist of quantitative and qualitative performance results on public datasets rather than any mathematical derivation chain. No equations, uniqueness theorems, or fitted-parameter predictions are shown that reduce by construction to the inputs. Any self-citations to LRM or prior work are not load-bearing for the reported results, which remain independently falsifiable via the stated evaluations. The analysis therefore finds no circular steps.
Axiom & Free-Parameter Ledger
free parameters (1)
- model hyperparameters and training schedule
axioms (1)
- domain assumption Transformer architecture can map single-image features to 3D mesh parameters effectively
Forward citations
Cited by 22 Pith papers
-
R-DMesh: Video-Guided 3D Animation via Rectified Dynamic Mesh Flow
R-DMesh generates high-fidelity 4D meshes aligned to video by disentangling base mesh, motion, and a learned rectification jump offset inside a VAE, then using Triflow Attention and rectified-flow diffusion.
-
Img2CADSeq: Image-to-CAD Generation via Sequence-Based Diffusion
Img2CADSeq generates standard CAD sequences from images via a multi-stage pipeline with three-level hierarchical codebook encoding, importance-guided compression, and contrastive point-cloud conditioning of a VQ-Diffu...
-
AmaraSpatial-10K: A Spatially and Semantically Aligned 3D Dataset for Spatial Computing and Embodied AI
AmaraSpatial-10K is a new dataset of over 10,000 metric-scaled and semantically anchored 3D assets that achieves 3.4 times higher text retrieval precision than Objaverse for embodied AI and spatial computing.
-
Towards Realistic and Consistent Orbital Video Generation via 3D Foundation Priors
A video generation approach conditions a base model with multi-scale 3D latent features and a cross-attention adapter to produce geometrically realistic and consistent orbital videos from one image.
-
AniGen: Unified $S^3$ Fields for Animatable 3D Asset Generation
AniGen directly generates animatable 3D assets with consistent shape, skeleton, and skinning from single images using unified S^3 fields and a two-stage flow-matching pipeline.
-
Benchmarking Vision-Language Models under Contradictory Virtual Content Attacks in Augmented Reality
ContrAR benchmark reveals that current VLMs show reasonable understanding of contradictory virtual content in AR but need improvement in detection, reasoning, and balancing accuracy with latency.
-
CARI4D: Category Agnostic 4D Reconstruction of Human-Object Interaction
CARI4D is the first category-agnostic pipeline that produces metric-scale, spatially and temporally consistent 4D reconstructions of human-object interactions from monocular RGB videos via foundation-model hypothesis ...
-
Structured 3D Latents for Scalable and Versatile 3D Generation
SLAT provides a unified 3D latent representation enabling versatile high-quality generation across multiple output formats from text or image inputs.
-
Sat3DGen: Comprehensive Street-Level 3D Scene Generation from Single Satellite Image
Sat3DGen improves geometric RMSE from 6.76m to 5.20m and FID from ~40 to 19 for street-level 3D generation from satellite images via geometry-centric constraints and perspective training.
-
Toward Visually Realistic Simulation: A Benchmark for Evaluating Robot Manipulation in Simulation
VISER is a new visually realistic simulation benchmark for robot manipulation tasks that uses PBR materials and MLLM-assisted asset generation, achieving 0.92 Pearson correlation with real-world policy performance.
-
Prop-Chromeleon: Adaptive Haptic Props in Mixed Reality through Generative Artificial Intelligence
A generative-AI pipeline dynamically generates and anchors virtual assets to match the shape of physical props, enabling adaptive passive haptics in MR that users rate higher in realism, immersion, and enjoyment than ...
-
Feed-Forward 3D Scene Modeling: A Problem-Driven Perspective
The paper proposes a problem-driven taxonomy for feed-forward 3D scene modeling that groups methods by five core challenges: feature enhancement, geometry awareness, model efficiency, augmentation strategies, and temp...
-
Lyra 2.0: Explorable Generative 3D Worlds
Lyra 2.0 produces persistent 3D-consistent video sequences for large explorable worlds by using per-frame geometry for information routing and self-augmented training to correct temporal drift.
-
A Semi-Automated Framework for 3D Reconstruction of Medieval Manuscript Miniatures
A pipeline using SAM segmentation and Hi3DGen mesh generation, evaluated on 69 medieval figures, produces usable 3D models for XR and tactile applications with Hi3DGen as the best starting point.
-
TripoSG: High-Fidelity 3D Shape Synthesis using Large-Scale Rectified Flow Models
TripoSG generates high-fidelity 3D meshes from input images via a large-scale rectified flow transformer and hybrid-trained 3D VAE on a custom 2-million-sample dataset, claiming state-of-the-art fidelity and generalization.
-
InstantMesh: Efficient 3D Mesh Generation from a Single Image with Sparse-view Large Reconstruction Models
InstantMesh produces diverse, high-quality 3D meshes from single images in seconds by combining a multi-view diffusion model with a sparse-view large reconstruction model and optimizing directly on meshes.
-
R-DMesh: Video-Guided 3D Animation via Rectified Dynamic Mesh Flow
R-DMesh uses a VAE with a learned rectification jump offset and Triflow Attention inside a rectified-flow diffusion transformer to produce video-aligned 4D meshes despite initial pose misalignment.
-
From Visual Synthesis to Interactive Worlds: Toward Production-Ready 3D Asset Generation
The paper surveys 3D asset generation methods and organizes them around the full production pipeline to assess which outputs meet engine-level requirements for interactive applications.
-
AmaraSpatial-10K: A Spatially and Semantically Aligned 3D Dataset for Spatial Computing and Embodied AI
AmaraSpatial-10K supplies 10K deployment-ready 3D assets with metric scaling and metadata, delivering 3.4x higher CLIP Recall@5 than Objaverse and 99.1% physics stability in Habitat-Sim.
-
UniMesh: Unifying 3D Mesh Understanding and Generation
UniMesh unifies 3D mesh generation and understanding in one model via a Mesh Head interface, Chain of Mesh iterative editing, and an Actor-Evaluator self-reflection loop.
-
From Visual Synthesis to Interactive Worlds: Toward Production-Ready 3D Asset Generation
The paper surveys 3D content generation literature using a taxonomy of asset types and production stages to evaluate progress toward engine-ready assets.
-
OpenWorldLib: A Unified Codebase and Definition of Advanced World Models
OpenWorldLib offers a standardized codebase and definition for world models that combine perception, interaction, and memory to understand and predict the world.
Reference graph
Works this paper leans on
-
[1]
Emerg- ing properties in self-supervised vision transformers
Mathilde Caron, Hugo Touvron, Ishan Misra, Herv ´e J´egou, Julien Mairal, Piotr Bojanowski, and Armand Joulin. Emerg- ing properties in self-supervised vision transformers. In Pro- ceedings of the International Conference on Computer Vi- sion (ICCV), 2021. 2
work page 2021
-
[2]
Efficient geometry-aware 3d generative adversarial networks
Eric R Chan, Connor Z Lin, Matthew A Chan, Koki Nagano, Boxiao Pan, Shalini De Mello, Orazio Gallo, Leonidas J Guibas, Jonathan Tremblay, Sameh Khamis, et al. Efficient geometry-aware 3d generative adversarial networks. In Pro- ceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 16123–16133, 2022. 2
work page 2022
-
[3]
Eric R. Chan, Koki Nagano, Matthew A. Chan, Alexan- der W. Bergman, Jeong Joon Park, Axel Levy, Miika Ait- tala, Shalini De Mello, Tero Karras, and Gordon Wetzstein. GeNVS: Generative novel view synthesis with 3D-aware dif- fusion models. In arXiv, 2023. 1
work page 2023
-
[4]
Objaverse: A universe of annotated 3d objects
Matt Deitke, Dustin Schwenk, Jordi Salvador, Luca Weihs, Oscar Michel, Eli VanderBilt, Ludwig Schmidt, Kiana Ehsani, Aniruddha Kembhavi, and Ali Farhadi. Objaverse: A universe of annotated 3d objects. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 13142–13153, 2023. 1, 2 5
work page 2023
-
[5]
Objaverse-xl: A universe of 10m+ 3d objects
Matt Deitke, Ruoshi Liu, Matthew Wallingford, Huong Ngo, Oscar Michel, Aditya Kusupati, Alan Fan, Christian Laforte, Vikram V oleti, Samir Yitzhak Gadre, et al. Objaverse-xl: A universe of 10m+ 3d objects. Advances in Neural Informa- tion Processing Systems, 36, 2024. 1
work page 2024
-
[6]
Google scanned objects: A high- quality dataset of 3d scanned household items
Laura Downs, Anthony Francis, Nate Koenig, Brandon Kin- man, Ryan Hickman, Krista Reymann, Thomas B McHugh, and Vincent Vanhoucke. Google scanned objects: A high- quality dataset of 3d scanned household items. In 2022 In- ternational Conference on Robotics and Automation (ICRA), pages 2553–2560. IEEE, 2022. 4, 5
work page 2022
-
[7]
Georgia Gkioxari, Jitendra Malik, and Justin Johnson. Mesh r-cnn. In Proceedings of the IEEE/CVF international confer- ence on computer vision , pages 9785–9795, 2019. 2
work page 2019
-
[8]
A papier-m ˆach´e ap- proach to learning 3d surface generation
Thibault Groueix, Matthew Fisher, Vladimir G Kim, Bryan C Russell, and Mathieu Aubry. A papier-m ˆach´e ap- proach to learning 3d surface generation. In Proceedings of the IEEE conference on computer vision and pattern recog- nition, pages 216–224, 2018. 2
work page 2018
-
[9]
threestudio: A unified framework for 3d content gen- eration, 2023
Yuan-Chen Guo, Ying-Tian Liu, Chen Wang, Zi-Xin Zou, Guan Luo, Chia-Hao Chen, Yan-Pei Cao, and Song-Hai Zhang. threestudio: A unified framework for 3d content gen- eration, 2023. 1
work page 2023
-
[10]
Openlrm: Open-source large reconstruction models
Zexin He and Tengfei Wang. Openlrm: Open-source large reconstruction models. https://github.com/3DTopi a/OpenLRM, 2023. 4
work page 2023
-
[11]
LRM: Large Reconstruction Model for Single Image to 3D
Yicong Hong, Kai Zhang, Jiuxiang Gu, Sai Bi, Yang Zhou, Difan Liu, Feng Liu, Kalyan Sunkavalli, Trung Bui, and Hao Tan. Lrm: Large reconstruction model for single image to 3d. arXiv preprint arXiv:2311.04400, 2023. 1, 2, 5
work page internal anchor Pith review Pith/arXiv arXiv 2023
-
[12]
Zixuan Huang, Varun Jampani, Anh Thai, Yuanzhen Li, Ste- fan Stojanov, and James M. Rehg. Shapeclipper: Scalable 3d shape learning from single-view images via geometric and clip-based consistency. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 12912–12922, 2023
work page 2023
-
[13]
Zeroshape: Regression-based zero- shot shape reconstruction
Zixuan Huang, Stefan Stojanov, Anh Thai, Varun Jampani, and James M Rehg. Zeroshape: Regression-based zero- shot shape reconstruction. arXiv preprint arXiv:2312.14198,
-
[14]
arXiv preprint arXiv:2311.06214 , year=
Jiahao Li, Hao Tan, Kai Zhang, Zexiang Xu, Fujun Luan, Yinghao Xu, Yicong Hong, Kalyan Sunkavalli, Greg Shakhnarovich, and Sai Bi. Instant3d: Fast text-to-3d with sparse-view generation and large reconstruction model. arXiv preprint arXiv:2311.06214, 2023. 2
-
[15]
Advances in 3d generation: A survey
Xiaoyu Li, Qi Zhang, Di Kang, Weihao Cheng, Yiming Gao, Jingbo Zhang, Zhihao Liang, Jing Liao, Yan-Pei Cao, and Ying Shan. Advances in 3d generation: A survey. arXiv preprint arXiv:2401.17807, 2024. 2
-
[16]
One-2-3-45: Any single image to 3d mesh in 45 seconds without per-shape optimiza- tion
Minghua Liu, Chao Xu, Haian Jin, Linghao Chen, Mukund Varma T, Zexiang Xu, and Hao Su. One-2-3-45: Any single image to 3d mesh in 45 seconds without per-shape optimiza- tion. Advances in Neural Information Processing Systems , 36, 2024. 4
work page 2024
-
[17]
Zero-1-to-3: Zero-shot one image to 3d object
Ruoshi Liu, Rundi Wu, Basile Van Hoorick, Pavel Tok- makov, Sergey Zakharov, and Carl V ondrick. Zero-1-to-3: Zero-shot one image to 3d object. In Proceedings of the IEEE/CVF International Conference on Computer Vision , pages 9298–9309, 2023. 1, 2
work page 2023
-
[18]
William E. Lorensen and Harvey E. Cline. Marching cubes: A high resolution 3d surface construction algorithm. SIG- GRAPH Comput. Graph., 21(4):163–169, 1987. 4
work page 1987
-
[19]
Occupancy networks: Learning 3d reconstruction in function space
Lars Mescheder, Michael Oechsle, Michael Niemeyer, Se- bastian Nowozin, and Andreas Geiger. Occupancy networks: Learning 3d reconstruction in function space. InProceedings of the IEEE/CVF Conference on Computer Vision and Pat- tern Recognition, pages 4460–4470, 2019. 2
work page 2019
-
[20]
DreamFusion: Text-to-3D using 2D Diffusion
Ben Poole, Ajay Jain, Jonathan T Barron, and Ben Milden- hall. Dreamfusion: Text-to-3d using 2d diffusion. arXiv preprint arXiv:2209.14988, 2022. 2
work page internal anchor Pith review Pith/arXiv arXiv 2022
-
[21]
MVDream: Multi-view Diffusion for 3D Generation
Yichun Shi, Peng Wang, Jianglong Ye, Mai Long, Kejie Li, and Xiao Yang. Mvdream: Multi-view diffusion for 3d gen- eration. arXiv preprint arXiv:2308.16512, 2023. 2
work page internal anchor Pith review Pith/arXiv arXiv 2023
-
[22]
Deep generative models on 3d rep- resentations: A survey
Zifan Shi, Sida Peng, Yinghao Xu, Andreas Geiger, Yiyi Liao, and Yujun Shen. Deep generative models on 3d rep- resentations: A survey. arXiv preprint arXiv:2210.15663 ,
-
[23]
DreamGaussian: Generative Gaussian Splatting for Efficient 3D Content Creation
Jiaxiang Tang, Jiawei Ren, Hang Zhou, Ziwei Liu, and Gang Zeng. Dreamgaussian: Generative gaussian splatting for effi- cient 3d content creation. arXiv preprint arXiv:2309.16653,
work page internal anchor Pith review Pith/arXiv arXiv
-
[24]
Jiaxiang Tang, Zhaoxi Chen, Xiaokang Chen, Tengfei Wang, Gang Zeng, and Ziwei Liu. Lgm: Large multi-view gaussian model for high-resolution 3d content creation.arXiv preprint arXiv:2402.05054, 2024. 2
-
[25]
Pixel2mesh: Generating 3d mesh models from single rgb images
Nanyang Wang, Yinda Zhang, Zhuwen Li, Yanwei Fu, Wei Liu, and Yu-Gang Jiang. Pixel2mesh: Generating 3d mesh models from single rgb images. In Proceedings of the Euro- pean conference on computer vision (ECCV) , pages 52–67, 2018
work page 2018
-
[26]
arXiv preprint arXiv:2311.12024 , year=
Peng Wang, Hao Tan, Sai Bi, Yinghao Xu, Fujun Luan, Kalyan Sunkavalli, Wenping Wang, Zexiang Xu, and Kai Zhang. Pf-lrm: Pose-free large reconstruction model for joint pose and shape prediction. arXiv preprint arXiv:2311.12024, 2023. 2
-
[27]
Zhengyi Wang, Cheng Lu, Yikai Wang, Fan Bao, Chongxuan Li, Hang Su, and Jun Zhu. Prolificdreamer: High-fidelity and diverse text-to-3d generation with variational score distilla- tion. Advances in Neural Information Processing Systems , 36, 2024. 2
work page 2024
-
[28]
Multiview compres- sive coding for 3d reconstruction
Chao-Yuan Wu, Justin Johnson, Jitendra Malik, Christoph Feichtenhofer, and Georgia Gkioxari. Multiview compres- sive coding for 3d reconstruction. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 9065–9075, 2023. 2
work page 2023
-
[29]
Reconfu- sion: 3d reconstruction with diffusion priors
Rundi Wu, Ben Mildenhall, Philipp Henzler, Keunhong Park, Ruiqi Gao, Daniel Watson, Pratul P Srinivasan, Dor Verbin, Jonathan T Barron, Ben Poole, et al. Reconfu- sion: 3d reconstruction with diffusion priors. arXiv preprint arXiv:2312.02981, 2023. 1
-
[30]
Tong Wu, Jiarui Zhang, Xiao Fu, Yuxin Wang, Jiawei Ren, Liang Pan, Wayne Wu, Lei Yang, Jiaqi Wang, Chen Qian, et al. Omniobject3d: Large-vocabulary 3d object dataset for realistic perception, reconstruction and generation. In Pro- 6 ceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 803–814, 2023. 4, 5
work page 2023
-
[31]
arXiv preprint arXiv:2311.09217 , year=
Yinghao Xu, Hao Tan, Fujun Luan, Sai Bi, Peng Wang, Ji- ahao Li, Zifan Shi, Kalyan Sunkavalli, Gordon Wetzstein, Zexiang Xu, et al. Dmv3d: Denoising multi-view diffu- sion using 3d large reconstruction model. arXiv preprint arXiv:2311.09217, 2023. 2
-
[32]
Learning to re- construct shapes from unseen classes
Xiuming Zhang, Zhoutong Zhang, Chengkai Zhang, Josh Tenenbaum, Bill Freeman, and Jiajun Wu. Learning to re- construct shapes from unseen classes. Advances in neural information processing systems, 31, 2018. 2
work page 2018
-
[33]
Sparsefusion: Dis- tilling view-conditioned diffusion for 3d reconstruction
Zhizhuo Zhou and Shubham Tulsiani. Sparsefusion: Dis- tilling view-conditioned diffusion for 3d reconstruction. In Proceedings of the IEEE/CVF Conference on Computer Vi- sion and Pattern Recognition, pages 12588–12597, 2023. 1
work page 2023
-
[34]
Sparse3d: Distill- ing multiview-consistent diffusion for object reconstruction from sparse views
Zi-Xin Zou, Weihao Cheng, Yan-Pei Cao, Shi-Sheng Huang, Ying Shan, and Song-Hai Zhang. Sparse3d: Distill- ing multiview-consistent diffusion for object reconstruction from sparse views. arXiv preprint arXiv:2308.14078, 2023
-
[35]
Zi-Xin Zou, Zhipeng Yu, Yuan-Chen Guo, Yangguang Li, Ding Liang, Yan-Pei Cao, and Song-Hai Zhang. Triplane meets gaussian splatting: Fast and generalizable single- view 3d reconstruction with transformers. arXiv preprint arXiv:2312.09147, 2023. 1, 2, 4 7
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.