Efficient 3D Content Reconstruction and Generation
Pith reviewed 2026-05-20 11:35 UTC · model grok-4.3
The pith
Instant3D generates high-quality 3D assets from text or images in 5-20 seconds while FastMap speeds up structure-from-motion by up to 10 times with comparable accuracy.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The thesis advances automatic 3D content creation by presenting Instant3D, which combines multi-view diffusion with feed-forward sparse-view 3D reconstruction to produce high-quality assets in 5-20 seconds, and FastMap, a structure-from-motion pipeline that achieves up to 10x speedup over prior state-of-the-art by using first-order optimization with fused GPU kernels extensively while maintaining comparable pose accuracy and downstream novel view synthesis quality.
What carries the argument
Instant3D's combination of multi-view diffusion followed by feed-forward reconstruction, and FastMap's replacement of conventional solvers with first-order optimization plus fused GPU kernels inside the structure-from-motion pipeline.
If this is right
- Text or single-image inputs become sufficient to produce usable 3D assets without lengthy manual modeling.
- Large image collections can be processed for camera poses and geometry at speeds that support real-time or near-real-time workflows.
- Novel view synthesis quality stays comparable to slower pipelines, so downstream rendering and simulation tasks remain reliable.
- Rapid iteration on 3D content becomes feasible for applications such as video-game prototyping and virtual-world construction.
Where Pith is reading between the lines
- If the two components are chained, a single short pipeline could turn text prompts into editable 3D scenes with minimal additional compute.
- The GPU-kernel approach used for pose optimization may transfer to other iterative refinement steps common in multi-view geometry.
- Wider adoption could lower the cost of curating large-scale 3D training sets for foundation models.
Load-bearing premise
The premise that first-order optimization with fused GPU kernels can deliver the claimed speedup in pose estimation without introducing accuracy losses that would degrade later novel view synthesis.
What would settle it
A side-by-side run of FastMap against a prior structure-from-motion method on a standard multi-view benchmark that shows either pose error rising or novel view synthesis quality dropping at the reported 10x speedup.
Figures
read the original abstract
Automatic 3D content creation seeks to replace labor-intensive modeling and scanning pipelines with systems that can synthesize or recover 3D assets directly from text or images. Its applications span video games, virtual reality, robotics, and simulation, enabling rapid asset prototyping, diverse interactive world generation, and efficient 3D data collection for training foundation models. Contemporary solutions largely follow two complementary paradigms: (i) text- or image-to-3D generation, which learns priors over 3D geometry and appearance to create novel assets from natural language or a single view image; and (ii) 3D reconstruction, which estimates camera poses and geometry from RGB images. This thesis advances both directions. On the generation side, I introduce Instant3D, which combines multi-view diffusion with feed-forward sparse-view 3D reconstruction to produce high-quality assets in 5-20 seconds. On the reconstruction side, I develop FastMap, a structure-from-motion pipeline that achieves up to 10x speedup over prior state-of-the-art by using first-order optimization with fused GPU kernels extensively, while maintaining comparable pose accuracy and downstream novel view synthesis quality.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript introduces Instant3D, which combines multi-view diffusion with feed-forward sparse-view 3D reconstruction to generate high-quality 3D assets from text or images in 5-20 seconds, and FastMap, a structure-from-motion pipeline that uses first-order optimization with fused GPU kernels to achieve up to 10x speedup over prior state-of-the-art while maintaining comparable pose accuracy and downstream novel view synthesis quality.
Significance. If the empirical claims hold, the work would advance efficient 3D content creation by reducing generation and reconstruction times substantially, with direct relevance to VR, robotics, and large-scale 3D data pipelines. The practical focus on GPU-accelerated first-order methods for SfM represents a useful engineering contribution if accuracy preservation is rigorously demonstrated.
major comments (1)
- The FastMap description claims that first-order optimization with fused GPU kernels preserves rotation/translation errors and NVS quality at ~10x speedup relative to second-order baselines (e.g., Levenberg-Marquardt or bundle adjustment). No iteration counts, convergence curves, per-scene error tables, or ablation on step-size/conditioning are referenced, leaving open whether accuracy holds after controlling for initialization sensitivity and kernel artifacts.
minor comments (1)
- The abstract and introduction would benefit from explicit dataset names and scene counts used for the 5-20s generation and 10x speedup claims to allow direct comparison with prior work.
Simulated Author's Rebuttal
We thank the referee for their constructive feedback on our manuscript. We address the major comment on FastMap below and will incorporate the suggested clarifications and additional results in the revised version.
read point-by-point responses
-
Referee: The FastMap description claims that first-order optimization with fused GPU kernels preserves rotation/translation errors and NVS quality at ~10x speedup relative to second-order baselines (e.g., Levenberg-Marquardt or bundle adjustment). No iteration counts, convergence curves, per-scene error tables, or ablation on step-size/conditioning are referenced, leaving open whether accuracy holds after controlling for initialization sensitivity and kernel artifacts.
Authors: We agree that the current presentation would benefit from more explicit empirical details. In the revised manuscript we will add iteration counts for our first-order method versus the second-order baselines, convergence curves across representative scenes, per-scene rotation/translation error tables, and ablations on step-size and conditioning. These will appear in Section 4.2. We already control for initialization by using identical COLMAP-derived starting poses for all compared methods and report mean and standard deviation over the full test set; we will make this protocol explicit. Numerical equivalence of the fused kernels to their unfused counterparts (within FP32 precision) is verified in the supplementary material, which we will reference in the main text. revision: yes
Circularity Check
No load-bearing circularity; empirical claims rest on proposed pipelines without self-referential definitions or fitted predictions
full rationale
The abstract and description introduce Instant3D (multi-view diffusion + feed-forward reconstruction) and FastMap (first-order optimization + fused kernels) as new combinations for 3D tasks. No equations, fitted parameters, or self-citations appear in the provided text that would define a result in terms of itself or rename an input fit as a prediction. Speedup and accuracy claims are stated as outcomes of the methods rather than derived by construction. This matches the default non-finding for a methods paper whose central content is independent of any visible self-referential math.
Axiom & Free-Parameter Ledger
Lean theorems connected to this paper
-
IndisputableMonolith/Cost/FunctionalEquation.leanwashburn_uniqueness_aczel unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
FastMap ... using first-order optimization with fused GPU kernels extensively, while maintaining comparable pose accuracy
-
IndisputableMonolith/Foundation/RealityFromDistinction.leanreality_from_one_distinction unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
Instant3D ... transformer-based sparse-view 3D reconstructor ... triplane representation
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Reference graph
Works this paper leans on
-
[1]
Learning representations and generative models for 3d point clouds
Panos Achlioptas, Olga Diamanti, Ioannis Mitliagkas, and Leonidas Guibas. Learning representations and generative models for 3d point clouds. In Jennifer Dy and Andreas Krause, editors,Proceedings of the 35th International Conference on Machine Learn- ing, volume 80 ofProceedings of Machine Learning Research, pages 40–49. PMLR, 10–15 Jul 2018. URLhttps://...
work page 2018
-
[2]
Adobe firefly.https://www.adobe.com/sensei/generative-ai/firefly
Adobe. Adobe firefly.https://www.adobe.com/sensei/generative-ai/firefly. html, 2023. Accessed: 2023-11-15
work page 2023
-
[3]
Sameer Agarwal, Noah Snavely, Ian Simon, Steven M Seitz, and Richard Szeliski. Building rome in a day. In2009 IEEE 12th International Conference on Computer Vision, pages 72–79, September 2009. doi: 10.1109/ICCV.2009.5459148. URLhttp: //dx.doi.org/10.1109/ICCV.2009.5459148
-
[4]
Sameer Agarwal, Noah Snavely, Steven M. Seitz, and Richard Szeliski. Bundle adjust- ment in the large. InComputer Vision – ECCV 2010, pages 29–42, 2010. doi: 10.1007/ 978-3-642-15552-9 3. URLhttp://dx.doi.org/10.1007/978-3-642-15552-9_3
-
[5]
Sameer Agarwal, Yasutaka Furukawa, Noah Snavely, Ian Simon, Brian Curless, Steven M. Seitz, and Richard Szeliski. Building rome in a day.Communications of the ACM, 54(10):105–112, October 2011. doi: 10.1145/2001269.2001293. URL http://dx.doi.org/10.1145/2001269.2001293
-
[6]
Sameer Agarwal, Keir Mierle, and The Ceres Solver Team. Ceres Solver. Software, 10
-
[7]
URLhttps://github.com/ceres-solver/ceres-solver. 88
-
[8]
Mixvpr: Feature mixing for visual place recognition
Amar Ali-bey, Brahim Chaib-draa, and Philippe Gigu` ere. Mixvpr: Feature mixing for visual place recognition. InProceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), pages 2998–3007, January 2023
work page 2023
-
[9]
Renderdiffusion: Image diffusion for 3d recon- struction, inpainting and generation
Titas Anciukeviˇ cius, Zexiang Xu, Matthew Fisher, Paul Henderson, Hakan Bilen, Niloy J Mitra, and Paul Guerrero. Renderdiffusion: Image diffusion for 3d recon- struction, inpainting and generation. InCVPR, 2023
work page 2023
-
[10]
Netvlad: Cnn architecture for weakly supervised place recognition
Relja Arandjelovic, Petr Gronat, Akihiko Torii, Tomas Pajdla, and Josef Sivic. Netvlad: Cnn architecture for weakly supervised place recognition. InProceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2016
work page 2016
-
[11]
Training a helpful and harmless assistant with reinforcement learning from human feedback, 2022
Yuntao Bai, Andy Jones, Kamal Ndousse, Amanda Askell, Anna Chen, Nova Das- Sarma, Dawn Drain, Stanislav Fort, Deep Ganguli, Tom Henighan, Nicholas Joseph, Saurav Kadavath, Jackson Kernion, Tom Conerly, Sheer El-Showk, Nelson Elhage, Zac Hatfield-Dodds, Danny Hernandez, Tristan Hume, Scott Johnston, Shauna Kravec, Liane Lovitt, Neel Nanda, Catherine Olsson...
work page 2022
-
[12]
Yuntao Bai, Saurav Kadavath, Sandipan Kundu, Amanda Askell, Jackson Kernion, Andy Jones, Anna Chen, Anna Goldie, Azalia Mirhoseini, Cameron McKinnon, Carol Chen, Catherine Olsson, Christopher Olah, Danny Hernandez, Dawn Drain, Deep Ganguli, Dustin Li, Eli Tran-Johnson, Ethan Perez, Jamie Kerr, Jared Mueller, Jeffrey Ladish, Joshua Landau, Kamal Ndousse, K...
work page 2022
-
[13]
A minimal solution for two-view focal-length estimation using two affine correspondences
Daniel Barath, Tekla Toth, and Levente Hajder. A minimal solution for two-view focal-length estimation using two affine correspondences. InCVPR, 2017
work page 2017
-
[14]
Fundamental matrix for cameras with radial distortion
Jo˜ ao Pedro Barreto and Kostas Daniilidis. Fundamental matrix for cameras with radial distortion. InICCV, 2005
work page 2005
-
[15]
Masked feature prediction for self-supervised visual pre-training
Jonathan T. Barron, Ben Mildenhall, Dor Verbin, Pratul P. Srinivasan, and Peter Hed- man. Mip-nerf 360: Unbounded anti-aliased neural radiance fields. InIEEE/CVF Con- ference on Computer Vision and Pattern Recognition (CVPR), pages 5460–5469, 2022. doi: 10.1109/CVPR52688.2022.00539. URLhttps://doi.org/10.1109/CVPR52688. 2022.00539
-
[16]
Jonathan T. Barron, Ben Mildenhall, Dor Verbin, Pratul P. Srinivasan, and Peter Hedman. Zip-nerf: Anti-aliased grid-based neural radiance fields. InIEEE/CVF In- ternational Conference on Computer Vision (ICCV), pages 19640–19648, 2023. doi: 10.1109/ICCV51070.2023.01804. URLhttps://doi.org/10.1109/ICCV51070.2023. 01804
-
[17]
Key.net: Keypoint detection by handcrafted and learned cnn filters, 2019
Axel Barroso-Laguna, Edgar Riba, Daniel Ponsa, and Krystian Mikolajczyk. Key.net: Keypoint detection by handcrafted and learned cnn filters, 2019. URLhttps://arxiv. org/abs/1904.00889
-
[18]
Rethinking visual geo- localization for large-scale applications
Gabriele Berton, Carlo Masone, and Barbara Caputo. Rethinking visual geo- localization for large-scale applications. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 4878–4888, June 2022
work page 2022
-
[19]
DynaSLAM: Tracking, Mapping and Inpainting in Dynamic Scenes
Berta Bescos, Jos´ e M. F´ acil, Javier Civera, and Jos´ e Neira. Dynaslam: Tracking, mapping and inpainting in dynamic scenes, 2018. URLhttps://arxiv.org/abs/ 1806.05620
work page internal anchor Pith review Pith/arXiv arXiv 2018
-
[20]
ZoeDepth: Zero-shot Transfer by Combining Relative and Metric Depth
Shariq Farooq Bhat, Reiner Birkl, Diana Wofk, Peter Wonka, and Matthias M¨ uller. Zoedepth: Zero-shot transfer by combining relative and metric depth.arXiv preprint arXiv:2302.12288, 2023. 90
work page internal anchor Pith review Pith/arXiv arXiv 2023
-
[21]
Training diffusion models with reinforcement learning, 2023
Kevin Black, Michael Janner, Yilun Du, Ilya Kostrikov, and Sergey Levine. Training diffusion models with reinforcement learning, 2023
work page 2023
-
[22]
Oleksandr Bogdan, Viktor Eckstein, Francois Rameau, and Jean-Charles Bazin. Deep- calib: a deep learning approach for automatic intrinsic calibration of wide field-of-view cameras. InProceedings of the 15th ACM SIGGRAPH European Conference on Visual Media Production, 2018
work page 2018
-
[23]
Neural-guided ransac: Learning where to sample model hypotheses
Eric Brachmann and Carsten Rother. Neural-guided ransac: Learning where to sample model hypotheses. In2019 IEEE/CVF International Conference on Computer Vision (ICCV), page 4321–4330. IEEE, October 2019. doi: 10.1109/iccv.2019.00442. URL http://dx.doi.org/10.1109/iccv.2019.00442
-
[24]
DSAC - Differentiable RANSAC for Camera Localization
Eric Brachmann, Alexander Krull, Sebastian Nowozin, Jamie Shotton, Frank Michel, Stefan Gumhold, and Carsten Rother. Dsac - differentiable ransac for camera local- ization, 2018. URLhttps://arxiv.org/abs/1611.05705
work page internal anchor Pith review Pith/arXiv arXiv 2018
-
[25]
Accelerated co- ordinate encoding: Learning to relocalize in minutes using rgb and poses
Eric Brachmann, Tommaso Cavallari, and Victor Adrian Prisacariu. Accelerated co- ordinate encoding: Learning to relocalize in minutes using rgb and poses. InCVPR, 2023
work page 2023
-
[26]
Eric Brachmann, Jamie Wynn, Shuai Chen, Tommaso Cavallari, ´Aron Monszpart, Daniyar Turmukhambetov, and Victor Adrian Prisacariu. Scene coordinate recon- struction: Posing of image collections via incremental learning of a relocalizer. In ECCV, 2024
work page 2024
-
[27]
Language models are few-shot learners.Advances in neural information processing systems, 33, 2020
Tom Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared D Kaplan, Prafulla Dhariwal, Arvind Neelakantan, Pranav Shyam, Girish Sastry, Amanda Askell, et al. Language models are few-shot learners.Advances in neural information processing systems, 33, 2020
work page 2020
-
[28]
Springer International Publishing, 2020
Bingyi Cao, Andr´ e Araujo, and Jack Sim.Unifying Deep Local and Global Features for Image Search, page 726–743. Springer International Publishing, 2020. ISBN 91 9783030585655. doi: 10.1007/978-3-030-58565-5 43. URLhttp://dx.doi.org/10. 1007/978-3-030-58565-5_43
-
[29]
Emerging properties in self-supervised vision trans- formers
Mathilde Caron, Hugo Touvron, Ishan Misra, Herv´ e J´ egou, Julien Mairal, Piotr Bo- janowski, and Armand Joulin. Emerging properties in self-supervised vision trans- formers. InProceedings of the International Conference on Computer Vision (ICCV), 2021
work page 2021
-
[30]
Stephen Casper, Xander Davies, Claudia Shi, Thomas Krendl Gilbert, J´ er´ emy Scheurer, Javier Rando, Rachel Freedman, Tomasz Korbak, David Lindner, Pedro Freire, Tony Wang, Samuel Marks, Charbel-Rapha¨ el Segerie, Micah Carroll, Andi Peng, Phillip Christoffersen, Mehul Damani, Stewart Slocum, Usman Anwar, Anand Siththaranjan, Max Nadeau, Eric J. Michaud,...
work page 2023
-
[31]
Chan, Mariano Monteiro, Petr Kellnhofer, Jiajun Wu, and Gordon Wetzstein
Eric R. Chan, Mariano Monteiro, Petr Kellnhofer, Jiajun Wu, and Gordon Wetzstein. pi-gan: Periodic implicit generative adversarial networks for 3d-aware image synthesis. InThe IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), June 2021
work page 2021
-
[32]
Eric R. Chan, Connor Z. Lin, Matthew A. Chan, Koki Nagano, Boxiao Pan, Shalini De Mello, Orazio Gallo, Leonidas Guibas, Jonathan Tremblay, Sameh Khamis, Tero Karras, and Gordon Wetzstein. Efficient geometry-aware 3d generative adversarial networks. InThe IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), June 2022
work page 2022
-
[33]
Efficient geometry-aware 3d generative adversarial networks
Eric R Chan, Connor Z Lin, Matthew A Chan, Koki Nagano, Boxiao Pan, Shalini De Mello, Orazio Gallo, Leonidas J Guibas, Jonathan Tremblay, Sameh Khamis, et al. Efficient geometry-aware 3d generative adversarial networks. InCVPR, 2022
work page 2022
-
[34]
Efficient and robust large-scale ro- tation averaging
Avishek Chatterjee and Venu Madhav Govindu. Efficient and robust large-scale ro- tation averaging. In2013 IEEE International Conference on Computer Vision, pages 92 521–528, December 2013. doi: 10.1109/ICCV.2013.70. URLhttp://dx.doi.org/10. 1109/ICCV.2013.70
-
[35]
Robust relative rotation averaging
Avishek Chatterjee and Venu Madhav Govindu. Robust relative rotation averaging. IEEE Transactions on Pattern Analysis and Machine Intelligence, 40(4):958–972, April
-
[36]
URLhttp://dx.doi.org/10.1109/TPAMI
doi: 10.1109/TPAMI.2017.2693984. URLhttp://dx.doi.org/10.1109/TPAMI. 2017.2693984
-
[37]
Tensorf: Ten- sorial radiance fields
Anpei Chen, Zexiang Xu, Andreas Geiger, Jingyi Yu, and Hao Su. Tensorf: Ten- sorial radiance fields. InComputer Vision – ECCV 2022 – 17th European Confer- ence, Tel Aviv, Israel, October 23-27, 2022, Proceedings, Part XXXII, volume 13692 ofLecture Notes in Computer Science, pages 333–350. Springer, 2022. doi: 10.1007/ 978-3-031-19824-3 20. URLhttps://doi....
-
[38]
Single-stage diffusion nerf: A unified approach to 3d generation and reconstruction
Hansheng Chen, Jiatao Gu, Anpei Chen, Wei Tian, Zhuowen Tu, Lingjie Liu, and Hao Su. Single-stage diffusion nerf: A unified approach to 3d generation and reconstruction. InICCV, 2023
work page 2023
-
[39]
Aspanformer: Detector-free image matching with adaptive span transformer, 2022
Hongkai Chen, Zixin Luo, Lei Zhou, Yurun Tian, Mingmin Zhen, Tian Fang, David Mckinnon, Yanghai Tsin, and Long Quan. Aspanformer: Detector-free image matching with adaptive span transformer, 2022. URLhttps://arxiv.org/abs/2208.14201
-
[40]
Fantasia3d: Disentangling geometry and appearance for high-quality text-to-3d content creation
Rui Chen, Yongwei Chen, Ningxin Jiao, and Kui Jia. Fantasia3d: Disentangling geometry and appearance for high-quality text-to-3d content creation. InProceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pages 20622– 20633, October 2023
work page 2023
-
[41]
Rui Chen, Yongwei Chen, Ningxin Jiao, and Kui Jia. Fantasia3d: Disentangling geometry and appearance for high-quality text-to-3d content creation.arXiv preprint arXiv:2303.13873, 2023
-
[42]
Learning implicit fields for generative shape modeling
Zhiqin Chen and Hao Zhang. Learning implicit fields for generative shape modeling. InThe IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2019. 93
work page 2019
-
[43]
V3d: Video diffusion models are effective 3d generators, 2024
Zilong Chen, Yikai Wang, Feng Wang, Zhengyi Wang, and Huaping Liu. V3d: Video diffusion models are effective 3d generators, 2024. URLhttps://arxiv.org/abs/ 2403.06738
-
[44]
Jaeyoung Chung, Suyoung Lee, Hyeongjin Nam, Jaerin Lee, and Kyoung Mu Lee. Luciddreamer: Domain-free generation of 3d gaussian splatting scenes.arXiv preprint arXiv:2311.13384, 2023
-
[45]
Abo: Dataset and benchmarks for real-world 3d object understanding
Jasmine Collins, Shubham Goel, Kenan Deng, Achleshwar Luthra, Leon Xu, Erhan Gundogdu, Xi Zhang, Tomas F Yago Vicente, Thomas Dideriksen, Himanshu Arora, et al. Abo: Dataset and benchmarks for real-world 3d object understanding. InCVPR, pages 21126–21136, 2022
work page 2022
-
[46]
Discrete- continuous optimization for large-scale structure from motion
David Crandall, Andrew Owens, Noah Snavely, and Dan Huttenlocher. Discrete- continuous optimization for large-scale structure from motion. InCVPR 2011, June
work page 2011
-
[47]
URLhttp://dx.doi.org/10.1109/CVPR
doi: 10.1109/CVPR.2011.5995626. URLhttp://dx.doi.org/10.1109/CVPR. 2011.5995626
-
[48]
Global structure-from-motion by similarity averaging
Zhaopeng Cui and Ping Tan. Global structure-from-motion by similarity averaging. In2015 IEEE International Conference on Computer Vision (ICCV), pages 864–872, December 2015. doi: 10.1109/ICCV.2015.105. URLhttp://dx.doi.org/10.1109/ ICCV.2015.105
-
[49]
Obja- verse: A universe of annotated 3d objects, 2022
Matt Deitke, Dustin Schwenk, Jordi Salvador, Luca Weihs, Oscar Michel, Eli Vander- Bilt, Ludwig Schmidt, Kiana Ehsani, Aniruddha Kembhavi, and Ali Farhadi. Obja- verse: A universe of annotated 3d objects, 2022
work page 2022
-
[50]
Objaverse-XL: A Universe of 10M+ 3D Objects
Matt Deitke, Ruoshi Liu, Matthew Wallingford, Huong Ngo, Oscar Michel, Aditya Kusupati, Alan Fan, Christian Laforte, Vikram Voleti, Samir Yitzhak Gadre, et al. Objaverse-xl: A universe of 10m+ 3d objects.arXiv preprint arXiv:2307.05663, 2023
work page internal anchor Pith review Pith/arXiv arXiv 2023
-
[51]
Obja- 94 verse: A universe of annotated 3d objects
Matt Deitke, Dustin Schwenk, Jordi Salvador, Luca Weihs, Oscar Michel, Eli Vander- Bilt, Ludwig Schmidt, Kiana Ehsani, Aniruddha Kembhavi, and Ali Farhadi. Obja- 94 verse: A universe of annotated 3d objects. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 13142–13153, 2023
work page 2023
-
[52]
Obja- verse: A universe of annotated 3d objects
Matt Deitke, Dustin Schwenk, Jordi Salvador, Luca Weihs, Oscar Michel, Eli Vander- Bilt, Ludwig Schmidt, Kiana Ehsani, Aniruddha Kembhavi, and Ali Farhadi. Obja- verse: A universe of annotated 3d objects. InCVPR, pages 13142–13153, 2023
work page 2023
-
[53]
Superpoint: Self- supervised interest point detection and description
Daniel DeTone, Tomasz Malisiewicz, and Andrew Rabinovich. Superpoint: Self- supervised interest point detection and description. In2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), page 337–33712. IEEE, June 2018. doi: 10.1109/cvprw.2018.00060. URLhttp://dx.doi.org/10. 1109/cvprw.2018.00060
-
[54]
Theia: A library for structure-from-motion
Theia Developers. Theia: A library for structure-from-motion. urlhttp://www.theia-sfm.org/, 2026. Accessed 2026-02-03
work page 2026
-
[55]
Siyan Dong, Shuzhe Wang, Shaohui Liu, Lulu Cai, Qingnan Fan, Juho Kannala, and Yanchao Yang. Reloc3r: Large-scale training of relative camera pose regression for generalizable, fast, and accurate visual localization. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 16739–16752, June 2025
work page 2025
-
[56]
Google scanned objects: A high-quality dataset of 3d scanned household items
Laura Downs, Anthony Francis, Nate Koenig, Brandon Kinman, Ryan Hickman, Krista Reymann, Thomas B McHugh, and Vincent Vanhoucke. Google scanned objects: A high-quality dataset of 3d scanned household items. In2022 International Conference on Robotics and Automation (ICRA), pages 2553–2560. IEEE, 2022
work page 2022
-
[57]
Close-range camera calibration.Photogramm
C Brown Duane. Close-range camera calibration.Photogramm. Eng, 37(8), 1971
work page 1971
-
[58]
MASt3R-SfM: a Fully-Integrated Solution for Unconstrained Structure-from-Motion
Bardienus Duisterhof, Lojze Zust, Philippe Weinzaepfel, Vincent Leroy, Yohann Cabon, and Jerome Revaud. MASt3R-SfM: A fully-integrated solution for uncon- strained structure-from-motion.arXiv preprint arXiv:2409.19152, 2024
-
[59]
MASt3r-sfm: a fully-integrated solution for unconstrained 95 structure-from-motion
Bardienus Pieter Duisterhof, Lojze Zust, Philippe Weinzaepfel, Vincent Leroy, Yohann Cabon, and Jerome Revaud. MASt3r-sfm: a fully-integrated solution for unconstrained 95 structure-from-motion. InInternational Conference on 3D Vision (3DV), 2025. URL https://openreview.net/forum?id=5uw1GRBFoT
work page 2025
-
[60]
D2-Net: A Trainable CNN for Joint Detection and Description of Local Features
Mihai Dusmanu, Ignacio Rocco, Tomas Pajdla, Marc Pollefeys, Josef Sivic, Akihiko Torii, and Torsten Sattler. D2-net: A trainable cnn for joint detection and description of local features, 2019. URLhttps://arxiv.org/abs/1905.03561
work page internal anchor Pith review Pith/arXiv arXiv 2019
-
[62]
Hyperparameters in rein- forcement learning and how to tune them, 2023
Theresa Eimer, Marius Lindauer, and Roberta Raileanu. Hyperparameters in rein- forcement learning and how to tune them, 2023
work page 2023
-
[63]
Syncity: Training-free generation of 3d worlds
Jakob Engstler, Gaisi Xia, Brandon Trabucco, Huicheng Yan, Rui Ding, Yun-Chun Chen, and Gordon Wetzstein. Syncity: Training-free generation of 3d worlds. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pages 18577–18587, 2025
work page 2025
-
[64]
Disentangled 3d scene gener- ation with layout learning
Dave Epstein, Alisa Kim, Xueyue Wang, Hsin-Ying Chen, Hung-Yu Tseng, Hsiao- Yu Fish Chen, Ming-Hsuan Yang, and Wei Wang. Disentangled 3d scene gener- ation with layout learning. In Dylan Fortune, Paul Vicol, Serena Yeung, Han- nah Zhang, Frank Hutter, and Hugo Larochelle, editors,Proceedings of the 41st International Conference on Machine Learning, volume...
work page 2024
-
[65]
Bastian Erdn¨ uß. A review of the one-parameter division undistortion model.ISPRS Annals of the Photogrammetry, Remote Sensing and Spatial Information Sciences, 1, 2021
work page 2021
-
[66]
Haoqiang Fan, Hao Su, and Leonidas J. Guibas. A point set generation network for 96 3d object reconstruction from a single image. InThe IEEE Conference on Computer Vision and Pattern Recognition (CVPR), July 2017
work page 2017
-
[67]
Dpok: Reinforcement learning for fine-tuning text-to-image diffusion models, 2023
Ying Fan, Olivia Watkins, Yuqing Du, Hao Liu, Moonkyung Ryu, Craig Boutilier, Pieter Abbeel, Mohammad Ghavamzadeh, Kangwook Lee, and Kimin Lee. Dpok: Reinforcement learning for fine-tuning text-to-image diffusion models, 2023
work page 2023
-
[68]
Ctrl-room: Controllable text-to-3d room meshes generation with layout constraints
Chuan Fang, Yuan Dong, Kunming Luo, Xiaotao Hu, Rakesh Shrestha, and Ping Tan. Ctrl-room: Controllable text-to-3d room meshes generation with layout constraints. InInternational Conference on 3D Vision (3DV), pages 692–701, 2025. doi: 10.1109/ 3DV66043.2025.00069
-
[69]
Olivier Faugeras.Three-dimensional computer vision: A geometric viewpoint. MIT Press, 1993
work page 1993
-
[70]
Simultaneous linear estimation of multiple view geometry and lens distortion
Andrew W Fitzgibbon. Simultaneous linear estimation of multiple view geometry and lens distortion. InCVPR, 2001
work page 2001
-
[71]
Jun Gao, Tianchang Shen, Zian Wang, Wenzheng Chen, Kangxue Yin, Daiqing Li, Or Litany, Zan Gojcic, and Sanja Fidler. Get3d: A generative model of high quality 3d textured shapes learned from images.Advances in Neural Information Processing Systems, 35:31841–31854, 2022
work page 2022
-
[72]
SEED-X: Multimodal Models with Unified Multi-granularity Comprehension and Generation
Yuying Ge, Sijie Zhao, Jinguo Zhu, Yixiao Ge, Kun Yi, Lin Song, Chen Li, Xiaohan Ding, and Ying Shan. SEED-X: multimodal models with unified multi-granularity comprehension and generation.CoRR, abs/2404.14396, 2024. doi: 10.48550/ARXIV. 2404.14396. URLhttps://doi.org/10.48550/arXiv.2404.14396
work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv 2024
-
[73]
Learning shape templates with structured implicit functions
Kyle Genova, Ethan Highlander, Ozgur Sirin, Wonmin Liao, Kostas Daniilidis, and Thomas Funkhouser. Learning shape templates with structured implicit functions. In The IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), June 2020
work page 2020
-
[74]
S2dnet: Learning accurate 97 correspondences for sparse-to-dense feature matching, 2020
Hugo Germain, Guillaume Bourmaud, and Vincent Lepetit. S2dnet: Learning accurate 97 correspondences for sparse-to-dense feature matching, 2020. URLhttps://arxiv. org/abs/2004.01673
-
[75]
Vivid-1-to-3: Novel view synthesis with video diffusion models, 2023
Jeong gi Kwak, Erqun Dong, Yuhe Jin, Hanseok Ko, Shweta Mahajan, and Kwang Moo Yi. Vivid-1-to-3: Novel view synthesis with video diffusion models, 2023. URLhttps: //arxiv.org/abs/2312.01305
-
[77]
Jose L G´ omez, Manuel Silva, Antonio Seoane, Agn` es Borr´ as, Mario Noriega, Germ´ an Ros, Jose A Iglesias-Guitian, and Antonio M L´ opez. All for one, and one for all: Urbansyn dataset, the third musketeer of synthetic driving scenes.Neurocomputing, 637:130038, 2025
work page 2025
-
[79]
V. M. Govindu. Lie-algebraic averaging for globally consistent motion estimation. In Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2004. CVPR 2004., volume 1, pages 684–691, 2004. doi: 10.1109/ CVPR.2004.1315098. URLhttp://dx.doi.org/10.1109/CVPR.2004.1315098
-
[80]
Kubric: A scalable dataset generator
Klaus Greff, Francois Belletti, Lucas Beyer, Carl Doersch, Yilun Du, Daniel Duck- worth, David J Fleet, Dan Gnanapragasam, Florian Golemo, Charles Herrmann, et al. Kubric: A scalable dataset generator. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 3749–3761, 2022
work page 2022
-
[81]
Thibault Groueix, Matthew Fisher, Vladimir G. Kim, Bryan C. Russell, and Mathieu 98 Aubry. A papier-mache approach to learning 3d surface generation. InThe IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2018
work page 2018
-
[82]
Cas- cade cost volume for high-resolution multi-view stereo and stereo matching, 2020
Xiaodong Gu, Zhiwen Fan, Zuozhuo Dai, Siyu Zhu, Feitong Tan, and Ping Tan. Cas- cade cost volume for high-resolution multi-view stereo and stereo matching, 2020. URL https://arxiv.org/abs/1912.06378
-
[83]
Yuan-Chen Guo, Ying-Tian Liu, Ruizhi Shao, Christian Laforte, Vikram Voleti, Guan Luo, Chia-Hao Chen, Zi-Xin Zou, Chen Wang, Yan-Pei Cao, and Song-Hai Zhang. threestudio: A unified framework for 3d content generation.https://github.com/ threestudio-project/threestudio, 2023
work page 2023
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.