Recognition: unknown
Feed-Forward 3D Scene Modeling: A Problem-Driven Perspective
Pith reviewed 2026-05-10 13:10 UTC · model grok-4.3
The pith
Feed-forward 3D reconstruction methods share common design patterns best captured by a taxonomy of five problems rather than output formats.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Despite diverse geometric output representations ranging from implicit fields to explicit primitives, existing feed-forward approaches share similar high-level architectural patterns such as image feature extraction backbones, multi-view information fusion mechanisms, and geometry-aware design principles. The authors therefore abstract away from output differences and organize the literature by five key problems that shape model design: feature enhancement, geometry awareness, model efficiency, augmentation strategies, and temporal-aware models. The taxonomy is further supported by comprehensive coverage of benchmarks, datasets, categorized applications, and discussion of future challenges.
What carries the argument
The proposed taxonomy centered on five key problems: feature enhancement, geometry awareness, model efficiency, augmentation strategies, and temporal-aware models. This structure abstracts from output representation differences to group methods by shared architectural strategies and design choices.
If this is right
- Advances in feature enhancement improve robustness to varied input images across many output formats.
- Geometry awareness mechanisms enforce multi-view consistency in reconstructed scenes.
- Efficiency-focused designs enable practical deployment of feed-forward models.
- Augmentation strategies support better generalization to new scenes and categories.
- Temporal-aware extensions allow the same feed-forward approach to handle dynamic or video inputs.
Where Pith is reading between the lines
- The taxonomy could guide hybrid models that combine solutions from multiple problem areas without regard to final output type.
- It implies that progress on any one problem may transfer across representation choices, accelerating overall field progress.
- Researchers might test the taxonomy by checking whether new papers naturally cluster under one or more of the five problems.
- Similar problem-driven groupings could be explored in adjacent tasks such as 4D reconstruction or neural rendering.
Load-bearing premise
Abstracting away from differences in geometric output representations yields a more useful organization of the literature than taxonomies based on those representations.
What would settle it
A detailed comparison that finds methods with different output representations require fundamentally incompatible architectural choices not captured by the five problems would undermine the taxonomy.
read the original abstract
Reconstructing 3D representations from 2D inputs is a fundamental task in computer vision and graphics, serving as a cornerstone for understanding and interacting with the physical world. While traditional methods achieve high fidelity, they are limited by slow per-scene optimization or category-specific training, which hinders their practical deployment and scalability. Hence, generalizable feed-forward 3D reconstruction has witnessed rapid development in recent years. By learning a model that maps images directly to 3D representations in a single forward pass, these methods enable efficient reconstruction and robust cross-scene generalization. Our survey is motivated by a critical observation: despite the diverse geometric output representations, ranging from implicit fields to explicit primitives, existing feed-forward approaches share similar high-level architectural patterns, such as image feature extraction backbones, multi-view information fusion mechanisms, and geometry-aware design principles. Consequently, we abstract away from these representation differences and instead focus on model design, proposing a novel taxonomy centered on model design strategies that are agnostic to the output format. Our proposed taxonomy organizes the research directions into five key problems that drive recent research development: feature enhancement, geometry awareness, model efficiency, augmentation strategies and temporal-aware models. To support this taxonomy with empirical grounding and standardized evaluation, we further comprehensively review related benchmarks and datasets, and extensively discuss and categorize real-world applications based on feed-forward 3D models. Finally, we outline future directions to address open challenges such as scalability, evaluation standards, and world modeling.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript surveys feed-forward 3D scene reconstruction methods that map 2D images to 3D representations in a single forward pass. It observes that approaches using diverse output formats (implicit fields to explicit primitives) nevertheless share high-level architectural patterns in image feature extraction, multi-view fusion, and geometry-aware design. From this observation the authors derive a representation-agnostic taxonomy organized around five driving problems: feature enhancement, geometry awareness, model efficiency, augmentation strategies, and temporal-aware models. The survey additionally reviews benchmarks and datasets, categorizes real-world applications, and outlines open challenges in scalability and evaluation.
Significance. A well-substantiated taxonomy that successfully decouples design strategies from output representation could usefully reorganize the rapidly expanding feed-forward 3D literature and surface shared research directions. The promised empirical review of benchmarks and applications would further increase the manuscript's value as a reference for the computer-vision community.
major comments (1)
- [Abstract and §1] Abstract and §1: the central claim that the five-problem taxonomy is 'agnostic to the output format' and more useful than prior representation-centered surveys is load-bearing, yet the manuscript provides no side-by-side re-categorization of the same papers under both schemes nor quantitative evidence (e.g., performance or efficiency trends grouped by the new axes) that the reorganization better predicts generalization behavior. Without such validation the taxonomy risks being a relabeling rather than an advance.
minor comments (2)
- [Taxonomy section] Ensure that every cited method is explicitly mapped to at least one of the five taxonomy categories so readers can verify coverage.
- [Taxonomy section] Clarify the exact criteria used to assign papers to 'augmentation strategies' versus 'model efficiency' when a method addresses both.
Simulated Author's Rebuttal
We thank the referee for the constructive feedback on our taxonomy. We address the major comment point by point below.
read point-by-point responses
-
Referee: [Abstract and §1] Abstract and §1: the central claim that the five-problem taxonomy is 'agnostic to the output format' and more useful than prior representation-centered surveys is load-bearing, yet the manuscript provides no side-by-side re-categorization of the same papers under both schemes nor quantitative evidence (e.g., performance or efficiency trends grouped by the new axes) that the reorganization better predicts generalization behavior. Without such validation the taxonomy risks being a relabeling rather than an advance.
Authors: We agree that a side-by-side comparison would help readers see the reorganization in action. In the revised version we will add a table in Section 1 that maps 12–15 representative papers (covering implicit, explicit, and hybrid outputs) to both the traditional representation-centered categories and our five-problem taxonomy. This table will illustrate how methods that differ in output format nevertheless share design choices for feature enhancement, geometry awareness, etc. We also plan to expand the discussion in §1 with concrete examples of cross-representation patterns already noted in the manuscript. Regarding quantitative evidence (performance or efficiency trends grouped by the new axes), we note that a rigorous meta-analysis would require standardized re-implementations and controlled re-evaluations of many methods—an undertaking that exceeds the scope of a survey. Our taxonomy is motivated by the observed architectural commonalities across the literature rather than by new empirical meta-results; the separate benchmark and dataset review in the paper is intended to facilitate such future quantitative studies. We believe the value of the taxonomy lies in its ability to surface shared research directions that representation-centric surveys obscure, even without new performance numbers. revision: partial
Circularity Check
No circularity: survey taxonomy is observational synthesis, not derived from self-referential inputs
full rationale
The manuscript is a literature survey whose central contribution is an observational claim that feed-forward 3D methods share high-level architectural patterns (feature backbones, multi-view fusion, geometry-aware designs) across output representations, followed by a proposed taxonomy of five design problems. This claim is grounded in review of external literature rather than any equation, fitted parameter, or self-citation chain that reduces to the paper's own inputs. No predictions, first-principles derivations, uniqueness theorems, or ansatzes are introduced; the taxonomy is presented as an organizational lens, not a result forced by construction. Self-citations, if present for specific methods, are not load-bearing for the taxonomy itself. The paper therefore contains no circular steps of the enumerated kinds.
Axiom & Free-Parameter Ledger
Reference graph
Works this paper leans on
-
[1]
Nerf: Representing scenes as neural radiance fields for view synthesis
Ben Mildenhall, Pratul P Srinivasan, Matthew Tancik, Jonathan T Barron, Ravi Ramamoor- thi, and Ren Ng. Nerf: Representing scenes as neural radiance fields for view synthesis. Communications of the ACM, 65(1):99–106, 2021
2021
-
[2]
3d gaussian splatting for real-time radiance field rendering.ACM Trans
Bernhard Kerbl, Georgios Kopanas, Thomas Leimkühler, and George Drettakis. 3d gaussian splatting for real-time radiance field rendering.ACM Trans. Graph., 42(4):139–1, 2023
2023
-
[3]
Dust3r: Geometric 3d vision made easy
Shuzhe Wang, Vincent Leroy, Yohann Cabon, Boris Chidlovskii, and Jerome Revaud. Dust3r: Geometric 3d vision made easy. InIEEE Conf. Comput. Vis. Pattern Recog., pages 20697–20709, 2024
2024
-
[4]
Deepsdf: Learning continuous signed distance functions for shape representation
Jeong Joon Park, Peter Florence, Julian Straub, Richard Newcombe, and Steven Lovegrove. Deepsdf: Learning continuous signed distance functions for shape representation. InIEEE Conf. Comput. Vis. Pattern Recog., pages 165–174, 2019
2019
-
[5]
Occupancy networks: Learning 3d reconstruction in function space
Lars Mescheder, Michael Oechsle, Michael Niemeyer, Sebastian Nowozin, and Andreas Geiger. Occupancy networks: Learning 3d reconstruction in function space. InIEEE Conf. Comput. Vis. Pattern Recog., pages 4460–4470, 2019
2019
-
[6]
Texture fields: Learning texture representations in function space
Michael Oechsle, Lars Mescheder, Michael Niemeyer, Thilo Strauss, and Andreas Geiger. Texture fields: Learning texture representations in function space. InInt. Conf. Comput. Vis., pages 4531–4540, 2019
2019
-
[7]
Learning implicit surface light fields
Michael Oechsle, Michael Niemeyer, Christian Reiser, Lars Mescheder, Thilo Strauss, and Andreas Geiger. Learning implicit surface light fields. InInt. Conf. 3D Vision (3DV), pages 452–462. IEEE, 2020
2020
-
[8]
Differentiable volumetric rendering: Learning implicit 3d representations without 3d supervision
Michael Niemeyer, Lars Mescheder, Michael Oechsle, and Andreas Geiger. Differentiable volumetric rendering: Learning implicit 3d representations without 3d supervision. InIEEE Conf. Comput. Vis. Pattern Recog., pages 3504–3515, 2020
2020
-
[9]
A computer algorithm for reconstructing a scene from two projections.Nature, 293(5828):133–135, 1981
H Christopher Longuet-Higgins. A computer algorithm for reconstructing a scene from two projections.Nature, 293(5828):133–135, 1981
1981
-
[10]
Multi-view stereo: A tutorial.Foundations and trends®in Computer Graphics and Vision, 9(1-2):1–148, 2015
Yasutaka Furukawa, Carlos Hernández, et al. Multi-view stereo: A tutorial.Foundations and trends®in Computer Graphics and Vision, 9(1-2):1–148, 2015
2015
-
[11]
Learning a probabilistic latent space of object shapes via 3d generative-adversarial modeling.Advances in neural information processing systems, 29, 2016
Jiajun Wu, Chengkai Zhang, Tianfan Xue, Bill Freeman, and Josh Tenenbaum. Learning a probabilistic latent space of object shapes via 3d generative-adversarial modeling.Advances in neural information processing systems, 29, 2016
2016
-
[12]
A point set generation network for 3d object reconstruction from a single image
Haoqiang Fan, Hao Su, and Leonidas J Guibas. A point set generation network for 3d object reconstruction from a single image. InIEEE Conf. Comput. Vis. Pattern Recog., pages 605–613, 2017
2017
-
[14]
Light field networks: Neural scene representations with single-evaluation rendering.Adv
Vincent Sitzmann, Semon Rezchikov, Bill Freeman, Josh Tenenbaum, and Fredo Durand. Light field networks: Neural scene representations with single-evaluation rendering.Adv. Neural Inf. Process. Syst., 34:19313–19325, 2021. 38 / 66 Feed-Forward 3D Scene Modeling: A Problem-Driven Perspective8 Conclusion
2021
-
[15]
pixelsplat: 3d gaussian splats from image pairs for scalable generalizable 3d reconstruction
David Charatan, Sizhe Lester Li, Andrea Tagliasacchi, and Vincent Sitzmann. pixelsplat: 3d gaussian splats from image pairs for scalable generalizable 3d reconstruction. InIEEE Conf. Comput. Vis. Pattern Recog., pages 19457–19467, 2024
2024
-
[16]
Splatter image: Ultra- fast single-view 3d reconstruction
Stanislaw Szymanowicz, Christian Rupprecht, and Andrea Vedaldi. Splatter image: Ultra- fast single-view 3d reconstruction. InIEEE Conf. Comput. Vis. Pattern Recog., 2024
2024
-
[17]
Mvsplat: Efficient 3d gaussian splatting from sparse multi-view images
Yuedong Chen, Haofei Xu, Chuanxia Zheng, Bohan Zhuang, Marc Pollefeys, Andreas Geiger, Tat-Jen Cham, and Jianfei Cai. Mvsplat: Efficient 3d gaussian splatting from sparse multi-view images. InEur. Conf. Comput. Vis., pages 370–386. Springer, 2024
2024
-
[18]
Depthsplat: Connecting gaussian splatting and depth
Haofei Xu, Songyou Peng, Fangjinhua Wang, Hermann Blum, Daniel Barath, Andreas Geiger, and Marc Pollefeys. Depthsplat: Connecting gaussian splatting and depth. InIEEE Conf. Comput. Vis. Pattern Recog., 2025
2025
-
[19]
Vggt: Visual geometry grounded transformer, 2025
Jianyuan Wang, Minghao Chen, Nikita Karaev, Andrea Vedaldi, Christian Rupprecht, and David Novotny. Vggt: Visual geometry grounded transformer.arXiv preprint arXiv:2503.11651, 2025
-
[20]
arXiv preprint arXiv:2507.14501 , year=
Jiahui Zhang, Yuelei Li, Anpei Chen, Muyu Xu, Kunhao Liu, Jianyuan Wang, Xiao-Xiao Long, Hanxue Liang, Zexiang Xu, Hao Su, et al. Advances in feed-forward 3d reconstruction and view synthesis: A survey.arXiv preprint arXiv:2507.14501, 2025
-
[21]
Large scale multi-view stereopsis evaluation
Rasmus Jensen, Anders Dahl, George Vogiatzis, Engin Tola, and Henrik Aanæs. Large scale multi-view stereopsis evaluation. InIEEE Conf. Comput. Vis. Pattern Recog., pages 406–413, 2014
2014
-
[22]
Scannet: Richly-annotated 3d reconstructions of indoor scenes
Angela Dai, Angel X Chang, Manolis Savva, Maciej Halber, Thomas Funkhouser, and Matthias Nießner. Scannet: Richly-annotated 3d reconstructions of indoor scenes. InIEEE Conf. Comput. Vis. Pattern Recog., pages 5828–5839, 2017
2017
-
[23]
The Replica Dataset: A Digital Replica of Indoor Spaces
Julian Straub, Thomas Whelan, Lingni Ma, Yufan Chen, Erik Wijmans, Simon Green, Jakob J Engel, Raul Mur-Artal, Carl Ren, Shobhit Verma, et al. The replica dataset: A digital replica of indoor spaces.arXiv preprint arXiv:1906.05797, 2019
work page internal anchor Pith review arXiv 1906
-
[24]
Stereo magnification: learning view synthesis using multiplane images.ACM Trans
Tinghui Zhou, Richard Tucker, John Flynn, Graham Fyffe, and Noah Snavely. Stereo magnification: learning view synthesis using multiplane images.ACM Trans. Graph., 37(4): 1–12, 2018
2018
-
[25]
Dl3dv-10k: A large-scale scene dataset for deep learning- based 3d vision
Lu Ling, Yichen Sheng, Zhi Tu, Wentian Zhao, Cheng Xin, Kun Wan, Lantao Yu, Qianyu Guo, Zixun Yu, Yawen Lu, et al. Dl3dv-10k: A large-scale scene dataset for deep learning- based 3d vision. InIEEE Conf. Comput. Vis. Pattern Recog., pages 22160–22169, 2024
2024
-
[26]
Grounding image matching in 3d with mast3r
Vincent Leroy, Yohann Cabon, and Jérôme Revaud. Grounding image matching in 3d with mast3r. InEur. Conf. Comput. Vis., pages 71–91. Springer, 2024
2024
-
[27]
Chao Chen, Yu-Shen Liu, and Zhizhong Han
Hanzhi Chang, Ruijie Zhu, Wenjie Chang, Mulin Yu, Yanzhe Liang, Jiahao Lu, Zhuoyuan Li, and Tianzhu Zhang. Meshsplat: Generalizable sparse-view surface reconstruction via gaussian splatting.arXiv preprint arXiv:2508.17811, 2025
-
[28]
Chen, Zeyu Zhang, Jiawang Bian, Bohan Zhuang, and Chunhua Shen
Duochao Shi, Weijie Wang, Donny Y. Chen, Zeyu Zhang, Jiawang Bian, Bohan Zhuang, and Chunhua Shen. Revisiting depth representations for feed-forward 3d gaussian splatting. arXiv preprint arXiv:2506.05327, 2025. 39 / 66 Feed-Forward 3D Scene Modeling: A Problem-Driven Perspective8 Conclusion
-
[29]
Lightgaussian: Unbounded 3d gaussian compression with 15x reduction and 200+ fps.Adv
Zhiwen Fan, Kevin Wang, Kairun Wen, Zehao Zhu, Dejia Xu, Zhangyang Wang, et al. Lightgaussian: Unbounded 3d gaussian compression with 15x reduction and 200+ fps.Adv. Neural Inf. Process. Syst., 37:140138–140158, 2024
2024
-
[30]
Compact 3d gaussian representation for radiance field
Joo Chan Lee, Daniel Rho, Xiangyu Sun, Jong Hwan Ko, and Eunbyung Park. Compact 3d gaussian representation for radiance field. InIEEE Conf. Comput. Vis. Pattern Recog., pages 21719–21728, 2024
2024
-
[31]
Barron, Ben Mildenhall, Dor Verbin, Pratul P
Jonathan T. Barron, Ben Mildenhall, Dor Verbin, Pratul P. Srinivasan, and Peter Hedman. Mip-nerf 360: Unbounded anti-aliased neural radiance fields.IEEE Conf. Comput. Vis. Pattern Recog., 2022
2022
-
[32]
Regnerf: Regularizing neural radiance fields for view synthesis from sparse inputs
Michael Niemeyer, Jonathan T Barron, Ben Mildenhall, Mehdi SM Sajjadi, Andreas Geiger, and Noha Radwan. Regnerf: Regularizing neural radiance fields for view synthesis from sparse inputs. InIEEE Conf. Comput. Vis. Pattern Recog., pages 5480–5490, 2022
2022
-
[33]
Chen Ziwen, Hao Tan, Kai Zhang, Sai Bi, Fujun Luan, Yicong Hong, Li Fuxin, and Zexiang Xu. Long-lrm: Long-sequence large reconstruction model for wide-coverage gaussian splats. arXiv preprint arXiv:2410.12781, 2024
-
[34]
Optical models for direct volume rendering.IEEE Trans
Nelson Max. Optical models for direct volume rendering.IEEE Trans. Vis. Comput. Graph., 1(2):99–108, 2002
2002
-
[35]
arXiv preprint arXiv:2209.02417 (2022)
Andrea Tagliasacchi and Ben Mildenhall. Volume rendering digest (for nerf).arXiv preprint arXiv:2209.02417, 2022
-
[36]
Barron, Ben Mildenhall, Matthew Tancik, Peter Hedman, Ricardo Martin- Brualla, and Pratul P
Jonathan T. Barron, Ben Mildenhall, Matthew Tancik, Peter Hedman, Ricardo Martin- Brualla, and Pratul P. Srinivasan. Mip-nerf: A multiscale representation for anti-aliasing neural radiance fields.Int. Conf. Comput. Vis., 2021
2021
-
[37]
Barron, and Pratul P
Dor Verbin, Peter Hedman, Ben Mildenhall, Todd Zickler, Jonathan T. Barron, and Pratul P. Srinivasan. Ref-NeRF: Structured view-dependent appearance for neural radiance fields. IEEE Conf. Comput. Vis. Pattern Recog., 2022
2022
-
[38]
Depth-supervised nerf: Fewer views and faster training for free
Kangle Deng, Andrew Liu, Jun-Yan Zhu, and Deva Ramanan. Depth-supervised nerf: Fewer views and faster training for free. InIEEE Conf. Comput. Vis. Pattern Recog., pages 12882–12891, 2022
2022
-
[39]
Nerfingmvs: Guided optimization of neural radiance fields for indoor multi-view stereo
Yi Wei, Shaohui Liu, Yongming Rao, Wang Zhao, Jiwen Lu, and Jie Zhou. Nerfingmvs: Guided optimization of neural radiance fields for indoor multi-view stereo. InInt. Conf. Comput. Vis., 2021
2021
-
[40]
Instant neural graphics primitives with a multiresolution hash encoding.ACM Trans
Thomas Müller, Alex Evans, Christoph Schied, and Alexander Keller. Instant neural graphics primitives with a multiresolution hash encoding.ACM Trans. Graph., 41(4):1–15, 2022
2022
-
[41]
Fregs: 3d gaussian splatting with progressive frequency regularization
Jiahui Zhang, Fangneng Zhan, Muyu Xu, Shijian Lu, and Eric Xing. Fregs: 3d gaussian splatting with progressive frequency regularization. InIEEE Conf. Comput. Vis. Pattern Recog., pages 21424–21433, 2024
2024
-
[42]
Scaffold-gs: Structured 3d gaussians for view-adaptive rendering
Tao Lu, Mulin Yu, Linning Xu, Yuanbo Xiangli, Limin Wang, Dahua Lin, and Bo Dai. Scaffold-gs: Structured 3d gaussians for view-adaptive rendering. InIEEE Conf. Comput. Vis. Pattern Recog., pages 20654–20664, 2024
2024
-
[43]
Gaussian opacity fields: Efficient adaptive surface reconstruction in unbounded scenes.ACM Trans
Zehao Yu, Torsten Sattler, and Andreas Geiger. Gaussian opacity fields: Efficient adaptive surface reconstruction in unbounded scenes.ACM Trans. Graph., 43(6):1–13, 2024. 40 / 66 Feed-Forward 3D Scene Modeling: A Problem-Driven Perspective8 Conclusion
2024
-
[44]
Gaussianshader: 3d gaussian splatting with shading functions for reflective surfaces
Yingwenqi Jiang, Jiadong Tu, Yuan Liu, Xifeng Gao, Xiaoxiao Long, Wenping Wang, and Yuexin Ma. Gaussianshader: 3d gaussian splatting with shading functions for reflective surfaces. InIEEE Conf. Comput. Vis. Pattern Recog., pages 5322–5332, 2024
2024
-
[45]
Spec-gaussian: Anisotropic view-dependent appearance for 3d gaussian splatting.Adv
Ziyi Yang, Xinyu Gao, Yang-Tian Sun, Yihua Huang, Xiaoyang Lyu, Wen Zhou, Shaohui Jiao, Xiaojuan Qi, and Xiaogang Jin. Spec-gaussian: Anisotropic view-dependent appearance for 3d gaussian splatting.Adv. Neural Inf. Process. Syst., 37:61192–61216, 2024
2024
-
[46]
Mirror-3dgs: Incorporating mirror reflections into 3d gaussian splatting
Jiarui Meng, Haijie Li, Yanmin Wu, Qiankun Gao, Shuzhou Yang, Jian Zhang, and Siwei Ma. Mirror-3dgs: Incorporating mirror reflections into 3d gaussian splatting. InIEEE Int. Conf. Vis. Commun. Image Process., pages 1–5. IEEE, 2024
2024
-
[47]
Bags: Blur agnostic gaussian splatting through multi-scale kernel modeling
Cheng Peng, Yutao Tang, Yifan Zhou, Nengyu Wang, Xijun Liu, Deming Li, and Rama Chellappa. Bags: Blur agnostic gaussian splatting through multi-scale kernel modeling. In Eur. Conf. Comput. Vis., pages 293–310. Springer, 2024
2024
-
[48]
Bad-gaussians: Bundle adjusted deblur gaussian splatting
Lingzhe Zhao, Peng Wang, and Peidong Liu. Bad-gaussians: Bundle adjusted deblur gaussian splatting. InEur. Conf. Comput. Vis., pages 233–250. Springer, 2024
2024
-
[49]
Reducing the memory footprint of 3d gaussian splatting.Proceedings of the ACM on Computer Graphics and Interactive Techniques, 7(1):1–17, 2024
Panagiotis Papantonakis, Georgios Kopanas, Bernhard Kerbl, Alexandre Lanvin, and George Drettakis. Reducing the memory footprint of 3d gaussian splatting.Proceedings of the ACM on Computer Graphics and Interactive Techniques, 7(1):1–17, 2024
2024
-
[50]
Compressed 3d gaussian splatting for accelerated novel view synthesis
Simon Niedermayr, Josef Stumpfegger, and Rüdiger Westermann. Compressed 3d gaussian splatting for accelerated novel view synthesis. InIEEE Conf. Comput. Vis. Pattern Recog., pages 10349–10358, 2024
2024
-
[51]
Hac: Hash-grid assisted context for 3d gaussian splatting compression
Yihang Chen, Qianyi Wu, Weiyao Lin, Mehrtash Harandi, and Jianfei Cai. Hac: Hash-grid assisted context for 3d gaussian splatting compression. InEur. Conf. Comput. Vis., pages 422–438. Springer, 2024
2024
-
[52]
Dsac-differentiable ransac for camera localization
Eric Brachmann, Alexander Krull, Sebastian Nowozin, Jamie Shotton, Frank Michel, Stefan Gumhold, and Carsten Rother. Dsac-differentiable ransac for camera localization. InIEEE Conf. Comput. Vis. Pattern Recog., pages 6684–6692, 2017
2017
-
[53]
Learning less is more-6d camera localization via 3d surface regression
Eric Brachmann and Carsten Rother. Learning less is more-6d camera localization via 3d surface regression. InIEEE Conf. Comput. Vis. Pattern Recog., pages 4654–4662, 2018
2018
-
[54]
Visual camera re-localization from rgb and rgb-d images using dsac.IEEE Trans
Eric Brachmann and Carsten Rother. Visual camera re-localization from rgb and rgb-d images using dsac.IEEE Trans. Pattern Anal. Mach. Intell., 44(9):5847–5865, 2021
2021
-
[55]
Sacreg: Scene-agnostic coordinate regression for visual localization
Jerome Revaud, Yohann Cabon, Romain Brégier, JongMin Lee, and Philippe Weinzaepfel. Sacreg: Scene-agnostic coordinate regression for visual localization. InIEEE Conf. Comput. Vis. Pattern Recog., pages 688–698, 2024
2024
-
[56]
Learning camera localization via dense scene matching
Shitao Tang, Chengzhou Tang, Rui Huang, Siyu Zhu, and Ping Tan. Learning camera localization via dense scene matching. InIEEE Conf. Comput. Vis. Pattern Recog., pages 1831–1841, 2021
2021
-
[57]
Sanet: Scene agnostic network for camera localization
Luwei Yang, Ziqian Bai, Chengzhou Tang, Honghua Li, Yasutaka Furukawa, and Ping Tan. Sanet: Scene agnostic network for camera localization. InInt. Conf. Comput. Vis., pages 42–51, 2019
2019
-
[58]
Learning efficient point cloud generation for dense 3d object reconstruction
Chen-Hsuan Lin, Chen Kong, and Simon Lucey. Learning efficient point cloud generation for dense 3d object reconstruction. InAAAI, volume 32, 2018. 41 / 66 Feed-Forward 3D Scene Modeling: A Problem-Driven Perspective8 Conclusion
2018
-
[59]
Pixels, voxels, and views: A study of shape representations for single view 3d object shape prediction
Daeyun Shin, Charless C Fowlkes, and Derek Hoiem. Pixels, voxels, and views: A study of shape representations for single view 3d object shape prediction. InIEEE Conf. Comput. Vis. Pattern Recog., pages 3061–3069, 2018
2018
-
[60]
Multi-view 3d models from single images with a convolutional network
Maxim Tatarchenko, Alexey Dosovitskiy, and Thomas Brox. Multi-view 3d models from single images with a convolutional network. InEur. Conf. Comput. Vis., pages 322–337. Springer, 2016
2016
-
[61]
Synsin: End-to-end view synthesis from a single image
Olivia Wiles, Georgia Gkioxari, Richard Szeliski, and Justin Johnson. Synsin: End-to-end view synthesis from a single image. InIEEE Conf. Comput. Vis. Pattern Recog., pages 7467–7477, 2020
2020
-
[62]
Croco: Self-supervised pre-training for 3d vision tasks by cross-view completion.Adv
Philippe Weinzaepfel, Vincent Leroy, Thomas Lucas, Romain Brégier, Yohann Cabon, Vaibhav Arora, Leonid Antsfeld, Boris Chidlovskii, Gabriela Csurka, and Jérôme Revaud. Croco: Self-supervised pre-training for 3d vision tasks by cross-view completion.Adv. Neural Inf. Process. Syst., 35:3502–3516, 2022
2022
-
[63]
Learning implicit fields for generative shape modeling
Zhiqin Chen and Hao Zhang. Learning implicit fields for generative shape modeling. In IEEE Conf. Comput. Vis. Pattern Recog., pages 5939–5948, 2019
2019
-
[64]
Convolutional occupancy networks
Songyou Peng, Michael Niemeyer, Lars Mescheder, Marc Pollefeys, and Andreas Geiger. Convolutional occupancy networks. InEur. Conf. Comput. Vis., 2020
2020
-
[65]
Sparseneus: Fast generalizable neural surface reconstruction from sparse views
Xiaoxiao Long, Cheng Lin, Peng Wang, Taku Komura, and Wenping Wang. Sparseneus: Fast generalizable neural surface reconstruction from sparse views. InEur. Conf. Comput. Vis., pages 210–227. Springer, 2022
2022
-
[66]
Volrecon: Volume rendering of signed ray distance functions for generalizable multi-view reconstruction
Yufan Ren, Fangjinhua Wang, Tong Zhang, Marc Pollefeys, and Sabine Süsstrunk. Volrecon: Volume rendering of signed ray distance functions for generalizable multi-view reconstruction. InIEEE Conf. Comput. Vis. Pattern Recog., pages 16685–16695, 2023
2023
-
[67]
Retr: Modeling rendering via transformer for generalizable neural surface reconstruction.Advances in neural information processing systems, 36:62332–62351, 2023
Yixun Liang, Hao He, and Yingcong Chen. Retr: Modeling rendering via transformer for generalizable neural surface reconstruction.Advances in neural information processing systems, 36:62332–62351, 2023
2023
-
[68]
Luoyuan Xu, Tao Guan, Yuesong Wang, Wenkai Liu, Zhaojie Zeng, Junle Wang, and Wei Yang. C2f2neus: Cascade cost frustum fusion for high fidelity and generalizable neural surface reconstruction. InInt. Conf. Comput. Vis., pages 18245–18255, 2023. doi: 10.1109/ICCV51070.2023.01677
-
[69]
Uforecon: Generalizable sparse-view surface reconstruction from arbitrary and unfavorable sets
Youngju Na, Woo Jae Kim, Kyu Beom Han, Suhyeon Ha, and Sung-Eui Yoon. Uforecon: Generalizable sparse-view surface reconstruction from arbitrary and unfavorable sets. In IEEE Conf. Comput. Vis. Pattern Recog., pages 5094–5104, 2024
2024
-
[70]
Zihui Gao, Jia-Wang Bian, Guosheng Lin, Hao Chen, and Chunhua Shen. Surfacesplat: Connecting surface reconstruction and gaussian splatting.arXiv preprint arXiv:2507.15602, 2025
-
[71]
arXiv preprint arXiv:2404.12385 , year=
Xinyue Wei, Kai Zhang, Sai Bi, Hao Tan, Fujun Luan, Valentin Deschaintre, Kalyan Sunkavalli, Hao Su, and Zexiang Xu. Meshlrm: Large reconstruction model for high-quality meshes.arXiv preprint arXiv:2404.12385, 2024
-
[72]
Meshformer: High-quality mesh generation with 3d-guided reconstruction model.Adv
Minghua Liu, Chong Zeng, Xinyue Wei, Ruoxi Shi, Linghao Chen, Chao Xu, Mengqi Zhang, Zhaoning Wang, Xiaoshuai Zhang, Isabella Liu, et al. Meshformer: High-quality mesh generation with 3d-guided reconstruction model.Adv. Neural Inf. Process. Syst., 37: 59314–59341, 2024. 42 / 66 Feed-Forward 3D Scene Modeling: A Problem-Driven Perspective8 Conclusion
2024
-
[73]
Renderformer: Transformer-based neural rendering of triangle meshes with global illumination
Chong Zeng, Yue Dong, Pieter Peers, Hongzhi Wu, and Xin Tong. Renderformer: Transformer-based neural rendering of triangle meshes with global illumination. InACM SIGGRAPH Conf. Comput. Graph. Interact. Tech., 2025
2025
-
[74]
Lara: Efficient large-baseline radiance fields
Anpei Chen, Haofei Xu, Stefano Esposito, Siyu Tang, and Andreas Geiger. Lara: Efficient large-baseline radiance fields. InEur. Conf. Comput. Vis., pages 338–355. Springer, 2024
2024
-
[75]
Lrm: Large reconstruction model for single image to 3d
Yicong Hong, Kai Zhang, Jiuxiang Gu, Sai Bi, Yang Zhou, Difan Liu, Feng Liu, Kalyan Sunkavalli, Trung Bui, and Hao Tan. Lrm: Large reconstruction model for single image to 3d. InInt. Conf. Learn. Represent., 2024
2024
-
[76]
Instant3d: Fast text-to-3d with sparse-view generation and large reconstruction model
Jiahao Li, Hao Tan, Kai Zhang, Zexiang Xu, Fujun Luan, Yinghao Xu, Yicong Hong, Kalyan Sunkavalli, Greg Shakhnarovich, and Sai Bi. Instant3d: Fast text-to-3d with sparse-view generation and large reconstruction model. InInt. Conf. Learn. Represent., 2024
2024
-
[77]
Agg: Amortized generative 3d gaussians for single image to 3d.arXiv preprint arXiv:2401.04099, 2024
Dejia Xu, Ye Yuan, Morteza Mardani, Sifei Liu, Jiaming Song, Zhangyang Wang, and Arash Vahdat. Agg: Amortized generative 3d gaussians for single image to 3d.arXiv preprint arXiv:2401.04099, 2024
-
[78]
Triplane meets gaussian splatting: Fast and generalizable single-view 3d reconstruction with transformers
Zi-Xin Zou, Zhipeng Yu, Yuan-Chen Guo, Yangguang Li, Ding Liang, Yan-Pei Cao, and Song-Hai Zhang. Triplane meets gaussian splatting: Fast and generalizable single-view 3d reconstruction with transformers. InIEEE Conf. Comput. Vis. Pattern Recog., pages 10324–10335, 2024
2024
-
[79]
Generalizable patch-based neural rendering
Mohammed Suhail, Carlos Esteves, Leonid Sigal, and Ameesh Makadia. Generalizable patch-based neural rendering. InEur. Conf. Comput. Vis., pages 156–174. Springer, 2022
2022
-
[80]
Lvsm: A large view synthesis model with minimal 3d inductive bias
Haian Jin, Hanwen Jiang, Hao Tan, Kai Zhang, Sai Bi, Tianyuan Zhang, Fujun Luan, Noah Snavely, and Zexiang Xu. Lvsm: A large view synthesis model with minimal 3d inductive bias. InInt. Conf. Learn. Represent., 2025. URLhttps://openreview.net/forum?id=QQ BPWtvtcn
2025
-
[81]
Pluckerf: A line-based 3d representation for few-view reconstruction
Sam Bahrami and Dylan Campbell. Pluckerf: A line-based 3d representation for few-view reconstruction. InIEEE Conf. Comput. Vis. Pattern Recog., pages 317–326, 2025
2025
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.