Recognition: unknown
Generalizable Sparse-View 3D Reconstruction from Unconstrained Images
Pith reviewed 2026-05-07 05:39 UTC · model grok-4.3
The pith
GenWildSplat reconstructs 3D outdoor scenes from sparse unposed photos in one forward pass.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
GenWildSplat is a feed-forward framework that ingests sparse, unposed images and directly outputs depth, camera parameters, and 3D Gaussians placed in a canonical space using learned geometric priors. An appearance adapter modulates the Gaussians to match target lighting, while semantic segmentation removes transient objects. Curriculum training on combined synthetic and real data enables generalization to diverse real-world illumination and occlusion patterns, delivering state-of-the-art rendering quality on PhotoTourism and MegaScenes benchmarks at real-time speeds with no test-time optimization.
What carries the argument
GenWildSplat, which predicts depth, poses, and canonical 3D Gaussians from unposed images then modulates them via an appearance adapter and semantic segmentation.
Load-bearing premise
Curriculum training on a blend of synthetic and real scenes produces priors strong enough to handle arbitrary real-world lighting changes and moving objects without any per-scene optimization or fine-tuning.
What would settle it
Run the trained model on a fresh set of unposed outdoor photos that contain lighting or transient patterns outside the training distribution and measure whether rendering quality falls below baseline methods or requires per-scene optimization to recover.
Figures
read the original abstract
Reconstructing 3D scenes from sparse, unposed images remains challenging under real-world conditions with varying illumination and transient occlusions. Existing methods rely on scene-specific optimization using appearance embeddings or dynamic masks, which requires extensive per-scene training and fails under sparse views. Moreover, evaluations on limited scenes raise questions about generalization. We present GenWildSplat, a feed-forward framework for sparse-view outdoor reconstruction that requires no per-scene optimization. Given unposed internet images, GenWildSplat predicts depth, camera parameters, and 3D Gaussians in a canonical space using learned geometric priors. An appearance adapter modulates appearance for target lighting conditions, while semantic segmentation handles transient objects. Through curriculum learning on synthetic and real data, GenWildSplat generalizes across diverse illumination and occlusion patterns. Evaluations on PhotoTourism and MegaScenes benchmark demonstrate state-of-the-art feed-forward rendering quality, achieving real-time inference without test-time optimization
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper proposes GenWildSplat, a feed-forward neural framework for sparse-view 3D reconstruction from unposed, unconstrained outdoor internet images. It predicts depth, camera parameters, and canonical 3D Gaussians using learned geometric priors from curriculum training on synthetic and real data; an appearance adapter modulates lighting conditions while semantic segmentation suppresses transients. The method claims to eliminate per-scene optimization, delivering real-time inference and state-of-the-art feed-forward rendering quality on the PhotoTourism and MegaScenes benchmarks.
Significance. If the empirical claims are substantiated, this would be a meaningful step toward generalizable, optimization-free 3D reconstruction for real-world sparse views. Removing the need for per-scene training or test-time adaptation addresses a central practical limitation of NeRF-style and 3D Gaussian Splatting pipelines, potentially enabling scalable applications on internet photo collections. The curriculum-learning strategy for bridging synthetic-to-real gaps and handling illumination/transient variation is a relevant direction, though its effectiveness remains to be fully demonstrated.
major comments (3)
- [§5, Tables 1–2] §5 (Experiments) and Tables 1–2: the SOTA feed-forward claim is asserted via PSNR/SSIM/LPIPS numbers, yet the text supplies no error bars across scenes, no explicit description of how optimization-based baselines (e.g., 3DGS variants) were converted to a feed-forward setting, and no ablation isolating the contribution of the appearance adapter or semantic module under high illumination variance. These omissions are load-bearing for the central generalization guarantee.
- [§4.2] §4.2 (Curriculum Learning): the training schedule mixes synthetic and real data but provides no quantitative metrics (e.g., per-stage PSNR on held-out lighting/transient subsets) or distribution-coverage analysis showing that extreme illumination changes and transient occluders are adequately sampled. Without such evidence the claim that the learned priors suffice for arbitrary real-world conditions without test-time optimization rests on an unverified assumption.
- [§3.3–3.4] §3.3 (Appearance Adapter) and §3.4 (Semantic Segmentation): the integration of the adapter and segmentation mask into the Gaussian rendering pipeline is described at a high level; the paper does not report an ablation that removes either component and measures degradation on scenes with strong lighting shifts or moving objects, which directly tests the robustness argument.
minor comments (2)
- [Figure 3] Figure 3: the qualitative renderings would be more informative if accompanied by per-pixel error maps or depth visualizations to illustrate where the feed-forward predictions deviate from ground truth.
- [§3.1] Notation in §3.1: the mapping from predicted depth and cameras to canonical Gaussians is introduced without an explicit equation; adding a compact formulation would improve clarity.
Simulated Author's Rebuttal
We thank the referee for their insightful and constructive comments, which have helped us identify areas where the manuscript can be strengthened. We address each major comment point by point below. Revisions will be incorporated into the next version of the manuscript to provide additional evidence and clarity for the central claims.
read point-by-point responses
-
Referee: [§5, Tables 1–2] §5 (Experiments) and Tables 1–2: the SOTA feed-forward claim is asserted via PSNR/SSIM/LPIPS numbers, yet the text supplies no error bars across scenes, no explicit description of how optimization-based baselines (e.g., 3DGS variants) were converted to a feed-forward setting, and no ablation isolating the contribution of the appearance adapter or semantic module under high illumination variance. These omissions are load-bearing for the central generalization guarantee.
Authors: We agree that the presentation of results can be improved for greater rigor. In the revised manuscript, we will add error bars (standard deviations across scenes) to all metrics in Tables 1 and 2. We will also expand the experimental setup to explicitly describe the feed-forward evaluation protocol for optimization-based baselines: these were run using publicly released pre-trained models with no per-scene optimization or test-time adaptation, matching the protocol used for our method. Additionally, we will include a new ablation study that isolates the appearance adapter and semantic segmentation module, reporting performance on scene subsets with high illumination variance and transient objects. These changes will directly support the generalization claims. revision: yes
-
Referee: [§4.2] §4.2 (Curriculum Learning): the training schedule mixes synthetic and real data but provides no quantitative metrics (e.g., per-stage PSNR on held-out lighting/transient subsets) or distribution-coverage analysis showing that extreme illumination changes and transient occluders are adequately sampled. Without such evidence the claim that the learned priors suffice for arbitrary real-world conditions without test-time optimization rests on an unverified assumption.
Authors: We acknowledge that additional quantitative support for the curriculum learning strategy would strengthen the paper. In the revised version, we will report per-stage PSNR and SSIM metrics evaluated on held-out subsets that specifically contain extreme lighting variations and transient occluders. We will also add a distribution-coverage analysis, including statistics and visualizations of illumination ranges and occlusion patterns sampled at each curriculum stage. This will provide concrete evidence that the training distribution adequately covers the target real-world conditions. revision: yes
-
Referee: [§3.3–3.4] §3.3 (Appearance Adapter) and §3.4 (Semantic Segmentation): the integration of the adapter and segmentation mask into the Gaussian rendering pipeline is described at a high level; the paper does not report an ablation that removes either component and measures degradation on scenes with strong lighting shifts or moving objects, which directly tests the robustness argument.
Authors: We agree that component-specific ablations on challenging conditions would provide stronger validation of the robustness argument. We will revise Sections 3.3 and 3.4 and add corresponding results in the experiments section. These ablations will remove the appearance adapter and the semantic segmentation module individually (while keeping all other components fixed) and quantify the resulting drop in rendering quality on scenes with strong lighting shifts and moving objects. The new results will be presented alongside the main tables to directly demonstrate the contribution of each module. revision: yes
Circularity Check
No circularity: empirical feed-forward network with no derivation chain or self-referential predictions
full rationale
The paper describes a neural network (GenWildSplat) that predicts depth, camera parameters, and 3D Gaussians from unposed images using learned priors, followed by an appearance adapter and semantic segmentation. Training occurs via curriculum learning on external synthetic and real datasets. No equations, derivations, or mathematical claims appear in the provided text; the method is purely empirical and evaluated on external benchmarks (PhotoTourism, MegaScenes). There are no fitted inputs renamed as predictions, no self-definitional steps, and no load-bearing self-citations that reduce the central claim to its own inputs. The approach is self-contained against external data and benchmarks, with generalization claims resting on empirical results rather than internal construction.
Axiom & Free-Parameter Ledger
free parameters (1)
- network weights
axioms (1)
- domain assumption Learned geometric priors from mixed synthetic and real training data generalize to arbitrary real-world illumination and transient patterns
Reference graph
Works this paper leans on
-
[1]
Generative mul- tiview relighting for 3d reconstruction under extreme illumi- nation variation
Hadi Alzayer, Philipp Henzler, Jonathan T Barron, Jia-Bin Huang, Pratul P Srinivasan, and Dor Verbin. Generative mul- tiview relighting for 3d reconstruction under extreme illumi- nation variation. InProceedings of the Computer Vision and Pattern Recognition Conference, pages 10933–10942, 2025. 3
2025
-
[2]
Hadi Alzayer, Yunzhi Zhang, Chen Geng, Jia-Bin Huang, and Jiajun Wu. Coupled diffusion sampling for training-free multi-view image editing.arXiv preprint arXiv:2510.14981,
-
[3]
pixelsplat: 3d gaussian splats from image pairs for scalable generalizable 3d reconstruction
David Charatan, Sizhe Lester Li, Andrea Tagliasacchi, and Vincent Sitzmann. pixelsplat: 3d gaussian splats from image pairs for scalable generalizable 3d reconstruction. InCVPR,
-
[4]
Hallucinated neural radiance fields in the wild
Xingyu Chen, Qi Zhang, Xiaoyu Li, Yue Chen, Ying Feng, Xuan Wang, and Jue Wang. Hallucinated neural radiance fields in the wild. InCVPR, 2022. 2
2022
-
[5]
Mvsplat: Efficient 3d gaussian splatting from sparse multi-view images
Yuedong Chen, Haofei Xu, Chuanxia Zheng, Bohan Zhuang, Marc Pollefeys, Andreas Geiger, Tat-Jen Cham, and Jianfei Cai. Mvsplat: Efficient 3d gaussian splatting from sparse multi-view images. InECCV, 2024. 3, 7
2024
-
[6]
Mvs- plat360: Feed-forward 360 scene synthesis from sparse views.NeurIPS, 2024
Yuedong Chen, Chuanxia Zheng, Haofei Xu, Bohan Zhuang, Andrea Vedaldi, Tat-Jen Cham, and Jianfei Cai. Mvs- plat360: Feed-forward 360 scene synthesis from sparse views.NeurIPS, 2024. 3
2024
-
[7]
Worameth Chinchuthakun, Pakkapon Phongthawee, Amit Raj, Varun Jampani, Pramook Khungurn, and Supasorn Suwajanakorn. Diffusionlight-turbo: Accelerated light probes for free via single-pass chrome ball inpainting.arXiv preprint arXiv:2507.01305, 2025. 7
-
[8]
Swag: Splatting in the wild images with appearance-conditioned gaussians
Hiba Dahmani, Moussab Bennehar, Nathan Piasco, Luis Roldao, and Dzmitry Tsishkou. Swag: Splatting in the wild images with appearance-conditioned gaussians. InECCV,
-
[9]
Lrm: Large reconstruction model for single image to 3d
Yicong Hong, Kai Zhang, Jiuxiang Gu, Sai Bi, Yang Zhou, Difan Liu, Feng Liu, Kalyan Sunkavalli, Trung Bui, and Hao Tan. Lrm: Large reconstruction model for single image to 3d. InICLR, 2023. 3
2023
-
[10]
Rayzer: A self-supervised large view synthesis model
Hanwen Jiang, Hao Tan, Peng Wang, Haian Jin, Yue Zhao, Sai Bi, Kai Zhang, Fujun Luan, Kalyan Sunkavalli, Qixing Huang, et al. Rayzer: A self-supervised large view synthesis model. InICCV, 2025. 3
2025
-
[11]
Anysplat: Feed-forward 3d gaussian splatting from unconstrained views
Lihan Jiang, Yucheng Mao, Linning Xu, Tao Lu, Kerui Ren, Yichen Jin, Xudong Xu, Mulin Yu, Jiangmiao Pang, Feng Zhao, et al. Anysplat: Feed-forward 3d gaussian splatting from unconstrained views. InACM SIGGRAPH Asia, 2025. 2, 3, 7
2025
-
[12]
Lvsm: A large view synthesis model with minimal 3d inductive bias
Haian Jin, Hanwen Jiang, Hao Tan, Kai Zhang, Sai Bi, Tianyuan Zhang, Fujun Luan, Noah Snavely, and Zexiang Xu. Lvsm: A large view synthesis model with minimal 3d inductive bias. InICLR, 2024. 3
2024
-
[13]
Ultralytics yolov8, 2023
Glenn Jocher, Ayush Chaurasia, and Jing Qiu. Ultralytics yolov8, 2023. 6, 1
2023
-
[14]
Lumigauss: Relightable gaussian splatting in the wild
Joanna Kaleta, Kacper Kania, Tomasz Trzci ´nski, and Marek Kowalski. Lumigauss: Relightable gaussian splatting in the wild. InWACV, 2025. 2
2025
-
[15]
3d gaussian splatting for real-time radiance field rendering.ACM Trans
Bernhard Kerbl, Georgios Kopanas, Thomas Leimk ¨uhler, and George Drettakis. 3d gaussian splatting for real-time radiance field rendering.ACM Trans. Graph., 42(4):139–1,
-
[16]
Wildgaussians: 3d gaussian splatting in the wild
Jonas Kulhanek, Songyou Peng, Zuzana Kukelova, Marc Pollefeys, and Torsten Sattler. Wildgaussians: 3d gaussian splatting in the wild. InNeurIPS, 2024. 2, 3, 4, 5, 6, 7, 8
2024
-
[17]
Ground- ing image matching in 3d with mast3r
Vincent Leroy, Yohann Cabon, and J´erˆome Revaud. Ground- ing image matching in 3d with mast3r. InECCV, 2024. 3
2024
-
[18]
Deming Li, Kaiwen Jiang, Yutao Tang, Ravi Ramamoorthi, Rama Chellappa, and Cheng Peng. Ms-gs: Multi-appearance sparse-view 3d gaussian splatting in the wild.arXiv preprint arXiv:2509.15548, 2025. 7, 1
-
[19]
SyncFix: Fixing 3D Reconstructions via Multi-View Synchronization
Deming Li, Abhay Yadav, Cheng Peng, Rama Chel- lappa, and Anand Bhattad. Syncfix: Fixing 3d recon- structions via multi-view synchronization.arXiv preprint arXiv:2604.11797, 2026. 6
work page internal anchor Pith review Pith/arXiv arXiv 2026
-
[20]
Yiqing Li, Xuan Wang, Jiawei Wu, Yikun Ma, and Zhi Jin. Sparsegs-w: Sparse-view 3d gaussian splatting in the wild with generative priors.arXiv preprint arXiv:2503.19452,
-
[21]
Diffusion renderer: Neu- ral inverse and forward rendering with video diffusion mod- els
Ruofan Liang, Zan Gojcic, Huan Ling, Jacob Munkberg, Jon Hasselgren, Chih-Hao Lin, Jun Gao, Alexander Keller, Nan- dita Vijaykumar, Sanja Fidler, et al. Diffusion renderer: Neu- ral inverse and forward rendering with video diffusion mod- els. InCVPR, 2025. 6, 7, 8, 1
2025
-
[22]
Microsoft coco: Common objects in context
Tsung-Yi Lin, Michael Maire, Serge Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Doll´ar, and C Lawrence Zitnick. Microsoft coco: Common objects in context. In ECCV, 2014. 6, 1
2014
-
[23]
Dl3dv-10k: A large-scale scene dataset for deep learning-based 3d vision
Lu Ling, Yichen Sheng, Zhi Tu, Wentian Zhao, Cheng Xin, Kun Wan, Lantao Yu, Qianyu Guo, Zixun Yu, Yawen Lu, et al. Dl3dv-10k: A large-scale scene dataset for deep learning-based 3d vision. InCVPR, 2024. 6, 1
2024
-
[24]
Slam3r: Real- time dense scene reconstruction from monocular rgb videos
Yuzheng Liu, Siyan Dong, Shuzhe Wang, Yingda Yin, Yan- chao Yang, Qingnan Fan, and Baoquan Chen. Slam3r: Real- time dense scene reconstruction from monocular rgb videos. InCVPR, 2025. 3
2025
-
[25]
Nerf in the wild: Neural radiance fields for uncon- strained photo collections
Ricardo Martin-Brualla, Noha Radwan, Mehdi SM Sajjadi, Jonathan T Barron, Alexey Dosovitskiy, and Daniel Duck- worth. Nerf in the wild: Neural radiance fields for uncon- strained photo collections. InCVPR, 2021. 3
2021
-
[26]
Orb-slam: A versatile and accurate monocular slam system.IEEE transactions on robotics, 31(5):1147–1163,
Raul Mur-Artal, Jose Maria Martinez Montiel, and Juan D Tardos. Orb-slam: A versatile and accurate monocular slam system.IEEE transactions on robotics, 31(5):1147–1163,
-
[27]
Mast3r-slam: Real-time dense slam with 3d reconstruction priors
Riku Murai, Eric Dexheimer, and Andrew J Davison. Mast3r-slam: Real-time dense slam with 3d reconstruction priors. InCVPR, 2025. 3
2025
-
[28]
Dinov2: Learning robust visual features without supervision
Maxime Oquab, Timoth ´ee Darcet, Th ´eo Moutakanni, Huy V o, Marc Szafraniec, Vasil Khalidov, Pierre Fernandez, Daniel Haziza, Francisco Massa, Alaaeldin El-Nouby, et al. Dinov2: Learning robust visual features without supervision. Transactions on Machine Learning Research Journal, pages 1–31, 2024. 5
2024
-
[29]
Coherentgs: Sparse novel view synthesis with coherent 3d gaussians
Avinash Paliwal, Wei Ye, Jinhui Xiong, Dmytro Kotovenko, Rakesh Ranjan, Vikas Chandra, and Nima Khademi Kalan- tari. Coherentgs: Sparse novel view synthesis with coherent 3d gaussians. InECCV, 2024. 3
2024
-
[30]
Nerf on-the-go: Exploiting uncertainty for distractor-free nerfs in the wild
Weining Ren, Zihan Zhu, Boyang Sun, Jiaqi Chen, Marc Pollefeys, and Songyou Peng. Nerf on-the-go: Exploiting uncertainty for distractor-free nerfs in the wild. InCVPR,
-
[31]
High-resolution image syn- thesis with latent diffusion models
Robin Rombach, Andreas Blattmann, Dominik Lorenz, Patrick Esser, and Bj¨orn Ommer. High-resolution image syn- thesis with latent diffusion models. InCVPR, 2022. 7
2022
-
[32]
Nerf for outdoor scene relighting
Viktor Rudnev, Mohamed Elgharib, William Smith, Lingjie Liu, Vladislav Golyanik, and Christian Theobalt. Nerf for outdoor scene relighting. InECCV, 2022. 2
2022
-
[33]
Robustnerf: Ignor- ing distractors with robust losses
Sara Sabour, Suhani V ora, Daniel Duckworth, Ivan Krasin, David J Fleet, and Andrea Tagliasacchi. Robustnerf: Ignor- ing distractors with robust losses. InCVPR, 2023. 3
2023
-
[34]
Photo tourism: exploring photo collections in 3d
Noah Snavely, Steven M Seitz, and Richard Szeliski. Photo tourism: exploring photo collections in 3d. InACM siggraph 2006 papers. 2006. 2, 7, 1
2006
-
[35]
Neural 3d reconstruction in the wild
Jiaming Sun, Xi Chen, Qianqian Wang, Zhengqi Li, Hadar Averbuch-Elor, Xiaowei Zhou, and Noah Snavely. Neural 3d reconstruction in the wild. InACM SIGGRAPH 2022 conference proceedings, 2022. 2
2022
-
[36]
Nexussplats: Efficient 3d gaussian splatting in the wild.arXiv preprint arXiv:2411.14514, 2024
Yuzhou Tang, Dejun Xu, Yongjie Hou, Zhenzhong Wang, and Min Jiang. Nexussplats: Efficient 3d gaussian splatting in the wild.arXiv preprint arXiv:2411.14514, 2024. 2, 3, 4, 6, 7, 8
-
[37]
Mv-dust3r+: Single-stage scene reconstruction from sparse views in 2 seconds
Zhenggang Tang, Yuchen Fan, Dilin Wang, Hongyu Xu, Rakesh Ranjan, Alexander Schwing, and Zhicheng Yan. Mv-dust3r+: Single-stage scene reconstruction from sparse views in 2 seconds. InCVPR, 2025. 3
2025
-
[38]
Megascenes: Scene-level view synthesis at scale
Joseph Tung, Gene Chou, Ruojin Cai, Guandao Yang, Kai Zhang, Gordon Wetzstein, Bharath Hariharan, and Noah Snavely. Megascenes: Scene-level view synthesis at scale. InECCV, 2024. 2
2024
-
[39]
3d reconstruction with spatial memory
Hengyi Wang and Lourdes Agapito. 3d reconstruction with spatial memory. In2025 International Conference on 3D Vision (3DV), 2025. 3
2025
-
[40]
Vggt: Vi- sual geometry grounded transformer
Jianyuan Wang, Minghao Chen, Nikita Karaev, Andrea Vedaldi, Christian Rupprecht, and David Novotny. Vggt: Vi- sual geometry grounded transformer. InCVPR, pages 5294– 5306, 2025. 2, 4
2025
-
[41]
Continuous 3d per- ception model with persistent state
Qianqian Wang, Yifei Zhang, Aleksander Holynski, Alexei A Efros, and Angjoo Kanazawa. Continuous 3d per- ception model with persistent state. InCVPR, 2025. 3
2025
-
[42]
Dust3r: Geometric 3d vi- sion made easy
Shuzhe Wang, Vincent Leroy, Yohann Cabon, Boris Chidlovskii, and Jerome Revaud. Dust3r: Geometric 3d vi- sion made easy. InCVPR, 2024. 3
2024
-
[43]
Freesplat: Generalizable 3d gaussian splatting towards free view synthesis of indoor scenes.NeurIPS, 37, 2024
Yunsong Wang, Tianxin Huang, Hanlin Chen, and Gim Hee Lee. Freesplat: Generalizable 3d gaussian splatting towards free view synthesis of indoor scenes.NeurIPS, 37, 2024. 3
2024
-
[44]
Yuze Wang, Junyi Wang, and Yue Qi. We-gs: An in-the-wild efficient 3d gaussian representation for unconstrained photo collections.arXiv preprint arXiv:2406.02407, 2024. 2, 3
-
[45]
Look at the sky: Sky- aware efficient 3d gaussian splatting in the wild.IEEE Trans- actions on Visualization and Computer Graphics, 2025
Yuze Wang, Junyi Wang, Ruicheng Gao, Yansong Qu, Wan- tong Duan, Shuo Yang, and Yue Qi. Look at the sky: Sky- aware efficient 3d gaussian splatting in the wild.IEEE Trans- actions on Visualization and Computer Graphics, 2025. 2
2025
-
[46]
Ccpl: Con- trastive coherence preserving loss for versatile style transfer
Zijie Wu, Zhen Zhu, Junping Du, and Xiang Bai. Ccpl: Con- trastive coherence preserving loss for versatile style transfer. InECCV, 2022. 7, 8
2022
-
[47]
Sparsegs: Sparse view synthesis using 3d gaus- sian splatting
Haolin Xiong, Sairisheek Muttukuru, Hanyuan Xiao, Rishi Upadhyay, Pradyumna Chari, Yajie Zhao, and Achuta Kadambi. Sparsegs: Sparse view synthesis using 3d gaus- sian splatting. In2025 International Conference on 3D Vi- sion (3DV), 2025. 3
2025
-
[48]
Depthsplat: Connecting gaussian splatting and depth
Haofei Xu, Songyou Peng, Fangjinhua Wang, Hermann Blum, Daniel Barath, Andreas Geiger, and Marc Pollefeys. Depthsplat: Connecting gaussian splatting and depth. In CVPR, 2025. 2, 3
2025
-
[49]
Wild-gs: Real- time novel view synthesis from unconstrained photo collec- tions
Jiacong Xu, Yiqun Mei, and Vishal Patel. Wild-gs: Real- time novel view synthesis from unconstrained photo collec- tions. InNeurIPS, 2024. 2
2024
-
[50]
Grm: Large gaussian reconstruction model for ef- ficient 3d reconstruction and generation
Yinghao Xu, Zifan Shi, Wang Yifan, Hansheng Chen, Ceyuan Yang, Sida Peng, Yujun Shen, and Gordon Wet- zstein. Grm: Large gaussian reconstruction model for ef- ficient 3d reconstruction and generation. InECCV, 2024. 3
2024
-
[51]
Fast3r: Towards 3d reconstruction of 1000+ images in one forward pass
Jianing Yang, Alexander Sax, Kevin J Liang, Mikael Henaff, Hao Tang, Ang Cao, Joyce Chai, Franziska Meier, and Matt Feiszli. Fast3r: Towards 3d reconstruction of 1000+ images in one forward pass. InCVPR, 2025. 3
2025
-
[52]
Cross-ray neural radiance fields for novel- view synthesis from unconstrained image collections
Yifan Yang, Shuhai Zhang, Zixiong Huang, Yubing Zhang, and Mingkui Tan. Cross-ray neural radiance fields for novel- view synthesis from unconstrained image collections. In ICCV, 2023. 3
2023
-
[53]
Fewviewgs: Gaussian splatting with few view matching and multi-stage training.NeurIPS, 2024
Ruihong Yin, Vladimir Yugay, Yue Li, Sezer Karaoglu, and Theo Gevers. Fewviewgs: Gaussian splatting with few view matching and multi-stage training.NeurIPS, 2024. 3
2024
-
[54]
Gaussian in the wild: 3d gaussian splatting for unconstrained image collections
Dongbin Zhang, Chuming Wang, Weitao Wang, Peihao Li, Minghan Qin, and Haoqian Wang. Gaussian in the wild: 3d gaussian splatting for unconstrained image collections. In ECCV, 2024. 2, 3, 5, 6, 7, 8
2024
-
[55]
Gs-lrm: Large recon- struction model for 3d gaussian splatting
Kai Zhang, Sai Bi, Hao Tan, Yuanbo Xiangli, Nanxuan Zhao, Kalyan Sunkavalli, and Zexiang Xu. Gs-lrm: Large recon- struction model for 3d gaussian splatting. InECCV, 2024. 3
2024
-
[56]
Latent intrinsics emerge from training to relight
Xiao Zhang, William Gao, Seemandhar Jain, Michael Maire, David Forsyth, and Anand Bhattad. Latent intrinsics emerge from training to relight. InNeurIPS, 2024. 6
2024
-
[57]
A comprehensive review of vision-based 3d re- construction methods.Sensors, 24(7):2314, 2024
Linglong Zhou, Guoxin Wu, Yunbo Zuo, Xuanyu Chen, and Hongle Hu. A comprehensive review of vision-based 3d re- construction methods.Sensors, 24(7):2314, 2024. 2
2024
-
[58]
Fsgs: Real-time few-shot view synthesis using gaussian splatting
Zehao Zhu, Zhiwen Fan, Yifan Jiang, and Zhangyang Wang. Fsgs: Real-time few-shot view synthesis using gaussian splatting. InECCV, 2024. 3
2024
-
[59]
Long-lrm: Long- sequence large reconstruction model for wide-coverage gaussian splats
Chen Ziwen, Hao Tan, Kai Zhang, Sai Bi, Fujun Luan, Yi- cong Hong, Li Fuxin, and Zexiang Xu. Long-lrm: Long- sequence large reconstruction model for wide-coverage gaussian splats. InICCV, 2025. 3 Appendix A. Dataset Details A.1. Training Dataset For training GenWildSplat, we constructed a large-scale synthetic dataset derived from the DL3DV [23] dataset. ...
2025
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.