Cross-View Splatter: Feed-Forward View Synthesis with Georeferenced Images
Pith reviewed 2026-05-20 06:11 UTC · model grok-4.3
The pith
Fusing satellite and ground images in one 3D frame improves novel-view synthesis for outdoor scenes.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Cross-View Splatter predicts pixel-aligned Gaussian splats for outdoor scenes by fusing orthorectified satellite views with GPS-tagged ground photos inside a single 3D coordinate frame; aligning the ground and bird's-eye feature representations produces better scene coverage and novel-view synthesis than ground imagery alone.
What carries the argument
Alignment of ground and bird's-eye feature representations inside a unified 3D coordinate frame that fuses satellite and ground imagery for Gaussian splat prediction.
If this is right
- Ground capture campaigns for large outdoor sites can be reduced while still obtaining usable 3D reconstructions.
- Novel views become feasible in regions visible only from the satellite vantage.
- Publicly available satellite data can serve as a geometric prior for any GPS-tagged ground collection.
- The same feed-forward pipeline supports evaluation on a new georeferenced benchmark that includes both image types.
Where Pith is reading between the lines
- The approach could extend to other multi-source capture settings, such as drone footage paired with ground images, if similar feature alignment is applied.
- Real-time mapping applications might benefit if the feed-forward prediction is further optimized for speed on mobile hardware.
- Errors in public satellite orthorectification would directly limit reconstruction fidelity in practice.
Load-bearing premise
Orthorectified satellite imagery and GPS-tagged ground photos can be aligned into a shared 3D frame without large systematic errors in pose or scale.
What would settle it
Measure novel-view quality on a test scene after deliberately shifting satellite poses or scales by known amounts; if quality gains over ground-only disappear or reverse, the central claim does not hold.
Figures
read the original abstract
We present Cross-View Splatter, a feed-forward method that predicts pixel-aligned Gaussian splats for outdoor scenes captured at ground level AND by satellite. Faithful reconstructions require good camera coverage, but ground imagery is time-consuming and hard to capture at scale for large outdoor scenes. Fortunately, satellite imagery can provide a global geometric prior that is easy to access via public APIs. Cross-View Splatter fuses orthorectified satellite views with GPS-tagged ground photos to predict Gaussian splats in a unified 3D coordinate frame. By aligning ground and bird's-eye feature representations, our model improves scene coverage and novel-view synthesis, compared to ground imagery alone. We train on curated georeferenced datasets and paired satellite-terrain data, mined from open mapping services. We evaluate our method on a new benchmark for novel-view synthesis with georeferenced imagery allowing comparison to prior state-of-the-art methods. Our code and data preparation will be available at https://nianticspatial.github.io/cross-view-splatter/.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript introduces Cross-View Splatter, a feed-forward neural method for novel-view synthesis of outdoor scenes. It predicts pixel-aligned 3D Gaussian splats by fusing features from GPS-tagged ground-level photographs and orthorectified satellite imagery within a single georeferenced 3D coordinate frame. The central claim is that this cross-view fusion improves scene coverage and synthesis quality relative to ground imagery alone. The approach is trained on curated pairs mined from open mapping services and evaluated on a newly introduced benchmark that supports comparison against prior state-of-the-art methods.
Significance. If the alignment between views is shown to be reliable and the reported gains are not artifacts of data curation, the work offers a practical route to scalable outdoor reconstruction by exploiting freely available satellite priors. The feed-forward design and planned release of code and data-preparation scripts would support reproducibility and adoption in computer vision and robotics applications that require large-scale 3D models.
major comments (2)
- [§3.2] §3.2 (Cross-View Feature Alignment): The method relies on aligning ground and bird's-eye feature representations into a unified 3D frame using GPS tags and orthorectified satellite data. However, no quantitative alignment-error statistics (e.g., mean translation or rotation residuals, or scale-drift measurements) are reported on the training or test pairs. Given that consumer-grade GPS and public orthorectification typically exhibit meter-scale inconsistencies, it is unclear whether the observed improvements in coverage and PSNR/SSIM (Tables 2 and 3) would persist under realistic residual misalignment.
- [§5.1] §5.1 (Benchmark and Baselines): The new georeferenced benchmark is used to claim superiority over ground-only baselines. Without an ablation that perturbs the satellite poses by amounts consistent with reported GPS accuracy (e.g., ±2 m translation, ±1° rotation) and re-measures the synthesis metrics, it remains possible that the gains are specific to the curated, well-aligned pairs rather than a general property of the cross-view fusion.
minor comments (2)
- The abstract states that 'code and data preparation will be available'; the final version should include a permanent repository link and confirm that the benchmark dataset (including the satellite-ground pairings) is released under an open license.
- [§3.1] Notation for the unified coordinate frame (e.g., the transformation between ground and satellite cameras) should be defined explicitly in §3.1 before being used in the feature-alignment equations.
Simulated Author's Rebuttal
We thank the referee for the constructive and detailed feedback. The concerns about alignment reliability and robustness to realistic pose noise are valid and point to useful additions that will strengthen the claims. We address each major comment below and will revise the manuscript accordingly.
read point-by-point responses
-
Referee: [§3.2] §3.2 (Cross-View Feature Alignment): The method relies on aligning ground and bird's-eye feature representations into a unified 3D frame using GPS tags and orthorectified satellite data. However, no quantitative alignment-error statistics (e.g., mean translation or rotation residuals, or scale-drift measurements) are reported on the training or test pairs. Given that consumer-grade GPS and public orthorectification typically exhibit meter-scale inconsistencies, it is unclear whether the observed improvements in coverage and PSNR/SSIM (Tables 2 and 3) would persist under realistic residual misalignment.
Authors: We agree that explicit quantitative alignment-error statistics are missing from the current version. In the revision we will add a new table (or subsection in §3.2) reporting mean translation and rotation residuals, as well as scale-drift statistics, computed directly from the GPS tags and orthorectified satellite metadata on both the training and test pairs. These numbers will be derived from the same curated data used for all experiments and will allow readers to judge whether the reported gains remain plausible under the meter-scale errors typical of consumer GPS and public orthorectification. revision: yes
-
Referee: [§5.1] §5.1 (Benchmark and Baselines): The new georeferenced benchmark is used to claim superiority over ground-only baselines. Without an ablation that perturbs the satellite poses by amounts consistent with reported GPS accuracy (e.g., ±2 m translation, ±1° rotation) and re-measures the synthesis metrics, it remains possible that the gains are specific to the curated, well-aligned pairs rather than a general property of the cross-view fusion.
Authors: We accept that an explicit robustness ablation is needed to rule out the possibility that gains are an artifact of unusually clean alignments. In the revised manuscript we will add an ablation in §5.1 (or a new supplementary section) that applies controlled perturbations of ±2 m translation and ±1° rotation to the satellite poses, re-runs the cross-view fusion, and reports the resulting changes in PSNR, SSIM, and coverage metrics on the benchmark. This will directly test whether the cross-view advantage persists under realistic residual misalignment. revision: yes
Circularity Check
No circularity; derivation relies on external training data and empirical evaluation
full rationale
The paper trains a feed-forward model on curated georeferenced datasets and paired satellite-terrain data mined from open mapping services, then evaluates on a new benchmark for novel-view synthesis. No equations or steps reduce by construction to fitted inputs, self-definitions, or self-citation chains; the alignment of ground and bird's-eye features occurs via learned prediction on independent external data rather than tautological renaming or forced statistical equivalence. The central claim of improved coverage therefore rests on standard supervised training against held-out test views, remaining self-contained.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption Satellite imagery supplies a reliable global geometric prior that can be fused with ground photos without large pose or scale errors.
Lean theorems connected to this paper
-
IndisputableMonolith/Foundation/AbsoluteFloorClosure.leanreality_from_one_distinction unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
We introduce Cross-View Splatter, a feed-forward method that uses both ground-level imagery and orthographic satellite imagery... inject cross-attention layers... regress 3D Gaussian splat attributes.
-
IndisputableMonolith/Cost/FunctionalEquation.leanwashburn_uniqueness_aczel unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
We adopt the per-batch ℓ2-scaling... height map regression... orthographic projection model
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Reference graph
Works this paper leans on
-
[1]
Map-free visual relocalization: Metric pose relative to a single image
Eduardo Arnold, Jamie Wynn, Sara Vicente, Guillermo Garcia-Hernando, ´Aron Monszpart, Victor Adrian Prisacariu, Daniyar Turmukhambetov, and Eric Brach- mann. Map-free visual relocalization: Metric pose relative to a single image. InECCV, 2022. 4, 6
work page 2022
-
[2]
Eric Brachmann, Jamie Wynn, Shuai Chen, Tommaso Cav- allari, ´Aron Monszpart, Daniyar Turmukhambetov, and Victor Adrian Prisacariu. Scene coordinate reconstruction: Posing of image collections via incremental learning of a relocalizer. InECCV, 2024. 2
work page 2024
-
[3]
Yohann Cabon, Naila Murray, and Martin Humenberger. Virtual kitti 2.arXiv preprint arXiv:2001.10773, 2020. 2, 6
work page internal anchor Pith review Pith/arXiv arXiv 2001
-
[4]
Efficient geometry-aware 3d generative adversarial networks
Eric R Chan, Connor Z Lin, Matthew A Chan, Koki Nagano, Boxiao Pan, Shalini De Mello, Orazio Gallo, Leonidas J Guibas, Jonathan Tremblay, Sameh Khamis, et al. Efficient geometry-aware 3d generative adversarial networks. InCVPR, 2022. 3
work page 2022
-
[5]
pixelsplat: 3d gaussian splats from image pairs for scalable generalizable 3d reconstruction
David Charatan, Sizhe Li, Andrea Tagliasacchi, and Vin- cent Sitzmann. pixelsplat: 3d gaussian splats from image pairs for scalable generalizable 3d reconstruction. InCVPR,
-
[6]
Anpei Chen, Zexiang Xu, Fuqiang Zhao, Xiaoshuai Zhang, Fanbo Xiang, Jingyi Yu, and Hao Su. Mvsnerf: Fast generalizable radiance field reconstruction from multi-view stereo.arXiv preprint arXiv:2103.15595, 2021. 2
-
[7]
Mvsplat: Efficient 3d gaussian splatting from sparse multi-view images
Yuedong Chen, Haofei Xu, Chuanxia Zheng, Bohan Zhuang, Marc Pollefeys, Andreas Geiger, Tat-Jen Cham, and Jianfei Cai. Mvsplat: Efficient 3d gaussian splatting from sparse multi-view images. InECCV, 2024. 1, 2, 7
work page 2024
-
[8]
Crandall, Andrew Owens, Noah Snavely, and Daniel P
David J. Crandall, Andrew Owens, Noah Snavely, and Daniel P. Huttenlocher. Sfm with mrfs: Discrete- continuous optimization for large-scale structure from mo- tion.PAMI, 2013. 2
work page 2013
-
[9]
Chang, Manolis Savva, Maciej Hal- ber, Thomas Funkhouser, and Matthias Nießner
Angela Dai, Angel X. Chang, Manolis Savva, Maciej Hal- ber, Thomas Funkhouser, and Matthias Nießner. Scannet: Richly-annotated 3d reconstructions of indoor scenes. In CVPR, 2017. 2
work page 2017
-
[10]
FlashAttention-2: Faster attention with better par- allelism and work partitioning
Tri Dao. FlashAttention-2: Faster attention with better par- allelism and work partitioning. InICLR, 2024. 8
work page 2024
-
[11]
Fu, Stefano Ermon, Atri Rudra, and Christopher R´e
Tri Dao, Daniel Y . Fu, Stefano Ermon, Atri Rudra, and Christopher R´e. FlashAttention: Fast and memory-efficient exact attention with IO-awareness. InNeurIPS, 2022. 8
work page 2022
-
[12]
Vision Transformers Need Registers
Timoth ´ee Darcet, Maxime Oquab, Julien Mairal, and Pi- otr Bojanowski. Vision transformers need registers.arXiv preprint arXiv:2309.16588, 2023. 4
work page internal anchor Pith review Pith/arXiv arXiv 2023
-
[13]
Streetscapes: Large-scale consistent street view generation using autoregressive video diffusion
Boyang Deng, Richard Tucker, Zhengqi Li, Leonidas Guibas, Noah Snavely, and Gordon Wetzstein. Streetscapes: Large-scale consistent street view generation using autoregressive video diffusion. InSIGGRAPH, 2024. 3
work page 2024
-
[14]
Ortholoc: UA V 6-dof localization and calibration using orthographic geo- data
Oussema Dhaouadi, Riccardo Marin, Johannes Michael Meier, Jacques Kaiser, and Daniel Cremers. Ortholoc: UA V 6-dof localization and calibration using orthographic geo- data. InNeurIPS Datasets and Benchmarks Track, 2025. 2
work page 2025
-
[15]
An Im- age is Worth 16x16 Words: Transformers for Image Recog- nition at Scale
Alexey Dosovitskiy, Lucas Beyer, Alexander Kolesnikov, Dirk Weissenborn, Xiaohua Zhai, Thomas Unterthiner, Mostafa Dehghani, Matthias Minderer, Georg Heigold, Sylvain Gelly, Jakob Uszkoreit, and Neil Houlsby. An Im- age is Worth 16x16 Words: Transformers for Image Recog- nition at Scale. InICLR, 2021. 2, 4
work page 2021
-
[16]
Learning to render novel views from wide-baseline stereo pairs.CVPR, 2023
Yilun Du, Cameron Smith, Ayush Tewari, and Vincent Sitz- mann. Learning to render novel views from wide-baseline stereo pairs.CVPR, 2023. 2
work page 2023
-
[17]
MASt3r-sfm: a fully-integrated solution for unconstrained structure-from-motion
Bardienus Pieter Duisterhof, Lojze Zust, Philippe Weinza- epfel, Vincent Leroy, Yohann Cabon, and Jerome Revaud. MASt3r-sfm: a fully-integrated solution for unconstrained structure-from-motion. In3DV, 2025. 1, 2, 13
work page 2025
-
[18]
Esri World Imagery.https : / / www
Esri. Esri World Imagery.https : / / www . arcgis . com / home / item . html ? id = 10df2279f9684e4a9f6a7f08febac2a9. Ac- cessed: 2025-10-05. 2, 4, 6
work page 2025
-
[19]
Florian Fervers. Tiled Web Maps.https://github. com/fferflo/tiledwebmaps. Accessed: 2025-10-
work page 2025
-
[20]
Uncertainty-aware vision-based metric cross-view geolo- calization
Florian Fervers, Sebastian Bullinger, Christoph Bo- densteiner, Michael Arens, and Rainer Stiefelhagen. Uncertainty-aware vision-based metric cross-view geolo- calization. InCVPR, 2023
work page 2023
-
[21]
Statewide visual geolocalization in the wild
Florian Fervers, Sebastian Bullinger, Christoph Boden- steiner, Michael Arens, and Rainer Stiefelhagen. Statewide visual geolocalization in the wild. InECCV, 2024. 6
work page 2024
-
[22]
Collection of open nation-scale lidar datasets
Flai. Collection of open nation-scale lidar datasets. https : / / registry . opendata . aws / open - lidar-data. Accessed: 2025-10-19. 2, 6
work page 2025
-
[23]
Virtual worlds as proxy for multi-object tracking anal- ysis
Adrien Gaidon, Qiao Wang, Yohann Cabon, and Eleonora Vig. Virtual worlds as proxy for multi-object tracking anal- ysis. InCVPR, 2016. 6
work page 2016
-
[24]
Massively parallel multiview stereopsis by surface normal diffusion
Silvano Galliani, Katrin Lasinger, and Konrad Schindler. Massively parallel multiview stereopsis by surface normal diffusion. InICCV, 2015. 2
work page 2015
-
[25]
Skyeyes: Ground roaming using aerial view images.arXiv preprint arXiv:2409.16685, 2024
Zhiyuan Gao, Wenbin Teng, Gonglin Chen, Jinsen Wu, Ningli Xu, Rongjun Qin, Andrew Feng, and Yajie Zhao. Skyeyes: Ground roaming using aerial view images.arXiv preprint arXiv:2409.16685, 2024. 2, 3
-
[26]
Gdal: Geospatial data abstraction li- brary.https://gdal.org, 2024
GDAL Developers. Gdal: Geospatial data abstraction li- brary.https://gdal.org, 2024. 16
work page 2024
-
[27]
Vision meets robotics: The kitti dataset.IJRR,
Andreas Geiger, Philip Lenz, Christoph Stiller, and Raquel Urtasun. Vision meets robotics: The kitti dataset.IJRR,
-
[28]
Are we ready for Autonomous Driving? The KITTI Vision Bench- mark Suite
Andreas Geiger, Philip Lenz, and Raquel Urtasun. Are we ready for Autonomous Driving? The KITTI Vision Bench- mark Suite. InCVPR, 2012. 2
work page 2012
-
[29]
Google Maps Platform Documentation
Google. Google Maps Platform Documentation. https : / / developers . google . com / maps / documentation. Accessed: 2025-10-04. 2, 4, 6
work page 2025
-
[30]
Cascade cost volume for high- resolution multi-view stereo and stereo matching
Xiaodong Gu, Zhiwen Fan, Siyu Zhu, Zuozhuo Dai, Feitong Tan, and Ping Tan. Cascade cost volume for high- resolution multi-view stereo and stereo matching. InCVPR,
-
[31]
Antoine Gu ´edon and Vincent Lepetit. Sugar: Surface- aligned gaussian splatting for efficient 3d mesh reconstruc- tion and high-quality mesh rendering.CVPR, 2024. 3
work page 2024
-
[32]
Richard Hartley and Andrew Zisserman.Multiple View Ge- ometry in Computer Vision. Cambridge University Press,
-
[33]
Pf3plat: Pose-free feed-forward 3d gaussian splatting,
Sunghwan Hong, Jaewoo Jung, Heeseong Shin, Jisang 9 Han, Jiaolong Yang, Chong Luo, and Seungryong Kim. Pf3plat: Pose-free feed-forward 3d gaussian splatting. arXiv preprint arXiv:2410.22128, 2024. 2
-
[34]
Xiaomou Hou, Wanshui Gan, and Naoto Yokoya. En- hancing monocular height estimation from aerial images with street-view images.arXiv preprint arXiv:2311.02121,
-
[35]
SkySplat: Generalizable 3D Gaussian Splatting from Multi-Temporal Sparse Satellite Images
Xuejun Huang, Xinyi Liu, Yi Wan, Zhi Zheng, Bin Zhang, Mingtao Xiong, Yingying Pei, and Yongjun Zhang. Skysplat: Generalizable 3d gaussian splatting from multi-temporal sparse satellite images.arXiv preprint arXiv:2508.09479, 2025. 3
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[36]
AnySplat: Feed-forward 3D Gaussian Splatting from Unconstrained Views, May 2025
Lihan Jiang, Yucheng Mao, Linning Xu, Tao Lu, Kerui Ren, Yichen Jin, Xudong Xu, Mulin Yu, Jiangmiao Pang, Feng Zhao, et al. Anysplat: Feed-forward 3d gaus- sian splatting from unconstrained views.arXiv preprint arXiv:2505.23716, 2025. 1, 2, 3, 5, 7, 8, 13, 16
-
[37]
Horizon-gs: Unified 3d gaussian splatting for large-scale aerial-to-ground scenes
Lihan Jiang, Kerui Ren, Mulin Yu, Linning Xu, Junt- ing Dong, Tao Lu, Feng Zhao, Dahua Lin, and Bo Dai. Horizon-gs: Unified 3d gaussian splatting for large-scale aerial-to-ground scenes. InCVPR, 2025. 3
work page 2025
-
[38]
Analyzing and improving the image quality of StyleGAN
Tero Karras, Samuli Laine, Miika Aittala, Janne Hellsten, Jaakko Lehtinen, and Timo Aila. Analyzing and improving the image quality of StyleGAN. InCVPR, 2020. 3
work page 2020
-
[39]
MapAnything: Universal Feed-Forward Metric 3D Reconstruction
Nikhil Keetha, Norman M ¨uller, Johannes Sch ¨onberger, Lorenzo Porzi, Yuchen Zhang, Tobias Fischer, Arno Knapitsch, Duncan Zauss, Ethan Weber, Nelson Antunes, Jonathon Luiten, Manuel Lopez-Antequera, Samuel Rota Bul`o, Christian Richardt, Deva Ramanan, Sebastian Scherer, and Peter Kontschieder. MapAnything: Univer- sal feed-forward metric 3D reconstructi...
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[40]
3d gaussian splatting for real-time radiance field rendering.TOG, 2023
Bernhard Kerbl, Georgios Kopanas, Thomas Leimk ¨uhler, and George Drettakis. 3d gaussian splatting for real-time radiance field rendering.TOG, 2023. 2, 4
work page 2023
-
[41]
Diederik P. Kingma and Jimmy Ba. Adam: A method for stochastic optimization.International Conference on Learning Representations (ICLR), 2015. 13
work page 2015
-
[42]
Tanks and temples: Benchmarking large-scale scene reconstruction.TOG, 2017
Arno Knapitsch, Jaesik Park, Qian-Yi Zhou, and Vladlen Koltun. Tanks and temples: Benchmarking large-scale scene reconstruction.TOG, 2017. 5, 6, 17
work page 2017
-
[43]
Skyfall-gs: Synthe- sizing immersive 3d urban scenes from satellite imagery
Jie-Ying Lee, Yi-Ruei Liu, Shr-Ruei Tsai, Wei-Cheng Chang, Chung-Ho Wu, Jiewen Chan, Zhenjun Zhao, Chieh Hubert Lin, and Yu-Lun Liu. Skyfall-gs: Synthe- sizing immersive 3d urban scenes from satellite imagery. arXiv preprint arXiv:2510.15869, 2025. 2, 3
-
[44]
Matthew J. Leotta, Cheng Long, Bastien Jacquet, Michael Zins, Daniel Lipsa, Jizhe Shan, Boyan Xu, Zhaoyu Li, Xun Zhang, Shih-Fu Chang, Misu Purri, Jia Xue, and Kristin Dana. Urban Semantic 3D Reconstruction from Multiview Satellite Imagery. InCVPRW, 2019. 2, 3
work page 2019
-
[45]
Zuoyue Li, Zhenqiang Li, Zhaopeng Cui, Marc Pollefeys, and Martin R. Oswald. Sat2Scene: 3D urban scene gener- ation from satellite images with diffusion. InCVPR, 2024. 3
work page 2024
-
[46]
Zuoyue Li, Zhenqiang Li, Zhaopeng Cui, Rongjun Qin, Marc Pollefeys, and Martin R. Oswald. Sat2Vid: Street- view panoramic video synthesis from a single satellite im- age. InICCV, 2021. 3
work page 2021
-
[47]
Megadepth: Learning single-view depth prediction from internet photos
Zhengqi Li and Noah Snavely. Megadepth: Learning single-view depth prediction from internet photos. In CVPR, 2018. 2, 17
work page 2018
-
[48]
Sky optimization: Semantically aware image processing of skies in low-light photography
Orly Liba, Longqi Cai, Yun-Ta Tsai, Elad Eban, Yair Movshovitz-Attias, Yael Pritch, Huizhong Chen, and Jonathan T Barron. Sky optimization: Semantically aware image processing of skies in low-light photography. In CVPRW, 2020. 6
work page 2020
-
[49]
Dl3dv-10k: A large-scale scene dataset for deep learning-based 3d vision
Lu Ling, Yichen Sheng, Zhi Tu, Wentian Zhao, Cheng Xin, Kun Wan, Lantao Yu, Qianyu Guo, Zixun Yu, Yawen Lu, et al. Dl3dv-10k: A large-scale scene dataset for deep learning-based 3d vision. InCVPR, 2024. 2, 6, 7, 16
work page 2024
-
[50]
Infinite nature: Perpetual view generation of natural scenes from a single image
Andrew Liu, Richard Tucker, Varun Jampani, Ameesh Makadia, Noah Snavely, and Angjoo Kanazawa. Infinite nature: Perpetual view generation of natural scenes from a single image. InICCV, 2021. 3
work page 2021
-
[51]
SLAM3R: Real-time dense scene reconstruction from monocular RGB videos
Yuzheng Liu, Siyan Dong, Shuzhe Wang, Yingda Yin, Yan- chao Yang, Qingnan Fan, and Baoquan Chen. SLAM3R: Real-time dense scene reconstruction from monocular RGB videos. InCVPR, 2025. 2
work page 2025
-
[52]
Worldmirror: Universal 3d world reconstruction with any-prior prompting,
Yifan Liu, Zhiyuan Min, Zhenwei Wang, Junta Wu, Tengfei Wang, Yixuan Yuan, Yawei Luo, and Chunchao Guo. Worldmirror: Universal 3d world reconstruction with any- prior prompting.arXiv preprint arXiv:2510.10726, 2025. 2
-
[53]
Mapillary Metropolis Dataset.https:// www.mapillary.com/dataset/metropolis
Mapillary. Mapillary Metropolis Dataset.https:// www.mapillary.com/dataset/metropolis. Ac- cessed: 2025-10-18. 5, 6, 16
work page 2025
-
[54]
Microsoft. Azure Maps.https : / / azure . microsoft . com / en - us / products / azure - maps. Accessed: 2025-10-04. 2, 4, 6
work page 2025
-
[55]
Srinivasan, Matthew Tancik, Jonathan T
Ben Mildenhall, Pratul P. Srinivasan, Matthew Tancik, Jonathan T. Barron, Ravi Ramamoorthi, and Ren Ng. Nerf: Representing scenes as neural radiance fields for view syn- thesis. InECCV, 2020. 2
work page 2020
-
[56]
Maxime Oquab, Timoth ´ee Darcet, Theo Moutakanni, Huy V . V o, Marc Szafraniec, Vasil Khalidov, Pierre Fernan- dez, Daniel Haziza, Francisco Massa, Alaaeldin El-Nouby, Russell Howes, Po-Yao Huang, Hu Xu, Vasu Sharma, Shang-Wen Li, Wojciech Galuba, Mike Rabbat, Mido Ass- ran, Nicolas Ballas, Gabriel Synnaeve, Ishan Misra, Herve Jegou, Julien Mairal, Patric...
work page internal anchor Pith review Pith/arXiv arXiv
-
[57]
Global Structure-from-Motion Revisited
Linfei Pan, Daniel Barath, Marc Pollefeys, and Jo- hannes Lutz Sch ¨onberger. Global Structure-from-Motion Revisited. InECCV, 2024. 2
work page 2024
-
[58]
Py- torch: An imperative style, high-performance deep learning library
Adam Paszke, Sam Gross, Francisco Massa, Adam Lerer, James Bradbury, Gregory Chanan, Trevor Killeen, Zeming Lin, Nataly Gimelshein, Luca Antiga, Alban Kopf, Fed- erico Metta, Allan Chiley, Brian Stwalley, Sheng Huang, Jiawan Jiang, Yehezkel Chen, Peng Zeng, Xiaobing Li, James Yu, Teteya Li, Andrey Kuchaiev, Kartik Ren, Houdong Zhang, Yanghan Shi, Jani Sin...
work page 2019
-
[59]
UniK3D: Universal camera monocular 3d estimation
Luigi Piccinelli, Christos Sakaridis, Mattia Segu, Yung- 10 Hsu Yang, Siyuan Li, Wim Abbeloos, and Luc Van Gool. UniK3D: Universal camera monocular 3d estimation. In CVPR, 2025. 6
work page 2025
-
[60]
See- ing through satellite images at street views.arXiv preprint arXiv:2505.17001, 2025
Ming Qian, Bin Tan, Qiuyu Wang, Xianwei Zheng, Han- jiang Xiong, Gui-Song Xia, Yujun Shen, and Nan Xue. See- ing through satellite images at street views.arXiv preprint arXiv:2505.17001, 2025. 3, 13, 17
-
[61]
Sat2density: Faithful density learning from satellite-ground image pairs
Ming Qian, Jincheng Xiong, Gui-Song Xia, and Nan Xue. Sat2density: Faithful density learning from satellite-ground image pairs. InICCV, 2023. 3, 7, 13, 17, 22
work page 2023
-
[62]
Rongjun Qin. RPC Stereo Processor (RSP)–A Software Package for Digital Surface Model and Orthophoto Gener- ation from Satellite Stereo Imagery.ISPRS Annals of the Photogrammetry, Remote Sensing and Spatial Information Sciences, 2016. 2, 3
work page 2016
-
[63]
Vi- sion transformers for dense prediction
Ren ´e Ranftl, Alexey Bochkovskiy, and Vladlen Koltun. Vi- sion transformers for dense prediction. InICCV, 2021. 4
work page 2021
-
[64]
Com- mon objects in 3d: Large-scale learning and evaluation of real-life 3d category reconstruction
Jeremy Reizenstein, Roman Shapovalov, Philipp Henzler, Luca Sbordone, Patrick Labatut, and David Novotny. Com- mon objects in 3d: Large-scale learning and evaluation of real-life 3d category reconstruction. InICCV, 2021. 2
work page 2021
-
[65]
High-Resolution Image Synthesis with Latent Diffusion Models
Robin Rombach, Andreas Blattmann, Dominik Lorenz, Patrick Esser, and Bjorn Ommer. High-Resolution Image Synthesis with Latent Diffusion Models. InCVPR, 2022. 3
work page 2022
-
[66]
Pixelwise view selection for unstructured multi-view stereo
Johannes L Sch ¨onberger, Enliang Zheng, Jan-Michael Frahm, and Marc Pollefeys. Pixelwise view selection for unstructured multi-view stereo. InECCV, 2016. 2
work page 2016
-
[67]
A multi-view stereo benchmark with high- resolution images and multi-camera videos
Thomas Sch ¨ops, Torsten Sattler, Christian H¨ane, and Marc Pollefeys. A multi-view stereo benchmark with high- resolution images and multi-camera videos. InCVPR,
-
[68]
Sch ¨onberger and Jan-Michael Frahm
Johannes L. Sch ¨onberger and Jan-Michael Frahm. Structure-from-motion revisited. InCVPR, 2016. 2, 6
work page 2016
-
[69]
Geometry-guided street-view panorama synthesis from satellite imagery.PAMI, 2022
Yujiao Shi, Dylan Campbell, Xin Yu, and Hongdong Li. Geometry-guided street-view panorama synthesis from satellite imagery.PAMI, 2022. 3
work page 2022
-
[70]
Splatt3R: Zero-shot Gaussian Splatting from Uncalibrated Image Pairs
Brandon Smart, Chuanxia Zheng, Iro Laina, and Vic- tor Adrian Prisacariu. Splatt3r: Zero-shot gaussian splatting from uncalibrated image pairs.arXiv preprint arXiv:2408.13912, 2024. 2
work page internal anchor Pith review Pith/arXiv arXiv 2024
-
[71]
Srinivasan, Richard Tucker, Jonathan T
Pratul P. Srinivasan, Richard Tucker, Jonathan T. Barron, Ravi Ramamoorthi, Ren Ng, and Noah Snavely. Pushing the boundaries of view extrapolation with multiplane im- ages.CVPR, 2019. 2
work page 2019
-
[72]
Stanislaw Szymanowicz, Jason Y . Zhang, Pratul Srinivasan, Ruiqi Gao, Arthur Brussee, Aleksander Holynski, Ricardo Martin-Brualla, Jonathan T. Barron, and Philipp Henzler. Bolt3D: Generating 3D Scenes in Seconds. InICCV, 2025. 17
work page 2025
-
[73]
Nerfstudio: A modular framework for neural radiance field development
Matthew Tancik, Ethan Weber, Evonne Ng, Ruilong Li, Brent Yi, Justin Kerr, Terrance Wang, Alexander Kristof- fersen, Jake Austin, Kamyar Salahi, Abhik Ahuja, David McAllister, and Angjoo Kanazawa. Nerfstudio: A modular framework for neural radiance field development. InSIG- GRAPH, 2023. 7
work page 2023
-
[74]
Zhenggang Tang, Yuchen Fan, Dilin Wang, Hongyu Xu, Rakesh Ranjan, Alexander G. Schwing, and Zhicheng Yan. MV-DUSt3R+: Single-stage scene reconstruction from sparse views in 2 seconds. InCVPR, 2025. 2
work page 2025
-
[75]
Coming Down to Earth: Satellite-to-Street View Synthesis for Geo-Localization
Aysim Toker, Qunjie Zhou, Maxim Maximov, and Laura Leal-Taix´e. Coming Down to Earth: Satellite-to-Street View Synthesis for Geo-Localization. InCVPR, 2021. 3
work page 2021
-
[76]
Single-view view syn- thesis with multiplane images
Richard Tucker and Noah Snavely. Single-view view syn- thesis with multiplane images. InCVPR, 2020. 2
work page 2020
-
[77]
U.S. Geological Survey. USGS Lidar Explorer Map. https : / / apps . nationalmap . gov / lidar - explorer. Accessed: 2025-10-19. 2, 6
work page 2025
-
[78]
Gomez, Łukasz Kaiser, and Illia Polosukhin
Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Łukasz Kaiser, and Illia Polosukhin. Attention is all you need. InNeurIPS,
-
[79]
Aerialmegadepth: Learning aerial-ground reconstruction and view synthesis
Khiem Vuong, Anurag Ghosh, Deva Ramanan, Srinivasa Narasimhan, and Shubham Tulsiani. Aerialmegadepth: Learning aerial-ground reconstruction and view synthesis. InCVPR, 2025. 2
work page 2025
-
[80]
Jianyuan Wang. Skyseg.https://huggingface. co/JianyuanWang/skyseg. Accessed: 2025-08-10. 6
work page 2025
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.