From Orbit to Ground: Generative City Photogrammetry from Extreme Off-Nadir Satellite Images
Pith reviewed 2026-05-17 00:54 UTC · model grok-4.3
The pith
Representing city geometry as a 2.5D height map from satellite views enables synthesis of photorealistic ground-level images over large areas.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
We show that modeling city geometry as a 2.5D height map via a Z-monotonic signed distance field stabilizes the reconstruction process from sparse extreme off-nadir satellite images. This produces watertight meshes featuring crisp roof lines and clean vertically extruded facades. Appearance is then transferred from the satellite data through differentiable rendering and refined by a generative texture restoration network that recovers plausible high-frequency details from the degraded orbital captures. Experiments confirm the approach reconstructs real-world regions spanning 4 square kilometers and delivers superior photorealistic novel views at ground level compared to prior techniques.
What carries the argument
A Z-monotonic signed distance field representing a 2.5D height map, which enforces the vertical extrusion typical of urban buildings and provides stable optimization targets under sparse satellite viewpoints with minimal parallax.
If this is right
- Large-scale urban areas up to 4 km² can be reconstructed from only a few satellite images.
- The resulting meshes and textures support high-fidelity ground view synthesis for visualization.
- Models serve directly as assets in urban planning and simulation applications.
- The technique remains robust across extensive experiments on real-world city data.
Where Pith is reading between the lines
- Such reconstructions could accelerate the creation of digital city twins by leveraging freely available satellite archives instead of costly aerial or ground campaigns.
- Integration with existing simulation tools might improve accuracy in predicting urban phenomena like heat islands or evacuation routes.
- Future work could test whether the same height-map prior helps with mixed natural and built environments beyond pure cities.
Load-bearing premise
That the geometry of cities is well approximated by vertically extruded structures captured in a 2.5D height map without significant overhangs or intricate roof shapes.
What would settle it
Observing whether the method produces accurate facades and roofs on buildings with known overhangs or sloped complex roofs when compared against high-resolution ground truth imagery or LiDAR scans.
Figures
read the original abstract
City-scale 3D reconstruction from satellite imagery presents the challenge of extreme viewpoint extrapolation, where our goal is to synthesize ground-level novel views from sparse orbital images with minimal parallax. This requires inferring nearly $90^\circ$ viewpoint gaps from image sources with severely foreshortened facades and flawed textures, causing state-of-the-art reconstruction engines such as NeRF and 3DGS to fail. To address this problem, we propose two design choices tailored for city structures and satellite inputs. First, we model city geometry as a 2.5D height map, implemented as a Z-monotonic signed distance field (SDF) that matches urban building layouts from top-down viewpoints. This stabilizes geometry optimization under sparse, off-nadir satellite views and yields a watertight mesh with crisp roofs and clean, vertically extruded facades. Second, we paint the mesh appearance from satellite images via differentiable rendering techniques. While the satellite inputs may contain long-range, blurry captures, we further train a generative texture restoration network to enhance the appearance, recovering high-frequency, plausible texture details from degraded inputs. Our method's scalability and robustness are demonstrated through extensive experiments on large-scale urban reconstruction. For example, in our teaser figure, we reconstruct a $4\,\mathrm{km}^2$ real-world region from only a few satellite images, achieving state-of-the-art performance in synthesizing photorealistic ground views. The resulting models are not only visually compelling but also serve as high-fidelity, application-ready assets for downstream tasks like urban planning and simulation. Project page can be found at https://pku-vcl-geometry.github.io/Orbit2Ground/.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper claims to solve city-scale 3D reconstruction from sparse extreme off-nadir satellite images by modeling geometry as a 2.5D height map via a Z-monotonic signed distance field (SDF) that produces watertight meshes with crisp roofs and extruded facades, combined with differentiable rendering and a generative texture restoration network to recover plausible high-frequency details from blurry inputs. It demonstrates the approach on a 4 km² real-world region, claiming state-of-the-art photorealistic ground-level novel view synthesis suitable for urban planning and simulation, where NeRF and 3DGS fail due to large viewpoint gaps.
Significance. If the central claims hold with supporting evidence, the work would advance scalable urban photogrammetry by providing an interpretable, application-ready alternative to general-purpose implicit representations for satellite-to-ground extrapolation. The explicit tailoring of the geometry representation and generative restoration to city structures is a positive aspect that could enable downstream uses in simulation.
major comments (2)
- [Abstract] Abstract: The Z-monotonic SDF is asserted to 'match urban building layouts from top-down viewpoints' and yield 'crisp roofs and clean, vertically extruded facades,' yet by construction a monotonic height field cannot represent overhangs, balconies, awnings, or non-vertical roof elements. This assumption is load-bearing for the claim of high-fidelity geometry that supports accurate photorealistic ground views and simulation assets; the generative texture network cannot compensate for missing 3D structure.
- [Abstract] Abstract: The manuscript states that the method achieves 'state-of-the-art performance in synthesizing photorealistic ground views' on the 4 km² example, but supplies no quantitative metrics, baseline comparisons, ablation results, or error analysis to support this. Without such evidence the superiority claim cannot be evaluated and is central to the paper's contribution.
minor comments (1)
- [Abstract] The project page URL is provided but the manuscript should include a brief statement on code or model availability to support reproducibility claims.
Simulated Author's Rebuttal
We thank the referee for the constructive feedback on our manuscript. We address each major comment below with clarifications on our design choices and evidence, and indicate where revisions will be made to strengthen the presentation.
read point-by-point responses
-
Referee: [Abstract] Abstract: The Z-monotonic SDF is asserted to 'match urban building layouts from top-down viewpoints' and yield 'crisp roofs and clean, vertically extruded facades,' yet by construction a monotonic height field cannot represent overhangs, balconies, awnings, or non-vertical roof elements. This assumption is load-bearing for the claim of high-fidelity geometry that supports accurate photorealistic ground views and simulation assets; the generative texture network cannot compensate for missing 3D structure.
Authors: We agree that a Z-monotonic SDF is a 2.5D height-map representation and therefore cannot model overhangs, balconies, awnings, or non-vertical roof elements. This is a deliberate modeling choice for city-scale reconstruction from sparse extreme off-nadir satellite imagery: it ensures optimization stability, produces watertight meshes, and matches the dominant structure of urban buildings (vertically extruded volumes with simple roofs). The generative texture restoration network is designed to synthesize plausible high-frequency appearance on these facades for ground-level novel views, which is the primary goal for photorealistic synthesis and downstream simulation use. We acknowledge that this approximation limits geometric fidelity for fine architectural details and will add an explicit limitations paragraph discussing the 2.5D assumption, its suitability for most urban planning applications, and potential extensions to full 3D representations. revision: partial
-
Referee: [Abstract] Abstract: The manuscript states that the method achieves 'state-of-the-art performance in synthesizing photorealistic ground views' on the 4 km² example, but supplies no quantitative metrics, baseline comparisons, ablation results, or error analysis to support this. Without such evidence the superiority claim cannot be evaluated and is central to the paper's contribution.
Authors: The full manuscript contains quantitative evaluations in the Experiments section, including PSNR/SSIM/LPIPS comparisons against NeRF and 3DGS baselines, ablations on the Z-monotonic SDF and generative texture components, and error analysis on the 4 km² real-world region. These results support the state-of-the-art claim for ground-view synthesis. However, the abstract does not summarize the numerical evidence. We will revise the abstract to include key quantitative metrics and explicit references to the supporting experiments and baselines, making the superiority claim directly verifiable from the abstract. revision: yes
Circularity Check
No circularity: explicit modeling choices applied to inputs
full rationale
The paper proposes two explicit design choices—representing city geometry via a 2.5D Z-monotonic SDF height map and applying a generative texture restoration network after differentiable rendering—to address extreme off-nadir satellite inputs. These are presented as tailored modeling decisions that stabilize optimization and enhance appearance, not as quantities derived from or reducing back to the input data by construction. No equations, fitted parameters renamed as predictions, or self-citation chains are shown that would make the central reconstruction claims equivalent to the inputs. The method is therefore self-contained, with results evaluated on real-world 4 km² regions against external visual and application benchmarks.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption City geometry from top-down satellite views is well approximated by vertically extruded facades and crisp roofs that can be represented as a Z-monotonic signed distance field.
invented entities (1)
-
Generative texture restoration network
no independent evidence
Lean theorems connected to this paper
-
IndisputableMonolith/Foundation/AlexanderDuality.leanalexander_duality_circle_linking unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
we model city geometry as a 2.5D height map, implemented as a Z-Monotonic signed distance field (SDF) that matches urban building layouts from top-down viewpoints... ∂s(x, y, z)/∂z ≥ 0
-
IndisputableMonolith/Foundation/ArithmeticFromLogic.leanembed_strictMono_of_one_lt unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
Z-Monotonic SDF... yields a watertight mesh with crisp roofs and clean, vertically extruded facades
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Reference graph
Works this paper leans on
-
[1]
3dgs-to-pc: 3d gaussian splatting to dense point clouds
Lewis A G Stuart, Andrew Morton, Ian Stavness, and Michael P Pound. 3dgs-to-pc: 3d gaussian splatting to dense point clouds. InProceedings of the IEEE/CVF International Conference on Computer Vision (ICCV) Workshops, pages 3730–3739, 2025. 5
work page 2025
-
[2]
Gaussian splatting for efficient satellite image photogram- metry
Luca Savant Aira, Gabriele Facciolo, and Thibaud Ehret. Gaussian splatting for efficient satellite image photogram- metry. InProceedings of the Computer Vision and Pattern Recognition Conference, pages 5959–5969, 2025. 3, 6, 7, 22
work page 2025
-
[3]
Nan Bai, Anran Yang, Hao Chen, and Chun Du. Satgs: Remote sensing novel view synthesis using multi-temporal satellite images with appearance-adaptive 3dgs.Remote Sensing, 17(9):1609, 2025. 3, 6
work page 2025
-
[4]
Patchmatch: A randomized correspon- dence algorithm for structural image editing.ACM Trans
Connelly Barnes, Eli Shechtman, Adam Finkelstein, and Dan B Goldman. Patchmatch: A randomized correspon- dence algorithm for structural image editing.ACM Trans. Graph., 28(3):24, 2009. 2
work page 2009
-
[5]
Chenjie Cao, Xinlin Ren, and Yanwei Fu. Mvsformer++: Revealing the devil in transformer’s details for multi-view stereo.arXiv preprint arXiv:2401.11673, 2024. 1
-
[6]
Two deterministic half-quadratic regular- ization algorithms for computed imaging
Pierre Charbonnier, Laure Blanc-Feraud, Gilles Aubert, and Michel Barlaud. Two deterministic half-quadratic regular- ization algorithms for computed imaging. InProceedings of 1st international conference on image processing, pages 168–172. IEEE, 1994. 5
work page 1994
-
[7]
Yu Chen and Gim Hee Lee. Dogs: Distributed-oriented gaus- sian splatting for large-scale 3d reconstruction via gaussian consensus.Advances in Neural Information Processing Sys- tems, 37:34487–34512, 2024. 2
work page 2024
-
[8]
Ziyang Chen, Wenting Li, Zhongwei Cui, and Yongjun Zhang. Surface depth estimation from multi-view stereo satellite images with distribution contrast network.IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, 2024. 3
work page 2024
-
[9]
Luciddreamer: Domain-free generation of 3d gaussian splatting scenes
Jaeyoung Chung, Suyoung Lee, Hyeongjin Nam, Jaerin Lee, and Kyoung Mu Lee. Luciddreamer: Domain-free gen- eration of 3d gaussian splatting scenes.arXiv preprint arXiv:2311.13384, 2023. 5
-
[10]
An automatic and modular stereo pipeline for pushbroom images
Carlo De Franchis, Enric Meinhardt-Llopis, Julien Michel, Jean-Michel Morel, and Gabriele Facciolo. An automatic and modular stereo pipeline for pushbroom images. InISPRS Annals of the Photogrammetry, Remote Sensing and Spatial Information Sciences, 2014. 3
work page 2014
-
[11]
Shadow neural radiance fields for multi-view satellite photogrammetry
Dawa Derksen and Dario Izzo. Shadow neural radiance fields for multi-view satellite photogrammetry. InProceed- ings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 1152–1161, 2021. 3, 6
work page 2021
-
[12]
Scaling recti- fied flow transformers for high-resolution image synthesis
Patrick Esser, Sumith Kulal, Andreas Blattmann, Rahim Entezari, Jonas M ¨uller, Harry Saini, Yam Levi, Dominik Lorenz, Axel Sauer, Frederic Boesel, et al. Scaling recti- fied flow transformers for high-resolution image synthesis. InForty-first international conference on machine learning,
-
[13]
Flashgs: Efficient 3d gaussian splatting for large-scale and high-resolution rendering
Guofeng Feng, Siyan Chen, Rong Fu, Zimu Liao, Yi Wang, Tao Liu, Boni Hu, Linning Xu, Zhilin Pei, Hengjie Li, et al. Flashgs: Efficient 3d gaussian splatting for large-scale and high-resolution rendering. InProceedings of the Computer Vision and Pattern Recognition Conference, pages 26652– 26662, 2025. 2
work page 2025
-
[14]
Flowr: Flowing from sparse to dense 3d reconstructions
Tobias Fischer, Samuel Rota Bul `o, Yung-Hsu Yang, Nikhil Keetha, Lorenzo Porzi, Norman M ¨uller, Katja Schwarz, Jonathon Luiten, Marc Pollefeys, and Peter Kontschieder. Flowr: Flowing from sparse to dense 3d reconstructions. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 27702–27712, 2025. 3
work page 2025
-
[15]
Yasutaka Furukawa and Jean Ponce. Accurate, dense, and robust multiview stereopsis.IEEE transactions on pattern analysis and machine intelligence, 32(8):1362–1376, 2009. 2, 4
work page 2009
-
[16]
Jian Gao, Jin Liu, and Shunping Ji. Rational polyno- mial camera model warping for deep learning based satel- lite multi-view stereo matching. InProceedings of the IEEE/CVF international conference on computer vision, pages 6148–6157, 2021. 3
work page 2021
-
[17]
Jian Gao, Jin Liu, and Shunping Ji. A general deep learn- ing based framework for 3d reconstruction from multi-view stereo satellite images.ISPRS Journal of Photogrammetry and Remote Sensing, 195:446–461, 2023. 3
work page 2023
-
[18]
Yuanyuan Gao, Hao Li, Jiaqi Chen, Zhengyu Zou, Zhihang Zhong, Dingwen Zhang, Xiao Sun, and Junwei Han. Citygs- x: A scalable architecture for efficient and geometrically accurate large-scale scene reconstruction.arXiv preprint arXiv:2503.23044, 2025. 2, 6, 5
-
[19]
Sat2city: 3d city generation from a single satellite image with cascaded latent diffusion
Tongyan Hua, Lutao Jiang, Ying-Cong Chen, and Wufan Zhao. Sat2city: 3d city generation from a single satellite image with cascaded latent diffusion. InProceedings of the IEEE/CVF International Conference on Computer Vision, pages 27978–27988, 2025. 3
work page 2025
-
[20]
2d gaussian splatting for geometrically ac- curate radiance fields
Binbin Huang, Zehao Yu, Anpei Chen, Andreas Geiger, and Shenghua Gao. 2d gaussian splatting for geometrically ac- curate radiance fields. InACM SIGGRAPH 2024 conference papers, pages 1–11, 2024. 6, 7, 5
work page 2024
-
[21]
SkySplat: Generalizable 3D Gaussian Splatting from Multi-Temporal Sparse Satellite Images
Xuejun Huang, Xinyi Liu, Yi Wan, Zhi Zheng, Bin Zhang, Mingtao Xiong, Yingying Pei, and Yongjun Zhang. Skysplat: Generalizable 3d gaussian splatting from multi-temporal sparse satellite images.arXiv preprint arXiv:2508.09479, 2025. 3
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[22]
3d gaussian splatting for real-time radiance field rendering.ACM Trans
Bernhard Kerbl, Georgios Kopanas, Thomas Leimk ¨uhler, and George Drettakis. 3d gaussian splatting for real-time radiance field rendering.ACM Trans. Graph., 42(4):139–1,
-
[23]
Bernhard Kerbl, Andreas Meuleman, Georgios Kopanas, Michael Wimmer, Alexandre Lanvin, and George Drettakis. A hierarchical 3d gaussian representation for real-time ren- dering of very large datasets.ACM Transactions on Graphics (TOG), 43(4):1–15, 2024. 2
work page 2024
-
[24]
Adam: A Method for Stochastic Optimization
Diederik P Kingma. Adam: A method for stochastic opti- mization.arXiv preprint arXiv:1412.6980, 2014. 5
work page internal anchor Pith review Pith/arXiv arXiv 2014
-
[25]
Flux.https://github.com/ black-forest-labs/flux, 2024
Black Forest Labs. Flux.https://github.com/ black-forest-labs/flux, 2024. 2, 5, 3, 20
work page 2024
-
[26]
Jie-Ying Lee, Yi-Ruei Liu, Shr-Ruei Tsai, Wei-Cheng Chang, Chung-Ho Wu, Jiewen Chan, Zhenjun Zhao, Chieh Hubert Lin, and Yu-Lun Liu. Skyfall-gs: Synthesiz- 9 ing immersive 3d urban scenes from satellite imagery.arXiv preprint arXiv:2510.15869, 2025. 3, 6, 4, 5, 7, 14, 15, 16
-
[27]
Matrixcity: A large-scale city dataset for city-scale neural rendering and beyond
Yixuan Li, Lihan Jiang, Linning Xu, Yuanbo Xiangli, Zhen- zhi Wang, Dahua Lin, and Bo Dai. Matrixcity: A large-scale city dataset for city-scale neural rendering and beyond. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 3205–3215, 2023. 6, 3
work page 2023
-
[28]
Magic3d: High-resolution text-to-3d content creation
Chen-Hsuan Lin, Jun Gao, Luming Tang, Towaki Takikawa, Xiaohui Zeng, Xun Huang, Karsten Kreis, Sanja Fidler, Ming-Yu Liu, and Tsung-Yi Lin. Magic3d: High-resolution text-to-3d content creation. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 300–309, 2023. 3
work page 2023
-
[29]
Vastgaussian: Vast 3d gaussians for large scene reconstruction
Jiaqi Lin, Zhihao Li, Xiao Tang, Jianzhuang Liu, Shiyong Liu, Jiayue Liu, Yangdi Lu, Xiaofei Wu, Songcen Xu, You- liang Yan, et al. Vastgaussian: Vast 3d gaussians for large scene reconstruction. InProceedings of the IEEE/CVF Con- ference on Computer Vision and Pattern Recognition, pages 5166–5175, 2024. 2
work page 2024
-
[30]
Zero-1-to- 3: Zero-shot one image to 3d object
Ruoshi Liu, Rundi Wu, Basile Van Hoorick, Pavel Tok- makov, Sergey Zakharov, and Carl V ondrick. Zero-1-to- 3: Zero-shot one image to 3d object. InProceedings of the IEEE/CVF international conference on computer vision, pages 9298–9309, 2023. 3
work page 2023
-
[31]
Xi Liu, Chaoyi Zhou, and Siyu Huang. 3dgs-enhancer: Enhancing unbounded 3d gaussian splatting with view- consistent 2d diffusion priors.Advances in Neural Informa- tion Processing Systems, 37:133305–133327, 2024. 3
work page 2024
-
[32]
Citygaussian: Real-time high-quality large-scale scene rendering with gaussians
Yang Liu, Chuanchen Luo, Lue Fan, Naiyan Wang, Jun- ran Peng, and Zhaoxiang Zhang. Citygaussian: Real-time high-quality large-scale scene rendering with gaussians. In European Conference on Computer Vision, pages 265–282. Springer, 2024. 2
work page 2024
-
[33]
Yang Liu, Chuanchen Luo, Zhongkai Mao, Junran Peng, and Zhaoxiang Zhang. Citygaussianv2: Efficient and geometri- cally accurate reconstruction for large-scale scenes.arXiv preprint arXiv:2411.00771, 2024. 2, 6, 5
-
[34]
Wonder3d: Sin- gle image to 3d using cross-domain diffusion
Xiaoxiao Long, Yuan-Chen Guo, Cheng Lin, Yuan Liu, Zhiyang Dou, Lingjie Liu, Yuexin Ma, Song-Hai Zhang, Marc Habermann, Christian Theobalt, et al. Wonder3d: Sin- gle image to 3d using cross-domain diffusion. InProceed- ings of the IEEE/CVF conference on computer vision and pattern recognition, pages 9970–9980, 2024. 3
work page 2024
-
[35]
Marching cubes: A high resolution 3d surface construction algorithm
William E Lorensen and Harvey E Cline. Marching cubes: A high resolution 3d surface construction algorithm. InSem- inal graphics: pioneering efforts that shaped the field, pages 347–353. 1998. 4, 8
work page 1998
-
[36]
Roger Mar ´ı, Gabriele Facciolo, and Thibaud Ehret. Sat-nerf: Learning multi-view satellite photogrammetry with transient objects and shadow modeling using rpc cameras. InPro- ceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 1311–1321, 2022. 3, 6
work page 2022
-
[37]
Multi- date earth observation nerf: The detail is in the shadows
Roger Mar ´ı, Gabriele Facciolo, and Thibaud Ehret. Multi- date earth observation nerf: The detail is in the shadows. In Proceedings of the IEEE/CVF Conference on Computer Vi- sion and Pattern Recognition, pages 2035–2045, 2023. 3, 6
work page 2035
-
[38]
Realfusion: 360deg reconstruction of any object from a single image
Luke Melas-Kyriazi, Iro Laina, Christian Rupprecht, and Andrea Vedaldi. Realfusion: 360deg reconstruction of any object from a single image. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 8446–8455, 2023. 3
work page 2023
-
[39]
Ben Mildenhall, Pratul P Srinivasan, Matthew Tancik, Jonathan T Barron, Ravi Ramamoorthi, and Ren Ng. Nerf: Representing scenes as neural radiance fields for view syn- thesis.Communications of the ACM, 65(1):99–106, 2021. 2
work page 2021
-
[40]
DINOv2: Learning Robust Visual Features without Supervision
Maxime Oquab, Timoth ´ee Darcet, Th ´eo Moutakanni, Huy V o, Marc Szafraniec, Vasil Khalidov, Pierre Fernandez, Daniel Haziza, Francisco Massa, Alaaeldin El-Nouby, et al. Dinov2: Learning robust visual features without supervision. arXiv preprint arXiv:2304.07193, 2023. 1
work page internal anchor Pith review Pith/arXiv arXiv 2023
-
[41]
Scalable diffusion models with transformers
William Peebles and Saining Xie. Scalable diffusion models with transformers. InProceedings of the IEEE/CVF inter- national conference on computer vision, pages 4195–4205,
-
[42]
Rasterized edge gra- dients: Handling discontinuities differentiably
Stanislav Pidhorskyi, Tomas Simon, Gabriel Schwartz, He Wen, Yaser Sheikh, and Jason Saragih. Rasterized edge gra- dients: Handling discontinuities differentiably. InEuropean Conference on Computer Vision, pages 335–352. Springer,
-
[43]
Example datasets - pix4dmatic.https : //support.pix4d.com/hc/en- us/articles/ 360048957691, 2025
Pix4D. Example datasets - pix4dmatic.https : //support.pix4d.com/hc/en- us/articles/ 360048957691, 2025. Accessed: 2025-11. 6
work page 2025
-
[44]
DreamFusion: Text-to-3D using 2D Diffusion
Ben Poole, Ajay Jain, Jonathan T Barron, and Ben Milden- hall. Dreamfusion: Text-to-3d using 2d diffusion.arXiv preprint arXiv:2209.14988, 2022. 3
work page internal anchor Pith review Pith/arXiv arXiv 2022
-
[45]
Rongjun Qin. Rpc stereo processor (rsp)–a software pack- age for digital surface model and orthophoto generation from satellite stereo imagery.ISPRS Annals of the Photogram- metry, Remote Sensing and Spatial Information Sciences, 3: 77–82, 2016. 3
work page 2016
-
[46]
Konstantinos Rematas, Andrew Liu, Pratul P Srini- vasan, Jonathan T Barron, Andrea Tagliasacchi, Thomas Funkhouser, and Vittorio Ferrari. Urban radiance fields. In Proceedings of the IEEE/CVF Conference on Computer Vi- sion and Pattern Recognition, pages 12932–12942, 2022. 2
work page 2022
-
[47]
Kerui Ren, Lihan Jiang, Tao Lu, Mulin Yu, Linning Xu, Zhangkai Ni, and Bo Dai. Octree-gs: Towards consistent real-time rendering with lod-structured 3d gaussians.arXiv preprint arXiv:2403.17898, 2024. 2
-
[48]
High-resolution image synthesis with latent diffusion models
Robin Rombach, Andreas Blattmann, Dominik Lorenz, Patrick Esser, and Bj ¨orn Ommer. High-resolution image synthesis with latent diffusion models. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 10684–10695, 2022. 3
work page 2022
-
[49]
Chitwan Saharia, William Chan, Saurabh Saxena, Lala Li, Jay Whang, Emily L Denton, Kamyar Ghasemipour, Raphael Gontijo Lopes, Burcu Karagol Ayan, Tim Salimans, et al. Photorealistic text-to-image diffusion models with deep language understanding.Advances in neural information processing systems, 35:36479–36494, 2022. 3
work page 2022
-
[50]
Structure- from-motion revisited
Johannes L Schonberger and Jan-Michael Frahm. Structure- from-motion revisited. InProceedings of the IEEE con- 10 ference on computer vision and pattern recognition, pages 4104–4113, 2016. 2
work page 2016
-
[51]
Pixelwise view selection for unstructured multi-view stereo
Johannes L Sch ¨onberger, Enliang Zheng, Jan-Michael Frahm, and Marc Pollefeys. Pixelwise view selection for unstructured multi-view stereo. InEuropean conference on computer vision, pages 501–518. Springer, 2016. 2
work page 2016
-
[52]
Tianchang Shen, Jacob Munkberg, Jon Hasselgren, Kangxue Yin, Zian Wang, Wenzheng Chen, Zan Gojcic, Sanja Fidler, Nicholas Sharp, and Jun Gao. Flexible isosurface extraction for gradient-based mesh optimization.ACM Transactions on Graphics (TOG), 42(4):1–16, 2023. 2, 4, 8, 1, 3
work page 2023
-
[53]
Oriane Sim ´eoni, Huy V V o, Maximilian Seitzer, Federico Baldassarre, Maxime Oquab, Cijo Jose, Vasil Khalidov, Marc Szafraniec, Seungeun Yi, Micha ¨el Ramamonjisoa, et al. Dinov3.arXiv preprint arXiv:2508.10104, 2025. 1
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[54]
City-on-web: Real-time neural rendering of large- scale scenes on the web
Kaiwen Song, Xiaoyi Zeng, Chenqu Ren, and Juyong Zhang. City-on-web: Real-time neural rendering of large- scale scenes on the web. InEuropean Conference on Com- puter Vision, pages 385–402. Springer, 2024. 2
work page 2024
-
[55]
Block-nerf: Scalable large scene neural view synthesis
Matthew Tancik, Vincent Casser, Xinchen Yan, Sabeek Prad- han, Ben Mildenhall, Pratul P Srinivasan, Jonathan T Barron, and Henrik Kretzschmar. Block-nerf: Scalable large scene neural view synthesis. InProceedings of the IEEE/CVF con- ference on computer vision and pattern recognition, pages 8248–8258, 2022. 2
work page 2022
-
[56]
HunyuanWorld Team, Zhenwei Wang, Yuhao Liu, Junta Wu, Zixiao Gu, Haoyuan Wang, Xuhui Zuo, Tianyu Huang, Wen- huan Li, Sheng Zhang, et al. Hunyuanworld 1.0: Generating immersive, explorable, and interactive 3d worlds from words or pixels.arXiv preprint arXiv:2507.21809, 2025. 3
-
[57]
Mega-nerf: Scalable construction of large- scale nerfs for virtual fly-throughs
Haithem Turki, Deva Ramanan, and Mahadev Satya- narayanan. Mega-nerf: Scalable construction of large- scale nerfs for virtual fly-throughs. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 12922–12931, 2022. 2
work page 2022
-
[58]
Suds: Scalable urban dynamic scenes
Haithem Turki, Jason Y Zhang, Francesco Ferroni, and Deva Ramanan. Suds: Scalable urban dynamic scenes. InPro- ceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 12375–12385, 2023. 2
work page 2023
-
[59]
Difix3d+: Improving 3d reconstruc- tions with single-step diffusion models
Jay Zhangjie Wu, Yuxuan Zhang, Haithem Turki, Xuanchi Ren, Jun Gao, Mike Zheng Shou, Sanja Fidler, Zan Goj- cic, and Huan Ling. Difix3d+: Improving 3d reconstruc- tions with single-step diffusion models. InProceedings of the Computer Vision and Pattern Recognition Conference, pages 26024–26035, 2025. 3
work page 2025
-
[60]
Grid-guided neural radiance fields for large urban scenes
Linning Xu, Yuanbo Xiangli, Sida Peng, Xingang Pan, Nanxuan Zhao, Christian Theobalt, Bo Dai, and Dahua Lin. Grid-guided neural radiance fields for large urban scenes. In Proceedings of the IEEE/CVF Conference on Computer Vi- sion and Pattern Recognition, pages 8296–8306, 2023. 2
work page 2023
-
[61]
Shuting Yang, Hao Chen, Fachuan He, Wen Chen, Ting Chen, and Jianjun He. A learning-based dual-scale enhanced confidence for dsm fusion in 3d reconstruction of multi-view satellite images.IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, 2025. 3
work page 2025
-
[62]
Xingyilang Yin, Qi Zhang, Jiahao Chang, Ying Feng, Qing- nan Fan, Xi Yang, Chi-Man Pun, Huaqi Zhang, and Xi- aodong Cun. Gsfixer: Improving 3d gaussian splatting with reference-guided video diffusion priors.arXiv preprint arXiv:2508.09667, 2025. 3
-
[63]
Wonderworld: Interactive 3d scene generation from a single image
Hong-Xing Yu, Haoyi Duan, Charles Herrmann, William T Freeman, and Jiajun Wu. Wonderworld: Interactive 3d scene generation from a single image. InProceedings of the Computer Vision and Pattern Recognition Conference, pages 5916–5926, 2025. 3, 5
work page 2025
-
[64]
Mip-splatting: Alias-free 3d gaussian splat- ting
Zehao Yu, Anpei Chen, Binbin Huang, Torsten Sattler, and Andreas Geiger. Mip-splatting: Alias-free 3d gaussian splat- ting. InProceedings of the IEEE/CVF Conference on Com- puter Vision and Pattern Recognition (CVPR), pages 19447– 19456, 2024. 6, 5
work page 2024
-
[65]
Leveraging vision re- construction pipelines for satellite imagery
Kai Zhang, Noah Snavely, and Jin Sun. Leveraging vision re- construction pipelines for satellite imagery. InProceedings of the IEEE/CVF International Conference on Computer Vi- sion Workshops, pages 0–0, 2019. 3, 1, 4
work page 2019
-
[66]
Lulin Zhang and Ewelina Rupnik. Sparsesat-nerf: Dense depth supervised neural radiance fields for sparse satellite images.arXiv preprint arXiv:2309.00277, 2023. 3, 6
-
[67]
Tongtong Zhang, Yu Zhou, Yuanxiang Li, and Xian Wei. Satensorf: Fast satellite tensorial radiance field for multidate satellite imagery of large size.IEEE Transactions on Geo- science and Remote Sensing, 62:1–15, 2024. 3, 6, 7, 22
work page 2024
-
[68]
Yuqi Zhang, Guanying Chen, and Shuguang Cui. Effi- cient large-scale scene representation with a hybrid of high- resolution grid and plane features.Pattern Recognition, 158: 111001, 2025. 2
work page 2025
-
[69]
On scaling up 3d gaussian splatting training
Hexu Zhao, Haoyang Weng, Daohan Lu, Ang Li, Jinyang Li, Aurojit Panda, and Saining Xie. On scaling up 3d gaussian splatting training. InEuropean Conference on Computer Vi- sion, pages 14–36. Springer, 2024. 2
work page 2024
-
[70]
Li Zhao, Haiyan Wang, Yi Zhu, and Mei Song. A review of 3d reconstruction from high-resolution urban satellite im- ages.International Journal of Remote Sensing, 44(2):713– 748, 2023. 3
work page 2023
-
[71]
Ours w/o Image Restoration Network
MI Zhenxing and Dan Xu. Switch-nerf: Learning scene de- composition with mixture of experts for large-scale neural radiance fields. InThe Eleventh International Conference on Learning Representations, 2022. 2 11 From Orbit to Ground: Generative City Photogrammetry from Extreme Off-Nadir Satellite Images Supplementary Material A. Appendix Overview In this ...
work page 2022
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.