pith. machine review for the scientific record. sign in

arxiv: 2605.09039 · v1 · submitted 2026-05-09 · 💻 cs.CV

Recognition: 2 theorem links

· Lean Theorem

SeasonScapes: Learning Large-scale Re-lightable 3D Landscapes with Seasonal Variation from Sparse Webcams

Authors on Pith no claims yet

Pith reviewed 2026-05-12 01:57 UTC · model grok-4.3

classification 💻 cs.CV
keywords 3D reconstructionseasonal variationdiffusion modelsmesh inpaintingrelightingsparse viewswebcam imagerylandscape modeling
0
0 comments X

The pith

Projecting sparse webcam images onto 3D meshes and filling gaps with diffusion models yields seasonal, relightable landscapes.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces a framework that turns collections of timestamped webcam photos into complete 3D models of large mountain areas that show how the landscape looks in different seasons. It first projects each season's images onto a shared mesh, then uses conditional diffusion models to inpaint the missing or occluded parts directly on that mesh. The resulting meshes preserve natural appearance shifts across the year and can be rendered under new lighting with ordinary physically-based tools. A supporting dataset supplies more than 85,000 images from 32 viewpoints across 13 dates in one year, covering tens of kilometers. If the inpainting step succeeds, the method gives a practical route to dynamic 3D environments from everyday camera feeds.

Core claim

By projecting timestamp-specific images onto a 3D mesh we construct seasonal 3D landscapes that reflect natural appearance changes over time; conditional diffusion models then perform image-guided inpainting directly on the mesh to handle occlusions and missing data, after which the completed meshes support relighting by standard physically-based renderers.

What carries the argument

Projection of timestamped images onto a shared 3D mesh followed by conditional diffusion inpainting performed on the mesh surface.

If this is right

  • Seasonal 3D models become feasible for areas spanning 50 km by 60 km using only sparse, publicly available webcam streams.
  • The same meshes can be relit under arbitrary illumination without retraining the underlying model.
  • Natural yearly appearance shifts are encoded directly in the mesh textures rather than added as post-process effects.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The approach could be tested on non-mountain environments such as forests or coastal zones where seasonal foliage or snow cover changes are similarly strong.
  • Combining the mesh outputs with satellite imagery might reduce reliance on ground-level webcams for even larger scales.
  • Real-time applications in virtual tourism or climate visualization become practical once the inpainted meshes are exported to game engines.

Load-bearing premise

Conditional diffusion models can inpaint the 3D mesh accurately enough to restore occlusions and missing data while keeping the natural seasonal appearance changes intact and without adding visible artifacts.

What would settle it

Render the completed meshes under novel lighting and compare the output images against real webcam frames captured at the same locations but different seasons or times of day; systematic mismatches or inpainting artifacts would disprove the claim.

Figures

Figures reproduced from arXiv: 2605.09039 by Danda Pani Paudel, Deheng Zhang, Luc Van Gool, Qi Ma, Timo Kleger.

Figure 1
Figure 1. Figure 1: We introduce SeasonScapes framework and a the SeasonScapes dataset: Swiss Sparse-view Mountain Scenes with Seasonal [PITH_FULL_IMAGE:figures/full_fig_p001_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Pipeline overview. Our pipeline generates time-varying 3D landscapes through a multi-stage process. First, we preprocess Google Earth data and webcam imagery (Sec. 3.2). We then employ a learning-based approach to optimize camera parameters, aligning the 2D webcam frames with the 3D landscape point cloud (Sec. 3.3). In the landscape painting stage, we project the preprocessed images to texture the UV map (… view at source ↗
Figure 4
Figure 4. Figure 4: 3D mesh texturing including inpainting. We iterate over the entire inpainting trajectory {pt} N t=1. For each step t we render an RGBD image. Using the RBG image we mask unseen regions using the HSV color space. We condition ControlNet us￾ing the inpainting mask, depth map, text prompt, and IP-Adapter image. The diffusion output is projected onto the UV map via depth-indexing of visible regions. The algori… view at source ↗
Figure 5
Figure 5. Figure 5: Qualitative comparison. (a) shows the inpainted images from our 3D landscape mesh without the clouds and in row (b) are the ground truth webcam images. The evaluation is done without the sky and without the nearest parts of the mesh. The first image is from the 7.10.2024 at 12 AM, the second image is from the 20.7.2024 at 12 AM, third image is from the 7.12.2024 at 12 AM. painting-related artifacts in the … view at source ↗
Figure 6
Figure 6. Figure 6: Qualitative comparison between our approach and Google Earth. Images (a) are from Google Earth [13], images (b) are our SeasonScapes rendering from the 1.9.2024 at 12 AM, images (c) are our SeasonScapes from the 18.12.2025 at 10 AM. SeasonScapes shows superior photorealistic rendering and time variant ability (a) (b) (c) (d) (e) (f) [PITH_FULL_IMAGE:figures/full_fig_p008_6.png] view at source ↗
Figure 8
Figure 8. Figure 8: Qualitative comparison of novel Wild Gaussian views. (a) Rendered relightable Gaussian outputs and (b) are the inpainted images from our iterative inpainting result. The first and the last image is from the 18.5.2025 at 2 PM, the second image is from the 12.5.2024 at 12 AM. models (DEMs), and satellite imagery - with camera pa￾rameter optimization. We additionally train a relightable Gaussian model using t… view at source ↗
read the original abstract

We introduce SeasonScapes framework and a the SeasonScapes dataset: Swiss Sparse-view Mountain Scenes with Seasonal Changes that covers over 50 km x 60 km, composed of more than 85,000 webcam images captured from 32 different locations across 13 timestamps throughout a full year. By projecting these timestamp-specific images onto a 3D mesh, we construct seasonal 3D landscapes that reflect natural appearance changes over time. To address occlusions and missing data, we leverage conditional diffusion models for image-guided inpainting directly on the mesh. The resulting completed meshes can be further relighted using standard physically-based renderer.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 1 minor

Summary. The paper introduces the SeasonScapes framework and a corresponding dataset of Swiss sparse-view mountain scenes with seasonal changes, spanning over 50 km × 60 km and comprising more than 85,000 webcam images from 32 locations across 13 timestamps in a full year. It describes projecting timestamp-specific images onto 3D meshes to construct seasonal 3D landscapes, then applying conditional diffusion models for image-guided inpainting directly on the mesh to handle occlusions and missing data, with the completed meshes intended for relighting via standard physically-based renderers.

Significance. If the claims hold, the work would provide a scalable pipeline for creating large-scale, relightable 3D models of natural landscapes that capture real seasonal appearance changes from sparse, publicly available webcam imagery. The dataset scale (50×60 km coverage) is a clear strength and could support downstream applications in environmental monitoring, VR/AR, and graphics. The integration of projection with diffusion-based mesh inpainting represents an attempt to address practical challenges in real-world 3D reconstruction from uncontrolled sources.

major comments (2)
  1. [Abstract] Abstract: The pipeline is outlined but supplies no quantitative validation, error metrics, or ablation results; support for the inpainting quality and seasonal fidelity claims cannot be assessed from the available text. This is load-bearing because the central contribution rests on faithful preservation of natural seasonal variations after inpainting.
  2. [Method] Method description (projection and inpainting steps): The manuscript relies on conditional diffusion models for direct image-guided inpainting on the 3D mesh without bounding error from the required 2D-to-mesh mapping (UV parameterization, multi-view rendering, or implicit representation) or providing evidence that fine-grained seasonal cues (snow cover, foliage color, lighting) are preserved rather than hallucinated. This assumption is central to the re-lightable seasonal landscapes claim.
minor comments (1)
  1. [Abstract] Abstract contains a grammatical error: 'We introduce SeasonScapes framework and a the SeasonScapes dataset' should read 'We introduce the SeasonScapes framework and the SeasonScapes dataset'.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their constructive feedback on our manuscript. We address each major comment below and commit to revisions that strengthen the quantitative support and methodological details for our claims.

read point-by-point responses
  1. Referee: [Abstract] Abstract: The pipeline is outlined but supplies no quantitative validation, error metrics, or ablation results; support for the inpainting quality and seasonal fidelity claims cannot be assessed from the available text. This is load-bearing because the central contribution rests on faithful preservation of natural seasonal variations after inpainting.

    Authors: We agree that the current abstract and manuscript text do not include quantitative metrics or ablations, which limits assessment of the inpainting and seasonal fidelity claims. In the revised manuscript we will update the abstract to highlight key quantitative results (e.g., inpainting FID/LPIPS scores and seasonal consistency metrics across timestamps) and add a new experimental section with ablation studies on the diffusion inpainting components and error metrics demonstrating preservation of seasonal cues such as snow cover and foliage color. revision: yes

  2. Referee: [Method] Method description (projection and inpainting steps): The manuscript relies on conditional diffusion models for direct image-guided inpainting on the 3D mesh without bounding error from the required 2D-to-mesh mapping (UV parameterization, multi-view rendering, or implicit representation) or providing evidence that fine-grained seasonal cues (snow cover, foliage color, lighting) are preserved rather than hallucinated. This assumption is central to the re-lightable seasonal landscapes claim.

    Authors: We acknowledge that the method section currently lacks explicit error bounds on the 2D-to-mesh projection and direct evidence that seasonal cues are preserved rather than hallucinated by the diffusion model. We will revise the method description to detail the UV parameterization and projection pipeline, include an analysis of projection-induced errors (e.g., via multi-view consistency checks), and add targeted experiments (qualitative and quantitative) showing that fine-grained cues like snow cover, foliage color, and lighting are transferred from the input images rather than synthesized. revision: yes

Circularity Check

0 steps flagged

No significant circularity; pipeline applies standard projection and diffusion techniques to new data

full rationale

The derivation chain consists of collecting a new webcam dataset, projecting images onto an existing 3D mesh construction, applying conditional diffusion models for inpainting (a pre-existing technique), and using standard physically-based rendering for relighting. No equations or steps reduce the central claims to fitted parameters by construction, self-definitions, or load-bearing self-citations. The approach is self-contained against external benchmarks for mesh projection and diffusion inpainting, with the novelty residing in the seasonal dataset and application rather than any circular redefinition of inputs as outputs.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The central claim rests on standard computer vision primitives for mesh projection and the domain assumption that diffusion models produce faithful seasonal inpaintings; no free parameters or invented entities are explicitly introduced in the abstract.

axioms (2)
  • standard math Images can be accurately projected onto a 3D mesh to represent scene appearance
    Core step for constructing seasonal 3D landscapes from 2D webcam captures.
  • domain assumption Conditional diffusion models can generate plausible completions for occluded or missing mesh regions while preserving seasonal context
    Directly invoked to address occlusions and missing data.

pith-pipeline@v0.9.0 · 5415 in / 1475 out tokens · 57051 ms · 2026-05-12T01:57:03.186712+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

65 extracted references · 65 canonical work pages

  1. [1]

    An Intro to the Earth Engine Python API, 2025

    Guillaume Attard. An Intro to the Earth Engine Python API, 2025. https://developers.google.com/earth- engine/tutorials/community/intro-to-python-api. 4

  2. [2]

    Barron, Ben Mildenhall, Dor Verbin, Pratul P

    Jonathan T. Barron, Ben Mildenhall, Dor Verbin, Pratul P. Srinivasan, and Peter Hedman. Mip-nerf 360: Unbounded anti-aliased neural radiance fields, 2022. 3

  3. [3]

    Bar- ron, Ce Liu, and Hendrik P.A

    Mark Boss, Raphael Braun, Varun Jampani, Jonathan T. Bar- ron, Ce Liu, and Hendrik P.A. Lensch. Nerd: Neural re- flectance decomposition from image collections. InICCV,

  4. [4]

    Landscapear: Large scale out- door augmented reality by matching photographs with ter- rain models using learned descriptors

    Jan Brejcha, Michal Luk ´aˇc, Yannick Hold-Geoffroy, Oliver Wang, and Martin ˇCad´ık. Landscapear: Large scale out- door augmented reality by matching photographs with ter- rain models using learned descriptors. InEuropean Confer- ence on Computer Vision, pages 295–312. Springer, 2020. 2

  5. [5]

    Texfusion: Synthesizing 3d textures with text-guided image diffusion models, 2023

    Tianshi Cao, Karsten Kreis, Sanja Fidler, Nicholas Sharp, and Kangxue Yin. Texfusion: Synthesizing 3d textures with text-guided image diffusion models, 2023. 3

  6. [6]

    D. Chen, G. Baatz, K ¨oser, S. Tsai, R. Vedantham, T. Pyl- vanainen, K. Roimela, X. Chen, J. Bach andM. Pollefeys, B. Girod, and R. Grzeszczuk. City-scale landmark identifica- tion on mobile devices.InProceedings of Computer Vision and Pattern Recognition (CVPR), 2011. 2

  7. [7]

    Text2tex: Text-driven tex- ture synthesis via diffusion models, 2023

    Dave Zhenyu Chen, Yawar Siddiqui, Hsin-Ying Lee, Sergey Tulyakov, and Matthias Nießner. Text2tex: Text-driven tex- ture synthesis via diffusion models, 2023. 3

  8. [8]

    It3d: Improved text- to-3d generation with explicit view synthesis, 2023

    Yiwen Chen, Chi Zhang, Xiaofeng Yang, Zhongang Cai, Gang Yu, Lei Yang, and Guosheng Lin. It3d: Improved text- to-3d generation with explicit view synthesis, 2023. 2

  9. [9]

    Automatic 3d reconstruction from multi-date satellite images, 2017

    Gabriele Facciolo, Carlo De Franchis, and Enric Meinhardt- Llopis. Automatic 3d reconstruction from multi-date satellite images, 2017. 2

  10. [10]

    Citygpt: Empowering urban spatial cognition of large language models, 2025

    Jie Feng, Tianhui Liu, Yuwei Du, Siqi Guo, Yuming Lin, and Yong Li. Citygpt: Empowering urban spatial cognition of large language models, 2025. 2

  11. [11]

    Bermano, Gal Chechik, and Daniel Cohen-Or

    Rinon Gal, Yuval Alaluf, Yuval Atzmon, Or Patashnik, Amit H. Bermano, Gal Chechik, and Daniel Cohen-Or. An image is worth one word: Personalizing text-to-image gen- eration using textual inversion, 2022. 3

  12. [12]

    Vision meets robotics: The kitti dataset.Interna- tional Journal of Robotics Research (IJRR), 2013

    Andreas Geiger, Philip Lenz, Christoph Stiller, and Raquel Urtasun. Vision meets robotics: The kitti dataset.Interna- tional Journal of Robotics Research (IJRR), 2013. 2

  13. [13]

    Google Earth, 2025

    Google. Google Earth, 2025. https://earth.google.com. 8

  14. [14]

    Denoising dif- fusion probabilistic models

    Jonathan Ho, Ajay Jain, and Pieter Abbeel. Denoising dif- fusion probabilistic models. InNeurIPS, pages 6840–6851,

  15. [15]

    Adversarial texture optimization from rgb-d scans, 2020

    Jingwei Huang, Justus Thies, Angela Dai, Abhijit Kundu, Chiyu Max Jiang, Leonidas Guibas, Matthias Nießner, and Thomas Funkhouser. Adversarial texture optimization from rgb-d scans, 2020. 2

  16. [16]

    Rellis-3d dataset: Data, benchmarks and analysis,

    Peng Jiang, Philip Osteen, Maggie Wigness, and Srikanth Saripalli. Rellis-3d dataset: Data, benchmarks and analysis,

  17. [17]

    Tensoir: Tensorial inverse rendering

    Haian Jin, Isabella Liu, Peijia Xu, Xiaoshuai Zhang, Song- fang Han, Sai Bi, Xiaowei Zhou, Zexiang Xu, and Hao Su. Tensoir: Tensorial inverse rendering. InCVPR, 2023. 3

  18. [18]

    Neural gaffer: Relighting any object via diffusion

    Haian Jin, Yuan Li, Fujun Luan, Yuanbo Xiangli, Sai Bi, Kai Zhang, Zexiang Xu, Jin Sun, and Noah Snavely. Neural gaffer: Relighting any object via diffusion. InAdvances in Neural Information Processing Systems, 2024. 3

  19. [19]

    Conerf: Controllable neural radiance fields

    Kacper Kania, Kwang Moo Yi, Marek Kowalski, Tomasz Trzci´nski, and Andrea Tagliasacchi. Conerf: Controllable neural radiance fields. InProceedings of the IEEE/CVF Con- ference on Computer Vision and Pattern Recognition, pages 18623–18632, 2022. 3

  20. [20]

    Mitra, Andrea Vedaldi, and David Novotny

    Animesh Karnewar, Niloy J. Mitra, Andrea Vedaldi, and David Novotny. Holofusion: Towards photo-realistic 3d gen- erative modeling, 2023. 2

  21. [21]

    Screened Poisson Sur- face Reconstruction.ACM Transactions on Graphics, 32,

    Michael Kazhdan and Hugues Hoppe. Screened Poisson Sur- face Reconstruction.ACM Transactions on Graphics, 32,

  22. [22]

    3d gaussian splatting for real-time radiance field rendering, 2023

    Bernhard Kerbl, Georgios Kopanas, Thomas Leimk ¨uhler, and George Drettakis. 3d gaussian splatting for real-time radiance field rendering, 2023. 3

  23. [23]

    A hierarchical 3d gaussian representation for real-time ren- dering of very large datasets.ACM Transactions on Graph- ics, 43(4), 2024

    Bernhard Kerbl, Andreas Meuleman, Georgios Kopanas, Michael Wimmer, Alexandre Lanvin, and George Drettakis. A hierarchical 3d gaussian representation for real-time ren- dering of very large datasets.ACM Transactions on Graph- ics, 43(4), 2024. 2

  24. [24]

    Solid tex- ture synthesis from 2d exemplars.ACM Transactions on Graphics (Proceedings of SIGGRAPH 2007), 26(3):2:1–2:9,

    Johannes Kopf, Chi-Wing Fu, Daniel Cohen-Or, Oliver Deussen, Dani Lischinski, and Tien-Tsin Wong. Solid tex- ture synthesis from 2d exemplars.ACM Transactions on Graphics (Proceedings of SIGGRAPH 2007), 26(3):2:1–2:9,

  25. [25]

    Wildgaussians: 3d gaussian splatting in the wild, 2024

    Jonas Kulhanek, Songyou Peng, Zuzana Kukelova, Marc Pollefeys, and Torsten Sattler. Wildgaussians: 3d gaussian splatting in the wild, 2024. 2, 3, 6

  26. [26]

    Control-nerf: Editable feature volumes for scene rendering and manipulation

    Verica Lazova, Vladimir Guzov, Kyle Olszewski, Sergey Tulyakov, and Gerard Pons-Moll. Control-nerf: Editable feature volumes for scene rendering and manipulation. In Proceedings of the IEEE/CVF Winter Conference on Appli- cations of Computer Vision, pages 4340–4350, 2023. 3

  27. [27]

    Matrixcity: A large-scale city dataset for city-scale neural rendering and beyond

    Yixuan Li, Lihan Jiang, Linning Xu, Yuanbo Xiangli, Zhen- zhi Wang, Dahua Lin, and Bo Dai. Matrixcity: A large-scale city dataset for city-scale neural rendering and beyond. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 3205–3215, 2023. 2

  28. [28]

    Gus-ir: Gaussian splatting with unified shading for inverse rendering, 2024

    Zhihao Liang, Hongdong Li, Kui Jia, Kailing Guo, and Qi Zhang. Gus-ir: Gaussian splatting with unified shading for inverse rendering, 2024. 3

  29. [29]

    Gs-ir: 3d gaussian splatting for inverse rendering

    Zhihao Liang, Qi Zhang, Ying Feng, Ying Shan, and Kui Jia. Gs-ir: 3d gaussian splatting for inverse rendering. InCVPR,

  30. [30]

    Citygaussian: Real-time high-quality large-scale scene rendering with gaussians

    Yang Liu, Chuanchen Luo, Lue Fan, Naiyan Wang, Jun- ran Peng, and Zhaoxiang Zhang. Citygaussian: Real-time high-quality large-scale scene rendering with gaussians. In European Conference on Computer Vision, pages 265–282. Springer, 2025. 2

  31. [31]

    Cityloc: 6dof pose distributional localization for text descriptions in large-scale scenes with gaussian representation, 2025

    Qi Ma, Runyi Yang, Bin Ren, Nicu Sebe, Ender Konukoglu, Luc Van Gool, and Danda Pani Paudel. Cityloc: 6dof pose distributional localization for text descriptions in large-scale scenes with gaussian representation, 2025. 2

  32. [32]

    Ricardo Martin-Brualla, Noha Radwan, Mehdi S. M. Sajjadi, Jonathan T. Barron, Alexey Dosovitskiy, and Daniel Duck- worth. NeRF in the Wild: Neural Radiance Fields for Un- constrained Photo Collections. InCVPR, 2021. 2, 3

  33. [33]

    Sdedit: Guided image synthesis and editing with stochastic differential equa- tions, 2022

    Chenlin Meng, Yutong He, Yang Song, Jiaming Song, Jia- jun Wu, Jun-Yan Zhu, and Stefano Ermon. Sdedit: Guided image synthesis and editing with stochastic differential equa- tions, 2022. 3

  34. [34]

    Srinivasan, Matthew Tancik, Jonathan T

    Ben Mildenhall, Pratul P. Srinivasan, Matthew Tancik, Jonathan T. Barron, Ravi Ramamoorthi, and Ren Ng. Nerf: Representing scenes as neural radiance fields for view syn- thesis, 2020. 3

  35. [35]

    T2i- adapter: Learning adapters to dig out more controllable abil- ity for text-to-image diffusion models, 2023

    Chong Mou, Xintao Wang, Liangbin Xie, Yanze Wu, Jian Zhang, Zhongang Qi, Ying Shan, and Xiaohu Qie. T2i- adapter: Learning adapters to dig out more controllable abil- ity for text-to-image diffusion models, 2023. 3

  36. [36]

    Extracting triangular 3d models, materials, and lighting from images

    Jacob Munkberg, Jon Hasselgren, Tianchang Shen, Jun Gao, Wenzheng Chen, Alex Evans, Thomas M¨uller, and Sanja Fi- dler. Extracting triangular 3d models, materials, and lighting from images. InCVPR, 2022. 3

  37. [37]

    A multi-modal graphical model for scene analysis

    Sarah Taghavi Namin, Mohammad Najafi, Mathieu Salz- mann, and Lars Petersson. A multi-modal graphical model for scene analysis. In2015 IEEE Winter Conference on Applications of Computer Vision, pages 1006–1013. IEEE,

  38. [38]

    Dinov2: Learning robust visual features with- out supervision, 2024

    Maxime Oquab, Timoth ´ee Darcet, Th ´eo Moutakanni, Huy V o, Marc Szafraniec, Vasil Khalidov, Pierre Fernandez, Daniel Haziza, Francisco Massa, Alaaeldin El-Nouby, Mah- moud Assran, Nicolas Ballas, Wojciech Galuba, Russell Howes, Po-Yao Huang, Shang-Wen Li, Ishan Misra, Michael Rabbat, Vasu Sharma, Gabriel Synnaeve, Hu Xu, Herv ´e Je- gou, Julien Mairal, ...

  39. [39]

    Magic123: One image to high-quality 3d object generation using both 2d and 3d diffusion priors, 2023

    Guocheng Qian, Jinjie Mai, Abdullah Hamdi, Jian Ren, Aliaksandr Siarohin, Bing Li, Hsin-Ying Lee, Ivan Sko- rokhodov, Peter Wonka, Sergey Tulyakov, and Bernard Ghanem. Magic123: One image to high-quality 3d object generation using both 2d and 3d diffusion priors, 2023. 2

  40. [40]

    Nerf on-the-go: Exploiting uncertainty for distractor-free nerfs in the wild, 2024

    Weining Ren, Zihan Zhu, Boyang Sun, Jiaqi Chen, Marc Pollefeys, and Songyou Peng. Nerf on-the-go: Exploiting uncertainty for distractor-free nerfs in the wild, 2024. 2

  41. [41]

    Texture: Text-guided texturing of 3d shapes, 2023

    Elad Richardson, Gal Metzer, Yuval Alaluf, Raja Giryes, and Daniel Cohen-Or. Texture: Text-guided texturing of 3d shapes, 2023. 2

  42. [42]

    Image based geo-localization in the alps, 2015

    Olivier Saurer, Georges Baatz, Kevin K¨oser, L’ubor Ladick´y, and Marc Pollefeys. Image based geo-localization in the alps, 2015. 2

  43. [43]

    Deep unsupervised learning using nonequilibrium thermodynamics

    Jascha Sohl-Dickstein, Eric Weiss, Niru Maheswaranathan, and Surya Ganguli. Deep unsupervised learning using nonequilibrium thermodynamics. InICML, pages 2256– 2265, 2015. 3

  44. [44]

    Make-it-3d: High-fidelity 3d creation from a single image with diffusion prior, 2023

    Junshu Tang, Tengfei Wang, Bo Zhang, Ting Zhang, Ran Yi, Lizhuang Ma, and Dong Chen. Make-it-3d: High-fidelity 3d creation from a single image with diffusion prior, 2023. 2

  45. [45]

    Texture synthesis on surfaces

    Greg Turk. Texture synthesis on surfaces. InProceedings of the 28th Annual Conference on Computer Graphics and Interactive Techniques, page 347–354, New York, NY , USA,

  46. [46]

    Association for Computing Machinery. 2

  47. [47]

    Mega-nerf: Scalable construction of large- scale nerfs for virtual fly-throughs

    Haithem Turki, Deva Ramanan, and Mahadev Satya- narayanan. Mega-nerf: Scalable construction of large- scale nerfs for virtual fly-throughs. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 12922–12931, 2022. 2

  48. [48]

    Wildscenes: A benchmark for 2d and 3d semantic segmentation in large-scale natural environ- ments.The International Journal of Robotics Research, 44 (4):532–549, 2025

    Kavisha Vidanapathirana, Joshua Knights, Stephen Hausler, Mark Cox, Milad Ramezani, Jason Jooste, Ethan Griffiths, Shaheer Mohamed, Sridha Sridharan, Clinton Fookes, and Peyman Moghadam. Wildscenes: A benchmark for 2d and 3d semantic segmentation in large-scale natural environ- ments.The International Journal of Robotics Research, 44 (4):532–549, 2025. 2

  49. [49]

    Clip-nerf: Text-and-image driven manip- ulation of neural radiance fields

    Can Wang, Menglei Chai, Mingming He, Dongdong Chen, and Jing Liao. Clip-nerf: Text-and-image driven manip- ulation of neural radiance fields. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 3835–3844, 2022. 3

  50. [50]

    Multi- scale structural similarity for image quality assessment.The Thrity-Seventh Asilomar Conference on Signals, Systems & Computers, 2:1398–1402, 2003

    Zhou Wang, Eero P Simoncelli, and Alan C Bovik. Multi- scale structural similarity for image quality assessment.The Thrity-Seventh Asilomar Conference on Signals, Systems & Computers, 2:1398–1402, 2003. 6

  51. [51]

    A rugd dataset for autonomous nav- igation and visual perception in unstructured outdoor envi- ronments

    Maggie Wigness, Sungmin Eum, John G Rogers, David Han, and Heesung Kwon. A rugd dataset for autonomous nav- igation and visual perception in unstructured outdoor envi- ronments. InInternational Conference on Intelligent Robots and Systems (IROS), 2019. 2

  52. [52]

    Oswald, and Jie Song

    Zirui Wu, Jianteng Chen, Laijian Li, Shaoteng Wu, Zhikai Zhu, Kang Xu, Martin R. Oswald, and Jie Song. 3d gaus- sian inverse rendering with approximated global illumina- tion, 2025. 3

  53. [53]

    Ip- adapter: Text compatible image prompt adapter for text-to- image diffusion models, 2023

    Hu Ye, Jun Zhang, Sibo Liu, Xiao Han, and Wei Yang. Ip- adapter: Text compatible image prompt adapter for text-to- image diffusion models, 2023. 3

  54. [54]

    3d reconstruction from satellite imagery using deep learning

    Tim Yngesj ¨o. 3d reconstruction from satellite imagery using deep learning. Master’s thesis, Link ¨oping University, De- partment of Electrical Engineering, Computer Vision, 2021. 2

  55. [55]

    xatlas: Mesh parameterization / uv unwrap- ping library, 2022

    Jonathan Young. xatlas: Mesh parameterization / uv unwrap- ping library, 2022. 4

  56. [56]

    Point-based radiance fields for controllable human motion synthesis, 2023

    Haitao Yu, Deheng Zhang, Peiyuan Xie, and Tianyi Zhang. Point-based radiance fields for controllable human motion synthesis, 2023. 3

  57. [57]

    Mip-splatting: Alias-free 3d gaussian splat- ting, 2023

    Zehao Yu, Anpei Chen, Binbin Huang, Torsten Sattler, and Andreas Geiger. Mip-splatting: Alias-free 3d gaussian splat- ting, 2023. 3

  58. [58]

    Paint3d: Paint anything 3d with lighting-less texture diffusion models,

    Xianfang Zeng, Xin Chen, Zhongqi Qi, Wen Liu, Zibo Zhao, Zhibin Wang, Bin Fu, Yong Liu, and Gang Yu. Paint3d: Paint anything 3d with lighting-less texture diffusion models,

  59. [59]

    Coarf: Controllable 3d artistic style transfer for radiance fields, 2024

    Deheng Zhang, Clara Fernandez-Labrador, and Christopher Schroers. Coarf: Controllable 3d artistic style transfer for radiance fields, 2024. 3

  60. [60]

    Rise-sdf: A relightable information-shared signed distance field for glossy object inverse rendering

    Deheng Zhang, Jingyu Wang, Shaofei Wang, Marko Miha- jlovic, Sergey Prokudin, Hendrik Lensch, and Siyu Tang. Rise-sdf: A relightable information-shared signed distance field for glossy object inverse rendering. InInternational Conference on 3D Vision (3DV), 2025. 3

  61. [61]

    Adding conditional control to text-to-image diffusion models

    Lvmin Zhang, Anyi Rao, and Maneesh Agrawala. Adding conditional control to text-to-image diffusion models. In 2023 IEEE/CVF International Conference on Computer Vi- sion (ICCV), pages 3813–3824, 2023. 3

  62. [62]

    The unreasonable effectiveness of deep features as a perceptual metric.CVPR, 2018

    Richard Zhang, Phillip Isola, Alexei A Efros, Eli Shechtman, and Oliver Wang. The unreasonable effectiveness of deep features as a perceptual metric.CVPR, 2018. 6

  63. [63]

    Srinivasan, Dor Verbin, Keunhong Park, Ricardo Martin Brualla, and Philipp Henzler

    Xiaoming Zhao, Pratul P. Srinivasan, Dor Verbin, Keunhong Park, Ricardo Martin Brualla, and Philipp Henzler. IllumiN- eRF: 3D Relighting Without Inverse Rendering. InNeurIPS,

  64. [64]

    Gaussiangrasper: 3d language gaussian splatting for open-vocabulary robotic grasping,

    Yuhang Zheng, Xiangyu Chen, Yupeng Zheng, Songen Gu, Runyi Yang, Bu Jin, Pengfei Li, Chengliang Zhong, Zeng- mao Wang, Lina Liu, et al. Gaussiangrasper: 3d lan- guage gaussian splatting for open-vocabulary robotic grasp- ing.arXiv preprint arXiv:2403.09637, 2024. 3

  65. [65]

    I2-sdf: Intrinsic indoor scene reconstruction and editing via raytracing in neural sdfs

    Jingsen Zhu, Yuchi Huo, Qi Ye, Fujun Luan, Jifan Li, Dian- bing Xi, Lisha Wang, Rui Tang, Wei Hua, Hujun Bao, et al. I2-sdf: Intrinsic indoor scene reconstruction and editing via raytracing in neural sdfs. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 12489–12498, 2023. 3