arxiv: 2605.09039 · v1 · submitted 2026-05-09 · 💻 cs.CV

Recognition: 2 theorem links

· Lean Theorem

SeasonScapes: Learning Large-scale Re-lightable 3D Landscapes with Seasonal Variation from Sparse Webcams

Timo Kleger , Qi Ma , Deheng Zhang , Luc Van Gool , Danda Pani Paudel

Authors on Pith no claims yet

Pith reviewed 2026-05-12 01:57 UTC · model grok-4.3

classification 💻 cs.CV

keywords 3D reconstructionseasonal variationdiffusion modelsmesh inpaintingrelightingsparse viewswebcam imagerylandscape modeling

0 comments

The pith

Projecting sparse webcam images onto 3D meshes and filling gaps with diffusion models yields seasonal, relightable landscapes.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces a framework that turns collections of timestamped webcam photos into complete 3D models of large mountain areas that show how the landscape looks in different seasons. It first projects each season's images onto a shared mesh, then uses conditional diffusion models to inpaint the missing or occluded parts directly on that mesh. The resulting meshes preserve natural appearance shifts across the year and can be rendered under new lighting with ordinary physically-based tools. A supporting dataset supplies more than 85,000 images from 32 viewpoints across 13 dates in one year, covering tens of kilometers. If the inpainting step succeeds, the method gives a practical route to dynamic 3D environments from everyday camera feeds.

Core claim

By projecting timestamp-specific images onto a 3D mesh we construct seasonal 3D landscapes that reflect natural appearance changes over time; conditional diffusion models then perform image-guided inpainting directly on the mesh to handle occlusions and missing data, after which the completed meshes support relighting by standard physically-based renderers.

What carries the argument

Projection of timestamped images onto a shared 3D mesh followed by conditional diffusion inpainting performed on the mesh surface.

If this is right

Seasonal 3D models become feasible for areas spanning 50 km by 60 km using only sparse, publicly available webcam streams.
The same meshes can be relit under arbitrary illumination without retraining the underlying model.
Natural yearly appearance shifts are encoded directly in the mesh textures rather than added as post-process effects.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The approach could be tested on non-mountain environments such as forests or coastal zones where seasonal foliage or snow cover changes are similarly strong.
Combining the mesh outputs with satellite imagery might reduce reliance on ground-level webcams for even larger scales.
Real-time applications in virtual tourism or climate visualization become practical once the inpainted meshes are exported to game engines.

Load-bearing premise

Conditional diffusion models can inpaint the 3D mesh accurately enough to restore occlusions and missing data while keeping the natural seasonal appearance changes intact and without adding visible artifacts.

What would settle it

Render the completed meshes under novel lighting and compare the output images against real webcam frames captured at the same locations but different seasons or times of day; systematic mismatches or inpainting artifacts would disprove the claim.

Figures

Figures reproduced from arXiv: 2605.09039 by Danda Pani Paudel, Deheng Zhang, Luc Van Gool, Qi Ma, Timo Kleger.

**Figure 1.** Figure 1: We introduce SeasonScapes framework and a the SeasonScapes dataset: Swiss Sparse-view Mountain Scenes with Seasonal [PITH_FULL_IMAGE:figures/full_fig_p001_1.png] view at source ↗

**Figure 2.** Figure 2: Pipeline overview. Our pipeline generates time-varying 3D landscapes through a multi-stage process. First, we preprocess Google Earth data and webcam imagery (Sec. 3.2). We then employ a learning-based approach to optimize camera parameters, aligning the 2D webcam frames with the 3D landscape point cloud (Sec. 3.3). In the landscape painting stage, we project the preprocessed images to texture the UV map (… view at source ↗

**Figure 4.** Figure 4: 3D mesh texturing including inpainting. We iterate over the entire inpainting trajectory {pt} N t=1. For each step t we render an RGBD image. Using the RBG image we mask unseen regions using the HSV color space. We condition ControlNet using the inpainting mask, depth map, text prompt, and IP-Adapter image. The diffusion output is projected onto the UV map via depth-indexing of visible regions. The algori… view at source ↗

**Figure 5.** Figure 5: Qualitative comparison. (a) shows the inpainted images from our 3D landscape mesh without the clouds and in row (b) are the ground truth webcam images. The evaluation is done without the sky and without the nearest parts of the mesh. The first image is from the 7.10.2024 at 12 AM, the second image is from the 20.7.2024 at 12 AM, third image is from the 7.12.2024 at 12 AM. painting-related artifacts in the … view at source ↗

**Figure 6.** Figure 6: Qualitative comparison between our approach and Google Earth. Images (a) are from Google Earth [13], images (b) are our SeasonScapes rendering from the 1.9.2024 at 12 AM, images (c) are our SeasonScapes from the 18.12.2025 at 10 AM. SeasonScapes shows superior photorealistic rendering and time variant ability (a) (b) (c) (d) (e) (f) [PITH_FULL_IMAGE:figures/full_fig_p008_6.png] view at source ↗

**Figure 8.** Figure 8: Qualitative comparison of novel Wild Gaussian views. (a) Rendered relightable Gaussian outputs and (b) are the inpainted images from our iterative inpainting result. The first and the last image is from the 18.5.2025 at 2 PM, the second image is from the 12.5.2024 at 12 AM. models (DEMs), and satellite imagery - with camera parameter optimization. We additionally train a relightable Gaussian model using t… view at source ↗

read the original abstract

We introduce SeasonScapes framework and a the SeasonScapes dataset: Swiss Sparse-view Mountain Scenes with Seasonal Changes that covers over 50 km x 60 km, composed of more than 85,000 webcam images captured from 32 different locations across 13 timestamps throughout a full year. By projecting these timestamp-specific images onto a 3D mesh, we construct seasonal 3D landscapes that reflect natural appearance changes over time. To address occlusions and missing data, we leverage conditional diffusion models for image-guided inpainting directly on the mesh. The resulting completed meshes can be further relighted using standard physically-based renderer.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper collects a large seasonal webcam dataset over 50x60 km and sketches a mesh-projection plus diffusion-inpainting pipeline for relightable 3D landscapes, but the abstract supplies no metrics or results to show the inpainting preserves seasonal details.

read the letter

The main takeaway is the dataset: more than 85,000 images from 32 public webcams across a 50 by 60 km Swiss mountain area, captured at 13 timestamps over a full year. That is a real step up in scale for seasonal natural scenes compared with the smaller or static datasets common in the field. The framework projects those timestamped images onto a 3D mesh to capture appearance changes, then applies conditional diffusion models for image-guided inpainting on the mesh to handle occlusions and gaps, with the finished meshes intended for standard physically based relighting. The data collection itself is the part that stands out as useful engineering work. Public webcams are cheap and already running, so turning them into large-area seasonal 3D assets makes sense for graphics, VR, or environmental modeling without needing new dense captures. The pipeline is a straightforward combination of projection and existing diffusion techniques applied to this new domain. The soft spot is the total lack of evidence. The abstract describes the steps but reports no quantitative metrics, no ablation studies, no error numbers on inpainting quality, and no checks on whether seasonal cues like snow cover or foliage color survive the process. The stress-test concern about mapping 2D diffusion models onto meshes is fair here: any UV parameterization or rendering step could distort fine details, and without results showing bounded artifacts or comparisons to baselines, it is impossible to tell if the seasonal fidelity claim holds. If the full paper includes solid experiments and released data, the dataset alone would be worth citing for people building outdoor scene models. This is for CV and graphics researchers focused on large-scale natural environments rather than general readers. It deserves peer review because the data contribution is substantial and the idea is plausible, even though the evaluation will need substantial work.

Referee Report

2 major / 1 minor

Summary. The paper introduces the SeasonScapes framework and a corresponding dataset of Swiss sparse-view mountain scenes with seasonal changes, spanning over 50 km × 60 km and comprising more than 85,000 webcam images from 32 locations across 13 timestamps in a full year. It describes projecting timestamp-specific images onto 3D meshes to construct seasonal 3D landscapes, then applying conditional diffusion models for image-guided inpainting directly on the mesh to handle occlusions and missing data, with the completed meshes intended for relighting via standard physically-based renderers.

Significance. If the claims hold, the work would provide a scalable pipeline for creating large-scale, relightable 3D models of natural landscapes that capture real seasonal appearance changes from sparse, publicly available webcam imagery. The dataset scale (50×60 km coverage) is a clear strength and could support downstream applications in environmental monitoring, VR/AR, and graphics. The integration of projection with diffusion-based mesh inpainting represents an attempt to address practical challenges in real-world 3D reconstruction from uncontrolled sources.

major comments (2)

[Abstract] Abstract: The pipeline is outlined but supplies no quantitative validation, error metrics, or ablation results; support for the inpainting quality and seasonal fidelity claims cannot be assessed from the available text. This is load-bearing because the central contribution rests on faithful preservation of natural seasonal variations after inpainting.
[Method] Method description (projection and inpainting steps): The manuscript relies on conditional diffusion models for direct image-guided inpainting on the 3D mesh without bounding error from the required 2D-to-mesh mapping (UV parameterization, multi-view rendering, or implicit representation) or providing evidence that fine-grained seasonal cues (snow cover, foliage color, lighting) are preserved rather than hallucinated. This assumption is central to the re-lightable seasonal landscapes claim.

minor comments (1)

[Abstract] Abstract contains a grammatical error: 'We introduce SeasonScapes framework and a the SeasonScapes dataset' should read 'We introduce the SeasonScapes framework and the SeasonScapes dataset'.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their constructive feedback on our manuscript. We address each major comment below and commit to revisions that strengthen the quantitative support and methodological details for our claims.

read point-by-point responses

Referee: [Abstract] Abstract: The pipeline is outlined but supplies no quantitative validation, error metrics, or ablation results; support for the inpainting quality and seasonal fidelity claims cannot be assessed from the available text. This is load-bearing because the central contribution rests on faithful preservation of natural seasonal variations after inpainting.

Authors: We agree that the current abstract and manuscript text do not include quantitative metrics or ablations, which limits assessment of the inpainting and seasonal fidelity claims. In the revised manuscript we will update the abstract to highlight key quantitative results (e.g., inpainting FID/LPIPS scores and seasonal consistency metrics across timestamps) and add a new experimental section with ablation studies on the diffusion inpainting components and error metrics demonstrating preservation of seasonal cues such as snow cover and foliage color. revision: yes
Referee: [Method] Method description (projection and inpainting steps): The manuscript relies on conditional diffusion models for direct image-guided inpainting on the 3D mesh without bounding error from the required 2D-to-mesh mapping (UV parameterization, multi-view rendering, or implicit representation) or providing evidence that fine-grained seasonal cues (snow cover, foliage color, lighting) are preserved rather than hallucinated. This assumption is central to the re-lightable seasonal landscapes claim.

Authors: We acknowledge that the method section currently lacks explicit error bounds on the 2D-to-mesh projection and direct evidence that seasonal cues are preserved rather than hallucinated by the diffusion model. We will revise the method description to detail the UV parameterization and projection pipeline, include an analysis of projection-induced errors (e.g., via multi-view consistency checks), and add targeted experiments (qualitative and quantitative) showing that fine-grained cues like snow cover, foliage color, and lighting are transferred from the input images rather than synthesized. revision: yes

Circularity Check

0 steps flagged

No significant circularity; pipeline applies standard projection and diffusion techniques to new data

full rationale

The derivation chain consists of collecting a new webcam dataset, projecting images onto an existing 3D mesh construction, applying conditional diffusion models for inpainting (a pre-existing technique), and using standard physically-based rendering for relighting. No equations or steps reduce the central claims to fitted parameters by construction, self-definitions, or load-bearing self-citations. The approach is self-contained against external benchmarks for mesh projection and diffusion inpainting, with the novelty residing in the seasonal dataset and application rather than any circular redefinition of inputs as outputs.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The central claim rests on standard computer vision primitives for mesh projection and the domain assumption that diffusion models produce faithful seasonal inpaintings; no free parameters or invented entities are explicitly introduced in the abstract.

axioms (2)

standard math Images can be accurately projected onto a 3D mesh to represent scene appearance
Core step for constructing seasonal 3D landscapes from 2D webcam captures.
domain assumption Conditional diffusion models can generate plausible completions for occluded or missing mesh regions while preserving seasonal context
Directly invoked to address occlusions and missing data.

pith-pipeline@v0.9.0 · 5415 in / 1475 out tokens · 57051 ms · 2026-05-12T01:57:03.186712+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Foundation/AlexanderDuality.lean alexander_duality_circle_linking unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

By projecting these timestamp-specific images onto a 3D mesh, we construct seasonal 3D landscapes... leverage conditional diffusion models for image-guided inpainting directly on the mesh... train a relightable Gaussian Splat model with appearance Multi-Layer Perceptron.
IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

We propose SeasonScapes framework for relightable 3D landscape creation that leverages diffusion priors to inpaint unseen regions with high fidelity.

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

65 extracted references · 65 canonical work pages

[1]

An Intro to the Earth Engine Python API, 2025

Guillaume Attard. An Intro to the Earth Engine Python API, 2025. https://developers.google.com/earth- engine/tutorials/community/intro-to-python-api. 4

work page 2025
[2]

Barron, Ben Mildenhall, Dor Verbin, Pratul P

Jonathan T. Barron, Ben Mildenhall, Dor Verbin, Pratul P. Srinivasan, and Peter Hedman. Mip-nerf 360: Unbounded anti-aliased neural radiance fields, 2022. 3

work page 2022
[3]

Bar- ron, Ce Liu, and Hendrik P.A

Mark Boss, Raphael Braun, Varun Jampani, Jonathan T. Bar- ron, Ce Liu, and Hendrik P.A. Lensch. Nerd: Neural re- flectance decomposition from image collections. InICCV,

work page
[4]

Landscapear: Large scale out- door augmented reality by matching photographs with ter- rain models using learned descriptors

Jan Brejcha, Michal Luk ´aˇc, Yannick Hold-Geoffroy, Oliver Wang, and Martin ˇCad´ık. Landscapear: Large scale out- door augmented reality by matching photographs with ter- rain models using learned descriptors. InEuropean Confer- ence on Computer Vision, pages 295–312. Springer, 2020. 2

work page 2020
[5]

Texfusion: Synthesizing 3d textures with text-guided image diffusion models, 2023

Tianshi Cao, Karsten Kreis, Sanja Fidler, Nicholas Sharp, and Kangxue Yin. Texfusion: Synthesizing 3d textures with text-guided image diffusion models, 2023. 3

work page 2023
[6]

D. Chen, G. Baatz, K ¨oser, S. Tsai, R. Vedantham, T. Pyl- vanainen, K. Roimela, X. Chen, J. Bach andM. Pollefeys, B. Girod, and R. Grzeszczuk. City-scale landmark identifica- tion on mobile devices.InProceedings of Computer Vision and Pattern Recognition (CVPR), 2011. 2

work page 2011
[7]

Text2tex: Text-driven tex- ture synthesis via diffusion models, 2023

Dave Zhenyu Chen, Yawar Siddiqui, Hsin-Ying Lee, Sergey Tulyakov, and Matthias Nießner. Text2tex: Text-driven tex- ture synthesis via diffusion models, 2023. 3

work page 2023
[8]

It3d: Improved text- to-3d generation with explicit view synthesis, 2023

Yiwen Chen, Chi Zhang, Xiaofeng Yang, Zhongang Cai, Gang Yu, Lei Yang, and Guosheng Lin. It3d: Improved text- to-3d generation with explicit view synthesis, 2023. 2

work page 2023
[9]

Automatic 3d reconstruction from multi-date satellite images, 2017

Gabriele Facciolo, Carlo De Franchis, and Enric Meinhardt- Llopis. Automatic 3d reconstruction from multi-date satellite images, 2017. 2

work page 2017
[10]

Citygpt: Empowering urban spatial cognition of large language models, 2025

Jie Feng, Tianhui Liu, Yuwei Du, Siqi Guo, Yuming Lin, and Yong Li. Citygpt: Empowering urban spatial cognition of large language models, 2025. 2

work page 2025
[11]

Bermano, Gal Chechik, and Daniel Cohen-Or

Rinon Gal, Yuval Alaluf, Yuval Atzmon, Or Patashnik, Amit H. Bermano, Gal Chechik, and Daniel Cohen-Or. An image is worth one word: Personalizing text-to-image gen- eration using textual inversion, 2022. 3

work page 2022
[12]

Vision meets robotics: The kitti dataset.Interna- tional Journal of Robotics Research (IJRR), 2013

Andreas Geiger, Philip Lenz, Christoph Stiller, and Raquel Urtasun. Vision meets robotics: The kitti dataset.Interna- tional Journal of Robotics Research (IJRR), 2013. 2

work page 2013
[13]

Google Earth, 2025

Google. Google Earth, 2025. https://earth.google.com. 8

work page 2025
[14]

Denoising dif- fusion probabilistic models

Jonathan Ho, Ajay Jain, and Pieter Abbeel. Denoising dif- fusion probabilistic models. InNeurIPS, pages 6840–6851,

work page
[15]

Adversarial texture optimization from rgb-d scans, 2020

Jingwei Huang, Justus Thies, Angela Dai, Abhijit Kundu, Chiyu Max Jiang, Leonidas Guibas, Matthias Nießner, and Thomas Funkhouser. Adversarial texture optimization from rgb-d scans, 2020. 2

work page 2020
[16]

Rellis-3d dataset: Data, benchmarks and analysis,

Peng Jiang, Philip Osteen, Maggie Wigness, and Srikanth Saripalli. Rellis-3d dataset: Data, benchmarks and analysis,

work page
[17]

Tensoir: Tensorial inverse rendering

Haian Jin, Isabella Liu, Peijia Xu, Xiaoshuai Zhang, Song- fang Han, Sai Bi, Xiaowei Zhou, Zexiang Xu, and Hao Su. Tensoir: Tensorial inverse rendering. InCVPR, 2023. 3

work page 2023
[18]

Neural gaffer: Relighting any object via diffusion

Haian Jin, Yuan Li, Fujun Luan, Yuanbo Xiangli, Sai Bi, Kai Zhang, Zexiang Xu, Jin Sun, and Noah Snavely. Neural gaffer: Relighting any object via diffusion. InAdvances in Neural Information Processing Systems, 2024. 3

work page 2024
[19]

Conerf: Controllable neural radiance fields

Kacper Kania, Kwang Moo Yi, Marek Kowalski, Tomasz Trzci´nski, and Andrea Tagliasacchi. Conerf: Controllable neural radiance fields. InProceedings of the IEEE/CVF Con- ference on Computer Vision and Pattern Recognition, pages 18623–18632, 2022. 3

work page 2022
[20]

Mitra, Andrea Vedaldi, and David Novotny

Animesh Karnewar, Niloy J. Mitra, Andrea Vedaldi, and David Novotny. Holofusion: Towards photo-realistic 3d gen- erative modeling, 2023. 2

work page 2023
[21]

Screened Poisson Sur- face Reconstruction.ACM Transactions on Graphics, 32,

Michael Kazhdan and Hugues Hoppe. Screened Poisson Sur- face Reconstruction.ACM Transactions on Graphics, 32,

work page
[22]

3d gaussian splatting for real-time radiance field rendering, 2023

Bernhard Kerbl, Georgios Kopanas, Thomas Leimk ¨uhler, and George Drettakis. 3d gaussian splatting for real-time radiance field rendering, 2023. 3

work page 2023
[23]

A hierarchical 3d gaussian representation for real-time ren- dering of very large datasets.ACM Transactions on Graph- ics, 43(4), 2024

Bernhard Kerbl, Andreas Meuleman, Georgios Kopanas, Michael Wimmer, Alexandre Lanvin, and George Drettakis. A hierarchical 3d gaussian representation for real-time ren- dering of very large datasets.ACM Transactions on Graph- ics, 43(4), 2024. 2

work page 2024
[24]

Solid tex- ture synthesis from 2d exemplars.ACM Transactions on Graphics (Proceedings of SIGGRAPH 2007), 26(3):2:1–2:9,

Johannes Kopf, Chi-Wing Fu, Daniel Cohen-Or, Oliver Deussen, Dani Lischinski, and Tien-Tsin Wong. Solid tex- ture synthesis from 2d exemplars.ACM Transactions on Graphics (Proceedings of SIGGRAPH 2007), 26(3):2:1–2:9,

work page 2007
[25]

Wildgaussians: 3d gaussian splatting in the wild, 2024

Jonas Kulhanek, Songyou Peng, Zuzana Kukelova, Marc Pollefeys, and Torsten Sattler. Wildgaussians: 3d gaussian splatting in the wild, 2024. 2, 3, 6

work page 2024
[26]

Control-nerf: Editable feature volumes for scene rendering and manipulation

Verica Lazova, Vladimir Guzov, Kyle Olszewski, Sergey Tulyakov, and Gerard Pons-Moll. Control-nerf: Editable feature volumes for scene rendering and manipulation. In Proceedings of the IEEE/CVF Winter Conference on Appli- cations of Computer Vision, pages 4340–4350, 2023. 3

work page 2023
[27]

Matrixcity: A large-scale city dataset for city-scale neural rendering and beyond

Yixuan Li, Lihan Jiang, Linning Xu, Yuanbo Xiangli, Zhen- zhi Wang, Dahua Lin, and Bo Dai. Matrixcity: A large-scale city dataset for city-scale neural rendering and beyond. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 3205–3215, 2023. 2

work page 2023
[28]

Gus-ir: Gaussian splatting with unified shading for inverse rendering, 2024

Zhihao Liang, Hongdong Li, Kui Jia, Kailing Guo, and Qi Zhang. Gus-ir: Gaussian splatting with unified shading for inverse rendering, 2024. 3

work page 2024
[29]

Gs-ir: 3d gaussian splatting for inverse rendering

Zhihao Liang, Qi Zhang, Ying Feng, Ying Shan, and Kui Jia. Gs-ir: 3d gaussian splatting for inverse rendering. InCVPR,

work page
[30]

Citygaussian: Real-time high-quality large-scale scene rendering with gaussians

Yang Liu, Chuanchen Luo, Lue Fan, Naiyan Wang, Jun- ran Peng, and Zhaoxiang Zhang. Citygaussian: Real-time high-quality large-scale scene rendering with gaussians. In European Conference on Computer Vision, pages 265–282. Springer, 2025. 2

work page 2025
[31]

Cityloc: 6dof pose distributional localization for text descriptions in large-scale scenes with gaussian representation, 2025

Qi Ma, Runyi Yang, Bin Ren, Nicu Sebe, Ender Konukoglu, Luc Van Gool, and Danda Pani Paudel. Cityloc: 6dof pose distributional localization for text descriptions in large-scale scenes with gaussian representation, 2025. 2

work page 2025
[32]

Ricardo Martin-Brualla, Noha Radwan, Mehdi S. M. Sajjadi, Jonathan T. Barron, Alexey Dosovitskiy, and Daniel Duck- worth. NeRF in the Wild: Neural Radiance Fields for Un- constrained Photo Collections. InCVPR, 2021. 2, 3

work page 2021
[33]

Sdedit: Guided image synthesis and editing with stochastic differential equa- tions, 2022

Chenlin Meng, Yutong He, Yang Song, Jiaming Song, Jia- jun Wu, Jun-Yan Zhu, and Stefano Ermon. Sdedit: Guided image synthesis and editing with stochastic differential equa- tions, 2022. 3

work page 2022
[34]

Srinivasan, Matthew Tancik, Jonathan T

Ben Mildenhall, Pratul P. Srinivasan, Matthew Tancik, Jonathan T. Barron, Ravi Ramamoorthi, and Ren Ng. Nerf: Representing scenes as neural radiance fields for view syn- thesis, 2020. 3

work page 2020
[35]

T2i- adapter: Learning adapters to dig out more controllable abil- ity for text-to-image diffusion models, 2023

Chong Mou, Xintao Wang, Liangbin Xie, Yanze Wu, Jian Zhang, Zhongang Qi, Ying Shan, and Xiaohu Qie. T2i- adapter: Learning adapters to dig out more controllable abil- ity for text-to-image diffusion models, 2023. 3

work page 2023
[36]

Extracting triangular 3d models, materials, and lighting from images

Jacob Munkberg, Jon Hasselgren, Tianchang Shen, Jun Gao, Wenzheng Chen, Alex Evans, Thomas M¨uller, and Sanja Fi- dler. Extracting triangular 3d models, materials, and lighting from images. InCVPR, 2022. 3

work page 2022
[37]

A multi-modal graphical model for scene analysis

Sarah Taghavi Namin, Mohammad Najafi, Mathieu Salz- mann, and Lars Petersson. A multi-modal graphical model for scene analysis. In2015 IEEE Winter Conference on Applications of Computer Vision, pages 1006–1013. IEEE,

work page
[38]

Dinov2: Learning robust visual features with- out supervision, 2024

Maxime Oquab, Timoth ´ee Darcet, Th ´eo Moutakanni, Huy V o, Marc Szafraniec, Vasil Khalidov, Pierre Fernandez, Daniel Haziza, Francisco Massa, Alaaeldin El-Nouby, Mah- moud Assran, Nicolas Ballas, Wojciech Galuba, Russell Howes, Po-Yao Huang, Shang-Wen Li, Ishan Misra, Michael Rabbat, Vasu Sharma, Gabriel Synnaeve, Hu Xu, Herv ´e Je- gou, Julien Mairal, ...

work page 2024
[39]

Magic123: One image to high-quality 3d object generation using both 2d and 3d diffusion priors, 2023

Guocheng Qian, Jinjie Mai, Abdullah Hamdi, Jian Ren, Aliaksandr Siarohin, Bing Li, Hsin-Ying Lee, Ivan Sko- rokhodov, Peter Wonka, Sergey Tulyakov, and Bernard Ghanem. Magic123: One image to high-quality 3d object generation using both 2d and 3d diffusion priors, 2023. 2

work page 2023
[40]

Nerf on-the-go: Exploiting uncertainty for distractor-free nerfs in the wild, 2024

Weining Ren, Zihan Zhu, Boyang Sun, Jiaqi Chen, Marc Pollefeys, and Songyou Peng. Nerf on-the-go: Exploiting uncertainty for distractor-free nerfs in the wild, 2024. 2

work page 2024
[41]

Texture: Text-guided texturing of 3d shapes, 2023

Elad Richardson, Gal Metzer, Yuval Alaluf, Raja Giryes, and Daniel Cohen-Or. Texture: Text-guided texturing of 3d shapes, 2023. 2

work page 2023
[42]

Image based geo-localization in the alps, 2015

Olivier Saurer, Georges Baatz, Kevin K¨oser, L’ubor Ladick´y, and Marc Pollefeys. Image based geo-localization in the alps, 2015. 2

work page 2015
[43]

Deep unsupervised learning using nonequilibrium thermodynamics

Jascha Sohl-Dickstein, Eric Weiss, Niru Maheswaranathan, and Surya Ganguli. Deep unsupervised learning using nonequilibrium thermodynamics. InICML, pages 2256– 2265, 2015. 3

work page 2015
[44]

Make-it-3d: High-fidelity 3d creation from a single image with diffusion prior, 2023

Junshu Tang, Tengfei Wang, Bo Zhang, Ting Zhang, Ran Yi, Lizhuang Ma, and Dong Chen. Make-it-3d: High-fidelity 3d creation from a single image with diffusion prior, 2023. 2

work page 2023
[45]

Texture synthesis on surfaces

Greg Turk. Texture synthesis on surfaces. InProceedings of the 28th Annual Conference on Computer Graphics and Interactive Techniques, page 347–354, New York, NY , USA,

work page
[46]

Association for Computing Machinery. 2

work page
[47]

Mega-nerf: Scalable construction of large- scale nerfs for virtual fly-throughs

Haithem Turki, Deva Ramanan, and Mahadev Satya- narayanan. Mega-nerf: Scalable construction of large- scale nerfs for virtual fly-throughs. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 12922–12931, 2022. 2

work page 2022
[48]

Wildscenes: A benchmark for 2d and 3d semantic segmentation in large-scale natural environ- ments.The International Journal of Robotics Research, 44 (4):532–549, 2025

Kavisha Vidanapathirana, Joshua Knights, Stephen Hausler, Mark Cox, Milad Ramezani, Jason Jooste, Ethan Griffiths, Shaheer Mohamed, Sridha Sridharan, Clinton Fookes, and Peyman Moghadam. Wildscenes: A benchmark for 2d and 3d semantic segmentation in large-scale natural environ- ments.The International Journal of Robotics Research, 44 (4):532–549, 2025. 2

work page 2025
[49]

Clip-nerf: Text-and-image driven manip- ulation of neural radiance fields

Can Wang, Menglei Chai, Mingming He, Dongdong Chen, and Jing Liao. Clip-nerf: Text-and-image driven manip- ulation of neural radiance fields. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 3835–3844, 2022. 3

work page 2022
[50]

Multi- scale structural similarity for image quality assessment.The Thrity-Seventh Asilomar Conference on Signals, Systems & Computers, 2:1398–1402, 2003

Zhou Wang, Eero P Simoncelli, and Alan C Bovik. Multi- scale structural similarity for image quality assessment.The Thrity-Seventh Asilomar Conference on Signals, Systems & Computers, 2:1398–1402, 2003. 6

work page 2003
[51]

A rugd dataset for autonomous nav- igation and visual perception in unstructured outdoor envi- ronments

Maggie Wigness, Sungmin Eum, John G Rogers, David Han, and Heesung Kwon. A rugd dataset for autonomous nav- igation and visual perception in unstructured outdoor envi- ronments. InInternational Conference on Intelligent Robots and Systems (IROS), 2019. 2

work page 2019
[52]

Oswald, and Jie Song

Zirui Wu, Jianteng Chen, Laijian Li, Shaoteng Wu, Zhikai Zhu, Kang Xu, Martin R. Oswald, and Jie Song. 3d gaus- sian inverse rendering with approximated global illumina- tion, 2025. 3

work page 2025
[53]

Ip- adapter: Text compatible image prompt adapter for text-to- image diffusion models, 2023

Hu Ye, Jun Zhang, Sibo Liu, Xiao Han, and Wei Yang. Ip- adapter: Text compatible image prompt adapter for text-to- image diffusion models, 2023. 3

work page 2023
[54]

3d reconstruction from satellite imagery using deep learning

Tim Yngesj ¨o. 3d reconstruction from satellite imagery using deep learning. Master’s thesis, Link ¨oping University, De- partment of Electrical Engineering, Computer Vision, 2021. 2

work page 2021
[55]

xatlas: Mesh parameterization / uv unwrap- ping library, 2022

Jonathan Young. xatlas: Mesh parameterization / uv unwrap- ping library, 2022. 4

work page 2022
[56]

Point-based radiance fields for controllable human motion synthesis, 2023

Haitao Yu, Deheng Zhang, Peiyuan Xie, and Tianyi Zhang. Point-based radiance fields for controllable human motion synthesis, 2023. 3

work page 2023
[57]

Mip-splatting: Alias-free 3d gaussian splat- ting, 2023

Zehao Yu, Anpei Chen, Binbin Huang, Torsten Sattler, and Andreas Geiger. Mip-splatting: Alias-free 3d gaussian splat- ting, 2023. 3

work page 2023
[58]

Paint3d: Paint anything 3d with lighting-less texture diffusion models,

Xianfang Zeng, Xin Chen, Zhongqi Qi, Wen Liu, Zibo Zhao, Zhibin Wang, Bin Fu, Yong Liu, and Gang Yu. Paint3d: Paint anything 3d with lighting-less texture diffusion models,

work page
[59]

Coarf: Controllable 3d artistic style transfer for radiance fields, 2024

Deheng Zhang, Clara Fernandez-Labrador, and Christopher Schroers. Coarf: Controllable 3d artistic style transfer for radiance fields, 2024. 3

work page 2024
[60]

Rise-sdf: A relightable information-shared signed distance field for glossy object inverse rendering

Deheng Zhang, Jingyu Wang, Shaofei Wang, Marko Miha- jlovic, Sergey Prokudin, Hendrik Lensch, and Siyu Tang. Rise-sdf: A relightable information-shared signed distance field for glossy object inverse rendering. InInternational Conference on 3D Vision (3DV), 2025. 3

work page 2025
[61]

Adding conditional control to text-to-image diffusion models

Lvmin Zhang, Anyi Rao, and Maneesh Agrawala. Adding conditional control to text-to-image diffusion models. In 2023 IEEE/CVF International Conference on Computer Vi- sion (ICCV), pages 3813–3824, 2023. 3

work page 2023
[62]

The unreasonable effectiveness of deep features as a perceptual metric.CVPR, 2018

Richard Zhang, Phillip Isola, Alexei A Efros, Eli Shechtman, and Oliver Wang. The unreasonable effectiveness of deep features as a perceptual metric.CVPR, 2018. 6

work page 2018
[63]

Srinivasan, Dor Verbin, Keunhong Park, Ricardo Martin Brualla, and Philipp Henzler

Xiaoming Zhao, Pratul P. Srinivasan, Dor Verbin, Keunhong Park, Ricardo Martin Brualla, and Philipp Henzler. IllumiN- eRF: 3D Relighting Without Inverse Rendering. InNeurIPS,

work page
[64]

Gaussiangrasper: 3d language gaussian splatting for open-vocabulary robotic grasping,

Yuhang Zheng, Xiangyu Chen, Yupeng Zheng, Songen Gu, Runyi Yang, Bu Jin, Pengfei Li, Chengliang Zhong, Zeng- mao Wang, Lina Liu, et al. Gaussiangrasper: 3d lan- guage gaussian splatting for open-vocabulary robotic grasp- ing.arXiv preprint arXiv:2403.09637, 2024. 3

work page arXiv 2024
[65]

I2-sdf: Intrinsic indoor scene reconstruction and editing via raytracing in neural sdfs

Jingsen Zhu, Yuchi Huo, Qi Ye, Fujun Luan, Jifan Li, Dian- bing Xi, Lisha Wang, Rui Tang, Wei Hua, Hujun Bao, et al. I2-sdf: Intrinsic indoor scene reconstruction and editing via raytracing in neural sdfs. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 12489–12498, 2023. 3

work page 2023