arxiv: 2605.03610 · v1 · submitted 2026-05-05 · 💻 cs.CV · eess.IV

Recognition: unknown

deSEO: Physics-Aware Dataset Creation for High-Resolution Satellite Image Shadow Removal

Lorenzo Beltrame , Jules Salzinger , Filip Svoboda , Phillipp Fanta-Jende , Jasmin Lampert , Radu Timofte , Marco K\"orner

Authors on Pith no claims yet

Pith reviewed 2026-05-08 01:14 UTC · model grok-4.3

classification 💻 cs.CV eess.IV

keywords shadow removalsatellite imagerypaired datasetEarth observationgeometry-aware registrationDSM integrationdeep learning

0 comments

The pith

deSEO builds the first paired dataset for removing shadows from high-resolution satellite images by aligning clear reference acquisitions with shadowed ones through geometric and physics constraints.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper addresses the absence of suitable paired data for training shadow removal models on satellite Earth observation imagery. It develops deSEO as a replicable pipeline that selects minimally shadowed images as references and registers them to shadowed counterparts using temporal filtering, orientation normalization, and feature matching with validity masks to create supervised pairs. The authors then train a DSM-aware model that incorporates residual translation, perceptual losses, and adversarial training constrained by alignment masks. This setup produces deshadowed outputs with better structural and visual quality than direct adaptations of ground-level shadow removal networks. If the approach holds, it would supply reproducible training resources and a baseline method for handling terrain and structure shadows in high-resolution satellite analysis.

Core claim

deSEO derives the first reproducible geometry-aware paired dataset for satellite shadow removal from existing shadow detection data by selecting a minimally shadowed acquisition as weak reference, applying Jacobian-based orientation normalisation and LoFTR-RANSAC registration, and restricting supervision to per-pixel validity masks; this is paired with a DSM-aware deshadowing model using residual translation, perceptual objectives, and mask-constrained adversarial learning that converges where standard UAV-based architectures fail.

What carries the argument

The deSEO pipeline that selects a minimally shadowed reference tile and registers it to shadowed acquisitions via temporal filtering, Jacobian orientation normalisation, LoFTR-RANSAC alignment, and a per-pixel validity mask to generate reliable paired supervision.

If this is right

Supervised models can now be trained directly on satellite viewpoint variability instead of relying on unpaired or weakly supervised formulations.
Downstream tasks including land classification, object detection, and 3D reconstruction receive inputs with reduced shadow artifacts.
The same selection and registration steps can generate additional paired data from other temporal satellite collections.
DSM integration in the model improves handling of terrain-induced shadows compared with purely image-based approaches.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The method could be adapted to correct other illumination variations such as seasonal or atmospheric effects in multi-date satellite stacks.
Extending the validity mask concept might support joint training across different satellite sensors with varying off-nadir angles.
The paired data opens the possibility of benchmarking multiple shadow removal architectures under consistent satellite geometry conditions.

Load-bearing premise

That choosing a minimally shadowed image as reference and performing the described registration steps produces alignments accurate enough for pixel-level training despite leftover parallax and scene changes over time.

What would settle it

Quantitative comparison of shadow removal results on a held-out collection of satellite images that includes independently verified shadow-free acquisitions or dense shadow boundary annotations, measuring metrics such as structural similarity and perceptual error against the current baseline.

Figures

Figures reproduced from arXiv: 2605.03610 by Filip Svoboda, Jasmin Lampert, Jules Salzinger, Lorenzo Beltrame, Marco K\"orner, Phillipp Fanta-Jende, Radu Timofte.

**Figure 1.** Figure 1: Diagram of deSEO, the proposed framework for shadow removal dataset creation. The processing pipeline uses multitemporal view at source ↗

**Figure 2.** Figure 2: Acquisition from the UCSD tile: (a) LiDAR-derived view at source ↗

**Figure 3.** Figure 3: Rejected examples for (a) azimuth, (b) off-nadir, and (c) view at source ↗

**Figure 4.** Figure 4: Examples for (a) the unprocessed acquisition with view at source ↗

**Figure 5.** Figure 5: Examples from the validation set during pretraining. view at source ↗

**Figure 6.** Figure 6: Qualitative comparison of the full model and the view at source ↗

**Figure 7.** Figure 7: Examples from the test set on the baseline model. view at source ↗

read the original abstract

Shadows cast by terrain and tall structures remain a major obstacle for high-resolution satellite image analysis, degrading classification, detection, and 3D reconstruction performance. Public resources offering geometry-consistent paired shadow/shadow-free satellite imagery are essentially missing, and most Earth-observation datasets are designed for shadow detection or 3D modelling rather than removal. Existing deep shadow-removal datasets either target ground-level or aerial scenes or rely on unpaired and weakly supervised formulations rather than explicit satellite pairs. We address this gap with deSEO, a geometry-aware and physics-informed methodology that, to the best of our knowledge, is the first to derive paired supervision for satellite shadow removal from the S-EO shadow detection dataset through a fully replicable pipeline. For each tile, deSEO selects a minimally shadowed acquisition as a weak reference and pairs it with shadowed counterparts using temporal and geometric filtering, Jacobian-based orientation normalisation, and LoFTR-RANSAC registration. A per-pixel validity mask restricts learning to reliably aligned regions, enabling supervision despite residual off-nadir parallax. In addition to this paired dataset, we develop a DSM-aware deshadowing model that combines residual translation, perceptual objectives, and mask-constrained adversarial learning. In contrast, a direct adaptation of a UAV-based SRNet/pix2pix architecture fails to converge under satellite viewpoint variability. Our model consistently reduces the visual impact of cast shadows across diverse illumination and viewing conditions, achieving improved structural and perceptual fidelity on held-out scenes. deSEO therefore provides the first reproducible, geometry-aware paired dataset and baseline for shadow removal in satellite Earth observation.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper delivers the first replicable paired satellite shadow-removal dataset via geometry-aware pairing but leaves the supervision accuracy unquantified.

read the letter

The paper's main contribution is a replicable pipeline to create paired shadow and shadow-free high-resolution satellite images from the existing S-EO dataset, plus a baseline DSM-aware removal model. They pick a low-shadow acquisition as the weak reference for each tile, normalize orientations with Jacobian, register using LoFTR-RANSAC, and use a validity mask to supervise only aligned pixels. The model adds DSM guidance to residual translation, perceptual loss, and masked adversarial training, which helps it handle satellite view variations better than direct UAV model adaptations. This is new because prior shadow removal work stayed at ground or aerial levels or used unpaired setups, and no public paired satellite resource existed for this. The replicable geometry-aware approach is a solid step for Earth observation tasks like classification and 3D recon. The weak part is the unproven quality of those pairs. Residual off-nadir parallax on buildings, small registration errors, and changes over time like vegetation or cars could introduce label noise even inside the mask. The abstract claims the mask enables good supervision and shows visual improvements, but without reported alignment error stats, non-shadow consistency metrics, or mask threshold ablations, it's unclear how reliable the training signal is. That matches the stress-test worry. This work suits researchers in satellite image processing and shadow handling in remote sensing. A reader building models for high-res EO data would get value from the dataset if released, and the pipeline description. I would send it for peer review. The gap it targets is real, and the method is concrete, but referees should push for quantitative validation of the pairing accuracy to confirm the claims.

Referee Report

2 major / 2 minor

Summary. The paper introduces deSEO, a pipeline that derives paired shadow/shadow-free supervision for high-resolution satellite imagery from the public S-EO dataset. For each tile it selects a minimally shadowed acquisition as weak reference, applies Jacobian orientation normalisation and LoFTR-RANSAC registration, and restricts supervision to a per-pixel validity mask; a DSM-aware residual network is then trained with perceptual and adversarial losses. The authors claim this yields the first reproducible geometry-aware paired dataset and a baseline model that improves structural and perceptual fidelity on held-out scenes where direct adaptation of UAV shadow-removal architectures fails.

Significance. If the masked pairs prove sufficiently accurate, deSEO would supply the first publicly replicable supervised training resource for satellite shadow removal, addressing a documented gap between existing shadow-detection/3D datasets and removal tasks. The reproducible pipeline and DSM integration are concrete strengths that could accelerate follow-on work in Earth-observation restoration.

major comments (2)

[Abstract and §4] Abstract and §4 (evaluation): the statements that the model 'consistently reduces the visual impact of cast shadows' and achieves 'improved structural and perceptual fidelity' are unsupported by any reported quantitative metrics, PSNR/SSIM values, error bars, ablation tables, or failure-case analysis. Without these numbers the central claim that the DSM-aware model outperforms adapted baselines cannot be assessed.
[§3] §3 (dataset construction): the per-pixel validity mask is asserted to 'enable supervision despite residual off-nadir parallax,' yet no alignment-error statistics (e.g., mean pixel displacement, fraction of valid pixels per tile, or non-shadow region consistency checks) are supplied. Because the quality of the weak-reference pairs is load-bearing for the 'first reproducible geometry-aware paired dataset' claim, the absence of these diagnostics leaves the training-signal fidelity unverified.

minor comments (2)

[§3] Notation for the Jacobian-based orientation normalisation and the exact form of the mask threshold should be defined with an equation or pseudocode for full replicability.
[§3] The manuscript would benefit from a table listing the number of tiles, average valid-pixel fraction, and temporal separation statistics for the derived pairs.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback highlighting the need for stronger quantitative support and dataset validation. We address each major comment below and will revise the manuscript to incorporate the suggested additions.

read point-by-point responses

Referee: [Abstract and §4] Abstract and §4 (evaluation): the statements that the model 'consistently reduces the visual impact of cast shadows' and achieves 'improved structural and perceptual fidelity' are unsupported by any reported quantitative metrics, PSNR/SSIM values, error bars, ablation tables, or failure-case analysis. Without these numbers the central claim that the DSM-aware model outperforms adapted baselines cannot be assessed.

Authors: We agree that the current manuscript relies primarily on qualitative visual comparisons and textual descriptions of baseline non-convergence to support the claims of reduced shadow impact and improved fidelity. To enable direct assessment of the DSM-aware model's performance, the revised version will add a quantitative evaluation table in §4 reporting PSNR, SSIM, LPIPS, and additional perceptual metrics on held-out scenes, with error bars from repeated training runs, an ablation study isolating the DSM component, and expanded failure-case analysis including quantitative error maps. revision: yes
Referee: [§3] §3 (dataset construction): the per-pixel validity mask is asserted to 'enable supervision despite residual off-nadir parallax,' yet no alignment-error statistics (e.g., mean pixel displacement, fraction of valid pixels per tile, or non-shadow region consistency checks) are supplied. Because the quality of the weak-reference pairs is load-bearing for the 'first reproducible geometry-aware paired dataset' claim, the absence of these diagnostics leaves the training-signal fidelity unverified.

Authors: We acknowledge that aggregate alignment statistics are necessary to verify the fidelity of the derived pairs and the effectiveness of the validity mask. While §3 describes the temporal-geometric filtering, Jacobian normalisation, LoFTR-RANSAC registration, and mask construction, specific error metrics are not reported. In the revision we will add these diagnostics to §3, including mean post-registration pixel displacement, the fraction of valid pixels per tile, and consistency checks on non-shadow regions across the dataset. revision: yes

Circularity Check

0 steps flagged

No circularity: pipeline applies external tools to public data without self-referential reductions

full rationale

The paper describes a replicable pipeline that selects minimally-shadowed tiles from the public S-EO dataset, applies temporal/geometric filtering, Jacobian orientation normalisation, LoFTR-RANSAC registration, and a per-pixel validity mask to produce paired supervision. No equations, fitted parameters, or derivations are shown that reduce the output pairs or the DSM-aware deshadowing model (residual translation + perceptual + mask-constrained adversarial objectives) to the inputs by construction. The central claim of providing the 'first reproducible geometry-aware paired dataset' rests on the novelty of applying these standard external components to satellite shadow removal, which is independent of the target results and externally falsifiable via the cited public dataset and registration methods. No self-citations, uniqueness theorems, or ansatzes are invoked in a load-bearing way. This is a normal self-contained case.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The central claim rests on the availability of sufficiently dense temporal acquisitions per tile and on the assumption that geometric normalisation plus registration can produce usable supervision masks despite parallax.

axioms (2)

domain assumption Minimally shadowed acquisitions exist and can be reliably identified as weak references for each tile
Invoked when selecting the reference image for pairing.
domain assumption Jacobian-based orientation normalisation and LoFTR-RANSAC registration produce alignment accurate enough for per-pixel supervision after masking
Central to creating valid training pairs despite off-nadir parallax.

pith-pipeline@v0.9.0 · 5614 in / 1371 out tokens · 59874 ms · 2026-05-08T01:14:41.410276+00:00 · methodology

Review history (2 revisions) →

discussion (0)

Reference graph

Works this paper leans on

32 extracted references · 1 canonical work pages · 1 internal anchor

[1]

S-EO: A Large-Scale Dataset for Geometry-Aware Shadow Detection in Remote Sensing Applications , doi =

Masquil, Elías and Marí, Roger and Ehret, Thibaud and Meinhardt-Llopis, Enric and Musé, Pablo and Facciolo, Gabriele , booktitle =. S-EO: A Large-Scale Dataset for Geometry-Aware Shadow Detection in Remote Sensing Applications , doi =
[2]

WSRD: A Novel Benchmark for High Resolution Image Shadow Removal , doi =

Vasluianu, Florin-Alexandru and Seizinger, Tim and Timofte, Radu , booktitle =. WSRD: A Novel Benchmark for High Resolution Image Shadow Removal , doi =
[3]

Deeply supervised convolutional neural network for shadow detection based on a novel aerial shadow imagery dataset , doi =

Luo, Shuang and Li, Huifang and Shen, Huanfeng , date =. Deeply supervised convolutional neural network for shadow detection based on a novel aerial shadow imagery dataset , doi =
[4]

Physics-Based Shadow Image Decomposition for Shadow Removal , doi =

Le, Hieu and Samaras, Dimitris , date =. Physics-Based Shadow Image Decomposition for Shadow Removal , doi =
[5]

An impervious surfaces extraction method based on optical, ascending and descending SAR remote sensing imagery in high-density urban core areas , doi =

Zhang, Aizhu and Han, Zheng and Sun, Genyun and Chen, Xiaolin and Cheng, Ji and Zhang, Honghsheng , date =. An impervious surfaces extraction method based on optical, ascending and descending SAR remote sensing imagery in high-density urban core areas , doi =
[6]

and Lee, Stephen J

Zhu, Xiao and Wang, Tiejun and Skidmore, Andrew K. and Lee, Stephen J. and Duporge, Isla , date =. Mitigating terrain shadows in very high-resolution satellite imagery for accurate evergreen conifer detection using bi-temporal image fusion , doi =
[7]

A review of research on remote sensing images shadow detection and application to building extraction , doi =

Dong, Xueyan and Cao, Jiannong and Zhao, Weiheng , date =. A review of research on remote sensing images shadow detection and application to building extraction , doi =
[8]

, date =

Dare, Paul M. , date =. Shadow analysis in high-resolution satellite imagery of urban areas , number =
[9]

Remote sensing and cast shadows in mountainous terrain , number =

Giles, Philip T , date =. Remote sensing and cast shadows in mountainous terrain , number =
[10]

A general variational framework considering cast shadows for the topographic correction of remote sensing imagery , doi =

Li, Huifang and Xu, Liming and Shen, Huanfeng and Zhang, Liangpei , date =. A general variational framework considering cast shadows for the topographic correction of remote sensing imagery , doi =
[11]

An Evolutionary Shadow Correction Network and a Benchmark UAV Dataset for Remote Sensing Images , doi =

Luo, Shuang and Li, Huifang and Li, Yiqiu and Shao, Chenglin and Shen, Huanfeng and Zhang, Liangpei , date =. An Evolutionary Shadow Correction Network and a Benchmark UAV Dataset for Remote Sensing Images , doi =
[12]

and Shechtman, Eli and Wang, Oliver , booktitle =

Zhang, Richard and Isola, Phillip and Efros, Alexei A. and Shechtman, Eli and Wang, Oliver , booktitle =. The Unreasonable Effectiveness of Deep Features as a Perceptual Metric , doi =
[13]

SGDR: Stochastic Gradient Descent with Warm Restarts , eprint =

Loshchilov, Ilya and Hutter, Frank , booktitle =. SGDR: Stochastic Gradient Descent with Warm Restarts , eprint =
[14]

Adam: A Method for Stochastic Optimization

Kingma, Diederik P. and Ba, Jimmy Lei , title =. 2014 , doi =. 1412.6980v9 , eprintclass =

work page internal anchor Pith review arXiv 2014
[15]

Perceptual Losses for Real-Time Style Transfer and Super-Resolution , doi =

Johnson, Justin and Alahi, Alexandre and Fei-Fei, Li , booktitle =. Perceptual Losses for Real-Time Style Transfer and Super-Resolution , doi =
[16]

Recreating Brightness From Remote Sensing Shadow Appearance , doi =

Wang, Qi and Chi, Kaichen and Jing, Wei and Yuan, Yuan , date =. Recreating Brightness From Remote Sensing Shadow Appearance , doi =
[17]

Stacked Conditional Generative Adversarial Networks for Jointly Learning Shadow Detection and Shadow Removal , doi =

Wang, Jifeng and Li, Xiang and Yang, Jian , booktitle =. Stacked Conditional Generative Adversarial Networks for Jointly Learning Shadow Detection and Shadow Removal , doi =
[18]

Qu, Liangqiong and Tian, Jiandong and He, Shengfeng and Tang, Yandong and Lau, Rynson W. H. , booktitle =. DeshadowNet: A Multi-context Embedding Deep Network for Shadow Removal , doi =
[19]

Self-Attention Generative Adversarial Networks , eprint =

Zhang, Han and Goodfellow, Ian and Metaxas, Dimitris and Odena, Augustus , booktitle =. Self-Attention Generative Adversarial Networks , eprint =
[20]

and Farhadi, N

Jovhari, N. and Farhadi, N. and Sedaghat, A. and Mohammadi, N. , date =. Performance evaluation of learning-based methods for multispectral satellite image matching , doi =
[21]

Spectral normalization for generative adversarial networks , eprint =

Miyato, Takeru and Kataoka, Toshiki and Koyama, Masanori and Yoshida, Yuichi , booktitle =. Spectral normalization for generative adversarial networks , eprint =
[22]

, booktitle =

Isola, Phillip and Zhu, Jun-Yan and Zhou, Tinghui and Efros, Alexei A. , booktitle =. Image-to-image translation with conditional adversarial networks , doi =
[23]

Enhanced Pix2pix Dehazing Network , doi =

Qu, Yanyun and Chen, Yizi and Huang, Jingying and Xie, Yuan , booktitle =. Enhanced Pix2pix Dehazing Network , doi =
[24]

Shadow removal method for high-resolution aerial remote sensing images based on region group matching , doi =

Guo, Mingqiang and Zhang, Haixue and Huang, Ying and Xie, Zhong and Wu, Liang and Zhang, Jiaming , date =. Shadow removal method for high-resolution aerial remote sensing images based on region group matching , doi =
[25]

, booktitle =

Potje, Guilherme and Cadar, Felipe and Araujo, André and Martins, Renato and Nascimento, Erickson R. , booktitle =. XFeat: Accelerated Features for Lightweight Image Matching , doi =
[26]

ORB: An efficient alternative to SIFT or SURF , doi =

Rublee, Ethan and Rabaud, Vincent and Konolige, Kurt and Bradski, Gary , booktitle =. ORB: An efficient alternative to SIFT or SURF , doi =
[27]

LoFTR: Detector-Free Local Feature Matching with Transformers , doi =

Sun, Jiaming and Shen, Zehong and Wang, Yuang and Bao, Hujun and Zhou, Xiaowei , booktitle =. LoFTR: Detector-Free Local Feature Matching with Transformers , doi =
[28]

and Bolles, Robert C

Fischler, Martin A. and Bolles, Robert C. , date =. Random Sample Consensus: A Paradigm for Model Fitting with Applications to Image Analysis and Automated Cartography , doi =
[29]

Revisiting Shadow Detection: A New Benchmark Dataset for Complex World , doi =

Hu, Xiaowei and Wang, Tianyu and Fu, Chi-Wing and Jiang, Yitong and Wang, Qiong and Heng, Pheng-Ann , date =. Revisiting Shadow Detection: A New Benchmark Dataset for Complex World , doi =
[30]

U-Net: Convolutional Networks for Biomedical Image Segmentation , doi =

Ronneberger, Olaf and Fischer, Philipp and Brox, Thomas , booktitle =. U-Net: Convolutional Networks for Biomedical Image Segmentation , doi =
[31]

Style-Guided Shadow Removal , doi =

Wan, Jin and Yin, Hui and Wu, Zhenyao and Wu, Xinyi and Liu, Yanting and Wang, Song , booktitle =. Style-Guided Shadow Removal , doi =
[32]

and Wang, Zhen and Smolley, Stephen Paul , booktitle =

Mao, Xudong and Li, Qing and Xie, Haoran and Lau, Raymond Y.K. and Wang, Zhen and Smolley, Stephen Paul , booktitle =. Least Squares Generative Adversarial Networks , doi =