SARU: A Shadow-Aware and Removal Unified Framework for Remote Sensing Images with New Benchmarks

Bin Luo; Hongruixuan Chen; Si-Bao Chen; Wei Lu; Zi-Yang Bo

arxiv: 2604.25432 · v2 · submitted 2026-04-28 · 💻 cs.CV

SARU: A Shadow-Aware and Removal Unified Framework for Remote Sensing Images with New Benchmarks

Zi-Yang Bo , Wei Lu , Hongruixuan Chen , Si-Bao Chen , Bin Luo This is my paper

Pith reviewed 2026-05-13 06:22 UTC · model grok-4.3

classification 💻 cs.CV

keywords shadow detectionshadow removalremote sensing imagesunified frameworktraining-free algorithmbenchmark datasets

0 comments

The pith

A unified two-stage framework detects shadows in remote sensing images with high fidelity and removes them via a fast training-free physical algorithm.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper presents the SARU framework to treat shadow detection and removal as an integrated process for remote sensing imagery instead of independent cascaded steps. A dual-branch network first produces accurate shadow masks by fusing multi-color space and semantic features to separate shadows from dark objects. These masks then guide a novel training-free physical restoration step that transfers illumination properties from adjacent non-shadow areas within the single input image. New benchmark datasets are introduced to enable consistent evaluation. A reader would care because shadows degrade many downstream analyses of satellite and aerial photos, and a practical single-image solution reduces the need for scarce paired training data.

Core claim

The central claim is that integrating a dual-branch detection module (DBCSF-Net) that fuses multi-color space and semantic features to generate high-fidelity shadow masks with a training-free physical algorithm (N²SGSR) that restores illumination by transferring properties from adjacent non-shadow regions within the single input image produces state-of-the-art detection performance and competitive removal quality at high speed on remote sensing images without requiring paired training data.

What carries the argument

The DBCSF-Net dual-branch detection module fusing multi-color space and semantic features to create shadow masks, together with the N²SGSR training-free algorithm that performs physical illumination transfer from non-shadow regions in one image.

If this is right

Accurate masks from the detection stage enable the physical restoration algorithm to transfer illumination properties from adjacent non-shadow regions without introducing visible artifacts or losing fine detail.
The training-free design removes dependence on paired shadow and non-shadow training images that are often unavailable in practice.
An average processing speed of 1.3 seconds per image makes the approach more than 10 times faster than prior state-of-the-art methods while maintaining comparable quality scores.
The introduced RSISD and SiSRB benchmark datasets allow standardized and rigorous evaluation of future shadow detection and removal methods for remote sensing.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Improved shadow handling could raise accuracy in downstream remote sensing tasks such as object detection and semantic segmentation by reducing error propagation from shadows.
The physical restoration step could be hybridized with learned refinement modules to further preserve texture details in complex scenes.
The unified detection-plus-physical-removal pattern may transfer to shadow problems in other single-image domains such as medical imaging or underwater photography.

Load-bearing premise

That the shadow masks produced by detection are accurate enough for the physical restoration step to transfer illumination from non-shadow regions without creating visible artifacts or losing fine detail.

What would settle it

Apply the full SARU pipeline to a collection of remote sensing images containing complex shadow boundaries or dark objects that resemble shadows and check whether the output images exhibit artifacts or fail to match ground-truth shadow-free versions in visual quality metrics.

Figures

Figures reproduced from arXiv: 2604.25432 by Bin Luo, Hongruixuan Chen, Si-Bao Chen, Wei Lu, Zi-Yang Bo.

**Figure 1.** Figure 1: The left image shows a comparison of sampled images before and after shadow removal. The top-right image illustrates that randomly sampled shaded and unshaded regions within the same background exhibit similar pixel intensity distributions across the RGB channels. The bottom-right image further demonstrates a similar trend in pixel value variation when sorted in ascending order for both regions. This sugge… view at source ↗

**Figure 2.** Figure 2: The overall pipeline of the SARU framework. Step 1: Shadow detection via DBCSF-Net; Step 2: Superpixel segmentation using SLIC; Step 3: Shadow removal guided by nearest-neighbor superpixels (N2SGSR); Step 4: Bilateral boundary smoothing for penumbra transition. RGB Lab HSV Decouple Block Stage 1 ×1 Decouple Block Stage 2 ×2 Decouple Block Stage 3 ×3 Decouple Block Stage 4 ×4 F F M Seg_Cls MCSC (Multi-Color… view at source ↗

**Figure 3.** Figure 3: Illustration of the architecture of DBCSF-Net. the MCSC Encoder. This module employs parallel encoding branches across RGB, HSV, and Lab color spaces, as depicted in the multi-color space path of view at source ↗

**Figure 4.** Figure 4: Structural diagram of FFM. 3.2.2. DecoupleNet Encoder Distinguishing shadows from semantically similar dark objects (e.g., black buildings) in RSI is a significant challenge. To address this, we incorporate a semantic information extraction branch using the lightweight DecoupleNet (Lu et al., 2024), shown in the semantic path of view at source ↗

**Figure 5.** Figure 5: The left image illustrates the eleven different cities in China covered by the RSISD dataset. The right image presents a sampling site, which includes the original shadow image, the manually annotated shadow mask, and the shadow removal result generated by our proposed method. and test sets. The RSISD dataset encompasses a diverse range of urban scenarios, including factories, residential areas, schools, a… view at source ↗

**Figure 6.** Figure 6: For the selected images, we annotate shadowed and non-shadowed areas of similar land cover types, ensuring that the number of pixels within pairs of annotated areas is roughly balanced, thus providing a fair basis for comparative analysis. learning rate of 2 × 10−4. The batch size was set to 4, and the training process spanned 30 epochs. Regarding the shadow removal phase using N2SGSR, The number of pixels… view at source ↗

**Figure 7.** Figure 7: Shadow detection results on the AISD dataset. (a) Input image, (b) Ground truth, (c) RSiSD, (d) AFFPN, (e) ECA-SD, (f) SDCM, (g) RSD, (h) SDDNet, (i) SILT, (j) CADDN, (k) Ours. The rendered colors demote TP(green), FN(red), FP(blue), TN(white). (a) (b) (c) (d) (e) (f) (g) (h) (i) (j) (k) view at source ↗

**Figure 8.** Figure 8: Shadow detection results on our proposed RSISD dataset. (a) Input image, (b) Ground truth, (c) RSiSD, (d) AFFPN, (e) ECA-SD, (f) SDCM, (g) RSD, (h) SDDNet, (i) SILT, (j) CADDN, (k) Ours. The rendered colors demote TP(green), FN(red), FP(blue), TN(white). shadows while effectively avoiding the common pitfall of misclassifying dark vehicles as shadow regions. The results in the second and fourth rows further… view at source ↗

**Figure 9.** Figure 9: Shadow removal results on the AISD dataset. (a) Input image, (b) RSiSR, (c) G2R-ShadowNet, (d) MEF-SR, (e) SRMF, (f)Self-ShadowGAN, (g) MAOSD, (h) RS-GSSR, (i)Ours. (a) (b) (c) (d) (e) (f) (g) (h) (i) view at source ↗

**Figure 10.** Figure 10: Shadow removal results on our proposed RSISD dataset. (a) Input image, (b) RSiSR, (c) G2R-ShadowNet, (d) MEF-SR, (e) SRMF, (f)Self-ShadowGAN, (g) MAOSD, (h) RS-GSSR, (i)Ours. originally for hyperspectral images, showed limited applicability to three-channel RGB images due to its reliance on additional spectral information. The RS-GSSR method, which constructs a paired training dataset through image cropp… view at source ↗

**Figure 11.** Figure 11: Vehicle detection visualization results for different shadow removal methods. (a) Input image, (b) RSiSR, (c) G2RShadowNet, (d) MEF-SR, (e) SRMF, (f)Self-ShadowGAN, (g) MAOSD, (h) RS-GSSR, (i)Ours. algorithms view at source ↗

**Figure 14.** Figure 14: Comparison of different search strategies in extreme scenarios lacking local homogeneous references. (a) Shadowed image after superpixel segmentation. (b) Naive global averaging method. (c) Proposed similarity-weighted global search method. of shadow edges is assessed. Owing to the discrete characteristics of initial binary masks, the direct restoration using estimated illumination ratios frequently indu… view at source ↗

**Figure 12.** Figure 12: Ablation study on the impact of the number of nearest neighbors 𝑛 in the N2SGSR algorithm. (a) Input image, (b)–(h) Shadow removal results obtained using 𝑛 = 1, 𝑛 = 3, 𝑛 = 5, 𝑛 = 7, 𝑛 = 9, 𝑛 = 12, and 𝑛 = 15, respectively. (a) (b) (c) view at source ↗

**Figure 13.** Figure 13: Ablation study on the Bilateral Boundary Smoothing. (a) Input image, (b) Shadow removal result without boundary smoothing, (c) Result with the proposed smoothing mechanism. Equation(19) effectively suppresses localized noise and leverages neighborhood redundancy, thereby achieving a more natural and consistent spectral transition. Experimental results indicate that the algorithm performance reaches its op… view at source ↗

read the original abstract

Shadows are a prevalent problem in remote sensing imagery (RSI), degrading visual quality and severely limiting the performance of downstream tasks like object detection and semantic segmentation. Most prior works treat shadow detection and removal as separate, cascaded tasks, which can lead to cumbersome process and error accumulation. Furthermore, many deep learning methods rely on paired shadow and non-shadow images for training, which are often unavailable in practice. To address these challenges, we propose Shadow-Aware and Removal Unified (SARU) Framework , a cohesive two-stage framework. First, its dual-branch detection module (DBCSF-Net) fuses multi-color space and semantic features to generate high-fidelity shadow masks, effectively distinguishing shadows from dark objects. Then, leveraging these masks, a novel, training-free physical algorithm (N$^2$SGSR) restores illumination by transferring properties from adjacent non-shadow regions within the single input image. To facilitate rigorous evaluation and foster future work, we also introduce two new benchmark datasets: the RSI Shadow Detection (RSISD) dataset and the Single-image Shadow Removal Benchmark (SiSRB). Extensive experiments on the AISD and RSISD datasets demonstrate that SARU achieves SOTA shadow detection performance. For shadow removal, our training-free N$^2$SGSR algorithm attains an average processing speed of approximately $1.3$s, which is over $10$ times faster than the SOTA MAOSD while maintains an SRI value close to 0.9 on both the AISD and SiSRB datasets, a level comparable to the advanced RS-GSSR method. By holistically integrating shadow detection and removal to mitigate error propagation and eliminating the dependency on paired training data, SARU establishes a robust, practical framework for real-world RSI analysis. The code and datasets are publicly available at: https://github.com/AeroVILab-AHU/SARU

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

SARU delivers two new public benchmarks and a fast training-free removal step that sidesteps paired data needs, but the framework is still a plain detection-then-restore cascade.

read the letter

The paper's real value sits in the two released datasets (RSISD for detection and SiSRB for removal) and the N²SGSR physical restoration routine that runs in about 1.3 seconds without any training. Those pieces are concrete and immediately usable for anyone dealing with remote sensing shadows. The dual-branch DBCSF-Net also looks like a reasonable way to pull in color-space and semantic cues to separate shadows from dark objects, which is a common failure mode in this domain. Releasing code and data is the part that actually moves the field forward here. The training-free angle is a practical win because paired shadow/non-shadow RSI pairs are hard to get at scale. On the downside, the stress-test note is accurate: the abstract still describes a strict two-stage pipeline where the removal step simply consumes the masks from the first stage. There is no joint optimization, no feedback, and no end-to-end component mentioned, so any boundary errors or dark-object misclassifications in detection will directly degrade the illumination transfer. The claim that this setup “mitigates error propagation” therefore rests on the assumption that the masks are already near-perfect rather than on any architectural robustness. The SOTA numbers are reported without error bars or full protocol details in the abstract, and the physical algorithm’s success is tied to mask quality that is not independently validated in the summary. This is the kind of work that belongs in a remote-sensing or computer-vision applications venue. Readers who need ready-to-use benchmarks or a quick baseline removal method will find it useful; theorists looking for a new joint formulation will not. It is coherent enough and grounded enough in released artifacts to deserve a full referee process rather than a desk reject, mainly so the experimental claims and the physical model can be checked against the actual code and data.

Referee Report

2 major / 2 minor

Summary. The paper proposes the SARU framework for shadow detection and removal in remote sensing images. It consists of a dual-branch detection network (DBCSF-Net) that fuses multi-color space and semantic features to produce shadow masks, followed by a training-free physical restoration algorithm (N²SGSR) that transfers illumination properties from non-shadow regions using the masks. The work also introduces two new benchmark datasets, RSISD for detection and SiSRB for removal, and claims state-of-the-art performance in detection on AISD and RSISD, along with fast processing (1.3s) and high SRI (~0.9) for removal comparable to existing methods.

Significance. If the claims hold, particularly the effectiveness of the training-free removal algorithm and the mitigation of error accumulation in the unified framework, this would represent a practical advance for remote sensing image analysis where paired training data is scarce. The speed improvement and new benchmarks could facilitate further research in the field.

major comments (2)

Abstract: The assertion that SARU mitigates error propagation by holistically integrating detection and removal is undermined by the described two-stage cascade architecture, where N²SGSR directly leverages masks from DBCSF-Net without joint optimization or feedback. This sequential design means detection inaccuracies (e.g., boundary errors or dark object misclassifications) would propagate to the restoration step, contradicting the claim of reduced error accumulation compared to prior cascaded methods.
Abstract: Quantitative results for shadow removal report an SRI value close to 0.9 and 1.3s processing time, but no error bars, standard deviations, or details on the number of test images are provided. Additionally, the validation that the physical algorithm maintains quality relies on the assumption of sufficiently accurate masks, which is not demonstrated through ablation studies or error analysis in the reported experiments.

minor comments (2)

The abstract mentions extensive experiments on AISD and RSISD, but lacks specifics on the experimental protocol, such as training details for DBCSF-Net or parameter settings for N²SGSR.
Clarify the exact definition of the SRI metric and how it is computed, as it is central to the removal performance claims.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the valuable feedback on our manuscript. We have carefully considered the comments and provide point-by-point responses below. Revisions have been made to address the concerns raised.

read point-by-point responses

Referee: Abstract: The assertion that SARU mitigates error propagation by holistically integrating detection and removal is undermined by the described two-stage cascade architecture, where N²SGSR directly leverages masks from DBCSF-Net without joint optimization or feedback. This sequential design means detection inaccuracies (e.g., boundary errors or dark object misclassifications) would propagate to the restoration step, contradicting the claim of reduced error accumulation compared to prior cascaded methods.

Authors: We acknowledge that the SARU framework operates as a sequential two-stage process without joint optimization or feedback loops between detection and removal. The claim in the abstract regarding mitigation of error propagation may have been overstated. While the dual-branch detection network improves mask accuracy by fusing multi-color space and semantic features, thereby reducing certain types of errors like misclassifying dark objects as shadows, detection inaccuracies can still propagate to the N²SGSR restoration step. The training-free physical algorithm helps avoid additional error sources from learning-based removal methods. We have revised the abstract to clarify the framework's design and remove the specific claim about mitigating error propagation through holistic integration. revision: yes
Referee: Abstract: Quantitative results for shadow removal report an SRI value close to 0.9 and 1.3s processing time, but no error bars, standard deviations, or details on the number of test images are provided. Additionally, the validation that the physical algorithm maintains quality relies on the assumption of sufficiently accurate masks, which is not demonstrated through ablation studies or error analysis in the reported experiments.

Authors: We agree that the quantitative results in the abstract lack sufficient statistical details. The SRI value of approximately 0.9 and the processing time of 1.3s are averages computed over the test images from the AISD and SiSRB datasets. In the revised manuscript, we will include error bars, standard deviations, and explicit information on the number of test images used for these metrics. Furthermore, to address the reliance on accurate masks, we will add ablation studies comparing removal performance using predicted masks versus ground-truth masks, along with an error analysis to show how mask inaccuracies affect the final SRI. These additions will demonstrate the robustness of the N²SGSR algorithm. revision: yes

Circularity Check

0 steps flagged

No circularity in SARU derivation chain

full rationale

The paper presents a two-stage pipeline consisting of a trained dual-branch detection network (DBCSF-Net) followed by a training-free, physics-based restoration algorithm (N²SGSR) that transfers illumination properties from non-shadow regions using the generated masks. Reported metrics such as SRI ≈ 0.9 and 1.3s processing speed are obtained via direct evaluation on the AISD, RSISD, and SiSRB benchmarks; no equations, fitted parameters, or self-citations reduce these quantities to quantities defined by the authors' own inputs. The 'unified' framing is an architectural description rather than a mathematical derivation that loops back on itself, and the new benchmarks provide external test data independent of any internal fitting.

Axiom & Free-Parameter Ledger

1 free parameters · 2 axioms · 0 invented entities

The central claims rest on standard computer-vision assumptions about feature fusion and illumination transfer rather than new invented physical entities; the detection network contains many learned parameters but the removal step is explicitly training-free.

free parameters (1)

Network hyperparameters and weights in DBCSF-Net
The dual-branch detection module is trained on data, so its parameters are fitted; exact count and values not stated in abstract.

axioms (2)

domain assumption Multi-color-space and semantic features suffice to distinguish shadows from dark objects
Invoked in the design of the dual-branch detection module.
domain assumption Illumination properties can be transferred from adjacent non-shadow regions to restore shadowed areas accurately
Core premise of the N²SGSR physical restoration algorithm.

pith-pipeline@v0.9.0 · 5656 in / 1583 out tokens · 180025 ms · 2026-05-13T06:22:04.570301+00:00 · methodology

Review history (2 revisions) →

discussion (0)

Reference graph

Works this paper leans on

2 extracted references · 2 canonical work pages

[1]

SACNet: A novel self-supervised learning method for shadow detection from high-resolution remote sensing images. J. Geovisual. Spatial Anal. 9(1), pp. 14. Chen,K.,Liu,C.,Chen,B.,Zhang,J.,Zou,Z.andShi,Z.,2026. Rsrefseg2: Decouplingreferringremotesensingimagesegmentationwithfoundation models. IEEE Trans. Geosci. Remote Sens. 64, pp. 1–20. Chen, X.-D., Wu, W...

work page 2026
[2]

IEEE Trans

Commonality feature representation learning for unsupervised multimodal change detection. IEEE Trans. Image Process. 34, pp. 1219– 1233. Liu, Y., Liu, Z., Yin, H., Wan, J., Wu, Z., Wu, X. and Wang, S., 2024c. Estimating intrinsic characteristics of images for shadow removal. Comput. & Graph. 120, pp. 103922. Liu, Z., Huang, K. and Tan, T., 2012. Cast shad...

work page 2012

[1] [1]

SACNet: A novel self-supervised learning method for shadow detection from high-resolution remote sensing images. J. Geovisual. Spatial Anal. 9(1), pp. 14. Chen,K.,Liu,C.,Chen,B.,Zhang,J.,Zou,Z.andShi,Z.,2026. Rsrefseg2: Decouplingreferringremotesensingimagesegmentationwithfoundation models. IEEE Trans. Geosci. Remote Sens. 64, pp. 1–20. Chen, X.-D., Wu, W...

work page 2026

[2] [2]

IEEE Trans

Commonality feature representation learning for unsupervised multimodal change detection. IEEE Trans. Image Process. 34, pp. 1219– 1233. Liu, Y., Liu, Z., Yin, H., Wan, J., Wu, Z., Wu, X. and Wang, S., 2024c. Estimating intrinsic characteristics of images for shadow removal. Comput. & Graph. 120, pp. 103922. Liu, Z., Huang, K. and Tan, T., 2012. Cast shad...

work page 2012