Heterogeneous SAR-optical fusion for near-real-time land use and land cover mapping under cloud contamination: A novel framework and global benchmark dataset

Jiangong Xu; Jun Pan; Mi Wang; Weibao Xue; Xiaoyu Yu; Xinlian Lianga

arxiv: 2606.17713 · v1 · pith:CD7OF2I6new · submitted 2026-06-16 · 💻 cs.CV

Heterogeneous SAR-optical fusion for near-real-time land use and land cover mapping under cloud contamination: A novel framework and global benchmark dataset

Jiangong Xu , Weibao Xue , Xiaoyu Yu , Jun Pan , Xinlian Lianga , Mi Wang This is my paper

Pith reviewed 2026-06-27 01:27 UTC · model grok-4.3

classification 💻 cs.CV

keywords land use land cover mappingSAR-optical fusioncloud contaminationremote sensingdeep learningSentinel-1 Sentinel-2benchmark datasetsemantic segmentation

0 comments

The pith

CloudLULC-Net fuses cloud-contaminated optical images with adjacent radar data to produce accurate land cover maps directly.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper establishes that an end-to-end network can map land use and land cover in near real time by taking cloud-covered Sentinel-2 optical images together with nearby Sentinel-1 radar observations, rather than first trying to reconstruct clear optical images. A sympathetic reader would care because optical sensors fail under clouds in many regions, while radar sees through them, yet prior fusion methods left semantic gaps that limited reliability for timely mapping. The authors introduce CloudLULC-Net with three main modules plus a new optimization step, and they release CloudLULC-Set, a global collection of over forty thousand paired images with labels, to test the approach. Experiments report 86.60 percent overall accuracy and better results than earlier methods across varying cloud levels.

Core claim

CloudLULC-Net is an end-to-end heterogeneous SAR-optical fusion framework that directly predicts LULC maps from cloud-contaminated Sentinel-2 imagery and temporally adjacent Sentinel-1 SAR observations. The network incorporates optical reliability modulation to suppress unreliable optical responses, heterogeneous information adaptive aggregation to model high-order spatial-channel interactions between optical and SAR representations, and a unified semantic mapping transformer to organize fused features in a LULC-oriented latent space, together with a semantic anchor-guided optimization strategy. On the CloudLULC-Set benchmark of 40,223 triplets, the method reaches 86.60 percent overall accur

What carries the argument

CloudLULC-Net, an end-to-end network that applies optical reliability modulation, heterogeneous information adaptive aggregation, and a unified semantic mapping transformer to fuse SAR and optical inputs into LULC predictions.

If this is right

Enables direct target-date mapping in cloud-prone areas without separate cloud-removal preprocessing.
Maintains accuracy across varying cloud-cover percentages.
Surpasses both reconstruction-first pipelines and other joint SAR-optical networks on the same data.
Supports comparisons against existing global LULC products to show practical gains.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same fusion structure might apply to other sensor pairs where one modality is weather-sensitive.
If radar data can be delivered within hours of the optical pass, near-real-time operational mapping becomes feasible.
The approach could lessen dependence on long time-series optical stacks for gap filling.

Load-bearing premise

Temporally adjacent radar observations are always available and supply enough complementary information to offset uncertainty in the clouded optical data without creating large mismatches in land features.

What would settle it

Performance falls sharply on a held-out test collection where the radar images were acquired more than a week apart from the optical images, revealing errors traceable to temporal mismatch.

Figures

Figures reproduced from arXiv: 2606.17713 by Jiangong Xu, Jun Pan, Mi Wang, Weibao Xue, Xiaoyu Yu, Xinlian Lianga.

**Figure 5.** Figure 5: Architecture of the Heterogeneous Information Adaptive Aggregation (HIAA) block, comprising Spatial Higher-Order Interaction (SHOI) and Channel Higher-Order Interaction (CHOI) modules for cross-modal feature integration in both spatial and channel dimensions. 4.3. Heterogeneous information adaptive aggregation After optical reliability modulation, the reliability-aware optical feature 𝐅𝐅�𝑜𝑜 and the SAR fea… view at source ↗

**Figure 6.** Figure 6: Class-wise comparison of CloudLULC-Net and representative methods on CloudLULC-Set. (a) IoU of different LULC categories and mIoU. (b) F1-score of different LULC categories and mean F1-score. 5.2.2 Comparative validation with existing global LULC products To further evaluate the practical value of CloudLULC-Net for target-date LULC mapping under cloud-contaminated conditions, we compared the generated maps… view at source ↗

**Figure 10.** Figure 10: Quantitative comparison of CloudLULC-Net and selected representative methods under different cloud-coverage levels. The selected methods include representative reconstruction-first and end-to-end SAR–optical mapping baselines [PITH_FULL_IMAGE:figures/full_fig_p020_10.png] view at source ↗

**Figure 11.** Figure 11: Visual comparison of LULC mapping results generated by CloudLULC-Net and selected representative methods under different cloud-coverage levels, with corresponding local magnifications. Subfigures (a)–(d) show the cloud-contaminated optical images, corresponding SAR images, nearest cloud-free optical images used for reference interpretation, and manually annotated LULC reference labels, respectively. Subfi… view at source ↗

**Figure 12.** Figure 12: Visual comparison of LULC mapping results generated by CloudLULC-Net under different input configurations in the Biobío region. The upper panels show (a) cloud-contaminated optical imagery, (b) SAR imagery, (c) temporally closest cloud-free optical imagery used as an ideal reference condition, and (d) manually annotated reference labels. The lower panels show the LULC mapping results obtained using (e) cl… view at source ↗

**Figure 13.** Figure 13: Sensitivity analysis of CloudLULC-Net with respect to the number of stacked HIAA blocks 𝑁𝑁 and the maximum interaction order 𝒪𝒪. (a) OA, (b) F1-score, and (c) mIoU under different combinations of 𝑁𝑁 and 𝒪𝒪. 6.4. Benefits of LULC mapping for fine-grained spatiotemporal surface analysis Beyond benchmark accuracy, an important advantage of CloudLULC-Net lies in its potential for fine-grained target-date LULC… view at source ↗

read the original abstract

Optical remote sensing imagery is frequently degraded by cloud and cloud-shadow contamination, which limits its reliability for near-real-time land use and land cover (LULC) mapping. Although synthetic aperture radar (SAR) can provide cloud-penetrating structural information, existing SAR-optical fusion methods often assume reliable optical observations and insufficiently address the semantic uncertainty introduced by cloud contamination. To address this issue, we propose CloudLULC-Net, an end-to-end heterogeneous SAR-optical fusion framework that directly predicts LULC maps from cloud-contaminated Sentinel-2 imagery and temporally adjacent Sentinel-1 SAR observations. The proposed network incorporates optical reliability modulation to suppress unreliable optical responses, heterogeneous information adaptive aggregation to model high-order spatial-channel interactions between optical and SAR representations, and a unified semantic mapping transformer to organize fused features in a LULC-oriented latent space. A semantic anchor-guided optimization strategy is further introduced to improve the consistency of intermediate semantic representations. To support this task, we construct CloudLULC-Set, a large-scale benchmark dataset containing 40,223 curated SAR-optical-label triplets with pixel-level LULC annotations across diverse geographic regions and cloud conditions. Experimental results show that CloudLULC-Net achieves an OA of 86.60%, an F1-score of 83.29%, and an mIoU of 73.51%, outperforming representative heterogeneous reconstruction-first and end-to-end SAR-optical mapping methods. Comparisons with existing global LULC products and analyses under different cloud-cover levels further demonstrate the robustness and practical value of CloudLULC-Net for target-date LULC mapping in cloud-prone regions.The project is publicly available at: https://github.com/RSIIPAC/CloudLULC

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper delivers a new end-to-end fusion network and a sizable global dataset for cloud-affected LULC mapping, with reported gains over baselines, but the results rest on an unexamined assumption that temporally adjacent SAR data introduces no meaningful mismatch.

read the letter

The core offering here is CloudLULC-Net, which fuses cloud-contaminated Sentinel-2 with nearby Sentinel-1 to output LULC maps directly, using optical reliability modulation, heterogeneous adaptive aggregation, and semantic anchor-guided losses. They also release CloudLULC-Set, a collection of 40,223 triplets spanning multiple regions and cloud conditions, along with code.

The dataset and public release are the clearest positives. A large, curated global set with pixel labels is useful for anyone testing fusion methods, and the reported metrics (86.60% OA, 83.29% F1, 73.51% mIoU) beat the reconstruction-first and end-to-end baselines they ran. Checking performance across cloud-cover levels adds some practical grounding.

The main soft spot is the operating assumption that temporally adjacent SAR always supplies usable structural information without semantic drift. The abstract gives no quantitative limit on time gaps, no ablation on gap size, and no breakdown by land-cover classes that change quickly. If mismatch errors matter in real deployments, the claimed edge over other approaches is not guaranteed. Dataset construction details—selection criteria, label consistency across continents—also need verification to rule out curation effects.

This work is aimed at remote sensing groups building operational mapping pipelines in cloudy areas. The dataset alone could see reuse even if the network needs adjustments.

It deserves peer review. The scale of the data and the direct-prediction framing are enough to warrant referee time, though reviewers will likely press on the timing sensitivity and baseline fairness.

Referee Report

2 major / 2 minor

Summary. The manuscript proposes CloudLULC-Net, an end-to-end heterogeneous SAR-optical fusion network for near-real-time LULC mapping from cloud-contaminated Sentinel-2 optical imagery paired with temporally adjacent Sentinel-1 SAR observations. The network includes optical reliability modulation, heterogeneous adaptive aggregation, a semantic mapping transformer, and semantic anchor-guided optimization. The authors also release CloudLULC-Set, a benchmark of 40,223 curated SAR-optical-label triplets spanning diverse regions and cloud conditions. Experiments report OA of 86.60%, F1 of 83.29%, and mIoU of 73.51%, outperforming reconstruction-first and end-to-end baselines, with additional comparisons to global LULC products under varying cloud cover.

Significance. If the empirical results prove robust, the work supplies a practical framework and large public benchmark for operational LULC mapping in persistently cloudy regions, directly addressing a common limitation of optical-only approaches. The dataset release and code availability constitute clear strengths for reproducibility and future method development in remote sensing.

major comments (2)

[Method description and Experiments] The central performance claims rest on the untested operating assumption that temporally adjacent Sentinel-1 acquisitions are always available and supply semantically consistent structural information without appreciable mismatch error. The method description and experimental section provide no quantitative bound on acceptable time gaps, no ablation varying the temporal separation, and no per-class analysis for land-cover types that change on sub-weekly timescales. If mismatch errors are non-negligible, the heterogeneous aggregation and semantic-anchor losses cannot be guaranteed to deliver the reported gains over reconstruction-first baselines.
[Dataset Construction] CloudLULC-Set is described as curated, yet the dataset-construction section supplies insufficient detail on selection criteria, cloud-condition sampling strategy, and safeguards against post-hoc curation choices that could favor the proposed network. Without these details, it is difficult to assess whether the held-out test performance (OA 86.60 %, mIoU 73.51 %) reflects genuine generalization or dataset-construction artifacts.

minor comments (2)

[Abstract] The abstract states that the method outperforms 'representative' baselines but does not name the specific reconstruction-first and end-to-end methods or cite their implementations; this should be clarified for reproducibility.
[Method] Notation for the optical reliability modulation and heterogeneous aggregation modules could be made more explicit (e.g., by defining the exact form of the modulation mask and the high-order interaction operator) to aid readers attempting re-implementation.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback. We address the two major comments point-by-point below and will revise the manuscript accordingly to strengthen the presentation of temporal consistency and dataset construction details.

read point-by-point responses

Referee: [Method description and Experiments] The central performance claims rest on the untested operating assumption that temporally adjacent Sentinel-1 acquisitions are always available and supply semantically consistent structural information without appreciable mismatch error. The method description and experimental section provide no quantitative bound on acceptable time gaps, no ablation varying the temporal separation, and no per-class analysis for land-cover types that change on sub-weekly timescales. If mismatch errors are non-negligible, the heterogeneous aggregation and semantic-anchor losses cannot be guaranteed to deliver the reported gains over reconstruction-first baselines.

Authors: We agree that an explicit analysis of temporal mismatch is needed to bound the operating regime. In CloudLULC-Set, SAR-optical pairs were formed using the closest available Sentinel-1 acquisition within a 6-day window of the target Sentinel-2 date (reflecting the combined revisit characteristics of the two sensors). To address the referee's concern, the revised manuscript will add: (1) a quantitative summary of the actual time gaps present in the 40,223 triplets, (2) an ablation that retrains and evaluates CloudLULC-Net on subsets with increasing maximum temporal separation (0-2, 2-4, 4-6 days), and (3) per-class F1 and IoU breakdowns for dynamic categories (e.g., cropland, water bodies) under these gap conditions. These additions will either confirm that mismatch remains negligible within the chosen window or identify the practical limit beyond which performance degrades. revision: yes
Referee: [Dataset Construction] CloudLULC-Set is described as curated, yet the dataset-construction section supplies insufficient detail on selection criteria, cloud-condition sampling strategy, and safeguards against post-hoc curation choices that could favor the proposed network. Without these details, it is difficult to assess whether the held-out test performance (OA 86.60 %, mIoU 73.51 %) reflects genuine generalization or dataset-construction artifacts.

Authors: We acknowledge that the current description of CloudLULC-Set construction is too brief. The revised manuscript will expand this section with: (i) explicit inclusion/exclusion rules (minimum 10 % cloud cover, exclusion of scenes with >80 % cloud or permanent snow/ice, geographic stratification across 12 Köppen climate zones), (ii) the cloud-cover sampling distribution (uniform sampling within 10-30 %, 30-50 %, 50-70 % bins), and (iii) safeguards including independent annotation by two experts with adjudication, temporal consistency checks against higher-resolution reference imagery, and a fully documented train/validation/test split that was frozen before any model development. These details will allow readers to evaluate potential curation bias. revision: yes

Circularity Check

0 steps flagged

No significant circularity; empirical results on held-out test data

full rationale

The paper presents CloudLULC-Net as an end-to-end neural architecture with optical reliability modulation, heterogeneous aggregation, and semantic-anchor losses, plus the new CloudLULC-Set dataset of 40,223 triplets. Reported metrics (OA 86.60%, F1 83.29%, mIoU 73.51%) are obtained via standard supervised training and evaluation on held-out portions of that dataset against reconstruction-first and end-to-end baselines. No equations, parameter fits, or derivations appear that reduce by construction to the inputs; the central claims rest on empirical performance rather than any self-definitional, fitted-input-called-prediction, or self-citation-load-bearing steps. The assumption about temporally adjacent SAR availability is an operating-regime limitation, not a circularity in the reported results.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 0 invented entities

The central performance claims rest on the learned behavior of the proposed network modules and the assumption that the new dataset adequately represents global geographic and cloud conditions; no additional ad-hoc constants beyond standard neural network training are described in the abstract.

free parameters (1)

network weights and hyperparameters
Learned from the CloudLULC-Set training split to optimize LULC prediction accuracy.

axioms (1)

domain assumption Temporally adjacent SAR observations provide reliable complementary structural information when optical data is cloud-contaminated.
Core premise enabling the heterogeneous fusion approach.

pith-pipeline@v0.9.1-grok · 5869 in / 1328 out tokens · 43825 ms · 2026-06-27T01:27:07.075333+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

57 extracted references · 52 canonical work pages

[1]

https://doi.org/10.1038/s41597-022-01307-4 Chen, B., Huang, B., Chen, L., Xu, B.,

work page doi:10.1038/s41597-022-01307-4
[2]

IEEE Trans

Spatially and temporally weighted regression: A novel method to produce continuous cloud- free Landsat imagery. IEEE Trans. Geosci. Remote Sens. 55, 27 -37. https://doi.org/10.1109/TGRS.2016.2580576 Chen, H., Zhang, J., Wang, H., Wang, S., Huang, P., Li, J., Guo, H., Wang, D., Wang, Z., Du, B.,

work page doi:10.1109/tgrs.2016.2580576 2016
[3]

IEEE Trans

Frequency-aware feature fusion for dense image prediction. IEEE Trans. Pattern Anal. Mach. Intell. 46, 10763-10780. https://doi.org/10.1109/TPAMI.2024.3449959 Chen, Y ., Bruz zone, L.,

work page doi:10.1109/tpami.2024.3449959 2024
[4]

IEEE Trans

Self -supervised SAR -optical data fusion of Sentinel-1/-2 images. IEEE Trans. Geosci. Remote Sens. 60, 1 -11. https://doi.org/10.1109/TGRS.2021.3128072 Chi, L., Jiang, B., Mu, Y .,

work page doi:10.1109/tgrs.2021.3128072 2021
[5]

In: Proceedings of the IEEE conference on computer vision and pattern recognition workshops

Deepglobe 2018: A challenge to parse the earth through satellite images. In: Proceedings of the IEEE conference on computer vision and pattern recognition workshops. pp. 172 -181. https://doi.org/10.1109/CVPRW.2018.00031 Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S.,

work page doi:10.1109/cvprw.2018.00031 2018
[6]

arXiv preprint arXiv:2010.11929 Ebel, P., Meraner, A., Schmitt, M., Zhu, X.,

An image is worth 16x16 words: Transformers for image recognition at scale. arXiv preprint arXiv:2010.11929 Ebel, P., Meraner, A., Schmitt, M., Zhu, X.,

Pith/arXiv arXiv 2010
[7]

https://doi.org/10.1109/TGRS.2020.3024744 Ghorbanian, A., Kakooei, M., Amani, M., Mahdavi, S., Mohammadzadeh, A., Hasanlou, M.,

work page doi:10.1109/tgrs.2020.3024744 2020
[8]

https://doi.org/10.1016/j.isprsjprs.2020.07.013 Guo, S., Wu, W., Shao, Z., Teng, J., Li, D.,

work page doi:10.1016/j.isprsjprs.2020.07.013 2020
[9]

Extracting urban impervious surface based on optical and SAR images cross -modal multi-scale features fusion network. Int. J. Digit. Earth 17, 2301675. https://doi.org/10.1080/17538947.2023.2301675 Karra, K., Kontgis, C., Statman-Weil, Z., Mazzariello, J.C., Mathis, M., Brumby, S.P.,

work page doi:10.1080/17538947.2023.2301675 2023
[10]

and Chini, M

Global land use/land cover with Sentinel 2 and deep learning. In: IEEE International Geoscience and Remote Sensing Symposium (IGARSS). IEEE. pp. 4704-4707. https://doi.org/10.1109/IGARSS47720.2021.9553499 Kattenborn, T., Lopatin, J., Förster, M., Braun, A.C., Fassnacht, F.E.,

work page doi:10.1109/igarss47720.2021.9553499 2021
[11]

Remote Sens

UA V data as alternative to field sampling to map woody invasive species based on combined Sentinel-1 and Sentinel-2 data. Remote Sens. Environ. 227, 61-73. https://doi.org/10.1016/j.rse.2019.03.025 Lee-Thorp, J., Ainslie, J., Eckstein, I., Ontanon, S.,

work page doi:10.1016/j.rse.2019.03.025 2019
[12]

In: Proceedings of the 2022 Conference of the north American chapter of the Association for Computational Linguistics: human language technologies

Fnet: Mixing tokens with fourier transforms. In: Proceedings of the 2022 Conference of the north American chapter of the Association for Computational Linguistics: human language technologies. pp. 4296 -4313. https://doi.org/10.18653/v1/2022.naacl-main.319 Li, C., Lyu, H., Jing, W., Yuan, Y ., Cheng, G., 2025a. MFFNet: a wavelet transform-based multimodal...

work page doi:10.18653/v1/2022.naacl-main.319 2022
[13]

IEEE Trans

Collaborative attention -based heterogeneous gated fusion network for land cover classification. IEEE Trans. Geosci. Remote Sens. 59, 3829 -3845. https://doi.org/10.1109/TGRS.2020.3015389 Li, X., Zhang, G., Cui, H., Hou, S., Wang, S., Li, X., Chen, Y ., Li, Z., Zhang, L., 2022a. MCANet: A joint semantic segmentation framework of optical and SAR images for...

work page doi:10.1109/tgrs.2020.3015389 2020
[14]

HS2P: Hierarchical spectral and structure -preserving fusion network for multimodal remote sensing image cloud and shadow removal. Inf. Fusion 94, 215 -228. https://doi.org/10.1016/j.inffus.2023.02.002 Li, Y ., Xue, Y ., Xin, Z., Liao, G., Huang, P ., 2025b. Multi -modal cross Swin transformer network for multi -label classification landslide detection wi...

work page doi:10.1016/j.inffus.2023.02.002 2023
[15]

https://doi.org/10.1016/j.isprsjprs.2022.02.013 Li, Z., Weng, Q., Zhou, Y ., Dou, P., Ding, X.,

work page doi:10.1016/j.isprsjprs.2022.02.013 2022
[16]

Remote Sens

Learning spectral-indices- fused deep models for time-series land use and land cover mapping in cloud- prone areas: The case of Pearl River Delta. Remote Sens. Environ. 308, 114190. https://doi.org/10.1016/j.rse.2024.114190 Liu, C., Huang, W., Zhu, X.X.,

work page doi:10.1016/j.rse.2024.114190 2024
[17]

https://doi.org/10.1016/j.isprsjprs.2026.04.056 Liu, R., Ling, J., Zhang, H.,

work page doi:10.1016/j.isprsjprs.2026.04.056 2026
[18]

SoftFormer: SAR -optical fusion transformer for urban land use and land cover classifica tion. ISPRS J. Photogramm. Remote Sens. 218, 277-293. https://doi.org/10.1016/j.isprsjprs.2024.09.012 Long, J., Shelhamer, E., Darrell, T.,

work page doi:10.1016/j.isprsjprs.2024.09.012 2024
[19]

In: Proceedings of the IEEE conference on computer vision and pattern recognition

Fully convolutional networks for semantic segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition. pp. 3431 -3440. https://doi.org/10.1109/CVPR.2015.7298965 Ma, J., Chen, Y ., Pan, J., Xu, J., Li, Z., Xu, R., Chen, R., 2024a. SCT -CR: A synergistic convolution -transformer modeling method using SAR -optical data fus...

work page doi:10.1109/cvpr.2015.7298965 2015
[20]

https://doi.org/10.1016/j.isprsjprs.2019.04.015 Ma, W., Karakuş, O., Rosin, P.L.,

work page doi:10.1016/j.isprsjprs.2019.04.015 2019
[21]

A multilevel multimodal fusion Transformer for remote sensing semantic segmentation

https://doi.org/10.3390/rs14184458 Ma, X., Zhang, X., Pun, M.-O., Liu, M., 2024b. A multilevel multimodal fusion Transformer for remote sensing semantic segmentation. IEEE Trans. Geosci. Remote Sens

work page doi:10.3390/rs14184458
[22]

https://doi.org/10.1109/TGRS.2024.3373033 Milletari, F., Navab, N., Ahmadi, S.-A.,

work page doi:10.1109/tgrs.2024.3373033 2024
[23]

In: 2016 fourth international conference on 3D vision (3DV)

V-net: Fully convolutional neural networks for volumetric medi cal image segmentation. In: 2016 fourth international conference on 3D vision (3DV). IEEE. pp. 565 -571. https://doi.org/10.1109/3DV .2016.79 Moreira, A., Prats-Iraola, P., Younis, M., Krieger, G., Hajnsek, I., Papathanassiou, K.P.,

work page doi:10.1109/3dv 2016
[24]

IEEE Geosci

A tutorial on synthetic aperture radar. IEEE Geosci. Remote Sens. Mag. 1, 6-43. https://doi.org/10.1109/MGRS.2013.2248301 Pan, J., Xu, J., Yu, X., Ye, G., Wang, M., Chen, Y ., Ma, J.,

work page doi:10.1109/mgrs.2013.2248301 2013
[25]

HDRSA-Net: Hybrid dynamic residual self -attention network for SAR -assisted optical image cloud and shadow removal. ISPRS J. Photogramm. Remote Sens. 218, 258-278. https://doi.org/10.1016/j.isprsjprs.2024.10.026 Peebles, W., Xie, S.,

work page doi:10.1016/j.isprsjprs.2024.10.026 2024
[26]

In: 2023 IEEE/CVF International Conference on Computer Vi- sion (ICCV)

Scalable diffusion models with transformers. In: Proceedings of the IEEE/CVF international conference on computer vision. pp. 4195-4205. https://doi.org/10.1109/ICCV51070.2023.00387 Qiu, S., Zhu, Z., He, B.,

work page doi:10.1109/iccv51070.2023.00387 2023
[27]

Remote Sens

Fmask 4.0: Improved cloud and cloud shadow detection in Landsats 4 –8 and Sentinel -2 imagery. Remote Sens. Environ. 231, 111205. https://doi.org/10.1016/j.rse.2019.05.024 Ren, B., Ma, S., Hou, B., Hong, D., Chanussot, J., Wang, J., Jiao, L.,

work page doi:10.1016/j.rse.2019.05.024 2019
[28]

A dual-stream high resolution network: Deep fusion of GF-2 and GF-3 data for land cover classification. Int. J. Appl. Earth Obs. Geoinf. 112, 102896. https://doi.org/10.1016/j.jag.2022.102896 Schmitt, M., Hughes, L.H., Qiu, C., Zhu, X.X.,

work page doi:10.1016/j.jag.2022.102896 2022
[29]

arXiv preprint arXiv:1906.07789 Shang, X., Li, G., Jiang, Z., Zhang, S., Ding, N., Liu, J.,

SEN12MS --A curated dataset of georeferenced multi -spectral sentinel -1/2 imagery for deep learning and data fusion. arXiv preprint arXiv:1906.07789 Shang, X., Li, G., Jiang, Z., Zhang, S., Ding, N., Liu, J.,

Pith/arXiv arXiv 1906
[30]

Holistic dynamic frequency transformer for image fusion and exposure correction. Inf. Fusion 102, 102073. https://doi.org/10.1016/j.inffus.2023.102073 Shermeyer, J., Hogan, D., Brown, J., Van Etten, A., Weir, N., Pacifici, F., Hansch, R., Bastidas, A., Soenen, S., Bacastow, T.,

work page doi:10.1016/j.inffus.2023.102073 2023
[31]

https://doi.org/10.1109/CVPRW50498.2020.00106 Shu, Q., Zhu, X., Xu, S., Wang, Y ., Liu, D.,

work page doi:10.1109/cvprw50498.2020.00106 2020
[32]

Remote Sens

RESTORE -DiT: Reliable satellite image time series reconstruction by multimodal sequential diffusion transformer. Remote Sens. Environ. 328, 114872. https://doi.org/10.1016/j.rse.2025.114872 Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.,

work page doi:10.1016/j.rse.2025.114872 2025
[33]

Sensors 8, 4213-4248

On the soil roughness parameterization problem in soil moisture retrieval of bare surfaces from synthetic aperture radar. Sensors 8, 4213-4248. https://doi.org/10.3390/s8074213 Wang, M., Hu, S., Song, Y ., Shi, Y .,

work page doi:10.3390/s8074213
[34]

https://doi.org/10.3390/rs17132241 Wang, Y ., Sun, Y ., Cao, X., Wang, Y ., Zhang, W., Cheng, X.,

work page doi:10.3390/rs17132241
[35]

A review of regional and Global scale Land Use/Land Cover (LULC) mapping products generated from satellite remote sensing. ISPRS J. Photogramm. Remote Sens. 206, 311-334. https://doi.org/10.1016/j.isprsjprs.2023.11.014 Wei, K., Dai, J., Hong, D., Ye, Y .,

work page doi:10.1016/j.isprsjprs.2023.11.014 2023
[36]

MGFNet: An MLP -dominated gated fusion network for semantic segmentation of high- resolution multi-modal remote sensing images. Int. J. Appl. Earth Obs. Geoinf. 135, 104241. https://doi.org/10.1016/j.jag.2024.104241 Wei, Y ., Xiao, A., Chen, H., Xia, J., Yokoya, N.,

work page doi:10.1016/j.jag.2024.104241 2024
[37]

arXiv preprint arXiv:2603.17528 Williamson, C.J., Kupc, A., Axisa, D., Bilsback, K.R., Bui, T., Campuzano-Jost, P., Dollner, M., Froyd, K.D., Hodshire, A.L., Jimenez, J.L.,

MM-OVSeg: Multimodal Optical-SAR Fusion for Open-V ocabulary Segmentation in Remote Sensing. arXiv preprint arXiv:2603.17528 Williamson, C.J., Kupc, A., Axisa, D., Bilsback, K.R., Bui, T., Campuzano-Jost, P., Dollner, M., Froyd, K.D., Hodshire, A.L., Jimenez, J.L.,

arXiv
[38]

Nature 574, 399-403

A large source of cloud condensation nuclei from new particle formation in the tropics. Nature 574, 399-403. https://doi.org/10.1038/s41586-019-1638-9 Wu, P ., Yao, Y ., Wan, Y ., Zhang, W., Zhao, R., Li, J., Zhang, Y .,

work page doi:10.1038/s41586-019-1638-9
[39]

arXiv preprint arXiv:2602.05480 Wu, S., Zhu, J., Gu, Y ., Han, W., Jiang, W., Geng, J.,

SOMA- 1M: A Large -Scale SAR -Optical Multi -resolution Alignment Dataset for Multi-Task Remote Sensing. arXiv preprint arXiv:2602.05480 Wu, S., Zhu, J., Gu, Y ., Han, W., Jiang, W., Geng, J.,

arXiv
[40]

Quantifying the sensitivity of SAR and optical images three -level fusions in land cover classification to registration errors

Wu, W., Shao, Z., Huang, X., Teng, J., Guo, S., Li, D., 2022 . Quantifying the sensitivity of SAR and optical images three -level fusions in land cover classification to registration errors. Int. J. Appl. Earth Obs. Geoinf. 112, 102868. https://doi.org/10.1016/j.jag.2022.102868 Xia, J., Chen, H., Broni -Bediako, C., Wei, Y ., Song, J., Yokoya, N.,

work page doi:10.1016/j.jag.2022.102868 2022
[41]

IEEE Geosci

OpenEarthMap-SAR: A benchmark synthetic aperture radar dataset for global high-resolution land cover mapping [Software and Data Sets]. IEEE Geosci. Remote Sens. M ag. 13, 476 -487. https://doi.org/10.1109/MGRS.2025.3599512 Xie, E., Wang, W., Y u, Z., Anandkumar, A., Alvarez, J.M., Luo, P.,

work page doi:10.1109/mgrs.2025.3599512 2025
[42]

CloudSeg: A multi-modal learning framework for robust land cover mapping under cloudy conditions. ISPRS J. Ph otogramm. Remote Sens. 214, 21 -32. https://doi.org/10.1016/j.isprsjprs.2024.06.001 Xu, F., Shi, Y ., Yang, W., Zhu, X.,

work page doi:10.1016/j.isprsjprs.2024.06.001 2024
[43]

Synergistic use of Sentinel-1 and Sentinel-2 images for in-season crop type classification,

Multi -modal multi-task learning for semantic segmentation of land c over under cloudy conditions. In: IGARSS 2023-2023 IEEE International Geoscience and Remote Sensing Symposium. IEEE. pp. 6274 -6277. https://doi.org/10.1109/IGARSS52108.2023.10281865 Xu, J., Yu, X., Pan, J., Cao, L., Wang, M.,

work page doi:10.1109/igarss52108.2023.10281865 2023
[44]

PolNet -CR: Spatial-channel collaborative interaction network for PolSAR incremental information - assisted optical satellite imagery cloud removal. Inf. Fusion 124, 103367. https://doi.org/10.1016/j.inffus.2025.103367 Yeung, M., Sala, E., Schönlieb, C. -B., Rundo, L.,

work page doi:10.1016/j.inffus.2025.103367 2025
[45]

Unified focal loss: Generalising dice and cross entropy-based losses to handle class imbalanced medical image segmentation. Comput. Med. Imaging Graph. 95, 102026. https://doi.org/10.1016/j.compmedimag.2021.102026 Yu, B., Li, J., Huang, X.,

work page doi:10.1016/j.compmedimag.2021.102026 2021
[46]

STSNet: A cross-spatial resolution multi-modal remote sensing deep fusion networ k for high resolution land- cover segmentation. Inf. Fusion 114, 102689. https://doi.org/10.1016/j.inffus.2024.102689 Yu, H., Li, G., Liu, H., Zhu, S., Dong, W., Li, C.,

work page doi:10.1016/j.inffus.2024.102689 2024
[47]

arXiv preprint arXiv:2410.03962 Zanaga, D., Van De Kerchove, R., Daems, D., De Keersmaecker, W., Brockmann, C., Kirches, G., Wevers, J., Cartus, O., San toro, M., Fritz, S.,

SpecSAR -former: a lightweight transformer -based network for global LULC mapping using integrated Sentinel-1 and Sentinel-2. arXiv preprint arXiv:2410.03962 Zanaga, D., Van De Kerchove, R., Daems, D., De Keersmaecker, W., Brockmann, C., Kirches, G., Wevers, J., Cartus, O., San toro, M., Fritz, S.,

arXiv
[48]

https://doi.org/10.5281/zenodo.5571936 Zhang, J., Liu, H., Yang, K., Hu, X., Liu, R., Stiefelhagen, R.,

ESA WorldCover 10 m 2021 v200. https://doi.org/10.5281/zenodo.5571936 Zhang, J., Liu, H., Yang, K., Hu, X., Liu, R., Stiefelhagen, R.,

work page doi:10.5281/zenodo.5571936 2021
[49]

IEEE Trans

CMX: Cross- modal fusion for RGB -X semantic segmentation with Transformers. IEEE Trans. Intell. Transp. Syst. 24, 14679 -14694. https://doi.org/10.1109/TITS.2023.3300537 Zhang, P., Peng, B., Lu, C., Huang, Q., Liu, D.,

work page doi:10.1109/tits.2023.3300537 2023
[50]

ASANet: Asymmetric Semantic Aligning Network for RGB and SAR image land cover classification. ISPRS J. Photogramm. Remote Sens. 218, 574 -587. https://doi.org/10.1016/j.isprsjprs.2024.09.025 Zhang, R., Yang, Y ., Li, Z., Li, P., Wang, H., 2025a. Optical and SAR Image Fusion: A Review of Theories, Methods, and Applications. Remote Sens. 18,

work page doi:10.1016/j.isprsjprs.2024.09.025 2024
[51]

GLC_FCS10: a global 10 -m land -cover dataset with a fine classification system from Sentinel -1 and Sentinel -2 time -series data in Google Earth Engine

https://doi.org/10.3390/rs18010073 Zhang, X., Liu, L., Zhao, T., Zhang, W., Guan, L., Bai, M., Chen, X., 2025b. GLC_FCS10: a global 10 -m land -cover dataset with a fine classification system from Sentinel -1 and Sentinel -2 time -series data in Google Earth Engine. Earth Syst. Sci. Data Discuss, 1 -27. https://doi.org/10.5194/essd- 2025-73 Zhang, Z., Yan...

work page doi:10.3390/rs18010073 2025
[52]

https://doi.org/10.1109/TGRS.2025.3574799 Zheng, H., Zhong, X., Liu, B., Xiao, Y ., Wen, B., Li, X.,

work page doi:10.1109/tgrs.2025.3574799 2025
[53]

https://doi.org/10.1109/TGRS.2025.3621902 Zheng, N., Zhou, M., Huang, J., Hou, J., Li, H., Xu, Y ., Zhao, F.,

work page doi:10.1109/tgrs.2025.3621902 2025
[54]

In: 2024 IEEE/CVF Conference on Com- puter Vision and Pattern Recognition (CVPR)

Probing synergistic high-order interaction in infrared and visible image fusion. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. pp. 26384 -26395. https://doi.org/10.1109/CVPR52733.2024.02492 Zhou, M., Zheng, N., He, X., Hong, D., Chanussot, J.,

work page doi:10.1109/cvpr52733.2024.02492 2024
[55]

IEEE Trans

Probing Synergistic High-Order Interaction for Multi-modal Image Fusion. IEEE Trans. Pattern Anal. Mach. Intell. https://doi.org/10.1109/TPAMI.2024.3475485 Zhu, X.X., Tuia, D., Mou, L., Xia, G.-S., Zhang, L., Xu, F., Fraundorfer, F.,

work page doi:10.1109/tpami.2024.3475485 2024
[56]

IEEE Geosci

Deep learning in remote sensing: A comprehensive review and list of resources. IEEE Geosci. Remote Sens. Mag. 5, 8 -36. https://doi.org/10.1109/MGRS.2017.2762307 Zhu, Z., Woodcock, C.E.,

work page doi:10.1109/mgrs.2017.2762307 2017
[57]

Remote Sens

Continuous change detection and classification of land cover using all available Landsat data. Remote Sens. Environ. 144, 152-171. https://doi.org/10.1016/j.rse.2014.01.011

work page doi:10.1016/j.rse.2014.01.011 2014

[1] [1]

https://doi.org/10.1038/s41597-022-01307-4 Chen, B., Huang, B., Chen, L., Xu, B.,

work page doi:10.1038/s41597-022-01307-4

[2] [2]

IEEE Trans

Spatially and temporally weighted regression: A novel method to produce continuous cloud- free Landsat imagery. IEEE Trans. Geosci. Remote Sens. 55, 27 -37. https://doi.org/10.1109/TGRS.2016.2580576 Chen, H., Zhang, J., Wang, H., Wang, S., Huang, P., Li, J., Guo, H., Wang, D., Wang, Z., Du, B.,

work page doi:10.1109/tgrs.2016.2580576 2016

[3] [3]

IEEE Trans

Frequency-aware feature fusion for dense image prediction. IEEE Trans. Pattern Anal. Mach. Intell. 46, 10763-10780. https://doi.org/10.1109/TPAMI.2024.3449959 Chen, Y ., Bruz zone, L.,

work page doi:10.1109/tpami.2024.3449959 2024

[4] [4]

IEEE Trans

Self -supervised SAR -optical data fusion of Sentinel-1/-2 images. IEEE Trans. Geosci. Remote Sens. 60, 1 -11. https://doi.org/10.1109/TGRS.2021.3128072 Chi, L., Jiang, B., Mu, Y .,

work page doi:10.1109/tgrs.2021.3128072 2021

[5] [5]

In: Proceedings of the IEEE conference on computer vision and pattern recognition workshops

Deepglobe 2018: A challenge to parse the earth through satellite images. In: Proceedings of the IEEE conference on computer vision and pattern recognition workshops. pp. 172 -181. https://doi.org/10.1109/CVPRW.2018.00031 Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S.,

work page doi:10.1109/cvprw.2018.00031 2018

[6] [6]

arXiv preprint arXiv:2010.11929 Ebel, P., Meraner, A., Schmitt, M., Zhu, X.,

An image is worth 16x16 words: Transformers for image recognition at scale. arXiv preprint arXiv:2010.11929 Ebel, P., Meraner, A., Schmitt, M., Zhu, X.,

Pith/arXiv arXiv 2010

[7] [7]

https://doi.org/10.1109/TGRS.2020.3024744 Ghorbanian, A., Kakooei, M., Amani, M., Mahdavi, S., Mohammadzadeh, A., Hasanlou, M.,

work page doi:10.1109/tgrs.2020.3024744 2020

[8] [8]

https://doi.org/10.1016/j.isprsjprs.2020.07.013 Guo, S., Wu, W., Shao, Z., Teng, J., Li, D.,

work page doi:10.1016/j.isprsjprs.2020.07.013 2020

[9] [9]

Extracting urban impervious surface based on optical and SAR images cross -modal multi-scale features fusion network. Int. J. Digit. Earth 17, 2301675. https://doi.org/10.1080/17538947.2023.2301675 Karra, K., Kontgis, C., Statman-Weil, Z., Mazzariello, J.C., Mathis, M., Brumby, S.P.,

work page doi:10.1080/17538947.2023.2301675 2023

[10] [10]

and Chini, M

Global land use/land cover with Sentinel 2 and deep learning. In: IEEE International Geoscience and Remote Sensing Symposium (IGARSS). IEEE. pp. 4704-4707. https://doi.org/10.1109/IGARSS47720.2021.9553499 Kattenborn, T., Lopatin, J., Förster, M., Braun, A.C., Fassnacht, F.E.,

work page doi:10.1109/igarss47720.2021.9553499 2021

[11] [11]

Remote Sens

UA V data as alternative to field sampling to map woody invasive species based on combined Sentinel-1 and Sentinel-2 data. Remote Sens. Environ. 227, 61-73. https://doi.org/10.1016/j.rse.2019.03.025 Lee-Thorp, J., Ainslie, J., Eckstein, I., Ontanon, S.,

work page doi:10.1016/j.rse.2019.03.025 2019

[12] [12]

In: Proceedings of the 2022 Conference of the north American chapter of the Association for Computational Linguistics: human language technologies

Fnet: Mixing tokens with fourier transforms. In: Proceedings of the 2022 Conference of the north American chapter of the Association for Computational Linguistics: human language technologies. pp. 4296 -4313. https://doi.org/10.18653/v1/2022.naacl-main.319 Li, C., Lyu, H., Jing, W., Yuan, Y ., Cheng, G., 2025a. MFFNet: a wavelet transform-based multimodal...

work page doi:10.18653/v1/2022.naacl-main.319 2022

[13] [13]

IEEE Trans

Collaborative attention -based heterogeneous gated fusion network for land cover classification. IEEE Trans. Geosci. Remote Sens. 59, 3829 -3845. https://doi.org/10.1109/TGRS.2020.3015389 Li, X., Zhang, G., Cui, H., Hou, S., Wang, S., Li, X., Chen, Y ., Li, Z., Zhang, L., 2022a. MCANet: A joint semantic segmentation framework of optical and SAR images for...

work page doi:10.1109/tgrs.2020.3015389 2020

[14] [14]

HS2P: Hierarchical spectral and structure -preserving fusion network for multimodal remote sensing image cloud and shadow removal. Inf. Fusion 94, 215 -228. https://doi.org/10.1016/j.inffus.2023.02.002 Li, Y ., Xue, Y ., Xin, Z., Liao, G., Huang, P ., 2025b. Multi -modal cross Swin transformer network for multi -label classification landslide detection wi...

work page doi:10.1016/j.inffus.2023.02.002 2023

[15] [15]

https://doi.org/10.1016/j.isprsjprs.2022.02.013 Li, Z., Weng, Q., Zhou, Y ., Dou, P., Ding, X.,

work page doi:10.1016/j.isprsjprs.2022.02.013 2022

[16] [16]

Remote Sens

Learning spectral-indices- fused deep models for time-series land use and land cover mapping in cloud- prone areas: The case of Pearl River Delta. Remote Sens. Environ. 308, 114190. https://doi.org/10.1016/j.rse.2024.114190 Liu, C., Huang, W., Zhu, X.X.,

work page doi:10.1016/j.rse.2024.114190 2024

[17] [17]

https://doi.org/10.1016/j.isprsjprs.2026.04.056 Liu, R., Ling, J., Zhang, H.,

work page doi:10.1016/j.isprsjprs.2026.04.056 2026

[18] [18]

SoftFormer: SAR -optical fusion transformer for urban land use and land cover classifica tion. ISPRS J. Photogramm. Remote Sens. 218, 277-293. https://doi.org/10.1016/j.isprsjprs.2024.09.012 Long, J., Shelhamer, E., Darrell, T.,

work page doi:10.1016/j.isprsjprs.2024.09.012 2024

[19] [19]

In: Proceedings of the IEEE conference on computer vision and pattern recognition

Fully convolutional networks for semantic segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition. pp. 3431 -3440. https://doi.org/10.1109/CVPR.2015.7298965 Ma, J., Chen, Y ., Pan, J., Xu, J., Li, Z., Xu, R., Chen, R., 2024a. SCT -CR: A synergistic convolution -transformer modeling method using SAR -optical data fus...

work page doi:10.1109/cvpr.2015.7298965 2015

[20] [20]

https://doi.org/10.1016/j.isprsjprs.2019.04.015 Ma, W., Karakuş, O., Rosin, P.L.,

work page doi:10.1016/j.isprsjprs.2019.04.015 2019

[21] [21]

A multilevel multimodal fusion Transformer for remote sensing semantic segmentation

https://doi.org/10.3390/rs14184458 Ma, X., Zhang, X., Pun, M.-O., Liu, M., 2024b. A multilevel multimodal fusion Transformer for remote sensing semantic segmentation. IEEE Trans. Geosci. Remote Sens

work page doi:10.3390/rs14184458

[22] [22]

https://doi.org/10.1109/TGRS.2024.3373033 Milletari, F., Navab, N., Ahmadi, S.-A.,

work page doi:10.1109/tgrs.2024.3373033 2024

[23] [23]

In: 2016 fourth international conference on 3D vision (3DV)

V-net: Fully convolutional neural networks for volumetric medi cal image segmentation. In: 2016 fourth international conference on 3D vision (3DV). IEEE. pp. 565 -571. https://doi.org/10.1109/3DV .2016.79 Moreira, A., Prats-Iraola, P., Younis, M., Krieger, G., Hajnsek, I., Papathanassiou, K.P.,

work page doi:10.1109/3dv 2016

[24] [24]

IEEE Geosci

A tutorial on synthetic aperture radar. IEEE Geosci. Remote Sens. Mag. 1, 6-43. https://doi.org/10.1109/MGRS.2013.2248301 Pan, J., Xu, J., Yu, X., Ye, G., Wang, M., Chen, Y ., Ma, J.,

work page doi:10.1109/mgrs.2013.2248301 2013

[25] [25]

HDRSA-Net: Hybrid dynamic residual self -attention network for SAR -assisted optical image cloud and shadow removal. ISPRS J. Photogramm. Remote Sens. 218, 258-278. https://doi.org/10.1016/j.isprsjprs.2024.10.026 Peebles, W., Xie, S.,

work page doi:10.1016/j.isprsjprs.2024.10.026 2024

[26] [26]

In: 2023 IEEE/CVF International Conference on Computer Vi- sion (ICCV)

Scalable diffusion models with transformers. In: Proceedings of the IEEE/CVF international conference on computer vision. pp. 4195-4205. https://doi.org/10.1109/ICCV51070.2023.00387 Qiu, S., Zhu, Z., He, B.,

work page doi:10.1109/iccv51070.2023.00387 2023

[27] [27]

Remote Sens

Fmask 4.0: Improved cloud and cloud shadow detection in Landsats 4 –8 and Sentinel -2 imagery. Remote Sens. Environ. 231, 111205. https://doi.org/10.1016/j.rse.2019.05.024 Ren, B., Ma, S., Hou, B., Hong, D., Chanussot, J., Wang, J., Jiao, L.,

work page doi:10.1016/j.rse.2019.05.024 2019

[28] [28]

A dual-stream high resolution network: Deep fusion of GF-2 and GF-3 data for land cover classification. Int. J. Appl. Earth Obs. Geoinf. 112, 102896. https://doi.org/10.1016/j.jag.2022.102896 Schmitt, M., Hughes, L.H., Qiu, C., Zhu, X.X.,

work page doi:10.1016/j.jag.2022.102896 2022

[29] [29]

arXiv preprint arXiv:1906.07789 Shang, X., Li, G., Jiang, Z., Zhang, S., Ding, N., Liu, J.,

SEN12MS --A curated dataset of georeferenced multi -spectral sentinel -1/2 imagery for deep learning and data fusion. arXiv preprint arXiv:1906.07789 Shang, X., Li, G., Jiang, Z., Zhang, S., Ding, N., Liu, J.,

Pith/arXiv arXiv 1906

[30] [30]

Holistic dynamic frequency transformer for image fusion and exposure correction. Inf. Fusion 102, 102073. https://doi.org/10.1016/j.inffus.2023.102073 Shermeyer, J., Hogan, D., Brown, J., Van Etten, A., Weir, N., Pacifici, F., Hansch, R., Bastidas, A., Soenen, S., Bacastow, T.,

work page doi:10.1016/j.inffus.2023.102073 2023

[31] [31]

https://doi.org/10.1109/CVPRW50498.2020.00106 Shu, Q., Zhu, X., Xu, S., Wang, Y ., Liu, D.,

work page doi:10.1109/cvprw50498.2020.00106 2020

[32] [32]

Remote Sens

RESTORE -DiT: Reliable satellite image time series reconstruction by multimodal sequential diffusion transformer. Remote Sens. Environ. 328, 114872. https://doi.org/10.1016/j.rse.2025.114872 Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.,

work page doi:10.1016/j.rse.2025.114872 2025

[33] [33]

Sensors 8, 4213-4248

On the soil roughness parameterization problem in soil moisture retrieval of bare surfaces from synthetic aperture radar. Sensors 8, 4213-4248. https://doi.org/10.3390/s8074213 Wang, M., Hu, S., Song, Y ., Shi, Y .,

work page doi:10.3390/s8074213

[34] [34]

https://doi.org/10.3390/rs17132241 Wang, Y ., Sun, Y ., Cao, X., Wang, Y ., Zhang, W., Cheng, X.,

work page doi:10.3390/rs17132241

[35] [35]

A review of regional and Global scale Land Use/Land Cover (LULC) mapping products generated from satellite remote sensing. ISPRS J. Photogramm. Remote Sens. 206, 311-334. https://doi.org/10.1016/j.isprsjprs.2023.11.014 Wei, K., Dai, J., Hong, D., Ye, Y .,

work page doi:10.1016/j.isprsjprs.2023.11.014 2023

[36] [36]

MGFNet: An MLP -dominated gated fusion network for semantic segmentation of high- resolution multi-modal remote sensing images. Int. J. Appl. Earth Obs. Geoinf. 135, 104241. https://doi.org/10.1016/j.jag.2024.104241 Wei, Y ., Xiao, A., Chen, H., Xia, J., Yokoya, N.,

work page doi:10.1016/j.jag.2024.104241 2024

[37] [37]

arXiv preprint arXiv:2603.17528 Williamson, C.J., Kupc, A., Axisa, D., Bilsback, K.R., Bui, T., Campuzano-Jost, P., Dollner, M., Froyd, K.D., Hodshire, A.L., Jimenez, J.L.,

MM-OVSeg: Multimodal Optical-SAR Fusion for Open-V ocabulary Segmentation in Remote Sensing. arXiv preprint arXiv:2603.17528 Williamson, C.J., Kupc, A., Axisa, D., Bilsback, K.R., Bui, T., Campuzano-Jost, P., Dollner, M., Froyd, K.D., Hodshire, A.L., Jimenez, J.L.,

arXiv

[38] [38]

Nature 574, 399-403

A large source of cloud condensation nuclei from new particle formation in the tropics. Nature 574, 399-403. https://doi.org/10.1038/s41586-019-1638-9 Wu, P ., Yao, Y ., Wan, Y ., Zhang, W., Zhao, R., Li, J., Zhang, Y .,

work page doi:10.1038/s41586-019-1638-9

[39] [39]

arXiv preprint arXiv:2602.05480 Wu, S., Zhu, J., Gu, Y ., Han, W., Jiang, W., Geng, J.,

SOMA- 1M: A Large -Scale SAR -Optical Multi -resolution Alignment Dataset for Multi-Task Remote Sensing. arXiv preprint arXiv:2602.05480 Wu, S., Zhu, J., Gu, Y ., Han, W., Jiang, W., Geng, J.,

arXiv

[40] [40]

Quantifying the sensitivity of SAR and optical images three -level fusions in land cover classification to registration errors

Wu, W., Shao, Z., Huang, X., Teng, J., Guo, S., Li, D., 2022 . Quantifying the sensitivity of SAR and optical images three -level fusions in land cover classification to registration errors. Int. J. Appl. Earth Obs. Geoinf. 112, 102868. https://doi.org/10.1016/j.jag.2022.102868 Xia, J., Chen, H., Broni -Bediako, C., Wei, Y ., Song, J., Yokoya, N.,

work page doi:10.1016/j.jag.2022.102868 2022

[41] [41]

IEEE Geosci

OpenEarthMap-SAR: A benchmark synthetic aperture radar dataset for global high-resolution land cover mapping [Software and Data Sets]. IEEE Geosci. Remote Sens. M ag. 13, 476 -487. https://doi.org/10.1109/MGRS.2025.3599512 Xie, E., Wang, W., Y u, Z., Anandkumar, A., Alvarez, J.M., Luo, P.,

work page doi:10.1109/mgrs.2025.3599512 2025

[42] [42]

CloudSeg: A multi-modal learning framework for robust land cover mapping under cloudy conditions. ISPRS J. Ph otogramm. Remote Sens. 214, 21 -32. https://doi.org/10.1016/j.isprsjprs.2024.06.001 Xu, F., Shi, Y ., Yang, W., Zhu, X.,

work page doi:10.1016/j.isprsjprs.2024.06.001 2024

[43] [43]

Synergistic use of Sentinel-1 and Sentinel-2 images for in-season crop type classification,

Multi -modal multi-task learning for semantic segmentation of land c over under cloudy conditions. In: IGARSS 2023-2023 IEEE International Geoscience and Remote Sensing Symposium. IEEE. pp. 6274 -6277. https://doi.org/10.1109/IGARSS52108.2023.10281865 Xu, J., Yu, X., Pan, J., Cao, L., Wang, M.,

work page doi:10.1109/igarss52108.2023.10281865 2023

[44] [44]

PolNet -CR: Spatial-channel collaborative interaction network for PolSAR incremental information - assisted optical satellite imagery cloud removal. Inf. Fusion 124, 103367. https://doi.org/10.1016/j.inffus.2025.103367 Yeung, M., Sala, E., Schönlieb, C. -B., Rundo, L.,

work page doi:10.1016/j.inffus.2025.103367 2025

[45] [45]

Unified focal loss: Generalising dice and cross entropy-based losses to handle class imbalanced medical image segmentation. Comput. Med. Imaging Graph. 95, 102026. https://doi.org/10.1016/j.compmedimag.2021.102026 Yu, B., Li, J., Huang, X.,

work page doi:10.1016/j.compmedimag.2021.102026 2021

[46] [46]

STSNet: A cross-spatial resolution multi-modal remote sensing deep fusion networ k for high resolution land- cover segmentation. Inf. Fusion 114, 102689. https://doi.org/10.1016/j.inffus.2024.102689 Yu, H., Li, G., Liu, H., Zhu, S., Dong, W., Li, C.,

work page doi:10.1016/j.inffus.2024.102689 2024

[47] [47]

arXiv preprint arXiv:2410.03962 Zanaga, D., Van De Kerchove, R., Daems, D., De Keersmaecker, W., Brockmann, C., Kirches, G., Wevers, J., Cartus, O., San toro, M., Fritz, S.,

SpecSAR -former: a lightweight transformer -based network for global LULC mapping using integrated Sentinel-1 and Sentinel-2. arXiv preprint arXiv:2410.03962 Zanaga, D., Van De Kerchove, R., Daems, D., De Keersmaecker, W., Brockmann, C., Kirches, G., Wevers, J., Cartus, O., San toro, M., Fritz, S.,

arXiv

[48] [48]

https://doi.org/10.5281/zenodo.5571936 Zhang, J., Liu, H., Yang, K., Hu, X., Liu, R., Stiefelhagen, R.,

ESA WorldCover 10 m 2021 v200. https://doi.org/10.5281/zenodo.5571936 Zhang, J., Liu, H., Yang, K., Hu, X., Liu, R., Stiefelhagen, R.,

work page doi:10.5281/zenodo.5571936 2021

[49] [49]

IEEE Trans

CMX: Cross- modal fusion for RGB -X semantic segmentation with Transformers. IEEE Trans. Intell. Transp. Syst. 24, 14679 -14694. https://doi.org/10.1109/TITS.2023.3300537 Zhang, P., Peng, B., Lu, C., Huang, Q., Liu, D.,

work page doi:10.1109/tits.2023.3300537 2023

[50] [50]

ASANet: Asymmetric Semantic Aligning Network for RGB and SAR image land cover classification. ISPRS J. Photogramm. Remote Sens. 218, 574 -587. https://doi.org/10.1016/j.isprsjprs.2024.09.025 Zhang, R., Yang, Y ., Li, Z., Li, P., Wang, H., 2025a. Optical and SAR Image Fusion: A Review of Theories, Methods, and Applications. Remote Sens. 18,

work page doi:10.1016/j.isprsjprs.2024.09.025 2024

[51] [51]

GLC_FCS10: a global 10 -m land -cover dataset with a fine classification system from Sentinel -1 and Sentinel -2 time -series data in Google Earth Engine

https://doi.org/10.3390/rs18010073 Zhang, X., Liu, L., Zhao, T., Zhang, W., Guan, L., Bai, M., Chen, X., 2025b. GLC_FCS10: a global 10 -m land -cover dataset with a fine classification system from Sentinel -1 and Sentinel -2 time -series data in Google Earth Engine. Earth Syst. Sci. Data Discuss, 1 -27. https://doi.org/10.5194/essd- 2025-73 Zhang, Z., Yan...

work page doi:10.3390/rs18010073 2025

[52] [52]

https://doi.org/10.1109/TGRS.2025.3574799 Zheng, H., Zhong, X., Liu, B., Xiao, Y ., Wen, B., Li, X.,

work page doi:10.1109/tgrs.2025.3574799 2025

[53] [53]

https://doi.org/10.1109/TGRS.2025.3621902 Zheng, N., Zhou, M., Huang, J., Hou, J., Li, H., Xu, Y ., Zhao, F.,

work page doi:10.1109/tgrs.2025.3621902 2025

[54] [54]

In: 2024 IEEE/CVF Conference on Com- puter Vision and Pattern Recognition (CVPR)

Probing synergistic high-order interaction in infrared and visible image fusion. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. pp. 26384 -26395. https://doi.org/10.1109/CVPR52733.2024.02492 Zhou, M., Zheng, N., He, X., Hong, D., Chanussot, J.,

work page doi:10.1109/cvpr52733.2024.02492 2024

[55] [55]

IEEE Trans

Probing Synergistic High-Order Interaction for Multi-modal Image Fusion. IEEE Trans. Pattern Anal. Mach. Intell. https://doi.org/10.1109/TPAMI.2024.3475485 Zhu, X.X., Tuia, D., Mou, L., Xia, G.-S., Zhang, L., Xu, F., Fraundorfer, F.,

work page doi:10.1109/tpami.2024.3475485 2024

[56] [56]

IEEE Geosci

Deep learning in remote sensing: A comprehensive review and list of resources. IEEE Geosci. Remote Sens. Mag. 5, 8 -36. https://doi.org/10.1109/MGRS.2017.2762307 Zhu, Z., Woodcock, C.E.,

work page doi:10.1109/mgrs.2017.2762307 2017

[57] [57]

Remote Sens

Continuous change detection and classification of land cover using all available Landsat data. Remote Sens. Environ. 144, 152-171. https://doi.org/10.1016/j.rse.2014.01.011

work page doi:10.1016/j.rse.2014.01.011 2014