pith. sign in

arxiv: 2606.17713 · v1 · pith:CD7OF2I6new · submitted 2026-06-16 · 💻 cs.CV

Heterogeneous SAR-optical fusion for near-real-time land use and land cover mapping under cloud contamination: A novel framework and global benchmark dataset

Pith reviewed 2026-06-27 01:27 UTC · model grok-4.3

classification 💻 cs.CV
keywords land use land cover mappingSAR-optical fusioncloud contaminationremote sensingdeep learningSentinel-1 Sentinel-2benchmark datasetsemantic segmentation
0
0 comments X

The pith

CloudLULC-Net fuses cloud-contaminated optical images with adjacent radar data to produce accurate land cover maps directly.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper establishes that an end-to-end network can map land use and land cover in near real time by taking cloud-covered Sentinel-2 optical images together with nearby Sentinel-1 radar observations, rather than first trying to reconstruct clear optical images. A sympathetic reader would care because optical sensors fail under clouds in many regions, while radar sees through them, yet prior fusion methods left semantic gaps that limited reliability for timely mapping. The authors introduce CloudLULC-Net with three main modules plus a new optimization step, and they release CloudLULC-Set, a global collection of over forty thousand paired images with labels, to test the approach. Experiments report 86.60 percent overall accuracy and better results than earlier methods across varying cloud levels.

Core claim

CloudLULC-Net is an end-to-end heterogeneous SAR-optical fusion framework that directly predicts LULC maps from cloud-contaminated Sentinel-2 imagery and temporally adjacent Sentinel-1 SAR observations. The network incorporates optical reliability modulation to suppress unreliable optical responses, heterogeneous information adaptive aggregation to model high-order spatial-channel interactions between optical and SAR representations, and a unified semantic mapping transformer to organize fused features in a LULC-oriented latent space, together with a semantic anchor-guided optimization strategy. On the CloudLULC-Set benchmark of 40,223 triplets, the method reaches 86.60 percent overall accur

What carries the argument

CloudLULC-Net, an end-to-end network that applies optical reliability modulation, heterogeneous information adaptive aggregation, and a unified semantic mapping transformer to fuse SAR and optical inputs into LULC predictions.

If this is right

  • Enables direct target-date mapping in cloud-prone areas without separate cloud-removal preprocessing.
  • Maintains accuracy across varying cloud-cover percentages.
  • Surpasses both reconstruction-first pipelines and other joint SAR-optical networks on the same data.
  • Supports comparisons against existing global LULC products to show practical gains.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same fusion structure might apply to other sensor pairs where one modality is weather-sensitive.
  • If radar data can be delivered within hours of the optical pass, near-real-time operational mapping becomes feasible.
  • The approach could lessen dependence on long time-series optical stacks for gap filling.

Load-bearing premise

Temporally adjacent radar observations are always available and supply enough complementary information to offset uncertainty in the clouded optical data without creating large mismatches in land features.

What would settle it

Performance falls sharply on a held-out test collection where the radar images were acquired more than a week apart from the optical images, revealing errors traceable to temporal mismatch.

Figures

Figures reproduced from arXiv: 2606.17713 by Jiangong Xu, Jun Pan, Mi Wang, Weibao Xue, Xiaoyu Yu, Xinlian Lianga.

Figure 5
Figure 5. Figure 5: Architecture of the Heterogeneous Information Adaptive Aggregation (HIAA) block, comprising Spatial Higher-Order Interaction (SHOI) and Channel Higher-Order Interaction (CHOI) modules for cross-modal feature integration in both spatial and channel dimensions. 4.3. Heterogeneous information adaptive aggregation After optical reliability modulation, the reliability-aware optical feature 𝐅𝐅�𝑜𝑜 and the SAR fea… view at source ↗
Figure 6
Figure 6. Figure 6: Class-wise comparison of CloudLULC-Net and representative methods on CloudLULC-Set. (a) IoU of different LULC categories and mIoU. (b) F1-score of different LULC categories and mean F1-score. 5.2.2 Comparative validation with existing global LULC products To further evaluate the practical value of CloudLULC-Net for target-date LULC mapping under cloud-contaminated conditions, we compared the generated maps… view at source ↗
Figure 10
Figure 10. Figure 10: Quantitative comparison of CloudLULC-Net and selected representative methods under different cloud-coverage levels. The selected methods include representative reconstruction-first and end-to-end SAR–optical mapping baselines [PITH_FULL_IMAGE:figures/full_fig_p020_10.png] view at source ↗
Figure 11
Figure 11. Figure 11: Visual comparison of LULC mapping results generated by CloudLULC-Net and selected representative methods under different cloud-coverage levels, with corresponding local magnifications. Subfigures (a)–(d) show the cloud-contaminated optical images, corresponding SAR images, nearest cloud-free optical images used for reference interpretation, and manually annotated LULC reference labels, respectively. Subfi… view at source ↗
Figure 12
Figure 12. Figure 12: Visual comparison of LULC mapping results generated by CloudLULC-Net under different input configurations in the Biobío region. The upper panels show (a) cloud-contaminated optical imagery, (b) SAR imagery, (c) temporally closest cloud-free optical imagery used as an ideal reference condition, and (d) manually annotated reference labels. The lower panels show the LULC mapping results obtained using (e) cl… view at source ↗
Figure 13
Figure 13. Figure 13: Sensitivity analysis of CloudLULC-Net with respect to the number of stacked HIAA blocks 𝑁𝑁 and the maximum interaction order 𝒪𝒪. (a) OA, (b) F1-score, and (c) mIoU under different combinations of 𝑁𝑁 and 𝒪𝒪. 6.4. Benefits of LULC mapping for fine-grained spatiotemporal surface analysis Beyond benchmark accuracy, an important advantage of CloudLULC-Net lies in its potential for fine-grained target-date LULC… view at source ↗
read the original abstract

Optical remote sensing imagery is frequently degraded by cloud and cloud-shadow contamination, which limits its reliability for near-real-time land use and land cover (LULC) mapping. Although synthetic aperture radar (SAR) can provide cloud-penetrating structural information, existing SAR-optical fusion methods often assume reliable optical observations and insufficiently address the semantic uncertainty introduced by cloud contamination. To address this issue, we propose CloudLULC-Net, an end-to-end heterogeneous SAR-optical fusion framework that directly predicts LULC maps from cloud-contaminated Sentinel-2 imagery and temporally adjacent Sentinel-1 SAR observations. The proposed network incorporates optical reliability modulation to suppress unreliable optical responses, heterogeneous information adaptive aggregation to model high-order spatial-channel interactions between optical and SAR representations, and a unified semantic mapping transformer to organize fused features in a LULC-oriented latent space. A semantic anchor-guided optimization strategy is further introduced to improve the consistency of intermediate semantic representations. To support this task, we construct CloudLULC-Set, a large-scale benchmark dataset containing 40,223 curated SAR-optical-label triplets with pixel-level LULC annotations across diverse geographic regions and cloud conditions. Experimental results show that CloudLULC-Net achieves an OA of 86.60%, an F1-score of 83.29%, and an mIoU of 73.51%, outperforming representative heterogeneous reconstruction-first and end-to-end SAR-optical mapping methods. Comparisons with existing global LULC products and analyses under different cloud-cover levels further demonstrate the robustness and practical value of CloudLULC-Net for target-date LULC mapping in cloud-prone regions.The project is publicly available at: https://github.com/RSIIPAC/CloudLULC

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The manuscript proposes CloudLULC-Net, an end-to-end heterogeneous SAR-optical fusion network for near-real-time LULC mapping from cloud-contaminated Sentinel-2 optical imagery paired with temporally adjacent Sentinel-1 SAR observations. The network includes optical reliability modulation, heterogeneous adaptive aggregation, a semantic mapping transformer, and semantic anchor-guided optimization. The authors also release CloudLULC-Set, a benchmark of 40,223 curated SAR-optical-label triplets spanning diverse regions and cloud conditions. Experiments report OA of 86.60%, F1 of 83.29%, and mIoU of 73.51%, outperforming reconstruction-first and end-to-end baselines, with additional comparisons to global LULC products under varying cloud cover.

Significance. If the empirical results prove robust, the work supplies a practical framework and large public benchmark for operational LULC mapping in persistently cloudy regions, directly addressing a common limitation of optical-only approaches. The dataset release and code availability constitute clear strengths for reproducibility and future method development in remote sensing.

major comments (2)
  1. [Method description and Experiments] The central performance claims rest on the untested operating assumption that temporally adjacent Sentinel-1 acquisitions are always available and supply semantically consistent structural information without appreciable mismatch error. The method description and experimental section provide no quantitative bound on acceptable time gaps, no ablation varying the temporal separation, and no per-class analysis for land-cover types that change on sub-weekly timescales. If mismatch errors are non-negligible, the heterogeneous aggregation and semantic-anchor losses cannot be guaranteed to deliver the reported gains over reconstruction-first baselines.
  2. [Dataset Construction] CloudLULC-Set is described as curated, yet the dataset-construction section supplies insufficient detail on selection criteria, cloud-condition sampling strategy, and safeguards against post-hoc curation choices that could favor the proposed network. Without these details, it is difficult to assess whether the held-out test performance (OA 86.60 %, mIoU 73.51 %) reflects genuine generalization or dataset-construction artifacts.
minor comments (2)
  1. [Abstract] The abstract states that the method outperforms 'representative' baselines but does not name the specific reconstruction-first and end-to-end methods or cite their implementations; this should be clarified for reproducibility.
  2. [Method] Notation for the optical reliability modulation and heterogeneous aggregation modules could be made more explicit (e.g., by defining the exact form of the modulation mask and the high-order interaction operator) to aid readers attempting re-implementation.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback. We address the two major comments point-by-point below and will revise the manuscript accordingly to strengthen the presentation of temporal consistency and dataset construction details.

read point-by-point responses
  1. Referee: [Method description and Experiments] The central performance claims rest on the untested operating assumption that temporally adjacent Sentinel-1 acquisitions are always available and supply semantically consistent structural information without appreciable mismatch error. The method description and experimental section provide no quantitative bound on acceptable time gaps, no ablation varying the temporal separation, and no per-class analysis for land-cover types that change on sub-weekly timescales. If mismatch errors are non-negligible, the heterogeneous aggregation and semantic-anchor losses cannot be guaranteed to deliver the reported gains over reconstruction-first baselines.

    Authors: We agree that an explicit analysis of temporal mismatch is needed to bound the operating regime. In CloudLULC-Set, SAR-optical pairs were formed using the closest available Sentinel-1 acquisition within a 6-day window of the target Sentinel-2 date (reflecting the combined revisit characteristics of the two sensors). To address the referee's concern, the revised manuscript will add: (1) a quantitative summary of the actual time gaps present in the 40,223 triplets, (2) an ablation that retrains and evaluates CloudLULC-Net on subsets with increasing maximum temporal separation (0-2, 2-4, 4-6 days), and (3) per-class F1 and IoU breakdowns for dynamic categories (e.g., cropland, water bodies) under these gap conditions. These additions will either confirm that mismatch remains negligible within the chosen window or identify the practical limit beyond which performance degrades. revision: yes

  2. Referee: [Dataset Construction] CloudLULC-Set is described as curated, yet the dataset-construction section supplies insufficient detail on selection criteria, cloud-condition sampling strategy, and safeguards against post-hoc curation choices that could favor the proposed network. Without these details, it is difficult to assess whether the held-out test performance (OA 86.60 %, mIoU 73.51 %) reflects genuine generalization or dataset-construction artifacts.

    Authors: We acknowledge that the current description of CloudLULC-Set construction is too brief. The revised manuscript will expand this section with: (i) explicit inclusion/exclusion rules (minimum 10 % cloud cover, exclusion of scenes with >80 % cloud or permanent snow/ice, geographic stratification across 12 Köppen climate zones), (ii) the cloud-cover sampling distribution (uniform sampling within 10-30 %, 30-50 %, 50-70 % bins), and (iii) safeguards including independent annotation by two experts with adjudication, temporal consistency checks against higher-resolution reference imagery, and a fully documented train/validation/test split that was frozen before any model development. These details will allow readers to evaluate potential curation bias. revision: yes

Circularity Check

0 steps flagged

No significant circularity; empirical results on held-out test data

full rationale

The paper presents CloudLULC-Net as an end-to-end neural architecture with optical reliability modulation, heterogeneous aggregation, and semantic-anchor losses, plus the new CloudLULC-Set dataset of 40,223 triplets. Reported metrics (OA 86.60%, F1 83.29%, mIoU 73.51%) are obtained via standard supervised training and evaluation on held-out portions of that dataset against reconstruction-first and end-to-end baselines. No equations, parameter fits, or derivations appear that reduce by construction to the inputs; the central claims rest on empirical performance rather than any self-definitional, fitted-input-called-prediction, or self-citation-load-bearing steps. The assumption about temporally adjacent SAR availability is an operating-regime limitation, not a circularity in the reported results.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 0 invented entities

The central performance claims rest on the learned behavior of the proposed network modules and the assumption that the new dataset adequately represents global geographic and cloud conditions; no additional ad-hoc constants beyond standard neural network training are described in the abstract.

free parameters (1)
  • network weights and hyperparameters
    Learned from the CloudLULC-Set training split to optimize LULC prediction accuracy.
axioms (1)
  • domain assumption Temporally adjacent SAR observations provide reliable complementary structural information when optical data is cloud-contaminated.
    Core premise enabling the heterogeneous fusion approach.

pith-pipeline@v0.9.1-grok · 5869 in / 1328 out tokens · 43825 ms · 2026-06-27T01:27:07.075333+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

57 extracted references · 52 canonical work pages

  1. [1]

    https://doi.org/10.1038/s41597-022-01307-4 Chen, B., Huang, B., Chen, L., Xu, B.,

  2. [2]

    IEEE Trans

    Spatially and temporally weighted regression: A novel method to produce continuous cloud- free Landsat imagery. IEEE Trans. Geosci. Remote Sens. 55, 27 -37. https://doi.org/10.1109/TGRS.2016.2580576 Chen, H., Zhang, J., Wang, H., Wang, S., Huang, P., Li, J., Guo, H., Wang, D., Wang, Z., Du, B.,

  3. [3]

    IEEE Trans

    Frequency-aware feature fusion for dense image prediction. IEEE Trans. Pattern Anal. Mach. Intell. 46, 10763-10780. https://doi.org/10.1109/TPAMI.2024.3449959 Chen, Y ., Bruz zone, L.,

  4. [4]

    IEEE Trans

    Self -supervised SAR -optical data fusion of Sentinel-1/-2 images. IEEE Trans. Geosci. Remote Sens. 60, 1 -11. https://doi.org/10.1109/TGRS.2021.3128072 Chi, L., Jiang, B., Mu, Y .,

  5. [5]

    In: Proceedings of the IEEE conference on computer vision and pattern recognition workshops

    Deepglobe 2018: A challenge to parse the earth through satellite images. In: Proceedings of the IEEE conference on computer vision and pattern recognition workshops. pp. 172 -181. https://doi.org/10.1109/CVPRW.2018.00031 Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S.,

  6. [6]

    arXiv preprint arXiv:2010.11929 Ebel, P., Meraner, A., Schmitt, M., Zhu, X.,

    An image is worth 16x16 words: Transformers for image recognition at scale. arXiv preprint arXiv:2010.11929 Ebel, P., Meraner, A., Schmitt, M., Zhu, X.,

  7. [7]

    https://doi.org/10.1109/TGRS.2020.3024744 Ghorbanian, A., Kakooei, M., Amani, M., Mahdavi, S., Mohammadzadeh, A., Hasanlou, M.,

  8. [8]

    https://doi.org/10.1016/j.isprsjprs.2020.07.013 Guo, S., Wu, W., Shao, Z., Teng, J., Li, D.,

  9. [9]

    Extracting urban impervious surface based on optical and SAR images cross -modal multi-scale features fusion network. Int. J. Digit. Earth 17, 2301675. https://doi.org/10.1080/17538947.2023.2301675 Karra, K., Kontgis, C., Statman-Weil, Z., Mazzariello, J.C., Mathis, M., Brumby, S.P.,

  10. [10]

    and Chini, M

    Global land use/land cover with Sentinel 2 and deep learning. In: IEEE International Geoscience and Remote Sensing Symposium (IGARSS). IEEE. pp. 4704-4707. https://doi.org/10.1109/IGARSS47720.2021.9553499 Kattenborn, T., Lopatin, J., Förster, M., Braun, A.C., Fassnacht, F.E.,

  11. [11]

    Remote Sens

    UA V data as alternative to field sampling to map woody invasive species based on combined Sentinel-1 and Sentinel-2 data. Remote Sens. Environ. 227, 61-73. https://doi.org/10.1016/j.rse.2019.03.025 Lee-Thorp, J., Ainslie, J., Eckstein, I., Ontanon, S.,

  12. [12]

    In: Proceedings of the 2022 Conference of the north American chapter of the Association for Computational Linguistics: human language technologies

    Fnet: Mixing tokens with fourier transforms. In: Proceedings of the 2022 Conference of the north American chapter of the Association for Computational Linguistics: human language technologies. pp. 4296 -4313. https://doi.org/10.18653/v1/2022.naacl-main.319 Li, C., Lyu, H., Jing, W., Yuan, Y ., Cheng, G., 2025a. MFFNet: a wavelet transform-based multimodal...

  13. [13]

    IEEE Trans

    Collaborative attention -based heterogeneous gated fusion network for land cover classification. IEEE Trans. Geosci. Remote Sens. 59, 3829 -3845. https://doi.org/10.1109/TGRS.2020.3015389 Li, X., Zhang, G., Cui, H., Hou, S., Wang, S., Li, X., Chen, Y ., Li, Z., Zhang, L., 2022a. MCANet: A joint semantic segmentation framework of optical and SAR images for...

  14. [14]

    HS2P: Hierarchical spectral and structure -preserving fusion network for multimodal remote sensing image cloud and shadow removal. Inf. Fusion 94, 215 -228. https://doi.org/10.1016/j.inffus.2023.02.002 Li, Y ., Xue, Y ., Xin, Z., Liao, G., Huang, P ., 2025b. Multi -modal cross Swin transformer network for multi -label classification landslide detection wi...

  15. [15]

    https://doi.org/10.1016/j.isprsjprs.2022.02.013 Li, Z., Weng, Q., Zhou, Y ., Dou, P., Ding, X.,

  16. [16]

    Remote Sens

    Learning spectral-indices- fused deep models for time-series land use and land cover mapping in cloud- prone areas: The case of Pearl River Delta. Remote Sens. Environ. 308, 114190. https://doi.org/10.1016/j.rse.2024.114190 Liu, C., Huang, W., Zhu, X.X.,

  17. [17]

    https://doi.org/10.1016/j.isprsjprs.2026.04.056 Liu, R., Ling, J., Zhang, H.,

  18. [18]

    SoftFormer: SAR -optical fusion transformer for urban land use and land cover classifica tion. ISPRS J. Photogramm. Remote Sens. 218, 277-293. https://doi.org/10.1016/j.isprsjprs.2024.09.012 Long, J., Shelhamer, E., Darrell, T.,

  19. [19]

    In: Proceedings of the IEEE conference on computer vision and pattern recognition

    Fully convolutional networks for semantic segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition. pp. 3431 -3440. https://doi.org/10.1109/CVPR.2015.7298965 Ma, J., Chen, Y ., Pan, J., Xu, J., Li, Z., Xu, R., Chen, R., 2024a. SCT -CR: A synergistic convolution -transformer modeling method using SAR -optical data fus...

  20. [20]

    https://doi.org/10.1016/j.isprsjprs.2019.04.015 Ma, W., Karakuş, O., Rosin, P.L.,

  21. [21]

    A multilevel multimodal fusion Transformer for remote sensing semantic segmentation

    https://doi.org/10.3390/rs14184458 Ma, X., Zhang, X., Pun, M.-O., Liu, M., 2024b. A multilevel multimodal fusion Transformer for remote sensing semantic segmentation. IEEE Trans. Geosci. Remote Sens

  22. [22]

    https://doi.org/10.1109/TGRS.2024.3373033 Milletari, F., Navab, N., Ahmadi, S.-A.,

  23. [23]

    In: 2016 fourth international conference on 3D vision (3DV)

    V-net: Fully convolutional neural networks for volumetric medi cal image segmentation. In: 2016 fourth international conference on 3D vision (3DV). IEEE. pp. 565 -571. https://doi.org/10.1109/3DV .2016.79 Moreira, A., Prats-Iraola, P., Younis, M., Krieger, G., Hajnsek, I., Papathanassiou, K.P.,

  24. [24]

    IEEE Geosci

    A tutorial on synthetic aperture radar. IEEE Geosci. Remote Sens. Mag. 1, 6-43. https://doi.org/10.1109/MGRS.2013.2248301 Pan, J., Xu, J., Yu, X., Ye, G., Wang, M., Chen, Y ., Ma, J.,

  25. [25]

    HDRSA-Net: Hybrid dynamic residual self -attention network for SAR -assisted optical image cloud and shadow removal. ISPRS J. Photogramm. Remote Sens. 218, 258-278. https://doi.org/10.1016/j.isprsjprs.2024.10.026 Peebles, W., Xie, S.,

  26. [26]

    In: 2023 IEEE/CVF International Conference on Computer Vi- sion (ICCV)

    Scalable diffusion models with transformers. In: Proceedings of the IEEE/CVF international conference on computer vision. pp. 4195-4205. https://doi.org/10.1109/ICCV51070.2023.00387 Qiu, S., Zhu, Z., He, B.,

  27. [27]

    Remote Sens

    Fmask 4.0: Improved cloud and cloud shadow detection in Landsats 4 –8 and Sentinel -2 imagery. Remote Sens. Environ. 231, 111205. https://doi.org/10.1016/j.rse.2019.05.024 Ren, B., Ma, S., Hou, B., Hong, D., Chanussot, J., Wang, J., Jiao, L.,

  28. [28]

    A dual-stream high resolution network: Deep fusion of GF-2 and GF-3 data for land cover classification. Int. J. Appl. Earth Obs. Geoinf. 112, 102896. https://doi.org/10.1016/j.jag.2022.102896 Schmitt, M., Hughes, L.H., Qiu, C., Zhu, X.X.,

  29. [29]

    arXiv preprint arXiv:1906.07789 Shang, X., Li, G., Jiang, Z., Zhang, S., Ding, N., Liu, J.,

    SEN12MS --A curated dataset of georeferenced multi -spectral sentinel -1/2 imagery for deep learning and data fusion. arXiv preprint arXiv:1906.07789 Shang, X., Li, G., Jiang, Z., Zhang, S., Ding, N., Liu, J.,

  30. [30]

    Holistic dynamic frequency transformer for image fusion and exposure correction. Inf. Fusion 102, 102073. https://doi.org/10.1016/j.inffus.2023.102073 Shermeyer, J., Hogan, D., Brown, J., Van Etten, A., Weir, N., Pacifici, F., Hansch, R., Bastidas, A., Soenen, S., Bacastow, T.,

  31. [31]

    https://doi.org/10.1109/CVPRW50498.2020.00106 Shu, Q., Zhu, X., Xu, S., Wang, Y ., Liu, D.,

  32. [32]

    Remote Sens

    RESTORE -DiT: Reliable satellite image time series reconstruction by multimodal sequential diffusion transformer. Remote Sens. Environ. 328, 114872. https://doi.org/10.1016/j.rse.2025.114872 Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.,

  33. [33]

    Sensors 8, 4213-4248

    On the soil roughness parameterization problem in soil moisture retrieval of bare surfaces from synthetic aperture radar. Sensors 8, 4213-4248. https://doi.org/10.3390/s8074213 Wang, M., Hu, S., Song, Y ., Shi, Y .,

  34. [34]

    https://doi.org/10.3390/rs17132241 Wang, Y ., Sun, Y ., Cao, X., Wang, Y ., Zhang, W., Cheng, X.,

  35. [35]

    A review of regional and Global scale Land Use/Land Cover (LULC) mapping products generated from satellite remote sensing. ISPRS J. Photogramm. Remote Sens. 206, 311-334. https://doi.org/10.1016/j.isprsjprs.2023.11.014 Wei, K., Dai, J., Hong, D., Ye, Y .,

  36. [36]

    MGFNet: An MLP -dominated gated fusion network for semantic segmentation of high- resolution multi-modal remote sensing images. Int. J. Appl. Earth Obs. Geoinf. 135, 104241. https://doi.org/10.1016/j.jag.2024.104241 Wei, Y ., Xiao, A., Chen, H., Xia, J., Yokoya, N.,

  37. [37]

    arXiv preprint arXiv:2603.17528 Williamson, C.J., Kupc, A., Axisa, D., Bilsback, K.R., Bui, T., Campuzano-Jost, P., Dollner, M., Froyd, K.D., Hodshire, A.L., Jimenez, J.L.,

    MM-OVSeg: Multimodal Optical-SAR Fusion for Open-V ocabulary Segmentation in Remote Sensing. arXiv preprint arXiv:2603.17528 Williamson, C.J., Kupc, A., Axisa, D., Bilsback, K.R., Bui, T., Campuzano-Jost, P., Dollner, M., Froyd, K.D., Hodshire, A.L., Jimenez, J.L.,

  38. [38]

    Nature 574, 399-403

    A large source of cloud condensation nuclei from new particle formation in the tropics. Nature 574, 399-403. https://doi.org/10.1038/s41586-019-1638-9 Wu, P ., Yao, Y ., Wan, Y ., Zhang, W., Zhao, R., Li, J., Zhang, Y .,

  39. [39]

    arXiv preprint arXiv:2602.05480 Wu, S., Zhu, J., Gu, Y ., Han, W., Jiang, W., Geng, J.,

    SOMA- 1M: A Large -Scale SAR -Optical Multi -resolution Alignment Dataset for Multi-Task Remote Sensing. arXiv preprint arXiv:2602.05480 Wu, S., Zhu, J., Gu, Y ., Han, W., Jiang, W., Geng, J.,

  40. [40]

    Quantifying the sensitivity of SAR and optical images three -level fusions in land cover classification to registration errors

    Wu, W., Shao, Z., Huang, X., Teng, J., Guo, S., Li, D., 2022 . Quantifying the sensitivity of SAR and optical images three -level fusions in land cover classification to registration errors. Int. J. Appl. Earth Obs. Geoinf. 112, 102868. https://doi.org/10.1016/j.jag.2022.102868 Xia, J., Chen, H., Broni -Bediako, C., Wei, Y ., Song, J., Yokoya, N.,

  41. [41]

    IEEE Geosci

    OpenEarthMap-SAR: A benchmark synthetic aperture radar dataset for global high-resolution land cover mapping [Software and Data Sets]. IEEE Geosci. Remote Sens. M ag. 13, 476 -487. https://doi.org/10.1109/MGRS.2025.3599512 Xie, E., Wang, W., Y u, Z., Anandkumar, A., Alvarez, J.M., Luo, P.,

  42. [42]

    CloudSeg: A multi-modal learning framework for robust land cover mapping under cloudy conditions. ISPRS J. Ph otogramm. Remote Sens. 214, 21 -32. https://doi.org/10.1016/j.isprsjprs.2024.06.001 Xu, F., Shi, Y ., Yang, W., Zhu, X.,

  43. [43]

    Synergistic use of Sentinel-1 and Sentinel-2 images for in-season crop type classification,

    Multi -modal multi-task learning for semantic segmentation of land c over under cloudy conditions. In: IGARSS 2023-2023 IEEE International Geoscience and Remote Sensing Symposium. IEEE. pp. 6274 -6277. https://doi.org/10.1109/IGARSS52108.2023.10281865 Xu, J., Yu, X., Pan, J., Cao, L., Wang, M.,

  44. [44]

    PolNet -CR: Spatial-channel collaborative interaction network for PolSAR incremental information - assisted optical satellite imagery cloud removal. Inf. Fusion 124, 103367. https://doi.org/10.1016/j.inffus.2025.103367 Yeung, M., Sala, E., Schönlieb, C. -B., Rundo, L.,

  45. [45]

    Unified focal loss: Generalising dice and cross entropy-based losses to handle class imbalanced medical image segmentation. Comput. Med. Imaging Graph. 95, 102026. https://doi.org/10.1016/j.compmedimag.2021.102026 Yu, B., Li, J., Huang, X.,

  46. [46]

    STSNet: A cross-spatial resolution multi-modal remote sensing deep fusion networ k for high resolution land- cover segmentation. Inf. Fusion 114, 102689. https://doi.org/10.1016/j.inffus.2024.102689 Yu, H., Li, G., Liu, H., Zhu, S., Dong, W., Li, C.,

  47. [47]

    arXiv preprint arXiv:2410.03962 Zanaga, D., Van De Kerchove, R., Daems, D., De Keersmaecker, W., Brockmann, C., Kirches, G., Wevers, J., Cartus, O., San toro, M., Fritz, S.,

    SpecSAR -former: a lightweight transformer -based network for global LULC mapping using integrated Sentinel-1 and Sentinel-2. arXiv preprint arXiv:2410.03962 Zanaga, D., Van De Kerchove, R., Daems, D., De Keersmaecker, W., Brockmann, C., Kirches, G., Wevers, J., Cartus, O., San toro, M., Fritz, S.,

  48. [48]

    https://doi.org/10.5281/zenodo.5571936 Zhang, J., Liu, H., Yang, K., Hu, X., Liu, R., Stiefelhagen, R.,

    ESA WorldCover 10 m 2021 v200. https://doi.org/10.5281/zenodo.5571936 Zhang, J., Liu, H., Yang, K., Hu, X., Liu, R., Stiefelhagen, R.,

  49. [49]

    IEEE Trans

    CMX: Cross- modal fusion for RGB -X semantic segmentation with Transformers. IEEE Trans. Intell. Transp. Syst. 24, 14679 -14694. https://doi.org/10.1109/TITS.2023.3300537 Zhang, P., Peng, B., Lu, C., Huang, Q., Liu, D.,

  50. [50]

    ASANet: Asymmetric Semantic Aligning Network for RGB and SAR image land cover classification. ISPRS J. Photogramm. Remote Sens. 218, 574 -587. https://doi.org/10.1016/j.isprsjprs.2024.09.025 Zhang, R., Yang, Y ., Li, Z., Li, P., Wang, H., 2025a. Optical and SAR Image Fusion: A Review of Theories, Methods, and Applications. Remote Sens. 18,

  51. [51]

    GLC_FCS10: a global 10 -m land -cover dataset with a fine classification system from Sentinel -1 and Sentinel -2 time -series data in Google Earth Engine

    https://doi.org/10.3390/rs18010073 Zhang, X., Liu, L., Zhao, T., Zhang, W., Guan, L., Bai, M., Chen, X., 2025b. GLC_FCS10: a global 10 -m land -cover dataset with a fine classification system from Sentinel -1 and Sentinel -2 time -series data in Google Earth Engine. Earth Syst. Sci. Data Discuss, 1 -27. https://doi.org/10.5194/essd- 2025-73 Zhang, Z., Yan...

  52. [52]

    https://doi.org/10.1109/TGRS.2025.3574799 Zheng, H., Zhong, X., Liu, B., Xiao, Y ., Wen, B., Li, X.,

  53. [53]

    https://doi.org/10.1109/TGRS.2025.3621902 Zheng, N., Zhou, M., Huang, J., Hou, J., Li, H., Xu, Y ., Zhao, F.,

  54. [54]

    In: 2024 IEEE/CVF Conference on Com- puter Vision and Pattern Recognition (CVPR)

    Probing synergistic high-order interaction in infrared and visible image fusion. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. pp. 26384 -26395. https://doi.org/10.1109/CVPR52733.2024.02492 Zhou, M., Zheng, N., He, X., Hong, D., Chanussot, J.,

  55. [55]

    IEEE Trans

    Probing Synergistic High-Order Interaction for Multi-modal Image Fusion. IEEE Trans. Pattern Anal. Mach. Intell. https://doi.org/10.1109/TPAMI.2024.3475485 Zhu, X.X., Tuia, D., Mou, L., Xia, G.-S., Zhang, L., Xu, F., Fraundorfer, F.,

  56. [56]

    IEEE Geosci

    Deep learning in remote sensing: A comprehensive review and list of resources. IEEE Geosci. Remote Sens. Mag. 5, 8 -36. https://doi.org/10.1109/MGRS.2017.2762307 Zhu, Z., Woodcock, C.E.,

  57. [57]

    Remote Sens

    Continuous change detection and classification of land cover using all available Landsat data. Remote Sens. Environ. 144, 152-171. https://doi.org/10.1016/j.rse.2014.01.011