Vision Transformer-Based Time-Series Image Reconstruction for Cloud-Filling Applications

Lujun Li; Radu State; Yiqun Wang

arxiv: 2506.19591 · v2 · submitted 2025-06-24 · 💻 cs.CV · cs.AI· cs.LG· eess.IV

Vision Transformer-Based Time-Series Image Reconstruction for Cloud-Filling Applications

Lujun Li , Yiqun Wang , Radu State This is my paper

Pith reviewed 2026-05-19 07:54 UTC · model grok-4.3

classification 💻 cs.CV cs.AIcs.LGeess.IV

keywords cloud fillingmultispectral imagerysynthetic aperture radarvision transformertime seriesimage reconstructionremote sensingcrop mapping

0 comments

The pith

A time-series Vision Transformer reconstructs multispectral satellite images blocked by clouds using radar data and temporal patterns.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper sets out to demonstrate that a Vision Transformer framework can fill in missing spectral details in cloud-obscured multispectral images by processing sequences of images over time together with synthetic aperture radar inputs. This matters because persistent cloud cover disrupts early-season crop mapping, which relies on complete spectral information to track plant conditions. The approach relies on the transformer's attention to link consistent patterns across the time series with radar's cloud-penetrating properties. Experiments compare the full method against versions that drop either the time-series element or the radar data and report better reconstruction quality with the combined inputs.

Core claim

The paper claims that its Time-series MSI Image Reconstruction using Vision Transformer framework, which applies attention mechanisms to fuse temporal coherence from multispectral imagery sequences with complementary information from synthetic aperture radar, produces more accurate reconstructions in cloud-covered regions than baselines that use non-time-series multispectral and SAR data or time-series multispectral data alone.

What carries the argument

The Vision Transformer attention mechanism applied to paired time-series multispectral and SAR images, which identifies and restores missing spectral values by drawing on historical image consistency and radar complementarity.

Load-bearing premise

Temporal coherence across multispectral images over time can be combined with SAR data through the Vision Transformer's attention to yield accurate fills for cloud-obscured areas.

What would settle it

A side-by-side evaluation on cloud-free validation images showing that the time-series Vision Transformer reconstructions produce higher error rates than the non-time-series baselines under standard metrics such as PSNR or SSIM would disprove the superiority claim.

Figures

Figures reproduced from arXiv: 2506.19591 by Lujun Li, Radu State, Yiqun Wang.

**Figure 2.** Figure 2: The proposed Time-Series ViT reconstruction Structure [PITH_FULL_IMAGE:figures/full_fig_p003_2.png] view at source ↗

**Figure 3.** Figure 3: The reconstructed images from the time-series input model are shown, [PITH_FULL_IMAGE:figures/full_fig_p004_3.png] view at source ↗

read the original abstract

Cloud cover in multispectral imagery (MSI) poses significant challenges for early season crop mapping, as it leads to missing or corrupted spectral information. Synthetic aperture radar (SAR) data, which is not affected by cloud interference, offers a complementary solution, but lack sufficient spectral detail for precise crop mapping. To address this, we propose a novel framework, Time-series MSI Image Reconstruction using Vision Transformer (ViT), to reconstruct MSI data in cloud-covered regions by leveraging the temporal coherence of MSI and the complementary information from SAR from the attention mechanism. Comprehensive experiments, using rigorous reconstruction evaluation metrics, demonstrate that Time-series ViT framework significantly outperforms baselines that use non-time-series MSI and SAR or time-series MSI without SAR, effectively enhancing MSI image reconstruction in cloud-covered regions.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

ViT time-series MSI reconstruction with SAR targets a real remote sensing gap but the abstract supplies no numbers and leaves the temporal split question open.

read the letter

Look, the main thing to know about this paper is that it puts a Vision Transformer on time-series multispectral imagery to fill in clouds, using SAR as extra input through the attention mechanism. The goal is better reconstruction for early season crop mapping. What it does is take the temporal sequence of MSI and the cloud-free SAR and let the ViT attention mix them to predict the missing parts. The abstract says this beats baselines that either ignore the time dimension or skip SAR. That framing of the combination seems new enough for the specific use case. The practical angle is solid. Cloud cover is a real headache for optical satellite data in agriculture, and mixing in SAR makes sense since it sees through clouds but needs the spectral detail from MSI. Where it gets thin is the evidence. The abstract talks about comprehensive experiments and significant outperformance but gives no actual metric values, no dataset names or sizes, and no mention of ablations. It's hard to judge if this is a real step forward or modest. On top of that, the stress-test note about data splits is on point. For any claim that relies on temporal coherence, the splits have to be chronological so the model can't peek at future clear images when reconstructing an earlier cloudy one. If they used random or windowed splits instead, the results could be misleading. The text doesn't clarify this. This is the kind of paper that would interest remote sensing folks and CV people working on satellite applications. It might be worth a read for someone building cloud removal pipelines, but probably not for theory-focused readers. I'd say send it for peer review. The idea is grounded and the problem matters, even if the current writeup needs more on the experiments and controls to stand up.

Referee Report

2 major / 2 minor

Summary. The paper proposes a Time-series Vision Transformer (ViT) framework for reconstructing multispectral imagery (MSI) in cloud-covered regions. It leverages temporal coherence in MSI sequences together with complementary SAR information through the transformer's attention mechanism, claiming this yields significantly better reconstruction than baselines that use either non-time-series MSI+SAR or time-series MSI alone.

Significance. If the performance gains are shown to arise from proper exploitation of temporal structure and multi-sensor fusion rather than experimental artifacts, the work could provide a practical advance for cloud-filling in remote-sensing pipelines, especially for early-season crop mapping where missing MSI data is a recurring obstacle. The approach applies an established architecture to a multi-modal time-series setting but does not introduce fundamentally new theoretical machinery.

major comments (2)

[§4] §4 (Experimental Setup): The manuscript does not describe the temporal partitioning strategy used for the time-series dataset. Because the central claim attributes superiority to the ViT attention mechanism's ability to exploit temporal coherence, the train/test split must be strictly chronological (forward-chaining or date-blocked) to preclude leakage of future clear-sky observations into reconstructions of earlier cloudy scenes; without this control the reported gains over non-time-series baselines cannot be unambiguously credited to the architecture.
[§4] §4 and Abstract: The text asserts that the Time-series ViT 'significantly outperforms' the baselines on 'rigorous reconstruction evaluation metrics' yet supplies no numerical values, error bars, dataset sizes, or ablation results in the sections examined. This absence leaves the primary empirical claim without verifiable quantitative support.

minor comments (2)

[Abstract] Abstract: The phrase 'rigorous reconstruction evaluation metrics' should name the concrete measures (RMSE, PSNR, SSIM, etc.) so readers can immediately assess the evaluation protocol.
[§3] §3: The description of how SAR and time-series MSI patches are tokenized and fed into the shared ViT encoder would benefit from an explicit diagram or pseudocode block clarifying the fusion point.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback on our manuscript. The comments highlight important aspects of experimental rigor that we will address in the revision to strengthen the presentation of our temporal and multi-modal fusion approach.

read point-by-point responses

Referee: [§4] §4 (Experimental Setup): The manuscript does not describe the temporal partitioning strategy used for the time-series dataset. Because the central claim attributes superiority to the ViT attention mechanism's ability to exploit temporal coherence, the train/test split must be strictly chronological (forward-chaining or date-blocked) to preclude leakage of future clear-sky observations into reconstructions of earlier cloudy scenes; without this control the reported gains over non-time-series baselines cannot be unambiguously credited to the architecture.

Authors: We agree that explicit description of the temporal split is necessary to support our claims. Our experiments employed a strictly chronological forward-chaining partition: training used all available MSI/SAR sequences from earlier dates in the time series, with testing performed on later dates to ensure no future clear-sky observations could influence reconstructions of prior cloudy scenes. We will revise §4 to document the exact date ranges, number of time steps per split, and confirmation that the split precludes temporal leakage, allowing unambiguous attribution of gains to the time-series attention mechanism. revision: yes
Referee: [§4] §4 and Abstract: The text asserts that the Time-series ViT 'significantly outperforms' the baselines on 'rigorous reconstruction evaluation metrics' yet supplies no numerical values, error bars, dataset sizes, or ablation results in the sections examined. This absence leaves the primary empirical claim without verifiable quantitative support.

Authors: We acknowledge that the current manuscript version presents the performance claims at a high level without sufficient inline numerical detail in the examined sections. The full results—including specific metric values (PSNR, SSIM, RMSE), standard deviations across runs, dataset sizes (e.g., number of patches and sequences), and ablation comparisons (time-series vs. non-time-series, with/without SAR)—appear in Section 5 and the associated tables. In the revision we will add key quantitative results and error bars directly into §4 and the abstract, along with clearer references to the ablation studies, to make the empirical support immediately verifiable. revision: yes

Circularity Check

0 steps flagged

No circularity: empirical framework claims rest on standard ViT components and independent experiments

full rationale

The paper describes a Time-series ViT framework that reconstructs cloud-covered MSI regions by feeding temporal MSI sequences and complementary SAR data into standard Vision Transformer attention layers. No equations, derivations, or parameter-fitting steps are presented that reduce by construction to the inputs (e.g., no self-definitional ratios, fitted inputs renamed as predictions, or uniqueness theorems imported via self-citation). The central performance claims are justified solely by comparative experiments against non-time-series and SAR-free baselines using reconstruction metrics; these evaluations are external to the model's architectural definitions and do not rely on self-referential loops. The work is therefore self-contained with no load-bearing circular steps.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Based on abstract only; no explicit free parameters, axioms, or invented entities are described beyond standard components of Vision Transformers and attention mechanisms.

pith-pipeline@v0.9.0 · 5664 in / 1024 out tokens · 33702 ms · 2026-05-19T07:54:48.679152+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Foundation/RealityFromDistinction.lean reality_from_one_distinction unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

We propose a novel framework, Time-series MSI Image Reconstruction using Vision Transformer (ViT), to reconstruct MSI data in cloud-covered regions by leveraging the temporal coherence of MSI and the complementary information from SAR from the attention mechanism.
IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

The framework proposed in this paper consists of Convolutional Patch Projection (CPP), a Multi-Head Self-Attention (MHSA) Encoder, and a Patch Decoder... multi-scale loss function that combines the Mean Squared Error (MSE) loss and the Spectral Angle Mapper (SAM) loss.

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

30 extracted references · 30 canonical work pages · 2 internal anchors

[1]

Cross domain early crop mapping using cropstgan,

Y . Wang, H. Huang, and R. State, “Cross domain early crop mapping using cropstgan,” IEEE Access, 2024

work page 2024
[2]

Cross domain early crop mapping with label spaces discrepancies using multicropgan,

——, “Cross domain early crop mapping with label spaces discrepancies using multicropgan,” ISPRS Annals of the Pho- togrammetry, Remote Sensing and Spatial Information Sciences, vol. 10, pp. 241–248, 2024

work page 2024
[3]

An introduction to synthetic aperture radar (sar),

Y . K. Chan and V . Koo, “An introduction to synthetic aperture radar (sar),” Progress In Electromagnetics Research B , vol. 2, pp. 27–60, 2008

work page 2008
[4]

An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale

A. Dosovitskiy, L. Beyer, A. Kolesnikov, D. Weissenborn, X. Zhai, T. Unterthiner, M. Dehghani, M. Minderer, G. Heigold, S. Gelly, J. Uszkoreit, and N. Houlsby, “An image is worth 16x16 words: Transformers for image recognition at scale,” CoRR, vol. abs/2010.11929, 2020. [Online]. Available: https://arxiv.org/abs/2010.11929

work page internal anchor Pith review Pith/arXiv arXiv 2010
[5]

A com- mentary review on the use of normalized difference vegetation index (ndvi) in the era of popular remote sensing,

S. Huang, L. Tang, J. P. Hupy, Y . Wang, and G. Shao, “A com- mentary review on the use of normalized difference vegetation index (ndvi) in the era of popular remote sensing,” Journal of Forestry Research, vol. 32, no. 1, pp. 1–6, 2021

work page 2021
[6]

Early crop mapping us- ing dynamic ecoregion clustering: A usa-wide study,

Y . Wang, H. Huang, and R. State, “Early crop mapping us- ing dynamic ecoregion clustering: A usa-wide study,” Remote Sensing, vol. 15, no. 20, p. 4962, 2023

work page 2023
[7]

Mapping crop types in complex farming areas using sar imagery with dynamic time warping,

G. W. Gella, W. Bijker, and M. Belgiu, “Mapping crop types in complex farming areas using sar imagery with dynamic time warping,” ISPRS journal of photogrammetry and remote sensing, vol. 175, pp. 171–183, 2021

work page 2021
[8]

Spatio-temporal multi-level attention crop mapping method using time-series sar imagery,

Z. Han, C. Zhang, L. Gao, Z. Zeng, B. Zhang, and P. M. Atkinson, “Spatio-temporal multi-level attention crop mapping method using time-series sar imagery,” ISPRS Journal of Pho- togrammetry and Remote Sensing, vol. 206, pp. 293–310, 2023

work page 2023
[9]

Integration of optical and synthetic aperture radar imagery for improving crop mapping in northwestern benin, west africa,

G. Forkuor, C. Conrad, M. Thiel, T. Ullmann, and E. Zoungrana, “Integration of optical and synthetic aperture radar imagery for improving crop mapping in northwestern benin, west africa,” Remote sensing, vol. 6, no. 7, pp. 6472–6499, 2014

work page 2014
[10]

Digital mapping of land cover changes using the fusion of sar and msi satellite data,

G. Metrikaityte, J. Suziedelyte Visockiene, and K. Papsys, “Digital mapping of land cover changes using the fusion of sar and msi satellite data,” Land, vol. 11, no. 7, p. 1023, 2022

work page 2022
[11]

Synergic use of sar and optical data for feature extraction,

A. Mazza, M. Ciotola, G. Poggi, and G. Scarpa, “Synergic use of sar and optical data for feature extraction,” in IGARSS 2023-2023 IEEE International Geoscience and Remote Sensing Symposium. IEEE, 2023, pp. 2061–2064

work page 2023
[12]

Identification of soybean based on sentinel-1/2 sar and msi imagery under a complex planting structure,

M. Zhu, B. She, L. Huang, D. Zhang, H. Xu, and X. Yang, “Identification of soybean based on sentinel-1/2 sar and msi imagery under a complex planting structure,” Ecological Infor- matics, vol. 72, p. 101825, 2022

work page 2022
[13]

Enhanced crop classification through integrated optical and sar data: a deep learning approach for multi-source image fusion,

N. Liu, Q. Zhao, R. Williams, and B. Barrett, “Enhanced crop classification through integrated optical and sar data: a deep learning approach for multi-source image fusion,” International Journal of Remote Sensing , vol. 45, no. 19-20, pp. 7605–7633, 2024

work page 2024
[14]

A machine learning approach for accurate crop type mapping using combined sar and optical time series data,

R. Tufail, A. Ahmad, M. A. Javed, and S. R. Ahmad, “A machine learning approach for accurate crop type mapping using combined sar and optical time series data,” Advances in Space Research, vol. 69, no. 1, pp. 331–346, 2022

work page 2022
[15]

Cloud removal based on sar-optical remote sensing data fusion via a two-flow network,

R. Mao, H. Li, G. Ren, and Z. Yin, “Cloud removal based on sar-optical remote sensing data fusion via a two-flow network,” IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing , vol. 15, pp. 7677–7686, 2022

work page 2022
[16]

Cloud removal in sentinel-2 imagery using a deep residual neural network and sar-optical data fusion,

A. Meraner, P. Ebel, X. X. Zhu, and M. Schmitt, “Cloud removal in sentinel-2 imagery using a deep residual neural network and sar-optical data fusion,” ISPRS Journal of Photogrammetry and Remote Sensing, vol. 166, pp. 333–346, 2020

work page 2020
[17]

Multi-scale restoration of missing data in optical time-series images with masked spatial-temporal attention network,

Z. Zhang, J. Yan, Y . Liang, J. Feng, H. He, and L. Cao, “Multi-scale restoration of missing data in optical time-series images with masked spatial-temporal attention network,” 2024. [Online]. Available: https://arxiv.org/abs/2406.13358

work page arXiv 2024
[18]

Vits for sits: Vision transformers for satellite image time series,

M. Tarasiou, E. Chavez, and S. Zafeiriou, “Vits for sits: Vision transformers for satellite image time series,” 2023. [Online]. Available: https://arxiv.org/abs/2301.04944

work page arXiv 2023
[19]

Is space-time attention all you need for video understanding?

G. Bertasius, H. Wang, and L. Torresani, “Is space- time attention all you need for video understanding?” CoRR, vol. abs/2102.05095, 2021. [Online]. Available: https: //arxiv.org/abs/2102.05095

work page arXiv 2021
[20]

Gmes sentinel-1 mission,

R. Torres, P. Snoeij, D. Geudtner, D. Bibby, M. Davidson, E. Attema, P. Potin, B. Rommen, N. Floury, M. Brown et al. , “Gmes sentinel-1 mission,”Remote sensing of environment, vol. 120, pp. 9–24, 2012

work page 2012
[21]

Sen2cor for sentinel-2,

M. Main-Knorn, B. Pflug, J. Louis, V . Debaecker, U. M ¨uller- Wilm, and F. Gascon, “Sen2cor for sentinel-2,” in Image and signal processing for remote sensing XXIII , vol. 10427. SPIE, 2017, pp. 37–48

work page 2017
[22]

Monitoring us agriculture: the us department of agriculture, national agricul- tural statistics service, cropland data layer program,

C. Boryan, Z. Yang, R. Mueller, and M. Craig, “Monitoring us agriculture: the us department of agriculture, national agricul- tural statistics service, cropland data layer program,” Geocarto International, vol. 26, no. 5, pp. 341–358, 2011

work page 2011
[23]

Cloud mask intercomparison exercise (cmix): An evaluation of cloud masking algorithms for landsat 8 and sentinel-2,

S. Skakun, J. Wevers, C. Brockmann, G. Doxani, M. Alek- sandrov, M. Bati ˇc, D. Frantz, F. Gascon, L. G ´omez-Chova, O. Hagolle et al., “Cloud mask intercomparison exercise (cmix): An evaluation of cloud masking algorithms for landsat 8 and sentinel-2,” Remote Sensing of Environment , vol. 274, p. 112990, 2022

work page 2022
[24]

Attention is all you need,

A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, L. Kaiser, and I. Polosukhin, “Attention is all you need,”

work page
[25]

Attention Is All You Need

[Online]. Available: https://arxiv.org/abs/1706.03762

work page internal anchor Pith review Pith/arXiv arXiv
[26]

Scikit-learn: Machine learning in Python,

F. Pedregosa, G. Varoquaux, A. Gramfort, V . Michel, B. Thirion, O. Grisel, M. Blondel, P. Prettenhofer, R. Weiss, V . Dubourg, J. Vanderplas, A. Passos, D. Cournapeau, M. Brucher, M. Perrot, and E. Duchesnay, “Scikit-learn: Machine learning in Python,” Journal of Machine Learning Research, vol. 12, pp. 2825–2830, 2011

work page 2011
[27]

A compar- ison of error metrics and constraints for multiple endmember spectral mixture analysis and spectral angle mapper,

P. E. Dennison, K. Q. Halligan, and D. A. Roberts, “A compar- ison of error metrics and constraints for multiple endmember spectral mixture analysis and spectral angle mapper,” Remote Sensing of Environment, vol. 93, no. 3, pp. 359–367, 2004

work page 2004
[28]

The aster spectral library version 2.0,

A. M. Baldridge, S. J. Hook, C. Grove, and G. Rivera, “The aster spectral library version 2.0,” Remote sensing of environ- ment, vol. 113, no. 4, pp. 711–715, 2009

work page 2009
[29]

Image quality assessment: from error visibility to structural similarity,

Z. Wang, A. Bovik, H. Sheikh, and E. Simoncelli, “Image quality assessment: from error visibility to structural similarity,” IEEE Transactions on Image Processing , vol. 13, no. 4, pp. 600–612, 2004

work page 2004
[30]

Mean squared error: Love it or leave it? a new look at signal fidelity measures,

Z. Wang and A. C. Bovik, “Mean squared error: Love it or leave it? a new look at signal fidelity measures,” IEEE Signal Processing Magazine, vol. 26, no. 1, pp. 98–117, 2009

work page 2009

[1] [1]

Cross domain early crop mapping using cropstgan,

Y . Wang, H. Huang, and R. State, “Cross domain early crop mapping using cropstgan,” IEEE Access, 2024

work page 2024

[2] [2]

Cross domain early crop mapping with label spaces discrepancies using multicropgan,

——, “Cross domain early crop mapping with label spaces discrepancies using multicropgan,” ISPRS Annals of the Pho- togrammetry, Remote Sensing and Spatial Information Sciences, vol. 10, pp. 241–248, 2024

work page 2024

[3] [3]

An introduction to synthetic aperture radar (sar),

Y . K. Chan and V . Koo, “An introduction to synthetic aperture radar (sar),” Progress In Electromagnetics Research B , vol. 2, pp. 27–60, 2008

work page 2008

[4] [4]

An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale

A. Dosovitskiy, L. Beyer, A. Kolesnikov, D. Weissenborn, X. Zhai, T. Unterthiner, M. Dehghani, M. Minderer, G. Heigold, S. Gelly, J. Uszkoreit, and N. Houlsby, “An image is worth 16x16 words: Transformers for image recognition at scale,” CoRR, vol. abs/2010.11929, 2020. [Online]. Available: https://arxiv.org/abs/2010.11929

work page internal anchor Pith review Pith/arXiv arXiv 2010

[5] [5]

A com- mentary review on the use of normalized difference vegetation index (ndvi) in the era of popular remote sensing,

S. Huang, L. Tang, J. P. Hupy, Y . Wang, and G. Shao, “A com- mentary review on the use of normalized difference vegetation index (ndvi) in the era of popular remote sensing,” Journal of Forestry Research, vol. 32, no. 1, pp. 1–6, 2021

work page 2021

[6] [6]

Early crop mapping us- ing dynamic ecoregion clustering: A usa-wide study,

Y . Wang, H. Huang, and R. State, “Early crop mapping us- ing dynamic ecoregion clustering: A usa-wide study,” Remote Sensing, vol. 15, no. 20, p. 4962, 2023

work page 2023

[7] [7]

Mapping crop types in complex farming areas using sar imagery with dynamic time warping,

G. W. Gella, W. Bijker, and M. Belgiu, “Mapping crop types in complex farming areas using sar imagery with dynamic time warping,” ISPRS journal of photogrammetry and remote sensing, vol. 175, pp. 171–183, 2021

work page 2021

[8] [8]

Spatio-temporal multi-level attention crop mapping method using time-series sar imagery,

Z. Han, C. Zhang, L. Gao, Z. Zeng, B. Zhang, and P. M. Atkinson, “Spatio-temporal multi-level attention crop mapping method using time-series sar imagery,” ISPRS Journal of Pho- togrammetry and Remote Sensing, vol. 206, pp. 293–310, 2023

work page 2023

[9] [9]

Integration of optical and synthetic aperture radar imagery for improving crop mapping in northwestern benin, west africa,

G. Forkuor, C. Conrad, M. Thiel, T. Ullmann, and E. Zoungrana, “Integration of optical and synthetic aperture radar imagery for improving crop mapping in northwestern benin, west africa,” Remote sensing, vol. 6, no. 7, pp. 6472–6499, 2014

work page 2014

[10] [10]

Digital mapping of land cover changes using the fusion of sar and msi satellite data,

G. Metrikaityte, J. Suziedelyte Visockiene, and K. Papsys, “Digital mapping of land cover changes using the fusion of sar and msi satellite data,” Land, vol. 11, no. 7, p. 1023, 2022

work page 2022

[11] [11]

Synergic use of sar and optical data for feature extraction,

A. Mazza, M. Ciotola, G. Poggi, and G. Scarpa, “Synergic use of sar and optical data for feature extraction,” in IGARSS 2023-2023 IEEE International Geoscience and Remote Sensing Symposium. IEEE, 2023, pp. 2061–2064

work page 2023

[12] [12]

Identification of soybean based on sentinel-1/2 sar and msi imagery under a complex planting structure,

M. Zhu, B. She, L. Huang, D. Zhang, H. Xu, and X. Yang, “Identification of soybean based on sentinel-1/2 sar and msi imagery under a complex planting structure,” Ecological Infor- matics, vol. 72, p. 101825, 2022

work page 2022

[13] [13]

Enhanced crop classification through integrated optical and sar data: a deep learning approach for multi-source image fusion,

N. Liu, Q. Zhao, R. Williams, and B. Barrett, “Enhanced crop classification through integrated optical and sar data: a deep learning approach for multi-source image fusion,” International Journal of Remote Sensing , vol. 45, no. 19-20, pp. 7605–7633, 2024

work page 2024

[14] [14]

A machine learning approach for accurate crop type mapping using combined sar and optical time series data,

R. Tufail, A. Ahmad, M. A. Javed, and S. R. Ahmad, “A machine learning approach for accurate crop type mapping using combined sar and optical time series data,” Advances in Space Research, vol. 69, no. 1, pp. 331–346, 2022

work page 2022

[15] [15]

Cloud removal based on sar-optical remote sensing data fusion via a two-flow network,

R. Mao, H. Li, G. Ren, and Z. Yin, “Cloud removal based on sar-optical remote sensing data fusion via a two-flow network,” IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing , vol. 15, pp. 7677–7686, 2022

work page 2022

[16] [16]

Cloud removal in sentinel-2 imagery using a deep residual neural network and sar-optical data fusion,

A. Meraner, P. Ebel, X. X. Zhu, and M. Schmitt, “Cloud removal in sentinel-2 imagery using a deep residual neural network and sar-optical data fusion,” ISPRS Journal of Photogrammetry and Remote Sensing, vol. 166, pp. 333–346, 2020

work page 2020

[17] [17]

Multi-scale restoration of missing data in optical time-series images with masked spatial-temporal attention network,

Z. Zhang, J. Yan, Y . Liang, J. Feng, H. He, and L. Cao, “Multi-scale restoration of missing data in optical time-series images with masked spatial-temporal attention network,” 2024. [Online]. Available: https://arxiv.org/abs/2406.13358

work page arXiv 2024

[18] [18]

Vits for sits: Vision transformers for satellite image time series,

M. Tarasiou, E. Chavez, and S. Zafeiriou, “Vits for sits: Vision transformers for satellite image time series,” 2023. [Online]. Available: https://arxiv.org/abs/2301.04944

work page arXiv 2023

[19] [19]

Is space-time attention all you need for video understanding?

G. Bertasius, H. Wang, and L. Torresani, “Is space- time attention all you need for video understanding?” CoRR, vol. abs/2102.05095, 2021. [Online]. Available: https: //arxiv.org/abs/2102.05095

work page arXiv 2021

[20] [20]

Gmes sentinel-1 mission,

R. Torres, P. Snoeij, D. Geudtner, D. Bibby, M. Davidson, E. Attema, P. Potin, B. Rommen, N. Floury, M. Brown et al. , “Gmes sentinel-1 mission,”Remote sensing of environment, vol. 120, pp. 9–24, 2012

work page 2012

[21] [21]

Sen2cor for sentinel-2,

M. Main-Knorn, B. Pflug, J. Louis, V . Debaecker, U. M ¨uller- Wilm, and F. Gascon, “Sen2cor for sentinel-2,” in Image and signal processing for remote sensing XXIII , vol. 10427. SPIE, 2017, pp. 37–48

work page 2017

[22] [22]

Monitoring us agriculture: the us department of agriculture, national agricul- tural statistics service, cropland data layer program,

C. Boryan, Z. Yang, R. Mueller, and M. Craig, “Monitoring us agriculture: the us department of agriculture, national agricul- tural statistics service, cropland data layer program,” Geocarto International, vol. 26, no. 5, pp. 341–358, 2011

work page 2011

[23] [23]

Cloud mask intercomparison exercise (cmix): An evaluation of cloud masking algorithms for landsat 8 and sentinel-2,

S. Skakun, J. Wevers, C. Brockmann, G. Doxani, M. Alek- sandrov, M. Bati ˇc, D. Frantz, F. Gascon, L. G ´omez-Chova, O. Hagolle et al., “Cloud mask intercomparison exercise (cmix): An evaluation of cloud masking algorithms for landsat 8 and sentinel-2,” Remote Sensing of Environment , vol. 274, p. 112990, 2022

work page 2022

[24] [24]

Attention is all you need,

A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, L. Kaiser, and I. Polosukhin, “Attention is all you need,”

work page

[25] [25]

Attention Is All You Need

[Online]. Available: https://arxiv.org/abs/1706.03762

work page internal anchor Pith review Pith/arXiv arXiv

[26] [26]

Scikit-learn: Machine learning in Python,

F. Pedregosa, G. Varoquaux, A. Gramfort, V . Michel, B. Thirion, O. Grisel, M. Blondel, P. Prettenhofer, R. Weiss, V . Dubourg, J. Vanderplas, A. Passos, D. Cournapeau, M. Brucher, M. Perrot, and E. Duchesnay, “Scikit-learn: Machine learning in Python,” Journal of Machine Learning Research, vol. 12, pp. 2825–2830, 2011

work page 2011

[27] [27]

A compar- ison of error metrics and constraints for multiple endmember spectral mixture analysis and spectral angle mapper,

P. E. Dennison, K. Q. Halligan, and D. A. Roberts, “A compar- ison of error metrics and constraints for multiple endmember spectral mixture analysis and spectral angle mapper,” Remote Sensing of Environment, vol. 93, no. 3, pp. 359–367, 2004

work page 2004

[28] [28]

The aster spectral library version 2.0,

A. M. Baldridge, S. J. Hook, C. Grove, and G. Rivera, “The aster spectral library version 2.0,” Remote sensing of environ- ment, vol. 113, no. 4, pp. 711–715, 2009

work page 2009

[29] [29]

Image quality assessment: from error visibility to structural similarity,

Z. Wang, A. Bovik, H. Sheikh, and E. Simoncelli, “Image quality assessment: from error visibility to structural similarity,” IEEE Transactions on Image Processing , vol. 13, no. 4, pp. 600–612, 2004

work page 2004

[30] [30]

Mean squared error: Love it or leave it? a new look at signal fidelity measures,

Z. Wang and A. C. Bovik, “Mean squared error: Love it or leave it? a new look at signal fidelity measures,” IEEE Signal Processing Magazine, vol. 26, no. 1, pp. 98–117, 2009

work page 2009