A Comparative Study of Transformer and Convolutional Models for Crop Segmentation from Satellite Image Time Series

Anwar Ur Rehman; Christian Loschiavo; Ignazio Gallo; Mattia Gatti; Mirco Boschetti; Nicola Landro; Riccardo La Grassa

arxiv: 2412.01944 · v2 · pith:W23ENRLQnew · submitted 2024-12-02 · 💻 cs.CV · eess.IV

A Comparative Study of Transformer and Convolutional Models for Crop Segmentation from Satellite Image Time Series

Mattia Gatti , Ignazio Gallo , Nicola Landro , Christian Loschiavo , Anwar Ur Rehman , Mirco Boschetti , Riccardo La Grassa This is my paper

Pith reviewed 2026-05-23 07:42 UTC · model grok-4.3

classification 💻 cs.CV eess.IV

keywords crop segmentationsatellite image time seriestransformerCNNSentinel-2temporal modelingsemantic segmentation

0 comments

The pith

TSViT slightly surpasses 3D U-Net for crop segmentation from satellite time series, with VistaFormer offering best efficiency.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper compares convolutional and transformer-based models for segmenting crops in satellite image time series from Sentinel-2. It evaluates 3D CNN variants and three transformer architectures that handle temporal dependencies differently. TSViT delivers the highest accuracy, narrowly ahead of 3D U-Net, while VistaFormer leads in computational efficiency. This establishes that how temporal information is modeled matters more for performance than whether the base network is convolutional or transformer-based. Readers interested in remote sensing applications would care because better crop maps improve agricultural monitoring and land-use analysis.

Core claim

Experiments on the Munich and Lombardia datasets show that TSViT achieves the best overall results, slightly surpassing 3D U-Net, which remains a strong CNN baseline. VistaFormer offers the best efficiency, while Swin UNETR performs competitively but is less effective than transformers that explicitly model temporal dynamics. These results highlight that temporal modelling is critical for SITS.

What carries the argument

Different strategies for capturing temporal dependencies in transformer architectures versus 3D convolutional networks for processing multispectral time series data.

If this is right

Temporal modelling is critical for satellite image time series tasks.
Transformers that explicitly model temporal dynamics outperform those that treat time as an additional spatial dimension.
TSViT outperforms the tested CNN models on the given datasets.
VistaFormer provides a strong efficiency-performance trade-off.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The observed superiority of TSViT may not hold for other geographic regions or sensor types not tested here.
VistaFormer's efficiency could enable deployment on resource-constrained systems for large-area monitoring.
Extending the comparison to include more recent transformer variants or hybrid models could refine the efficiency-accuracy frontier.

Load-bearing premise

The Munich and Lombardia datasets together with the chosen training and evaluation protocols are representative enough that the observed ranking of models will generalize to other regions, sensors, or crop types.

What would settle it

Evaluating the models on a new Sentinel-2 dataset from a different agricultural region where 3D U-Net or another CNN achieves higher accuracy than TSViT would falsify the claim that TSViT is generally superior.

Figures

Figures reproduced from arXiv: 2412.01944 by Anwar Ur Rehman, Christian Loschiavo, Ignazio Gallo, Mattia Gatti, Mirco Boschetti, Nicola Landro, Riccardo La Grassa.

**Figure 2.** Figure 2: Three random samples of input-output pairs from the Munich dataset. On the top, the input is shown as an [PITH_FULL_IMAGE:figures/full_fig_p003_2.png] view at source ↗

**Figure 3.** Figure 3: Three random samples of input-output pairs from the Lombardia dataset. On the top, the input is shown as an [PITH_FULL_IMAGE:figures/full_fig_p003_3.png] view at source ↗

**Figure 4.** Figure 4: An example of a good prediction made by the Swin UNETR model on the Munich dataset. [PITH_FULL_IMAGE:figures/full_fig_p003_4.png] view at source ↗

**Figure 5.** Figure 5: An example of a bad prediction made by the Swin UNETR model on the Munich dataset. [PITH_FULL_IMAGE:figures/full_fig_p004_5.png] view at source ↗

**Figure 6.** Figure 6: An example of a bad prediction made by the Swin UNETR model on the Lombardia dataset. [PITH_FULL_IMAGE:figures/full_fig_p004_6.png] view at source ↗

**Figure 7.** Figure 7: An example of a good prediction made by the Swin UNETR model on the Lombardia dataset. [PITH_FULL_IMAGE:figures/full_fig_p004_7.png] view at source ↗

**Figure 8.** Figure 8: Model’s predictions for Lombardia Test A. [PITH_FULL_IMAGE:figures/full_fig_p005_8.png] view at source ↗

read the original abstract

Crop segmentation from satellite image time series (SITS) is a fundamental task for agricultural monitoring and land-use analysis. While convolutional neural networks (CNNs) have been widely used, transformer-based architectures offer alternative mechanisms for representing spatial and temporal dependencies in multispectral data. This paper presents a comparative study of CNN and transformer-based segmentation models for crop mapping from Sentinel-2 time series, including 3D U-Net, 3D FPN, 3D DeepLabv3, and three transformer architectures: Swin UNETR, TSViT, and VistaFormer, which adopt different strategies for capturing temporal dependencies. Experiments on the Munich and Lombardia datasets show that TSViT achieves the best overall results, slightly surpassing 3D U-Net, which remains a strong CNN baseline. VistaFormer offers the best efficiency, while Swin UNETR performs competitively but is less effective than transformers that explicitly model temporal dynamics. These results highlight that temporal modelling is critical for SITS: TSViT outperforms CNNs and approaches that treat time as an additional spatial dimension, while VistaFormer provides a strong efficiency-performance trade-off.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

Straightforward head-to-head on two European Sentinel-2 datasets shows TSViT slightly ahead of 3D U-Net and VistaFormer most efficient, but the ranking rests on narrow data.

read the letter

The paper runs a comparison of TSViT, VistaFormer, Swin UNETR, and three 3D CNN baselines on crop segmentation from Sentinel-2 time series. TSViT comes out on top overall, VistaFormer wins on efficiency, and the results point to explicit temporal modeling as the key factor over treating time as an extra spatial channel. That is the core message a reader takes away after one pass through the abstract and results summary. Nothing in the work introduces new architectures or theory; it applies already-published models to two fixed datasets and reports the ordering plus runtime numbers. The experiments are presented cleanly enough that someone facing the same task on similar data can use the numbers as a starting point without re-implementing everything from scratch. The citation list stays within the expected SITS and vision-transformer literature without obvious gaps or padding. The soft spot is exactly the one the stress-test note flags. Munich and Lombardia are both mid-latitude European agricultural scenes captured by the same sensor with overlapping crop calendars. No cross-region, cross-sensor, or cross-climate checks appear, so the claimed superiority of TSViT could be tied to those distributions rather than a general property of the architecture. The abstract gives no detail on statistical significance, hyper-parameter search scope, or cross-validation scheme, which leaves the ranking plausible but not yet ironclad. This paper is for practitioners who need a quick reference when selecting a model for operational European-style crop mapping. A methods-focused reader or someone working in tropical or arid zones will find little to take away. It is coherent on its own terms and shows honest engagement with the empirical question, so it clears the bar for peer review even though the scope is limited. I would send it out but would expect referees to ask for at least one additional dataset from a different region.

Referee Report

1 major / 0 minor

Summary. The manuscript presents an empirical comparison of CNN-based (3D U-Net, 3D FPN, 3D DeepLabv3) and transformer-based (Swin UNETR, TSViT, VistaFormer) segmentation models for crop mapping from Sentinel-2 satellite image time series. On the Munich and Lombardia datasets, TSViT achieves the highest overall performance (slightly above the 3D U-Net baseline), VistaFormer offers the best efficiency, and explicit temporal modeling is identified as critical, with approaches treating time as an extra spatial dimension performing less well.

Significance. If the reported ranking is robust, the study supplies actionable guidance for architecture selection in SITS-based agricultural monitoring, particularly the benefit of dedicated temporal mechanisms over 3D convolutions and the efficiency of VistaFormer. The work is a straightforward empirical ranking with no parameter-free derivations or machine-checked proofs.

major comments (1)

[Abstract] Abstract: the central claim that TSViT is best overall and that explicit temporal modeling is critical rests exclusively on results from the Munich and Lombardia datasets. Both are mid-latitude European Sentinel-2 scenes with overlapping crop calendars; no cross-region, cross-sensor, or cross-crop-type experiments are described that would separate architecture effects from data-distribution effects.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for the constructive feedback on the scope of our empirical comparison. We address the major comment below.

read point-by-point responses

Referee: [Abstract] Abstract: the central claim that TSViT is best overall and that explicit temporal modeling is critical rests exclusively on results from the Munich and Lombardia datasets. Both are mid-latitude European Sentinel-2 scenes with overlapping crop calendars; no cross-region, cross-sensor, or cross-crop-type experiments are described that would separate architecture effects from data-distribution effects.

Authors: We agree that the reported ranking and the emphasis on explicit temporal modeling are derived solely from the Munich and Lombardia datasets, which share similar mid-latitude European characteristics and crop calendars. The abstract already names these datasets, but we acknowledge that the central claims would benefit from clearer qualification regarding generalizability. In the revised version we will (1) update the abstract to state that the performance ordering holds on these two Sentinel-2 scenes and (2) add a short limitations paragraph noting that architecture effects have not been isolated from data-distribution effects and that cross-region or cross-sensor validation remains future work. We believe the comparative results still offer actionable guidance for similar agricultural monitoring settings. revision: yes

Circularity Check

0 steps flagged

No circularity: purely empirical model ranking on fixed datasets

full rationale

The paper reports an empirical comparison of off-the-shelf CNN and transformer segmentation architectures (3D U-Net, TSViT, VistaFormer, etc.) on the Munich and Lombardia Sentinel-2 datasets. No equations, derivations, fitted parameters presented as predictions, or self-citation chains appear in the abstract or described content. All claims reduce to direct performance metrics on the chosen data splits; the ranking is therefore not forced by construction or by prior self-referential results. This is the normal, non-circular outcome for a benchmark study.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

This is a purely empirical comparative study. No free parameters are fitted as part of a derivation, no mathematical axioms are invoked beyond standard deep-learning assumptions, and no new physical or mathematical entities are postulated.

pith-pipeline@v0.9.0 · 5757 in / 1080 out tokens · 28088 ms · 2026-05-23T07:42:44.895449+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

29 extracted references · 29 canonical work pages · 1 internal anchor

[1]

A review of deep learning methods for semantic segmentation of remote sensing imagery

Xiaohui Yuan, Jianfang Shi, and Lichuan Gu. A review of deep learning methods for semantic segmentation of remote sensing imagery. Expert Systems with Applications , 169:114417, 2021

work page 2021
[2]

Convolutional neural networks based potholes detection using thermal imaging

Aparna, Yukti Bhatia, Rachna Rai, Varun Gupta, Naveen Aggarwal, and Aparna Akula. Convolutional neural networks based potholes detection using thermal imaging. Journal of King Saud University - Computer and Information Sciences, 34(3):578–588, 2022

work page 2022
[3]

An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale

Alexey Dosovitskiy, Lucas Beyer, Alexander Kolesnikov, Dirk Weissenborn, Xiaohua Zhai, Thomas Unterthiner, Mostafa Dehghani, Matthias Minderer, Georg Heigold, Sylvain Gelly, Jakob Uszkoreit, and Neil Houlsby. An image is worth 16x16 words: Transformers for image recognition at scale. CoRR, abs/2010.11929, 2020

work page internal anchor Pith review Pith/arXiv arXiv 2010
[4]

Swin transformer: Hierarchical vision transformer using shifted windows

Ze Liu, Yutong Lin, Yue Cao, Han Hu, Yixuan Wei, Zheng Zhang, Stephen Lin, and Baining Guo. Swin transformer: Hierarchical vision transformer using shifted windows. InProceedings of the IEEE/CVF International Conference on Computer Vision, pages 10012–10022, 2021

work page 2021
[5]

End-to-end object detection with transformers

Nicolas Carion, Francisco Massa, Gabriel Synnaeve, Nicolas Usunier, Alexander Kirillov, and Sergey Zagoruyko. End-to-end object detection with transformers. In Andrea Vedaldi, Horst Bischof, Thomas Brox, and Jan-Michael Frahm, editors, Computer Vision – ECCV 2020, pages 213–229, Cham, 2020. Springer International Publishing

work page 2020
[6]

Torr, and Li Zhang

Sixiao Zheng, Jiachen Lu, Hengshuang Zhao, Xiatian Zhu, Zekun Luo, Yabiao Wang, Yanwei Fu, Jianfeng Feng, Tao Xiang, Philip H.S. Torr, and Li Zhang. Rethinking semantic segmentation from a sequence-to-sequence perspective with transformers. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 6881–6890, June 2021

work page 2021
[7]

Sits-former: A pre-trained spatio- spectral-temporal representation model for sentinel-2 time series classification

Yuan Yuan, Lei Lin, Qingshan Liu, Renlong Hang, and Zeng-Guang Zhou. Sits-former: A pre-trained spatio- spectral-temporal representation model for sentinel-2 time series classification. International Journal of Applied Earth Observation and Geoinformation , 106:102651, 2022

work page 2022
[8]

Ctgan : Cloud transformer generative adversarial network

Gi-Luen Huang and Pei-Yuan Wu. Ctgan : Cloud transformer generative adversarial network. In 2022 IEEE International Conference on Image Processing (ICIP) , pages 511–515, 2022

work page 2022
[9]

Libo Wang, RUI LI, Ce Zhang, Shenghui Fang, Chenxi Duan, Xiaoliang Meng, and Peter Atkinson. Unetformer: A unet-like transformer for efficient semantic segmentation of remote sensing urban scene imagery.ISPRS Journal of Photogrammetry and Remote Sensing , 190:196–214, 06 2022

work page 2022
[10]

Hsi-bert: Hyperspectral image classification using the bidirectional encoder representation from transformers

Ji He, Lina Zhao, Hongwei Yang, Mengmeng Zhang, and Wei Li. Hsi-bert: Hyperspectral image classification using the bidirectional encoder representation from transformers. IEEE Transactions on Geoscience and Remote Sensing, 58(1):165–178, 2020

work page 2020
[11]

Spectral- former: Rethinking hyperspectral image classification with transformers

Danfeng Hong, Zhu Han, Jing Yao, Lianru Gao, Bing Zhang, Antonio Plaza, and Jocelyn Chanussot. Spectral- former: Rethinking hyperspectral image classification with transformers. IEEE Transactions on Geoscience and Remote Sensing, 60:1–15, 2022

work page 2022
[12]

Al Rahhal, Reham Al Dayil, and Naif Al Ajlan

Yakoub Bazi, Laila Bashmal, Mohamad M. Al Rahhal, Reham Al Dayil, and Naif Al Ajlan. Vision transformers for remote sensing image classification. Remote Sensing, 13(3), 2021

work page 2021
[13]

Multi- temporal, multi-frequency, and multi-polarization coherence and sar backscatter analysis of wetlands

Fariba Mohammadimanesh, Bahram Salehi, Masoud Mahdianpari, Brian Brisco, and Mahdi Motagh. Multi- temporal, multi-frequency, and multi-polarization coherence and sar backscatter analysis of wetlands. ISPRS Journal of Photogrammetry and Remote Sensing , 142:78–93, 2018

work page 2018
[14]

A deep learning approach for burned area segmentation with sentinel-2 data

Lisa Knopp, Marc Wieland, Michaela Rättich, and Sandro Martinis. A deep learning approach for burned area segmentation with sentinel-2 data. Remote Sensing, 12(15), 2020

work page 2020
[15]

Clouds classification from sentinel-2 imagery with deep residual learning and semantic image segmen- tation

Cheng-Chien Liu, Yu-Cheng Zhang, Pei-Yin Chen, Chien-Chih Lai, Yi-Hsin Chen, Ji-Hong Cheng, and Ming- Hsun Ko. Clouds classification from sentinel-2 imagery with deep residual learning and semantic image segmen- tation. Remote Sensing, 11(2), 2019

work page 2019
[16]

Convolutional neural networks for water segmentation using sentinel-2 red, green, blue (rgb) composites and derived spectral indices

Thomas James, Calogero Schillaci, and Aldo Lipani. Convolutional neural networks for water segmentation using sentinel-2 red, green, blue (rgb) composites and derived spectral indices. International Journal of Remote Sensing , 42(14):5338–5365, 2021

work page 2021
[17]

Sentinel 2 time series analysis with 3d feature pyramid network and time domain class activation intervals for crop mapping

Ignazio Gallo, Riccardo La Grassa, Nicola Landro, and Mirco Boschetti. Sentinel 2 time series analysis with 3d feature pyramid network and time domain class activation intervals for crop mapping. ISPRS International Journal of Geo-Information, 10(7), 2021. 7

work page 2021
[18]

Swin transformer and deep convolutional neural networks for coastal wetland classification using sentinel-1, sentinel-2, and lidar data

Ali Jamali and Masoud Mahdianpari. Swin transformer and deep convolutional neural networks for coastal wetland classification using sentinel-1, sentinel-2, and lidar data. Remote Sensing, 14(2), 2022

work page 2022
[19]

A deep learning framework based on generative adversarial networks and vision transformer for complex wetland classification using limited training samples

Ali Jamali, Masoud Mahdianpari, Fariba Mohammadimanesh, and Saeid Homayouni. A deep learning framework based on generative adversarial networks and vision transformer for complex wetland classification using limited training samples. International Journal of Applied Earth Observation and Geoinformation , 115:103095, 2022

work page 2022
[20]

Agnès Bégué, Damien Arvor, Beatriz Bellon, Julie Betbeder, Diego De Abelleyra, Rodrigo P. D. Ferraz, Valentine Lebourgeois, Camille Lelong, Margareth Simões, and Santiago R. Verón. Remote sensing and cropping practices: A review. Remote Sensing, 10(1), 2018

work page 2018
[21]

Shyamal Virnodkar, V . K. Pachghare, and Sagar Murade. A technique to classify sugarcane crop from sentinel-2 satellite imagery using u-net architecture. In Chhabi Rani Panigrahi, Bibudhendu Pati, Prasant Mohapatra, Rajkumar Buyya, and Kuan-Ching Li, editors, Progress in Advanced Computing and Intelligent Engineering , pages 322–330, Singapore, 2021. Spri...

work page 2021
[22]

Crop type mapping by using transfer learning

Artur Nowakowski, John Mrziglod, Dario Spiller, Rogerio Bonifacio, Irene Ferrari, Pierre Philippe Mathieu, Manuel Garcia-Herranz, and Do-Hyung Kim. Crop type mapping by using transfer learning. International Journal of Applied Earth Observation and Geoinformation , 98:102313, 2021

work page 2021
[23]

Multi-temporal land cover classification with sequential recurrent encoders

Marc Rußwurm and Marco Körner. Multi-temporal land cover classification with sequential recurrent encoders. ISPRS International Journal of Geo-Information , 7(4):129, 2018

work page 2018
[24]

Cctnet: Coupled cnn and transformer network for crop segmentation of remote sensing images

Hong Wang, Xianzhong Chen, Tianxiang Zhang, Zhiyong Xu, and Jiangyun Li. Cctnet: Coupled cnn and transformer network for crop segmentation of remote sensing images. Remote Sensing, 14(9):1956, 2022

work page 1956
[25]

Hsi-transunet: A transformer based semantic segmentation model for crop mapping from uav hyperspectral imagery.Computers and Electronics in Agriculture, 201:107297, 2022

Bowen Niu, Quanlong Feng, Boan Chen, Cong Ou, Yiming Liu, and Jianyu Yang. Hsi-transunet: A transformer based semantic segmentation model for crop mapping from uav hyperspectral imagery.Computers and Electronics in Agriculture, 201:107297, 2022

work page 2022
[26]

Swin unetr: Swin transformers for semantic segmentation of brain tumors in mri images, 2022

Ali Hatamizadeh, Vishwesh Nath, Yucheng Tang, Dong Yang, Holger Roth, and Daguang Xu. Swin unetr: Swin transformers for semantic segmentation of brain tumors in mri images, 2022

work page 2022
[27]

In-season and dynamic crop mapping using 3d convolution neural networks and sentinel-2 time series

Ignazio Gallo, Luigi Ranghetti, Nicola Landro, Riccardo La Grassa, and Mirco Boschetti. In-season and dynamic crop mapping using 3d convolution neural networks and sentinel-2 time series. ISPRS Journal of Photogrammetry and Remote Sensing, 195:335–352, 2023

work page 2023
[28]

Convolutional and transformer network for crop segmentation of sentinel-2 images

Mattia Gatti. Convolutional and transformer network for crop segmentation of sentinel-2 images. https: //github.com/mattiagatti/Sentinel-2-Crop-Mapping-Models , 2024

work page 2024
[29]

Rethinking atrous convolution for semantic image segmentation, 2017

Liang-Chieh Chen, George Papandreou, Florian Schroff, and Hartwig Adam. Rethinking atrous convolution for semantic image segmentation, 2017. 8

work page 2017

[1] [1]

A review of deep learning methods for semantic segmentation of remote sensing imagery

Xiaohui Yuan, Jianfang Shi, and Lichuan Gu. A review of deep learning methods for semantic segmentation of remote sensing imagery. Expert Systems with Applications , 169:114417, 2021

work page 2021

[2] [2]

Convolutional neural networks based potholes detection using thermal imaging

Aparna, Yukti Bhatia, Rachna Rai, Varun Gupta, Naveen Aggarwal, and Aparna Akula. Convolutional neural networks based potholes detection using thermal imaging. Journal of King Saud University - Computer and Information Sciences, 34(3):578–588, 2022

work page 2022

[3] [3]

An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale

Alexey Dosovitskiy, Lucas Beyer, Alexander Kolesnikov, Dirk Weissenborn, Xiaohua Zhai, Thomas Unterthiner, Mostafa Dehghani, Matthias Minderer, Georg Heigold, Sylvain Gelly, Jakob Uszkoreit, and Neil Houlsby. An image is worth 16x16 words: Transformers for image recognition at scale. CoRR, abs/2010.11929, 2020

work page internal anchor Pith review Pith/arXiv arXiv 2010

[4] [4]

Swin transformer: Hierarchical vision transformer using shifted windows

Ze Liu, Yutong Lin, Yue Cao, Han Hu, Yixuan Wei, Zheng Zhang, Stephen Lin, and Baining Guo. Swin transformer: Hierarchical vision transformer using shifted windows. InProceedings of the IEEE/CVF International Conference on Computer Vision, pages 10012–10022, 2021

work page 2021

[5] [5]

End-to-end object detection with transformers

Nicolas Carion, Francisco Massa, Gabriel Synnaeve, Nicolas Usunier, Alexander Kirillov, and Sergey Zagoruyko. End-to-end object detection with transformers. In Andrea Vedaldi, Horst Bischof, Thomas Brox, and Jan-Michael Frahm, editors, Computer Vision – ECCV 2020, pages 213–229, Cham, 2020. Springer International Publishing

work page 2020

[6] [6]

Torr, and Li Zhang

Sixiao Zheng, Jiachen Lu, Hengshuang Zhao, Xiatian Zhu, Zekun Luo, Yabiao Wang, Yanwei Fu, Jianfeng Feng, Tao Xiang, Philip H.S. Torr, and Li Zhang. Rethinking semantic segmentation from a sequence-to-sequence perspective with transformers. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 6881–6890, June 2021

work page 2021

[7] [7]

Sits-former: A pre-trained spatio- spectral-temporal representation model for sentinel-2 time series classification

Yuan Yuan, Lei Lin, Qingshan Liu, Renlong Hang, and Zeng-Guang Zhou. Sits-former: A pre-trained spatio- spectral-temporal representation model for sentinel-2 time series classification. International Journal of Applied Earth Observation and Geoinformation , 106:102651, 2022

work page 2022

[8] [8]

Ctgan : Cloud transformer generative adversarial network

Gi-Luen Huang and Pei-Yuan Wu. Ctgan : Cloud transformer generative adversarial network. In 2022 IEEE International Conference on Image Processing (ICIP) , pages 511–515, 2022

work page 2022

[9] [9]

Libo Wang, RUI LI, Ce Zhang, Shenghui Fang, Chenxi Duan, Xiaoliang Meng, and Peter Atkinson. Unetformer: A unet-like transformer for efficient semantic segmentation of remote sensing urban scene imagery.ISPRS Journal of Photogrammetry and Remote Sensing , 190:196–214, 06 2022

work page 2022

[10] [10]

Hsi-bert: Hyperspectral image classification using the bidirectional encoder representation from transformers

Ji He, Lina Zhao, Hongwei Yang, Mengmeng Zhang, and Wei Li. Hsi-bert: Hyperspectral image classification using the bidirectional encoder representation from transformers. IEEE Transactions on Geoscience and Remote Sensing, 58(1):165–178, 2020

work page 2020

[11] [11]

Spectral- former: Rethinking hyperspectral image classification with transformers

Danfeng Hong, Zhu Han, Jing Yao, Lianru Gao, Bing Zhang, Antonio Plaza, and Jocelyn Chanussot. Spectral- former: Rethinking hyperspectral image classification with transformers. IEEE Transactions on Geoscience and Remote Sensing, 60:1–15, 2022

work page 2022

[12] [12]

Al Rahhal, Reham Al Dayil, and Naif Al Ajlan

Yakoub Bazi, Laila Bashmal, Mohamad M. Al Rahhal, Reham Al Dayil, and Naif Al Ajlan. Vision transformers for remote sensing image classification. Remote Sensing, 13(3), 2021

work page 2021

[13] [13]

Multi- temporal, multi-frequency, and multi-polarization coherence and sar backscatter analysis of wetlands

Fariba Mohammadimanesh, Bahram Salehi, Masoud Mahdianpari, Brian Brisco, and Mahdi Motagh. Multi- temporal, multi-frequency, and multi-polarization coherence and sar backscatter analysis of wetlands. ISPRS Journal of Photogrammetry and Remote Sensing , 142:78–93, 2018

work page 2018

[14] [14]

A deep learning approach for burned area segmentation with sentinel-2 data

Lisa Knopp, Marc Wieland, Michaela Rättich, and Sandro Martinis. A deep learning approach for burned area segmentation with sentinel-2 data. Remote Sensing, 12(15), 2020

work page 2020

[15] [15]

Clouds classification from sentinel-2 imagery with deep residual learning and semantic image segmen- tation

Cheng-Chien Liu, Yu-Cheng Zhang, Pei-Yin Chen, Chien-Chih Lai, Yi-Hsin Chen, Ji-Hong Cheng, and Ming- Hsun Ko. Clouds classification from sentinel-2 imagery with deep residual learning and semantic image segmen- tation. Remote Sensing, 11(2), 2019

work page 2019

[16] [16]

Convolutional neural networks for water segmentation using sentinel-2 red, green, blue (rgb) composites and derived spectral indices

Thomas James, Calogero Schillaci, and Aldo Lipani. Convolutional neural networks for water segmentation using sentinel-2 red, green, blue (rgb) composites and derived spectral indices. International Journal of Remote Sensing , 42(14):5338–5365, 2021

work page 2021

[17] [17]

Sentinel 2 time series analysis with 3d feature pyramid network and time domain class activation intervals for crop mapping

Ignazio Gallo, Riccardo La Grassa, Nicola Landro, and Mirco Boschetti. Sentinel 2 time series analysis with 3d feature pyramid network and time domain class activation intervals for crop mapping. ISPRS International Journal of Geo-Information, 10(7), 2021. 7

work page 2021

[18] [18]

Swin transformer and deep convolutional neural networks for coastal wetland classification using sentinel-1, sentinel-2, and lidar data

Ali Jamali and Masoud Mahdianpari. Swin transformer and deep convolutional neural networks for coastal wetland classification using sentinel-1, sentinel-2, and lidar data. Remote Sensing, 14(2), 2022

work page 2022

[19] [19]

A deep learning framework based on generative adversarial networks and vision transformer for complex wetland classification using limited training samples

Ali Jamali, Masoud Mahdianpari, Fariba Mohammadimanesh, and Saeid Homayouni. A deep learning framework based on generative adversarial networks and vision transformer for complex wetland classification using limited training samples. International Journal of Applied Earth Observation and Geoinformation , 115:103095, 2022

work page 2022

[20] [20]

Agnès Bégué, Damien Arvor, Beatriz Bellon, Julie Betbeder, Diego De Abelleyra, Rodrigo P. D. Ferraz, Valentine Lebourgeois, Camille Lelong, Margareth Simões, and Santiago R. Verón. Remote sensing and cropping practices: A review. Remote Sensing, 10(1), 2018

work page 2018

[21] [21]

Shyamal Virnodkar, V . K. Pachghare, and Sagar Murade. A technique to classify sugarcane crop from sentinel-2 satellite imagery using u-net architecture. In Chhabi Rani Panigrahi, Bibudhendu Pati, Prasant Mohapatra, Rajkumar Buyya, and Kuan-Ching Li, editors, Progress in Advanced Computing and Intelligent Engineering , pages 322–330, Singapore, 2021. Spri...

work page 2021

[22] [22]

Crop type mapping by using transfer learning

Artur Nowakowski, John Mrziglod, Dario Spiller, Rogerio Bonifacio, Irene Ferrari, Pierre Philippe Mathieu, Manuel Garcia-Herranz, and Do-Hyung Kim. Crop type mapping by using transfer learning. International Journal of Applied Earth Observation and Geoinformation , 98:102313, 2021

work page 2021

[23] [23]

Multi-temporal land cover classification with sequential recurrent encoders

Marc Rußwurm and Marco Körner. Multi-temporal land cover classification with sequential recurrent encoders. ISPRS International Journal of Geo-Information , 7(4):129, 2018

work page 2018

[24] [24]

Cctnet: Coupled cnn and transformer network for crop segmentation of remote sensing images

Hong Wang, Xianzhong Chen, Tianxiang Zhang, Zhiyong Xu, and Jiangyun Li. Cctnet: Coupled cnn and transformer network for crop segmentation of remote sensing images. Remote Sensing, 14(9):1956, 2022

work page 1956

[25] [25]

Hsi-transunet: A transformer based semantic segmentation model for crop mapping from uav hyperspectral imagery.Computers and Electronics in Agriculture, 201:107297, 2022

Bowen Niu, Quanlong Feng, Boan Chen, Cong Ou, Yiming Liu, and Jianyu Yang. Hsi-transunet: A transformer based semantic segmentation model for crop mapping from uav hyperspectral imagery.Computers and Electronics in Agriculture, 201:107297, 2022

work page 2022

[26] [26]

Swin unetr: Swin transformers for semantic segmentation of brain tumors in mri images, 2022

Ali Hatamizadeh, Vishwesh Nath, Yucheng Tang, Dong Yang, Holger Roth, and Daguang Xu. Swin unetr: Swin transformers for semantic segmentation of brain tumors in mri images, 2022

work page 2022

[27] [27]

In-season and dynamic crop mapping using 3d convolution neural networks and sentinel-2 time series

Ignazio Gallo, Luigi Ranghetti, Nicola Landro, Riccardo La Grassa, and Mirco Boschetti. In-season and dynamic crop mapping using 3d convolution neural networks and sentinel-2 time series. ISPRS Journal of Photogrammetry and Remote Sensing, 195:335–352, 2023

work page 2023

[28] [28]

Convolutional and transformer network for crop segmentation of sentinel-2 images

Mattia Gatti. Convolutional and transformer network for crop segmentation of sentinel-2 images. https: //github.com/mattiagatti/Sentinel-2-Crop-Mapping-Models , 2024

work page 2024

[29] [29]

Rethinking atrous convolution for semantic image segmentation, 2017

Liang-Chieh Chen, George Papandreou, Florian Schroff, and Hartwig Adam. Rethinking atrous convolution for semantic image segmentation, 2017. 8

work page 2017