On the Generalizability of Foundation Models for Crop Type Mapping
Pith reviewed 2026-05-23 20:56 UTC · model grok-4.3
The pith
Foundation models pre-trained on Sentinel-2 imagery transfer better to crop classification across continents than ImageNet weights.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Pre-trained weights designed explicitly for Sentinel-2, such as SSL4EO-S12, outperform general pre-trained weights like ImageNet when evaluated on five crop classification datasets across five continents. While only 100 labeled images are sufficient for achieving high overall accuracy, 900 images are required to mitigate class imbalance and improve average accuracy.
What carries the argument
Transfer performance comparison of Sentinel-2-specific self-supervised pre-training against ImageNet pre-training when fine-tuned on multispectral crop datasets from multiple continents.
If this is right
- Domain-specific pre-training on satellite imagery produces weights that transfer more reliably to new geographic areas than general image pre-training.
- Overall accuracy on crop mapping reaches high levels with as few as 100 labeled examples per new region.
- Correcting for class imbalance in crop type predictions requires roughly nine times more labeled examples than overall accuracy alone.
- Using Sentinel-2 tailored models can reduce the impact of geospatial bias when applying foundation models to agriculture in data-scarce regions.
Where Pith is reading between the lines
- Global crop monitoring systems could rely on a single Sentinel-2 pre-trained backbone with limited local fine-tuning.
- Similar pre-training strategies might improve performance on other multispectral tasks such as yield estimation or land-cover change detection.
- Practitioners could prioritize collection of balanced labeled samples rather than simply maximizing total volume when adapting these models.
Load-bearing premise
The five crop classification datasets drawn from five continents are representative of global agricultural conditions and free of selection or quality biases.
What would settle it
A new crop classification dataset from an additional continent or region where the ImageNet model matches or exceeds the accuracy of the Sentinel-2 pre-trained models would falsify the claimed advantage.
Figures
read the original abstract
Foundation models pre-trained using self-supervised learning have shown powerful transfer learning capabilities on various downstream tasks, including language understanding, text generation, and image recognition. The Earth observation (EO) field has produced several foundation models pre-trained directly on multispectral satellite imagery for applications like precision agriculture, wildfire and drought monitoring, and natural disaster response. However, few studies have investigated the ability of these models to generalize to new geographic locations, and potential concerns of geospatial bias -- models trained on data-rich developed nations not transferring well to data-scarce developing nations -- remain. We evaluate three popular EO foundation models, SSL4EO-S12, SatlasPretrain, and ImageNet, on five crop classification datasets across five continents. Results show that pre-trained weights designed explicitly for Sentinel-2, such as SSL4EO-S12, outperform general pre-trained weights like ImageNet. While only 100 labeled images are sufficient for achieving high overall accuracy, 900 images are required to mitigate class imbalance and improve average accuracy.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript evaluates three foundation models (SSL4EO-S12, SatlasPretrain, ImageNet) on five crop classification datasets spanning five continents. It claims that Sentinel-2-specific pre-trained weights outperform general ImageNet weights, that 100 labeled images suffice for high overall accuracy, and that 900 images are required to mitigate class imbalance and improve average accuracy.
Significance. If the results hold, the work supplies empirical evidence on the value of domain-specific pretraining for Earth-observation tasks and concrete guidance on label budgets for crop mapping, directly addressing concerns about geospatial bias in data-scarce regions.
major comments (2)
- [Abstract] Abstract: the abstract reports comparative accuracies and label thresholds but supplies no error bars, statistical tests, dataset sizes, fine-tuning protocols, or exclusion criteria; without the full methods section it is impossible to verify whether central comparisons are free of post-hoc choices.
- [Dataset description] Dataset description (implicit in the five-continent claim): the central claim of geographic generalizability (and thus that SSL4EO-S12 outperforms ImageNet) depends on the five datasets being representative and free of selection or quality bias; no explicit coverage of Köppen climates, crop calendars, field-size distributions, or cloud-cover regimes typical of data-scarce regions is supplied, so the reported 100-vs-900 image thresholds and model ranking could be conditional on untested conditions.
minor comments (1)
- The manuscript would benefit from reporting the exact number of images per dataset and per class to allow readers to assess the class-imbalance mitigation claim.
Simulated Author's Rebuttal
We thank the referee for the detailed review and constructive feedback on our manuscript. We address each major comment below. Where revisions are warranted, we have updated the manuscript accordingly.
read point-by-point responses
-
Referee: [Abstract] Abstract: the abstract reports comparative accuracies and label thresholds but supplies no error bars, statistical tests, dataset sizes, fine-tuning protocols, or exclusion criteria; without the full methods section it is impossible to verify whether central comparisons are free of post-hoc choices.
Authors: We agree that the abstract would benefit from additional context on the experimental setup. In the revised manuscript we have expanded the abstract to include the number of datasets and images used, a brief description of the fine-tuning protocol (linear probing with 100/900 labeled examples), and a note that all results include standard error across multiple runs. Full statistical details, exclusion criteria, and protocols remain in the Methods section due to length constraints. We have also added error bars to all result figures. revision: partial
-
Referee: [Dataset description] Dataset description (implicit in the five-continent claim): the central claim of geographic generalizability (and thus that SSL4EO-S12 outperforms ImageNet) depends on the five datasets being representative and free of selection or quality bias; no explicit coverage of Köppen climates, crop calendars, field-size distributions, or cloud-cover regimes typical of data-scarce regions is supplied, so the reported 100-vs-900 image thresholds and model ranking could be conditional on untested conditions.
Authors: The five datasets were selected because they are the primary public benchmarks for multi-continent crop mapping; together they cover North America, Europe, Africa, Asia, and South America. We acknowledge that the original manuscript did not explicitly tabulate Köppen climates, crop calendars, or cloud regimes. In the revised version we have added a new subsection and supplementary table that summarizes these characteristics for each dataset, drawing on the original dataset papers and auxiliary climate data. This addition allows readers to assess the diversity of conditions tested while preserving the original experimental design. revision: yes
Circularity Check
No circularity; purely empirical evaluation on external datasets
full rationale
The paper reports direct experimental results from fine-tuning and evaluating three pre-trained models (SSL4EO-S12, SatlasPretrain, ImageNet) on five public crop classification datasets spanning five continents. No equations, parameter fits, predictions derived from the paper's own data, or self-referential derivations appear in the reported claims. All performance numbers (overall accuracy with 100 images, average accuracy requiring 900 images, model ranking) are measured outcomes on held-out test sets, not constructed from the inputs by definition. The geographic-generalizability claim rests on the external datasets themselves rather than any internal reduction or self-citation chain.
Axiom & Free-Parameter Ledger
Reference graph
Works this paper leans on
-
[1]
Assessing and addressing the global state of food production data scarcity,
E. A. Kebede, H. Abou Ali, T. Clavelle, H. E. Froehlich, J. A. Gephart, S. Hartman, M. Herrero, H. Kerner, P. Mehta, C. Nakalembe et al., “Assessing and addressing the global state of food production data scarcity,” Nature Reviews Earth & Environment, vol. 5, no. 4, pp. 295–311, 2024
work page 2024
-
[2]
Crop yield prediction using deep neural networks,
S. Khaki and L. Wang, “Crop yield prediction using deep neural networks,” Frontiers in plant science, vol. 10, p. 452963, 2019
work page 2019
-
[3]
Crop yield assessment from remote sensing,
P. C. Doraiswamy, S. Moulin, P. W. Cook, and A. Stern, “Crop yield assessment from remote sensing,”Photogrammetric Engineering & Remote Sensing , vol. 69, no. 6, pp. 665–674, 2003
work page 2003
-
[4]
Crop yield prediction using machine learning: A systematic literature re- view,
T. Van Klompenburg, A. Kassahun, and C. Catal, “Crop yield prediction using machine learning: A systematic literature re- view,” Computers and Electronics in Agriculture , vol. 177, p. 105709, 2020
work page 2020
-
[5]
J. Fan, J. Bai, Z. Li, A. Ortiz-Bobea, and C. P. Gomes, “A GNN-RNN approach for harnessing geospatial and temporal information: Application to crop yield prediction,” Proceedings of the AAAI Conference on Artificial Intelligence , vol. 36, no. 11, pp. 11 873–11 881, Jun. 2022. [Online]. Available: https://ojs.aaai.org/index.php/AAAI/article/view/21444
work page 2022
-
[6]
E. M. Gross, B. P. Lahkar, N. Subedi, V . R. Nyirenda, L. L. Lichtenfeld, and O. Jakoby, “Seasonality, crop type and crop phenology influence crop damage by wildlife herbivores in africa and asia,” Biodiversity and Conservation , vol. 27, pp. 2029–2050, 2018
work page 2029
-
[7]
Wildlife- friendly farming increases crop yield: evidence for ecological intensification,
R. F. Pywell, M. S. Heard, B. A. Woodcock, S. Hinsley, L. Ridding, M. Nowakowski, and J. M. Bullock, “Wildlife- friendly farming increases crop yield: evidence for ecological intensification,” Proceedings of the Royal Society B: Biological Sciences, vol. 282, no. 1816, p. 20151740, 2015
work page 2015
-
[8]
Assessment of crop damage using space remote sensing and gis,
N. Silleos, K. Perakis, and G. Petsanis, “Assessment of crop damage using space remote sensing and gis,” International Journal of Remote Sensing , vol. 23, no. 3, pp. 417–427, 2002
work page 2002
-
[9]
A systematic review on case studies of remote-sensing-based flood crop loss assessment,
M. S. Rahman and L. Di, “A systematic review on case studies of remote-sensing-based flood crop loss assessment,” Agriculture, vol. 10, no. 4, p. 131, 2020
work page 2020
-
[10]
Accuracy assessment of the first Eu-wide crop type map with lucas data,
A. Verhegghen, R. d’Andrimont, F. Waldner, and M. Van der Velde, “Accuracy assessment of the first Eu-wide crop type map with lucas data,” in 2021 IEEE International Geoscience and Remote Sensing Symposium IGARSS , 2021, pp. 1990–1993
work page 2021
-
[11]
D. Luman and T. Tweddale, “Assessment and potential of the 2007 usda-nass cropland data layer for statewide annual land cover applications,” Technical Report INHS 2008 (49) , 2008
work page 2007
-
[12]
Crop type mapping using lidar, sentinel-2 and aerial imagery with machine learning algorithms,
A. J. Prins and A. Van Niekerk, “Crop type mapping using lidar, sentinel-2 and aerial imagery with machine learning algorithms,” Geo-Spatial Information Science , vol. 24, no. 2, pp. 215–227, 2021
work page 2021
-
[13]
Sentinel sar-optical fusion for crop type mapping using deep learning and google earth engine,
J. Adrian, V . Sagan, and M. Maimaitijiang, “Sentinel sar-optical fusion for crop type mapping using deep learning and google earth engine,” ISPRS Journal of Photogrammetry and Remote Sensing, vol. 175, pp. 215–235, 2021
work page 2021
-
[14]
Deep learning with multi-scale temporal hybrid structure for robust crop mapping,
P. Tang, J. Chanussot, S. Guo, W. Zhang, L. Qie, P. Zhang, H. Fang, and P. Du, “Deep learning with multi-scale temporal hybrid structure for robust crop mapping,” ISPRS Journal of Photogrammetry and Remote Sensing , vol. 209, pp. 117–132, 2024
work page 2024
-
[15]
Y . Yuan, L. Lin, Z.-G. Zhou, H. Jiang, and Q. Liu, “Bridging optical and sar satellite image time series via contrastive feature extraction for crop classification,”ISPRS Journal of Photogram- metry and Remote Sensing , vol. 195, pp. 222–232, 2023
work page 2023
-
[16]
Transfer learning in environmental remote sensing,
Y . Ma, S. Chen, S. Ermon, and D. B. Lobell, “Transfer learning in environmental remote sensing,” Remote Sensing of Environment , vol. 301, p. 113924, 2024. [Online]. Available: https://www.sciencedirect.com/science/ article/pii/S0034425723004765
work page 2024
-
[17]
Generalized few-shot semantic segmentation in remote sensing: Challenge and benchmark,
C. Broni-Bediako, J. Xia, J. Song, H. Chen, M. Siam, and N. Yokoya, “Generalized few-shot semantic segmentation in remote sensing: Challenge and benchmark,” IEEE Geoscience and Remote Sensing Letters , vol. 21, pp. 1–5, 2024
work page 2024
-
[18]
Towards global crop maps with transfer learning,
A. Koukos, H.-W. Jo, V . Sitokonstantinou, I. Tsoumas, C. Kon- toes, and W.-K. Lee, “Towards global crop maps with transfer learning,” in IGARSS 2024 - 2024 IEEE International Geo- science and Remote Sensing Symposium , 2024, pp. 1540–1545
work page 2024
-
[19]
H. Kerner, S. Chaudhari, A. Ghosh, C. Robinson, A. Ahmad, E. Choi, N. Jacobs, C. Holmes, M. Mohr, R. Dodhia, J. M. L. Ferres, and J. Marcus, “Fields of The World: A machine learning benchmark dataset for global agricultural field boundary segmentation,” 2024. [Online]. Available: https://arxiv.org/abs/2409.16252
-
[20]
S. Sachdeva, I. Lopez, C. Biradar, and D. Lobell, “A distribution shift benchmark for smallholder agroforestry: Do foundation models improve geographic generalization?” The Twelfth In- ternational Conference on Learning Representations 2024 Ma- chine Learning for Remote Sensing (ML4RS) Workshop , 2024
work page 2024
-
[21]
Lightweight, pre-trained transformers for remote sensing timeseries,
G. Tseng, R. Cartuyvels, I. Zvonkov, M. Purohit, D. Rolnick, and H. Kerner, “Lightweight, pre-trained transformers for remote sensing timeseries,” 2024. [Online]. Available: https: //arxiv.org/abs/2304.14065
-
[22]
CropHarvest: A global dataset for crop-type classification,
G. Tseng, I. Zvonkov, C. Nakalembe, and H. R. Kerner, “CropHarvest: A global dataset for crop-type classification,” in NeurIPS Datasets and Benchmarks , 2021. [Online]. Available: https://api.semanticscholar.org/CorpusID:248529758
work page 2021
-
[23]
Fewshot learning on global multimodal embeddings for earth observation tasks,
M. Allen, F. Dorr, J. A. Gallego-Mejia, L. Mart ´ınez- Ferrer, A. Jungbluth, F. Kalaitzis, and R. Ramos-Poll ´an, “Fewshot learning on global multimodal embeddings for earth observation tasks,” 2023. [Online]. Available: https: //arxiv.org/abs/2310.00119
-
[24]
Y . Wang, N. A. A. Braham, Z. Xiong, C. Liu, C. M. Albrecht, and X. Zhu, “SSL4EO-S12: A large-scale multimodal, multitemporal dataset for self-supervised learning in earth observation [software and data sets],” IEEE Geoscience and Remote Sensing Magazine, vol. 11, pp. 98–106, 2023. [Online]. Available: https://api.semanticscholar.org/CorpusID:262975520
work page 2023
-
[25]
SatlasPretrain: A large-scale dataset for remote sensing image understanding,
F. Bastani, P. Wolters, R. Gupta, J. Ferdinando, and A. Kembhavi, “SatlasPretrain: A large-scale dataset for remote sensing image understanding,” 2023 IEEE/CVF International Conference on Computer Vision (ICCV), pp. 16 726–16 736, 2022. [Online]. Available: https://api.semanticscholar.org/CorpusID:258947021
work page 2023
-
[26]
USDA NASS, “Cropland Data Layer,” USDA NASS Marketing and Information Services Office, Washington, D.C., 2024. [Online]. Available: https://croplandcros.scinet.usda.gov/
work page 2024
-
[27]
EuroCrops: The largest harmonized open crop dataset across the European Union,
M. Schneider, T. Schelte, F. Schmitz, and M. K ¨orner, “EuroCrops: The largest harmonized open crop dataset across the European Union,” Scientific Data , vol. 10, no. 1, p. 612, Sep. 2023. [Online]. Available: https: //doi.org/10.1038/s41597-023-02517-0
-
[28]
The 10-m crop type maps in northeast china during 2017–2019,
N. You, J. Dong, J. Huang, G. Du, G. Zhang, Y . He, T. Yang, Y . Di, and X. Xiao, “The 10-m crop type maps in northeast china during 2017–2019,” Scientific data, vol. 8, no. 1, p. 41, 2021. [Online]. Available: https://doi.org/10.1038/s41597-021-00827-9
-
[29]
Crop type classification dataset for western cape, south africa,
Western Cape Department of Agriculture and Radiant Earth Foundation, “Crop type classification dataset for western cape, south africa,” Radiant MLHub, 2021, version 1.0. [Online]. Available: https://doi.org/10.34911/rdnt.j0co8q
-
[30]
Massive soybean expansion in south america since 2000 and implications for conservation,
X.-P. Song, M. C. Hansen, P. Potapov, B. Adusei, J. Pickering, M. Adami, A. Lima, V . Zalles, S. V . Stehman, C. M. Di Bella et al., “Massive soybean expansion in south america since 2000 and implications for conservation,” Nature sustainability, vol. 4, no. 9, pp. 784–792, 2021
work page 2000
-
[31]
TorchGeo: Deep learning with geospatial data,
A. J. Stewart, C. Robinson, I. A. Corley, A. Ortiz, J. M. Lavista Ferres, and A. Banerjee, “TorchGeo: Deep learning with geospatial data,” in Proceedings of the 30th International Conference on Advances in Geographic Information Systems , ser. SIGSPATIAL ’22. Seattle, Washington: Association for Computing Machinery, Nov. 2022, pp. 1–12. [Online]. Availabl...
-
[32]
Regional and global shifts in crop diversity through the anthropocene,
A. R. Martin, M. W. Cadotte, M. E. Isaac, R. Milla, D. Vile, and C. Violle, “Regional and global shifts in crop diversity through the anthropocene,” PLoS One, vol. 14, no. 2, p. e0209788, 2019
work page 2019
-
[33]
Deep residual learning for image recognition,
K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) , June 2016
work page 2016
-
[34]
U-Net: Convolutional networks for biomedical image segmentation,
O. Ronneberger, P. Fischer, and T. Brox, “U-Net: Convolutional networks for biomedical image segmentation,” in Medical Im- age Computing and Computer-Assisted Intervention – MICCAI 2015, N. Navab, J. Hornegger, W. M. Wells, and A. F. Frangi, Eds. Cham: Springer International Publishing, 2015, pp. 234– 241
work page 2015
-
[35]
An image is worth 16x16 words: Transformers for image recognition at scale,
A. Dosovitskiy, L. Beyer, A. Kolesnikov, D. Weissenborn, X. Zhai, T. Unterthiner, M. Dehghani, M. Minderer, G. Heigold, S. Gelly, J. Uszkoreit, and N. Houlsby, “An image is worth 16x16 words: Transformers for image recognition at scale,”
-
[36]
An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale
[Online]. Available: https://arxiv.org/abs/2010.11929
work page internal anchor Pith review Pith/arXiv arXiv 2010
-
[37]
Improved Baselines with Momentum Contrastive Learning
X. Chen, H. Fan, R. Girshick, and K. He, “Improved base- lines with momentum contrastive learning,” arXiv preprint arXiv:2003.04297, 2020
work page internal anchor Pith review Pith/arXiv arXiv 2003
-
[38]
Emerging properties in self-supervised vision transformers,
M. Caron, H. Touvron, I. Misra, H. J ´egou, J. Mairal, P. Bo- janowski, and A. Joulin, “Emerging properties in self-supervised vision transformers,” in Proceedings of the IEEE/CVF Interna- tional Conference on Computer Vision (ICCV) , October 2021, pp. 9650–9660
work page 2021
-
[39]
Swin Transformer: Hierarchical vision transformer using shifted windows,
Z. Liu, Y . Lin, Y . Cao, H. Hu, Y . Wei, Z. Zhang, S. Lin, and B. Guo, “Swin Transformer: Hierarchical vision transformer using shifted windows,” in 2021 IEEE/CVF International Con- ference on Computer Vision (ICCV) , 2021, pp. 9992–10 002
work page 2021
-
[40]
ImageNet: A large-scale hierarchical image database,
J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, and L. Fei-Fei, “ImageNet: A large-scale hierarchical image database,” in 2009 IEEE Conference on Computer Vision and Pattern Recognition, 2009, pp. 248–255
work page 2009
-
[41]
R. Wightman, “PyTorch Image Models,” https://github.com/ rwightman/pytorch-image-models, 2019
work page 2019
-
[42]
SSL4EO-L: Datasets and foundation models for Landsat imagery,
A. Stewart, N. Lehmann, I. Corley, Y . Wang, Y .-C. Chang, N. Ait Ali Braham, S. Sehgal, C. Robinson, and A. Baner- jee, “SSL4EO-L: Datasets and foundation models for Landsat imagery,” Advances in Neural Information Processing Systems , vol. 36, 2024
work page 2024
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.