pith. sign in

arxiv: 2409.09451 · v5 · submitted 2024-09-14 · 💻 cs.CV · cs.LG

On the Generalizability of Foundation Models for Crop Type Mapping

Pith reviewed 2026-05-23 20:56 UTC · model grok-4.3

classification 💻 cs.CV cs.LG
keywords foundation modelscrop type mappingSentinel-2transfer learninggeneralizabilitygeospatial biasself-supervised learningEarth observation
0
0 comments X

The pith

Foundation models pre-trained on Sentinel-2 imagery transfer better to crop classification across continents than ImageNet weights.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper tests whether Earth observation foundation models can generalize across geographies for the task of mapping crop types. It compares three pre-trained models on five separate crop classification datasets drawn from five continents. The evaluation shows that models whose pre-training used Sentinel-2 multispectral data achieve higher accuracy than a general-purpose ImageNet model. The work also measures how many labeled examples are needed to reach strong performance and to correct for class imbalance.

Core claim

Pre-trained weights designed explicitly for Sentinel-2, such as SSL4EO-S12, outperform general pre-trained weights like ImageNet when evaluated on five crop classification datasets across five continents. While only 100 labeled images are sufficient for achieving high overall accuracy, 900 images are required to mitigate class imbalance and improve average accuracy.

What carries the argument

Transfer performance comparison of Sentinel-2-specific self-supervised pre-training against ImageNet pre-training when fine-tuned on multispectral crop datasets from multiple continents.

If this is right

  • Domain-specific pre-training on satellite imagery produces weights that transfer more reliably to new geographic areas than general image pre-training.
  • Overall accuracy on crop mapping reaches high levels with as few as 100 labeled examples per new region.
  • Correcting for class imbalance in crop type predictions requires roughly nine times more labeled examples than overall accuracy alone.
  • Using Sentinel-2 tailored models can reduce the impact of geospatial bias when applying foundation models to agriculture in data-scarce regions.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Global crop monitoring systems could rely on a single Sentinel-2 pre-trained backbone with limited local fine-tuning.
  • Similar pre-training strategies might improve performance on other multispectral tasks such as yield estimation or land-cover change detection.
  • Practitioners could prioritize collection of balanced labeled samples rather than simply maximizing total volume when adapting these models.

Load-bearing premise

The five crop classification datasets drawn from five continents are representative of global agricultural conditions and free of selection or quality biases.

What would settle it

A new crop classification dataset from an additional continent or region where the ImageNet model matches or exceeds the accuracy of the Sentinel-2 pre-trained models would falsify the claimed advantage.

Figures

Figures reproduced from arXiv: 2409.09451 by Adam J. Stewart, Arindam Banerjee, Favyen Bastani, George R. Huber, Jingtong Wang, Piper Wolters, Shreya Kannan, Yi-Chia Chang.

Figure 1
Figure 1. Figure 1: Reported metrics of ID, OOD + ID, and balanced [PITH_FULL_IMAGE:figures/full_fig_p003_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Visualization of example input Sentinel-2 images, [PITH_FULL_IMAGE:figures/full_fig_p004_2.png] view at source ↗
read the original abstract

Foundation models pre-trained using self-supervised learning have shown powerful transfer learning capabilities on various downstream tasks, including language understanding, text generation, and image recognition. The Earth observation (EO) field has produced several foundation models pre-trained directly on multispectral satellite imagery for applications like precision agriculture, wildfire and drought monitoring, and natural disaster response. However, few studies have investigated the ability of these models to generalize to new geographic locations, and potential concerns of geospatial bias -- models trained on data-rich developed nations not transferring well to data-scarce developing nations -- remain. We evaluate three popular EO foundation models, SSL4EO-S12, SatlasPretrain, and ImageNet, on five crop classification datasets across five continents. Results show that pre-trained weights designed explicitly for Sentinel-2, such as SSL4EO-S12, outperform general pre-trained weights like ImageNet. While only 100 labeled images are sufficient for achieving high overall accuracy, 900 images are required to mitigate class imbalance and improve average accuracy.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 1 minor

Summary. The manuscript evaluates three foundation models (SSL4EO-S12, SatlasPretrain, ImageNet) on five crop classification datasets spanning five continents. It claims that Sentinel-2-specific pre-trained weights outperform general ImageNet weights, that 100 labeled images suffice for high overall accuracy, and that 900 images are required to mitigate class imbalance and improve average accuracy.

Significance. If the results hold, the work supplies empirical evidence on the value of domain-specific pretraining for Earth-observation tasks and concrete guidance on label budgets for crop mapping, directly addressing concerns about geospatial bias in data-scarce regions.

major comments (2)
  1. [Abstract] Abstract: the abstract reports comparative accuracies and label thresholds but supplies no error bars, statistical tests, dataset sizes, fine-tuning protocols, or exclusion criteria; without the full methods section it is impossible to verify whether central comparisons are free of post-hoc choices.
  2. [Dataset description] Dataset description (implicit in the five-continent claim): the central claim of geographic generalizability (and thus that SSL4EO-S12 outperforms ImageNet) depends on the five datasets being representative and free of selection or quality bias; no explicit coverage of Köppen climates, crop calendars, field-size distributions, or cloud-cover regimes typical of data-scarce regions is supplied, so the reported 100-vs-900 image thresholds and model ranking could be conditional on untested conditions.
minor comments (1)
  1. The manuscript would benefit from reporting the exact number of images per dataset and per class to allow readers to assess the class-imbalance mitigation claim.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the detailed review and constructive feedback on our manuscript. We address each major comment below. Where revisions are warranted, we have updated the manuscript accordingly.

read point-by-point responses
  1. Referee: [Abstract] Abstract: the abstract reports comparative accuracies and label thresholds but supplies no error bars, statistical tests, dataset sizes, fine-tuning protocols, or exclusion criteria; without the full methods section it is impossible to verify whether central comparisons are free of post-hoc choices.

    Authors: We agree that the abstract would benefit from additional context on the experimental setup. In the revised manuscript we have expanded the abstract to include the number of datasets and images used, a brief description of the fine-tuning protocol (linear probing with 100/900 labeled examples), and a note that all results include standard error across multiple runs. Full statistical details, exclusion criteria, and protocols remain in the Methods section due to length constraints. We have also added error bars to all result figures. revision: partial

  2. Referee: [Dataset description] Dataset description (implicit in the five-continent claim): the central claim of geographic generalizability (and thus that SSL4EO-S12 outperforms ImageNet) depends on the five datasets being representative and free of selection or quality bias; no explicit coverage of Köppen climates, crop calendars, field-size distributions, or cloud-cover regimes typical of data-scarce regions is supplied, so the reported 100-vs-900 image thresholds and model ranking could be conditional on untested conditions.

    Authors: The five datasets were selected because they are the primary public benchmarks for multi-continent crop mapping; together they cover North America, Europe, Africa, Asia, and South America. We acknowledge that the original manuscript did not explicitly tabulate Köppen climates, crop calendars, or cloud regimes. In the revised version we have added a new subsection and supplementary table that summarizes these characteristics for each dataset, drawing on the original dataset papers and auxiliary climate data. This addition allows readers to assess the diversity of conditions tested while preserving the original experimental design. revision: yes

Circularity Check

0 steps flagged

No circularity; purely empirical evaluation on external datasets

full rationale

The paper reports direct experimental results from fine-tuning and evaluating three pre-trained models (SSL4EO-S12, SatlasPretrain, ImageNet) on five public crop classification datasets spanning five continents. No equations, parameter fits, predictions derived from the paper's own data, or self-referential derivations appear in the reported claims. All performance numbers (overall accuracy with 100 images, average accuracy requiring 900 images, model ranking) are measured outcomes on held-out test sets, not constructed from the inputs by definition. The geographic-generalizability claim rests on the external datasets themselves rather than any internal reduction or self-citation chain.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Empirical evaluation paper; contains no mathematical derivations, free parameters, or postulated entities. Relies on standard supervised fine-tuning assumptions common to the field.

pith-pipeline@v0.9.0 · 5732 in / 1163 out tokens · 59701 ms · 2026-05-23T20:56:47.042345+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

42 extracted references · 42 canonical work pages · 2 internal anchors

  1. [1]

    Assessing and addressing the global state of food production data scarcity,

    E. A. Kebede, H. Abou Ali, T. Clavelle, H. E. Froehlich, J. A. Gephart, S. Hartman, M. Herrero, H. Kerner, P. Mehta, C. Nakalembe et al., “Assessing and addressing the global state of food production data scarcity,” Nature Reviews Earth & Environment, vol. 5, no. 4, pp. 295–311, 2024

  2. [2]

    Crop yield prediction using deep neural networks,

    S. Khaki and L. Wang, “Crop yield prediction using deep neural networks,” Frontiers in plant science, vol. 10, p. 452963, 2019

  3. [3]

    Crop yield assessment from remote sensing,

    P. C. Doraiswamy, S. Moulin, P. W. Cook, and A. Stern, “Crop yield assessment from remote sensing,”Photogrammetric Engineering & Remote Sensing , vol. 69, no. 6, pp. 665–674, 2003

  4. [4]

    Crop yield prediction using machine learning: A systematic literature re- view,

    T. Van Klompenburg, A. Kassahun, and C. Catal, “Crop yield prediction using machine learning: A systematic literature re- view,” Computers and Electronics in Agriculture , vol. 177, p. 105709, 2020

  5. [5]

    A GNN-RNN approach for harnessing geospatial and temporal information: Application to crop yield prediction,

    J. Fan, J. Bai, Z. Li, A. Ortiz-Bobea, and C. P. Gomes, “A GNN-RNN approach for harnessing geospatial and temporal information: Application to crop yield prediction,” Proceedings of the AAAI Conference on Artificial Intelligence , vol. 36, no. 11, pp. 11 873–11 881, Jun. 2022. [Online]. Available: https://ojs.aaai.org/index.php/AAAI/article/view/21444

  6. [6]

    Seasonality, crop type and crop phenology influence crop damage by wildlife herbivores in africa and asia,

    E. M. Gross, B. P. Lahkar, N. Subedi, V . R. Nyirenda, L. L. Lichtenfeld, and O. Jakoby, “Seasonality, crop type and crop phenology influence crop damage by wildlife herbivores in africa and asia,” Biodiversity and Conservation , vol. 27, pp. 2029–2050, 2018

  7. [7]

    Wildlife- friendly farming increases crop yield: evidence for ecological intensification,

    R. F. Pywell, M. S. Heard, B. A. Woodcock, S. Hinsley, L. Ridding, M. Nowakowski, and J. M. Bullock, “Wildlife- friendly farming increases crop yield: evidence for ecological intensification,” Proceedings of the Royal Society B: Biological Sciences, vol. 282, no. 1816, p. 20151740, 2015

  8. [8]

    Assessment of crop damage using space remote sensing and gis,

    N. Silleos, K. Perakis, and G. Petsanis, “Assessment of crop damage using space remote sensing and gis,” International Journal of Remote Sensing , vol. 23, no. 3, pp. 417–427, 2002

  9. [9]

    A systematic review on case studies of remote-sensing-based flood crop loss assessment,

    M. S. Rahman and L. Di, “A systematic review on case studies of remote-sensing-based flood crop loss assessment,” Agriculture, vol. 10, no. 4, p. 131, 2020

  10. [10]

    Accuracy assessment of the first Eu-wide crop type map with lucas data,

    A. Verhegghen, R. d’Andrimont, F. Waldner, and M. Van der Velde, “Accuracy assessment of the first Eu-wide crop type map with lucas data,” in 2021 IEEE International Geoscience and Remote Sensing Symposium IGARSS , 2021, pp. 1990–1993

  11. [11]

    Assessment and potential of the 2007 usda-nass cropland data layer for statewide annual land cover applications,

    D. Luman and T. Tweddale, “Assessment and potential of the 2007 usda-nass cropland data layer for statewide annual land cover applications,” Technical Report INHS 2008 (49) , 2008

  12. [12]

    Crop type mapping using lidar, sentinel-2 and aerial imagery with machine learning algorithms,

    A. J. Prins and A. Van Niekerk, “Crop type mapping using lidar, sentinel-2 and aerial imagery with machine learning algorithms,” Geo-Spatial Information Science , vol. 24, no. 2, pp. 215–227, 2021

  13. [13]

    Sentinel sar-optical fusion for crop type mapping using deep learning and google earth engine,

    J. Adrian, V . Sagan, and M. Maimaitijiang, “Sentinel sar-optical fusion for crop type mapping using deep learning and google earth engine,” ISPRS Journal of Photogrammetry and Remote Sensing, vol. 175, pp. 215–235, 2021

  14. [14]

    Deep learning with multi-scale temporal hybrid structure for robust crop mapping,

    P. Tang, J. Chanussot, S. Guo, W. Zhang, L. Qie, P. Zhang, H. Fang, and P. Du, “Deep learning with multi-scale temporal hybrid structure for robust crop mapping,” ISPRS Journal of Photogrammetry and Remote Sensing , vol. 209, pp. 117–132, 2024

  15. [15]

    Bridging optical and sar satellite image time series via contrastive feature extraction for crop classification,

    Y . Yuan, L. Lin, Z.-G. Zhou, H. Jiang, and Q. Liu, “Bridging optical and sar satellite image time series via contrastive feature extraction for crop classification,”ISPRS Journal of Photogram- metry and Remote Sensing , vol. 195, pp. 222–232, 2023

  16. [16]

    Transfer learning in environmental remote sensing,

    Y . Ma, S. Chen, S. Ermon, and D. B. Lobell, “Transfer learning in environmental remote sensing,” Remote Sensing of Environment , vol. 301, p. 113924, 2024. [Online]. Available: https://www.sciencedirect.com/science/ article/pii/S0034425723004765

  17. [17]

    Generalized few-shot semantic segmentation in remote sensing: Challenge and benchmark,

    C. Broni-Bediako, J. Xia, J. Song, H. Chen, M. Siam, and N. Yokoya, “Generalized few-shot semantic segmentation in remote sensing: Challenge and benchmark,” IEEE Geoscience and Remote Sensing Letters , vol. 21, pp. 1–5, 2024

  18. [18]

    Towards global crop maps with transfer learning,

    A. Koukos, H.-W. Jo, V . Sitokonstantinou, I. Tsoumas, C. Kon- toes, and W.-K. Lee, “Towards global crop maps with transfer learning,” in IGARSS 2024 - 2024 IEEE International Geo- science and Remote Sensing Symposium , 2024, pp. 1540–1545

  19. [19]

    Fields of The World: A machine learning benchmark dataset for global agricultural field boundary segmentation,

    H. Kerner, S. Chaudhari, A. Ghosh, C. Robinson, A. Ahmad, E. Choi, N. Jacobs, C. Holmes, M. Mohr, R. Dodhia, J. M. L. Ferres, and J. Marcus, “Fields of The World: A machine learning benchmark dataset for global agricultural field boundary segmentation,” 2024. [Online]. Available: https://arxiv.org/abs/2409.16252

  20. [20]

    A distribution shift benchmark for smallholder agroforestry: Do foundation models improve geographic generalization?

    S. Sachdeva, I. Lopez, C. Biradar, and D. Lobell, “A distribution shift benchmark for smallholder agroforestry: Do foundation models improve geographic generalization?” The Twelfth In- ternational Conference on Learning Representations 2024 Ma- chine Learning for Remote Sensing (ML4RS) Workshop , 2024

  21. [21]

    Lightweight, pre-trained transformers for remote sensing timeseries,

    G. Tseng, R. Cartuyvels, I. Zvonkov, M. Purohit, D. Rolnick, and H. Kerner, “Lightweight, pre-trained transformers for remote sensing timeseries,” 2024. [Online]. Available: https: //arxiv.org/abs/2304.14065

  22. [22]

    CropHarvest: A global dataset for crop-type classification,

    G. Tseng, I. Zvonkov, C. Nakalembe, and H. R. Kerner, “CropHarvest: A global dataset for crop-type classification,” in NeurIPS Datasets and Benchmarks , 2021. [Online]. Available: https://api.semanticscholar.org/CorpusID:248529758

  23. [23]

    Fewshot learning on global multimodal embeddings for earth observation tasks,

    M. Allen, F. Dorr, J. A. Gallego-Mejia, L. Mart ´ınez- Ferrer, A. Jungbluth, F. Kalaitzis, and R. Ramos-Poll ´an, “Fewshot learning on global multimodal embeddings for earth observation tasks,” 2023. [Online]. Available: https: //arxiv.org/abs/2310.00119

  24. [24]

    SSL4EO-S12: A large-scale multimodal, multitemporal dataset for self-supervised learning in earth observation [software and data sets],

    Y . Wang, N. A. A. Braham, Z. Xiong, C. Liu, C. M. Albrecht, and X. Zhu, “SSL4EO-S12: A large-scale multimodal, multitemporal dataset for self-supervised learning in earth observation [software and data sets],” IEEE Geoscience and Remote Sensing Magazine, vol. 11, pp. 98–106, 2023. [Online]. Available: https://api.semanticscholar.org/CorpusID:262975520

  25. [25]

    SatlasPretrain: A large-scale dataset for remote sensing image understanding,

    F. Bastani, P. Wolters, R. Gupta, J. Ferdinando, and A. Kembhavi, “SatlasPretrain: A large-scale dataset for remote sensing image understanding,” 2023 IEEE/CVF International Conference on Computer Vision (ICCV), pp. 16 726–16 736, 2022. [Online]. Available: https://api.semanticscholar.org/CorpusID:258947021

  26. [26]

    Cropland Data Layer,

    USDA NASS, “Cropland Data Layer,” USDA NASS Marketing and Information Services Office, Washington, D.C., 2024. [Online]. Available: https://croplandcros.scinet.usda.gov/

  27. [27]

    EuroCrops: The largest harmonized open crop dataset across the European Union,

    M. Schneider, T. Schelte, F. Schmitz, and M. K ¨orner, “EuroCrops: The largest harmonized open crop dataset across the European Union,” Scientific Data , vol. 10, no. 1, p. 612, Sep. 2023. [Online]. Available: https: //doi.org/10.1038/s41597-023-02517-0

  28. [28]

    The 10-m crop type maps in northeast china during 2017–2019,

    N. You, J. Dong, J. Huang, G. Du, G. Zhang, Y . He, T. Yang, Y . Di, and X. Xiao, “The 10-m crop type maps in northeast china during 2017–2019,” Scientific data, vol. 8, no. 1, p. 41, 2021. [Online]. Available: https://doi.org/10.1038/s41597-021-00827-9

  29. [29]

    Crop type classification dataset for western cape, south africa,

    Western Cape Department of Agriculture and Radiant Earth Foundation, “Crop type classification dataset for western cape, south africa,” Radiant MLHub, 2021, version 1.0. [Online]. Available: https://doi.org/10.34911/rdnt.j0co8q

  30. [30]

    Massive soybean expansion in south america since 2000 and implications for conservation,

    X.-P. Song, M. C. Hansen, P. Potapov, B. Adusei, J. Pickering, M. Adami, A. Lima, V . Zalles, S. V . Stehman, C. M. Di Bella et al., “Massive soybean expansion in south america since 2000 and implications for conservation,” Nature sustainability, vol. 4, no. 9, pp. 784–792, 2021

  31. [31]

    TorchGeo: Deep learning with geospatial data,

    A. J. Stewart, C. Robinson, I. A. Corley, A. Ortiz, J. M. Lavista Ferres, and A. Banerjee, “TorchGeo: Deep learning with geospatial data,” in Proceedings of the 30th International Conference on Advances in Geographic Information Systems , ser. SIGSPATIAL ’22. Seattle, Washington: Association for Computing Machinery, Nov. 2022, pp. 1–12. [Online]. Availabl...

  32. [32]

    Regional and global shifts in crop diversity through the anthropocene,

    A. R. Martin, M. W. Cadotte, M. E. Isaac, R. Milla, D. Vile, and C. Violle, “Regional and global shifts in crop diversity through the anthropocene,” PLoS One, vol. 14, no. 2, p. e0209788, 2019

  33. [33]

    Deep residual learning for image recognition,

    K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) , June 2016

  34. [34]

    U-Net: Convolutional networks for biomedical image segmentation,

    O. Ronneberger, P. Fischer, and T. Brox, “U-Net: Convolutional networks for biomedical image segmentation,” in Medical Im- age Computing and Computer-Assisted Intervention – MICCAI 2015, N. Navab, J. Hornegger, W. M. Wells, and A. F. Frangi, Eds. Cham: Springer International Publishing, 2015, pp. 234– 241

  35. [35]

    An image is worth 16x16 words: Transformers for image recognition at scale,

    A. Dosovitskiy, L. Beyer, A. Kolesnikov, D. Weissenborn, X. Zhai, T. Unterthiner, M. Dehghani, M. Minderer, G. Heigold, S. Gelly, J. Uszkoreit, and N. Houlsby, “An image is worth 16x16 words: Transformers for image recognition at scale,”

  36. [36]

    An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale

    [Online]. Available: https://arxiv.org/abs/2010.11929

  37. [37]

    Improved Baselines with Momentum Contrastive Learning

    X. Chen, H. Fan, R. Girshick, and K. He, “Improved base- lines with momentum contrastive learning,” arXiv preprint arXiv:2003.04297, 2020

  38. [38]

    Emerging properties in self-supervised vision transformers,

    M. Caron, H. Touvron, I. Misra, H. J ´egou, J. Mairal, P. Bo- janowski, and A. Joulin, “Emerging properties in self-supervised vision transformers,” in Proceedings of the IEEE/CVF Interna- tional Conference on Computer Vision (ICCV) , October 2021, pp. 9650–9660

  39. [39]

    Swin Transformer: Hierarchical vision transformer using shifted windows,

    Z. Liu, Y . Lin, Y . Cao, H. Hu, Y . Wei, Z. Zhang, S. Lin, and B. Guo, “Swin Transformer: Hierarchical vision transformer using shifted windows,” in 2021 IEEE/CVF International Con- ference on Computer Vision (ICCV) , 2021, pp. 9992–10 002

  40. [40]

    ImageNet: A large-scale hierarchical image database,

    J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, and L. Fei-Fei, “ImageNet: A large-scale hierarchical image database,” in 2009 IEEE Conference on Computer Vision and Pattern Recognition, 2009, pp. 248–255

  41. [41]

    PyTorch Image Models,

    R. Wightman, “PyTorch Image Models,” https://github.com/ rwightman/pytorch-image-models, 2019

  42. [42]

    SSL4EO-L: Datasets and foundation models for Landsat imagery,

    A. Stewart, N. Lehmann, I. Corley, Y . Wang, Y .-C. Chang, N. Ait Ali Braham, S. Sehgal, C. Robinson, and A. Baner- jee, “SSL4EO-L: Datasets and foundation models for Landsat imagery,” Advances in Neural Information Processing Systems , vol. 36, 2024