BELDE: Building a Large-scale Earth-observation Land-cover Dataset for Europe
Pith reviewed 2026-06-26 17:47 UTC · model grok-4.3
The pith
BELDE supplies 1,088,385 curated Sentinel-2 RGB and land-cover map pairs across Europe at 10 m resolution.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
BELDE contains 1,088,385 curated image-segmentation map pairs spanning Europe with 7 land-cover classes at 10 m spatial resolution, constructed from Sentinel-2 true-color images and ESA WorldCover data annotations; the authors additionally release BELDE-K (16,607 pairs) for Korea and BELDE-CA-NV (88,155 pairs) for California and Nevada, and report baseline F1 scores of 83.0 percent in-domain versus 66.4 percent and 58.3 percent on the cross-domain sets.
What carries the argument
The BELDE dataset of paired Sentinel-2 RGB images and seven-class land-cover segmentation maps derived from ESA WorldCover annotations, which supplies the scale and geographic breadth needed for training and evaluating remote-sensing segmentation models.
If this is right
- Models trained on BELDE reach 83.0 percent F1 on the European test set.
- The same models drop to 66.4 percent F1 on BELDE-CA-NV and 58.3 percent on BELDE-K, quantifying geographic domain shift.
- The dataset and its cross-region companions enable controlled studies of model generalization beyond a single continent.
- Public release of the full collection lowers the barrier to developing transferable Earth-observation segmentation systems.
Where Pith is reading between the lines
- If the WorldCover labels prove consistent enough, future work could combine BELDE with similar continental-scale sets to train models that require less region-specific retraining.
- The measured cross-domain drops suggest that explicit domain-adaptation layers or style-transfer preprocessing may be needed before models trained on BELDE can be deployed reliably outside Europe.
- The dataset size makes it feasible to test whether scaling laws observed in natural-image segmentation also hold for satellite RGB data.
Load-bearing premise
ESA WorldCover annotations supply sufficiently accurate and spatially consistent labels to serve as reliable ground truth across the full diversity of European landscapes.
What would settle it
An independent high-resolution ground-truth survey in multiple European regions that reveals systematic label errors in one or more of the seven WorldCover classes at a scale large enough to change reported F1 scores by more than a few points.
Figures
read the original abstract
Earth observation imagery plays a critical role in environmental monitoring, urban planning, disaster assessment, and climate analysis. While multi-spectral sensors are increasingly available, true-color (RGB) imagery remains widely used due to the power, cost, and deployment constraints of many satellite and aerial platforms. However, existing land-cover segmentation datasets are often limited in geographic coverage, scale, or public accessibility. To bridge this gap, we introduce BELDE (Building a Large-scale Earth-observation Land-cover Dataset for Europe), a publicly available dataset tailored for RGB-based remote sensing semantic segmentation. Constructed from Sentinel-2 true-color images and ESA WorldCover data annotations, BELDE contains 1,088,385 curated image-segmentation map pairs spanning Europe with 7 land-cover classes at 10 m spatial resolution, making it one of the largest publicly available RGB land-cover segmentation datasets for Earth observation. To facilitate cross-region generalization studies, we additionally introduce BELDE-K (16,607 pairs) covering the Republic of Korea and BELDE-CA-NV (88,155 pairs) covering California and Nevada in the United States. We establish baseline results using multiple semantic segmentation architectures and evaluate both in-domain and cross-domain performance. Models trained on BELDE achieve an F1 score of 83.0% on the European test set, while performance decreases to 66.4% on BELDE-CA-NV and 58.3% on BELDE-K, highlighting the challenges posed by out-of-distribution geographic domain shift. By providing a continental-scale RGB segmentation and evaluation benchmark, BELDE supports the development of robust and transferable Earth observation models. The dataset and benchmark resources will be publicly released.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript introduces BELDE, a publicly available dataset of 1,088,385 curated Sentinel-2 RGB image and ESA WorldCover segmentation map pairs covering Europe at 10 m resolution with 7 land-cover classes. It also releases smaller cross-domain sets BELDE-K (16,607 pairs, Republic of Korea) and BELDE-CA-NV (88,155 pairs, California/Nevada). Baseline semantic segmentation models achieve 83.0% F1 on the European test set, dropping to 66.4% and 58.3% on the out-of-domain sets, with plans for public release to support generalization research in Earth observation.
Significance. If the WorldCover-derived labels can be shown to be sufficiently accurate, BELDE would constitute a meaningful contribution by supplying one of the largest public RGB land-cover segmentation resources at continental scale, together with explicit cross-domain benchmarks that directly address a recognized challenge in remote-sensing model transfer. The scale and planned public release are strengths that would facilitate reproducible work on domain shift.
major comments (2)
- [Abstract] Abstract: the central claim that BELDE supplies reliable large-scale training/evaluation data rests on ESA WorldCover supplying accurate 7-class labels, yet the manuscript reports no per-class accuracy figures, confusion matrices versus CORINE or national maps, or manual audit on any validation subset; without this, the reported 83.0% in-domain F1 and the cross-domain drops cannot be unambiguously attributed to model performance rather than inherited label noise.
- [Abstract] Abstract: no information is supplied on curation criteria used to select the 1,088,385 pairs, the train-test split methodology, or the training hyperparameters for the baseline models; these omissions prevent verification that the empirical results support the utility claims made for the dataset.
minor comments (1)
- [Abstract] The abstract would be clearer if it explicitly listed the seven land-cover classes and the precise spatial extent (e.g., bounding box or country coverage) of the European portion.
Simulated Author's Rebuttal
We thank the referee for the constructive feedback. We address each major comment below, indicating planned revisions where appropriate.
read point-by-point responses
-
Referee: [Abstract] Abstract: the central claim that BELDE supplies reliable large-scale training/evaluation data rests on ESA WorldCover supplying accurate 7-class labels, yet the manuscript reports no per-class accuracy figures, confusion matrices versus CORINE or national maps, or manual audit on any validation subset; without this, the reported 83.0% in-domain F1 and the cross-domain drops cannot be unambiguously attributed to model performance rather than inherited label noise.
Authors: We agree that label provenance and quality are central to interpreting the reported metrics. BELDE directly adopts the 7-class labels from the ESA WorldCover product without additional re-labeling. WorldCover has published global validation results (we will cite the relevant ESA technical reports in the revision). The manuscript does not contain independent per-class comparisons to CORINE, national maps, or a manual audit of a validation subset. We will add a new subsection on label source and limitations, including a brief qualitative review of label consistency on a small random subset of tiles, and will explicitly note that the 83.0% in-domain F1 reflects agreement with WorldCover labels rather than an absolute ground-truth accuracy. A full quantitative cross-product validation lies outside the scope of this dataset release paper. revision: partial
-
Referee: [Abstract] Abstract: no information is supplied on curation criteria used to select the 1,088,385 pairs, the train-test split methodology, or the training hyperparameters for the baseline models; these omissions prevent verification that the empirical results support the utility claims made for the dataset.
Authors: We acknowledge that the abstract is terse on these points. The full manuscript contains a Data Construction section describing curation (cloud-free Sentinel-2 tile selection, geographic tiling, and exclusion of low-quality WorldCover regions), a random tile-level train/validation/test split designed to minimize spatial leakage, and the exact training settings (optimizer, learning rate schedule, augmentation, and early stopping) used for the U-Net, DeepLabv3+, and other baselines. To improve clarity we will (i) add a concise summary of curation and split strategy to the abstract and (ii) ensure all hyperparameter values appear in a dedicated table or appendix. The accompanying code repository will release the exact configuration files and split indices. revision: yes
- A comprehensive, continent-wide quantitative validation of WorldCover labels against CORINE or national maps would require a separate, resource-intensive study and is not feasible within the current manuscript.
Circularity Check
No circularity; dataset construction paper with no derivations or self-referential steps
full rationale
The manuscript constructs BELDE by pairing Sentinel-2 RGB tiles with ESA WorldCover labels and reports empirical baseline F1 scores on in-domain and cross-domain splits. No equations, parameter fits, or predictions are present. The central claims (dataset scale, 83% in-domain F1) are direct measurements or counts, not quantities derived from prior results by the paper's own logic. The reliance on WorldCover as ground truth is an external assumption subject to independent verification, not a self-definitional or fitted-input reduction. No self-citations are invoked as load-bearing uniqueness theorems. The work is therefore self-contained against external benchmarks.
Axiom & Free-Parameter Ledger
Forward citations
Cited by 1 Pith paper
-
Benchmarking the Alignment of Data-Quality Metrics, Human Judgment and Land-Cover Segmentation Performance for Earth Observation
Automatic metrics such as FID are misaligned with human perception and downstream segmentation performance for Earth observation datasets and synthetic counterparts.
Reference graph
Works this paper leans on
-
[1]
In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition
Adimoolam, Y.K., Poullis, C., Averkiou, M.: Data leakage detection and de- duplication in large scale geospatial image datasets. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. pp. 72–81 (2026)
2026
-
[2]
International Journal of Computer Vision133(11), 7672–7709 (2025)
Al-Emadi, S.A., Yang, Y., Ofli, F.: Analysing satellite imagery classification under spatial domain shift across geographic regions. International Journal of Computer Vision133(11), 7672–7709 (2025)
2025
-
[3]
In: Asian conference on computer vision
Audebert, N., Le Saux, B., Lefèvre, S.: Semantic segmentation of earth observation data using multimodal and multi-scale deep networks. In: Asian conference on computer vision. pp. 180–196. Springer (2016)
2016
-
[4]
In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops
Boguszewski, A., Batorski, D., Ziemba-Jankowska, N., Dziedzic, T., Zambrzycka, A.: LandCover.ai: Dataset for automatic mapping of buildings, woodlands, water and roads from aerial imagery. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops. pp. 1102–1110 (June 2021)
2021
-
[5]
Scientific data9(1), 251 (2022)
Brown, C.F., Brumby, S.P., Guzder-Williams, B., Birch, T., Hyde, S.B., Mazzariello, J., Czerwinski, W., Pasquarella, V.J., Haertel, R., Ilyushchenko, S., et al.: Dynamic world, near real-time global 10 m land use land cover mapping. Scientific data9(1), 251 (2022)
2022
-
[6]
arXiv preprint arXiv:2603.09625 (2026)
Çağlar, Ü.M., Temizel, A.: Grounding synthetic data generation with vision and language models. arXiv preprint arXiv:2603.09625 (2026)
Pith/arXiv arXiv 2026
-
[7]
arXiv preprint arXiv:2606.02092 (2026)
Çağlar, Ü.M., Temizel, A.: LALE: Lightweight-transformer architecture for land- cover estimation. arXiv preprint arXiv:2606.02092 (2026)
Pith/arXiv arXiv 2026
-
[8]
In: 2017 IEEE visual communications and image processing (VCIP)
Chaurasia, A., Culurciello, E.: Linknet: Exploiting encoder representations for efficient semantic segmentation. In: 2017 IEEE visual communications and image processing (VCIP). pp. 1–4. IEEE (2017)
2017
-
[9]
arXiv preprint arXiv:1706.05587 (2017)
Chen, L.C., Papandreou, G., Schroff, F., Adam, H.: Rethinking atrous convolution for semantic image segmentation. arXiv preprint arXiv:1706.05587 (2017)
Pith/arXiv arXiv 2017
-
[10]
In: ECCV (2018)
Chen, L.C., Zhu, Y., Papandreou, G., Schroff, F., Adam, H.: Encoder-decoder with atrous separable convolution for semantic image segmentation. In: ECCV (2018)
2018
-
[11]
In: IGARSS 2025- 2025IEEEInternationalGeoscienceandRemoteSensingSymposium.pp.1264–1268
Clasen, K.N., Hackel, L., Burgert, T., Sumbul, G., Demir, B., Markl, V.: reBEN: Refined bigearthnet dataset for remote sensing image analysis. In: IGARSS 2025- 2025IEEEInternationalGeoscienceandRemoteSensingSymposium.pp.1264–1268. IEEE (2025)
2025
-
[12]
In: Proceedings of the IEEE conference on computer vision and pattern recognition workshops
Demir, I., Koperski, K., Lindenbaum, D., Pang, G., Huang, J., Basu, S., Hughes, F., Tuia, D., Raskar, R.: Deepglobe 2018: A challenge to parse the earth through satellite images. In: Proceedings of the IEEE conference on computer vision and pattern recognition workshops. pp. 172–181 (2018)
2018
-
[13]
IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing (2019)
Helber, P., Bischke, B., Dengel, A., Borth, D.: EuroSAT: A novel dataset and deep learning benchmark for land use and land cover classification. IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing (2019)
2019
-
[14]
In: IEEE International Geoscience and Remote Sensing Symposium (IGARSS)
Hirayama, S., Tadono, T., Mizukami, Y., Ohki, M., Imamura, K., Hirade, N., Ohgushi, F., Dotsu, M., Yamanokuchi, T., Nasahara, K.N.: Generation of the high-resolution land-use and land-cover map in japan version 21.11. In: IEEE International Geoscience and Remote Sensing Symposium (IGARSS). pp. 4339–
-
[15]
In: Proceedings of the IEEE/CVF international conference on computer vision
Li, Y., Hu, J., Wen, Y., Evangelidis, G., Salahi, K., Wang, Y., Tulyakov, S., Ren, J.: Rethinking vision transformers for mobilenet size and speed. In: Proceedings of the IEEE/CVF international conference on computer vision. pp. 16889–16900 (2023) BELDE 15
2023
-
[16]
In: Proceedings of the IEEE conference on computer vision and pattern recognition
Lin, T.Y., Dollár, P., Girshick, R., He, K., Hariharan, B., Belongie, S.: Feature pyramid networks for object detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition. pp. 2117–2125 (2017)
2017
-
[17]
ISPRS journal of photogrammetry and remote sensing152, 166–177 (2019)
Ma, L., Liu, Y., Zhang, X., Ye, Y., Yin, G., Johnson, B.A.: Deep learning in remote sensing applications: A meta-analysis and review. ISPRS journal of photogrammetry and remote sensing152, 166–177 (2019)
2019
-
[18]
In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)
Miranda, M., Pathak, D., Helber, P., Bischke, B., Najjar, H., Mena, F., Sanchez, C., Pai, A., Arenas, D., Valdenegro-Toro, M., Charfuelan, M., Nuske, M., Dengel, A.: Yieldsat: A multimodal benchmark dataset for high-resolution crop yield prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). pp. 22920–229...
2026
-
[19]
In: Proceedings of the IEEE/CVF international conference on computer vision
Ranftl, R., Bochkovskiy, A., Koltun, V.: Vision transformers for dense prediction. In: Proceedings of the IEEE/CVF international conference on computer vision. pp. 12179–12188 (2021)
2021
-
[20]
In: International Conference on Medical image computing and computer-assisted intervention
Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: International Conference on Medical image computing and computer-assisted intervention. pp. 234–241. Springer (2015)
2015
-
[21]
arXiv preprint arXiv:1902.06148 (2019)
Sumbul, G., Charfuelan, M., Demir, B., Markl, V.: Bigearthnet: A large-scale benchmark archive for remote sensing image understanding. arXiv preprint arXiv:1902.06148 (2019)
arXiv 1902
-
[22]
In: European conference on computer vision
Touvron, H., Cord, M., Jégou, H.: DeiT III: Revenge of the vit. In: European conference on computer vision. pp. 516–533. Springer (2022)
2022
-
[23]
Scientific reports14(1), 3926 (2024)
Truong, V.T., Hirayama, S., Phan, D.C., Hoang, T.T., Tadono, T., Nasahara, K.N.: Jaxa’s new high-resolution land use land cover map for vietnam using a time-feature convolutional neural network. Scientific reports14(1), 3926 (2024)
2024
-
[24]
In: European conference on computer vision
Tu, Z., Talebi, H., Zhang, H., Yang, F., Milanfar, P., Bovik, A., Li, Y.: Maxvit: Multi-axis vision transformer. In: European conference on computer vision. pp. 459–479. Springer (2022)
2022
-
[25]
In: Proceedings of the IEEE/CVF international conference on computer vision
Vasu, P.K.A., Gabriel, J., Zhu, J., Tuzel, O., Ranjan, A.: FastViT: A fast hybrid vision transformer using structural reparameterization. In: Proceedings of the IEEE/CVF international conference on computer vision. pp. 5785–5795 (2023)
2023
-
[26]
In: Vanschoren, J., Yeung, S
Wang,J.,Zheng,Z.,Ma,A.,Lu,X.,Zhong,Y.:LoveDA:Aremotesensingland-cover dataset for domain adaptive semantic segmentation. In: Vanschoren, J., Yeung, S. (eds.) Proceedings of the Neural Information Processing Systems Track on Datasets and Benchmarks. vol. 1. Curran Associates, Inc. (2021)
2021
-
[27]
Advances in neural information processing systems34, 12077–12090 (2021)
Xie, E., Wang, W., Yu, Z., Anandkumar, A., Alvarez, J.M., Luo, P.: SegFormer: Simple and efficient design for semantic segmentation with transformers. Advances in neural information processing systems34, 12077–12090 (2021)
2021
-
[28]
Zanaga, D., Van De Kerchove, R., Daems, D., De Keersmaecker, W., Brockmann, C., Kirches, G., Wevers, J., Cartus, O., Santoro, M., Fritz, S., et al.: ESA worldcover 10 m 2021 v200 (2022)
2021
-
[29]
In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)
Zeng, L., Marsocci, V., Zhao, W., Nascetti, A., Vergauwen, M.: Neighbormae: Exploiting spatial dependencies between neighboring earth observation images in masked autoencoders pretraining. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). pp. 20597–20607 (June 2026)
2026
-
[30]
In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)
Zhang, P., Zhang, Y., Xu, L., Lin, J., Guo, Z., Wang, F., Yang, X., Wei, K., Wang, L.: Geovis: Geospatially rewarded visual search for remote sensing visual grounding. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). pp. 14335–14345 (June 2026) 16 Ü.M. Çağlar And A. Temizel
2026
-
[31]
In: Proceedings of the IEEE conference on computer vision and pattern recognition
Zhao, H., Shi, J., Qi, X., Wang, X., Jia, J.: Pyramid scene parsing network. In: Proceedings of the IEEE conference on computer vision and pattern recognition. pp. 2881–2890 (2017)
2017
-
[32]
In: International workshop on deep learning in medical image analysis (2018)
Zhou, Z., Rahman Siddiquee, M.M., Tajbakhsh, N., Liang, J.: Unet++: A nested u-net architecture for medical image segmentation. In: International workshop on deep learning in medical image analysis (2018)
2018
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.