pith. sign in

arxiv: 2606.20767 · v1 · pith:POIFGJPCnew · submitted 2026-06-18 · 🧬 q-bio.OT

COLD-CI: A large-scale very high-resolution label polygon dataset for cocoa and non-cocoa classification in Cote d'Ivoire

Pith reviewed 2026-06-26 14:55 UTC · model grok-4.3

classification 🧬 q-bio.OT
keywords cocoa mappingland cover polygonsCôte d'Ivoirehigh-resolution imageryvector datasetdeforestation monitoringvalidation data
0
0 comments X

The pith

COLD-CI supplies 123736 vector polygons labelling 5996 km² of cocoa and non-cocoa land in Côte d'Ivoire with 99% validation agreement.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper releases COLD-CI, a public vector dataset of 123736 polygons that mark cocoa planted areas and background land covers across the main production zones and contrasting landscapes of Côte d'Ivoire. Polygon candidates came from automated filtering of prior West Africa data and thematic layers, then received systematic visual interpretation, manual correction and digitisation on 0.5 m satellite imagery. The final set contains 58107 cocoa polygons covering 1788 km² and 65629 background polygons covering 4208 km². Independent validation against field and expert photointerpreted reference points from the Copernicus4GEOGLAM dataset reached 99% overall agreement, with producer's and user's accuracies above 98% for both classes. The authors state that the open release removes a key barrier to transparent benchmarking, model training and validation for land-use planning, deforestation monitoring and supply-chain analysis.

Core claim

COLD-CI is a vector polygon dataset of 123736 features covering a total labelled area of 5996 km², comprising 58107 cocoa polygons (1788 km²) and 65629 background polygons (4208 km²), produced by conservative automated filtering followed by systematic visual interpretation, manual correction and digitisation from 0.5 m imagery, and shown to agree at 99% with independent field-based and expert reference data.

What carries the argument

The label polygons generated by automated candidate filtering combined with systematic visual interpretation and manual digitisation on 0.5 m satellite imagery.

If this is right

  • Enables direct benchmarking of cocoa mapping methods against a transparent reference set.
  • Supports development and validation of classification models at multiple spatial resolutions.
  • Provides reference data for deforestation monitoring and land-use planning in the main cocoa regions of Côte d'Ivoire.
  • Allows supply-chain analysis that distinguishes cocoa from other land covers at fine scale.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The fine-scale internal heterogeneity captured in the cocoa polygons could improve training of models that detect mixed or shaded cocoa systems.
  • Public availability may reduce repeated creation of similar labels for other West African cocoa studies.
  • The dataset could serve as a test bed for assessing how label quality at 0.5 m resolution affects downstream map accuracy at coarser resolutions.
  • Extension to adjacent countries would test whether the same digitisation protocol yields comparable accuracy outside Côte d'Ivoire.

Load-bearing premise

Systematic visual interpretation and manual digitisation on 0.5 m imagery produce polygons that accurately represent true cocoa planted areas without substantial errors from canopy similarity or interpreter bias.

What would settle it

An independent field survey or expert photointerpretation campaign that measures producer's or user's accuracy below 90% for the cocoa class on the released polygons would falsify the accuracy claim.

read the original abstract

Spatially explicit information on cocoa cultivation is essential for land-use planning, deforestation monitoring, environmental assessment, and supply-chain analysis. Although several cocoa map products exist, their underlying reference data are often not publicly available, limiting transparency and methodological benchmarking. Here, we present a large-scale, very high-resolution cocoa and non-cocoa label polygon dataset for Cote d'Ivoire (COLD-CI), covering the main cocoa-producing regions as well as contrasting non-cocoa landscapes. COLD-CI consists of 123,736 vector polygons corresponding to a total labelled area of 5,996 km^2, including 58,107 cocoa polygons (1,788 km^2) and 65,629 background polygons (4,208 km^2). Polygon label candidates were first generated through conservative automated filtering of polygons from the West Africa Cocoa dataset and the combination of multiple external thematic datasets. These candidates were subsequently refined and complemented through systematic visual interpretation, manual correction, and digitisation using very high-resolution (0.5 m) satellite imagery. The resulting label polygons capture cocoa planted areas and associated fine-scale internal heterogeneity, as well as a wide range of non-cocoa land-cover types. Independent validation using field-based and expert photointerpreted reference data from the Copernicus4GEOGLAM validation dataset indicated an overall agreement of 99%, with producer's and user's accuracy exceeding 98% for both cocoa and background classes. COLD-CI is released as a vector dataset with associated metadata to support transparent benchmarking, model development, and validation across a wide range of spatial resolutions.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

0 major / 2 minor

Summary. The manuscript presents COLD-CI, a large-scale very high-resolution vector polygon dataset for cocoa and non-cocoa classification in Côte d'Ivoire. It comprises 123,736 polygons over 5,996 km² (58,107 cocoa polygons covering 1,788 km²; 65,629 background polygons covering 4,208 km²). Polygon candidates were generated via conservative automated filtering of the West Africa Cocoa dataset combined with external thematic layers, then refined through systematic visual interpretation, manual correction, and digitization on 0.5 m satellite imagery. Independent validation against field-based and expert photointerpreted reference data from the Copernicus4GEOGLAM dataset yields 99% overall agreement, with producer's and user's accuracies exceeding 98% for both classes. The dataset is released as open vector data with metadata.

Significance. If the reported validation holds, the dataset fills a documented gap in publicly available, high-resolution reference data for cocoa mapping. Its scale, inclusion of fine-scale heterogeneity, coverage of both cocoa and contrasting non-cocoa landscapes, and independent validation against an external reference set provide a concrete resource for benchmarking, model training, and validation at multiple resolutions. The open release directly supports transparency and reproducibility in land-use and supply-chain applications.

minor comments (2)
  1. [Abstract] Abstract: the phrase 'Cote d'Ivoire' appears without the diacritic; standard orthography is 'Côte d'Ivoire'.
  2. [Methods] The methods description of the automated filtering step would benefit from an explicit list or table of the exact criteria and thresholds applied to the West Africa Cocoa dataset polygons.

Simulated Author's Rebuttal

0 responses · 0 unresolved

We thank the referee for their positive assessment of the manuscript and for recommending acceptance. The review accurately captures the dataset's scale, validation approach, and intended uses.

Circularity Check

0 steps flagged

No significant circularity

full rationale

This is a dataset creation and release paper. Polygon candidates are generated from external public datasets (West Africa Cocoa dataset and thematic layers), refined by manual digitization on independent VHR imagery, and validated against a separate external field/photointerpreted reference set (Copernicus4GEOGLAM) that reports 99% agreement. No equations, fitted parameters, predictions, or self-citation chains exist; the central claim is the dataset itself whose quality is supported by external benchmarks. The work is therefore self-contained.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

No free parameters, axioms, or invented entities; this is an empirical remote-sensing dataset release without theoretical modeling or new postulated constructs.

pith-pipeline@v0.9.1-grok · 5853 in / 1161 out tokens · 26393 ms · 2026-06-26T14:55:45.567896+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

28 extracted references · 23 canonical work pages

  1. [1]

    Kalischek, N., Lang, N., Renier, C. et al. Cocoa plantations are associated with deforestation in Côte d’Ivoire and Ghana. Nat. Food 4, 384–393 (2023). https://doi.org/10.1038/s43016-023-00751-8

  2. [2]

    Renier, C. et al. Direct and indirect deforestation for cocoa in the tropical moist forests of Ghana. Environ. Res.: Food Syst. 2, 025006 (2025). https://doi.org/10.1088/2976- 601X/add01b

  3. [3]

    & Persson, U

    Singh, C. & Persson, U. M. Global patterns of commodity-driven deforestation and associated carbon emissions. Nat. Food 7, 138–151 (2026). https://doi.org/10.1038/s43016-026-01305-4

  4. [4]

    European Union. Regulation (EU) 2023/1115 of the European Parliament and of the Council of 31 May 2023 on the making available on the Union market and the export from the Union of certain commodities and products associated with deforestation and forest degradation and repealing Regulation (EU) No 995/2010. Off. J. Eur. Union L 150, 206 (2023). https://eu...

  5. [5]

    Masolele, R. N. et al. Mapping the diversity of land uses following deforestation across Africa. Sci. Rep. 14, 1681 (2024). https://doi.org/10.1038/s41598-024-52138-9

  6. [6]

    & Reiche, J

    Moraiti, N., Mullissa, A., Rahn, E., Sassen, M. & Reiche, J. Critical assessment of cocoa classification with limited reference data: a study in Côte d’Ivoire and Ghana using Sentinel-2 and random forest model. Remote Sens. 16, 598 (2024). https://doi.org/10.3390/rs16030598

  7. [7]

    Note méthodologique de la carte d’occupation du sol de Côte d’Ivoire 2020 (2020)

    BNETD-CIGN. Note méthodologique de la carte d’occupation du sol de Côte d’Ivoire 2020 (2020). https://africageoportal.maps.arcgis.com/sharing/rest/content/items/76dc18767b89472eb8 9e8aa54e08a6c9/data

  8. [8]

    N., Van Coillie, F

    Numbisi, F. N., Van Coillie, F. M. B. & De Wulf, R. Delineation of cocoa agroforests using multiseason Sentinel-1 SAR images: a low grey level range reduces uncertainties in GLCM texture-based mapping. ISPRS Int. J. Geo-Inf. 8, 179 (2019). https://doi.org/10.3390/ijgi8040179

  9. [9]

    Ashiagbor, G. et al. Pixel-based and object-oriented approaches in segregating cocoa from forest in the Juabeso-Bia landscape of Ghana. Remote Sens. Appl. Soc. Environ. 19, 100349 (2020). https://doi.org/10.1016/j.rsase.2020.100349

  10. [10]

    O., Szantoi, Z., Brink, A., Robuchon, M

    Abu, I. O., Szantoi, Z., Brink, A., Robuchon, M. & Thiel, M. Detecting cocoa plantations in Côte d’Ivoire and Ghana and their implications on protected areas. Ecol. Indic. 129, 107863 (2021). https://doi.org/10.1016/j.ecolind.2021.107863

  11. [11]

    Blaser-Hart, W. J. & Hart, S. The unrealised potential of agroforestry (climate mitigation potential analysis). University of Queensland institutional archive (2025). https://doi.org/10.48610/dda018c

  12. [12]

    Kouamé, I. K. et al. Supporting dataset for the article ‘Maximizing tree diversity in cocoa agroforestry: taking advantage of planted, spontaneous, and remnant trees’. Zenodo (2025). https://doi.org/10.5281/zenodo.15124220

  13. [13]

    Inventaire des arbres dans 223 cacaoyères agroforestières au Cameroun

    Lescuyer, G. Inventaire des arbres dans 223 cacaoyères agroforestières au Cameroun. Harvard Dataverse (2024). https://doi.org/10.18167/DVN1/MGDIJU

  14. [14]

    Lammoglia, S. et al. High-resolution multispectral and RGB dataset from UAV surveys of ten cocoa agroforestry typologies in Côte d'Ivoire. Data Brief 55, 110664 (2024). https://doi.org/10.1016/j.dib.2024.110664

  15. [15]

    Reference dataset for land use change mapping in Ghana's cocoa landscape (2024–2025) (v2.0) [Data set]

    Centre for Remote Sensing and Geographic Information Services (CERSGIS). Reference dataset for land use change mapping in Ghana's cocoa landscape (2024–2025) (v2.0) [Data set]. Zenodo (2025). https://doi.org/10.5281/zenodo.16579443

  16. [16]

    & Shao, Y

    Schneider, M., Winchester, C., Goldman, E. & Shao, Y. Mapping cocoa and assessing deforestation risk for the cocoa sector in Côte d’Ivoire and Ghana. World Resources Institute (2023). https://doi.org/10.46830/writn.21.00011

  17. [17]

    Cocoa & Forests Initiative

    World Cocoa Foundation. Cocoa & Forests Initiative. Online resource (2026). https://worldcocoafoundation.org/programmes-and-initiatives/cocoa-and-forests-initiative

  18. [18]

    COPERNICUS4GEOGLAM Validation Note LandCover v2 Ivory Coast (2024)

    CLS & Terrasphere. COPERNICUS4GEOGLAM Validation Note LandCover v2 Ivory Coast (2024). https://africageoportal.maps.arcgis.com/sharing/rest/content/items/0072843eefe545b1b4 3b0c5b2ebec41a/data

  19. [19]

    Community models 2025a

    Forest Data Partnership. Community models 2025a. GitHub repository (2025). https://github.com/google/forest-data- partnership/blob/main/models/model_2025a/README.md

  20. [20]

    Zanaga, D. et al. ESA WorldCover 10 m 2020 v100. Zenodo (2021). https://doi.org/10.5281/zenodo.5571936

  21. [21]

    Van Tricht, K. et al. WorldCereal: a dynamic open-source system for global-scale, seasonal, and reproducible crop and irrigation mapping. Earth Syst. Sci. Data 15, 5491– 5515 (2023). https://doi.org/10.5194/essd-15-5491-2023

  22. [22]

    Tolan, J. et al. Very high resolution canopy height maps from RGB imagery using self- supervised vision transformer and convolutional decoder trained on aerial lidar. Remote Sens. Environ. 300, 113888 (2024). https://doi.org/10.1016/j.rse.2023.113888

  23. [23]

    Descals, A. et al. High-resolution global map of smallholder and industrial closed-canopy oil palm plantations. Earth Syst. Sci. Data 13, 1211–1231 (2021). https://doi.org/10.5194/essd-13-1211-2021

  24. [24]

    Descals, A. et al. High-resolution global map of closed-canopy coconut palm. Earth Syst. Sci. Data 15, 3991–4010 (2023). https://doi.org/10.5194/essd-15-3991-2023

  25. [25]

    Hansen, M. C. et al. Global land use extent and dispersion within natural land cover using Landsat data. Environ. Res. Lett. 17, 034050 (2022). https://doi.org/10.1088/1748- 9326/ac46ec

  26. [26]

    & Stolle, F

    Brandt, J., Ertel, J., Spore, J. & Stolle, F. Wall-to-wall mapping of tree extent in the tropics with Sentinel-1 and Sentinel-2. Remote Sens. Environ. 292, 113574 (2023). https://doi.org/10.1016/j.rse.2023.113574

  27. [27]

    Vancutsem, C. et al. Long-term (1990–2019) monitoring of forest cover changes in the humid tropics. Sci. Adv. 7, eabe1603 (2021). https://doi.org/10.1126/sciadv.abe1603

  28. [28]

    Hansen, M. C. et al. High-resolution global maps of 21st-century forest cover change. Science 342, 850–853 (2013). https://doi.org/10.1126/science.1244693