pith. sign in

arxiv: 2507.04017 · v3 · submitted 2025-07-05 · 💻 cs.CV

Habitat Classification from Ground-Level Imagery Using Deep Neural Networks

Pith reviewed 2026-05-19 06:26 UTC · model grok-4.3

classification 💻 cs.CV
keywords habitat classificationground-level imageryvision transformerssupervised contrastive learningecological monitoringbiodiversity conservationdeep neural networksUK Countryside Survey
0
0 comments X

The pith

Vision transformers classify 18 UK habitats from ground photos as accurately as experienced experts.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper establishes that deep neural networks, particularly vision transformers trained with supervised contrastive learning, can classify 18 broad habitat types from ground-level images with performance matching that of human ecological experts. This matters because expert field surveys for habitat assessment are costly and time-intensive, limiting the scale of biodiversity monitoring needed for conservation. Ground-level photos capture structural details missed by remote sensing, and the models reduce errors between visually similar classes like different grasslands by learning better-separated feature embeddings. If the approach scales, it could support more frequent national-level habitat tracking to guide land-use decisions.

Core claim

Vision transformers consistently outperform convolutional neural network baselines on classification of 18 habitat types from UK Countryside Survey ground-level imagery, reaching 91% top-3 accuracy and 0.66 Matthews correlation coefficient. Supervised contrastive learning further improves discrimination among similar habitats by producing a more separable embedding space. The strongest model achieves accuracy on par with experienced ecological experts when classifying the same images.

What carries the argument

Vision transformers combined with supervised contrastive learning, which builds a discriminative embedding space to separate visually similar habitats such as Improved Grassland and Neutral Grassland.

If this is right

  • Expert field surveys could be supplemented or partially replaced by automated image analysis for routine habitat monitoring.
  • National-scale biodiversity assessments could become more frequent and less expensive by processing large numbers of ground photos.
  • Misclassification rates between similar habitats drop when contrastive learning is used instead of standard supervised training.
  • Models that reach expert-level performance on ground imagery enable integration of AI outputs directly into conservation planning workflows.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same pipeline could be tested on ground-level images from other countries to check whether the learned distinctions transfer across different ecosystems.
  • Pairing the model's interpretable attention maps with expert review might reveal which visual cues humans and networks both rely on for habitat decisions.
  • Mobile apps that run the model locally could let land managers or volunteers collect and classify habitat data in the field without waiting for specialist input.

Load-bearing premise

The labeled UK Countryside Survey ground-level images represent the full range of real-world variation in the 18 habitat classes so that patterns learned by the model apply to new photos.

What would settle it

Applying the best trained model to a fresh collection of ground-level habitat photographs labeled independently by multiple experts and measuring whether its accuracy stays within the range of expert-to-expert agreement.

Figures

Figures reproduced from arXiv: 2507.04017 by Claire M Wood, Hongrui Shi, James M Brown, Lan Qie, Lisa Norton, Lucy Ridding, Petra Bosilj, Simon Rolph, Tom August.

Figure 1
Figure 1. Figure 1: Examples of level 3 (L3) habitats defined by UKHab in the Countryside Survey dataset, grouped by their coarse L2 categories (bold text). Some L2 habitats, such as Cropland, only have one L3 class. Note that Improved Grassland, Montane, and Bracken have different origins than L3 habitats in UKHab but are treated as L3 habitats in the CS dataset. An explanation is provided in Section 3.1. sensing pipeline, t… view at source ↗
Figure 2
Figure 2. Figure 2: Habitat distributions (L2 and L3) in the CS dataset based on the UKHab system. L3 classes in UKHab, and all references to ’L3 habitats’ in the CS dataset include these three classes unless otherwise specified [PITH_FULL_IMAGE:figures/full_fig_p005_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Contrasting locality bias in CNNs with global self-attention in ViT for ground-level habitat classification. A ground-level photograph of a mixed woodland scene is partitioned into a conceptional 3×3 grid of patches. Top Row (CNN Pipeline): A small convolutional filter “slides” over each patch in turn, but never directly integrates information from non-adjacent patches. The resulting GradCam shows concentr… view at source ↗
Figure 4
Figure 4. Figure 4: Comparison of Classical Supervised Learning and Supervised Contrastive Learning. Top (Classical Supervised Learning). An image encoder and classifier are trained end-to-end by maximising the predicted probability (prob) of the ground-truth label. Because embeddings from different classes remain entangled, the classifier struggles to carve out a decision boundary in a poorly separated embedding space, which… view at source ↗
Figure 5
Figure 5. Figure 5: Comparison of the GradCAMs generated by WNR-50-2 and SwinT-B. The green notation suggests a correct prediction on the sample and the red notation indicates a misclassification. Whether correct or not, the SwinT-B judges the habitat based on a broader and more connected background across the entire image, with a particular overlook on irrelevant visual features, resulting in an improved interpretability ove… view at source ↗
Figure 6
Figure 6. Figure 6: The confusion matrix of SwinT-B on the test set. Habitats with no samples in the test set are removed from the matrix (refer to Section 3.5 for details). While SwinT-B demonstrates more suitability than CNNs for habitat classification, some habitats are often misclassified by the model, e.g., Neutral Grassland is often misclassified as Improved Grassland, affecting the overall classification performance si… view at source ↗
Figure 7
Figure 7. Figure 7: Delta confusion matrix (CM) of challenging habitats: based on the CM produced by SupCon, this graph further highlights its differences from the CM generated by supervised learning, with blue colour indicating boost and red colour indicating decrease. SupCon reduces misclassifications on three major grasslands and the FMS in wetlands, which confuse the model most in the supervised learning. automatic ground… view at source ↗
Figure 8
Figure 8. Figure 8: UMAPs of the embedding space produced by the image encoder on test set. SupCon is found to generate tightly clusters that are more separable from each other compared to supervised learning. Habitat symbols in this figure: ∙ Acid Grassland;∙ Arable and Horticulture;∙ Bog; ∙ Bracken; ∙ Broadleaved Mixed and Yew Woodland; ∙ Calcareous Grassland;∙ Coniferous Woodland; ∙ Dwarf Shrub Heath; ∙ Fen, Marsh, Swamp; … view at source ↗
Figure 9
Figure 9. Figure 9: Confusion matrices for Expert 1 performance against the ground truth, using a subset of 158 test images [PITH_FULL_IMAGE:figures/full_fig_p022_9.png] view at source ↗
Figure 10
Figure 10. Figure 10: Confusion matrices for Expert 2 performance against the ground truth, using a subset of 158 test images. : Preprint submitted to Elsevier Page 22 of 26 [PITH_FULL_IMAGE:figures/full_fig_p022_10.png] view at source ↗
Figure 11
Figure 11. Figure 11: Confusion matrices for Expert 3 performance against the ground truth, using a subset of 158 test images [PITH_FULL_IMAGE:figures/full_fig_p023_11.png] view at source ↗
Figure 12
Figure 12. Figure 12: Confusion matrices for SwinT-B (SupCon) performance against the ground truth, using a subset of 158 test images. : Preprint submitted to Elsevier Page 23 of 26 [PITH_FULL_IMAGE:figures/full_fig_p023_12.png] view at source ↗
read the original abstract

Habitat assessment at local scales -- critical for enhancing biodiversity and guiding conservation priorities -- often relies on expert field surveys that can be costly, motivating the exploration of AI-driven tools to automate and refine this process. While most AI-driven habitat mapping depends on remote sensing, it is often constrained by sensor availability, weather, and coarse resolution. In contrast, ground-level imagery captures essential structural and compositional cues invisible from above and remains underexplored for robust, fine-grained habitat classification. This study addresses this gap by applying state-of-the-art deep neural network architectures to ground-level habitat imagery. Leveraging data from the UK Countryside Survey covering 18 broad habitat types, we evaluate two families of models - convolutional neural networks (CNNs) and vision transformers (ViTs) - under both supervised and supervised contrastive learning paradigms. Our results demonstrate that ViTs consistently outperform state-of-the-art CNN baselines on key classification metrics (Top-3 accuracy = 91%, MCC = 0.66) and offer more interpretable scene understanding tailored to ground-level images. Moreover, supervised contrastive learning significantly reduces misclassification rates among visually similar habitats (e.g., Improved vs. Neutral Grassland), driven by a more discriminative embedding space. Finally, our best model performs on par with experienced ecological experts in habitat classification from images, underscoring the promise of expert-level automated assessment. By integrating advanced AI with ecological expertise, this research establishes a scalable, cost-effective framework for ground-level habitat monitoring to accelerate biodiversity conservation and inform land-use decisions at a national scale.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The manuscript evaluates CNN and Vision Transformer architectures, including supervised contrastive learning, for classifying 18 broad habitat types from ground-level images in the UK Countryside Survey dataset. It reports concrete metrics for the best ViT+contrastive model (Top-3 accuracy 91%, MCC 0.66), notes improved discrimination among visually similar classes, and claims that this performance is on par with experienced ecological experts, positioning the approach as a scalable tool for automated habitat monitoring.

Significance. If the expert-parity claim is substantiated, the work would demonstrate a practical advance in applying modern vision models to fine-grained ecological classification tasks that are invisible from remote sensing. The explicit demonstration that contrastive learning reduces confusion between similar habitats (e.g., Improved vs. Neutral Grassland) is a concrete strength that could be leveraged in other fine-grained ecological datasets.

major comments (2)
  1. [Abstract and Results] The central claim that the best model 'performs on par with experienced ecological experts' (Abstract and Results) is not supported by any quantitative expert baseline. No per-expert accuracy, MCC, inter-rater agreement (Fleiss' kappa or equivalent), number of experts, or evaluation protocol (isolated images vs. additional context) is reported. Because the model is trained to reproduce the same expert labels, this omission directly limits interpretation of the headline metrics as 'expert-level'.
  2. [Methods] Methods section provides no description of train/validation/test splits, cross-validation procedure, statistical significance testing of performance differences across models, or any analysis of label noise or inter-expert disagreement. These omissions make it impossible to assess whether the reported Top-3 accuracy and MCC are robust or generalizable under real-world variation.
minor comments (2)
  1. [Abstract] The abstract states that ViTs 'offer more interpretable scene understanding' but does not specify the interpretability method (e.g., attention maps, Grad-CAM) or show supporting figures; a brief clarification or reference to a figure would help.
  2. Table or figure comparing all models on the full set of metrics (accuracy, Top-3, MCC, per-class F1) would improve readability and allow direct assessment of the contrastive-learning gains.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their thoughtful and constructive review. The comments highlight important areas for improving clarity and rigor, particularly around the expert comparison claim and methodological transparency. We address each major comment below and indicate the revisions planned for the next version of the manuscript.

read point-by-point responses
  1. Referee: [Abstract and Results] The central claim that the best model 'performs on par with experienced ecological experts' (Abstract and Results) is not supported by any quantitative expert baseline. No per-expert accuracy, MCC, inter-rater agreement (Fleiss' kappa or equivalent), number of experts, or evaluation protocol (isolated images vs. additional context) is reported. Because the model is trained to reproduce the same expert labels, this omission directly limits interpretation of the headline metrics as 'expert-level'.

    Authors: We agree that the manuscript does not provide a quantitative expert baseline or inter-rater metrics to support the 'on par with experienced ecological experts' claim. The model was trained and evaluated using the same expert-provided labels from the UK Countryside Survey, but no separate blinded comparison against multiple experts (with reported agreement statistics or protocol details) was performed. To correct this, we will revise the abstract and results sections to remove the expert-parity phrasing. The updated text will instead highlight the achieved metrics (Top-3 accuracy 91%, MCC 0.66) on the expert-annotated dataset and position the work as a promising scalable complement to expert surveys without claiming direct equivalence. revision: yes

  2. Referee: [Methods] Methods section provides no description of train/validation/test splits, cross-validation procedure, statistical significance testing of performance differences across models, or any analysis of label noise or inter-expert disagreement. These omissions make it impossible to assess whether the reported Top-3 accuracy and MCC are robust or generalizable under real-world variation.

    Authors: We acknowledge these omissions limit the ability to evaluate robustness. We will expand the Methods section to explicitly describe the train/validation/test split ratios and any stratification by habitat class or survey year. We will clarify whether a single split or cross-validation was used and, if the latter, detail the number of folds and aggregation method. Statistical significance testing (e.g., paired bootstrap or McNemar tests with p-values) for differences between CNN and ViT models will be added. For label noise and inter-expert disagreement, we will report any available dataset metadata on label provenance and include a brief discussion of potential label variability as a limitation, along with any post-hoc analysis of confusion patterns that may reflect such noise. revision: yes

Circularity Check

0 steps flagged

No circularity: standard empirical evaluation on external dataset

full rationale

The paper applies off-the-shelf CNN and ViT architectures (with optional contrastive learning) to a pre-existing UK Countryside Survey dataset of ground-level images labeled by ecological experts. Reported results consist of standard test-set metrics (Top-3 accuracy, MCC) and a direct comparison against expert performance on the same images. No equations, fitted parameters, self-referential predictions, or uniqueness theorems appear; the central claims are data-driven empirical outcomes rather than derivations that collapse to their own inputs by construction.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 0 invented entities

The performance claims rest on the representativeness of the Countryside Survey dataset and the assumption that ground-level images contain sufficient cues for the 18 classes; no new entities or free parameters beyond standard neural-network training are introduced.

free parameters (1)
  • model architecture and training hyperparameters
    Choice of CNN and ViT variants plus contrastive loss weighting are tuned on the data but not enumerated.
axioms (1)
  • domain assumption Ground-level imagery contains essential structural and compositional cues invisible from above that enable robust classification of the 18 habitat types.
    Stated in the motivation contrasting ground-level imagery with remote sensing.

pith-pipeline@v0.9.0 · 5832 in / 1232 out tokens · 66710 ms · 2026-05-19T06:26:37.502532+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

36 extracted references · 36 canonical work pages · 5 internal anchors

  1. [1]

    Fast unfolding of communities in large networks. J. Stat. Mech.-Theory Exp. 2008, P10008. Breiman, L.,

  2. [2]

    Communications in Statistics 3, 1–27

    A dendrite method for cluster analysis. Communications in Statistics 3, 1–27. URL:https://www.tandfonline.com/doi/abs/10.1080/03610927408827101, doi:10.1080/03610927408827101, arXiv:https://www.tandfonline.com/doi/pdf/10.1080/03610927408827101. Cao,R.,Liao,C.,Li,Q.,Tu,W.,Zhu,R.,Luo,N.,Qiu,G.,Shi,W.,2023. Integratingsatelliteandstreet-levelimagesforlocalcl...

  3. [3]

    Chen, Q., Wu, T.T., Fang, M.,

    Countryside survey: Uk headline messages from 2007 . Chen, Q., Wu, T.T., Fang, M.,

  4. [4]

    Hunter and Kenneth Steiglitz

    A cluster separation measure. IEEE Transactions on Pattern Analysis and Machine Intelligence PAMI-1, 224–227. doi:10.1109/TPAMI.1979.4766909. Deng, J., Dong, W., Socher, R., Li, L.J., Li, K., Fei-Fei, L.,

  5. [5]

    Imagenet: A large-scale hierarchical image database, in: 2009 IEEE conference on computer vision and pattern recognition, Ieee. pp. 248–255. Díaz-Ireland, G., Gülçin, D., López-Sánchez, A., Pla, E., Burton, J., Velázquez, J.,

  6. [6]

    An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale

    An image is worth 16x16 words: Transformers for image recognition at scale. arXiv preprint arXiv:2010.11929 . zu Ermgassen, S.O., Marsh, S., Ryland, K., Church, E., Marsh, R., Bull, J.W.,

  7. [7]

    Conservation Letters 14, e12820

    Exploring the ecological outcomes of mandatory biodiversity net gain using evidence from early-adopter jurisdictions in england. Conservation Letters 14, e12820. European Commission, . Eu biodiversity strategy for 2030: Bringing nature back into our lives. URL:https://eur-lex.europa.eu/ legal-content/EN/TXT/?uri=celex:52020DC0380. Fabio, D.R., Fabio, D., ...

  8. [8]

    Resolution limit in community detection. Proc. Natl. Acad. Sci. U. S. A. 104, 36–41. Fox,J.,Siebenbrunner,A.,Reitinger,S.,Peer,D.,Rodríguez-Sánchez,A.,2024. Automatingavalanchedetectioninground-basedphotographswith deep learning. Cold Regions Science and Technology 223, 104179. Geisz, J.K., Wernette, P.A., Esselman, P.C.,

  9. [9]

    Towardshighlyaccuratecoraltextureimagesclassification using deep convolutional neural networks and data augmentation

    Gómez-Ríos,A.,Tabik,S.,Luengo,J.,Shihavuddin,A.,Krawczyk,B.,Herrera,F.,2019. Towardshighlyaccuratecoraltextureimagesclassification using deep convolutional neural networks and data augmentation. Expert Systems with Applications 118, 315–328. Goodfellow, I., Bengio, Y., Courville, A., Bengio, Y.,

  10. [10]

    GOV.UK,2023

    MIT press Cambridge. GOV.UK,2023. Biodiversitynetgain. URL:https://www.gov.uk/government/collections/biodiversity-net-gain.lastupdated:15 March

  11. [11]

    IEEE Trans

    A soft modularity function for detecting fuzzy communities in social networks. IEEE Trans. Fuzzy Syst. 21, 1170–1175. :Preprint submitted to Elsevier Page 16 of 26 He,K.,Zhang,X.,Ren,S.,Sun,J.,2016. Deepresiduallearningforimagerecognition,in:ProceedingsoftheIEEEconferenceoncomputervision and pattern recognition, pp. 770–778. Hullermeier, E., Rifqi, M.,

  12. [12]

    International Journal of Applied Earth Observation and Geoinformation 120, 103333

    Wetmapformer: A unified deep cnn and vision transformer for complex wetland mapping. International Journal of Applied Earth Observation and Geoinformation 120, 103333. JointNatureConservationCommittee,n.d.Ukbappriorityhabitats.URL:https://jncc.gov.uk/our-work/uk-bap-priority-habitats/. accessed: 2025-07-01. Joly, A., Picek, L., Kahl, S., Goëau, H., Espita...

  13. [13]

    Overview of lifeclef 2024: Challenges on species distribution prediction and identification, in: International Conference of the Cross-Language Evaluation Forum for European Languages, Springer. pp. 183–207. Khosla, P., Teterwak, P., Wang, C., Sarna, A., Tian, Y., Isola, P., Maschinot, A., Liu, C., Krishnan, D.,

  14. [14]

    Swintransformer:Hierarchicalvisiontransformerusingshiftedwindows, in: Proceedings of the IEEE/CVF international conference on computer vision, pp

    Liu,Z.,Lin,Y.,Cao,Y.,Hu,H.,Wei,Y.,Zhang,Z.,Lin,S.,Guo,B.,2021. Swintransformer:Hierarchicalvisiontransformerusingshiftedwindows, in: Proceedings of the IEEE/CVF international conference on computer vision, pp. 10012–10022. Loshchilov, I., Hutter, F.,

  15. [15]

    Decoupled Weight Decay Regularization

    Decoupled weight decay regularization. arXiv preprint arXiv:1711.05101 . Lou,H.,Li,S.,Zhao,Y.,2013. Detectingcommunitystructureusinglabelpropagationwithweightedcoherentneighborhoodpropinquity. Physica A. 392, 3095–3105. Majewski,P.,Zapotoczny,P.,Lampa,P.,Burduk,R.,Reiner,J.,2022. Multipurposemonitoringsystemforedibleinsectbreedingbasedonmachine learning. ...

  16. [16]

    Marcinkowska-Ochtyra,A.,Ochtyra,A.,Raczko,E.,Kopeć,D.,2023.Natura2000grasslandhabitatsmappingbasedonspectro-temporaldimension of sentinel-2 images with machine learning

    arXiv preprint arXiv:2312.06960 . Marcinkowska-Ochtyra,A.,Ochtyra,A.,Raczko,E.,Kopeć,D.,2023.Natura2000grasslandhabitatsmappingbasedonspectro-temporaldimension of sentinel-2 images with machine learning. Remote Sensing 15,

  17. [17]

    Anovelspatio-temporalvisiontransformermodelforimprovingwetland mapping using multi-seasonal sentinel data

    Marjani,M.,Mohammadimanesh,F.,Mahdianpari,M.,Gill,E.W.,2025. Anovelspatio-temporalvisiontransformermodelforimprovingwetland mapping using multi-seasonal sentinel data. Remote Sensing Applications: Society and Environment 37, 101401. Martinez-Sanchez, L., See, L., Yordanov, M., Verhegghen, A., Elvekjaer, N., Muraro, D., d’Andrimont, R., Van der Velde, M.,

  18. [18]

    UMAP: Uniform Manifold Approximation and Projection for Dimension Reduction

    Umap: Uniform manifold approximation and projection for dimension reduction. arXiv preprint arXiv:1802.03426 . Morueta-Holme, N., Iversen, L., Corcoran, D., Rahbek, C., Normand, S.,

  19. [19]

    URL:https://publications.naturalengland

    Natural England. URL:https://publications.naturalengland. org.uk/file/5432513149272064. Nawoya,S.,Ssemakula,F.,Akol,R.,Geissmann,Q.,Karstoft,H.,Bjerge,K.,Mwikirize,C.,Katumba,A.,Gebreyesus,G.,2024. Computervision and deep learning in insects for food and feed production: A review. Computers and Electronics in Agriculture 216, 108503. Nepusz, T., Petróczi,...

  20. [20]

    Acceleratingecosystemmonitoringthroughcomputervisionwithdeepmetriclearning

    Oba,Y.,Doi,H.,2025. Acceleratingecosystemmonitoringthroughcomputervisionwithdeepmetriclearning. EcologicalComplexity62,101124. Perrett, A., Pollard, H., Barnes, C., Schofield, M., Qie, L., Bosilj, P., Brown, J.M.,

  21. [21]

    Accessed: 2025-05-14

    Pl@ntNet: Plant identification platform.https://plantnet.org/. Accessed: 2025-05-14. Praticò,S.,Solano,F.,DiFazio,S.,Modica,G.,2021. Machinelearningclassificationofmediterraneanforesthabitatsingoogleearthenginebased on seasonal sentinel-2 time-series and input image composition optimisation. Remote sensing 13,

  22. [22]

    Near linear time algorithm to detect community structures in large-scale networks. Phys. Rev E. 76, 036106. Ratnayake,M.N.,Dyer,A.G.,Dorin,A.,2021. Trackingindividualhoneybeesamongwildflowerclusterswithcomputervision-facilitatedpollinator monitoring. Plos one 16, e0239504. :Preprint submitted to Elsevier Page 17 of 26 Redmon, J., Divvala, S., Girshick, R....

  23. [23]

    Ecological Indicators 145, 109698

    Using deep learning to detect an indicator arid shrub in ultra-high-resolution uav imagery. Ecological Indicators 145, 109698. Reynolds,S.A.,Beery,S.,Burgess,N.,Burgman,M.,Butchart,S.H.,Cooke,S.J.,Coomes,D.,Danielsen,F.,DiMinin,E.,Durán,A.P.,etal.,2025. The potential for ai to revolutionize conservation: a horizon scan. Trends in ecology & evolution 40, 1...

  24. [24]

    Selvaraju,R.R.,Cogswell,M.,Das,A.,Vedantam,R.,Parikh,D.,Batra,D.,2017

    Why aren’t more landowners enrolling in land-based carbon credit exchanges? Rangelands 46, 117–131. Selvaraju,R.R.,Cogswell,M.,Das,A.,Vedantam,R.,Parikh,D.,Batra,D.,2017. Grad-cam:Visualexplanationsfromdeepnetworksviagradient- based localization, in: Proceedings of the IEEE international conference on computer vision, pp. 618–626. Simonyan, K., Zisserman, A.,

  25. [25]

    Very Deep Convolutional Networks for Large-Scale Image Recognition

    Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 . Sittaro, F., Hutengs, C., Semella, S., Vohland, M.,

  26. [26]

    Remote Sensing 14,

    A machine learning framework for the classification of natura 2000 habitat types at large spatial scales using modis surface reflectance data. Remote Sensing 14,

  27. [27]

    Identification of overlapping and non-overlapping community structure by fuzzy clustering in complex networks. Inf. Sci. 181, 1060–1071. Tan,M.,Le,Q.,2021.Efficientnetv2:Smallermodelsandfastertraining,in:Internationalconferenceonmachinelearning,PMLR.pp.10096–10106. UK Biodiversity Group,

  28. [28]

    gov.uk/assets/0b7943ea-2eee-47a9-bd13-76d1d66d471f

    Uk biodiversity action plan priority habitat descriptions.https://hub.jncc. gov.uk/assets/0b7943ea-2eee-47a9-bd13-76d1d66d471f. URL:https://hub.jncc.gov.uk/assets/ 0b7943ea-2eee-47a9-bd13-76d1d66d471f. accessed: 2025-07-01. UK Centre for Ecology & Hydrology,

  29. [29]

    URL:https://uk-scape.ceh.ac.uk/our-science/projects/ countryside-survey

    Ukceh countryside survey. URL:https://uk-scape.ceh.ac.uk/our-science/projects/ countryside-survey. accessed: 2025-05-13. UKHab Ltd,

  30. [30]

    URL:https://www.ukhab.org/

    UK Habitat Classification System. URL:https://www.ukhab.org/. professional and Basic editions; registration required for full documentation. VanAn,N.,Quang,N.H.,Son,T.P.H.,An,T.T.,2023. High-resolutionbenthichabitatmappingfrommachinelearningonplanetscopeimageryand icesat-2 data. Geocarto International 38, 2184875. Vaswani, A., Shazeer, N., Parmar, N., Usz...

  31. [31]

    Ubiquitousness of link-density and link-pattern communities in real-world networks. Eur. Phys. J. B. 85, 1–11. Wang,Q.,2025. Plantspeciesrecognitionandclassificationalgorithmbasedondeeplearning,in:20253rdInternationalConferenceonIntegrated Circuits and Communication Systems (ICICACS), IEEE. pp. 1–5. Wang, W., Liu, D., Liu, X., Pan, L.,

  32. [32]

    Earth System Science Data 9, 445–459

    Long-term vegetation monitoring in great britain–the countryside survey 1978–2007 and beyond. Earth System Science Data 9, 445–459. Wu, Z., Xiong, Y., Yu, S.X., Lin, D.,

  33. [33]

    Wide Residual Networks

    Wide residual networks. arXiv preprint arXiv:1605.07146 . Zhang, S., Wang, R., Zhang, X.,

  34. [34]

    Physica A

    Identification of overlapping community structure in complex networks using fuzzy c-means clustering. Physica A. 374, 483–490. Zhang,Y.,Yeung,D.,2012. Overlappingcommunitydetectionviaboundednonnegativematrixtri-factorization,in:InProc.ACMSIGKDDConf., pp. 606–614. Zhong,J.,Li,M.,Zhang,H.,Qin,J.,2023. Fine-grained3dmodelingandsemanticmappingofcoralreefsusin...

  35. [35]

    IEEE Transactions on Geoscience and Remote Sensing

    Benthic mapping of coral reef areas at varied water depths using integrated active and passive remote sensing data and novel visual transformer models. IEEE Transactions on Geoscience and Remote Sensing . :Preprint submitted to Elsevier Page 18 of 26 Table 5 Comparison of Calinski–Harabasz (CH) Index and Davies–Bouldin (DB) Index for supervised learning (...

  36. [36]

    •Woodland and cropland receive consistently strong predictions from both humans and the model. This likely reflects their limited within-class variability (two woodland sub-classes, one cropland class) and distinctive visual features, in contrast to the five visually similar grassland sub-classes. •ForrarerhabitatssuchasLittoralSedimentandInlandRock,theex...