Multispectral airborne laser scanning dataset for tree species classification: MS-ALS-SPECIES
Pith reviewed 2026-05-08 04:31 UTC · model grok-4.3
The pith
An open multispectral airborne laser scanning dataset of 6326 trees demonstrates improved species classification using point transformer models for small and minority species.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The authors present an open multispectral ALS dataset comprising 6326 segment-level point clouds of individual trees representing nine species in Southern Finland, acquired using a helicopter-borne system with point density exceeding 1000 points per square meter and an Optech Titan system with about 35 points per square meter. They describe field data collection techniques for high-quality ground truth and provide new analyses on species classification that highlight the advantages of the point transformer model particularly for small trees and minority species.
What carries the argument
The multispectral segment-level point cloud dataset for individual trees, which enables machine learning models like the point transformer to classify tree species more effectively across different sizes and abundances.
If this is right
- Supports improved individual-tree-level forest assessments and biodiversity mapping in boreal ecosystems.
- Facilitates development and benchmarking of machine learning and deep learning methods for tree species classification from multispectral data.
- Demonstrates that point transformer models can achieve better accuracy for small trees and less common species compared to other approaches.
- The dataset's dual acquisition systems allow study of how point density affects classification performance.
Where Pith is reading between the lines
- Researchers in other forest regions could adapt the field collection techniques to create similar local datasets for species classification.
- Combining this dataset with other remote sensing data types might further enhance classification accuracy for keystone species.
- Future work could test whether the observed advantages hold when applying the models to operational forest inventories at larger scales.
- Public availability of the data lowers barriers for testing new algorithms without needing to collect expensive field-validated samples.
Load-bearing premise
The field data collection produces accurate and unbiased ground truth labels for the tree species that are representative enough for results to apply beyond the specific study area and sensor setups in Southern Finland.
What would settle it
A replication study applying the same models to multispectral ALS data from a different geographic region or forest type that fails to show improved accuracy for small trees and minority species using the point transformer would indicate the claim does not generalize.
Figures
read the original abstract
The shift from stand-level to individual-tree-level forest assessments supports improved biodiversity mapping, particularly in boreal ecosystems where tree species like aspen (Populus tremula L.) play a keystone role. While airborne laser scanning (ALS) is the standard for such inventories, a major limitation is the small number of publicly available ALS datasets containing high-quality, field-validated reference data. Furthermore, open multispectral ALS datasets with high-quality field reference data are completely lacking despite the potential of multispectral ALS data for tree species classification. This paper presents and details an open multispectral ALS dataset used in a recent international benchmarking study of machine learning and deep learning methods for tree species classification by Taher et al. (2026). The dataset comprises 6326 segment-level point clouds of individual trees representing nine species in Southern Finland. The point cloud data has been acquired using two multispectral laser scanning systems each operating at three laser wavelengths: a helicopter-borne system (HeliALS) with a point density exceeding 1000 points/m$^2$ and an Optech Titan system with approximately 35 points/m$^2$. We provide a detailed description of field data collection techniques developed in the study to facilitate the collection of high-quality ground truth data in an efficient and scalable manner. Additionally, our article presents new analyses on species classification using multispectral data building upon the initial findings of Taher et al. (2026). Furthermore, we study the relation between classification accuracy and tree height to highlight the versatility of the open dataset and to demonstrate the advantage of the point transformer model for small trees and minority species.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper presents an open multispectral ALS dataset (MS-ALS-SPECIES) comprising 6326 segment-level point clouds of individual trees from nine species in Southern Finland. Data were acquired with a high-density helicopter-borne HeliALS system (>1000 pts/m²) and an Optech Titan system (~35 pts/m²), both multispectral. It details scalable field collection techniques for ground-truth labels, provides new species-classification analyses using multispectral point clouds, and demonstrates the point transformer's advantage for small trees and minority species.
Significance. If the labels prove reliable, the dataset fills a documented gap in publicly available multispectral ALS reference data for individual-tree classification in boreal forests. It supports benchmarking of ML/DL methods and biodiversity applications, with explicit strengths in open release, dual-sensor coverage, and height-stratified performance analysis.
major comments (1)
- Field data collection section and classification results: The manuscript describes field techniques intended to produce high-quality ground truth but reports no quantitative validation of species labels (e.g., inter-observer agreement, repeated measurements on a subset, or cross-checks against independent references such as destructive sampling or high-resolution imagery). This is load-bearing for the central claim that the point transformer outperforms other models on small trees and minority species, because even modest label noise in boreal settings could inflate apparent advantages without reflecting genuine discriminative power of the multispectral features.
Simulated Author's Rebuttal
We thank the referee for the constructive feedback and for recognizing the potential value of the MS-ALS-SPECIES dataset. We address the single major comment below.
read point-by-point responses
-
Referee: Field data collection section and classification results: The manuscript describes field techniques intended to produce high-quality ground truth but reports no quantitative validation of species labels (e.g., inter-observer agreement, repeated measurements on a subset, or cross-checks against independent references such as destructive sampling or high-resolution imagery). This is load-bearing for the central claim that the point transformer outperforms other models on small trees and minority species, because even modest label noise in boreal settings could inflate apparent advantages without reflecting genuine discriminative power of the multispectral features.
Authors: We agree that the absence of quantitative label validation metrics is a limitation that should be addressed. The species labels were obtained by trained forestry professionals using visual identification supplemented by existing stand-level inventory records where available; no formal inter-observer agreement study, repeated measurements, or destructive sampling was conducted during the original field campaign. In the revised manuscript we will expand the field data collection section with an explicit discussion of the label acquisition protocol, known sources of uncertainty in boreal species identification, and the implications of potential label noise for the reported classification results. We will also add a short sensitivity analysis showing how the relative performance of the point transformer versus other models holds when restricting the test set to the most confidently labeled trees. These additions will allow readers to better evaluate whether the observed advantages for small trees and minority species reflect genuine multispectral discriminative power. revision: yes
Circularity Check
No circularity: dataset release and descriptive analyses are self-contained
full rationale
The manuscript is a data release paper that describes an open multispectral ALS dataset of 6326 individual-tree point clouds, details field collection protocols for ground-truth labels, and reports new descriptive classification analyses using multispectral features. No equations, fitted parameters, or model outputs are defined in terms of themselves; the classification results are presented as empirical observations on the released data rather than predictions derived from internal fits. The single self-citation to Taher et al. (2026) points to a related benchmarking study that employed this dataset and does not serve as the sole justification for any central claim. All core contributions (dataset description, collection methods, and accuracy-vs-height relations) are externally verifiable through the public data release and do not reduce to inputs by construction.
Axiom & Free-Parameter Ledger
Reference graph
Works this paper leans on
-
[1]
URLhttps://kartat.espoo.fi/opendata/. Retrieved 10.6.2024. M. Dalponte, L. Bruzzone, and D. Gianelle. Tree species classification in the Southern Alps based on the fusion of very high geometrical resolution multispectral/hyperspectral images and LiDAR data.Re- mote sensing of environment, 123:258–270,
work page 2024
-
[2]
URL https://github.com/JulFrey/DetailView. Retrieved 27.1.2025. S. Graves and S. Marconi. Idtrees 2020 competition data, July
work page 2025
-
[3]
URLhttps://doi.org/10.5281/zenodo.3934932. S. J. Graves, S. Marconi, D. Stewart, I. Harmon, B. Weinstein, Y . Kanazawa, V . M. Scholl, M. B. Joseph, J. McGlinchy, L. Browne, et al. Data science competition for cross-site individual tree species identification from airborne remote sensing data.PeerJ, 11:e16578,
-
[4]
URL https://doi.org/10.5281/zenodo.17077255. G. Jocher, A. Chaurasia, and J. Qiu. Ultralytics YOLO (Version 8.0.0),
-
[5]
URLhttps://github.com/ultralytics/ ultralytics. Retrieved 27.1.2025. M. Jonsson, J. Bengtsson, L. Gamfeldt, J. Moen, and T. Snäll. Levels of forest ecosystem services depend on specific mixtures of com- mercial tree species.Nature plants, 5(2):141–147,
work page 2025
-
[6]
J. Korkeala, J. Muhojoki, A. Kukko, and J. Hyyppä. Espoonlahti mo- bile laser scanning tree species classification, Nov. 2025a. URL https://doi.org/10.5281/zenodo.17639338. J. Korkeala, J. Muhojoki, J. Taher, K. Salolahti, M. Hyyppä, A. Kukko, and J. Hyyppä. Normalview: sensor-agnostic tree species classifi- cation from backpack and aerial lidar data usin...
- [7]
-
[8]
URLhttps://www.maanmittauslaitos. fi/en/maps-and-spatial-data/ datasets-and-interfaces/ map-interface-services/ map-image-service-wms-wmts-2. Retrieved 10.6.2024. H. O. Ørka, E. Næsset, and O. M. Bollandsås. Classifying species of individual trees by intensity and structure features derived from airborne laser scanner data.Remote Sensing of Environment, 1...
work page 2024
-
[9]
URL https://postgis.net/. S. Prasad, B. L. Saux, N. Yokoya, and R. Hansch. 2018 ieee grss data fusion challenge – fusion of mul- tispectral lidar and hyperspectral data,
work page 2018
-
[10]
URL https://ieee-dataport.org/open-access/ 2018-ieee-grss-data-fusion-challenge-fusion-multispectral-lidar-and-hyperspectral-data. J.-F. Prieur, B. St-Onge, R. A. Fournier, M. E. Woods, P. Rana, and D. Kneeshaw. A comparison of three airborne laser scanner types for species identification of individual trees.Sensors, 22(1):35,
work page 2018
-
[11]
URLhttps:// doi.org/10.5281/zenodo.10035928. A. Tockner, R. Kraßnitzer, C. Gollob, S. Witzmann, T. Ritter, and A. Nothdurft. Tree species classification using intensity patterns from individual tree point clouds.International Journal of Applied Earth Observation and Geoinformation, 139:104502,
-
[12]
URLhttps: //doi.org/10.1594/PANGAEA.942856. Z. Xi, C. Hopkinson, S. B. Rood, and D. R. Peddle. See the forest and the trees: Effective machine and deep learning algorithms for wood filtering and tree species classification from terrestrial laser scanning.ISPRS Journal of Photogrammetry and Remote Sensing, 168:1–16,
-
[13]
For each study, we report the country and type of the forest under study, number of trees used to train the classifier, the number of species considered in the study, average point density, classification method, and the achieved overall clas- sification accuracy. Based on the information reported in the papers, we have attempted to interpret the number o...
work page 2010
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.