Multispectral airborne laser scanning dataset for tree species classification: MS-ALS-SPECIES

Antero Kukko; Eric Hyypp\"a; Harri Kaartinen; Josef Taher; Juha Hyypp\"a; Klaara Salolahti; Leena Matikainen; Matti Hyypp\"a; Matti Lehtom\"aki; Paula Litkey

arxiv: 2604.24370 · v1 · submitted 2026-04-27 · 💻 cs.CV

Multispectral airborne laser scanning dataset for tree species classification: MS-ALS-SPECIES

Matti Hyypp\"a , Klaara Salolahti , Eric Hyypp\"a , Xiaowei Yu , Josef Taher , Leena Matikainen , Matti Lehtom\"aki , Paula Litkey

show 4 more authors

Teemu Hakala Harri Kaartinen Juha Hyypp\"a Antero Kukko

This is my paper

Pith reviewed 2026-05-08 04:31 UTC · model grok-4.3

classification 💻 cs.CV

keywords multispectral ALStree species classificationpoint cloud datasetboreal forestairborne laser scanningdeep learningindividual treepoint transformer

0 comments

The pith

An open multispectral airborne laser scanning dataset of 6326 trees demonstrates improved species classification using point transformer models for small and minority species.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

This paper releases a publicly available dataset of multispectral point clouds from individual trees in a boreal forest. The data comes from two laser scanning systems operating at multiple wavelengths, with one providing very high point density. Detailed methods for collecting reliable field reference data for nine tree species are explained to support high-quality labeling. New analyses explore how classification performance varies with tree height and show specific benefits of the point transformer approach. A sympathetic reader would care because such datasets are rare, limiting progress in using laser scanning for detailed forest biodiversity assessments.

Core claim

The authors present an open multispectral ALS dataset comprising 6326 segment-level point clouds of individual trees representing nine species in Southern Finland, acquired using a helicopter-borne system with point density exceeding 1000 points per square meter and an Optech Titan system with about 35 points per square meter. They describe field data collection techniques for high-quality ground truth and provide new analyses on species classification that highlight the advantages of the point transformer model particularly for small trees and minority species.

What carries the argument

The multispectral segment-level point cloud dataset for individual trees, which enables machine learning models like the point transformer to classify tree species more effectively across different sizes and abundances.

If this is right

Supports improved individual-tree-level forest assessments and biodiversity mapping in boreal ecosystems.
Facilitates development and benchmarking of machine learning and deep learning methods for tree species classification from multispectral data.
Demonstrates that point transformer models can achieve better accuracy for small trees and less common species compared to other approaches.
The dataset's dual acquisition systems allow study of how point density affects classification performance.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Researchers in other forest regions could adapt the field collection techniques to create similar local datasets for species classification.
Combining this dataset with other remote sensing data types might further enhance classification accuracy for keystone species.
Future work could test whether the observed advantages hold when applying the models to operational forest inventories at larger scales.
Public availability of the data lowers barriers for testing new algorithms without needing to collect expensive field-validated samples.

Load-bearing premise

The field data collection produces accurate and unbiased ground truth labels for the tree species that are representative enough for results to apply beyond the specific study area and sensor setups in Southern Finland.

What would settle it

A replication study applying the same models to multispectral ALS data from a different geographic region or forest type that fails to show improved accuracy for small trees and minority species using the point transformer would indicate the claim does not generalize.

Figures

Figures reproduced from arXiv: 2604.24370 by Antero Kukko, Eric Hyypp\"a, Harri Kaartinen, Josef Taher, Juha Hyypp\"a, Klaara Salolahti, Leena Matikainen, Matti Hyypp\"a, Matti Lehtom\"aki, Paula Litkey, Teemu Hakala, Xiaowei Yu.

**Figure 1.** Figure 1: (a) Median overall accuracy of tree species classification as a function of the number of species for previous studies using single view at source ↗

**Figure 2.** Figure 2: (a) The MS-ALS-SPECIES dataset is based on a study area located in Espoonlahti, Southern Finland. (b) Orthophoto of the study view at source ↗

**Figure 3.** Figure 3: (a) The software architecture of the crowdsourcing application developed to enable scalable ground truth collection for tree species view at source ↗

**Figure 4.** Figure 4: The workflow for annotating species labels in the field using the crowdsourcing application. The platform can highlight (a) tree view at source ↗

**Figure 5.** Figure 5: Statistics of the open MS-ALS-SPECIES dataset based on the final state of the reference data view at source ↗

**Figure 6.** Figure 6: Orthographic projections of multispectral point clouds of tree segments in (a) Optech Titan and (b) HeliALS datasets. The top and view at source ↗

**Figure 7.** Figure 7: Scatter plots for the mean intensity of the three channels for each tree segment in the multispectral HeliALS and Optech Titan datasets. view at source ↗

**Figure 8.** Figure 8: (a,b) Histogram of segment-wise point densities in the multispectral (a) HeliALS and (b) Optech Titan datasets. (c, d) Histogram of view at source ↗

**Figure 9.** Figure 9: (a) Overall accuracy (filled bars) and macro-average accuracy (outlines) of tree species classification across clean tree segments in the view at source ↗

**Figure 10.** Figure 10: (a) Overall accuracy on the HeliALS dataset calculated across clean tree segments as a function of tree height for the best performing view at source ↗

**Figure 11.** Figure 11: (a) Overall accuracy per species (solid lines) and for all clean tree segments (dashed line) as a function of tree height averaged across view at source ↗

**Figure 12.** Figure 12: (a) Overall accuracy for each crown category (solid lines) averaged across all classification methods using the HeliALS dataset as a view at source ↗

**Figure 13.** Figure 13: Two examples of segments misclassified by several algorithms. (a) Tree segment A 287 with reference class aspen and misclassified view at source ↗

**Figure 14.** Figure 14: (a) Overall accuracy per species (solid lines) and for all clean tree segments (dashed line) as a function of tree height averaged across view at source ↗

read the original abstract

The shift from stand-level to individual-tree-level forest assessments supports improved biodiversity mapping, particularly in boreal ecosystems where tree species like aspen (Populus tremula L.) play a keystone role. While airborne laser scanning (ALS) is the standard for such inventories, a major limitation is the small number of publicly available ALS datasets containing high-quality, field-validated reference data. Furthermore, open multispectral ALS datasets with high-quality field reference data are completely lacking despite the potential of multispectral ALS data for tree species classification. This paper presents and details an open multispectral ALS dataset used in a recent international benchmarking study of machine learning and deep learning methods for tree species classification by Taher et al. (2026). The dataset comprises 6326 segment-level point clouds of individual trees representing nine species in Southern Finland. The point cloud data has been acquired using two multispectral laser scanning systems each operating at three laser wavelengths: a helicopter-borne system (HeliALS) with a point density exceeding 1000 points/m$^2$ and an Optech Titan system with approximately 35 points/m$^2$. We provide a detailed description of field data collection techniques developed in the study to facilitate the collection of high-quality ground truth data in an efficient and scalable manner. Additionally, our article presents new analyses on species classification using multispectral data building upon the initial findings of Taher et al. (2026). Furthermore, we study the relation between classification accuracy and tree height to highlight the versatility of the open dataset and to demonstrate the advantage of the point transformer model for small trees and minority species.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

This is a useful data release of multispectral ALS point clouds with 6326 labeled trees, but the label accuracy claims rest on unverified field methods.

read the letter

The main thing here is the open dataset: 6326 segment-level multispectral ALS point clouds for nine tree species in Southern Finland, collected with a high-density helicopter system and the Optech Titan. The authors make the data public and describe scalable field techniques for getting the species labels, which is the real gap they fill since open multispectral ALS resources with solid references have been missing. They also add new height-based classification breakdowns that show the point transformer performing better on small trees and minority species compared to other models from the prior Taher et al. benchmarking work. That part is straightforward and adds some practical insight into where multispectral data helps most. The soft spot is the lack of any quantitative checks on label quality. The paper talks up the field collection as producing high-quality ground truth, but it gives no numbers on inter-observer agreement, repeated measures, or cross-checks against other references. In boreal settings, small trees and species like aspen are easy to misidentify, so even moderate label noise could inflate the apparent advantages for the point transformer without proving the data features are truly discriminative. This paper is aimed at remote sensing and forestry researchers who need labeled multispectral point clouds for training classifiers or testing individual-tree methods. A reader working on boreal biodiversity mapping or ALS-based inventories would get direct value from downloading and using the data. It deserves peer review because the resource is new, open, and low-risk to share; reviewers can flag the label validation gap without undermining the dataset's utility. I would send it forward rather than desk reject.

Referee Report

1 major / 0 minor

Summary. The paper presents an open multispectral ALS dataset (MS-ALS-SPECIES) comprising 6326 segment-level point clouds of individual trees from nine species in Southern Finland. Data were acquired with a high-density helicopter-borne HeliALS system (>1000 pts/m²) and an Optech Titan system (~35 pts/m²), both multispectral. It details scalable field collection techniques for ground-truth labels, provides new species-classification analyses using multispectral point clouds, and demonstrates the point transformer's advantage for small trees and minority species.

Significance. If the labels prove reliable, the dataset fills a documented gap in publicly available multispectral ALS reference data for individual-tree classification in boreal forests. It supports benchmarking of ML/DL methods and biodiversity applications, with explicit strengths in open release, dual-sensor coverage, and height-stratified performance analysis.

major comments (1)

Field data collection section and classification results: The manuscript describes field techniques intended to produce high-quality ground truth but reports no quantitative validation of species labels (e.g., inter-observer agreement, repeated measurements on a subset, or cross-checks against independent references such as destructive sampling or high-resolution imagery). This is load-bearing for the central claim that the point transformer outperforms other models on small trees and minority species, because even modest label noise in boreal settings could inflate apparent advantages without reflecting genuine discriminative power of the multispectral features.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for the constructive feedback and for recognizing the potential value of the MS-ALS-SPECIES dataset. We address the single major comment below.

read point-by-point responses

Referee: Field data collection section and classification results: The manuscript describes field techniques intended to produce high-quality ground truth but reports no quantitative validation of species labels (e.g., inter-observer agreement, repeated measurements on a subset, or cross-checks against independent references such as destructive sampling or high-resolution imagery). This is load-bearing for the central claim that the point transformer outperforms other models on small trees and minority species, because even modest label noise in boreal settings could inflate apparent advantages without reflecting genuine discriminative power of the multispectral features.

Authors: We agree that the absence of quantitative label validation metrics is a limitation that should be addressed. The species labels were obtained by trained forestry professionals using visual identification supplemented by existing stand-level inventory records where available; no formal inter-observer agreement study, repeated measurements, or destructive sampling was conducted during the original field campaign. In the revised manuscript we will expand the field data collection section with an explicit discussion of the label acquisition protocol, known sources of uncertainty in boreal species identification, and the implications of potential label noise for the reported classification results. We will also add a short sensitivity analysis showing how the relative performance of the point transformer versus other models holds when restricting the test set to the most confidently labeled trees. These additions will allow readers to better evaluate whether the observed advantages for small trees and minority species reflect genuine multispectral discriminative power. revision: yes

Circularity Check

0 steps flagged

No circularity: dataset release and descriptive analyses are self-contained

full rationale

The manuscript is a data release paper that describes an open multispectral ALS dataset of 6326 individual-tree point clouds, details field collection protocols for ground-truth labels, and reports new descriptive classification analyses using multispectral features. No equations, fitted parameters, or model outputs are defined in terms of themselves; the classification results are presented as empirical observations on the released data rather than predictions derived from internal fits. The single self-citation to Taher et al. (2026) points to a related benchmarking study that employed this dataset and does not serve as the sole justification for any central claim. All core contributions (dataset description, collection methods, and accuracy-vs-height relations) are externally verifiable through the public data release and do not reduce to inputs by construction.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

The paper relies on standard remote-sensing data acquisition practices and established machine-learning classification methods; no new free parameters, axioms, or invented entities are introduced in the abstract.

pith-pipeline@v0.9.0 · 5654 in / 1128 out tokens · 45620 ms · 2026-05-08T04:31:06.403055+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

13 extracted references · 13 canonical work pages

[1]

Retrieved 10.6.2024

URLhttps://kartat.espoo.fi/opendata/. Retrieved 10.6.2024. M. Dalponte, L. Bruzzone, and D. Gianelle. Tree species classification in the Southern Alps based on the fusion of very high geometrical resolution multispectral/hyperspectral images and LiDAR data.Re- mote sensing of environment, 123:258–270,

work page 2024
[2]

Retrieved 27.1.2025

URL https://github.com/JulFrey/DetailView. Retrieved 27.1.2025. S. Graves and S. Marconi. Idtrees 2020 competition data, July

work page 2025
[3]

URLhttps://doi.org/10.5281/zenodo.3934932. S. J. Graves, S. Marconi, D. Stewart, I. Harmon, B. Weinstein, Y . Kanazawa, V . M. Scholl, M. B. Joseph, J. McGlinchy, L. Browne, et al. Data science competition for cross-site individual tree species identification from airborne remote sensing data.PeerJ, 11:e16578,

work page doi:10.5281/zenodo.3934932
[4]

URL https://doi.org/10.5281/zenodo.17077255. G. Jocher, A. Chaurasia, and J. Qiu. Ultralytics YOLO (Version 8.0.0),

work page doi:10.5281/zenodo.17077255
[5]

Retrieved 27.1.2025

URLhttps://github.com/ultralytics/ ultralytics. Retrieved 27.1.2025. M. Jonsson, J. Bengtsson, L. Gamfeldt, J. Moen, and T. Snäll. Levels of forest ecosystem services depend on specific mixtures of com- mercial tree species.Nature plants, 5(2):141–147,

work page 2025
[6]

Korkeala, J

J. Korkeala, J. Muhojoki, A. Kukko, and J. Hyyppä. Espoonlahti mo- bile laser scanning tree species classification, Nov. 2025a. URL https://doi.org/10.5281/zenodo.17639338. J. Korkeala, J. Muhojoki, J. Taher, K. Salolahti, M. Hyyppä, A. Kukko, and J. Hyyppä. Normalview: sensor-agnostic tree species classifi- cation from backpack and aerial lidar data usin...

work page doi:10.5281/zenodo.17639338
[7]

H. Lin, M. Nazari, and D. Zheng. PCTreeS: 3D Point Cloud Tree Species Classification Using Airborne LiDAR Images.arXiv preprint arXiv:2412.04714,

work page arXiv
[8]

fi/en/maps-and-spatial-data/ datasets-and-interfaces/ map-interface-services/ map-image-service-wms-wmts-2

URLhttps://www.maanmittauslaitos. fi/en/maps-and-spatial-data/ datasets-and-interfaces/ map-interface-services/ map-image-service-wms-wmts-2. Retrieved 10.6.2024. H. O. Ørka, E. Næsset, and O. M. Bollandsås. Classifying species of individual trees by intensity and structure features derived from airborne laser scanner data.Remote Sensing of Environment, 1...

work page 2024
[9]

URL https://postgis.net/. S. Prasad, B. L. Saux, N. Yokoya, and R. Hansch. 2018 ieee grss data fusion challenge – fusion of mul- tispectral lidar and hyperspectral data,

work page 2018
[10]

URL https://ieee-dataport.org/open-access/ 2018-ieee-grss-data-fusion-challenge-fusion-multispectral-lidar-and-hyperspectral-data. J.-F. Prieur, B. St-Onge, R. A. Fournier, M. E. Woods, P. Rana, and D. Kneeshaw. A comparison of three airborne laser scanner types for species identification of individual trees.Sensors, 22(1):35,

work page 2018
[11]

URLhttps:// doi.org/10.5281/zenodo.10035928. A. Tockner, R. Kraßnitzer, C. Gollob, S. Witzmann, T. Ritter, and A. Nothdurft. Tree species classification using intensity patterns from individual tree point clouds.International Journal of Applied Earth Observation and Geoinformation, 139:104502,

work page doi:10.5281/zenodo.10035928
[12]

URLhttps: //doi.org/10.1594/PANGAEA.942856. Z. Xi, C. Hopkinson, S. B. Rood, and D. R. Peddle. See the forest and the trees: Effective machine and deep learning algorithms for wood filtering and tree species classification from terrestrial laser scanning.ISPRS Journal of Photogrammetry and Remote Sensing, 168:1–16,

work page doi:10.1594/pangaea.942856
[13]

For each study, we report the country and type of the forest under study, number of trees used to train the classifier, the number of species considered in the study, average point density, classification method, and the achieved overall clas- sification accuracy. Based on the information reported in the papers, we have attempted to interpret the number o...

work page 2010

[1] [1]

Retrieved 10.6.2024

URLhttps://kartat.espoo.fi/opendata/. Retrieved 10.6.2024. M. Dalponte, L. Bruzzone, and D. Gianelle. Tree species classification in the Southern Alps based on the fusion of very high geometrical resolution multispectral/hyperspectral images and LiDAR data.Re- mote sensing of environment, 123:258–270,

work page 2024

[2] [2]

Retrieved 27.1.2025

URL https://github.com/JulFrey/DetailView. Retrieved 27.1.2025. S. Graves and S. Marconi. Idtrees 2020 competition data, July

work page 2025

[3] [3]

URLhttps://doi.org/10.5281/zenodo.3934932. S. J. Graves, S. Marconi, D. Stewart, I. Harmon, B. Weinstein, Y . Kanazawa, V . M. Scholl, M. B. Joseph, J. McGlinchy, L. Browne, et al. Data science competition for cross-site individual tree species identification from airborne remote sensing data.PeerJ, 11:e16578,

work page doi:10.5281/zenodo.3934932

[4] [4]

URL https://doi.org/10.5281/zenodo.17077255. G. Jocher, A. Chaurasia, and J. Qiu. Ultralytics YOLO (Version 8.0.0),

work page doi:10.5281/zenodo.17077255

[5] [5]

Retrieved 27.1.2025

URLhttps://github.com/ultralytics/ ultralytics. Retrieved 27.1.2025. M. Jonsson, J. Bengtsson, L. Gamfeldt, J. Moen, and T. Snäll. Levels of forest ecosystem services depend on specific mixtures of com- mercial tree species.Nature plants, 5(2):141–147,

work page 2025

[6] [6]

Korkeala, J

J. Korkeala, J. Muhojoki, A. Kukko, and J. Hyyppä. Espoonlahti mo- bile laser scanning tree species classification, Nov. 2025a. URL https://doi.org/10.5281/zenodo.17639338. J. Korkeala, J. Muhojoki, J. Taher, K. Salolahti, M. Hyyppä, A. Kukko, and J. Hyyppä. Normalview: sensor-agnostic tree species classifi- cation from backpack and aerial lidar data usin...

work page doi:10.5281/zenodo.17639338

[7] [7]

H. Lin, M. Nazari, and D. Zheng. PCTreeS: 3D Point Cloud Tree Species Classification Using Airborne LiDAR Images.arXiv preprint arXiv:2412.04714,

work page arXiv

[8] [8]

fi/en/maps-and-spatial-data/ datasets-and-interfaces/ map-interface-services/ map-image-service-wms-wmts-2

URLhttps://www.maanmittauslaitos. fi/en/maps-and-spatial-data/ datasets-and-interfaces/ map-interface-services/ map-image-service-wms-wmts-2. Retrieved 10.6.2024. H. O. Ørka, E. Næsset, and O. M. Bollandsås. Classifying species of individual trees by intensity and structure features derived from airborne laser scanner data.Remote Sensing of Environment, 1...

work page 2024

[9] [9]

URL https://postgis.net/. S. Prasad, B. L. Saux, N. Yokoya, and R. Hansch. 2018 ieee grss data fusion challenge – fusion of mul- tispectral lidar and hyperspectral data,

work page 2018

[10] [10]

URL https://ieee-dataport.org/open-access/ 2018-ieee-grss-data-fusion-challenge-fusion-multispectral-lidar-and-hyperspectral-data. J.-F. Prieur, B. St-Onge, R. A. Fournier, M. E. Woods, P. Rana, and D. Kneeshaw. A comparison of three airborne laser scanner types for species identification of individual trees.Sensors, 22(1):35,

work page 2018

[11] [11]

URLhttps:// doi.org/10.5281/zenodo.10035928. A. Tockner, R. Kraßnitzer, C. Gollob, S. Witzmann, T. Ritter, and A. Nothdurft. Tree species classification using intensity patterns from individual tree point clouds.International Journal of Applied Earth Observation and Geoinformation, 139:104502,

work page doi:10.5281/zenodo.10035928

[12] [12]

URLhttps: //doi.org/10.1594/PANGAEA.942856. Z. Xi, C. Hopkinson, S. B. Rood, and D. R. Peddle. See the forest and the trees: Effective machine and deep learning algorithms for wood filtering and tree species classification from terrestrial laser scanning.ISPRS Journal of Photogrammetry and Remote Sensing, 168:1–16,

work page doi:10.1594/pangaea.942856

[13] [13]

For each study, we report the country and type of the forest under study, number of trees used to train the classifier, the number of species considered in the study, average point density, classification method, and the achieved overall clas- sification accuracy. Based on the information reported in the papers, we have attempted to interpret the number o...

work page 2010