Highly Detailed and Generalizable Broadleaf Tree Crown Instance Segmentation from UAV Imagery
Pith reviewed 2026-05-19 19:22 UTC · model grok-4.3
The pith
A Mask2Former model trained on 18,507 hand-annotated crown polygons segments individual tree crowns in complex broadleaf forests from UAV RGB imagery and generalizes to other regions and forest types.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
By training a Mask2Former model on 18,507 manually delineated crown polygons from UAV orthomosaics of seven Japanese forests, the authors achieve high instance segmentation performance for individual tree crowns in structurally complex broadleaf forests using only RGB imagery, with this performance generalizing to other Japanese forests and to tropical rainforests in Borneo.
What carries the argument
Mask2Former instance segmentation model trained on the large custom dataset of hand-drawn crown outlines.
Load-bearing premise
The 18,507 manually delineated crown polygons from seven Japanese forests provide accurate ground truth and sufficient ecological diversity for the model to generalize to other forests such as Bornean tropical rainforests.
What would settle it
Evaluating the model on UAV imagery from a forest with markedly different tree species, crown structures, or imaging conditions outside the tested regions and observing a substantial drop in segmentation accuracy.
Figures
read the original abstract
We present a highly detailed instance segmentation model for delineating individual tree crowns in natural broadleaf forests using aerial imagery acquired by unmanned aerial vehicles (UAVs). Tree crown delineation in broadleaf forests is more challenging than in other forest types due to diversity of crown shapes and the lack of clearly defined treetops. To address this issue, we developed a deep-learning-based crown segmentation model trained on high-quality annotated crown outlines. We manually delineated 18,507 crown polygons from orthomosaic images collected across seven forests in Japan by skilled annotators, and developed a model based on Mask2Former with multiple backbone architectures. The best model achieved high segmentation performance in structurally complex broadleaf forests using only RGB imagery. This performance was maintained when applied to geographically distinct forests within Japan, as well as to biologically distinct tropical rainforests in Borneo. These results demonstrate that using a large number of high-quality annotated datasets is critical for achieving detailed and generalizable crown segmentation across diverse forest ecosystems. The developed model has been integrated into DF Scanner Pro, a software that supports practical forest monitoring using UAVs, and this implementation is expected to enable a wide range of users to analyze tree-level information in broadleaf forest from UAVs.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript presents a Mask2Former-based instance segmentation model for delineating individual broadleaf tree crowns from UAV RGB orthomosaic imagery. It describes the manual creation of a training dataset consisting of 18,507 crown polygons from seven Japanese forests and reports that the best model achieves high segmentation performance in structurally complex broadleaf forests. This performance is stated to be maintained on geographically distinct Japanese forests and on biologically distinct tropical rainforests in Borneo. The model is integrated into DF Scanner Pro software for practical UAV-based forest monitoring.
Significance. If the reported generalization holds under quantitative scrutiny, the work would represent a meaningful advance in remote-sensing applications for ecology by enabling detailed, individual-tree crown delineation in broadleaf systems using only accessible RGB data. The scale of the annotated dataset and the explicit focus on cross-region transfer constitute clear strengths that could support scalable forest monitoring.
major comments (2)
- [Results (Borneo transfer experiments)] The central generalization claim to Bornean tropical rainforests rests on the assertion that performance is maintained, yet the provided text supplies no quantitative metrics (e.g., AP, mIoU, or boundary F-score) or details on the size and annotation status of the Borneo test set. Without these, the domain-shift robustness cannot be verified and remains a load-bearing gap for the primary empirical contribution.
- [Methods (Dataset construction)] The dataset description indicates all 18,507 polygons originate from seven Japanese forests; the manuscript should explicitly address whether species composition, canopy density, and phenological variability across these sites are sufficient to support transfer to Bornean rainforests, or whether additional controls (e.g., stratified sampling or domain-adaptation baselines) were considered.
minor comments (2)
- [Abstract] The abstract repeatedly uses the phrase 'high segmentation performance' without numerical values; including at least the primary quantitative scores for the best model on the held-out Japanese test sets would improve clarity.
- [Figures] Figure captions and axis labels should be checked for consistency with the reported metrics (e.g., ensuring IoU thresholds are stated when visual results are shown).
Simulated Author's Rebuttal
We thank the referee for their constructive and detailed review. We address each major comment below and commit to revisions that will improve the clarity and verifiability of our generalization results and dataset description.
read point-by-point responses
-
Referee: [Results (Borneo transfer experiments)] The central generalization claim to Bornean tropical rainforests rests on the assertion that performance is maintained, yet the provided text supplies no quantitative metrics (e.g., AP, mIoU, or boundary F-score) or details on the size and annotation status of the Borneo test set. Without these, the domain-shift robustness cannot be verified and remains a load-bearing gap for the primary empirical contribution.
Authors: We agree that explicit quantitative support is required to substantiate the claim. The manuscript text states that performance was maintained but does not present the numerical values or test-set details in the main body. In the revised version we will add a dedicated paragraph and table in the Results section reporting AP, mIoU, and boundary F-score on the Borneo data, together with the number of orthomosaic images, the number of annotated crowns, and a statement that annotations followed the identical high-quality protocol used for the Japanese training set. These additions will allow direct verification of domain-shift robustness. revision: yes
-
Referee: [Methods (Dataset construction)] The dataset description indicates all 18,507 polygons originate from seven Japanese forests; the manuscript should explicitly address whether species composition, canopy density, and phenological variability across these sites are sufficient to support transfer to Bornean rainforests, or whether additional controls (e.g., stratified sampling or domain-adaptation baselines) were considered.
Authors: We acknowledge the value of an explicit discussion. In the revised Methods section we will expand the dataset-construction paragraph to describe the range of species compositions (deciduous and evergreen broadleaf taxa), canopy-density gradients, and seasonal phenological conditions represented across the seven Japanese sites. We will explain how this structural and compositional diversity was selected to capture crown-shape variability relevant to tropical broadleaf systems. The original study did not apply domain-adaptation baselines or formal stratified sampling; we will note this limitation and indicate that multi-site collection served as the primary control for generalization. revision: yes
Circularity Check
No significant circularity; empirical evaluation on held-out data
full rationale
The paper reports training a Mask2Former instance segmentation model on 18,507 manually delineated crown polygons from seven Japanese forests and evaluating segmentation performance on separate test imagery from other Japanese sites and Bornean tropical forests. No equations, fitted parameters, or self-referential definitions appear in the provided text; the central claims rest on standard supervised learning metrics computed on geographically and biologically distinct held-out data rather than any reduction of outputs to inputs by construction. Self-citations, if present, are not load-bearing for the generalization result, and the evaluation protocol is externally falsifiable via the reported test sets. This is a conventional empirical ML study whose performance numbers are not tautological with the training procedure.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption Manual delineations by skilled annotators constitute accurate ground-truth crown boundaries
Lean theorems connected to this paper
-
IndisputableMonolith/Cost/FunctionalEquation.leanwashburn_uniqueness_aczel unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
We manually delineated 18,507 crown polygons ... developed a model based on Mask2Former with multiple backbone architectures.
-
IndisputableMonolith/Foundation/AbsoluteFloorClosure.leanreality_from_one_distinction unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
The best model achieved high segmentation performance ... maintained when applied to ... tropical rainforests in Borneo.
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Reference graph
Works this paper leans on
-
[1]
Detection of individual tree crowns in airborne lidar data
Barbara Koch, Ursula Heyder, and Holger Weinacker. Detection of individual tree crowns in airborne lidar data. Photogrammetric Engineering & Remote Sensing, 2006
work page 2006
-
[2]
Olli Nevalainen, Eija Honkavaara, Sakari Tuominen, Niko Viljanen, Teemu Hakala, Xiaowei Yu, Juha Hyyppä, Heikki Saari, Ilkka Pölönen, Nilton N. Imai, and Antonio M.G. Tommaselli. Individual tree detection and classification with uav-based photogrammetric point clouds and hyperspectral imaging.Remote Sensing, 9(3), 2017
work page 2017
-
[3]
Xuemei Zhou and Xiaoli Zhang. Individual tree parameters estimation for plantation forests based on uav oblique photography.IEEE Access, 8:96184–96198, 2020
work page 2020
-
[4]
Automatic classification of trees using a uav onboard camera and deep learning
Masanori Onishi and Takeshi Ise. Automatic classification of trees using a uav onboard camera and deep learning. arXiv, 2018
work page 2018
-
[5]
Kyaw Kyaw Htoo, Masanori Onishi, Md Farhadur Rahman, Ryuichi Takeshige, Kaoru Kitajima, and Yusuke Onoda. Development of crown-based allometric equations for estimating stem diameter and above-ground biomass using uav-lidar in 23 species-rich natural forests of japan.Journal of Forest Research, 30(6):491–501, 2025
work page 2025
-
[6]
Janik Steier and Dorota Iwaszczuk. Comparison of manual and semi-automated synthetic training data creation for individual tree crown delineation.The International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences, XLVIII-1/W6-2025:227–233, 2025
work page 2025
-
[7]
Zhen Zhen, Lindi J. Quackenbush, Stephen V . Stehman, and Lianjun Zhang. Agent-based region growing for individual tree crown delineation from airborne laser scanning (als) data.International Journal of Remote Sensing, 36(7):1965–1993, 2015
work page 1965
-
[8]
Wenxia Dai, Bisheng Yang, Zhen Dong, and Ahmed Shaker. A new method for 3d individual tree extraction using multispectral airborne lidar point clouds.ISPRS Journal of Photogrammetry and Remote Sensing, 144:400–411, 2018
work page 2018
-
[9]
Pestryakova, Bodo Bookhagen, Evgenii S
Frederic Brieger, Ulrike Herzschuh, Luidmila A. Pestryakova, Bodo Bookhagen, Evgenii S. Zakharov, and Stefan Kruse. Advances in the derivation of northeast siberian forest metrics using high-resolution uav-based photogrammetric point clouds.Remote Sensing, 11(12), 2019
work page 2019
-
[10]
Juntao Yang, Zhizhong Kang, Sai Cheng, Zhou Yang, and Perpetual Hope Akwensi. An individual tree segmen- tation method based on watershed algorithm and three-dimensional spatial distribution analysis from airborne lidar point clouds.IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, pages 1055–1067, 2020. 11 Nakada et al
work page 2020
-
[11]
Weinstein, Sergio Marconi, Stephanie A
Ben G. Weinstein, Sergio Marconi, Stephanie A. Bohlman, Alina Zare, and Ethan P. White. Cross-site learning in deep learning rgb tree crown detection.Ecological Informatics, 56, 2020
work page 2020
-
[12]
Juepeng Zheng, Shuai Yuan, Weijia Li, Haohuan Fu, Le Yu, and Jianxi Huang. A review of individual tree crown detection and delineation from optical remote sensing images: Current progress and future.IEEE Geoscience and Remote Sensing Magazine, 13(1):209–236, 2025
work page 2025
-
[13]
Mai Viet Hoang Do, Duc-Thang Phung, Hoang Duy Linh Pham, Quang-Duy Pham, Van-Nam Hoang, Van-Sam Hoang, Michiel Vlaminck, Hiep Luong, Thanh-Hai Tran, Hai Vu, and Thi-Lan Le. A uav rgb dataset and method for instance tree crown segmentation for biodiversity monitoring.Scientific Reports, 2026
work page 2026
-
[14]
James G.C. Ball, Sebastian H.M. Hickman, Tobias D. Jackson, Xian Jing Koay, James Hirst, William Jay, Matthew Archer, Mélaine Aubry-Kientz, Grégoire Vincent, and David A. Coomes. Accurate delineation of individual tree crowns in tropical forests from aerial rgb imagery using mask r-cnn.Remote Sensing in Ecology and Conservation, 9(5):641–655, 2023
work page 2023
-
[15]
Ryuichi Takeshige, Kyaw Kyaw Htoo, Masanori Onishi, Farhadur Md Rahman, Kazuhiko Hoshizaki, Hideyuki Ida, Masae Iwamoto Ishihara, Akira Itoh, Takayuki Kaneko, Ayumi Katayama, Shigeo Kuramoto, Hiroko Kurokawa, Masayuki Maki, Kazuhiko Masaka, Tatsuro Nakaji, Masahiro Nakamura, Naoyuki Nishimura, Mahoko Noguchi, Atsushi Sakai, Atsushi Takashima, Naoaki Tashi...
work page 2025
-
[16]
Lawrence Zitnick, and Piotr Dollár
Tsung-Yi Lin, Michael Maire, Serge Belongie, Lubomir Bourdev, Ross Girshick, James Hays, Pietro Perona, Deva Ramanan, C. Lawrence Zitnick, and Piotr Dollár. Microsoft coco: Common objects in context.arXiv, 2015
work page 2015
-
[17]
Schwing, Alexander Kirillov, and Rohit Girdhar
Bowen Cheng, Ishan Misra, Alexander G. Schwing, Alexander Kirillov, and Rohit Girdhar. Masked-attention mask transformer for universal image segmentation.arXiv, 2022
work page 2022
-
[18]
Deep residual learning for image recognition.arXiv, 2015
Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep residual learning for image recognition.arXiv, 2015
work page 2015
-
[19]
Swin transformer: Hierarchical vision transformer using shifted windows.arXiv, 2021
Ze Liu, Yutong Lin, Yue Cao, Han Hu, Yixuan Wei, Zheng Zhang, Stephen Lin, and Baining Guo. Swin transformer: Hierarchical vision transformer using shifted windows.arXiv, 2021
work page 2021
-
[20]
Decoupled weight decay regularization.arXiv, 2019
Ilya Loshchilov and Frank Hutter. Decoupled weight decay regularization.arXiv, 2019
work page 2019
-
[21]
Christian Payer, Darko Štern, Marlies Feiner, Horst Bischof, and Martin Urschler. Segmenting and tracking cell instances with cosine embeddings and recurrent hourglass networks.Medical Image Analysis, 57:106–119, 2019
work page 2019
-
[22]
Zhiang Chen, Tyler R. Scott, Sarah Bearman, Harish Anand, Devin Keating, Chelsea Scott, J Ramon Arrowsmith, and Jnaneshwar Das. Geomorphological analysis using unpiloted aircraft systems, structure from motion, and deep learning.arXiv, 2021
work page 2021
-
[23]
Annual report on forest and forestry in japan fiscal year 2024 (summary)
Forestry Agency. Annual report on forest and forestry in japan fiscal year 2024 (summary). Technical report, Ministry of Agriculture, Forestry and Fisheries, Japan, 2024
work page 2024
-
[24]
Assessing sam for tree crown instance segmentation from drone imagery.arXiv, 2025
Mélisande Teng, Arthur Ouaknine, Etienne Laliberté, Yoshua Bengio, David Rolnick, and Hugo Larochelle. Assessing sam for tree crown instance segmentation from drone imagery.arXiv, 2025. 12
work page 2025
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.