pith. sign in

arxiv: 2605.15673 · v1 · pith:ZL4FCCFPnew · submitted 2026-05-15 · 📡 eess.IV · cs.CV· cs.LG

Highly Detailed and Generalizable Broadleaf Tree Crown Instance Segmentation from UAV Imagery

Pith reviewed 2026-05-19 19:22 UTC · model grok-4.3

classification 📡 eess.IV cs.CVcs.LG
keywords tree crown segmentationUAVbroadleaf forestinstance segmentationdeep learninggeneralizationforest monitoring
0
0 comments X

The pith

A Mask2Former model trained on 18,507 hand-annotated crown polygons segments individual tree crowns in complex broadleaf forests from UAV RGB imagery and generalizes to other regions and forest types.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper aims to solve the problem of outlining single tree crowns in broadleaf forests, where shapes are diverse and treetops are hard to spot, using UAV photos. The authors created a large training set by manually outlining 18,507 crowns across seven Japanese forests. They trained several versions of a Mask2Former model and selected the best one. Tests showed strong results on the complex forests, and the same results held up in other Japanese forests and in Bornean tropical rainforests. The work demonstrates that high-quality large-scale annotations are key to making such models generalizable, and they put the model into practical software for forest analysis.

Core claim

By training a Mask2Former model on 18,507 manually delineated crown polygons from UAV orthomosaics of seven Japanese forests, the authors achieve high instance segmentation performance for individual tree crowns in structurally complex broadleaf forests using only RGB imagery, with this performance generalizing to other Japanese forests and to tropical rainforests in Borneo.

What carries the argument

Mask2Former instance segmentation model trained on the large custom dataset of hand-drawn crown outlines.

Load-bearing premise

The 18,507 manually delineated crown polygons from seven Japanese forests provide accurate ground truth and sufficient ecological diversity for the model to generalize to other forests such as Bornean tropical rainforests.

What would settle it

Evaluating the model on UAV imagery from a forest with markedly different tree species, crown structures, or imaging conditions outside the tested regions and observing a substantial drop in segmentation accuracy.

Figures

Figures reproduced from arXiv: 2605.15673 by (2) YM Lab., 3) ((1) DeepForest Technologies Co., (3) Graduate School of Agriculture, 4), (4) Graduate School of Science, 5), (5) Faculty of Tropical Forestry, (6) Forest Research Centre), Kanehiro Kitayama (3, Kengo Ikebata (1), Kyaw Kyaw Htoo (3), Kyoto University, Ltd., Masanori Onishi (1, Mitsutaka Nakada (1), Osaka Metropolitan University, Robert Ong (6), Ryuichi Takeshige (3, Takahiko Ikebata (1), Universiti Malaysia Sabah, Yuji Mizuno (2), Yusuke Onoda (3).

Figure 1
Figure 1. Figure 1: Locations of the sites used for model development (training, validation and test) and inference (Sanpoku [PITH_FULL_IMAGE:figures/full_fig_p003_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Examples of training data annotation for tree crown delineation. (a, c) Original orthomosaic images. (b, d) [PITH_FULL_IMAGE:figures/full_fig_p005_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Examples of tree crown segmentation results in Sanpoku Forest. (a, c) Original orthomosaic images of [PITH_FULL_IMAGE:figures/full_fig_p007_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Examples of tree crown segmentation results in Sadayama Forest. (a, c, e) Original orthomosaic images. (b, d, [PITH_FULL_IMAGE:figures/full_fig_p008_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Examples of tree crown segmentation results in Borneo Forest. (a, c, e) Original orthomosaic images. (b, d, f) [PITH_FULL_IMAGE:figures/full_fig_p009_5.png] view at source ↗
read the original abstract

We present a highly detailed instance segmentation model for delineating individual tree crowns in natural broadleaf forests using aerial imagery acquired by unmanned aerial vehicles (UAVs). Tree crown delineation in broadleaf forests is more challenging than in other forest types due to diversity of crown shapes and the lack of clearly defined treetops. To address this issue, we developed a deep-learning-based crown segmentation model trained on high-quality annotated crown outlines. We manually delineated 18,507 crown polygons from orthomosaic images collected across seven forests in Japan by skilled annotators, and developed a model based on Mask2Former with multiple backbone architectures. The best model achieved high segmentation performance in structurally complex broadleaf forests using only RGB imagery. This performance was maintained when applied to geographically distinct forests within Japan, as well as to biologically distinct tropical rainforests in Borneo. These results demonstrate that using a large number of high-quality annotated datasets is critical for achieving detailed and generalizable crown segmentation across diverse forest ecosystems. The developed model has been integrated into DF Scanner Pro, a software that supports practical forest monitoring using UAVs, and this implementation is expected to enable a wide range of users to analyze tree-level information in broadleaf forest from UAVs.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The manuscript presents a Mask2Former-based instance segmentation model for delineating individual broadleaf tree crowns from UAV RGB orthomosaic imagery. It describes the manual creation of a training dataset consisting of 18,507 crown polygons from seven Japanese forests and reports that the best model achieves high segmentation performance in structurally complex broadleaf forests. This performance is stated to be maintained on geographically distinct Japanese forests and on biologically distinct tropical rainforests in Borneo. The model is integrated into DF Scanner Pro software for practical UAV-based forest monitoring.

Significance. If the reported generalization holds under quantitative scrutiny, the work would represent a meaningful advance in remote-sensing applications for ecology by enabling detailed, individual-tree crown delineation in broadleaf systems using only accessible RGB data. The scale of the annotated dataset and the explicit focus on cross-region transfer constitute clear strengths that could support scalable forest monitoring.

major comments (2)
  1. [Results (Borneo transfer experiments)] The central generalization claim to Bornean tropical rainforests rests on the assertion that performance is maintained, yet the provided text supplies no quantitative metrics (e.g., AP, mIoU, or boundary F-score) or details on the size and annotation status of the Borneo test set. Without these, the domain-shift robustness cannot be verified and remains a load-bearing gap for the primary empirical contribution.
  2. [Methods (Dataset construction)] The dataset description indicates all 18,507 polygons originate from seven Japanese forests; the manuscript should explicitly address whether species composition, canopy density, and phenological variability across these sites are sufficient to support transfer to Bornean rainforests, or whether additional controls (e.g., stratified sampling or domain-adaptation baselines) were considered.
minor comments (2)
  1. [Abstract] The abstract repeatedly uses the phrase 'high segmentation performance' without numerical values; including at least the primary quantitative scores for the best model on the held-out Japanese test sets would improve clarity.
  2. [Figures] Figure captions and axis labels should be checked for consistency with the reported metrics (e.g., ensuring IoU thresholds are stated when visual results are shown).

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their constructive and detailed review. We address each major comment below and commit to revisions that will improve the clarity and verifiability of our generalization results and dataset description.

read point-by-point responses
  1. Referee: [Results (Borneo transfer experiments)] The central generalization claim to Bornean tropical rainforests rests on the assertion that performance is maintained, yet the provided text supplies no quantitative metrics (e.g., AP, mIoU, or boundary F-score) or details on the size and annotation status of the Borneo test set. Without these, the domain-shift robustness cannot be verified and remains a load-bearing gap for the primary empirical contribution.

    Authors: We agree that explicit quantitative support is required to substantiate the claim. The manuscript text states that performance was maintained but does not present the numerical values or test-set details in the main body. In the revised version we will add a dedicated paragraph and table in the Results section reporting AP, mIoU, and boundary F-score on the Borneo data, together with the number of orthomosaic images, the number of annotated crowns, and a statement that annotations followed the identical high-quality protocol used for the Japanese training set. These additions will allow direct verification of domain-shift robustness. revision: yes

  2. Referee: [Methods (Dataset construction)] The dataset description indicates all 18,507 polygons originate from seven Japanese forests; the manuscript should explicitly address whether species composition, canopy density, and phenological variability across these sites are sufficient to support transfer to Bornean rainforests, or whether additional controls (e.g., stratified sampling or domain-adaptation baselines) were considered.

    Authors: We acknowledge the value of an explicit discussion. In the revised Methods section we will expand the dataset-construction paragraph to describe the range of species compositions (deciduous and evergreen broadleaf taxa), canopy-density gradients, and seasonal phenological conditions represented across the seven Japanese sites. We will explain how this structural and compositional diversity was selected to capture crown-shape variability relevant to tropical broadleaf systems. The original study did not apply domain-adaptation baselines or formal stratified sampling; we will note this limitation and indicate that multi-site collection served as the primary control for generalization. revision: yes

Circularity Check

0 steps flagged

No significant circularity; empirical evaluation on held-out data

full rationale

The paper reports training a Mask2Former instance segmentation model on 18,507 manually delineated crown polygons from seven Japanese forests and evaluating segmentation performance on separate test imagery from other Japanese sites and Bornean tropical forests. No equations, fitted parameters, or self-referential definitions appear in the provided text; the central claims rest on standard supervised learning metrics computed on geographically and biologically distinct held-out data rather than any reduction of outputs to inputs by construction. Self-citations, if present, are not load-bearing for the generalization result, and the evaluation protocol is externally falsifiable via the reported test sets. This is a conventional empirical ML study whose performance numbers are not tautological with the training procedure.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim rests on the quality and representativeness of the human annotations plus standard deep-learning assumptions; no new physical entities or ad-hoc constants are introduced.

axioms (1)
  • domain assumption Manual delineations by skilled annotators constitute accurate ground-truth crown boundaries
    All training and evaluation depend on these polygons as the reference standard.

pith-pipeline@v0.9.0 · 5885 in / 1271 out tokens · 45500 ms · 2026-05-19T19:22:56.192569+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

24 extracted references · 24 canonical work pages

  1. [1]

    Detection of individual tree crowns in airborne lidar data

    Barbara Koch, Ursula Heyder, and Holger Weinacker. Detection of individual tree crowns in airborne lidar data. Photogrammetric Engineering & Remote Sensing, 2006

  2. [2]

    Imai, and Antonio M.G

    Olli Nevalainen, Eija Honkavaara, Sakari Tuominen, Niko Viljanen, Teemu Hakala, Xiaowei Yu, Juha Hyyppä, Heikki Saari, Ilkka Pölönen, Nilton N. Imai, and Antonio M.G. Tommaselli. Individual tree detection and classification with uav-based photogrammetric point clouds and hyperspectral imaging.Remote Sensing, 9(3), 2017

  3. [3]

    Individual tree parameters estimation for plantation forests based on uav oblique photography.IEEE Access, 8:96184–96198, 2020

    Xuemei Zhou and Xiaoli Zhang. Individual tree parameters estimation for plantation forests based on uav oblique photography.IEEE Access, 8:96184–96198, 2020

  4. [4]

    Automatic classification of trees using a uav onboard camera and deep learning

    Masanori Onishi and Takeshi Ise. Automatic classification of trees using a uav onboard camera and deep learning. arXiv, 2018

  5. [5]

    Kyaw Kyaw Htoo, Masanori Onishi, Md Farhadur Rahman, Ryuichi Takeshige, Kaoru Kitajima, and Yusuke Onoda. Development of crown-based allometric equations for estimating stem diameter and above-ground biomass using uav-lidar in 23 species-rich natural forests of japan.Journal of Forest Research, 30(6):491–501, 2025

  6. [6]

    Janik Steier and Dorota Iwaszczuk. Comparison of manual and semi-automated synthetic training data creation for individual tree crown delineation.The International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences, XLVIII-1/W6-2025:227–233, 2025

  7. [7]

    Quackenbush, Stephen V

    Zhen Zhen, Lindi J. Quackenbush, Stephen V . Stehman, and Lianjun Zhang. Agent-based region growing for individual tree crown delineation from airborne laser scanning (als) data.International Journal of Remote Sensing, 36(7):1965–1993, 2015

  8. [8]

    A new method for 3d individual tree extraction using multispectral airborne lidar point clouds.ISPRS Journal of Photogrammetry and Remote Sensing, 144:400–411, 2018

    Wenxia Dai, Bisheng Yang, Zhen Dong, and Ahmed Shaker. A new method for 3d individual tree extraction using multispectral airborne lidar point clouds.ISPRS Journal of Photogrammetry and Remote Sensing, 144:400–411, 2018

  9. [9]

    Pestryakova, Bodo Bookhagen, Evgenii S

    Frederic Brieger, Ulrike Herzschuh, Luidmila A. Pestryakova, Bodo Bookhagen, Evgenii S. Zakharov, and Stefan Kruse. Advances in the derivation of northeast siberian forest metrics using high-resolution uav-based photogrammetric point clouds.Remote Sensing, 11(12), 2019

  10. [10]

    Juntao Yang, Zhizhong Kang, Sai Cheng, Zhou Yang, and Perpetual Hope Akwensi. An individual tree segmen- tation method based on watershed algorithm and three-dimensional spatial distribution analysis from airborne lidar point clouds.IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, pages 1055–1067, 2020. 11 Nakada et al

  11. [11]

    Weinstein, Sergio Marconi, Stephanie A

    Ben G. Weinstein, Sergio Marconi, Stephanie A. Bohlman, Alina Zare, and Ethan P. White. Cross-site learning in deep learning rgb tree crown detection.Ecological Informatics, 56, 2020

  12. [12]

    Juepeng Zheng, Shuai Yuan, Weijia Li, Haohuan Fu, Le Yu, and Jianxi Huang. A review of individual tree crown detection and delineation from optical remote sensing images: Current progress and future.IEEE Geoscience and Remote Sensing Magazine, 13(1):209–236, 2025

  13. [13]

    A uav rgb dataset and method for instance tree crown segmentation for biodiversity monitoring.Scientific Reports, 2026

    Mai Viet Hoang Do, Duc-Thang Phung, Hoang Duy Linh Pham, Quang-Duy Pham, Van-Nam Hoang, Van-Sam Hoang, Michiel Vlaminck, Hiep Luong, Thanh-Hai Tran, Hai Vu, and Thi-Lan Le. A uav rgb dataset and method for instance tree crown segmentation for biodiversity monitoring.Scientific Reports, 2026

  14. [14]

    Ball, Sebastian H.M

    James G.C. Ball, Sebastian H.M. Hickman, Tobias D. Jackson, Xian Jing Koay, James Hirst, William Jay, Matthew Archer, Mélaine Aubry-Kientz, Grégoire Vincent, and David A. Coomes. Accurate delineation of individual tree crowns in tropical forests from aerial rgb imagery using mask r-cnn.Remote Sensing in Ecology and Conservation, 9(5):641–655, 2023

  15. [15]

    Ryuichi Takeshige, Kyaw Kyaw Htoo, Masanori Onishi, Farhadur Md Rahman, Kazuhiko Hoshizaki, Hideyuki Ida, Masae Iwamoto Ishihara, Akira Itoh, Takayuki Kaneko, Ayumi Katayama, Shigeo Kuramoto, Hiroko Kurokawa, Masayuki Maki, Kazuhiko Masaka, Tatsuro Nakaji, Masahiro Nakamura, Naoyuki Nishimura, Mahoko Noguchi, Atsushi Sakai, Atsushi Takashima, Naoaki Tashi...

  16. [16]

    Lawrence Zitnick, and Piotr Dollár

    Tsung-Yi Lin, Michael Maire, Serge Belongie, Lubomir Bourdev, Ross Girshick, James Hays, Pietro Perona, Deva Ramanan, C. Lawrence Zitnick, and Piotr Dollár. Microsoft coco: Common objects in context.arXiv, 2015

  17. [17]

    Schwing, Alexander Kirillov, and Rohit Girdhar

    Bowen Cheng, Ishan Misra, Alexander G. Schwing, Alexander Kirillov, and Rohit Girdhar. Masked-attention mask transformer for universal image segmentation.arXiv, 2022

  18. [18]

    Deep residual learning for image recognition.arXiv, 2015

    Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep residual learning for image recognition.arXiv, 2015

  19. [19]

    Swin transformer: Hierarchical vision transformer using shifted windows.arXiv, 2021

    Ze Liu, Yutong Lin, Yue Cao, Han Hu, Yixuan Wei, Zheng Zhang, Stephen Lin, and Baining Guo. Swin transformer: Hierarchical vision transformer using shifted windows.arXiv, 2021

  20. [20]

    Decoupled weight decay regularization.arXiv, 2019

    Ilya Loshchilov and Frank Hutter. Decoupled weight decay regularization.arXiv, 2019

  21. [21]

    Segmenting and tracking cell instances with cosine embeddings and recurrent hourglass networks.Medical Image Analysis, 57:106–119, 2019

    Christian Payer, Darko Štern, Marlies Feiner, Horst Bischof, and Martin Urschler. Segmenting and tracking cell instances with cosine embeddings and recurrent hourglass networks.Medical Image Analysis, 57:106–119, 2019

  22. [22]

    Scott, Sarah Bearman, Harish Anand, Devin Keating, Chelsea Scott, J Ramon Arrowsmith, and Jnaneshwar Das

    Zhiang Chen, Tyler R. Scott, Sarah Bearman, Harish Anand, Devin Keating, Chelsea Scott, J Ramon Arrowsmith, and Jnaneshwar Das. Geomorphological analysis using unpiloted aircraft systems, structure from motion, and deep learning.arXiv, 2021

  23. [23]

    Annual report on forest and forestry in japan fiscal year 2024 (summary)

    Forestry Agency. Annual report on forest and forestry in japan fiscal year 2024 (summary). Technical report, Ministry of Agriculture, Forestry and Fisheries, Japan, 2024

  24. [24]

    Assessing sam for tree crown instance segmentation from drone imagery.arXiv, 2025

    Mélisande Teng, Arthur Ouaknine, Etienne Laliberté, Yoshua Bengio, David Rolnick, and Hugo Larochelle. Assessing sam for tree crown instance segmentation from drone imagery.arXiv, 2025. 12