SelectAnyTree: A Promptable Instance Segmentation Model for 3D Forest LiDAR Point Clouds

Daniel Lusk; Duc Viet Le; Ichiro Ide; Janusch Vajna-Jehle; Julian Frey; Kilian Gerberding; Phi Le Nguyen; Takahiro Komamizu; Teja Kattenborn; Trung Thanh Nguyen

arxiv: 2606.27491 · v1 · pith:XHPDJ3POnew · submitted 2026-06-25 · 💻 cs.CV

SelectAnyTree: A Promptable Instance Segmentation Model for 3D Forest LiDAR Point Clouds

Trung Thanh Nguyen , Daniel Lusk , Kilian Gerberding , Janusch Vajna-Jehle , Tuan-Anh Vu , Duc Viet Le , Tu Vo , Phi Le Nguyen

show 5 more authors

Yasutomo Kawanishi Takahiro Komamizu Ichiro Ide Julian Frey Teja Kattenborn

This is my paper

Pith reviewed 2026-06-29 02:07 UTC · model grok-4.3

classification 💻 cs.CV

keywords instance segmentationLiDAR point cloudspromptable modelsforest monitoringinteractive segmentationtree delineation3D point clouds

0 comments

The pith

SelectAnyTree segments any tree in a 3D forest LiDAR point cloud from a single click at 78.2 IoU.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper presents a model that lets users outline individual trees in large forest point clouds by supplying only a few clicks instead of drawing full labels. It tackles the high cost of creating training data for forest monitoring, where one hectare can contain millions of points across overlapping crowns. A click-to-query encoder converts each click into a prompt that carries position and positive or negative polarity, while a Canopy Height Model supplies an automatic first prompt at likely treetops. These prompts feed a state-space decoder that produces the tree mask in linear time.

Core claim

SelectAnyTree converts user clicks into single content queries through a dedicated encoder that pools local backbone features with 3D position and polarity, supplies an initial prompt from a Canopy Height Model without user input, and decodes the resulting query into one tree mask via a state-space query decoder that handles long-range context in large scenes.

What carries the argument

Click-to-query prompt encoder that produces one content query per click together with a Canopy Height Model-guided first prompt and state-space query decoder.

If this is right

Interactive correction of automatic pre-segmentations becomes feasible at scale.
Annotation effort per hectare falls because each tree requires only the fewest clicks to reach target accuracy.
Linear-time decoding supports processing of hectare-scale scenes without quadratic cost growth.
Fewer parameters and shorter inference time allow the model to run on standard hardware for field use.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same click-to-query design could be adapted to segment other point-cloud objects such as individual buildings or vehicles.
Combining the CHM-guided prompt with multi-temporal LiDAR might enable change detection of tree growth or removal.
Deployment on mobile devices could support on-site verification during forest inventories.

Load-bearing premise

The CHM-guided first prompt and click-to-query encoder will reliably produce usable queries across unseen forest structures and sensor characteristics without domain-specific retraining.

What would settle it

A measurable drop in single-click IoU below the strongest baseline when the model is tested on a new forest region recorded with a different LiDAR sensor and tree species mix.

Figures

Figures reproduced from arXiv: 2606.27491 by Daniel Lusk, Duc Viet Le, Ichiro Ide, Janusch Vajna-Jehle, Julian Frey, Kilian Gerberding, Phi Le Nguyen, Takahiro Komamizu, Teja Kattenborn, Trung Thanh Nguyen, Tuan-Anh Vu, Tu Vo, Yasutomo Kawanishi.

**Figure 1.** Figure 1: Overview of the proposed SelectAnyTree. A forest point cloud is encoded once by the 3D Scene Encoder into reusable voxel features. Each click set is converted by the Prompt Encoder into a single content query that the Query Decoder turns into one tree mask. Training and evaluation share the same Interactive Rollout, with a Canopy Height Model (CHM) treetop as a free, geometry-guided prompt. part segmentati… view at source ↗

**Figure 2.** Figure 2: Interactive multi-click refinement of the proposed SelectAnyTree on a NIBIO (Norway) [ [PITH_FULL_IMAGE:figures/full_fig_p006_2.png] view at source ↗

**Figure 3.** Figure 3: Single-click tree segmentation. Comparison of the three strongest promptable baselines, AGILE3D [ [PITH_FULL_IMAGE:figures/full_fig_p008_3.png] view at source ↗

**Figure 4.** Figure 4: Promptable tree segmentation on FOR-instanceV2 [ [PITH_FULL_IMAGE:figures/full_fig_p016_4.png] view at source ↗

**Figure 5.** Figure 5: Single-click tree segmentation. Comparison of the three strongest promptable baselines, AGILE3D [ [PITH_FULL_IMAGE:figures/full_fig_p017_5.png] view at source ↗

read the original abstract

Automated instance segmentation of forest LiDAR point clouds is increasingly critical as forest monitoring moves toward scalable, detailed, 3D measurement. Yet, progress is constrained by label scarcity for tree instances; a single hectare can hold millions of points and hundreds of overlapping, complex crowns, making manual annotation from scratch with raw data laborious and error-prone. Annotations are often corrected from automatic pre-segmentations, but remain costly as these provide no interactive or AI-assisted refinement. Inspired by the promptable paradigm of foundation segmentation models, we propose SelectAnyTree, a promptable instance segmentation model that delineates any individual tree in a 3D forest point cloud from a few clicks. It introduces two key components: Click-to-query prompt encoder and Canopy Height Model (CHM)-guided first prompt. The former turns each click into a single content query, encoding its 3D position and positive/negative polarity together with a pooled local backbone feature. The latter provides treetops as a geometry- and ecologically guided first prompt without any user input. The resulting prompt query is then decoded into one tree mask by a state-space query decoder to efficiently capture long-range context in large-scale forest scenes with linear-time complexity. We evaluate SelectAnyTree in interactive and instance-level settings across seven diverse forest regions and an independent held-out test dataset, demonstrating strong generalization beyond the training domains. It segments a target tree to 78.2 Intersection over Union (IoU) from a single click, 24.8 points above the strongest promptable baseline, and reaches every accuracy target with the fewest clicks, while using far fewer parameters and less inference time than prior promptable models. The source code is available at https://github.com/thanhhff/SelectAnyTree.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

SelectAnyTree adds a click-to-query encoder and CHM-guided prompt to make promptable segmentation practical on forest LiDAR, with clear held-out gains but thin evidence on cross-sensor robustness.

read the letter

The core contribution is a domain-adapted prompt encoder that turns clicks into queries by combining 3D position, polarity, and pooled local features, paired with an automatic CHM treetop prompt to bootstrap without input. A state-space decoder then produces the mask efficiently. This setup delivers 78.2 IoU from one click on the reported data, 24.8 points above the strongest baseline, and reaches accuracy targets with fewer clicks while using fewer parameters and less inference time.

The evaluation covers seven regions plus an independent held-out set, which is a step above single-site papers. Code release is a plus for checking the numbers. The architecture choices are straightforward and tied to the forest point-cloud constraints, such as long-range context in large scenes.

The soft spot is the generalization claim. The single-click result and the statement about working beyond training domains both rest on the CHM providing reliable first prompts and the encoder transferring across sensors or canopy structures. The paper does not include explicit ablations on CHM failure modes under varying point density, closure, or noise, so that precondition remains untested in the reported experiments.

This work is aimed at remote-sensing and ecology groups that already deal with LiDAR tree mapping and want interactive refinement tools. Readers outside that niche will see incremental domain engineering rather than a broad advance. It is coherent on its own terms and reports reproducible metrics on held-out data, so it deserves a serious referee with requests for the missing robustness checks.

Referee Report

2 major / 2 minor

Summary. SelectAnyTree is a promptable instance segmentation model for 3D forest LiDAR point clouds. It converts user clicks into content queries via a Click-to-query prompt encoder (encoding 3D position, positive/negative polarity, and pooled local backbone features) and augments them with an automatic CHM-guided first prompt for treetop detection. These queries are decoded into individual tree masks using a state-space query decoder with linear-time complexity. The model is evaluated interactively and at instance level on seven diverse forest regions plus an independent held-out test set, reporting 78.2 IoU from a single click (24.8 points above the strongest promptable baseline), fewer clicks to reach all accuracy targets, and advantages in parameter count and inference time over prior models. Source code is released.

Significance. If the performance numbers and cross-domain generalization hold, the work offers a practical advance for scalable forest monitoring by enabling efficient interactive refinement of tree instances in large, complex point clouds where manual annotation is costly. The state-space decoder's linear complexity addresses a key scalability issue for hectare-scale scenes, and the public code release supports reproducibility.

major comments (2)

[§5] §5 (Experiments and held-out evaluation): The central claims of 78.2 IoU from one click and strong generalization beyond training domains rest on the CHM-guided first prompt and click-to-query encoder producing reliable queries across unseen sensors/structures; however, no explicit cross-sensor ablation, CHM peak-detection failure analysis under varying point densities or canopy closure, or sensitivity study is reported, leaving the single-click result and generalization statement insufficiently secured.
[Method] Method (Click-to-query prompt encoder description): The integration of the pooled local backbone feature with 3D position and polarity into the query is described at a high level without specifying pooling radius, feature dimension, or exact concatenation/embedding steps; this detail is load-bearing for the query quality that underpins the reported 24.8-point IoU gain and fewest-clicks result.

minor comments (2)

[Abstract] Abstract and §3: The phrase 'state-space query decoder' is introduced without a short parenthetical reference to the underlying SSM formulation or prior work on first mention, which may slow readers outside the immediate subfield.
[Figures] Figure 3/4 captions (qualitative results): Adding the specific forest region, sensor type, and point density for each example would improve interpretability of the generalization evidence.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback. We address each major comment below and will revise the manuscript to incorporate the suggested clarifications and additional analyses.

read point-by-point responses

Referee: [§5] §5 (Experiments and held-out evaluation): The central claims of 78.2 IoU from one click and strong generalization beyond training domains rest on the CHM-guided first prompt and click-to-query encoder producing reliable queries across unseen sensors/structures; however, no explicit cross-sensor ablation, CHM peak-detection failure analysis under varying point densities or canopy closure, or sensitivity study is reported, leaving the single-click result and generalization statement insufficiently secured.

Authors: We agree that the generalization claims would be more robust with explicit supporting analyses. The current held-out test set already evaluates performance on data from domains distinct from training, but we will add the requested elements to the revised §5: a cross-sensor ablation partitioning results by LiDAR sensor, a failure-mode analysis of CHM peak detection stratified by point density and canopy closure, and a sensitivity study on CHM-guided prompt parameters. These additions will directly support the single-click IoU and cross-domain results. revision: yes
Referee: [Method] Method (Click-to-query prompt encoder description): The integration of the pooled local backbone feature with 3D position and polarity into the query is described at a high level without specifying pooling radius, feature dimension, or exact concatenation/embedding steps; this detail is load-bearing for the query quality that underpins the reported 24.8-point IoU gain and fewest-clicks result.

Authors: We agree that the prompt encoder description requires additional implementation specifics to allow full reproduction of the reported gains. In the revised method section we will explicitly state the pooling radius for local backbone feature extraction, the feature dimension, and the precise concatenation and embedding operations that combine 3D position, polarity, and the pooled features into each content query. revision: yes

Circularity Check

0 steps flagged

No circularity: empirical results on held-out data with independent validation

full rationale

The paper introduces architectural components (Click-to-query prompt encoder, CHM-guided first prompt, state-space query decoder) and reports performance metrics such as 78.2 IoU from a single click on seven diverse regions plus an independent held-out test set. No equations, fitted parameters, or self-citations reduce these metrics to quantities defined or fitted on the same test distributions by construction. The derivation chain consists of model design choices followed by external empirical evaluation without self-referential reductions or load-bearing self-citations.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central performance claims rest on standard supervised training of a deep network on labeled forest point clouds and the assumption that the provided evaluation datasets are representative of real-world deployment conditions. No additional free parameters, axioms, or invented entities are introduced beyond conventional deep-learning practice.

axioms (1)

domain assumption Labeled training data for tree instances is available and sufficiently representative of target deployment forests.
The model is trained on annotated forest point clouds; performance on held-out regions is reported but the abstract does not detail how labels were obtained or their quality.

pith-pipeline@v0.9.1-grok · 5913 in / 1202 out tokens · 33449 ms · 2026-06-29T02:07:26.690465+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

41 extracted references · 3 canonical work pages · 2 internal anchors

[1]

Terrestrial laser scanning in forest ecology: Expanding the horizon.Re- mote Sensing of Environment, 251(112102):1–17, 2020

Kim Calders, Jennifer Adams, John Armston, Harm Bartholomeus, Sebastien Bauwens, Lisa Patrick Bentley, Jerome Chave, F Mark Danson, Miro Demol, Mathias Dis- ney, Rachel Gaulton, Sruthi M Krishna Moorthy, Shaun R Levick, Ninni Saarinen, Crystal Schaaf, Atticus Stovall, Louise Terryn, Phil Wilkes, and Hans Verbeeck. Terrestrial laser scanning in forest ecol...

2020
[2]

End-to- end object detection with Transformers

Nicolas Carion, Francisco Massa, Gabriel Synnaeve, Nicolas Usunier, Alexander Kirillov, and Sergey Zagoruyko. End-to- end object detection with Transformers. InProceedings of the 16th European Conference on Computer Vision,Part I, pages 213–229, 2020. 2

2020
[3]

Masked-attention mask Transformer for universal image segmentation

Bowen Cheng, Ishan Misra, Alexander G Schwing, Alexan- der Kirillov, and Rohit Girdhar. Masked-attention mask Transformer for universal image segmentation. InProceed- ings of the 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 1290–1299, 2022. 2

2022
[4]

4D spatio-temporal ConvNets: Minkowski convolutional neu- ral networks

Christopher Choy, JunYoung Gwak, and Silvio Savarese. 4D spatio-temporal ConvNets: Minkowski convolutional neu- ral networks. InProceedings of the 2019 IEEE Conference on Computer Vision and Pattern Recognition, pages 3075– 3084, 2019. 2

2019
[5]

Se- mantic instance segmentation for autonomous driving

Bert De Brabandere, Davy Neven, and Luc Van Gool. Se- mantic instance segmentation for autonomous driving. In Proceedings of the 2017 IEEE Conference on Computer Vi- sion and Pattern Recognition Workshops, pages 478–480,

2017
[6]

Close-range remote sensing of forest structure for biodiver- sity assessments: A systematic literature review.Current Forestry Reports, 11(1):1–18, 2025

Jan Feigl, Julian Frey, Thomas Seifert, and Barbara Koch. Close-range remote sensing of forest structure for biodiver- sity assessments: A systematic literature review.Current Forestry Reports, 11(1):1–18, 2025. 1

2025
[7]

3D semantic segmentation with submanifold sparse convolutional networks

Benjamin Graham, Martin Engelcke, and Laurens van der Maaten. 3D semantic segmentation with submanifold sparse convolutional networks. InProceedings of the 2018 IEEE Conference on Computer Vision and Pattern Recognition, pages 9224–9232, 2018. 2

2018
[8]

Mamba: Linear-time sequence modeling with selective state spaces

Albert Gu and Tri Dao. Mamba: Linear-time sequence modeling with selective state spaces. InProceedings of the 2024 International Conference on Learning Representations, pages 1–32, 2024. 2, 3

2024
[9]

TreeLearn: A deep learn- ing method for segmenting individual trees from ground- based LiDAR forest point clouds.Ecological Informatics, 84(102888):1–16, 2024

Jonathan Henrich, Jan van Delden, Dominik Seidel, Thomas Kneib, and Alexander S Ecker. TreeLearn: A deep learn- ing method for segmenting individual trees from ground- based LiDAR forest point clouds.Ecological Informatics, 84(102888):1–16, 2024. 1, 2, 5, 7

2024
[10]

Allometric equations for integrating remote sensing imagery into forest monitor- ing programmes.Global Change Biology, 23(1):177–190,

Tommaso Jucker, John Caspersen, J ´erˆome Chave, C ´ecile Antin, Nicolas Barbier, Frans Bongers, Michele Dalponte, Karin Y van Ewijk, David I Forrester, Matthias Haeni, Steven I Higgins, Robert J Holdaway, Yoshiko Iida, Craig Lorimer, Peter L Marshall, St ´ephane Momo, Glenn R Mon- crieff, Pierre Ploton, Lourens Poorter, Kassim Abd Rah- man, Michael Schlu...
[11]

Review on Convolutional Neural Networks (CNN) in vegetation remote sensing.ISPRS Journal of Photogramme- try and Remote Sensing, 173:24–49, 2021

Teja Kattenborn, Jens Leitloff, Felix Schiefer, and Stefan Hinz. Review on Convolutional Neural Networks (CNN) in vegetation remote sensing.ISPRS Journal of Photogramme- try and Remote Sensing, 173:24–49, 2021. 1, 8

2021
[12]

Segment anything

Alexander Kirillov, Eric Mintun, Nikhila Ravi, Hanzi Mao, Chloe Rolland, Laura Gustafson, Tete Xiao, Spencer White- head, Alexander C Berg, Wan-Yen Lo, Piotr Doll ´ar, and Ross Girshick. Segment anything. InProceedings of the 19th IEEE/CVF International Conference on Computer Vi- sion, pages 4015–4026, 2023. 2, 4, 11

2023
[13]

OneFormer3D: One Transformer for unified point cloud segmentation

Maxim Kolodiazhnyi, Anna V orontsova, Anton Konushin, and Danila Rukhovich. OneFormer3D: One Transformer for unified point cloud segmentation. InProceedings of the 2024 IEEE/CVF Conference on Computer Vision and Pat- tern Recognition, pages 20943–20953, 2024. 1, 2, 5, 7, 14

2024
[14]

Probabilistic interactive 3D segmentation with hierarchical neural processes

Jie Liu, Pan Zhou, Zehao Xiao, Jiayi Shen, Wenzhe Yin, Jan- jakob Sonke, and Efstratios Gavves. Probabilistic interactive 3D segmentation with hierarchical neural processes. InPro- ceedings of the 42nd International Conference on Machine Learning, pages 1–20, 2025. 2, 5, 6, 7, 12, 13, 15

2025
[15]

VMamba: Visual state space model.Advances in Neural Information Processing Systems, 37:103031–103063, 2024

Yue Liu, Yunjie Tian, Yuzhong Zhao, Hongtian Yu, Lingxi Xie, Yaowei Wang, Qixiang Ye, and Yunfan Liu. VMamba: Visual state space model.Advances in Neural Information Processing Systems, 37:103031–103063, 2024. 2

2024
[16]

Decoupled weight de- cay regularization

Ilya Loshchilov and Frank Hutter. Decoupled weight de- cay regularization. InProceedings of the 2019 International Conference on Learning Representations, pages 1–19, 2019. 5, 13, 14

2019
[17]

V-Net: Fully convolutional neural networks for volumetric medical image segmentation

Fausto Milletari, Nassir Navab, and Seyed-Ahmad Ahmadi. V-Net: Fully convolutional neural networks for volumetric medical image segmentation. InProceedings of the 4th In- ternational Conference on 3D Vision, pages 565–571, 2016. 5, 11

2016
[18]

Individual tree crown delineation using multispectral LiDAR data.Sensors, 19(24):1–21, 2019

Faizaan Naveed, Baoxin Hu, Jianguo Wang, and G Brent Hall. Individual tree crown delineation using multispectral LiDAR data.Sensors, 19(24):1–21, 2019. 2

2019
[19]

ForestMamba: Sparse Mamba with Geometry-guided Queries for 3D Forest Point Cloud Segmentation

Trung Thanh Nguyen, Tuan-Anh Vu, Duc Viet Le, Yasutomo Kawanishi, Takahiro Komamizu, Ichiro Ide, and Teja Katten- born. ForestMamba: Sparse Mamba with geometry-guided queries for 3D forest point cloud segmentation.Computing Research Repository, arXiv Preprints,arXiv:2606.01549, pages 1–18, 2026. 1, 2, 3, 4, 5, 7, 12, 14

work page internal anchor Pith review Pith/arXiv arXiv 2026
[20]

Estimating plot- level tree heights with LiDAR: Local filtering with a canopy- height based variable window size.Computers and Electron- ics in Agriculture, 37:71–95, 2002

Sorin C Popescu and Randolph H Wynne. Estimating plot- level tree heights with LiDAR: Local filtering with a canopy- height based variable window size.Computers and Electron- ics in Agriculture, 37:71–95, 2002. 2, 3, 4 9

2002
[21]

PointNet: Deep learning on point sets for 3D classification and segmentation

Charles R Qi, Hao Su, Kaichun Mo, and Leonidas J Guibas. PointNet: Deep learning on point sets for 3D classification and segmentation. InProceedings of the 2017 IEEE Con- ference on Computer Vision and Pattern Recognition, pages 652–660, 2017. 2

2017
[22]

PointNet++: Deep hierarchical feature learning on point sets in a metric space.Advances in Neural Information Processing Systems, 30:5105–5114, 2017

Charles Ruizhongtai Qi, Li Yi, Hao Su, and Leonidas J Guibas. PointNet++: Deep hierarchical feature learning on point sets in a metric space.Advances in Neural Information Processing Systems, 30:5105–5114, 2017. 2

2017
[23]

U- Net: Convolutional networks for biomedical image segmen- tation

Olaf Ronneberger, Philipp Fischer, and Thomas Brox. U- Net: Convolutional networks for biomedical image segmen- tation. InProceedings of the 18th International Conference on Medical Image Computing and Computer-Assisted Inter- vention, Part III, pages 234–241, 2015. 2

2015
[24]

Qi, Jean-Emmanuel Deschaud, Beatriz Marcotegui, Franc ¸ois Goulette, and Leonidas Guibas

Hugues Thomas, Charles R. Qi, Jean-Emmanuel Deschaud, Beatriz Marcotegui, Franc ¸ois Goulette, and Leonidas Guibas. KPConv: Flexible and deformable convolution for point clouds. InProceedings of the 17th IEEE/CVF Inter- national Conference on Computer Vision, pages 6411–6420,
[25]

LAUTx — Individual tree point clouds from Austrian forest inventory plots.Zenodo, 2022

A Tockner, C Gollob, T Ritter, and A Noth- durft. LAUTx — Individual tree point clouds from Austrian forest inventory plots.Zenodo, 2022. https://doi.org/10.5281/zenodo.6560112. 5, 6, 8, 12

work page doi:10.5281/zenodo.6560112 2022
[26]

Attention is all you need.Advances in Neural Information Processing Systems, 30, 2017

Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszko- reit, Llion Jones, Aidan N Gomez, Łukasz Kaiser, and Illia Polosukhin. Attention is all you need.Advances in Neural Information Processing Systems, 30, 2017. 2

2017
[27]

Test-time augmentation for 3D point cloud classification and segmentation

Tuan Anh Vu, Srinjay Sarkar, Zhiyuan Zhang, Binh Son Hua, and Sai Kit Yeung. Test-time augmentation for 3D point cloud classification and segmentation. InProceedings of the 2024 International Conference on 3D Vision, pages 1543–1553, 2024. 2

2024
[28]

Re- mote sensing technologies for enhancing forest inventories: A review.Canadian Journal of Remote Sensing, 42(5):619– 641, 2016

Joanne C White, Nicholas C Coops, Michael A Wulder, Mikko Vastaranta, Thomas Hilker, and Piotr Tompalski. Re- mote sensing technologies for enhancing forest inventories: A review.Canadian Journal of Remote Sensing, 42(5):619– 641, 2016. 1

2016
[29]

SegmentAnyTree: A sen- sor and platform agnostic deep learning model for tree seg- mentation using any 3D point cloud data.Remote Sensing of Environment, 313(114367):1–13, 2024

Maciej Wielgosz, Stefano Puliti, Binbin Xiang, Konrad Schindler, and Rasmus Astrup. SegmentAnyTree: A sen- sor and platform agnostic deep learning model for tree seg- mentation using any 3D point cloud data.Remote Sensing of Environment, 313(114367):1–13, 2024. 2

2024
[30]

Maciej Wielgosz, Stefano Puliti, and Rasmus Astrup. SegmentAnyTreeV2: Scaling Transformer-based tree instance segmentation across sensors, platforms, and forests.Computing Research Repository, arXiv Preprints, arXiv:2606.08206, pages 1–20, 2026. 2

work page internal anchor Pith review Pith/arXiv arXiv 2026
[31]

Point Transformer V2: Grouped vector atten- tion and partition-based pooling.Advances in Neural Infor- mation Processing Systems, 35:33330–33342, 2022

Xiaoyang Wu, Yixing Lao, Li Jiang, Xihui Liu, and Heng- shuang Zhao. Point Transformer V2: Grouped vector atten- tion and partition-based pooling.Advances in Neural Infor- mation Processing Systems, 35:33330–33342, 2022. 2

2022
[32]

Point Transformer V3: Simpler, faster, stronger

Xiaoyang Wu, Li Jiang, Peng-Shuai Wang, Zhijian Liu, Xi- hui Liu, Yu Qiao, Wanli Ouyang, Tong He, and Hengshuang Zhao. Point Transformer V3: Simpler, faster, stronger. In Proceedings of the 2024 IEEE/CVF Conference on Com- puter Vision and Pattern Recognition, pages 4840–4851,

2024
[33]

Automated forest inventory: Analysis of high- density airborne LiDAR point clouds with 3D deep learning

Binbin Xiang, Maciej Wielgosz, Theodora Kontogianni, Tor- ben Peters, Stefano Puliti, Rasmus Astrup, and Konrad Schindler. Automated forest inventory: Analysis of high- density airborne LiDAR point clouds with 3D deep learning. Remote Sensing of Environment, 305(114078):1–20, 2024. 1, 2, 5, 7

2024
[34]

Forest- Former3D: A unified framework for end-to-end segmenta- tion of forest LiDAR 3D point clouds

Binbin Xiang, Maciej Wielgosz, Stefano Puliti, Kamil Kr ´al, Martin Kr˚uˇcek, Azim Missarov, and Rasmus Astrup. Forest- Former3D: A unified framework for end-to-end segmenta- tion of forest LiDAR 3D point clouds. InProceedings of the 20th IEEE/CVF International Conference on Computer Vi- sion, pages 24717–24727, 2025. 1, 2, 3, 5, 6, 7, 8, 12, 14, 15, 16

2025
[35]

Lei Yao, Yi Wang, Yawen Cui, Moyun Liu, and Lap-Pui Chau. LaSSM: Efficient semantic-spatial query decoding via Local aggregation and State Space Models for 3D instance segmentation.IEEE Transactions on Circuits and Systems for Video Technology, pages 1–13, 2026. 2, 4

2026
[36]

AGILE3D: Attention guided interactive multi- object 3D segmentation

Yuanwen Yue, Sabarinath Mahadevan, Jonas Schult, Francis Engelmann, Bastian Leibe, Konrad Schindler, and Theodora Kontogianni. AGILE3D: Attention guided interactive multi- object 3D segmentation. InProceedings of the 12th Interna- tional Conference on Learning Representations, pages 1–21,
[37]

2, 5, 6, 7, 8, 12, 13, 15, 17
[38]

Point Transformer

Hengshuang Zhao, Li Jiang, Jiaya Jia, Philip Torr, and Vladlen Koltun. Point Transformer. InProceedings of the 18th IEEE/CVF International Conference on Computer Vi- sion, pages 16259–16268, 2021. 2

2021
[39]

Point-SAM: Promptable 3D segmentation model for point clouds

Yuchen Zhou, Jiayuan Gu, Tung Yen Chiang, Fanbo Xi- ang, and Hao Su. Point-SAM: Promptable 3D segmentation model for point clouds. InProceedings of the 13th Interna- tional Conference on Learning Representations, pages 1–20,
[40]

2, 4, 5, 6, 7, 8, 11, 12, 13, 15, 17
[41]

PartSAM: A scalable promptable part segmentation model trained on native 3D data

Zhe Zhu, Le Wan, Rui Xu, Yiheng Zhang, Honghua Chen, Zhiyang Dou, Cheng Lin, Yuan Liu, and Mingqiang Wei. PartSAM: A scalable promptable part segmentation model trained on native 3D data. InProceedings of the 14th In- ternational Conference on Learning Representations, pages 1–25, 2026. 2, 5, 6, 7, 8, 12, 14, 15, 17 10 A. Promptable Training Loss SelectAn...

2026

[1] [1]

Terrestrial laser scanning in forest ecology: Expanding the horizon.Re- mote Sensing of Environment, 251(112102):1–17, 2020

Kim Calders, Jennifer Adams, John Armston, Harm Bartholomeus, Sebastien Bauwens, Lisa Patrick Bentley, Jerome Chave, F Mark Danson, Miro Demol, Mathias Dis- ney, Rachel Gaulton, Sruthi M Krishna Moorthy, Shaun R Levick, Ninni Saarinen, Crystal Schaaf, Atticus Stovall, Louise Terryn, Phil Wilkes, and Hans Verbeeck. Terrestrial laser scanning in forest ecol...

2020

[2] [2]

End-to- end object detection with Transformers

Nicolas Carion, Francisco Massa, Gabriel Synnaeve, Nicolas Usunier, Alexander Kirillov, and Sergey Zagoruyko. End-to- end object detection with Transformers. InProceedings of the 16th European Conference on Computer Vision,Part I, pages 213–229, 2020. 2

2020

[3] [3]

Masked-attention mask Transformer for universal image segmentation

Bowen Cheng, Ishan Misra, Alexander G Schwing, Alexan- der Kirillov, and Rohit Girdhar. Masked-attention mask Transformer for universal image segmentation. InProceed- ings of the 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 1290–1299, 2022. 2

2022

[4] [4]

4D spatio-temporal ConvNets: Minkowski convolutional neu- ral networks

Christopher Choy, JunYoung Gwak, and Silvio Savarese. 4D spatio-temporal ConvNets: Minkowski convolutional neu- ral networks. InProceedings of the 2019 IEEE Conference on Computer Vision and Pattern Recognition, pages 3075– 3084, 2019. 2

2019

[5] [5]

Se- mantic instance segmentation for autonomous driving

Bert De Brabandere, Davy Neven, and Luc Van Gool. Se- mantic instance segmentation for autonomous driving. In Proceedings of the 2017 IEEE Conference on Computer Vi- sion and Pattern Recognition Workshops, pages 478–480,

2017

[6] [6]

Close-range remote sensing of forest structure for biodiver- sity assessments: A systematic literature review.Current Forestry Reports, 11(1):1–18, 2025

Jan Feigl, Julian Frey, Thomas Seifert, and Barbara Koch. Close-range remote sensing of forest structure for biodiver- sity assessments: A systematic literature review.Current Forestry Reports, 11(1):1–18, 2025. 1

2025

[7] [7]

3D semantic segmentation with submanifold sparse convolutional networks

Benjamin Graham, Martin Engelcke, and Laurens van der Maaten. 3D semantic segmentation with submanifold sparse convolutional networks. InProceedings of the 2018 IEEE Conference on Computer Vision and Pattern Recognition, pages 9224–9232, 2018. 2

2018

[8] [8]

Mamba: Linear-time sequence modeling with selective state spaces

Albert Gu and Tri Dao. Mamba: Linear-time sequence modeling with selective state spaces. InProceedings of the 2024 International Conference on Learning Representations, pages 1–32, 2024. 2, 3

2024

[9] [9]

TreeLearn: A deep learn- ing method for segmenting individual trees from ground- based LiDAR forest point clouds.Ecological Informatics, 84(102888):1–16, 2024

Jonathan Henrich, Jan van Delden, Dominik Seidel, Thomas Kneib, and Alexander S Ecker. TreeLearn: A deep learn- ing method for segmenting individual trees from ground- based LiDAR forest point clouds.Ecological Informatics, 84(102888):1–16, 2024. 1, 2, 5, 7

2024

[10] [10]

Allometric equations for integrating remote sensing imagery into forest monitor- ing programmes.Global Change Biology, 23(1):177–190,

Tommaso Jucker, John Caspersen, J ´erˆome Chave, C ´ecile Antin, Nicolas Barbier, Frans Bongers, Michele Dalponte, Karin Y van Ewijk, David I Forrester, Matthias Haeni, Steven I Higgins, Robert J Holdaway, Yoshiko Iida, Craig Lorimer, Peter L Marshall, St ´ephane Momo, Glenn R Mon- crieff, Pierre Ploton, Lourens Poorter, Kassim Abd Rah- man, Michael Schlu...

[11] [11]

Review on Convolutional Neural Networks (CNN) in vegetation remote sensing.ISPRS Journal of Photogramme- try and Remote Sensing, 173:24–49, 2021

Teja Kattenborn, Jens Leitloff, Felix Schiefer, and Stefan Hinz. Review on Convolutional Neural Networks (CNN) in vegetation remote sensing.ISPRS Journal of Photogramme- try and Remote Sensing, 173:24–49, 2021. 1, 8

2021

[12] [12]

Segment anything

Alexander Kirillov, Eric Mintun, Nikhila Ravi, Hanzi Mao, Chloe Rolland, Laura Gustafson, Tete Xiao, Spencer White- head, Alexander C Berg, Wan-Yen Lo, Piotr Doll ´ar, and Ross Girshick. Segment anything. InProceedings of the 19th IEEE/CVF International Conference on Computer Vi- sion, pages 4015–4026, 2023. 2, 4, 11

2023

[13] [13]

OneFormer3D: One Transformer for unified point cloud segmentation

Maxim Kolodiazhnyi, Anna V orontsova, Anton Konushin, and Danila Rukhovich. OneFormer3D: One Transformer for unified point cloud segmentation. InProceedings of the 2024 IEEE/CVF Conference on Computer Vision and Pat- tern Recognition, pages 20943–20953, 2024. 1, 2, 5, 7, 14

2024

[14] [14]

Probabilistic interactive 3D segmentation with hierarchical neural processes

Jie Liu, Pan Zhou, Zehao Xiao, Jiayi Shen, Wenzhe Yin, Jan- jakob Sonke, and Efstratios Gavves. Probabilistic interactive 3D segmentation with hierarchical neural processes. InPro- ceedings of the 42nd International Conference on Machine Learning, pages 1–20, 2025. 2, 5, 6, 7, 12, 13, 15

2025

[15] [15]

VMamba: Visual state space model.Advances in Neural Information Processing Systems, 37:103031–103063, 2024

Yue Liu, Yunjie Tian, Yuzhong Zhao, Hongtian Yu, Lingxi Xie, Yaowei Wang, Qixiang Ye, and Yunfan Liu. VMamba: Visual state space model.Advances in Neural Information Processing Systems, 37:103031–103063, 2024. 2

2024

[16] [16]

Decoupled weight de- cay regularization

Ilya Loshchilov and Frank Hutter. Decoupled weight de- cay regularization. InProceedings of the 2019 International Conference on Learning Representations, pages 1–19, 2019. 5, 13, 14

2019

[17] [17]

V-Net: Fully convolutional neural networks for volumetric medical image segmentation

Fausto Milletari, Nassir Navab, and Seyed-Ahmad Ahmadi. V-Net: Fully convolutional neural networks for volumetric medical image segmentation. InProceedings of the 4th In- ternational Conference on 3D Vision, pages 565–571, 2016. 5, 11

2016

[18] [18]

Individual tree crown delineation using multispectral LiDAR data.Sensors, 19(24):1–21, 2019

Faizaan Naveed, Baoxin Hu, Jianguo Wang, and G Brent Hall. Individual tree crown delineation using multispectral LiDAR data.Sensors, 19(24):1–21, 2019. 2

2019

[19] [19]

ForestMamba: Sparse Mamba with Geometry-guided Queries for 3D Forest Point Cloud Segmentation

Trung Thanh Nguyen, Tuan-Anh Vu, Duc Viet Le, Yasutomo Kawanishi, Takahiro Komamizu, Ichiro Ide, and Teja Katten- born. ForestMamba: Sparse Mamba with geometry-guided queries for 3D forest point cloud segmentation.Computing Research Repository, arXiv Preprints,arXiv:2606.01549, pages 1–18, 2026. 1, 2, 3, 4, 5, 7, 12, 14

work page internal anchor Pith review Pith/arXiv arXiv 2026

[20] [20]

Estimating plot- level tree heights with LiDAR: Local filtering with a canopy- height based variable window size.Computers and Electron- ics in Agriculture, 37:71–95, 2002

Sorin C Popescu and Randolph H Wynne. Estimating plot- level tree heights with LiDAR: Local filtering with a canopy- height based variable window size.Computers and Electron- ics in Agriculture, 37:71–95, 2002. 2, 3, 4 9

2002

[21] [21]

PointNet: Deep learning on point sets for 3D classification and segmentation

Charles R Qi, Hao Su, Kaichun Mo, and Leonidas J Guibas. PointNet: Deep learning on point sets for 3D classification and segmentation. InProceedings of the 2017 IEEE Con- ference on Computer Vision and Pattern Recognition, pages 652–660, 2017. 2

2017

[22] [22]

PointNet++: Deep hierarchical feature learning on point sets in a metric space.Advances in Neural Information Processing Systems, 30:5105–5114, 2017

Charles Ruizhongtai Qi, Li Yi, Hao Su, and Leonidas J Guibas. PointNet++: Deep hierarchical feature learning on point sets in a metric space.Advances in Neural Information Processing Systems, 30:5105–5114, 2017. 2

2017

[23] [23]

U- Net: Convolutional networks for biomedical image segmen- tation

Olaf Ronneberger, Philipp Fischer, and Thomas Brox. U- Net: Convolutional networks for biomedical image segmen- tation. InProceedings of the 18th International Conference on Medical Image Computing and Computer-Assisted Inter- vention, Part III, pages 234–241, 2015. 2

2015

[24] [24]

Qi, Jean-Emmanuel Deschaud, Beatriz Marcotegui, Franc ¸ois Goulette, and Leonidas Guibas

Hugues Thomas, Charles R. Qi, Jean-Emmanuel Deschaud, Beatriz Marcotegui, Franc ¸ois Goulette, and Leonidas Guibas. KPConv: Flexible and deformable convolution for point clouds. InProceedings of the 17th IEEE/CVF Inter- national Conference on Computer Vision, pages 6411–6420,

[25] [25]

LAUTx — Individual tree point clouds from Austrian forest inventory plots.Zenodo, 2022

A Tockner, C Gollob, T Ritter, and A Noth- durft. LAUTx — Individual tree point clouds from Austrian forest inventory plots.Zenodo, 2022. https://doi.org/10.5281/zenodo.6560112. 5, 6, 8, 12

work page doi:10.5281/zenodo.6560112 2022

[26] [26]

Attention is all you need.Advances in Neural Information Processing Systems, 30, 2017

Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszko- reit, Llion Jones, Aidan N Gomez, Łukasz Kaiser, and Illia Polosukhin. Attention is all you need.Advances in Neural Information Processing Systems, 30, 2017. 2

2017

[27] [27]

Test-time augmentation for 3D point cloud classification and segmentation

Tuan Anh Vu, Srinjay Sarkar, Zhiyuan Zhang, Binh Son Hua, and Sai Kit Yeung. Test-time augmentation for 3D point cloud classification and segmentation. InProceedings of the 2024 International Conference on 3D Vision, pages 1543–1553, 2024. 2

2024

[28] [28]

Re- mote sensing technologies for enhancing forest inventories: A review.Canadian Journal of Remote Sensing, 42(5):619– 641, 2016

Joanne C White, Nicholas C Coops, Michael A Wulder, Mikko Vastaranta, Thomas Hilker, and Piotr Tompalski. Re- mote sensing technologies for enhancing forest inventories: A review.Canadian Journal of Remote Sensing, 42(5):619– 641, 2016. 1

2016

[29] [29]

SegmentAnyTree: A sen- sor and platform agnostic deep learning model for tree seg- mentation using any 3D point cloud data.Remote Sensing of Environment, 313(114367):1–13, 2024

Maciej Wielgosz, Stefano Puliti, Binbin Xiang, Konrad Schindler, and Rasmus Astrup. SegmentAnyTree: A sen- sor and platform agnostic deep learning model for tree seg- mentation using any 3D point cloud data.Remote Sensing of Environment, 313(114367):1–13, 2024. 2

2024

[30] [30]

Maciej Wielgosz, Stefano Puliti, and Rasmus Astrup. SegmentAnyTreeV2: Scaling Transformer-based tree instance segmentation across sensors, platforms, and forests.Computing Research Repository, arXiv Preprints, arXiv:2606.08206, pages 1–20, 2026. 2

work page internal anchor Pith review Pith/arXiv arXiv 2026

[31] [31]

Point Transformer V2: Grouped vector atten- tion and partition-based pooling.Advances in Neural Infor- mation Processing Systems, 35:33330–33342, 2022

Xiaoyang Wu, Yixing Lao, Li Jiang, Xihui Liu, and Heng- shuang Zhao. Point Transformer V2: Grouped vector atten- tion and partition-based pooling.Advances in Neural Infor- mation Processing Systems, 35:33330–33342, 2022. 2

2022

[32] [32]

Point Transformer V3: Simpler, faster, stronger

Xiaoyang Wu, Li Jiang, Peng-Shuai Wang, Zhijian Liu, Xi- hui Liu, Yu Qiao, Wanli Ouyang, Tong He, and Hengshuang Zhao. Point Transformer V3: Simpler, faster, stronger. In Proceedings of the 2024 IEEE/CVF Conference on Com- puter Vision and Pattern Recognition, pages 4840–4851,

2024

[33] [33]

Automated forest inventory: Analysis of high- density airborne LiDAR point clouds with 3D deep learning

Binbin Xiang, Maciej Wielgosz, Theodora Kontogianni, Tor- ben Peters, Stefano Puliti, Rasmus Astrup, and Konrad Schindler. Automated forest inventory: Analysis of high- density airborne LiDAR point clouds with 3D deep learning. Remote Sensing of Environment, 305(114078):1–20, 2024. 1, 2, 5, 7

2024

[34] [34]

Forest- Former3D: A unified framework for end-to-end segmenta- tion of forest LiDAR 3D point clouds

Binbin Xiang, Maciej Wielgosz, Stefano Puliti, Kamil Kr ´al, Martin Kr˚uˇcek, Azim Missarov, and Rasmus Astrup. Forest- Former3D: A unified framework for end-to-end segmenta- tion of forest LiDAR 3D point clouds. InProceedings of the 20th IEEE/CVF International Conference on Computer Vi- sion, pages 24717–24727, 2025. 1, 2, 3, 5, 6, 7, 8, 12, 14, 15, 16

2025

[35] [35]

Lei Yao, Yi Wang, Yawen Cui, Moyun Liu, and Lap-Pui Chau. LaSSM: Efficient semantic-spatial query decoding via Local aggregation and State Space Models for 3D instance segmentation.IEEE Transactions on Circuits and Systems for Video Technology, pages 1–13, 2026. 2, 4

2026

[36] [36]

AGILE3D: Attention guided interactive multi- object 3D segmentation

Yuanwen Yue, Sabarinath Mahadevan, Jonas Schult, Francis Engelmann, Bastian Leibe, Konrad Schindler, and Theodora Kontogianni. AGILE3D: Attention guided interactive multi- object 3D segmentation. InProceedings of the 12th Interna- tional Conference on Learning Representations, pages 1–21,

[37] [37]

2, 5, 6, 7, 8, 12, 13, 15, 17

[38] [38]

Point Transformer

Hengshuang Zhao, Li Jiang, Jiaya Jia, Philip Torr, and Vladlen Koltun. Point Transformer. InProceedings of the 18th IEEE/CVF International Conference on Computer Vi- sion, pages 16259–16268, 2021. 2

2021

[39] [39]

Point-SAM: Promptable 3D segmentation model for point clouds

Yuchen Zhou, Jiayuan Gu, Tung Yen Chiang, Fanbo Xi- ang, and Hao Su. Point-SAM: Promptable 3D segmentation model for point clouds. InProceedings of the 13th Interna- tional Conference on Learning Representations, pages 1–20,

[40] [40]

2, 4, 5, 6, 7, 8, 11, 12, 13, 15, 17

[41] [41]

PartSAM: A scalable promptable part segmentation model trained on native 3D data

Zhe Zhu, Le Wan, Rui Xu, Yiheng Zhang, Honghua Chen, Zhiyang Dou, Cheng Lin, Yuan Liu, and Mingqiang Wei. PartSAM: A scalable promptable part segmentation model trained on native 3D data. InProceedings of the 14th In- ternational Conference on Learning Representations, pages 1–25, 2026. 2, 5, 6, 7, 8, 12, 14, 15, 17 10 A. Promptable Training Loss SelectAn...

2026