ForestMamba: Sparse Mamba with Geometry-guided Queries for 3D Forest Point Cloud Segmentation

Duc Viet Le; Ichiro Ide; Takahiro Komamizu; Teja Kattenborn; Trung Thanh Nguyen; Tuan-Anh Vu; Yasutomo Kawanishi

arxiv: 2606.01549 · v1 · pith:N5V7N5ZPnew · submitted 2026-06-01 · 💻 cs.CV

ForestMamba: Sparse Mamba with Geometry-guided Queries for 3D Forest Point Cloud Segmentation

Trung Thanh Nguyen , Tuan-Anh Vu , Duc Viet Le , Yasutomo Kawanishi , Takahiro Komamizu , Ichiro Ide , Teja Kattenborn This is my paper

Pith reviewed 2026-06-28 15:46 UTC · model grok-4.3

classification 💻 cs.CV

keywords forest point cloud segmentationMambaLiDARcanopy height modelinstance segmentationsemantic segmentationsparse voxelsstate space models

0 comments

The pith

ForestMamba replaces quadratic attention with linear state-space modeling while using canopy height maxima to seed queries for forest point cloud segmentation.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper sets out to establish that forest structural priors can be injected directly into feature encoding, query generation, and refinement steps inside a Mamba architecture to handle the size and complexity of LiDAR forest scenes. Vertical slab serialization of sparse voxels, on-the-fly multi-scale canopy height models for query seeding, and a dual-path Mamba decoder are presented as the mechanisms that deliver both higher accuracy and linear scaling. A reader would care if the claim holds because forest monitoring at landscape scale currently hits hard limits on time and memory when transformer methods are applied to full scenes. Experiments on seven regions are used to argue that the gains hold across geographic variability.

Core claim

ForestMamba shows that a sparse encoder with vertical-priority slab serialization, geometry-guided query initialization from multi-scale canopy height models, and a Mamba-based query decoder with local kNN aggregation together produce better semantic and instance segmentation than existing sparse convolution or transformer baselines while running at linear complexity.

What carries the argument

Geometry-guided query initialization that uses canopy maxima from an on-the-fly multi-scale canopy height model as ecologically meaningful seeds, supplemented by farthest point sampling for understory coverage.

If this is right

Individual tree separation improves specifically in dense overlapping canopy areas because query seeds are placed at local height maxima.
Inference on full forest scenes runs three times faster and uses 2.3 times less GPU memory than transformer equivalents because attention is replaced by state-space modeling.
Both semantic labeling of points and instance separation of trees benefit from the same vertical serialization and dual-path refinement steps.
Performance remains consistent across seven regions that differ in species, density, and terrain.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

If the canopy-height seeding step proves stable, analogous structural priors could be extracted for other 3D natural scenes such as urban vegetation or coral reefs.
The linear scaling could allow processing of entire drone surveys on modest hardware, opening the possibility of near-real-time updates to forest inventories.
The same query-refinement block might be reused in other point-cloud tasks where vertical structure dominates, such as building facade parsing or terrain classification.

Load-bearing premise

The canopy maxima extracted from the multi-scale canopy height model supply seeds that improve tree separation in overlapping regions beyond what generic point sampling achieves.

What would settle it

An ablation study on a new forest region in which disabling the canopy-height-model query seeding drops both semantic and instance segmentation metrics to the level of the strongest non-geometry baseline while the reported speed and memory advantages remain unchanged.

Figures

Figures reproduced from arXiv: 2606.01549 by Duc Viet Le, Ichiro Ide, Takahiro Komamizu, Teja Kattenborn, Trung Thanh Nguyen, Tuan-Anh Vu, Yasutomo Kawanishi.

**Figure 1.** Figure 1: Overview of the proposed ForestMamba. The input point cloud is voxelized and processed by a structure-aware sparse encoder with vertical-priority serialization. A geometry-guided module initializes instance queries using Canopy Height Model (CHM)- based canopy peaks and Farthest Point Sampling (FPS). Resulting queries are refined by a Mamba-based query decoder to jointly predict semantic labels and tree in… view at source ↗

**Figure 2.** Figure 2: Overview of the Mamba-based Query Decoder. Query tokens are iteratively refined through local kNN aggregation, spatial Mamba modeling, and feed-forward layers, followed by mask prediction. 3.3 Mamba-based query decoder Motivation. As illustrated in [PITH_FULL_IMAGE:figures/full_fig_p008_2.png] view at source ↗

**Figure 3.** Figure 3: Qualitative comparison of instance segmentation on NIBIO (top) and SCION (bottom) subsets. Top-down and side views are shown for each scene, where ground and unlabeled points are omitted in the top-down view. Red boxes mark missing or fragmented predictions. Color is randomly assigned per instance. scale Canopy Height Model (CHM) peak detection with Farthest Point Sampling (FPS) supplementation, and (3) M… view at source ↗

read the original abstract

AI-based semantic and instance segmentation of terrestrial and drone LiDAR point clouds is emerging as a transformative approach for converting the complex 3D structure of forests into actionable information for forest monitoring and biodiversity assessment. However, forest LiDAR scenes remain highly challenging due to their large data volumes, irregular sampling density, overlapping and complex canopy structure, and geographic variability. Existing methods based on sparse convolutions or Transformers achieve promising results, but suffer from two key limitations: Quadratic complexity of attention scales poorly to large forest scenes, and Generic context modeling does not exploit forest structural priors, limiting tree separation in complex regions. To address these challenges, we propose ForestMamba, a structure-aware method that incorporates forest-specific priors into feature encoding, query generation, and query refinement, while replacing quadratic attention with linear-time state-space modeling. First, we introduce a sparse encoder with vertical-priority slab serialization that organizes sparse voxels into vertically coherent sequences for efficient long-range context modeling. Second, we propose a geometry-guided query initialization strategy based on an on-the-fly multi-scale Canopy Height Model (CHM), where canopy maxima provide ecologically meaningful query seeds, supplemented by Farthest Point Sampling (FPS) to cover understory trees. Third, we design a Mamba-based query decoder that combines local kNN voxel aggregation with a spatial dual-path Mamba for query refinement with linear computational complexity. Extensive experiments across seven forest regions demonstrate that ForestMamba consistently outperforms existing baselines in both segmentation tasks, while achieving 3 times faster inference and 2.3 times lower GPU memory than Transformer-based methods.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

ForestMamba adapts Mamba to forest LiDAR with vertical slabs and CHM query seeds for linear scaling, but the CHM contribution lacks clear isolation from the backbone.

read the letter

The main takeaway is that this paper swaps quadratic attention for Mamba in 3D forest segmentation and adds two domain tweaks: vertical-priority slab serialization in the encoder and CHM maxima to seed queries, with FPS for understory coverage. The dual-path Mamba decoder then refines queries locally and globally.

What stands out as new is the specific combination of slab ordering for vertical coherence and on-the-fly multi-scale CHM for query initialization. The paper does a solid job framing the real-world constraints—large irregular scenes, overlapping canopies, and the need for efficiency in monitoring applications—and shows how linear-complexity state-space modeling can address the scaling limit of prior Transformer work.

The soft spot is the geometry-guided query claim. The abstract states that CHM maxima supply ecologically meaningful seeds that aid separation in complex regions, yet the stress-test concern holds: without an ablation that holds the Mamba encoder fixed and varies only the query initialization (CHM versus FPS-only), the reported gains could come mostly from the backbone rather than the forest prior. The seven-region experiments and efficiency numbers (3x inference, 2.3x memory) are useful if the tables back them with error bars and dataset sizes, but those details are not visible in the abstract.

This is aimed at CV researchers working on point clouds for ecology or remote sensing. A reader already using Mamba or sparse methods on natural scenes would get concrete ideas from the serialization and query strategy. The work shows honest engagement with the limitations of existing approaches, so it deserves a serious referee. I would send it to review and ask specifically for the CHM ablation.

Referee Report

1 major / 1 minor

Summary. The paper proposes ForestMamba, a structure-aware architecture for semantic and instance segmentation of large-scale forest LiDAR point clouds. It replaces quadratic attention with linear state-space modeling via a sparse vertical-slab Mamba encoder, introduces geometry-guided query initialization from on-the-fly multi-scale Canopy Height Model (CHM) maxima (supplemented by FPS), and uses a dual-path Mamba query decoder with local kNN aggregation. The central claim is that this yields consistent outperformance over baselines across seven forest regions together with 3× faster inference and 2.3× lower GPU memory than Transformer-based methods.

Significance. If the reported gains are reproducible and the CHM-guided initialization is shown to drive separation improvements beyond the linear-complexity backbone, the work would offer a practical advance for processing voluminous, structurally complex forest scenes by injecting domain priors into an efficient sequence model, with direct relevance to ecological monitoring.

major comments (1)

[Abstract] Abstract: the claim that 'canopy maxima provide ecologically meaningful query seeds' that 'meaningfully aid tree separation where canopies overlap' is presented as the second key contribution and is load-bearing for the performance delta; however, no ablation isolating the CHM initialization from the sparse vertical-slab Mamba encoder is described, nor is any quantitative comparison (e.g., seed quality metrics or per-region IoU deltas) between CHM+FPS queries and FPS-only queries supplied.

minor comments (1)

[Abstract] Abstract: quantitative metrics, dataset sizes, error bars, and ablation tables are referenced but not reported, making it impossible to assess the magnitude or statistical reliability of the claimed gains.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for the constructive feedback on our manuscript. We address the major comment below and will revise the paper accordingly to strengthen the evidence for our claims.

read point-by-point responses

Referee: [Abstract] Abstract: the claim that 'canopy maxima provide ecologically meaningful query seeds' that 'meaningfully aid tree separation where canopies overlap' is presented as the second key contribution and is load-bearing for the performance delta; however, no ablation isolating the CHM initialization from the sparse vertical-slab Mamba encoder is described, nor is any quantitative comparison (e.g., seed quality metrics or per-region IoU deltas) between CHM+FPS queries and FPS-only queries supplied.

Authors: We agree that an explicit ablation isolating the CHM-guided query initialization from the sparse vertical-slab Mamba encoder is needed to substantiate the claim. The current manuscript reports overall gains against external baselines but does not include a controlled within-architecture comparison of CHM+FPS versus FPS-only query seeding. In the revised version we will add this ablation, reporting per-region IoU/mIoU deltas, seed quality metrics (e.g., precision of canopy maxima as tree centers), and qualitative analysis of overlapping-canopy cases. revision: yes

Circularity Check

0 steps flagged

No significant circularity; claims rest on empirical validation

full rationale

The paper presents an empirical method (sparse Mamba encoder + CHM-guided query init + dual-path decoder) whose central claims are outperformance on seven forest regions plus efficiency gains versus Transformers. No equations, derivations, or fitted-parameter predictions appear in the provided text. The geometry-guided CHM initialization is an input prior, not a self-referential definition or renamed fit. No self-citation chains or uniqueness theorems are invoked to force the architecture. The derivation chain is therefore self-contained against external benchmarks and receives the default non-circularity finding.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract-only review yields no explicit free parameters, axioms, or invented entities; standard point-cloud and Mamba assumptions are implicit but not enumerated.

pith-pipeline@v0.9.1-grok · 5848 in / 1070 out tokens · 19643 ms · 2026-06-28T15:46:21.272675+00:00 · methodology

discussion (0)

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

SelectAnyTree: A Promptable Instance Segmentation Model for 3D Forest LiDAR Point Clouds
cs.CV 2026-06 conditional novelty 7.0

SelectAnyTree is a promptable instance segmentation model for 3D forest LiDAR point clouds that achieves 78.2 IoU from a single click via a click-to-query prompt encoder, CHM-guided first prompt, and state-space query...

Reference graph

Works this paper leans on

42 extracted references · 1 canonical work pages · cited by 1 Pith paper

[1]

Benjamin Brede, Kim Calders, Alvaro Lau, Pasi Raumonen, Harm M Bartholomeus, Martin Herold, and Lammert Kooistra. Non-destructive tree volume estimation through quantitative structure modelling: Comparing UA V laser scanning with terrestrial Li- DAR.Remote Sensing of Environment, 233(111355):1–14, 2022

2022
[2]

Ter- restrial laser scanning in forest ecology: Expanding the horizon.Remote Sensing of Environment, 251(112102):1–17, 2020

Kim Calders, Jennifer Adams, John Armston, Harm Bartholomeus, Sebastien Bauwens, Lisa Patrick Bentley, Jerome Chave, F Mark Danson, Miro Demol, Mathias Disney, Rachel Gaulton, Sruthi M Krishna Moorthy, Shaun R Levick, Ninni Saarinen, Crystal Schaaf, Atticus Stovall, Louise Terryn, Phil Wilkes, and Hans Verbeeck. Ter- restrial laser scanning in forest ecol...

2020
[3]

End-to-end object detection with Transformers

Nicolas Carion, Francisco Massa, Gabriel Synnaeve, Nicolas Usunier, Alexander Kir- illov, and Sergey Zagoruyko. End-to-end object detection with Transformers. InPro- ceedings of the 16th European Conference on Computer Vision,Part I, pages 213–229, 2020

2020
[4]

Masked-attention mask Transformer for universal image segmentation

Bowen Cheng, Ishan Misra, Alexander G Schwing, Alexander Kirillov, and Rohit Gird- har. Masked-attention mask Transformer for universal image segmentation. InPro- ceedings of the 2022 IEEE/CVF Conference on Computer Vision and Pattern Recogni- tion, pages 1290–1299, 2022

2022
[5]

4D spatio-temporal Con- vNets: Minkowski convolutional neural networks

Christopher Choy, JunYoung Gwak, and Silvio Savarese. 4D spatio-temporal Con- vNets: Minkowski convolutional neural networks. InProceedings of the 2019 IEEE Conference on Computer Vision and Pattern Recognition, pages 3075–3084, 2019

2019
[6]

Semantic instance segmentation for autonomous driving

Bert De Brabandere, Davy Neven, and Luc Van Gool. Semantic instance segmentation for autonomous driving. InProceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops, pages 478–480, 2017

2017
[7]

Aboveground biomass density models for NASA’s Global Ecosystem Dynamics Investigation (GEDI) LiDAR mission.Remote Sensing of Environment, 270(112845): 1–20, 2022

Laura Duncanson, James R Kellner, John Armston, Ralph Dubayah, David M Minor, Steven Hancock, Sean P Healey, Paul L Patterson, Svetlana Saarela, Suzanne Marselis, et al. Aboveground biomass density models for NASA’s Global Ecosystem Dynamics Investigation (GEDI) LiDAR mission.Remote Sensing of Environment, 270(112845): 1–20, 2022

2022
[8]

A density-based al- gorithm for discovering clusters in large spatial databases with noise

Martin Ester, Hans-Peter Kriegel, Jörg Sander, and Xiaowei Xu. A density-based al- gorithm for discovering clusters in large spatial databases with noise. InProceedings of the 2nd International Conference on Knowledge Discovery and Data Mining, page 226–231, 1996

1996
[9]

Close-range remote sensing of forest structure for biodiversity assessments: A systematic literature review.Current Forestry Reports, 11(1):1–18, 2025

Jan Feigl, Julian Frey, Thomas Seifert, and Barbara Koch. Close-range remote sensing of forest structure for biodiversity assessments: A systematic literature review.Current Forestry Reports, 11(1):1–18, 2025

2025
[10]

3D semantic seg- mentation with submanifold sparse convolutional networks

Benjamin Graham, Martin Engelcke, and Laurens van der Maaten. 3D semantic seg- mentation with submanifold sparse convolutional networks. InProceedings of the 2018 IEEE Conference on Computer Vision and Pattern Recognition, pages 9224–9232, 2018. 15

2018
[11]

Mamba: Linear-time sequence modeling with selective state spaces

Albert Gu and Tri Dao. Mamba: Linear-time sequence modeling with selective state spaces. InProceedings of the 2024 International Conference on Learning Representa- tions, pages 1–32, 2024

2024
[12]

Mamba3D: Enhancing local features for 3D point cloud analysis via state space model

Xu Han, Yuan Tang, Zhaoxuan Wang, and Xianzhi Li. Mamba3D: Enhancing local features for 3D point cloud analysis via state space model. InProceedings of the 32nd ACM International Conference on Multimedia, page 4995–5004, 2024

2024
[13]

Towards general deep-learning-based tree in- stance segmentation models

Jonathan Henrich and Jan van Delden. Towards general deep-learning-based tree in- stance segmentation models. InProceedings of the 2024 International Conference on Learning Representations Workshop on Machine Learning for Remote Sensing, pages 1–6, 2024

2024
[14]

TreeLearn: A deep learning method for segmenting individual trees from ground-based LiDAR forest point clouds.Ecological Informatics, 84(102888):1–16, 2024

Jonathan Henrich, Jan van Delden, Dominik Seidel, Thomas Kneib, and Alexander S Ecker. TreeLearn: A deep learning method for segmenting individual trees from ground-based LiDAR forest point clouds.Ecological Informatics, 84(102888):1–16, 2024

2024
[15]

Allometric equations for integrating remote sensing imagery into forest monitoring programmes.Global Change Biology, 23(1):177–190, 2017

Tommaso Jucker, John Caspersen, Jérôme Chave, Cécile Antin, Nicolas Barbier, Frans Bongers, Michele Dalponte, Karin Y van Ewijk, David I Forrester, Matthias Haeni, Steven I Higgins, Robert J Holdaway, Yoshiko Iida, Craig Lorimer, Peter L Marshall, Stéphane Momo, Glenn R Moncrieff, Pierre Ploton, Lourens Poorter, Kassim Abd Rahman, Michael Schlund, Bonaven...

2017
[16]

Review on Convolu- tional Neural Networks (CNN) in vegetation remote sensing.ISPRS Journal of Pho- togrammetry and Remote Sensing, 173:24–49, 2021

Teja Kattenborn, Jens Leitloff, Felix Schiefer, and Stefan Hinz. Review on Convolu- tional Neural Networks (CNN) in vegetation remote sensing.ISPRS Journal of Pho- togrammetry and Remote Sensing, 173:24–49, 2021

2021
[17]

OneFormer3D: One Transformer for unified point cloud segmentation

Maxim Kolodiazhnyi, Anna V orontsova, Anton Konushin, and Danila Rukhovich. OneFormer3D: One Transformer for unified point cloud segmentation. InProceed- ings of the 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 20943–20953, 2024

2024
[18]

Harold W. Kuhn. The Hungarian method for the assignment problem.Naval Research Logistics Quarterly, 2(1–2):83–97, 1955

1955
[19]

PointMamba: A simple state space model for point cloud analy- sis.Advances in Neural Information Processing Systems, 37:32653–32677, 2024

Dingkang Liang, Xin Zhou, Xinyu Wang, Xingkui Zhu, Wei Xu, Zhikang Zheng, Yifei Song, and Xiang Bai. PointMamba: A simple state space model for point cloud analy- sis.Advances in Neural Information Processing Systems, 37:32653–32677, 2024

2024
[20]

VMamba: Visual state space model.Advances in Neural Information Processing Systems, 37:103031–103063, 2024

Yue Liu, Yunjie Tian, Yuzhong Zhao, Hongtian Yu, Lingxi Xie, Yaowei Wang, Qix- iang Ye, and Yunfan Liu. VMamba: Visual state space model.Advances in Neural Information Processing Systems, 37:103031–103063, 2024

2024
[21]

Decoupled weight decay regularization

Ilya Loshchilov and Frank Hutter. Decoupled weight decay regularization. InProceed- ings of the 2019 International Conference on Learning Representations, pages 1–19, 2019. 16

2019
[22]

Dening Lu, Linlin Xu, Jun Zhou, Kyle Gao, Zheng Gong, and Dedong Zhang. 3D- UMamba: 3D U-Net with state space model for semantic segmentation of multi-source LiDAR point clouds.International Journal of Applied Earth Observation and Geoin- formation, 136(104401):1–14, 2025

2025
[23]

Individual tree crown delineation using multispectral LiDAR data.Sensors, 19(24):1–21, 2019

Faizaan Naveed, Baoxin Hu, Jianguo Wang, and G Brent Hall. Individual tree crown delineation using multispectral LiDAR data.Sensors, 19(24):1–21, 2019

2019
[24]

Efficient non-maximum suppression

Alexander Neubeck and Luc Van Gool. Efficient non-maximum suppression. InPro- ceedings of the 18th International Conference on Pattern Recognition, volume 3, pages 850–855, 2006

2006
[25]

Estimating plot-level tree heights with Li- DAR: local filtering with a canopy-height based variable window size.Computers and Electronics in Agriculture, 37:71–95, 2002

Sorin C Popescu and Randolph H Wynne. Estimating plot-level tree heights with Li- DAR: local filtering with a canopy-height based variable window size.Computers and Electronics in Agriculture, 37:71–95, 2002

2002
[26]

Hans Pretzsch, Cory Matthew, and Jochen Dieler. Allometry of tree crown structure: Relevance for space occupation at the individual plant level and for self-thinning at the stand level.Growth and Defence in Plants: Resource Allocation at Multiple Scales, 220:287–310, 2012

2012
[27]

PointNet: Deep learning on point sets for 3D classification and segmentation

Charles R Qi, Hao Su, Kaichun Mo, and Leonidas J Guibas. PointNet: Deep learning on point sets for 3D classification and segmentation. InProceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition, pages 652–660, 2017

2017
[28]

PointNet++: Deep hierarchical feature learning on point sets in a metric space.Advances in Neural Infor- mation Processing Systems, 30:5105–5114, 2017

Charles Ruizhongtai Qi, Li Yi, Hao Su, and Leonidas J Guibas. PointNet++: Deep hierarchical feature learning on point sets in a metric space.Advances in Neural Infor- mation Processing Systems, 30:5105–5114, 2017

2017
[29]

Aldino Rizaldy, Fabian Ewald Fassnacht, Ahmed Jamal Afifi, Hua Jiang, Richard Gloaguen, and Pedram Ghamisi. Label-efficient 3D forest mapping: Self-supervised and transfer learning for individual, structural, and species analysis.Computing Re- search Repository, arXiv Preprints,arXiv:2503.10243, pages 1–47, 2025

work page arXiv 2025
[30]

U-Net: Convolutional networks for biomedical image segmentation

Olaf Ronneberger, Philipp Fischer, and Thomas Brox. U-Net: Convolutional networks for biomedical image segmentation. InProceedings of the 18th International Confer- ence on Medical Image Computing and Computer-Assisted Intervention, Part III, pages 234–241, 2015

2015
[31]

Qi, Jean-Emmanuel Deschaud, Beatriz Marcotegui, François Goulette, and Leonidas Guibas

Hugues Thomas, Charles R. Qi, Jean-Emmanuel Deschaud, Beatriz Marcotegui, François Goulette, and Leonidas Guibas. KPConv: Flexible and deformable convolu- tion for point clouds. InProceedings of the 17th IEEE/CVF International Conference on Computer Vision, pages 6411–6420, 2019

2019
[32]

Test- time augmentation for 3D point cloud classification and segmentation

Tuan Anh Vu, Srinjay Sarkar, Zhiyuan Zhang, Binh Son Hua, and Sai Kit Yeung. Test- time augmentation for 3D point cloud classification and segmentation. InProceedings of the 2024 International Conference on 3D Vision, pages 1543–1553, 2024

2024
[33]

Remote sensing technologies for enhancing forest inven- tories: A review.Canadian Journal of Remote Sensing, 42(5):619–641, 2016

Joanne C White, Nicholas C Coops, Michael A Wulder, Mikko Vastaranta, Thomas Hilker, and Piotr Tompalski. Remote sensing technologies for enhancing forest inven- tories: A review.Canadian Journal of Remote Sensing, 42(5):619–641, 2016. 17

2016
[34]

SegmentAnyTree: A sensor and platform agnostic deep learning model for tree segmentation using any 3D point cloud data.Remote Sensing of Environment, 313 (114367):1–13, 2024

Maciej Wielgosz, Stefano Puliti, Binbin Xiang, Konrad Schindler, and Rasmus As- trup. SegmentAnyTree: A sensor and platform agnostic deep learning model for tree segmentation using any 3D point cloud data.Remote Sensing of Environment, 313 (114367):1–13, 2024

2024
[35]

Point Trans- former V2: Grouped vector attention and partition-based pooling.Advances in Neural Information Processing Systems, 35:33330–33342, 2022

Xiaoyang Wu, Yixing Lao, Li Jiang, Xihui Liu, and Hengshuang Zhao. Point Trans- former V2: Grouped vector attention and partition-based pooling.Advances in Neural Information Processing Systems, 35:33330–33342, 2022

2022
[36]

Point Transformer V3: Simpler, faster, stronger

Xiaoyang Wu, Li Jiang, Peng-Shuai Wang, Zhijian Liu, Xihui Liu, Yu Qiao, Wanli Ouyang, Tong He, and Hengshuang Zhao. Point Transformer V3: Simpler, faster, stronger. InProceedings of the 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 4840–4851, 2024

2024
[37]

Automated forest inventory: Analysis of high- density airborne LiDAR point clouds with 3D deep learning.Remote Sensing of Envi- ronment, 305(114078):1–20, 2024

Binbin Xiang, Maciej Wielgosz, Theodora Kontogianni, Torben Peters, Stefano Puliti, Rasmus Astrup, and Konrad Schindler. Automated forest inventory: Analysis of high- density airborne LiDAR point clouds with 3D deep learning.Remote Sensing of Envi- ronment, 305(114078):1–20, 2024

2024
[38]

ForestFormer3D: A unified framework for end-to-end seg- mentation of forest LiDAR 3D point clouds

Binbin Xiang, Maciej Wielgosz, Stefano Puliti, Kamil Král, Martin Kr˚ uˇcek, Azim Mis- sarov, and Rasmus Astrup. ForestFormer3D: A unified framework for end-to-end seg- mentation of forest LiDAR 3D point clouds. InProceedings of the 20th IEEE/CVF International Conference on Computer Vision, pages 24717–24727, 2025

2025
[39]

Individual tree extraction through 3D promptable segmentation networks.Methods in Ecology and Evolution, 16 (8):1749–1762, 2025

Tianyu Xiu, Hanwen Qi, Jiabo Xu, and XinLian Liang. Individual tree extraction through 3D promptable segmentation networks.Methods in Ecology and Evolution, 16 (8):1749–1762, 2025

2025
[40]

Lei Yao, Yi Wang, Yawen Cui, Moyun Liu, and Lap-Pui Chau. LaSSM: Efficient semantic-spatial query decoding via local aggregation and state space models for 3D instance segmentation.IEEE Transactions on Circuits and Systems for Video Technol- ogy, pages 1–13, 2026

2026
[41]

Point cloud Mamba: Point cloud learning via state space model

Tao Zhang, Haobo Yuan, Lu Qi, Jiangning Zhang, Qianyu Zhou, Shunping Ji, Shuicheng Yan, and Xiangtai Li. Point cloud Mamba: Point cloud learning via state space model. InProceedings of the 39th AAAI Conference on Artificial Intelligence, pages 10121–10130, 2025

2025
[42]

Point Trans- former

Hengshuang Zhao, Li Jiang, Jiaya Jia, Philip Torr, and Vladlen Koltun. Point Trans- former. InProceedings of the 18th IEEE/CVF International Conference on Computer Vision, pages 16259–16268, 2021. 18

2021

[1] [1]

Benjamin Brede, Kim Calders, Alvaro Lau, Pasi Raumonen, Harm M Bartholomeus, Martin Herold, and Lammert Kooistra. Non-destructive tree volume estimation through quantitative structure modelling: Comparing UA V laser scanning with terrestrial Li- DAR.Remote Sensing of Environment, 233(111355):1–14, 2022

2022

[2] [2]

Ter- restrial laser scanning in forest ecology: Expanding the horizon.Remote Sensing of Environment, 251(112102):1–17, 2020

Kim Calders, Jennifer Adams, John Armston, Harm Bartholomeus, Sebastien Bauwens, Lisa Patrick Bentley, Jerome Chave, F Mark Danson, Miro Demol, Mathias Disney, Rachel Gaulton, Sruthi M Krishna Moorthy, Shaun R Levick, Ninni Saarinen, Crystal Schaaf, Atticus Stovall, Louise Terryn, Phil Wilkes, and Hans Verbeeck. Ter- restrial laser scanning in forest ecol...

2020

[3] [3]

End-to-end object detection with Transformers

Nicolas Carion, Francisco Massa, Gabriel Synnaeve, Nicolas Usunier, Alexander Kir- illov, and Sergey Zagoruyko. End-to-end object detection with Transformers. InPro- ceedings of the 16th European Conference on Computer Vision,Part I, pages 213–229, 2020

2020

[4] [4]

Masked-attention mask Transformer for universal image segmentation

Bowen Cheng, Ishan Misra, Alexander G Schwing, Alexander Kirillov, and Rohit Gird- har. Masked-attention mask Transformer for universal image segmentation. InPro- ceedings of the 2022 IEEE/CVF Conference on Computer Vision and Pattern Recogni- tion, pages 1290–1299, 2022

2022

[5] [5]

4D spatio-temporal Con- vNets: Minkowski convolutional neural networks

Christopher Choy, JunYoung Gwak, and Silvio Savarese. 4D spatio-temporal Con- vNets: Minkowski convolutional neural networks. InProceedings of the 2019 IEEE Conference on Computer Vision and Pattern Recognition, pages 3075–3084, 2019

2019

[6] [6]

Semantic instance segmentation for autonomous driving

Bert De Brabandere, Davy Neven, and Luc Van Gool. Semantic instance segmentation for autonomous driving. InProceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops, pages 478–480, 2017

2017

[7] [7]

Aboveground biomass density models for NASA’s Global Ecosystem Dynamics Investigation (GEDI) LiDAR mission.Remote Sensing of Environment, 270(112845): 1–20, 2022

Laura Duncanson, James R Kellner, John Armston, Ralph Dubayah, David M Minor, Steven Hancock, Sean P Healey, Paul L Patterson, Svetlana Saarela, Suzanne Marselis, et al. Aboveground biomass density models for NASA’s Global Ecosystem Dynamics Investigation (GEDI) LiDAR mission.Remote Sensing of Environment, 270(112845): 1–20, 2022

2022

[8] [8]

A density-based al- gorithm for discovering clusters in large spatial databases with noise

Martin Ester, Hans-Peter Kriegel, Jörg Sander, and Xiaowei Xu. A density-based al- gorithm for discovering clusters in large spatial databases with noise. InProceedings of the 2nd International Conference on Knowledge Discovery and Data Mining, page 226–231, 1996

1996

[9] [9]

Close-range remote sensing of forest structure for biodiversity assessments: A systematic literature review.Current Forestry Reports, 11(1):1–18, 2025

Jan Feigl, Julian Frey, Thomas Seifert, and Barbara Koch. Close-range remote sensing of forest structure for biodiversity assessments: A systematic literature review.Current Forestry Reports, 11(1):1–18, 2025

2025

[10] [10]

3D semantic seg- mentation with submanifold sparse convolutional networks

Benjamin Graham, Martin Engelcke, and Laurens van der Maaten. 3D semantic seg- mentation with submanifold sparse convolutional networks. InProceedings of the 2018 IEEE Conference on Computer Vision and Pattern Recognition, pages 9224–9232, 2018. 15

2018

[11] [11]

Mamba: Linear-time sequence modeling with selective state spaces

Albert Gu and Tri Dao. Mamba: Linear-time sequence modeling with selective state spaces. InProceedings of the 2024 International Conference on Learning Representa- tions, pages 1–32, 2024

2024

[12] [12]

Mamba3D: Enhancing local features for 3D point cloud analysis via state space model

Xu Han, Yuan Tang, Zhaoxuan Wang, and Xianzhi Li. Mamba3D: Enhancing local features for 3D point cloud analysis via state space model. InProceedings of the 32nd ACM International Conference on Multimedia, page 4995–5004, 2024

2024

[13] [13]

Towards general deep-learning-based tree in- stance segmentation models

Jonathan Henrich and Jan van Delden. Towards general deep-learning-based tree in- stance segmentation models. InProceedings of the 2024 International Conference on Learning Representations Workshop on Machine Learning for Remote Sensing, pages 1–6, 2024

2024

[14] [14]

TreeLearn: A deep learning method for segmenting individual trees from ground-based LiDAR forest point clouds.Ecological Informatics, 84(102888):1–16, 2024

Jonathan Henrich, Jan van Delden, Dominik Seidel, Thomas Kneib, and Alexander S Ecker. TreeLearn: A deep learning method for segmenting individual trees from ground-based LiDAR forest point clouds.Ecological Informatics, 84(102888):1–16, 2024

2024

[15] [15]

Allometric equations for integrating remote sensing imagery into forest monitoring programmes.Global Change Biology, 23(1):177–190, 2017

Tommaso Jucker, John Caspersen, Jérôme Chave, Cécile Antin, Nicolas Barbier, Frans Bongers, Michele Dalponte, Karin Y van Ewijk, David I Forrester, Matthias Haeni, Steven I Higgins, Robert J Holdaway, Yoshiko Iida, Craig Lorimer, Peter L Marshall, Stéphane Momo, Glenn R Moncrieff, Pierre Ploton, Lourens Poorter, Kassim Abd Rahman, Michael Schlund, Bonaven...

2017

[16] [16]

Review on Convolu- tional Neural Networks (CNN) in vegetation remote sensing.ISPRS Journal of Pho- togrammetry and Remote Sensing, 173:24–49, 2021

Teja Kattenborn, Jens Leitloff, Felix Schiefer, and Stefan Hinz. Review on Convolu- tional Neural Networks (CNN) in vegetation remote sensing.ISPRS Journal of Pho- togrammetry and Remote Sensing, 173:24–49, 2021

2021

[17] [17]

OneFormer3D: One Transformer for unified point cloud segmentation

Maxim Kolodiazhnyi, Anna V orontsova, Anton Konushin, and Danila Rukhovich. OneFormer3D: One Transformer for unified point cloud segmentation. InProceed- ings of the 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 20943–20953, 2024

2024

[18] [18]

Harold W. Kuhn. The Hungarian method for the assignment problem.Naval Research Logistics Quarterly, 2(1–2):83–97, 1955

1955

[19] [19]

PointMamba: A simple state space model for point cloud analy- sis.Advances in Neural Information Processing Systems, 37:32653–32677, 2024

Dingkang Liang, Xin Zhou, Xinyu Wang, Xingkui Zhu, Wei Xu, Zhikang Zheng, Yifei Song, and Xiang Bai. PointMamba: A simple state space model for point cloud analy- sis.Advances in Neural Information Processing Systems, 37:32653–32677, 2024

2024

[20] [20]

VMamba: Visual state space model.Advances in Neural Information Processing Systems, 37:103031–103063, 2024

Yue Liu, Yunjie Tian, Yuzhong Zhao, Hongtian Yu, Lingxi Xie, Yaowei Wang, Qix- iang Ye, and Yunfan Liu. VMamba: Visual state space model.Advances in Neural Information Processing Systems, 37:103031–103063, 2024

2024

[21] [21]

Decoupled weight decay regularization

Ilya Loshchilov and Frank Hutter. Decoupled weight decay regularization. InProceed- ings of the 2019 International Conference on Learning Representations, pages 1–19, 2019. 16

2019

[22] [22]

Dening Lu, Linlin Xu, Jun Zhou, Kyle Gao, Zheng Gong, and Dedong Zhang. 3D- UMamba: 3D U-Net with state space model for semantic segmentation of multi-source LiDAR point clouds.International Journal of Applied Earth Observation and Geoin- formation, 136(104401):1–14, 2025

2025

[23] [23]

Individual tree crown delineation using multispectral LiDAR data.Sensors, 19(24):1–21, 2019

Faizaan Naveed, Baoxin Hu, Jianguo Wang, and G Brent Hall. Individual tree crown delineation using multispectral LiDAR data.Sensors, 19(24):1–21, 2019

2019

[24] [24]

Efficient non-maximum suppression

Alexander Neubeck and Luc Van Gool. Efficient non-maximum suppression. InPro- ceedings of the 18th International Conference on Pattern Recognition, volume 3, pages 850–855, 2006

2006

[25] [25]

Estimating plot-level tree heights with Li- DAR: local filtering with a canopy-height based variable window size.Computers and Electronics in Agriculture, 37:71–95, 2002

Sorin C Popescu and Randolph H Wynne. Estimating plot-level tree heights with Li- DAR: local filtering with a canopy-height based variable window size.Computers and Electronics in Agriculture, 37:71–95, 2002

2002

[26] [26]

Hans Pretzsch, Cory Matthew, and Jochen Dieler. Allometry of tree crown structure: Relevance for space occupation at the individual plant level and for self-thinning at the stand level.Growth and Defence in Plants: Resource Allocation at Multiple Scales, 220:287–310, 2012

2012

[27] [27]

PointNet: Deep learning on point sets for 3D classification and segmentation

Charles R Qi, Hao Su, Kaichun Mo, and Leonidas J Guibas. PointNet: Deep learning on point sets for 3D classification and segmentation. InProceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition, pages 652–660, 2017

2017

[28] [28]

PointNet++: Deep hierarchical feature learning on point sets in a metric space.Advances in Neural Infor- mation Processing Systems, 30:5105–5114, 2017

Charles Ruizhongtai Qi, Li Yi, Hao Su, and Leonidas J Guibas. PointNet++: Deep hierarchical feature learning on point sets in a metric space.Advances in Neural Infor- mation Processing Systems, 30:5105–5114, 2017

2017

[29] [29]

Aldino Rizaldy, Fabian Ewald Fassnacht, Ahmed Jamal Afifi, Hua Jiang, Richard Gloaguen, and Pedram Ghamisi. Label-efficient 3D forest mapping: Self-supervised and transfer learning for individual, structural, and species analysis.Computing Re- search Repository, arXiv Preprints,arXiv:2503.10243, pages 1–47, 2025

work page arXiv 2025

[30] [30]

U-Net: Convolutional networks for biomedical image segmentation

Olaf Ronneberger, Philipp Fischer, and Thomas Brox. U-Net: Convolutional networks for biomedical image segmentation. InProceedings of the 18th International Confer- ence on Medical Image Computing and Computer-Assisted Intervention, Part III, pages 234–241, 2015

2015

[31] [31]

Qi, Jean-Emmanuel Deschaud, Beatriz Marcotegui, François Goulette, and Leonidas Guibas

Hugues Thomas, Charles R. Qi, Jean-Emmanuel Deschaud, Beatriz Marcotegui, François Goulette, and Leonidas Guibas. KPConv: Flexible and deformable convolu- tion for point clouds. InProceedings of the 17th IEEE/CVF International Conference on Computer Vision, pages 6411–6420, 2019

2019

[32] [32]

Test- time augmentation for 3D point cloud classification and segmentation

Tuan Anh Vu, Srinjay Sarkar, Zhiyuan Zhang, Binh Son Hua, and Sai Kit Yeung. Test- time augmentation for 3D point cloud classification and segmentation. InProceedings of the 2024 International Conference on 3D Vision, pages 1543–1553, 2024

2024

[33] [33]

Remote sensing technologies for enhancing forest inven- tories: A review.Canadian Journal of Remote Sensing, 42(5):619–641, 2016

Joanne C White, Nicholas C Coops, Michael A Wulder, Mikko Vastaranta, Thomas Hilker, and Piotr Tompalski. Remote sensing technologies for enhancing forest inven- tories: A review.Canadian Journal of Remote Sensing, 42(5):619–641, 2016. 17

2016

[34] [34]

SegmentAnyTree: A sensor and platform agnostic deep learning model for tree segmentation using any 3D point cloud data.Remote Sensing of Environment, 313 (114367):1–13, 2024

Maciej Wielgosz, Stefano Puliti, Binbin Xiang, Konrad Schindler, and Rasmus As- trup. SegmentAnyTree: A sensor and platform agnostic deep learning model for tree segmentation using any 3D point cloud data.Remote Sensing of Environment, 313 (114367):1–13, 2024

2024

[35] [35]

Point Trans- former V2: Grouped vector attention and partition-based pooling.Advances in Neural Information Processing Systems, 35:33330–33342, 2022

Xiaoyang Wu, Yixing Lao, Li Jiang, Xihui Liu, and Hengshuang Zhao. Point Trans- former V2: Grouped vector attention and partition-based pooling.Advances in Neural Information Processing Systems, 35:33330–33342, 2022

2022

[36] [36]

Point Transformer V3: Simpler, faster, stronger

Xiaoyang Wu, Li Jiang, Peng-Shuai Wang, Zhijian Liu, Xihui Liu, Yu Qiao, Wanli Ouyang, Tong He, and Hengshuang Zhao. Point Transformer V3: Simpler, faster, stronger. InProceedings of the 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 4840–4851, 2024

2024

[37] [37]

Automated forest inventory: Analysis of high- density airborne LiDAR point clouds with 3D deep learning.Remote Sensing of Envi- ronment, 305(114078):1–20, 2024

Binbin Xiang, Maciej Wielgosz, Theodora Kontogianni, Torben Peters, Stefano Puliti, Rasmus Astrup, and Konrad Schindler. Automated forest inventory: Analysis of high- density airborne LiDAR point clouds with 3D deep learning.Remote Sensing of Envi- ronment, 305(114078):1–20, 2024

2024

[38] [38]

ForestFormer3D: A unified framework for end-to-end seg- mentation of forest LiDAR 3D point clouds

Binbin Xiang, Maciej Wielgosz, Stefano Puliti, Kamil Král, Martin Kr˚ uˇcek, Azim Mis- sarov, and Rasmus Astrup. ForestFormer3D: A unified framework for end-to-end seg- mentation of forest LiDAR 3D point clouds. InProceedings of the 20th IEEE/CVF International Conference on Computer Vision, pages 24717–24727, 2025

2025

[39] [39]

Individual tree extraction through 3D promptable segmentation networks.Methods in Ecology and Evolution, 16 (8):1749–1762, 2025

Tianyu Xiu, Hanwen Qi, Jiabo Xu, and XinLian Liang. Individual tree extraction through 3D promptable segmentation networks.Methods in Ecology and Evolution, 16 (8):1749–1762, 2025

2025

[40] [40]

Lei Yao, Yi Wang, Yawen Cui, Moyun Liu, and Lap-Pui Chau. LaSSM: Efficient semantic-spatial query decoding via local aggregation and state space models for 3D instance segmentation.IEEE Transactions on Circuits and Systems for Video Technol- ogy, pages 1–13, 2026

2026

[41] [41]

Point cloud Mamba: Point cloud learning via state space model

Tao Zhang, Haobo Yuan, Lu Qi, Jiangning Zhang, Qianyu Zhou, Shunping Ji, Shuicheng Yan, and Xiangtai Li. Point cloud Mamba: Point cloud learning via state space model. InProceedings of the 39th AAAI Conference on Artificial Intelligence, pages 10121–10130, 2025

2025

[42] [42]

Point Trans- former

Hengshuang Zhao, Li Jiang, Jiaya Jia, Philip Torr, and Vladlen Koltun. Point Trans- former. InProceedings of the 18th IEEE/CVF International Conference on Computer Vision, pages 16259–16268, 2021. 18

2021