ForestMamba: Sparse Mamba with Geometry-guided Queries for 3D Forest Point Cloud Segmentation
Pith reviewed 2026-06-28 15:46 UTC · model grok-4.3
The pith
ForestMamba replaces quadratic attention with linear state-space modeling while using canopy height maxima to seed queries for forest point cloud segmentation.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
ForestMamba shows that a sparse encoder with vertical-priority slab serialization, geometry-guided query initialization from multi-scale canopy height models, and a Mamba-based query decoder with local kNN aggregation together produce better semantic and instance segmentation than existing sparse convolution or transformer baselines while running at linear complexity.
What carries the argument
Geometry-guided query initialization that uses canopy maxima from an on-the-fly multi-scale canopy height model as ecologically meaningful seeds, supplemented by farthest point sampling for understory coverage.
If this is right
- Individual tree separation improves specifically in dense overlapping canopy areas because query seeds are placed at local height maxima.
- Inference on full forest scenes runs three times faster and uses 2.3 times less GPU memory than transformer equivalents because attention is replaced by state-space modeling.
- Both semantic labeling of points and instance separation of trees benefit from the same vertical serialization and dual-path refinement steps.
- Performance remains consistent across seven regions that differ in species, density, and terrain.
Where Pith is reading between the lines
- If the canopy-height seeding step proves stable, analogous structural priors could be extracted for other 3D natural scenes such as urban vegetation or coral reefs.
- The linear scaling could allow processing of entire drone surveys on modest hardware, opening the possibility of near-real-time updates to forest inventories.
- The same query-refinement block might be reused in other point-cloud tasks where vertical structure dominates, such as building facade parsing or terrain classification.
Load-bearing premise
The canopy maxima extracted from the multi-scale canopy height model supply seeds that improve tree separation in overlapping regions beyond what generic point sampling achieves.
What would settle it
An ablation study on a new forest region in which disabling the canopy-height-model query seeding drops both semantic and instance segmentation metrics to the level of the strongest non-geometry baseline while the reported speed and memory advantages remain unchanged.
Figures
read the original abstract
AI-based semantic and instance segmentation of terrestrial and drone LiDAR point clouds is emerging as a transformative approach for converting the complex 3D structure of forests into actionable information for forest monitoring and biodiversity assessment. However, forest LiDAR scenes remain highly challenging due to their large data volumes, irregular sampling density, overlapping and complex canopy structure, and geographic variability. Existing methods based on sparse convolutions or Transformers achieve promising results, but suffer from two key limitations: Quadratic complexity of attention scales poorly to large forest scenes, and Generic context modeling does not exploit forest structural priors, limiting tree separation in complex regions. To address these challenges, we propose ForestMamba, a structure-aware method that incorporates forest-specific priors into feature encoding, query generation, and query refinement, while replacing quadratic attention with linear-time state-space modeling. First, we introduce a sparse encoder with vertical-priority slab serialization that organizes sparse voxels into vertically coherent sequences for efficient long-range context modeling. Second, we propose a geometry-guided query initialization strategy based on an on-the-fly multi-scale Canopy Height Model (CHM), where canopy maxima provide ecologically meaningful query seeds, supplemented by Farthest Point Sampling (FPS) to cover understory trees. Third, we design a Mamba-based query decoder that combines local kNN voxel aggregation with a spatial dual-path Mamba for query refinement with linear computational complexity. Extensive experiments across seven forest regions demonstrate that ForestMamba consistently outperforms existing baselines in both segmentation tasks, while achieving 3 times faster inference and 2.3 times lower GPU memory than Transformer-based methods.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper proposes ForestMamba, a structure-aware architecture for semantic and instance segmentation of large-scale forest LiDAR point clouds. It replaces quadratic attention with linear state-space modeling via a sparse vertical-slab Mamba encoder, introduces geometry-guided query initialization from on-the-fly multi-scale Canopy Height Model (CHM) maxima (supplemented by FPS), and uses a dual-path Mamba query decoder with local kNN aggregation. The central claim is that this yields consistent outperformance over baselines across seven forest regions together with 3× faster inference and 2.3× lower GPU memory than Transformer-based methods.
Significance. If the reported gains are reproducible and the CHM-guided initialization is shown to drive separation improvements beyond the linear-complexity backbone, the work would offer a practical advance for processing voluminous, structurally complex forest scenes by injecting domain priors into an efficient sequence model, with direct relevance to ecological monitoring.
major comments (1)
- [Abstract] Abstract: the claim that 'canopy maxima provide ecologically meaningful query seeds' that 'meaningfully aid tree separation where canopies overlap' is presented as the second key contribution and is load-bearing for the performance delta; however, no ablation isolating the CHM initialization from the sparse vertical-slab Mamba encoder is described, nor is any quantitative comparison (e.g., seed quality metrics or per-region IoU deltas) between CHM+FPS queries and FPS-only queries supplied.
minor comments (1)
- [Abstract] Abstract: quantitative metrics, dataset sizes, error bars, and ablation tables are referenced but not reported, making it impossible to assess the magnitude or statistical reliability of the claimed gains.
Simulated Author's Rebuttal
We thank the referee for the constructive feedback on our manuscript. We address the major comment below and will revise the paper accordingly to strengthen the evidence for our claims.
read point-by-point responses
-
Referee: [Abstract] Abstract: the claim that 'canopy maxima provide ecologically meaningful query seeds' that 'meaningfully aid tree separation where canopies overlap' is presented as the second key contribution and is load-bearing for the performance delta; however, no ablation isolating the CHM initialization from the sparse vertical-slab Mamba encoder is described, nor is any quantitative comparison (e.g., seed quality metrics or per-region IoU deltas) between CHM+FPS queries and FPS-only queries supplied.
Authors: We agree that an explicit ablation isolating the CHM-guided query initialization from the sparse vertical-slab Mamba encoder is needed to substantiate the claim. The current manuscript reports overall gains against external baselines but does not include a controlled within-architecture comparison of CHM+FPS versus FPS-only query seeding. In the revised version we will add this ablation, reporting per-region IoU/mIoU deltas, seed quality metrics (e.g., precision of canopy maxima as tree centers), and qualitative analysis of overlapping-canopy cases. revision: yes
Circularity Check
No significant circularity; claims rest on empirical validation
full rationale
The paper presents an empirical method (sparse Mamba encoder + CHM-guided query init + dual-path decoder) whose central claims are outperformance on seven forest regions plus efficiency gains versus Transformers. No equations, derivations, or fitted-parameter predictions appear in the provided text. The geometry-guided CHM initialization is an input prior, not a self-referential definition or renamed fit. No self-citation chains or uniqueness theorems are invoked to force the architecture. The derivation chain is therefore self-contained against external benchmarks and receives the default non-circularity finding.
Axiom & Free-Parameter Ledger
Forward citations
Cited by 1 Pith paper
-
SelectAnyTree: A Promptable Instance Segmentation Model for 3D Forest LiDAR Point Clouds
SelectAnyTree is a promptable instance segmentation model for 3D forest LiDAR point clouds that achieves 78.2 IoU from a single click via a click-to-query prompt encoder, CHM-guided first prompt, and state-space query...
Reference graph
Works this paper leans on
-
[1]
Benjamin Brede, Kim Calders, Alvaro Lau, Pasi Raumonen, Harm M Bartholomeus, Martin Herold, and Lammert Kooistra. Non-destructive tree volume estimation through quantitative structure modelling: Comparing UA V laser scanning with terrestrial Li- DAR.Remote Sensing of Environment, 233(111355):1–14, 2022
2022
-
[2]
Ter- restrial laser scanning in forest ecology: Expanding the horizon.Remote Sensing of Environment, 251(112102):1–17, 2020
Kim Calders, Jennifer Adams, John Armston, Harm Bartholomeus, Sebastien Bauwens, Lisa Patrick Bentley, Jerome Chave, F Mark Danson, Miro Demol, Mathias Disney, Rachel Gaulton, Sruthi M Krishna Moorthy, Shaun R Levick, Ninni Saarinen, Crystal Schaaf, Atticus Stovall, Louise Terryn, Phil Wilkes, and Hans Verbeeck. Ter- restrial laser scanning in forest ecol...
2020
-
[3]
End-to-end object detection with Transformers
Nicolas Carion, Francisco Massa, Gabriel Synnaeve, Nicolas Usunier, Alexander Kir- illov, and Sergey Zagoruyko. End-to-end object detection with Transformers. InPro- ceedings of the 16th European Conference on Computer Vision,Part I, pages 213–229, 2020
2020
-
[4]
Masked-attention mask Transformer for universal image segmentation
Bowen Cheng, Ishan Misra, Alexander G Schwing, Alexander Kirillov, and Rohit Gird- har. Masked-attention mask Transformer for universal image segmentation. InPro- ceedings of the 2022 IEEE/CVF Conference on Computer Vision and Pattern Recogni- tion, pages 1290–1299, 2022
2022
-
[5]
4D spatio-temporal Con- vNets: Minkowski convolutional neural networks
Christopher Choy, JunYoung Gwak, and Silvio Savarese. 4D spatio-temporal Con- vNets: Minkowski convolutional neural networks. InProceedings of the 2019 IEEE Conference on Computer Vision and Pattern Recognition, pages 3075–3084, 2019
2019
-
[6]
Semantic instance segmentation for autonomous driving
Bert De Brabandere, Davy Neven, and Luc Van Gool. Semantic instance segmentation for autonomous driving. InProceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops, pages 478–480, 2017
2017
-
[7]
Aboveground biomass density models for NASA’s Global Ecosystem Dynamics Investigation (GEDI) LiDAR mission.Remote Sensing of Environment, 270(112845): 1–20, 2022
Laura Duncanson, James R Kellner, John Armston, Ralph Dubayah, David M Minor, Steven Hancock, Sean P Healey, Paul L Patterson, Svetlana Saarela, Suzanne Marselis, et al. Aboveground biomass density models for NASA’s Global Ecosystem Dynamics Investigation (GEDI) LiDAR mission.Remote Sensing of Environment, 270(112845): 1–20, 2022
2022
-
[8]
A density-based al- gorithm for discovering clusters in large spatial databases with noise
Martin Ester, Hans-Peter Kriegel, Jörg Sander, and Xiaowei Xu. A density-based al- gorithm for discovering clusters in large spatial databases with noise. InProceedings of the 2nd International Conference on Knowledge Discovery and Data Mining, page 226–231, 1996
1996
-
[9]
Close-range remote sensing of forest structure for biodiversity assessments: A systematic literature review.Current Forestry Reports, 11(1):1–18, 2025
Jan Feigl, Julian Frey, Thomas Seifert, and Barbara Koch. Close-range remote sensing of forest structure for biodiversity assessments: A systematic literature review.Current Forestry Reports, 11(1):1–18, 2025
2025
-
[10]
3D semantic seg- mentation with submanifold sparse convolutional networks
Benjamin Graham, Martin Engelcke, and Laurens van der Maaten. 3D semantic seg- mentation with submanifold sparse convolutional networks. InProceedings of the 2018 IEEE Conference on Computer Vision and Pattern Recognition, pages 9224–9232, 2018. 15
2018
-
[11]
Mamba: Linear-time sequence modeling with selective state spaces
Albert Gu and Tri Dao. Mamba: Linear-time sequence modeling with selective state spaces. InProceedings of the 2024 International Conference on Learning Representa- tions, pages 1–32, 2024
2024
-
[12]
Mamba3D: Enhancing local features for 3D point cloud analysis via state space model
Xu Han, Yuan Tang, Zhaoxuan Wang, and Xianzhi Li. Mamba3D: Enhancing local features for 3D point cloud analysis via state space model. InProceedings of the 32nd ACM International Conference on Multimedia, page 4995–5004, 2024
2024
-
[13]
Towards general deep-learning-based tree in- stance segmentation models
Jonathan Henrich and Jan van Delden. Towards general deep-learning-based tree in- stance segmentation models. InProceedings of the 2024 International Conference on Learning Representations Workshop on Machine Learning for Remote Sensing, pages 1–6, 2024
2024
-
[14]
TreeLearn: A deep learning method for segmenting individual trees from ground-based LiDAR forest point clouds.Ecological Informatics, 84(102888):1–16, 2024
Jonathan Henrich, Jan van Delden, Dominik Seidel, Thomas Kneib, and Alexander S Ecker. TreeLearn: A deep learning method for segmenting individual trees from ground-based LiDAR forest point clouds.Ecological Informatics, 84(102888):1–16, 2024
2024
-
[15]
Allometric equations for integrating remote sensing imagery into forest monitoring programmes.Global Change Biology, 23(1):177–190, 2017
Tommaso Jucker, John Caspersen, Jérôme Chave, Cécile Antin, Nicolas Barbier, Frans Bongers, Michele Dalponte, Karin Y van Ewijk, David I Forrester, Matthias Haeni, Steven I Higgins, Robert J Holdaway, Yoshiko Iida, Craig Lorimer, Peter L Marshall, Stéphane Momo, Glenn R Moncrieff, Pierre Ploton, Lourens Poorter, Kassim Abd Rahman, Michael Schlund, Bonaven...
2017
-
[16]
Review on Convolu- tional Neural Networks (CNN) in vegetation remote sensing.ISPRS Journal of Pho- togrammetry and Remote Sensing, 173:24–49, 2021
Teja Kattenborn, Jens Leitloff, Felix Schiefer, and Stefan Hinz. Review on Convolu- tional Neural Networks (CNN) in vegetation remote sensing.ISPRS Journal of Pho- togrammetry and Remote Sensing, 173:24–49, 2021
2021
-
[17]
OneFormer3D: One Transformer for unified point cloud segmentation
Maxim Kolodiazhnyi, Anna V orontsova, Anton Konushin, and Danila Rukhovich. OneFormer3D: One Transformer for unified point cloud segmentation. InProceed- ings of the 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 20943–20953, 2024
2024
-
[18]
Harold W. Kuhn. The Hungarian method for the assignment problem.Naval Research Logistics Quarterly, 2(1–2):83–97, 1955
1955
-
[19]
PointMamba: A simple state space model for point cloud analy- sis.Advances in Neural Information Processing Systems, 37:32653–32677, 2024
Dingkang Liang, Xin Zhou, Xinyu Wang, Xingkui Zhu, Wei Xu, Zhikang Zheng, Yifei Song, and Xiang Bai. PointMamba: A simple state space model for point cloud analy- sis.Advances in Neural Information Processing Systems, 37:32653–32677, 2024
2024
-
[20]
VMamba: Visual state space model.Advances in Neural Information Processing Systems, 37:103031–103063, 2024
Yue Liu, Yunjie Tian, Yuzhong Zhao, Hongtian Yu, Lingxi Xie, Yaowei Wang, Qix- iang Ye, and Yunfan Liu. VMamba: Visual state space model.Advances in Neural Information Processing Systems, 37:103031–103063, 2024
2024
-
[21]
Decoupled weight decay regularization
Ilya Loshchilov and Frank Hutter. Decoupled weight decay regularization. InProceed- ings of the 2019 International Conference on Learning Representations, pages 1–19, 2019. 16
2019
-
[22]
Dening Lu, Linlin Xu, Jun Zhou, Kyle Gao, Zheng Gong, and Dedong Zhang. 3D- UMamba: 3D U-Net with state space model for semantic segmentation of multi-source LiDAR point clouds.International Journal of Applied Earth Observation and Geoin- formation, 136(104401):1–14, 2025
2025
-
[23]
Individual tree crown delineation using multispectral LiDAR data.Sensors, 19(24):1–21, 2019
Faizaan Naveed, Baoxin Hu, Jianguo Wang, and G Brent Hall. Individual tree crown delineation using multispectral LiDAR data.Sensors, 19(24):1–21, 2019
2019
-
[24]
Efficient non-maximum suppression
Alexander Neubeck and Luc Van Gool. Efficient non-maximum suppression. InPro- ceedings of the 18th International Conference on Pattern Recognition, volume 3, pages 850–855, 2006
2006
-
[25]
Estimating plot-level tree heights with Li- DAR: local filtering with a canopy-height based variable window size.Computers and Electronics in Agriculture, 37:71–95, 2002
Sorin C Popescu and Randolph H Wynne. Estimating plot-level tree heights with Li- DAR: local filtering with a canopy-height based variable window size.Computers and Electronics in Agriculture, 37:71–95, 2002
2002
-
[26]
Hans Pretzsch, Cory Matthew, and Jochen Dieler. Allometry of tree crown structure: Relevance for space occupation at the individual plant level and for self-thinning at the stand level.Growth and Defence in Plants: Resource Allocation at Multiple Scales, 220:287–310, 2012
2012
-
[27]
PointNet: Deep learning on point sets for 3D classification and segmentation
Charles R Qi, Hao Su, Kaichun Mo, and Leonidas J Guibas. PointNet: Deep learning on point sets for 3D classification and segmentation. InProceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition, pages 652–660, 2017
2017
-
[28]
PointNet++: Deep hierarchical feature learning on point sets in a metric space.Advances in Neural Infor- mation Processing Systems, 30:5105–5114, 2017
Charles Ruizhongtai Qi, Li Yi, Hao Su, and Leonidas J Guibas. PointNet++: Deep hierarchical feature learning on point sets in a metric space.Advances in Neural Infor- mation Processing Systems, 30:5105–5114, 2017
2017
-
[29]
Aldino Rizaldy, Fabian Ewald Fassnacht, Ahmed Jamal Afifi, Hua Jiang, Richard Gloaguen, and Pedram Ghamisi. Label-efficient 3D forest mapping: Self-supervised and transfer learning for individual, structural, and species analysis.Computing Re- search Repository, arXiv Preprints,arXiv:2503.10243, pages 1–47, 2025
-
[30]
U-Net: Convolutional networks for biomedical image segmentation
Olaf Ronneberger, Philipp Fischer, and Thomas Brox. U-Net: Convolutional networks for biomedical image segmentation. InProceedings of the 18th International Confer- ence on Medical Image Computing and Computer-Assisted Intervention, Part III, pages 234–241, 2015
2015
-
[31]
Qi, Jean-Emmanuel Deschaud, Beatriz Marcotegui, François Goulette, and Leonidas Guibas
Hugues Thomas, Charles R. Qi, Jean-Emmanuel Deschaud, Beatriz Marcotegui, François Goulette, and Leonidas Guibas. KPConv: Flexible and deformable convolu- tion for point clouds. InProceedings of the 17th IEEE/CVF International Conference on Computer Vision, pages 6411–6420, 2019
2019
-
[32]
Test- time augmentation for 3D point cloud classification and segmentation
Tuan Anh Vu, Srinjay Sarkar, Zhiyuan Zhang, Binh Son Hua, and Sai Kit Yeung. Test- time augmentation for 3D point cloud classification and segmentation. InProceedings of the 2024 International Conference on 3D Vision, pages 1543–1553, 2024
2024
-
[33]
Remote sensing technologies for enhancing forest inven- tories: A review.Canadian Journal of Remote Sensing, 42(5):619–641, 2016
Joanne C White, Nicholas C Coops, Michael A Wulder, Mikko Vastaranta, Thomas Hilker, and Piotr Tompalski. Remote sensing technologies for enhancing forest inven- tories: A review.Canadian Journal of Remote Sensing, 42(5):619–641, 2016. 17
2016
-
[34]
SegmentAnyTree: A sensor and platform agnostic deep learning model for tree segmentation using any 3D point cloud data.Remote Sensing of Environment, 313 (114367):1–13, 2024
Maciej Wielgosz, Stefano Puliti, Binbin Xiang, Konrad Schindler, and Rasmus As- trup. SegmentAnyTree: A sensor and platform agnostic deep learning model for tree segmentation using any 3D point cloud data.Remote Sensing of Environment, 313 (114367):1–13, 2024
2024
-
[35]
Point Trans- former V2: Grouped vector attention and partition-based pooling.Advances in Neural Information Processing Systems, 35:33330–33342, 2022
Xiaoyang Wu, Yixing Lao, Li Jiang, Xihui Liu, and Hengshuang Zhao. Point Trans- former V2: Grouped vector attention and partition-based pooling.Advances in Neural Information Processing Systems, 35:33330–33342, 2022
2022
-
[36]
Point Transformer V3: Simpler, faster, stronger
Xiaoyang Wu, Li Jiang, Peng-Shuai Wang, Zhijian Liu, Xihui Liu, Yu Qiao, Wanli Ouyang, Tong He, and Hengshuang Zhao. Point Transformer V3: Simpler, faster, stronger. InProceedings of the 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 4840–4851, 2024
2024
-
[37]
Automated forest inventory: Analysis of high- density airborne LiDAR point clouds with 3D deep learning.Remote Sensing of Envi- ronment, 305(114078):1–20, 2024
Binbin Xiang, Maciej Wielgosz, Theodora Kontogianni, Torben Peters, Stefano Puliti, Rasmus Astrup, and Konrad Schindler. Automated forest inventory: Analysis of high- density airborne LiDAR point clouds with 3D deep learning.Remote Sensing of Envi- ronment, 305(114078):1–20, 2024
2024
-
[38]
ForestFormer3D: A unified framework for end-to-end seg- mentation of forest LiDAR 3D point clouds
Binbin Xiang, Maciej Wielgosz, Stefano Puliti, Kamil Král, Martin Kr˚ uˇcek, Azim Mis- sarov, and Rasmus Astrup. ForestFormer3D: A unified framework for end-to-end seg- mentation of forest LiDAR 3D point clouds. InProceedings of the 20th IEEE/CVF International Conference on Computer Vision, pages 24717–24727, 2025
2025
-
[39]
Individual tree extraction through 3D promptable segmentation networks.Methods in Ecology and Evolution, 16 (8):1749–1762, 2025
Tianyu Xiu, Hanwen Qi, Jiabo Xu, and XinLian Liang. Individual tree extraction through 3D promptable segmentation networks.Methods in Ecology and Evolution, 16 (8):1749–1762, 2025
2025
-
[40]
Lei Yao, Yi Wang, Yawen Cui, Moyun Liu, and Lap-Pui Chau. LaSSM: Efficient semantic-spatial query decoding via local aggregation and state space models for 3D instance segmentation.IEEE Transactions on Circuits and Systems for Video Technol- ogy, pages 1–13, 2026
2026
-
[41]
Point cloud Mamba: Point cloud learning via state space model
Tao Zhang, Haobo Yuan, Lu Qi, Jiangning Zhang, Qianyu Zhou, Shunping Ji, Shuicheng Yan, and Xiangtai Li. Point cloud Mamba: Point cloud learning via state space model. InProceedings of the 39th AAAI Conference on Artificial Intelligence, pages 10121–10130, 2025
2025
-
[42]
Point Trans- former
Hengshuang Zhao, Li Jiang, Jiaya Jia, Philip Torr, and Vladlen Koltun. Point Trans- former. InProceedings of the 18th IEEE/CVF International Conference on Computer Vision, pages 16259–16268, 2021. 18
2021
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.