pith. machine review for the scientific record. sign in

arxiv: 2512.03424 · v3 · submitted 2025-12-03 · 💻 cs.CV

DM3D: Deformable Mamba via Offset-Guided Differentiable Scanning for Point Cloud Understanding

Pith reviewed 2026-05-17 03:15 UTC · model grok-4.3

classification 💻 cs.CV
keywords deformable mambapoint cloud understandingdifferentiable scanningstate space modelsadaptive serializationdeformable spatial resamplingpoint cloud segmentationfew-shot learning
0
0 comments X

The pith

Offset-guided differentiable scanning lets Mamba models adapt serialization order to point cloud geometry instead of relying on fixed patterns.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Point clouds have no inherent sequence, yet State Space Models require ordered input for effective long-range modeling. DM3D addresses this mismatch by introducing an offset-guided mechanism that jointly resamples local features and learns the optimal serialization order end-to-end. Deformable Spatial Resampling captures structural details adaptively while Gaussian-based Differentiable Reordering removes the need for hand-crafted scan paths. A Continuity-Aware State Update and Tri-Path Fusion module further stabilize the processing of varying geometries. Benchmark results on classification, few-shot learning, and part segmentation show that this adaptive approach delivers state-of-the-art or highly competitive accuracy.

Core claim

The paper establishes that jointly optimizing resampling and reordering through offset-guided differentiable scanning produces a structure-adaptive serialization process for point clouds. Deformable Spatial Resampling enhances local geometric awareness by adaptively selecting features, Gaussian-based Differentiable Reordering permits gradient flow through the ordering choice, and the Continuity-Aware State Update modulates state transitions according to local continuity. These components together replace rigid scanning schemes with a learned, geometry-aware sequence that improves downstream performance on standard point cloud benchmarks.

What carries the argument

Offset-guided differentiable scanning mechanism that jointly performs Deformable Spatial Resampling (DSR) for adaptive local feature selection and Gaussian-based Differentiable Reordering (GDR) to enable end-to-end optimization of serialization order.

If this is right

  • Point cloud classification accuracy rises because the model processes features in an order that respects local geometry rather than an arbitrary raster order.
  • Few-shot learning benefits as the learned serialization generalizes from limited examples by focusing on structurally relevant sequences.
  • Part segmentation improves through better preservation of geometric continuity during state updates across irregular surfaces.
  • The method replaces multiple predefined scanning heuristics with a single differentiable procedure that can be trained jointly with the rest of the network.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same differentiable reordering idea could be tested on other irregular domains such as meshes or graphs where fixed traversals are also suboptimal.
  • Computational cost of the Gaussian reordering step may limit deployment on very large point clouds; measuring FLOPs versus accuracy trade-offs would clarify practicality.
  • Combining this scanning module with other sequential architectures beyond Mamba could reveal whether the gains stem from the ordering flexibility itself.
  • Evaluating on dynamic or multi-view point cloud sequences might show whether the learned order adapts across time as well as across space.

Load-bearing premise

Jointly optimizing resampling and reordering via differentiable scanning will stably improve performance across diverse geometric structures without training instability or heavy hyperparameter tuning.

What would settle it

Training the model on ShapeNet part segmentation and observing no mIoU gain or increased variance compared to a fixed-order Mamba baseline would indicate the adaptive scanning provides no reliable benefit.

Figures

Figures reproduced from arXiv: 2512.03424 by Bin Liu, Chunyang Wang, Ge Zhang, Xuelian Liu.

Figure 1
Figure 1. Figure 1: Illustration of our deformable scanning. The off￾set network predicts spatial offsets ∆p and sequential offsets ∆t. Guided by the predicted offsets, a Gaussian kernel performs con￾sistent local resampling and global reordering, yielding structure￾aware sequences that capture fine-grained geometric details. and strong long-sequence modeling capability. However, applying Mamba [10] to point clouds requires a… view at source ↗
Figure 2
Figure 2. Figure 2: Overview of DM3D. (a) Overall architecture showing the embedding, encoder, and decoder structures. (b) The Deformable Mamba Block (DMB) consists of three SSM branches: the standard forward SSM [42] (F-SSM) branch, the channel-flip backward SSM [13] (C-SSM) branch, and the deformable SSM (D-SSM) branch. (c) Deformable Scan, the core of D-SSM, predicts spatial and sequential offsets via OffsetNet, enabling u… view at source ↗
Figure 3
Figure 3. Figure 3: Illustration of TPFF. Cross-path fusion is performed first, followed by frequency enhancement. • As σt → 0 +, the mapping converges to deterministic sorting, where each position si inherits the feature of its nearest discrete index Jj , effectively snapping tokens to their closest sequence positions. Although token indices are not explicitly permuted, their features are nearly exchanged, achieving an effec… view at source ↗
Figure 4
Figure 4. Figure 4: Visualization of the deformable mechanism. The token feature interaction matrix shows the feature weights from source tokens (x-axis) to target tokens (y-axis). mantically salient areas, thereby enhancing its sensitivity to fine structural details. Analysis of Differentiable Reordering. The heatmap in [PITH_FULL_IMAGE:figures/full_fig_p008_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Visual results of part segmentation by DM3D and PointMamba on ShapeNetPart. where J¯ T denotes the mean index of T , and Jj − J¯ T ̸= 0. Therefore: lim σt→0+ ∂W(t) ij ∂si =    0, j /∈ T +∞, j ∈ T , Jj > J¯ T −∞, j ∈ T , Jj < J¯ T (28) In this scenario, weights are evenly distributed the across equidistant indices, and the derivatives diverge. However, even a slight deviation in si breaks the symmetry,… view at source ↗
read the original abstract

State Space Models (SSMs) show significant potential for long-sequence modeling, but their reliance on input order conflicts with the irregular nature of point clouds. Existing approaches often rely on predefined serialization schemes whose fixed scanning patterns cannot adapt to diverse geometric structures. To address this limitation, we propose DM3D, a deformable Mamba architecture for point cloud understanding. Specifically, DM3D introduces an offset-guided differentiable scanning mechanism that jointly performs resampling and reordering. Deformable Spatial Resampling (DSR) enhances structural awareness by adaptively resampling local features, while the Gaussian-based Differentiable Reordering (GDR) enables end-to-end optimization of the serialization order. We further introduce a Continuity-Aware State Update (CASU) mechanism that modulates the state update based on local geometric continuity. In addition, a Tri-Path Fusion module facilitates complementary interactions among different SSM branches. Together, these designs enable structure-adaptive serialization for point clouds. Extensive experiments on benchmark datasets show that DM3D achieves state-of-the-art or highly competitive results on classification, few-shot learning, and part segmentation tasks, validating the effectiveness of adaptive serialization for point cloud understanding.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper proposes DM3D, a deformable Mamba architecture for point cloud understanding. It introduces an offset-guided differentiable scanning mechanism that jointly performs resampling and reordering via Deformable Spatial Resampling (DSR) to enhance structural awareness and Gaussian-based Differentiable Reordering (GDR) to enable end-to-end optimization of serialization order. A Continuity-Aware State Update (CASU) modulates state updates based on local geometric continuity, and a Tri-Path Fusion module enables interactions among SSM branches. The central claim is that these components enable structure-adaptive serialization, leading to state-of-the-art or highly competitive results on classification, few-shot learning, and part segmentation tasks.

Significance. If the empirical gains are shown to stem specifically from the adaptive serialization components and the GDR approximation is demonstrated to yield stable gradients without excessive hyperparameter sensitivity or instability on irregular point distributions, the work would meaningfully extend SSMs to non-Euclidean data by replacing fixed scanning patterns with learned, geometry-aware ordering.

major comments (2)
  1. The abstract asserts SOTA results attributable to the new mechanisms (DSR, GDR, CASU), yet supplies no ablation tables, quantitative breakdowns, or controls isolating the contribution of the differentiable scanning components versus baseline Mamba adaptations or other architectural choices. Without these, it is impossible to confirm that performance improvements arise from structure-adaptive serialization rather than confounding factors.
  2. The Gaussian-based Differentiable Reordering (GDR) approximates discrete serialization order via Gaussians to permit end-to-end gradients, but the manuscript provides no analysis of gradient variance, convergence behavior, or sensitivity to the Gaussian bandwidth hyperparameter across varying point densities. This is load-bearing: if the soft approximation produces noisy or vanishing gradients on non-uniform geometries, the claimed benefits of jointly optimizing DSR and GDR would not hold.
minor comments (2)
  1. Clarify the exact parameterization of the scanning offsets and Gaussian parameters for reordering in the methods section, including how they are initialized and regularized during training.
  2. Ensure all newly introduced modules (DSR, GDR, CASU) are accompanied by explicit algorithmic pseudocode or equations showing their integration into the Mamba state update.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We sincerely thank the referee for the constructive and detailed feedback. The comments raise important points about empirical validation and the stability of our proposed differentiable components. Below we address each major comment directly, referencing content already present in the manuscript while outlining targeted revisions to strengthen the presentation.

read point-by-point responses
  1. Referee: The abstract asserts SOTA results attributable to the new mechanisms (DSR, GDR, CASU), yet supplies no ablation tables, quantitative breakdowns, or controls isolating the contribution of the differentiable scanning components versus baseline Mamba adaptations or other architectural choices. Without these, it is impossible to confirm that performance improvements arise from structure-adaptive serialization rather than confounding factors.

    Authors: We thank the referee for emphasizing the need to isolate contributions. While the abstract is a high-level summary, the full manuscript already contains systematic ablation studies in Section 4.3. Table 4 reports classification accuracy on ModelNet40 for the full DM3D versus variants with DSR removed, GDR replaced by fixed-order scanning, CASU disabled, and Tri-Path Fusion ablated. Table 5 provides corresponding results on ShapeNet part segmentation. These show that disabling the adaptive serialization components (DSR+GDR) causes the largest drops (1.1–1.8% mIoU / accuracy), outperforming a standard Mamba baseline with raster-order scanning. We will revise the manuscript to add a concise summary paragraph in the main Experiments section that explicitly cross-references these tables to the abstract claims, making the isolation of contributions more immediately visible to readers. revision: partial

  2. Referee: The Gaussian-based Differentiable Reordering (GDR) approximates discrete serialization order via Gaussians to permit end-to-end gradients, but the manuscript provides no analysis of gradient variance, convergence behavior, or sensitivity to the Gaussian bandwidth hyperparameter across varying point densities. This is load-bearing: if the soft approximation produces noisy or vanishing gradients on non-uniform geometries, the claimed benefits of jointly optimizing DSR and GDR would not hold.

    Authors: We agree that explicit analysis of the GDR soft approximation is important for validating end-to-end optimization. The manuscript presents the GDR formulation in Section 3.2 and includes training loss curves (Figure 6) showing stable convergence on ModelNet40 and ScanObjectNN. However, we did not include dedicated studies of gradient variance, norm statistics, or sensitivity sweeps over the Gaussian bandwidth hyperparameter under varying point densities. We will add a new subsection (and corresponding appendix figures) that reports: (i) gradient norm histograms during training for multiple bandwidth values, (ii) performance sensitivity curves on non-uniform datasets such as ScanObjectNN, and (iii) a direct comparison of convergence behavior with the soft GDR versus a non-differentiable hard reordering baseline. This addition will directly address concerns about potential instability or hyperparameter sensitivity. revision: yes

Circularity Check

0 steps flagged

No circularity: empirical validation of novel modules on external benchmarks

full rationale

The paper proposes architectural components (DSR resampling, GDR reordering via Gaussian approximation, CASU state update, Tri-Path Fusion) to enable structure-adaptive serialization in a Mamba backbone for point clouds. Central claims rest on experimental results across classification, few-shot, and segmentation benchmarks rather than any mathematical derivation or prediction that reduces to fitted inputs or self-referential definitions. No load-bearing step equates outputs to inputs by construction, and validation uses independent external datasets without reliance on self-citation chains or ansatz smuggling.

Axiom & Free-Parameter Ledger

2 free parameters · 1 axioms · 3 invented entities

The approach rests on the domain assumption that point clouds benefit from learned adaptive ordering rather than fixed serialization. Several new modules are introduced whose parameters are optimized end-to-end. No machine-checked proofs or external benchmarks beyond standard datasets are mentioned.

free parameters (2)
  • scanning offsets
    Learned parameters that guide local resampling and reordering during training.
  • Gaussian parameters for reordering
    Parameters controlling the differentiable reordering process.
axioms (1)
  • domain assumption Fixed scanning patterns are insufficient for diverse geometric structures in point clouds
    Stated in the motivation for introducing deformable scanning.
invented entities (3)
  • Deformable Spatial Resampling (DSR) no independent evidence
    purpose: Adaptively resample local features to enhance structural awareness
    New component proposed to address irregular point cloud geometry.
  • Gaussian-based Differentiable Reordering (GDR) no independent evidence
    purpose: Enable end-to-end optimization of serialization order
    New differentiable mechanism for reordering.
  • Continuity-Aware State Update (CASU) no independent evidence
    purpose: Modulate state update based on local geometric continuity
    New mechanism to handle continuity in point clouds.

pith-pipeline@v0.9.0 · 5512 in / 1418 out tokens · 31952 ms · 2026-05-17T03:15:25.399451+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

57 extracted references · 57 canonical work pages · 1 internal anchor

  1. [1]

    Spectral informed mamba for robust point cloud processing

    Ali Bahri, Moslem Yazdanpanah, Mehrdad Noori, Sa- har Dastani, Milad Cheraghalikhani, Gustavo Adolfo Var- gas Hakim, David Osowiechi, Farzad Beizaee, Ismail Ben Ayed, and Christian Desrosiers. Spectral informed mamba for robust point cloud processing. InCVPR, pages 11799– 11809, 2025. 6, 7

  2. [2]

    ShapeNet: An Information-Rich 3D Model Repository

    Angel Chang, Thomas Funkhouser, Leonidas Guibas, Pat Hanrahan, Qixing Huang, Zimo Li, Silvio Savarese, Mano- lis Savva, Shuran Song, Hao Su, Jianxiong Xiao, Li Yi, and Fisher Yu. Shapenet: An information-rich 3d model reposi- tory.arXiv preprint arXiv:1512.03012, 2015. 6, 7

  3. [3]

    R. Q. Charles, H. Su, M. Kaichun, and L. J. Guibas. Pointnet: Deep learning on point sets for 3d classification and segmen- tation. InCVPR, pages 77–85, 2017. 1, 6, 7

  4. [4]

    Pointgpt: Auto-regressively generative pre- training from point clouds

    Guangyan Chen, Meiling Wang, Yi Yang, Kai Yu, Li Yuan, and Yufeng Yue. Pointgpt: Auto-regressively generative pre- training from point clouds. InNeurIPS, pages 29667–29679. Curran Associates, Inc., 2023. 6, 7

  5. [5]

    A novel radar point cloud generation method for robot envi- ronment perception.IEEE Transactions on Robotics, 38(6): 3754–3773, 2022

    Yuwei Cheng, Jingran Su, Mengxin Jiang, and Yimin Liu. A novel radar point cloud generation method for robot envi- ronment perception.IEEE Transactions on Robotics, 38(6): 3754–3773, 2022. 1

  6. [6]

    Octformer: Efficient octree-based transformer for point cloud compression with local enhancement

    Mingyue Cui, Junhua Long, Mingjian Feng, Boyang Li, and Huang Kai. Octformer: Efficient octree-based transformer for point cloud compression with local enhancement. In AAAI, pages 470–478, 2023. 1, 2

  7. [7]

    Deformable convolutional networks

    Jifeng Dai, Haozhi Qi, Yuwen Xiong, Yi Li, Guodong Zhang, Han Hu, and Yichen Wei. Deformable convolutional networks. InICCV, pages 764–773, 2017. 2

  8. [8]

    Autoencoders as cross-modal teachers: Can pretrained 2d image transform- ers help 3d representation learning? InICLR, 2023

    Runpei Dong, Zekun Qi, Linfeng Zhang, Junbo Zhang, Jian- jian Sun, Zheng Ge, Li Yi, and Kaisheng Ma. Autoencoders as cross-modal teachers: Can pretrained 2d image transform- ers help 3d representation learning? InICLR, 2023. 7

  9. [9]

    Sodeep: A sorting deep net to learn rank- ing loss surrogates

    Martin Engilberge, Louis Chevallier, Patrick P ´erez, and Matthieu Cord. Sodeep: A sorting deep net to learn rank- ing loss surrogates. InCVPR, pages 10784–10793, 2019. 5

  10. [10]

    Mamba: Linear-time sequence mod- eling with selective state spaces

    Albert Gu and Tri Dao. Mamba: Linear-time sequence mod- eling with selective state spaces. InFirst Conference on Lan- guage Modeling, 2024. 1, 3

  11. [11]

    Combining recurrent, convolutional, and continuous-time models with linear state space layers

    Albert Gu, Isys Johnson, Karan Goel, Khaled Saab, Tri Dao, Atri Rudra, and Christopher R ´e. Combining recurrent, convolutional, and continuous-time models with linear state space layers. InNeurIPS, pages 572–585. Curran Associates, Inc., 2021. 1

  12. [12]

    Efficiently mod- eling long sequences with structured state spaces

    Albert Gu, Karan Goel, and Christopher R´e. Efficiently mod- eling long sequences with structured state spaces. InICLR,

  13. [13]

    Mamba3d: Enhancing local features for 3d point cloud anal- ysis via state space model

    Xu Han, Yuan Tang, Zhaoxuan Wang, and Xianzhi Li. Mamba3d: Enhancing local features for 3d point cloud anal- ysis via state space model. InACM MM, pages 4995–5004. ACM, 2024. 1, 2, 3, 4, 6, 7

  14. [14]

    Localmamba: Visual state space model with windowed selective scan

    Tao Huang, Xiaohuan Pei, Shan You, Fei Wang, Chen Qian, and Chang Xu. Localmamba: Visual state space model with windowed selective scan. InECCV, pages 12–22. Springer Nature Switzerland. 2

  15. [15]

    An im- age is worth 16x16 words: Transformers for image recogni- tion at scale

    Alexander Kolesnikov, Alexey Dosovitskiy, Dirk Weis- senborn, Georg Heigold, Jakob Uszkoreit, Lucas Beyer, Matthias Minderer, Mostafa Dehghani, Neil Houlsby, Syl- vain Gelly, Thomas Unterthiner, and Xiaohua Zhai. An im- age is worth 16x16 words: Transformers for image recogni- tion at scale. InICLR, 2021. 3

  16. [16]

    E-mamba: An efficient mamba point cloud analysis method with enhanced feature representation.Neurocomputing, 639:130201, 2025

    Dengao Li, Zhichao Gao, Shufeng Hao, Ziyou Xun, Jiajian Song, Jie Cheng, and Jumin Zhao. E-mamba: An efficient mamba point cloud analysis method with enhanced feature representation.Neurocomputing, 639:130201, 2025. 2, 6

  17. [17]

    Point- mamba: A simple state space model for point cloud anal- ysis

    Dingkang Liang, Xin Zhou, Wei Xu, Xingkui Zhu, Zhikang Zou, Xiaoqing Ye, Xiao Tan, and Xiang Bai. Point- mamba: A simple state space model for point cloud anal- ysis. InNeurIPS, pages 32653–32677. Curran Associates, Inc., 2024. 1, 2, 6, 7

  18. [18]

    Point cloud gen- eration using deep adversarial local features for augmented and mixed reality contents.IEEE Transactions on Consumer Electronics, 68(1):69–76, 2022

    Sohee Lim, Minwoo Shin, and Joonki Paik. Point cloud gen- eration using deep adversarial local features for augmented and mixed reality contents.IEEE Transactions on Consumer Electronics, 68(1):69–76, 2022. 1

  19. [19]

    Convolution in the cloud: Learning deformable kernels in 3d graph convolution networks for point cloud analysis

    Zhi-Hao Lin, Sheng-Yu Huang, and Yu-Chiang Frank Wang. Convolution in the cloud: Learning deformable kernels in 3d graph convolution networks for point cloud analysis. In CVPR, pages 1797–1806, 2020. 2

  20. [20]

    Hymamba: Mamba with hybrid geometry-feature cou- pling for efficient point cloud classification.arXiv preprint arXiv:2505.11099, 2025

    Bin Liu, Chunyang Wang, Xuelian Liu, Bo Xiao, and Guan Xi. Hymamba: Mamba with hybrid geometry-feature cou- pling for efficient point cloud classification.arXiv preprint arXiv:2505.11099, 2025. 1, 3, 7

  21. [21]

    Masked discrimina- tion for self-supervised learning on point clouds

    Haotian Liu, Mu Cai, and Yong Jae Lee. Masked discrimina- tion for self-supervised learning on point clouds. InECCV, pages 657–675, Cham, 2022. 7

  22. [22]

    Defmamba: Deformable visual state space model

    Leiye Liu, Miao Zhang, Jihao Yin, Tingwei Liu, Wei Ji, Yon- gri Piao, and Huchuan Lu. Defmamba: Deformable visual state space model. InCVPR, pages 8838–8847, 2025. 2, 4, 5

  23. [23]

    Point cloud classification using content-based trans- former via clustering in feature space.IEEE/CAA Journal of Automatica Sinica, 11(1):231, 2024

    Yahui Liu, Bin Tian, Yisheng Lv, Lingxi Li, and Fei-Yue Wang. Point cloud classification using content-based trans- former via clustering in feature space.IEEE/CAA Journal of Automatica Sinica, 11(1):231, 2024. 6

  24. [24]

    Vmamba: Visual state space model

    Yue Liu, Yunjie Tian, Yuzhong Zhao, Hongtian Yu, Lingxi Xie, Yaowei Wang, Qixiang Ye, Jianbin Jiao, and Yunfan Liu. Vmamba: Visual state space model. InNeurIPS, pages 103031–103063. Curran Associates, Inc., 2024. 2

  25. [25]

    Flatformer: Flattened window attention for effi- cient point cloud transformer

    Zhijian Liu, Xinyu Yang, Haotian Tang, Shang Yang, and Song Han. Flatformer: Flattened window attention for effi- cient point cloud transformer. InCVPR, pages 1200–1211,

  26. [26]

    Exploring token serialization for mamba-based lidar point cloud segmentation.IEEE Transactions on Geoscience and Remote Sensing, 63:1–14, 2025

    Dening Lu, Kyle Gao, Jonathan Li, Dedong Zhang, and Lin- lin Xu. Exploring token serialization for mamba-based lidar point cloud segmentation.IEEE Transactions on Geoscience and Remote Sensing, 63:1–14, 2025. 2

  27. [27]

    Dening Lu, Linlin Xu, Jun Zhou, Kyle Gao, Zheng Gong, and Dedong Zhang. 3d-umamba: 3d u-net with state space model for semantic segmentation of multi-source LiDAR point clouds.International Journal of Applied Earth Ob- servation and Geoinformation, 136:104401, 2025. 2

  28. [28]

    Yatian Pang, Wenxiao Wang, Francis E. H. Tay, Wei Liu, Yonghong Tian, and Li Yuan. Masked autoencoders for point cloud self-supervised learning. InECCV, pages 604–621, Cham, 2022. 6, 7

  29. [29]

    Pointnet++: Deep hierarchical feature learning on point sets in a metric space

    Charles Ruizhongtai Qi, Li Yi, Hao Su, and Leonidas J Guibas. Pointnet++: Deep hierarchical feature learning on point sets in a metric space. InNeurIPS. Curran Associates, Inc., 2017. 6, 7

  30. [30]

    Qi, Jean-Emmanuel Deschaud, Beatriz Marcotegui, Franc ¸ois Goulette, and Leonidas Guibas

    Hugues Thomas, Charles R. Qi, Jean-Emmanuel Deschaud, Beatriz Marcotegui, Franc ¸ois Goulette, and Leonidas Guibas. Kpconv: Flexible and deformable convolution for point clouds. InICCV, pages 6410–6419, 2019. 2

  31. [31]

    Revisiting point cloud classification: A new benchmark dataset and classification model on real-world data

    Mikaela Angelina Uy, Quang-Hieu Pham, Binh-Son Hua, Thanh Nguyen, and Sai-Kit Yeung. Revisiting point cloud classification: A new benchmark dataset and classification model on real-world data. InICCV, pages 1588–1597, 2019. 6

  32. [32]

    Gomez, Łukasz Kaiser, and Illia Polosukhin

    Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszko- reit, Llion Jones, Aidan N. Gomez, Łukasz Kaiser, and Illia Polosukhin. Attention is all you need. InNeurIPS, page 6000–6010, Red Hook, NY , USA, 2017. Curran Associates Inc. 2

  33. [33]

    H. Wang, Q. Liu, X. Yue, J. Lasenby, and M. J. Kusner. Unsupervised point cloud pre-training via occlusion comple- tion. InICCV, pages 9762–9772, 2021. 7

  34. [34]

    Octformer: Octree-based transformers for 3d point clouds.ACM Trans

    Peng-Shuai Wang. Octformer: Octree-based transformers for 3d point clouds.ACM Trans. Graph., 42(4), 2023. 1

  35. [35]

    Top- net: Transformer-efficient occupancy prediction network for octree-structured point cloud geometry compression

    Xinjie Wang, Yifan Zhang, Ting Liu, Xinpu Liu, Ke Xu, Jianwei Wan, Yulan Guo, and Hanyun Wang. Top- net: Transformer-efficient occupancy prediction network for octree-structured point cloud geometry compression. In CVPR, pages 27305–27314, 2025. 1

  36. [36]

    Sarma, Michael M

    Yue Wang, Yongbin Sun, Ziwei Liu, Sanjay E. Sarma, Michael M. Bronstein, and Justin M. Solomon. Dynamic graph cnn for learning on point clouds.ACM Trans. Graph., 38(5), 2019. 1, 6, 7

  37. [37]

    Pointconv: Deep convolutional networks on 3d point clouds

    Wenxuan Wu, Zhongang Qi, and Li Fuxin. Pointconv: Deep convolutional networks on 3d point clouds. InCVPR, pages 9613–9622, 2019. 2

  38. [38]

    Point transformer v2: Grouped vector atten- tion and partition-based pooling

    Xiaoyang Wu, Yixing Lao, Li Jiang, Xihui Liu, and Heng- shuang Zhao. Point transformer v2: Grouped vector atten- tion and partition-based pooling. InNeurIPS, 2022. 2

  39. [39]

    Point transformer v3: Simpler, faster, stronger

    Xiaoyang Wu, Li Jiang, Peng-Shuai Wang, Zhijian Liu, Xi- hui Liu, Yu Qiao, Wanli Ouyang, Tong He, and Hengshuang Zhao. Point transformer v3: Simpler, faster, stronger. In CVPR, pages 4840–4851, 2024. 2

  40. [40]

    3d shapenets: A deep representation for volumetric shapes

    Zhirong Wu, Shuran Song, Aditya Khosla, Fisher Yu, Lin- guang Zhang, Xiaoou Tang, and Jianxiong Xiao. 3d shapenets: A deep representation for volumetric shapes. In CVPR, pages 1912–1920, 2015. 6

  41. [41]

    Vision transformer with deformable attention

    Zhuofan Xia, Xuran Pan, Shiji Song, Li Erran Li, and Gao Huang. Vision transformer with deformable attention. In CVPR, pages 4784–4793, 2022. 2, 4, 5

  42. [42]

    Chenhongyi Yang, Zehui Chen, Miguel Espinosa, Linus Er- icsson, Zhenyu Wang, Jiaming Liu, and Elliot J. Crowley. Plainmamba: Improving non-hierarchical mamba in visual recognition. In35th British Machine Vision Conference (BMVC), 2024. 2, 3, 4, 6

  43. [43]

    Grid mamba:grid state space model for large-scale point cloud analysis.Neurocomputing, 636: 129985

    Yulong Yang, Tianzhou Xun, Kuangrong Hao, Bing Wei, and Xue-song Tang. Grid mamba:grid state space model for large-scale point cloud analysis.Neurocomputing, 636: 129985. 1, 2, 4

  44. [44]

    Mambaout: Do we really need mamba for vision? InCVPR, pages 4484–4496, 2025

    Weihao Yu and Xinchao Wang. Mambaout: Do we really need mamba for vision? InCVPR, pages 4484–4496, 2025. 2, 4

  45. [45]

    Point-bert: Pre-training 3d point cloud transformers with masked point modeling

    Xumin Yu, Lulu Tang, Yongming Rao, Tiejun Huang, Jie Zhou, and Jiwen Lu. Point-bert: Pre-training 3d point cloud transformers with masked point modeling. InCVPR, pages 19291–19300, 2022. 3, 6, 7

  46. [46]

    Mambamos: Lidar-based 3d moving object segmentation with motion- aware state space model.arXiv preprint arXiv:2404.12794,

    Kang Zeng, Hao Shi, Jiacheng Lin, Siyu Li, Jintao Cheng, Kaiwei Wang, Zhiyong Li, and Kailun Yang. Mambamos: Lidar-based 3d moving object segmentation with motion- aware state space model.arXiv preprint arXiv:2404.12794,

  47. [47]

    V oxel mamba: group-free state space models for point cloud based 3d object detection

    Guowen Zhang, Lue Fan, Chenhang He, Zhen Lei, Zhaoxi- ang Zhang, and Lei Zhang. V oxel mamba: group-free state space models for point cloud based 3d object detection. In NeurIPS, Red Hook, NY , USA, 2024. Curran Associates Inc. 2

  48. [48]

    To- wards unsupervised object detection from lidar point clouds

    Lunjun Zhang, Anqi Joyce Yang, Yuwen Xiong, Sergio Casas, Bin Yang, Mengye Ren, and Raquel Urtasun. To- wards unsupervised object detection from lidar point clouds. InCVPR, pages 9317–9328, 2023. 1

  49. [49]

    Point cloud mamba: Point cloud learning via state space model

    Tao Zhang, Haobo Yuan, Lu Qi, Jiangning Zhang, Qianyu Zhou, Shunping Ji, Shuicheng Yan, and Xiangtai Li. Point cloud mamba: Point cloud learning via state space model. In AAAI, pages 10121–10130, 2025. 1, 2, 6, 7

  50. [50]

    To- wards more diverse and challenging pre-training for point cloud learning: Self-supervised cross reconstruction with de- coupled views

    Xiangdong Zhang, Shaofeng Zhang, and Junchi Yan. To- wards more diverse and challenging pre-training for point cloud learning: Self-supervised cross reconstruction with de- coupled views. InICCV, 2025. 6, 7

  51. [51]

    Point transformer

    Hengshuang Zhao, Li Jiang, Jiaya Jia, Philip Torr, and Vladlen Koltun. Point transformer. InICCV, pages 16239– 16248, 2021. 1

  52. [52]

    Point cloud pre-training with diffusion models

    Xiao Zheng, Xiaoshui Huang, Guofeng Mei, Yuenan Hou, Zhaoyang Lyu, Bo Dai, Wanli Ouyang, and Yongshun Gong. Point cloud pre-training with diffusion models. InCVPR, pages 22935–22945, 2024. 6

  53. [53]

    Centerformer: Center-based transformer for 3d object detection

    Zixiang Zhou, Xiangchen Zhao, Yu Wang, Panqu Wang, and Hassan Foroosh. Centerformer: Center-based transformer for 3d object detection. InECCV, 2022. 2

  54. [54]

    Vision mamba: Efficient visual representation learning with bidirectional state space model

    Lianghui Zhu, Bencheng Liao, Qian Zhang, Xinlong Wang, Wenyu Liu, and Xinggang Wang. Vision mamba: Efficient visual representation learning with bidirectional state space model. InICML, pages 62429–62442, 2024. 2 DM3D: Deformable Mamba via Offset-Guided Gaussian Sequencing for Point Cloud Understanding Supplementary Material

  55. [55]

    The weighting function is defined as: W (t) ij = exp − (si−Jj )2 2σ2 t PN l=1 exp − (si−Jl)2 2σ2 t (18) whereσ t is the Gaussian scale in the sequence domain,J= [1,2,

    Analysis of GDR Differentiability In this section, we analyze the Gaussian weights and their derivatives with respect to the offset indicess i to examine the behavior of Gaussian-based Differentiable Reordering (GDR) mechanism. The weighting function is defined as: W (t) ij = exp − (si−Jj )2 2σ2 t PN l=1 exp − (si−Jl)2 2σ2 t (18) whereσ t is the Gaussian ...

  56. [56]

    mo- torbike

    More Experimental Details Implementation Details.Tab. 6 details the training and model parameters. We use the official pre-trained Point- MAE model. To avoid excessive offset from high layer stages, we set the number of stage layers to 6, stable con- vergence acrossσ t initializations (0.05–1), no gradient col- lapse.. We evaluate our method on ModelNet40...

  57. [57]

    auto", dt_min =0.001, dt_max=0.1, dt_init=

    Model details in PyTorch style pseudo-code We provide PyTorch-style pseudocode for the proposed modules, including the Deformable Scan for Point Clouds, Tri-Path Frequency Fusion, and Deformable Mamba Block. The complete implementation is available in the supple- mentary materials. Algorithm 1. Pseudo-code of the Deformable Scan For Point Cloud. # # Defor...