MT-PCR: Hybrid Mamba-Transformer Network with Spatial Serialization for Point Cloud Registration
Pith reviewed 2026-05-19 09:53 UTC · model grok-4.3
The pith
MT-PCR hybridizes Mamba and Transformer with Z-order serialization to register point clouds more accurately and with far less memory and compute than Transformer baselines.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
MT-PCR is the first point cloud registration framework that integrates Mamba and Transformer modules. Serializing point cloud features with Z-order space-filling curves enforces spatial locality so that an optimized Mamba encoder can model geometric structure; removing the order-indicator module improves performance in this setting. The serialized features are then refined by a Transformer stage, producing superior accuracy and efficiency with substantially lower GPU memory usage and FLOPs than Transformer-based and other state-of-the-art methods.
What carries the argument
Hybrid pipeline that serializes point features via Z-order curves, feeds the ordered sequence to a Mamba encoder for linear-complexity geometric modeling, then passes the result to a Transformer for feature refinement.
If this is right
- Higher-resolution point clouds can be registered without the downsampling that currently discards fine detail.
- Real-time 3D registration in robotics and autonomous driving becomes feasible on hardware with limited memory.
- Hybrid Mamba-Transformer stacks may reduce quadratic scaling bottlenecks across other 3D vision pipelines.
- Explicit order tokens are unnecessary once spatial locality is restored by serialization.
Where Pith is reading between the lines
- The same Z-order serialization could let Mamba process other irregular geometric data such as meshes or graphs without custom ordering modules.
- Performance on dynamic or noisy point clouds would test whether the locality assumption survives real-world sensor artifacts.
- The removal of the order-indicator token suggests that, for geometrically serialized sequences, learned positional signals may be redundant or even counterproductive.
Load-bearing premise
That serializing point cloud features with Z-order space-filling curves sufficiently enforces spatial locality for Mamba to model geometric structure effectively.
What would settle it
Measure registration accuracy on the same point clouds after randomly permuting the Z-order sequence; a large drop relative to the correctly ordered version would indicate that the serialization step is not carrying the claimed benefit.
Figures
read the original abstract
Point cloud registration (PCR) is a fundamental task in 3D computer vision and robotics. Most learning-based PCR methods rely on Transformer architectures, which suffer from quadratic computational complexity. This limitation restricts the resolution of point clouds that can be processed, inevitably leading to information loss. In contrast, Mamba, a recently proposed model based on state-space models, achieves linear computational complexity while maintaining strong long-range contextual modeling capabilities. However, directly applying Mamba to PCR tasks yields suboptimal performance due to the unordered and irregular nature of point cloud data. To address these challenges, we propose MT-PCR, the first point cloud registration framework that integrates Mamba and Transformer modules. Specifically, we serialize point cloud features using Z-order space-filling curves to enforce spatial locality, enabling Mamba to better model the geometric structure of the inputs. Additionally, we remove the order-indicator module commonly used in Mamba-based sequence modeling, leading to improved performance in our setting. The serialized features are then processed by an optimized Mamba encoder, followed by a Transformer-based feature refinement stage. Extensive experiments on multiple benchmarks demonstrate that MT-PCR outperforms Transformer-based and other state-of-the-art methods in both accuracy and efficiency, significantly reducing GPU memory usage and FLOPs.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript proposes MT-PCR, the first hybrid Mamba-Transformer framework for point cloud registration. It serializes point cloud features via Z-order space-filling curves to impose spatial locality so that a Mamba encoder can model geometric structure, removes the order-indicator module, and follows with Transformer-based refinement. Extensive experiments on multiple benchmarks are reported to demonstrate superior accuracy and efficiency over Transformer-based and other SOTA methods together with large reductions in GPU memory and FLOPs.
Significance. If the central claims are substantiated, the work would be significant: it directly tackles the quadratic complexity barrier of Transformer-only PCR pipelines by introducing a linear-complexity Mamba stage, potentially enabling higher-resolution registration in robotics and autonomous systems. The explicit design choice of Z-order serialization plus removal of the order-indicator constitutes a concrete, testable adaptation of state-space models to unordered 3D data, and the reported efficiency gains (memory and FLOPs) would be practically valuable if reproducible across standard benchmarks.
major comments (2)
- [Abstract and §3] Abstract and §3 (Method): The premise that Z-order space-filling curves applied to sparse, non-grid point clouds will impose sufficient spatial locality for the Mamba state-space recurrence to capture local rigidity and correspondence cues is load-bearing for both the accuracy and efficiency claims. Because point clouds require an implicit discretization or coordinate-to-index mapping whose locality properties are not guaranteed to align with true 3D neighborhoods, this step requires either a formal locality analysis or targeted ablations showing that geometrically adjacent points remain contiguous in the serialized sequence; without such evidence the claimed advantage over pure Transformers rests on an unverified assumption.
- [§4] §4 (Experiments): The headline performance and efficiency numbers are presented without sufficient detail on baseline implementations, statistical significance testing, or controls for post-hoc hyper-parameter tuning. Given that the soundness assessment is limited by the absence of these elements, the cross-method superiority claims cannot yet be considered fully secured.
minor comments (2)
- [§3] Notation for the serialized feature sequence and the precise definition of the Z-order mapping should be introduced with an equation or pseudocode early in §3 to avoid ambiguity for readers unfamiliar with space-filling curves on irregular data.
- [Abstract] The abstract would benefit from naming the specific benchmarks (e.g., ModelNet, KITTI, 3DMatch) rather than referring generically to “multiple benchmarks.”
Simulated Author's Rebuttal
We sincerely thank the referee for the constructive and detailed feedback. The comments highlight important aspects of our methodological assumptions and experimental presentation that we have addressed in the revised manuscript to strengthen the work.
read point-by-point responses
-
Referee: [Abstract and §3] The premise that Z-order space-filling curves applied to sparse, non-grid point clouds will impose sufficient spatial locality for the Mamba state-space recurrence to capture local rigidity and correspondence cues is load-bearing for both the accuracy and efficiency claims. Because point clouds require an implicit discretization or coordinate-to-index mapping whose locality properties are not guaranteed to align with true 3D neighborhoods, this step requires either a formal locality analysis or targeted ablations showing that geometrically adjacent points remain contiguous in the serialized sequence; without such evidence the claimed advantage over pure Transformers rests on an unverified assumption.
Authors: We agree that explicit validation of locality preservation is essential for substantiating the design. In the revised manuscript we have added a new subsection (4.3) with targeted ablations that quantify neighborhood preservation under Z-order serialization. These include (i) the fraction of k-nearest neighbors retained within fixed-length windows of the serialized sequence across varying point densities and (ii) visualizations contrasting Z-order ordering with random and grid-based alternatives. The results show that more than 82% of local 3D neighbors remain contiguous within windows of size 64, providing direct empirical support for the assumption and clarifying the advantage relative to pure Transformer pipelines. revision: yes
-
Referee: [§4] The headline performance and efficiency numbers are presented without sufficient detail on baseline implementations, statistical significance testing, or controls for post-hoc hyper-parameter tuning. Given that the soundness assessment is limited by the absence of these elements, the cross-method superiority claims cannot yet be considered fully secured.
Authors: We thank the referee for underscoring the need for greater experimental transparency. The revised Section 4 now provides: (1) a supplementary table listing exact baseline implementations, library versions, and hyper-parameter values used for each compared method; (2) performance metrics reported as mean ± standard deviation over five independent runs with different random seeds to demonstrate statistical stability; and (3) an explicit description of the hyper-parameter selection protocol, which applied identical validation-based tuning to all methods before final test evaluation. These additions remove ambiguity and reinforce the reliability of the reported superiority and efficiency gains. revision: yes
Circularity Check
No significant circularity: architectural design validated empirically
full rationale
The paper presents MT-PCR as a novel hybrid architecture that applies Z-order serialization to point cloud features to enable effective Mamba processing, removes the order-indicator module, and follows with Transformer refinement. These are explicit design choices justified by the unordered nature of point clouds and validated through benchmark experiments showing accuracy and efficiency gains. No derivation chain reduces a claimed result to its own inputs by construction, no self-citation load-bearing premises appear in the provided text, and performance claims rest on comparative empirical results rather than fitted parameters or self-referential equations.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption Z-order space-filling curves enforce spatial locality in unordered point cloud features
Lean theorems connected to this paper
-
Foundation/AlexanderDuality.leanalexander_duality_circle_linking unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
we serialize point cloud features using Z-order space-filling curves to enforce spatial locality, enabling Mamba to better model the geometric structure
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Reference graph
Works this paper leans on
-
[1]
Sheng Ao, Qingyong Hu, Hanyun Wang, Kai Xu, and Yulan Guo. 2023. Buffer: Balancing accuracy, efficiency, and generalizability in point cloud registration. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 1255–1264
work page 2023
-
[2]
Sheng Ao, Qingyong Hu, Bo Yang, Andrew Markham, and Yulan Guo. 2021. Spinnet: Learning a general surface descriptor for 3d point cloud registration. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 11753–11762
work page 2021
-
[3]
Simran Arora, Sabri Eyuboglu, Michael Zhang, Aman Timalsina, Silas Alberti, Dylan Zinsley, James Zou, Atri Rudra, and Christopher Ré. 2024. Simple linear attention language models balance the recall-throughput tradeoff. ArXiv preprint abs/2402.18668 (2024). https://arxiv.org/abs/2402.18668
-
[4]
Xuyang Bai, Zixin Luo, Lei Zhou, Hongbo Fu, Long Quan, and Chiew-Lan Tai
-
[5]
In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition
D3feat: Joint learning of dense detection and description of 3d local fea- tures. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 6359–6367
-
[6]
Cesar Cadena, Luca Carlone, Henry Carrillo, Yasir Latif, Davide Scaramuzza, José Neira, Ian Reid, and John J Leonard. 2016. Past, present, and future of simultaneous localization and mapping: Toward the robust-perception age. IEEE Transactions on robotics 32, 6 (2016), 1309–1332
work page 2016
-
[7]
Julie Carmigniani, Borko Furht, Marco Anisetti, Paolo Ceravolo, Ernesto Damiani, and Misa Ivkovic. 2011. Augmented reality technologies, systems and applications. Multimedia tools and applications 51 (2011), 341–377
work page 2011
-
[8]
Suyi Chen, Hao Xu, Ru Li, Guanghui Liu, Chi-Wing Fu, and Shuaicheng Liu
-
[9]
In Proceedings of the IEEE/CVF International Conference on Computer Vision
SIRA-PCR: Sim-to-Real Adaptation for 3D Point Cloud Registration. In Proceedings of the IEEE/CVF International Conference on Computer Vision . 14394– 14405
- [10]
-
[11]
Yuhong Chou, Man Yao, Kexin Wang, Yuqi Pan, Rui-Jie Zhu, Jibin Wu, Yiran Zhong, Yu Qiao, Bo Xu, and Guoqi Li. 2024. MetaLA: Unified optimal linear ap- proximation to softmax attention map. Advances in Neural Information Processing Systems 37 (2024), 71034–71067
work page 2024
-
[12]
Christopher Choy, Jaesik Park, and Vladlen Koltun. 2019. Fully convolutional geometric features. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 8958–8966
work page 2019
-
[13]
Haowen Deng, Tolga Birdal, and Slobodan Ilic. 2018. Ppfnet: Global context aware local features for robust 3d point matching. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition . 195–205
work page 2018
-
[14]
MA FISCHLER AND. 1981. Random sample consensus: a paradigm for model fitting with applications to image analysis and automated cartography. Commun. ACM 24, 6 (1981), 381–395
work page 1981
-
[15]
Andreas Geiger, Philip Lenz, and Raquel Urtasun. 2012. Are we ready for au- tonomous driving? the kitti vision benchmark suite. In IEEE Conference on Com- puter Vision and Pattern Recognition . IEEE, 3354–3361
work page 2012
-
[16]
Zan Gojcic, Caifa Zhou, Jan D Wegner, and Andreas Wieser. 2019. The perfect match: 3d point cloud matching with smoothed densities. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition . 5545–5554
work page 2019
-
[17]
Albert Gu and Tri Dao. 2023. Mamba: Linear-Time Sequence Modeling with Selective State Spaces. arXiv preprint arXiv:2312.00752 (2023)
work page internal anchor Pith review Pith/arXiv arXiv 2023
-
[18]
Albert Gu, Karan Goel, Ankit Gupta, and Christopher Ré. 2022. On the parame- terization and initialization of diagonal state space models. Advances in Neural Information Processing Systems 35 (2022), 35971–35983
work page 2022
-
[19]
Shiyi Guo, Yihong Wu, Binjian Xie, Bingxi Liu, and Tong Jia. 2024. Low-Overlap Point Cloud Registration by Semiglobal Block Matching. IEEE Transactions on Industrial Informatics (2024)
work page 2024
-
[20]
Xu Han, Yuan Tang, Zhaoxuan Wang, and Xianzhi Li. 2024. Mamba3d: Enhancing local features for 3d point cloud analysis via state space model. In Proceedings of the 32nd ACM International Conference on Multimedia . 4995–5004
work page 2024
-
[21]
Renlang Huang, Yufan Tang, Jiming Chen, and Liang Li. 2024. A consistency- aware spot-guided transformer for versatile and hierarchical point cloud regis- tration. Proc. Conf. Neural Inf. Process. Syst. (2024)
work page 2024
-
[22]
Shengyu Huang, Zan Gojcic, Mikhail Usvyatsov, Andreas Wieser, and Konrad Schindler. 2021. Predator: Registration of 3d point clouds with low overlap. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 4267–4276
work page 2021
-
[23]
Rudolph Emil Kalman. 1960. A new approach to linear filtering and prediction problems. (1960)
work page 1960
-
[24]
Jiaxin Li and Gim Hee Lee. 2019. Usip: Unsupervised stable interest point detec- tion from 3d point clouds. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 361–370
work page 2019
-
[25]
Dingkang Liang, Xin Zhou, Wei Xu, Xingkui Zhu, Zhikang Zou, Xiaoqing Ye, Xiao Tan, and Xiang Bai. 2024. PointMamba: A Simple State Space Model for Point Cloud Analysis. In Advances in Neural Information Processing Systems
work page 2024
- [26]
-
[27]
Yue Liu, Yunjie Tian, Yuzhong Zhao, Hongtian Yu, Lingxi Xie, Yaowei Wang, Qixiang Ye, Jianbin Jiao, and Yunfan Liu. 2024. Vmamba: Visual state space model. Advances in neural information processing systems 37 (2024), 103031–103063
work page 2024
-
[28]
Ilya Loshchilov and Frank Hutter. 2018. Decoupled Weight Decay Regularization. In International Conference on Learning Representations
work page 2018
-
[29]
Fan Lu, Guang Chen, Yinlong Liu, Lijun Zhang, Sanqing Qu, Shu Liu, Rongqi Gu, and Changjun Jiang. 2023. HRegNet: A hierarchical network for efficient and accurate outdoor LiDAR point cloud registration. IEEE Transactions on Pattern Analysis and Machine Intelligence (2023)
work page 2023
-
[30]
Weixin Lu, Yao Zhou, Guowei Wan, Shenhua Hou, and Shiyu Song. 2019. L3-net: Towards learning based lidar localization for autonomous driving. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition . 6389–6398
work page 2019
-
[31]
Aaron van den Oord, Yazhe Li, and Oriol Vinyals. 2018. Representation learning with contrastive predictive coding. arXiv preprint arXiv:1807.03748 (2018)
work page internal anchor Pith review Pith/arXiv arXiv 2018
-
[32]
Adam Paszke, Sam Gross, Francisco Massa, Adam Lerer, James Bradbury, Gregory Chanan, Trevor Killeen, Zeming Lin, Natalia Gimelshein, Luca Antiga, Alban Des- maison, Andreas Kopf, Edward Yang, Zachary DeVito, Martin Raison, Alykhan Tejani, Sasank Chilamkurthy, Benoit Steiner, Lu Fang, Junjie Bai, and Soumith Chintala. 2019. PyTorch: An Imperative Style, Hi...
work page 2019
-
[33]
Charles R Qi, Hao Su, Kaichun Mo, and Leonidas J Guibas. 2017. Pointnet: Deep learning on point sets for 3d classification and segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition . 652–660
work page 2017
-
[34]
Zheng Qin, Hao Yu, Changjian Wang, Yulan Guo, Yuxing Peng, Slobodan Ilic, Dewen Hu, and Kai Xu. 2023. GeoTransformer: Fast and Robust Point Cloud Registration With Geometric Transformer. IEEE Transactions on Pattern Analysis and Machine Intelligence (2023)
work page 2023
-
[35]
Liliang Ren, Yang Liu, Yadong Lu, Yelong Shen, Chen Liang, and Weizhu Chen
-
[36]
arXiv preprint arXiv:2406.07522 , year=
Samba: Simple Hybrid State Space Models for Efficient Unlimited Context Language Modeling. ArXiv preprint abs/2406.07522 (2024). https://arxiv.org/abs/ 2406.07522
-
[37]
Radu Bogdan Rusu, Nico Blodow, and Michael Beetz. 2009. Fast point feature histograms (FPFH) for 3D registration. In 2009 IEEE International Conference on Robotics and Automation. IEEE, 3212–3217
work page 2009
-
[38]
Radu Bogdan Rusu, Nico Blodow, Zoltan Csaba Marton, and Michael Beetz. 2008. Aligning point cloud views using persistent feature histograms. In 2008 IEEE/RSJ International Conference on Intelligent Robots and Systems . IEEE, 3384–3391
work page 2008
-
[39]
Samuele Salti, Federico Tombari, and Luigi Di Stefano. 2014. SHOT: Unique signatures of histograms for surface and texture description. Computer Vision and Image Understanding 125 (2014), 251–264
work page 2014
-
[40]
Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Łukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need. Advances in neural information processing systems 30 (2017)
work page 2017
-
[41]
Haiping Wang, Yuan Liu, Zhen Dong, and Wenping Wang. 2022. You only hypothesize once: Point cloud registration with rotation-equivariant descriptors. In Proceedings of the 30th ACM International Conference on Multimedia . 1630– 1641
work page 2022
-
[42]
Jingtao Wang and Zechao Li. 2024. 3DPCP-Net: A Lightweight Progressive 3D Correspondence Pruning Network for Accurate and Efficient Point Cloud Regis- tration. In Proceedings of the 32nd ACM International Conference on Multimedia . MM ’25, October 28-31, 2025, Dublin, Ireland Bingxi Liu, An Liu, Hao Chen, Jinqiang Cui, Yiqun Wang, and Hong Zhang ∗ 1885–1894
work page 2024
-
[43]
Yue Wang and Justin M Solomon. 2019. Deep closest point: Learning representa- tions for point cloud registration. In Proceedings of the IEEE/CVF international conference on computer vision . 3523–3532
work page 2019
-
[44]
Fan Yang, Lin Guo, Zhi Chen, and Wenbing Tao. 2022. One-inlier is first: Towards efficient position encoding for point cloud registration. Advances in Neural Information Processing Systems 35 (2022), 6982–6995
work page 2022
-
[45]
Heng Yang, Jingnan Shi, and Luca Carlone. 2020. Teaser: Fast and certifiable point cloud registration. IEEE Transactions on Robotics 37, 2 (2020), 314–333
work page 2020
-
[46]
Zi Jian Yew and Gim Hee Lee. 2018. 3dfeat-net: Weakly supervised local 3d features for point cloud registration. In Proceedings of the European Conference on Computer Vision (ECCV) . 607–623
work page 2018
-
[47]
Zi Jian Yew and Gim Hee Lee. 2022. Regtr: End-to-end point cloud correspon- dences with transformers. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition . 6677–6686
work page 2022
-
[48]
Hao Yu, Fu Li, Mahdi Saleh, Benjamin Busam, and Slobodan Ilic. 2021. Cofinet: Reliable coarse-to-fine correspondences for robust point cloud registration. Ad- vances in Neural Information Processing Systems 34 (2021), 23872–23884
work page 2021
-
[49]
Hao Yu, Zheng Qin, Ji Hou, Mahdi Saleh, Dongsheng Li, Benjamin Busam, and Slobodan Ilic. 2023. Rotation-invariant transformer for point cloud matching. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 5384–5393
work page 2023
-
[50]
Junle Yu, Luwei Ren, Yu Zhang, Wenhui Zhou, Lili Lin, and Guojun Dai. 2023. PEAL: Prior-embedded explicit attention learning for low-overlap point cloud registration. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 17702–17711
work page 2023
-
[51]
Andy Zeng, Shuran Song, Matthias Nießner, Matthew Fisher, Jianxiong Xiao, and Thomas Funkhouser. 2017. 3dmatch: Learning local geometric descriptors from rgb-d reconstructions. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 1802–1811
work page 2017
-
[52]
Xiyu Zhang, Jiaqi Yang, Shikun Zhang, and Yanning Zhang. 2023. 3D registration with maximal cliques. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition . 17745–17754
work page 2023
-
[53]
Lianghui Zhu, Bencheng Liao, Qian Zhang, Xinlong Wang, Wenyu Liu, and Xinggang Wang. [n. d.]. Vision Mamba: Efficient Visual Representation Learning with Bidirectional State Space Model. In Forty-first International Conference on Machine Learning
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.