pith. machine review for the scientific record. sign in

arxiv: 2605.03438 · v2 · submitted 2026-05-05 · 💻 cs.CV

Recognition: no theorem link

Mantis: Mamba-native Tuning is Efficient for 3D Point Cloud Foundation Models

Authors on Pith no claims yet

Pith reviewed 2026-05-12 02:40 UTC · model grok-4.3

classification 💻 cs.CV
keywords Mambaparameter-efficient fine-tuning3D point cloudsfoundation modelsState-Aware AdapterDual-Serialization Consistency Distillationpoint cloud serializationselective state-space models
0
0 comments X

The pith

Mantis introduces a Mamba-native parameter-efficient fine-tuning method for 3D point cloud foundation models that reaches competitive accuracy with only about 5% trainable parameters.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Pre-trained 3D point cloud foundation models transfer well across tasks but full fine-tuning is expensive in compute and storage. Existing parameter-efficient methods were built for Transformer backbones and rely on token-level changes, which create a mismatch with Mamba's state-level sequence dynamics and cause large accuracy drops when applied directly. The paper proposes Mantis, the first framework designed for Mamba, that adds a State-Aware Adapter to insert lightweight task control into the selective state-space updates while freezing the backbone. It further adds Dual-Serialization Consistency Distillation to reduce instability from different valid point-orderings. Experiments on multiple benchmarks show this combination delivers performance close to full fine-tuning using roughly 5 percent of the parameters.

Core claim

Mantis is the first Mamba-native PEFT framework for 3D point cloud foundation models. It introduces a State-Aware Adapter that injects lightweight task-conditioned control signals into Mamba's selective state-space updates, enabling state-level adaptation without updating the pre-trained backbone, and applies Dual-Serialization Consistency Distillation to regularize across different valid point-cloud serializations, thereby reducing serialization-induced instability. Extensive experiments across multiple benchmarks show that Mantis achieves competitive performance with only about 5% trainable parameters.

What carries the argument

The State-Aware Adapter, which injects task-specific control signals directly into Mamba's selective state-space updates to support state-level adaptation in a frozen backbone.

If this is right

  • Mamba-based 3D point cloud models can be adapted to new tasks without full retraining or large storage costs.
  • Token-level PEFT methods are insufficient for state-space backbones and must be replaced by state-level mechanisms.
  • Regularizing across multiple point-cloud serializations stabilizes training of Mamba models on unordered 3D data.
  • Foundation models in 3D vision become practical to deploy when only a small fraction of parameters need updating per task.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Similar state-aware adapters could be tested on other state-space sequence models in vision or multimodal settings.
  • The 5% parameter budget may enable on-device adaptation of 3D models where full fine-tuning is impossible.
  • The dual-serialization regularization might extend to other ordering-sensitive data types such as sequences of images or meshes.
  • The framework highlights that backbone-specific PEFT design is necessary to avoid the accuracy degradation seen when transferring Transformer methods to Mamba.

Load-bearing premise

The State-Aware Adapter successfully injects task-specific control into Mamba's selective state-space updates without degrading the pre-trained dynamics, and Dual-Serialization Consistency Distillation reduces serialization instability without new accuracy trade-offs.

What would settle it

On a held-out 3D point cloud benchmark, if Mantis at 5% trainable parameters shows substantially lower accuracy than full fine-tuning or existing PEFT baselines, the claim of efficient competitive performance would be falsified.

Figures

Figures reproduced from arXiv: 2605.03438 by Ajmal Saeed Mian, Jian Liu, Jihua Zhu, Zihao Guo.

Figure 1
Figure 1. Figure 1: Comprehensive comparisons between our Mantis and several representative counterparts[ view at source ↗
Figure 2
Figure 2. Figure 2: Pipeline of Mantis. Raw point clouds are sampled using Farthest Point Sampling (FPS) to select representative key points, and local neighborhoods are formed via K-nearest neighbors (KNN) for each key point. The resulting patches are serialized along two complementary space-filling curves to produce dual-order patches. These patches are fused and processed by stacked SAA-Mamba blocks, where the selective SS… view at source ↗
Figure 3
Figure 3. Figure 3: Illustration of Dual-Serialization Consistency Distillation. During training, DSCD enforces cross￾serialization consistency at both feature and prediction levels. Although serialization enables Mamba to process unordered point clouds as sequential inputs, the resulting hidden-state propagation is inherently sensitive to the specific traversal order. Consequently, differ￾ent valid serializations of the same… view at source ↗
Figure 4
Figure 4. Figure 4: Qualitative analysis results for object classification and part segmentation. (a) The t-SNE visualization results on the PB_T50_RS variant with different fine-tuning schemes. (b) Comparison of full fine-tuning and Mantis on part segmentation with Mamba3D[12]. Ablation on different inserted layers. One straightforward way to further reduce the number of tunable parameters is to insert SAA into only a subset… view at source ↗
Figure 5
Figure 5. Figure 5: Comparison of computational efficiency and hyperparameter sensitivity on view at source ↗
Figure 6
Figure 6. Figure 6: The classification accuracy curves of full fine-tuning, IDPT[ view at source ↗
Figure 7
Figure 7. Figure 7: Visualization results for part segmentation on ShapeNetPart[46]. Projected prediction images from Mantis are shown across four different viewpoints, including the categories “Airplane”, “Car”, “Chair”, “Guitar” and “Motorbike”. 25 view at source ↗
Figure 8
Figure 8. Figure 8: Visualization results for part segmentation on ShapeNetPart[46]. Projected prediction images from Mantis are shown across four different viewpoints, including the categories “Table”, “Rocket”, “Pistol”, “Laptop” and “Lamp”. 26 view at source ↗
Figure 9
Figure 9. Figure 9: Visualization results for part segmentation on ShapeNetPart[46]. Projected prediction images from Mantis are shown across four different viewpoints, including the categories “Bag”, “Cap”, “Earphone”, “Knife”, “Mug” and “Skateboard”. 27 view at source ↗
read the original abstract

Pre-trained 3D point cloud foundation models (PFMs) have demonstrated strong transferability across diverse downstream tasks. However, full fine-tuning these models is computationally expensive and storage-intensive. Parameter-efficient fine-tuning (PEFT) offers a promising alternative, but existing PEFT approaches are primarily designed for Transformer-based backbones and rely on token-level prompting or feature transformation. Mamba-based backbones introduce a granularity mismatch between token-level adaptation and state-level sequence dynamics. Consequently, straightforward transfer of existing PEFT approaches to frozen Mamba backbones leads to substantial accuracy degradation and unstable optimization. To address this issue, we propose Mantis, the first Mamba-native PEFT framework for 3D PFMs. Specifically, a State-Aware Adapter (SAA) is introduced to inject lightweight task-conditioned control signals into selective state-space updates, enabling state-level adaptation while keeping the pre-trained backbone frozen. Moreover, different valid point cloud serializations are regularized by Dual-Serialization Consistency Distillation (DSCD), thereby reducing serialization-induced instability. Extensive experiments across multiple benchmarks demonstrate that our Mantis achieves competitive performance with only about 5% trainable parameters. Our code is available at https://github.com/gzhhhhhhh/Mantis.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

3 major / 2 minor

Summary. The paper introduces Mantis, the first Mamba-native parameter-efficient fine-tuning (PEFT) framework for 3D point cloud foundation models. It identifies a granularity mismatch when applying token-level Transformer PEFT methods to Mamba backbones and proposes two components: a State-Aware Adapter (SAA) that injects lightweight task-conditioned control signals directly into the selective state-space updates, and Dual-Serialization Consistency Distillation (DSCD) that regularizes predictions across different valid point-cloud serializations to reduce instability. The central claim is that Mantis achieves competitive performance on multiple 3D benchmarks while training only ~5% of the parameters with the backbone frozen.

Significance. If the performance claims and component contributions are substantiated, the work would be a useful first step toward efficient adaptation of Mamba-based 3D models, extending PEFT research beyond Transformer architectures. The open-source code release is a positive factor for reproducibility.

major comments (3)
  1. [§4] §4 (Experiments): The abstract asserts competitive results with ~5% trainable parameters, yet the provided text contains no quantitative tables, baseline comparisons, error bars, or statistical significance tests. Without these, the central performance claim cannot be evaluated.
  2. [§3.2] §3.2 (State-Aware Adapter): No equations or state-update analysis show how SAA injects task-specific signals at the state level rather than token level, nor any verification that pre-trained Mamba dynamics are preserved. Targeted ablations isolating SAA's effect on selective state-space parameters are required to support the mechanism.
  3. [§3.3] §3.3 (Dual-Serialization Consistency Distillation): The description of DSCD lacks a formal loss equation or analysis demonstrating that it reduces serialization-induced instability without introducing accuracy trade-offs. Component-wise ablations comparing runs with and without DSCD are needed to establish causality.
minor comments (2)
  1. [§3] Notation for the Mamba state-space parameters (e.g., A, B, C, Δ) should be explicitly aligned with the original Mamba paper to avoid ambiguity.
  2. [Figure 2] Figure captions for the SAA and DSCD diagrams should include a brief description of the data flow and which modules are frozen.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for their constructive feedback on our manuscript. We address each major comment point by point below and indicate the revisions we will make to improve clarity and substantiation of our claims.

read point-by-point responses
  1. Referee: [§4] §4 (Experiments): The abstract asserts competitive results with ~5% trainable parameters, yet the provided text contains no quantitative tables, baseline comparisons, error bars, or statistical significance tests. Without these, the central performance claim cannot be evaluated.

    Authors: We agree that the experimental section requires more explicit quantitative support to allow proper evaluation of the claims. The full manuscript describes results across benchmarks, but we will expand Section 4 in the revision to include complete tables with baseline comparisons, error bars from multiple runs, and statistical significance tests where applicable. revision: yes

  2. Referee: [§3.2] §3.2 (State-Aware Adapter): No equations or state-update analysis show how SAA injects task-specific signals at the state level rather than token level, nor any verification that pre-trained Mamba dynamics are preserved. Targeted ablations isolating SAA's effect on selective state-space parameters are required to support the mechanism.

    Authors: We will add the missing equations formalizing the SAA injection into the selective state-space updates, along with analysis demonstrating preservation of pre-trained Mamba dynamics. We will also include targeted ablations that isolate SAA's specific effects on the state-space parameters. revision: yes

  3. Referee: [§3.3] §3.3 (Dual-Serialization Consistency Distillation): The description of DSCD lacks a formal loss equation or analysis demonstrating that it reduces serialization-induced instability without introducing accuracy trade-offs. Component-wise ablations comparing runs with and without DSCD are needed to establish causality.

    Authors: We will incorporate a formal loss equation for DSCD and provide analysis showing its effect on reducing serialization instability without accuracy trade-offs. Component-wise ablations comparing performance with and without DSCD will be added to establish the contribution. revision: yes

Circularity Check

0 steps flagged

No circularity: novel adapters and distillation loss are independent proposals validated by experiments

full rationale

The paper introduces two new mechanisms (State-Aware Adapter for state-level control injection and Dual-Serialization Consistency Distillation for serialization regularization) to address a stated granularity mismatch between token-level PEFT and Mamba state dynamics. These are presented as original contributions, with performance claims resting on empirical results across benchmarks rather than any derivation that reduces outputs to fitted inputs or self-cited priors by construction. No equations, self-definitional loops, or load-bearing self-citations appear in the provided text that would force the reported ~5% parameter efficiency or competitive accuracy as tautological. The derivation chain is self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 2 invented entities

The central claim rests on two newly introduced modules whose effectiveness is asserted but not independently evidenced outside the paper's own experiments.

axioms (1)
  • domain assumption Mamba backbones exhibit a granularity mismatch between token-level adaptation and state-level sequence dynamics that causes existing PEFT methods to degrade.
    Invoked in the abstract to justify why Transformer PEFT cannot be transferred directly.
invented entities (2)
  • State-Aware Adapter (SAA) no independent evidence
    purpose: Inject lightweight task-conditioned control signals into selective state-space updates while keeping the backbone frozen.
    New module proposed to solve the stated mismatch; no external evidence of its behavior is supplied.
  • Dual-Serialization Consistency Distillation (DSCD) no independent evidence
    purpose: Regularize different valid point cloud serializations to reduce serialization-induced instability.
    New regularization technique introduced to stabilize training; effectiveness asserted via the paper's results.

pith-pipeline@v0.9.0 · 5528 in / 1324 out tokens · 55849 ms · 2026-05-12T02:40:32.993474+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

58 extracted references · 58 canonical work pages · 1 internal anchor

  1. [1]

    GAPrompt: Geometry-aware point cloud prompt for 3D vision model

    Zixiang Ai, Zichen Liu, Yuanhang Lei, Zhenyu Cui, Xu Zou, and Jiahuan Zhou. GAPrompt: Geometry-aware point cloud prompt for 3D vision model. InProceedings of the 42nd Interna- tional Conference on Machine Learning (ICML), 2025

  2. [2]

    Zamir, Helen Jiang, Ioannis Brilakis, Martin Fischer, and Silvio Savarese

    Iro Armeni, Ozan Sener, Amir R. Zamir, Helen Jiang, Ioannis Brilakis, Martin Fischer, and Silvio Savarese. 3d semantic parsing of large-scale indoor spaces. InProceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016

  3. [3]

    Spectral informed mamba for robust point cloud processing

    Ali Bahri, Moslem Yazdanpanah, Mehrdad Noori, Sahar Dastani, Milad Cheraghalikhani, Gus- tavo Adolfo Vargas Hakim, David Osowiechi, Farzad Beizaee, Ismail Ben Ayed, and Christian Desrosiers. Spectral informed mamba for robust point cloud processing. InProceedings of the Computer Vision and Pattern Recognition Conference (CVPR), 2025

  4. [4]

    Pointgpt: Auto-regressively generative pre-training from point clouds.Advances in Neural Information Processing Systems (NeurIPS), 2023

    Guangyan Chen, Meiling Wang, Yi Yang, Kai Yu, Li Yuan, and Yufeng Yue. Pointgpt: Auto-regressively generative pre-training from point clouds.Advances in Neural Information Processing Systems (NeurIPS), 2023

  5. [5]

    Adaptformer: Adapting vision transformers for scalable visual recognition.Advances in Neural Information Processing Systems (NeurIPS), 2022

    Shoufa Chen, Chongjian Ge, Zhan Tong, Jiangliu Wang, Yibing Song, Jue Wang, and Ping Luo. Adaptformer: Adapting vision transformers for scalable visual recognition.Advances in Neural Information Processing Systems (NeurIPS), 2022

  6. [6]

    Bert: Pre-training of deep bidirectional transformers for language understanding

    Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. Bert: Pre-training of deep bidirectional transformers for language understanding. InProceedings of the 2019 conference of the North American chapter of the association for computational linguistics: human language technologies, volume 1 (NAACL), 2019

  7. [7]

    Zigzagpointmamba: Spatial- semantic mamba for point cloud understanding

    Linshuang Diao, Sensen Song, Yurong Qian, and Dayong Ren. Zigzagpointmamba: Spatial- semantic mamba for point cloud understanding. InAdvances in Neural Information Processing Systems (NeurIPS), 2025

  8. [8]

    Runpei Dong, Zekun Qi, Linfeng Zhang, Junbo Zhang, Jianjian Sun, Zheng Ge, Li Yi, and Kaisheng Ma. Autoencoders as cross-modal teachers: Can pretrained 2d image transformers help 3d representation learning? InThe Eleventh International Conference on Learning Representations (ICLR), 2023

  9. [9]

    Mamba: Linear-time sequence modeling with selective state spaces

    Albert Gu and Tri Dao. Mamba: Linear-time sequence modeling with selective state spaces. In First conference on language modeling (COLM), 2024

  10. [10]

    Efficiently modeling long sequences with structured state spaces

    Albert Gu, Karan Goel, and Christopher Re. Efficiently modeling long sequences with structured state spaces. InInternational Conference on Learning Representations (ICLR), 2022

  11. [11]

    Joint-mae: 2d-3d joint masked autoencoders for 3d point cloud pre-training

    Ziyu Guo, Renrui Zhang, Longtian Qiu, Xianzhi Li, and Pheng-Ann Heng. Joint-mae: 2d-3d joint masked autoencoders for 3d point cloud pre-training. InProceedings of the Thirty-Second International Joint Conference on Artificial Intelligence (IJCAI), 2023

  12. [12]

    Mamba3d: Enhancing local features for 3d point cloud analysis via state space model

    Xu Han, Yuan Tang, Zhaoxuan Wang, and Xianzhi Li. Mamba3d: Enhancing local features for 3d point cloud analysis via state space model. InProceedings of the 32nd ACM International Conference on Multimedia (ACM MM), 2024

  13. [13]

    Most: Efficient monarch sparse tuning for 3d representation learning

    Xu Han, Yuan Tang, Jinfeng Xu, and Xianzhi Li. Most: Efficient monarch sparse tuning for 3d representation learning. InProceedings of the Computer Vision and Pattern Recognition Conference (CVPR), 2025

  14. [14]

    Über die stetige abbildung einer linie auf ein flächenstück

    David Hilbert. Über die stetige abbildung einer linie auf ein flächenstück. InDritter Band: Analysis· Grundlagen der Mathematik· Physik Verschiedenes: Nebst Einer Lebensgeschichte. Springer, 1935

  15. [15]

    Parameter-efficient transfer learning for nlp

    Neil Houlsby, Andrei Giurgiu, Stanislaw Jastrzebski, Bruna Morrone, Quentin De Laroussilhe, Andrea Gesmundo, Mona Attariyan, and Sylvain Gelly. Parameter-efficient transfer learning for nlp. InInternational conference on machine learning (ICML), 2019. 10

  16. [16]

    LoRA: Low-rank adaptation of large language models

    Edward J Hu, Yelong Shen, Phillip Wallis, Zeyuan Allen-Zhu, Yuanzhi Li, Shean Wang, Lu Wang, and Weizhu Chen. LoRA: Low-rank adaptation of large language models. In International Conference on Learning Representations (ICLR), 2022

  17. [17]

    Visual prompt tuning

    Menglin Jia, Luming Tang, Bor-Chun Chen, Claire Cardie, Serge Belongie, Bharath Hariharan, and Ser-Nam Lim. Visual prompt tuning. InEuropean conference on computer vision (ECCV). Springer, 2022

  18. [18]

    Li, Gary K.L

    Zhaoyi Jiang, Yi Xu, Frederick W.B. Li, Gary K.L. Tam, Chao Song, and Bailin Yang. Oct- mamba: Mamba-based octree context entropy model for point cloud geometry compression. Pattern Recognition (PR), 2026

  19. [19]

    Revisiting the parameter efficiency of adapters from the perspective of precision redundancy

    Shibo Jie, Haoqing Wang, and Zhi-Hong Deng. Revisiting the parameter efficiency of adapters from the perspective of precision redundancy. InProceedings of the IEEE/CVF international conference on computer vision (ICCV), 2023

  20. [20]

    Pointdico: Contrastive 3d representation learning guided by diffusion models

    Pengbo Li, Yiding Sun, and Haozhe Cheng. Pointdico: Contrastive 3d representation learning guided by diffusion models. In2025 International Joint Conference on Neural Networks (IJCNN), 2025

  21. [21]

    Prefix-tuning: Optimizing continuous prompts for generation

    Xiang Lisa Li and Percy Liang. Prefix-tuning: Optimizing continuous prompts for generation. InProceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers) (ACL), 2021

  22. [22]

    Pointcnn: Convolution on x-transformed points.Advances in neural information processing systems (NeurIPS), 2018

    Yangyan Li, Rui Bu, Mingchao Sun, Wei Wu, Xinhan Di, and Baoquan Chen. Pointcnn: Convolution on x-transformed points.Advances in neural information processing systems (NeurIPS), 2018

  23. [23]

    Pointmamba: A simple state space model for point cloud analysis

    Dingkang Liang, Xin Zhou, Wei Xu, Xingkui Zhu, Zhikang Zou, Xiaoqing Ye, Xiao Tan, and Xiang Bai. Pointmamba: A simple state space model for point cloud analysis. InAdvances in Neural Information Processing Systems (NeurIPS), 2024

  24. [24]

    Parameter-efficient fine-tuning in spectral domain for point cloud learning.IEEE transactions on pattern analysis and machine intelligence (TPAMI), 2025

    Dingkang Liang, Tianrui Feng, Xin Zhou, Yumeng Zhang, Zhikang Zou, and Xiang Bai. Parameter-efficient fine-tuning in spectral domain for point cloud learning.IEEE transactions on pattern analysis and machine intelligence (TPAMI), 2025

  25. [25]

    Masked discrimination for self-supervised learning on point clouds

    Haotian Liu, Mu Cai, and Yong Jae Lee. Masked discrimination for self-supervised learning on point clouds. InEuropean Conference on Computer Vision (ECCV). Springer, 2022

  26. [26]

    Relation-shape convolutional neural network for point cloud analysis

    Yongcheng Liu, Bin Fan, Shiming Xiang, and Chunhong Pan. Relation-shape convolutional neural network for point cloud analysis. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition (CVPR), 2019

  27. [27]

    Yatian Pang, Wenxiao Wang, Francis E. H. Tay, Wei Liu, Yijun Tian, and Li Yuan. Masked autoencoders for point cloud self-supervised learning. InEuropean Conference on Computer Vision (ECCV), 2022

  28. [28]

    Qi, Hao Su, Kaichun Mo, and Leonidas J

    Charles R. Qi, Hao Su, Kaichun Mo, and Leonidas J. Guibas. Pointnet: Deep learning on point sets for 3d classification and segmentation. InProceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2017

  29. [29]

    Qi, Li Yi, Hao Su, and Leonidas J

    Charles R. Qi, Li Yi, Hao Su, and Leonidas J. Guibas. Pointnet++: Deep hierarchical feature learning on point sets in a metric space. InAdvances in Neural Information Processing Systems (NeurIPS), 2017

  30. [30]

    Con- trast with reconstruct: Contrastive 3d representation learning guided by generative pretraining

    Zekun Qi, Runpei Dong, Guofan Fan, Zheng Ge, Xiangyu Zhang, Kaisheng Ma, and Li Yi. Con- trast with reconstruct: Contrastive 3d representation learning guided by generative pretraining. InProceedings of the 40th International Conference on Machine Learning (ICML), 2023

  31. [31]

    Hydramamba: Multi-head state space model for global point cloud learning

    Kanglin Qu, Pan Gao, Qun Dai, and Yuanhao Sun. Hydramamba: Multi-head state space model for global point cloud learning. InProceedings of the 33rd ACM International Conference on Multimedia (ACM MM), 2025. 11

  32. [32]

    Cloudmamba: Grouped selective state spaces for point cloud analysis

    Kanglin Qu, Pan Gao, Qun Dai, Zhanzhi Ye, Rui Ye, and Yuanhao Sun. Cloudmamba: Grouped selective state spaces for point cloud analysis. InProceedings of the AAAI Conference on Artificial Intelligence (AAAI), 2026

  33. [33]

    Hyperpoint: Multimodal 3d foundation model in hyperbolic space.Pattern Recognition (PR), 2026

    Yiding Sun, Haozhe Cheng, Chaoyi Lu, Zhengqiao Li, Minghong Wu, Huimin Lu, and Jihua Zhu. Hyperpoint: Multimodal 3d foundation model in hyperbolic space.Pattern Recognition (PR), 2026

  34. [34]

    Align then Adapt: Rethinking Parameter-Efficient Transfer Learning in 4D Perception

    Yiding Sun, Jihua Zhu, Haozhe Cheng, Chaoyi Lu, Zhichuan Yang, Lin Chen, and Yaonan Wang. Align then adapt: Rethinking parameter-efficient transfer learning in 4d perception. arXiv preprint arXiv:2602.23069, 2026

  35. [35]

    Point-peft: Parameter-efficient fine-tuning for 3d pre-trained models

    Yiwen Tang, Ray Zhang, Zoey Guo, Xianzheng Ma, Bin Zhao, Zhigang Wang, Dong Wang, and Xuelong Li. Point-peft: Parameter-efficient fine-tuning for 3d pre-trained models. In Proceedings of the AAAI Conference on Artificial Intelligence (AAAI), 2024

  36. [36]

    Revisiting point cloud classification: A new benchmark dataset and classification model on real-world data

    Mikaela Angelina Uy, Quang-Hieu Pham, Binh-Son Hua, Duc Thanh Nguyen, and Sai-Kit Yeung. Revisiting point cloud classification: A new benchmark dataset and classification model on real-world data. InInternational Conference on Computer Vision (ICCV), 2019

  37. [37]

    Visualizing data using t-sne.Journal of Machine Learning Research (JMLR), 2008

    Laurens van der Maaten and Geoffrey Hinton. Visualizing data using t-sne.Journal of Machine Learning Research (JMLR), 2008

  38. [38]

    Attention is all you need

    Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Łukasz Kaiser, and Illia Polosukhin. Attention is all you need. InAdvances in neural information processing systems (NeurIPS), 2017

  39. [39]

    Strumamba3d: Exploring structural mamba for self-supervised point cloud representation learning

    Chuxin Wang, Yixin Zha, Wenfei Yang, and Tianzhu Zhang. Strumamba3d: Exploring structural mamba for self-supervised point cloud representation learning. InProceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), 2025

  40. [40]

    Pointlora: Low-rank adaptation with token selection for point cloud learning

    Song Wang, Xiaolu Liu, Lingdong Kong, Jianyun Xu, Chunyong Hu, Gongfan Fang, Wentong Li, Jianke Zhu, and Xinchao Wang. Pointlora: Low-rank adaptation with token selection for point cloud learning. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2025

  41. [41]

    Pointrft: Explicit reinforcement fine-tuning for point cloud few-shot learning.arXiv preprint arXiv:2603.23957, 2026

    Yankai Wang, Yiding Sun, Qirui Wang, Pengbo Li, Chaoyi Lu, and Dongxu Zhang. Pointrft: Explicit reinforcement fine-tuning for point cloud few-shot learning.arXiv preprint arXiv:2603.23957, 2026

  42. [42]

    Sarma, Michael M

    Yue Wang, Yongbin Sun, Ziwei Liu, Sanjay E. Sarma, Michael M. Bronstein, and Justin M. Solomon. Dynamic graph cnn for learning on point clouds.ACM Transactions on Graphics (TOG), 2019

  43. [43]

    Point transformer v2: Grouped vector attention and partition-based pooling

    Xiaoyang Wu, Yixing Lao, Li Jiang, Xihui Liu, and Hengshuang Zhao. Point transformer v2: Grouped vector attention and partition-based pooling. InAdvances in Neural Information Processing Systems (NeurIPS), 2022

  44. [44]

    Point transformer v3: Simpler, faster, stronger

    Xiaoyang Wu, Yuxin Lao, Li Jiang, Xiangyu Liu, and Hengshuang Zhao. Point transformer v3: Simpler, faster, stronger. InProceedings of the IEEE/CVF conference on Computer Vision and Pattern Recognition (CVPR), 2024

  45. [45]

    3d shapenets: A deep representation for volumetric shapes

    Zhirong Wu, Shuran Song, Aditya Khosla, Fisher Yu, Linguang Zhang, Xiaoou Tang, and Jianxiong Xiao. 3d shapenets: A deep representation for volumetric shapes. InProceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2015

  46. [46]

    Kim, Duygu Ceylan, I Chao Shen, Mengyan Yan, Hao Su, Cewu Lu, Qixing Huang, Alla Sheffer, and Leonidas Guibas

    Li Yi, Vladimir G. Kim, Duygu Ceylan, I Chao Shen, Mengyan Yan, Hao Su, Cewu Lu, Qixing Huang, Alla Sheffer, and Leonidas Guibas. A scalable active framework for region annotation in 3d shape collections.ACM Transactions on Graphics (TOG), 2016

  47. [47]

    Point-bert: Pre- training 3d point cloud transformers with masked point modeling

    Xiaoyang Yu, Yilun Tang, Yue Rao, Tiejun Huang, Jie Zhou, and Jiwen Lu. Point-bert: Pre- training 3d point cloud transformers with masked point modeling. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2022. 12

  48. [48]

    Instance-aware dynamic prompt tuning for pre-trained point cloud models

    Yaohua Zha, Jinpeng Wang, Tao Dai, Bin Chen, Zhi Wang, and Shu-Tao Xia. Instance-aware dynamic prompt tuning for pre-trained point cloud models. InProceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), 2023

  49. [49]

    Towards compact 3d representations via point feature enhancement masked autoencoders

    Yaohua Zha, Huizhen Ji, Jinmin Li, Rongsheng Li, Tao Dai, Bin Chen, Zhi Wang, and Shu-Tao Xia. Towards compact 3d representations via point feature enhancement masked autoencoders. InProceedings of the AAAI Conference on Artificial Intelligence (AAAI), 2024

  50. [50]

    Pma: Towards parameter-efficient point cloud understanding via point mamba adapter

    Yaohua Zha, Yanzi Wang, Hang Guo, Jinpeng Wang, Tao Dai, Bin Chen, Zhihao Ouyang, Xue Yuerong, Ke Chen, and Shu-Tao Xia. Pma: Towards parameter-efficient point cloud understanding via point mamba adapter. InProceedings of the Computer Vision and Pattern Recognition Conference (CVPR), 2025

  51. [51]

    Exploring vision seman- tic prompt for efficient point cloud understanding

    Yixin Zha, Chuxin Wang, Wenfei Yang, Tianzhu Zhang, and Feng Wu. Exploring vision seman- tic prompt for efficient point cloud understanding. InProceedings of the 42nd International Conference on Machine Learning (ICML), 2025

  52. [52]

    Pointcot: A multi-modal benchmark for explicit 3d geometric reasoning.arXiv preprint arXiv:2602.23945, 2026

    Dongxu Zhang, Yiding Sun, Pengcheng Li, Yumou Liu, Hongqiang Lin, Haoran Xu, Xiaoxuan Mu, Liang Lin, Wenbiao Yan, Ning Yang, et al. Pointcot: A multi-modal benchmark for explicit 3d geometric reasoning.arXiv preprint arXiv:2602.23945, 2026

  53. [53]

    Point-m2ae: Multi-scale masked autoencoders for hierarchical point cloud pre-training

    Renrui Zhang, Ziyu Guo, Peng Gao, Rongyao Fang, Bin Zhao, Dong Wang, Yu Qiao, and Hongsheng Li. Point-m2ae: Multi-scale masked autoencoders for hierarchical point cloud pre-training. InAdvances in Neural Information Processing Systems (NeurIPS), 2022

  54. [54]

    Learning 3d representa- tions from 2d pre-trained models via image-to-point masked autoencoders

    Renrui Zhang, Liuhui Wang, Yu Qiao, Peng Gao, and Hongsheng Li. Learning 3d representa- tions from 2d pre-trained models via image-to-point masked autoencoders. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition (CVPR), 2023

  55. [55]

    Point cloud mamba: Point cloud learning via state space model

    Tao Zhang, Haobo Yuan, Lu Qi, Jiangning Zhang, Qianyu Zhou, Shunping Ji, Shuicheng Yan, and Xiangtai Li. Point cloud mamba: Point cloud learning via state space model. InProceedings of the AAAI Conference on Artificial Intelligence (AAAI), 2025

  56. [56]

    Hengshuang Zhao, Li Jiang, Jiaya Jia, Philip H. S. Torr, and Vladlen Koltun. Point transformer. InProceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), 2021

  57. [57]

    Bag”, “Cap

    Xin Zhou, Dingkang Liang, Wei Xu, Xingkui Zhu, Yihan Xu, Zhikang Zou, and Xiang Bai. Dynamic adapter meets prompt tuning: Parameter-efficient transfer learning for point cloud analysis. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2024. 13 Appendix Table of Contents A Additional theoretical analysis . . . . ....

  58. [58]

    Guidelines: • The answer [N/A] means that the paper does not involve crowdsourcing nor research with human subjects

    Institutional review board (IRB) approvals or equivalent for research with human subjects 33 Question: Does the paper describe potential risks incurred by study participants, whether such risks were disclosed to the subjects, and whether Institutional Review Board (IRB) approvals (or an equivalent approval/review based on the requirements of your country ...