arxiv: 2605.03438 · v2 · submitted 2026-05-05 · 💻 cs.CV

Recognition: no theorem link

Mantis: Mamba-native Tuning is Efficient for 3D Point Cloud Foundation Models

Zihao Guo , Jihua Zhu , Jian Liu , Ajmal Saeed Mian

Authors on Pith no claims yet

Pith reviewed 2026-05-12 02:40 UTC · model grok-4.3

classification 💻 cs.CV

keywords Mambaparameter-efficient fine-tuning3D point cloudsfoundation modelsState-Aware AdapterDual-Serialization Consistency Distillationpoint cloud serializationselective state-space models

0 comments

The pith

Mantis introduces a Mamba-native parameter-efficient fine-tuning method for 3D point cloud foundation models that reaches competitive accuracy with only about 5% trainable parameters.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Pre-trained 3D point cloud foundation models transfer well across tasks but full fine-tuning is expensive in compute and storage. Existing parameter-efficient methods were built for Transformer backbones and rely on token-level changes, which create a mismatch with Mamba's state-level sequence dynamics and cause large accuracy drops when applied directly. The paper proposes Mantis, the first framework designed for Mamba, that adds a State-Aware Adapter to insert lightweight task control into the selective state-space updates while freezing the backbone. It further adds Dual-Serialization Consistency Distillation to reduce instability from different valid point-orderings. Experiments on multiple benchmarks show this combination delivers performance close to full fine-tuning using roughly 5 percent of the parameters.

Core claim

Mantis is the first Mamba-native PEFT framework for 3D point cloud foundation models. It introduces a State-Aware Adapter that injects lightweight task-conditioned control signals into Mamba's selective state-space updates, enabling state-level adaptation without updating the pre-trained backbone, and applies Dual-Serialization Consistency Distillation to regularize across different valid point-cloud serializations, thereby reducing serialization-induced instability. Extensive experiments across multiple benchmarks show that Mantis achieves competitive performance with only about 5% trainable parameters.

What carries the argument

The State-Aware Adapter, which injects task-specific control signals directly into Mamba's selective state-space updates to support state-level adaptation in a frozen backbone.

If this is right

Mamba-based 3D point cloud models can be adapted to new tasks without full retraining or large storage costs.
Token-level PEFT methods are insufficient for state-space backbones and must be replaced by state-level mechanisms.
Regularizing across multiple point-cloud serializations stabilizes training of Mamba models on unordered 3D data.
Foundation models in 3D vision become practical to deploy when only a small fraction of parameters need updating per task.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Similar state-aware adapters could be tested on other state-space sequence models in vision or multimodal settings.
The 5% parameter budget may enable on-device adaptation of 3D models where full fine-tuning is impossible.
The dual-serialization regularization might extend to other ordering-sensitive data types such as sequences of images or meshes.
The framework highlights that backbone-specific PEFT design is necessary to avoid the accuracy degradation seen when transferring Transformer methods to Mamba.

Load-bearing premise

The State-Aware Adapter successfully injects task-specific control into Mamba's selective state-space updates without degrading the pre-trained dynamics, and Dual-Serialization Consistency Distillation reduces serialization instability without new accuracy trade-offs.

What would settle it

On a held-out 3D point cloud benchmark, if Mantis at 5% trainable parameters shows substantially lower accuracy than full fine-tuning or existing PEFT baselines, the claim of efficient competitive performance would be falsified.

Figures

Figures reproduced from arXiv: 2605.03438 by Ajmal Saeed Mian, Jian Liu, Jihua Zhu, Zihao Guo.

**Figure 1.** Figure 1: Comprehensive comparisons between our Mantis and several representative counterparts[ view at source ↗

**Figure 2.** Figure 2: Pipeline of Mantis. Raw point clouds are sampled using Farthest Point Sampling (FPS) to select representative key points, and local neighborhoods are formed via K-nearest neighbors (KNN) for each key point. The resulting patches are serialized along two complementary space-filling curves to produce dual-order patches. These patches are fused and processed by stacked SAA-Mamba blocks, where the selective SS… view at source ↗

**Figure 3.** Figure 3: Illustration of Dual-Serialization Consistency Distillation. During training, DSCD enforces crossserialization consistency at both feature and prediction levels. Although serialization enables Mamba to process unordered point clouds as sequential inputs, the resulting hidden-state propagation is inherently sensitive to the specific traversal order. Consequently, different valid serializations of the same… view at source ↗

**Figure 4.** Figure 4: Qualitative analysis results for object classification and part segmentation. (a) The t-SNE visualization results on the PB_T50_RS variant with different fine-tuning schemes. (b) Comparison of full fine-tuning and Mantis on part segmentation with Mamba3D[12]. Ablation on different inserted layers. One straightforward way to further reduce the number of tunable parameters is to insert SAA into only a subset… view at source ↗

**Figure 5.** Figure 5: Comparison of computational efficiency and hyperparameter sensitivity on view at source ↗

**Figure 6.** Figure 6: The classification accuracy curves of full fine-tuning, IDPT[ view at source ↗

**Figure 7.** Figure 7: Visualization results for part segmentation on ShapeNetPart[46]. Projected prediction images from Mantis are shown across four different viewpoints, including the categories “Airplane”, “Car”, “Chair”, “Guitar” and “Motorbike”. 25 view at source ↗

**Figure 8.** Figure 8: Visualization results for part segmentation on ShapeNetPart[46]. Projected prediction images from Mantis are shown across four different viewpoints, including the categories “Table”, “Rocket”, “Pistol”, “Laptop” and “Lamp”. 26 view at source ↗

**Figure 9.** Figure 9: Visualization results for part segmentation on ShapeNetPart[46]. Projected prediction images from Mantis are shown across four different viewpoints, including the categories “Bag”, “Cap”, “Earphone”, “Knife”, “Mug” and “Skateboard”. 27 view at source ↗

read the original abstract

Pre-trained 3D point cloud foundation models (PFMs) have demonstrated strong transferability across diverse downstream tasks. However, full fine-tuning these models is computationally expensive and storage-intensive. Parameter-efficient fine-tuning (PEFT) offers a promising alternative, but existing PEFT approaches are primarily designed for Transformer-based backbones and rely on token-level prompting or feature transformation. Mamba-based backbones introduce a granularity mismatch between token-level adaptation and state-level sequence dynamics. Consequently, straightforward transfer of existing PEFT approaches to frozen Mamba backbones leads to substantial accuracy degradation and unstable optimization. To address this issue, we propose Mantis, the first Mamba-native PEFT framework for 3D PFMs. Specifically, a State-Aware Adapter (SAA) is introduced to inject lightweight task-conditioned control signals into selective state-space updates, enabling state-level adaptation while keeping the pre-trained backbone frozen. Moreover, different valid point cloud serializations are regularized by Dual-Serialization Consistency Distillation (DSCD), thereby reducing serialization-induced instability. Extensive experiments across multiple benchmarks demonstrate that our Mantis achieves competitive performance with only about 5% trainable parameters. Our code is available at https://github.com/gzhhhhhhh/Mantis.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

3 major / 2 minor

Summary. The paper introduces Mantis, the first Mamba-native parameter-efficient fine-tuning (PEFT) framework for 3D point cloud foundation models. It identifies a granularity mismatch when applying token-level Transformer PEFT methods to Mamba backbones and proposes two components: a State-Aware Adapter (SAA) that injects lightweight task-conditioned control signals directly into the selective state-space updates, and Dual-Serialization Consistency Distillation (DSCD) that regularizes predictions across different valid point-cloud serializations to reduce instability. The central claim is that Mantis achieves competitive performance on multiple 3D benchmarks while training only ~5% of the parameters with the backbone frozen.

Significance. If the performance claims and component contributions are substantiated, the work would be a useful first step toward efficient adaptation of Mamba-based 3D models, extending PEFT research beyond Transformer architectures. The open-source code release is a positive factor for reproducibility.

major comments (3)

[§4] §4 (Experiments): The abstract asserts competitive results with ~5% trainable parameters, yet the provided text contains no quantitative tables, baseline comparisons, error bars, or statistical significance tests. Without these, the central performance claim cannot be evaluated.
[§3.2] §3.2 (State-Aware Adapter): No equations or state-update analysis show how SAA injects task-specific signals at the state level rather than token level, nor any verification that pre-trained Mamba dynamics are preserved. Targeted ablations isolating SAA's effect on selective state-space parameters are required to support the mechanism.
[§3.3] §3.3 (Dual-Serialization Consistency Distillation): The description of DSCD lacks a formal loss equation or analysis demonstrating that it reduces serialization-induced instability without introducing accuracy trade-offs. Component-wise ablations comparing runs with and without DSCD are needed to establish causality.

minor comments (2)

[§3] Notation for the Mamba state-space parameters (e.g., A, B, C, Δ) should be explicitly aligned with the original Mamba paper to avoid ambiguity.
[Figure 2] Figure captions for the SAA and DSCD diagrams should include a brief description of the data flow and which modules are frozen.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for their constructive feedback on our manuscript. We address each major comment point by point below and indicate the revisions we will make to improve clarity and substantiation of our claims.

read point-by-point responses

Referee: [§4] §4 (Experiments): The abstract asserts competitive results with ~5% trainable parameters, yet the provided text contains no quantitative tables, baseline comparisons, error bars, or statistical significance tests. Without these, the central performance claim cannot be evaluated.

Authors: We agree that the experimental section requires more explicit quantitative support to allow proper evaluation of the claims. The full manuscript describes results across benchmarks, but we will expand Section 4 in the revision to include complete tables with baseline comparisons, error bars from multiple runs, and statistical significance tests where applicable. revision: yes
Referee: [§3.2] §3.2 (State-Aware Adapter): No equations or state-update analysis show how SAA injects task-specific signals at the state level rather than token level, nor any verification that pre-trained Mamba dynamics are preserved. Targeted ablations isolating SAA's effect on selective state-space parameters are required to support the mechanism.

Authors: We will add the missing equations formalizing the SAA injection into the selective state-space updates, along with analysis demonstrating preservation of pre-trained Mamba dynamics. We will also include targeted ablations that isolate SAA's specific effects on the state-space parameters. revision: yes
Referee: [§3.3] §3.3 (Dual-Serialization Consistency Distillation): The description of DSCD lacks a formal loss equation or analysis demonstrating that it reduces serialization-induced instability without introducing accuracy trade-offs. Component-wise ablations comparing runs with and without DSCD are needed to establish causality.

Authors: We will incorporate a formal loss equation for DSCD and provide analysis showing its effect on reducing serialization instability without accuracy trade-offs. Component-wise ablations comparing performance with and without DSCD will be added to establish the contribution. revision: yes

Circularity Check

0 steps flagged

No circularity: novel adapters and distillation loss are independent proposals validated by experiments

full rationale

The paper introduces two new mechanisms (State-Aware Adapter for state-level control injection and Dual-Serialization Consistency Distillation for serialization regularization) to address a stated granularity mismatch between token-level PEFT and Mamba state dynamics. These are presented as original contributions, with performance claims resting on empirical results across benchmarks rather than any derivation that reduces outputs to fitted inputs or self-cited priors by construction. No equations, self-definitional loops, or load-bearing self-citations appear in the provided text that would force the reported ~5% parameter efficiency or competitive accuracy as tautological. The derivation chain is self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 2 invented entities

The central claim rests on two newly introduced modules whose effectiveness is asserted but not independently evidenced outside the paper's own experiments.

axioms (1)

domain assumption Mamba backbones exhibit a granularity mismatch between token-level adaptation and state-level sequence dynamics that causes existing PEFT methods to degrade.
Invoked in the abstract to justify why Transformer PEFT cannot be transferred directly.

invented entities (2)

State-Aware Adapter (SAA) no independent evidence
purpose: Inject lightweight task-conditioned control signals into selective state-space updates while keeping the backbone frozen.
New module proposed to solve the stated mismatch; no external evidence of its behavior is supplied.
Dual-Serialization Consistency Distillation (DSCD) no independent evidence
purpose: Regularize different valid point cloud serializations to reduce serialization-induced instability.
New regularization technique introduced to stabilize training; effectiveness asserted via the paper's results.

pith-pipeline@v0.9.0 · 5528 in / 1324 out tokens · 55849 ms · 2026-05-12T02:40:32.993474+00:00 · methodology

Review history (3 revisions) →

discussion (0)

Reference graph

Works this paper leans on

58 extracted references · 58 canonical work pages · 1 internal anchor

[1]

GAPrompt: Geometry-aware point cloud prompt for 3D vision model

Zixiang Ai, Zichen Liu, Yuanhang Lei, Zhenyu Cui, Xu Zou, and Jiahuan Zhou. GAPrompt: Geometry-aware point cloud prompt for 3D vision model. InProceedings of the 42nd Interna- tional Conference on Machine Learning (ICML), 2025

work page 2025
[2]

Zamir, Helen Jiang, Ioannis Brilakis, Martin Fischer, and Silvio Savarese

Iro Armeni, Ozan Sener, Amir R. Zamir, Helen Jiang, Ioannis Brilakis, Martin Fischer, and Silvio Savarese. 3d semantic parsing of large-scale indoor spaces. InProceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016

work page 2016
[3]

Spectral informed mamba for robust point cloud processing

Ali Bahri, Moslem Yazdanpanah, Mehrdad Noori, Sahar Dastani, Milad Cheraghalikhani, Gus- tavo Adolfo Vargas Hakim, David Osowiechi, Farzad Beizaee, Ismail Ben Ayed, and Christian Desrosiers. Spectral informed mamba for robust point cloud processing. InProceedings of the Computer Vision and Pattern Recognition Conference (CVPR), 2025

work page 2025
[4]

Pointgpt: Auto-regressively generative pre-training from point clouds.Advances in Neural Information Processing Systems (NeurIPS), 2023

Guangyan Chen, Meiling Wang, Yi Yang, Kai Yu, Li Yuan, and Yufeng Yue. Pointgpt: Auto-regressively generative pre-training from point clouds.Advances in Neural Information Processing Systems (NeurIPS), 2023

work page 2023
[5]

Adaptformer: Adapting vision transformers for scalable visual recognition.Advances in Neural Information Processing Systems (NeurIPS), 2022

Shoufa Chen, Chongjian Ge, Zhan Tong, Jiangliu Wang, Yibing Song, Jue Wang, and Ping Luo. Adaptformer: Adapting vision transformers for scalable visual recognition.Advances in Neural Information Processing Systems (NeurIPS), 2022

work page 2022
[6]

Bert: Pre-training of deep bidirectional transformers for language understanding

Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. Bert: Pre-training of deep bidirectional transformers for language understanding. InProceedings of the 2019 conference of the North American chapter of the association for computational linguistics: human language technologies, volume 1 (NAACL), 2019

work page 2019
[7]

Zigzagpointmamba: Spatial- semantic mamba for point cloud understanding

Linshuang Diao, Sensen Song, Yurong Qian, and Dayong Ren. Zigzagpointmamba: Spatial- semantic mamba for point cloud understanding. InAdvances in Neural Information Processing Systems (NeurIPS), 2025

work page 2025
[8]

Runpei Dong, Zekun Qi, Linfeng Zhang, Junbo Zhang, Jianjian Sun, Zheng Ge, Li Yi, and Kaisheng Ma. Autoencoders as cross-modal teachers: Can pretrained 2d image transformers help 3d representation learning? InThe Eleventh International Conference on Learning Representations (ICLR), 2023

work page 2023
[9]

Mamba: Linear-time sequence modeling with selective state spaces

Albert Gu and Tri Dao. Mamba: Linear-time sequence modeling with selective state spaces. In First conference on language modeling (COLM), 2024

work page 2024
[10]

Efficiently modeling long sequences with structured state spaces

Albert Gu, Karan Goel, and Christopher Re. Efficiently modeling long sequences with structured state spaces. InInternational Conference on Learning Representations (ICLR), 2022

work page 2022
[11]

Joint-mae: 2d-3d joint masked autoencoders for 3d point cloud pre-training

Ziyu Guo, Renrui Zhang, Longtian Qiu, Xianzhi Li, and Pheng-Ann Heng. Joint-mae: 2d-3d joint masked autoencoders for 3d point cloud pre-training. InProceedings of the Thirty-Second International Joint Conference on Artificial Intelligence (IJCAI), 2023

work page 2023
[12]

Mamba3d: Enhancing local features for 3d point cloud analysis via state space model

Xu Han, Yuan Tang, Zhaoxuan Wang, and Xianzhi Li. Mamba3d: Enhancing local features for 3d point cloud analysis via state space model. InProceedings of the 32nd ACM International Conference on Multimedia (ACM MM), 2024

work page 2024
[13]

Most: Efficient monarch sparse tuning for 3d representation learning

Xu Han, Yuan Tang, Jinfeng Xu, and Xianzhi Li. Most: Efficient monarch sparse tuning for 3d representation learning. InProceedings of the Computer Vision and Pattern Recognition Conference (CVPR), 2025

work page 2025
[14]

Über die stetige abbildung einer linie auf ein flächenstück

David Hilbert. Über die stetige abbildung einer linie auf ein flächenstück. InDritter Band: Analysis· Grundlagen der Mathematik· Physik Verschiedenes: Nebst Einer Lebensgeschichte. Springer, 1935

work page 1935
[15]

Parameter-efficient transfer learning for nlp

Neil Houlsby, Andrei Giurgiu, Stanislaw Jastrzebski, Bruna Morrone, Quentin De Laroussilhe, Andrea Gesmundo, Mona Attariyan, and Sylvain Gelly. Parameter-efficient transfer learning for nlp. InInternational conference on machine learning (ICML), 2019. 10

work page 2019
[16]

LoRA: Low-rank adaptation of large language models

Edward J Hu, Yelong Shen, Phillip Wallis, Zeyuan Allen-Zhu, Yuanzhi Li, Shean Wang, Lu Wang, and Weizhu Chen. LoRA: Low-rank adaptation of large language models. In International Conference on Learning Representations (ICLR), 2022

work page 2022
[17]

Visual prompt tuning

Menglin Jia, Luming Tang, Bor-Chun Chen, Claire Cardie, Serge Belongie, Bharath Hariharan, and Ser-Nam Lim. Visual prompt tuning. InEuropean conference on computer vision (ECCV). Springer, 2022

work page 2022
[18]

Li, Gary K.L

Zhaoyi Jiang, Yi Xu, Frederick W.B. Li, Gary K.L. Tam, Chao Song, and Bailin Yang. Oct- mamba: Mamba-based octree context entropy model for point cloud geometry compression. Pattern Recognition (PR), 2026

work page 2026
[19]

Revisiting the parameter efficiency of adapters from the perspective of precision redundancy

Shibo Jie, Haoqing Wang, and Zhi-Hong Deng. Revisiting the parameter efficiency of adapters from the perspective of precision redundancy. InProceedings of the IEEE/CVF international conference on computer vision (ICCV), 2023

work page 2023
[20]

Pointdico: Contrastive 3d representation learning guided by diffusion models

Pengbo Li, Yiding Sun, and Haozhe Cheng. Pointdico: Contrastive 3d representation learning guided by diffusion models. In2025 International Joint Conference on Neural Networks (IJCNN), 2025

work page 2025
[21]

Prefix-tuning: Optimizing continuous prompts for generation

Xiang Lisa Li and Percy Liang. Prefix-tuning: Optimizing continuous prompts for generation. InProceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers) (ACL), 2021

work page 2021
[22]

Pointcnn: Convolution on x-transformed points.Advances in neural information processing systems (NeurIPS), 2018

Yangyan Li, Rui Bu, Mingchao Sun, Wei Wu, Xinhan Di, and Baoquan Chen. Pointcnn: Convolution on x-transformed points.Advances in neural information processing systems (NeurIPS), 2018

work page 2018
[23]

Pointmamba: A simple state space model for point cloud analysis

Dingkang Liang, Xin Zhou, Wei Xu, Xingkui Zhu, Zhikang Zou, Xiaoqing Ye, Xiao Tan, and Xiang Bai. Pointmamba: A simple state space model for point cloud analysis. InAdvances in Neural Information Processing Systems (NeurIPS), 2024

work page 2024
[24]

Parameter-efficient fine-tuning in spectral domain for point cloud learning.IEEE transactions on pattern analysis and machine intelligence (TPAMI), 2025

Dingkang Liang, Tianrui Feng, Xin Zhou, Yumeng Zhang, Zhikang Zou, and Xiang Bai. Parameter-efficient fine-tuning in spectral domain for point cloud learning.IEEE transactions on pattern analysis and machine intelligence (TPAMI), 2025

work page 2025
[25]

Masked discrimination for self-supervised learning on point clouds

Haotian Liu, Mu Cai, and Yong Jae Lee. Masked discrimination for self-supervised learning on point clouds. InEuropean Conference on Computer Vision (ECCV). Springer, 2022

work page 2022
[26]

Relation-shape convolutional neural network for point cloud analysis

Yongcheng Liu, Bin Fan, Shiming Xiang, and Chunhong Pan. Relation-shape convolutional neural network for point cloud analysis. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition (CVPR), 2019

work page 2019
[27]

Yatian Pang, Wenxiao Wang, Francis E. H. Tay, Wei Liu, Yijun Tian, and Li Yuan. Masked autoencoders for point cloud self-supervised learning. InEuropean Conference on Computer Vision (ECCV), 2022

work page 2022
[28]

Qi, Hao Su, Kaichun Mo, and Leonidas J

Charles R. Qi, Hao Su, Kaichun Mo, and Leonidas J. Guibas. Pointnet: Deep learning on point sets for 3d classification and segmentation. InProceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2017

work page 2017
[29]

Qi, Li Yi, Hao Su, and Leonidas J

Charles R. Qi, Li Yi, Hao Su, and Leonidas J. Guibas. Pointnet++: Deep hierarchical feature learning on point sets in a metric space. InAdvances in Neural Information Processing Systems (NeurIPS), 2017

work page 2017
[30]

Con- trast with reconstruct: Contrastive 3d representation learning guided by generative pretraining

Zekun Qi, Runpei Dong, Guofan Fan, Zheng Ge, Xiangyu Zhang, Kaisheng Ma, and Li Yi. Con- trast with reconstruct: Contrastive 3d representation learning guided by generative pretraining. InProceedings of the 40th International Conference on Machine Learning (ICML), 2023

work page 2023
[31]

Hydramamba: Multi-head state space model for global point cloud learning

Kanglin Qu, Pan Gao, Qun Dai, and Yuanhao Sun. Hydramamba: Multi-head state space model for global point cloud learning. InProceedings of the 33rd ACM International Conference on Multimedia (ACM MM), 2025. 11

work page 2025
[32]

Cloudmamba: Grouped selective state spaces for point cloud analysis

Kanglin Qu, Pan Gao, Qun Dai, Zhanzhi Ye, Rui Ye, and Yuanhao Sun. Cloudmamba: Grouped selective state spaces for point cloud analysis. InProceedings of the AAAI Conference on Artificial Intelligence (AAAI), 2026

work page 2026
[33]

Hyperpoint: Multimodal 3d foundation model in hyperbolic space.Pattern Recognition (PR), 2026

Yiding Sun, Haozhe Cheng, Chaoyi Lu, Zhengqiao Li, Minghong Wu, Huimin Lu, and Jihua Zhu. Hyperpoint: Multimodal 3d foundation model in hyperbolic space.Pattern Recognition (PR), 2026

work page 2026
[34]

Align then Adapt: Rethinking Parameter-Efficient Transfer Learning in 4D Perception

Yiding Sun, Jihua Zhu, Haozhe Cheng, Chaoyi Lu, Zhichuan Yang, Lin Chen, and Yaonan Wang. Align then adapt: Rethinking parameter-efficient transfer learning in 4d perception. arXiv preprint arXiv:2602.23069, 2026

work page internal anchor Pith review Pith/arXiv arXiv 2026
[35]

Point-peft: Parameter-efficient fine-tuning for 3d pre-trained models

Yiwen Tang, Ray Zhang, Zoey Guo, Xianzheng Ma, Bin Zhao, Zhigang Wang, Dong Wang, and Xuelong Li. Point-peft: Parameter-efficient fine-tuning for 3d pre-trained models. In Proceedings of the AAAI Conference on Artificial Intelligence (AAAI), 2024

work page 2024
[36]

Revisiting point cloud classification: A new benchmark dataset and classification model on real-world data

Mikaela Angelina Uy, Quang-Hieu Pham, Binh-Son Hua, Duc Thanh Nguyen, and Sai-Kit Yeung. Revisiting point cloud classification: A new benchmark dataset and classification model on real-world data. InInternational Conference on Computer Vision (ICCV), 2019

work page 2019
[37]

Visualizing data using t-sne.Journal of Machine Learning Research (JMLR), 2008

Laurens van der Maaten and Geoffrey Hinton. Visualizing data using t-sne.Journal of Machine Learning Research (JMLR), 2008

work page 2008
[38]

Attention is all you need

Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Łukasz Kaiser, and Illia Polosukhin. Attention is all you need. InAdvances in neural information processing systems (NeurIPS), 2017

work page 2017
[39]

Strumamba3d: Exploring structural mamba for self-supervised point cloud representation learning

Chuxin Wang, Yixin Zha, Wenfei Yang, and Tianzhu Zhang. Strumamba3d: Exploring structural mamba for self-supervised point cloud representation learning. InProceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), 2025

work page 2025
[40]

Pointlora: Low-rank adaptation with token selection for point cloud learning

Song Wang, Xiaolu Liu, Lingdong Kong, Jianyun Xu, Chunyong Hu, Gongfan Fang, Wentong Li, Jianke Zhu, and Xinchao Wang. Pointlora: Low-rank adaptation with token selection for point cloud learning. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2025

work page 2025
[41]

Pointrft: Explicit reinforcement fine-tuning for point cloud few-shot learning.arXiv preprint arXiv:2603.23957, 2026

Yankai Wang, Yiding Sun, Qirui Wang, Pengbo Li, Chaoyi Lu, and Dongxu Zhang. Pointrft: Explicit reinforcement fine-tuning for point cloud few-shot learning.arXiv preprint arXiv:2603.23957, 2026

work page arXiv 2026
[42]

Sarma, Michael M

Yue Wang, Yongbin Sun, Ziwei Liu, Sanjay E. Sarma, Michael M. Bronstein, and Justin M. Solomon. Dynamic graph cnn for learning on point clouds.ACM Transactions on Graphics (TOG), 2019

work page 2019
[43]

Point transformer v2: Grouped vector attention and partition-based pooling

Xiaoyang Wu, Yixing Lao, Li Jiang, Xihui Liu, and Hengshuang Zhao. Point transformer v2: Grouped vector attention and partition-based pooling. InAdvances in Neural Information Processing Systems (NeurIPS), 2022

work page 2022
[44]

Point transformer v3: Simpler, faster, stronger

Xiaoyang Wu, Yuxin Lao, Li Jiang, Xiangyu Liu, and Hengshuang Zhao. Point transformer v3: Simpler, faster, stronger. InProceedings of the IEEE/CVF conference on Computer Vision and Pattern Recognition (CVPR), 2024

work page 2024
[45]

3d shapenets: A deep representation for volumetric shapes

Zhirong Wu, Shuran Song, Aditya Khosla, Fisher Yu, Linguang Zhang, Xiaoou Tang, and Jianxiong Xiao. 3d shapenets: A deep representation for volumetric shapes. InProceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2015

work page 2015
[46]

Kim, Duygu Ceylan, I Chao Shen, Mengyan Yan, Hao Su, Cewu Lu, Qixing Huang, Alla Sheffer, and Leonidas Guibas

Li Yi, Vladimir G. Kim, Duygu Ceylan, I Chao Shen, Mengyan Yan, Hao Su, Cewu Lu, Qixing Huang, Alla Sheffer, and Leonidas Guibas. A scalable active framework for region annotation in 3d shape collections.ACM Transactions on Graphics (TOG), 2016

work page 2016
[47]

Point-bert: Pre- training 3d point cloud transformers with masked point modeling

Xiaoyang Yu, Yilun Tang, Yue Rao, Tiejun Huang, Jie Zhou, and Jiwen Lu. Point-bert: Pre- training 3d point cloud transformers with masked point modeling. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2022. 12

work page 2022
[48]

Instance-aware dynamic prompt tuning for pre-trained point cloud models

Yaohua Zha, Jinpeng Wang, Tao Dai, Bin Chen, Zhi Wang, and Shu-Tao Xia. Instance-aware dynamic prompt tuning for pre-trained point cloud models. InProceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), 2023

work page 2023
[49]

Towards compact 3d representations via point feature enhancement masked autoencoders

Yaohua Zha, Huizhen Ji, Jinmin Li, Rongsheng Li, Tao Dai, Bin Chen, Zhi Wang, and Shu-Tao Xia. Towards compact 3d representations via point feature enhancement masked autoencoders. InProceedings of the AAAI Conference on Artificial Intelligence (AAAI), 2024

work page 2024
[50]

Pma: Towards parameter-efficient point cloud understanding via point mamba adapter

Yaohua Zha, Yanzi Wang, Hang Guo, Jinpeng Wang, Tao Dai, Bin Chen, Zhihao Ouyang, Xue Yuerong, Ke Chen, and Shu-Tao Xia. Pma: Towards parameter-efficient point cloud understanding via point mamba adapter. InProceedings of the Computer Vision and Pattern Recognition Conference (CVPR), 2025

work page 2025
[51]

Exploring vision seman- tic prompt for efficient point cloud understanding

Yixin Zha, Chuxin Wang, Wenfei Yang, Tianzhu Zhang, and Feng Wu. Exploring vision seman- tic prompt for efficient point cloud understanding. InProceedings of the 42nd International Conference on Machine Learning (ICML), 2025

work page 2025
[52]

Pointcot: A multi-modal benchmark for explicit 3d geometric reasoning.arXiv preprint arXiv:2602.23945, 2026

Dongxu Zhang, Yiding Sun, Pengcheng Li, Yumou Liu, Hongqiang Lin, Haoran Xu, Xiaoxuan Mu, Liang Lin, Wenbiao Yan, Ning Yang, et al. Pointcot: A multi-modal benchmark for explicit 3d geometric reasoning.arXiv preprint arXiv:2602.23945, 2026

work page arXiv 2026
[53]

Point-m2ae: Multi-scale masked autoencoders for hierarchical point cloud pre-training

Renrui Zhang, Ziyu Guo, Peng Gao, Rongyao Fang, Bin Zhao, Dong Wang, Yu Qiao, and Hongsheng Li. Point-m2ae: Multi-scale masked autoencoders for hierarchical point cloud pre-training. InAdvances in Neural Information Processing Systems (NeurIPS), 2022

work page 2022
[54]

Learning 3d representa- tions from 2d pre-trained models via image-to-point masked autoencoders

Renrui Zhang, Liuhui Wang, Yu Qiao, Peng Gao, and Hongsheng Li. Learning 3d representa- tions from 2d pre-trained models via image-to-point masked autoencoders. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition (CVPR), 2023

work page 2023
[55]

Point cloud mamba: Point cloud learning via state space model

Tao Zhang, Haobo Yuan, Lu Qi, Jiangning Zhang, Qianyu Zhou, Shunping Ji, Shuicheng Yan, and Xiangtai Li. Point cloud mamba: Point cloud learning via state space model. InProceedings of the AAAI Conference on Artificial Intelligence (AAAI), 2025

work page 2025
[56]

Hengshuang Zhao, Li Jiang, Jiaya Jia, Philip H. S. Torr, and Vladlen Koltun. Point transformer. InProceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), 2021

work page 2021
[57]

Bag”, “Cap

Xin Zhou, Dingkang Liang, Wei Xu, Xingkui Zhu, Yihan Xu, Zhikang Zou, and Xiang Bai. Dynamic adapter meets prompt tuning: Parameter-efficient transfer learning for point cloud analysis. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2024. 13 Appendix Table of Contents A Additional theoretical analysis . . . . ....

work page 2024
[58]

Guidelines: • The answer [N/A] means that the paper does not involve crowdsourcing nor research with human subjects

Institutional review board (IRB) approvals or equivalent for research with human subjects 33 Question: Does the paper describe potential risks incurred by study participants, whether such risks were disclosed to the subjects, and whether Institutional Review Board (IRB) approvals (or an equivalent approval/review based on the requirements of your country ...

work page