Recognition: no theorem link
Mantis: Mamba-native Tuning is Efficient for 3D Point Cloud Foundation Models
Pith reviewed 2026-05-12 02:40 UTC · model grok-4.3
The pith
Mantis introduces a Mamba-native parameter-efficient fine-tuning method for 3D point cloud foundation models that reaches competitive accuracy with only about 5% trainable parameters.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Mantis is the first Mamba-native PEFT framework for 3D point cloud foundation models. It introduces a State-Aware Adapter that injects lightweight task-conditioned control signals into Mamba's selective state-space updates, enabling state-level adaptation without updating the pre-trained backbone, and applies Dual-Serialization Consistency Distillation to regularize across different valid point-cloud serializations, thereby reducing serialization-induced instability. Extensive experiments across multiple benchmarks show that Mantis achieves competitive performance with only about 5% trainable parameters.
What carries the argument
The State-Aware Adapter, which injects task-specific control signals directly into Mamba's selective state-space updates to support state-level adaptation in a frozen backbone.
If this is right
- Mamba-based 3D point cloud models can be adapted to new tasks without full retraining or large storage costs.
- Token-level PEFT methods are insufficient for state-space backbones and must be replaced by state-level mechanisms.
- Regularizing across multiple point-cloud serializations stabilizes training of Mamba models on unordered 3D data.
- Foundation models in 3D vision become practical to deploy when only a small fraction of parameters need updating per task.
Where Pith is reading between the lines
- Similar state-aware adapters could be tested on other state-space sequence models in vision or multimodal settings.
- The 5% parameter budget may enable on-device adaptation of 3D models where full fine-tuning is impossible.
- The dual-serialization regularization might extend to other ordering-sensitive data types such as sequences of images or meshes.
- The framework highlights that backbone-specific PEFT design is necessary to avoid the accuracy degradation seen when transferring Transformer methods to Mamba.
Load-bearing premise
The State-Aware Adapter successfully injects task-specific control into Mamba's selective state-space updates without degrading the pre-trained dynamics, and Dual-Serialization Consistency Distillation reduces serialization instability without new accuracy trade-offs.
What would settle it
On a held-out 3D point cloud benchmark, if Mantis at 5% trainable parameters shows substantially lower accuracy than full fine-tuning or existing PEFT baselines, the claim of efficient competitive performance would be falsified.
Figures
read the original abstract
Pre-trained 3D point cloud foundation models (PFMs) have demonstrated strong transferability across diverse downstream tasks. However, full fine-tuning these models is computationally expensive and storage-intensive. Parameter-efficient fine-tuning (PEFT) offers a promising alternative, but existing PEFT approaches are primarily designed for Transformer-based backbones and rely on token-level prompting or feature transformation. Mamba-based backbones introduce a granularity mismatch between token-level adaptation and state-level sequence dynamics. Consequently, straightforward transfer of existing PEFT approaches to frozen Mamba backbones leads to substantial accuracy degradation and unstable optimization. To address this issue, we propose Mantis, the first Mamba-native PEFT framework for 3D PFMs. Specifically, a State-Aware Adapter (SAA) is introduced to inject lightweight task-conditioned control signals into selective state-space updates, enabling state-level adaptation while keeping the pre-trained backbone frozen. Moreover, different valid point cloud serializations are regularized by Dual-Serialization Consistency Distillation (DSCD), thereby reducing serialization-induced instability. Extensive experiments across multiple benchmarks demonstrate that our Mantis achieves competitive performance with only about 5% trainable parameters. Our code is available at https://github.com/gzhhhhhhh/Mantis.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper introduces Mantis, the first Mamba-native parameter-efficient fine-tuning (PEFT) framework for 3D point cloud foundation models. It identifies a granularity mismatch when applying token-level Transformer PEFT methods to Mamba backbones and proposes two components: a State-Aware Adapter (SAA) that injects lightweight task-conditioned control signals directly into the selective state-space updates, and Dual-Serialization Consistency Distillation (DSCD) that regularizes predictions across different valid point-cloud serializations to reduce instability. The central claim is that Mantis achieves competitive performance on multiple 3D benchmarks while training only ~5% of the parameters with the backbone frozen.
Significance. If the performance claims and component contributions are substantiated, the work would be a useful first step toward efficient adaptation of Mamba-based 3D models, extending PEFT research beyond Transformer architectures. The open-source code release is a positive factor for reproducibility.
major comments (3)
- [§4] §4 (Experiments): The abstract asserts competitive results with ~5% trainable parameters, yet the provided text contains no quantitative tables, baseline comparisons, error bars, or statistical significance tests. Without these, the central performance claim cannot be evaluated.
- [§3.2] §3.2 (State-Aware Adapter): No equations or state-update analysis show how SAA injects task-specific signals at the state level rather than token level, nor any verification that pre-trained Mamba dynamics are preserved. Targeted ablations isolating SAA's effect on selective state-space parameters are required to support the mechanism.
- [§3.3] §3.3 (Dual-Serialization Consistency Distillation): The description of DSCD lacks a formal loss equation or analysis demonstrating that it reduces serialization-induced instability without introducing accuracy trade-offs. Component-wise ablations comparing runs with and without DSCD are needed to establish causality.
minor comments (2)
- [§3] Notation for the Mamba state-space parameters (e.g., A, B, C, Δ) should be explicitly aligned with the original Mamba paper to avoid ambiguity.
- [Figure 2] Figure captions for the SAA and DSCD diagrams should include a brief description of the data flow and which modules are frozen.
Simulated Author's Rebuttal
We thank the referee for their constructive feedback on our manuscript. We address each major comment point by point below and indicate the revisions we will make to improve clarity and substantiation of our claims.
read point-by-point responses
-
Referee: [§4] §4 (Experiments): The abstract asserts competitive results with ~5% trainable parameters, yet the provided text contains no quantitative tables, baseline comparisons, error bars, or statistical significance tests. Without these, the central performance claim cannot be evaluated.
Authors: We agree that the experimental section requires more explicit quantitative support to allow proper evaluation of the claims. The full manuscript describes results across benchmarks, but we will expand Section 4 in the revision to include complete tables with baseline comparisons, error bars from multiple runs, and statistical significance tests where applicable. revision: yes
-
Referee: [§3.2] §3.2 (State-Aware Adapter): No equations or state-update analysis show how SAA injects task-specific signals at the state level rather than token level, nor any verification that pre-trained Mamba dynamics are preserved. Targeted ablations isolating SAA's effect on selective state-space parameters are required to support the mechanism.
Authors: We will add the missing equations formalizing the SAA injection into the selective state-space updates, along with analysis demonstrating preservation of pre-trained Mamba dynamics. We will also include targeted ablations that isolate SAA's specific effects on the state-space parameters. revision: yes
-
Referee: [§3.3] §3.3 (Dual-Serialization Consistency Distillation): The description of DSCD lacks a formal loss equation or analysis demonstrating that it reduces serialization-induced instability without introducing accuracy trade-offs. Component-wise ablations comparing runs with and without DSCD are needed to establish causality.
Authors: We will incorporate a formal loss equation for DSCD and provide analysis showing its effect on reducing serialization instability without accuracy trade-offs. Component-wise ablations comparing performance with and without DSCD will be added to establish the contribution. revision: yes
Circularity Check
No circularity: novel adapters and distillation loss are independent proposals validated by experiments
full rationale
The paper introduces two new mechanisms (State-Aware Adapter for state-level control injection and Dual-Serialization Consistency Distillation for serialization regularization) to address a stated granularity mismatch between token-level PEFT and Mamba state dynamics. These are presented as original contributions, with performance claims resting on empirical results across benchmarks rather than any derivation that reduces outputs to fitted inputs or self-cited priors by construction. No equations, self-definitional loops, or load-bearing self-citations appear in the provided text that would force the reported ~5% parameter efficiency or competitive accuracy as tautological. The derivation chain is self-contained against external benchmarks.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption Mamba backbones exhibit a granularity mismatch between token-level adaptation and state-level sequence dynamics that causes existing PEFT methods to degrade.
invented entities (2)
-
State-Aware Adapter (SAA)
no independent evidence
-
Dual-Serialization Consistency Distillation (DSCD)
no independent evidence
Reference graph
Works this paper leans on
-
[1]
GAPrompt: Geometry-aware point cloud prompt for 3D vision model
Zixiang Ai, Zichen Liu, Yuanhang Lei, Zhenyu Cui, Xu Zou, and Jiahuan Zhou. GAPrompt: Geometry-aware point cloud prompt for 3D vision model. InProceedings of the 42nd Interna- tional Conference on Machine Learning (ICML), 2025
work page 2025
-
[2]
Zamir, Helen Jiang, Ioannis Brilakis, Martin Fischer, and Silvio Savarese
Iro Armeni, Ozan Sener, Amir R. Zamir, Helen Jiang, Ioannis Brilakis, Martin Fischer, and Silvio Savarese. 3d semantic parsing of large-scale indoor spaces. InProceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016
work page 2016
-
[3]
Spectral informed mamba for robust point cloud processing
Ali Bahri, Moslem Yazdanpanah, Mehrdad Noori, Sahar Dastani, Milad Cheraghalikhani, Gus- tavo Adolfo Vargas Hakim, David Osowiechi, Farzad Beizaee, Ismail Ben Ayed, and Christian Desrosiers. Spectral informed mamba for robust point cloud processing. InProceedings of the Computer Vision and Pattern Recognition Conference (CVPR), 2025
work page 2025
-
[4]
Guangyan Chen, Meiling Wang, Yi Yang, Kai Yu, Li Yuan, and Yufeng Yue. Pointgpt: Auto-regressively generative pre-training from point clouds.Advances in Neural Information Processing Systems (NeurIPS), 2023
work page 2023
-
[5]
Shoufa Chen, Chongjian Ge, Zhan Tong, Jiangliu Wang, Yibing Song, Jue Wang, and Ping Luo. Adaptformer: Adapting vision transformers for scalable visual recognition.Advances in Neural Information Processing Systems (NeurIPS), 2022
work page 2022
-
[6]
Bert: Pre-training of deep bidirectional transformers for language understanding
Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. Bert: Pre-training of deep bidirectional transformers for language understanding. InProceedings of the 2019 conference of the North American chapter of the association for computational linguistics: human language technologies, volume 1 (NAACL), 2019
work page 2019
-
[7]
Zigzagpointmamba: Spatial- semantic mamba for point cloud understanding
Linshuang Diao, Sensen Song, Yurong Qian, and Dayong Ren. Zigzagpointmamba: Spatial- semantic mamba for point cloud understanding. InAdvances in Neural Information Processing Systems (NeurIPS), 2025
work page 2025
-
[8]
Runpei Dong, Zekun Qi, Linfeng Zhang, Junbo Zhang, Jianjian Sun, Zheng Ge, Li Yi, and Kaisheng Ma. Autoencoders as cross-modal teachers: Can pretrained 2d image transformers help 3d representation learning? InThe Eleventh International Conference on Learning Representations (ICLR), 2023
work page 2023
-
[9]
Mamba: Linear-time sequence modeling with selective state spaces
Albert Gu and Tri Dao. Mamba: Linear-time sequence modeling with selective state spaces. In First conference on language modeling (COLM), 2024
work page 2024
-
[10]
Efficiently modeling long sequences with structured state spaces
Albert Gu, Karan Goel, and Christopher Re. Efficiently modeling long sequences with structured state spaces. InInternational Conference on Learning Representations (ICLR), 2022
work page 2022
-
[11]
Joint-mae: 2d-3d joint masked autoencoders for 3d point cloud pre-training
Ziyu Guo, Renrui Zhang, Longtian Qiu, Xianzhi Li, and Pheng-Ann Heng. Joint-mae: 2d-3d joint masked autoencoders for 3d point cloud pre-training. InProceedings of the Thirty-Second International Joint Conference on Artificial Intelligence (IJCAI), 2023
work page 2023
-
[12]
Mamba3d: Enhancing local features for 3d point cloud analysis via state space model
Xu Han, Yuan Tang, Zhaoxuan Wang, and Xianzhi Li. Mamba3d: Enhancing local features for 3d point cloud analysis via state space model. InProceedings of the 32nd ACM International Conference on Multimedia (ACM MM), 2024
work page 2024
-
[13]
Most: Efficient monarch sparse tuning for 3d representation learning
Xu Han, Yuan Tang, Jinfeng Xu, and Xianzhi Li. Most: Efficient monarch sparse tuning for 3d representation learning. InProceedings of the Computer Vision and Pattern Recognition Conference (CVPR), 2025
work page 2025
-
[14]
Über die stetige abbildung einer linie auf ein flächenstück
David Hilbert. Über die stetige abbildung einer linie auf ein flächenstück. InDritter Band: Analysis· Grundlagen der Mathematik· Physik Verschiedenes: Nebst Einer Lebensgeschichte. Springer, 1935
work page 1935
-
[15]
Parameter-efficient transfer learning for nlp
Neil Houlsby, Andrei Giurgiu, Stanislaw Jastrzebski, Bruna Morrone, Quentin De Laroussilhe, Andrea Gesmundo, Mona Attariyan, and Sylvain Gelly. Parameter-efficient transfer learning for nlp. InInternational conference on machine learning (ICML), 2019. 10
work page 2019
-
[16]
LoRA: Low-rank adaptation of large language models
Edward J Hu, Yelong Shen, Phillip Wallis, Zeyuan Allen-Zhu, Yuanzhi Li, Shean Wang, Lu Wang, and Weizhu Chen. LoRA: Low-rank adaptation of large language models. In International Conference on Learning Representations (ICLR), 2022
work page 2022
-
[17]
Menglin Jia, Luming Tang, Bor-Chun Chen, Claire Cardie, Serge Belongie, Bharath Hariharan, and Ser-Nam Lim. Visual prompt tuning. InEuropean conference on computer vision (ECCV). Springer, 2022
work page 2022
-
[18]
Zhaoyi Jiang, Yi Xu, Frederick W.B. Li, Gary K.L. Tam, Chao Song, and Bailin Yang. Oct- mamba: Mamba-based octree context entropy model for point cloud geometry compression. Pattern Recognition (PR), 2026
work page 2026
-
[19]
Revisiting the parameter efficiency of adapters from the perspective of precision redundancy
Shibo Jie, Haoqing Wang, and Zhi-Hong Deng. Revisiting the parameter efficiency of adapters from the perspective of precision redundancy. InProceedings of the IEEE/CVF international conference on computer vision (ICCV), 2023
work page 2023
-
[20]
Pointdico: Contrastive 3d representation learning guided by diffusion models
Pengbo Li, Yiding Sun, and Haozhe Cheng. Pointdico: Contrastive 3d representation learning guided by diffusion models. In2025 International Joint Conference on Neural Networks (IJCNN), 2025
work page 2025
-
[21]
Prefix-tuning: Optimizing continuous prompts for generation
Xiang Lisa Li and Percy Liang. Prefix-tuning: Optimizing continuous prompts for generation. InProceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers) (ACL), 2021
work page 2021
-
[22]
Yangyan Li, Rui Bu, Mingchao Sun, Wei Wu, Xinhan Di, and Baoquan Chen. Pointcnn: Convolution on x-transformed points.Advances in neural information processing systems (NeurIPS), 2018
work page 2018
-
[23]
Pointmamba: A simple state space model for point cloud analysis
Dingkang Liang, Xin Zhou, Wei Xu, Xingkui Zhu, Zhikang Zou, Xiaoqing Ye, Xiao Tan, and Xiang Bai. Pointmamba: A simple state space model for point cloud analysis. InAdvances in Neural Information Processing Systems (NeurIPS), 2024
work page 2024
-
[24]
Dingkang Liang, Tianrui Feng, Xin Zhou, Yumeng Zhang, Zhikang Zou, and Xiang Bai. Parameter-efficient fine-tuning in spectral domain for point cloud learning.IEEE transactions on pattern analysis and machine intelligence (TPAMI), 2025
work page 2025
-
[25]
Masked discrimination for self-supervised learning on point clouds
Haotian Liu, Mu Cai, and Yong Jae Lee. Masked discrimination for self-supervised learning on point clouds. InEuropean Conference on Computer Vision (ECCV). Springer, 2022
work page 2022
-
[26]
Relation-shape convolutional neural network for point cloud analysis
Yongcheng Liu, Bin Fan, Shiming Xiang, and Chunhong Pan. Relation-shape convolutional neural network for point cloud analysis. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition (CVPR), 2019
work page 2019
-
[27]
Yatian Pang, Wenxiao Wang, Francis E. H. Tay, Wei Liu, Yijun Tian, and Li Yuan. Masked autoencoders for point cloud self-supervised learning. InEuropean Conference on Computer Vision (ECCV), 2022
work page 2022
-
[28]
Qi, Hao Su, Kaichun Mo, and Leonidas J
Charles R. Qi, Hao Su, Kaichun Mo, and Leonidas J. Guibas. Pointnet: Deep learning on point sets for 3d classification and segmentation. InProceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2017
work page 2017
-
[29]
Qi, Li Yi, Hao Su, and Leonidas J
Charles R. Qi, Li Yi, Hao Su, and Leonidas J. Guibas. Pointnet++: Deep hierarchical feature learning on point sets in a metric space. InAdvances in Neural Information Processing Systems (NeurIPS), 2017
work page 2017
-
[30]
Con- trast with reconstruct: Contrastive 3d representation learning guided by generative pretraining
Zekun Qi, Runpei Dong, Guofan Fan, Zheng Ge, Xiangyu Zhang, Kaisheng Ma, and Li Yi. Con- trast with reconstruct: Contrastive 3d representation learning guided by generative pretraining. InProceedings of the 40th International Conference on Machine Learning (ICML), 2023
work page 2023
-
[31]
Hydramamba: Multi-head state space model for global point cloud learning
Kanglin Qu, Pan Gao, Qun Dai, and Yuanhao Sun. Hydramamba: Multi-head state space model for global point cloud learning. InProceedings of the 33rd ACM International Conference on Multimedia (ACM MM), 2025. 11
work page 2025
-
[32]
Cloudmamba: Grouped selective state spaces for point cloud analysis
Kanglin Qu, Pan Gao, Qun Dai, Zhanzhi Ye, Rui Ye, and Yuanhao Sun. Cloudmamba: Grouped selective state spaces for point cloud analysis. InProceedings of the AAAI Conference on Artificial Intelligence (AAAI), 2026
work page 2026
-
[33]
Hyperpoint: Multimodal 3d foundation model in hyperbolic space.Pattern Recognition (PR), 2026
Yiding Sun, Haozhe Cheng, Chaoyi Lu, Zhengqiao Li, Minghong Wu, Huimin Lu, and Jihua Zhu. Hyperpoint: Multimodal 3d foundation model in hyperbolic space.Pattern Recognition (PR), 2026
work page 2026
-
[34]
Align then Adapt: Rethinking Parameter-Efficient Transfer Learning in 4D Perception
Yiding Sun, Jihua Zhu, Haozhe Cheng, Chaoyi Lu, Zhichuan Yang, Lin Chen, and Yaonan Wang. Align then adapt: Rethinking parameter-efficient transfer learning in 4d perception. arXiv preprint arXiv:2602.23069, 2026
work page internal anchor Pith review Pith/arXiv arXiv 2026
-
[35]
Point-peft: Parameter-efficient fine-tuning for 3d pre-trained models
Yiwen Tang, Ray Zhang, Zoey Guo, Xianzheng Ma, Bin Zhao, Zhigang Wang, Dong Wang, and Xuelong Li. Point-peft: Parameter-efficient fine-tuning for 3d pre-trained models. In Proceedings of the AAAI Conference on Artificial Intelligence (AAAI), 2024
work page 2024
-
[36]
Mikaela Angelina Uy, Quang-Hieu Pham, Binh-Son Hua, Duc Thanh Nguyen, and Sai-Kit Yeung. Revisiting point cloud classification: A new benchmark dataset and classification model on real-world data. InInternational Conference on Computer Vision (ICCV), 2019
work page 2019
-
[37]
Visualizing data using t-sne.Journal of Machine Learning Research (JMLR), 2008
Laurens van der Maaten and Geoffrey Hinton. Visualizing data using t-sne.Journal of Machine Learning Research (JMLR), 2008
work page 2008
-
[38]
Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Łukasz Kaiser, and Illia Polosukhin. Attention is all you need. InAdvances in neural information processing systems (NeurIPS), 2017
work page 2017
-
[39]
Strumamba3d: Exploring structural mamba for self-supervised point cloud representation learning
Chuxin Wang, Yixin Zha, Wenfei Yang, and Tianzhu Zhang. Strumamba3d: Exploring structural mamba for self-supervised point cloud representation learning. InProceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), 2025
work page 2025
-
[40]
Pointlora: Low-rank adaptation with token selection for point cloud learning
Song Wang, Xiaolu Liu, Lingdong Kong, Jianyun Xu, Chunyong Hu, Gongfan Fang, Wentong Li, Jianke Zhu, and Xinchao Wang. Pointlora: Low-rank adaptation with token selection for point cloud learning. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2025
work page 2025
-
[41]
Yankai Wang, Yiding Sun, Qirui Wang, Pengbo Li, Chaoyi Lu, and Dongxu Zhang. Pointrft: Explicit reinforcement fine-tuning for point cloud few-shot learning.arXiv preprint arXiv:2603.23957, 2026
-
[42]
Yue Wang, Yongbin Sun, Ziwei Liu, Sanjay E. Sarma, Michael M. Bronstein, and Justin M. Solomon. Dynamic graph cnn for learning on point clouds.ACM Transactions on Graphics (TOG), 2019
work page 2019
-
[43]
Point transformer v2: Grouped vector attention and partition-based pooling
Xiaoyang Wu, Yixing Lao, Li Jiang, Xihui Liu, and Hengshuang Zhao. Point transformer v2: Grouped vector attention and partition-based pooling. InAdvances in Neural Information Processing Systems (NeurIPS), 2022
work page 2022
-
[44]
Point transformer v3: Simpler, faster, stronger
Xiaoyang Wu, Yuxin Lao, Li Jiang, Xiangyu Liu, and Hengshuang Zhao. Point transformer v3: Simpler, faster, stronger. InProceedings of the IEEE/CVF conference on Computer Vision and Pattern Recognition (CVPR), 2024
work page 2024
-
[45]
3d shapenets: A deep representation for volumetric shapes
Zhirong Wu, Shuran Song, Aditya Khosla, Fisher Yu, Linguang Zhang, Xiaoou Tang, and Jianxiong Xiao. 3d shapenets: A deep representation for volumetric shapes. InProceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2015
work page 2015
-
[46]
Li Yi, Vladimir G. Kim, Duygu Ceylan, I Chao Shen, Mengyan Yan, Hao Su, Cewu Lu, Qixing Huang, Alla Sheffer, and Leonidas Guibas. A scalable active framework for region annotation in 3d shape collections.ACM Transactions on Graphics (TOG), 2016
work page 2016
-
[47]
Point-bert: Pre- training 3d point cloud transformers with masked point modeling
Xiaoyang Yu, Yilun Tang, Yue Rao, Tiejun Huang, Jie Zhou, and Jiwen Lu. Point-bert: Pre- training 3d point cloud transformers with masked point modeling. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2022. 12
work page 2022
-
[48]
Instance-aware dynamic prompt tuning for pre-trained point cloud models
Yaohua Zha, Jinpeng Wang, Tao Dai, Bin Chen, Zhi Wang, and Shu-Tao Xia. Instance-aware dynamic prompt tuning for pre-trained point cloud models. InProceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), 2023
work page 2023
-
[49]
Towards compact 3d representations via point feature enhancement masked autoencoders
Yaohua Zha, Huizhen Ji, Jinmin Li, Rongsheng Li, Tao Dai, Bin Chen, Zhi Wang, and Shu-Tao Xia. Towards compact 3d representations via point feature enhancement masked autoencoders. InProceedings of the AAAI Conference on Artificial Intelligence (AAAI), 2024
work page 2024
-
[50]
Pma: Towards parameter-efficient point cloud understanding via point mamba adapter
Yaohua Zha, Yanzi Wang, Hang Guo, Jinpeng Wang, Tao Dai, Bin Chen, Zhihao Ouyang, Xue Yuerong, Ke Chen, and Shu-Tao Xia. Pma: Towards parameter-efficient point cloud understanding via point mamba adapter. InProceedings of the Computer Vision and Pattern Recognition Conference (CVPR), 2025
work page 2025
-
[51]
Exploring vision seman- tic prompt for efficient point cloud understanding
Yixin Zha, Chuxin Wang, Wenfei Yang, Tianzhu Zhang, and Feng Wu. Exploring vision seman- tic prompt for efficient point cloud understanding. InProceedings of the 42nd International Conference on Machine Learning (ICML), 2025
work page 2025
-
[52]
Dongxu Zhang, Yiding Sun, Pengcheng Li, Yumou Liu, Hongqiang Lin, Haoran Xu, Xiaoxuan Mu, Liang Lin, Wenbiao Yan, Ning Yang, et al. Pointcot: A multi-modal benchmark for explicit 3d geometric reasoning.arXiv preprint arXiv:2602.23945, 2026
-
[53]
Point-m2ae: Multi-scale masked autoencoders for hierarchical point cloud pre-training
Renrui Zhang, Ziyu Guo, Peng Gao, Rongyao Fang, Bin Zhao, Dong Wang, Yu Qiao, and Hongsheng Li. Point-m2ae: Multi-scale masked autoencoders for hierarchical point cloud pre-training. InAdvances in Neural Information Processing Systems (NeurIPS), 2022
work page 2022
-
[54]
Learning 3d representa- tions from 2d pre-trained models via image-to-point masked autoencoders
Renrui Zhang, Liuhui Wang, Yu Qiao, Peng Gao, and Hongsheng Li. Learning 3d representa- tions from 2d pre-trained models via image-to-point masked autoencoders. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition (CVPR), 2023
work page 2023
-
[55]
Point cloud mamba: Point cloud learning via state space model
Tao Zhang, Haobo Yuan, Lu Qi, Jiangning Zhang, Qianyu Zhou, Shunping Ji, Shuicheng Yan, and Xiangtai Li. Point cloud mamba: Point cloud learning via state space model. InProceedings of the AAAI Conference on Artificial Intelligence (AAAI), 2025
work page 2025
-
[56]
Hengshuang Zhao, Li Jiang, Jiaya Jia, Philip H. S. Torr, and Vladlen Koltun. Point transformer. InProceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), 2021
work page 2021
-
[57]
Xin Zhou, Dingkang Liang, Wei Xu, Xingkui Zhu, Yihan Xu, Zhikang Zou, and Xiang Bai. Dynamic adapter meets prompt tuning: Parameter-efficient transfer learning for point cloud analysis. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2024. 13 Appendix Table of Contents A Additional theoretical analysis . . . . ....
work page 2024
-
[58]
Institutional review board (IRB) approvals or equivalent for research with human subjects 33 Question: Does the paper describe potential risks incurred by study participants, whether such risks were disclosed to the subjects, and whether Institutional Review Board (IRB) approvals (or an equivalent approval/review based on the requirements of your country ...
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.