PointTPA: Dynamic Network Parameter Adaptation for 3D Scene Understanding
Pith reviewed 2026-05-10 18:40 UTC · model grok-4.3
The pith
PointTPA generates input-aware parameters for local patches in 3D point clouds, raising ScanNet mIoU to 78.4 percent with under 2 percent added parameters.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
PointTPA is a test-time parameter adaptation framework that uses Serialization-based Neighborhood Grouping to form locally coherent patches from input point clouds and a Dynamic Parameter Projector to produce patch-wise adaptive weights; when integrated into the PTv3 backbone these two lightweight modules, together less than 2 percent of the original parameters, enable the network to adjust its behavior to scene-specific variations and reach 78.4 percent mIoU on ScanNet validation while outperforming prior parameter-efficient fine-tuning approaches on multiple benchmarks.
What carries the argument
The Dynamic Parameter Projector, which takes patch features from Serialization-based Neighborhood Grouping and outputs custom network weights for each patch so the backbone can change its computation according to the current scene.
If this is right
- The backbone maintains strong performance on ScanNet validation while the added modules stay below 2 percent of its parameter count.
- The same modules surpass existing parameter-efficient fine-tuning methods across several 3D scene benchmarks.
- The network adjusts its internal behavior to each scene's geometry and layout during inference without any additional training pass.
- Local patch grouping followed by per-patch weight generation keeps the adaptation both spatially coherent and computationally light.
Where Pith is reading between the lines
- The same patch-wise adaptation idea could be tested on outdoor LiDAR data where scene layouts change even more abruptly than in indoor scans.
- If the projector proves stable, future models might replace heavy pre-training on mixed datasets with lightweight on-the-fly adjustment for each new environment.
- The approach hints that conditional weight generation may be more efficient than adding more layers or channels when the goal is robustness to scene diversity.
Load-bearing premise
The patch-wise parameters produced by the Dynamic Parameter Projector will improve results on diverse scenes without introducing instability or requiring scene-specific tuning that was not disclosed.
What would settle it
Running PointTPA on a new collection of indoor scenes with deliberately varied layouts and measuring whether mIoU falls below the static PTv3 baseline or fluctuates sharply when the projector is replaced by random weights of the same size.
Figures
read the original abstract
Scene-level point cloud understanding remains challenging due to diverse geometries, imbalanced category distributions, and highly varied spatial layouts. Existing methods improve object-level performance but rely on static network parameters during inference, limiting their adaptability to dynamic scene data. We propose PointTPA, a Test-time Parameter Adaptation framework that generates input-aware network parameters for scene-level point clouds. PointTPA adopts a Serialization-based Neighborhood Grouping (SNG) to form locally coherent patches and a Dynamic Parameter Projector (DPP) to produce patch-wise adaptive weights, enabling the backbone to adjust its behavior according to scene-specific variations while maintaining a low parameter overhead. Integrated into the PTv3 structure, PointTPA demonstrates strong parameter efficiency by introducing two lightweight modules of less than 2% of the backbone's parameters. Despite this minimal parameter overhead, PointTPA achieves 78.4% mIoU on ScanNet validation, surpassing existing parameter-efficient fine-tuning (PEFT) methods across multiple benchmarks, highlighting the efficacy of our test-time dynamic network parameter adaptation mechanism in enhancing 3D scene understanding. The code is available at https://github.com/H-EmbodVis/PointTPA.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper proposes PointTPA, a test-time parameter adaptation framework for scene-level 3D point cloud understanding. It introduces Serialization-based Neighborhood Grouping (SNG) to form locally coherent patches from point clouds and a Dynamic Parameter Projector (DPP) to generate input-aware patch-wise weights that adapt the PTv3 backbone to scene-specific geometry and layout variations. The two modules add less than 2% parameters to the backbone; the method reports 78.4% mIoU on ScanNet validation and outperforms existing parameter-efficient fine-tuning (PEFT) approaches across multiple benchmarks.
Significance. If the dynamic adaptation mechanism proves robust, the work would offer a practical route to parameter-efficient handling of diverse 3D scenes without retraining or large overhead, addressing a real limitation of static networks in scene understanding. The low parameter count and code release are positive for reproducibility and deployment.
major comments (3)
- [Experimental results] Experimental results: the headline 78.4% mIoU on ScanNet validation is presented without an ablation that isolates the contribution of the Dynamic Parameter Projector (DPP) from the Serialization-based Neighborhood Grouping (SNG) alone; this is load-bearing for the central claim that input-aware dynamic weights drive the improvement.
- [Experimental results] Experimental results: no per-scene or per-category variance statistics or stability analysis is reported for the patch-wise parameters produced by DPP, leaving the claim of reliable adaptation across diverse geometries and layouts unverified.
- [Method] Method description: the manuscript provides no statement on whether the DPP projection weights or any adaptation step requires scene-dependent hyperparameter choices; if such tuning is present but undisclosed, the parameter-efficiency argument is weakened.
minor comments (2)
- [Method] The abstract and method sections use the term 'test-time' but the precise inference-time procedure (e.g., whether DPP runs once per scene or per patch) should be clarified with a diagram or pseudocode.
- [Experiments] Table captions and baseline descriptions should explicitly state whether all compared PEFT methods were trained with identical optimizer, schedule, and data augmentation settings.
Simulated Author's Rebuttal
We thank the referee for the constructive comments on our manuscript. We address each major comment point by point below, agreeing where revisions are needed to strengthen the presentation of our results and claims.
read point-by-point responses
-
Referee: [Experimental results] Experimental results: the headline 78.4% mIoU on ScanNet validation is presented without an ablation that isolates the contribution of the Dynamic Parameter Projector (DPP) from the Serialization-based Neighborhood Grouping (SNG) alone; this is load-bearing for the central claim that input-aware dynamic weights drive the improvement.
Authors: We agree that an explicit ablation isolating the DPP from SNG would more directly support the central claim regarding the benefit of input-aware dynamic weights. Our current experiments demonstrate gains of the full PointTPA (SNG + DPP) over the PTv3 baseline and PEFT methods, but do not include this specific isolation. We will add the requested ablation in the revised manuscript, evaluating SNG with static parameters versus the full dynamic adaptation. revision: yes
-
Referee: [Experimental results] Experimental results: no per-scene or per-category variance statistics or stability analysis is reported for the patch-wise parameters produced by DPP, leaving the claim of reliable adaptation across diverse geometries and layouts unverified.
Authors: We acknowledge that variance and stability statistics would provide stronger verification of reliable adaptation. While overall benchmark improvements suggest robustness, such per-scene analysis was not included in the original submission. In the revision, we will incorporate per-scene and per-category variance statistics for the DPP-generated parameters along with basic stability metrics. revision: yes
-
Referee: [Method] Method description: the manuscript provides no statement on whether the DPP projection weights or any adaptation step requires scene-dependent hyperparameter choices; if such tuning is present but undisclosed, the parameter-efficiency argument is weakened.
Authors: We confirm that the DPP projection weights and all adaptation steps use fixed hyperparameters with no scene-dependent choices or tuning. These values were selected once on a validation set and held constant across all scenes. We will add an explicit clarifying statement in the method section of the revised manuscript. revision: yes
Circularity Check
No circularity; empirical architecture proposal with no load-bearing derivations
full rationale
The paper proposes PointTPA as a test-time adaptation framework consisting of Serialization-based Neighborhood Grouping (SNG) and Dynamic Parameter Projector (DPP) modules inserted into PTv3. All central claims are framed as empirical outcomes: the modules add <2% parameters and yield 78.4% mIoU on ScanNet validation while outperforming PEFT baselines. No equations, uniqueness theorems, fitted-parameter predictions, or self-citation chains are invoked that would reduce the reported gains to the inputs by construction. The derivation chain is therefore self-contained as an engineering contribution whose validity rests on external benchmark results rather than internal redefinition.
Axiom & Free-Parameter Ledger
free parameters (2)
- SNG patch size and serialization parameters
- DPP output dimension and projection weights
axioms (1)
- domain assumption PTv3 backbone layers can accept and benefit from externally supplied patch-wise parameters without retraining the core weights.
invented entities (2)
-
Serialization-based Neighborhood Grouping (SNG)
no independent evidence
-
Dynamic Parameter Projector (DPP)
no independent evidence
Lean theorems connected to this paper
-
IndisputableMonolith/Cost/FunctionalEquation.leanwashburn_uniqueness_aczel unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
PointTPA adopts a Serialization-based Neighborhood Grouping (SNG) to form locally coherent patches and a Dynamic Parameter Projector (DPP) to produce patch-wise adaptive weights
-
IndisputableMonolith/Foundation/DimensionForcing.leanalexander_duality_circle_linking unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
achieves 78.4% mIoU on ScanNet validation... two lightweight modules of less than 2% of the backbone's parameters
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Reference graph
Works this paper leans on
-
[1]
Crosspoint: Self-supervised cross-modal contrastive learning for 3d point cloud understanding
Mohamed Afham, Isuru Dissanayake, Dinithi Dissanayake, Amaya Dharmasiri, Kanchana Thilakarathna, and Ranga Ro- drigo. Crosspoint: Self-supervised cross-modal contrastive learning for 3d point cloud understanding. InProc. IEEE Conf. Comput. Vis. Pattern Recognit., pages 9902–9912,
-
[2]
Randlora: full rank parameter-efficient fine- tuning of large models
Paul Albert, Frederic Z Zhang, Cristian Rodriguez-Opazo, Hemanth Saratchandran, Anton van den Hengel, and Ehsan Abbasnejad. Randlora: full rank parameter-efficient fine- tuning of large models. 2024. 6, 12
work page 2024
-
[3]
3d semantic parsing of large-scale indoor spaces
Iro Armeni, Ozan Sener, Amir R Zamir, Helen Jiang, Ioannis Brilakis, Martin Fischer, and Silvio Savarese. 3d semantic parsing of large-scale indoor spaces. InProc. IEEE Conf. Comput. Vis. Pattern Recognit., pages 1534–1543, 2016. 5, 6, 13
work page 2016
-
[4]
Clip2scene: Towards label-efficient 3d scene under- standing by clip
Runnan Chen, Youquan Liu, Lingdong Kong, Xinge Zhu, Yuexin Ma, Yikang Li, Yuenan Hou, Yu Qiao, and Wenping Wang. Clip2scene: Towards label-efficient 3d scene under- standing by clip. InProc. IEEE Conf. Comput. Vis. Pattern Recognit., pages 7020–7030, 2023. 3
work page 2023
-
[5]
V oxelnext: Fully sparse voxelnet for 3d object detection and tracking
Yukang Chen, Jianhui Liu, Xiangyu Zhang, Xiaojuan Qi, and Jiaya Jia. V oxelnext: Fully sparse voxelnet for 3d object detection and tracking. InProc. IEEE Conf. Comput. Vis. Pattern Recognit., pages 21674–21683, 2023. 3
work page 2023
-
[6]
4d spatio-temporal convnets: Minkowski convolutional neural networks
Christopher Choy, JunYoung Gwak, and Silvio Savarese. 4d spatio-temporal convnets: Minkowski convolutional neural networks. InProc. IEEE Conf. Comput. Vis. Pattern Recog- nit., pages 3075–3084, 2019. 6
work page 2019
-
[7]
Pointcept Contributors. Pointcept: A codebase for point cloud perception research.https://github.com/ Pointcept/Pointcept, 2023. 5
work page 2023
-
[8]
Scannet: Richly-annotated 3d reconstructions of indoor scenes
Angela Dai, Angel X Chang, Manolis Savva, Maciej Hal- ber, Thomas Funkhouser, and Matthias Nießner. Scannet: Richly-annotated 3d reconstructions of indoor scenes. In Proc. IEEE Conf. Comput. Vis. Pattern Recognit., pages 5828–5839, 2017. 2, 5, 6, 8, 12, 13
work page 2017
-
[9]
Super sparse 3d object detection.IEEE Trans
Lue Fan, Yuxue Yang, Feng Wang, Naiyan Wang, and Zhaoxiang Zhang. Super sparse 3d object detection.IEEE Trans. Pattern Anal. Mach. Intell., 45(10):12490–12505,
-
[10]
Parameter efficient point cloud prompt tuning for unified point cloud understanding.IEEE Trans
Ben Fei, Liwen Liu, Weidong Yang, Zhijun Li, Wen-Ming Chen, and Lipeng Ma. Parameter efficient point cloud prompt tuning for unified point cloud understanding.IEEE Trans. Intell. Vehicles, 10(1):255–271, 2025. 3
work page 2025
-
[11]
Haoyu Fu, Diankun Zhang, Zongchuang Zhao, Jianfeng Cui, Dingkang Liang, Chong Zhang, Dingyuan Zhang, Hongwei Xie, Bing Wang, and Xiang Bai. Orion: A holistic end-to- end autonomous driving framework by vision-language in- structed action generation. InProc. IEEE Int. Conf. Comput. Vis., pages 24823–24834, 2025. 2
work page 2025
-
[12]
Haoyu Fu, Diankun Zhang, Zongchuang Zhao, Jianfeng Cui, Hongwei Xie, Bing Wang, Guang Chen, Dingkang Liang, and Xiang Bai. Minddrive: A vision-language-action model for autonomous driving via online reinforcement learning. arXiv preprint arXiv:2512.13636, 2025. 2
-
[13]
Pct: Point cloud transformer.Computational visual media, 7(2):187–199,
Meng-Hao Guo, Jun-Xiong Cai, Zheng-Ning Liu, Tai-Jiang Mu, Ralph R Martin, and Shi-Min Hu. Pct: Point cloud transformer.Computational visual media, 7(2):187–199,
-
[14]
Deep learning for 3d point clouds: A survey.IEEE Trans
Yulan Guo, Hanyun Wang, Qingyong Hu, Hao Liu, Li Liu, and Mohammed Bennamoun. Deep learning for 3d point clouds: A survey.IEEE Trans. Pattern Anal. Mach. Intell., 43(12):4338–4364, 2020. 1
work page 2020
-
[15]
Exploring data-efficient 3d scene understanding with contrastive scene contexts
Ji Hou, Benjamin Graham, Matthias Nießner, and Saining Xie. Exploring data-efficient 3d scene understanding with contrastive scene contexts. InProc. IEEE Conf. Comput. Vis. Pattern Recognit., pages 15587–15597, 2021. 3
work page 2021
-
[16]
Parameter-efficient transfer learning for nlp
Neil Houlsby, Andrei Giurgiu, Stanislaw Jastrzebski, Bruna Morrone, Quentin De Laroussilhe, Andrea Gesmundo, Mona Attariyan, and Sylvain Gelly. Parameter-efficient transfer learning for nlp. InProc. Int. Conf. Mach. Learn., pages 2790–2799, 2019. 5, 6
work page 2019
-
[17]
Lora: Low- rank adaptation of large language models
Edward J Hu, Phillip Wallis, Zeyuan Allen-Zhu, Yuanzhi Li, Shean Wang, Lu Wang, Weizhu Chen, et al. Lora: Low- rank adaptation of large language models. InProc. Int. Conf. Learn. Representations, 2022. 3, 5, 6, 12
work page 2022
-
[18]
Menglin Jia, Luming Tang, Bor-Chun Chen, Claire Cardie, Serge Belongie, Bharath Hariharan, and Ser-Nam Lim. Vi- sual prompt tuning. InProc. Eur. Conf. Comput. Vis., pages 709–727, 2022. 3
work page 2022
-
[19]
Pointgroup: Dual-set point grouping for 3d instance segmentation
Li Jiang, Hengshuang Zhao, Shaoshuai Shi, Shu Liu, Chi- Wing Fu, and Jiaya Jia. Pointgroup: Dual-set point grouping for 3d instance segmentation. InProc. IEEE Conf. Comput. Vis. Pattern Recognit., pages 4867–4876, 2020. 1
work page 2020
-
[20]
Vera: Vector-based random matrix adaptation
Dawid Jan Kopiczko, Tijmen Blankevoort, and Yuki M Asano. Vera: Vector-based random matrix adaptation. In Proc. Int. Conf. Learn. Representations, 2024. 5, 6, 12
work page 2024
-
[21]
Dds3d: Dense pseudo-labels with dynamic threshold for semi-supervised 3d object detection
Jingyu Li, Zhe Liu, Jinghua Hou, and Dingkang Liang. Dds3d: Dense pseudo-labels with dynamic threshold for semi-supervised 3d object detection. InProc. IEEE Int. Conf. Robotics Automation, pages 9245–9252, 2023. 2
work page 2023
-
[22]
Sgdrive: Scene-to-goal hierarchical world cognition for autonomous driving
Jingyu Li, Junjie Wu, Dongnan Hu, Xiangkai Huang, Bin Sun, Zhihui Hao, Xianpeng Lang, Xiatian Zhu, and Li Zhang. Sgdrive: Scene-to-goal hierarchical world cognition for autonomous driving. InProc. IEEE Conf. Comput. Vis. Pattern Recognit., 2026. 1
work page 2026
-
[23]
Imagidrive: A unified imagination-and- planning framework for autonomous driving
Jingyu Li, Bozhou Zhang, Xin Jin, Jiankang Deng, Xiatian Zhu, and Li Zhang. Imagidrive: A unified imagination-and- planning framework for autonomous driving. InProc. IEEE Int. Conf. Robotics Automation, 2026. 2
work page 2026
-
[24]
Geoteacher: Geometry-guided semi-supervised 3d object detection
Jingyu Li, Xiaolong Zhao, Zhe Liu, Wenxiao Wu, and Li Zhang. Geoteacher: Geometry-guided semi-supervised 3d object detection. InProc. IEEE Int. Conf. Robotics Automa- tion, 2026. 2
work page 2026
-
[25]
Prefix-tuning: Optimizing continuous prompts for generation
Xiang Lisa Li and Percy Liang. Prefix-tuning: Optimizing continuous prompts for generation. InProc. Annual Meet- ing of the Association for Computational Linguistics, pages 4582–4597, 2021. 3, 5, 6, 12
work page 2021
-
[26]
Pointmamba: A simple state space model for point cloud analysis
Dingkang Liang, Xin Zhou, Wei Xu, Xingkui Zhu, Zhikang Zou, Xiaoqing Ye, Xiao Tan, and Xiang Bai. Pointmamba: A simple state space model for point cloud analysis. InProc. Adv. Neural Inf. Process. Syst., pages 32653–32677, 2024. 2 9
work page 2024
-
[27]
Parameter-efficient fine-tuning in spectral domain for point cloud learning.IEEE Trans
Dingkang Liang, Tianrui Feng, Xin Zhou, Yumeng Zhang, Zhikang Zou, and Xiang Bai. Parameter-efficient fine-tuning in spectral domain for point cloud learning.IEEE Trans. Pattern Anal. Mach. Intell., 47(12):10949–10966, 2025. 1, 2, 3, 5, 6, 12
work page 2025
-
[28]
Sood++: Leveraging unlabeled data to boost oriented object detection.IEEE Trans
Dingkang Liang, Wei Hua, Chunsheng Shi, Zhikang Zou, Xiaoqing Ye, and Xiang Bai. Sood++: Leveraging unlabeled data to boost oriented object detection.IEEE Trans. Pattern Anal. Mach. Intell., 48(1):840–858, 2025. 2
work page 2025
-
[29]
Cook and clean together: Teaching embodied agents for parallel task execution
Dingkang Liang, Cheng Zhang, Xiaopeng Xu, Jianzhong Ju, Zhenbo Luo, and Xiang Bai. Cook and clean together: Teaching embodied agents for parallel task execution. In Proc. AAAI Conf. Artif. Intell., pages 18415–18424, 2026. 1
work page 2026
-
[30]
Unifuture: A 4d driving world model for future generation and perception
Dingkang Liang, Dingyuan Zhang, Xin Zhou, Sifan Tu, Tianrui Feng, Xiaofan Li, Yumeng Zhang, Mingyang Du, Xiao Tan, and Xiang Bai. Unifuture: A 4d driving world model for future generation and perception. InProc. IEEE Int. Conf. Robotics Automation, 2026. 1
work page 2026
-
[31]
A closer look at local aggregation operators in point cloud anal- ysis
Ze Liu, Han Hu, Yue Cao, Zheng Zhang, and Xin Tong. A closer look at local aggregation operators in point cloud anal- ysis. InProc. Eur. Conf. Comput. Vis., pages 326–342, 2020. 2
work page 2020
-
[32]
Transformers in 3d point clouds: A survey.arXiv preprint arXiv:2205.07417, 2022
Dening Lu, Qian Xie, Mingqiang Wei, Kyle Gao, Linlin Xu, and Jonathan Li. Transformers in 3d point clouds: A survey. arXiv preprint arXiv:2205.07417, 2022. 1
-
[33]
V oxel transformer for 3d object detection
Jiageng Mao, Yujing Xue, Minzhe Niu, Haoyue Bai, Jiashi Feng, Xiaodan Liang, Hang Xu, and Chunjing Xu. V oxel transformer for 3d object detection. InProc. IEEE Int. Conf. Comput. Vis., pages 3164–3173, 2021. 3
work page 2021
-
[34]
Masked autoencoders for point cloud self-supervised learning
Yatian Pang, Wenxiao Wang, Francis EH Tay, Wei Liu, Yonghong Tian, and Li Yuan. Masked autoencoders for point cloud self-supervised learning. InProc. Eur. Conf. Comput. Vis., pages 604–621, 2022. 3
work page 2022
-
[35]
Self-positioning point-based transformer for point cloud understanding
Jinyoung Park, Sanghyeok Lee, Sihyeon Kim, Yunyang Xiong, and Hyunwoo J Kim. Self-positioning point-based transformer for point cloud understanding. InProc. IEEE Conf. Comput. Vis. Pattern Recognit., pages 21814–21823,
-
[36]
Pointnet: Deep learning on point sets for 3d classification and segmentation
Charles R Qi, Hao Su, Kaichun Mo, and Leonidas J Guibas. Pointnet: Deep learning on point sets for 3d classification and segmentation. InProc. IEEE Conf. Comput. Vis. Pattern Recognit., pages 652–660, 2017. 2
work page 2017
-
[37]
Pointnet++: Deep hierarchical feature learning on point sets in a metric space
Charles Ruizhongtai Qi, Li Yi, Hao Su, and Leonidas J Guibas. Pointnet++: Deep hierarchical feature learning on point sets in a metric space. InProc. Adv. Neural Inf. Pro- cess. Syst., pages 5105–5114, 2017. 2
work page 2017
-
[38]
Pointnext: Revisiting pointnet++ with improved training and scaling strategies
Guocheng Qian, Yuchen Li, Houwen Peng, Jinjie Mai, Hasan Hammoud, Mohamed Elhoseiny, and Bernard Ghanem. Pointnext: Revisiting pointnet++ with improved training and scaling strategies. InProc. Adv. Neural Inf. Pro- cess. Syst., pages 23192–23204, 2022. 2, 6
work page 2022
-
[39]
Language- grounded indoor 3d semantic segmentation in the wild
David Rozenberszki, Or Litany, and Angela Dai. Language- grounded indoor 3d semantic segmentation in the wild. In Proc. Eur. Conf. Comput. Vis., pages 125–141, 2022. 12, 13
work page 2022
-
[40]
Multi-view convolutional neural networks for 3d shape recognition
Hang Su, Subhransu Maji, Evangelos Kalogerakis, and Erik Learned-Miller. Multi-view convolutional neural networks for 3d shape recognition. InProc. IEEE Int. Conf. Comput. Vis., pages 945–953, 2015. 3
work page 2015
-
[41]
Parameter-efficient prompt learning for 3d point cloud understanding
Hongyu Sun, Yongcai Wang, Wang Chen, Haoran Deng, and Deying Li. Parameter-efficient prompt learning for 3d point cloud understanding. InProc. IEEE Int. Conf. Robotics Au- tomation, pages 9478–9486, 2024. 3
work page 2024
-
[42]
Point- peft: Parameter-efficient fine-tuning for 3d pre-trained mod- els
Yiwen Tang, Ray Zhang, Zoey Guo, Xianzheng Ma, Bin Zhao, Zhigang Wang, Dong Wang, and Xuelong Li. Point- peft: Parameter-efficient fine-tuning for 3d pre-trained mod- els. InProc. AAAI Conf. Artif. Intell., pages 5171–5179,
-
[43]
Any2point: Empowering any-modality large models for efficient 3d understanding
Yiwen Tang, Ray Zhang, Jiaming Liu, Zoey Guo, Bin Zhao, Zhigang Wang, Peng Gao, Hongsheng Li, Dong Wang, and Xuelong Li. Any2point: Empowering any-modality large models for efficient 3d understanding. InProc. Eur. Conf. Comput. Vis., pages 456–473, 2024. 3
work page 2024
-
[44]
Mojtaba Valipour, Mehdi Rezagholizadeh, Ivan Kobyzev, and Ali Ghodsi. Dylora: Parameter-efficient tuning of pre- trained models using dynamic search-free low-rank adapta- tion. InProceedings of the 17th Conference of the European Chapter of the Association for Computational Linguistics, pages 3274–3287, 2023. 3
work page 2023
-
[45]
Dynamic graph cnn for learning on point clouds.ACM Trans
Yue Wang, Yongbin Sun, Ziwei Liu, Sanjay E Sarma, Michael M Bronstein, and Justin M Solomon. Dynamic graph cnn for learning on point clouds.ACM Trans. ON Graphics, 38(5):1–12, 2019. 3
work page 2019
-
[46]
Point transformer v2: Grouped vector atten- tion and partition-based pooling
Xiaoyang Wu, Yixing Lao, Li Jiang, Xihui Liu, and Heng- shuang Zhao. Point transformer v2: Grouped vector atten- tion and partition-based pooling. InProc. Adv. Neural Inf. Process. Syst., pages 33330–33342, 2022. 2
work page 2022
-
[47]
Masked scene contrast: A scalable framework for unsuper- vised 3d representation learning
Xiaoyang Wu, Xin Wen, Xihui Liu, and Hengshuang Zhao. Masked scene contrast: A scalable framework for unsuper- vised 3d representation learning. InProc. IEEE Conf. Com- put. Vis. Pattern Recognit., pages 9415–9424, 2023. 1, 3
work page 2023
-
[48]
Point transformer v3: Simpler faster stronger
Xiaoyang Wu, Li Jiang, Peng-Shuai Wang, Zhijian Liu, Xi- hui Liu, Yu Qiao, Wanli Ouyang, Tong He, and Hengshuang Zhao. Point transformer v3: Simpler faster stronger. InProc. IEEE Conf. Comput. Vis. Pattern Recognit., pages 4840– 4851, 2024. 2, 3, 5, 6
work page 2024
-
[49]
Towards large- scale 3d representation learning with multi-dataset point prompt training
Xiaoyang Wu, Zhuotao Tian, Xin Wen, Bohao Peng, Xihui Liu, Kaicheng Yu, and Hengshuang Zhao. Towards large- scale 3d representation learning with multi-dataset point prompt training. InProc. IEEE Conf. Comput. Vis. Pattern Recognit., pages 19551–19562, 2024. 6
work page 2024
-
[50]
Sonata: Self- supervised learning of reliable point representations
Xiaoyang Wu, Daniel DeTone, Duncan Frost, Tianwei Shen, Chris Xie, Nan Yang, Jakob Engel, Richard New- combe, Hengshuang Zhao, and Julian Straub. Sonata: Self- supervised learning of reliable point representations. InProc. IEEE Conf. Comput. Vis. Pattern Recognit., pages 22193– 22204, 2025. 1, 3, 5, 6, 12
work page 2025
-
[51]
Walk in the cloud: Learning curves for point clouds shape analysis
Tiange Xiang, Chaoyi Zhang, Yang Song, Jianhui Yu, and Weidong Cai. Walk in the cloud: Learning curves for point clouds shape analysis. InProc. IEEE Int. Conf. Comput. Vis., pages 915–924, 2021. 2
work page 2021
-
[52]
Pointcontrast: Unsupervised pre- training for 3d point cloud understanding
Saining Xie, Jiatao Gu, Demi Guo, Charles R Qi, Leonidas Guibas, and Or Litany. Pointcontrast: Unsupervised pre- training for 3d point cloud understanding. InProc. Eur. Conf. Comput. Vis., pages 574–591, 2020. 1, 3 10
work page 2020
-
[53]
A unified framework for 3d scene un- derstanding
Wei Xu, Chunsheng Shi, Sifan Tu, Xin Zhou, Dingkang Liang, and Xiang Bai. A unified framework for 3d scene un- derstanding. InProc. Adv. Neural Inf. Process. Syst., pages 59468–59490, 2024. 2
work page 2024
-
[54]
Scannet++: A high-fidelity dataset of 3d indoor scenes
Chandan Yeshwanth, Yueh-Cheng Liu, Matthias Nießner, and Angela Dai. Scannet++: A high-fidelity dataset of 3d indoor scenes. InProc. IEEE Int. Conf. Comput. Vis., pages 12–22, 2023. 2, 5, 6, 13
work page 2023
-
[55]
Point-bert: Pre-training 3d point cloud transformers with masked point modeling
Xumin Yu, Lulu Tang, Yongming Rao, Tiejun Huang, Jie Zhou, and Jiwen Lu. Point-bert: Pre-training 3d point cloud transformers with masked point modeling. InProc. IEEE Conf. Comput. Vis. Pattern Recognit., pages 19313–19322,
-
[56]
Bitfit: Simple parameter-efficient fine-tuning for transformer-based masked language-models
Elad Ben Zaken, Yoav Goldberg, and Shauli Ravfogel. Bitfit: Simple parameter-efficient fine-tuning for transformer-based masked language-models. InProc. Annual Meeting of the Association for Computational Linguistics, pages 1–9, 2022. 5, 6, 12
work page 2022
-
[57]
Sfr: Semantic-aware feature ren- dering of point cloud
Yaohua Zha, Rongsheng Li, Tao Dai, Jianyu Xiong, Xin Wang, and Shu-Tao Xia. Sfr: Semantic-aware feature ren- dering of point cloud. InProc. Int. Conf. Acoustics, Speech, Signal Process., pages 1–5, 2023. 1
work page 2023
-
[58]
Instance-aware dynamic prompt tuning for pre-trained point cloud models
Yaohua Zha, Jinpeng Wang, Tao Dai, Bin Chen, Zhi Wang, and Shu-Tao Xia. Instance-aware dynamic prompt tuning for pre-trained point cloud models. InProc. IEEE Int. Conf. Comput. Vis., pages 14161–14170, 2023. 1, 3, 5, 6, 8, 12, 13
work page 2023
-
[59]
Towards compact 3d representations via point feature enhancement masked au- toencoders
Yaohua Zha, Huizhen Ji, Jinmin Li, Rongsheng Li, Tao Dai, Bin Chen, Zhi Wang, and Shu-Tao Xia. Towards compact 3d representations via point feature enhancement masked au- toencoders. InProc. AAAI Conf. Artif. Intell., pages 6962– 6970, 2024. 2
work page 2024
-
[60]
Lcm: Locally constrained compact point cloud model for masked point modeling
Yaohua Zha, Naiqi Li, Yanzi Wang, Tao Dai, Hang Guo, Bin Chen, Zhi Wang, Zhihao Ouyang, and Shu-Tao Xia. Lcm: Locally constrained compact point cloud model for masked point modeling. pages 104816–104842, 2024. 2
work page 2024
-
[61]
Yaohua Zha, Yanzi Wang, Tao Dai, and Shu-Tao Xia. Pre- training point cloud compact model with partial-aware re- construction.arXiv preprint arXiv:2407.09344, 2024. 3
-
[62]
Point cloud mixture-of-domain- experts model for 3d self-supervised learning
Yaohua Zha, Tao Dai, Hang Guo, Yanzi Wang, Bin Chen, Ke Chen, and Shu-Tao Xia. Point cloud mixture-of-domain- experts model for 3d self-supervised learning. InProc. Int. Joint Conf. Artif. Intell., pages 2332–2340, 2025. 3
work page 2025
-
[63]
Pma: Towards parameter-efficient point cloud understanding via point mamba adapter
Yaohua Zha, Yanzi Wang, Hang Guo, Jinpeng Wang, Tao Dai, Bin Chen, Zhihao Ouyang, Xue Yuerong, Ke Chen, and Shu-Tao Xia. Pma: Towards parameter-efficient point cloud understanding via point mamba adapter. InProc. IEEE Conf. Comput. Vis. Pattern Recognit., pages 16976–16986, 2025. 3
work page 2025
-
[64]
A simple vision transformer for weakly semi-supervised 3d object de- tection
Dingyuan Zhang, Dingkang Liang, Zhikang Zou, Jingyu Li, Xiaoqing Ye, Zhe Liu, Xiao Tan, and Xiang Bai. A simple vision transformer for weakly semi-supervised 3d object de- tection. InProc. IEEE Int. Conf. Comput. Vis., pages 8373– 8383, 2023. 1
work page 2023
-
[65]
Point-m2ae: multi-scale masked autoencoders for hierarchical point cloud pre-training
Renrui Zhang, Ziyu Guo, Peng Gao, Rongyao Fang, Bin Zhao, Dong Wang, Yu Qiao, and Hongsheng Li. Point-m2ae: multi-scale masked autoencoders for hierarchical point cloud pre-training. InProc. Adv. Neural Inf. Process. Syst., pages 27061–27074, 2022. 2, 3
work page 2022
-
[66]
Pointclip: Point cloud understanding by clip
Renrui Zhang, Ziyu Guo, Wei Zhang, Kunchang Li, Xu- peng Miao, Bin Cui, Yu Qiao, Peng Gao, and Hongsheng Li. Pointclip: Point cloud understanding by clip. InProc. IEEE Conf. Comput. Vis. Pattern Recognit., pages 8552– 8562, 2022. 3
work page 2022
-
[67]
Starting from non-parametric net- works for 3d point cloud analysis
Renrui Zhang, Liuhui Wang, Yali Wang, Peng Gao, Hong- sheng Li, and Jianbo Shi. Starting from non-parametric net- works for 3d point cloud analysis. InProc. IEEE Conf. Com- put. Vis. Pattern Recognit., pages 5344–5353, 2023. 2
work page 2023
-
[68]
Hengshuang Zhao, Li Jiang, Jiaya Jia, Philip HS Torr, and Vladlen Koltun. Point transformer. InProc. IEEE Int. Conf. Comput. Vis., pages 16259–16268, 2021. 1, 3
work page 2021
-
[69]
Dynamic adapter meets prompt tuning: Parameter-efficient transfer learning for point cloud analysis
Xin Zhou, Dingkang Liang, Wei Xu, Xingkui Zhu, Yihan Xu, Zhikang Zou, and Xiang Bai. Dynamic adapter meets prompt tuning: Parameter-efficient transfer learning for point cloud analysis. InProc. IEEE Conf. Comput. Vis. Pattern Recognit., pages 14707–14717, 2024. 1, 3, 5, 6, 12, 13
work page 2024
-
[70]
Hermes: A unified self-driving world model for simultaneous 3d scene understanding and generation
Xin Zhou, Dingkang Liang, Sifan Tu, Xiwu Chen, Yikang Ding, Dingyuan Zhang, Feiyang Tan, Hengshuang Zhao, and Xiang Bai. Hermes: A unified self-driving world model for simultaneous 3d scene understanding and generation. In Proc. IEEE Int. Conf. Comput. Vis., pages 27817–27827,
-
[71]
V oxelnet: End-to-end learn- ing for point cloud based 3d object detection
Yin Zhou and Oncel Tuzel. V oxelnet: End-to-end learn- ing for point cloud based 3d object detection. InProc. IEEE Conf. Comput. Vis. Pattern Recognit., pages 4490– 4499, 2018. 3
work page 2018
-
[72]
Point- clip v2: Prompting clip and gpt for powerful 3d open-world learning
Xiangyang Zhu, Renrui Zhang, Bowei He, Ziyu Guo, Ziyao Zeng, Zipeng Qin, Shanghang Zhang, and Peng Gao. Point- clip v2: Prompting clip and gpt for powerful 3d open-world learning. InProc. IEEE Int. Conf. Comput. Vis., pages 2639– 2650, 2023. 3 11 Supplementary Material S1. Additional Experiments S1.1. Analysis on Different Rank One of our core hyperparame...
work page 2023
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.