P3T: Prototypical Point-level Prompt Tuning with Enhanced Generalization for 3D Vision-Language Models
Pith reviewed 2026-05-10 09:02 UTC · model grok-4.3
The pith
A prompt tuning approach for pre-trained 3D vision-language models matches full fine-tuning performance while improving generalization under data shifts.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
P3T consists of a Point Prompter that produces instance-aware point-level prompts directly from the input point cloud and a Text Prompter that inserts learnable prompts into the text input, together with a prototypical loss that reduces intra-category variance to improve embedding alignment. This combination allows task-specific adaptation of 3D VLMs without full retraining, matching or exceeding full fine-tuning accuracy in classification and few-shot learning while demonstrating stronger robustness in cross-dataset evaluations.
What carries the argument
The Point Prompter generates instance-aware point-level prompts for each input point cloud and the Text Prompter replaces hand-crafted text with learnable prompts, with both supported by a prototypical loss that aligns embeddings by shrinking variance inside each category.
Load-bearing premise
That the combination of point-level prompts, learnable text prompts, and a prototypical loss will reduce intra-category variance and improve generalization without creating new overfitting modes or domain-specific biases missed by the experiments.
What would settle it
A controlled test on an unseen cross-dataset shift where P3T falls substantially below full fine-tuning accuracy on the target task.
Figures
read the original abstract
With the rise of pre-trained models in the 3D point cloud domain for a wide range of real-world applications, adapting them to downstream tasks has become increasingly important. However, conventional full fine-tuning methods are computationally expensive and storage-intensive. Although prompt tuning has emerged as an efficient alternative, it often suffers from overfitting, thereby compromising generalization capability. To address this issue, we propose Prototypical Point-level Prompt Tuning (P$^3$T), a parameter-efficient prompt tuning method designed for pre-trained 3D vision-language models (VLMs). P$^3$T consists of two components: 1) \textit{Point Prompter}, which generates instance-aware point-level prompts for the input point cloud, and 2) \textit{Text Prompter}, which employs learnable prompts into the input text instead of hand-crafted ones. Since both prompters operate directly on input data, P$^3$T enables task-specific adaptation of 3D VLMs without sacrificing generalizability. Furthermore, to enhance embedding space alignment, which is key to fine-tuning 3D VLMs, we introduce a prototypical loss that reduces intra-category variance. Extensive experiments demonstrate that our method matches or outperforms full fine-tuning in classification and few-shot learning, and further exhibits robust generalization under data shift in the cross-dataset setting. The code is available at \textcolor{violet}{https://github.com/gyjung975/P3T}.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript proposes Prototypical Point-level Prompt Tuning (P³T) as a parameter-efficient adaptation method for pre-trained 3D vision-language models. It introduces a Point Prompter that generates instance-aware point-level prompts directly from the input point cloud, a Text Prompter that replaces hand-crafted text prompts with learnable ones, and a prototypical loss to reduce intra-category variance and improve embedding alignment. The central empirical claim is that P³T matches or outperforms full fine-tuning on classification and few-shot learning tasks while exhibiting stronger generalization under data shift in cross-dataset evaluations.
Significance. If the reported performance and generalization results hold under rigorous verification, the work would be significant for efficient adaptation of large 3D VLMs in real-world applications where full fine-tuning is prohibitive. The emphasis on reducing overfitting via point-level and prototypical components, combined with the code release, could facilitate further research in parameter-efficient 3D prompt tuning.
major comments (2)
- Abstract and §4 (Experiments): The central claim that P³T matches or outperforms full fine-tuning and shows robust cross-dataset generalization is stated without any quantitative results, baseline comparisons, error bars, or ablation tables in the abstract; the full experimental section must supply these details (including specific datasets, shot settings, and statistical significance) to substantiate the claim, as the current presentation leaves the empirical support unverifiable.
- §3.2 (Prototypical Loss): The prototypical loss is described as reducing intra-category variance to enhance embedding alignment, but without an explicit equation or derivation showing how prototypes are computed (e.g., class means in feature space) and how the loss balances intra- vs. inter-class terms, it is unclear whether the formulation is parameter-free or risks introducing domain-specific biases not captured in the reported experiments.
minor comments (2)
- §3.1 (Point Prompter): Clarify the exact architecture and parameter count of the Point Prompter relative to the frozen backbone to strengthen the parameter-efficiency argument.
- Figure 1 and §3: Ensure the diagram of the overall P³T pipeline explicitly labels the flow from point cloud through Point Prompter to the VLM and the integration of the prototypical loss during training.
Simulated Author's Rebuttal
We thank the referee for the constructive comments, which have helped us improve the clarity and rigor of the manuscript. We address each major comment below and have made revisions to strengthen the presentation of our empirical results and the formal description of the prototypical loss.
read point-by-point responses
-
Referee: Abstract and §4 (Experiments): The central claim that P³T matches or outperforms full fine-tuning and shows robust cross-dataset generalization is stated without any quantitative results, baseline comparisons, error bars, or ablation tables in the abstract; the full experimental section must supply these details (including specific datasets, shot settings, and statistical significance) to substantiate the claim, as the current presentation leaves the empirical support unverifiable.
Authors: We appreciate the referee's emphasis on verifiability. The abstract is kept concise per standard practice, but we have revised it to include key quantitative highlights (e.g., accuracy gains on ModelNet40 classification and few-shot tasks relative to full fine-tuning). Section 4 already provides the requested details: comprehensive tables comparing P³T to full fine-tuning and other baselines across specific datasets (ModelNet40, ScanObjectNN, ShapeNet), shot settings (1-shot to 16-shot), ablation studies on each component, and cross-dataset generalization results. In the revision we have added error bars to all main tables and a brief note on statistical significance testing to further substantiate the claims. revision: yes
-
Referee: §3.2 (Prototypical Loss): The prototypical loss is described as reducing intra-category variance to enhance embedding alignment, but without an explicit equation or derivation showing how prototypes are computed (e.g., class means in feature space) and how the loss balances intra- vs. inter-class terms, it is unclear whether the formulation is parameter-free or risks introducing domain-specific biases not captured in the reported experiments.
Authors: We thank the referee for noting the need for greater formality. In the revised §3.2 we now include the explicit equation: prototypes are computed as the mean of L2-normalized embeddings per class in the batch; the loss is L_proto = L_intra + λ L_inter, where L_intra pulls samples to their class prototype and L_inter repels different prototypes. The formulation introduces no new parameters beyond the prompters themselves. Cross-dataset results already demonstrate that no harmful domain-specific bias is introduced, as performance remains stable or improves under distribution shift; we have added a short derivation paragraph explaining the variance-reduction motivation. revision: yes
Circularity Check
No significant circularity detected
full rationale
The paper introduces P3T as an empirical method for parameter-efficient adaptation of 3D VLMs via point-level and text prompters plus a prototypical loss to reduce intra-category variance. All central claims (matching or outperforming full fine-tuning in classification/few-shot settings and robust cross-dataset generalization) are presented as outcomes of experiments rather than any first-principles derivation or prediction. No equations, uniqueness theorems, self-citations as load-bearing premises, or fitted parameters renamed as predictions appear in the abstract or described structure. The approach is self-contained against external benchmarks through reported implementation details and results, with no reduction of outputs to inputs by construction.
Axiom & Free-Parameter Ledger
invented entities (3)
-
Point Prompter
no independent evidence
-
Text Prompter
no independent evidence
-
prototypical loss
no independent evidence
Reference graph
Works this paper leans on
-
[1]
Pointnet: Deep learning on point sets for 3d classification and segmentation,
C. R. Qi, H. Su, K. Mo, and L. J. Guibas, “Pointnet: Deep learning on point sets for 3d classification and segmentation,” inProceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), July 2017, pp. 652–660
work page 2017
-
[2]
Pointcnn: Con- volution on x-transformed points,
Y . Li, R. Bu, M. Sun, W. Wu, X. Di, and B. Chen, “Pointcnn: Con- volution on x-transformed points,” inAdvances in Neural Information Processing Systems, 2018
work page 2018
-
[3]
Pointconv: Deep convolutional networks on 3d point clouds,
W. Wu, Z. Qi, and L. Fuxin, “Pointconv: Deep convolutional networks on 3d point clouds,” inProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), June 2019, pp. 9621–9630
work page 2019
-
[4]
H. Zhao, L. Jiang, J. Jia, P. H. Torr, and V . Koltun, “Point trans- former,” inProceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), October 2021, pp. 16 259–16 268
work page 2021
-
[5]
Pointmixer: Mlp-mixer for point cloud understanding,
J. Choe, C. Park, F. Rameau, J. Park, and I. S. Kweon, “Pointmixer: Mlp-mixer for point cloud understanding,” inEuropean Conference on Computer Vision (ECCV), 2022, pp. 620–640
work page 2022
-
[6]
Rethinking network design and local geometry in point cloud: A simple residual MLP framework,
X. Ma, C. Qin, H. You, H. Ran, and Y . Fu, “Rethinking network design and local geometry in point cloud: A simple residual MLP framework,” inInternational Conference on Learning Representations, 2022
work page 2022
-
[7]
Pointcon- trast: Unsupervised pre-training for 3d point cloud understanding,
S. Xie, J. Gu, D. Guo, C. R. Qi, L. Guibas, and O. Litany, “Pointcon- trast: Unsupervised pre-training for 3d point cloud understanding,” in European Conference on Computer Vision (ECCV), 2020
work page 2020
-
[8]
Meshmae: Masked autoencoders for 3d mesh data analysis,
Y . Liang, S. Zhao, B. Yu, J. Zhang, and F. He, “Meshmae: Masked autoencoders for 3d mesh data analysis,” inEuropean Conference on Computer Vision (ECCV), S. Avidan, G. Brostow, M. Ciss ´e, G. M. Farinella, and T. Hassner, Eds., 2022, pp. 37–54
work page 2022
-
[9]
Point-bert: Pre-training 3d point cloud transformers with masked point modeling,
X. Yu, L. Tang, Y . Rao, T. Huang, J. Zhou, and J. Lu, “Point-bert: Pre-training 3d point cloud transformers with masked point modeling,” inProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), June 2022, pp. 19 313–19 322
work page 2022
-
[10]
Point-m2AE: Multi-scale masked autoencoders for hier- archical point cloud pre-training,
R. Zhang, Z. Guo, P. Gao, R. Fang, B. Zhao, D. Wang, Y . Qiao, and H. Li, “Point-m2AE: Multi-scale masked autoencoders for hier- archical point cloud pre-training,” inAdvances in Neural Information Processing Systems, 2022, pp. 27 061–27 074
work page 2022
-
[11]
F. Long, T. Yao, Z. Qiu, L. Li, and T. Mei, “Pointclustering: Un- supervised point cloud pre-training using transformation invariance in clustering,” inProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), June 2023, pp. 21 824–21 834
work page 2023
-
[12]
Point cloud pre-training with diffusion models,
X. Zheng, X. Huang, G. Mei, Y . Hou, Z. Lyu, B. Dai, W. Ouyang, and Y . Gong, “Point cloud pre-training with diffusion models,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), June 2024, pp. 22 935–22 945
work page 2024
-
[13]
Groupcontrast: Semantic-aware self-supervised representation learn- ing for 3d understanding,
C. Wang, L. Jiang, X. Wu, Z. Tian, B. Peng, H. Zhao, and J. Jia, “Groupcontrast: Semantic-aware self-supervised representation learn- ing for 3d understanding,” inProceedings of the IEEE/CVF Confer- ence on Computer Vision and Pattern Recognition (CVPR), June 2024, pp. 4917–4928
work page 2024
-
[14]
Ulip: Learning a unified representation of language, images, and point clouds for 3d understanding,
L. Xue, M. Gao, C. Xing, R. Mart ´ın-Mart´ın, J. Wu, C. Xiong, R. Xu, J. C. Niebles, and S. Savarese, “Ulip: Learning a unified representation of language, images, and point clouds for 3d understanding,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), June 2023, pp. 1179–1189
work page 2023
-
[15]
Ulip-2: Towards scalable multimodal pre-training for 3d understanding,
L. Xue, N. Yu, S. Zhang, A. Panagopoulou, J. Li, R. Mart ´ın-Mart´ın, J. Wu, C. Xiong, R. Xu, J. C. Niebles, and S. Savarese, “Ulip-2: Towards scalable multimodal pre-training for 3d understanding,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), June 2024, pp. 27 091–27 101
work page 2024
-
[16]
Learning transferable visual models from natural language supervi- sion,
A. Radford, J. W. Kim, C. Hallacy, A. Ramesh, G. Goh, S. Agarwal, G. Sastry, A. Askell, P. Mishkin, J. Clark, G. Krueger, and I. Sutskever, “Learning transferable visual models from natural language supervi- sion,” inProceedings of the 38th International Conference on Machine Learning, ser. Proceedings of Machine Learning Research, vol. 139. PMLR, 18–24 J...
work page 2021
-
[18]
Prefix-tuning: Optimizing continuous prompts for generation,
X. L. Li and P. Liang, “Prefix-tuning: Optimizing continuous prompts for generation,” inProceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), Aug. 2021, pp. 4582–4597
work page 2021
-
[19]
Learning to prompt for vision-language models,
K. Zhou, J. Yang, C. C. Loy, and Z. Liu, “Learning to prompt for vision-language models,” inInternational Journal of Computer Vision (IJCV), Sept. 2022, pp. 2337–2348
work page 2022
-
[20]
Conditional prompt learning for vision-language models,
K. Zhou, J. Yang, C. C. Loy, and Z. Liu, “Conditional prompt learning for vision-language models,” inProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), June 2022, pp. 16 816–16 825
work page 2022
-
[21]
M. Jia, L. Tang, B.-C. Chen, C. Cardie, S. Belongie, B. Hariharan, and S.-N. Lim, “Visual prompt tuning,” inEuropean Conference on Computer Vision (ECCV), S. Avidan, G. Brostow, M. Ciss ´e, G. M. Farinella, and T. Hassner, Eds., 2022, pp. 709–727
work page 2022
-
[22]
Maple: Multi-modal prompt learning,
M. U. Khattak, H. Rasheed, M. Maaz, S. Khan, and F. S. Khan, “Maple: Multi-modal prompt learning,” inProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), June 2023, pp. 19 113–19 122
work page 2023
-
[23]
Distribution-aware prompt tuning for vision-language models,
E. Cho, J. Kim, and H. J. Kim, “Distribution-aware prompt tuning for vision-language models,” inProceedings of the IEEE/CVF Inter- national Conference on Computer Vision (ICCV), October 2023, pp. 22 004–22 013
work page 2023
-
[24]
Tcp:textual-based class-aware prompt tuning for visual-language model,
H. Yao, R. Zhang, and C. Xu, “Tcp:textual-based class-aware prompt tuning for visual-language model,” inProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), June 2024, pp. 23 438–23 448
work page 2024
-
[25]
Instance- aware dynamic prompt tuning for pre-trained point cloud models,
Y . Zha, J. Wang, T. Dai, B. Chen, Z. Wang, and S.-T. Xia, “Instance- aware dynamic prompt tuning for pre-trained point cloud models,” in Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), October 2023, pp. 14 161–14 170
work page 2023
-
[26]
Dynamic adapter meets prompt tuning: Parameter-efficient transfer learning for point cloud analysis,
X. Zhou, D. Liang, W. Xu, X. Zhu, Y . Xu, Z. Zou, and X. Bai, “Dynamic adapter meets prompt tuning: Parameter-efficient transfer learning for point cloud analysis,” inProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), June 2024, pp. 14 707–14 717
work page 2024
-
[27]
Parameter-efficient prompt learning for 3d point cloud understanding,
H. Sun, Y . Wang, W. Chen, H. Deng, and D. Li, “Parameter-efficient prompt learning for 3d point cloud understanding,” in2024 IEEE International Conference on Robotics and Automation (ICRA), 2024, pp. 9478–9486
work page 2024
-
[28]
Exploring visual prompts for adapting large-scale models,
H. Bahng, A. Jahanian, S. Sankaranarayanan, and P. Isola, “Exploring visual prompts for adapting large-scale models,” 2022
work page 2022
-
[29]
Blackvip: Black-box visual prompting for robust transfer learning,
C. Oh, H. Hwang, H.-y. Lee, Y . Lim, G. Jung, J. Jung, H. Choi, and K. Song, “Blackvip: Black-box visual prompting for robust transfer learning,” inProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), June 2023, pp. 24 224–24 235
work page 2023
-
[30]
Visual-language prompt tuning with knowledge-guided context optimization,
H. Yao, R. Zhang, and C. Xu, “Visual-language prompt tuning with knowledge-guided context optimization,” inProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), June 2023, pp. 6757–6767
work page 2023
-
[31]
Self-regulating prompts: Foundational model adaptation without forgetting,
M. U. Khattak, S. T. Wasim, M. Naseer, S. Khan, M.-H. Yang, and F. S. Khan, “Self-regulating prompts: Foundational model adaptation without forgetting,” inProceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), October 2023, pp. 15 190– 15 200
work page 2023
-
[32]
Consistency-guided prompt learning for vision-language models,
S. Roy and A. Etemad, “Consistency-guided prompt learning for vision-language models,” inInternational Conference on Learning Representations, 2024
work page 2024
-
[33]
Scaling up visual and vision-language representation learning with noisy text supervision,
C. Jia, Y . Yang, Y . Xia, Y .-T. Chen, Z. Parekh, H. Pham, Q. Le, Y .-H. Sung, Z. Li, and T. Duerig, “Scaling up visual and vision-language representation learning with noisy text supervision,” inProceedings of the 38th International Conference on Machine Learning, ser. Proceedings of Machine Learning Research, vol. 139. PMLR, 18–24 Jul 2021, pp. 4904–4916
work page 2021
-
[34]
Slip: Self-supervision meets language-image pre-training,
N. Mu, A. Kirillov, D. Wagner, and S. Xie, “Slip: Self-supervision meets language-image pre-training,” inEuropean Conference on Com- puter Vision (ECCV), S. Avidan, G. Brostow, M. Ciss ´e, G. M. Farinella, and T. Hassner, Eds., 2022, pp. 529–544
work page 2022
-
[35]
Internvl: Scaling up vision foundation models and aligning for generic visual-linguistic tasks,
Z. Chen, J. Wu, W. Wang, W. Su, G. Chen, S. Xing, M. Zhong, Q. Zhang, X. Zhu, L. Lu, B. Li, P. Luo, T. Lu, Y . Qiao, and J. Dai, “Internvl: Scaling up vision foundation models and aligning for generic visual-linguistic tasks,” inProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), June 2024, pp. 24 185–24 198
work page 2024
-
[36]
The power of scale for parameter-efficient prompt tuning,
B. Lester, R. Al-Rfou, and N. Constant, “The power of scale for parameter-efficient prompt tuning,” inProceedings of the 2021 Con- ference on Empirical Methods in Natural Language Processing, Nov. 2021, pp. 3045–3059
work page 2021
-
[37]
Autoprompt: Eliciting knowledge from language models with auto- matically generated prompts,
T. Shin, Y . Razeghi, R. L. Logan IV , E. Wallace, and S. Singh, “Autoprompt: Eliciting knowledge from language models with auto- matically generated prompts,” inProceedings of the 2020 Conference on Empirical Methods in Natural Language Processing, Nov. 2020, pp. 4222–4235
work page 2020
-
[38]
Z. Jiang, J. Araki, H. Ding, and G. Neubig, “How can we know when language models know? on the calibration of language models for question answering,”Transactions of the Association for Computa- tional Linguistics, vol. 9, pp. 962–977, 2021
work page 2021
-
[39]
Bitfit: Sim- ple parameter-efficient fine-tuning for transformer-based masked language-models,
E. Ben Zaken, Y . Goldberg, and S. Ravfogel, “Bitfit: Sim- ple parameter-efficient fine-tuning for transformer-based masked language-models,” inProceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), May 2022, pp. 1–9
work page 2022
-
[40]
Y . Lu, J. Liu, Y . Zhang, Y . Liu, and X. Tian, “Prompt distribution learning,” inProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), June 2022, pp. 5206–5215
work page 2022
-
[41]
Prompt generation networks for input-space adaptation of frozen vision transformers,
J. Loedeman, M. C. Stol, T. Han, and Y . M. Asano, “Prompt generation networks for input-space adaptation of frozen vision transformers,” in British Machine Vision Conference (BMVC), 2024
work page 2024
-
[42]
Read- only prompt optimization for vision-language few-shot learning,
D. Lee, S. Song, J. Suh, J. Choi, S. Lee, and H. J. Kim, “Read- only prompt optimization for vision-language few-shot learning,” in Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), October 2023, pp. 1401–1411
work page 2023
-
[43]
Parameter-efficient transfer learning for NLP,
N. Houlsby, A. Giurgiu, S. Jastrzebski, B. Morrone, Q. De Laroussilhe, A. Gesmundo, M. Attariyan, and S. Gelly, “Parameter-efficient transfer learning for NLP,” inProceedings of the 36th International Conference on Machine Learning, ser. Proceedings of Machine Learning Research, vol. 97. PMLR, 09–15 Jun 2019, pp. 2790–2799
work page 2019
-
[44]
Adaptformer: Adapting vision transformers for scalable visual recog- nition,
S. Chen, C. GE, Z. Tong, J. Wang, Y . Song, J. Wang, and P. Luo, “Adaptformer: Adapting vision transformers for scalable visual recog- nition,” inAdvances in Neural Information Processing Systems, 2022
work page 2022
-
[45]
Towards a unified view of parameter-efficient transfer learning,
J. He, C. Zhou, X. Ma, T. Berg-Kirkpatrick, and G. Neubig, “Towards a unified view of parameter-efficient transfer learning,” inInternational Conference on Learning Representations, 2022
work page 2022
-
[46]
Fact: Factor-tuning for lightweight adaptation on vision transformer,
S. Jie and Z.-H. Deng, “Fact: Factor-tuning for lightweight adaptation on vision transformer,”Proceedings of the AAAI Conference on Artificial Intelligence, vol. 37, pp. 1060–1068, Jun. 2023
work page 2023
-
[47]
Cheap and quick: Efficient vision-language instruction tuning for large language models,
G. Luo, Y . Zhou, T. Ren, S. Chen, X. Sun, and R. Ji, “Cheap and quick: Efficient vision-language instruction tuning for large language models,” inAdvances in Neural Information Processing Systems, 2023
work page 2023
-
[48]
Point-peft: Parameter-efficient fine-tuning for 3d pre- trained models,
Y . Tang, R. Zhang, Z. Guo, X. Ma, B. Zhao, Z. Wang, D. Wang, and X. Li, “Point-peft: Parameter-efficient fine-tuning for 3d pre- trained models,”Proceedings of the AAAI Conference on Artificial Intelligence, vol. 38, no. 6, pp. 5171–5179, Mar. 2024
work page 2024
-
[49]
Point- PRC: A prompt learning based regulation framework for generalizable point cloud analysis,
H. Sun, Q. Ke, Y . Wang, W. Chen, K. Yang, D. Li, and J. Cai, “Point- PRC: A prompt learning based regulation framework for generalizable point cloud analysis,” inAdvances in Neural Information Processing Systems, 2024
work page 2024
-
[50]
Dynamic graph cnn for learning on point clouds,
Y . Wang, Y . Sun, Z. Liu, S. E. Sarma, M. M. Bronstein, and J. M. Solomon, “Dynamic graph cnn for learning on point clouds,”ACM Transactions on Graphics (TOG), 2019
work page 2019
-
[51]
Prototypical networks for few- shot learning,
J. Snell, K. Swersky, and R. Zemel, “Prototypical networks for few- shot learning,” inAdvances in Neural Information Processing Systems, 2017
work page 2017
-
[52]
Pointnet: Deep learning on point sets for 3d classification and segmentation,
C. R. Qi, H. Su, K. Mo, and L. J. Guibas, “Pointnet: Deep learning on point sets for 3d classification and segmentation,” inProceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), July 2017
work page 2017
-
[53]
Mvtn: Multi-view trans- formation network for 3d shape recognition,
A. Hamdi, S. Giancola, and B. Ghanem, “Mvtn: Multi-view trans- formation network for 3d shape recognition,” inProceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), October 2021, pp. 1–11
work page 2021
-
[54]
Unsupervised point cloud pre-training via occlusion completion,
H. Wang, Q. Liu, X. Yue, J. Lasenby, and M. J. Kusner, “Unsupervised point cloud pre-training via occlusion completion,” inProceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), October 2021, pp. 9782–9792
work page 2021
-
[55]
Crosspoint: Self-supervised cross-modal contrastive learning for 3d point cloud understanding,
M. Afham, I. Dissanayake, D. Dissanayake, A. Dharmasiri, K. Thi- lakarathna, and R. Rodrigo, “Crosspoint: Self-supervised cross-modal contrastive learning for 3d point cloud understanding,” inProceed- ings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), June 2022, pp. 9902–9912
work page 2022
-
[56]
Masked discrimination for self- supervised learning on point clouds,
H. Liu, M. Cai, and Y . J. Lee, “Masked discrimination for self- supervised learning on point clouds,” inEuropean Conference on Computer Vision (ECCV), S. Avidan, G. Brostow, M. Ciss ´e, G. M. Farinella, and T. Hassner, Eds., 2022, pp. 657–675
work page 2022
-
[57]
PointGPT: Auto-regressively generative pre-training from point clouds,
G. Chen, M. Wang, Y . Yang, K. Yu, L. Yuan, and Y . Yue, “PointGPT: Auto-regressively generative pre-training from point clouds,” inThirty- seventh Conference on Neural Information Processing Systems, 2023
work page 2023
-
[58]
R. Dong, Z. Qi, L. Zhang, J. Zhang, J. Sun, Z. Ge, L. Yi, and K. Ma, “Autoencoders as cross-modal teachers: Can pretrained 2d image transformers help 3d representation learning?” inThe Eleventh International Conference on Learning Representations, 2023
work page 2023
-
[59]
Contrast with reconstruct: Contrastive 3D representation learning guided by generative pretraining,
Z. Qi, R. Dong, G. Fan, Z. Ge, X. Zhang, K. Ma, and L. Yi, “Contrast with reconstruct: Contrastive 3D representation learning guided by generative pretraining,” inProceedings of the 40th International Con- ference on Machine Learning, ser. Proceedings of Machine Learning Research, vol. 202. PMLR, 23–29 Jul 2023, pp. 28 223–28 243
work page 2023
-
[60]
3d shapenets: A deep representation for volumetric shapes,
Z. Wu, S. Song, A. Khosla, F. Yu, L. Zhang, X. Tang, and J. Xiao, “3d shapenets: A deep representation for volumetric shapes,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2015
work page 2015
-
[61]
M. A. Uy, Q.-H. Pham, B.-S. Hua, T. Nguyen, and S.-K. Yeung, “Revisiting point cloud classification: A new benchmark dataset and classification model on real-world data,” inProceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), October 2019
work page 2019
-
[62]
Objaverse: A universe of annotated 3d objects,
M. Deitke, D. Schwenk, J. Salvador, L. Weihs, O. Michel, E. Vander- Bilt, L. Schmidt, K. Ehsani, A. Kembhavi, and A. Farhadi, “Objaverse: A universe of annotated 3d objects,” inProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), June 2023, pp. 13 142–13 153
work page 2023
-
[63]
Shapenet: An information-rich 3d model repository,
A. X. Chang, T. Funkhouser, L. Guibas, P. Hanrahan, Q. Huang, Z. Li, S. Savarese, M. Savva, S. Song, H. Su, J. Xiao, L. Yi, and F. Yu, “Shapenet: An information-rich 3d model repository,” 2015
work page 2015
-
[64]
Pointclip v2: Prompting clip and gpt for powerful 3d open-world learning,
X. Zhu, R. Zhang, B. He, Z. Guo, Z. Zeng, Z. Qin, S. Zhang, and P. Gao, “Pointclip v2: Prompting clip and gpt for powerful 3d open-world learning,” inProceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), October 2023, pp. 2639– 2650
work page 2023
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.