arxiv: 2605.01759 · v1 · submitted 2026-05-03 · 💻 cs.CV

Recognition: 3 theorem links

· Lean Theorem

PointCSP: Cross-Sample Semantic Propagation and Stability Preservation in Self-Supervised Point Cloud Learning

Xinxing Yu , Ajian Liu , Sunyuan Qiang , Hui Ma , Liying Yang , Yuzhong Wang , Zhi Rao , Yanyan Liang

Authors on Pith no claims yet

Pith reviewed 2026-05-08 19:27 UTC · model grok-4.3

classification 💻 cs.CV

keywords point cloudself-supervised learningsemantic consistencystate space modeldistillation3D representation learningcross-sample propagation

0 comments

The pith

Serializing point cloud samples into a state-space model propagates semantics across a batch to build global alignment, with asymmetric distillation preserving that consistency at test time.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper tries to show that the usual sample-by-sample approach in self-supervised point cloud learning leaves semantic representations misaligned from one scene to the next. By turning a batch into a single serialized sequence and feeding it to a state-space model, the network can pass semantic state from one sample to the next, explicitly tying their latent representations together. An asymmetric distillation step then lets the model keep that cross-sample structure even when it is later tested on single scenes. If this works, pretrained point-cloud models would produce more stable and transferable semantic features for downstream 3D tasks.

Core claim

The authors claim that cross-sample semantic propagation (CSP) achieved by serializing batch samples and processing them with a state-space model explicitly models dynamic dependencies across scenes, thereby establishing global semantic alignment in the latent space. They further claim that an asymmetric semantic preservation distillation (SPD) module, using heterogeneous input and feature alignment constraints, removes the batch-induced inconsistencies that would otherwise appear under single-scene testing and enables stable transfer of the pretrained semantics.

What carries the argument

Cross-sample semantic propagation (CSP) that serializes a batch of point clouds into a continuous sequence and runs it through a state-space model so semantic state can flow from one sample to the next.

If this is right

Pretrained models produce latent features whose semantic categories remain aligned even when the scenes are presented independently at test time.
Downstream 3D segmentation and classification tasks receive more stable and transferable representations than those obtained from prior self-supervised point-cloud methods.
Batch-level serialization during pretraining becomes an explicit mechanism for building cross-sample consistency instead of a source of inconsistency.
The same state-space propagation idea could be applied to other 3D modalities once the finetuning stabilization step is in place.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

If the propagation truly creates global alignment, the method might reduce the need for large-scale labeled 3D data by making unlabeled scene collections more useful for representation learning.
The approach suggests that other modalities with natural batch structure, such as video or multi-view images, could adopt similar serialization-plus-state-space designs to enforce cross-sample consistency.
A remaining open question is whether the state-space model can be replaced by simpler recurrent or attention mechanisms while retaining the same alignment benefit.

Load-bearing premise

That turning separate point-cloud samples into one serialized sequence and running it through a state-space model actually creates meaningful global semantic alignment rather than spurious correlations, and that the later asymmetric distillation step removes all batch artifacts without creating new inconsistencies.

What would settle it

A controlled test in which two scenes that should share semantic categories are processed separately after pretraining and their extracted features show no measurable increase in cross-scene consistency or downstream task accuracy compared with a standard sample-independent baseline.

Figures

Figures reproduced from arXiv: 2605.01759 by Ajian Liu, Hui Ma, Liying Yang, Sunyuan Qiang, Xinxing Yu, Yanyan Liang, Yuzhong Wang, Zhi Rao.

**Figure 1.** Figure 1: A comparison of t-SNE visualizations under identi view at source ↗

**Figure 2.** Figure 2: Overview of the proposed PointSCP. (a) The CSP models cross-sample semantic dependencies within the state-space represen view at source ↗

**Figure 3.** Figure 3: Visualization of large-scale object-level semantic segmentation results on the S3DIS Area-5 dataset, showing input point clouds, view at source ↗

**Figure 4.** Figure 4: t-SNE visualizations of features across three scene view at source ↗

**Figure 5.** Figure 5: Ablation experiments on batch size during finetuning. view at source ↗

read the original abstract

Scene-level point cloud self-supervised learning (PC-SSL) has demonstrated potential in enhancing the generalization capability of 3D vision models. Despite the advances in the field through existing methods, the sample-independent modeling paradigm still poses significant limitations in terms of maintaining consistent semantic representations across scenes. This challenge hinders the construction of a unified and transferable semantic space. To address this issue, we propose a PC-SSL framework based on cross-sample semantic propagation (CSP), in which samples within a batch are serialized into continuous input and processed by a state-space model to enable semantic state propagation. This mechanism explicitly models the dynamic dependencies across samples in the state space, allowing the network to establish cross-sample semantic consistency in the latent space and achieve global semantic alignment. Since serialization-based pretraining requires batch-level input organization, we further introduce an asymmetric semantic preservation distillation (SPD) during finetuning to achieve structural alignment of semantic transfer and eliminate inconsistencies caused by batch dependency. The proposed SPD ensures stable transfer of pretrained semantics through a heterogeneous input mechanism and a semantic feature alignment constraint. This enables the model to maintain structured semantic consistency and robustness under single-scene testing conditions. Extensive experiments on multiple benchmark datasets demonstrate that our method consistently outperforms state-of-the-art methods in both performance and semantic consistency.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

PointCSP's cross-sample state-space propagation is a fresh angle on PC-SSL consistency but the arbitrary serialization step likely injects order-dependent artifacts that the asymmetric SPD may not fully remove.

read the letter

The paper's core move is to serialize a batch of point clouds into one sequence, feed it to a state-space model so semantic states can flow across samples, and then apply asymmetric distillation at finetuning to keep things stable for single-scene testing. That combination is not a standard extension of prior PC-SSL work, and it directly targets the sample-independent limitation that most existing methods inherit.

Referee Report

2 major / 2 minor

Summary. The manuscript introduces PointCSP, a self-supervised framework for scene-level point cloud learning. It proposes cross-sample semantic propagation (CSP) that serializes an entire batch into a continuous sequence processed by a state-space model to propagate semantic states across samples and achieve global alignment in latent space. To address batch-induced inconsistencies at single-scene test time, it adds asymmetric semantic preservation distillation (SPD) during fine-tuning via heterogeneous inputs and a semantic feature alignment constraint. The central claim is that this yields consistent outperformance over state-of-the-art methods on multiple benchmarks in both task performance and semantic consistency.

Significance. If the experimental claims are robust and the method is shown to be insensitive to serialization order, the work could meaningfully advance point-cloud SSL by moving beyond sample-independent modeling toward explicit cross-sample consistency. The use of state-space models for dynamic inter-sample propagation is a distinctive technical choice that, if validated, might influence other 3D or multimodal SSL pipelines.

major comments (2)

[§3.2] §3.2 (CSP mechanism): Serialization of the batch into a single continuous sequence for the state-space model imposes an arbitrary ordering with no canonical basis in unordered point-cloud scenes. The manuscript provides no experiments that permute serialization order or measure resulting variation in semantic alignment metrics; without such controls, the claimed global semantic alignment risks encoding batch-specific ordering artifacts rather than intrinsic scene-invariant semantics, which is load-bearing for the core contribution.
[§4.1] §4.1 (SPD design): The asymmetric distillation is presented as fully removing batch-dependency inconsistencies, yet the description does not demonstrate that order-dependent features learned during CSP pre-training are prevented from being transferred or distorted by the heterogeneous-input and alignment-constraint mechanism. A direct ablation comparing SPD performance under different pre-training serialization orders is required to substantiate the stability claim.

minor comments (2)

The abstract states the outperformance claim without any numerical metrics, baseline names, or dataset identifiers; adding a concise quantitative summary would improve immediate readability.
Notation for the state-space model transition and the asymmetric distillation loss could be clarified with an explicit equation reference in the main text rather than only in supplementary material.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the thoughtful and constructive comments, which help clarify the robustness requirements for our cross-sample semantic propagation approach. We appreciate the recognition of the potential significance of using state-space models for inter-sample consistency in point cloud SSL. We address each major comment below and outline the revisions we will make to strengthen the manuscript.

read point-by-point responses

Referee: [§3.2] §3.2 (CSP mechanism): Serialization of the batch into a single continuous sequence for the state-space model imposes an arbitrary ordering with no canonical basis in unordered point-cloud scenes. The manuscript provides no experiments that permute serialization order or measure resulting variation in semantic alignment metrics; without such controls, the claimed global semantic alignment risks encoding batch-specific ordering artifacts rather than intrinsic scene-invariant semantics, which is load-bearing for the core contribution.

Authors: We agree that the arbitrary serialization order is a valid concern and that explicit controls are needed to confirm the claimed semantic alignment arises from intrinsic scene properties rather than ordering artifacts. Although the state-space model propagates semantic states sequentially with the intent of capturing dynamic cross-sample dependencies independent of specific order, we acknowledge the absence of permutation experiments in the current manuscript. In the revised version, we will add a dedicated set of experiments that apply multiple random permutations to the batch serialization order, as well as alternative groupings (e.g., by scene similarity), and report the resulting variation in semantic consistency metrics and downstream task performance. These results will be used to quantify sensitivity and, if low, to support that the global alignment is robust. revision: yes
Referee: [§4.1] §4.1 (SPD design): The asymmetric distillation is presented as fully removing batch-dependency inconsistencies, yet the description does not demonstrate that order-dependent features learned during CSP pre-training are prevented from being transferred or distorted by the heterogeneous-input and alignment-constraint mechanism. A direct ablation comparing SPD performance under different pre-training serialization orders is required to substantiate the stability claim.

Authors: We thank the referee for this precise observation on the SPD mechanism. The asymmetric distillation is designed to enforce semantic feature alignment using heterogeneous inputs precisely to decouple the transferred representations from batch-specific (including order-dependent) artifacts learned in pre-training. Nevertheless, we recognize that a direct ablation linking different pre-training serialization orders to post-SPD performance is missing and would provide stronger evidence for the stability claim. In the revision, we will include this ablation: models pre-trained under varied serialization orders will be fine-tuned with SPD, and we will report the resulting task performance and semantic consistency scores to demonstrate that SPD successfully suppresses order-induced variations. revision: yes

Circularity Check

0 steps flagged

No circularity detected in claimed derivation

full rationale

The paper proposes CSP (serializing batch samples into a state-space model for cross-sample semantic propagation) and asymmetric SPD (distillation during finetuning for single-scene stability) as novel mechanisms to address sample-independent modeling limitations in PC-SSL. No equations, predictions, or central claims reduce by construction to self-fitted parameters, prior self-citations, or renamed inputs; the abstract and described framework present these as independent architectural choices with external benchmark validation. The derivation chain remains self-contained without load-bearing self-referential reductions.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The central claim rests on untested assumptions that state-space models can propagate semantics across serialized point-cloud samples to produce global alignment and that SPD can eliminate batch dependency effects; no explicit free parameters or new entities are named in the abstract.

axioms (2)

domain assumption State-space models can model dynamic semantic dependencies across serialized batch samples to establish cross-sample consistency in latent space
Invoked as the core mechanism of CSP in the abstract
domain assumption Asymmetric semantic preservation distillation can achieve structural alignment and eliminate batch-induced inconsistencies under single-scene testing
Invoked to justify the SPD component for stable transfer

pith-pipeline@v0.9.0 · 5550 in / 1303 out tokens · 29740 ms · 2026-05-08T19:27:36.762616+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean (J-cost uniqueness) washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

samples within a batch are serialized into continuous input and processed by a state-space model to enable semantic state propagation ... h_t = f_θ(A h_{t−1} + B f'_t), y_t = C h_t
IndisputableMonolith/Foundation/LogicAsFunctionalEquation.lean derivedCost / J = ½(x+x⁻¹)−1 unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

L_SPD = (1/N) Σ ||f_s_i − f_t_i||²₂ ; L_fine-tune = L_task + λ_SPD L_SPD + λ_geo L_geo

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

64 extracted references

[1]

3d semantic parsing of large-scale indoor spaces

Iro Armeni, Ozan Sener, Amir R Zamir, Helen Jiang, Ioannis Brilakis, Martin Fischer, and Silvio Savarese. 3d semantic parsing of large-scale indoor spaces. InCVPR, pages 1534– 1543, 2016. 5

2016
[2]

Spectral informed mamba for ro- bust point cloud processing

Ali Bahri, Moslem Yazdanpanah, Mehrdad Noori, Sahar Dastani, Milad Cheraghalikhani, Gustavo Adolfo Vargas Hakim, David Osowiechi, Farzad Beizaee, Ismail Ben Ayed, and Christian Desrosiers. Spectral informed mamba for ro- bust point cloud processing. InCVPR, pages 11799–11809,
[3]

Emerg- ing properties in self-supervised vision transformers

Mathilde Caron, Hugo Touvron, Ishan Misra, Herv ´e J´egou, Julien Mairal, Piotr Bojanowski, and Armand Joulin. Emerg- ing properties in self-supervised vision transformers. In ICCV, pages 9650–9660, 2021. 2

2021
[4]

To- wards robust 3d object detection with lidar and 4d radar fu- sion in various weather conditions

Yujeong Chae, Hyeonseong Kim, and Kuk-Jin Yoon. To- wards robust 3d object detection with lidar and 4d radar fu- sion in various weather conditions. InCVPR, pages 15162– 15172, 2024. 1

2024
[5]

Decoupled local aggregation for point cloud learning, 2023

Binjie Chen, Yunzhou Xia, Yu Zang, Cheng Wang, and Jonathan Li. Decoupled local aggregation for point cloud learning, 2023. 5

2023
[6]

Pointgpt: Auto-regressively generative pre- training from point clouds

Guangyan Chen, Meiling Wang, Yi Yang, Kai Yu, Li Yuan, and Yufeng Yue. Pointgpt: Auto-regressively generative pre- training from point clouds. InNeurIPS, pages 29667–29679,
[7]

Harnessing Text-to-Image Diffu- sion Models for Point Cloud Self-Supervised Learning

Yiyang Chen, Shanshan Zhao, Lunhao Duan, Changxing Ding, and Dacheng Tao. Harnessing Text-to-Image Diffu- sion Models for Point Cloud Self-Supervised Learning. In ICCV, pages 26156–26166, 2025. 7, 8

2025
[8]

Chang, Manolis Savva, Maciej Hal- ber, Thomas Funkhouser, and Matthias Nießner

Angela Dai, Angel X. Chang, Manolis Savva, Maciej Hal- ber, Thomas Funkhouser, and Matthias Nießner. Scannet: Richly-annotated 3d reconstructions of indoor scenes. In CVPR, 2017. 5

2017
[9]

Transformers are SSMs: General- ized models and efficient algorithms through structured state space duality

Tri Dao and Albert Gu. Transformers are SSMs: General- ized models and efficient algorithms through structured state space duality. InICML, 2024. 3

2024
[10]

Linnet: Linear network for effi- cient point cloud representation learning

Hao Deng, Kunlei Jing, Shengmei Cheng, Cheng Liu, Jiawei Ru, Jiang Bo, and Lin Wang. Linnet: Linear network for effi- cient point cloud representation learning. InNeurIPS, pages 43189–43209, 2024. 7

2024
[11]

Point-gcc: Universal self-supervised 3d scene pre-training via geometry-color contrast

Guofan Fan, Zekun Qi, Wenkai Shi, and Kaisheng Ma. Point-gcc: Universal self-supervised 3d scene pre-training via geometry-color contrast. InACMMM, pages 4709–4718,
[12]

Shape2scene: 3d scene representation learning through pre- training on shape data

Tuo Feng, Wenguan Wang, Ruijie Quan, and Yi Yang. Shape2scene: 3d scene representation learning through pre- training on shape data. InECCV, pages 73–91. Springer,
[13]

Mamba: Linear-Time Sequence Modeling with Selective State Spaces

Albert Gu and Tri Dao. Mamba: Linear-Time Sequence Modeling with Selective State Spaces. InCOLM, 2024. 3

2024
[14]

Mamba3D: Enhancing Local Features for 3D Point Cloud Analysis via State Space Model

Xu Han, Yuan Tang, Zhaoxuan Wang, and Xianzhi Li. Mamba3D: Enhancing Local Features for 3D Point Cloud Analysis via State Space Model. InACM MM, pages 4995– 5004, Melbourne VIC Australia, 2024. 3, 7, 8

2024
[15]

All in One: Visual-Description-Guided Unified Point Cloud Seg- mentation

Zongyan Han, Mohamed El Amine Boudjoghra, Jiahua Dong, Jinhong Wang, and Rao Muhammad Anwer. All in One: Visual-Description-Guided Unified Point Cloud Seg- mentation. InICCV, pages 24835–24845, 2025. 5

2025
[16]

Masked scene modeling: Narrowing the gap between supervised and self-supervised learning in 3d scene understanding

Pedro Hermosilla, Christian Stippel, and Leon Sick. Masked scene modeling: Narrowing the gap between supervised and self-supervised learning in 3d scene understanding. In CVPR, pages 14835–14844, 2025. 3

2025
[17]

¨Uber die stetige abbildung einer linie auf ein fl ¨achenst¨uck

David Hilbert. ¨Uber die stetige abbildung einer linie auf ein fl ¨achenst¨uck. InDritter Band: Analysis· Grundlagen der Mathematik· Physik Verschiedenes: Nebst Einer Lebens- geschichte, pages 1–2. Springer, 1935. 3

1935
[18]

Exploring data-efficient 3d scene understanding with contrastive scene contexts

Ji Hou, Benjamin Graham, Matthias Nießner, and Saining Xie. Exploring data-efficient 3d scene understanding with contrastive scene contexts. InCVPR, pages 15587–15597,
[19]

Ponder: Point cloud pre-training via neural rendering

Di Huang, Sida Peng, Tong He, Honghui Yang, Xiaowei Zhou, and Wanli Ouyang. Ponder: Point cloud pre-training via neural rendering. InICCV, pages 16089–16098, 2023. 3

2023
[20]

Spatio-temporal self-supervised representation learning for 3d point clouds

Siyuan Huang, Yichen Xie, Song-Chun Zhu, and Yixin Zhu. Spatio-temporal self-supervised representation learning for 3d point clouds. InICCV, pages 6535–6545, 2021. 2, 3

2021
[21]

L4dr: Lidar-4dradar fusion for weather-robust 3d ob- ject detection

Xun Huang, Ziyu Xu, Hai Wu, Jinlong Wang, Qiming Xia, Yan Xia, Jonathan Li, Kyle Gao, Chenglu Wen, and Cheng Wang. L4dr: Lidar-4dradar fusion for weather-robust 3d ob- ject detection. InAAAI, pages 3806–3814, 2025. 1

2025
[22]

Self-supervised pre- training with masked shape prediction for 3d scene under- standing

Li Jiang, Zetong Yang, Shaoshuai Shi, Vladislav Golyanik, Dengxin Dai, and Bernt Schiele. Self-supervised pre- training with masked shape prediction for 3d scene under- standing. InCVPR, pages 1168–1178, 2023. 2, 3

2023
[23]

Feast-mamba: Feature and spatial aware mamba network with bidirectional orthogonal fusion for cross-modal point cloud segmentation

Chade Li, Pengju Zhang, Bo Liu, Hao Wei, and Yihong Wu. Feast-mamba: Feature and spatial aware mamba network with bidirectional orthogonal fusion for cross-modal point cloud segmentation. InAAAI, pages 4634–4642, 2025. 3

2025
[24]

A closer look at invari- ances in self-supervised pre-training for 3d vision

Lanxiao Li and Michael Heizmann. A closer look at invari- ances in self-supervised pre-training for 3d vision. InECCV, pages 656–673. Springer, 2022. 3

2022
[25]

Pamba: enhancing global inter- action in point clouds via state space model

Zhuoyuan Li, Yubo Ai, Jiahao Lu, ChuXin Wang, Jiacheng Deng, Hanzhi Chang, Yanzhe Liang, Wenfei Yang, Shifeng Zhang, and Tianzhu Zhang. Pamba: enhancing global inter- action in point clouds via state space model. InAAAI, 2025. 3

2025
[26]

PointMamba: a simple state space model for point cloud analysis

Dingkang Liang, Xin Zhou, Wei Xu, Xingkui Zhu, Zhikang Zou, Xiaoqing Ye, Xiao Tan, and Xiang Bai. PointMamba: a simple state space model for point cloud analysis. In NeurIPS, 2024. 3, 8

2024
[27]

Decoupled weight decay regularization, 2017

Ilya Loshchilov and Frank Hutter. Decoupled weight decay regularization, 2017. 5

2017
[28]

Exploring point-bev fusion for 3d point cloud object tracking with transformer.PAMI, 46(9):5921–5935, 2024

Zhipeng Luo, Changqing Zhou, Liang Pan, Gongjie Zhang, Tianrui Liu, Yueru Luo, Haiyu Zhao, Ziwei Liu, and Shijian Lu. Exploring point-bev fusion for 3d point cloud object tracking with transformer.PAMI, 46(9):5921–5935, 2024. 1

2024
[29]

Visualizing data using t-sne.Journal of machine learning research, 9 (Nov):2579–2605, 2008

Laurens van der Maaten and Geoffrey Hinton. Visualizing data using t-sne.Journal of machine learning research, 9 (Nov):2579–2605, 2008. 6

2008
[30]

International Business Machines Company, 1966

Guy M Morton.A computer oriented geodetic data base and a new technique in file sequencing. International Business Machines Company, 1966. 3

1966
[31]

3dses: an indoor lidar point cloud segmen- tation dataset with real and pseudo-labels from a 3d model,

Maxime M ´erizette, Nicolas Audebert, Pierre Kervella, and J´erˆome Verdun. 3dses: an indoor lidar point cloud segmen- tation dataset with real and pseudo-labels from a 3d model,
[32]

Segcontrast: 3d point cloud feature representation learning through self-supervised seg- ment discrimination.IEEE Robotics and Automation Letters, 7(2):2116–2123, 2022

Lucas Nunes, Rodrigo Marcuzzi, Xieyuanli Chen, Jens Behley, and Cyrill Stachniss. Segcontrast: 3d point cloud feature representation learning through self-supervised seg- ment discrimination.IEEE Robotics and Automation Letters, 7(2):2116–2123, 2022. 3

2022
[33]

Pointnext: Revisiting pointnet++ with improved training and scaling strategies

Guocheng Qian, Yuchen Li, Houwen Peng, Jinjie Mai, Hasan Hammoud, Mohamed Elhoseiny, and Bernard Ghanem. Pointnext: Revisiting pointnet++ with improved training and scaling strategies. InNeurIPS, pages 23192– 23204, 2022. 7

2022
[34]

Kpconvx: Modernizing kernel point convolution with kernel attention

Hugues Thomas, Yao-Hung Hubert Tsai, Timothy D Bar- foot, and Jian Zhang. Kpconvx: Modernizing kernel point convolution with kernel attention. InCVPR, pages 5525– 5535, 2024. 7

2024
[35]

Pdfactor: Learning tri- perspective view policy diffusion field for multi-task robotic manipulation

Jingyi Tian, Le Wang, Sanping Zhou, Sen Wang, Jiayi Li, Haowen Sun, and Wei Tang. Pdfactor: Learning tri- perspective view policy diffusion field for multi-task robotic manipulation. InCVPR, pages 15757–15767, 2025. 1

2025
[36]

Revisiting Point Cloud Classification: A New Benchmark Dataset and Clas- sification Model on Real-World Data

Mikaela Angelina Uy, Quang-Hieu Pham, Binh-Son Hua, Duc Thanh Nguyen, and Sai-Kit Yeung. Revisiting Point Cloud Classification: A New Benchmark Dataset and Clas- sification Model on Real-World Data. InICCV, 2019. 7

2019
[37]

Groupcontrast: Semantic-aware self-supervised representation learning for 3d understanding

Chengyao Wang, Li Jiang, Xiaoyang Wu, Zhuotao Tian, Bo- hao Peng, Hengshuang Zhao, and Jiaya Jia. Groupcontrast: Semantic-aware self-supervised representation learning for 3d understanding. InCVPR, pages 4917–4928, 2024. 3

2024
[38]

Masked point-entity contrast for open-vocabulary 3d scene understanding

Yan Wang, Baoxiong Jia, Ziyu Zhu, and Siyuan Huang. Masked point-entity contrast for open-vocabulary 3d scene understanding. InCVPR, 2025. 1, 3

2025
[39]

Mcgs: Markov chain gaussian splatting for dynamic scenes reconstruction

Yuzhong Wang, Wenmin Wang, Shixiong Zhang, Xinxing Yu, and Zhongheng Chen. Mcgs: Markov chain gaussian splatting for dynamic scenes reconstruction. InAAAI, pages 10341–10348, 2026. 1

2026
[40]

PoinTramba: A Hybrid Transformer-Mamba Framework for Point Cloud Analysis,

Zicheng Wang, Zhenghao Chen, Yiming Wu, Zhen Zhao, Luping Zhou, and Dong Xu. PoinTramba: A Hybrid Transformer-Mamba Framework for Point Cloud Analysis,
[41]

Masked scene contrast: A scalable framework for unsuper- vised 3d representation learning

Xiaoyang Wu, Xin Wen, Xihui Liu, and Hengshuang Zhao. Masked scene contrast: A scalable framework for unsuper- vised 3d representation learning. InCVPR, pages 9415– 9424, 2023. 3

2023
[42]

Point transformer v3: Simpler faster stronger

Xiaoyang Wu, Li Jiang, Peng-Shuai Wang, Zhijian Liu, Xi- hui Liu, Yu Qiao, Wanli Ouyang, Tong He, and Hengshuang Zhao. Point transformer v3: Simpler faster stronger. In CVPR, pages 4840–4851, 2024. 5

2024
[43]

Sonata: Self- supervised learning of reliable point representations

Xiaoyang Wu, Daniel DeTone, Duncan Frost, Tianwei Shen, Chris Xie, Nan Yang, Jakob Engel, Richard New- combe, Hengshuang Zhao, and Julian Straub. Sonata: Self- supervised learning of reliable point representations. In CVPR, pages 22193–22204, 2025. 5

2025
[44]

Mitigating object de- pendencies: Improving point cloud self-supervised learning through object exchange

Yanhao Wu, Tong Zhang, Wei Ke, Congpei Qiu, Sabine S¨usstrunk, and Mathieu Salzmann. Mitigating object de- pendencies: Improving point cloud self-supervised learning through object exchange. InCVPR, pages 23052–23061,
[45]

3d shapenets: A deep representation for volumetric shapes

Zhirong Wu, Shuran Song, Aditya Khosla, Fisher Yu, Lin- guang Zhang, Xiaoou Tang, and Jianxiong Xiao. 3d shapenets: A deep representation for volumetric shapes. In CVPR, pages 1912–1920, 2015. 7

1912
[46]

Pointcontrast: Unsupervised pre- training for 3d point cloud understanding

Saining Xie, Jiatao Gu, Demi Guo, Charles R Qi, Leonidas Guibas, and Or Litany. Pointcontrast: Unsupervised pre- training for 3d point cloud understanding. InECCV, pages 574–591. Springer, 2020. 2, 3

2020
[47]

Mm-3dscene: 3d scene under- standing by customizing masked modeling with informative- preserved reconstruction and self-distilled consistency

Mingye Xu, Mutian Xu, Tong He, Wanli Ouyang, Yali Wang, Xiaoguang Han, and Yu Qiao. Mm-3dscene: 3d scene under- standing by customizing masked modeling with informative- preserved reconstruction and self-distilled consistency. In CVPR, pages 4380–4390, 2023. 3

2023
[48]

Gated delta networks: Improving mamba2 with delta rule

Songlin Yang, Jan Kautz, and Ali Hatamizadeh. Gated delta networks: Improving mamba2 with delta rule. InICLR,
[49]

Swin3d: A pretrained transformer backbone for 3d indoor scene understanding.CVM, 11(1):83–101, 2025

Yu-Qi Yang, Yu-Xiao Guo, Jian-Yu Xiong, Yang Liu, Hao Pan, Peng-Shuai Wang, Xin Tong, and Baining Guo. Swin3d: A pretrained transformer backbone for 3d indoor scene understanding.CVM, 11(1):83–101, 2025. 7

2025
[50]

Kim, Duygu Ceylan, I.-Chao Shen, Mengyan Yan, Hao Su, Cewu Lu, Qixing Huang, Alla Shef- fer, and Leonidas Guibas

Li Yi, Vladimir G. Kim, Duygu Ceylan, I.-Chao Shen, Mengyan Yan, Hao Su, Cewu Lu, Qixing Huang, Alla Shef- fer, and Leonidas Guibas. A Scalable Active Framework for Region Annotation in 3D Shape Collections.SIGGRAPH Asia, 2016. 8

2016
[51]

Is-fusion: Instance-scene collaborative fusion for multimodal 3d object detection

Junbo Yin, Jianbing Shen, Runnan Chen, Wei Li, Ruigang Yang, Pascal Frossard, and Wenguan Wang. Is-fusion: Instance-scene collaborative fusion for multimodal 3d object detection. InCVPR, pages 14905–14915, 2024. 1

2024
[52]

Point deformable network with enhanced normal embedding for point cloud analysis

Xingyilang Yin, Xi Yang, Liangchen Liu, Nannan Wang, and Xinbo Gao. Point deformable network with enhanced normal embedding for point cloud analysis. InAAAI, pages 6738– 6746, 2024. 5

2024
[53]

FACNet: Feature alignment fast point cloud completion network.CVM, 11(1):141–157, 2025

Xinxing Yu, Jianyi Li, Chi-Chong Wong, Chi-Man V ong, and Yanyan Liang. FACNet: Feature alignment fast point cloud completion network.CVM, 11(1):141–157, 2025. 1

2025
[54]

Pointmc: Multi-view consistent en- coding and center-global feature fusion for point clouds un- derstanding

Xinxing Yu, Ajian Liu, Sunyuan Qiang, Yuzhong Wang, Hui Ma, and Yanyan Liang. Pointmc: Multi-view consistent en- coding and center-global feature fusion for point clouds un- derstanding. InAAAI, pages 12169–12177, 2026. 3

2026
[55]

Mambamos: Lidar-based 3d moving object segmentation with motion- aware state space model

Kang Zeng, Hao Shi, Jiacheng Lin, Siyu Li, Jintao Cheng, Kaiwei Wang, Zhiyong Li, and Kailun Yang. Mambamos: Lidar-based 3d moving object segmentation with motion- aware state space model. InACMMM, pages 1505–1513,
[56]

DeepLA-Net: Very Deep Local Aggregation Networks for Point Cloud Analysis

Ziyin Zeng, Mingyue Dong, Jian Zhou, Huan Qiu, Zhen Dong, Man Luo, and Bijun Li. DeepLA-Net: Very Deep Local Aggregation Networks for Point Cloud Analysis. In CVPR, pages 1330–1341, 2025. 5

2025
[57]

Point Cloud Mixture-of-Domain- Experts Model for 3D Self-supervised Learning

Yaohua Zha, Tao Dai, Hang Guo, Yanzi Wang, Bin Chen, Ke Chen, and Shu-Tao Xia. Point Cloud Mixture-of-Domain- Experts Model for 3D Self-supervised Learning. InIJCAI, pages 2332–2340. International Joint Conferences on Artifi- cial Intelligence Organization, 2025. 3, 5, 7, 8

2025
[58]

CamPoint: Boosting Point Cloud Segmen- tation with Virtual Camera

Jianhui Zhang, Yizhi Luo, Zicheng Zhang, Xuecheng Nie, and Bonan Li. CamPoint: Boosting Point Cloud Segmen- tation with Virtual Camera. InCVPR, pages 11822–11832,
[59]

Point Cloud Mamba: Point Cloud Learning via State Space Model

Tao Zhang, Haobo Yuan, Lu Qi, Jiangning Zhang, Qianyu Zhou, Shunping Ji, Shuicheng Yan, and Xiangtai Li. Point Cloud Mamba: Point Cloud Learning via State Space Model. InAAAI, pages 10121–10130, 2025. 3, 7

2025
[60]

To- wards More Diverse and Challenging Pre-training for Point Cloud Learning: Self-Supervised Cross Reconstruction with Decoupled Views

Xiangdong Zhang, Shaofeng Zhang, and Junchi Yan. To- wards More Diverse and Challenging Pre-training for Point Cloud Learning: Self-Supervised Cross Reconstruction with Decoupled Views. InICCV, pages 28696–28706, 2025. 5, 7, 8

2025
[61]

Point cloud pre-training with diffusion models

Xiao Zheng, Xiaoshui Huang, Guofeng Mei, Yuenan Hou, Zhaoyang Lyu, Bo Dai, Wanli Ouyang, and Yongshun Gong. Point cloud pre-training with diffusion models. InCVPR, pages 22935–22945, 2024. 3, 5

2024
[62]

Point cloud matters: Rethinking the impact of different observation spaces on robot learning

Haoyi Zhu, Yating Wang, Di Huang, Weicai Ye, Wanli Ouyang, and Tong He. Point cloud matters: Rethinking the impact of different observation spaces on robot learning. NeurIPS, 37:77799–77830, 2024. 1

2024
[63]

Living scenes: Multi-object relocalization and re- construction in changing 3d environments

Liyuan Zhu, Shengyu Huang, Konrad Schindler, and Iro Ar- meni. Living scenes: Multi-object relocalization and re- construction in changing 3d environments. InCVPR, pages 28014–28024, 2024. 1

2024
[64]

Improved mlp point cloud processing with high-dimensional positional encoding

Yanmei Zou, Hongshan Yu, Zhengeng Yang, Zechuan Li, and Naveed Akhtar. Improved mlp point cloud processing with high-dimensional positional encoding. InAAAI, pages 7891–7899, 2024. 5

2024