pith. machine review for the scientific record. sign in

arxiv: 2605.01759 · v1 · submitted 2026-05-03 · 💻 cs.CV

Recognition: 3 theorem links

· Lean Theorem

PointCSP: Cross-Sample Semantic Propagation and Stability Preservation in Self-Supervised Point Cloud Learning

Authors on Pith no claims yet

Pith reviewed 2026-05-08 19:27 UTC · model grok-4.3

classification 💻 cs.CV
keywords point cloudself-supervised learningsemantic consistencystate space modeldistillation3D representation learningcross-sample propagation
0
0 comments X

The pith

Serializing point cloud samples into a state-space model propagates semantics across a batch to build global alignment, with asymmetric distillation preserving that consistency at test time.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper tries to show that the usual sample-by-sample approach in self-supervised point cloud learning leaves semantic representations misaligned from one scene to the next. By turning a batch into a single serialized sequence and feeding it to a state-space model, the network can pass semantic state from one sample to the next, explicitly tying their latent representations together. An asymmetric distillation step then lets the model keep that cross-sample structure even when it is later tested on single scenes. If this works, pretrained point-cloud models would produce more stable and transferable semantic features for downstream 3D tasks.

Core claim

The authors claim that cross-sample semantic propagation (CSP) achieved by serializing batch samples and processing them with a state-space model explicitly models dynamic dependencies across scenes, thereby establishing global semantic alignment in the latent space. They further claim that an asymmetric semantic preservation distillation (SPD) module, using heterogeneous input and feature alignment constraints, removes the batch-induced inconsistencies that would otherwise appear under single-scene testing and enables stable transfer of the pretrained semantics.

What carries the argument

Cross-sample semantic propagation (CSP) that serializes a batch of point clouds into a continuous sequence and runs it through a state-space model so semantic state can flow from one sample to the next.

If this is right

  • Pretrained models produce latent features whose semantic categories remain aligned even when the scenes are presented independently at test time.
  • Downstream 3D segmentation and classification tasks receive more stable and transferable representations than those obtained from prior self-supervised point-cloud methods.
  • Batch-level serialization during pretraining becomes an explicit mechanism for building cross-sample consistency instead of a source of inconsistency.
  • The same state-space propagation idea could be applied to other 3D modalities once the finetuning stabilization step is in place.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • If the propagation truly creates global alignment, the method might reduce the need for large-scale labeled 3D data by making unlabeled scene collections more useful for representation learning.
  • The approach suggests that other modalities with natural batch structure, such as video or multi-view images, could adopt similar serialization-plus-state-space designs to enforce cross-sample consistency.
  • A remaining open question is whether the state-space model can be replaced by simpler recurrent or attention mechanisms while retaining the same alignment benefit.

Load-bearing premise

That turning separate point-cloud samples into one serialized sequence and running it through a state-space model actually creates meaningful global semantic alignment rather than spurious correlations, and that the later asymmetric distillation step removes all batch artifacts without creating new inconsistencies.

What would settle it

A controlled test in which two scenes that should share semantic categories are processed separately after pretraining and their extracted features show no measurable increase in cross-scene consistency or downstream task accuracy compared with a standard sample-independent baseline.

Figures

Figures reproduced from arXiv: 2605.01759 by Ajian Liu, Hui Ma, Liying Yang, Sunyuan Qiang, Xinxing Yu, Yanyan Liang, Yuzhong Wang, Zhi Rao.

Figure 1
Figure 1. Figure 1: A comparison of t-SNE visualizations under identi view at source ↗
Figure 2
Figure 2. Figure 2: Overview of the proposed PointSCP. (a) The CSP models cross-sample semantic dependencies within the state-space represen view at source ↗
Figure 3
Figure 3. Figure 3: Visualization of large-scale object-level semantic segmentation results on the S3DIS Area-5 dataset, showing input point clouds, view at source ↗
Figure 4
Figure 4. Figure 4: t-SNE visualizations of features across three scene view at source ↗
Figure 5
Figure 5. Figure 5: Ablation experiments on batch size during finetuning. view at source ↗
read the original abstract

Scene-level point cloud self-supervised learning (PC-SSL) has demonstrated potential in enhancing the generalization capability of 3D vision models. Despite the advances in the field through existing methods, the sample-independent modeling paradigm still poses significant limitations in terms of maintaining consistent semantic representations across scenes. This challenge hinders the construction of a unified and transferable semantic space. To address this issue, we propose a PC-SSL framework based on cross-sample semantic propagation (CSP), in which samples within a batch are serialized into continuous input and processed by a state-space model to enable semantic state propagation. This mechanism explicitly models the dynamic dependencies across samples in the state space, allowing the network to establish cross-sample semantic consistency in the latent space and achieve global semantic alignment. Since serialization-based pretraining requires batch-level input organization, we further introduce an asymmetric semantic preservation distillation (SPD) during finetuning to achieve structural alignment of semantic transfer and eliminate inconsistencies caused by batch dependency. The proposed SPD ensures stable transfer of pretrained semantics through a heterogeneous input mechanism and a semantic feature alignment constraint. This enables the model to maintain structured semantic consistency and robustness under single-scene testing conditions. Extensive experiments on multiple benchmark datasets demonstrate that our method consistently outperforms state-of-the-art methods in both performance and semantic consistency.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The manuscript introduces PointCSP, a self-supervised framework for scene-level point cloud learning. It proposes cross-sample semantic propagation (CSP) that serializes an entire batch into a continuous sequence processed by a state-space model to propagate semantic states across samples and achieve global alignment in latent space. To address batch-induced inconsistencies at single-scene test time, it adds asymmetric semantic preservation distillation (SPD) during fine-tuning via heterogeneous inputs and a semantic feature alignment constraint. The central claim is that this yields consistent outperformance over state-of-the-art methods on multiple benchmarks in both task performance and semantic consistency.

Significance. If the experimental claims are robust and the method is shown to be insensitive to serialization order, the work could meaningfully advance point-cloud SSL by moving beyond sample-independent modeling toward explicit cross-sample consistency. The use of state-space models for dynamic inter-sample propagation is a distinctive technical choice that, if validated, might influence other 3D or multimodal SSL pipelines.

major comments (2)
  1. [§3.2] §3.2 (CSP mechanism): Serialization of the batch into a single continuous sequence for the state-space model imposes an arbitrary ordering with no canonical basis in unordered point-cloud scenes. The manuscript provides no experiments that permute serialization order or measure resulting variation in semantic alignment metrics; without such controls, the claimed global semantic alignment risks encoding batch-specific ordering artifacts rather than intrinsic scene-invariant semantics, which is load-bearing for the core contribution.
  2. [§4.1] §4.1 (SPD design): The asymmetric distillation is presented as fully removing batch-dependency inconsistencies, yet the description does not demonstrate that order-dependent features learned during CSP pre-training are prevented from being transferred or distorted by the heterogeneous-input and alignment-constraint mechanism. A direct ablation comparing SPD performance under different pre-training serialization orders is required to substantiate the stability claim.
minor comments (2)
  1. The abstract states the outperformance claim without any numerical metrics, baseline names, or dataset identifiers; adding a concise quantitative summary would improve immediate readability.
  2. Notation for the state-space model transition and the asymmetric distillation loss could be clarified with an explicit equation reference in the main text rather than only in supplementary material.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the thoughtful and constructive comments, which help clarify the robustness requirements for our cross-sample semantic propagation approach. We appreciate the recognition of the potential significance of using state-space models for inter-sample consistency in point cloud SSL. We address each major comment below and outline the revisions we will make to strengthen the manuscript.

read point-by-point responses
  1. Referee: [§3.2] §3.2 (CSP mechanism): Serialization of the batch into a single continuous sequence for the state-space model imposes an arbitrary ordering with no canonical basis in unordered point-cloud scenes. The manuscript provides no experiments that permute serialization order or measure resulting variation in semantic alignment metrics; without such controls, the claimed global semantic alignment risks encoding batch-specific ordering artifacts rather than intrinsic scene-invariant semantics, which is load-bearing for the core contribution.

    Authors: We agree that the arbitrary serialization order is a valid concern and that explicit controls are needed to confirm the claimed semantic alignment arises from intrinsic scene properties rather than ordering artifacts. Although the state-space model propagates semantic states sequentially with the intent of capturing dynamic cross-sample dependencies independent of specific order, we acknowledge the absence of permutation experiments in the current manuscript. In the revised version, we will add a dedicated set of experiments that apply multiple random permutations to the batch serialization order, as well as alternative groupings (e.g., by scene similarity), and report the resulting variation in semantic consistency metrics and downstream task performance. These results will be used to quantify sensitivity and, if low, to support that the global alignment is robust. revision: yes

  2. Referee: [§4.1] §4.1 (SPD design): The asymmetric distillation is presented as fully removing batch-dependency inconsistencies, yet the description does not demonstrate that order-dependent features learned during CSP pre-training are prevented from being transferred or distorted by the heterogeneous-input and alignment-constraint mechanism. A direct ablation comparing SPD performance under different pre-training serialization orders is required to substantiate the stability claim.

    Authors: We thank the referee for this precise observation on the SPD mechanism. The asymmetric distillation is designed to enforce semantic feature alignment using heterogeneous inputs precisely to decouple the transferred representations from batch-specific (including order-dependent) artifacts learned in pre-training. Nevertheless, we recognize that a direct ablation linking different pre-training serialization orders to post-SPD performance is missing and would provide stronger evidence for the stability claim. In the revision, we will include this ablation: models pre-trained under varied serialization orders will be fine-tuned with SPD, and we will report the resulting task performance and semantic consistency scores to demonstrate that SPD successfully suppresses order-induced variations. revision: yes

Circularity Check

0 steps flagged

No circularity detected in claimed derivation

full rationale

The paper proposes CSP (serializing batch samples into a state-space model for cross-sample semantic propagation) and asymmetric SPD (distillation during finetuning for single-scene stability) as novel mechanisms to address sample-independent modeling limitations in PC-SSL. No equations, predictions, or central claims reduce by construction to self-fitted parameters, prior self-citations, or renamed inputs; the abstract and described framework present these as independent architectural choices with external benchmark validation. The derivation chain remains self-contained without load-bearing self-referential reductions.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The central claim rests on untested assumptions that state-space models can propagate semantics across serialized point-cloud samples to produce global alignment and that SPD can eliminate batch dependency effects; no explicit free parameters or new entities are named in the abstract.

axioms (2)
  • domain assumption State-space models can model dynamic semantic dependencies across serialized batch samples to establish cross-sample consistency in latent space
    Invoked as the core mechanism of CSP in the abstract
  • domain assumption Asymmetric semantic preservation distillation can achieve structural alignment and eliminate batch-induced inconsistencies under single-scene testing
    Invoked to justify the SPD component for stable transfer

pith-pipeline@v0.9.0 · 5550 in / 1303 out tokens · 29740 ms · 2026-05-08T19:27:36.762616+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

64 extracted references

  1. [1]

    3d semantic parsing of large-scale indoor spaces

    Iro Armeni, Ozan Sener, Amir R Zamir, Helen Jiang, Ioannis Brilakis, Martin Fischer, and Silvio Savarese. 3d semantic parsing of large-scale indoor spaces. InCVPR, pages 1534– 1543, 2016. 5

  2. [2]

    Spectral informed mamba for ro- bust point cloud processing

    Ali Bahri, Moslem Yazdanpanah, Mehrdad Noori, Sahar Dastani, Milad Cheraghalikhani, Gustavo Adolfo Vargas Hakim, David Osowiechi, Farzad Beizaee, Ismail Ben Ayed, and Christian Desrosiers. Spectral informed mamba for ro- bust point cloud processing. InCVPR, pages 11799–11809,

  3. [3]

    Emerg- ing properties in self-supervised vision transformers

    Mathilde Caron, Hugo Touvron, Ishan Misra, Herv ´e J´egou, Julien Mairal, Piotr Bojanowski, and Armand Joulin. Emerg- ing properties in self-supervised vision transformers. In ICCV, pages 9650–9660, 2021. 2

  4. [4]

    To- wards robust 3d object detection with lidar and 4d radar fu- sion in various weather conditions

    Yujeong Chae, Hyeonseong Kim, and Kuk-Jin Yoon. To- wards robust 3d object detection with lidar and 4d radar fu- sion in various weather conditions. InCVPR, pages 15162– 15172, 2024. 1

  5. [5]

    Decoupled local aggregation for point cloud learning, 2023

    Binjie Chen, Yunzhou Xia, Yu Zang, Cheng Wang, and Jonathan Li. Decoupled local aggregation for point cloud learning, 2023. 5

  6. [6]

    Pointgpt: Auto-regressively generative pre- training from point clouds

    Guangyan Chen, Meiling Wang, Yi Yang, Kai Yu, Li Yuan, and Yufeng Yue. Pointgpt: Auto-regressively generative pre- training from point clouds. InNeurIPS, pages 29667–29679,

  7. [7]

    Harnessing Text-to-Image Diffu- sion Models for Point Cloud Self-Supervised Learning

    Yiyang Chen, Shanshan Zhao, Lunhao Duan, Changxing Ding, and Dacheng Tao. Harnessing Text-to-Image Diffu- sion Models for Point Cloud Self-Supervised Learning. In ICCV, pages 26156–26166, 2025. 7, 8

  8. [8]

    Chang, Manolis Savva, Maciej Hal- ber, Thomas Funkhouser, and Matthias Nießner

    Angela Dai, Angel X. Chang, Manolis Savva, Maciej Hal- ber, Thomas Funkhouser, and Matthias Nießner. Scannet: Richly-annotated 3d reconstructions of indoor scenes. In CVPR, 2017. 5

  9. [9]

    Transformers are SSMs: General- ized models and efficient algorithms through structured state space duality

    Tri Dao and Albert Gu. Transformers are SSMs: General- ized models and efficient algorithms through structured state space duality. InICML, 2024. 3

  10. [10]

    Linnet: Linear network for effi- cient point cloud representation learning

    Hao Deng, Kunlei Jing, Shengmei Cheng, Cheng Liu, Jiawei Ru, Jiang Bo, and Lin Wang. Linnet: Linear network for effi- cient point cloud representation learning. InNeurIPS, pages 43189–43209, 2024. 7

  11. [11]

    Point-gcc: Universal self-supervised 3d scene pre-training via geometry-color contrast

    Guofan Fan, Zekun Qi, Wenkai Shi, and Kaisheng Ma. Point-gcc: Universal self-supervised 3d scene pre-training via geometry-color contrast. InACMMM, pages 4709–4718,

  12. [12]

    Shape2scene: 3d scene representation learning through pre- training on shape data

    Tuo Feng, Wenguan Wang, Ruijie Quan, and Yi Yang. Shape2scene: 3d scene representation learning through pre- training on shape data. InECCV, pages 73–91. Springer,

  13. [13]

    Mamba: Linear-Time Sequence Modeling with Selective State Spaces

    Albert Gu and Tri Dao. Mamba: Linear-Time Sequence Modeling with Selective State Spaces. InCOLM, 2024. 3

  14. [14]

    Mamba3D: Enhancing Local Features for 3D Point Cloud Analysis via State Space Model

    Xu Han, Yuan Tang, Zhaoxuan Wang, and Xianzhi Li. Mamba3D: Enhancing Local Features for 3D Point Cloud Analysis via State Space Model. InACM MM, pages 4995– 5004, Melbourne VIC Australia, 2024. 3, 7, 8

  15. [15]

    All in One: Visual-Description-Guided Unified Point Cloud Seg- mentation

    Zongyan Han, Mohamed El Amine Boudjoghra, Jiahua Dong, Jinhong Wang, and Rao Muhammad Anwer. All in One: Visual-Description-Guided Unified Point Cloud Seg- mentation. InICCV, pages 24835–24845, 2025. 5

  16. [16]

    Masked scene modeling: Narrowing the gap between supervised and self-supervised learning in 3d scene understanding

    Pedro Hermosilla, Christian Stippel, and Leon Sick. Masked scene modeling: Narrowing the gap between supervised and self-supervised learning in 3d scene understanding. In CVPR, pages 14835–14844, 2025. 3

  17. [17]

    ¨Uber die stetige abbildung einer linie auf ein fl ¨achenst¨uck

    David Hilbert. ¨Uber die stetige abbildung einer linie auf ein fl ¨achenst¨uck. InDritter Band: Analysis· Grundlagen der Mathematik· Physik Verschiedenes: Nebst Einer Lebens- geschichte, pages 1–2. Springer, 1935. 3

  18. [18]

    Exploring data-efficient 3d scene understanding with contrastive scene contexts

    Ji Hou, Benjamin Graham, Matthias Nießner, and Saining Xie. Exploring data-efficient 3d scene understanding with contrastive scene contexts. InCVPR, pages 15587–15597,

  19. [19]

    Ponder: Point cloud pre-training via neural rendering

    Di Huang, Sida Peng, Tong He, Honghui Yang, Xiaowei Zhou, and Wanli Ouyang. Ponder: Point cloud pre-training via neural rendering. InICCV, pages 16089–16098, 2023. 3

  20. [20]

    Spatio-temporal self-supervised representation learning for 3d point clouds

    Siyuan Huang, Yichen Xie, Song-Chun Zhu, and Yixin Zhu. Spatio-temporal self-supervised representation learning for 3d point clouds. InICCV, pages 6535–6545, 2021. 2, 3

  21. [21]

    L4dr: Lidar-4dradar fusion for weather-robust 3d ob- ject detection

    Xun Huang, Ziyu Xu, Hai Wu, Jinlong Wang, Qiming Xia, Yan Xia, Jonathan Li, Kyle Gao, Chenglu Wen, and Cheng Wang. L4dr: Lidar-4dradar fusion for weather-robust 3d ob- ject detection. InAAAI, pages 3806–3814, 2025. 1

  22. [22]

    Self-supervised pre- training with masked shape prediction for 3d scene under- standing

    Li Jiang, Zetong Yang, Shaoshuai Shi, Vladislav Golyanik, Dengxin Dai, and Bernt Schiele. Self-supervised pre- training with masked shape prediction for 3d scene under- standing. InCVPR, pages 1168–1178, 2023. 2, 3

  23. [23]

    Feast-mamba: Feature and spatial aware mamba network with bidirectional orthogonal fusion for cross-modal point cloud segmentation

    Chade Li, Pengju Zhang, Bo Liu, Hao Wei, and Yihong Wu. Feast-mamba: Feature and spatial aware mamba network with bidirectional orthogonal fusion for cross-modal point cloud segmentation. InAAAI, pages 4634–4642, 2025. 3

  24. [24]

    A closer look at invari- ances in self-supervised pre-training for 3d vision

    Lanxiao Li and Michael Heizmann. A closer look at invari- ances in self-supervised pre-training for 3d vision. InECCV, pages 656–673. Springer, 2022. 3

  25. [25]

    Pamba: enhancing global inter- action in point clouds via state space model

    Zhuoyuan Li, Yubo Ai, Jiahao Lu, ChuXin Wang, Jiacheng Deng, Hanzhi Chang, Yanzhe Liang, Wenfei Yang, Shifeng Zhang, and Tianzhu Zhang. Pamba: enhancing global inter- action in point clouds via state space model. InAAAI, 2025. 3

  26. [26]

    PointMamba: a simple state space model for point cloud analysis

    Dingkang Liang, Xin Zhou, Wei Xu, Xingkui Zhu, Zhikang Zou, Xiaoqing Ye, Xiao Tan, and Xiang Bai. PointMamba: a simple state space model for point cloud analysis. In NeurIPS, 2024. 3, 8

  27. [27]

    Decoupled weight decay regularization, 2017

    Ilya Loshchilov and Frank Hutter. Decoupled weight decay regularization, 2017. 5

  28. [28]

    Exploring point-bev fusion for 3d point cloud object tracking with transformer.PAMI, 46(9):5921–5935, 2024

    Zhipeng Luo, Changqing Zhou, Liang Pan, Gongjie Zhang, Tianrui Liu, Yueru Luo, Haiyu Zhao, Ziwei Liu, and Shijian Lu. Exploring point-bev fusion for 3d point cloud object tracking with transformer.PAMI, 46(9):5921–5935, 2024. 1

  29. [29]

    Visualizing data using t-sne.Journal of machine learning research, 9 (Nov):2579–2605, 2008

    Laurens van der Maaten and Geoffrey Hinton. Visualizing data using t-sne.Journal of machine learning research, 9 (Nov):2579–2605, 2008. 6

  30. [30]

    International Business Machines Company, 1966

    Guy M Morton.A computer oriented geodetic data base and a new technique in file sequencing. International Business Machines Company, 1966. 3

  31. [31]

    3dses: an indoor lidar point cloud segmen- tation dataset with real and pseudo-labels from a 3d model,

    Maxime M ´erizette, Nicolas Audebert, Pierre Kervella, and J´erˆome Verdun. 3dses: an indoor lidar point cloud segmen- tation dataset with real and pseudo-labels from a 3d model,

  32. [32]

    Segcontrast: 3d point cloud feature representation learning through self-supervised seg- ment discrimination.IEEE Robotics and Automation Letters, 7(2):2116–2123, 2022

    Lucas Nunes, Rodrigo Marcuzzi, Xieyuanli Chen, Jens Behley, and Cyrill Stachniss. Segcontrast: 3d point cloud feature representation learning through self-supervised seg- ment discrimination.IEEE Robotics and Automation Letters, 7(2):2116–2123, 2022. 3

  33. [33]

    Pointnext: Revisiting pointnet++ with improved training and scaling strategies

    Guocheng Qian, Yuchen Li, Houwen Peng, Jinjie Mai, Hasan Hammoud, Mohamed Elhoseiny, and Bernard Ghanem. Pointnext: Revisiting pointnet++ with improved training and scaling strategies. InNeurIPS, pages 23192– 23204, 2022. 7

  34. [34]

    Kpconvx: Modernizing kernel point convolution with kernel attention

    Hugues Thomas, Yao-Hung Hubert Tsai, Timothy D Bar- foot, and Jian Zhang. Kpconvx: Modernizing kernel point convolution with kernel attention. InCVPR, pages 5525– 5535, 2024. 7

  35. [35]

    Pdfactor: Learning tri- perspective view policy diffusion field for multi-task robotic manipulation

    Jingyi Tian, Le Wang, Sanping Zhou, Sen Wang, Jiayi Li, Haowen Sun, and Wei Tang. Pdfactor: Learning tri- perspective view policy diffusion field for multi-task robotic manipulation. InCVPR, pages 15757–15767, 2025. 1

  36. [36]

    Revisiting Point Cloud Classification: A New Benchmark Dataset and Clas- sification Model on Real-World Data

    Mikaela Angelina Uy, Quang-Hieu Pham, Binh-Son Hua, Duc Thanh Nguyen, and Sai-Kit Yeung. Revisiting Point Cloud Classification: A New Benchmark Dataset and Clas- sification Model on Real-World Data. InICCV, 2019. 7

  37. [37]

    Groupcontrast: Semantic-aware self-supervised representation learning for 3d understanding

    Chengyao Wang, Li Jiang, Xiaoyang Wu, Zhuotao Tian, Bo- hao Peng, Hengshuang Zhao, and Jiaya Jia. Groupcontrast: Semantic-aware self-supervised representation learning for 3d understanding. InCVPR, pages 4917–4928, 2024. 3

  38. [38]

    Masked point-entity contrast for open-vocabulary 3d scene understanding

    Yan Wang, Baoxiong Jia, Ziyu Zhu, and Siyuan Huang. Masked point-entity contrast for open-vocabulary 3d scene understanding. InCVPR, 2025. 1, 3

  39. [39]

    Mcgs: Markov chain gaussian splatting for dynamic scenes reconstruction

    Yuzhong Wang, Wenmin Wang, Shixiong Zhang, Xinxing Yu, and Zhongheng Chen. Mcgs: Markov chain gaussian splatting for dynamic scenes reconstruction. InAAAI, pages 10341–10348, 2026. 1

  40. [40]

    PoinTramba: A Hybrid Transformer-Mamba Framework for Point Cloud Analysis,

    Zicheng Wang, Zhenghao Chen, Yiming Wu, Zhen Zhao, Luping Zhou, and Dong Xu. PoinTramba: A Hybrid Transformer-Mamba Framework for Point Cloud Analysis,

  41. [41]

    Masked scene contrast: A scalable framework for unsuper- vised 3d representation learning

    Xiaoyang Wu, Xin Wen, Xihui Liu, and Hengshuang Zhao. Masked scene contrast: A scalable framework for unsuper- vised 3d representation learning. InCVPR, pages 9415– 9424, 2023. 3

  42. [42]

    Point transformer v3: Simpler faster stronger

    Xiaoyang Wu, Li Jiang, Peng-Shuai Wang, Zhijian Liu, Xi- hui Liu, Yu Qiao, Wanli Ouyang, Tong He, and Hengshuang Zhao. Point transformer v3: Simpler faster stronger. In CVPR, pages 4840–4851, 2024. 5

  43. [43]

    Sonata: Self- supervised learning of reliable point representations

    Xiaoyang Wu, Daniel DeTone, Duncan Frost, Tianwei Shen, Chris Xie, Nan Yang, Jakob Engel, Richard New- combe, Hengshuang Zhao, and Julian Straub. Sonata: Self- supervised learning of reliable point representations. In CVPR, pages 22193–22204, 2025. 5

  44. [44]

    Mitigating object de- pendencies: Improving point cloud self-supervised learning through object exchange

    Yanhao Wu, Tong Zhang, Wei Ke, Congpei Qiu, Sabine S¨usstrunk, and Mathieu Salzmann. Mitigating object de- pendencies: Improving point cloud self-supervised learning through object exchange. InCVPR, pages 23052–23061,

  45. [45]

    3d shapenets: A deep representation for volumetric shapes

    Zhirong Wu, Shuran Song, Aditya Khosla, Fisher Yu, Lin- guang Zhang, Xiaoou Tang, and Jianxiong Xiao. 3d shapenets: A deep representation for volumetric shapes. In CVPR, pages 1912–1920, 2015. 7

  46. [46]

    Pointcontrast: Unsupervised pre- training for 3d point cloud understanding

    Saining Xie, Jiatao Gu, Demi Guo, Charles R Qi, Leonidas Guibas, and Or Litany. Pointcontrast: Unsupervised pre- training for 3d point cloud understanding. InECCV, pages 574–591. Springer, 2020. 2, 3

  47. [47]

    Mm-3dscene: 3d scene under- standing by customizing masked modeling with informative- preserved reconstruction and self-distilled consistency

    Mingye Xu, Mutian Xu, Tong He, Wanli Ouyang, Yali Wang, Xiaoguang Han, and Yu Qiao. Mm-3dscene: 3d scene under- standing by customizing masked modeling with informative- preserved reconstruction and self-distilled consistency. In CVPR, pages 4380–4390, 2023. 3

  48. [48]

    Gated delta networks: Improving mamba2 with delta rule

    Songlin Yang, Jan Kautz, and Ali Hatamizadeh. Gated delta networks: Improving mamba2 with delta rule. InICLR,

  49. [49]

    Swin3d: A pretrained transformer backbone for 3d indoor scene understanding.CVM, 11(1):83–101, 2025

    Yu-Qi Yang, Yu-Xiao Guo, Jian-Yu Xiong, Yang Liu, Hao Pan, Peng-Shuai Wang, Xin Tong, and Baining Guo. Swin3d: A pretrained transformer backbone for 3d indoor scene understanding.CVM, 11(1):83–101, 2025. 7

  50. [50]

    Kim, Duygu Ceylan, I.-Chao Shen, Mengyan Yan, Hao Su, Cewu Lu, Qixing Huang, Alla Shef- fer, and Leonidas Guibas

    Li Yi, Vladimir G. Kim, Duygu Ceylan, I.-Chao Shen, Mengyan Yan, Hao Su, Cewu Lu, Qixing Huang, Alla Shef- fer, and Leonidas Guibas. A Scalable Active Framework for Region Annotation in 3D Shape Collections.SIGGRAPH Asia, 2016. 8

  51. [51]

    Is-fusion: Instance-scene collaborative fusion for multimodal 3d object detection

    Junbo Yin, Jianbing Shen, Runnan Chen, Wei Li, Ruigang Yang, Pascal Frossard, and Wenguan Wang. Is-fusion: Instance-scene collaborative fusion for multimodal 3d object detection. InCVPR, pages 14905–14915, 2024. 1

  52. [52]

    Point deformable network with enhanced normal embedding for point cloud analysis

    Xingyilang Yin, Xi Yang, Liangchen Liu, Nannan Wang, and Xinbo Gao. Point deformable network with enhanced normal embedding for point cloud analysis. InAAAI, pages 6738– 6746, 2024. 5

  53. [53]

    FACNet: Feature alignment fast point cloud completion network.CVM, 11(1):141–157, 2025

    Xinxing Yu, Jianyi Li, Chi-Chong Wong, Chi-Man V ong, and Yanyan Liang. FACNet: Feature alignment fast point cloud completion network.CVM, 11(1):141–157, 2025. 1

  54. [54]

    Pointmc: Multi-view consistent en- coding and center-global feature fusion for point clouds un- derstanding

    Xinxing Yu, Ajian Liu, Sunyuan Qiang, Yuzhong Wang, Hui Ma, and Yanyan Liang. Pointmc: Multi-view consistent en- coding and center-global feature fusion for point clouds un- derstanding. InAAAI, pages 12169–12177, 2026. 3

  55. [55]

    Mambamos: Lidar-based 3d moving object segmentation with motion- aware state space model

    Kang Zeng, Hao Shi, Jiacheng Lin, Siyu Li, Jintao Cheng, Kaiwei Wang, Zhiyong Li, and Kailun Yang. Mambamos: Lidar-based 3d moving object segmentation with motion- aware state space model. InACMMM, pages 1505–1513,

  56. [56]

    DeepLA-Net: Very Deep Local Aggregation Networks for Point Cloud Analysis

    Ziyin Zeng, Mingyue Dong, Jian Zhou, Huan Qiu, Zhen Dong, Man Luo, and Bijun Li. DeepLA-Net: Very Deep Local Aggregation Networks for Point Cloud Analysis. In CVPR, pages 1330–1341, 2025. 5

  57. [57]

    Point Cloud Mixture-of-Domain- Experts Model for 3D Self-supervised Learning

    Yaohua Zha, Tao Dai, Hang Guo, Yanzi Wang, Bin Chen, Ke Chen, and Shu-Tao Xia. Point Cloud Mixture-of-Domain- Experts Model for 3D Self-supervised Learning. InIJCAI, pages 2332–2340. International Joint Conferences on Artifi- cial Intelligence Organization, 2025. 3, 5, 7, 8

  58. [58]

    CamPoint: Boosting Point Cloud Segmen- tation with Virtual Camera

    Jianhui Zhang, Yizhi Luo, Zicheng Zhang, Xuecheng Nie, and Bonan Li. CamPoint: Boosting Point Cloud Segmen- tation with Virtual Camera. InCVPR, pages 11822–11832,

  59. [59]

    Point Cloud Mamba: Point Cloud Learning via State Space Model

    Tao Zhang, Haobo Yuan, Lu Qi, Jiangning Zhang, Qianyu Zhou, Shunping Ji, Shuicheng Yan, and Xiangtai Li. Point Cloud Mamba: Point Cloud Learning via State Space Model. InAAAI, pages 10121–10130, 2025. 3, 7

  60. [60]

    To- wards More Diverse and Challenging Pre-training for Point Cloud Learning: Self-Supervised Cross Reconstruction with Decoupled Views

    Xiangdong Zhang, Shaofeng Zhang, and Junchi Yan. To- wards More Diverse and Challenging Pre-training for Point Cloud Learning: Self-Supervised Cross Reconstruction with Decoupled Views. InICCV, pages 28696–28706, 2025. 5, 7, 8

  61. [61]

    Point cloud pre-training with diffusion models

    Xiao Zheng, Xiaoshui Huang, Guofeng Mei, Yuenan Hou, Zhaoyang Lyu, Bo Dai, Wanli Ouyang, and Yongshun Gong. Point cloud pre-training with diffusion models. InCVPR, pages 22935–22945, 2024. 3, 5

  62. [62]

    Point cloud matters: Rethinking the impact of different observation spaces on robot learning

    Haoyi Zhu, Yating Wang, Di Huang, Weicai Ye, Wanli Ouyang, and Tong He. Point cloud matters: Rethinking the impact of different observation spaces on robot learning. NeurIPS, 37:77799–77830, 2024. 1

  63. [63]

    Living scenes: Multi-object relocalization and re- construction in changing 3d environments

    Liyuan Zhu, Shengyu Huang, Konrad Schindler, and Iro Ar- meni. Living scenes: Multi-object relocalization and re- construction in changing 3d environments. InCVPR, pages 28014–28024, 2024. 1

  64. [64]

    Improved mlp point cloud processing with high-dimensional positional encoding

    Yanmei Zou, Hongshan Yu, Zhengeng Yang, Zechuan Li, and Naveed Akhtar. Improved mlp point cloud processing with high-dimensional positional encoding. InAAAI, pages 7891–7899, 2024. 5