Recognition: 3 theorem links
· Lean TheoremPointCSP: Cross-Sample Semantic Propagation and Stability Preservation in Self-Supervised Point Cloud Learning
Pith reviewed 2026-05-08 19:27 UTC · model grok-4.3
The pith
Serializing point cloud samples into a state-space model propagates semantics across a batch to build global alignment, with asymmetric distillation preserving that consistency at test time.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The authors claim that cross-sample semantic propagation (CSP) achieved by serializing batch samples and processing them with a state-space model explicitly models dynamic dependencies across scenes, thereby establishing global semantic alignment in the latent space. They further claim that an asymmetric semantic preservation distillation (SPD) module, using heterogeneous input and feature alignment constraints, removes the batch-induced inconsistencies that would otherwise appear under single-scene testing and enables stable transfer of the pretrained semantics.
What carries the argument
Cross-sample semantic propagation (CSP) that serializes a batch of point clouds into a continuous sequence and runs it through a state-space model so semantic state can flow from one sample to the next.
If this is right
- Pretrained models produce latent features whose semantic categories remain aligned even when the scenes are presented independently at test time.
- Downstream 3D segmentation and classification tasks receive more stable and transferable representations than those obtained from prior self-supervised point-cloud methods.
- Batch-level serialization during pretraining becomes an explicit mechanism for building cross-sample consistency instead of a source of inconsistency.
- The same state-space propagation idea could be applied to other 3D modalities once the finetuning stabilization step is in place.
Where Pith is reading between the lines
- If the propagation truly creates global alignment, the method might reduce the need for large-scale labeled 3D data by making unlabeled scene collections more useful for representation learning.
- The approach suggests that other modalities with natural batch structure, such as video or multi-view images, could adopt similar serialization-plus-state-space designs to enforce cross-sample consistency.
- A remaining open question is whether the state-space model can be replaced by simpler recurrent or attention mechanisms while retaining the same alignment benefit.
Load-bearing premise
That turning separate point-cloud samples into one serialized sequence and running it through a state-space model actually creates meaningful global semantic alignment rather than spurious correlations, and that the later asymmetric distillation step removes all batch artifacts without creating new inconsistencies.
What would settle it
A controlled test in which two scenes that should share semantic categories are processed separately after pretraining and their extracted features show no measurable increase in cross-scene consistency or downstream task accuracy compared with a standard sample-independent baseline.
Figures
read the original abstract
Scene-level point cloud self-supervised learning (PC-SSL) has demonstrated potential in enhancing the generalization capability of 3D vision models. Despite the advances in the field through existing methods, the sample-independent modeling paradigm still poses significant limitations in terms of maintaining consistent semantic representations across scenes. This challenge hinders the construction of a unified and transferable semantic space. To address this issue, we propose a PC-SSL framework based on cross-sample semantic propagation (CSP), in which samples within a batch are serialized into continuous input and processed by a state-space model to enable semantic state propagation. This mechanism explicitly models the dynamic dependencies across samples in the state space, allowing the network to establish cross-sample semantic consistency in the latent space and achieve global semantic alignment. Since serialization-based pretraining requires batch-level input organization, we further introduce an asymmetric semantic preservation distillation (SPD) during finetuning to achieve structural alignment of semantic transfer and eliminate inconsistencies caused by batch dependency. The proposed SPD ensures stable transfer of pretrained semantics through a heterogeneous input mechanism and a semantic feature alignment constraint. This enables the model to maintain structured semantic consistency and robustness under single-scene testing conditions. Extensive experiments on multiple benchmark datasets demonstrate that our method consistently outperforms state-of-the-art methods in both performance and semantic consistency.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript introduces PointCSP, a self-supervised framework for scene-level point cloud learning. It proposes cross-sample semantic propagation (CSP) that serializes an entire batch into a continuous sequence processed by a state-space model to propagate semantic states across samples and achieve global alignment in latent space. To address batch-induced inconsistencies at single-scene test time, it adds asymmetric semantic preservation distillation (SPD) during fine-tuning via heterogeneous inputs and a semantic feature alignment constraint. The central claim is that this yields consistent outperformance over state-of-the-art methods on multiple benchmarks in both task performance and semantic consistency.
Significance. If the experimental claims are robust and the method is shown to be insensitive to serialization order, the work could meaningfully advance point-cloud SSL by moving beyond sample-independent modeling toward explicit cross-sample consistency. The use of state-space models for dynamic inter-sample propagation is a distinctive technical choice that, if validated, might influence other 3D or multimodal SSL pipelines.
major comments (2)
- [§3.2] §3.2 (CSP mechanism): Serialization of the batch into a single continuous sequence for the state-space model imposes an arbitrary ordering with no canonical basis in unordered point-cloud scenes. The manuscript provides no experiments that permute serialization order or measure resulting variation in semantic alignment metrics; without such controls, the claimed global semantic alignment risks encoding batch-specific ordering artifacts rather than intrinsic scene-invariant semantics, which is load-bearing for the core contribution.
- [§4.1] §4.1 (SPD design): The asymmetric distillation is presented as fully removing batch-dependency inconsistencies, yet the description does not demonstrate that order-dependent features learned during CSP pre-training are prevented from being transferred or distorted by the heterogeneous-input and alignment-constraint mechanism. A direct ablation comparing SPD performance under different pre-training serialization orders is required to substantiate the stability claim.
minor comments (2)
- The abstract states the outperformance claim without any numerical metrics, baseline names, or dataset identifiers; adding a concise quantitative summary would improve immediate readability.
- Notation for the state-space model transition and the asymmetric distillation loss could be clarified with an explicit equation reference in the main text rather than only in supplementary material.
Simulated Author's Rebuttal
We thank the referee for the thoughtful and constructive comments, which help clarify the robustness requirements for our cross-sample semantic propagation approach. We appreciate the recognition of the potential significance of using state-space models for inter-sample consistency in point cloud SSL. We address each major comment below and outline the revisions we will make to strengthen the manuscript.
read point-by-point responses
-
Referee: [§3.2] §3.2 (CSP mechanism): Serialization of the batch into a single continuous sequence for the state-space model imposes an arbitrary ordering with no canonical basis in unordered point-cloud scenes. The manuscript provides no experiments that permute serialization order or measure resulting variation in semantic alignment metrics; without such controls, the claimed global semantic alignment risks encoding batch-specific ordering artifacts rather than intrinsic scene-invariant semantics, which is load-bearing for the core contribution.
Authors: We agree that the arbitrary serialization order is a valid concern and that explicit controls are needed to confirm the claimed semantic alignment arises from intrinsic scene properties rather than ordering artifacts. Although the state-space model propagates semantic states sequentially with the intent of capturing dynamic cross-sample dependencies independent of specific order, we acknowledge the absence of permutation experiments in the current manuscript. In the revised version, we will add a dedicated set of experiments that apply multiple random permutations to the batch serialization order, as well as alternative groupings (e.g., by scene similarity), and report the resulting variation in semantic consistency metrics and downstream task performance. These results will be used to quantify sensitivity and, if low, to support that the global alignment is robust. revision: yes
-
Referee: [§4.1] §4.1 (SPD design): The asymmetric distillation is presented as fully removing batch-dependency inconsistencies, yet the description does not demonstrate that order-dependent features learned during CSP pre-training are prevented from being transferred or distorted by the heterogeneous-input and alignment-constraint mechanism. A direct ablation comparing SPD performance under different pre-training serialization orders is required to substantiate the stability claim.
Authors: We thank the referee for this precise observation on the SPD mechanism. The asymmetric distillation is designed to enforce semantic feature alignment using heterogeneous inputs precisely to decouple the transferred representations from batch-specific (including order-dependent) artifacts learned in pre-training. Nevertheless, we recognize that a direct ablation linking different pre-training serialization orders to post-SPD performance is missing and would provide stronger evidence for the stability claim. In the revision, we will include this ablation: models pre-trained under varied serialization orders will be fine-tuned with SPD, and we will report the resulting task performance and semantic consistency scores to demonstrate that SPD successfully suppresses order-induced variations. revision: yes
Circularity Check
No circularity detected in claimed derivation
full rationale
The paper proposes CSP (serializing batch samples into a state-space model for cross-sample semantic propagation) and asymmetric SPD (distillation during finetuning for single-scene stability) as novel mechanisms to address sample-independent modeling limitations in PC-SSL. No equations, predictions, or central claims reduce by construction to self-fitted parameters, prior self-citations, or renamed inputs; the abstract and described framework present these as independent architectural choices with external benchmark validation. The derivation chain remains self-contained without load-bearing self-referential reductions.
Axiom & Free-Parameter Ledger
axioms (2)
- domain assumption State-space models can model dynamic semantic dependencies across serialized batch samples to establish cross-sample consistency in latent space
- domain assumption Asymmetric semantic preservation distillation can achieve structural alignment and eliminate batch-induced inconsistencies under single-scene testing
Lean theorems connected to this paper
-
IndisputableMonolith/Cost/FunctionalEquation.lean (J-cost uniqueness)washburn_uniqueness_aczel unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
samples within a batch are serialized into continuous input and processed by a state-space model to enable semantic state propagation ... h_t = f_θ(A h_{t−1} + B f'_t), y_t = C h_t
-
IndisputableMonolith/Foundation/LogicAsFunctionalEquation.leanderivedCost / J = ½(x+x⁻¹)−1 unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
L_SPD = (1/N) Σ ||f_s_i − f_t_i||²₂ ; L_fine-tune = L_task + λ_SPD L_SPD + λ_geo L_geo
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Reference graph
Works this paper leans on
-
[1]
3d semantic parsing of large-scale indoor spaces
Iro Armeni, Ozan Sener, Amir R Zamir, Helen Jiang, Ioannis Brilakis, Martin Fischer, and Silvio Savarese. 3d semantic parsing of large-scale indoor spaces. InCVPR, pages 1534– 1543, 2016. 5
2016
-
[2]
Spectral informed mamba for ro- bust point cloud processing
Ali Bahri, Moslem Yazdanpanah, Mehrdad Noori, Sahar Dastani, Milad Cheraghalikhani, Gustavo Adolfo Vargas Hakim, David Osowiechi, Farzad Beizaee, Ismail Ben Ayed, and Christian Desrosiers. Spectral informed mamba for ro- bust point cloud processing. InCVPR, pages 11799–11809,
-
[3]
Emerg- ing properties in self-supervised vision transformers
Mathilde Caron, Hugo Touvron, Ishan Misra, Herv ´e J´egou, Julien Mairal, Piotr Bojanowski, and Armand Joulin. Emerg- ing properties in self-supervised vision transformers. In ICCV, pages 9650–9660, 2021. 2
2021
-
[4]
To- wards robust 3d object detection with lidar and 4d radar fu- sion in various weather conditions
Yujeong Chae, Hyeonseong Kim, and Kuk-Jin Yoon. To- wards robust 3d object detection with lidar and 4d radar fu- sion in various weather conditions. InCVPR, pages 15162– 15172, 2024. 1
2024
-
[5]
Decoupled local aggregation for point cloud learning, 2023
Binjie Chen, Yunzhou Xia, Yu Zang, Cheng Wang, and Jonathan Li. Decoupled local aggregation for point cloud learning, 2023. 5
2023
-
[6]
Pointgpt: Auto-regressively generative pre- training from point clouds
Guangyan Chen, Meiling Wang, Yi Yang, Kai Yu, Li Yuan, and Yufeng Yue. Pointgpt: Auto-regressively generative pre- training from point clouds. InNeurIPS, pages 29667–29679,
-
[7]
Harnessing Text-to-Image Diffu- sion Models for Point Cloud Self-Supervised Learning
Yiyang Chen, Shanshan Zhao, Lunhao Duan, Changxing Ding, and Dacheng Tao. Harnessing Text-to-Image Diffu- sion Models for Point Cloud Self-Supervised Learning. In ICCV, pages 26156–26166, 2025. 7, 8
2025
-
[8]
Chang, Manolis Savva, Maciej Hal- ber, Thomas Funkhouser, and Matthias Nießner
Angela Dai, Angel X. Chang, Manolis Savva, Maciej Hal- ber, Thomas Funkhouser, and Matthias Nießner. Scannet: Richly-annotated 3d reconstructions of indoor scenes. In CVPR, 2017. 5
2017
-
[9]
Transformers are SSMs: General- ized models and efficient algorithms through structured state space duality
Tri Dao and Albert Gu. Transformers are SSMs: General- ized models and efficient algorithms through structured state space duality. InICML, 2024. 3
2024
-
[10]
Linnet: Linear network for effi- cient point cloud representation learning
Hao Deng, Kunlei Jing, Shengmei Cheng, Cheng Liu, Jiawei Ru, Jiang Bo, and Lin Wang. Linnet: Linear network for effi- cient point cloud representation learning. InNeurIPS, pages 43189–43209, 2024. 7
2024
-
[11]
Point-gcc: Universal self-supervised 3d scene pre-training via geometry-color contrast
Guofan Fan, Zekun Qi, Wenkai Shi, and Kaisheng Ma. Point-gcc: Universal self-supervised 3d scene pre-training via geometry-color contrast. InACMMM, pages 4709–4718,
-
[12]
Shape2scene: 3d scene representation learning through pre- training on shape data
Tuo Feng, Wenguan Wang, Ruijie Quan, and Yi Yang. Shape2scene: 3d scene representation learning through pre- training on shape data. InECCV, pages 73–91. Springer,
-
[13]
Mamba: Linear-Time Sequence Modeling with Selective State Spaces
Albert Gu and Tri Dao. Mamba: Linear-Time Sequence Modeling with Selective State Spaces. InCOLM, 2024. 3
2024
-
[14]
Mamba3D: Enhancing Local Features for 3D Point Cloud Analysis via State Space Model
Xu Han, Yuan Tang, Zhaoxuan Wang, and Xianzhi Li. Mamba3D: Enhancing Local Features for 3D Point Cloud Analysis via State Space Model. InACM MM, pages 4995– 5004, Melbourne VIC Australia, 2024. 3, 7, 8
2024
-
[15]
All in One: Visual-Description-Guided Unified Point Cloud Seg- mentation
Zongyan Han, Mohamed El Amine Boudjoghra, Jiahua Dong, Jinhong Wang, and Rao Muhammad Anwer. All in One: Visual-Description-Guided Unified Point Cloud Seg- mentation. InICCV, pages 24835–24845, 2025. 5
2025
-
[16]
Masked scene modeling: Narrowing the gap between supervised and self-supervised learning in 3d scene understanding
Pedro Hermosilla, Christian Stippel, and Leon Sick. Masked scene modeling: Narrowing the gap between supervised and self-supervised learning in 3d scene understanding. In CVPR, pages 14835–14844, 2025. 3
2025
-
[17]
¨Uber die stetige abbildung einer linie auf ein fl ¨achenst¨uck
David Hilbert. ¨Uber die stetige abbildung einer linie auf ein fl ¨achenst¨uck. InDritter Band: Analysis· Grundlagen der Mathematik· Physik Verschiedenes: Nebst Einer Lebens- geschichte, pages 1–2. Springer, 1935. 3
1935
-
[18]
Exploring data-efficient 3d scene understanding with contrastive scene contexts
Ji Hou, Benjamin Graham, Matthias Nießner, and Saining Xie. Exploring data-efficient 3d scene understanding with contrastive scene contexts. InCVPR, pages 15587–15597,
-
[19]
Ponder: Point cloud pre-training via neural rendering
Di Huang, Sida Peng, Tong He, Honghui Yang, Xiaowei Zhou, and Wanli Ouyang. Ponder: Point cloud pre-training via neural rendering. InICCV, pages 16089–16098, 2023. 3
2023
-
[20]
Spatio-temporal self-supervised representation learning for 3d point clouds
Siyuan Huang, Yichen Xie, Song-Chun Zhu, and Yixin Zhu. Spatio-temporal self-supervised representation learning for 3d point clouds. InICCV, pages 6535–6545, 2021. 2, 3
2021
-
[21]
L4dr: Lidar-4dradar fusion for weather-robust 3d ob- ject detection
Xun Huang, Ziyu Xu, Hai Wu, Jinlong Wang, Qiming Xia, Yan Xia, Jonathan Li, Kyle Gao, Chenglu Wen, and Cheng Wang. L4dr: Lidar-4dradar fusion for weather-robust 3d ob- ject detection. InAAAI, pages 3806–3814, 2025. 1
2025
-
[22]
Self-supervised pre- training with masked shape prediction for 3d scene under- standing
Li Jiang, Zetong Yang, Shaoshuai Shi, Vladislav Golyanik, Dengxin Dai, and Bernt Schiele. Self-supervised pre- training with masked shape prediction for 3d scene under- standing. InCVPR, pages 1168–1178, 2023. 2, 3
2023
-
[23]
Feast-mamba: Feature and spatial aware mamba network with bidirectional orthogonal fusion for cross-modal point cloud segmentation
Chade Li, Pengju Zhang, Bo Liu, Hao Wei, and Yihong Wu. Feast-mamba: Feature and spatial aware mamba network with bidirectional orthogonal fusion for cross-modal point cloud segmentation. InAAAI, pages 4634–4642, 2025. 3
2025
-
[24]
A closer look at invari- ances in self-supervised pre-training for 3d vision
Lanxiao Li and Michael Heizmann. A closer look at invari- ances in self-supervised pre-training for 3d vision. InECCV, pages 656–673. Springer, 2022. 3
2022
-
[25]
Pamba: enhancing global inter- action in point clouds via state space model
Zhuoyuan Li, Yubo Ai, Jiahao Lu, ChuXin Wang, Jiacheng Deng, Hanzhi Chang, Yanzhe Liang, Wenfei Yang, Shifeng Zhang, and Tianzhu Zhang. Pamba: enhancing global inter- action in point clouds via state space model. InAAAI, 2025. 3
2025
-
[26]
PointMamba: a simple state space model for point cloud analysis
Dingkang Liang, Xin Zhou, Wei Xu, Xingkui Zhu, Zhikang Zou, Xiaoqing Ye, Xiao Tan, and Xiang Bai. PointMamba: a simple state space model for point cloud analysis. In NeurIPS, 2024. 3, 8
2024
-
[27]
Decoupled weight decay regularization, 2017
Ilya Loshchilov and Frank Hutter. Decoupled weight decay regularization, 2017. 5
2017
-
[28]
Exploring point-bev fusion for 3d point cloud object tracking with transformer.PAMI, 46(9):5921–5935, 2024
Zhipeng Luo, Changqing Zhou, Liang Pan, Gongjie Zhang, Tianrui Liu, Yueru Luo, Haiyu Zhao, Ziwei Liu, and Shijian Lu. Exploring point-bev fusion for 3d point cloud object tracking with transformer.PAMI, 46(9):5921–5935, 2024. 1
2024
-
[29]
Visualizing data using t-sne.Journal of machine learning research, 9 (Nov):2579–2605, 2008
Laurens van der Maaten and Geoffrey Hinton. Visualizing data using t-sne.Journal of machine learning research, 9 (Nov):2579–2605, 2008. 6
2008
-
[30]
International Business Machines Company, 1966
Guy M Morton.A computer oriented geodetic data base and a new technique in file sequencing. International Business Machines Company, 1966. 3
1966
-
[31]
3dses: an indoor lidar point cloud segmen- tation dataset with real and pseudo-labels from a 3d model,
Maxime M ´erizette, Nicolas Audebert, Pierre Kervella, and J´erˆome Verdun. 3dses: an indoor lidar point cloud segmen- tation dataset with real and pseudo-labels from a 3d model,
-
[32]
Segcontrast: 3d point cloud feature representation learning through self-supervised seg- ment discrimination.IEEE Robotics and Automation Letters, 7(2):2116–2123, 2022
Lucas Nunes, Rodrigo Marcuzzi, Xieyuanli Chen, Jens Behley, and Cyrill Stachniss. Segcontrast: 3d point cloud feature representation learning through self-supervised seg- ment discrimination.IEEE Robotics and Automation Letters, 7(2):2116–2123, 2022. 3
2022
-
[33]
Pointnext: Revisiting pointnet++ with improved training and scaling strategies
Guocheng Qian, Yuchen Li, Houwen Peng, Jinjie Mai, Hasan Hammoud, Mohamed Elhoseiny, and Bernard Ghanem. Pointnext: Revisiting pointnet++ with improved training and scaling strategies. InNeurIPS, pages 23192– 23204, 2022. 7
2022
-
[34]
Kpconvx: Modernizing kernel point convolution with kernel attention
Hugues Thomas, Yao-Hung Hubert Tsai, Timothy D Bar- foot, and Jian Zhang. Kpconvx: Modernizing kernel point convolution with kernel attention. InCVPR, pages 5525– 5535, 2024. 7
2024
-
[35]
Pdfactor: Learning tri- perspective view policy diffusion field for multi-task robotic manipulation
Jingyi Tian, Le Wang, Sanping Zhou, Sen Wang, Jiayi Li, Haowen Sun, and Wei Tang. Pdfactor: Learning tri- perspective view policy diffusion field for multi-task robotic manipulation. InCVPR, pages 15757–15767, 2025. 1
2025
-
[36]
Revisiting Point Cloud Classification: A New Benchmark Dataset and Clas- sification Model on Real-World Data
Mikaela Angelina Uy, Quang-Hieu Pham, Binh-Son Hua, Duc Thanh Nguyen, and Sai-Kit Yeung. Revisiting Point Cloud Classification: A New Benchmark Dataset and Clas- sification Model on Real-World Data. InICCV, 2019. 7
2019
-
[37]
Groupcontrast: Semantic-aware self-supervised representation learning for 3d understanding
Chengyao Wang, Li Jiang, Xiaoyang Wu, Zhuotao Tian, Bo- hao Peng, Hengshuang Zhao, and Jiaya Jia. Groupcontrast: Semantic-aware self-supervised representation learning for 3d understanding. InCVPR, pages 4917–4928, 2024. 3
2024
-
[38]
Masked point-entity contrast for open-vocabulary 3d scene understanding
Yan Wang, Baoxiong Jia, Ziyu Zhu, and Siyuan Huang. Masked point-entity contrast for open-vocabulary 3d scene understanding. InCVPR, 2025. 1, 3
2025
-
[39]
Mcgs: Markov chain gaussian splatting for dynamic scenes reconstruction
Yuzhong Wang, Wenmin Wang, Shixiong Zhang, Xinxing Yu, and Zhongheng Chen. Mcgs: Markov chain gaussian splatting for dynamic scenes reconstruction. InAAAI, pages 10341–10348, 2026. 1
2026
-
[40]
PoinTramba: A Hybrid Transformer-Mamba Framework for Point Cloud Analysis,
Zicheng Wang, Zhenghao Chen, Yiming Wu, Zhen Zhao, Luping Zhou, and Dong Xu. PoinTramba: A Hybrid Transformer-Mamba Framework for Point Cloud Analysis,
-
[41]
Masked scene contrast: A scalable framework for unsuper- vised 3d representation learning
Xiaoyang Wu, Xin Wen, Xihui Liu, and Hengshuang Zhao. Masked scene contrast: A scalable framework for unsuper- vised 3d representation learning. InCVPR, pages 9415– 9424, 2023. 3
2023
-
[42]
Point transformer v3: Simpler faster stronger
Xiaoyang Wu, Li Jiang, Peng-Shuai Wang, Zhijian Liu, Xi- hui Liu, Yu Qiao, Wanli Ouyang, Tong He, and Hengshuang Zhao. Point transformer v3: Simpler faster stronger. In CVPR, pages 4840–4851, 2024. 5
2024
-
[43]
Sonata: Self- supervised learning of reliable point representations
Xiaoyang Wu, Daniel DeTone, Duncan Frost, Tianwei Shen, Chris Xie, Nan Yang, Jakob Engel, Richard New- combe, Hengshuang Zhao, and Julian Straub. Sonata: Self- supervised learning of reliable point representations. In CVPR, pages 22193–22204, 2025. 5
2025
-
[44]
Mitigating object de- pendencies: Improving point cloud self-supervised learning through object exchange
Yanhao Wu, Tong Zhang, Wei Ke, Congpei Qiu, Sabine S¨usstrunk, and Mathieu Salzmann. Mitigating object de- pendencies: Improving point cloud self-supervised learning through object exchange. InCVPR, pages 23052–23061,
-
[45]
3d shapenets: A deep representation for volumetric shapes
Zhirong Wu, Shuran Song, Aditya Khosla, Fisher Yu, Lin- guang Zhang, Xiaoou Tang, and Jianxiong Xiao. 3d shapenets: A deep representation for volumetric shapes. In CVPR, pages 1912–1920, 2015. 7
1912
-
[46]
Pointcontrast: Unsupervised pre- training for 3d point cloud understanding
Saining Xie, Jiatao Gu, Demi Guo, Charles R Qi, Leonidas Guibas, and Or Litany. Pointcontrast: Unsupervised pre- training for 3d point cloud understanding. InECCV, pages 574–591. Springer, 2020. 2, 3
2020
-
[47]
Mm-3dscene: 3d scene under- standing by customizing masked modeling with informative- preserved reconstruction and self-distilled consistency
Mingye Xu, Mutian Xu, Tong He, Wanli Ouyang, Yali Wang, Xiaoguang Han, and Yu Qiao. Mm-3dscene: 3d scene under- standing by customizing masked modeling with informative- preserved reconstruction and self-distilled consistency. In CVPR, pages 4380–4390, 2023. 3
2023
-
[48]
Gated delta networks: Improving mamba2 with delta rule
Songlin Yang, Jan Kautz, and Ali Hatamizadeh. Gated delta networks: Improving mamba2 with delta rule. InICLR,
-
[49]
Swin3d: A pretrained transformer backbone for 3d indoor scene understanding.CVM, 11(1):83–101, 2025
Yu-Qi Yang, Yu-Xiao Guo, Jian-Yu Xiong, Yang Liu, Hao Pan, Peng-Shuai Wang, Xin Tong, and Baining Guo. Swin3d: A pretrained transformer backbone for 3d indoor scene understanding.CVM, 11(1):83–101, 2025. 7
2025
-
[50]
Kim, Duygu Ceylan, I.-Chao Shen, Mengyan Yan, Hao Su, Cewu Lu, Qixing Huang, Alla Shef- fer, and Leonidas Guibas
Li Yi, Vladimir G. Kim, Duygu Ceylan, I.-Chao Shen, Mengyan Yan, Hao Su, Cewu Lu, Qixing Huang, Alla Shef- fer, and Leonidas Guibas. A Scalable Active Framework for Region Annotation in 3D Shape Collections.SIGGRAPH Asia, 2016. 8
2016
-
[51]
Is-fusion: Instance-scene collaborative fusion for multimodal 3d object detection
Junbo Yin, Jianbing Shen, Runnan Chen, Wei Li, Ruigang Yang, Pascal Frossard, and Wenguan Wang. Is-fusion: Instance-scene collaborative fusion for multimodal 3d object detection. InCVPR, pages 14905–14915, 2024. 1
2024
-
[52]
Point deformable network with enhanced normal embedding for point cloud analysis
Xingyilang Yin, Xi Yang, Liangchen Liu, Nannan Wang, and Xinbo Gao. Point deformable network with enhanced normal embedding for point cloud analysis. InAAAI, pages 6738– 6746, 2024. 5
2024
-
[53]
FACNet: Feature alignment fast point cloud completion network.CVM, 11(1):141–157, 2025
Xinxing Yu, Jianyi Li, Chi-Chong Wong, Chi-Man V ong, and Yanyan Liang. FACNet: Feature alignment fast point cloud completion network.CVM, 11(1):141–157, 2025. 1
2025
-
[54]
Pointmc: Multi-view consistent en- coding and center-global feature fusion for point clouds un- derstanding
Xinxing Yu, Ajian Liu, Sunyuan Qiang, Yuzhong Wang, Hui Ma, and Yanyan Liang. Pointmc: Multi-view consistent en- coding and center-global feature fusion for point clouds un- derstanding. InAAAI, pages 12169–12177, 2026. 3
2026
-
[55]
Mambamos: Lidar-based 3d moving object segmentation with motion- aware state space model
Kang Zeng, Hao Shi, Jiacheng Lin, Siyu Li, Jintao Cheng, Kaiwei Wang, Zhiyong Li, and Kailun Yang. Mambamos: Lidar-based 3d moving object segmentation with motion- aware state space model. InACMMM, pages 1505–1513,
-
[56]
DeepLA-Net: Very Deep Local Aggregation Networks for Point Cloud Analysis
Ziyin Zeng, Mingyue Dong, Jian Zhou, Huan Qiu, Zhen Dong, Man Luo, and Bijun Li. DeepLA-Net: Very Deep Local Aggregation Networks for Point Cloud Analysis. In CVPR, pages 1330–1341, 2025. 5
2025
-
[57]
Point Cloud Mixture-of-Domain- Experts Model for 3D Self-supervised Learning
Yaohua Zha, Tao Dai, Hang Guo, Yanzi Wang, Bin Chen, Ke Chen, and Shu-Tao Xia. Point Cloud Mixture-of-Domain- Experts Model for 3D Self-supervised Learning. InIJCAI, pages 2332–2340. International Joint Conferences on Artifi- cial Intelligence Organization, 2025. 3, 5, 7, 8
2025
-
[58]
CamPoint: Boosting Point Cloud Segmen- tation with Virtual Camera
Jianhui Zhang, Yizhi Luo, Zicheng Zhang, Xuecheng Nie, and Bonan Li. CamPoint: Boosting Point Cloud Segmen- tation with Virtual Camera. InCVPR, pages 11822–11832,
-
[59]
Point Cloud Mamba: Point Cloud Learning via State Space Model
Tao Zhang, Haobo Yuan, Lu Qi, Jiangning Zhang, Qianyu Zhou, Shunping Ji, Shuicheng Yan, and Xiangtai Li. Point Cloud Mamba: Point Cloud Learning via State Space Model. InAAAI, pages 10121–10130, 2025. 3, 7
2025
-
[60]
To- wards More Diverse and Challenging Pre-training for Point Cloud Learning: Self-Supervised Cross Reconstruction with Decoupled Views
Xiangdong Zhang, Shaofeng Zhang, and Junchi Yan. To- wards More Diverse and Challenging Pre-training for Point Cloud Learning: Self-Supervised Cross Reconstruction with Decoupled Views. InICCV, pages 28696–28706, 2025. 5, 7, 8
2025
-
[61]
Point cloud pre-training with diffusion models
Xiao Zheng, Xiaoshui Huang, Guofeng Mei, Yuenan Hou, Zhaoyang Lyu, Bo Dai, Wanli Ouyang, and Yongshun Gong. Point cloud pre-training with diffusion models. InCVPR, pages 22935–22945, 2024. 3, 5
2024
-
[62]
Point cloud matters: Rethinking the impact of different observation spaces on robot learning
Haoyi Zhu, Yating Wang, Di Huang, Weicai Ye, Wanli Ouyang, and Tong He. Point cloud matters: Rethinking the impact of different observation spaces on robot learning. NeurIPS, 37:77799–77830, 2024. 1
2024
-
[63]
Living scenes: Multi-object relocalization and re- construction in changing 3d environments
Liyuan Zhu, Shengyu Huang, Konrad Schindler, and Iro Ar- meni. Living scenes: Multi-object relocalization and re- construction in changing 3d environments. InCVPR, pages 28014–28024, 2024. 1
2024
-
[64]
Improved mlp point cloud processing with high-dimensional positional encoding
Yanmei Zou, Hongshan Yu, Zhengeng Yang, Zechuan Li, and Naveed Akhtar. Improved mlp point cloud processing with high-dimensional positional encoding. InAAAI, pages 7891–7899, 2024. 5
2024
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.