FoundObj: Self-supervised Foundation Models as Rewards for Label-free 3D Object Segmentation
Pith reviewed 2026-06-29 18:38 UTC · model grok-4.3
The pith
Self-supervised foundation models provide rewards that let an agent segment 3D objects without any scene labels.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
FoundObj introduces a superpoint-based agent that incrementally merges neighboring superpoints into objects, using reward modules that extract semantic priors from 2D foundation models and geometric priors from 3D foundation models. These rewards train the agent via reinforcement learning to identify multi-class objects in point clouds without scene-level annotations. The method achieves superior performance on benchmarks and demonstrates strong zero-shot and long-tail generalization.
What carries the argument
The superpoint-based object discovery agent guided by semantic and geometric reward modules derived from self-supervised foundation models.
Load-bearing premise
The semantic and geometric priors from the self-supervised foundation models are complementary and accurate enough to guide the superpoint merging without any human-provided scene labels.
What would settle it
A controlled test in which the foundation-model reward signals are replaced by random or constant values and the agent then matches or falls below non-RL baselines on standard benchmarks would falsify the claim.
Figures
read the original abstract
We address the challenging task of 3D object segmentation in complex scene point clouds without relying on any scene-level human annotations during training. Existing methods are typically constrained to identifying simple objects, primarily due to insufficient object priors in the learning process. In this paper, we present FoundObj, a novel framework featuring a superpoint-based object discovery agent that incrementally merges suitable neighboring superpoints, guided by our innovative semantic and geometric reward modules. These modules synergistically leverage semantic and geometric priors from self-supervised 2D/3D foundation models, providing complementary feedback to the object discovery agent and enabling robust identification of multi-class objects through reinforcement learning. Extensive experiments on diverse benchmarks demonstrate that our approach consistently outperforms existing baselines. Notably, our method exhibits strong generalization in zero-shot and long-tail scenarios, underscoring its potential for scalable, label-free 3D object segmentation.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper introduces FoundObj, a framework for label-free 3D object segmentation on point clouds. A superpoint-based agent incrementally merges neighboring superpoints via reinforcement learning, with the reward signal supplied by semantic and geometric modules that extract priors from self-supervised 2D/3D foundation models. The central claims are consistent outperformance over baselines together with strong zero-shot and long-tail generalization without any scene-level human annotations.
Significance. If the reward modules indeed supply accurate complementary feedback, the approach would be significant for scalable 3D segmentation by removing the need for scene-level labels and by repurposing existing foundation models as reward sources. The RL formulation for object discovery is a reasonable direction, but the manuscript supplies no machine-checked proofs, reproducible code, or parameter-free derivations that would strengthen the assessment.
major comments (2)
- [Abstract and §3] Abstract and §3 (method overview): the headline claim that the combined semantic and geometric reward modules supply 'complementary feedback' sufficient to discover multi-class object boundaries is load-bearing, yet no ablation isolating each module, no correlation analysis between the reward signal and ground-truth object geometry, and no failure-mode study on noisy foundation-model predictions in cluttered scenes are supplied.
- [§4] §4 (experiments): the assertion of 'consistent outperformance' and 'strong generalization in zero-shot and long-tail scenarios' is presented without reference to specific tables, quantitative metrics, or statistical significance tests that would demonstrate the reward signal quality exceeds that of existing baselines.
minor comments (1)
- [§3] Notation for the reward function and the RL policy update is introduced without an explicit equation or pseudocode block, making the training procedure difficult to reconstruct.
Simulated Author's Rebuttal
We thank the referee for the constructive feedback. We address each major comment below and will revise the manuscript to strengthen the empirical support for our claims regarding complementary feedback and quantitative results.
read point-by-point responses
-
Referee: [Abstract and §3] Abstract and §3 (method overview): the headline claim that the combined semantic and geometric reward modules supply 'complementary feedback' sufficient to discover multi-class object boundaries is load-bearing, yet no ablation isolating each module, no correlation analysis between the reward signal and ground-truth object geometry, and no failure-mode study on noisy foundation-model predictions in cluttered scenes are supplied.
Authors: We agree that the manuscript would be strengthened by explicit evidence for the complementary nature of the modules. The current version demonstrates combined performance but does not isolate contributions. In revision we will add: (i) an ablation study removing each module in turn, (ii) correlation analysis between per-superpoint reward values and ground-truth object boundaries, and (iii) a failure-mode study on scenes with noisy foundation-model outputs. These will appear in a new subsection of §3 and expanded experiments. revision: yes
-
Referee: [§4] §4 (experiments): the assertion of 'consistent outperformance' and 'strong generalization in zero-shot and long-tail scenarios' is presented without reference to specific tables, quantitative metrics, or statistical significance tests that would demonstrate the reward signal quality exceeds that of existing baselines.
Authors: Section 4 already contains quantitative comparisons on multiple benchmarks using metrics such as mIoU, precision-recall, and zero-shot/long-tail transfer scores, reported in Tables 1–4 against the listed baselines. We will revise the abstract and §3 to include explicit forward references to these tables and metrics. We will also add statistical significance tests (paired t-tests with p-values) in the revised experimental section to quantify improvement over baselines. revision: yes
Circularity Check
No circularity; method uses external foundation-model priors as independent reward sources
full rationale
The provided abstract and description show a method that trains an RL superpoint-merging agent using rewards derived from pre-existing self-supervised 2D/3D foundation models. No equations, fitted parameters, or self-citations are quoted that would make the claimed outperformance or generalization reduce by construction to quantities defined within the paper itself. The central premise treats the foundation models as external, complementary inputs rather than outputs of the current work, rendering the derivation self-contained against external benchmarks.
Axiom & Free-Parameter Ledger
Reference graph
Works this paper leans on
-
[1]
write newline
" write newline "" before.all 'output.state := FUNCTION n.dashify 't := "" t empty not t #1 #1 substring "-" = t #1 #2 substring "--" = not "--" * t #2 global.max substring 't := t #1 #1 substring "-" = "-" * t #2 global.max substring 't := while if t #1 #1 substring * t #2 global.max substring 't := if while FUNCTION format.date year duplicate empty "emp...
-
[2]
Joint 2D-3D-Semantic Data for Indoor Scene Understanding
Armeni, I., Sax, S., Zamir, A. R., and Savarese, S. Joint 2D-3D-Semantic Data for Indoor Scene Understanding . arXiv:1702.01105, 2017
work page internal anchor Pith review Pith/arXiv arXiv 2017
-
[3]
A., Emmerichs, D
Baur, S. A., Emmerichs, D. J., Moosmann, F., Pinggera, P., Ommer, B., and Geiger, A. SLIM: Self-Supervised LiDAR Scene Flow and Motion Segmentation . ICCV, 2021
2021
-
[4]
Recognition-by-Components: A Theory of Human Image Understanding
Biederman, I. Recognition-by-Components: A Theory of Human Image Understanding . Psychological Review, 1987
1987
-
[5]
Boudjoghra, M. E. A., Dai, A., Lahoud, J., Cholakkal, H., Anwer, R. M., Khan, S., and Khan, F. S. Open-YOLO 3D: Towards Fast and Accurate Open-Vocabulary 3D Instance Segmentation . ICLR, 2025
2025
-
[6]
Coda: Collaborative novel box discovery and cross-modal alignment for open-vocabulary 3d object detection
Cao, Y., Yihan, Z., Xu, H., and Xu, D. Coda: Collaborative novel box discovery and cross-modal alignment for open-vocabulary 3d object detection. NeurIPS, 2023
2023
-
[7]
Carion, N., Gustafson, L., Hu, Y.-T., Debnath, S., Hu, R., Suris, D., Ryali, C., Alwala, K. V., Khedr, H., Huang, A., Lei, J., Ma, T., Guo, B., Kalla, A., Marks, M., Greer, J., Wang, M., Sun, P., R \" a dle, R., Afouras, T., Mavroudi, E., Xu, K., Wu, T.-H., Zhou, Y., Momeni, L., Hazra, R., Ding, S., Vaze, S., Porcher, F., Li, F., Li, S., Kamath, A., Cheng...
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[8]
Emerging Properties in Self-Supervised Vision Transformers
Caron, M., Touvron, H., Misra, I., J \' e gou, H., Mairal, J., Bojanowski, P., and Joulin, A. Emerging Properties in Self-Supervised Vision Transformers . ICCV, 2021
2021
-
[9]
ShapeNet: An Information-Rich 3D Model Repository
Chang, A. X., Funkhouser, T., Guibas, L., Hanrahan, P., Huang, Q., Li, Z., Savarese, S., Savva, M., Song, S., Su, H., Xiao, J., Yi, L., and Yu, F. ShapeNet: An Information-Rich 3D Model Repository . arXiv:1512.03012, 2015
work page internal anchor Pith review Pith/arXiv arXiv 2015
-
[10]
EvObj: Learning Evolving Object-centric Representations for 3D Instance Segmentation without Scene Supervision
Chen, J., Zhang, Z., Yang, Y., Li, J., Wei, S., Sun, Z., and Yang, B. EvObj: Learning Evolving Object-centric Representations for 3D Instance Segmentation without Scene Supervision . CVPR, 2026 a
2026
-
[11]
Evobj: Learning evolving object-centric representations for 3d instance segmentation without scene supervision
Chen, J., Zhang, Z., Yang, Y., Li, J., Wei, S., Sun, Z., and Yang, B. Evobj: Learning evolving object-centric representations for 3d instance segmentation without scene supervision. CVPR, 2026 b
2026
-
[12]
Hierarchical Aggregation for 3D Instance Segmentation
Chen, S., Fang, J., Zhang, Q., Liu, W., and Wang, X. Hierarchical Aggregation for 3D Instance Segmentation . ICCV, 2021
2021
-
[13]
A., and Pons-Moll, G
Chibane, J., Engelmann, F., Tran, T. A., and Pons-Moll, G. Box2Mask: Weakly Supervised 3D Semantic Instance Segmentation Using Bounding Boxes . ECCV, 2022
2022
-
[14]
and Ralph, M
Chiou, R. and Ralph, M. A. L. The anterior temporal cortex is a primary semantic source of top-down influences on object recognition . Cortex, 2016
2016
-
[15]
Collins, J., Goel, S., Luthra, A., Xu, L., Deng, K., Zhang, X., Vicente, T. F. Y., Arora, H., Dideriksen, T., Guillaumin, M., and Malik, J. ABO: Dataset and Benchmarks for Real-World 3D Object Understanding . CVPR, 2022
2022
-
[16]
Spconv: Spatially sparse convolution library
Contributors, S. Spconv: Spatially sparse convolution library. 2022
2022
-
[17]
X., Savva, M., Halber, M., Funkhouser, T., and Nie ner, M
Dai, A., Chang, A. X., Savva, M., Halber, M., Funkhouser, T., and Nie ner, M. ScanNet: Richly-annotated 3D Reconstructions of Indoor Scenes . CVPR, 2017
2017
-
[18]
Y., VanderBilt, E., Kembhavi, A., Vondrick, C., Gkioxari, G., Ehsani, K., Schmidt, L., and Farhadi, A
Deitke, M., Liu, R., Wallingford, M., Ngo, H., Michel, O., Kusupati, A., Fan, A., Laforte, C., Voleti, V., Gadre, S. Y., VanderBilt, E., Kembhavi, A., Vondrick, C., Gkioxari, G., Ehsani, K., Schmidt, L., and Farhadi, A. Objaverse-XL: A Universe of 10M+ 3D Objects . NeurIPS, 2023
2023
-
[19]
Sketchy Bounding-box Supervision for 3D Instance Segmentation
Deng, Q., Hui, L., Xie, J., and Yang, J. Sketchy Bounding-box Supervision for 3D Instance Segmentation . CVPR, 2025
2025
-
[20]
A density-based algorithm for discovering clusters in large spatial databases with noise
Ester, M., Kriegel, H.-P., Sander, J., and Xu, X. A density-based algorithm for discovering clusters in large spatial databases with noise . KDD, 1996
1996
-
[21]
M., Loy, C
Fang, Z., Li, X., Li, X., Buhmann, J. M., Loy, C. C., and Liu, M. Explore In-Context Learning for 3D Point Cloud Understanding . NeurIPS, 2023
2023
-
[22]
Felzenszwalb, P. F. and Huttenlocher, D. P. Efficient Graph-Based Image Segmentation . IJCV, 2004
2004
-
[23]
3D-FUTURE: 3D Furniture shape with TextURE
Fu, H., Jia, R., Gao, L., Gong, M., Zhao, B., Maybank, S., and Tao, D. 3D-FUTURE: 3D Furniture shape with TextURE . IJCV, 2021
2021
-
[24]
Scaling open-vocabulary image segmentation with image-level labels
Ghiasi, G., Gu, X., Cui, Y., and Lin, T.-Y. Scaling open-vocabulary image segmentation with image-level labels. ECCV, 2022
2022
-
[25]
3D Semantic Segmentation with Submanifold Sparse Convolutional Networks
Graham, B., Engelcke, M., and van der Maaten, L. 3D Semantic Segmentation with Submanifold Sparse Convolutional Networks . CVPR, 2018
2018
-
[26]
Finding Your (3D) Center: 3D Object Detection Using a Learned Loss
Griffiths, D., Boehm, J., and Ritschel, T. Finding Your (3D) Center: 3D Object Detection Using a Learned Loss . ECCV, 2020
2020
-
[27]
A Survey on Self-Supervised Learning: Algorithms, Applications, and Future Trends
Gui, J., Chen, T., Zhang, J., Cao, Q., Sun, Z., Luo, H., and Tao, D. A Survey on Self-Supervised Learning: Algorithms, Applications, and Future Trends . TPAMI, 2024
2024
-
[28]
SAM-guided Graph Cut for 3D Instance Segmentation
Guo, H., Zhu, H., Peng, S., Wang, Y., Shen, Y., Hu, R., and Zhou, X. SAM-guided Graph Cut for 3D Instance Segmentation . ECCV, 2024
2024
-
[29]
and Song, S
Ha, H. and Song, S. Semantic Abstraction: Open-World 3D Scene Understanding from 2D Vision-Language Models . CoRL, 2022
2022
-
[30]
OccuSeg: Occupancy-aware 3D Instance Segmentation
Han, L., Zheng, T., Xu, L., and Fang, L. OccuSeg: Occupancy-aware 3D Instance Segmentation . CVPR, 2020
2020
-
[31]
Han, Z., Boudjoghra, M. E. A., Dong, J., Wang, J., and Anwer, R. M. All in One: Visual-Description-Guided Unified Point Cloud Segmentation . ICCV, 2025
2025
-
[32]
DyCo3D: Robust Instance Segmentation of 3D Point Clouds through Dynamic Convolution
He, T., Shen, C., and van den Hengel, A. DyCo3D: Robust Instance Segmentation of 3D Point Clouds through Dynamic Convolution . CVPR, 2021
2021
-
[33]
3D-SIS: 3D Semantic Instance Segmentation of RGB-D Scans
Hou, J., Dai, A., and Nie ner, M. 3D-SIS: 3D Semantic Instance Segmentation of RGB-D Scans . CVPR, 2019
2019
-
[34]
Huang, S.-Y., Choe, J., Wang, Y.-C. F., and Sun, C. OpenVoxel: Training-Free Grouping and Captioning Voxels for Open-Vocabulary 3D Scene Understanding . arXiv:2601.09575, 2026
-
[35]
OpenIns3D: Snap and Lookup for 3D Open-vocabulary Instance Segmentation
Huang, Z., Wu, X., Chen, X., Zhao, H., Zhu, L., and Lasenby, J. OpenIns3D: Snap and Lookup for 3D Open-vocabulary Instance Segmentation . ECCV, 2024
2024
-
[36]
Jung, S., Zheng, J., Zhang, K., Qiao, N., Chen, A. Y. C., Xia, L., Liu, C., Sun, Y., Zeng, X., Huang, H.-W., Boots, B., Sun, M., and Kuo, C.-H. Details Matter for Indoor Open-vocabulary 3D Instance Segmentation . ICCV, 2025
2025
-
[37]
C., Lo, W.-Y., Doll \' a r, P., and Girshick, R
Kirillov, A., Mintun, E., Ravi, N., Mao, H., Rolland, C., Gustafson, L., Xiao, T., Whitehead, S., Berg, A. C., Lo, W.-Y., Doll \' a r, P., and Girshick, R. Segment Anything . ICCV, 2023
2023
-
[38]
OneFormer3D: One Transformer for Unified Point Cloud Segmentation
Kolodiazhnyi, M., Vorontsova, A., Konushin, A., and Rukhovich, D. OneFormer3D: One Transformer for Unified Point Cloud Segmentation . CVPR, 2024
2024
-
[39]
Mask-Attention-Free Transformer for 3D Instance Segmentation
Lai, X., Yuan, Y., Chu, R., Chen, Y., Hu, H., and Jia, J. Mask-Attention-Free Transformer for 3D Instance Segmentation . ICCV, 2023
2023
-
[40]
Hunyuan3D 2.5: Towards High-Fidelity 3D Assets Generation with Ultimate Details
Lai, Z., Zhao, Y., Liu, H., Zhao, Z., Lin, Q., Shi, H., Yang, X., Yang, M., Yang, S., Feng, Y., Zhang, S., Huang, X., Luo, D., Yang, F., Yang, F., Wang, L., Liu, S., Tang, Y., Cai, Y., He, Z., Liu, T., Liu, Y., Jiang, J., Linus, Huang, J., and Guo, C. Hunyuan3D 2.5: Towards High-Fidelity 3D Assets Generation with Ultimate Details . arXiv:2506.16504, 2025
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[41]
F., Kautz, J., Cho, M., and Choy, C
Lee, J., Park, C., Choe, J., Wang, Y.-C. F., Kautz, J., Cho, M., and Choy, C. Mosaic3D: Foundation Dataset and Model for Open-Vocabulary 3D Segmentation . CVPR, 2025
2025
-
[42]
EFEM: Equivariant Neural Field Expectation Maximization for 3D Object Segmentation Without Scene Supervision
Lei, J., Deng, C., Schmeckpeper, K., Guibas, L., and Daniilidis, K. EFEM: Equivariant Neural Field Expectation Maximization for 3D Object Segmentation Without Scene Supervision . CVPR, 2023
2023
-
[43]
Advances in 3d generation: A survey.arXiv preprint arXiv:2401.17807, 2024
Li, X., Zhang, Q., Kang, D., Cheng, W., Gao, Y., Zhang, J., Liang, Z., Liao, J., Cao, Y.-P., and Shan, Y. Advances in 3D Generation: A Survey . arXiv:2401.17807, 2024
-
[44]
Liu, H., Li, C., Wu, Q., and Lee, Y. J. Visual Instruction Tuning . NeurIPS, 2023 a
2023
-
[45]
Towards 3D Objectness Learning in an Open World
Liu, T., Wang, Z., Liu, R., Wang, G., and Zhang, D. Towards 3D Objectness Learning in an Open World . NeurIPS, 2025
2025
-
[46]
Segment Any Point Cloud Sequences by Distilling Vision Foundation Models
Liu, Y., Kong, L., Cen, J., Chen, R., Zhang, W., Pan, L., Chen, K., and Liu, Z. Segment Any Point Cloud Sequences by Distilling Vision Foundation Models . NeurIPS, 2023 b
2023
-
[47]
Query Refinement Transformer for 3D Instance Segmentation
Lu, J., Deng, J., Wang, C., He, J., and Zhang, T. Query Refinement Transformer for 3D Instance Segmentation . ICCV, 2023 a
2023
-
[48]
Open-Vocabulary Point-Cloud Object Detection without 3D Annotation
Lu, Y., Xu, C., Wei, X., Xie, X., Tomizuka, M., Keutzer, K., and Zhang, S. Open-Vocabulary Point-Cloud Object Detection without 3D Annotation . CVPR, 2023 b
2023
-
[49]
Vocabulary-Free 3D Instance Segmentation with Vision-Language Assistant
Mei, G., Riz, L., Wang, Y., and Poiesi, F. Vocabulary-Free 3D Instance Segmentation with Vision-Language Assistant . 3DV, 2025
2025
-
[50]
Any3DIS: Class-Agnostic 3D Instance Segmentation by 2D Mask Tracking
Nguyen, P., Luu, M., Tran, A., Pham, C., and Nguyen, K. Any3DIS: Class-Agnostic 3D Instance Segmentation by 2D Mask Tracking . CVPR, 2025
2025
-
[51]
Nguyen, P. D. A., Ngo, T. D., Kalogerakis, E., Gan, C., Tran, A., Pham, C., and Nguyen, K. Open3DIS: Open-Vocabulary 3D Instance Segmentation with 2D Mask Guidance . CVPR, 2024
2024
-
[52]
Oquab, M., Darcet, T., Moutakanni, T., Vo, H. V., Szafraniec, M., Khalidov, V., Fernandez, P., Haziza, D., Massa, F., El-nouby, A., Assran, M., Ballas, N., Galuba, W., Howes, R., Huang, P.-y., Li, S.-w., Misra, I., Rabbat, M., Sharma, V., Synnaeve, G., Xu, H., Jegou, H., and Mairal, J. DINOv2: Learning Robust Visual Features without Supervision . TMLR, 2024
2024
-
[53]
Openscene: 3d scene understanding with open vocabularies
Peng, S., Genova, K., Jiang, C., Tagliasacchi, A., Pollefeys, M., Funkhouser, T., et al. Openscene: 3d scene understanding with open vocabularies. CVPR, 2023
2023
-
[54]
W., Hallacy, C., Ramesh, A., Goh, G., Agarwal, S., Sastry, G., Askell, A., Mishkin, P., Clark, J., Krueger, G., and Sutskever, I
Radford, A., Kim, J. W., Hallacy, C., Ramesh, A., Goh, G., Agarwal, S., Sastry, G., Askell, A., Mishkin, P., Clark, J., Krueger, G., and Sutskever, I. Learning Transferable Visual Models From Natural Language Supervision . ICML, 2021
2021
-
[55]
UCFSeg: Unsupervised 3D point cloud segmentation via multi-scale contextual feature learning
Ren, S., Zhang, C., Wang, S., Zhu, L., and Zhang, M. UCFSeg: Unsupervised 3D point cloud segmentation via multi-scale contextual feature learning . Digital Signal Processing, 2026
2026
-
[56]
Edge-Aware 3D Instance Segmentation Network with Intelligent Semantic Prior
Roh, W., Jung, H., Nam, G., Yeom, J., Park, H., Ho, S., and Sangpil, Y. Edge-Aware 3D Instance Segmentation Network with Intelligent Semantic Prior . CVPR, 2024
2024
-
[57]
Language-Grounded Indoor 3D Semantic Segmentation in the Wild
Rozenberszki, D., Litany, O., and Dai, A. Language-Grounded Indoor 3D Semantic Segmentation in the Wild . ECCV, 2022
2022
-
[58]
UnScene3D: Unsupervised 3D Instance Segmentation for Indoor Scenes
Rozenberszki, D., Litany, O., and Dai, A. UnScene3D: Unsupervised 3D Instance Segmentation for Indoor Scenes . CVPR, 2024
2024
-
[59]
Mask3D: Mask Transformer for 3D Semantic Instance Segmentation
Schult, J., Engelmann, F., Hermans, A., Litany, O., Tang, S., and Leibe, B. Mask3D: Mask Transformer for 3D Semantic Instance Segmentation . ICRA, 2023
2023
-
[60]
Part2Object: Hierarchical Unsupervised 3D Instance Segmentation
Shi, C., Zhang, Y., Yang, B., Tang, J., and Yang, S. Part2Object: Hierarchical Unsupervised 3D Instance Segmentation . ECCV, 2024
2024
-
[61]
and Malik, J
Shi, J. and Malik, J. Normalized cuts and image segmentation . TPAMI, 2000
2000
-
[62]
Spherical Mask: Coarse-to-Fine 3D Point Cloud Instance Segmentation with Spherical Representation
Shin, S., Zhou, K., Vankadari, M., Markham, A., and Trigoni, N. Spherical Mask: Coarse-to-Fine 3D Point Cloud Instance Segmentation with Spherical Representation . CVPR, 2024
2024
-
[63]
and Yang, B
Song, Z. and Yang, B. OGC: Unsupervised 3D Object Segmentation from Rigid Dynamics of Point Clouds . NeurIPS, 2022
2022
-
[64]
and Yang, B
Song, Z. and Yang, B. Unsupervised 3D Object Segmentation of Point Clouds by Geometry Consistency . TPAMI, 2024
2024
-
[65]
Superpoint Transformer for 3D Scene Instance Segmentation
Sun, J., Qing, C., Tan, J., and Xu, X. Superpoint Transformer for 3D Scene Instance Segmentation . AAAI, 2023
2023
-
[66]
W., Pollefeys, M., Tombari, F., and Engelmann, F
Takmaz, A., Fedele, E., Sumner, R. W., Pollefeys, M., Tombari, F., and Engelmann, F. OpenMask3D: Open-Vocabulary 3D Instance Segmentation . NeurIPS, 2023
2023
-
[67]
Learning Inter-Superpoint Affinity for Weakly Supervised 3D Instance Segmentation
Tang, L., Hui, L., and Xie, J. Learning Inter-Superpoint Affinity for Weakly Supervised 3D Instance Segmentation . ACCV, 2022
2022
-
[68]
M., Nguyen, X
Vu, T., Kim, K., Luu, T. M., Nguyen, X. T., and Yoo, C. D. SoftGroup for 3D Instance Segmentation on Point Clouds . CVPR, 2022
2022
-
[69]
SGPN: Similarity Group Proposal Network for 3D Point Cloud Instance Segmentation
Wang, W., Yu, R., Huang, Q., and Neumann, U. SGPN: Similarity Group Proposal Network for 3D Point Cloud Instance Segmentation . CVPR, 2018
2018
-
[70]
Autorecon: Automated 3d object discovery and reconstruction
Wang, Y., He, X., Peng, S., Lin, H., Bao, H., and Zhou, X. Autorecon: Automated 3d object discovery and reconstruction. CVPR, 2023
2023
-
[71]
Masked Point-Entity Contrast for Open-Vocabulary 3D Scene Understanding
Wang, Y., Jia, B., Zhu, Z., and Huang, S. Masked Point-Entity Contrast for Open-Vocabulary 3D Scene Understanding . CVPR, 2025
2025
-
[72]
RayletDF: Raylet Distance Fields for Generalizable 3D Surface Reconstruction from Point Clouds or Gaussians
Wei, S., Li, J., Yang, Y., Zhou, S., and Yang, B. RayletDF: Raylet Distance Fields for Generalizable 3D Surface Reconstruction from Point Clouds or Gaussians . ICCV, 2025
2025
-
[73]
Direct3d: Scalable image-to-3d generation via 3d latent diffusion transformer
Wu, S., Lin, Y., Zhang, F., Zeng, Y., Xu, J., Torr, P., Cao, X., and Yao, Y. Direct3d: Scalable image-to-3d generation via 3d latent diffusion transformer. NeurIPS, 2024
2024
-
[74]
Structured 3D Latents for Scalable and Versatile 3D Generation
Xiang, J., Lv, Z., Xu, S., Deng, Y., Wang, R., Zhang, B., Chen, D., Tong, X., and Yang, J. Structured 3D Latents for Scalable and Versatile 3D Generation . CVPR, 2025
2025
-
[75]
MaskClustering: View Consensus based Mask Graph Clustering for Open-Vocabulary 3D Instance Segmentation
Yan, M., Zhang, J., Zhu, Y., and Wang, H. MaskClustering: View Consensus based Mask Graph Clustering for Open-Vocabulary 3D Instance Segmentation . CVPR, 2024
2024
-
[76]
Learning Object Bounding Boxes for 3D Instance Segmentation on Point Clouds
Yang, B., Wang, J., Clark, R., Hu, Q., Wang, S., Markham, A., and Trigoni, N. Learning Object Bounding Boxes for 3D Instance Segmentation on Point Clouds . NeurIPS, 2019
2019
-
[77]
unMORE: Unsupervised Multi-Object Segmentation via Center-Boundary Reasoning
Yang, Y., Zhang, Z., and Yang, B. unMORE: Unsupervised Multi-Object Segmentation via Center-Boundary Reasoning . ICML, 2025
2025
-
[78]
GSPN: Generative Shape Proposal Network for 3D Instance Segmentation in Point Cloud
Yi, L., Zhao, W., Wang, H., Sung, M., and Guibas, L. GSPN: Generative Shape Proposal Network for 3D Instance Segmentation in Point Cloud . CVPR, 2019
2019
-
[79]
SAI3D: Segment Any Instance in 3D Scenes
Yin, Y., Liu, Y., Xiao, Y., Cohen-Or, D., Huang, J., and Chen, B. SAI3D: Segment Any Instance in 3D Scenes . CVPR, 2024
2024
-
[80]
BEEP3D: Box-Supervised End-to-End Pseudo-Mask Generation for 3D Instance Segmentation
Yoo, Y., Kim, S., and Kim, C. BEEP3D: Box-Supervised End-to-End Pseudo-Mask Generation for 3D Instance Segmentation . arXiv:2510.12182, 2025
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.