C-GenReg: Training-Free 3D Point Cloud Registration by Multi-View-Consistent Geometry-to-Image Generation with Probabilistic Modalities Fusion
Pith reviewed 2026-05-10 08:31 UTC · model grok-4.3
The pith
C-GenReg achieves training-free 3D point cloud registration by generating multi-view-consistent images from geometry, extracting VFM correspondences, and probabilistically fusing them with raw geometric matches for zero-shot performance on indoor and outdoor benchmarks.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
For the first time, we demonstrate a generative registration framework that operates successfully on real outdoor LiDAR data, where no imagery data is available. Extensive experiments on indoor (3DMatch, ScanNet) and outdoor (Waymo) benchmarks demonstrate strong zero-shot performance and superior cross-domain generalization.
Load-bearing premise
The World Foundation Model can synthesize multi-view-consistent RGB representations from the input geometry that preserve spatial coherence across source and target views without any fine-tuning; if the generated images lack sufficient realism or consistency, the subsequent VFM-based matches will be unreliable.
Figures
read the original abstract
We introduce C-GenReg, a training-free framework for 3D point cloud registration that leverages the complementary strengths of world-scale generative priors and registration-oriented Vision Foundation Models (VFMs). Current learning-based 3D point cloud registration methods struggle to generalize across sensing modalities, sampling differences, and environments. Hence, C-GenReg augments the geometric point cloud registration branch by transferring the matching problem into an auxiliary image domain, where VFMs excel, using a World Foundation Model to synthesize multi-view-consistent RGB representations from the input geometry. This generative transfer, preserves spatial coherence across source and target views without any fine-tuning. From these generated views, a VFM pretrained for finding dense correspondences extracts matches. The resulting pixel correspondences are lifted back to 3D via the original depth maps. To further enhance robustness, we introduce a "Match-then-Fuse" probabilistic cold-fusion scheme that combines two independent correspondence posteriors, that of the generated-RGB branch with that of the raw geometric branch. This principled fusion preserves each modality inductive bias and provides calibrated confidence without any additional learning. C-GenReg is zero-shot and plug-and-play: all modules are pretrained and operate without fine-tuning. Extensive experiments on indoor (3DMatch, ScanNet) and outdoor (Waymo) benchmarks demonstrate strong zero-shot performance and superior cross-domain generalization. For the first time, we demonstrate a generative registration framework that operates successfully on real outdoor LiDAR data, where no imagery data is available.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper introduces C-GenReg, a training-free 3D point cloud registration framework that augments a geometric registration branch by using a pretrained World Foundation Model to synthesize multi-view-consistent RGB images from source and target point clouds (no real imagery or fine-tuning), extracts dense correspondences via a Vision Foundation Model, lifts pixel matches back to 3D using original depth maps, and fuses the resulting posterior with the geometric branch via a probabilistic 'Match-then-Fuse' scheme that preserves each modality's inductive bias. It reports strong zero-shot performance on indoor benchmarks (3DMatch, ScanNet) and outdoor LiDAR (Waymo), claiming the first successful generative registration on real outdoor data without imagery.
Significance. If the central claims hold, this would be a notable contribution to zero-shot cross-domain 3D registration by demonstrating that pretrained generative and correspondence models can transfer matching problems into the image domain while maintaining spatial coherence, enabling robust performance on challenging outdoor LiDAR without any task-specific training or imagery.
major comments (1)
- [Abstract] Abstract and experimental claims on Waymo: the central assertion that the World Foundation Model produces multi-view-consistent RGB representations from sparse outdoor LiDAR geometry (preserving spatial coherence without fine-tuning) is load-bearing for the zero-shot outdoor success and cross-domain generalization, yet the manuscript provides no quantitative validation such as cross-view reprojection error, consistency metrics, or ablation removing the generative branch on Waymo; without these, it is unclear whether the reported performance relies on the generative transfer or defaults to the geometric branch alone.
minor comments (1)
- [Method] The description of the 'Match-then-Fuse' probabilistic cold-fusion scheme would benefit from an explicit equation or pseudocode showing how the two independent posteriors are combined and how calibration is achieved.
Axiom & Free-Parameter Ledger
axioms (2)
- domain assumption A pre-trained World Foundation Model can generate multi-view-consistent RGB images from point cloud geometry that preserve spatial coherence without fine-tuning
- domain assumption Pre-trained Vision Foundation Models can reliably extract dense correspondences from the synthesized images
Reference graph
Works this paper leans on
-
[1]
Muhammad Awais, Muzammal Naseer, Salman Khan, Rao Muhammad Anwer, Hisham Cholakkal, Mubarak Shah, Ming-Hsuan Y ang, and Fahad Shahbaz Khan. Foundation models defining a new era in vision: a survey and outlook.IEEE Trans- actions on P attern Analysis and Machine Intelligence, 2025. 1
work page 2025
-
[2]
Method for registration of 3-d shapes
Paul J Besl and Neil D McKay. Method for registration of 3-d shapes. InSensor fusion IV: control paradigms and data structures, pages 586–606. Spie, 1992. 2
work page 1992
-
[3]
Sc2-pcr: A sec- ond order spatial compatibility for efficient and robust point cloud registration
Zhi Chen, Kun Sun, Fan Y ang, and Wenbing Tao. Sc2-pcr: A sec- ond order spatial compatibility for efficient and robust point cloud registration. InProceedings of the IEEE/CVF conference on com- puter vision and pattern recognition, pages 13221–13231, 2022. 6
work page 2022
-
[4]
Fully convolutional geometric features
Christopher Choy, Jaesik Park, and Vladlen Koltun. Fully convolutional geometric features. InProceedings of the IEEE/CVF international conference on computer vision, pages 8958–8966, 2019. 1, 2, 6, 7
work page 2019
-
[5]
A generic fisheye camera model for robotic applica- tions
Jonathan Courbon, Y oucef Mezouar, Laurent Eckt, and Philippe Martinet. A generic fisheye camera model for robotic applica- tions. In2007 IEEE/RSJ International Conference on Intelligent Robots and Systems, pages 1683–1688. IEEE, 2007. 12
work page 2007
-
[6]
A volumetric method for building complex models from range images
Brian Curless and Marc Levoy. A volumetric method for building complex models from range images. InProceedings of the 23rd annual conference on Computer graphics and interactive techniques, pages 303–312, 1996. 4
work page 1996
-
[7]
Scannet: Richly- annotated 3d reconstructions of indoor scenes
Angela Dai, Angel X Chang, Manolis Savva, Maciej Halber, Thomas Funkhouser, and Matthias Nießner. Scannet: Richly- annotated 3d reconstructions of indoor scenes. InProceedings of the IEEE conference on computer vision and pattern recognition, pages 5828–5839, 2017. 4, 6
work page 2017
-
[8]
Roma: Robust dense feature matching
Johan Edstedt, Qiyu Sun, Georg B¨okman, M˚arten Wadenb¨ack, and Michael Felsberg. Roma: Robust dense feature matching. In Proceedings of the IEEE/CVF Conference on Computer V ision and P attern Recognition (CVPR), pages 19790–19800, 2024. 2, 8
work page 2024
-
[9]
Martin A Fischler and Robert C Bolles. Random sample consensus: a paradigm for model fitting with applications to image analysis and automated cartography.Communications of the ACM, 24(6):381–395, 1981. 1, 2
work page 1981
-
[10]
Vision meets robotics: The kitti dataset.International Journal of Robotics Research (IJRR), 2013
Andreas Geiger, Philip Lenz, Christoph Stiller, and Raquel Urtasun. Vision meets robotics: The kitti dataset.International Journal of Robotics Research (IJRR), 2013. 7
work page 2013
-
[11]
Berthold KP Horn. Closed-form solution of absolute orientation using unit quaternions.Journal of the optical society of America A, 4(4):629–642, 1987. 3
work page 1987
-
[12]
Predator: Registration of 3d point clouds with low overlap
Shengyu Huang, Zan Gojcic, Mikhail Usvyatsov, Andreas Wieser, and Konrad Schindler. Predator: Registration of 3d point clouds with low overlap. InProceedings of the IEEE/CVF Conference on computer vision and pattern recognition, pages 4267–4276, 2021. 1, 2, 6, 7
work page 2021
-
[13]
Generative point cloud registration
Haobo Jiang, Jin Xie, Jian Y ang, Liang Y u, and Jianmin Zheng. Generative point cloud registration. InF orty-second International Conference on Machine Learning, 2025. 2, 3, 6, 7, 8
work page 2025
-
[14]
Zero-shot rgb-d point cloud registration with pre-trained large vision model
Haobo Jiang, Jin Xie, Jian Y ang, Liang Y u, and Jianmin Zheng. Zero-shot rgb-d point cloud registration with pre-trained large vision model. InProceedings of the Computer V ision and P attern Recognition Conference, pages 16943–16952, 2025. 2, 3, 6, 7, 8
work page 2025
-
[15]
Distilling cosmos transfer 1 models
Grace Lam. Distilling cosmos transfer 1 models. https: //nvidia-cosmos.github.io/cosmos-cookbook/ core _ concepts / distillation / distilling _ transfer1.html, 2025. NVIDIA Cosmos Cookbook. 14
work page 2025
-
[16]
Grounding image matching in 3d with mast3r, 2024
Vincent Leroy, Y ohann Cabon, and Jerome Revaud. Grounding image matching in 3d with mast3r, 2024. 2, 4, 6, 7, 8
work page 2024
-
[17]
Unsupervised deep probabilistic approach for partial point cloud registration
Guofeng Mei, Hao Tang, Xiaoshui Huang, Weijie Wang, Juan Liu, Jian Zhang, Luc V an Gool, and Qiang Wu. Unsupervised deep probabilistic approach for partial point cloud registration. In Proceedings of the IEEE/CVF Conference on Computer V ision and P attern Recognition, pages 13611–13620, 2023. 2, 6
work page 2023
-
[18]
Colorpcr: Color point cloud registration with multi-stage geometric-color fusion
Juncheng Mu, Lin Bie, Shaoyi Du, and Y ue Gao. Colorpcr: Color point cloud registration with multi-stage geometric-color fusion. InProceedings of the IEEE/CVF Conference on Computer V ision and P attern Recognition, pages 21061–21070, 2024. 3
work page 2024
-
[19]
Cosmos world foundation model platform for physical ai, 2025
NVIDIA, :, Niket Agarwal, Arslan Ali, Maciej Bala, Y ogesh Bal- aji, Erik Barker, Tiffany Cai, Prithvijit Chattopadhyay, Y ongxin Chen, Yin Cui, Yifan Ding, Daniel Dworakowski, Jiaojiao Fan, Michele Fenzi, Francesco Ferroni, Sanja Fidler, Dieter Fox, Songwei Ge, Y unhao Ge, Jinwei Gu, Siddharth Gururani, Ethan He, Jiahui Huang, Jacob Huffman, Pooya Jannat...
work page 2025
-
[20]
Cosmos-transfer1: Conditional world generation with adaptive multimodal control, 2025
NVIDIA, :, Hassan Abu Alhaija, Jose Alvarez, Maciej Bala, Tiffany Cai, Tianshi Cao, Liz Cha, Joshua Chen, Mike Chen, Francesco Ferroni, Sanja Fidler, Dieter Fox, Y unhao Ge, Jinwei Gu, Ali Hassani, Michael Isaev, Pooya Jannaty, Shiyi Lan, Tobias Lasser, Huan Ling, Ming-Y u Liu, Xian Liu, Yifan Lu, Alice Luo, Qianli Ma, Hanzi Mao, Fabio Ramos, Xuanchi Ren,...
work page 2025
-
[21]
Maxime Oquab, Timoth´ee Darcet, Th´eo Moutakanni, Huy V . V o, Marc Szafraniec, V asil Khalidov, Pierre Fernandez, Daniel HAZIZA, Francisco Massa, Alaaeldin El-Nouby, Mido Assran, Nicolas Ballas, Wojciech Galuba, Russell Howes, Po-Y ao Huang, Shang-Wen Li, Ishan Misra, Michael Rabbat, V asu Sharma, Gabriel Synnaeve, Hu Xu, Herve Jegou, Julien Mairal, Patr...
work page 2024
-
[22]
Geometric transformer for fast and robust point cloud registration
Zheng Qin, Hao Y u, Changjian Wang, Y ulan Guo, Y uxing Peng, and Kai Xu. Geometric transformer for fast and robust point cloud registration. InProceedings of the IEEE/CVF conference 9 on computer vision and pattern recognition, pages 11143–11152,
-
[23]
Cosmos-drive-dreams: Scalable synthetic driving data generation with world foundation models, 2025
Xuanchi Ren, Yifan Lu, Tianshi Cao, Ruiyuan Gao, Shengyu Huang, Amirmojtaba Sabour, Tianchang Shen, Tobias Pfaff, Jay Zhangjie Wu, Runjian Chen, Seung Wook Kim, Jun Gao, Laura Leal-Taixe, Mike Chen, Sanja Fidler, and Huan Ling. Cosmos-drive-dreams: Scalable synthetic driving data generation with world foundation models, 2025. 12
work page 2025
-
[24]
High-resolution image synthesis with latent diffusion models
Robin Rombach, Andreas Blattmann, Dominik Lorenz, Patrick Esser, and Bj ¨orn Ommer. High-resolution image synthesis with latent diffusion models. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 10684–10695, 2022. 3
work page 2022
-
[25]
Fast point feature histograms (fpfh) for 3d registration
Radu Bogdan Rusu, Nico Blodow, and Michael Beetz. Fast point feature histograms (fpfh) for 3d registration. In2009 IEEE international conference on robotics and automation, pages 3212–3217. IEEE, 2009. 1, 2, 6
work page 2009
-
[26]
Superglue: Learning feature matching with graph neural networks
Paul-Edouard Sarlin, Daniel DeTone, Tomasz Malisiewicz, and Andrew Rabinovich. Superglue: Learning feature matching with graph neural networks. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 4938–4947, 2020. 7
work page 2020
-
[27]
A flexible technique for accurate omnidirectional camera cali- bration and structure from motion
Davide Scaramuzza, Agostino Martinelli, and Roland Siegwart. A flexible technique for accurate omnidirectional camera cali- bration and structure from motion. InF ourth IEEE International Conference on Computer V ision Systems (ICVS’06), pages 45–45. IEEE, 2006. 12
work page 2006
-
[28]
Scalability in perception for autonomous driving: Waymo open dataset
Pei Sun, Henrik Kretzschmar, Xerxes Dotiwalla, Aurelien Chouard, Vijaysai Patnaik, Paul Tsui, James Guo, Yin Zhou, Y uning Chai, Benjamin Caine, Vijay V asudevan, Wei Han, Jiquan Ngiam, Hang Zhao, Aleksei Timofeev, Scott Ettinger, Maxim Krivokon, Amy Gao, Aditya Joshi, Y u Zhang, Jonathon Shlens, Zhifeng Chen, and Dragomir Anguelov. Scalability in percept...
work page 2020
-
[29]
Foundational models for 3d point clouds: A survey and outlook,
Vishal Thengane, Xiatian Zhu, Salim Bouzerdoum, Son Lam Phung, and Y unpeng Li. Foundational models for 3d point clouds: A survey and outlook.arXiv preprint arXiv:2501.18594, 2025. 1
-
[30]
Unique signatures of histograms for local surface description
Federico Tombari, Samuele Salti, and Luigi Di Stefano. Unique signatures of histograms for local surface description. In European conference on computer vision, pages 356–369. Springer, 2010. 1, 2
work page 2010
-
[31]
Haiping Wang, Y uan Liu, Bing W ANG, YUJING SUN, Zhen Dong, Wenping Wang, and Bisheng Y ang. Freereg: Image-to- point cloud registration leveraging pretrained diffusion models and monocular depth estimators. InThe T welfth International Conference on Learning Representations, 2024. 2, 3
work page 2024
-
[32]
Dust3r: Geometric 3d vision made easy
Shuzhe Wang, Vincent Leroy, Y ohann Cabon, Boris Chidlovskii, and Jerome Revaud. Dust3r: Geometric 3d vision made easy. InCVPR, 2024. 2
work page 2024
-
[33]
Rotation-invariant transformer for point cloud matching
Hao Y u, Zheng Qin, Ji Hou, Mahdi Saleh, Dongsheng Li, Ben- jamin Busam, and Slobodan Ilic. Rotation-invariant transformer for point cloud matching. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 5384–5393, 2023. 2, 6, 7
work page 2023
-
[34]
Pointmbf: A multi-scale bidirectional fusion network for unsupervised rgb-d point cloud registration
Mingzhi Y uan, Kexue Fu, Zhihao Li, Y ucong Meng, and Manning Wang. Pointmbf: A multi-scale bidirectional fusion network for unsupervised rgb-d point cloud registration. In Proceedings of the IEEE/CVF International Conference on Computer V ision, pages 17694–17705, 2023. 2, 6, 7
work page 2023
-
[35]
3dmatch: Learning local geometric descriptors from rgb-d reconstructions
Andy Zeng, Shuran Song, Matthias Nießner, Matthew Fisher, Jianxiong Xiao, and Thomas Funkhouser. 3dmatch: Learning local geometric descriptors from rgb-d reconstructions. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 1802–1811, 2017. 1, 4, 6
work page 2017
-
[36]
Lvmin Zhang, Anyi Rao, and Maneesh Agrawala. Adding conditional control to text-to-image diffusion models. In Proceedings of the IEEE/CVF international conference on computer vision, pages 3836–3847, 2023. 3 10 C-GenReg: Training-Free 3D Point Cloud Registration by Multi-View-Consistent Geometry-to-Image Generation with Probabilistic Modalities Fusion Sup...
work page 2023
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.