pith. sign in

arxiv: 2604.16680 · v1 · submitted 2026-04-17 · 💻 cs.CV

C-GenReg: Training-Free 3D Point Cloud Registration by Multi-View-Consistent Geometry-to-Image Generation with Probabilistic Modalities Fusion

Pith reviewed 2026-05-10 08:31 UTC · model grok-4.3

classification 💻 cs.CV
keywords registrationc-genregcloudpointbranchgenerativewithoutacross
0
0 comments X

The pith

C-GenReg achieves training-free 3D point cloud registration by generating multi-view-consistent images from geometry, extracting VFM correspondences, and probabilistically fusing them with raw geometric matches for zero-shot performance on indoor and outdoor benchmarks.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The approach starts with two 3D point clouds that need to be aligned. Instead of working only in 3D space, it converts the geometry into several realistic RGB images that look consistent from different angles using a large pre-trained generative model. Pre-trained vision foundation models, which are already good at spotting matching points in photos, then find dense correspondences between these generated images. Those 2D matches are projected back into 3D using the original depth information from the point clouds. To improve reliability, the system runs a separate geometric matching process directly on the point clouds and combines the two sets of matches using a probabilistic fusion rule that weights them based on confidence without any extra training. This whole pipeline uses only existing pre-trained components and works on both indoor scenes and outdoor LiDAR scans where no original images exist. The result is a method that can handle differences in sensors and environments better than methods trained on specific datasets.

Core claim

For the first time, we demonstrate a generative registration framework that operates successfully on real outdoor LiDAR data, where no imagery data is available. Extensive experiments on indoor (3DMatch, ScanNet) and outdoor (Waymo) benchmarks demonstrate strong zero-shot performance and superior cross-domain generalization.

Load-bearing premise

The World Foundation Model can synthesize multi-view-consistent RGB representations from the input geometry that preserve spatial coherence across source and target views without any fine-tuning; if the generated images lack sufficient realism or consistency, the subsequent VFM-based matches will be unreliable.

Figures

Figures reproduced from arXiv: 2604.16680 by Amit Efraim, Joseph M. Francos, Yuval Haitman.

Figure 1
Figure 1. Figure 1: C-GenReg: A training-free point cloud registration frame￾work. The pipeline operates in two parallel branches: (1) Generated￾RGB Branch - a World Foundation Model generates RGB views that are geometrically aligned with the input source and target point clouds and visually consistent across the two viewpoints; a task-specific Vision Foundation Model extracts dense image features and estimates RGB-based corr… view at source ↗
Figure 2
Figure 2. Figure 2: C-GenReg Overview: A training-free, zero-shot point cloud registration framework with two parallel branches. (1) Generated-RGB Branch - source and target point clouds are each represented as depth-frame sequences, temporally concatenated and processed by a frozen World Foundation Model to generate RGB views that are geometrically aligned and appearance-consistent across views. A subset of K frames per doma… view at source ↗
Figure 3
Figure 3. Figure 3: C-GenReg qualitative example on 3DMatch. Generated source and target images with a subset of matched points (color-coded correspondences), and the corresponding matches visualized on the input point clouds. The resulting rotation (RRE) and translation (RTE) errors are reported.          [PITH_FULL_IMAGE:figures/full_fig_p004_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Prompt robustness on 3DMatch. Relative rotation (RRE,◦ ) and translation (RTE, cm) errors under different prompt types. geometric coherence across viewpoints. A task-specific VFM pretrained for dense geometric matching then extracts 2D fea￾tures from these synthesized views, which are lifted back to 3D using the original depth to obtain per-point descriptors. In par￾allel, the geometric branch encodes the … view at source ↗
Figure 6
Figure 6. Figure 6: Effect of View Selection (K). Registration performance measured by Relative Rotation Error (RRE) and Relative Translation Error (RTE) as a function of the number of selected views K. Performance saturates for K ≥ 4, indicating that only a few representative views are sufficient for stable registration. maps. To exploit this property, we sample K views uni￾formly from the L frames of the generated source an… view at source ↗
Figure 5
Figure 5. Figure 5: WFM Input Formatting. (a) Input depth maps of the source and target views. (b) Feeding the pretrained WFM with horizontally concatenated depth inputs causes cross-view inconsistencies, e.g., the sofa is mistakenly replaced in the generated source image. (c) Using temporal concatenation produces RGB outputs that are geometrically coherent and appearance-consistent between the two views. This temporal concat… view at source ↗
Figure 7
Figure 7. Figure 7: C-GenReg LiDAR Input Pipeline: (a) A virtual camera is configured into the LiDAR scan. (b) The LiDAR points are projected into a depth image. (c) The resulting depth map is fed into the generative model to produce an aligned RGB image. and Noisy-OR on the point-matching task [PITH_FULL_IMAGE:figures/full_fig_p013_7.png] view at source ↗
Figure 8
Figure 8. Figure 8: Matching Performance Comparison of Noisy-AND vs. Noisy-OR. Precision–recall curves comparing the two probabilistic fusion operators on the point-matching task (a match is correct if within 5cm under the ground-truth transformation). Noisy-AND consistently achieves higher precision at similar recall rates. Tab. 5, this stage accounts for almost the entire runtime (507s), while the remaining components are l… view at source ↗
Figure 9
Figure 9. Figure 9: Qualitative registration example from C-GenReg on the 3DMatch dataset. Generated source and target images with a subset of matched keypoints (same color indicates correspondence), and the same correspondences visualized on the source and target 3D point clouds. The resulting rotation error (RRE, °) and translation error (RTE, cm) are reported as well. 15 [PITH_FULL_IMAGE:figures/full_fig_p015_9.png] view at source ↗
Figure 10
Figure 10. Figure 10: Qualitative registration examples of C-GenReg on the Waymo dataset. Row (a) shows generated source and target images with a subset of matched keypoints (same color indicates correspondence). Row (b) shows the same correspondences visualized on the source and target 3D point clouds. The resulting rotation error (RRE, °) and translation error (RTE, m) are also reported. 16 [PITH_FULL_IMAGE:figures/full_fig… view at source ↗
Figure 11
Figure 11. Figure 11: Multi-view consistent RGB generation from depth on 3DMatch. Three representative synthetic RGB examples generated from depth. The paired views remain geometrically and visually consistent. (a) Example 1 (b) Example 2 (c) Example 3 [PITH_FULL_IMAGE:figures/full_fig_p017_11.png] view at source ↗
Figure 12
Figure 12. Figure 12: Multi-view consistent RGB generation from depth on ScanNet. Three representative synthetic RGB examples from indoor depth scans. The synthesized frames preserve layout and structure across viewpoints. (a) Example 1 (b) Example 2 (c) Example 3 (d) Example 4 [PITH_FULL_IMAGE:figures/full_fig_p017_12.png] view at source ↗
Figure 13
Figure 13. Figure 13: Multi-view consistent RGB generation from depth on Waymo. Four representative synthetic RGB examples generated from LiDAR-projected depth. The synthesized frames preserve scene geometry across viewpoints. 17 [PITH_FULL_IMAGE:figures/full_fig_p017_13.png] view at source ↗
read the original abstract

We introduce C-GenReg, a training-free framework for 3D point cloud registration that leverages the complementary strengths of world-scale generative priors and registration-oriented Vision Foundation Models (VFMs). Current learning-based 3D point cloud registration methods struggle to generalize across sensing modalities, sampling differences, and environments. Hence, C-GenReg augments the geometric point cloud registration branch by transferring the matching problem into an auxiliary image domain, where VFMs excel, using a World Foundation Model to synthesize multi-view-consistent RGB representations from the input geometry. This generative transfer, preserves spatial coherence across source and target views without any fine-tuning. From these generated views, a VFM pretrained for finding dense correspondences extracts matches. The resulting pixel correspondences are lifted back to 3D via the original depth maps. To further enhance robustness, we introduce a "Match-then-Fuse" probabilistic cold-fusion scheme that combines two independent correspondence posteriors, that of the generated-RGB branch with that of the raw geometric branch. This principled fusion preserves each modality inductive bias and provides calibrated confidence without any additional learning. C-GenReg is zero-shot and plug-and-play: all modules are pretrained and operate without fine-tuning. Extensive experiments on indoor (3DMatch, ScanNet) and outdoor (Waymo) benchmarks demonstrate strong zero-shot performance and superior cross-domain generalization. For the first time, we demonstrate a generative registration framework that operates successfully on real outdoor LiDAR data, where no imagery data is available.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

1 major / 1 minor

Summary. The paper introduces C-GenReg, a training-free 3D point cloud registration framework that augments a geometric registration branch by using a pretrained World Foundation Model to synthesize multi-view-consistent RGB images from source and target point clouds (no real imagery or fine-tuning), extracts dense correspondences via a Vision Foundation Model, lifts pixel matches back to 3D using original depth maps, and fuses the resulting posterior with the geometric branch via a probabilistic 'Match-then-Fuse' scheme that preserves each modality's inductive bias. It reports strong zero-shot performance on indoor benchmarks (3DMatch, ScanNet) and outdoor LiDAR (Waymo), claiming the first successful generative registration on real outdoor data without imagery.

Significance. If the central claims hold, this would be a notable contribution to zero-shot cross-domain 3D registration by demonstrating that pretrained generative and correspondence models can transfer matching problems into the image domain while maintaining spatial coherence, enabling robust performance on challenging outdoor LiDAR without any task-specific training or imagery.

major comments (1)
  1. [Abstract] Abstract and experimental claims on Waymo: the central assertion that the World Foundation Model produces multi-view-consistent RGB representations from sparse outdoor LiDAR geometry (preserving spatial coherence without fine-tuning) is load-bearing for the zero-shot outdoor success and cross-domain generalization, yet the manuscript provides no quantitative validation such as cross-view reprojection error, consistency metrics, or ablation removing the generative branch on Waymo; without these, it is unclear whether the reported performance relies on the generative transfer or defaults to the geometric branch alone.
minor comments (1)
  1. [Method] The description of the 'Match-then-Fuse' probabilistic cold-fusion scheme would benefit from an explicit equation or pseudocode showing how the two independent posteriors are combined and how calibration is achieved.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The central claim rests on assumptions about the capabilities of external pre-trained models rather than new derivations or fitted parameters.

axioms (2)
  • domain assumption A pre-trained World Foundation Model can generate multi-view-consistent RGB images from point cloud geometry that preserve spatial coherence without fine-tuning
    Invoked in the generative transfer step of the framework
  • domain assumption Pre-trained Vision Foundation Models can reliably extract dense correspondences from the synthesized images
    Central to lifting pixel matches back to 3D

pith-pipeline@v0.9.0 · 5590 in / 1476 out tokens · 30628 ms · 2026-05-10T08:31:17.909406+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

36 extracted references · 36 canonical work pages

  1. [1]

    Foundation models defining a new era in vision: a survey and outlook.IEEE Trans- actions on P attern Analysis and Machine Intelligence, 2025

    Muhammad Awais, Muzammal Naseer, Salman Khan, Rao Muhammad Anwer, Hisham Cholakkal, Mubarak Shah, Ming-Hsuan Y ang, and Fahad Shahbaz Khan. Foundation models defining a new era in vision: a survey and outlook.IEEE Trans- actions on P attern Analysis and Machine Intelligence, 2025. 1

  2. [2]

    Method for registration of 3-d shapes

    Paul J Besl and Neil D McKay. Method for registration of 3-d shapes. InSensor fusion IV: control paradigms and data structures, pages 586–606. Spie, 1992. 2

  3. [3]

    Sc2-pcr: A sec- ond order spatial compatibility for efficient and robust point cloud registration

    Zhi Chen, Kun Sun, Fan Y ang, and Wenbing Tao. Sc2-pcr: A sec- ond order spatial compatibility for efficient and robust point cloud registration. InProceedings of the IEEE/CVF conference on com- puter vision and pattern recognition, pages 13221–13231, 2022. 6

  4. [4]

    Fully convolutional geometric features

    Christopher Choy, Jaesik Park, and Vladlen Koltun. Fully convolutional geometric features. InProceedings of the IEEE/CVF international conference on computer vision, pages 8958–8966, 2019. 1, 2, 6, 7

  5. [5]

    A generic fisheye camera model for robotic applica- tions

    Jonathan Courbon, Y oucef Mezouar, Laurent Eckt, and Philippe Martinet. A generic fisheye camera model for robotic applica- tions. In2007 IEEE/RSJ International Conference on Intelligent Robots and Systems, pages 1683–1688. IEEE, 2007. 12

  6. [6]

    A volumetric method for building complex models from range images

    Brian Curless and Marc Levoy. A volumetric method for building complex models from range images. InProceedings of the 23rd annual conference on Computer graphics and interactive techniques, pages 303–312, 1996. 4

  7. [7]

    Scannet: Richly- annotated 3d reconstructions of indoor scenes

    Angela Dai, Angel X Chang, Manolis Savva, Maciej Halber, Thomas Funkhouser, and Matthias Nießner. Scannet: Richly- annotated 3d reconstructions of indoor scenes. InProceedings of the IEEE conference on computer vision and pattern recognition, pages 5828–5839, 2017. 4, 6

  8. [8]

    Roma: Robust dense feature matching

    Johan Edstedt, Qiyu Sun, Georg B¨okman, M˚arten Wadenb¨ack, and Michael Felsberg. Roma: Robust dense feature matching. In Proceedings of the IEEE/CVF Conference on Computer V ision and P attern Recognition (CVPR), pages 19790–19800, 2024. 2, 8

  9. [9]

    Random sample consensus: a paradigm for model fitting with applications to image analysis and automated cartography.Communications of the ACM, 24(6):381–395, 1981

    Martin A Fischler and Robert C Bolles. Random sample consensus: a paradigm for model fitting with applications to image analysis and automated cartography.Communications of the ACM, 24(6):381–395, 1981. 1, 2

  10. [10]

    Vision meets robotics: The kitti dataset.International Journal of Robotics Research (IJRR), 2013

    Andreas Geiger, Philip Lenz, Christoph Stiller, and Raquel Urtasun. Vision meets robotics: The kitti dataset.International Journal of Robotics Research (IJRR), 2013. 7

  11. [11]

    Closed-form solution of absolute orientation using unit quaternions.Journal of the optical society of America A, 4(4):629–642, 1987

    Berthold KP Horn. Closed-form solution of absolute orientation using unit quaternions.Journal of the optical society of America A, 4(4):629–642, 1987. 3

  12. [12]

    Predator: Registration of 3d point clouds with low overlap

    Shengyu Huang, Zan Gojcic, Mikhail Usvyatsov, Andreas Wieser, and Konrad Schindler. Predator: Registration of 3d point clouds with low overlap. InProceedings of the IEEE/CVF Conference on computer vision and pattern recognition, pages 4267–4276, 2021. 1, 2, 6, 7

  13. [13]

    Generative point cloud registration

    Haobo Jiang, Jin Xie, Jian Y ang, Liang Y u, and Jianmin Zheng. Generative point cloud registration. InF orty-second International Conference on Machine Learning, 2025. 2, 3, 6, 7, 8

  14. [14]

    Zero-shot rgb-d point cloud registration with pre-trained large vision model

    Haobo Jiang, Jin Xie, Jian Y ang, Liang Y u, and Jianmin Zheng. Zero-shot rgb-d point cloud registration with pre-trained large vision model. InProceedings of the Computer V ision and P attern Recognition Conference, pages 16943–16952, 2025. 2, 3, 6, 7, 8

  15. [15]

    Distilling cosmos transfer 1 models

    Grace Lam. Distilling cosmos transfer 1 models. https: //nvidia-cosmos.github.io/cosmos-cookbook/ core _ concepts / distillation / distilling _ transfer1.html, 2025. NVIDIA Cosmos Cookbook. 14

  16. [16]

    Grounding image matching in 3d with mast3r, 2024

    Vincent Leroy, Y ohann Cabon, and Jerome Revaud. Grounding image matching in 3d with mast3r, 2024. 2, 4, 6, 7, 8

  17. [17]

    Unsupervised deep probabilistic approach for partial point cloud registration

    Guofeng Mei, Hao Tang, Xiaoshui Huang, Weijie Wang, Juan Liu, Jian Zhang, Luc V an Gool, and Qiang Wu. Unsupervised deep probabilistic approach for partial point cloud registration. In Proceedings of the IEEE/CVF Conference on Computer V ision and P attern Recognition, pages 13611–13620, 2023. 2, 6

  18. [18]

    Colorpcr: Color point cloud registration with multi-stage geometric-color fusion

    Juncheng Mu, Lin Bie, Shaoyi Du, and Y ue Gao. Colorpcr: Color point cloud registration with multi-stage geometric-color fusion. InProceedings of the IEEE/CVF Conference on Computer V ision and P attern Recognition, pages 21061–21070, 2024. 3

  19. [19]

    Cosmos world foundation model platform for physical ai, 2025

    NVIDIA, :, Niket Agarwal, Arslan Ali, Maciej Bala, Y ogesh Bal- aji, Erik Barker, Tiffany Cai, Prithvijit Chattopadhyay, Y ongxin Chen, Yin Cui, Yifan Ding, Daniel Dworakowski, Jiaojiao Fan, Michele Fenzi, Francesco Ferroni, Sanja Fidler, Dieter Fox, Songwei Ge, Y unhao Ge, Jinwei Gu, Siddharth Gururani, Ethan He, Jiahui Huang, Jacob Huffman, Pooya Jannat...

  20. [20]

    Cosmos-transfer1: Conditional world generation with adaptive multimodal control, 2025

    NVIDIA, :, Hassan Abu Alhaija, Jose Alvarez, Maciej Bala, Tiffany Cai, Tianshi Cao, Liz Cha, Joshua Chen, Mike Chen, Francesco Ferroni, Sanja Fidler, Dieter Fox, Y unhao Ge, Jinwei Gu, Ali Hassani, Michael Isaev, Pooya Jannaty, Shiyi Lan, Tobias Lasser, Huan Ling, Ming-Y u Liu, Xian Liu, Yifan Lu, Alice Luo, Qianli Ma, Hanzi Mao, Fabio Ramos, Xuanchi Ren,...

  21. [21]

    Maxime Oquab, Timoth´ee Darcet, Th´eo Moutakanni, Huy V . V o, Marc Szafraniec, V asil Khalidov, Pierre Fernandez, Daniel HAZIZA, Francisco Massa, Alaaeldin El-Nouby, Mido Assran, Nicolas Ballas, Wojciech Galuba, Russell Howes, Po-Y ao Huang, Shang-Wen Li, Ishan Misra, Michael Rabbat, V asu Sharma, Gabriel Synnaeve, Hu Xu, Herve Jegou, Julien Mairal, Patr...

  22. [22]

    Geometric transformer for fast and robust point cloud registration

    Zheng Qin, Hao Y u, Changjian Wang, Y ulan Guo, Y uxing Peng, and Kai Xu. Geometric transformer for fast and robust point cloud registration. InProceedings of the IEEE/CVF conference 9 on computer vision and pattern recognition, pages 11143–11152,

  23. [23]

    Cosmos-drive-dreams: Scalable synthetic driving data generation with world foundation models, 2025

    Xuanchi Ren, Yifan Lu, Tianshi Cao, Ruiyuan Gao, Shengyu Huang, Amirmojtaba Sabour, Tianchang Shen, Tobias Pfaff, Jay Zhangjie Wu, Runjian Chen, Seung Wook Kim, Jun Gao, Laura Leal-Taixe, Mike Chen, Sanja Fidler, and Huan Ling. Cosmos-drive-dreams: Scalable synthetic driving data generation with world foundation models, 2025. 12

  24. [24]

    High-resolution image synthesis with latent diffusion models

    Robin Rombach, Andreas Blattmann, Dominik Lorenz, Patrick Esser, and Bj ¨orn Ommer. High-resolution image synthesis with latent diffusion models. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 10684–10695, 2022. 3

  25. [25]

    Fast point feature histograms (fpfh) for 3d registration

    Radu Bogdan Rusu, Nico Blodow, and Michael Beetz. Fast point feature histograms (fpfh) for 3d registration. In2009 IEEE international conference on robotics and automation, pages 3212–3217. IEEE, 2009. 1, 2, 6

  26. [26]

    Superglue: Learning feature matching with graph neural networks

    Paul-Edouard Sarlin, Daniel DeTone, Tomasz Malisiewicz, and Andrew Rabinovich. Superglue: Learning feature matching with graph neural networks. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 4938–4947, 2020. 7

  27. [27]

    A flexible technique for accurate omnidirectional camera cali- bration and structure from motion

    Davide Scaramuzza, Agostino Martinelli, and Roland Siegwart. A flexible technique for accurate omnidirectional camera cali- bration and structure from motion. InF ourth IEEE International Conference on Computer V ision Systems (ICVS’06), pages 45–45. IEEE, 2006. 12

  28. [28]

    Scalability in perception for autonomous driving: Waymo open dataset

    Pei Sun, Henrik Kretzschmar, Xerxes Dotiwalla, Aurelien Chouard, Vijaysai Patnaik, Paul Tsui, James Guo, Yin Zhou, Y uning Chai, Benjamin Caine, Vijay V asudevan, Wei Han, Jiquan Ngiam, Hang Zhao, Aleksei Timofeev, Scott Ettinger, Maxim Krivokon, Amy Gao, Aditya Joshi, Y u Zhang, Jonathon Shlens, Zhifeng Chen, and Dragomir Anguelov. Scalability in percept...

  29. [29]

    Foundational models for 3d point clouds: A survey and outlook,

    Vishal Thengane, Xiatian Zhu, Salim Bouzerdoum, Son Lam Phung, and Y unpeng Li. Foundational models for 3d point clouds: A survey and outlook.arXiv preprint arXiv:2501.18594, 2025. 1

  30. [30]

    Unique signatures of histograms for local surface description

    Federico Tombari, Samuele Salti, and Luigi Di Stefano. Unique signatures of histograms for local surface description. In European conference on computer vision, pages 356–369. Springer, 2010. 1, 2

  31. [31]

    Freereg: Image-to- point cloud registration leveraging pretrained diffusion models and monocular depth estimators

    Haiping Wang, Y uan Liu, Bing W ANG, YUJING SUN, Zhen Dong, Wenping Wang, and Bisheng Y ang. Freereg: Image-to- point cloud registration leveraging pretrained diffusion models and monocular depth estimators. InThe T welfth International Conference on Learning Representations, 2024. 2, 3

  32. [32]

    Dust3r: Geometric 3d vision made easy

    Shuzhe Wang, Vincent Leroy, Y ohann Cabon, Boris Chidlovskii, and Jerome Revaud. Dust3r: Geometric 3d vision made easy. InCVPR, 2024. 2

  33. [33]

    Rotation-invariant transformer for point cloud matching

    Hao Y u, Zheng Qin, Ji Hou, Mahdi Saleh, Dongsheng Li, Ben- jamin Busam, and Slobodan Ilic. Rotation-invariant transformer for point cloud matching. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 5384–5393, 2023. 2, 6, 7

  34. [34]

    Pointmbf: A multi-scale bidirectional fusion network for unsupervised rgb-d point cloud registration

    Mingzhi Y uan, Kexue Fu, Zhihao Li, Y ucong Meng, and Manning Wang. Pointmbf: A multi-scale bidirectional fusion network for unsupervised rgb-d point cloud registration. In Proceedings of the IEEE/CVF International Conference on Computer V ision, pages 17694–17705, 2023. 2, 6, 7

  35. [35]

    3dmatch: Learning local geometric descriptors from rgb-d reconstructions

    Andy Zeng, Shuran Song, Matthias Nießner, Matthew Fisher, Jianxiong Xiao, and Thomas Funkhouser. 3dmatch: Learning local geometric descriptors from rgb-d reconstructions. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 1802–1811, 2017. 1, 4, 6

  36. [36]

    safeguard

    Lvmin Zhang, Anyi Rao, and Maneesh Agrawala. Adding conditional control to text-to-image diffusion models. In Proceedings of the IEEE/CVF international conference on computer vision, pages 3836–3847, 2023. 3 10 C-GenReg: Training-Free 3D Point Cloud Registration by Multi-View-Consistent Geometry-to-Image Generation with Probabilistic Modalities Fusion Sup...