C-GenReg: Training-Free 3D Point Cloud Registration by Multi-View-Consistent Geometry-to-Image Generation with Probabilistic Modalities Fusion

Amit Efraim; Joseph M. Francos; Yuval Haitman

arxiv: 2604.16680 · v1 · submitted 2026-04-17 · 💻 cs.CV

C-GenReg: Training-Free 3D Point Cloud Registration by Multi-View-Consistent Geometry-to-Image Generation with Probabilistic Modalities Fusion

Yuval Haitman , Amit Efraim , Joseph M. Francos This is my paper

Pith reviewed 2026-05-10 08:31 UTC · model grok-4.3

classification 💻 cs.CV

keywords registrationc-genregcloudpointbranchgenerativewithoutacross

0 comments

The pith

C-GenReg achieves training-free 3D point cloud registration by generating multi-view-consistent images from geometry, extracting VFM correspondences, and probabilistically fusing them with raw geometric matches for zero-shot performance on indoor and outdoor benchmarks.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The approach starts with two 3D point clouds that need to be aligned. Instead of working only in 3D space, it converts the geometry into several realistic RGB images that look consistent from different angles using a large pre-trained generative model. Pre-trained vision foundation models, which are already good at spotting matching points in photos, then find dense correspondences between these generated images. Those 2D matches are projected back into 3D using the original depth information from the point clouds. To improve reliability, the system runs a separate geometric matching process directly on the point clouds and combines the two sets of matches using a probabilistic fusion rule that weights them based on confidence without any extra training. This whole pipeline uses only existing pre-trained components and works on both indoor scenes and outdoor LiDAR scans where no original images exist. The result is a method that can handle differences in sensors and environments better than methods trained on specific datasets.

Core claim

For the first time, we demonstrate a generative registration framework that operates successfully on real outdoor LiDAR data, where no imagery data is available. Extensive experiments on indoor (3DMatch, ScanNet) and outdoor (Waymo) benchmarks demonstrate strong zero-shot performance and superior cross-domain generalization.

Load-bearing premise

The World Foundation Model can synthesize multi-view-consistent RGB representations from the input geometry that preserve spatial coherence across source and target views without any fine-tuning; if the generated images lack sufficient realism or consistency, the subsequent VFM-based matches will be unreliable.

Figures

Figures reproduced from arXiv: 2604.16680 by Amit Efraim, Joseph M. Francos, Yuval Haitman.

**Figure 1.** Figure 1: C-GenReg: A training-free point cloud registration framework. The pipeline operates in two parallel branches: (1) GeneratedRGB Branch - a World Foundation Model generates RGB views that are geometrically aligned with the input source and target point clouds and visually consistent across the two viewpoints; a task-specific Vision Foundation Model extracts dense image features and estimates RGB-based corr… view at source ↗

**Figure 2.** Figure 2: C-GenReg Overview: A training-free, zero-shot point cloud registration framework with two parallel branches. (1) Generated-RGB Branch - source and target point clouds are each represented as depth-frame sequences, temporally concatenated and processed by a frozen World Foundation Model to generate RGB views that are geometrically aligned and appearance-consistent across views. A subset of K frames per doma… view at source ↗

**Figure 3.** Figure 3: C-GenReg qualitative example on 3DMatch. Generated source and target images with a subset of matched points (color-coded correspondences), and the corresponding matches visualized on the input point clouds. The resulting rotation (RRE) and translation (RTE) errors are reported. [PITH_FULL_IMAGE:figures/full_fig_p004_3.png] view at source ↗

**Figure 4.** Figure 4: Prompt robustness on 3DMatch. Relative rotation (RRE,◦ ) and translation (RTE, cm) errors under different prompt types. geometric coherence across viewpoints. A task-specific VFM pretrained for dense geometric matching then extracts 2D features from these synthesized views, which are lifted back to 3D using the original depth to obtain per-point descriptors. In parallel, the geometric branch encodes the … view at source ↗

**Figure 6.** Figure 6: Effect of View Selection (K). Registration performance measured by Relative Rotation Error (RRE) and Relative Translation Error (RTE) as a function of the number of selected views K. Performance saturates for K ≥ 4, indicating that only a few representative views are sufficient for stable registration. maps. To exploit this property, we sample K views uniformly from the L frames of the generated source an… view at source ↗

**Figure 5.** Figure 5: WFM Input Formatting. (a) Input depth maps of the source and target views. (b) Feeding the pretrained WFM with horizontally concatenated depth inputs causes cross-view inconsistencies, e.g., the sofa is mistakenly replaced in the generated source image. (c) Using temporal concatenation produces RGB outputs that are geometrically coherent and appearance-consistent between the two views. This temporal concat… view at source ↗

**Figure 7.** Figure 7: C-GenReg LiDAR Input Pipeline: (a) A virtual camera is configured into the LiDAR scan. (b) The LiDAR points are projected into a depth image. (c) The resulting depth map is fed into the generative model to produce an aligned RGB image. and Noisy-OR on the point-matching task [PITH_FULL_IMAGE:figures/full_fig_p013_7.png] view at source ↗

**Figure 8.** Figure 8: Matching Performance Comparison of Noisy-AND vs. Noisy-OR. Precision–recall curves comparing the two probabilistic fusion operators on the point-matching task (a match is correct if within 5cm under the ground-truth transformation). Noisy-AND consistently achieves higher precision at similar recall rates. Tab. 5, this stage accounts for almost the entire runtime (507s), while the remaining components are l… view at source ↗

**Figure 9.** Figure 9: Qualitative registration example from C-GenReg on the 3DMatch dataset. Generated source and target images with a subset of matched keypoints (same color indicates correspondence), and the same correspondences visualized on the source and target 3D point clouds. The resulting rotation error (RRE, °) and translation error (RTE, cm) are reported as well. 15 [PITH_FULL_IMAGE:figures/full_fig_p015_9.png] view at source ↗

**Figure 10.** Figure 10: Qualitative registration examples of C-GenReg on the Waymo dataset. Row (a) shows generated source and target images with a subset of matched keypoints (same color indicates correspondence). Row (b) shows the same correspondences visualized on the source and target 3D point clouds. The resulting rotation error (RRE, °) and translation error (RTE, m) are also reported. 16 [PITH_FULL_IMAGE:figures/full_fig… view at source ↗

**Figure 11.** Figure 11: Multi-view consistent RGB generation from depth on 3DMatch. Three representative synthetic RGB examples generated from depth. The paired views remain geometrically and visually consistent. (a) Example 1 (b) Example 2 (c) Example 3 [PITH_FULL_IMAGE:figures/full_fig_p017_11.png] view at source ↗

**Figure 12.** Figure 12: Multi-view consistent RGB generation from depth on ScanNet. Three representative synthetic RGB examples from indoor depth scans. The synthesized frames preserve layout and structure across viewpoints. (a) Example 1 (b) Example 2 (c) Example 3 (d) Example 4 [PITH_FULL_IMAGE:figures/full_fig_p017_12.png] view at source ↗

**Figure 13.** Figure 13: Multi-view consistent RGB generation from depth on Waymo. Four representative synthetic RGB examples generated from LiDAR-projected depth. The synthesized frames preserve scene geometry across viewpoints. 17 [PITH_FULL_IMAGE:figures/full_fig_p017_13.png] view at source ↗

read the original abstract

We introduce C-GenReg, a training-free framework for 3D point cloud registration that leverages the complementary strengths of world-scale generative priors and registration-oriented Vision Foundation Models (VFMs). Current learning-based 3D point cloud registration methods struggle to generalize across sensing modalities, sampling differences, and environments. Hence, C-GenReg augments the geometric point cloud registration branch by transferring the matching problem into an auxiliary image domain, where VFMs excel, using a World Foundation Model to synthesize multi-view-consistent RGB representations from the input geometry. This generative transfer, preserves spatial coherence across source and target views without any fine-tuning. From these generated views, a VFM pretrained for finding dense correspondences extracts matches. The resulting pixel correspondences are lifted back to 3D via the original depth maps. To further enhance robustness, we introduce a "Match-then-Fuse" probabilistic cold-fusion scheme that combines two independent correspondence posteriors, that of the generated-RGB branch with that of the raw geometric branch. This principled fusion preserves each modality inductive bias and provides calibrated confidence without any additional learning. C-GenReg is zero-shot and plug-and-play: all modules are pretrained and operate without fine-tuning. Extensive experiments on indoor (3DMatch, ScanNet) and outdoor (Waymo) benchmarks demonstrate strong zero-shot performance and superior cross-domain generalization. For the first time, we demonstrate a generative registration framework that operates successfully on real outdoor LiDAR data, where no imagery data is available.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

1 major / 1 minor

Summary. The paper introduces C-GenReg, a training-free 3D point cloud registration framework that augments a geometric registration branch by using a pretrained World Foundation Model to synthesize multi-view-consistent RGB images from source and target point clouds (no real imagery or fine-tuning), extracts dense correspondences via a Vision Foundation Model, lifts pixel matches back to 3D using original depth maps, and fuses the resulting posterior with the geometric branch via a probabilistic 'Match-then-Fuse' scheme that preserves each modality's inductive bias. It reports strong zero-shot performance on indoor benchmarks (3DMatch, ScanNet) and outdoor LiDAR (Waymo), claiming the first successful generative registration on real outdoor data without imagery.

Significance. If the central claims hold, this would be a notable contribution to zero-shot cross-domain 3D registration by demonstrating that pretrained generative and correspondence models can transfer matching problems into the image domain while maintaining spatial coherence, enabling robust performance on challenging outdoor LiDAR without any task-specific training or imagery.

major comments (1)

[Abstract] Abstract and experimental claims on Waymo: the central assertion that the World Foundation Model produces multi-view-consistent RGB representations from sparse outdoor LiDAR geometry (preserving spatial coherence without fine-tuning) is load-bearing for the zero-shot outdoor success and cross-domain generalization, yet the manuscript provides no quantitative validation such as cross-view reprojection error, consistency metrics, or ablation removing the generative branch on Waymo; without these, it is unclear whether the reported performance relies on the generative transfer or defaults to the geometric branch alone.

minor comments (1)

[Method] The description of the 'Match-then-Fuse' probabilistic cold-fusion scheme would benefit from an explicit equation or pseudocode showing how the two independent posteriors are combined and how calibration is achieved.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The central claim rests on assumptions about the capabilities of external pre-trained models rather than new derivations or fitted parameters.

axioms (2)

domain assumption A pre-trained World Foundation Model can generate multi-view-consistent RGB images from point cloud geometry that preserve spatial coherence without fine-tuning
Invoked in the generative transfer step of the framework
domain assumption Pre-trained Vision Foundation Models can reliably extract dense correspondences from the synthesized images
Central to lifting pixel matches back to 3D

pith-pipeline@v0.9.0 · 5590 in / 1476 out tokens · 30628 ms · 2026-05-10T08:31:17.909406+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

36 extracted references · 36 canonical work pages

[1]

Foundation models defining a new era in vision: a survey and outlook.IEEE Trans- actions on P attern Analysis and Machine Intelligence, 2025

Muhammad Awais, Muzammal Naseer, Salman Khan, Rao Muhammad Anwer, Hisham Cholakkal, Mubarak Shah, Ming-Hsuan Y ang, and Fahad Shahbaz Khan. Foundation models defining a new era in vision: a survey and outlook.IEEE Trans- actions on P attern Analysis and Machine Intelligence, 2025. 1

work page 2025
[2]

Method for registration of 3-d shapes

Paul J Besl and Neil D McKay. Method for registration of 3-d shapes. InSensor fusion IV: control paradigms and data structures, pages 586–606. Spie, 1992. 2

work page 1992
[3]

Sc2-pcr: A sec- ond order spatial compatibility for efficient and robust point cloud registration

Zhi Chen, Kun Sun, Fan Y ang, and Wenbing Tao. Sc2-pcr: A sec- ond order spatial compatibility for efficient and robust point cloud registration. InProceedings of the IEEE/CVF conference on com- puter vision and pattern recognition, pages 13221–13231, 2022. 6

work page 2022
[4]

Fully convolutional geometric features

Christopher Choy, Jaesik Park, and Vladlen Koltun. Fully convolutional geometric features. InProceedings of the IEEE/CVF international conference on computer vision, pages 8958–8966, 2019. 1, 2, 6, 7

work page 2019
[5]

A generic fisheye camera model for robotic applica- tions

Jonathan Courbon, Y oucef Mezouar, Laurent Eckt, and Philippe Martinet. A generic fisheye camera model for robotic applica- tions. In2007 IEEE/RSJ International Conference on Intelligent Robots and Systems, pages 1683–1688. IEEE, 2007. 12

work page 2007
[6]

A volumetric method for building complex models from range images

Brian Curless and Marc Levoy. A volumetric method for building complex models from range images. InProceedings of the 23rd annual conference on Computer graphics and interactive techniques, pages 303–312, 1996. 4

work page 1996
[7]

Scannet: Richly- annotated 3d reconstructions of indoor scenes

Angela Dai, Angel X Chang, Manolis Savva, Maciej Halber, Thomas Funkhouser, and Matthias Nießner. Scannet: Richly- annotated 3d reconstructions of indoor scenes. InProceedings of the IEEE conference on computer vision and pattern recognition, pages 5828–5839, 2017. 4, 6

work page 2017
[8]

Roma: Robust dense feature matching

Johan Edstedt, Qiyu Sun, Georg B¨okman, M˚arten Wadenb¨ack, and Michael Felsberg. Roma: Robust dense feature matching. In Proceedings of the IEEE/CVF Conference on Computer V ision and P attern Recognition (CVPR), pages 19790–19800, 2024. 2, 8

work page 2024
[9]

Random sample consensus: a paradigm for model fitting with applications to image analysis and automated cartography.Communications of the ACM, 24(6):381–395, 1981

Martin A Fischler and Robert C Bolles. Random sample consensus: a paradigm for model fitting with applications to image analysis and automated cartography.Communications of the ACM, 24(6):381–395, 1981. 1, 2

work page 1981
[10]

Vision meets robotics: The kitti dataset.International Journal of Robotics Research (IJRR), 2013

Andreas Geiger, Philip Lenz, Christoph Stiller, and Raquel Urtasun. Vision meets robotics: The kitti dataset.International Journal of Robotics Research (IJRR), 2013. 7

work page 2013
[11]

Closed-form solution of absolute orientation using unit quaternions.Journal of the optical society of America A, 4(4):629–642, 1987

Berthold KP Horn. Closed-form solution of absolute orientation using unit quaternions.Journal of the optical society of America A, 4(4):629–642, 1987. 3

work page 1987
[12]

Predator: Registration of 3d point clouds with low overlap

Shengyu Huang, Zan Gojcic, Mikhail Usvyatsov, Andreas Wieser, and Konrad Schindler. Predator: Registration of 3d point clouds with low overlap. InProceedings of the IEEE/CVF Conference on computer vision and pattern recognition, pages 4267–4276, 2021. 1, 2, 6, 7

work page 2021
[13]

Generative point cloud registration

Haobo Jiang, Jin Xie, Jian Y ang, Liang Y u, and Jianmin Zheng. Generative point cloud registration. InF orty-second International Conference on Machine Learning, 2025. 2, 3, 6, 7, 8

work page 2025
[14]

Zero-shot rgb-d point cloud registration with pre-trained large vision model

Haobo Jiang, Jin Xie, Jian Y ang, Liang Y u, and Jianmin Zheng. Zero-shot rgb-d point cloud registration with pre-trained large vision model. InProceedings of the Computer V ision and P attern Recognition Conference, pages 16943–16952, 2025. 2, 3, 6, 7, 8

work page 2025
[15]

Distilling cosmos transfer 1 models

Grace Lam. Distilling cosmos transfer 1 models. https: //nvidia-cosmos.github.io/cosmos-cookbook/ core _ concepts / distillation / distilling _ transfer1.html, 2025. NVIDIA Cosmos Cookbook. 14

work page 2025
[16]

Grounding image matching in 3d with mast3r, 2024

Vincent Leroy, Y ohann Cabon, and Jerome Revaud. Grounding image matching in 3d with mast3r, 2024. 2, 4, 6, 7, 8

work page 2024
[17]

Unsupervised deep probabilistic approach for partial point cloud registration

Guofeng Mei, Hao Tang, Xiaoshui Huang, Weijie Wang, Juan Liu, Jian Zhang, Luc V an Gool, and Qiang Wu. Unsupervised deep probabilistic approach for partial point cloud registration. In Proceedings of the IEEE/CVF Conference on Computer V ision and P attern Recognition, pages 13611–13620, 2023. 2, 6

work page 2023
[18]

Colorpcr: Color point cloud registration with multi-stage geometric-color fusion

Juncheng Mu, Lin Bie, Shaoyi Du, and Y ue Gao. Colorpcr: Color point cloud registration with multi-stage geometric-color fusion. InProceedings of the IEEE/CVF Conference on Computer V ision and P attern Recognition, pages 21061–21070, 2024. 3

work page 2024
[19]

Cosmos world foundation model platform for physical ai, 2025

NVIDIA, :, Niket Agarwal, Arslan Ali, Maciej Bala, Y ogesh Bal- aji, Erik Barker, Tiffany Cai, Prithvijit Chattopadhyay, Y ongxin Chen, Yin Cui, Yifan Ding, Daniel Dworakowski, Jiaojiao Fan, Michele Fenzi, Francesco Ferroni, Sanja Fidler, Dieter Fox, Songwei Ge, Y unhao Ge, Jinwei Gu, Siddharth Gururani, Ethan He, Jiahui Huang, Jacob Huffman, Pooya Jannat...

work page 2025
[20]

Cosmos-transfer1: Conditional world generation with adaptive multimodal control, 2025

NVIDIA, :, Hassan Abu Alhaija, Jose Alvarez, Maciej Bala, Tiffany Cai, Tianshi Cao, Liz Cha, Joshua Chen, Mike Chen, Francesco Ferroni, Sanja Fidler, Dieter Fox, Y unhao Ge, Jinwei Gu, Ali Hassani, Michael Isaev, Pooya Jannaty, Shiyi Lan, Tobias Lasser, Huan Ling, Ming-Y u Liu, Xian Liu, Yifan Lu, Alice Luo, Qianli Ma, Hanzi Mao, Fabio Ramos, Xuanchi Ren,...

work page 2025
[21]

Maxime Oquab, Timoth´ee Darcet, Th´eo Moutakanni, Huy V . V o, Marc Szafraniec, V asil Khalidov, Pierre Fernandez, Daniel HAZIZA, Francisco Massa, Alaaeldin El-Nouby, Mido Assran, Nicolas Ballas, Wojciech Galuba, Russell Howes, Po-Y ao Huang, Shang-Wen Li, Ishan Misra, Michael Rabbat, V asu Sharma, Gabriel Synnaeve, Hu Xu, Herve Jegou, Julien Mairal, Patr...

work page 2024
[22]

Geometric transformer for fast and robust point cloud registration

Zheng Qin, Hao Y u, Changjian Wang, Y ulan Guo, Y uxing Peng, and Kai Xu. Geometric transformer for fast and robust point cloud registration. InProceedings of the IEEE/CVF conference 9 on computer vision and pattern recognition, pages 11143–11152,

work page
[23]

Cosmos-drive-dreams: Scalable synthetic driving data generation with world foundation models, 2025

Xuanchi Ren, Yifan Lu, Tianshi Cao, Ruiyuan Gao, Shengyu Huang, Amirmojtaba Sabour, Tianchang Shen, Tobias Pfaff, Jay Zhangjie Wu, Runjian Chen, Seung Wook Kim, Jun Gao, Laura Leal-Taixe, Mike Chen, Sanja Fidler, and Huan Ling. Cosmos-drive-dreams: Scalable synthetic driving data generation with world foundation models, 2025. 12

work page 2025
[24]

High-resolution image synthesis with latent diffusion models

Robin Rombach, Andreas Blattmann, Dominik Lorenz, Patrick Esser, and Bj ¨orn Ommer. High-resolution image synthesis with latent diffusion models. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 10684–10695, 2022. 3

work page 2022
[25]

Fast point feature histograms (fpfh) for 3d registration

Radu Bogdan Rusu, Nico Blodow, and Michael Beetz. Fast point feature histograms (fpfh) for 3d registration. In2009 IEEE international conference on robotics and automation, pages 3212–3217. IEEE, 2009. 1, 2, 6

work page 2009
[26]

Superglue: Learning feature matching with graph neural networks

Paul-Edouard Sarlin, Daniel DeTone, Tomasz Malisiewicz, and Andrew Rabinovich. Superglue: Learning feature matching with graph neural networks. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 4938–4947, 2020. 7

work page 2020
[27]

A flexible technique for accurate omnidirectional camera cali- bration and structure from motion

Davide Scaramuzza, Agostino Martinelli, and Roland Siegwart. A flexible technique for accurate omnidirectional camera cali- bration and structure from motion. InF ourth IEEE International Conference on Computer V ision Systems (ICVS’06), pages 45–45. IEEE, 2006. 12

work page 2006
[28]

Scalability in perception for autonomous driving: Waymo open dataset

Pei Sun, Henrik Kretzschmar, Xerxes Dotiwalla, Aurelien Chouard, Vijaysai Patnaik, Paul Tsui, James Guo, Yin Zhou, Y uning Chai, Benjamin Caine, Vijay V asudevan, Wei Han, Jiquan Ngiam, Hang Zhao, Aleksei Timofeev, Scott Ettinger, Maxim Krivokon, Amy Gao, Aditya Joshi, Y u Zhang, Jonathon Shlens, Zhifeng Chen, and Dragomir Anguelov. Scalability in percept...

work page 2020
[29]

Foundational models for 3d point clouds: A survey and outlook,

Vishal Thengane, Xiatian Zhu, Salim Bouzerdoum, Son Lam Phung, and Y unpeng Li. Foundational models for 3d point clouds: A survey and outlook.arXiv preprint arXiv:2501.18594, 2025. 1

work page arXiv 2025
[30]

Unique signatures of histograms for local surface description

Federico Tombari, Samuele Salti, and Luigi Di Stefano. Unique signatures of histograms for local surface description. In European conference on computer vision, pages 356–369. Springer, 2010. 1, 2

work page 2010
[31]

Freereg: Image-to- point cloud registration leveraging pretrained diffusion models and monocular depth estimators

Haiping Wang, Y uan Liu, Bing W ANG, YUJING SUN, Zhen Dong, Wenping Wang, and Bisheng Y ang. Freereg: Image-to- point cloud registration leveraging pretrained diffusion models and monocular depth estimators. InThe T welfth International Conference on Learning Representations, 2024. 2, 3

work page 2024
[32]

Dust3r: Geometric 3d vision made easy

Shuzhe Wang, Vincent Leroy, Y ohann Cabon, Boris Chidlovskii, and Jerome Revaud. Dust3r: Geometric 3d vision made easy. InCVPR, 2024. 2

work page 2024
[33]

Rotation-invariant transformer for point cloud matching

Hao Y u, Zheng Qin, Ji Hou, Mahdi Saleh, Dongsheng Li, Ben- jamin Busam, and Slobodan Ilic. Rotation-invariant transformer for point cloud matching. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 5384–5393, 2023. 2, 6, 7

work page 2023
[34]

Pointmbf: A multi-scale bidirectional fusion network for unsupervised rgb-d point cloud registration

Mingzhi Y uan, Kexue Fu, Zhihao Li, Y ucong Meng, and Manning Wang. Pointmbf: A multi-scale bidirectional fusion network for unsupervised rgb-d point cloud registration. In Proceedings of the IEEE/CVF International Conference on Computer V ision, pages 17694–17705, 2023. 2, 6, 7

work page 2023
[35]

3dmatch: Learning local geometric descriptors from rgb-d reconstructions

Andy Zeng, Shuran Song, Matthias Nießner, Matthew Fisher, Jianxiong Xiao, and Thomas Funkhouser. 3dmatch: Learning local geometric descriptors from rgb-d reconstructions. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 1802–1811, 2017. 1, 4, 6

work page 2017
[36]

safeguard

Lvmin Zhang, Anyi Rao, and Maneesh Agrawala. Adding conditional control to text-to-image diffusion models. In Proceedings of the IEEE/CVF international conference on computer vision, pages 3836–3847, 2023. 3 10 C-GenReg: Training-Free 3D Point Cloud Registration by Multi-View-Consistent Geometry-to-Image Generation with Probabilistic Modalities Fusion Sup...

work page 2023

[1] [1]

Foundation models defining a new era in vision: a survey and outlook.IEEE Trans- actions on P attern Analysis and Machine Intelligence, 2025

Muhammad Awais, Muzammal Naseer, Salman Khan, Rao Muhammad Anwer, Hisham Cholakkal, Mubarak Shah, Ming-Hsuan Y ang, and Fahad Shahbaz Khan. Foundation models defining a new era in vision: a survey and outlook.IEEE Trans- actions on P attern Analysis and Machine Intelligence, 2025. 1

work page 2025

[2] [2]

Method for registration of 3-d shapes

Paul J Besl and Neil D McKay. Method for registration of 3-d shapes. InSensor fusion IV: control paradigms and data structures, pages 586–606. Spie, 1992. 2

work page 1992

[3] [3]

Sc2-pcr: A sec- ond order spatial compatibility for efficient and robust point cloud registration

Zhi Chen, Kun Sun, Fan Y ang, and Wenbing Tao. Sc2-pcr: A sec- ond order spatial compatibility for efficient and robust point cloud registration. InProceedings of the IEEE/CVF conference on com- puter vision and pattern recognition, pages 13221–13231, 2022. 6

work page 2022

[4] [4]

Fully convolutional geometric features

Christopher Choy, Jaesik Park, and Vladlen Koltun. Fully convolutional geometric features. InProceedings of the IEEE/CVF international conference on computer vision, pages 8958–8966, 2019. 1, 2, 6, 7

work page 2019

[5] [5]

A generic fisheye camera model for robotic applica- tions

Jonathan Courbon, Y oucef Mezouar, Laurent Eckt, and Philippe Martinet. A generic fisheye camera model for robotic applica- tions. In2007 IEEE/RSJ International Conference on Intelligent Robots and Systems, pages 1683–1688. IEEE, 2007. 12

work page 2007

[6] [6]

A volumetric method for building complex models from range images

Brian Curless and Marc Levoy. A volumetric method for building complex models from range images. InProceedings of the 23rd annual conference on Computer graphics and interactive techniques, pages 303–312, 1996. 4

work page 1996

[7] [7]

Scannet: Richly- annotated 3d reconstructions of indoor scenes

Angela Dai, Angel X Chang, Manolis Savva, Maciej Halber, Thomas Funkhouser, and Matthias Nießner. Scannet: Richly- annotated 3d reconstructions of indoor scenes. InProceedings of the IEEE conference on computer vision and pattern recognition, pages 5828–5839, 2017. 4, 6

work page 2017

[8] [8]

Roma: Robust dense feature matching

Johan Edstedt, Qiyu Sun, Georg B¨okman, M˚arten Wadenb¨ack, and Michael Felsberg. Roma: Robust dense feature matching. In Proceedings of the IEEE/CVF Conference on Computer V ision and P attern Recognition (CVPR), pages 19790–19800, 2024. 2, 8

work page 2024

[9] [9]

Random sample consensus: a paradigm for model fitting with applications to image analysis and automated cartography.Communications of the ACM, 24(6):381–395, 1981

Martin A Fischler and Robert C Bolles. Random sample consensus: a paradigm for model fitting with applications to image analysis and automated cartography.Communications of the ACM, 24(6):381–395, 1981. 1, 2

work page 1981

[10] [10]

Vision meets robotics: The kitti dataset.International Journal of Robotics Research (IJRR), 2013

Andreas Geiger, Philip Lenz, Christoph Stiller, and Raquel Urtasun. Vision meets robotics: The kitti dataset.International Journal of Robotics Research (IJRR), 2013. 7

work page 2013

[11] [11]

Closed-form solution of absolute orientation using unit quaternions.Journal of the optical society of America A, 4(4):629–642, 1987

Berthold KP Horn. Closed-form solution of absolute orientation using unit quaternions.Journal of the optical society of America A, 4(4):629–642, 1987. 3

work page 1987

[12] [12]

Predator: Registration of 3d point clouds with low overlap

Shengyu Huang, Zan Gojcic, Mikhail Usvyatsov, Andreas Wieser, and Konrad Schindler. Predator: Registration of 3d point clouds with low overlap. InProceedings of the IEEE/CVF Conference on computer vision and pattern recognition, pages 4267–4276, 2021. 1, 2, 6, 7

work page 2021

[13] [13]

Generative point cloud registration

Haobo Jiang, Jin Xie, Jian Y ang, Liang Y u, and Jianmin Zheng. Generative point cloud registration. InF orty-second International Conference on Machine Learning, 2025. 2, 3, 6, 7, 8

work page 2025

[14] [14]

Zero-shot rgb-d point cloud registration with pre-trained large vision model

Haobo Jiang, Jin Xie, Jian Y ang, Liang Y u, and Jianmin Zheng. Zero-shot rgb-d point cloud registration with pre-trained large vision model. InProceedings of the Computer V ision and P attern Recognition Conference, pages 16943–16952, 2025. 2, 3, 6, 7, 8

work page 2025

[15] [15]

Distilling cosmos transfer 1 models

Grace Lam. Distilling cosmos transfer 1 models. https: //nvidia-cosmos.github.io/cosmos-cookbook/ core _ concepts / distillation / distilling _ transfer1.html, 2025. NVIDIA Cosmos Cookbook. 14

work page 2025

[16] [16]

Grounding image matching in 3d with mast3r, 2024

Vincent Leroy, Y ohann Cabon, and Jerome Revaud. Grounding image matching in 3d with mast3r, 2024. 2, 4, 6, 7, 8

work page 2024

[17] [17]

Unsupervised deep probabilistic approach for partial point cloud registration

Guofeng Mei, Hao Tang, Xiaoshui Huang, Weijie Wang, Juan Liu, Jian Zhang, Luc V an Gool, and Qiang Wu. Unsupervised deep probabilistic approach for partial point cloud registration. In Proceedings of the IEEE/CVF Conference on Computer V ision and P attern Recognition, pages 13611–13620, 2023. 2, 6

work page 2023

[18] [18]

Colorpcr: Color point cloud registration with multi-stage geometric-color fusion

Juncheng Mu, Lin Bie, Shaoyi Du, and Y ue Gao. Colorpcr: Color point cloud registration with multi-stage geometric-color fusion. InProceedings of the IEEE/CVF Conference on Computer V ision and P attern Recognition, pages 21061–21070, 2024. 3

work page 2024

[19] [19]

Cosmos world foundation model platform for physical ai, 2025

NVIDIA, :, Niket Agarwal, Arslan Ali, Maciej Bala, Y ogesh Bal- aji, Erik Barker, Tiffany Cai, Prithvijit Chattopadhyay, Y ongxin Chen, Yin Cui, Yifan Ding, Daniel Dworakowski, Jiaojiao Fan, Michele Fenzi, Francesco Ferroni, Sanja Fidler, Dieter Fox, Songwei Ge, Y unhao Ge, Jinwei Gu, Siddharth Gururani, Ethan He, Jiahui Huang, Jacob Huffman, Pooya Jannat...

work page 2025

[20] [20]

Cosmos-transfer1: Conditional world generation with adaptive multimodal control, 2025

NVIDIA, :, Hassan Abu Alhaija, Jose Alvarez, Maciej Bala, Tiffany Cai, Tianshi Cao, Liz Cha, Joshua Chen, Mike Chen, Francesco Ferroni, Sanja Fidler, Dieter Fox, Y unhao Ge, Jinwei Gu, Ali Hassani, Michael Isaev, Pooya Jannaty, Shiyi Lan, Tobias Lasser, Huan Ling, Ming-Y u Liu, Xian Liu, Yifan Lu, Alice Luo, Qianli Ma, Hanzi Mao, Fabio Ramos, Xuanchi Ren,...

work page 2025

[21] [21]

Maxime Oquab, Timoth´ee Darcet, Th´eo Moutakanni, Huy V . V o, Marc Szafraniec, V asil Khalidov, Pierre Fernandez, Daniel HAZIZA, Francisco Massa, Alaaeldin El-Nouby, Mido Assran, Nicolas Ballas, Wojciech Galuba, Russell Howes, Po-Y ao Huang, Shang-Wen Li, Ishan Misra, Michael Rabbat, V asu Sharma, Gabriel Synnaeve, Hu Xu, Herve Jegou, Julien Mairal, Patr...

work page 2024

[22] [22]

Geometric transformer for fast and robust point cloud registration

Zheng Qin, Hao Y u, Changjian Wang, Y ulan Guo, Y uxing Peng, and Kai Xu. Geometric transformer for fast and robust point cloud registration. InProceedings of the IEEE/CVF conference 9 on computer vision and pattern recognition, pages 11143–11152,

work page

[23] [23]

Cosmos-drive-dreams: Scalable synthetic driving data generation with world foundation models, 2025

Xuanchi Ren, Yifan Lu, Tianshi Cao, Ruiyuan Gao, Shengyu Huang, Amirmojtaba Sabour, Tianchang Shen, Tobias Pfaff, Jay Zhangjie Wu, Runjian Chen, Seung Wook Kim, Jun Gao, Laura Leal-Taixe, Mike Chen, Sanja Fidler, and Huan Ling. Cosmos-drive-dreams: Scalable synthetic driving data generation with world foundation models, 2025. 12

work page 2025

[24] [24]

High-resolution image synthesis with latent diffusion models

Robin Rombach, Andreas Blattmann, Dominik Lorenz, Patrick Esser, and Bj ¨orn Ommer. High-resolution image synthesis with latent diffusion models. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 10684–10695, 2022. 3

work page 2022

[25] [25]

Fast point feature histograms (fpfh) for 3d registration

Radu Bogdan Rusu, Nico Blodow, and Michael Beetz. Fast point feature histograms (fpfh) for 3d registration. In2009 IEEE international conference on robotics and automation, pages 3212–3217. IEEE, 2009. 1, 2, 6

work page 2009

[26] [26]

Superglue: Learning feature matching with graph neural networks

Paul-Edouard Sarlin, Daniel DeTone, Tomasz Malisiewicz, and Andrew Rabinovich. Superglue: Learning feature matching with graph neural networks. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 4938–4947, 2020. 7

work page 2020

[27] [27]

A flexible technique for accurate omnidirectional camera cali- bration and structure from motion

Davide Scaramuzza, Agostino Martinelli, and Roland Siegwart. A flexible technique for accurate omnidirectional camera cali- bration and structure from motion. InF ourth IEEE International Conference on Computer V ision Systems (ICVS’06), pages 45–45. IEEE, 2006. 12

work page 2006

[28] [28]

Scalability in perception for autonomous driving: Waymo open dataset

Pei Sun, Henrik Kretzschmar, Xerxes Dotiwalla, Aurelien Chouard, Vijaysai Patnaik, Paul Tsui, James Guo, Yin Zhou, Y uning Chai, Benjamin Caine, Vijay V asudevan, Wei Han, Jiquan Ngiam, Hang Zhao, Aleksei Timofeev, Scott Ettinger, Maxim Krivokon, Amy Gao, Aditya Joshi, Y u Zhang, Jonathon Shlens, Zhifeng Chen, and Dragomir Anguelov. Scalability in percept...

work page 2020

[29] [29]

Foundational models for 3d point clouds: A survey and outlook,

Vishal Thengane, Xiatian Zhu, Salim Bouzerdoum, Son Lam Phung, and Y unpeng Li. Foundational models for 3d point clouds: A survey and outlook.arXiv preprint arXiv:2501.18594, 2025. 1

work page arXiv 2025

[30] [30]

Unique signatures of histograms for local surface description

Federico Tombari, Samuele Salti, and Luigi Di Stefano. Unique signatures of histograms for local surface description. In European conference on computer vision, pages 356–369. Springer, 2010. 1, 2

work page 2010

[31] [31]

Freereg: Image-to- point cloud registration leveraging pretrained diffusion models and monocular depth estimators

Haiping Wang, Y uan Liu, Bing W ANG, YUJING SUN, Zhen Dong, Wenping Wang, and Bisheng Y ang. Freereg: Image-to- point cloud registration leveraging pretrained diffusion models and monocular depth estimators. InThe T welfth International Conference on Learning Representations, 2024. 2, 3

work page 2024

[32] [32]

Dust3r: Geometric 3d vision made easy

Shuzhe Wang, Vincent Leroy, Y ohann Cabon, Boris Chidlovskii, and Jerome Revaud. Dust3r: Geometric 3d vision made easy. InCVPR, 2024. 2

work page 2024

[33] [33]

Rotation-invariant transformer for point cloud matching

Hao Y u, Zheng Qin, Ji Hou, Mahdi Saleh, Dongsheng Li, Ben- jamin Busam, and Slobodan Ilic. Rotation-invariant transformer for point cloud matching. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 5384–5393, 2023. 2, 6, 7

work page 2023

[34] [34]

Pointmbf: A multi-scale bidirectional fusion network for unsupervised rgb-d point cloud registration

Mingzhi Y uan, Kexue Fu, Zhihao Li, Y ucong Meng, and Manning Wang. Pointmbf: A multi-scale bidirectional fusion network for unsupervised rgb-d point cloud registration. In Proceedings of the IEEE/CVF International Conference on Computer V ision, pages 17694–17705, 2023. 2, 6, 7

work page 2023

[35] [35]

3dmatch: Learning local geometric descriptors from rgb-d reconstructions

Andy Zeng, Shuran Song, Matthias Nießner, Matthew Fisher, Jianxiong Xiao, and Thomas Funkhouser. 3dmatch: Learning local geometric descriptors from rgb-d reconstructions. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 1802–1811, 2017. 1, 4, 6

work page 2017

[36] [36]

safeguard

Lvmin Zhang, Anyi Rao, and Maneesh Agrawala. Adding conditional control to text-to-image diffusion models. In Proceedings of the IEEE/CVF international conference on computer vision, pages 3836–3847, 2023. 3 10 C-GenReg: Training-Free 3D Point Cloud Registration by Multi-View-Consistent Geometry-to-Image Generation with Probabilistic Modalities Fusion Sup...

work page 2023