FUSER: Feed-Forward MUltiview 3D Registration Transformer and SE(3)^N Diffusion Refinement
Pith reviewed 2026-05-16 23:39 UTC · model grok-4.3
The pith
A feed-forward transformer registers multiple 3D point clouds by jointly processing them in a unified latent space to predict global poses directly.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
FUSER encodes each scan into low-resolution superpoint features via a sparse 3D CNN that preserves absolute translation cues, and performs efficient intra- and inter-scan reasoning through a Geometric Alternating Attention module to directly predict global poses without any pairwise estimation. Building on this, FUSER-DF introduces an SE(3)^N diffusion refinement framework where FUSER serves as a surrogate model to denoise poses in the joint space.
What carries the argument
Geometric Alternating Attention module, which alternates attention within and across scans on the superpoint features to capture geometric consistency in the unified latent space.
If this is right
- Registration accuracy improves because holistic geometric constraints are applied directly instead of relying on potentially inconsistent pairwise alignments.
- Computational cost decreases significantly for large numbers of views since pairwise computations are eliminated.
- The method enables end-to-end training of multiview registration as a single feed-forward network.
- Diffusion refinement provides a way to model uncertainty and correct errors in the initial pose estimates.
Where Pith is reading between the lines
- If the joint latent space approach holds, it could extend to other tasks like simultaneous localization and mapping with multiple sensors.
- Transferring 2D priors to 3D attention suggests similar benefits in other cross-modal 3D tasks.
- Real-time performance in dynamic environments becomes feasible due to the feed-forward nature.
Load-bearing premise
Low-resolution superpoint features extracted by a sparse 3D CNN preserve enough absolute translation information to support accurate global pose prediction across all views simultaneously.
What would settle it
Measuring registration error on benchmark datasets after ablating the translation-preserving aspects of the superpoint encoding; a large increase in error would falsify the claim that these features suffice for direct global prediction.
Figures
read the original abstract
Registration of multiview point clouds conventionally relies on extensive pairwise matching to build a pose graph for global synchronization, which is computationally expensive and inherently ill-posed without holistic geometric constraints. This paper proposes FUSER, the first feed-forward multiview registration transformer that jointly processes all scans in a unified, compact latent space to directly predict global poses without any pairwise estimation. To maintain tractability, FUSER encodes each scan into low-resolution superpoint features via a sparse 3D CNN that preserves absolute translation cues, and performs efficient intra- and inter-scan reasoning through a Geometric Alternating Attention module. Particularly, we transfer 2D attention priors from off-the-shelf foundation models to enhance 3D feature interaction and geometric consistency. Building upon FUSER, we further introduce FUSER-DF, an SE(3)$^N$ diffusion refinement framework to correct FUSER's estimates via denoising in the joint SE(3)$^N$ space. FUSER acts as a surrogate multiview registration model to construct the denoiser, and a prior-conditioned SE(3)$^N$ variational lower bound is derived for denoising supervision. Extensive experiments on 3DMatch, ScanNet and ArkitScenes demonstrate that our approach achieves the superior registration accuracy and outstanding computational efficiency.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript proposes FUSER, the first feed-forward multiview 3D registration transformer that jointly encodes all input scans into a unified compact latent space of low-resolution superpoint features via per-scan sparse 3D CNNs, applies Geometric Alternating Attention (with transferred 2D foundation-model priors) for intra- and inter-scan reasoning, and directly regresses globally consistent SE(3) poses without any pairwise matching or pose-graph optimization. It further introduces FUSER-DF, an SE(3)^N diffusion refinement stage that treats the feed-forward output as a surrogate model and optimizes a prior-conditioned variational lower bound. Experiments on 3DMatch, ScanNet, and ArkitScenes are reported to demonstrate superior registration accuracy and computational efficiency.
Significance. If the central architectural claim holds and the low-resolution superpoint features indeed suffice for direct global regression, the work would constitute a meaningful advance by replacing expensive pairwise graph synchronization with a single forward pass, offering substantial gains in speed and scalability for large-scale 3D reconstruction. The combination of attention transfer from 2D models and joint SE(3)^N diffusion refinement is technically novel and could influence subsequent feed-forward 3D vision pipelines.
major comments (2)
- [Abstract and §3] Abstract and §3 (superpoint encoding): the central claim that independent per-scan sparse 3D CNNs on low-resolution superpoints preserve sufficient absolute translation cues for direct joint SE(3) regression is load-bearing for the 'no pairwise estimation' contribution, yet the manuscript provides no ablation on voxel resolution, no analysis of translation-signal retention after downsampling, and no comparison against a pairwise baseline that isolates this assumption.
- [§4] §4 (Geometric Alternating Attention): the description of intra- and inter-scan attention does not specify how the module enforces global consistency across all views in a single pass; without an explicit consistency loss or proof that the attention mechanism cannot produce inconsistent poses, the 'globally consistent' claim remains under-supported.
minor comments (2)
- [Abstract] Abstract: quantitative metrics (recall, RMSE, runtime) and ablation tables are referenced but not shown; adding at least the headline numbers would strengthen the summary.
- [§5] Notation: the SE(3)^N diffusion formulation uses an ad-hoc prior-conditioned variational bound; a brief derivation or reference to the exact ELBO terms would improve clarity.
Simulated Author's Rebuttal
We thank the referee for the constructive comments. We address each major comment below and outline the revisions we will make to strengthen the manuscript.
read point-by-point responses
-
Referee: [Abstract and §3] Abstract and §3 (superpoint encoding): the central claim that independent per-scan sparse 3D CNNs on low-resolution superpoints preserve sufficient absolute translation cues for direct joint SE(3) regression is load-bearing for the 'no pairwise estimation' contribution, yet the manuscript provides no ablation on voxel resolution, no analysis of translation-signal retention after downsampling, and no comparison against a pairwise baseline that isolates this assumption.
Authors: We agree that empirical validation of this assumption is important. In the revised version, we will add an ablation study varying the voxel resolution and superpoint downsampling factor, measuring the impact on absolute translation accuracy. We will also include a comparison with a pairwise registration baseline that uses the same superpoint features but performs separate pairwise estimations followed by graph optimization. This will isolate the benefit of the joint feed-forward approach. The sparse 3D CNN preserves translation cues because it operates directly on the input coordinates without relative normalization, as the features retain positional encoding from the original scan positions. revision: yes
-
Referee: [§4] §4 (Geometric Alternating Attention): the description of intra- and inter-scan attention does not specify how the module enforces global consistency across all views in a single pass; without an explicit consistency loss or proof that the attention mechanism cannot produce inconsistent poses, the 'globally consistent' claim remains under-supported.
Authors: The joint processing in a single unified latent space allows the inter-scan attention to exchange information across all views simultaneously, leading to globally consistent pose predictions through the shared feature interactions. To better support this, we will revise §4 to include a detailed description of the attention flow and how it promotes consistency. Additionally, we will add empirical measurements of pose consistency (such as the maximum deviation in relative poses) in the experiments. We do not introduce an explicit consistency loss because the model is trained end-to-end with direct supervision on the global SE(3) poses from ground truth, which implicitly enforces consistency. revision: partial
Circularity Check
No significant circularity; architectural and empirical contribution with independent validation
full rationale
The paper introduces FUSER as a feed-forward transformer architecture that encodes multiview scans via sparse 3D CNN into superpoint features, applies Geometric Alternating Attention for joint pose prediction, and optionally refines via SE(3)^N diffusion. No equations or derivations reduce the claimed global registration performance to quantities defined by the model's own fitted parameters or self-referential inputs. The low-resolution feature preservation assumption is presented as an empirical design choice rather than a definitional tautology, and the diffusion stage is explicitly corrective. Validation on 3DMatch, ScanNet, and ArkitScenes provides external benchmarks. This is a standard non-circular finding for an architectural paper whose central claims rest on implementation and testing rather than closed-form reduction to inputs.
Axiom & Free-Parameter Ledger
Reference graph
Works this paper leans on
-
[1]
4-points congruent sets for robust pairwise surface registration
Dror Aiger, Niloy J Mitra, and Daniel Cohen-Or. 4-points congruent sets for robust pairwise surface registration. In SIGGRAPH. 2008. 2
work page 2008
-
[2]
Buffer: Balancing accuracy, efficiency, and generaliz- ability in point cloud registration
Sheng Ao, Qingyong Hu, Hanyun Wang, Kai Xu, and Yulan Guo. Buffer: Balancing accuracy, efficiency, and generaliz- ability in point cloud registration. InCVPR, 2023. 2
work page 2023
-
[3]
Spectral synchronization of multiple views in se (3).SIAM Journal on Imaging Sciences, 2016
Federica Arrigoni, Beatrice Rossi, and Andrea Fusiello. Spectral synchronization of multiple views in se (3).SIAM Journal on Imaging Sciences, 2016. 1, 2, 3, 6, 7
work page 2016
-
[4]
A survey of augmented reality.Presence: teleoperators & virtual environments, 1997
Ronald T Azuma. A survey of augmented reality.Presence: teleoperators & virtual environments, 1997. 1
work page 1997
-
[5]
D3feat: Joint learning of dense detec- tion and description of 3d local features
Xuyang Bai, Zixin Luo, Lei Zhou, Hongbo Fu, Long Quan, and Chiew-Lan Tai. D3feat: Joint learning of dense detec- tion and description of 3d local features. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 6359–6367, 2020. 2
work page 2020
-
[6]
ARKitScenes: A Diverse Real-World Dataset For 3D Indoor Scene Understanding Using Mobile RGB-D Data
Gilad Baruch, Zhuoyuan Chen, Afshin Dehghan, Tal Dimry, Yuri Feigin, Peter Fu, Thomas Gebauer, Brandon Joffe, Daniel Kurz, Arik Schwartz, et al. Arkitscenes: A diverse real-world dataset for 3d indoor scene understanding using mobile rgb-d data.arXiv preprint arXiv:2111.08897, 2021. 6, 7, 8
work page internal anchor Pith review Pith/arXiv arXiv 2021
-
[7]
Method for registration of 3-D shapes
Paul J Besl and Neil D McKay. Method for registration of 3-D shapes. InSensor fusion IV: control paradigms and data structures, pages 586–606, 1992. 2
work page 1992
-
[8]
Bayesian pose graph optimization via bingham distributions and tempered geodesic mcmc.NeurIPS, 2018
Tolga Birdal, Umut Simsekli, Mustafa Onur Eken, and Slo- bodan Ilic. Bayesian pose graph optimization via bingham distributions and tempered geodesic mcmc.NeurIPS, 2018. 2
work page 2018
-
[9]
Avishek Chatterjee and Venu Madhav Govindu. Robust rel- ative rotation averaging.IEEE transactions on pattern anal- ysis and machine intelligence, 2017. 1, 3, 6, 7
work page 2017
-
[10]
Sira-pcr: Sim-to-real adaptation for 3d point cloud registration
Suyi Chen, Hao Xu, Ru Li, Guanghui Liu, Chi-Wing Fu, and Shuaicheng Liu. Sira-pcr: Sim-to-real adaptation for 3d point cloud registration. InICCV, 2023. 2
work page 2023
-
[11]
In- cremental multiview point cloud registration.arXiv preprint arXiv:2407.05021, 2024
Xiaoya Cheng, Yu Liu, Maojun Zhang, and Shen Yan. In- cremental multiview point cloud registration.arXiv preprint arXiv:2407.05021, 2024. 2, 3, 6, 7
-
[12]
The trimmed iterative closest point algorithm
Dmitry Chetverikov, Dmitry Svirko, Dmitry Stepanov, and Pavel Krsek. The trimmed iterative closest point algorithm. InObject recognition supported by user interaction for ser- vice robots, 2002. 2
work page 2002
-
[13]
Robust reconstruction of indoor scenes
Sungjoon Choi, Qian-Yi Zhou, and Vladlen Koltun. Robust reconstruction of indoor scenes. InCVPR, 2015. 2, 3
work page 2015
-
[14]
4d spatio-temporal convnets: Minkowski convolutional neural networks
Christopher Choy, JunYoung Gwak, and Silvio Savarese. 4d spatio-temporal convnets: Minkowski convolutional neural networks. InCVPR, 2019. 2, 4
work page 2019
-
[15]
Fully convolutional geometric features
Christopher Choy, Jaesik Park, and Vladlen Koltun. Fully convolutional geometric features. InICCV, 2019. 2, 4, 7
work page 2019
-
[16]
Christopher Choy, Wei Dong, and Vladlen Koltun. Deep global registration. InCVPR, 2020. 2, 6, 12
work page 2020
-
[17]
Parallel, real-time visual slam
Brian Clipp, Jongwoo Lim, Jan-Michael Frahm, and Marc Pollefeys. Parallel, real-time visual slam. InIROS, 2010. 2
work page 2010
-
[18]
Scannet: Richly-annotated 3d reconstructions of indoor scenes
Angela Dai, Angel X Chang, Manolis Savva, Maciej Hal- ber, Thomas Funkhouser, and Matthias Nießner. Scannet: Richly-annotated 3d reconstructions of indoor scenes. In CVPR, 2017. 6, 7, 8, 12
work page 2017
-
[19]
Angela Dai, Matthias Nießner, Michael Zollh ¨ofer, Shahram Izadi, and Christian Theobalt. Bundlefusion: Real-time globally consistent 3d reconstruction using on-the-fly surface reintegration.ACM Transactions on Graphics, 2017. 1
work page 2017
-
[20]
FlashAttention-2: Faster attention with better par- allelism and work partitioning
Tri Dao. FlashAttention-2: Faster attention with better par- allelism and work partitioning. InInternational Conference on Learning Representations (ICLR), 2024. 8
work page 2024
-
[21]
Fu, Stefano Ermon, Atri Rudra, and Christopher R´e
Tri Dao, Daniel Y . Fu, Stefano Ermon, Atri Rudra, and Christopher R´e. FlashAttention: Fast and memory-efficient exact attention with IO-awareness. InAdvances in Neural Information Processing Systems (NeurIPS), 2022. 8
work page 2022
-
[22]
Zhen Dong, Bisheng Yang, Fuxun Liang, Ronggang Huang, and Sebastian Scherer. Hierarchical registration of unordered tls point clouds based on binary shape context descriptor.IS- PRS Journal of Photogrammetry and Remote Sensing, 2018. 2
work page 2018
-
[23]
Model globally, match locally: Efficient and robust 3D object recognition
Bertram Drost, Markus Ulrich, Nassir Navab, and Slobodan Ilic. Model globally, match locally: Efficient and robust 3D object recognition. InCVPR, 2010. 2
work page 2010
-
[24]
Robust point cloud registration framework based on deep graph matching
Kexue Fu, Shaolei Liu, Xiaoyuan Luo, and Manning Wang. Robust point cloud registration framework based on deep graph matching. InCVPR, 2021. 2
work page 2021
-
[25]
Learning multiview 3d point cloud regis- tration
Zan Gojcic, Caifa Zhou, Jan D Wegner, Leonidas J Guibas, and Tolga Birdal. Learning multiview 3d point cloud regis- tration. InCVPR, 2020. 1, 2, 3, 6, 7, 12
work page 2020
-
[26]
Lie-algebraic averaging for globally consistent motion estimation
Venu Madhav Govindu. Lie-algebraic averaging for globally consistent motion estimation. InCVPR, 2004. 2, 3, 6
work page 2004
-
[27]
A tutorial on graph-based slam.IEEE Intelligent Transportation Systems Magazine, 2011
Giorgio Grisetti, Rainer K ¨ummerle, Cyrill Stachniss, and Wolfram Burgard. A tutorial on graph-based slam.IEEE Intelligent Transportation Systems Magazine, 2011. 2
work page 2011
-
[28]
Card: Classification and regression diffusion models.NeurIPS,
Xizewen Han, Huangjie Zheng, and Mingyuan Zhou. Card: Classification and regression diffusion models.NeurIPS,
-
[29]
L1 rotation averaging using the weiszfeld algorithm
Richard Hartley, Khurrum Aftab, and Jochen Trumpf. L1 rotation averaging using the weiszfeld algorithm. InCVPR,
-
[30]
Yiheng Hu, Binghao Li, Chengpei Xu, Sarp Saydam, and Wenjie Zhang. Featsync: 3d point cloud multiview registra- tion with attention feature-based refinement.Neurocomput- ing, 2024. 2, 3
work page 2024
-
[31]
Predator: Registration of 3d point clouds with low overlap
Shengyu Huang, Zan Gojcic, Mikhail Usvyatsov, Andreas Wieser, and Konrad Schindler. Predator: Registration of 3d point clouds with low overlap. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 4267–4276, 2021. 1, 2, 3, 7
work page 2021
-
[32]
Translation synchronization via truncated least squares.NeurIPS, 2017
Xiangru Huang, Zhenxiao Liang, Chandrajit Bajaj, and Qix- ing Huang. Translation synchronization via truncated least squares.NeurIPS, 2017. 2, 3
work page 2017
-
[33]
Learning transfor- mation synchronization
Xiangru Huang, Zhenxiao Liang, Xiaowei Zhou, Yao Xie, Leonidas J Guibas, and Qixing Huang. Learning transfor- mation synchronization. InCVPR, 2019. 2, 3
work page 2019
-
[34]
Haobo Jiang, Mathieu Salzmann, Zheng Dang, Jin Xie, and Jian Yang. Se (3) diffusion model-based point cloud regis- tration for robust 6d object pose estimation.NeurIPS, 2023. 3, 5, 6
work page 2023
-
[35]
Multiway point cloud mosaicking with diffusion and global optimization
Shengze Jin, Iro Armeni, Marc Pollefeys, and Daniel Barath. Multiway point cloud mosaicking with diffusion and global optimization. InCVPR, 2024. 2, 3, 6
work page 2024
-
[36]
Andrew E Johnson and Martial Hebert. Using spin images for efficient object recognition in cluttered 3d scenes.IEEE Transactions on pattern analysis and machine intelligence,
-
[37]
Pyojin Kim, Jungha Kim, Minkyeong Song, Yeoeun Lee, Moonkyeong Jung, and Hyeong-Geun Kim. A benchmark comparison of four off-the-shelf proprietary visual–inertial odometry systems.Sensors, 2022. 1
work page 2022
-
[38]
g2o: A general framework for graph optimization
Rainer K ¨ummerle, Giorgio Grisetti, Hauke Strasdat, Kurt Konolige, and Wolfram Burgard. g2o: A general framework for graph optimization. InICRA, 2011. 2
work page 2011
-
[39]
Hara: A hierarchical ap- proach for robust rotation averaging
Seong Hun Lee and Javier Civera. Hara: A hierarchical ap- proach for robust rotation averaging. InCVPR, 2022. 2, 3, 6, 7
work page 2022
-
[40]
Jiahao Li, Changhao Zhang, Ziyao Xu, Hangning Zhou, and Chi Zhang. Iterative distance-aware similarity matrix convo- lution with mutual-supervised point elimination for efficient point cloud registration. InECCV, 2020. 2
work page 2020
-
[41]
Shiqi Li, Jihua Zhu, Yifan Xie, Naiwen Hu, and Di Wang. Matching distance and geometric distribution aided learning multiview point cloud registration.IEEE Robotics and Au- tomation Letters, 2024. 2, 3, 6, 7
work page 2024
-
[42]
Lepard: Learning partial point cloud matching in rigid and deformable scenes
Yang Li and Tatsuya Harada. Lepard: Learning partial point cloud matching in rigid and deformable scenes. InCVPR,
-
[43]
Kinectfusion: Real-time dense surface mapping and track- ing
Richard A Newcombe, Shahram Izadi, Otmar Hilliges, David Molyneaux, David Kim, Andrew J Davison, Pushmeet Kohi, Jamie Shotton, Steve Hodges, and Andrew Fitzgibbon. Kinectfusion: Real-time dense surface mapping and track- ing. InIEEE international symposium on mixed and aug- mented reality, 2011. 1
work page 2011
-
[44]
Improved denoising diffusion probabilistic models
Alexander Quinn Nichol and Prafulla Dhariwal. Improved denoising diffusion probabilistic models. InICML, 2021. 3
work page 2021
-
[45]
Geometric transformer for fast and robust point cloud registration
Zheng Qin, Hao Yu, Changjian Wang, Yulan Guo, Yuxing Peng, and Kai Xu. Geometric transformer for fast and robust point cloud registration. InCVPR, 2022. 1, 2, 3, 4, 7, 8
work page 2022
-
[46]
Aligning point cloud views using persistent feature histograms
Radu Bogdan Rusu, Nico Blodow, Zoltan Csaba Marton, and Michael Beetz. Aligning point cloud views using persistent feature histograms. InIROS, 2008. 2
work page 2008
-
[47]
Fast point feature histograms (fpfh) for 3d registration
Radu Bogdan Rusu, Nico Blodow, and Michael Beetz. Fast point feature histograms (fpfh) for 3d registration. InICRA,
-
[48]
Samuele Salti, Federico Tombari, and Luigi Di Stefano. Shot: Unique signatures of histograms for surface and tex- ture description.Computer Vision and Image Understand- ing, 2014. 2
work page 2014
-
[49]
Habitat: A plat- form for embodied ai research
Manolis Savva, Abhishek Kadian, Oleksandr Maksymets, Yili Zhao, Erik Wijmans, Bhavana Jain, Julian Straub, Jia Liu, Vladlen Koltun, Jitendra Malik, et al. Habitat: A plat- form for embodied ai research. InICCV, 2019. 1
work page 2019
-
[50]
Habitat 2.0: Training home assistants to rearrange their habitat.NeurIPS, 2021
Andrew Szot, Alexander Clegg, Eric Undersander, Erik Wi- jmans, Yili Zhao, John Turner, Noah Maestre, Mustafa Mukadam, Devendra Singh Chaplot, Oleksandr Maksymets, et al. Habitat 2.0: Training home assistants to rearrange their habitat.NeurIPS, 2021. 1
work page 2021
-
[51]
Kpconv: Flexible and deformable convolution for point clouds
Hugues Thomas, Charles R Qi, Jean-Emmanuel Deschaud, Beatriz Marcotegui, Franc ¸ois Goulette, and Leonidas J Guibas. Kpconv: Flexible and deformable convolution for point clouds. InICCV, 2019. 2, 4
work page 2019
-
[52]
Unique shape context for 3d data description
Federico Tombari, Samuele Salti, and Luigi Di Stefano. Unique shape context for 3d data description. InProceed- ings of the ACM workshop on 3D object retrieval, 2010. 2
work page 2010
-
[53]
Haiping Wang, Yuan Liu, Zhen Dong, Wenping Wang, and Bisheng Yang. You only hypothesize once: Point cloud reg- istration with rotation-equivariant descriptors.arXiv preprint arXiv:2109.00182, 2021. 6, 7
-
[54]
Haiping Wang, Yuan Liu, Zhen Dong, Yulan Guo, Yu-Shen Liu, Wenping Wang, and Bisheng Yang. Robust multiview point cloud registration with reliable pose graph initialization and history reweighting. InCVPR, 2023. 2, 3, 6, 7, 8, 12
work page 2023
-
[55]
Vggt: Vi- sual geometry grounded transformer
Jianyuan Wang, Minghao Chen, Nikita Karaev, Andrea Vedaldi, Christian Rupprecht, and David Novotny. Vggt: Vi- sual geometry grounded transformer. InCVPR, 2025. 4
work page 2025
-
[56]
Lanhui Wang and Amit Singer. Exact and stable recovery of rotations for robust synchronization.Information and Infer- ence: A Journal of the IMA, 2013. 2, 3
work page 2013
-
[57]
Zero-shot point cloud registration.arXiv preprint arXiv:2312.03032, 2023
Weijie Wang, Guofeng Mei, Bin Ren, Xiaoshui Huang, Fabio Poiesi, Luc Van Gool, Nicu Sebe, and Bruno Lepri. Zero-shot point cloud registration.arXiv preprint arXiv:2312.03032, 2023. 2
-
[58]
Deep closest point: Learn- ing representations for point cloud registration
Yue Wang and Justin M Solomon. Deep closest point: Learn- ing representations for point cloud registration. InICCV,
-
[59]
$\pi^3$: Permutation-Equivariant Visual Geometry Learning
Yifan Wang, Jianjun Zhou, Haoyi Zhu, Wenzheng Chang, Yang Zhou, Zizun Li, Junyi Chen, Jiangmiao Pang, Chun- hua Shen, and Tong He. pi3: Permutation-equivariant visual geometry learning.arXiv preprint arXiv:2507.13347, 2025. 2, 4, 8
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[60]
Pare-net: Position-aware rotation- equivariant networks for robust point cloud registration
Runzhao Yao, Shaoyi Du, Wenting Cui, Canhui Tang, and Chengwu Yang. Pare-net: Position-aware rotation- equivariant networks for robust point cloud registration. In ECCV, 2024. 7, 8
work page 2024
-
[61]
Scannet++: A high-fidelity dataset of 3d indoor scenes
Chandan Yeshwanth, Yueh-Cheng Liu, Matthias Nießner, and Angela Dai. Scannet++: A high-fidelity dataset of 3d indoor scenes. InICCV, 2023. 6, 8
work page 2023
-
[62]
Rpm-net: Robust point matching using learned features
Zi Jian Yew and Gim Hee Lee. Rpm-net: Robust point matching using learned features. InCVPR, 2020. 3
work page 2020
-
[63]
Learning iterative robust transformation synchronization
Zi Jian Yew and Gim Hee Lee. Learning iterative robust transformation synchronization. In3DV, 2021. 2, 3, 6, 7, 12
work page 2021
-
[64]
Regtr: End-to-end point cloud correspondences with transformers
Zi Jian Yew and Gim Hee Lee. Regtr: End-to-end point cloud correspondences with transformers. InCVPR, 2022. 3
work page 2022
-
[65]
Rotation-invariant transformer for point cloud matching
Hao Yu, Zheng Qin, Ji Hou, Mahdi Saleh, Dongsheng Li, Benjamin Busam, and Slobodan Ilic. Rotation-invariant transformer for point cloud matching. InCVPR, 2023. 1, 2, 3, 4
work page 2023
-
[66]
3dmatch: Learning local geometric descriptors from rgb-d reconstruc- tions
Andy Zeng, Shuran Song, Matthias Nießner, Matthew Fisher, Jianxiong Xiao, and Thomas Funkhouser. 3dmatch: Learning local geometric descriptors from rgb-d reconstruc- tions. InProceedings of the IEEE conference on computer vision and pattern recognition, pages 1802–1811, 2017. 1, 2, 3, 6, 8, 12
work page 2017
-
[67]
Qian-Yi Zhou, Jaesik Park, and Vladlen Koltun. Fast global registration. InECCV, 2016. 2 Supplementary Material
work page 2016
-
[68]
Evaluation Metrics We evaluate multiview registration accuracy by comparing predicted relative poses ˆRij, ˆtij with ground truthR ij,t ij. For ScanNet [18], following [25, 54, 63], we report the empirical cumulative distribution function (ECDF) of rotation/translation errors: REij = arccos Tr ˆR⊤ ijRij −1 2 ,TE ij =∥ ˆtij −t ij∥2 (14) For 3DMatch [66], f...
-
[69]
pθ(T0:T 1:N | S, ˆT1:N) q(T1:T 1:N |T 0 1:N , ˆT1:N) # (I) ≥E T1:T 1:N ∼q
Variational Lower Bound Derivation for Prior-aware SE(3)N Diffusion Refinement Model The objective is to find a tractable lower bound on the marginal log-likelihood of the ground-truth transformationsT 0 1:N given the dataS={S 1,S 2, ...,S N }and the prior transformations ˆT1:N = ( ˆT1, ˆT2, ..., ˆTN)predicted by the FUSER. We introduce the set of latent ...
-
[70]
Network architecture of absolute geometric encoder
Model Architecture of Absolute Geometric Encoder 3D Coordinate Input 3D Conv 5×5×5,1,32 3D Conv 5x5x5, 2, 32 ResBlock, 32 3D Conv 3x3x3, 2, 64 ResBlock, 64 3D Conv 3x3x3, 2, 128 ResBlock, 128 3D Conv 3x3x3, 2, 256 ResBlock, 256 3D Conv 3x3x3, 2, 1024 ResBlock, 1024 Superpoint Features Figure 6. Network architecture of absolute geometric encoder
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.