Joint Multi-Camera LiDAR Extrinsic Calibration via Learned Pairwise Initialization and Geometric Refinement
Pith reviewed 2026-06-28 22:30 UTC · model grok-4.3
The pith
Pairwise camera-LiDAR predictions refine into a globally consistent multi-camera calibration via joint bundle adjustment.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The central claim is that a two-stage process—independent learned pairwise initialization for each camera-LiDAR pair followed by joint multi-frame bundle adjustment using reprojection error, per-camera prior, and relative-pose prior terms—converts the separate pairwise extrinsic estimates into one globally consistent multi-camera calibration.
What carries the argument
The multi-frame bundle adjustment that jointly optimizes all camera-LiDAR extrinsics from independent pairwise initializations by combining reprojection, per-camera prior, and relative-pose prior terms.
If this is right
- Per-camera translation error reaches 0.89 cm and rotation error reaches 0.038 on KITTI data.
- Translation error drops from 108.6 cm to 3.1 cm on the Walkley dataset.
- Inter-camera consistency improves beyond what independent pairwise processing achieves.
- The joint step supplies robustness when individual camera predictions are less reliable.
Where Pith is reading between the lines
- The same refinement pattern could extend to additional rigid sensors such as radar units on the same platform.
- Downstream multi-view fusion tasks may benefit from the enforced geometric consistency even if they do not rerun the calibration.
- The separation of learned initialization from geometric refinement suggests that similar two-stage pipelines could improve other sensor-calibration problems.
- Real-world deployment may gain from running the refinement only when pairwise predictions show high mutual disagreement.
Load-bearing premise
The bundle adjustment can find a globally consistent solution even when it starts from independent pairwise predictions that may contain errors.
What would settle it
A multi-camera dataset in which the final inter-camera relative poses after joint refinement deviate farther from ground truth than the original pairwise estimates.
Figures
read the original abstract
Most learning-based camera-LiDAR calibration methods treat each camera-LiDAR pair independently, ignoring the rigid geometric coupling in multi-camera platforms. As a result, per-camera estimates may be individually accurate yet inconsistent at the system level. We present a two-stage framework for joint multi-camera LiDAR extrinsic calibration that combines learned pairwise matching with geometric refinement. First, CMRNext is applied independently to each camera to produce initial extrinsic estimates and dense 2D-3D correspondences. These predictions are then jointly refined through a multi-frame bundle adjustment with reprojection, per-camera prior, and relative-pose prior terms. This approach converts pairwise predictions into a globally consistent multi-camera calibration. Experiments on KITTI (in-domain for CMRNext) and Walkley (out-of-domain) datasets show improved per-camera accuracy and inter-camera consistency. On KITTI, the method achieves 0.89 cm translation error and 0.038 rotation error. On Walkley, it reduces translation error from 108.6 cm to 3.1 cm, highlighting the benefit of explicit multi-camera coupling when single-camera predictions are less reliable.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript presents a two-stage framework for joint multi-camera LiDAR extrinsic calibration. The first stage applies the CMRNext model independently to each camera-LiDAR pair to obtain initial extrinsics and dense 2D-3D correspondences. The second stage performs a multi-frame bundle adjustment incorporating reprojection error, per-camera prior, and relative-pose prior terms to refine the estimates into a globally consistent calibration. Experiments on the KITTI (in-domain) and Walkley (out-of-domain) datasets report improved per-camera accuracy and inter-camera consistency, with translation errors of 0.89 cm on KITTI and a reduction from 108.6 cm to 3.1 cm on Walkley.
Significance. If the central claim holds, the work is significant for addressing inconsistency across independent pairwise camera-LiDAR calibrations in multi-camera rigs, a practical issue in robotics and autonomous driving. The combination of learned pairwise initialization with geometric refinement, plus explicit evaluation on out-of-domain data where single-camera predictions degrade, is a strength. The approach demonstrates that explicit multi-camera coupling can yield large error reductions when independent estimates are unreliable.
major comments (2)
- [Abstract] Abstract: The central claim that the multi-frame bundle adjustment converts independent CMRNext pairwise predictions into globally consistent extrinsics rests on the relative-pose prior term, yet the abstract provides neither its equation, weighting schedule, nor an ablation removing this term. Without these, it is unclear whether the reported drop from 108.6 cm to 3.1 cm on Walkley reflects geometric coupling or prior regularization, which is load-bearing for the consistency claim.
- [Method] Method section (bundle adjustment formulation): The assumption that reprojection + per-camera prior + relative-pose prior terms can reliably resolve inconsistencies from independent initializations is untested in the reported experiments; an ablation on the relative-pose prior weight (or its removal) is needed to confirm that the joint optimization enforces consistency rather than averaging under strong priors, especially given the large initial errors on out-of-domain data.
minor comments (2)
- [Abstract] Abstract: The rotation error of 0.038 is reported without units (radians or degrees), which should be clarified for reproducibility.
- [Experiments] Experiments: Baseline comparisons and implementation details (e.g., exact CMRNext usage, optimization hyperparameters) are referenced but not fully detailed in the provided abstract; ensure these are expanded in the full manuscript for clarity.
Simulated Author's Rebuttal
We thank the referee for the constructive feedback on our manuscript. We address each major comment below and will incorporate revisions to strengthen the presentation of the relative-pose prior and its empirical validation.
read point-by-point responses
-
Referee: [Abstract] Abstract: The central claim that the multi-frame bundle adjustment converts independent CMRNext pairwise predictions into globally consistent extrinsics rests on the relative-pose prior term, yet the abstract provides neither its equation, weighting schedule, nor an ablation removing this term. Without these, it is unclear whether the reported drop from 108.6 cm to 3.1 cm on Walkley reflects geometric coupling or prior regularization, which is load-bearing for the consistency claim.
Authors: We agree that the abstract should more clearly signal the role of the relative-pose prior. In the revised manuscript we will expand the abstract sentence describing the bundle adjustment to explicitly name the relative-pose prior term and its purpose. Because abstracts have strict length limits, the full equation and weighting schedule will be added to Section 3.2 (Bundle Adjustment Formulation) with a forward reference from the abstract. We will also add the requested ablation (see response to the second comment). revision: yes
-
Referee: [Method] Method section (bundle adjustment formulation): The assumption that reprojection + per-camera prior + relative-pose prior terms can reliably resolve inconsistencies from independent initializations is untested in the reported experiments; an ablation on the relative-pose prior weight (or its removal) is needed to confirm that the joint optimization enforces consistency rather than averaging under strong priors, especially given the large initial errors on out-of-domain data.
Authors: We acknowledge that the current experiments do not isolate the contribution of the relative-pose prior. In the revised manuscript we will add a new ablation table (Table X) that reports per-camera and inter-camera translation/rotation errors on both KITTI and Walkley when (i) the relative-pose prior weight is set to zero and (ii) the weight is varied over a range of values. This will directly test whether the observed consistency gains arise from geometric coupling rather than prior regularization alone. revision: yes
Circularity Check
No circularity: pairwise learned initialization followed by independent geometric bundle adjustment.
full rationale
The derivation chain consists of two distinct stages with no reduction to inputs by construction. CMRNext supplies independent per-pair initial extrinsics and correspondences; these are then refined by a standard multi-frame bundle adjustment whose objective (reprojection error + per-camera priors + relative-pose priors) is defined externally to the learned predictions and does not embed the target global consistency as a fitted parameter. No self-citation is invoked to justify uniqueness of the refinement step, no fitted quantity is relabeled as a prediction, and no ansatz is smuggled through prior work. The reported error reductions therefore arise from the interaction of the geometric terms with the data rather than from algebraic identity with the initialization.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption The rigid geometric coupling between cameras in a multi-camera platform can be enforced through prior terms in bundle adjustment.
Reference graph
Works this paper leans on
-
[1]
Identifying areas of high-risk vegetation encroach- ment on electrical powerlines using mobile and air- borne laser scanned point clouds,
A. Al-Najjar, M. Amini, S. Rajan, and J. R. Green, “Identifying areas of high-risk vegetation encroach- ment on electrical powerlines using mobile and air- borne laser scanned point clouds,”IEEE Sensors Jour- nal, vol. 24, no. 14, pp. 22 129–22 143, 2024. 1
2024
-
[2]
Im- proving forest above-ground biomass estimation using uav lidar and rgb with machine learning algorithm,
X. Cheng, L. Zhou, S. Liu, C. He, and Y . Teng, “Im- proving forest above-ground biomass estimation using uav lidar and rgb with machine learning algorithm,” Forests, vol. 16, no. 12, p. 1819, 2025. 1
2025
-
[3]
Plant height measurement using uav-based aerial rgb and lidar images in soybean,
L. Pun Magar, J. Sandifer, D. Khatri, S. Poudel, S. Kc, B. Gyawali, M. Gebremedhin, and A. Chiluwal, “Plant height measurement using uav-based aerial rgb and lidar images in soybean,”Frontiers in Plant Sci- ence, vol. 16, p. 1488760, 2025. 1
2025
-
[4]
Lineshield-a generalized lidar pipeline for automated 8 vegetation encroachment detection on powerlines,
A. Al-Najjar, M. Amini, J. R. Green, and F. Kwamena, “Lineshield-a generalized lidar pipeline for automated 8 vegetation encroachment detection on powerlines,” in 2025 IEEE International Symposium on Circuits and Systems (ISCAS). IEEE, 2025, pp. 1–5. 1
2025
-
[5]
Automatic extrinsic calibration of a camera and a 3D LiDAR using line and plane correspondences,
L. Zhou, Z. Li, and M. Kaess, “Automatic extrinsic calibration of a camera and a 3D LiDAR using line and plane correspondences,” in2018 IEEE/RSJ Interna- tional Conference on Intelligent Robots and Systems (IROS). IEEE, 2018, pp. 5562–5569. 1, 2
2018
-
[6]
Au- tomatic extrinsic calibration between a camera and a 3D LiDAR using 3D point and plane correspon- dences,
S. Verma, J. S. Berrio, S. Worrall, and E. Nebot, “Au- tomatic extrinsic calibration between a camera and a 3D LiDAR using 3D point and plane correspon- dences,” in2019 IEEE Intelligent Transportation Sys- tems Conference (ITSC). IEEE, 2019, pp. 3906–
2019
-
[7]
Automatic targetless extrinsic calibration of a 3D Li- DAR and camera by maximizing mutual information,
G. Pandey, J. McBride, S. Savarese, and R. Eustice, “Automatic targetless extrinsic calibration of a 3D Li- DAR and camera by maximizing mutual information,” inProceedings of the AAAI Conference on Artificial Intelligence, vol. 26, no. 1, 2012, pp. 2053–2059. 2
2012
-
[8]
Automatic extrinsic calibration of vision and LiDAR by maximizing mutual information,
G. Pandey, J. R. McBride, S. Savarese, and R. M. Eu- stice, “Automatic extrinsic calibration of vision and LiDAR by maximizing mutual information,”Journal of Field Robotics, vol. 32, no. 5, pp. 696–722, 2015. 2
2015
-
[9]
RegNet: Multimodal sensor registration using deep neural networks,
N. Schneider, F. Piewak, C. Stiller, and U. Franke, “RegNet: Multimodal sensor registration using deep neural networks,” in2017 IEEE Intelligent Vehicles Symposium (IV). IEEE, 2017, pp. 1803–1810. 2, 4
2017
-
[10]
CalibNet: Geometrically supervised extrinsic calibration using 3D spatial transformer networks,
G. Iyer, R. K. Ram, J. K. Murthy, and K. M. Kr- ishna, “CalibNet: Geometrically supervised extrinsic calibration using 3D spatial transformer networks,” in 2018 IEEE/RSJ International Conference on Intelli- gent Robots and Systems (IROS). IEEE, 2018, pp. 1110–1117
2018
-
[11]
LC- CNet: LiDAR and camera self-calibration using cost volume network,
X. Lv, B. Wang, Z. Dou, D. Ye, and S. Wang, “LC- CNet: LiDAR and camera self-calibration using cost volume network,” inProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recog- nition, 2021, pp. 2894–2901. 2, 4, 7
2021
-
[12]
CMRNext: Camera to Li- DAR matching in the wild for localization and extrin- sic calibration,
D. Cattaneo and A. Valada, “CMRNext: Camera to Li- DAR matching in the wild for localization and extrin- sic calibration,”IEEE Transactions on Robotics, 2025. 1, 2, 3, 7
2025
-
[13]
Vi- sion meets robotics: The KITTI dataset,
A. Geiger, P. Lenz, C. Stiller, and R. Urtasun, “Vi- sion meets robotics: The KITTI dataset,”The Interna- tional Journal of Robotics Research, vol. 32, no. 11, pp. 1231–1237, 2013. 1, 5
2013
-
[14]
Multi-modal Sensor Data Resource of Canadian Critical Electrical Infrastructure-Version 2,
M. Amini, H. Abbasi, A. Khoyani, A. Dey, I. Lam, Z. Sharifisoraki, E. Ali, A. Al-Najjar, A. Mehtiyev, A. Singh, S. Rajan, and J. R. Green, “Multi-modal Sensor Data Resource of Canadian Critical Electrical Infrastructure-Version 2,” inIEEE. Borealis, 2023. [Online]. Available: https://doi.org/10.5683/SP3/TCHEVX 1, 5
-
[15]
What really mat- ters for learning-based LiDAR-camera calibration,
S. Huang, C. Lin, and Y . Zhao, “What really mat- ters for learning-based LiDAR-camera calibration,” in Proceedings of the 7th ACM International Conference on Multimedia in Asia, 2025, pp. 1–8. 1, 2
2025
-
[16]
From chaos to calibration: A geometric mutual information approach to target-free camera-LiDAR extrinsic cali- bration,
J. Borer, J. Tschirner, F. ¨Olsner, and S. Milz, “From chaos to calibration: A geometric mutual information approach to target-free camera-LiDAR extrinsic cali- bration,” inProceedings of the IEEE/CVF Winter Con- ference on Applications of Computer Vision, 2024, pp. 8409–8418. 2
2024
-
[17]
L. Li, H. Li, X. Liu, D. He, Z. Miao, F. Kong, R. Li, Z. Liu, and F. Zhang, “Joint intrinsic and ex- trinsic LiDAR-camera calibration in targetless envi- ronments using plane-constrained bundle adjustment,” arXiv preprint arXiv:2308.12629, 2023. 2
arXiv 2023
-
[18]
Multi- camera-LiDAR auto-calibration by joint structure- from-motion,
D. Tu, B. Wang, H. Cui, Y . Liu, and S. Shen, “Multi- camera-LiDAR auto-calibration by joint structure- from-motion,” in2022 IEEE/RSJ International Con- ference on Intelligent Robots and Systems (IROS). IEEE, 2022, pp. 2242–2249. 2
2022
-
[19]
Uni- Cal: Unified neural sensor calibration,
Z. Yang, G. Chen, H. Zhang, K. Ta, I. A. B ˆarsan, D. Murphy, S. Manivasagam, and R. Urtasun, “Uni- Cal: Unified neural sensor calibration,” inEuropean Conference on Computer Vision. Springer, 2024, pp. 327–345. 2
2024
-
[20]
Y . Lv, Y . Zhang, C. Lu, J. Zhu, and S. Wu, “Targetless intrinsics and extrinsic calibration of multiple LiDARs and cameras with IMU using continuous-time estima- tion,”arXiv preprint arXiv:2501.02821, 2025. 2
arXiv 2025
-
[21]
Multi- Calib: A scalable LiDAR–camera calibration network for variable sensor configurations,
L. Hu, C. Wei, M. Wang, Z. Wu, and Y . Xu, “Multi- Calib: A scalable LiDAR–camera calibration network for variable sensor configurations,”Sensors, vol. 25, no. 23, p. 7321, 2025. 2
2025
-
[22]
CMRNet: Camera to lidar-map registration,
D. Cattaneo, M. Vaghi, A. L. Ballardini, S. Fontana, D. G. Sorrenti, and W. Burgard, “CMRNet: Camera to lidar-map registration,” in2019 IEEE Intelligent Transportation Systems Conference (ITSC). IEEE, 2019, pp. 1283–1289. 4, 7
2019
-
[23]
RGGNet: Toler- ance aware LiDAR-camera online calibration with ge- ometric deep learning and generative model,
K. Yuan, Z. Guo, and Z. J. Wang, “RGGNet: Toler- ance aware LiDAR-camera online calibration with ge- ometric deep learning and generative model,”IEEE Robotics and Automation Letters, vol. 5, no. 4, pp. 6956–6963, 2020. 7 9
2020
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.