Joint Target-Less Intrinsic and Extrinsic Camera-LiDAR Calibration using Deep Point Correspondences
Pith reviewed 2026-05-25 04:52 UTC · model grok-4.3
The pith
A target-less method jointly recovers camera intrinsics and camera-LiDAR extrinsics from raw images via deep point correspondences.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The authors claim that extending deep correspondence calibration to unknown intrinsics—via SfM initialization, a matcher tolerant of raw distorted images, and tight coupling inside joint nonlinear optimization—yields the first fully target-less pipeline for simultaneous intrinsic and extrinsic camera-LiDAR calibration, with measurable gains in extrinsic accuracy on unseen pairs.
What carries the argument
Deep pixel-point correspondences extended to raw images and coupled inside joint nonlinear optimization over intrinsics and extrinsics.
If this is right
- Extrinsic accuracy improves when intrinsics are optimized at the same time rather than held fixed.
- Accurate pinhole intrinsics with radial-tangential distortion are recovered as a direct output.
- Calibration succeeds on unrectified raw images without a separate intrinsic step.
- The pipeline works for previously unseen camera-LiDAR pairings after SfM initialization.
Where Pith is reading between the lines
- If the matcher remains stable across lighting changes, the same pipeline could support periodic recalibration during long-term robot operation.
- The approach might transfer to other sensor combinations provided a comparable deep correspondence model exists for them.
- Failure of the SfM stage on texture-poor scenes would block the entire joint optimization regardless of later steps.
Load-bearing premise
The deep correspondence network must produce usable matches on images whose focal length and distortion parameters are still unknown.
What would settle it
Running the full pipeline on KITTI sequences whose initial intrinsics are deliberately offset by 20 percent focal length and noticeable distortion, then checking whether the final extrinsic error fails to beat the extrinsic-only baseline, would falsify the joint approach.
Figures
read the original abstract
Accurate camera-LiDAR calibration is a prerequisite for robust multi-modal perception in robotics. Recent target-less approaches based on deep point correspondences achieve remarkable performance for extrinsic calibration but assume rectified images with known intrinsics. In this work, we overcome this limitation and present the first fully target-less pipeline that jointly estimates camera intrinsics (pinhole model with radial-tangential distortion) and camera-LiDAR extrinsics with deep pixel-point correspondences. Our approach extends deep correspondence-based calibration by (i) automatic intrinsic initialization via structure-from-motion, (ii) generalizing camera-LiDAR matching to raw images with unknown intrinsics including distortion, and (iii) tightly coupling correspondence estimation with joint nonlinear optimization over both intrinsics and extrinsics. We evaluate our method on the KITTI dataset with unseen camera-LiDAR pairs and demonstrate that joint calibration achieves improved extrinsic accuracy while additionally recovering accurate intrinsics.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper presents the first fully target-less pipeline for jointly estimating camera intrinsics (pinhole model with radial-tangential distortion) and camera-LiDAR extrinsics using deep pixel-point correspondences. It uses structure-from-motion for automatic intrinsic initialization, generalizes the matching to raw images with unknown intrinsics, and tightly couples correspondence estimation with joint nonlinear optimization. Evaluation on the KITTI dataset with unseen camera-LiDAR pairs shows improved extrinsic accuracy and recovery of accurate intrinsics.
Significance. If the results hold, this approach would be significant for multi-modal perception in robotics by enabling calibration without targets or pre-known intrinsics. The joint optimization and use of deep correspondences represent a practical advancement over previous methods that assume rectified images with known intrinsics. The ability to handle raw data could facilitate easier deployment in real-world scenarios.
major comments (2)
- [Abstract] Abstract: The abstract claims 'improved extrinsic accuracy' on KITTI but provides no quantitative metrics, baselines, or error analysis. This makes it impossible to assess whether the joint method delivers a load-bearing improvement over prior extrinsic-only approaches.
- [Method] Method section: The central claim requires that the deep correspondence network (developed for rectified images) produces reliable matches on raw images whose distortion parameters are unknown and jointly estimated. No ablation or diagnostic is shown for correspondence accuracy or bias under radial-tangential distortion; if the feature extractor is sensitive, the subsequent nonlinear optimization can converge to a local minimum whose extrinsic error exceeds the claimed gain.
minor comments (2)
- [Experiments] Experiments: Define the exact KITTI sequences and the protocol for 'unseen camera-LiDAR pairs' to support reproducibility.
- Notation: Ensure all variables in the joint optimization objective (intrinsics, extrinsics, correspondence weights) are explicitly defined before first use.
Simulated Author's Rebuttal
We thank the referee for the constructive feedback. We address the two major comments below and will revise the manuscript to strengthen the presentation of results and add requested diagnostics.
read point-by-point responses
-
Referee: [Abstract] Abstract: The abstract claims 'improved extrinsic accuracy' on KITTI but provides no quantitative metrics, baselines, or error analysis. This makes it impossible to assess whether the joint method delivers a load-bearing improvement over prior extrinsic-only approaches.
Authors: We agree that the abstract should include quantitative support for the claim of improved extrinsic accuracy. In the revised version we will expand the abstract to report specific translation/rotation errors, the baselines used, and the magnitude of improvement on the KITTI unseen-pair experiments. revision: yes
-
Referee: [Method] Method section: The central claim requires that the deep correspondence network (developed for rectified images) produces reliable matches on raw images whose distortion parameters are unknown and jointly estimated. No ablation or diagnostic is shown for correspondence accuracy or bias under radial-tangential distortion; if the feature extractor is sensitive, the subsequent nonlinear optimization can converge to a local minimum whose extrinsic error exceeds the claimed gain.
Authors: The referee correctly notes the absence of a direct ablation on correspondence accuracy under unknown radial-tangential distortion. While the end-to-end KITTI results and the SfM initialization plus joint optimization provide indirect evidence of robustness, we acknowledge that an explicit diagnostic (e.g., matching precision/recall before and after distortion correction, or failure-case analysis) is missing. We will add this ablation study to the revised manuscript. revision: yes
Circularity Check
No circularity: joint optimization and SfM initialization are independent extensions
full rationale
The paper extends prior deep correspondence methods with SfM-based intrinsic initialization and joint nonlinear optimization over intrinsics plus extrinsics. No equation or claim reduces a derived quantity to a fitted parameter by construction, nor does any load-bearing premise rest on a self-citation chain that itself lacks external verification. The evaluation on unseen KITTI pairs supplies an external benchmark, keeping the derivation self-contained.
Axiom & Free-Parameter Ledger
axioms (2)
- domain assumption Pinhole camera model with radial-tangential distortion accurately represents the imaging process for the evaluated cameras.
- domain assumption Structure-from-motion provides a reliable initial estimate of intrinsics for subsequent joint optimization.
Reference graph
Works this paper leans on
-
[1]
Bevcar: Camera-radar fusion for bev map and object segmentation,
J. Schramm, N. V ¨odisch, K. Petek, B. R. Kiran, S. Yogamani, W. Burgard, and A. Valada, “Bevcar: Camera-radar fusion for bev map and object segmentation,” inIEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 2024, pp. 1435–1442
work page 2024
-
[2]
Convoluted mixture of deep experts for robust semantic segmentation,
A. Valada, A. Dhall, and W. Burgard, “Convoluted mixture of deep experts for robust semantic segmentation,” inIEEE/RSJ International conference on intelligent robots and systems (IROS) workshop, state estimation and terrain perception for all terrain mobile robots, vol. 2, 2016, p. 1
work page 2016
-
[3]
Towards robust semantic segmentation using deep fusion,
A. Valada, G. Oliveira, T. Brox, and W. Burgard, “Towards robust semantic segmentation using deep fusion,” inRobotics: Science and systems (RSS 2016) workshop, are the sceptics right? Limits and potentials of deep learning in robotics, vol. 114, 2016
work page 2016
-
[4]
Real-time multi-modal semantic fusion on unmanned aerial vehicles,
S. Bultmann, J. Quenzel, and S. Behnke, “Real-time multi-modal semantic fusion on unmanned aerial vehicles,” inEuropean Conference on Mobile Robots (ECMR), 2021
work page 2021
-
[5]
Up-fuse: Uncertainty-guided lidar-camera fusion for 3d panoptic segmentation,
R. Mohan, F. Drews, Y . Miron, D. Cattaneo, and A. Valada, “Up-fuse: Uncertainty-guided lidar-camera fusion for 3d panoptic segmentation,” Robotics: Science and Systems (RSS), 2026
work page 2026
-
[6]
Extending Kalibr: Calibrating the extrinsics of multiple IMUs and of individual axes,
J. Rehder, J. Nikolic, T. Schneider, T. Hinzmann, and R. Siegwart, “Extending Kalibr: Calibrating the extrinsics of multiple IMUs and of individual axes,” inIEEE International Conference on Robotics and Automation (ICRA), 2016, pp. 4304–4311
work page 2016
-
[7]
Joint camera intrinsic and LiDAR-camera extrinsic calibration,
G. Yan, F. He, C. Shi, P. Wei, X. Cai, and Y . Li, “Joint camera intrinsic and LiDAR-camera extrinsic calibration,” in2023 IEEE International Conference on Robotics and Automation (ICRA), 2023, pp. 11 446– 11 452
work page 2023
-
[8]
Joint intrinsic and extrinsic calibration of perception systems utilizing a calibration environment,
L. Wiesmann, T. L ¨abe, L. Nunes, J. Behley, and C. Stachniss, “Joint intrinsic and extrinsic calibration of perception systems utilizing a calibration environment,”IEEE Robotics and Automation Letters, vol. 9, no. 10, pp. 9103–9110, 2024
work page 2024
-
[9]
Online marker-free extrinsic camera calibration using person keypoint detections,
B. P ¨atzold, S. Bultmann, and S. Behnke, “Online marker-free extrinsic camera calibration using person keypoint detections,” in44th DAGM German Conference on Pattern Recognition (GCPR), 2022, pp. 300– 316
work page 2022
-
[10]
L. Li, H. Li, X. Liu, D. He, Z. Miao, F. Kong, R. Li, Z. Liu, and F. Zhang, “Joint intrinsic and extrinsic LiDAR-camera calibration in targetless environments using plane-constrained bundle adjustment,” arXiv preprint arXiv: 2308.12629, 2023
-
[11]
Cmrnet++: Map and camera agnostic monocular visual localization in lidar maps,
D. Cattaneo, D. G. Sorrenti, and A. Valada, “Cmrnet++: Map and camera agnostic monocular visual localization in lidar maps,”IEEE International Conference on Robotics and Automation (ICRA) Workshop on Emerging Learning and Algorithmic Methods for Data Association in Robotics, 2020
work page 2020
-
[12]
CMRNext: Camera to LiDAR matching in the wild for localization and extrinsic calibration,
D. Cattaneo and A. Valada, “CMRNext: Camera to LiDAR matching in the wild for localization and extrinsic calibration,”IEEE Transactions on Robotics, vol. 41, pp. 1995–2013, 2025
work page 1995
-
[13]
Ralf: Flow-based global and metric radar localization in lidar maps,
A. Nayak, D. Cattaneo, and A. Valada, “Ralf: Flow-based global and metric radar localization in lidar maps,” in2024 IEEE International Conference on Robotics and Automation (ICRA), 2024, pp. 5097–5103
work page 2024
-
[14]
MINIMA: Modality invariant image matching,
J. Ren, X. Jiang, Z. Li, D. Liang, X. Zhou, and X. Bai, “MINIMA: Modality invariant image matching,” in2025 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2025, pp. 23 059– 23 068
work page 2025
-
[15]
MATCHA: Towards matching anything,
F. Xue, S. Elflein, L. Leal-Taix ´e, and Q. Zhou, “MATCHA: Towards matching anything,” in2025 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2025, pp. 27 081–27 091
work page 2025
-
[16]
I2D-Loc++: Camera pose tracking in LiDAR maps with multi-view motion flows,
H. Yu, K. Chen, W. Yang, S. Scherer, and G.-S. Xia, “I2D-Loc++: Camera pose tracking in LiDAR maps with multi-view motion flows,” IEEE Robotics and Automation Letters, vol. 9, no. 9, pp. 8162–8169, 2024
work page 2024
-
[17]
Automatic target-less camera-LiDAR calibration from motion and deep point correspondences,
K. Petek, N. V¨odisch, J. Meyer, D. Cattaneo, A. Valada, and W. Burgard, “Automatic target-less camera-LiDAR calibration from motion and deep point correspondences,”IEEE Robotics and Automation Letters, vol. 9, no. 11, pp. 9978–9985, 2024
work page 2024
-
[18]
S. Agarwal, K. Mierle, and T. C. S. Team, “Ceres Solver,” 10 2023. [Online]. Available: https://github.com/ceres-solver/ceres-solver
work page 2023
-
[19]
Structure-from-motion revisited,
J. L. Sch ¨onberger and J.-M. Frahm, “Structure-from-motion revisited,” inConference on Computer Vision and Pattern Recognition (CVPR), 2016
work page 2016
-
[20]
On the maximum radius of polynomial lens distortion,
M. J. Leotta, D. Russell, and A. Matrai, “On the maximum radius of polynomial lens distortion,” in2022 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), 2022, pp. 2374–2382
work page 2022
-
[21]
Are we ready for autonomous driving? the kitti vision benchmark suite,
A. Geiger, P. Lenz, and R. Urtasun, “Are we ready for autonomous driving? the kitti vision benchmark suite,” in2012 IEEE Conference on Computer Vision and Pattern Recognition, 2012, pp. 3354–3361
work page 2012
-
[22]
Argoverse: 3d tracking and forecasting with rich maps,
M.-F. Chang, J. Lambert, P. Sangkloy, J. Singh, S. Bak, A. Hartnett, D. Wang, P. Carr, S. Lucey, D. Ramanan, and J. Hays, “Argoverse: 3d tracking and forecasting with rich maps,” in2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2019, pp. 8740–8749
work page 2019
-
[23]
Pandaset: Advanced sensor suite dataset for autonomous driving,
P. Xiao, Z. Shao, S. Hao, Z. Zhang, X. Chai, J. Jiao, Z. Li, J. Wu, K. Sun, K. Jiang, Y . Wang, and D. Yang, “Pandaset: Advanced sensor suite dataset for autonomous driving,” in2021 IEEE International Intelligent Transportation Systems Conference (ITSC), 2021, pp. 3095– 3101
work page 2021
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.