pith. sign in

arxiv: 1907.00930 · v1 · pith:IYKBD4LZnew · submitted 2019-07-01 · 💻 cs.RO

A Joint Optimization Approach of LiDAR-Camera Fusion for Accurate Dense 3D Reconstructions

Pith reviewed 2026-05-25 11:37 UTC · model grok-4.3

classification 💻 cs.RO
keywords LiDAR-camera fusionbundle adjustmentpoint cloud registration3D reconstructionsensor calibrationdense mappingextrinsic calibration
0
0 comments X

The pith

Jointly solving bundle adjustment and cloud registration fuses LiDAR and camera data into dense 3D models at 2.7 mm accuracy.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper establishes that an offline fusion method can jointly optimize camera poses through bundle adjustment and sensor extrinsic calibration through cloud registration. This produces dense reconstructions by exploiting complementary properties of accurate sparse LiDAR ranges and high-resolution camera textures. A sympathetic reader would care because the approach yields models with 2.7 mm average accuracy and 70 points per square centimeter resolution when measured against survey scanner ground truth, while also improving calibration over prior techniques.

Core claim

The method jointly solves a bundle adjustment problem and a cloud registration problem to compute camera poses and the sensor extrinsic calibration, enabling the construction of dense, accurate 3D models from LiDAR and camera data that reach an averaged accuracy of 2.7 mm and a resolution of 70 points per square cm against survey scanner ground truth, with calibration results that outperform the state-of-the-art.

What carries the argument

Joint bundle adjustment and cloud registration optimization that locates correlations between LiDAR geometry and camera texture to refine poses and calibration simultaneously.

If this is right

  • The resulting models reach an averaged accuracy of 2.7 mm when compared to survey scanner ground truth.
  • Reconstructions achieve a resolution of 70 points per square cm.
  • The computed extrinsic calibration outperforms the state-of-the-art method.
  • Dense 3D models can be built offline without separate calibration or post-processing stages.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The joint formulation could reduce reliance on separate calibration hardware or procedures in multi-sensor platforms.
  • If the optimization remains stable across varied environments, the method might support real-time extensions for robotic mapping.
  • Similar joint optimization could be tested on other sensor pairs where one provides sparse geometry and the other dense appearance.

Load-bearing premise

Reliable correlations between sparse geometric LiDAR data and dense textural camera data can be found and exploited inside the joint optimization without divergence.

What would settle it

Applying the method to fresh datasets and measuring reconstruction error against independent survey scanner ground truth yields average deviations well above 2.7 mm.

Figures

Figures reproduced from arXiv: 1907.00930 by Jingfeng Liu, Sebastian Scherer, Weikun Zhen, Yaoyu Hu.

Figure 1
Figure 1. Figure 1: A customized LiDAR-stereo system is used to collect [PITH_FULL_IMAGE:figures/full_fig_p001_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: An illustration of inaccurate edge extraction. [PITH_FULL_IMAGE:figures/full_fig_p001_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Given the stereo images and LiDAR point clouds, we first extract and match features to prepare three sets of observations, namely the landmark set L, the camera observation set Oc and the LiDAR observation set Ol . The observations are then fed to the joint optimization block to estimate optimal camera poses T ∗ c and sensor extrinsic transform T∗ e . Based on the latest estimation, the LiDAR ob￾servations… view at source ↗
Figure 3
Figure 3. Figure 3: A diagram of the proposed pipeline. In the observation extraction phase (front-end), SURF features are extracted and matched across all datasets to build the landmark set L and the camera observations Oc. On the other hand, point clouds are abstracted with BSC features, and roughly registered to find cloud transforms Tl . Then point-plane pairs are found to build the LiDAR observation set Ol . In the pose … view at source ↗
Figure 4
Figure 4. Figure 4: Left: An example of extracted BSC features (red) from a point cloud (grey). Middle: Registered point cloud map based on matched features. Right: Comparison of rough registration (top-right) and refined registration (bottom-right) in a zoomed-in window. to work. After that, a feature point is associated with depth value if a valid disparity value is found within a small radius (2 pixels in our implementatio… view at source ↗
Figure 5
Figure 5. Figure 5: An example of refining the stereo depth. The outliers [PITH_FULL_IMAGE:figures/full_fig_p005_5.png] view at source ↗
Figure 7
Figure 7. Figure 7: Built point cloud model of the T-shaped specimen. [PITH_FULL_IMAGE:figures/full_fig_p005_7.png] view at source ↗
Figure 8
Figure 8. Figure 8: Top: Estimated camera poses (numbered in the order of capture) and visual landmarks (blue points). We follow the convention to define camera frame z (blue) forward, y (green) downward. Bottom: Pose graph connections from images (blue) and poing clouds (gray) FOV and therefore guarantees a fully connected graph. As to the computation statistics, we provide a rough measure of the processing time of the major… view at source ↗
Figure 12
Figure 12. Figure 12: It can be seen that perturbing the translation won’t [PITH_FULL_IMAGE:figures/full_fig_p006_12.png] view at source ↗
Figure 9
Figure 9. Figure 9: From top to bottom, the results of three tests are visualized: a squared pillar (top), a cylinder pillar (middle) and a bridge pillar (bottom). From left to right, we visualize the camera poses and landmarks (blue points), a sample of the image data, complete LiDAR point cloud, overlaid LiDAR and stereo point cloud, dense stereo point cloud. (a) Raw image (b) Edges (c) Edge score (d) Initialization (e) Edg… view at source ↗
Figure 12
Figure 12. Figure 12: Changes of cost values w.r.t. perturbed extrinsic transform. From left to right, the three columns show the cost changes in three tests: with rotation fixed, with rotation about one axis, and with rotation about two axes. Within each test, translation (top plots) and rotation (bottom plots) perturbations are visualized separately. improves the overall model accuracy and we also benefit from the convenienc… view at source ↗
Figure 11
Figure 11. Figure 11: Cutaway view of the overlaid LiDAR clouds (white) [PITH_FULL_IMAGE:figures/full_fig_p007_11.png] view at source ↗
Figure 13
Figure 13. Figure 13: Comparing the reconstructed models with the ground truth model built by the FARO scanner. On the left are visualizations of the ground truth model and the distance map of reconstructed models, where the color encodes the distance error between two point clouds. On the right are the distance histograms corresponding to each comparison and the averaged errors are marked by the red vertical bar. by station, … view at source ↗
read the original abstract

Fusing data from LiDAR and camera is conceptually attractive because of their complementary properties. For instance, camera images are higher resolution and have colors, while LiDAR data provide more accurate range measurements and have a wider Field Of View (FOV). However, the sensor fusion problem remains challenging since it is difficult to find reliable correlations between data of very different characteristics (geometry vs. texture, sparse vs. dense). This paper proposes an offline LiDAR-camera fusion method to build dense, accurate 3D models. Specifically, our method jointly solves a bundle adjustment (BA) problem and a cloud registration problem to compute camera poses and the sensor extrinsic calibration. In experiments, we show that our method can achieve an averaged accuracy of 2.7mm and resolution of 70 points per square cm by comparing to the ground truth data from a survey scanner. Furthermore, the extrinsic calibration result is discussed and shown to outperform the state-of-the-art method.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

0 major / 2 minor

Summary. The paper introduces an offline fusion method for LiDAR and camera data to construct dense and accurate 3D models. The approach jointly optimizes a bundle adjustment (BA) problem for estimating camera poses and a cloud registration problem to determine the extrinsic calibration between the LiDAR and camera. Experimental results show an average accuracy of 2.7 mm and a resolution of 70 points per square cm when validated against ground truth from a survey scanner, and the extrinsic calibration outperforms state-of-the-art methods.

Significance. If the reported accuracy and resolution hold under the joint optimization, the work would advance multi-sensor 3D reconstruction in robotics by providing a principled way to exploit complementary LiDAR geometry and camera texture without separate calibration pipelines. The independent survey-scanner validation and explicit formulation of cost terms linking projected points to image features add credibility to the quantitative claims.

minor comments (2)
  1. [Method] The balancing weights between the BA and registration terms are listed as free parameters; a brief sensitivity study or default values in the experiments section would clarify robustness.
  2. [Abstract] The abstract states the 2.7 mm / 70 pts/cm² figures but omits the number of scenes or total points compared; adding this would help readers assess the scope of the validation.

Simulated Author's Rebuttal

0 responses · 0 unresolved

We thank the referee for the positive assessment of our work on joint LiDAR-camera optimization for dense 3D reconstruction and for recommending minor revision. No specific major comments were raised in the report.

Circularity Check

0 steps flagged

No significant circularity

full rationale

The paper presents an empirical method for joint bundle adjustment and point-cloud registration whose accuracy (2.7 mm, 70 pts/cm²) is measured against an independent external survey-scanner ground truth. No equation, cost term, or claimed prediction is shown to be definitionally equivalent to its own fitted inputs or to a self-citation chain; the optimization formulation and convergence are described explicitly and the quantitative claims rest on external validation rather than internal re-labeling of the same data.

Axiom & Free-Parameter Ledger

1 free parameters · 2 axioms · 0 invented entities

The approach rests on standard domain assumptions from bundle adjustment and registration literature plus likely tunable weights in the joint cost function; no new entities are introduced.

free parameters (1)
  • balancing weights between BA and registration terms
    Joint optimization cost functions typically require hand-tuned or fitted scalar weights to combine the two objectives; these are not reported in the abstract.
axioms (2)
  • domain assumption Reliable feature correspondences exist between camera images and LiDAR point clouds that can be used inside a single optimization.
    Invoked by the decision to jointly solve the two problems.
  • domain assumption The LiDAR-camera extrinsic transform is constant and can be recovered by registration.
    Core premise of the calibration component.

pith-pipeline@v0.9.0 · 5703 in / 1177 out tokens · 45165 ms · 2026-05-25T11:37:59.267672+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

28 extracted references · 28 canonical work pages

  1. [1]

    Automatic extrinsic calibration of a camera and a 3d lidar using line and plane correspondences,

    L. Zhou, Z. Li, and M. Kaess, “Automatic extrinsic calibration of a camera and a 3d lidar using line and plane correspondences,” in 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). IEEE, 2018, pp. 5562–5569

  2. [2]

    Automatic camera and range sensor calibration using a single shot,

    A. Geiger, F. Moosmann, ¨O. Car, and B. Schuster, “Automatic camera and range sensor calibration using a single shot,” in 2012 IEEE/RSJ International Conference on Robotics and Automation (ICRA). IEEE, 2012, pp. 3936–3943

  3. [3]

    Accurate calibration of lidar-camera systems using ordinary boxes,

    Z. Pusztai and L. Hajder, “Accurate calibration of lidar-camera systems using ordinary boxes,” 2017

  4. [4]

    Calibration of rgb camera with velodyne lidar,

    M. Vel’as, M. ˇSpanˇel, Z. Materna, and A. Herout, “Calibration of rgb camera with velodyne lidar,” 2014

  5. [5]

    3d lidar-camera extrinsic calibration using an arbitrary trihedron,

    X. Gong, Y . Lin, and J. Liu, “3d lidar-camera extrinsic calibration using an arbitrary trihedron,” Sensors, vol. 13, no. 2, pp. 1902–1918, 2013

  6. [6]

    Automatic online calibration of cameras and lasers

    J. Levinson and S. Thrun, “Automatic online calibration of cameras and lasers.” in Robotics: Science and Systems , vol. 2, 2013

  7. [7]

    Automatic targetless extrinsic calibration of a 3d lidar and camera by maximizing mutual information

    G. Pandey, J. R. McBride, S. Savarese, and R. M. Eustice, “Automatic targetless extrinsic calibration of a 3d lidar and camera by maximizing mutual information.” in AAAI, 2012

  8. [8]

    Lidar and camera calibra- tion using motions estimated by sensor fusion odometry,

    R. Ishikawa, T. Oishi, and K. Ikeuchi, “Lidar and camera calibra- tion using motions estimated by sensor fusion odometry,” in 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). IEEE, 2018, pp. 7342–7349

  9. [9]

    Odometry-based online extrinsic sensor calibration,

    S. Schneider, T. Luettel, and H.-J. Wuensche, “Odometry-based online extrinsic sensor calibration,” in 2013 IEEE/RSJ International Confer- ence on Intelligent Robots and Systems (IROS) . IEEE, 2013, pp. 1287–1292

  10. [10]

    Extrinsic calibration from per-sensor egomotion,

    J. Brookshire and S. Teller, “Extrinsic calibration from per-sensor egomotion,” Robotics: Science and Systems VIII , pp. 504–512, 2013

  11. [11]

    A new technique for fully autonomous and efficient 3d robotics hand/eye calibration,

    R. Y . Tsai and R. K. Lenz, “A new technique for fully autonomous and efficient 3d robotics hand/eye calibration,” IEEE Transactions on robotics and automation , vol. 5, no. 3, pp. 345–358, 1989

  12. [12]

    Upsampling range data in dynamic environments,

    J. Dolson, J. Baek, C. Plagemann, and S. Thrun, “Upsampling range data in dynamic environments,” in 2010 IEEE Conference on Com- puter Vision and Pattern Recognition (CVPR). IEEE, 2010, pp. 1141– 1148

  13. [13]

    Sensor fusion of cameras and a laser for city-scale 3d reconstruction,

    Y . Bok, D.-G. Choi, and I. S. Kweon, “Sensor fusion of cameras and a laser for city-scale 3d reconstruction,” Sensors, vol. 14, no. 11, pp. 20 882–20 909, 2014

  14. [14]

    Colourising point clouds using independent cameras,

    P. Vechersky, M. Cox, P. Borges, and T. Lowe, “Colourising point clouds using independent cameras,” IEEE Robotics and Automation Letters, vol. 3, no. 4, pp. 3575–3582, 2018

  15. [15]

    Visual-lidar odometry and mapping: Low- drift, robust, and fast,

    J. Zhang and S. Singh, “Visual-lidar odometry and mapping: Low- drift, robust, and fast,” in 2015 IEEE International Conference on Robotics and Automation (ICRA) . IEEE, 2015, pp. 2174–2181

  16. [16]

    Integrating lidar into stereo for fast and improved disparity computation,

    D. Huber, T. Kanade, et al., “Integrating lidar into stereo for fast and improved disparity computation,” in 2011 International Conference on 3D Imaging, Modeling, Processing, Visualization and Transmission (3DIMPVT). IEEE, 2011, pp. 405–412

  17. [17]

    Incremental dense multi-modal 3d scene reconstruction,

    O. Miksik, Y . Amar, V . Vineet, P. P´erez, and P. H. Torr, “Incremental dense multi-modal 3d scene reconstruction,” in 2015 IEEE/RSJ Inter- national Conference on Intelligent Robots and Systems (IROS). IEEE, 2015, pp. 908–915

  18. [18]

    Real-time probabilistic fusion of sparse 3d lidar and dense stereo,

    W. Maddern and P. Newman, “Real-time probabilistic fusion of sparse 3d lidar and dense stereo,” in 2016 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) . IEEE, 2016, pp. 2181– 2188

  19. [19]

    Fusion of stereo and lidar data for dense depth map computation,

    H. Courtois and N. Aouf, “Fusion of stereo and lidar data for dense depth map computation,” in Research, Education and Development of Unmanned Aerial Systems (RED-UAS), 2017 Workshop on . IEEE, 2017, pp. 186–191

  20. [20]

    Automatic fusion of digital images and laser scanner data for heritage preservation,

    W. Moussa, M. Abdel-Wahab, and D. Fritsch, “Automatic fusion of digital images and laser scanner data for heritage preservation,” in Euro-Mediterranean Conference. Springer, 2012, pp. 76–85

  21. [21]

    Combined high resolution laser scanning and photogrammetrical documentation of the pyramids at giza,

    W. Neubauer, M. Doneus, N. Studnicka, and J. Riegl, “Combined high resolution laser scanning and photogrammetrical documentation of the pyramids at giza,” in CIPA XX International Symposium . Citeseer, 2005, pp. 470–475

  22. [22]

    Towards a 3d true colored space by the fusion of laser scanner point cloud and digital photos,

    A. Abdelhafiz, B. Riedel, and W. Niemeier, “Towards a 3d true colored space by the fusion of laser scanner point cloud and digital photos,” in Proceedings of the ISPRS Working Group V/4 Workshop (3D-ARCH . Citeseer, 2005

  23. [23]

    Stereo processing by semiglobal matching and mutual information,

    H. Hirschmuller, “Stereo processing by semiglobal matching and mutual information,” IEEE Transactions on pattern analysis and machine intelligence, vol. 30, no. 2, pp. 328–341, 2008

  24. [24]

    Surf: Speeded up robust features,

    H. Bay, T. Tuytelaars, and L. Van Gool, “Surf: Speeded up robust features,” in European conference on computer vision. Springer, 2006, pp. 404–417

  25. [25]

    A comprehensive performance evaluation of 3d local feature descriptors,

    Y . Guo, M. Bennamoun, F. Sohel, M. Lu, J. Wan, and N. M. Kwok, “A comprehensive performance evaluation of 3d local feature descriptors,” International Journal of Computer Vision , vol. 116, no. 1, pp. 66–89, 2016

  26. [26]

    Method for registration of 3-d shapes,

    P. J. Besl and N. D. McKay, “Method for registration of 3-d shapes,” in Sensor Fusion IV: Control Paradigms and Data Structures , vol. 1611. International Society for Optics and Photonics, 1992, pp. 586–607

  27. [27]

    A novel binary shape context for 3d local surface description,

    Z. Dong, B. Yang, Y . Liu, F. Liang, B. Li, and Y . Zang, “A novel binary shape context for 3d local surface description,” ISPRS Journal of Photogrammetry and Remote Sensing, vol. 130, pp. 431–452, 2017

  28. [28]

    The OpenCV Library,

    G. Bradski, “The OpenCV Library,” Dr. Dobb’s Journal of Software Tools, 2000