On RGB-TIR Stereo Calibration under Extreme Resolution Asymmetry
Pith reviewed 2026-05-20 19:37 UTC · model grok-4.3
The pith
Baseline-constrained bundle adjustment recovers accurate geometry for RGB paired with 625-times lower resolution TIR camera.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The framework achieves a recovered stereo baseline of 32.7 mm against a nominal 30 mm value together with an overall reprojection error of 0.382 pixels by combining modality-specific patterns on an active OLED screen, a dedicated low-resolution corner detector, and a baseline-constrained bundle adjustment that enforces physically consistent rig geometry despite degeneracy from the planar calibration object.
What carries the argument
Baseline-constrained bundle adjustment that enforces physically consistent rig geometry under the planar-calibration-object degeneracy.
If this is right
- The calibrated system produces consistent TIR-to-RGB projections that support both constant-depth and per-pixel depth estimation.
- Validation on a building mock-up demonstrates suitability for multimodal energy performance assessment.
- The overall reprojection error stays at 0.382 pixels across the extreme resolution pair.
Where Pith is reading between the lines
- The active single-surface pattern approach could reduce setup time for calibration in field conditions where separate targets are impractical.
- The same constrained adjustment principle may stabilize calibration for other sensor pairs that suffer from large resolution differences.
Load-bearing premise
The dedicated corner detection algorithm reliably locates checkerboard corners in 80 by 62 pixel thermal images through perspective rectification, Hessian analysis, and mean-shift refinement without requiring per-frame parameter tuning.
What would settle it
A validation run on the thermally active building mock-up that produces either a measured baseline differing by more than a few millimeters from 30 mm or a reprojection error substantially above 0.382 pixels would show the constrained adjustment fails to maintain consistent geometry.
Figures
read the original abstract
Accurate geometric calibration of RGB-thermal infrared (TIR) stereo camera systems is essential for multimodal building envelope analysis, yet remains challenging when low-cost thermal sensors with very low spatial resolution are employed. This paper presents a practical stereo calibration framework for an RGB camera (2028 x 1520 px) paired with a TIR camera operating at only 80 x 62 px - a pixel-count ratio of approximately 1:625. An active OLED screen dynamically switches modality-specific patterns (checkerboard for TIR, ChArUco for RGB) on a single physical surface, providing controlled and repeatable thermal contrast. A dedicated corner detection algorithm combining perspective rectification, Hessian saddle-point analysis, and Mean Shift localisation achieves reliable checkerboard detection at 80 x 62 px without per-frame parameter tuning. A baseline-constrained bundle adjustment enforces physically consistent rig geometry under the planar-calibration-object degeneracy, yielding a stereo baseline of 32.7 mm (nominal 30 mm) with an overall reprojection error of 0.382 px. The system is validated on a thermally active building mock-up using constant-depth and per-pixel depth estimation, demonstrating consistent TIR-to-RGB projection suitable for building energy performance assessment.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper presents a stereo calibration framework for RGB-TIR camera pairs under extreme resolution asymmetry (RGB 2028×1520 px paired with TIR 80×62 px). It uses an active OLED screen to display modality-specific patterns (checkerboard for TIR, ChArUco for RGB), a dedicated corner detector combining perspective rectification, Hessian saddle-point analysis and Mean Shift localisation for reliable TIR corner finding without per-frame tuning, and a baseline-constrained bundle adjustment to enforce physically consistent rig geometry despite planar-calibration-object degeneracy. The method reports a recovered stereo baseline of 32.7 mm (nominal 30 mm) and overall reprojection error of 0.382 px, with validation via constant-depth and per-pixel depth estimation on a thermally active building mock-up.
Significance. If the central results hold, the work offers a practical solution for multimodal calibration in applications such as building energy performance assessment, where low-cost low-resolution TIR sensors are paired with high-resolution RGB cameras. The active display approach for controlled thermal contrast and the explicit baseline constraint in bundle adjustment directly address the degeneracy that arises with planar targets at such extreme pixel-count ratios (~1:625). The reported baseline accuracy and sub-pixel reprojection error indicate potential for consistent TIR-to-RGB projection, which is a load-bearing requirement for downstream depth and thermal analysis tasks.
major comments (2)
- [Abstract / corner detection section] Abstract and methods description of corner detection: the claim that the dedicated detector (perspective rectification + Hessian saddle-point + Mean Shift) achieves reliable checkerboard detection at 80×62 px without per-frame parameter tuning is load-bearing for the entire pipeline, yet the manuscript supplies only the aggregate reprojection error of 0.382 px. No independent localisation error metrics, per-modality residual statistics, detection success rates, or comparison against manual annotations are reported; at this resolution even a 0.3–0.5 px systematic bias corresponds to several degrees of angular error that would directly affect the bundle-adjustment solution yielding the 32.7 mm baseline.
- [Validation / experimental results] Validation section: the demonstration on the thermally active building mock-up reports consistent TIR-to-RGB projection via constant-depth and per-pixel depth estimation, but lacks quantitative controls such as comparison to independent ground-truth depth measurements, per-pixel error maps, or ablation of the baseline constraint's contribution to the final accuracy.
minor comments (2)
- [Abstract] The stated pixel-count ratio of approximately 1:625 is slightly rounded; an exact calculation (2028×1520)/(80×62) yields ~621.5, which should be reported precisely for reproducibility.
- [Results] Notation for the recovered baseline (32.7 mm) and nominal value (30 mm) would benefit from explicit uncertainty intervals or covariance from the bundle adjustment to allow readers to assess how close the result truly is to the physical rig.
Simulated Author's Rebuttal
We thank the referee for their thorough review and constructive comments on our manuscript. We address each of the major comments point by point below, indicating the changes we will make in the revised version.
read point-by-point responses
-
Referee: [Abstract / corner detection section] Abstract and methods description of corner detection: the claim that the dedicated detector (perspective rectification + Hessian saddle-point + Mean Shift) achieves reliable checkerboard detection at 80×62 px without per-frame parameter tuning is load-bearing for the entire pipeline, yet the manuscript supplies only the aggregate reprojection error of 0.382 px. No independent localisation error metrics, per-modality residual statistics, detection success rates, or comparison against manual annotations are reported; at this resolution even a 0.3–0.5 px systematic bias corresponds to several degrees of angular error that would directly affect the bundle-adjustment solution yielding the 32.7 mm baseline.
Authors: We agree that additional metrics specific to the corner detection would provide stronger evidence for the reliability of the detector at such low resolution. The reported 0.382 px reprojection error is an end-to-end metric after bundle adjustment, and the close match of the estimated baseline (32.7 mm) to the nominal value (30 mm) serves as supporting evidence for the overall calibration quality. Nevertheless, to address the referee's concern directly, we will revise the manuscript to include the detection success rate over the sequence of frames, average per-modality residual statistics from the bundle adjustment, and a limited comparison of detected corners against manual annotations for a representative subset of images. These additions will allow for an independent assessment of the localisation accuracy. revision: yes
-
Referee: [Validation / experimental results] Validation section: the demonstration on the thermally active building mock-up reports consistent TIR-to-RGB projection via constant-depth and per-pixel depth estimation, but lacks quantitative controls such as comparison to independent ground-truth depth measurements, per-pixel error maps, or ablation of the baseline constraint's contribution to the final accuracy.
Authors: We appreciate the suggestion for more rigorous quantitative validation. In the revised manuscript, we will add per-pixel error maps for the depth estimation results and an ablation study demonstrating the effect of including the baseline constraint in the bundle adjustment on both the reprojection error and the recovered baseline length. However, our experimental validation on the thermally active building mock-up was designed around the consistency of projections and depth estimates derived from the calibrated system itself; we did not collect independent ground-truth depth data using additional sensors. We will explicitly discuss this aspect as a limitation of the current validation approach. revision: partial
- Provision of independent ground-truth depth measurements for the building mock-up, since no such external depth reference data was acquired in the experiments.
Circularity Check
No circularity: derivation relies on standard models plus independent physical constraint
full rationale
The paper applies established camera models, a custom but explicitly algorithmic corner detector (perspective rectification + Hessian + Mean Shift), and bundle adjustment augmented by a physical baseline constraint drawn from rig geometry. The reported 32.7 mm baseline is an optimization output that differs from the nominal 30 mm input, showing the result is not forced by construction. No equations or claims reduce the final calibration parameters to fitted inputs or self-citations; the method is presented as a practical pipeline whose correctness is checked against external physical expectations rather than internal redefinitions. This is a self-contained engineering contribution against standard benchmarks.
Axiom & Free-Parameter Ledger
axioms (2)
- standard math Standard pinhole camera model with radial and tangential distortion applies to both RGB and TIR sensors.
- domain assumption Planar calibration target introduces degeneracy that can be resolved by enforcing known baseline distance.
Reference graph
Works this paper leans on
-
[1]
Global Buildings and Construction Report 2023
International Energy Agency. Global Buildings and Construction Report 2023. Technical report, International Energy Agency, Paris, France, 2023. Accessed: 4 November 2025
work page 2023
-
[2]
Wolk, S.; Reinhart, C. Semantic Building Energy Modeling: Analysis across Geospatial Scales.Building and Environment2025,276, 112883. https://doi.org/10.1016/j.buildenv.2025.112883
-
[3]
Dlesk, A.; Vach, K. Point Cloud Generation of a Building from Close Range Thermal Images.ISPRS Archives 2019,XLII-5/W2, 29–33. https://doi.org/10.5194/isprs-archives-XLII-5-W2-29-2019
-
[4]
Zheng, H.; Zhong, X.; Yan, J.; Zhao, L.; Wang, X. A Thermal Performance Detection Method for Building Envelope Based on 3D Model Generated by UAV Thermal Imagery.Energies2020,13, 6677. https: //doi.org/10.3390/en13246677
-
[5]
Hoegner, L.; Abmayr, T.; Tosic, D.; Turzer, S.; Stilla, U. Fusion of 3D Point Clouds with TIR Images for Indoor Scene Reconstruction.The International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences2018,XLII-1, 189–194. https://doi.org/10.5194/isprs-archives-XLII-1-189-2018
-
[6]
Hassan, M.; Forest, F.; Fink, O.; Mielle, M. ThermoNeRF: A Multimodal Neural Radiance Field for Joint RGB–Thermal Novel View Synthesis of Building Facades.Advanced Engineering Informatics2025,65, 103345. https://doi.org/10.1016/j.aei.2025.103345
-
[7]
Iwaszczuk, D.; Stilla, U. Camera pose refinement by matching uncertain 3D building models with thermal infrared image sequences for high quality texture extraction.Photogrammetry and Remote Sensing2017, 132, 33–47. https://doi.org/10.1016/j.isprsjprs.2017.08.006
-
[8]
RGB-D and Thermal Sensor Fusion: A Systematic Literature Review.IEEE Access2023,11, 93347–93379
Brenner, M.; Reyes, N.H.; Susnjak, T.; Barczak, A.L.C. RGB-D and Thermal Sensor Fusion: A Systematic Literature Review.IEEE Access2023,11, 93347–93379. https://doi.org/10.1109/ACCESS.2023.3301119
-
[9]
ElSheikh, A.; Abu-Nabah, B.A.; Hamdan, M.O.; Tian, G.Y. Infrared Camera Geometric Calibration: A Review and a Precise Thermal Radiation Checkerboard Target.Sensors2023,23, 3479. https://doi.org/10.3390/s230 73479
-
[10]
Robust Low Resolution Thermal Stereo Camera Calibration
Zoetgnandé, Y.W.K.; Fougères, A.J.; Cormier, G.; Dillenseger, J.L. Robust Low Resolution Thermal Stereo Camera Calibration. In Proceedings of the Eleventh International Conference on Machine Vision (ICMV 2018), Munich, Germany, 2019; Vol. 11041,Proceedings of SPIE, p. 110411D. https://doi.org/10.1117/12.2523440
-
[11]
ROCHADE: Robust Checkerboard Advanced Detection for Camera Calibration
Placht, S.; Fürsattel, P .; Assoumou Mengue, E.; Hofmann, H.; Schaller, C.; Balda, M.; Angelopoulou, E. ROCHADE: Robust Checkerboard Advanced Detection for Camera Calibration. In Proceedings of the Computer Vision – ECCV 2014, Cham, 2014; Vol. 8692,Lecture Notes in Computer Science, pp. 766–779. https://doi.org/10.1007/978-3-319-10593-2_50
-
[12]
Chen, M.; Tian, S.; He, F.; Fu, Q.; Gu, Q.; Wu, B. Modeling and Calibration of Active Thermal-Infrared Visual System for Industrial HMI.Electronics2022,11, 1230. https://doi.org/10.3390/electronics11081230. 26 of 27
-
[13]
ThermalGS: Dynamic 3D Thermal Reconstruc- tion with Gaussian Splatting.Remote Sensing2025,17, 335
Liu, Y.; Chen, X.; Yan, S.; Cui, Z.; Xiao, H.; Liu, Y.; Zhang, M. ThermalGS: Dynamic 3D Thermal Reconstruc- tion with Gaussian Splatting.Remote Sensing2025,17, 335. https://doi.org/10.3390/rs17020335
-
[14]
Zhang, Z. A Flexible New Technique for Camera Calibration.IEEE Transactions on Pattern Analysis and Machine Intelligence2000,22, 1330–1334. https://doi.org/10.1109/34.888718
-
[15]
OpenCV Documentation: Camera Calibration and 3D Reconstruction (calib3d Module), 2024
OpenCV Developers. OpenCV Documentation: Camera Calibration and 3D Reconstruction (calib3d Module), 2024. Accessed: 01 December 2025
work page 2024
-
[16]
Alba, M.I.; Barazzetti, L.; Scaioni, M.; Rosina, E.; Previtali, M. Mapping Infrared Data on Terrestrial Laser Scanning 3D Models of Buildings.Remote Sensing2011,3, 1847–1870. https://doi.org/10.3390/rs3091847
-
[17]
Improving Calibration of Thermal Stereo Cameras Using Heated Calibration Board
Saponaro, P .; Sorensen, S.; Rhein, S.; Kambhamettu, C. Improving Calibration of Thermal Stereo Cameras Using Heated Calibration Board. In Proceedings of the Proceedings of the 2015 IEEE International Conference on Image Processing (ICIP), Quebec City, QC, Canada, 2015; pp. 4718–4722. https://doi.org/10.1109/ICIP . 2015.7351702
-
[18]
Roshan, M.C.; Isaksson, M.; Pranata, A. A Geometric Calibration Method for Thermal Cameras Using a ChArUco Board.Infrared Physics & Technology2024,138, 105219. https://doi.org/10.1016/j.infrared.2024.1 05219
-
[19]
Vidas, S.; Lakemond, R.; Denman, S.; Fookes, C.; Sridharan, S.; Wark, T. A Mask-Based Approach for the Geometric Calibration of Thermal-Infrared Cameras.IEEE Transactions on Instrumentation and Measurement 2012,61, 1625–1635. https://doi.org/10.1109/TIM.2012.2182851
-
[20]
Sher, B.A.; Xu, X.; Chen, G.; Feng, C. Marker-based Extrinsic Calibration for Thermal–RGB Camera Pair with Different Calibration Board Materials. In Proceedings of the Proceedings of the 40th International Symposium on Automation and Robotics in Construction (ISARC), Chennai, India, 2023; pp. 490–497. https://doi.org/10.22260/isarc2023/0066
-
[21]
Piccinelli, N.; De Rossi, G.; Daffara, C.; Muradore, R. A Passive Stereo Calibration Technique for Visible– Thermal, Low-Resolution Imaging in Remote Sensing Applications.Measurement2024,231, 114647. https: //doi.org/10.1016/j.measurement.2024.114647
-
[22]
Vidas, S.; Moghadam, P .; Bosse, M. 3D Thermal Mapping of Building Interiors Using an RGB-D and Thermal Camera.Proceedings of the IEEE International Conference on Robotics and Automation (ICRA)2013, pp. 2311–2318. https://doi.org/10.1109/ICRA.2013.6630890
-
[23]
Elias, M.; Weitkamp, A.; Eltner, A. Multi-modal Image Matching to Colorize a SLAM Based Point Cloud with Arbitrary Data from a Thermal Camera.ISPRS Open Journal of Photogrammetry and Remote Sensing2023, 9, 100041. https://doi.org/10.1016/j.ophoto.2023.100041
-
[24]
Camera Pose Revisited.Applied Sciences2026,16, 2690
Skarbek, W.; Salamonowicz, M.; Król, M. Camera Pose Revisited.Applied Sciences2026,16, 2690. https: //doi.org/10.3390/app16062690
-
[25]
ThermalGaussian: Thermal 3D Gaus- sian Splatting
Lu, R.; Chen, H.; Zhu, Z.; Qin, Y.; Lu, M.; zhang, L.; Yan, C.; anke xue. ThermalGaussian: Thermal 3D Gaus- sian Splatting. In Proceedings of the The Thirteenth International Conference on Learning Representations, 2025
work page 2025
-
[26]
Rotationally Invariant Image Operators
Beaudet, P .R. Rotationally Invariant Image Operators. In Proceedings of the Proceedings of the International Joint Conference on Pattern Recognition, Kyoto, Japan, 1978; pp. 579–583
work page 1978
-
[27]
Garrido-Jurado, S.; Muñoz-Salinas, R.; Madrid-Cuevas, F.J.; Marín-Jiménez, M.J. Automatic Generation and Detection of Highly Reliable Fiducial Markers Under Occlusion.Pattern Recognition2014,47, 2280–2292. https://doi.org/10.1016/j.patcog.2014.01.005
-
[28]
Camera Calibration with ChArUco Boards, 2024
OpenCV Developers. Camera Calibration with ChArUco Boards, 2024. Accessed: 01 December 2025
work page 2024
-
[29]
Contrast Limited Adaptive Histogram Equalization
Zuiderveld, K. Contrast Limited Adaptive Histogram Equalization. InGraphics Gems IV; Heckbert, P .S., Ed.; Academic Press: San Diego, CA, 1994; pp. 474–485
work page 1994
-
[30]
Otsu, N. A Threshold Selection Method from Gray-Level Histograms.IEEE Transactions on Systems, Man, and Cybernetics1979,9, 62–66. https://doi.org/10.1109/TSMC.1979.4310076
-
[31]
Comaniciu, D.; Meer, P . Mean Shift: A Robust Approach Toward Feature Space Analysis.IEEE Transactions on Pattern Analysis and Machine Intelligence2002,24, 603–619. https://doi.org/10.1109/34.1000236
-
[32]
Faugeras, O.; Luong, Q.T.; Papadopoulo, T.The Geometry of Multiple Images: The Laws That Govern the Formation of Multiple Images of a Scene and Some of Their Applications; MIT Press: Cambridge, MA, 2001
work page 2001
-
[33]
Hartley, R.; Zisserman, A.Multiple View Geometry in Computer Vision, 2 ed.; Cambridge University Press: Cambridge, UK, 2003
work page 2003
- [34]
-
[35]
Unified Temporal and Spatial Calibration for Multi-Sensor Systems
Furgale, P .; Rehder, J.; Siegwart, R. Unified Temporal and Spatial Calibration for Multi-Sensor Systems. In Proceedings of the Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Tokyo, Japan, 2013; pp. 1280–1286. https://doi.org/10.1109/IROS.2013.6696514
-
[36]
Bundle Adjustment — A Modern Synthesis
Triggs, B.; McLauchlan, P .F.; Hartley, R.I.; Fitzgibbon, A.W. Bundle Adjustment — A Modern Synthesis. In Vision Algorithms: Theory and Practice; Triggs, B.; Zisserman, A.; Szeliski, R., Eds.; Springer: Berlin, Heidelberg, 2000; Vol. 1883,Lecture Notes in Computer Science, pp. 298–372. https://doi.org/10.1007/3-540-44480-7_21
-
[37]
Depth Pro: Sharp Monocular Metric Depth in Less Than a Second
Bochkovskii, A.; Delaunoy, A.; Germain, H.; Santos, M.; Zhou, Y.; Richter, S.R.; Koltun, V . Depth Pro: Sharp Monocular Metric Depth in Less Than a Second. In Proceedings of the Proceedings of the 13th International Conference on Learning Representations (ICLR), 2025. Available online: https://openreview.net/forum? id=aueXfY0Clv
work page 2025
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.