Lights Out: A Nighttime UAV Localization Framework Using Thermal Imagery and Semantic 3D Maps
Pith reviewed 2026-05-07 13:30 UTC · model grok-4.3
The pith
Semantic reprojection aligns nighttime thermal images with daytime 3D maps to localize UAVs without GNSS.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Localization is formulated in a shared semantic domain and solved via a symmetric bidirectional reprojection objective with confusion-aware weighting to improve robustness under segmentation uncertainty. Evaluated on real nighttime UAV trajectories, it achieves a bias-corrected RMSE2D of 2.18 m and median of 1.52 m, with errors concentrated in semantically ambiguous regions.
What carries the argument
The symmetric bidirectional reprojection objective with confusion-aware weighting, which aligns semantic labels from thermal segmentation to the 3D map while downweighting uncertain labels.
If this is right
- Localization performance reaches 2.18 m bias-corrected RMSE2D on 6.5 km of real nighttime flights.
- Accuracy correlates strongly with availability of semantic edge evidence in the scene.
- Large localization errors occur only in spatially localized ambiguous areas rather than randomly.
- Semantic reprojection provides a viable path for globally referenced nighttime UAV localization using thermal imagery alone.
Where Pith is reading between the lines
- Such a system could serve as a backup for GNSS in urban or contested environments where night operations are common.
- Extending the approach to dynamic maps or online map updates might handle changing environments better.
- Testing in more varied terrains could reveal how semantic density affects reliability across different settings.
Load-bearing premise
That thermal image segmentation produces semantic categories that match those in the daytime RGB map reliably enough for reprojection to disambiguate position.
What would settle it
Running the system on a nighttime trajectory in an area with abundant semantic edges but observing RMSE2D errors significantly above 2 meters would falsify the claim of robust localization.
Figures
read the original abstract
Reliable backup localization for unmanned aerial vehicles (UAVs) operating in GNSS-denied nighttime conditions remains an open challenge due to the severe modality gap between daytime RGB maps and nighttime thermal imagery. This work presents a semantic reprojection framework for map-relative nighttime UAV localization by aligning segmented thermal observations with a globally referenced, semantically labeled 3D map constructed from daytime RGB data. Rather than relying on appearance-based correspondence, localization is formulated in a shared semantic domain and solved via a symmetric bidirectional reprojection objective with confusion-aware weighting to improve robustness under segmentation uncertainty. The approach is evaluated offline across 6.5 km of nighttime, real-world UAV flight trajectories in urban and semi-structured environments. Relative to RTK GNSS ground truth, the system achieves a bias-corrected RMSE2D of 2.18 m and a median RMSE2D of 1.52 m. Results show that localization performance is strongly correlated with the availability of semantic edge evidence and that large-error events are spatially localized to semantically ambiguous areas rather than uniformly distributed. These findings indicate that semantic reprojection offers a promising pathway toward globally referenced nighttime UAV localization using thermal imagery alone.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript proposes a semantic reprojection framework for map-relative nighttime UAV localization. Thermal imagery is segmented and aligned to a globally referenced, semantically labeled 3D map built from daytime RGB data via a symmetric bidirectional reprojection objective that incorporates confusion-aware weighting to handle segmentation uncertainty. The method is evaluated offline on 6.5 km of real-world nighttime UAV trajectories in urban and semi-structured environments, reporting a bias-corrected RMSE2D of 2.18 m and median RMSE2D of 1.52 m relative to RTK GNSS ground truth, with performance shown to correlate with semantic edge availability.
Significance. If the cross-modal semantic consistency assumption holds, the work offers a promising direction for GNSS-denied nighttime UAV localization that avoids direct appearance matching. The use of a real-world dataset spanning 6.5 km of trajectories and the explicit correlation of errors with semantic ambiguity are strengths that support the central claim of robustness under segmentation uncertainty.
major comments (3)
- [Abstract and Evaluation] Abstract and Evaluation section: The reported bias-corrected RMSE2D of 2.18 m and median of 1.52 m are presented without any quantitative cross-modal label agreement metric (e.g., per-class IoU between thermal segmentation predictions and projected 3D map labels) or details on the segmentation model architecture, training data, or domain adaptation steps. This is load-bearing because the bidirectional reprojection objective and confusion-aware weighting presuppose reliable semantic class consistency between modalities.
- [Method] Method section: No ablation experiments isolate the contribution of the symmetric bidirectional reprojection or the confusion-aware weighting factors relative to simpler unidirectional or unweighted baselines. Without these, it is unclear whether the achieved RMSE is attributable to the proposed objective or to favorable segmentation performance on the selected trajectories.
- [Experiments] Experiments section: The claim that large-error events are spatially localized to semantically ambiguous areas is stated but not supported by any per-trajectory or per-region breakdown of segmentation agreement or edge density statistics that would allow readers to verify the correlation between semantic evidence availability and localization accuracy.
minor comments (2)
- [Method] Notation for the confusion-aware weights and the bidirectional reprojection loss could be introduced with explicit equations earlier in the Method section to improve readability.
- [Abstract] The abstract mentions 'offline' evaluation; clarifying whether the optimization is solved in batch or sequentially would help assess real-time applicability.
Simulated Author's Rebuttal
We thank the referee for the constructive and detailed feedback on our manuscript. We have carefully addressed each major comment by expanding the relevant sections with additional details, experiments, and supporting analysis in the revised version. Our point-by-point responses follow.
read point-by-point responses
-
Referee: [Abstract and Evaluation] Abstract and Evaluation section: The reported bias-corrected RMSE2D of 2.18 m and median of 1.52 m are presented without any quantitative cross-modal label agreement metric (e.g., per-class IoU between thermal segmentation predictions and projected 3D map labels) or details on the segmentation model architecture, training data, or domain adaptation steps. This is load-bearing because the bidirectional reprojection objective and confusion-aware weighting presuppose reliable semantic class consistency between modalities.
Authors: We agree that quantifying cross-modal semantic consistency is important to substantiate the core assumptions of the method. In the revised manuscript, we have expanded the Method section to include the segmentation model architecture, training data sources, and domain adaptation steps. We have also added a quantitative cross-modal label agreement analysis in the Evaluation section, reporting per-class IoU between thermal segmentation predictions and projected 3D map labels across the evaluation trajectories. This provides direct evidence of the semantic consistency achieved. revision: yes
-
Referee: [Method] Method section: No ablation experiments isolate the contribution of the symmetric bidirectional reprojection or the confusion-aware weighting factors relative to simpler unidirectional or unweighted baselines. Without these, it is unclear whether the achieved RMSE is attributable to the proposed objective or to favorable segmentation performance on the selected trajectories.
Authors: We concur that ablation studies are needed to demonstrate the specific contributions of the proposed components. We have added these experiments to the Experiments section of the revised manuscript. The ablations compare the full symmetric bidirectional reprojection objective with confusion-aware weighting against unidirectional reprojection and unweighted baselines. The results confirm that both elements contribute to the final localization accuracy beyond what is provided by segmentation performance alone. revision: yes
-
Referee: [Experiments] Experiments section: The claim that large-error events are spatially localized to semantically ambiguous areas is stated but not supported by any per-trajectory or per-region breakdown of segmentation agreement or edge density statistics that would allow readers to verify the correlation between semantic evidence availability and localization accuracy.
Authors: We acknowledge that the original manuscript states the correlation without the detailed per-trajectory and per-region breakdowns. In the revised manuscript, we have added these analyses to the Experiments section, including per-trajectory plots of RMSE2D versus semantic edge density and segmentation agreement, as well as region-specific statistics. These additions provide the quantitative verification that large-error events are localized to areas with limited semantic evidence rather than being uniformly distributed. revision: yes
Circularity Check
No significant circularity; optimization objective evaluated against independent ground truth
full rationale
The paper formulates nighttime UAV localization as an optimization problem in a shared semantic domain using a symmetric bidirectional reprojection objective with confusion-aware weighting. Performance is measured via bias-corrected RMSE2D (2.18 m) and median RMSE2D (1.52 m) against external RTK GNSS ground truth over 6.5 km trajectories. No equations or steps reduce the objective or reported metrics to fitted parameters from the evaluation data, self-definitions, or self-citation chains. The derivation remains independent of the test results, with performance correlation to semantic edge availability presented as an empirical observation rather than a constructed equivalence.
Axiom & Free-Parameter Ledger
free parameters (1)
- confusion-aware weighting factors
axioms (1)
- domain assumption Semantic labels remain consistent across daytime RGB and nighttime thermal modalities in the target environments
Reference graph
Works this paper leans on
-
[1]
S. N. Lingam, R. Verstegen, S. M. Petermeijer, and M. Martens, “Human interactions with delivery drones in public spaces: design recommendations from recipient and bystander perspectives,”Fron- tiers in Robotics and AI, vol. 12, 2025
work page 2025
-
[2]
Cold region building inspection using UA V-based three-dimensional reconstruction,
N. Chodura, D. Boase, J. Woods, and M. Greeff, “Cold region building inspection using UA V-based three-dimensional reconstruction,”ISPRS Annals of the Photogrammetry, Remote Sensing and Spatial Infor- mation Sciences, vol. X-2-W2-2025, pp. 15–22, 2025, uA V-g 2025, Espoo, Finland
work page 2025
-
[3]
M. Warren, M. Greeff, B. Patel, J. Collier, A. P. Schoellig, and T. D. Barfoot, “There’s no place like home: Visual teach and repeat for emergency return of multirotor UA Vs during GPS failure,”IEEE Robotics and Automation Letters, vol. 4, no. 1, pp. 161–168, 2019
work page 2019
-
[4]
UA V localization using autoencoded satellite images,
M. Bianchi and T. D. Barfoot, “UA V localization using autoencoded satellite images,”IEEE Robotics and Automation Letters, vol. 6, no. 2, pp. 1761–1768, 2021
work page 2021
-
[5]
SLAM in the dark: Self-supervised learning of pose, depth and loop-closure from thermal images,
Y . Xu, Q. Hao, L. Zhang, J. Mao, X. He, W. Wu, and C. Chen, “SLAM in the dark: Self-supervised learning of pose, depth and loop-closure from thermal images,” 2025
work page 2025
-
[6]
Visual localization with google earth images for robust global pose estimation of UA Vs,
B. Patel, T. D. Barfoot, and A. P. Schoellig, “Visual localization with google earth images for robust global pose estimation of UA Vs,” in 2020 IEEE International Conference on Robotics and Automation (ICRA). IEEE, 2020, pp. 6491–6497
work page 2020
-
[7]
Efficient structure from motion for large-scale UA V images: A review and a comparison of SfM tools,
S. Jiang, C. Jiang, and W. Jiang, “Efficient structure from motion for large-scale UA V images: A review and a comparison of SfM tools,” ISPRS Journal of Photogrammetry and Remote Sensing, vol. 167, pp. 230–251, 2020
work page 2020
-
[8]
M. Storch, B. Kisliuk, T. Jarmer, B. Waske, and N. de Lange, “Comparative analysis of UA V-based LiDAR and photogrammetric systems for the detection of terrain anomalies in a historical conflict landscape,”Science of Remote Sensing, vol. 11, p. 100191, 2025
work page 2025
-
[9]
Monocular thermal SLAM with neural radiance fields for 3d scene reconstruction,
Y . Wu, L. Wang, L. Zhang, M. Chen, W. Zhao, D. Zheng, and Y . Cai, “Monocular thermal SLAM with neural radiance fields for 3d scene reconstruction,”Neurocomputing, vol. 617, p. 129041, 2025
work page 2025
-
[10]
Thermal-inertial SLAM for the environments with challenging illumination,
J. Jiang, X. Chen, W. Dai, Z. Gao, and Y . Zhang, “Thermal-inertial SLAM for the environments with challenging illumination,”IEEE Robotics and Automation Letters, vol. 7, no. 4, pp. 8767–8774, 2022
work page 2022
-
[11]
RGB- d and thermal sensor fusion: A systematic literature review,
M. Brenner, N. H. Reyes, T. Susnjak, and A. L. C. Barczak, “RGB- d and thermal sensor fusion: A systematic literature review,”IEEE Access, vol. 11, pp. 82 410–82 442, 2023
work page 2023
-
[12]
Y . Zhou, X. Li, S. Li, X. Wang, and Z. Shen, “Ground-VIO: Monocular visual-inertial odometry with online calibration of camera-ground ge- ometric parameters,”IEEE Transactions on Intelligent Transportation Systems, vol. 25, no. 10, pp. 14 328–14 343, 2024. (a) Top 10% performance. Strong semantic detail and clear struc- tural boundaries, resulting in 0.4...
work page 2024
-
[13]
SuperVINS: A real-time visual-inertial SLAM framework for challenging imaging conditions,
H. Luo, Y . Liu, C. Guo, Z. Li, and W. Song, “SuperVINS: A real-time visual-inertial SLAM framework for challenging imaging conditions,” IEEE Sensors Journal, vol. 25, no. 13, pp. 26 042–26 050, 2025
work page 2025
-
[14]
LOAM: Lidar odometry and mapping in real- time,
J. Zhang and S. Singh, “LOAM: Lidar odometry and mapping in real- time,” inRobotics: Science and Systems X. Robotics: Science and Systems Foundation, 2014
work page 2014
-
[15]
LeGO-LOAM: Lightweight and ground- optimized lidar odometry and mapping on variable terrain,
T. Shan and B. Englot, “LeGO-LOAM: Lightweight and ground- optimized lidar odometry and mapping on variable terrain,” in2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 2018, pp. 4758–4765
work page 2018
-
[16]
LIO-SAM: Tightly-coupled lidar inertial odometry via smoothing and mapping,
T. Shan, B. Englot, D. Meyers, W. Wang, C. Ratti, and D. Rus, “LIO-SAM: Tightly-coupled lidar inertial odometry via smoothing and mapping,” in2020 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 2020, pp. 5135–5142
work page 2020
-
[17]
FAST-LIO2: Fast direct LiDAR-inertial odometry,
W. Xu, Y . Cai, D. He, J. Lin, and F. Zhang, “FAST-LIO2: Fast direct LiDAR-inertial odometry,”IEEE Transactions on Robotics, vol. 38, no. 4, pp. 2053–2073, 2022
work page 2053
-
[18]
Thermal-inertial odometry for autonomous flight throughout the night,
J. Delaune, R. Hewitt, L. Lytle, C. Sorice, R. Thakker, and L. Matthies, “Thermal-inertial odometry for autonomous flight throughout the night,” in2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 2019, pp. 1122–1128
work page 2019
-
[19]
WTI-SLAM: a novel thermal infrared visual SLAM algorithm for weak texture thermal infrared images,
S. Li, X. Ma, R. He, Y . Shen, H. Guan, H. Liu, and F. Li, “WTI-SLAM: a novel thermal infrared visual SLAM algorithm for weak texture thermal infrared images,”Complex & Intelligent Systems, vol. 11, no. 6, p. 242, 2025
work page 2025
-
[20]
Visual-inertial SLAM technology based on monocular infrared camera,
C. Lv, L. Li, R. Wei, X. Wang, and T. Zuo, “Visual-inertial SLAM technology based on monocular infrared camera,” in2024 36th Chi- nese Control and Decision Conference (CCDC), 2024, pp. 2009–2014
work page 2024
-
[21]
Long-Range UA V Thermal Geo-Localization with Satellite Imagery,
J. Xiao, D. Tortei, E. Roura, and G. Loianno, “Long-Range UA V Thermal Geo-Localization with Satellite Imagery,” in2023 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Oct. 2023, pp. 5820–5827
work page 2023
-
[22]
UASTHN: Uncertainty-aware deep homog- raphy estimation for UA V satellite-thermal geo-localization,
J. Xiao and G. Loianno, “UASTHN: Uncertainty-aware deep homog- raphy estimation for UA V satellite-thermal geo-localization,” in2025 IEEE International Conference on Robotics and Automation (ICRA), 2025, pp. 14 066–14 072
work page 2025
-
[23]
HeatNet: Bridging the day-night domain gap in semantic segmentation with thermal images,
J. Vertens, J. Z ¨urn, and W. Burgard, “HeatNet: Bridging the day-night domain gap in semantic segmentation with thermal images,” in2020 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 2020, pp. 8461–8468, ISSN: 2153-0866
work page 2020
-
[24]
A robust registration method for UA V thermal infrared and visible images taken by dual-cameras,
L. Meng, J. Zhou, S. Liu, Z. Wang, X. Zhang, L. Ding, L. Shen, and S. Wang, “A robust registration method for UA V thermal infrared and visible images taken by dual-cameras,”ISPRS Journal of Photogram- metry and Remote Sensing, vol. 192, pp. 189–214, 2022
work page 2022
-
[25]
Y . Ji, K. Song, H. Wen, X. Xue, Y . Yan, and Q. Meng, “UA V applications in intelligent traffic: RGBT image feature registration and complementary perception,”Advanced Engineering Informatics, vol. 63, p. 102953, 2025
work page 2025
-
[26]
A comprehensive survey on synthetic infrared image synthesis,
A. Upadhyay, M. Sharma, P. Mukherjee, A. Singhal, and B. Lall, “A comprehensive survey on synthetic infrared image synthesis,”Infrared Physics & Technology, vol. 147, p. 105745, 2025
work page 2025
-
[27]
SemSegMap – 3d segment-based semantic localization,
A. Cramariuc, F. Tschopp, N. Alatur, S. Benz, T. Falck, M. Br¨uhlmeier, B. Hahn, J. Nieto, and R. Siegwart, “SemSegMap – 3d segment-based semantic localization,” in2021 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 2021, pp. 1183–1190
work page 2021
-
[28]
Cross- modal monocular localization in prior LiDAR maps utilizing semantic consistency,
C. Zhang, H. Zhao, C. Wang, X. Tang, and M. Yang, “Cross- modal monocular localization in prior LiDAR maps utilizing semantic consistency,” in2023 IEEE International Conference on Robotics and Automation (ICRA), 2023, pp. 4004–4010
work page 2023
-
[29]
Vision global localization with semantic segmentation and interest feature points,
K. Li, X. Zhang, K. LI, and S. Zhang, “Vision global localization with semantic segmentation and interest feature points,” in2020 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 2020, pp. 4581–4587
work page 2020
-
[30]
FLAIR: a country-scale land cover semantic segmentation dataset from multi-source optical imagery,
A. Garioud, N. Gonthier, L. Landrieu, A. D. Wit, M. Valette, M. Poup ´ee, S. Giordano, and B. Wattrelos, “FLAIR: a country-scale land cover semantic segmentation dataset from multi-source optical imagery,” inAdvances in Neural Information Processing Systems (NeurIPS), 2023
work page 2023
-
[31]
A. Rahman. Aerial semantic drone dataset
-
[32]
UA Vid: A semantic segmentation dataset for UA V imagery,
Y . Lyu, G. V osselman, G.-S. Xia, A. Yilmaz, and M. Y . Yang, “UA Vid: A semantic segmentation dataset for UA V imagery,”ISPRS Journal of Photogrammetry and Remote Sensing, vol. 165, pp. 108–119, 2020
work page 2020
-
[33]
VDD: Varied drone dataset for semantic segmentation,
W. Cai, K. Jin, J. Hou, C. Guo, L. Wu, and W. Yang, “VDD: Varied drone dataset for semantic segmentation,”Journal of Visual Communication and Image Representation, vol. 109, p. 104429, 2025
work page 2025
-
[34]
Encoder-decoder with atrous separable convolution for semantic image segmentation,
L.-C. Chen, Y . Zhu, G. Papandreou, F. Schroff, and H. Adam, “Encoder-decoder with atrous separable convolution for semantic image segmentation,” inComputer Vision – ECCV 2018, V . Ferrari, M. Hebert, C. Sminchisescu, and Y . Weiss, Eds. Springer International Publishing, 2018, vol. 11211, pp. 833–851, series Title: Lecture Notes in Computer Science
work page 2018
-
[35]
Deep residual learning for image recognition,
K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” in2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, 2016, pp. 770–778
work page 2016
-
[36]
SegFormer: Simple and efficient design for semantic segmentation with transformers,
E. Xie, W. Wang, Z. Yu, A. Anandkumar, J. M. Alvarez, and P. Luo, “SegFormer: Simple and efficient design for semantic segmentation with transformers,” inAdvances in Neural Information Processing Systems, vol. 34. Curran Associates, Inc., 2021, pp. 12 077–12 090
work page 2021
-
[37]
ImageNet large scale visual recognition challenge,
O. Russakovsky, J. Deng, H. Su, J. Krause, S. Satheesh, S. Ma, Z. Huang, A. Karpathy, A. Khosla, M. Bernstein, A. C. Berg, and L. Fei-Fei, “ImageNet large scale visual recognition challenge,”Inter- national Journal of Computer Vision, vol. 115, no. 3, pp. 211–252, 2015
work page 2015
- [38]
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.