pith. sign in

arxiv: 2604.26201 · v1 · submitted 2026-04-29 · 💻 cs.RO

Lights Out: A Nighttime UAV Localization Framework Using Thermal Imagery and Semantic 3D Maps

Pith reviewed 2026-05-07 13:30 UTC · model grok-4.3

classification 💻 cs.RO
keywords UAV localizationthermal imagerysemantic mappingnighttime navigationreprojectionGNSS-denied3D maps
0
0 comments X

The pith

Semantic reprojection aligns nighttime thermal images with daytime 3D maps to localize UAVs without GNSS.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper establishes a framework for localizing UAVs at night by projecting segmented thermal imagery onto a semantic 3D map built from daytime RGB data. Instead of matching visual appearances across modalities, it operates entirely in the semantic label space using a symmetric bidirectional reprojection cost that accounts for segmentation uncertainties. This matters because GNSS can be jammed or unavailable, and thermal cameras work in darkness where RGB fails. The method is tested on real 6.5 km flights and shows meter-level accuracy that depends on the presence of distinct semantic boundaries.

Core claim

Localization is formulated in a shared semantic domain and solved via a symmetric bidirectional reprojection objective with confusion-aware weighting to improve robustness under segmentation uncertainty. Evaluated on real nighttime UAV trajectories, it achieves a bias-corrected RMSE2D of 2.18 m and median of 1.52 m, with errors concentrated in semantically ambiguous regions.

What carries the argument

The symmetric bidirectional reprojection objective with confusion-aware weighting, which aligns semantic labels from thermal segmentation to the 3D map while downweighting uncertain labels.

If this is right

  • Localization performance reaches 2.18 m bias-corrected RMSE2D on 6.5 km of real nighttime flights.
  • Accuracy correlates strongly with availability of semantic edge evidence in the scene.
  • Large localization errors occur only in spatially localized ambiguous areas rather than randomly.
  • Semantic reprojection provides a viable path for globally referenced nighttime UAV localization using thermal imagery alone.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Such a system could serve as a backup for GNSS in urban or contested environments where night operations are common.
  • Extending the approach to dynamic maps or online map updates might handle changing environments better.
  • Testing in more varied terrains could reveal how semantic density affects reliability across different settings.

Load-bearing premise

That thermal image segmentation produces semantic categories that match those in the daytime RGB map reliably enough for reprojection to disambiguate position.

What would settle it

Running the system on a nighttime trajectory in an area with abundant semantic edges but observing RMSE2D errors significantly above 2 meters would falsify the claim of robust localization.

Figures

Figures reproduced from arXiv: 2604.26201 by Melissa Greeff, Ryan Allen.

Figure 1
Figure 1. Figure 1: Overview of the proposed nighttime localization pipeline with the Vic dataset. A semantically labeled 3D map is constructed from daytime RGB imagery during pre-flight mapping (right). At night, segmented thermal imagery (left) is aligned to this map via edge-aware reprojection, enabling globally referenced localization without appearance-based matching. a fundamental modality mismatch: global maps are typi… view at source ↗
Figure 2
Figure 2. Figure 2: Overview of the proposed thermal semantic localization pipeline with the City dataset. Daytime preprocessing (top): synchronized RGB–thermal data with RTK ground truth are used to train a thermal segmentation model and construct a georeferenced point cloud. Semantic labels are reprojected into the 3D reconstruction to produce a semantically labeled map, which is refined through pruning and edge extraction.… view at source ↗
Figure 3
Figure 3. Figure 3: Pier flight trajectory overlaid on the semantically labeled global map. Localization error relative to RTK ground truth is color￾coded along the path: white indicates high-accuracy estimates (0–2 m), grey denotes moderate error (2–5 m), and black highlights failure regions exceeding 5 m. Regions P1–P3 mark regions where large errors are concentrated. The figure illustrates that localization failures are co… view at source ↗
Figure 4
Figure 4. Figure 4: Qualitative localization examples shown as row-wise triplets (Left: thermal image. Middle: thermal semantic segmentation. Right: reprojected RGB semantic map) from Pier failure regions. Reprojections are computed using RTK ground-truth poses to isolate segmentation effects from localization error. [13] H. Luo, Y. Liu, C. Guo, Z. Li, and W. Song, “SuperVINS: A real-time visual-inertial SLAM framework for ch… view at source ↗
read the original abstract

Reliable backup localization for unmanned aerial vehicles (UAVs) operating in GNSS-denied nighttime conditions remains an open challenge due to the severe modality gap between daytime RGB maps and nighttime thermal imagery. This work presents a semantic reprojection framework for map-relative nighttime UAV localization by aligning segmented thermal observations with a globally referenced, semantically labeled 3D map constructed from daytime RGB data. Rather than relying on appearance-based correspondence, localization is formulated in a shared semantic domain and solved via a symmetric bidirectional reprojection objective with confusion-aware weighting to improve robustness under segmentation uncertainty. The approach is evaluated offline across 6.5 km of nighttime, real-world UAV flight trajectories in urban and semi-structured environments. Relative to RTK GNSS ground truth, the system achieves a bias-corrected RMSE2D of 2.18 m and a median RMSE2D of 1.52 m. Results show that localization performance is strongly correlated with the availability of semantic edge evidence and that large-error events are spatially localized to semantically ambiguous areas rather than uniformly distributed. These findings indicate that semantic reprojection offers a promising pathway toward globally referenced nighttime UAV localization using thermal imagery alone.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

3 major / 2 minor

Summary. The manuscript proposes a semantic reprojection framework for map-relative nighttime UAV localization. Thermal imagery is segmented and aligned to a globally referenced, semantically labeled 3D map built from daytime RGB data via a symmetric bidirectional reprojection objective that incorporates confusion-aware weighting to handle segmentation uncertainty. The method is evaluated offline on 6.5 km of real-world nighttime UAV trajectories in urban and semi-structured environments, reporting a bias-corrected RMSE2D of 2.18 m and median RMSE2D of 1.52 m relative to RTK GNSS ground truth, with performance shown to correlate with semantic edge availability.

Significance. If the cross-modal semantic consistency assumption holds, the work offers a promising direction for GNSS-denied nighttime UAV localization that avoids direct appearance matching. The use of a real-world dataset spanning 6.5 km of trajectories and the explicit correlation of errors with semantic ambiguity are strengths that support the central claim of robustness under segmentation uncertainty.

major comments (3)
  1. [Abstract and Evaluation] Abstract and Evaluation section: The reported bias-corrected RMSE2D of 2.18 m and median of 1.52 m are presented without any quantitative cross-modal label agreement metric (e.g., per-class IoU between thermal segmentation predictions and projected 3D map labels) or details on the segmentation model architecture, training data, or domain adaptation steps. This is load-bearing because the bidirectional reprojection objective and confusion-aware weighting presuppose reliable semantic class consistency between modalities.
  2. [Method] Method section: No ablation experiments isolate the contribution of the symmetric bidirectional reprojection or the confusion-aware weighting factors relative to simpler unidirectional or unweighted baselines. Without these, it is unclear whether the achieved RMSE is attributable to the proposed objective or to favorable segmentation performance on the selected trajectories.
  3. [Experiments] Experiments section: The claim that large-error events are spatially localized to semantically ambiguous areas is stated but not supported by any per-trajectory or per-region breakdown of segmentation agreement or edge density statistics that would allow readers to verify the correlation between semantic evidence availability and localization accuracy.
minor comments (2)
  1. [Method] Notation for the confusion-aware weights and the bidirectional reprojection loss could be introduced with explicit equations earlier in the Method section to improve readability.
  2. [Abstract] The abstract mentions 'offline' evaluation; clarifying whether the optimization is solved in batch or sequentially would help assess real-time applicability.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the constructive and detailed feedback on our manuscript. We have carefully addressed each major comment by expanding the relevant sections with additional details, experiments, and supporting analysis in the revised version. Our point-by-point responses follow.

read point-by-point responses
  1. Referee: [Abstract and Evaluation] Abstract and Evaluation section: The reported bias-corrected RMSE2D of 2.18 m and median of 1.52 m are presented without any quantitative cross-modal label agreement metric (e.g., per-class IoU between thermal segmentation predictions and projected 3D map labels) or details on the segmentation model architecture, training data, or domain adaptation steps. This is load-bearing because the bidirectional reprojection objective and confusion-aware weighting presuppose reliable semantic class consistency between modalities.

    Authors: We agree that quantifying cross-modal semantic consistency is important to substantiate the core assumptions of the method. In the revised manuscript, we have expanded the Method section to include the segmentation model architecture, training data sources, and domain adaptation steps. We have also added a quantitative cross-modal label agreement analysis in the Evaluation section, reporting per-class IoU between thermal segmentation predictions and projected 3D map labels across the evaluation trajectories. This provides direct evidence of the semantic consistency achieved. revision: yes

  2. Referee: [Method] Method section: No ablation experiments isolate the contribution of the symmetric bidirectional reprojection or the confusion-aware weighting factors relative to simpler unidirectional or unweighted baselines. Without these, it is unclear whether the achieved RMSE is attributable to the proposed objective or to favorable segmentation performance on the selected trajectories.

    Authors: We concur that ablation studies are needed to demonstrate the specific contributions of the proposed components. We have added these experiments to the Experiments section of the revised manuscript. The ablations compare the full symmetric bidirectional reprojection objective with confusion-aware weighting against unidirectional reprojection and unweighted baselines. The results confirm that both elements contribute to the final localization accuracy beyond what is provided by segmentation performance alone. revision: yes

  3. Referee: [Experiments] Experiments section: The claim that large-error events are spatially localized to semantically ambiguous areas is stated but not supported by any per-trajectory or per-region breakdown of segmentation agreement or edge density statistics that would allow readers to verify the correlation between semantic evidence availability and localization accuracy.

    Authors: We acknowledge that the original manuscript states the correlation without the detailed per-trajectory and per-region breakdowns. In the revised manuscript, we have added these analyses to the Experiments section, including per-trajectory plots of RMSE2D versus semantic edge density and segmentation agreement, as well as region-specific statistics. These additions provide the quantitative verification that large-error events are localized to areas with limited semantic evidence rather than being uniformly distributed. revision: yes

Circularity Check

0 steps flagged

No significant circularity; optimization objective evaluated against independent ground truth

full rationale

The paper formulates nighttime UAV localization as an optimization problem in a shared semantic domain using a symmetric bidirectional reprojection objective with confusion-aware weighting. Performance is measured via bias-corrected RMSE2D (2.18 m) and median RMSE2D (1.52 m) against external RTK GNSS ground truth over 6.5 km trajectories. No equations or steps reduce the objective or reported metrics to fitted parameters from the evaluation data, self-definitions, or self-citation chains. The derivation remains independent of the test results, with performance correlation to semantic edge availability presented as an empirical observation rather than a constructed equivalence.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 0 invented entities

The central claim rests on the domain assumption that semantic categories extracted from thermal imagery correspond sufficiently to those in the daytime RGB map and that semantic edges provide disambiguating information; no explicit free parameters or new entities are named in the abstract.

free parameters (1)
  • confusion-aware weighting factors
    Used to modulate the influence of uncertain semantic segments in the reprojection objective; values are not stated but implied to be part of the method.
axioms (1)
  • domain assumption Semantic labels remain consistent across daytime RGB and nighttime thermal modalities in the target environments
    The entire alignment procedure presupposes that the same object categories can be recognized in both sensor types.

pith-pipeline@v0.9.0 · 5500 in / 1371 out tokens · 73339 ms · 2026-05-07T13:30:54.471218+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

38 extracted references · 38 canonical work pages

  1. [1]

    Human interactions with delivery drones in public spaces: design recommendations from recipient and bystander perspectives,

    S. N. Lingam, R. Verstegen, S. M. Petermeijer, and M. Martens, “Human interactions with delivery drones in public spaces: design recommendations from recipient and bystander perspectives,”Fron- tiers in Robotics and AI, vol. 12, 2025

  2. [2]

    Cold region building inspection using UA V-based three-dimensional reconstruction,

    N. Chodura, D. Boase, J. Woods, and M. Greeff, “Cold region building inspection using UA V-based three-dimensional reconstruction,”ISPRS Annals of the Photogrammetry, Remote Sensing and Spatial Infor- mation Sciences, vol. X-2-W2-2025, pp. 15–22, 2025, uA V-g 2025, Espoo, Finland

  3. [3]

    There’s no place like home: Visual teach and repeat for emergency return of multirotor UA Vs during GPS failure,

    M. Warren, M. Greeff, B. Patel, J. Collier, A. P. Schoellig, and T. D. Barfoot, “There’s no place like home: Visual teach and repeat for emergency return of multirotor UA Vs during GPS failure,”IEEE Robotics and Automation Letters, vol. 4, no. 1, pp. 161–168, 2019

  4. [4]

    UA V localization using autoencoded satellite images,

    M. Bianchi and T. D. Barfoot, “UA V localization using autoencoded satellite images,”IEEE Robotics and Automation Letters, vol. 6, no. 2, pp. 1761–1768, 2021

  5. [5]

    SLAM in the dark: Self-supervised learning of pose, depth and loop-closure from thermal images,

    Y . Xu, Q. Hao, L. Zhang, J. Mao, X. He, W. Wu, and C. Chen, “SLAM in the dark: Self-supervised learning of pose, depth and loop-closure from thermal images,” 2025

  6. [6]

    Visual localization with google earth images for robust global pose estimation of UA Vs,

    B. Patel, T. D. Barfoot, and A. P. Schoellig, “Visual localization with google earth images for robust global pose estimation of UA Vs,” in 2020 IEEE International Conference on Robotics and Automation (ICRA). IEEE, 2020, pp. 6491–6497

  7. [7]

    Efficient structure from motion for large-scale UA V images: A review and a comparison of SfM tools,

    S. Jiang, C. Jiang, and W. Jiang, “Efficient structure from motion for large-scale UA V images: A review and a comparison of SfM tools,” ISPRS Journal of Photogrammetry and Remote Sensing, vol. 167, pp. 230–251, 2020

  8. [8]

    Comparative analysis of UA V-based LiDAR and photogrammetric systems for the detection of terrain anomalies in a historical conflict landscape,

    M. Storch, B. Kisliuk, T. Jarmer, B. Waske, and N. de Lange, “Comparative analysis of UA V-based LiDAR and photogrammetric systems for the detection of terrain anomalies in a historical conflict landscape,”Science of Remote Sensing, vol. 11, p. 100191, 2025

  9. [9]

    Monocular thermal SLAM with neural radiance fields for 3d scene reconstruction,

    Y . Wu, L. Wang, L. Zhang, M. Chen, W. Zhao, D. Zheng, and Y . Cai, “Monocular thermal SLAM with neural radiance fields for 3d scene reconstruction,”Neurocomputing, vol. 617, p. 129041, 2025

  10. [10]

    Thermal-inertial SLAM for the environments with challenging illumination,

    J. Jiang, X. Chen, W. Dai, Z. Gao, and Y . Zhang, “Thermal-inertial SLAM for the environments with challenging illumination,”IEEE Robotics and Automation Letters, vol. 7, no. 4, pp. 8767–8774, 2022

  11. [11]

    RGB- d and thermal sensor fusion: A systematic literature review,

    M. Brenner, N. H. Reyes, T. Susnjak, and A. L. C. Barczak, “RGB- d and thermal sensor fusion: A systematic literature review,”IEEE Access, vol. 11, pp. 82 410–82 442, 2023

  12. [12]

    Ground-VIO: Monocular visual-inertial odometry with online calibration of camera-ground ge- ometric parameters,

    Y . Zhou, X. Li, S. Li, X. Wang, and Z. Shen, “Ground-VIO: Monocular visual-inertial odometry with online calibration of camera-ground ge- ometric parameters,”IEEE Transactions on Intelligent Transportation Systems, vol. 25, no. 10, pp. 14 328–14 343, 2024. (a) Top 10% performance. Strong semantic detail and clear struc- tural boundaries, resulting in 0.4...

  13. [13]

    SuperVINS: A real-time visual-inertial SLAM framework for challenging imaging conditions,

    H. Luo, Y . Liu, C. Guo, Z. Li, and W. Song, “SuperVINS: A real-time visual-inertial SLAM framework for challenging imaging conditions,” IEEE Sensors Journal, vol. 25, no. 13, pp. 26 042–26 050, 2025

  14. [14]

    LOAM: Lidar odometry and mapping in real- time,

    J. Zhang and S. Singh, “LOAM: Lidar odometry and mapping in real- time,” inRobotics: Science and Systems X. Robotics: Science and Systems Foundation, 2014

  15. [15]

    LeGO-LOAM: Lightweight and ground- optimized lidar odometry and mapping on variable terrain,

    T. Shan and B. Englot, “LeGO-LOAM: Lightweight and ground- optimized lidar odometry and mapping on variable terrain,” in2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 2018, pp. 4758–4765

  16. [16]

    LIO-SAM: Tightly-coupled lidar inertial odometry via smoothing and mapping,

    T. Shan, B. Englot, D. Meyers, W. Wang, C. Ratti, and D. Rus, “LIO-SAM: Tightly-coupled lidar inertial odometry via smoothing and mapping,” in2020 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 2020, pp. 5135–5142

  17. [17]

    FAST-LIO2: Fast direct LiDAR-inertial odometry,

    W. Xu, Y . Cai, D. He, J. Lin, and F. Zhang, “FAST-LIO2: Fast direct LiDAR-inertial odometry,”IEEE Transactions on Robotics, vol. 38, no. 4, pp. 2053–2073, 2022

  18. [18]

    Thermal-inertial odometry for autonomous flight throughout the night,

    J. Delaune, R. Hewitt, L. Lytle, C. Sorice, R. Thakker, and L. Matthies, “Thermal-inertial odometry for autonomous flight throughout the night,” in2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 2019, pp. 1122–1128

  19. [19]

    WTI-SLAM: a novel thermal infrared visual SLAM algorithm for weak texture thermal infrared images,

    S. Li, X. Ma, R. He, Y . Shen, H. Guan, H. Liu, and F. Li, “WTI-SLAM: a novel thermal infrared visual SLAM algorithm for weak texture thermal infrared images,”Complex & Intelligent Systems, vol. 11, no. 6, p. 242, 2025

  20. [20]

    Visual-inertial SLAM technology based on monocular infrared camera,

    C. Lv, L. Li, R. Wei, X. Wang, and T. Zuo, “Visual-inertial SLAM technology based on monocular infrared camera,” in2024 36th Chi- nese Control and Decision Conference (CCDC), 2024, pp. 2009–2014

  21. [21]

    Long-Range UA V Thermal Geo-Localization with Satellite Imagery,

    J. Xiao, D. Tortei, E. Roura, and G. Loianno, “Long-Range UA V Thermal Geo-Localization with Satellite Imagery,” in2023 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Oct. 2023, pp. 5820–5827

  22. [22]

    UASTHN: Uncertainty-aware deep homog- raphy estimation for UA V satellite-thermal geo-localization,

    J. Xiao and G. Loianno, “UASTHN: Uncertainty-aware deep homog- raphy estimation for UA V satellite-thermal geo-localization,” in2025 IEEE International Conference on Robotics and Automation (ICRA), 2025, pp. 14 066–14 072

  23. [23]

    HeatNet: Bridging the day-night domain gap in semantic segmentation with thermal images,

    J. Vertens, J. Z ¨urn, and W. Burgard, “HeatNet: Bridging the day-night domain gap in semantic segmentation with thermal images,” in2020 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 2020, pp. 8461–8468, ISSN: 2153-0866

  24. [24]

    A robust registration method for UA V thermal infrared and visible images taken by dual-cameras,

    L. Meng, J. Zhou, S. Liu, Z. Wang, X. Zhang, L. Ding, L. Shen, and S. Wang, “A robust registration method for UA V thermal infrared and visible images taken by dual-cameras,”ISPRS Journal of Photogram- metry and Remote Sensing, vol. 192, pp. 189–214, 2022

  25. [25]

    UA V applications in intelligent traffic: RGBT image feature registration and complementary perception,

    Y . Ji, K. Song, H. Wen, X. Xue, Y . Yan, and Q. Meng, “UA V applications in intelligent traffic: RGBT image feature registration and complementary perception,”Advanced Engineering Informatics, vol. 63, p. 102953, 2025

  26. [26]

    A comprehensive survey on synthetic infrared image synthesis,

    A. Upadhyay, M. Sharma, P. Mukherjee, A. Singhal, and B. Lall, “A comprehensive survey on synthetic infrared image synthesis,”Infrared Physics & Technology, vol. 147, p. 105745, 2025

  27. [27]

    SemSegMap – 3d segment-based semantic localization,

    A. Cramariuc, F. Tschopp, N. Alatur, S. Benz, T. Falck, M. Br¨uhlmeier, B. Hahn, J. Nieto, and R. Siegwart, “SemSegMap – 3d segment-based semantic localization,” in2021 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 2021, pp. 1183–1190

  28. [28]

    Cross- modal monocular localization in prior LiDAR maps utilizing semantic consistency,

    C. Zhang, H. Zhao, C. Wang, X. Tang, and M. Yang, “Cross- modal monocular localization in prior LiDAR maps utilizing semantic consistency,” in2023 IEEE International Conference on Robotics and Automation (ICRA), 2023, pp. 4004–4010

  29. [29]

    Vision global localization with semantic segmentation and interest feature points,

    K. Li, X. Zhang, K. LI, and S. Zhang, “Vision global localization with semantic segmentation and interest feature points,” in2020 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 2020, pp. 4581–4587

  30. [30]

    FLAIR: a country-scale land cover semantic segmentation dataset from multi-source optical imagery,

    A. Garioud, N. Gonthier, L. Landrieu, A. D. Wit, M. Valette, M. Poup ´ee, S. Giordano, and B. Wattrelos, “FLAIR: a country-scale land cover semantic segmentation dataset from multi-source optical imagery,” inAdvances in Neural Information Processing Systems (NeurIPS), 2023

  31. [31]

    A. Rahman. Aerial semantic drone dataset

  32. [32]

    UA Vid: A semantic segmentation dataset for UA V imagery,

    Y . Lyu, G. V osselman, G.-S. Xia, A. Yilmaz, and M. Y . Yang, “UA Vid: A semantic segmentation dataset for UA V imagery,”ISPRS Journal of Photogrammetry and Remote Sensing, vol. 165, pp. 108–119, 2020

  33. [33]

    VDD: Varied drone dataset for semantic segmentation,

    W. Cai, K. Jin, J. Hou, C. Guo, L. Wu, and W. Yang, “VDD: Varied drone dataset for semantic segmentation,”Journal of Visual Communication and Image Representation, vol. 109, p. 104429, 2025

  34. [34]

    Encoder-decoder with atrous separable convolution for semantic image segmentation,

    L.-C. Chen, Y . Zhu, G. Papandreou, F. Schroff, and H. Adam, “Encoder-decoder with atrous separable convolution for semantic image segmentation,” inComputer Vision – ECCV 2018, V . Ferrari, M. Hebert, C. Sminchisescu, and Y . Weiss, Eds. Springer International Publishing, 2018, vol. 11211, pp. 833–851, series Title: Lecture Notes in Computer Science

  35. [35]

    Deep residual learning for image recognition,

    K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” in2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, 2016, pp. 770–778

  36. [36]

    SegFormer: Simple and efficient design for semantic segmentation with transformers,

    E. Xie, W. Wang, Z. Yu, A. Anandkumar, J. M. Alvarez, and P. Luo, “SegFormer: Simple and efficient design for semantic segmentation with transformers,” inAdvances in Neural Information Processing Systems, vol. 34. Curran Associates, Inc., 2021, pp. 12 077–12 090

  37. [37]

    ImageNet large scale visual recognition challenge,

    O. Russakovsky, J. Deng, H. Su, J. Krause, S. Satheesh, S. Ma, Z. Huang, A. Karpathy, A. Khosla, M. Bernstein, A. C. Berg, and L. Fei-Fei, “ImageNet large scale visual recognition challenge,”Inter- national Journal of Computer Vision, vol. 115, no. 3, pp. 211–252, 2015

  38. [38]

    Reality capture,

    E. Games, “Reality capture,” 2025