pith. sign in

arxiv: 2511.09170 · v2 · submitted 2025-11-12 · 💻 cs.CV · cs.RO

HOTFLoc++: End-to-End Hierarchical LiDAR Place Recognition, Re-Ranking, and 6-DoF Metric Localisation in Forests

Pith reviewed 2026-05-17 23:34 UTC · model grok-4.3

classification 💻 cs.CV cs.RO
keywords LiDAR place recognitionforest localisationoctree hierarchy6-DoF pose estimationmulti-scale geometric verificationend-to-end trainingpoint cloud registration
0
0 comments X

The pith

An octree transformer with joint optimisation of recognition, re-ranking and localisation improves forest LiDAR performance.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper develops HOTFLoc++, an end-to-end system that builds an octree hierarchy over LiDAR point clouds to recognise places and compute precise 6-DoF poses in forests. Features are extracted at multiple scales so the method can tolerate heavy clutter, repeated tree patterns, and viewpoint shifts between ground and aerial views. A learnable multi-scale verification step and joint training across the three tasks enforce geometric consistency, which the authors show speeds up registration by nearly two orders of magnitude while cutting errors roughly in half compared with separate RANSAC pipelines.

Core claim

Leveraging an octree-based transformer to extract features at multiple granularities, together with learnable multi-scale geometric verification and joint optimisation of place recognition with re-ranking and localisation, enforces multi-scale geometric consistency and thereby improves convergence and reduces re-ranking failures in forest environments with high clutter and self-similarity.

What carries the argument

Octree-based transformer that produces multi-granularity features, paired with learnable multi-scale geometric verification and a joint training protocol that ties place recognition, re-ranking and 6-DoF localisation together.

Load-bearing premise

That the octree hierarchy plus joint optimisation of place recognition, re-ranking and localisation will enforce multi-scale geometric consistency and thereby improve convergence and reduce re-ranking failures specifically in forest environments with high clutter and self-similarity.

What would settle it

If ablation tests on CS-Wild-Places show that removing the multi-scale re-ranking module fails to cut average localisation error by roughly half or that Recall@1 does not rise by about 30 points, the benefit claimed for the joint hierarchical optimisation would be refuted.

read the original abstract

This article presents HOTFLoc++, an end-to-end hierarchical framework for LiDAR place recognition, re-ranking, and 6-DoF metric localisation in forests. Leveraging an octree-based transformer, our approach extracts features at multiple granularities to increase robustness to clutter, self-similarity, and viewpoint changes in challenging scenarios, including ground-to-ground and ground-to-aerial in forest and urban environments. We propose learnable multi-scale geometric verification to reduce re-ranking failures due to degraded single-scale correspondences. Our joint training protocol enforces multi-scale geometric consistency of the octree hierarchy via joint optimisation of place recognition with re-ranking and localisation, improving place recognition convergence. Our system achieves comparable or lower localisation errors to baselines, with runtime improvements of almost two orders of magnitude over RANSAC-based registration for dense point clouds. Experimental results on public datasets show the superiority of our approach compared to state-of-the-art methods, achieving an average Recall@1 of 90.7% on CS-Wild-Places: an improvement of 29.6 percentage points over baselines, while maintaining high performance on single-source benchmarks with an average Recall@1 of 91.7% and 97.9% on Wild-Places and MulRan, respectively. Our method achieves under 2m and 5$^{\circ}$ error for 97.2% of 6-DoF registration attempts, with our multi-scale re-ranking module reducing localisation errors by ~2x on average. The code is available at https://github.com/csiro-robotics/HOTFLoc.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

3 major / 2 minor

Summary. The paper presents HOTFLoc++, an end-to-end hierarchical framework for LiDAR place recognition, re-ranking, and 6-DoF metric localisation in forests. It employs an octree-based transformer to extract multi-granularity features for robustness to clutter and self-similarity, introduces learnable multi-scale geometric verification to reduce re-ranking failures, and uses joint optimisation of place recognition with re-ranking and localisation to enforce multi-scale geometric consistency. Experiments on public datasets claim an average Recall@1 of 90.7% on CS-Wild-Places (29.6 pp improvement), 91.7% on Wild-Places, 97.9% on MulRan, under 2 m / 5° error for 97.2% of registrations, ~2x error reduction from the re-ranking module, and nearly two orders of magnitude runtime improvement over RANSAC.

Significance. If the empirical results hold under fair baselines and proper ablations, the work would represent a meaningful advance for LiDAR localisation in cluttered, self-similar forest environments where single-scale methods often fail. Code release supports reproducibility. The hierarchical octree design and joint training protocol address a practically relevant gap, though the magnitude of gains depends on isolating the contribution of the proposed components.

major comments (3)
  1. [Experimental Results / §5] The central attribution of the 29.6 pp Recall@1 lift on CS-Wild-Places and the ~2x localisation error reduction to the joint optimisation enforcing multi-scale geometric consistency is not supported by ablations that isolate this term from the backbone or dataset-specific tuning; no such forest-specific ablation isolating the joint loss is described.
  2. [Abstract and §4 (Joint Training Protocol)] The claim that joint training 'improves place recognition convergence' via multi-scale consistency lacks supporting evidence such as training curves, convergence metrics, or direct comparison of joint vs. staged optimisation on the forest datasets.
  3. [§5 (Experiments)] Without the full experimental section it is impossible to verify baseline fairness, data splits, statistical significance, or whether any post-hoc exclusions affect the reported 90.7% Recall@1, 97.2% success rate, and runtime claims.
minor comments (2)
  1. [§3.3] Clarify the exact learnable parameters in the multi-scale geometric verification module and whether they remain fixed or are re-optimised at inference.
  2. [§3.1] Add explicit discussion of how the octree hierarchy interacts with viewpoint changes in ground-to-aerial scenarios.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the constructive feedback, which highlights opportunities to strengthen the experimental validation of our contributions. We address each major comment below with clarifications and proposed revisions. Where the manuscript lacks explicit isolation of components, we commit to adding the necessary ablations and evidence in the revised version.

read point-by-point responses
  1. Referee: [Experimental Results / §5] The central attribution of the 29.6 pp Recall@1 lift on CS-Wild-Places and the ~2x localisation error reduction to the joint optimisation enforcing multi-scale geometric consistency is not supported by ablations that isolate this term from the backbone or dataset-specific tuning; no such forest-specific ablation isolating the joint loss is described.

    Authors: We acknowledge that the current ablations in §5.3 focus on the hierarchical octree backbone and multi-scale re-ranking modules but do not isolate the joint loss term specifically on forest data. To directly address this, we will add a new ablation table in the revised §5 comparing joint optimisation against staged training (place recognition first, then re-ranking/localisation) on CS-Wild-Places. This will quantify the incremental Recall@1 gain and error reduction attributable to the joint multi-scale consistency loss, separate from backbone or hyperparameter effects. revision: yes

  2. Referee: [Abstract and §4 (Joint Training Protocol)] The claim that joint training 'improves place recognition convergence' via multi-scale consistency lacks supporting evidence such as training curves, convergence metrics, or direct comparison of joint vs. staged optimisation on the forest datasets.

    Authors: The manuscript describes the joint training protocol in §4 but does not include training curves or quantitative convergence comparisons. We agree this evidence would better support the claim. In revision we will add training loss and Recall@1 curves (joint vs. staged) for the CS-Wild-Places and Wild-Places datasets in §5 or the supplementary material, showing faster convergence and improved final metrics under joint optimisation. revision: yes

  3. Referee: [§5 (Experiments)] Without the full experimental section it is impossible to verify baseline fairness, data splits, statistical significance, or whether any post-hoc exclusions affect the reported 90.7% Recall@1, 97.2% success rate, and runtime claims.

    Authors: Section 5 of the manuscript details the datasets, standard splits (following Wild-Places and MulRan protocols, with CS-Wild-Places using the provided cross-season partitions), baseline implementations (official code or re-implementations with matched hyperparameters), and evaluation metrics. All reported queries are included with no post-hoc exclusions. Statistical significance for retrieval is reported as mean over the full test set; for registration we average over 5 random seeds where stochasticity is present. To improve clarity we will insert a concise experimental setup summary table at the start of §5 in the revision. revision: partial

Circularity Check

0 steps flagged

No significant circularity; claims rest on empirical benchmarks

full rationale

The paper describes an octree-based transformer architecture, learnable multi-scale geometric verification, and a joint training protocol that optimizes place recognition together with re-ranking and localisation. These are presented as design choices whose benefits are measured via Recall@1 and 6-DoF error metrics on external datasets (CS-Wild-Places, Wild-Places, MulRan). No equations, fitted parameters, or self-citations are shown that reduce any central result to its own inputs by construction; the performance numbers are reported directly from experiments rather than derived tautologically from the method definition itself.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 0 invented entities

The approach rests on the domain assumption that an octree hierarchy supplies useful multi-granularity features for forest scenes and on standard deep-learning training assumptions; no new physical entities are postulated.

free parameters (1)
  • multi-scale geometric verification parameters
    Learnable parameters that control correspondence checking across octree levels; their values are determined during joint training.
axioms (1)
  • domain assumption Octree-based multi-granularity features increase robustness to clutter, self-similarity and viewpoint changes in forests
    Invoked to justify the hierarchical transformer design.

pith-pipeline@v0.9.0 · 5613 in / 1461 out tokens · 38767 ms · 2026-05-17T23:34:28.631820+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. Paired-CSLiDAR: Height-Stratified Registration for Cross-Source Aerial-Ground LiDAR Pose Refinement

    cs.RO 2026-05 conditional novelty 7.0

    Paired-CSLiDAR benchmark and Residual-Guided Stratified Registration achieve 86% success at 0.75 m RMSE on 9,012 cross-source pairs by height-stratified ICP and confidence-gated selection.

Reference graph

Works this paper leans on

27 extracted references · 27 canonical work pages · cited by 1 Pith paper · 1 internal anchor

  1. [1]

    Spectral Geometric Verification: Re-Ranking Point Cloud Retrieval for Metric Localization,

    K. Vidanapathirana, P. Moghadam, S. Sridharan, and C. Fookes, “Spectral Geometric Verification: Re-Ranking Point Cloud Retrieval for Metric Localization,”IEEE Robot. Automat. Lett., vol. 8, no. 5, pp. 2494–2501, May 2023

  2. [2]

    CrossLoc3D: Aerial- Ground Cross-Source 3D Place Recognition,

    T. Guan, A. Muthuselvam, M. Hoover, X. Wang, J. Liang, A. J. Sathyamoorthy, D. Conover, and D. Manocha, “CrossLoc3D: Aerial- Ground Cross-Source 3D Place Recognition,”Proc. IEEE/CVF Int. Conf. Comput. Vis., pp. 11 301–11 310, 2023

  3. [3]

    Online 6DoF Global Localisation in Forests using Semantically-Guided Re- Localisation and Cross-View Factor-Graph Optimisation,

    L. Carvalho de Lima, E. Griffiths, M. Haghighat, S. Denman, C. Fookes, P. Borges, M. Brunig, and M. Ramezani, “Online 6DoF Global Localisation in Forests using Semantically-Guided Re- Localisation and Cross-View Factor-Graph Optimisation,” inProc. IEEE/RSJ Int. Conf. Intell. Robots Syst., 2025

  4. [4]

    HOTFormerLoc: Hierarchical Octree Transformer for Versatile Li- dar Place Recognition Across Ground and Aerial Views,

    E. Griffiths, M. Haghighat, S. Denman, C. Fookes, and M. Ramezani, “HOTFormerLoc: Hierarchical Octree Transformer for Versatile Li- dar Place Recognition Across Ground and Aerial Views,” inProc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit., 2025, pp. 6648– 6658

  5. [5]

    Scan Context: Egocentric Spatial Descriptor for Place Recognition Within 3D Point Cloud Map,

    G. Kim and A. Kim, “Scan Context: Egocentric Spatial Descriptor for Place Recognition Within 3D Point Cloud Map,” inProc. IEEE/RSJ Int. Conf. Intell. Robots Syst., 2018, pp. 4802–4809

  6. [6]

    RING++: Roto-Translation Invariant Gram for Global Localization on a Sparse Scan Map,

    X. Xu, S. Lu, J. Wu, H. Lu, Q. Zhu, Y . Liao, R. Xiong, and Y . Wang, “RING++: Roto-Translation Invariant Gram for Global Localization on a Sparse Scan Map,”IEEE Trans. Robot., vol. 39, no. 6, pp. 4616– 4635, Dec. 2023

  7. [7]

    PointNetVLAD: Deep Point Cloud Based Retrieval for Large-Scale Place Recognition,

    M. A. Uy and G. H. Lee, “PointNetVLAD: Deep Point Cloud Based Retrieval for Large-Scale Place Recognition,” inProc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit., June 2018, pp. 4470–4479

  8. [8]

    DH3D: Deep Hierarchical 3D Descriptors for Robust Large-Scale 6DoF Relocalization,

    J. Du, R. Wang, and D. Cremers, “DH3D: Deep Hierarchical 3D Descriptors for Robust Large-Scale 6DoF Relocalization,” inProc. Eur. Conf. Comput. Vis., 2020, pp. 744–762

  9. [9]

    MinkLoc3D: Point Cloud Based Large-Scale Place Recognition,

    J. Komorowski, “MinkLoc3D: Point Cloud Based Large-Scale Place Recognition,” inProc. IEEE Winter Conf. Appl. Comput. Vis., Jan. 2021, pp. 1789–1798

  10. [10]

    LCDNet: Deep Loop Closure Detection and Point Cloud Registration for LiDAR SLAM,

    D. Cattaneo, M. Vaghi, and A. Valada, “LCDNet: Deep Loop Closure Detection and Point Cloud Registration for LiDAR SLAM,”IEEE Trans. Robot., vol. 38, no. 4, pp. 2074–2093, Aug. 2022

  11. [11]

    LoGG3D-Net: Locally Guided Global Descriptor Learn- ing for 3D Place Recognition,

    K. Vidanapathirana, M. Ramezani, P. Moghadam, S. Sridharan, and C. Fookes, “LoGG3D-Net: Locally Guided Global Descriptor Learn- ing for 3D Place Recognition,” inProc. IEEE Int. Conf. Robot. Automat., 2022, pp. 2215–2221

  12. [12]

    Improving Point Cloud Based Place Recognition with Ranking-based Loss and Large Batch Training,

    J. Komorowski, “Improving Point Cloud Based Place Recognition with Ranking-based Loss and Large Batch Training,” in26th Int. Conf. Pattern Recognit.IEEE, 2022, pp. 3699–3705

  13. [13]

    EgoNN: Egocen- tric Neural Network for Point Cloud Based 6DoF Relocalization at the City Scale,

    J. Komorowski, M. Wysoczanska, and T. Trzcinski, “EgoNN: Egocen- tric Neural Network for Point Cloud Based 6DoF Relocalization at the City Scale,”IEEE Robot. Automat. Lett., vol. 7, no. 2, pp. 722–729, Apr. 2022

  14. [14]

    Pyramid Point Cloud Transformer for Large-Scale Place Recognition,

    L. Hui, H. Yang, M. Cheng, J. Xie, and J. Yang, “Pyramid Point Cloud Transformer for Large-Scale Place Recognition,” inProc. IEEE/CVF Int. Conf. Comput. Vis., 2021, pp. 6098–6107

  15. [15]

    TransLoc3D: Point cloud based large-scale place recognition using adaptive receptive fields,

    T.-X. Xu, Y .-C. Guo, Z. Li, G. Yu, Y .-K. Lai, and S.-H. Zhang, “TransLoc3D: Point cloud based large-scale place recognition using adaptive receptive fields,”Commun. Inf. Syst., vol. 23, no. 1, pp. 57– 83, 2023

  16. [16]

    SALSA: Swift Adaptive Lightweight Self-Attention for Enhanced LiDAR Place Recognition,

    R. G. Goswami, N. Patel, P. Krishnamurthy, and F. Khorrami, “SALSA: Swift Adaptive Lightweight Self-Attention for Enhanced LiDAR Place Recognition,”IEEE Robot. Autom. Lett., vol. 9, no. 10, pp. 8242–8249, Oct. 2024

  17. [17]

    Wild-Places: A Large-Scale Dataset for Lidar Place Recognition in Unstructured Natural Environments,

    J. Knights, K. Vidanapathirana, M. Ramezani, S. Sridharan, C. Fookes, and P. Moghadam, “Wild-Places: A Large-Scale Dataset for Lidar Place Recognition in Unstructured Natural Environments,” inProc. IEEE Int. Conf. Robot. Automat., 2023, pp. 11 322–11 328

  18. [18]

    Random sample consensus: A paradigm for model fitting with applications to image analysis and automated cartography,

    M. A. Fischler and R. C. Bolles, “Random sample consensus: A paradigm for model fitting with applications to image analysis and automated cartography,”Commun. ACM, vol. 24, no. 6, pp. 381–395, June 1981

  19. [19]

    PointDSC: Robust Point Cloud Registration using Deep Spatial Consistency,

    X. Bai, Z. Luo, L. Zhou, H. Chen, L. Li, Z. Hu, H. Fu, and C.- L. Tai, “PointDSC: Robust Point Cloud Registration using Deep Spatial Consistency,” inProc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit., June 2021, pp. 15 854–15 864

  20. [20]

    CoFiNet: Reliable Coarse-to-fine Correspondences for Robust PointCloud Registration,

    H. Yu, F. Li, M. Saleh, B. Busam, and S. Ilic, “CoFiNet: Reliable Coarse-to-fine Correspondences for Robust PointCloud Registration,” inProc. Adv. Neural Inf. Process. Syst., vol. 34, 2021, pp. 23 872– 23 884

  21. [21]

    GeoTransformer: Fast and Robust Point Cloud Registration With Geometric Transformer,

    Z. Qin, H. Yu, C. Wang, Y . Guo, Y . Peng, S. Ilic, D. Hu, and K. Xu, “GeoTransformer: Fast and Robust Point Cloud Registration With Geometric Transformer,”IEEE Trans. Pattern Anal. Machine Intell., vol. 45, no. 8, pp. 9806–9821, Aug. 2023

  22. [22]

    GeoAdapt: Self-Supervised Test-Time Adaptation in LiDAR Place Recognition Using Geometric Priors,

    J. Knights, S. Hausler, S. Sridharan, C. Fookes, and P. Moghadam, “GeoAdapt: Self-Supervised Test-Time Adaptation in LiDAR Place Recognition Using Geometric Priors,”IEEE Robot. Automat. Lett., vol. 9, no. 1, pp. 915–922, Jan. 2024

  23. [23]

    A spectral technique for correspon- dence problems using pairwise constraints,

    M. Leordeanu and M. Hebert, “A spectral technique for correspon- dence problems using pairwise constraints,” inProc. 10th IEEE Int. Conf. Comput. Vis., vol. 2, Oct. 2005, pp. 1482–1489

  24. [24]

    Super- Glue: Learning Feature Matching With Graph Neural Networks,

    P.-E. Sarlin, D. DeTone, T. Malisiewicz, and A. Rabinovich, “Super- Glue: Learning Feature Matching With Graph Neural Networks,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit., June 2020, pp. 4937–4946

  25. [25]

    In Defense of the Triplet Loss for Person Re-Identification

    A. Hermans, L. Beyer, and B. Leibe, “In Defense of the Triplet Loss for Person Re-Identification,” Nov. 2017, arXiv:1703.07737 [cs]

  26. [26]

    MulRan: Multimodal Range Dataset for Urban Place Recognition,

    G. Kim, Y . S. Park, Y . Cho, J. Jeong, and A. Kim, “MulRan: Multimodal Range Dataset for Urban Place Recognition,” inProc. IEEE Int. Conf. Robot. Automat., 2020, pp. 6246–6253

  27. [27]

    Sharpness-Aware Training for Free,

    J. Du, D. Zhou, J. Feng, V . Tan, and J. T. Zhou, “Sharpness-Aware Training for Free,”Proc. Adv. Neural Inf. Process. Syst., Dec. 2022