Beyond First-Order: Learning Riemannian Geometries for Invariant Visual Place Recognition

Chi Man Vong; Jintao Cheng; Jin Wu; Weibin Li; Wei Zhang; Zhijian He

arxiv: 2602.00841 · v4 · pith:TCOVQJ3Tnew · submitted 2026-01-31 · 💻 cs.CV

Beyond First-Order: Learning Riemannian Geometries for Invariant Visual Place Recognition

Jintao Cheng , Weibin Li , Zhijian He , Jin Wu , Chi Man Vong , Wei Zhang This is my paper

Pith reviewed 2026-05-21 13:32 UTC · model grok-4.3

classification 💻 cs.CV

keywords visual place recognitionRiemannian geometrySPD manifoldinvariant aggregationsecond-order poolingcovariance descriptorszero-shot adaptationcongruence transformations

0 comments

The pith

RIA models second-order scene structure on the SPD manifold to deliver invariant visual place recognition that matches supervised methods in zero-shot settings.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper proposes Riemannian Invariant Aggregation (RIA) as a geometric approach to visual place recognition. It addresses the shortcomings of first-order pooling and heavy supervised training by representing scenes through covariance descriptors on the Symmetric Positive Definite manifold. Perturbations are handled as congruence transformations that Riemannian mappings can linearize while keeping structural invariants intact and reducing noise. This produces representations that perform comparably to supervised baselines without training and reach state-of-the-art results after light fine-tuning, with particular gains in unstructured scenes. A reader would care because robust place recognition under viewpoint and environmental shifts remains essential for robotics and navigation systems where labeled data is costly to obtain.

Core claim

By explicitly modeling second-order scene structure on the Symmetric Positive Definite (SPD) manifold and leveraging geometry-aware Riemannian mappings to project covariance descriptors into a linearized Euclidean space, perturbations can be treated as tractable congruence transformations that preserve invariant structural components while suppressing noise.

What carries the argument

Riemannian Invariant Aggregation (RIA), which represents scene structure via covariance matrices on the Symmetric Positive Definite (SPD) manifold and applies geometry-aware mappings to enforce invariance under congruence transformations.

If this is right

RIA achieves zero-shot performance comparable to supervised methods on visual place recognition tasks.
Simple fine-tuning on top of RIA yields state-of-the-art accuracy.
Gains are largest in unstructured environments where first-order methods lose structural correlations.
The approach avoids the high adaptation costs of purely supervised aggregation pipelines.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same manifold projection idea could be tested on related tasks such as object tracking or scene understanding under motion blur.
Combining RIA descriptors with modern transformer backbones might further improve invariance without increasing training data needs.
Measuring how the method scales when the number of covariance dimensions grows would clarify practical deployment limits.

Load-bearing premise

Visual scene perturbations can be treated as tractable congruence transformations on the SPD manifold such that Riemannian mappings preserve the important structural parts while removing noise.

What would settle it

A head-to-head test on standard VPR benchmarks with large viewpoint and lighting changes in which RIA shows no accuracy advantage over ordinary first-order pooling in the zero-shot case would disprove the central claim.

Figures

Figures reproduced from arXiv: 2602.00841 by Chi Man Vong, Jintao Cheng, Jin Wu, Weibin Li, Wei Zhang, Zhijian He.

**Figure 2.** Figure 2: Schematic overview of the proposed Riemannian Invariant Aggregation (RIA) framework. The pipeline transforms local features from a frozen backbone into a robust global descriptor through four geometric phases: Stage 1: High-dimensional features are projected onto a lower-dimensional subspace to ensure a full-rank covariance estimation. Stage 2: We compute the sample covariance and apply ReCov to suppress s… view at source ↗

**Figure 3.** Figure 3: Feature distance drift under illumination and viewpoint [PITH_FULL_IMAGE:figures/full_fig_p006_3.png] view at source ↗

**Figure 4.** Figure 4: Qualitative comparison of Top-3 retrieval results. Comparison between our RIA (top row) and the VLAD baseline (bottom row). (a) Under drastic illumination changes (Nightto-Day on Tokyo24/7), RIA successfully retrieves the correct scene. (b) Under viewpoint and scale variations (Pitts30k), RIA accurately identifies the landmark. Green and red borders indicate correct and incorrect matches, respectively. No… view at source ↗

**Figure 5.** Figure 5: Qualitative retrieval results under challenging conditions. Each row represents a different challenge scenario: (1) Seasonal Variation, (2) Occlusion, (3) Illumination Change, and (4) Perspective Change. For each scenario, we show the query image (blue border), the ground truth match, and the top-3 retrieved results. Green borders indicate correct matches, while red borders denote incorrect retrievals. Our… view at source ↗

read the original abstract

Visual Place Recognition (VPR) demands representations robust to drastic environmental and viewpoint shifts. Existing aggregation paradigms either depend on extensive supervised training or rely on first-order pooling, often struggling to preserve structural correlations under extreme shifts or incurring high adaptation costs. In this work, we propose Riemannian Invariant Aggregation (RIA), a unified geometric framework that explicitly models second-order scene structure on the Symmetric Positive Definite (SPD) manifold. By treating perturbations as tractable congruence transformations, RIA leverages geometry-aware Riemannian mappings to project covariance descriptors into a linearized Euclidean space, effectively preserving invariant structural components while suppressing noise. Extensive evaluations demonstrate that RIA achieves zero-shot performance comparable to supervised methods, and establishes state-of-the-art accuracy with simple fine-tuning, particularly in unstructured environments. The source code will be released.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

RIA frames VPR invariance via SPD covariances and Riemannian mappings under a congruence model, but that modeling choice is the part that needs checking.

read the letter

The main takeaway is that this paper pushes second-order pooling for visual place recognition by mapping covariance descriptors to the SPD manifold and using Riemannian operations to handle invariance. They treat scene changes as congruence transformations and project to Euclidean space to keep structural correlations while damping noise, claiming zero-shot results near supervised baselines and SOTA after light fine-tuning, especially in unstructured settings.

Referee Report

2 major / 2 minor

Summary. The manuscript proposes Riemannian Invariant Aggregation (RIA) for Visual Place Recognition. It represents second-order scene structure via covariance descriptors on the Symmetric Positive Definite (SPD) manifold, models perturbations as congruence transformations, and applies geometry-aware Riemannian mappings (Log-Euclidean or affine-invariant) to project into a Euclidean space that preserves invariant components while attenuating noise. The central claims are that RIA attains zero-shot performance comparable to supervised baselines and reaches state-of-the-art accuracy after simple fine-tuning, especially in unstructured environments.

Significance. If the modeling assumption and empirical results hold, the work offers a principled geometric alternative to first-order pooling and heavy supervision in VPR. By extending standard SPD-manifold techniques to handle drastic viewpoint and environmental shifts, it could lower adaptation costs in robotics and navigation applications. The stated intention to release source code would support reproducibility and further testing of the Riemannian mappings.

major comments (2)

[§3.2] §3.2 (Modeling of perturbations): The assertion that real VPR perturbations act as tractable congruence transformations A ↦ P A P^T on SPD covariances is load-bearing for the invariance claim, yet the manuscript provides neither a formal justification nor an ablation isolating non-congruence effects (illumination gradients, seasonal texture shifts, partial occlusions). If these effects alter local descriptor distributions outside the congruence model, the subsequent Riemannian projection cannot be guaranteed to deliver the stated noise suppression.
[§4] §4 (Experimental validation): The abstract and results section assert zero-shot parity with supervised methods and SOTA after fine-tuning, but supply no concrete metrics, error bars, dataset statistics, or baseline implementations. Without these, the performance claims cannot be independently verified and the cross-environment superiority remains unquantified.

minor comments (2)

[§3.1] Notation for the two Riemannian mappings (Log-Euclidean vs. affine-invariant) should be written out explicitly with the corresponding matrix equations to avoid ambiguity in the projection step.
[Figure 4] Figure captions and axis labels in the qualitative results could be expanded to indicate which environmental factors (viewpoint, illumination, season) are being visualized.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback and positive overall assessment of our work on Riemannian Invariant Aggregation for Visual Place Recognition. We address each major comment point by point below, clarifying our modeling choices and experimental reporting while committing to revisions that strengthen the manuscript without altering its core contributions.

read point-by-point responses

Referee: [§3.2] §3.2 (Modeling of perturbations): The assertion that real VPR perturbations act as tractable congruence transformations A ↦ P A P^T on SPD covariances is load-bearing for the invariance claim, yet the manuscript provides neither a formal justification nor an ablation isolating non-congruence effects (illumination gradients, seasonal texture shifts, partial occlusions). If these effects alter local descriptor distributions outside the congruence model, the subsequent Riemannian projection cannot be guaranteed to deliver the stated noise suppression.

Authors: We agree that the congruence transformation model is central to the invariance properties claimed for RIA. Section 3.2 motivates this choice by showing that common VPR perturbations (viewpoint changes, affine warps) induce linear transformations on local descriptors, which translate to congruence on the resulting covariance matrices; this is consistent with prior SPD descriptor work. We do not claim the model covers every possible perturbation, and we acknowledge that effects such as strong illumination gradients or seasonal changes may deviate from pure congruence. To address this, we will expand Section 3.2 with a clearer discussion of the modeling assumptions and their limitations, and add a targeted ablation that introduces controlled non-congruence perturbations (synthetic illumination and occlusion) to quantify any degradation in the Riemannian projection's noise suppression. revision: yes
Referee: [§4] §4 (Experimental validation): The abstract and results section assert zero-shot parity with supervised methods and SOTA after fine-tuning, but supply no concrete metrics, error bars, dataset statistics, or baseline implementations. Without these, the performance claims cannot be independently verified and the cross-environment superiority remains unquantified.

Authors: We thank the referee for noting the need for greater transparency in the experimental section. While the full manuscript contains tables with accuracy figures, standard deviations, dataset statistics, and baseline details (including implementation references), we recognize that these were not sufficiently highlighted in the abstract or summarized for quick verification. We will revise the abstract to include key quantitative results (e.g., zero-shot and fine-tuned accuracies on standard VPR benchmarks with error bars) and expand the results section with an explicit summary table of all metrics, dataset characteristics, and baseline configurations to facilitate independent verification and better quantify cross-environment gains. revision: yes

Circularity Check

0 steps flagged

No significant circularity; derivation relies on standard manifold geometry and empirical validation

full rationale

The paper introduces RIA by adopting established SPD manifold properties and congruence transformations A ↦ P A P^T as modeling assumptions, then applies known Riemannian mappings (Log-Euclidean or affine-invariant) to linearize descriptors. Performance results are presented as outcomes of extensive evaluations rather than any fitted parameter renamed as a prediction or any self-referential definition. No load-bearing step reduces by construction to its own inputs, and the central claims remain independent of the provided abstract and described framework.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 1 invented entities

The central claim rests on the domain assumption that visual perturbations admit tractable congruence transformations on the SPD manifold and that Riemannian mappings can isolate invariant components; no free parameters or new entities are explicitly introduced in the abstract.

axioms (1)

domain assumption Perturbations in visual scenes can be modeled as tractable congruence transformations on the SPD manifold
Directly stated in the abstract as the basis for applying geometry-aware Riemannian mappings.

invented entities (1)

Riemannian Invariant Aggregation (RIA) no independent evidence
purpose: Project covariance descriptors into linearized Euclidean space while preserving invariant structural components
New named framework introduced to unify the geometric treatment of second-order scene structure.

pith-pipeline@v0.9.0 · 5672 in / 1376 out tokens · 49987 ms · 2026-05-21T13:32:04.653742+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

By treating perturbations as tractable congruence transformations, RIA leverages geometry-aware Riemannian mappings to project covariance descriptors into a linearized Euclidean space
IndisputableMonolith/Foundation/AlphaCoordinateFixation.lean costAlphaLog_fourth_deriv_at_zero unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

PEM distance with power α=0.5 … matrix square root … Newton-Schulz iterations

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

40 extracted references · 40 canonical work pages · 3 internal anchors

[1]

Visual place recognition: A survey,

S. Lowry, N. S ¨underhauf, P. Newman, J. J. Leonard, D. Cox, P. Corke, and M. J. Milford, “Visual place recognition: A survey,”ieee transactions on robotics, vol. 32, no. 1, pp. 1–19, 2015

work page 2015
[2]

Visual place recognition: A survey from deep learning perspective,

X. Zhang, L. Wang, and Y . Su, “Visual place recognition: A survey from deep learning perspective,”Pattern Recognition, vol. 113, p. 107760, 2021

work page 2021
[3]

Seqslam: Visual route-based navigation for sunny summer days and stormy winter nights,

M. J. Milford and G. F. Wyeth, “Seqslam: Visual route-based navigation for sunny summer days and stormy winter nights,” in2012 IEEE international conference on robotics and automation, pp. 1643–1649, IEEE, 2012

work page 2012
[4]

Benchmarking 6dof outdoor visual localization in changing conditions,

T. Sattler, W. Maddern, C. Toft, A. Torii, L. Hammarstrand, E. Stenborg, D. Safari, M. Okutomi, M. Pollefeys, J. Sivic,et al., “Benchmarking 6dof outdoor visual localization in changing conditions,” inProceedings of the IEEE conference on computer vision and pattern recognition, pp. 8601–8610, 2018

work page 2018
[5]

Patch-netvlad: Multi-scale fusion of locally-global descriptors for place recognition,

S. Hausler, S. Garg, M. Xu, M. Milford, and T. Fischer, “Patch-netvlad: Multi-scale fusion of locally-global descriptors for place recognition,” inProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 14141–14152, 2021

work page 2021
[6]

Rethinking visual geo-localization for large-scale applications,

G. Berton, C. Masone, and B. Caputo, “Rethinking visual geo-localization for large-scale applications,” inProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 4878–4888, 2022

work page 2022
[7]

Vpr-bench: An open-source visual place recognition evaluation framework with quantifiable viewpoint and appearance change,

M. Zaffar, S. Garg, M. Milford, J. Kooij, D. Flynn, K. McDonald-Maier, and S. Ehsan, “Vpr-bench: An open-source visual place recognition evaluation framework with quantifiable viewpoint and appearance change,” International Journal of Computer Vision, vol. 129, no. 7, pp. 2136–2174, 2021

work page 2021
[8]

Mapillary street-level sequences: A dataset for lifelong place recognition,

F. Warburg, S. Hauberg, M. Lopez-Antequera, P. Gargallo, Y . Kuang, and J. Civera, “Mapillary street-level sequences: A dataset for lifelong place recognition,” inProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 2626–2635, 2020

work page 2020
[9]

Netvlad: Cnn architecture for weakly supervised place recognition,

R. Arandjelovic, P. Gronat, A. Torii, T. Pajdla, and J. Sivic, “Netvlad: Cnn architecture for weakly supervised place recognition,” inProceedings of the IEEE conference on computer vision and pattern recognition, pp. 5297–5307, 2016

work page 2016
[10]

R2former: Unified retrieval and reranking transformer for place recognition,

S. Zhu, L. Yang, C. Chen, M. Shah, X. Shen, and H. Wang, “R2former: Unified retrieval and reranking transformer for place recognition,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 19370–19380, 2023

work page 2023
[11]

DINO: DETR with Improved DeNoising Anchor Boxes for End-to-End Object Detection

H. Zhang, F. Li, S. Liu, L. Zhang, H. Su, J. Zhu, L. M. Ni, and H.-Y . Shum, “Dino: Detr with improved denoising anchor boxes for end-to-end object detection,”arXiv preprint arXiv:2203.03605, 2022

work page internal anchor Pith review Pith/arXiv arXiv 2022
[12]

Mixvpr: Feature mixing for visual place recognition,

A. Ali-Bey, B. Chaib-Draa, and P. Giguere, “Mixvpr: Feature mixing for visual place recognition,” inProceedings of the IEEE/CVF winter conference on applications of computer vision, pp. 2998–3007, 2023

work page 2023
[13]

Cricavpr: Cross-image correlation-aware representation learning for visual place recognition,

F. Lu, X. Lan, L. Zhang, D. Jiang, Y . Wang, and C. Yuan, “Cricavpr: Cross-image correlation-aware representation learning for visual place recognition,” inProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 16772–16782, 2024

work page 2024
[14]

Anyloc: Towards universal visual place recognition,

N. Keetha, A. Mishra, J. Karhade, K. M. Jatavallabhula, S. Scherer, M. Krishna, and S. Garg, “Anyloc: Towards universal visual place recognition,”IEEE Robotics and Automation Letters, vol. 9, no. 2, pp. 1286–1293, 2023

work page 2023
[15]

A hyperdimensional one place signature to represent them all: Stackable descriptors for visual place recognition,

C. Malone, S. Hussaini, T. Fischer, and M. Milford, “A hyperdimensional one place signature to represent them all: Stackable descriptors for visual place recognition,” inProceedings of the IEEE/CVF International Conference on Computer Vision, pp. 9822–9833, 2025

work page 2025
[16]

Fine-tuning cnn image retrieval with no human annotation,

F. Radenovi´c, G. Tolias, and O. Chum, “Fine-tuning cnn image retrieval with no human annotation,”IEEE transactions on pattern analysis and machine intelligence, vol. 41, no. 7, pp. 1655–1668, 2018

work page 2018
[17]

A riemannian network for spd matrix learning,

Z. Huang and L. Van Gool, “A riemannian network for spd matrix learning,” inProceedings of the AAAI conference on artificial intelligence, vol. 31, 2017

work page 2017
[18]

Eigenplaces: Training viewpoint robust models for visual place recognition,

G. Berton, G. Trivigno, B. Caputo, and C. Masone, “Eigenplaces: Training viewpoint robust models for visual place recognition,” inProceedings of the IEEE/CVF International Conference on Computer Vision, pp. 11080– 11090, 2023

work page 2023
[19]

Transvpr: Transformer-based place recognition with multi-level attention aggrega- tion,

R. Wang, Y . Shen, W. Zuo, S. Zhou, and N. Zheng, “Transvpr: Transformer-based place recognition with multi-level attention aggrega- tion,” inProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 13648–13657, 2022

work page 2022
[20]

Towards seamless adaptation of pre-trained models for visual place recognition,

F. Lu, L. Zhang, X. Lan, S. Dong, Y . Wang, and C. Yuan, “Towards seamless adaptation of pre-trained models for visual place recognition,” arXiv preprint arXiv:2402.14505, 2024

work page arXiv 2024
[21]

Supervlad: Compact and robust image descriptors for visual place recognition,

F. Lu, X. Zhang, C. Ye, S. Dong, L. Zhang, X. Lan, and C. Yuan, “Supervlad: Compact and robust image descriptors for visual place recognition,”Advances in Neural Information Processing Systems, vol. 37, pp. 5789–5816, 2024

work page 2024
[22]

Optimal transport aggregation for visual place recognition,

S. Izquierdo and J. Civera, “Optimal transport aggregation for visual place recognition,” inProceedings of the ieee/cvf conference on computer vision and pattern recognition, pp. 17658–17668, 2024

work page 2024
[23]

Dreamnet: A deep riemannian manifold network for spd matrix learning,

R. Wang, X.-J. Wu, Z. Chen, T. Xu, and J. Kittler, “Dreamnet: A deep riemannian manifold network for spd matrix learning,” inProceedings of the Asian conference on computer vision, pp. 3241–3257, 2022

work page 2022
[24]

Riemannian local mechanism for spd neural networks,

Z. Chen, T. Xu, X.-J. Wu, R. Wang, Z. Huang, and J. Kittler, “Riemannian local mechanism for spd neural networks,” inProceedings of the AAAI Conference on Artificial Intelligence, vol. 37, pp. 7104–7112, 2023

work page 2023
[25]

Learning to optimize on spd manifolds,

Z. Gao, Y . Wu, Y . Jia, and M. Harandi, “Learning to optimize on spd manifolds,” inProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 7700–7709, 2020

work page 2020
[26]

Generalized learning riemannian space quantization: A case study on riemannian manifold of spd matrices,

F. Tang, M. Fan, and P. Ti ˇno, “Generalized learning riemannian space quantization: A case study on riemannian manifold of spd matrices,” IEEE Transactions on Neural Networks and Learning Systems, vol. 32, no. 1, pp. 281–292, 2020

work page 2020
[27]

Geometry-aware similarity learning on spd manifolds for visual recognition,

Z. Huang, R. Wang, X. Li, W. Liu, S. Shan, L. Van Gool, and X. Chen, “Geometry-aware similarity learning on spd manifolds for visual recognition,”IEEE Transactions on Circuits and Systems for Video Technology, vol. 28, no. 10, pp. 2513–2523, 2017

work page 2017
[28]

Deep metric learning on the spd manifold for image set classification,

R. Wang, X.-J. Wu, T. Xu, C. Hu, and J. Kittler, “Deep metric learning on the spd manifold for image set classification,”IEEE transactions on circuits and systems for video technology, vol. 34, no. 2, pp. 663–680, 2022

work page 2022
[29]

Power Euclidean metrics for covariance matrices with application to diffusion tensor imaging

I. L. Dryden, X. Pennec, and J.-M. Peyrat, “Power euclidean metrics for covariance matrices with application to diffusion tensor imaging,”arXiv preprint arXiv:1009.3045, 2010

work page internal anchor Pith review Pith/arXiv arXiv 2010
[30]

Towards faster training of global covariance pooling networks by iterative matrix square root normalization,

P. Li, J. Xie, Q. Wang, and Z. Gao, “Towards faster training of global covariance pooling networks by iterative matrix square root normalization,” inProceedings of the IEEE conference on computer vision and pattern recognition, pp. 947–955, 2018

work page 2018
[31]

DINOv2: Learning Robust Visual Features without Supervision

M. Oquab, T. Darcet, T. Moutakanni, H. V o, M. Szafraniec, V . Khalidov, P. Fernandez, D. Haziza, F. Massa, A. El-Nouby,et al., “Dinov2: Learning robust visual features without supervision,”arXiv preprint arXiv:2304.07193, 2023

work page internal anchor Pith review Pith/arXiv arXiv 2023
[32]

Indoor place recognition system for localization of mobile robots,

R. Sahdev and J. K. Tsotsos, “Indoor place recognition system for localization of mobile robots,” in2016 13th Conference on computer and robot vision (CRV), pp. 53–60, IEEE, 2016

work page 2016
[33]

24/7 place recognition by view synthesis,

A. Torii, R. Arandjelovic, J. Sivic, M. Okutomi, and T. Pajdla, “24/7 place recognition by view synthesis,” inProceedings of the IEEE conference on computer vision and pattern recognition, pp. 1808–1817, 2015

work page 2015
[34]

A dataset for benchmarking image-based localization,

X. Sun, Y . Xie, P. Luo, and L. Wang, “A dataset for benchmarking image-based localization,” inProceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 7436–7444, 2017

work page 2017
[35]

Visual place recognition with repetitive structures,

A. Torii, J. Sivic, T. Pajdla, and M. Okutomi, “Visual place recognition with repetitive structures,” inProceedings of the IEEE conference on computer vision and pattern recognition, pp. 883–890, 2013

work page 2013
[36]

Gardens point day and night, left and right,

A. Glover, “Gardens point day and night, left and right,”Zenodo DOI, vol. 10, p. 3, 2014

work page 2014
[37]

On the performance of convnet features for place recognition,

N. S ¨underhauf, S. Shirazi, F. Dayoub, B. Upcroft, and M. Milford, “On the performance of convnet features for place recognition,” in2015 IEEE/RSJ international conference on intelligent robots and systems (IROS), pp. 4297–4304, IEEE, 2015

work page 2015
[38]

1 year, 1000 km: The oxford robotcar dataset,

W. Maddern, G. Pascoe, C. Linegar, and P. Newman, “1 year, 1000 km: The oxford robotcar dataset,”The International Journal of Robotics Research, vol. 36, no. 1, pp. 3–15, 2017

work page 2017
[39]

Unaided stereo vision based pose estimation,

M. Warren, D. McKinnon, H. He, and B. Upcroft, “Unaided stereo vision based pose estimation,” inProceedings of the 2010 Australasian Conference on Robotics and Automation, pp. 1–8, Australian Robotics & Automation Association, 2010. APPENDIXCONTENTS A Notations and abbreviations. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ...

work page 2010
[40]

= 4∥QC 1/2 1 Q⊤ −QC 1/2 2 Q⊤∥2 F (15) = 4∥Q(C 1/2 1 −C 1/2 2 )Q⊤∥2 F .(16) Applying the unitary invariance property of the Frobenius norm ( ∥U AV∥ F =∥A∥ F for orthogonal U,V ), the rotation matricesQandQ ⊤ are eliminated: 4∥Q(C 1/2 1 −C 1/2 2 )Q⊤∥2 F = 4∥C 1/2 1 −C 1/2 2 ∥2 F =d 2 PEM(C 1,C 2).(17) Thus, the distance remains invariant. RemarkB.3.This the...

work page

[1] [1]

Visual place recognition: A survey,

S. Lowry, N. S ¨underhauf, P. Newman, J. J. Leonard, D. Cox, P. Corke, and M. J. Milford, “Visual place recognition: A survey,”ieee transactions on robotics, vol. 32, no. 1, pp. 1–19, 2015

work page 2015

[2] [2]

Visual place recognition: A survey from deep learning perspective,

X. Zhang, L. Wang, and Y . Su, “Visual place recognition: A survey from deep learning perspective,”Pattern Recognition, vol. 113, p. 107760, 2021

work page 2021

[3] [3]

Seqslam: Visual route-based navigation for sunny summer days and stormy winter nights,

M. J. Milford and G. F. Wyeth, “Seqslam: Visual route-based navigation for sunny summer days and stormy winter nights,” in2012 IEEE international conference on robotics and automation, pp. 1643–1649, IEEE, 2012

work page 2012

[4] [4]

Benchmarking 6dof outdoor visual localization in changing conditions,

T. Sattler, W. Maddern, C. Toft, A. Torii, L. Hammarstrand, E. Stenborg, D. Safari, M. Okutomi, M. Pollefeys, J. Sivic,et al., “Benchmarking 6dof outdoor visual localization in changing conditions,” inProceedings of the IEEE conference on computer vision and pattern recognition, pp. 8601–8610, 2018

work page 2018

[5] [5]

Patch-netvlad: Multi-scale fusion of locally-global descriptors for place recognition,

S. Hausler, S. Garg, M. Xu, M. Milford, and T. Fischer, “Patch-netvlad: Multi-scale fusion of locally-global descriptors for place recognition,” inProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 14141–14152, 2021

work page 2021

[6] [6]

Rethinking visual geo-localization for large-scale applications,

G. Berton, C. Masone, and B. Caputo, “Rethinking visual geo-localization for large-scale applications,” inProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 4878–4888, 2022

work page 2022

[7] [7]

Vpr-bench: An open-source visual place recognition evaluation framework with quantifiable viewpoint and appearance change,

M. Zaffar, S. Garg, M. Milford, J. Kooij, D. Flynn, K. McDonald-Maier, and S. Ehsan, “Vpr-bench: An open-source visual place recognition evaluation framework with quantifiable viewpoint and appearance change,” International Journal of Computer Vision, vol. 129, no. 7, pp. 2136–2174, 2021

work page 2021

[8] [8]

Mapillary street-level sequences: A dataset for lifelong place recognition,

F. Warburg, S. Hauberg, M. Lopez-Antequera, P. Gargallo, Y . Kuang, and J. Civera, “Mapillary street-level sequences: A dataset for lifelong place recognition,” inProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 2626–2635, 2020

work page 2020

[9] [9]

Netvlad: Cnn architecture for weakly supervised place recognition,

R. Arandjelovic, P. Gronat, A. Torii, T. Pajdla, and J. Sivic, “Netvlad: Cnn architecture for weakly supervised place recognition,” inProceedings of the IEEE conference on computer vision and pattern recognition, pp. 5297–5307, 2016

work page 2016

[10] [10]

R2former: Unified retrieval and reranking transformer for place recognition,

S. Zhu, L. Yang, C. Chen, M. Shah, X. Shen, and H. Wang, “R2former: Unified retrieval and reranking transformer for place recognition,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 19370–19380, 2023

work page 2023

[11] [11]

DINO: DETR with Improved DeNoising Anchor Boxes for End-to-End Object Detection

H. Zhang, F. Li, S. Liu, L. Zhang, H. Su, J. Zhu, L. M. Ni, and H.-Y . Shum, “Dino: Detr with improved denoising anchor boxes for end-to-end object detection,”arXiv preprint arXiv:2203.03605, 2022

work page internal anchor Pith review Pith/arXiv arXiv 2022

[12] [12]

Mixvpr: Feature mixing for visual place recognition,

A. Ali-Bey, B. Chaib-Draa, and P. Giguere, “Mixvpr: Feature mixing for visual place recognition,” inProceedings of the IEEE/CVF winter conference on applications of computer vision, pp. 2998–3007, 2023

work page 2023

[13] [13]

Cricavpr: Cross-image correlation-aware representation learning for visual place recognition,

F. Lu, X. Lan, L. Zhang, D. Jiang, Y . Wang, and C. Yuan, “Cricavpr: Cross-image correlation-aware representation learning for visual place recognition,” inProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 16772–16782, 2024

work page 2024

[14] [14]

Anyloc: Towards universal visual place recognition,

N. Keetha, A. Mishra, J. Karhade, K. M. Jatavallabhula, S. Scherer, M. Krishna, and S. Garg, “Anyloc: Towards universal visual place recognition,”IEEE Robotics and Automation Letters, vol. 9, no. 2, pp. 1286–1293, 2023

work page 2023

[15] [15]

A hyperdimensional one place signature to represent them all: Stackable descriptors for visual place recognition,

C. Malone, S. Hussaini, T. Fischer, and M. Milford, “A hyperdimensional one place signature to represent them all: Stackable descriptors for visual place recognition,” inProceedings of the IEEE/CVF International Conference on Computer Vision, pp. 9822–9833, 2025

work page 2025

[16] [16]

Fine-tuning cnn image retrieval with no human annotation,

F. Radenovi´c, G. Tolias, and O. Chum, “Fine-tuning cnn image retrieval with no human annotation,”IEEE transactions on pattern analysis and machine intelligence, vol. 41, no. 7, pp. 1655–1668, 2018

work page 2018

[17] [17]

A riemannian network for spd matrix learning,

Z. Huang and L. Van Gool, “A riemannian network for spd matrix learning,” inProceedings of the AAAI conference on artificial intelligence, vol. 31, 2017

work page 2017

[18] [18]

Eigenplaces: Training viewpoint robust models for visual place recognition,

G. Berton, G. Trivigno, B. Caputo, and C. Masone, “Eigenplaces: Training viewpoint robust models for visual place recognition,” inProceedings of the IEEE/CVF International Conference on Computer Vision, pp. 11080– 11090, 2023

work page 2023

[19] [19]

Transvpr: Transformer-based place recognition with multi-level attention aggrega- tion,

R. Wang, Y . Shen, W. Zuo, S. Zhou, and N. Zheng, “Transvpr: Transformer-based place recognition with multi-level attention aggrega- tion,” inProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 13648–13657, 2022

work page 2022

[20] [20]

Towards seamless adaptation of pre-trained models for visual place recognition,

F. Lu, L. Zhang, X. Lan, S. Dong, Y . Wang, and C. Yuan, “Towards seamless adaptation of pre-trained models for visual place recognition,” arXiv preprint arXiv:2402.14505, 2024

work page arXiv 2024

[21] [21]

Supervlad: Compact and robust image descriptors for visual place recognition,

F. Lu, X. Zhang, C. Ye, S. Dong, L. Zhang, X. Lan, and C. Yuan, “Supervlad: Compact and robust image descriptors for visual place recognition,”Advances in Neural Information Processing Systems, vol. 37, pp. 5789–5816, 2024

work page 2024

[22] [22]

Optimal transport aggregation for visual place recognition,

S. Izquierdo and J. Civera, “Optimal transport aggregation for visual place recognition,” inProceedings of the ieee/cvf conference on computer vision and pattern recognition, pp. 17658–17668, 2024

work page 2024

[23] [23]

Dreamnet: A deep riemannian manifold network for spd matrix learning,

R. Wang, X.-J. Wu, Z. Chen, T. Xu, and J. Kittler, “Dreamnet: A deep riemannian manifold network for spd matrix learning,” inProceedings of the Asian conference on computer vision, pp. 3241–3257, 2022

work page 2022

[24] [24]

Riemannian local mechanism for spd neural networks,

Z. Chen, T. Xu, X.-J. Wu, R. Wang, Z. Huang, and J. Kittler, “Riemannian local mechanism for spd neural networks,” inProceedings of the AAAI Conference on Artificial Intelligence, vol. 37, pp. 7104–7112, 2023

work page 2023

[25] [25]

Learning to optimize on spd manifolds,

Z. Gao, Y . Wu, Y . Jia, and M. Harandi, “Learning to optimize on spd manifolds,” inProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 7700–7709, 2020

work page 2020

[26] [26]

Generalized learning riemannian space quantization: A case study on riemannian manifold of spd matrices,

F. Tang, M. Fan, and P. Ti ˇno, “Generalized learning riemannian space quantization: A case study on riemannian manifold of spd matrices,” IEEE Transactions on Neural Networks and Learning Systems, vol. 32, no. 1, pp. 281–292, 2020

work page 2020

[27] [27]

Geometry-aware similarity learning on spd manifolds for visual recognition,

Z. Huang, R. Wang, X. Li, W. Liu, S. Shan, L. Van Gool, and X. Chen, “Geometry-aware similarity learning on spd manifolds for visual recognition,”IEEE Transactions on Circuits and Systems for Video Technology, vol. 28, no. 10, pp. 2513–2523, 2017

work page 2017

[28] [28]

Deep metric learning on the spd manifold for image set classification,

R. Wang, X.-J. Wu, T. Xu, C. Hu, and J. Kittler, “Deep metric learning on the spd manifold for image set classification,”IEEE transactions on circuits and systems for video technology, vol. 34, no. 2, pp. 663–680, 2022

work page 2022

[29] [29]

Power Euclidean metrics for covariance matrices with application to diffusion tensor imaging

I. L. Dryden, X. Pennec, and J.-M. Peyrat, “Power euclidean metrics for covariance matrices with application to diffusion tensor imaging,”arXiv preprint arXiv:1009.3045, 2010

work page internal anchor Pith review Pith/arXiv arXiv 2010

[30] [30]

Towards faster training of global covariance pooling networks by iterative matrix square root normalization,

P. Li, J. Xie, Q. Wang, and Z. Gao, “Towards faster training of global covariance pooling networks by iterative matrix square root normalization,” inProceedings of the IEEE conference on computer vision and pattern recognition, pp. 947–955, 2018

work page 2018

[31] [31]

DINOv2: Learning Robust Visual Features without Supervision

M. Oquab, T. Darcet, T. Moutakanni, H. V o, M. Szafraniec, V . Khalidov, P. Fernandez, D. Haziza, F. Massa, A. El-Nouby,et al., “Dinov2: Learning robust visual features without supervision,”arXiv preprint arXiv:2304.07193, 2023

work page internal anchor Pith review Pith/arXiv arXiv 2023

[32] [32]

Indoor place recognition system for localization of mobile robots,

R. Sahdev and J. K. Tsotsos, “Indoor place recognition system for localization of mobile robots,” in2016 13th Conference on computer and robot vision (CRV), pp. 53–60, IEEE, 2016

work page 2016

[33] [33]

24/7 place recognition by view synthesis,

A. Torii, R. Arandjelovic, J. Sivic, M. Okutomi, and T. Pajdla, “24/7 place recognition by view synthesis,” inProceedings of the IEEE conference on computer vision and pattern recognition, pp. 1808–1817, 2015

work page 2015

[34] [34]

A dataset for benchmarking image-based localization,

X. Sun, Y . Xie, P. Luo, and L. Wang, “A dataset for benchmarking image-based localization,” inProceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 7436–7444, 2017

work page 2017

[35] [35]

Visual place recognition with repetitive structures,

A. Torii, J. Sivic, T. Pajdla, and M. Okutomi, “Visual place recognition with repetitive structures,” inProceedings of the IEEE conference on computer vision and pattern recognition, pp. 883–890, 2013

work page 2013

[36] [36]

Gardens point day and night, left and right,

A. Glover, “Gardens point day and night, left and right,”Zenodo DOI, vol. 10, p. 3, 2014

work page 2014

[37] [37]

On the performance of convnet features for place recognition,

N. S ¨underhauf, S. Shirazi, F. Dayoub, B. Upcroft, and M. Milford, “On the performance of convnet features for place recognition,” in2015 IEEE/RSJ international conference on intelligent robots and systems (IROS), pp. 4297–4304, IEEE, 2015

work page 2015

[38] [38]

1 year, 1000 km: The oxford robotcar dataset,

W. Maddern, G. Pascoe, C. Linegar, and P. Newman, “1 year, 1000 km: The oxford robotcar dataset,”The International Journal of Robotics Research, vol. 36, no. 1, pp. 3–15, 2017

work page 2017

[39] [39]

Unaided stereo vision based pose estimation,

M. Warren, D. McKinnon, H. He, and B. Upcroft, “Unaided stereo vision based pose estimation,” inProceedings of the 2010 Australasian Conference on Robotics and Automation, pp. 1–8, Australian Robotics & Automation Association, 2010. APPENDIXCONTENTS A Notations and abbreviations. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ...

work page 2010

[40] [40]

= 4∥QC 1/2 1 Q⊤ −QC 1/2 2 Q⊤∥2 F (15) = 4∥Q(C 1/2 1 −C 1/2 2 )Q⊤∥2 F .(16) Applying the unitary invariance property of the Frobenius norm ( ∥U AV∥ F =∥A∥ F for orthogonal U,V ), the rotation matricesQandQ ⊤ are eliminated: 4∥Q(C 1/2 1 −C 1/2 2 )Q⊤∥2 F = 4∥C 1/2 1 −C 1/2 2 ∥2 F =d 2 PEM(C 1,C 2).(17) Thus, the distance remains invariant. RemarkB.3.This the...

work page