arxiv: 2605.07099 · v1 · submitted 2026-05-08 · 💻 cs.CV

Recognition: 2 theorem links

· Lean Theorem

InfoGeo: Information-Theoretic Object-Centric Learning for Cross-View Generalizable UAV Geo-Localization

Hongyang Zhang , Maonnan Wang , Ziyao Wang , Hongrui Yin , Man OnPun

Authors on Pith no claims yet

Pith reviewed 2026-05-11 01:24 UTC · model grok-4.3

classification 💻 cs.CV

keywords cross-view geo-localizationUAV localizationobject-centric learninginformation bottleneckview-invariant featuresdomain generalizationsatellite matching

0 comments

The pith

Reformulating cross-view geo-localization as an information bottleneck that aligns object-centric structural relations across views improves robustness to domain shifts and clutter in UAV scenarios.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper proposes InfoGeo to solve cross-view geo-localization for UAVs by moving beyond global feature matching, which breaks down under changing textures, weather, and dense object clutter from wide UAV perspectives. It draws on object-centric learning to extract structural relations among objects that stay consistent when the same scene is viewed from ground, air, or satellite angles. The method casts training as an information bottleneck that keeps the invariant relations while discarding view-specific noise via cross-view constraints. A sympathetic reader would care because reliable image-based localization without GPS enables safer navigation in GPS-denied or contested environments. If the approach holds, matching across extreme viewpoint changes becomes more reliable on standard benchmarks.

Core claim

InfoGeo reformulates the optimization as an information bottleneck process with two core objectives: maximizing view-invariant information by aligning the object-centric structural relations across views, and minimizing view-specific noisy signals through cross-view knowledge constraints.

What carries the argument

An information bottleneck that aligns object-centric structural relations across views while applying cross-view knowledge constraints to suppress noise.

If this is right

Matching accuracy rises on benchmarks that include UAV, ground, and satellite imagery under varying conditions.
The framework reduces the impact of regional textures and weather-induced domain shifts compared with global feature baselines.
Localization succeeds in cluttered UAV views by focusing on structural object relations rather than raw appearance.
Generalization improves when cross-view knowledge constraints are used to filter view-specific signals.
The method supports precise navigation in GPS-denied settings by producing view-invariant representations.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same bottleneck could be applied to other viewpoint-invariant tasks such as cross-season image retrieval.
If object relations prove key, integrating lightweight object detectors might further reduce the need for heavy global descriptors.
Deployment on resource-limited UAV hardware would require checking whether the constraint terms add acceptable latency.
Environments with few distinct objects, such as open water or desert, could test whether the structural signal remains informative.

Load-bearing premise

Object-centric structural relations extracted from images remain sufficiently consistent across different viewpoints to serve as the primary matching signal.

What would settle it

A controlled ablation on standard CVGL benchmarks that removes the object-centric alignment term and measures whether matching accuracy drops or stays the same.

Figures

Figures reproduced from arXiv: 2605.07099 by Hongrui Yin, Hongyang Zhang, Man OnPun, Maonnan Wang, Ziyao Wang.

**Figure 1.** Figure 1: (a) The illustration of our motivation. Cross-view images can be decomposed into the view-invariant information and viewspecific noise, while paired data can be matched through key visual clues. (b) The overview of cross-view object-centric learning process, the main target is to extract view-invariant tokens by compressing the view-specific noise. (c) Comparison with recent state-of-the-art methods on th… view at source ↗

**Figure 2.** Figure 2: The overview of our proposed framework InfoGeo. (Section 4.1). In Section 4.2, the OCVA module is proposed to incorporate object-centric representations into the scenelevel descriptors, enabling fine-grained discrimination with view-invariant semantics. The detailed information of them is illustrated in [PITH_FULL_IMAGE:figures/full_fig_p004_2.png] view at source ↗

**Figure 3.** Figure 3: The training pipeline of Cross-view Visual Concept Reasoner, which pioneers an IB theory based framework for cross-view OCL through two synergistic components: 1) Cross-view Adaptive Concept Selection, and 2) Concept Structural Relational Reasoning. Object-Centric Visual Augmentation is further proposed to integrate object-centric representations into the global scene-level descriptors. where ∥·∥2 2 denote… view at source ↗

**Figure 4.** Figure 4: The PCA visualization and concept affinities of Object-Centric Representations Zˆ. RGB values correspond to principal components. Circles are the view-shared landmarks. Concept affinities are calculated by the spatial-level cosine distance across viewpoints (darker colors indicate higher spatial similarity), while dashed circles highlight the feature-space regions that exhibit robustness. View-Specific Noi… view at source ↗

**Figure 5.** Figure 5: The cross-view spatial-level concept affinities of different components produced by decoding attention maps. Ablation of Main Components. We perform an ablation study on individual components to verify our design, as shown in [PITH_FULL_IMAGE:figures/full_fig_p008_5.png] view at source ↗

**Figure 6.** Figure 6: The sensitivity analysis of the hyperparameters in OCVA. effectively avoiding slot collapse issue where excessively similar concepts. In contrast, K = 32 degrades the generalization performance, as excessive slots cause redundant discrete concepts, introducing noise that weakens discriminative cues. Thus, K = 16 provides an optimal value in the module. Meanwhile, the model achieves its best performance … view at source ↗

**Figure 7.** Figure 7: The detailed structures of the feature aggregation layer [PITH_FULL_IMAGE:figures/full_fig_p018_7.png] view at source ↗

**Figure 8.** Figure 8: Comparsion on the network structures of our proposed work in the inference stage. (a) InfoGeo w/o Relational Distillation, (b) InfoGeo w/ Relational Distillation. query-gallery pairs while simultaneously pushing apart non-matching pairs, effectively approximating the mutual information between views in a tractable manner (Van den Oord et al., 2018). Formally, the loss is defined as: Lalign( ˜f q , ˜f g ) =… view at source ↗

**Figure 9.** Figure 9: The ablations on the three hyper-parameters across different scenarios. C.5.2. SENSITIVITY ANALYSIS ON HYPER-PARAMETER λ1, λ2 AND λ3 To explore the influence of the different components in our overall objective, we conduct a sensitivity analysis on the three hyperparameters: λ1, λ2, and λ3, as illustrated in [PITH_FULL_IMAGE:figures/full_fig_p021_9.png] view at source ↗

**Figure 10.** Figure 10: The PCA visualization of feature maps between different UAV benchmarks. D. Qualitive Results D.1. Visualization on Object-Centric Representations We further present additional PCA visualizations of object-centric representations on the three benchmarks in [PITH_FULL_IMAGE:figures/full_fig_p022_10.png] view at source ↗

**Figure 11.** Figure 11: Comparison of the visualization on feature maps between InfoGeo and Baseline under Multi-Weather settings. The bounding boxes denote the view-shared objects across different viewpoints. more challenging fog-snow scenario, global representations learned by baseline models fail to effectively distinguish two distant buildings and lack discriminative capability for the green building in the foreground. In co… view at source ↗

**Figure 12.** Figure 12: The retrieval results of the failure case in University-1652→SUES-200 (150m and 300m). The predicted result is the top-1 retrieval results. Dash circles denote the key visual clues (Red lines denote the wrong-matched patterns, while Yellow lines are the discriminative objects). efficiency degradation caused by the integration of object-centric learning modules during inference. Extensive experiments acros… view at source ↗

read the original abstract

Cross-view geo-localization (CVGL) is fundamental for precise localization and navigation in GPS-denied environments, aiming to match ground or UAV imagery with satellite views. While existing approaches rely on global feature alignment, they often suffer from substantial domain shifts induced by varying regional textures and weather conditions. This issue becomes even more pronounced in UAV-based scenarios, where the broader perspective inevitably introduces dense, fine-grained objects, creating significant visual clutter. To address this, we draw inspiration from Object-Centric Learning (OCL) and propose InfoGeo, an information-theoretic framework designed to enhance robustness and generalization. InfoGeo reformulates the optimization as an information bottleneck process with two core objectives: (i) maximizing view-invariant information by aligning the object-centric structural relations across views, and (ii) minimizing view-specific noisy signals through cross-view knowledge constraints. Extensive evaluations across diverse benchmarks and challenging scenarios demonstrate that InfoGeo significantly outperforms state-of-the-art methods.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

InfoGeo applies an information bottleneck to object-centric relations for UAV cross-view geo-localization, but the abstract leaves the extraction and invariance steps without equations or validation.

read the letter

The paper's core move is to treat cross-view UAV geo-localization as an information-bottleneck task. It keeps object-centric structural relations that are supposed to stay stable across views while using cross-view constraints to suppress texture and weather noise that global features pick up. That framing directly targets the clutter problem in UAV imagery, which standard CVGL methods struggle with under domain shift.

Referee Report

2 major / 2 minor

Summary. The paper proposes InfoGeo, an information-theoretic object-centric learning framework for cross-view geo-localization (CVGL) in UAV scenarios. It reformulates the optimization as an information bottleneck process with two objectives: (i) maximizing view-invariant information by aligning object-centric structural relations across views and (ii) minimizing view-specific noisy signals through cross-view knowledge constraints. The authors claim this addresses domain shifts and visual clutter better than global feature alignment methods, with extensive evaluations on diverse benchmarks showing significant outperformance over state-of-the-art approaches.

Significance. If the central claims hold, the work could meaningfully advance UAV geo-localization in GPS-denied settings by introducing a principled disentanglement of invariant structural cues from domain-specific noise via object-centric representations. This bridges object-centric learning with cross-view matching and offers a template for handling clutter in aerial imagery that global methods struggle with.

major comments (2)

[§3.2] §3.2 (Information Bottleneck Formulation): The reformulation into the two core objectives is described at a high level, but no explicit equations or estimation procedure is provided for computing the mutual information terms that align object-centric structural relations or enforce the cross-view constraints. This is load-bearing for the central claim, as it leaves open whether the objectives are implementable without reducing to standard contrastive losses or introducing view-specific artifacts during relation extraction.
[§5] §5 (Experiments): No ablation or direct validation is reported on the robustness of extracting object-centric structural relations from cluttered UAV imagery or on their view-invariance under extreme viewpoint/domain shifts. This undermines the weakest assumption that these relations serve as a reliable primary signal, since any extraction artifacts would propagate through the bottleneck without being selectively suppressed.

minor comments (2)

[Abstract] The abstract would be strengthened by including one or two key quantitative results (e.g., mAP or recall gains on a primary benchmark) to support the outperformance claim.
[§3] Notation for object-centric relations and knowledge constraints should be introduced earlier and used consistently to improve readability of the method section.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive and detailed feedback. We address each major comment point by point below, clarifying the technical details and committing to revisions that strengthen the presentation and validation of our claims.

read point-by-point responses

Referee: [§3.2] §3.2 (Information Bottleneck Formulation): The reformulation into the two core objectives is described at a high level, but no explicit equations or estimation procedure is provided for computing the mutual information terms that align object-centric structural relations or enforce the cross-view constraints. This is load-bearing for the central claim, as it leaves open whether the objectives are implementable without reducing to standard contrastive losses or introducing view-specific artifacts during relation extraction.

Authors: We appreciate the referee highlighting the need for greater rigor here. Section 3.2 introduces the information bottleneck objectives at a conceptual level to motivate the framework, but we agree that explicit equations are necessary for reproducibility. In the revision we will add the precise mathematical formulation: the primary objective maximizes I(R; Y) where R denotes the extracted object-centric structural relations and Y the view-invariant target, subject to a cross-view constraint minimizing I(R; V) for view-specific variables V. Estimation will be detailed via a variational lower bound combined with a relation-aware contrastive estimator (distinct from standard global contrastive losses by operating on pairwise object graphs rather than image embeddings). We will also specify the object relation extraction pipeline (using slot attention followed by graph construction) and show it avoids view-specific artifacts through the bottleneck regularization. These additions will make the implementation fully transparent. revision: yes
Referee: [§5] §5 (Experiments): No ablation or direct validation is reported on the robustness of extracting object-centric structural relations from cluttered UAV imagery or on their view-invariance under extreme viewpoint/domain shifts. This undermines the weakest assumption that these relations serve as a reliable primary signal, since any extraction artifacts would propagate through the bottleneck without being selectively suppressed.

Authors: We acknowledge that direct validation of the core assumption would strengthen the paper. While the consistent outperformance over global-feature baselines on multiple benchmarks (including cluttered UAV and extreme-shift scenarios) provides supporting evidence that the relations are robust and invariant, we agree this is indirect. In the revised manuscript we will add targeted experiments: (i) ablations replacing the object-centric relation extractor with global or alternative local features, (ii) quantitative invariance metrics (e.g., relation consistency scores across view pairs), and (iii) qualitative and quantitative analysis on subsets with heavy clutter and large domain shifts to demonstrate that the information bottleneck selectively suppresses noise while preserving structural cues. These results will be reported in an expanded Section 5. revision: yes

Circularity Check

0 steps flagged

No circularity: proposed framework is a methodological reformulation without self-referential reduction.

full rationale

The abstract describes InfoGeo as a new information-theoretic framework that reformulates CVGL optimization via an information bottleneck with two explicit objectives (maximizing view-invariant object-centric relations and minimizing view-specific noise). No equations, fitted parameters, or self-citations are presented that would make these objectives equivalent to their inputs by construction. The derivation chain is a proposal drawing inspiration from OCL rather than a tautological renaming or load-bearing self-citation; the central claim remains an independent modeling choice whose validity rests on empirical benchmarks, not internal redefinition.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

Review is abstract-only; ledger entries are inferred from stated assumptions in the abstract and marked as such.

axioms (2)

domain assumption Object-centric structural relations can be extracted and aligned to capture view-invariant information for localization.
Central to the first objective; appears in the abstract description of maximizing view-invariant information.
domain assumption Cross-view knowledge constraints can isolate and minimize view-specific noisy signals without harming localization performance.
Central to the second objective; stated as part of the information bottleneck reformulation.

pith-pipeline@v0.9.0 · 5475 in / 1405 out tokens · 28638 ms · 2026-05-11T01:24:32.515349+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel echoes

?

echoes
ECHOES: this paper passage has the same mathematical shape or conceptual pattern as the Recognition theorem, but is not a direct formal dependency.

Theorem 3.1 (View-Invariant Information Bottleneck Principle). The representations Ẑ(v) are optimal for CVGL when they maximize mutual information (MI) with the geographic identity Y while minimizing task-irrelevant information inherited from the original features Z(v): max L(v)_IB = I(Ẑ(v);Y) − β I(Ẑ(v);Z(v)|Y)
IndisputableMonolith/Foundation/AlexanderDuality.lean alexander_duality_circle_linking unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

We construct concept graphs G(v) ∈ R^{K×K} through Â(v)_d … Laplacian Eigenmaps … spectral decomposition … Lstruct = min_{Q∈O(r)} ∥Uq − Ug Q∥₂²

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

67 extracted references · 67 canonical work pages · 2 internal anchors

[1]

Langley , title =

P. Langley , title =. Proceedings of the 17th International Conference on Machine Learning (ICML 2000) , address =. 2000 , pages =

work page 2000
[2]

T. M. Mitchell. The Need for Biases in Learning Generalizations. 1980

work page 1980
[3]

M. J. Kearns , title =

work page
[4]

Machine Learning: An Artificial Intelligence Approach, Vol. I. 1983

work page 1983
[5]

R. O. Duda and P. E. Hart and D. G. Stork. Pattern Classification. 2000

work page 2000
[6]

Suppressed for Anonymity , author=

work page
[7]

Newell and P

A. Newell and P. S. Rosenbloom. Mechanisms of Skill Acquisition and the Law of Practice. Cognitive Skills and Their Acquisition. 1981

work page 1981
[8]

A. L. Samuel. Some Studies in Machine Learning Using the Game of Checkers. IBM Journal of Research and Development. 1959

work page 1959
[9]

Proceedings of the 37th annual Allerton Conference on Communication, Control, and Computing , pages=

The information bottleneck method , author=. Proceedings of the 37th annual Allerton Conference on Communication, Control, and Computing , pages=

work page
[10]

Advances in neural information processing systems , volume=

Object-centric learning with slot attention , author=. Advances in neural information processing systems , volume=

work page
[11]

Proceedings of the IEEE/CVF winter conference on applications of computer vision , pages=

Mixvpr: Feature mixing for visual place recognition , author=. Proceedings of the IEEE/CVF winter conference on applications of computer vision , pages=

work page
[12]

Statistics and computing , volume=

A tutorial on spectral clustering , author=. Statistics and computing , volume=. 2007 , publisher=

work page 2007
[13]

Proceedings of the AAAI conference on artificial intelligence , volume=

Film: Visual reasoning with a general conditioning layer , author=. Proceedings of the AAAI conference on artificial intelligence , volume=

work page
[14]

Proceedings of the 33rd ACM International Conference on Multimedia , pages=

Slot attention with re-initialization and self-distillation , author=. Proceedings of the 33rd ACM International Conference on Multimedia , pages=

work page
[15]

Elsayed, Aravindh Mahendran, Austin Stone, Sara Sabour, Georg Heigold, Rico Jonschkowski, Alexey Dosovitskiy, and Klaus Greff

Conditional object-centric learning from video , author=. arXiv preprint arXiv:2111.12594 , year=

work page arXiv
[16]

arXiv preprint arXiv:2209.14860 , year=

Bridging the gap to real-world object-centric learning , author=. arXiv preprint arXiv:2209.14860 , year=

work page arXiv
[17]

IEEE transactions on pattern analysis and machine intelligence , volume=

An eigendecomposition approach to weighted graph matching problems , author=. IEEE transactions on pattern analysis and machine intelligence , volume=. 2002 , publisher=

work page 2002
[18]

Proceedings of the 28th ACM international conference on Multimedia , pages=

University-1652: A multi-view multi-source benchmark for drone-based geo-localization , author=. Proceedings of the 28th ACM international conference on Multimedia , pages=

work page
[19]

IEEE Transactions on Circuits and Systems for Video Technology , volume=

SUES-200: A multi-height multi-scene cross-view image benchmark across drone and satellite , author=. IEEE Transactions on Circuits and Systems for Video Technology , volume=. 2023 , publisher=

work page 2023
[20]

IEEE Transactions on Image Processing , volume=

Vision-based UAV self-positioning in low-altitude urban environments , author=. IEEE Transactions on Image Processing , volume=. 2023 , publisher=

work page 2023
[21]

Proceedings of the AAAI Conference on Artificial Intelligence , volume=

Game4loc: A uav geo-localization benchmark from game data , author=. Proceedings of the AAAI Conference on Artificial Intelligence , volume=

work page
[22]

An intuitive proof of the data processing inequality

An intuitive proof of the data processing inequality , author=. arXiv preprint arXiv:1107.0740 , year=

work page Pith review arXiv
[23]

International conference on machine learning , pages=

On variational bounds of mutual information , author=. International conference on machine learning , pages=. 2019 , organization=

work page 2019
[24]

Proceedings of the International Conference on Learning Representations (ICLR) , year =

Representation Learning with Contrastive Predictive Coding , author =. Proceedings of the International Conference on Learning Representations (ICLR) , year =

work page
[25]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition , pages=

Unleashing unlabeled data: A paradigm for cross-view geo-localization , author=. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition , pages=

work page
[26]

Proceedings of the IEEE/CVF conference on computer vision and pattern recognition , pages=

Relational knowledge distillation , author=. Proceedings of the IEEE/CVF conference on computer vision and pattern recognition , pages=

work page
[27]

Neural computation , volume=

Laplacian eigenmaps for dimensionality reduction and data representation , author=. Neural computation , volume=. 2003 , publisher=

work page 2003
[28]

Nature Machine Intelligence , volume=

Challenges, evaluation and opportunities for open-world learning , author=. Nature Machine Intelligence , volume=. 2024 , publisher=

work page 2024
[29]

Advances in neural information processing systems , volume=

Learning object-centric representations of multi-object scenes from multiple views , author=. Advances in neural information processing systems , volume=

work page
[30]

Proceedings of the IEEE International Conference on Computer Vision , pages=

Wide-area image geolocalization with aerial reference imagery , author=. Proceedings of the IEEE International Conference on Computer Vision , pages=

work page
[31]

Proceedings of the IEEE/CVF conference on computer vision and pattern recognition , pages=

From coarse to fine: Robust hierarchical localization at large scale , author=. Proceedings of the IEEE/CVF conference on computer vision and pattern recognition , pages=

work page
[32]

Anindya Sarkar and Srikumar Sastry and Aleksis Pirinen and Chongjie Zhang and Nathan Jacobs and Yevgeniy Vorobeychik , booktitle=

work page
[33]

Towards Generative Location Awareness for Disaster Response: A Probabilistic Cross-view Geolocalization Approach

Towards Generative Location Awareness for Disaster Response: A Probabilistic Cross-view Geolocalization Approach , author=. arXiv preprint arXiv:2512.20056 , year=

work page internal anchor Pith review Pith/arXiv arXiv
[34]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition , pages=

Adaptive slot attention: Object discovery with dynamic slot number , author=. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition , pages=

work page
[35]

Proceedings of the IEEE/CVF International Conference on Computer Vision , pages=

Sample4geo: Hard negative sampling for cross-view geo-localisation , author=. Proceedings of the IEEE/CVF International Conference on Computer Vision , pages=

work page
[36]

Proceedings of the 3rd International Workshop on UAVs in Multimedia: Capturing the World from a New Perspective , pages=

SkyLink: Unifying Street-Satellite Geo-Localization via UAV-Mediated 3D Scene Alignment , author=. Proceedings of the 3rd International Workshop on UAVs in Multimedia: Capturing the World from a New Perspective , pages=

work page
[37]

Proceedings of the IEEE/CVF conference on computer vision and pattern recognition , pages=

Differentiable information bottleneck for deterministic multi-view clustering , author=. Proceedings of the IEEE/CVF conference on computer vision and pattern recognition , pages=

work page
[38]

IEEE Transactions on Pattern Analysis and Machine Intelligence , year=

Contrastive Learning via Variational Information Bottleneck , author=. IEEE Transactions on Pattern Analysis and Machine Intelligence , year=

work page
[39]

Proceedings of the IEEE/CVF international conference on computer vision , pages=

Object-centric multiple object tracking , author=. Proceedings of the IEEE/CVF international conference on computer vision , pages=

work page
[40]

Robust cross-view geo-localization via content-viewpoint disentangle- ment,

Robust Cross-View Geo-Localization via Content-Viewpoint Disentanglement , author=. arXiv preprint arXiv:2505.11822 , year=

work page arXiv
[41]

Proceedings of the 2nd Workshop on UAVs in Multimedia: Capturing the World from a New Perspective , pages=

The 2nd Workshop on UAVs in Multimedia: Capturing the World from a New Perspective , author=. Proceedings of the 2nd Workshop on UAVs in Multimedia: Capturing the World from a New Perspective , pages=

work page
[42]

DINOv2: Learning Robust Visual Features without Supervision

Dinov2: Learning robust visual features without supervision , author=. arXiv preprint arXiv:2304.07193 , year=

work page internal anchor Pith review Pith/arXiv arXiv
[43]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition , pages=

Shepherding slots to objects: Towards stable and robust object-centric learning , author=. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition , pages=

work page
[44]

European conference on computer vision , pages=

Unifying deep local and global features for image search , author=. European conference on computer vision , pages=. 2020 , organization=

work page 2020
[45]

Yang, Min and He, Dongliang and Fan, Miao and Shi, Baorong and Xue, Xuetong and Li, Fu and Ding, Errui and Huang, Jizhou , booktitle=

work page
[46]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition , pages=

Rethinking visual geo-localization for large-scale applications , author=. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition , pages=

work page
[47]

Zhu, Sijie and Yang, Taojiannan and Chen, Chen , booktitle=

work page
[48]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition , pages=

Coming down to earth: Satellite-to-street view synthesis for geo-localization , author=. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition , pages=

work page
[49]

IEEE transactions on pattern analysis and machine intelligence , volume=

Accurate 3-DoF camera geo-localization via ground-to-satellite image matching , author=. IEEE transactions on pattern analysis and machine intelligence , volume=. 2022 , publisher=

work page 2022
[50]

Simple, effective and general: A new backbone for cross-view image geo-localization,

Simple, effective and general: A new backbone for cross-view image geo-localization , author=. arXiv preprint arXiv:2302.01572 , year=

work page arXiv
[51]

IEEE Transactions on Circuits and Systems for Video Technology , volume=

Patch similarity self-knowledge distillation for cross-view geo-localization , author=. IEEE Transactions on Circuits and Systems for Video Technology , volume=. 2023 , publisher=

work page 2023
[52]

European Conference on Computer Vision , pages=

Cross-view image geo-localization with Panorama-BEV Co-Retrieval Network , author=. European Conference on Computer Vision , pages=. 2024 , organization=

work page 2024
[53]

IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing , year=

Cv-cities: Advancing cross-view geo-localization in global cities , author=. IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing , year=

work page
[54]

Advances in neural information processing systems , volume=

Object-centric representation learning with generative spatial-temporal factorization , author=. Advances in neural information processing systems , volume=

work page
[55]

IEEE Transactions on Pattern Analysis and Machine Intelligence , volume=

Unsupervised object-centric learning from multiple unspecified viewpoints , author=. IEEE Transactions on Pattern Analysis and Machine Intelligence , volume=. 2024 , publisher=

work page 2024
[56]

Proceedings of the IEEE/CVF International Conference on Computer Vision , pages=

Objectrelator: Enabling cross-view object relation understanding across ego-centric and exo-centric perspectives , author=. Proceedings of the IEEE/CVF International Conference on Computer Vision , pages=

work page
[57]

Advances in Neural Information Processing Systems , volume=

Improving viewpoint-independent object-centric representations through active viewpoint selection , author=. Advances in Neural Information Processing Systems , volume=

work page
[58]

IEEE transactions on pattern analysis and machine intelligence , volume=

Learning representations for neural network-based classification using the information bottleneck principle , author=. IEEE transactions on pattern analysis and machine intelligence , volume=. 2019 , publisher=

work page 2019
[59]

IEEE Transactions on Circuits and Systems for Video Technology , volume=

MCCG: A ConvNeXt-based multiple-classifier method for cross-view geo-localization , author=. IEEE Transactions on Circuits and Systems for Video Technology , volume=. 2023 , publisher=

work page 2023
[60]

IEEE Transactions on Circuits and Systems for Video Technology , year=

Enhancing cross-view geo-localization with domain alignment and scene consistency , author=. IEEE Transactions on Circuits and Systems for Video Technology , year=

work page
[61]

IEEE Transactions on Geoscience and Remote Sensing , year=

Camp: A cross-view geo-localization method using contrastive attributes mining and position-aware partitioning , author=. IEEE Transactions on Geoscience and Remote Sensing , year=

work page
[62]

Proceedings of the 32nd ACM International Conference on Multimedia , pages=

Mfrgn: Multi-scale feature representation generalization network for ground-to-aerial geo-localization , author=. Proceedings of the 32nd ACM International Conference on Multimedia , pages=

work page
[63]

2025 IEEE/CVF International Conference on Computer Vision (ICCV) , pages=

MMGeo: Multimodal Compositional Geo-Localization for UAVs , author=. 2025 IEEE/CVF International Conference on Computer Vision (ICCV) , pages=

work page 2025
[64]

IEEE Transactions on Geoscience and Remote Sensing , year=

Learning cross-view visual geo-localization without ground truth , author=. IEEE Transactions on Geoscience and Remote Sensing , year=

work page
[65]

Learning deep representations by mutual information estimation and maximization

Learning deep representations by mutual information estimation and maximization , author=. arXiv preprint arXiv:1808.06670 , year=

work page Pith review arXiv
[66]

On Differentiating Parameterized Argmin and Argmax Problems with Application to Bi-level Optimization

On differentiating parameterized argmin and argmax problems with application to bi-level optimization , author=. arXiv preprint arXiv:1607.05447 , year=

work page Pith review arXiv
[67]

Advances in neural information processing systems , volume=

Attention is all you need , author=. Advances in neural information processing systems , volume=

work page