pith. machine review for the scientific record. sign in

arxiv: 2605.07099 · v1 · submitted 2026-05-08 · 💻 cs.CV

Recognition: 2 theorem links

· Lean Theorem

InfoGeo: Information-Theoretic Object-Centric Learning for Cross-View Generalizable UAV Geo-Localization

Authors on Pith no claims yet

Pith reviewed 2026-05-11 01:24 UTC · model grok-4.3

classification 💻 cs.CV
keywords cross-view geo-localizationUAV localizationobject-centric learninginformation bottleneckview-invariant featuresdomain generalizationsatellite matching
0
0 comments X

The pith

Reformulating cross-view geo-localization as an information bottleneck that aligns object-centric structural relations across views improves robustness to domain shifts and clutter in UAV scenarios.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper proposes InfoGeo to solve cross-view geo-localization for UAVs by moving beyond global feature matching, which breaks down under changing textures, weather, and dense object clutter from wide UAV perspectives. It draws on object-centric learning to extract structural relations among objects that stay consistent when the same scene is viewed from ground, air, or satellite angles. The method casts training as an information bottleneck that keeps the invariant relations while discarding view-specific noise via cross-view constraints. A sympathetic reader would care because reliable image-based localization without GPS enables safer navigation in GPS-denied or contested environments. If the approach holds, matching across extreme viewpoint changes becomes more reliable on standard benchmarks.

Core claim

InfoGeo reformulates the optimization as an information bottleneck process with two core objectives: maximizing view-invariant information by aligning the object-centric structural relations across views, and minimizing view-specific noisy signals through cross-view knowledge constraints.

What carries the argument

An information bottleneck that aligns object-centric structural relations across views while applying cross-view knowledge constraints to suppress noise.

If this is right

  • Matching accuracy rises on benchmarks that include UAV, ground, and satellite imagery under varying conditions.
  • The framework reduces the impact of regional textures and weather-induced domain shifts compared with global feature baselines.
  • Localization succeeds in cluttered UAV views by focusing on structural object relations rather than raw appearance.
  • Generalization improves when cross-view knowledge constraints are used to filter view-specific signals.
  • The method supports precise navigation in GPS-denied settings by producing view-invariant representations.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same bottleneck could be applied to other viewpoint-invariant tasks such as cross-season image retrieval.
  • If object relations prove key, integrating lightweight object detectors might further reduce the need for heavy global descriptors.
  • Deployment on resource-limited UAV hardware would require checking whether the constraint terms add acceptable latency.
  • Environments with few distinct objects, such as open water or desert, could test whether the structural signal remains informative.

Load-bearing premise

Object-centric structural relations extracted from images remain sufficiently consistent across different viewpoints to serve as the primary matching signal.

What would settle it

A controlled ablation on standard CVGL benchmarks that removes the object-centric alignment term and measures whether matching accuracy drops or stays the same.

Figures

Figures reproduced from arXiv: 2605.07099 by Hongrui Yin, Hongyang Zhang, Man OnPun, Maonnan Wang, Ziyao Wang.

Figure 1
Figure 1. Figure 1: (a) The illustration of our motivation. Cross-view images can be decomposed into the view-invariant information and view￾specific noise, while paired data can be matched through key visual clues. (b) The overview of cross-view object-centric learning process, the main target is to extract view-invariant tokens by compressing the view-specific noise. (c) Comparison with recent state-of-the-art methods on th… view at source ↗
Figure 2
Figure 2. Figure 2: The overview of our proposed framework InfoGeo. (Section 4.1). In Section 4.2, the OCVA module is proposed to incorporate object-centric representations into the scene￾level descriptors, enabling fine-grained discrimination with view-invariant semantics. The detailed information of them is illustrated in [PITH_FULL_IMAGE:figures/full_fig_p004_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: The training pipeline of Cross-view Visual Concept Reasoner, which pioneers an IB theory based framework for cross-view OCL through two synergistic components: 1) Cross-view Adaptive Concept Selection, and 2) Concept Structural Relational Reasoning. Object-Centric Visual Augmentation is further proposed to integrate object-centric representations into the global scene-level descriptors. where ∥·∥2 2 denote… view at source ↗
Figure 4
Figure 4. Figure 4: The PCA visualization and concept affinities of Object-Centric Representations Zˆ. RGB values correspond to principal components. Circles are the view-shared landmarks. Concept affinities are calculated by the spatial-level cosine distance across viewpoints (darker colors indicate higher spatial similarity), while dashed circles highlight the feature-space regions that exhibit robustness. View-Specific Noi… view at source ↗
Figure 5
Figure 5. Figure 5: The cross-view spatial-level concept affinities of different components produced by decoding attention maps. Ablation of Main Components. We perform an ablation study on individual components to verify our design, as shown in [PITH_FULL_IMAGE:figures/full_fig_p008_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: The sensitivity analysis of the hyperparameters in OCVA. effectively avoiding slot collapse issue where excessively similar concepts. In contrast, K = 32 degrades the gener￾alization performance, as excessive slots cause redundant discrete concepts, introducing noise that weakens discrim￾inative cues. Thus, K = 16 provides an optimal value in the module. Meanwhile, the model achieves its best perfor￾mance … view at source ↗
Figure 7
Figure 7. Figure 7: The detailed structures of the feature aggregation layer [PITH_FULL_IMAGE:figures/full_fig_p018_7.png] view at source ↗
Figure 8
Figure 8. Figure 8: Comparsion on the network structures of our proposed work in the inference stage. (a) InfoGeo w/o Relational Distillation, (b) InfoGeo w/ Relational Distillation. query-gallery pairs while simultaneously pushing apart non-matching pairs, effectively approximating the mutual information between views in a tractable manner (Van den Oord et al., 2018). Formally, the loss is defined as: Lalign( ˜f q , ˜f g ) =… view at source ↗
Figure 9
Figure 9. Figure 9: The ablations on the three hyper-parameters across different scenarios. C.5.2. SENSITIVITY ANALYSIS ON HYPER-PARAMETER λ1, λ2 AND λ3 To explore the influence of the different components in our overall objective, we conduct a sensitivity analysis on the three hyperparameters: λ1, λ2, and λ3, as illustrated in [PITH_FULL_IMAGE:figures/full_fig_p021_9.png] view at source ↗
Figure 10
Figure 10. Figure 10: The PCA visualization of feature maps between different UAV benchmarks. D. Qualitive Results D.1. Visualization on Object-Centric Representations We further present additional PCA visualizations of object-centric representations on the three benchmarks in [PITH_FULL_IMAGE:figures/full_fig_p022_10.png] view at source ↗
Figure 11
Figure 11. Figure 11: Comparison of the visualization on feature maps between InfoGeo and Baseline under Multi-Weather settings. The bounding boxes denote the view-shared objects across different viewpoints. more challenging fog-snow scenario, global representations learned by baseline models fail to effectively distinguish two distant buildings and lack discriminative capability for the green building in the foreground. In co… view at source ↗
Figure 12
Figure 12. Figure 12: The retrieval results of the failure case in University-1652→SUES-200 (150m and 300m). The predicted result is the top-1 retrieval results. Dash circles denote the key visual clues (Red lines denote the wrong-matched patterns, while Yellow lines are the discriminative objects). efficiency degradation caused by the integration of object-centric learning modules during inference. Extensive experiments acros… view at source ↗
read the original abstract

Cross-view geo-localization (CVGL) is fundamental for precise localization and navigation in GPS-denied environments, aiming to match ground or UAV imagery with satellite views. While existing approaches rely on global feature alignment, they often suffer from substantial domain shifts induced by varying regional textures and weather conditions. This issue becomes even more pronounced in UAV-based scenarios, where the broader perspective inevitably introduces dense, fine-grained objects, creating significant visual clutter. To address this, we draw inspiration from Object-Centric Learning (OCL) and propose InfoGeo, an information-theoretic framework designed to enhance robustness and generalization. InfoGeo reformulates the optimization as an information bottleneck process with two core objectives: (i) maximizing view-invariant information by aligning the object-centric structural relations across views, and (ii) minimizing view-specific noisy signals through cross-view knowledge constraints. Extensive evaluations across diverse benchmarks and challenging scenarios demonstrate that InfoGeo significantly outperforms state-of-the-art methods.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper proposes InfoGeo, an information-theoretic object-centric learning framework for cross-view geo-localization (CVGL) in UAV scenarios. It reformulates the optimization as an information bottleneck process with two objectives: (i) maximizing view-invariant information by aligning object-centric structural relations across views and (ii) minimizing view-specific noisy signals through cross-view knowledge constraints. The authors claim this addresses domain shifts and visual clutter better than global feature alignment methods, with extensive evaluations on diverse benchmarks showing significant outperformance over state-of-the-art approaches.

Significance. If the central claims hold, the work could meaningfully advance UAV geo-localization in GPS-denied settings by introducing a principled disentanglement of invariant structural cues from domain-specific noise via object-centric representations. This bridges object-centric learning with cross-view matching and offers a template for handling clutter in aerial imagery that global methods struggle with.

major comments (2)
  1. [§3.2] §3.2 (Information Bottleneck Formulation): The reformulation into the two core objectives is described at a high level, but no explicit equations or estimation procedure is provided for computing the mutual information terms that align object-centric structural relations or enforce the cross-view constraints. This is load-bearing for the central claim, as it leaves open whether the objectives are implementable without reducing to standard contrastive losses or introducing view-specific artifacts during relation extraction.
  2. [§5] §5 (Experiments): No ablation or direct validation is reported on the robustness of extracting object-centric structural relations from cluttered UAV imagery or on their view-invariance under extreme viewpoint/domain shifts. This undermines the weakest assumption that these relations serve as a reliable primary signal, since any extraction artifacts would propagate through the bottleneck without being selectively suppressed.
minor comments (2)
  1. [Abstract] The abstract would be strengthened by including one or two key quantitative results (e.g., mAP or recall gains on a primary benchmark) to support the outperformance claim.
  2. [§3] Notation for object-centric relations and knowledge constraints should be introduced earlier and used consistently to improve readability of the method section.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive and detailed feedback. We address each major comment point by point below, clarifying the technical details and committing to revisions that strengthen the presentation and validation of our claims.

read point-by-point responses
  1. Referee: [§3.2] §3.2 (Information Bottleneck Formulation): The reformulation into the two core objectives is described at a high level, but no explicit equations or estimation procedure is provided for computing the mutual information terms that align object-centric structural relations or enforce the cross-view constraints. This is load-bearing for the central claim, as it leaves open whether the objectives are implementable without reducing to standard contrastive losses or introducing view-specific artifacts during relation extraction.

    Authors: We appreciate the referee highlighting the need for greater rigor here. Section 3.2 introduces the information bottleneck objectives at a conceptual level to motivate the framework, but we agree that explicit equations are necessary for reproducibility. In the revision we will add the precise mathematical formulation: the primary objective maximizes I(R; Y) where R denotes the extracted object-centric structural relations and Y the view-invariant target, subject to a cross-view constraint minimizing I(R; V) for view-specific variables V. Estimation will be detailed via a variational lower bound combined with a relation-aware contrastive estimator (distinct from standard global contrastive losses by operating on pairwise object graphs rather than image embeddings). We will also specify the object relation extraction pipeline (using slot attention followed by graph construction) and show it avoids view-specific artifacts through the bottleneck regularization. These additions will make the implementation fully transparent. revision: yes

  2. Referee: [§5] §5 (Experiments): No ablation or direct validation is reported on the robustness of extracting object-centric structural relations from cluttered UAV imagery or on their view-invariance under extreme viewpoint/domain shifts. This undermines the weakest assumption that these relations serve as a reliable primary signal, since any extraction artifacts would propagate through the bottleneck without being selectively suppressed.

    Authors: We acknowledge that direct validation of the core assumption would strengthen the paper. While the consistent outperformance over global-feature baselines on multiple benchmarks (including cluttered UAV and extreme-shift scenarios) provides supporting evidence that the relations are robust and invariant, we agree this is indirect. In the revised manuscript we will add targeted experiments: (i) ablations replacing the object-centric relation extractor with global or alternative local features, (ii) quantitative invariance metrics (e.g., relation consistency scores across view pairs), and (iii) qualitative and quantitative analysis on subsets with heavy clutter and large domain shifts to demonstrate that the information bottleneck selectively suppresses noise while preserving structural cues. These results will be reported in an expanded Section 5. revision: yes

Circularity Check

0 steps flagged

No circularity: proposed framework is a methodological reformulation without self-referential reduction.

full rationale

The abstract describes InfoGeo as a new information-theoretic framework that reformulates CVGL optimization via an information bottleneck with two explicit objectives (maximizing view-invariant object-centric relations and minimizing view-specific noise). No equations, fitted parameters, or self-citations are presented that would make these objectives equivalent to their inputs by construction. The derivation chain is a proposal drawing inspiration from OCL rather than a tautological renaming or load-bearing self-citation; the central claim remains an independent modeling choice whose validity rests on empirical benchmarks, not internal redefinition.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

Review is abstract-only; ledger entries are inferred from stated assumptions in the abstract and marked as such.

axioms (2)
  • domain assumption Object-centric structural relations can be extracted and aligned to capture view-invariant information for localization.
    Central to the first objective; appears in the abstract description of maximizing view-invariant information.
  • domain assumption Cross-view knowledge constraints can isolate and minimize view-specific noisy signals without harming localization performance.
    Central to the second objective; stated as part of the information bottleneck reformulation.

pith-pipeline@v0.9.0 · 5475 in / 1405 out tokens · 28638 ms · 2026-05-11T01:24:32.515349+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

  • IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel echoes
    ?
    echoes

    ECHOES: this paper passage has the same mathematical shape or conceptual pattern as the Recognition theorem, but is not a direct formal dependency.

    Theorem 3.1 (View-Invariant Information Bottleneck Principle). The representations Ẑ(v) are optimal for CVGL when they maximize mutual information (MI) with the geographic identity Y while minimizing task-irrelevant information inherited from the original features Z(v): max L(v)_IB = I(Ẑ(v);Y) − β I(Ẑ(v);Z(v)|Y)

  • IndisputableMonolith/Foundation/AlexanderDuality.lean alexander_duality_circle_linking unclear
    ?
    unclear

    Relation between the paper passage and the cited Recognition theorem.

    We construct concept graphs G(v) ∈ R^{K×K} through Â(v)_d … Laplacian Eigenmaps … spectral decomposition … Lstruct = min_{Q∈O(r)} ∥Uq − Ug Q∥₂²

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

67 extracted references · 67 canonical work pages · 2 internal anchors

  1. [1]

    Langley , title =

    P. Langley , title =. Proceedings of the 17th International Conference on Machine Learning (ICML 2000) , address =. 2000 , pages =

  2. [2]

    T. M. Mitchell. The Need for Biases in Learning Generalizations. 1980

  3. [3]

    M. J. Kearns , title =

  4. [4]

    Machine Learning: An Artificial Intelligence Approach, Vol. I. 1983

  5. [5]

    R. O. Duda and P. E. Hart and D. G. Stork. Pattern Classification. 2000

  6. [6]

    Suppressed for Anonymity , author=

  7. [7]

    Newell and P

    A. Newell and P. S. Rosenbloom. Mechanisms of Skill Acquisition and the Law of Practice. Cognitive Skills and Their Acquisition. 1981

  8. [8]

    A. L. Samuel. Some Studies in Machine Learning Using the Game of Checkers. IBM Journal of Research and Development. 1959

  9. [9]

    Proceedings of the 37th annual Allerton Conference on Communication, Control, and Computing , pages=

    The information bottleneck method , author=. Proceedings of the 37th annual Allerton Conference on Communication, Control, and Computing , pages=

  10. [10]

    Advances in neural information processing systems , volume=

    Object-centric learning with slot attention , author=. Advances in neural information processing systems , volume=

  11. [11]

    Proceedings of the IEEE/CVF winter conference on applications of computer vision , pages=

    Mixvpr: Feature mixing for visual place recognition , author=. Proceedings of the IEEE/CVF winter conference on applications of computer vision , pages=

  12. [12]

    Statistics and computing , volume=

    A tutorial on spectral clustering , author=. Statistics and computing , volume=. 2007 , publisher=

  13. [13]

    Proceedings of the AAAI conference on artificial intelligence , volume=

    Film: Visual reasoning with a general conditioning layer , author=. Proceedings of the AAAI conference on artificial intelligence , volume=

  14. [14]

    Proceedings of the 33rd ACM International Conference on Multimedia , pages=

    Slot attention with re-initialization and self-distillation , author=. Proceedings of the 33rd ACM International Conference on Multimedia , pages=

  15. [15]

    Elsayed, Aravindh Mahendran, Austin Stone, Sara Sabour, Georg Heigold, Rico Jonschkowski, Alexey Dosovitskiy, and Klaus Greff

    Conditional object-centric learning from video , author=. arXiv preprint arXiv:2111.12594 , year=

  16. [16]

    arXiv preprint arXiv:2209.14860 , year=

    Bridging the gap to real-world object-centric learning , author=. arXiv preprint arXiv:2209.14860 , year=

  17. [17]

    IEEE transactions on pattern analysis and machine intelligence , volume=

    An eigendecomposition approach to weighted graph matching problems , author=. IEEE transactions on pattern analysis and machine intelligence , volume=. 2002 , publisher=

  18. [18]

    Proceedings of the 28th ACM international conference on Multimedia , pages=

    University-1652: A multi-view multi-source benchmark for drone-based geo-localization , author=. Proceedings of the 28th ACM international conference on Multimedia , pages=

  19. [19]

    IEEE Transactions on Circuits and Systems for Video Technology , volume=

    SUES-200: A multi-height multi-scene cross-view image benchmark across drone and satellite , author=. IEEE Transactions on Circuits and Systems for Video Technology , volume=. 2023 , publisher=

  20. [20]

    IEEE Transactions on Image Processing , volume=

    Vision-based UAV self-positioning in low-altitude urban environments , author=. IEEE Transactions on Image Processing , volume=. 2023 , publisher=

  21. [21]

    Proceedings of the AAAI Conference on Artificial Intelligence , volume=

    Game4loc: A uav geo-localization benchmark from game data , author=. Proceedings of the AAAI Conference on Artificial Intelligence , volume=

  22. [22]

    An intuitive proof of the data processing inequality

    An intuitive proof of the data processing inequality , author=. arXiv preprint arXiv:1107.0740 , year=

  23. [23]

    International conference on machine learning , pages=

    On variational bounds of mutual information , author=. International conference on machine learning , pages=. 2019 , organization=

  24. [24]

    Proceedings of the International Conference on Learning Representations (ICLR) , year =

    Representation Learning with Contrastive Predictive Coding , author =. Proceedings of the International Conference on Learning Representations (ICLR) , year =

  25. [25]

    Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition , pages=

    Unleashing unlabeled data: A paradigm for cross-view geo-localization , author=. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition , pages=

  26. [26]

    Proceedings of the IEEE/CVF conference on computer vision and pattern recognition , pages=

    Relational knowledge distillation , author=. Proceedings of the IEEE/CVF conference on computer vision and pattern recognition , pages=

  27. [27]

    Neural computation , volume=

    Laplacian eigenmaps for dimensionality reduction and data representation , author=. Neural computation , volume=. 2003 , publisher=

  28. [28]

    Nature Machine Intelligence , volume=

    Challenges, evaluation and opportunities for open-world learning , author=. Nature Machine Intelligence , volume=. 2024 , publisher=

  29. [29]

    Advances in neural information processing systems , volume=

    Learning object-centric representations of multi-object scenes from multiple views , author=. Advances in neural information processing systems , volume=

  30. [30]

    Proceedings of the IEEE International Conference on Computer Vision , pages=

    Wide-area image geolocalization with aerial reference imagery , author=. Proceedings of the IEEE International Conference on Computer Vision , pages=

  31. [31]

    Proceedings of the IEEE/CVF conference on computer vision and pattern recognition , pages=

    From coarse to fine: Robust hierarchical localization at large scale , author=. Proceedings of the IEEE/CVF conference on computer vision and pattern recognition , pages=

  32. [32]

    Anindya Sarkar and Srikumar Sastry and Aleksis Pirinen and Chongjie Zhang and Nathan Jacobs and Yevgeniy Vorobeychik , booktitle=

  33. [33]

    Towards Generative Location Awareness for Disaster Response: A Probabilistic Cross-view Geolocalization Approach

    Towards Generative Location Awareness for Disaster Response: A Probabilistic Cross-view Geolocalization Approach , author=. arXiv preprint arXiv:2512.20056 , year=

  34. [34]

    Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition , pages=

    Adaptive slot attention: Object discovery with dynamic slot number , author=. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition , pages=

  35. [35]

    Proceedings of the IEEE/CVF International Conference on Computer Vision , pages=

    Sample4geo: Hard negative sampling for cross-view geo-localisation , author=. Proceedings of the IEEE/CVF International Conference on Computer Vision , pages=

  36. [36]

    Proceedings of the 3rd International Workshop on UAVs in Multimedia: Capturing the World from a New Perspective , pages=

    SkyLink: Unifying Street-Satellite Geo-Localization via UAV-Mediated 3D Scene Alignment , author=. Proceedings of the 3rd International Workshop on UAVs in Multimedia: Capturing the World from a New Perspective , pages=

  37. [37]

    Proceedings of the IEEE/CVF conference on computer vision and pattern recognition , pages=

    Differentiable information bottleneck for deterministic multi-view clustering , author=. Proceedings of the IEEE/CVF conference on computer vision and pattern recognition , pages=

  38. [38]

    IEEE Transactions on Pattern Analysis and Machine Intelligence , year=

    Contrastive Learning via Variational Information Bottleneck , author=. IEEE Transactions on Pattern Analysis and Machine Intelligence , year=

  39. [39]

    Proceedings of the IEEE/CVF international conference on computer vision , pages=

    Object-centric multiple object tracking , author=. Proceedings of the IEEE/CVF international conference on computer vision , pages=

  40. [40]

    Robust cross-view geo-localization via content-viewpoint disentangle- ment,

    Robust Cross-View Geo-Localization via Content-Viewpoint Disentanglement , author=. arXiv preprint arXiv:2505.11822 , year=

  41. [41]

    Proceedings of the 2nd Workshop on UAVs in Multimedia: Capturing the World from a New Perspective , pages=

    The 2nd Workshop on UAVs in Multimedia: Capturing the World from a New Perspective , author=. Proceedings of the 2nd Workshop on UAVs in Multimedia: Capturing the World from a New Perspective , pages=

  42. [42]

    DINOv2: Learning Robust Visual Features without Supervision

    Dinov2: Learning robust visual features without supervision , author=. arXiv preprint arXiv:2304.07193 , year=

  43. [43]

    Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition , pages=

    Shepherding slots to objects: Towards stable and robust object-centric learning , author=. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition , pages=

  44. [44]

    European conference on computer vision , pages=

    Unifying deep local and global features for image search , author=. European conference on computer vision , pages=. 2020 , organization=

  45. [45]

    Yang, Min and He, Dongliang and Fan, Miao and Shi, Baorong and Xue, Xuetong and Li, Fu and Ding, Errui and Huang, Jizhou , booktitle=

  46. [46]

    Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition , pages=

    Rethinking visual geo-localization for large-scale applications , author=. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition , pages=

  47. [47]

    Zhu, Sijie and Yang, Taojiannan and Chen, Chen , booktitle=

  48. [48]

    Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition , pages=

    Coming down to earth: Satellite-to-street view synthesis for geo-localization , author=. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition , pages=

  49. [49]

    IEEE transactions on pattern analysis and machine intelligence , volume=

    Accurate 3-DoF camera geo-localization via ground-to-satellite image matching , author=. IEEE transactions on pattern analysis and machine intelligence , volume=. 2022 , publisher=

  50. [50]

    Simple, effective and general: A new backbone for cross-view image geo-localization,

    Simple, effective and general: A new backbone for cross-view image geo-localization , author=. arXiv preprint arXiv:2302.01572 , year=

  51. [51]

    IEEE Transactions on Circuits and Systems for Video Technology , volume=

    Patch similarity self-knowledge distillation for cross-view geo-localization , author=. IEEE Transactions on Circuits and Systems for Video Technology , volume=. 2023 , publisher=

  52. [52]

    European Conference on Computer Vision , pages=

    Cross-view image geo-localization with Panorama-BEV Co-Retrieval Network , author=. European Conference on Computer Vision , pages=. 2024 , organization=

  53. [53]

    IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing , year=

    Cv-cities: Advancing cross-view geo-localization in global cities , author=. IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing , year=

  54. [54]

    Advances in neural information processing systems , volume=

    Object-centric representation learning with generative spatial-temporal factorization , author=. Advances in neural information processing systems , volume=

  55. [55]

    IEEE Transactions on Pattern Analysis and Machine Intelligence , volume=

    Unsupervised object-centric learning from multiple unspecified viewpoints , author=. IEEE Transactions on Pattern Analysis and Machine Intelligence , volume=. 2024 , publisher=

  56. [56]

    Proceedings of the IEEE/CVF International Conference on Computer Vision , pages=

    Objectrelator: Enabling cross-view object relation understanding across ego-centric and exo-centric perspectives , author=. Proceedings of the IEEE/CVF International Conference on Computer Vision , pages=

  57. [57]

    Advances in Neural Information Processing Systems , volume=

    Improving viewpoint-independent object-centric representations through active viewpoint selection , author=. Advances in Neural Information Processing Systems , volume=

  58. [58]

    IEEE transactions on pattern analysis and machine intelligence , volume=

    Learning representations for neural network-based classification using the information bottleneck principle , author=. IEEE transactions on pattern analysis and machine intelligence , volume=. 2019 , publisher=

  59. [59]

    IEEE Transactions on Circuits and Systems for Video Technology , volume=

    MCCG: A ConvNeXt-based multiple-classifier method for cross-view geo-localization , author=. IEEE Transactions on Circuits and Systems for Video Technology , volume=. 2023 , publisher=

  60. [60]

    IEEE Transactions on Circuits and Systems for Video Technology , year=

    Enhancing cross-view geo-localization with domain alignment and scene consistency , author=. IEEE Transactions on Circuits and Systems for Video Technology , year=

  61. [61]

    IEEE Transactions on Geoscience and Remote Sensing , year=

    Camp: A cross-view geo-localization method using contrastive attributes mining and position-aware partitioning , author=. IEEE Transactions on Geoscience and Remote Sensing , year=

  62. [62]

    Proceedings of the 32nd ACM International Conference on Multimedia , pages=

    Mfrgn: Multi-scale feature representation generalization network for ground-to-aerial geo-localization , author=. Proceedings of the 32nd ACM International Conference on Multimedia , pages=

  63. [63]

    2025 IEEE/CVF International Conference on Computer Vision (ICCV) , pages=

    MMGeo: Multimodal Compositional Geo-Localization for UAVs , author=. 2025 IEEE/CVF International Conference on Computer Vision (ICCV) , pages=

  64. [64]

    IEEE Transactions on Geoscience and Remote Sensing , year=

    Learning cross-view visual geo-localization without ground truth , author=. IEEE Transactions on Geoscience and Remote Sensing , year=

  65. [65]

    Learning deep representations by mutual information estimation and maximization

    Learning deep representations by mutual information estimation and maximization , author=. arXiv preprint arXiv:1808.06670 , year=

  66. [66]

    On Differentiating Parameterized Argmin and Argmax Problems with Application to Bi-level Optimization

    On differentiating parameterized argmin and argmax problems with application to bi-level optimization , author=. arXiv preprint arXiv:1607.05447 , year=

  67. [67]

    Advances in neural information processing systems , volume=

    Attention is all you need , author=. Advances in neural information processing systems , volume=