egenioussBench: A New Dataset for Geospatial Visual Localisation
Pith reviewed 2026-05-08 16:49 UTC · model grok-4.3
The pith
A new benchmark pairs airborne city 3D meshes with smartphone images to create realistic tests for large-scale visual localisation.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
By using deployable geospatial reference assets instead of structure-from-motion reconstructions and by enforcing a non-co-visible test split through maximum-independent-set selection on a rendered-depth co-visibility graph, egenioussBench supplies evaluation data that expose cross-view and cross-domain localisation difficulties while remaining extensible to city-scale problems.
What carries the argument
The maximum independent set extracted from the co-visibility matrix estimated from depth renders of the reference models; it isolates a compact set of mutually non-overlapping query images that simulate realistic deployment conditions.
If this is right
- Localisation methods can be compared fairly on a public leaderboard using consistent binning metrics at several pose-error thresholds plus median, RMSE and outlier-ratio statistics.
- The 412-image validation split with known poses supports supervised training of pose regressors and self-validation experiments.
- Algorithms can be developed and tested against standard city mapping products rather than requiring per-scene structure-from-motion reconstructions.
- The design isolates the specific difficulties of airborne-to-ground view changes and mesh-to-image domain shifts for targeted method improvement.
Where Pith is reading between the lines
- Adoption of the benchmark could encourage the community to evaluate localisation systems against official geospatial assets instead of reconstructed models.
- The same co-visibility-matrix and maximum-independent-set procedure could be reused to construct test sets for other cities or sensor combinations.
- Large performance differences between mesh-based and LoD2-based entries on the leaderboard would indicate which reference representation is more suitable for particular operational settings.
Load-bearing premise
The maximum independent set taken from the estimated co-visibility matrix produces a test split whose difficulty and distribution genuinely reflect real-world deployment conditions without hidden selection bias.
What would settle it
Releasing the withheld ground truth for the 42-image test split and then measuring whether methods that rank highest on the leaderboard maintain their reported accuracy when the same images are replaced by a random sample of equal size drawn from the full 2,709-image collection would directly test whether the non-co-visibility criterion creates the claimed increase in difficulty.
Figures
read the original abstract
We present egenioussBench, a visual localisation benchmark built on geospatial reference data: a city-scale airborne 3D mesh and a CityGML LoD2 model. This pairing reflects deployable mapping assets and supports true scalability beyond traditional SfM-based approaches. The query data comprise smartphone images with centimetre-accurate, map-independent ground truth obtained via PPK and GCP/CP-aided adjustment. From 2,709 images, we derive a non-co-visible subset by estimating the full co-visibility matrix from rendered depth and selecting a maximum independent set; the released data include a test split of 42 non-co-visible images with withheld ground truth and a validation split of 412 sequential images with poses, e.g. for training of pose regressors and self-validation. The benchmark features a public leaderboard evaluated with binning metrics at multiple pose-error thresholds alongside global statistics (median, RMSE, outlier ratio), ensuring fair, like-for-like comparison across mesh- and LoD2-based methods. Together, these design choices expose realistic cross-view and cross-domain challenges while providing a rigorous, scalable path for advancing large-scale visual localisation. We make the evaluation code and data availeable at https://github.com/fratopa/egenioussBench and https://www.egeniouss.eu/
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper presents egenioussBench, a benchmark for geospatial visual localisation built on a city-scale airborne 3D mesh and CityGML LoD2 model. Smartphone images with centimetre-accurate ground truth (via PPK and GCP/CP-aided adjustment) are used to derive a non-co-visible test split of 42 images by estimating a full co-visibility matrix from rendered depth and extracting a maximum independent set; a validation split of 412 sequential images is also provided. The benchmark includes a public leaderboard using binning metrics at multiple pose-error thresholds plus global statistics (median, RMSE, outlier ratio) to support fair comparisons of mesh- and LoD2-based methods.
Significance. If the test-split construction is shown to avoid selection bias, this work provides a valuable contribution by releasing a scalable benchmark that leverages deployable geospatial assets rather than SfM reconstructions, exposing realistic cross-view and cross-domain challenges. The withheld test ground truth, evaluation code, and public data availability at the cited GitHub and website are clear strengths that enhance reproducibility and utility for the community.
major comments (1)
- [Abstract (and corresponding methods section on co-visibility and split construction)] The derivation of the 42-image test split via maximum independent set from the estimated co-visibility matrix (described in the abstract) may systematically bias the selection toward low-density regions, atypical viewpoints, or unique geometric features, since the solver prioritizes global non-overlap rather than matching clustered or sequential real-world query statistics. No quantitative validation—such as spatial histograms, depth distributions, or comparison against random non-co-visible sampling—is provided to confirm the split reflects realistic deployment conditions, which is load-bearing for the central claim that the benchmark exposes realistic challenges.
minor comments (2)
- [Abstract] Typo: 'availeable' should be corrected to 'available'.
- [Abstract] The abstract refers to 'binning metrics at multiple pose-error thresholds' without specifying the exact thresholds or bin definitions; adding these details would improve clarity for readers evaluating the leaderboard protocol.
Simulated Author's Rebuttal
We thank the referee for their constructive and detailed review. We address the single major comment below, providing clarification on our design choices while acknowledging the need for additional supporting analysis. We commit to incorporating the suggested validation in the revised manuscript.
read point-by-point responses
-
Referee: [Abstract (and corresponding methods section on co-visibility and split construction)] The derivation of the 42-image test split via maximum independent set from the estimated co-visibility matrix (described in the abstract) may systematically bias the selection toward low-density regions, atypical viewpoints, or unique geometric features, since the solver prioritizes global non-overlap rather than matching clustered or sequential real-world query statistics. No quantitative validation—such as spatial histograms, depth distributions, or comparison against random non-co-visible sampling—is provided to confirm the split reflects realistic deployment conditions, which is load-bearing for the central claim that the benchmark exposes realistic challenges.
Authors: We thank the referee for this important observation. The maximum independent set (MIS) was deliberately selected to guarantee a mutually non-co-visible test set of 42 images, which is central to the benchmark's goal of evaluating localisation performance under realistic cross-view and cross-domain conditions without trivial overlaps. This construction avoids the common pitfall of clustered queries that could artificially boost metrics through repeated similar viewpoints. While the global optimisation of MIS could in principle favour certain distributions, the underlying image set of 2,709 densely sampled smartphone images across the city-scale area provides a broad base from which the MIS is drawn, mitigating extreme bias. Nevertheless, we agree that the manuscript does not include quantitative checks (spatial histograms, depth distributions, or random-sampling baselines) to empirically confirm the split's representativeness. In the revised version we will add these analyses, including (i) spatial density maps of the selected test points versus the full set, (ii) depth and viewpoint histograms, and (iii) a direct comparison against randomly sampled non-co-visible subsets of the same size. These additions will strengthen the claim that the benchmark reflects realistic deployment conditions. revision: yes
Circularity Check
No circularity in dataset construction or claims
full rationale
The paper is a data release describing procedural construction of a benchmark split (maximum independent set on an estimated co-visibility matrix derived from rendered depth and GT poses). This is an explicit, one-way definition of the test set properties rather than any derivation, prediction, or fitted parameter that reduces back to its own inputs. No equations, self-citations, or ansatzes are invoked to justify a result; the central claims about exposing cross-view challenges rest on the released data, code, and external verifiability of the construction steps. The skeptic concern about possible selection bias in the MIS procedure is a question of external validity, not circularity.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption Depth maps rendered from the 3D mesh and CityGML model accurately reflect real scene geometry for co-visibility computation.
Reference graph
Works this paper leans on
-
[1]
Produkt- und Qualit ¨atsstandard f ¨ur 3D-Geb ¨audemodelle. https://www.adv-online.de/AdV-Produkte/Standards-und- Produktblaetter/Standards-der-Geotopographie/ (last visited October 08, 2025). Barath, D., Noskova, J., Ivashechkin, M., Matas, J.,
work page 2025
-
[2]
MAGSAC++, a Fast, Reliable and Accurate Robust Estimator. 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), IEEE, Seattle, W A, USA, 1301–1309. Berton, G., Junglas, L., Zaccone, R., Pollok, T., Caputo, B., Masone, C.,
work page 2020
-
[3]
Brejcha, J., Luk ´aˇc, M., Hold-Geoffroy, Y ., Wang, O., ˇCad´ık, M.,
Acceler- ated Coordinate Encoding: Learning to Relocalize in Minutes Using RGB and Poses.2023 IEEE/CVF Conference on Com- puter Vision and Pattern Recognition (CVPR), IEEE, Van- couver, BC, Canada, 5044–5053. Brejcha, J., Luk ´aˇc, M., Hold-Geoffroy, Y ., Wang, O., ˇCad´ık, M.,
work page 2023
-
[4]
Landscapear: Large scale outdoor augmented real- ity by matching photographs with terrain models using learned descriptors. A. Vedaldi, H. Bischof, T. Brox, J.-M. Frahm (eds), Computer Vision – ECCV 2020, Lecture Notes in Computer Science, 12374, Springer International Publishing, 295–312. DeTone, D., Malisiewicz, T., Rabinovich, A.,
work page 2020
-
[5]
Di Giammarino, L., Sun, B., Grisetti, G., Pollefeys, M., Blum, H., Barath, D.,
SuperPoint: Self-Supervised Interest Point Detection and Description.2018 IEEE/CVF Conference on Computer Vision and Pattern Recog- nition Workshops (CVPRW), IEEE, Salt Lake City, UT, USA, 337–33712. Di Giammarino, L., Sun, B., Grisetti, G., Pollefeys, M., Blum, H., Barath, D.,
work page 2018
-
[6]
Learning Where to Look: Self-supervised Viewpoint Selection for Active Localization Using Geomet- rical Information. A. Leonardis, E. Ricci, S. Roth, O. Rus- sakovsky, T. Sattler, G. Varol (eds),Computer Vision – ECCV 2024, Springer Nature Switzerland, Cham, 188–205. Dong, S., Wang, S., Zhuang, Y ., Kannala, J., Pollefeys, M., Chen, B.,
work page 2024
-
[7]
Hespe, D., Schulz, C., Strash, D.,
Visual Localization via Few-Shot Scene Re- gion Classification.2022 International Conference on 3D Vis- ion (3DV), IEEE Computer Society, 393–402. Hespe, D., Schulz, C., Strash, D.,
work page 2022
-
[8]
Scalable Ker- nelization for Maximum Independent Sets.ACM Journal of Experimental Algorithmics, 24(1), 1.16:1–1.16:22. ht- tps://doi.org/10.1145/3355502. Krishnan, A., Liu, S., Sarlin, P.-E., Gentilhomme, O., Caruso, D., Monge, M., Newcombe, R., Engel, J., Pollefeys, M.,
-
[9]
ALKIS. ”https://ni- lgln-opengeodata.hub.arcgis.com/apps/lgln- opengeodata::alkis/about” (last visited October 21, 2025). Landesamt f¨ur Geoinformation und Landesvermessung Nieder- sachsen,
work page 2025
-
[10]
”https://ni-lgln- opengeodata.hub.arcgis.com/” (last visited October 21, 2025)
Opengeodata niedersachsen. ”https://ni-lgln- opengeodata.hub.arcgis.com/” (last visited October 21, 2025). Loeper, Y ., Gerke, M., Alamouri, A., Kern, A., Ba- jauri, M. S., Fanta-Jende, P.,
work page 2025
-
[11]
https://isprs-archives.copernicus.org/articles/XLVIII- 2-W8-2024/311/2024/
Visual localization in urban environments employing 3D city models.The In- ternational Archives of the Photogrammetry, Remote Sens- ing and Spatial Information Sciences, XLVIII-2/W8-2024, 311–318. https://isprs-archives.copernicus.org/articles/XLVIII- 2-W8-2024/311/2024/. Maggio, D., Abate, M., Shi, J., Mario, C., Carlone, L.,
work page 2024
-
[12]
2023 IEEE International Conference on Robotics and Automa- tion (ICRA), 4018–4025
Loc-nerf: Monte carlo localization using neural radiance fields. 2023 IEEE International Conference on Robotics and Automa- tion (ICRA), 4018–4025. Mera-Trujillo, M., Smith, B., Fragoso, V .,
work page 2023
-
[13]
Mikolka-Fl¨ory, S., Ressl, C., Schimpl, L., Pfeifer, N.,
Efficient Scene Compression for Visual-based Localization.2020 International Conference on 3D Vision (3DV), 1–10. Mikolka-Fl¨ory, S., Ressl, C., Schimpl, L., Pfeifer, N.,
work page 2020
-
[14]
MeshLoc: Mesh- Based Visual Localization. S. Avidan, G. Brostow, M. Ciss ´e, G. M. Farinella, T. Hassner (eds),Computer Vision – ECCV 2022, 13682, Springer Nature Switzerland, Cham, 589–609. Panek, V ., Kukelova, Z., Sattler, T.,
work page 2022
-
[15]
Visual Loc- alization using Imperfect 3D Models from the Internet. 10.48550/arXiv.2304.05947. Pang, W., Xia, C., Leong, B., Ahmad, F., Paek, J., Govindan, R.,
-
[16]
Vultaggio, F., Fanta-Jende, P., Gerke, M.,
Crosslocate: Cross- modal large-scale visual geo-localization in natural environ- ments using rendered modalities.2022 IEEE/CVF Winter Con- ference on Applications of Computer Vision (WACV), IEEE, 2193–2202. Vultaggio, F., Fanta-Jende, P., Gerke, M.,
work page 2022
-
[17]
Vultaggio, F., Fanta-Jende, P., Sch ¨orghuber, M., Kern, A., Gerke, M.,
Perspective- n-Point in Practice: Performance, Robustness, and Accuracy for Mesh-Based Localisation.The International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sci- ences, XLVIII-1-W4-2025, 131–138. Vultaggio, F., Fanta-Jende, P., Sch ¨orghuber, M., Kern, A., Gerke, M.,
work page 2025
-
[18]
https://isprs-archives.copernicus.org/articles/XLVIII-2-W8- 2024/447/2024/
Investigating Visual Localization Using Geospatial Meshes.The International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences, XLVIII-2/W8-2024, 447–454. https://isprs-archives.copernicus.org/articles/XLVIII-2-W8- 2024/447/2024/. Wald, J., Sattler, T., Golodetz, S., Cavallari, T., Tombari, F.,
work page 2024
-
[19]
Wang, F., Jiang, X., Galliani, S., V ogel, C., Pollefeys, M.,
Beyond Controlled Environments: 3D Cam- era Re-Localization in Changing Indoor Scenes.ht- tps://arxiv.org/abs/2008.02004. Wang, F., Jiang, X., Galliani, S., V ogel, C., Pollefeys, M.,
-
[20]
Wu, R., Cheng, X., Zhu, J., Liu, X., Zhang, M., Yan, S.,
GLACE: Global Local Accelerated Coordinate Encoding.2024 IEEE/CVF Conference on Computer Vision and Pattern Recog- nition (CVPR), IEEE, Seattle, W A, USA, 5819–5828. Wu, R., Cheng, X., Zhu, J., Liu, X., Zhang, M., Yan, S.,
work page 2024
-
[21]
Uavd4l: A large-scale dataset for uav 6-dof localization
UA VD4L: A Large-Scale Dataset for UA V 6-DoF Localization. 10.48550/arXiv.2401.05971. Yan, Q., Zheng, J., Reding, S., Li, S., Doytchinov, I.,
-
[22]
Yan, S., Cheng, X., Liu, Y ., Zhu, J., Wu, R., Liu, Y ., Zhang, M.,
CrossLoc: Scalable Aerial Localization Assisted by Mul- timodal Synthetic Data.10.48550/arXiv.2112.09081. Yan, S., Cheng, X., Liu, Y ., Zhu, J., Wu, R., Liu, Y ., Zhang, M.,
-
[23]
Render-and-Compare: Cross-view 6-DoF Localiza- tion from Noisy Prior.2023 IEEE International Conference on Multimedia and Expo (ICME), IEEE Computer Society, 2171–
work page 2023
-
[24]
Zhu, J., Yan, S., Wang, L., Zhang, S., Liu, Y ., Zhang, M.,
LoD-Loc v2: Aerial Visual Localization over Low Level-of-Detail City Models using Explicit Silhouette Align- ment.https://doi.org/10.48550/arXiv.2507.00659. Zhu, J., Yan, S., Wang, L., Zhang, S., Liu, Y ., Zhang, M.,
-
[25]
LoD-Loc: Aerial Visual Localization using LoD 3D Map with Neural Wireframe Alignment.10.48550/arXiv.2410.12269
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.