pith. sign in

arxiv: 2604.12068 · v1 · submitted 2026-04-13 · 💻 cs.CV

Privacy-Preserving Structureless Visual Localization via Image Obfuscation

Pith reviewed 2026-05-10 15:38 UTC · model grok-4.3

classification 💻 cs.CV
keywords visual localizationprivacy preservationimage obfuscationstructureless methodsfeature matchingsemantic segmentationcamera pose estimationstructure from motion
0
0 comments X

The pith

Simple image obfuscation lets structureless visual localization preserve privacy without changing pipelines or losing accuracy.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper shows that privacy-preserving visual localization does not require complex new representations or algorithms. Instead, replacing both query and reference images with standard obfuscations such as semantic segmentations is enough. Modern feature matchers already succeed on these abstracted images, so existing structureless pipelines work unchanged. The result is a practical system that hides private details in sent images and stored scene data while delivering competitive pose accuracy across datasets.

Core claim

Structureless localization pipelines need no modifications to become privacy-preserving because off-the-shelf feature matchers can directly match obfuscated images produced by common operations like semantic segmentation, yielding state-of-the-art accuracy for privacy-preserving methods.

What carries the argument

Image obfuscation via everyday operations such as converting RGB images to semantic segmentations, which removes private visual details while retaining sufficient features for matching by existing detectors and descriptors.

If this is right

  • No custom code or retraining is required to add privacy to any existing structureless localization system.
  • Both the query images sent to a server and the reference images stored on the server can be obfuscated for protection.
  • The approach reaches the best reported pose accuracy among privacy-preserving visual localization techniques on standard benchmarks.
  • The same obfuscated representations can be used for both localization queries and map storage without scene-specific tuning.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The method could be tested on video streams or live camera feeds to check whether frame-to-frame consistency helps or hurts privacy protection.
  • If feature matchers continue to improve on abstracted inputs, even lighter forms of obfuscation might suffice for many applications.
  • This suggests a broader design pattern where privacy is added at the image level rather than through cryptographic or federated changes to the backend.
  • Similar obfuscation steps might transfer to other structureless tasks such as visual odometry or image-based rendering.

Load-bearing premise

That common obfuscations like semantic segmentation remove enough private information from images and scene representations yet leave distinctive enough features for unmodified modern feature matchers to succeed across different scenes and conditions.

What would settle it

A controlled test on a dataset containing clearly private elements where matching accuracy between obfuscated images drops below the level achieved by non-obfuscated baselines or by other privacy methods.

Figures

Figures reproduced from arXiv: 2604.12068 by Patrik Beliansky, Torsten Sattler, Vojtech Panek, Zuzana Kukelova.

Figure 1
Figure 1. Figure 1: In this paper, we investigate privacy-preservation through image obfuscation in the context of visual [PITH_FULL_IMAGE:figures/full_fig_p002_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Example segmentation masks: The top row shows the original image, SAM1 - fine masks, and SAM1 - fine borders. The bottom row shows Mask2Former segmentation masks: colored by semantic labels (semantic), colored randomly (random), segment borders (borders) [PITH_FULL_IMAGE:figures/full_fig_p004_2.png] view at source ↗
Figure 4
Figure 4. Figure 4: Example of text masking for SAM sampling. [PITH_FULL_IMAGE:figures/full_fig_p004_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: contains examples of the blur and pixelization ob￾fuscations [PITH_FULL_IMAGE:figures/full_fig_p008_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: Selective anonymization example. From top [PITH_FULL_IMAGE:figures/full_fig_p009_6.png] view at source ↗
Figure 7
Figure 7. Figure 7: Comparison of segmentation masks on the Indoor-6 dataset. Left to right: original image, SAM1 - fine masks segmentation and Mask2Former - semantic segmentation (using ADE20k classes). A.4 Edge extraction Apart from the Canny edge detector (Canny), which is ap￾plied directly to the input RGB photos, we tested an ap￾proach that first estimates monocular depth maps gener￾ated with the Metric3D [107, 42] monoc… view at source ↗
Figure 8
Figure 8. Figure 8: Examples of the used edge extraction meth [PITH_FULL_IMAGE:figures/full_fig_p009_8.png] view at source ↗
Figure 9
Figure 9. Figure 9: Comparison of Canny and SAM1 - fine borders on Cambridge Landmarks dataset. On the top row: original image and Canny edge map. The bot￾tom row contains two maps generated with SAM1 - fine borders. The right one was generated with text filtra￾tion. C Details on the evaluation datasets This section contains details on the evaluated datasets. Aachen Day-Night v1.1 [110, 87, 89] is an outdoor dataset that capt… view at source ↗
Figure 10
Figure 10. Figure 10: Reconstructions of the St Mary’s Church scene from Cambridge Landmarks [44] from origi￾nal images using SuperPoint [23] features (left) and ALIKED [111, 112] features (right). We can see that in the right image, one side of the cathedral collapsed onto the other. which case global feature extraction could be performed di￾rectly on the server. The results presented in Tab. 14 show that, for most meth￾ods, … view at source ↗
read the original abstract

Visual localization is the task of estimating the camera pose of an image relative to a scene representation. In practice, visual localization systems are often cloud-based. Naturally, this raises privacy concerns in terms of revealing private details through the images sent to the server or through the representations stored on the server. Privacy-preserving localization aims to avoid such leakage of private details. However, the resulting localization approaches are significantly more complex, slower, and less accurate than their non-privacy-preserving counterparts. In this paper, we consider structureless localization methods in the context of privacy preservation. Structureless methods represent the scene through a set of reference images with known camera poses and intrinsics. In contrast to existing methods proposing representations that are as privacy-preserving as possible, we study a simple image obfuscation approach based on common image operations, e.g., replacing RGB images with (semantic) segmentations. We show that existing structureless pipelines do not need any special adjustments, as modern feature matchers can match obfuscated images out of the box. The results are easy-to-implement pipelines that can ensure both the privacy of the query images and the scene representations. Detailed experiments on multiple datasets show that the resulting methods achieve state-of-the-art pose accuracy for privacy-preserving approaches.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper claims that structureless visual localization pipelines can be made privacy-preserving via simple image obfuscation operations (e.g., replacing RGB images with semantic segmentations) applied to both query and reference images. It asserts that modern feature matchers succeed on these obfuscated inputs without any pipeline modifications or retraining, yielding easy-to-implement methods that protect privacy for queries and scene representations while achieving state-of-the-art pose accuracy among privacy-preserving approaches, supported by experiments on multiple datasets.

Significance. If the empirical results hold under scrutiny, the work is significant for demonstrating that privacy preservation need not require complex custom representations or matcher adaptations in structureless localization. It leverages off-the-shelf components to balance privacy and accuracy, potentially making such systems more practical for cloud-based applications.

major comments (2)
  1. The central claim that 'modern feature matchers can match obfuscated images out of the box' (Abstract) is load-bearing for the 'no special adjustments' assertion. Given the domain shift from natural RGB statistics to class-label maps, the experiments must include keypoint repeatability metrics, correspondence accuracy breakdowns, and comparisons against RGB baselines or retrained matchers to confirm robustness across datasets without adaptation.
  2. Privacy claims for scene representations (Abstract and §3) rely on obfuscation removing private details. However, semantic segmentations can still convey structural layout; the paper should report quantitative privacy evaluations (e.g., success rates of reconstruction or attribute inference attacks) rather than assuming sufficient protection.
minor comments (2)
  1. Figure captions and the experimental section could include example visualizations of matched keypoints on obfuscated vs. original images to illustrate the 'out of the box' matching.
  2. Ensure the related work section explicitly contrasts the proposed method against prior privacy-preserving localization techniques that modify matchers or representations.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback and detailed comments on our manuscript. We address each major comment point by point below, providing clarifications on our approach and indicating where we will revise the paper to incorporate additional analysis.

read point-by-point responses
  1. Referee: The central claim that 'modern feature matchers can match obfuscated images out of the box' (Abstract) is load-bearing for the 'no special adjustments' assertion. Given the domain shift from natural RGB statistics to class-label maps, the experiments must include keypoint repeatability metrics, correspondence accuracy breakdowns, and comparisons against RGB baselines or retrained matchers to confirm robustness across datasets without adaptation.

    Authors: We appreciate the referee's emphasis on this point. Our primary evaluation metric is end-to-end pose accuracy on multiple datasets, which already demonstrates that off-the-shelf matchers produce usable correspondences on obfuscated inputs without retraining or pipeline changes. To strengthen the evidence for matcher robustness under domain shift, we will add keypoint repeatability metrics and correspondence accuracy breakdowns in the revised experiments section. We will also include explicit side-by-side comparisons of matching statistics and localization performance against the corresponding RGB baselines. These additions will be reported for all evaluated obfuscation methods and datasets. revision: yes

  2. Referee: Privacy claims for scene representations (Abstract and §3) rely on obfuscation removing private details. However, semantic segmentations can still convey structural layout; the paper should report quantitative privacy evaluations (e.g., success rates of reconstruction or attribute inference attacks) rather than assuming sufficient protection.

    Authors: We agree that semantic segmentations preserve coarse structural layout and that this could in principle enable certain inference attacks. Our manuscript focuses on removing fine-grained private information (textures, colors, identities) via standard obfuscation operations while preserving localization utility. We did not include quantitative attack simulations, as these would require defining specific threat models and attack implementations that fall outside the paper's core scope. In the revision we will expand §3 with a more detailed qualitative analysis of what information is removed versus retained, explicitly acknowledge the structural leakage concern, and discuss potential attack vectors as a limitation. We believe this provides a balanced treatment without overclaiming perfect privacy. revision: partial

Circularity Check

0 steps flagged

No significant circularity: purely empirical validation of obfuscation for structureless localization

full rationale

The paper advances an empirical claim that modern feature matchers succeed on obfuscated images (e.g., semantic segmentations) without pipeline modifications, supported by experiments across datasets. No derivation chain, equations, or first-principles results are present that could reduce a prediction to a fitted input or self-citation by construction. The central result is tested directly against external matchers and benchmarks rather than being defined or forced by the paper's own inputs.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim rests on the domain assumption that obfuscated images retain sufficient matching information. No free parameters, invented entities, or additional axioms are described in the abstract.

axioms (1)
  • domain assumption Modern feature matchers can match obfuscated images such as semantic segmentations effectively without modifications.
    This assumption underpins the claim that no special adjustments to existing pipelines are needed.

pith-pipeline@v0.9.0 · 5534 in / 1245 out tokens · 40499 ms · 2026-05-10T15:38:31.924346+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

118 extracted references · 118 canonical work pages · 1 internal anchor

  1. [1]

    Deep 3D World Models for Multi-Image Super-Resolution Beyond Optical Flow.IEEE Access,

    Luca Savant Aira, Diego Valsesia, Andrea Bordone Molini, Giulia Fracastoro, Enrico Magli, and Andrea Mirabile. Deep 3D World Models for Multi-Image Super-Resolution Beyond Optical Flow.IEEE Access,

  2. [2]

    Single Im- age Super-Resolution: A Comprehensive Review and Recent Insight.Frontiers of Computer Science, 18(1):181702, 2024

    Hanadi Al-Mekhlafi and Shiguang Liu. Single Im- age Super-Resolution: A Comprehensive Review and Recent Insight.Frontiers of Computer Science, 18(1):181702, 2024. 5, 9

  3. [3]

    Vulnerability of privacy-preserving visual localization against diffusion-based attacks

    Anonymous. Vulnerability of privacy-preserving visual localization against diffusion-based attacks. InSub- mitted to The Fourteenth International Conference on Learning Representations, 2025. under review:https: //openreview.net/forum?id=NmWf0gLufZ. 3, 9, 10

  4. [4]

    NetVLAD: CNN Architecture for Weakly Supervised Place Recognition.2016 IEEE Conference on Computer Vision and Pattern Recogni- tion (CVPR), pages 5297–5307, 2015

    Relja Arandjelović, Petr Gronát, Akihiko Torii, Tomás Pajdla, and Josef Sivic. NetVLAD: CNN Architecture for Weakly Supervised Place Recognition.2016 IEEE Conference on Computer Vision and Pattern Recogni- tion (CVPR), pages 5297–5307, 2015. 6, 8

  5. [5]

    C. Arth, D. Wagner, M. Klopschitz, A. Irschara, and D. Schmalstieg. Wide area localization on mobile phones. InISMAR, 2009. 1

  6. [6]

    RelocNet: Continuous Metric Learning Relocalisation using Neural Nets

    Vassileios Balntas, Shuda Li, and Victor Prisacariu. RelocNet: Continuous Metric Learning Relocalisation using Neural Nets. InThe European Conference on Computer Vision (ECCV), September 2018. 2

  7. [7]

    Rethinking Visual Geo-Localization for Large-Scale Applications

    Gabriele Berton, Carlo Masone, and Barbara Caputo. Rethinking Visual Geo-Localization for Large-Scale Applications. InIEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2022. 12

  8. [8]

    EigenPlaces: Training Viewpoint Robust Models for Visual Place Recognition

    Gabriele Berton, Gabriele Trivigno, Barbara Caputo, and Carlo Masone. EigenPlaces: Training Viewpoint Robust Models for Visual Place Recognition. InPro- ceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pages 11080–11090, Oc- tober 2023. 3, 4, 5, 6, 7, 8, 12, 13, 14, 15, 16

  9. [9]

    Heikkila, and Zuzana Kukelova

    SnehalBhayani, TorstenSattler, DánielBaráth, Patrik Beliansky, J. Heikkila, and Zuzana Kukelova. Cali- brated and Partially Calibrated Semi-Generalized Ho- mographies.2021 IEEE/CVF International Confer- ence on Computer Vision (ICCV), pages 5916–5925,

  10. [10]

    6DGS: 6D Pose Estimation from a Single Image and a 3D Gaussian Splatting Model

    Matteo Bortolon, Theodore Tsesmelis, Stuart James, Fabio Poiesi, and Alessio Del Bue. 6DGS: 6D Pose Estimation from a Single Image and a 3D Gaussian Splatting Model. InECCV, 2024. 2

  11. [11]

    Eric Brachmann, Tommaso Cavallari, and Vic- tor Adrian Prisacariu. Accelerated Coordinate En- coding: Learning to Relocalize in Minutes Using RGB and Poses.2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 5044– 5053, 2023. 2

  12. [12]

    DSAC - Differentiable RANSAC for Camera Localization

    Eric Brachmann, Alexander Krull, Sebastian Nowozin, Jamie Shotton, Frank Michel, Stefan Gumhold, and Carsten Rother. DSAC - Differentiable RANSAC for Camera Localization. InCVPR, 2017. 2

  13. [13]

    Visual Camera Re-Localization From RGB and RGB-D Images Using DSAC.IEEE Transactions on Pattern Analysis and Machine Intelligence, 44:5847–5865, 2020

    Eric Brachmann and Carsten Rother. Visual Camera Re-Localization From RGB and RGB-D Images Using DSAC.IEEE Transactions on Pattern Analysis and Machine Intelligence, 44:5847–5865, 2020. 2, 7

  14. [14]

    John F. Canny. A Computational Approach to Edge Detection.IEEE Transactions on Pattern Analysis and Machine Intelligence, PAMI-8:679–698, 1986. 5

  15. [15]

    Robert Castle, Georg Klein, and David W. Murray. Video-rate localization in multiple maps for wearable augmented reality. InISWC, 2008. 1

  16. [16]

    Getting the Face Behind the Squares: Recon- structing Pixelized Video Streams

    Ludovico Cavedon, Luca Foschini, and Giovanni Vi- gna. Getting the Face Behind the Squares: Recon- structing Pixelized Video Streams. In5th USENIX Workshop on Offensive Technologies (WOOT 11),

  17. [17]

    Obfuscation Based Pri- vacy Preserving Representations are Recoverable Us- ing Neighborhood Information

    Kunal Chelani, Assia Benbihi, Fredrik Kahl, Torsten Sattler, and Zuzana Kukelova. Obfuscation Based Pri- vacy Preserving Representations are Recoverable Us- ing Neighborhood Information. In2025 International Conference on 3D Vision (3DV),pages189–199.IEEE,

  18. [18]

    How privacy-preserving are line clouds? recovering scene details from 3d lines

    KunalChelani, FredrikKahl, andTorstenSattler. How privacy-preserving are line clouds? recovering scene details from 3d lines. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recogni- tion (CVPR), 2021. 1, 3, 9

  19. [19]

    Activating More Pixels in Image Super-Resolution Transformer

    Xiangyu Chen, Xintao Wang, Jiantao Zhou, Yu Qiao, and Chao Dong. Activating More Pixels in Image Super-Resolution Transformer. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 22367–22377, 2023. 5, 9

  20. [20]

    Schwing, Alexander Kirillov, and Rohit Girdhar

    Bowen Cheng, Ishan Misra, Alexander G. Schwing, Alexander Kirillov, and Rohit Girdhar. Masked- attention Mask Transformer for Universal Image Seg- mentation.CVPR, 2022. 4

  21. [21]

    Schwing, and Alexander Kirillov

    Bowen Cheng, Alexander G. Schwing, and Alexander Kirillov. Per-Pixel Classification is Not All You Need for Semantic Segmentation.NeurIPS, 2021. 4

  22. [22]

    Locally Optimized RANSAC

    Ondřej Chum, Jiří Matas, and Josef Kittler. Locally Optimized RANSAC. InJoint pattern recognition sym- posium, pages 236–243. Springer, 2003. 3

  23. [23]

    SuperPoint: Self-Supervised Interest Point Detection and Description.2018 IEEE/CVF Con- ference on Computer Vision and Pattern Recognition Workshops (CVPRW), pages 337–33712, 2017

    Daniel DeTone, Tomasz Malisiewicz, and Andrew Ra- binovich. SuperPoint: Self-Supervised Interest Point Detection and Description.2018 IEEE/CVF Con- ference on Computer Vision and Pattern Recognition Workshops (CVPRW), pages 337–33712, 2017. 5, 6, 12, 13, 14

  24. [24]

    Revisiting the P3P problem

    Yaqing Ding, Jian Yang, Viktor Larsson, Carl Olsson, and Kalle Åström. Revisiting the P3P problem. In Proceedings of the IEEE/CVF Conference on Com- puter Vision and Pattern Recognition, pages 4872– 4880, 2023. 4

  25. [25]

    Tien Do, Ondrej Miksik, Joseph DeGol, Hyun Soo Park, and Sudipta N. Sinha. Learning to Detect Scene Landmarks for Camera Localization. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), June 2022. 5, 6, 7, 8, 10, 12, 13, 14, 15

  26. [26]

    Image Super-Resolution Using Deep Convolutional Networks.IEEE transactions on pat- tern analysis and machine intelligence, 38(2):295–307,

    Chao Dong, Chen Change Loy, Kaiming He, and Xiaoou Tang. Image Super-Resolution Using Deep Convolutional Networks.IEEE transactions on pat- tern analysis and machine intelligence, 38(2):295–307,

  27. [27]

    Lazy Visual Localization via Motion Averaging.arXiv:2307.09981, 2023

    Siyan Dong, Shaohui Liu, Hengkai Guo, Baoquan Chen, and Marc Pollefeys. Lazy Visual Localization via Motion Averaging.arXiv:2307.09981, 2023. 2

  28. [28]

    Reloc3R: Large-Scale Training of Relative Camera Pose Regression for Generalizable, Fast, and Accurate Visual Localization

    Siyan Dong, Shuzhe Wang, Shaohui Liu, Lulu Cai, Qingnan Fan, Juho Kannala, and Yanchao Yang. Re- loc3r: Large-Scale Training of Relative Camera Pose Regression for Generalizable, Fast, and Accurate Vi- sual Localization.arXiv preprint arXiv:2412.08376,

  29. [29]

    Inverting vi- sual representations with convolutional networks

    Alexey Dosovitskiy and Thomas Brox. Inverting vi- sual representations with convolutional networks. In CVPR, pages 4829–4837, 2016. 1, 3, 9

  30. [30]

    The Faiss library

    Matthijs Douze, Alexandr Guzhva, Chengqi Deng, Jeff Johnson, Gergely Szilvasy, Pierre-Emmanuel Mazaré, Maria Lomeli, Lucas Hosseini, and Hervé Jégou. The Faiss library. 2024. 12

  31. [31]

    Schonberger, Sudipta N

    Mihai Dusmanu, Johannes L. Schonberger, Sudipta N. Sinha, and Marc Pollefeys. Privacy-preserving image features via adversarial affine subspace embeddings. InProceedings of the IEEE/CVF Conference on Com- puter Vision and Pattern Recognition (CVPR), pages 14267–14277, June 2021. 1, 2, 3, 9

  32. [32]

    Roma v2: Harder better faster denser feature matching.arXiv preprint arXiv:2511.15706, 2025

    Johan Edstedt, David Nordström, Yushan Zhang, Georg Bökman, Jonathan Astermark, Viktor Lars- son, Anders Heyden, Fredrik Kahl, Mårten Waden- bäck, and Michael Felsberg. RoMa v2: Harder Bet- ter Faster Denser Feature Matching.arXiv preprint arXiv:2511.15706, 2025. 6, 15

  33. [33]

    RoMa: Robust Dense Feature Matching.IEEE Conference on Com- puter Vision and Pattern Recognition, 2024

    Johan Edstedt, Qiyu Sun, Georg Bökman, Mårten Wadenbäck, and Michael Felsberg. RoMa: Robust Dense Feature Matching.IEEE Conference on Com- puter Vision and Pattern Recognition, 2024. 4, 5, 6, 7, 11, 12, 13, 14, 15

  34. [34]

    Randomsam- ple consensus: a paradigm for model fitting with appli- cations to image analysis and automated cartography

    MartinA.FischlerandRobertC.Bolles. Randomsam- ple consensus: a paradigm for model fitting with appli- cations to image analysis and automated cartography. Commun. ACM, 24:381–395, 1981. 4

  35. [35]

    Schön- berger, and Marc Pollefeys

    Marcel Geppert, Viktor Larsson, Johannes L. Schön- berger, and Marc Pollefeys. Privacy preserving partial localization. InProceedings of the IEEE/CVF Con- ference on Computer Vision and Pattern Recognition (CVPR), pages 17337–17347, June 2022. 1, 3, 9, 14

  36. [36]

    Pri- vacy Preserving Structure-from-Motion

    Marcel Geppert, Viktor Larsson, Pablo Speciale, Jo- hannes L Schönberger, and Marc Pollefeys. Pri- vacy Preserving Structure-from-Motion. InComputer Vision–ECCV 2020: 16th European Conference, Glas- gow, UK, August 23–28, 2020, Proceedings, Part I 16, pages 333–350. Springer, 2020. 1, 3, 9, 14

  37. [37]

    Schonberger, and Marc Pollefeys

    Marcel Geppert, Viktor Larsson, Pablo Speciale, Jo- hannes L. Schonberger, and Marc Pollefeys. Privacy preserving localization and mapping from uncalibrated cameras. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 1809–1819, June 2021. 1, 3, 9, 14

  38. [38]

    Review and analysis of solutions of the three point perspective pose estimation prob- lem.International journal of computer vision (IJCV), 13:331–356, 1994

    Bert M Haralick, Chung-Nan Lee, Karsten Ottenberg, and Michael Nölle. Review and analysis of solutions of the three point perspective pose estimation prob- lem.International journal of computer vision (IJCV), 13:331–356, 1994. 4

  39. [39]

    K. He, X. Zhang, S. Ren, and J. Sun. Deep residual learning for image recognition. InCVPR, 2016. 12

  40. [40]

    On the (In)effectiveness of Mosaicing and BlurringasToolsforDocumentRedaction.Proceedings on Privacy Enhancing Technologies, 2016

    Steven Hill, Zhimin Zhou, Lawrence Saul, and Hovav Shacham. On the (In)effectiveness of Mosaicing and BlurringasToolsforDocumentRedaction.Proceedings on Privacy Enhancing Technologies, 2016. 5, 9

  41. [41]

    DRCT: Saving Image Super-Resolution Away from Information Bottleneck

    Chih-Chung Hsu, Chia-Ming Lee, and Yi-Shiuan Chou. DRCT: Saving Image Super-Resolution Away from Information Bottleneck. InProceedings of the IEEE/CVF Conference on Computer Vision and Pat- tern Recognition, pages 6133–6142, 2024. 5, 9

  42. [42]

    Xiaoyan Zhang, Zhipeng Cai, Xiaoxiao Long, Hao Chen, Kaixuan Wang, Gang Yu, Chunhua Shen, and Shaojie Shen

    Mu Hu, Wei Yin, China. Xiaoyan Zhang, Zhipeng Cai, Xiaoxiao Long, Hao Chen, Kaixuan Wang, Gang Yu, Chunhua Shen, and Shaojie Shen. Metric3d v2: A versatile monocular geometric foundation model for zero-shot metric depth and surface normal estimation. IEEE transactions on pattern analysis and machine in- telligence, PP, 2024. 9

  43. [43]

    Imagemagick

    ImageMagick Studio LLC. Imagemagick. https://imagemagick.org. 17

  44. [44]

    PoseNet: A Convolutional Network for Real- Time 6-DOF Camera Relocalization.2015 IEEE In- ternational Conference on Computer Vision (ICCV), pages 2938–2946, 2015

    Alex Kendall, Matthew Koichi Grimes, and Roberto Cipolla. PoseNet: A Convolutional Network for Real- Time 6-DOF Camera Relocalization.2015 IEEE In- ternational Conference on Computer Vision (ICCV), pages 2938–2946, 2015. 5, 6, 8, 10, 12, 13, 14

  45. [45]

    LDL: Line Distance Functions for Panoramic Localization

    Junho Kim, Changwoon Choi, Hojun Jang, and Young Min Kim. LDL: Line Distance Functions for Panoramic Localization. InProceedings of the IEEE/CVF International Conference on Computer Vi- sion (ICCV), pages 17882–17892, October 2023. 1, 3, 9

  46. [46]

    Fully Geometric Panoramic Localization

    Junho Kim, Jiwon Jeong, and Young Min Kim. Fully Geometric Panoramic Localization. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 20827–20837, June 2024. 1, 3, 9

  47. [47]

    Segment Anything

    Alexander Kirillov, Eric Mintun, Nikhila Ravi, Hanzi Mao, Chloe Rolland, Laura Gustafson, Tete Xiao, Spencer Whitehead, Alexander C Berg, Wan-Yen Lo, et al. Segment Anything. InProceedings of the IEEE/CVF international conference on computer vi- sion, pages 4015–4026, 2023. 4

  48. [48]

    Remove360: Benchmarking Residuals After Object Removal in 3D Gaussian Splatting.arXiv, abs/2508.11431, 2025

    Simona Kocour, Assia Benbihi, and Torsten Sat- tler. Remove360: Benchmarking Residuals After Object Removal in 3D Gaussian Splatting.arXiv, abs/2508.11431, 2025. 5, 6, 10, 11, 13, 16

  49. [49]

    Remove360 Dataset, 2025.https://huggingface.co/ simkoc/Remove360

    Simona Kocour, Assia Benbihi, and Torsten Sattler. Remove360 Dataset, 2025.https://huggingface.co/ simkoc/Remove360. 5, 6, 10, 11, 13, 16

  50. [50]

    PoseLib - Min- imal Solvers for Camera Pose Estimation, 2020

    Viktor Larsson and contributors. PoseLib - Min- imal Solvers for Camera Pose Estimation, 2020. https://github.com/vlarsson/PoseLib. 4

  51. [51]

    Camera Relocalization by Computing Pairwise Relative Poses Using Convolutional Neural Network

    Zakaria Laskar, Iaroslav Melekhov, Surya Kalia, and Juho Kannala. Camera Relocalization by Computing Pairwise Relative Poses Using Convolutional Neural Network. InICCV Workshops, 2017. 2

  52. [52]

    Sala Matas, and Ondřej Chum

    Karel Lebeda, Juan E. Sala Matas, and Ondřej Chum. Fixing the Locally Optimized RANSAC. InBMVC,

  53. [53]

    Paired-point lifting for enhanced privacy-preserving visual localization

    Chunghwan Lee, Jaihoon Kim, Chanhyuk Yun, and Je Hyeong Hong. Paired-point lifting for enhanced privacy-preserving visual localization. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 17266–17275, June 2023. 1, 3, 9, 14

  54. [54]

    Image Super-Resolution: A Com- prehensiveReview, Recent Trends, Challenges and Ap- plications.Information Fusion, 91:230–260, 2023

    Dawa Chyophel Lepcha, Bhawna Goyal, Ayush Dogra, and Vishal Goyal. Image Super-Resolution: A Com- prehensiveReview, Recent Trends, Challenges and Ap- plications.Information Fusion, 91:230–260, 2023. 5, 9

  55. [55]

    Grounding Image Matching in 3D with MASt3R, 2024

    Vincent Leroy, Yohann Cabon, and Jerome Revaud. Grounding Image Matching in 3D with MASt3R, 2024. 2, 6, 11

  56. [56]

    Worldwide pose estimation using 3d point clouds

    Yunpeng Li, Noah Snavely, Dan Huttenlocher, and Pascal Fua. Worldwide pose estimation using 3d point clouds. InECCV, pages 15–29. Springer, 2012. 2

  57. [57]

    Real-time Scene Text Detection with Dif- ferentiable Binarization

    Minghui Liao, Zhaoyi Wan, Cong Yao, Kai Chen, and Xiang Bai. Real-time Scene Text Detection with Dif- ferentiable Binarization. InProceedings of the AAAI conference on artificial intelligence, volume 34, pages 11474–11481, 2020. 5, 11

  58. [58]

    Sinha, Michael F

    Hyon Lim, Sudipta N. Sinha, Michael F. Cohen, and Matthew Uyttendaele. Real-time image-based 6-DOF 19 localization in large-scale environments.2012 IEEE Conference on Computer Vision and Pattern Recogni- tion, pages 1043–1050, 2012. 1

  59. [59]

    LightGlue: Local Feature Matching at Light Speed

    Philipp Lindenberger, Paul-Edouard Sarlin, and Marc Pollefeys. LightGlue: Local Feature Matching at Light Speed. InICCV, 2023. 5, 6, 12, 15

  60. [60]

    GS-CPR: Effpicient camera pose refinement via 3d gaussian splatting

    Changkun Liu, Shuai Chen, Yash Sanjay Bhalgat, Siyan HU, Ming Cheng, Zirui Wang, Victor Adrian Prisacariu, and Tristan Braud. GS-CPR: Effpicient camera pose refinement via 3d gaussian splatting. In The Thirteenth International Conference on Learn- ing Representations, 2025.https://openreview.net/ forum?id=mP7uV59iJM. 2

  61. [61]

    Hesch, Marc Pollefeys, and Roland Y

    Simon Lynen, Torsten Sattler, Michael Bosse, Joel A. Hesch, Marc Pollefeys, and Roland Y. Siegwart. Get Out of My Lab: Large-scale, Real-Time Visual-Inertial Localization. InRobotics: Science and Systems, 2015. 1

  62. [62]

    Leverag- ing camera triplets for efficient and accurate structure- from-motion

    Lalit Manam and Venu Madhav Govindu. Leverag- ing camera triplets for efficient and accurate structure- from-motion. InProceedings of the IEEE/CVF Con- ference on Computer Vision and Pattern Recognition (CVPR), pages 4959–4968, June 2024. 13

  63. [63]

    Defeating Image Obfuscation with Deep Learning

    Richard McPherson, Reza Shokri, and Vitaly Shmatikov. Defeating Image Obfuscation with Deep Learning.arXiv preprint arXiv:1609.00408, 2016. 5, 9

  64. [64]

    Sven Middelberg, Torsten Sattler, Ole Untzelmann, and Leif P. Kobbelt. Scalable 6-DOF Localization on Mobile Devices. InEuropean Conference on Computer Vision, 2014. 1

  65. [65]

    Efficient privacy-preserving visual localization using 3d ray clouds

    Heejoon Moon, Chunghwan Lee, and Je Hyeong Hong. Efficient privacy-preserving visual localization using 3d ray clouds. InProceedings of the IEEE/CVF Con- ference on Computer Vision and Pattern Recognition (CVPR), pages 9773–9783, June 2024. 1, 3, 9, 14

  66. [66]

    The Mapillary Vistas Dataset for Semantic Understanding of Street Scenes

    Gerhard Neuhold, Tobias Ollmann, Samuel Rota Bulo, and Peter Kontschieder. The Mapillary Vistas Dataset for Semantic Understanding of Street Scenes. InPro- ceedings of the IEEE international conference on com- puter vision, pages 4990–4999, 2017. 11

  67. [67]

    Lee, Daniel DeTone, Tsun-Yi Yang, Tianwei Shen, Eddy Ilg, Vassileios Balntas, Krystian Mikolajczyk, and Chris Sweeney

    Tony Ng, Hyo Jin Kim, Vincent T. Lee, Daniel DeTone, Tsun-Yi Yang, Tianwei Shen, Eddy Ilg, Vassileios Balntas, Krystian Mikolajczyk, and Chris Sweeney. NinjaDesc: Content-Concealing Visual De- scriptors via Adversarial Learning. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 12797–12807, June 2022. 1, 3, 9

  68. [68]

    OoD-Pose: Camera Pose Regression From Out-of-Distribution Synthetic Views

    Tony Ng, Adrian Lopez-Rodriguez, Vassileios Balntas, and Krystian Mikolajczyk. OoD-Pose: Camera Pose Regression From Out-of-Distribution Synthetic Views. In2022 International Conference on 3D Vision (3DV),

  69. [69]

    D. Nistér. An Efficient Solution to the Five-Point Relative Pose Problem.IEEE Transactions on Pat- tern Analysis and Machine Intelligence, 26(6):756–770, June 2004. 3

  70. [70]

    Schönberger, Viktor Larsson, and Marc Pollefeys

    Linfei Pan, Johannes L. Schönberger, Viktor Larsson, and Marc Pollefeys. Privacy preserving localization via coordinate permutations. InProceedings of the IEEE/CVF International Conference on Computer Vi- sion (ICCV), pages 18174–18183, October 2023. 1, 3, 6, 9

  71. [71]

    easy-anon - An Easy- to-Use Image Masking and Anonymization Tool, 2025

    Vojtech Panek and contributors. easy-anon - An Easy- to-Use Image Masking and Anonymization Tool, 2025. https://github.com/spatial-intelligence-group/ easy_anon. 4

  72. [72]

    Meshloc: Mesh-based visual localization

    Vojtech Panek, Zuzana Kukelova, and Torsten Sattler. Meshloc: Mesh-based visual localization. InECCV,

  73. [73]

    Visual Localization Using Imperfect 3D Models From the Internet

    Vojtech Panek, Zuzana Kukelova, and Torsten Sattler. Visual Localization Using Imperfect 3D Models From the Internet. InProceedings of the IEEE/CVF Con- ference on Computer Vision and Pattern Recognition (CVPR), pages 13175–13186, June 2023. 2

  74. [74]

    A Guide to Structureless Visual Lo- calization.arXiv preprint arXiv:2504.17636, 2025

    Vojtech Panek, Qunjie Zhou, Yaqing Ding, Sérgio Agostinho, Zuzana Kukelova, Torsten Sattler, and Laura Leal-Taixé. A Guide to Structureless Visual Lo- calization.arXiv preprint arXiv:2504.17636, 2025. 2, 4

  75. [75]

    Lambda twist: An accurate fast robust perspective three point (p3p) solver

    Mikael Persson and Klas Nordberg. Lambda twist: An accurate fast robust perspective three point (p3p) solver. InEuropean Conference on Computer Vision (ECCV), 2018. 4

  76. [76]

    Piasco, D

    N. Piasco, D. Sidibé, V. Gouet-Brunet, and C. De- monceaux. Learning Scene Geometry for Visual Lo- calization in Challenging Conditions. In2019 In- ternational Conference on Robotics and Automation (ICRA), 2019. 2

  77. [77]

    Gaussian Splatting Feature Fields for (Privacy-Preserving) Visual Localization

    Maxime Pietrantoni, Gabriela Csurka, and Torsten Sattler. Gaussian Splatting Feature Fields for (Privacy-Preserving) Visual Localization. InProceed- ings of the IEEE/CVF Conference on Computer Vi- sion and Pattern Recognition (CVPR), pages 1082– 1092, June 2025. 1, 2, 3, 4, 6, 7, 9, 10

  78. [78]

    SegLoc: Learn- ing Segmentation-Based Representations for Privacy- Preserving Visual Localization

    Maxime Pietrantoni, Martin Humenberger, Torsten Sattler, and Gabriela Csurka. SegLoc: Learn- ing Segmentation-Based Representations for Privacy- Preserving Visual Localization. InProceedings of the IEEE/CVF Conference on Computer Vision and Pat- tern Recognition (CVPR), pages 15380–15391, June

  79. [79]

    1, 2, 3, 4, 6, 7, 9, 10

  80. [80]

    Koppal, Sing Bing Kang, and Sudipta N

    Francesco Pittaluga, Sanjeev J. Koppal, Sing Bing Kang, and Sudipta N. Sinha. Revealing Scenes by In- verting Structure From Motion Reconstructions. In The IEEE Conference on Computer Vision and Pat- tern Recognition (CVPR), 2019. 1, 3, 9

Showing first 80 references.