pith. sign in

arxiv: 2604.23670 · v1 · submitted 2026-04-26 · 💻 cs.CV

Deploy DINO with Many-to-Many Association

Pith reviewed 2026-05-08 06:29 UTC · model grok-4.3

classification 💻 cs.CV
keywords DINOmany-to-many matchingimage matchingzero-shotcamera pose estimationHarmonic Consensus Maximizationout-of-distribution generalizationrobust estimation
0
0 comments X

The pith

General DINO features compete with specialized matching models on out-of-distribution datasets using many-to-many association and HCM.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper shows that DINO's general visual features can be deployed for image matching without adaptation. They address the ambiguity in matching similar objects by using many-to-many associations rather than strict one-to-one. To handle robustness efficiently under this paradigm, they develop Harmonic Consensus Maximization as a faster approximation to finding maximum-cardinality matchings. This setup lets the out-of-the-box features perform competitively with specialized models on unseen domains for camera pose estimation.

Core claim

Adopting many-to-many association for DINO features to manage inherent ambiguity on semantically similar instances, and introducing Harmonic Consensus Maximization as a likelihood-based efficient robust mechanism, allows these general-purpose features to achieve performance comparable to specialized matching models on out-of-distribution datasets in downstream tasks such as camera pose estimation.

What carries the argument

Harmonic Consensus Maximization (HCM), which provides a faster and finer-grained robust estimation by interpreting the problem from a likelihood perspective instead of computing maximum-cardinality matchings for each parameter hypothesis.

Load-bearing premise

The assumption that DINO features require a many-to-many paradigm because of ambiguity on similar instances and that HCM's likelihood approximation delivers equivalent robustness for tasks like camera pose estimation.

What would settle it

Running camera pose estimation experiments on multiple out-of-distribution datasets comparing accuracy and runtime of DINO with m-to-m plus HCM against specialized matching models; superior or equal performance on accuracy with better efficiency would confirm the claim.

Figures

Figures reproduced from arXiv: 2604.23670 by Haodong Jiang, Junfeng Wu, Mingzhe Li.

Figure 1
Figure 1. Figure 1: This figure illustrates the inherent ambiguity in establishing geometric correspondence using semantic-rich DINOv3 view at source ↗
Figure 2
Figure 2. Figure 2: An inlier under the ground-truth parameter is not view at source ↗
Figure 3
Figure 3. Figure 3: A toy example for likelihood calculation. Orange view at source ↗
Figure 4
Figure 4. Figure 4: Group precision for MKNN test with K = 3, 5, 8 and an error threshold of 5 pixels. The Easy, Average, and Hard subsets feature camera perspective difference of [0◦ , 40◦ ), [40◦ , 80◦ ) and [80◦ , 120◦ ) as detailed in Section V view at source ↗
Figure 5
Figure 5. Figure 5: Sensitivity of the HCM mechanism with respect to view at source ↗
Figure 6
Figure 6. Figure 6: Sensitivity of the HCM mechanism with respect to hyper-parameters view at source ↗
Figure 7
Figure 7. Figure 7: Histograms of discretization error in 106 runs view at source ↗
read the original abstract

Motivated by the limited generalization of supervised image matching models to unseen image domains, we explore the zero-shot deployment of DINO features for this task. The generalist visual representation extracted from DINO has inherent ambiguity when used to match feature points among semantically similar instances, prompting us to adopt a many-to-many (m-to-m) matching paradigm. However, the existing robust mechanism under m-to-m data association is computationally heavy, which requires finding a maximum-cardinality matching in the inlier association graph for each parameter evaluation. To address this inefficiency, we introduce a novel likelihood perspective, which interprets the existing method as a zeroth-order approximation of otherwise intractable likelihood calculation,and inspires us to propose a faster and finer-grained robust mechanism, termed as Harmonic Consensus Maximization (HCM). Take camera pose estimation as an exemplifying downstream task, we demonstrate that general-purpose visual features, used out of the box without any adaptation, can compete with specialized matching models on out-of-distribution datasets when mated with m-to-m association and the HCM mechanism.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

1 major / 2 minor

Summary. The manuscript explores the zero-shot deployment of unmodified DINO features for image matching tasks by adopting a many-to-many (m-to-m) association paradigm to address inherent ambiguities among semantically similar instances. It reinterprets the existing robust mechanism under m-to-m matching as a zeroth-order approximation to an otherwise intractable likelihood calculation and introduces Harmonic Consensus Maximization (HCM) as a faster, finer-grained robust alternative. Using camera pose estimation as the downstream task, the paper claims that general-purpose DINO features combined with m-to-m association and HCM can compete with specialized matching models on out-of-distribution datasets.

Significance. If the result holds and HCM is shown to preserve the robustness properties of maximum-cardinality matching, this would be significant for computer vision as it demonstrates that off-the-shelf generalist visual representations can handle challenging OOD geometric tasks without any adaptation or retraining. The likelihood-based reinterpretation provides a principled foundation for deriving new mechanisms, and the work highlights the potential of m-to-m paradigms for ambiguous feature matching scenarios.

major comments (1)
  1. [Abstract and HCM derivation] Abstract and HCM proposal: The central claim that HCM is a valid faster approximation to the intractable likelihood (reinterpreting the existing max-cardinality method as zeroth-order) is load-bearing for the competitiveness result on OOD pose estimation. However, no explicit derivation, assumptions (e.g., independence or cardinality conditions on DINO descriptor graphs), or closed-form equivalence is provided to show that the harmonic consensus step rigorously approximates or preserves the robustness of maximum-cardinality matching. If the inlier graph induced by DINO features violates these implicit assumptions, HCM may not retain the necessary properties, undermining the switch from one-to-one matching.
minor comments (2)
  1. [Abstract] The abstract states the demonstration on camera pose estimation but provides no equations, quantitative results, error bars, or dataset details, making immediate assessment of the 'compete with specialized models' claim difficult.
  2. [Method introduction] The introduction of 'Harmonic Consensus Maximization (HCM)' would benefit from an early, self-contained definition of the 'harmonic' component and how it differs operationally from standard consensus maximization before the likelihood reinterpretation is presented.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for the constructive feedback and for identifying the need for greater rigor in the theoretical justification of HCM. We address the major comment below and commit to a revision that strengthens the manuscript without altering its core claims or experimental results.

read point-by-point responses
  1. Referee: [Abstract and HCM derivation] Abstract and HCM proposal: The central claim that HCM is a valid faster approximation to the intractable likelihood (reinterpreting the existing max-cardinality method as zeroth-order) is load-bearing for the competitiveness result on OOD pose estimation. However, no explicit derivation, assumptions (e.g., independence or cardinality conditions on DINO descriptor graphs), or closed-form equivalence is provided to show that the harmonic consensus step rigorously approximates or preserves the robustness of maximum-cardinality matching. If the inlier graph induced by DINO features violates these implicit assumptions, HCM may not retain the necessary properties, undermining the switch from one-to-one matching.

    Authors: We agree that the current presentation would benefit from an explicit derivation and statement of assumptions. Section 3 of the manuscript introduces the likelihood view by modeling associations as an inlier graph and treats maximum-cardinality matching as selecting the largest consistent set (a zeroth-order count-based approximation to the mode of the joint likelihood). HCM is motivated as replacing the cardinality objective with a harmonic-mean consensus score over pairwise consistencies, which is computationally lighter and incorporates descriptor similarity magnitudes. However, we acknowledge that the text does not list the required assumptions (e.g., conditional independence of edge weights given the pose hypothesis, or bounded cardinality of the true inlier set) nor supply a closed-form proof that the harmonic step preserves the same robustness guarantees. We will add a dedicated subsection with the full derivation, the explicit assumptions on DINO-induced graphs, a sketch showing equivalence under those conditions, and a short discussion of potential violations together with empirical checks confirming that HCM retains competitive robustness on the reported OOD benchmarks. This revision directly addresses the concern that the m-to-m + HCM pipeline might lose necessary properties. revision: yes

Circularity Check

0 steps flagged

No significant circularity: HCM introduced via independent likelihood reinterpretation

full rationale

The paper's derivation chain reinterprets an existing m-to-m robust mechanism as a zeroth-order likelihood approximation and proposes HCM as a faster successor. No equations, self-citations, or fitted parameters are exhibited that reduce the central claims (DINO + m-to-m + HCM for OOD pose estimation) to tautologies or inputs by construction. The likelihood view and HCM mechanism are presented as novel contributions with independent grounding, making the derivation self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 1 invented entities

Review conducted from abstract only; full derivations, parameter choices, and experimental protocols are unavailable. The ledger reflects only assumptions explicitly stated or implied in the abstract.

axioms (2)
  • domain assumption DINO features exhibit inherent ambiguity when matching points among semantically similar instances
    Directly stated as the motivation prompting the m-to-m paradigm
  • ad hoc to paper The existing robust mechanism under m-to-m association is a zeroth-order approximation of an otherwise intractable likelihood
    Invoked to justify the proposal of HCM as a faster alternative
invented entities (1)
  • Harmonic Consensus Maximization (HCM) no independent evidence
    purpose: Faster and finer-grained robust mechanism for m-to-m data association in feature matching
    Newly introduced in the paper; no independent evidence or external validation provided in the abstract

pith-pipeline@v0.9.0 · 5477 in / 1543 out tokens · 33063 ms · 2026-05-08T06:29:23.348197+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

47 extracted references · 47 canonical work pages · 2 internal anchors

  1. [1]

    Deep vit features as dense visual descriptors.arXiv preprint arXiv:2112.05814, 2(3):4, 2021

    Shir Amir, Yossi Gandelsman, Shai Bagon, and Tali Dekel. Deep vit features as dense visual descriptors. arXiv preprint arXiv:2112.05814, 2(3):4, 2021

  2. [2]

    Outlier-robust estimation: Hardness, minimally tuned algorithms, and applications.IEEE Transactions on Robotics, 38(1):281–301, 2021

    Pasquale Antonante, Vasileios Tzoumas, Heng Yang, and Luca Carlone. Outlier-robust estimation: Hardness, minimally tuned algorithms, and applications.IEEE Transactions on Robotics, 38(1):281–301, 2021

  3. [3]

    Hpatches: A benchmark and evaluation of handcrafted and learned local descriptors

    Vassileios Balntas, Karel Lenc, Andrea Vedaldi, and Krystian Mikolajczyk. Hpatches: A benchmark and evaluation of handcrafted and learned local descriptors. InCVPR, 2017

  4. [4]

    Surf: Speeded up robust features

    Herbert Bay, Tinne Tuytelaars, and Luc Van Gool. Surf: Speeded up robust features. InEuropean conference on computer vision, pages 404–417. Springer, 2006

  5. [5]

    Mismatched: Evaluating the limits of image matching approaches and benchmarks

    Sierra Bonilla, Chiara Di Vece, Rema Daher, Xinwei Ju, Danail Stoyanov, Francisco Vasconcelos, and Sophia Bano. Mismatched: Evaluating the limits of image matching approaches and benchmarks. InEuropean Con- ference on Computer Vision, pages 120–137. Springer, 2024

  6. [6]

    Globally-optimal inlier set maximisation for camera pose and correspondence estimation.IEEE transactions on pattern analysis and machine intelli- gence, 42(2):328–342, 2018

    Dylan Campbell, Lars Petersson, Laurent Kneip, and Hongdong Li. Globally-optimal inlier set maximisation for camera pose and correspondence estimation.IEEE transactions on pattern analysis and machine intelli- gence, 42(2):328–342, 2018

  7. [7]

    Orb-slam3: An accurate open-source library for visual, visual–inertial, and multimap slam.IEEE transactions on robotics, 37(6):1874–1890, 2021

    Carlos Campos, Richard Elvira, Juan J G ´omez Rodr´ıguez, Jos ´e MM Montiel, and Juan D Tard ´os. Orb-slam3: An accurate open-source library for visual, visual–inertial, and multimap slam.IEEE transactions on robotics, 37(6):1874–1890, 2021

  8. [8]

    Hybrid scene compression for visual localization

    Federico Camposeco, Andrea Cohen, Marc Pollefeys, and Torsten Sattler. Hybrid scene compression for visual localization. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 7653–7662, 2019

  9. [9]

    Sinkhorn distances: Lightspeed computa- tion of optimal transport.Advances in neural information processing systems, 26, 2013

    Marco Cuturi. Sinkhorn distances: Lightspeed computa- tion of optimal transport.Advances in neural information processing systems, 26, 2013

  10. [10]

    Scannet: Richly-annotated 3d reconstructions of indoor scenes

    Angela Dai, Angel X Chang, Manolis Savva, Maciej Hal- ber, Thomas Funkhouser, and Matthias Nießner. Scannet: Richly-annotated 3d reconstructions of indoor scenes. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 5828–5839, 2017

  11. [11]

    Superpoint: Self-supervised interest point de- tection and description

    Daniel DeTone, Tomasz Malisiewicz, and Andrew Rabi- novich. Superpoint: Self-supervised interest point de- tection and description. InProceedings of the IEEE conference on computer vision and pattern recognition workshops, pages 224–236, 2018

  12. [12]

    Roma: Robust dense feature matching

    Johan Edstedt, Qiyu Sun, Georg B ¨okman, M ˚arten Wadenb¨ack, and Michael Felsberg. Roma: Robust dense feature matching. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 19790–19800, 2024

  13. [13]

    A brute-force algorithm for reconstructing a scene from two projections

    Olof Enqvist, Fangyuan Jiang, and Fredrik Kahl. A brute-force algorithm for reconstructing a scene from two projections. InCVPR 2011, pages 2961–2968. IEEE, 2011

  14. [14]

    Random sample consensus: a paradigm for model fitting with applications to image analysis and automated cartography.Commu- nications of the ACM, 24(6):381–395, 1981

    Martin A Fischler and Robert C Bolles. Random sample consensus: a paradigm for model fitting with applications to image analysis and automated cartography.Commu- nications of the ACM, 24(6):381–395, 1981

  15. [15]

    Optimal relative pose with unknown correspondences

    Johan Fredriksson, Viktor Larsson, Carl Olsson, and Fredrik Kahl. Optimal relative pose with unknown correspondences. InProceedings of the IEEE conference on computer vision and pattern recognition, pages 1728– 1736, 2016

  16. [16]

    Rotation averaging.International journal of computer vision, 103(3):267–305, 2013

    Richard Hartley, Jochen Trumpf, Yuchao Dai, and Hong- dong Li. Rotation averaging.International journal of computer vision, 103(3):267–305, 2013

  17. [17]

    An nˆ5/2 algorithm for maximum matchings in bipartite graphs

    John E Hopcroft and Richard M Karp. An nˆ5/2 algorithm for maximum matchings in bipartite graphs. SIAM Journal on computing, 2(4):225–231, 1973

  18. [18]

    Navi: Category-agnostic image collections with high-quality 3d shape and pose annotations.Ad- vances in Neural Information Processing Systems, 36: 76061–76084, 2023

    Varun Jampani, Kevis-Kokitsi Maninis, Andreas Engel- hardt, Arjun Karpur, Karen Truong, Kyle Sargent, Stefan Popov, Andr ´e Araujo, Ricardo Martin Brualla, Kaushal Patel, et al. Navi: Category-agnostic image collections with high-quality 3d shape and pose annotations.Ad- vances in Neural Information Processing Systems, 36: 76061–76084, 2023

  19. [19]

    Omniglue: Generalizable feature matching with foundation model guidance

    Hanwen Jiang, Arjun Karpur, Bingyi Cao, Qixing Huang, and Andr ´e Araujo. Omniglue: Generalizable feature matching with foundation model guidance. InProceed- ings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 19865–19875, 2024

  20. [20]

    Score: Saturated consensus relocalization in semantic line maps

    Haodong Jiang, Xiang Zheng, Yanglin Zhang, Qingcheng Zeng, Yiqian Li, Ziyang Hong, and Junfeng Wu. Score: Saturated consensus relocalization in semantic line maps. arXiv preprint arXiv:2503.03254, 2025

  21. [21]

    Anyloc: Towards universal visual place recognition.IEEE Robotics and Automation Letters, 9(2):1286–1293, 2023

    Nikhil Keetha, Avneesh Mishra, Jay Karhade, Kr- ishna Murthy Jatavallabhula, Sebastian Scherer, Madhava Krishna, and Sourav Garg. Anyloc: Towards universal visual place recognition.IEEE Robotics and Automation Letters, 9(2):1286–1293, 2023

  22. [22]

    Approximation algorithms for rectangle stabbing and interval stabbing problems.SIAM Journal on Discrete Mathematics, 20 (3):748–768, 2006

    Sofia Kovaleva and Frits CR Spieksma. Approximation algorithms for rectangle stabbing and interval stabbing problems.SIAM Journal on Discrete Mathematics, 20 (3):748–768, 2006

  23. [23]

    Grounding image matching in 3d with mast3r

    Vincent Leroy, Yohann Cabon, and J ´erˆome Revaud. Grounding image matching in 3d with mast3r. In European conference on computer vision, pages 71–91. Springer, 2024

  24. [24]

    Megadepth: Learning single-view depth prediction from internet photos

    Zhengqi Li and Noah Snavely. Megadepth: Learning single-view depth prediction from internet photos. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 2041–2050, 2018

  25. [25]

    Lightglue: Local feature matching at light speed

    Philipp Lindenberger, Paul-Edouard Sarlin, and Marc Pollefeys. Lightglue: Local feature matching at light speed. InProceedings of the IEEE/CVF international conference on computer vision, pages 17627–17638, 2023

  26. [26]

    Mind the gap: Aligning vision foundation models to image feature matching

    Yuhan Liu, Jingwen Fu, Yang Wu, Kangyi Wu, Pengna Li, Jiayi Wu, Sanping Zhou, and Jingmin Xin. Mind the gap: Aligning vision foundation models to image feature matching. InProceedings of the IEEE/CVF International Conference on Computer Vision, pages 20313–20323, 2025

  27. [27]

    Distinctive image features from scale- invariant keypoints.International journal of computer vision, 60:91–110, 2004

    David G Lowe. Distinctive image features from scale- invariant keypoints.International journal of computer vision, 60:91–110, 2004

  28. [28]

    Deter- ministic sample consensus with multiple match hypothe- ses

    Paul McIlroy, Simon Taylor, and Tom Drummond. Deter- ministic sample consensus with multiple match hypothe- ses

  29. [29]

    DINOv2: Learning Robust Visual Features without Supervision

    Maxime Oquab, Timoth ´ee Darcet, Th ´eo Moutakanni, Huy V o, Marc Szafraniec, Vasil Khalidov, Pierre Fer- nandez, Daniel Haziza, Francisco Massa, Alaaeldin El- Nouby, et al. Dinov2: Learning robust visual features without supervision.arXiv preprint arXiv:2304.07193, 2023

  30. [30]

    Fast registration based on noisy planes with unknown correspondences for 3-d mapping

    Kaustubh Pathak, Andreas Birk, Narunas Va ˇskeviˇcius, and Jann Poppinga. Fast registration based on noisy planes with unknown correspondences for 3-d mapping. IEEE Transactions on Robotics, 26(3):424–441, 2010

  31. [31]

    Learning transferable visual models from natural lan- guage supervision

    Alec Radford, Jong Wook Kim, Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sandhini Agarwal, Girish Sastry, Amanda Askell, Pamela Mishkin, Jack Clark, et al. Learning transferable visual models from natural lan- guage supervision. InInternational conference on ma- chine learning, pages 8748–8763. PmLR, 2021

  32. [32]

    Neighbourhood consensus networks.Advances in neural information processing systems, 31, 2018

    Ignacio Rocco, Mircea Cimpoi, Relja Arandjelovi ´c, Aki- hiko Torii, Tomas Pajdla, and Josef Sivic. Neighbourhood consensus networks.Advances in neural information processing systems, 31, 2018

  33. [33]

    From coarse to fine: Robust hierarchical localization at large scale

    Paul-Edouard Sarlin, Cesar Cadena, Roland Siegwart, and Marcin Dymczyk. From coarse to fine: Robust hierarchical localization at large scale. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 12716–12725, 2019

  34. [34]

    Superglue: Learning feature matching with graph neural networks

    Paul-Edouard Sarlin, Daniel DeTone, Tomasz Mal- isiewicz, and Andrew Rabinovich. Superglue: Learning feature matching with graph neural networks. InPro- ceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 4938–4947, 2020

  35. [35]

    DINOv3

    Oriane Sim ´eoni, Huy V V o, Maximilian Seitzer, Federico Baldassarre, Maxime Oquab, Cijo Jose, Vasil Khalidov, Marc Szafraniec, Seungeun Yi, Micha ¨el Ramamonjisoa, et al. Dinov3.arXiv preprint arXiv:2508.10104, 2025

  36. [36]

    Diagonal equivalence to matrices with prescribed row and column sums.The American Mathematical Monthly, 74(4):402–405, 1967

    Richard Sinkhorn. Diagonal equivalence to matrices with prescribed row and column sums.The American Mathematical Monthly, 74(4):402–405, 1967

  37. [37]

    Concerning nonneg- ative matrices and doubly stochastic matrices.Pacific Journal of Mathematics, 21(2):343–348, 1967

    Richard Sinkhorn and Paul Knopp. Concerning nonneg- ative matrices and doubly stochastic matrices.Pacific Journal of Mathematics, 21(2):343–348, 1967

  38. [38]

    Loftr: Detector-free local feature matching with transformers

    Jiaming Sun, Zehong Shen, Yuang Wang, Hujun Bao, and Xiaowei Zhou. Loftr: Detector-free local feature matching with transformers. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 8922–8931, 2021

  39. [39]

    Emergent correspondence from image diffusion.Advances in Neural Information Processing Systems, 36:1363–1389, 2023

    Luming Tang, Menglin Jia, Qianqian Wang, Cheng Perng Phoo, and Bharath Hariharan. Emergent correspondence from image diffusion.Advances in Neural Information Processing Systems, 36:1363–1389, 2023

  40. [40]

    Attention is all you need.Advances in neural information processing systems, 30, 2017

    Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Łukasz Kaiser, and Illia Polosukhin. Attention is all you need.Advances in neural information processing systems, 30, 2017

  41. [41]

    Efficient loftr: Semi-dense local feature matching with sparse-like speed

    Yifan Wang, Xingyi He, Sida Peng, Dongli Tan, and Xiaowei Zhou. Efficient loftr: Semi-dense local feature matching with sparse-like speed. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 21666–21675, 2024

  42. [42]

    Certifiably optimal mutual lo- calization with anonymous bearing measurements.IEEE Robotics and Automation Letters, 7(4):9374–9381, 2022

    Yingjian Wang, Xiangyong Wen, Longji Yin, Chao Xu, Yanjun Cao, and Fei Gao. Certifiably optimal mutual lo- calization with anonymous bearing measurements.IEEE Robotics and Automation Letters, 7(4):9374–9381, 2022

  43. [43]

    Teaser: Fast and certifiable point cloud registration.IEEE Transac- tions on Robotics, 37(2):314–333, 2020

    Heng Yang, Jingnan Shi, and Luca Carlone. Teaser: Fast and certifiable point cloud registration.IEEE Transac- tions on Robotics, 37(2):314–333, 2020

  44. [44]

    Optimal essential matrix estimation via inlier-set maximization

    Jiaolong Yang, Hongdong Li, and Yunde Jia. Optimal essential matrix estimation via inlier-set maximization. InEuropean Conference on Computer Vision, pages 111–

  45. [45]

    Lift: Learned invariant feature transform

    Kwang Moo Yi, Eduard Trulls, Vincent Lepetit, and Pascal Fua. Lift: Learned invariant feature transform. InEuropean conference on computer vision, pages 467–

  46. [46]

    A tale of two features: Stable diffu- sion complements dino for zero-shot semantic corre- spondence.Advances in Neural Information Processing Systems, 36:45533–45547, 2023

    Junyi Zhang, Charles Herrmann, Junhwa Hur, Luisa Polania Cabrera, Varun Jampani, Deqing Sun, and Ming- Hsuan Yang. A tale of two features: Stable diffu- sion complements dino for zero-shot semantic corre- spondence.Advances in Neural Information Processing Systems, 36:45533–45547, 2023

  47. [47]

    Telling left from right: Identifying geometry-aware se- mantic correspondence

    Junyi Zhang, Charles Herrmann, Junhwa Hur, Eric Chen, Varun Jampani, Deqing Sun, and Ming-Hsuan Yang. Telling left from right: Identifying geometry-aware se- mantic correspondence. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 3076–3085, 2024. APPENDIXA HYPER-PARAMETERSENSITIVITYANALYSIS Recall thatHarmonic Con...