SAMatcher: Co-Visibility Modeling with Segment Anything for Robust Feature Matching

He Chen; Mingyue Dong; Qiyuan Ma; Wei Ji; Xianwei Zheng; Xu Pan

arxiv: 2606.03406 · v1 · pith:2CQROFLFnew · submitted 2026-06-02 · 💻 cs.CV

SAMatcher: Co-Visibility Modeling with Segment Anything for Robust Feature Matching

Xu Pan , Qiyuan Ma , Mingyue Dong , He Chen , Wei Ji , Xianwei Zheng This is my paper

Pith reviewed 2026-06-28 11:00 UTC · model grok-4.3

classification 💻 cs.CV

keywords feature matchingco-visibility modelingSegment Anything Modelimage correspondencestructured priorsmulti-view reasoningfoundation modelsrobust matching

0 comments

The pith

SAMatcher adapts the Segment Anything Model to predict co-visible regions across image pairs as priors for more reliable feature matching.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper argues that correspondence between images can be improved by first identifying regions visible in both views rather than matching local features directly. It extends the Segment Anything Model with a symmetric interaction that lets the two images exchange information to produce consistent masks and boxes. These outputs then guide the matching process, with a joint training scheme that enforces consistency between the masks and boxes. A reader would care because many vision tasks rely on accurate point matches when camera angles or distances change sharply, and the work suggests existing single-image models can be repurposed for this without new architectures from scratch.

Core claim

SAMatcher formulates correspondence estimation through co-visibility modeling by first predicting co-visible region masks and bounding boxes as structured priors, using a symmetric cross-view interaction mechanism on the Segment Anything Model that enables bidirectional feature exchange and cross-view semantic alignment, together with a unified supervision scheme that jointly optimizes mask prediction, box regression, and mask-box consistency constraints.

What carries the argument

Symmetric cross-view interaction mechanism on the Segment Anything Model that performs bidirectional feature exchange and cross-view semantic alignment to produce co-visible masks and boxes.

If this is right

Matching performance improves substantially on benchmarks that contain large viewpoint and scale variations.
Structured region-level priors outperform methods that operate only at the pixel or patch level.
Joint optimization through mask learning, box regression, and mask-box consistency yields better priors than separate training.
Foundation models trained on monocular segmentation can be extended to multi-view correspondence tasks.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same co-visibility idea could be tested on other pre-trained segmentation or detection models to see if the benefit is specific to SAM.
In Structure-from-Motion pipelines the region masks might be used to filter outliers before bundle adjustment.
The approach may help matching when parts of the scene are occluded in one view but not the other.
If the masks alone prove sufficient, the method could simplify pipelines by reducing reliance on dense local descriptors.

Load-bearing premise

Accurate prediction of co-visible region masks and bounding boxes through the SAM-based symmetric interaction will supply effective structured priors that improve correspondence estimation over direct local feature matching.

What would settle it

On standard matching benchmarks, replace the predicted co-visible masks and boxes with random or empty regions and check whether overall matching accuracy falls back to the level of direct local-feature baselines; if it does not, the value of the co-visibility priors is not supported.

Figures

Figures reproduced from arXiv: 2606.03406 by He Chen, Mingyue Dong, Qiyuan Ma, Wei Ji, Xianwei Zheng, Xu Pan.

**Figure 1.** Figure 1: Motivation of SAMatcher. (a) An image pair with large scale variation in co-visible regions, where corresponding structures occupy significantly different spatial extents across views. (b) Co-visible region segmentation in SAMatcher, highlighting regions jointly visible across views while suppressing non-overlapping areas. (c) Pixel confusion caused by scale inconsistency in local matching and its mitigat… view at source ↗

**Figure 2.** Figure 2: Overview of the proposed SAMatcher framework. Given an image pair, SAMatcher extracts high-level visual representations using a shared encoder. A cross-view symmetric fusion module aligns semantic information across views and highlights potentially co-visible content. Based on the fused features, a prompt-driven mask decoder predicts co-visible region masks, while a dedicated box decoder estimates correspo… view at source ↗

**Figure 3.** Figure 3: Architecture of the proposed symmetric cross-view feature interaction module. Features from the source and target views are interleaved and processed by a stack of symmetric interaction blocks, enabling bidirectional token-level communication across views. Window-based attention with positional encoding facilitates efficient local interaction while preserving view identity. Subsequent single-view refinemen… view at source ↗

**Figure 4.** Figure 4 [PITH_FULL_IMAGE:figures/full_fig_p009_4.png] view at source ↗

**Figure 5.** Figure 5: Qualitative comparison of co-visible region detection. For each image pair, we show OETR [33] box-only predictions and SAMatcher mask predictions, with masks overlaid as semi-transparent purple regions. While OETR provides coarse bounding boxes, SAMatcher produces accurate and consistent co-visible regions across views, even under large viewpoint changes and partial overlap. works, SAMatcher continues to p… view at source ↗

**Figure 6.** Figure 6: Region-guided correspondence comparison. SP+SG, +OETR, and +SAMatcher. Green lines denote correct matches, red lines incorrect ones. Under large scale variation, OETR often predicts inaccurate or missing regions, while SAMatcher identifies valid co-visible regions and yields more reliable correspondences. Overall, the results in Table I demonstrate that SAMatcher consistently improves correspondence qualit… view at source ↗

**Figure 7.** Figure 7: Complementarity of mask and box predictions. Masks (magenta) provide high recall but coarse coverage, while boxes (red) offer precise localization. Constraining masks with boxes yields refined co-visible regions (green), improving correspondence reliability [PITH_FULL_IMAGE:figures/full_fig_p012_7.png] view at source ↗

**Figure 8.** Figure 8: Zero-shot generalization on unseen datasets. Top: GL3D (outdoor aerial scenes). Bottom: ScanNet (indoor environments). Predicted co-visible regions are overlaid as semi-transparent magenta masks. SAMatcher consistently captures mutually observable regions while suppressing non-overlapping content under domain shifts. space of potential matches and alleviates common failure modes under large viewpoint or sc… view at source ↗

read the original abstract

Reliable correspondence estimation is a fundamental problem in image processing, underpinning applications such as Structure from Motion, visual localization, and image registration. Existing learning-based methods have significantly improved local feature representations, yet most still operate at the pixel or patch level and lack explicit modeling of regions that are jointly visible across views. We propose SAMatcher, a feature matching framework that formulates correspondence estimation through co-visibility modeling. Instead of directly matching local features, SAMatcher first predicts co-visible region masks and bounding boxes as structured priors for correspondence estimation. Built upon the Segment Anything Model (SAM), it introduces a symmetric cross-view interaction mechanism that enables bidirectional feature exchange and cross-view semantic alignment. We further develop a unified supervision scheme that jointly optimizes mask prediction and box localization through mask learning, box regression, and mask-box consistency constraints. Extensive experiments on challenging benchmarks demonstrate substantial improvements over existing matching pipelines, particularly under large viewpoint and scale variations. Our results show that foundation models originally designed for monocular segmentation can be effectively extended to multi-view correspondence reasoning through explicit co-visibility modeling, offering a new perspective on structured representation learning for image matching. Code and project page: https://xupan.top/Projects/samatcher

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

SAMatcher adds symmetric cross-view interaction on SAM to predict co-visible masks and boxes as priors for matching, but the gains may not be isolated to those priors.

read the letter

The one thing to take away is that this work adds an explicit co-visibility modeling step on top of SAM to guide feature matching, using symmetric cross-view interaction to predict masks and boxes. It claims this helps more than direct matching, especially with viewpoint changes.

What the paper does is take the Segment Anything Model and extend it with bidirectional feature exchange between views, then supervise the mask prediction, box regression, and their consistency together. That unified scheme is a clean way to train the outputs. The experiments reportedly show gains on challenging benchmarks, which is the kind of result that matters for applications like SfM and localization.

The soft spot is in the evidence for the priors themselves. The stress-test concern holds: there is no mention of an ablation that keeps the SAM backbone and interaction but removes the co-visibility prediction to see if matching still improves. Without that, we cannot be sure the structured regions are the effective part rather than just the richer cross-view features. If the full paper has those controls, it would strengthen the case considerably.

This paper is for computer vision researchers working on robust correspondence estimation who want to incorporate region-level visibility into their pipelines. A reader interested in foundation model adaptations will get a concrete example of how to add multi-view reasoning.

I recommend sending it for peer review. The architecture is described clearly enough in the abstract to evaluate, and the idea has enough substance that referees should see it.

Referee Report

2 major / 2 minor

Summary. The paper proposes SAMatcher, a feature matching framework built on the Segment Anything Model (SAM) that predicts co-visible region masks and bounding boxes via a symmetric cross-view interaction mechanism. These predictions serve as structured priors for correspondence estimation rather than direct local feature matching. A unified supervision scheme jointly optimizes mask prediction, box regression, and mask-box consistency. The work claims substantial improvements over existing pipelines on benchmarks with large viewpoint and scale variations, arguing that monocular segmentation foundation models can be extended to multi-view correspondence through explicit co-visibility modeling.

Significance. If the co-visibility predictions can be shown to function as effective structured priors that improve matching beyond the SAM backbone and interaction module alone, the approach would provide a concrete demonstration of transferring monocular foundation models to multi-view geometric tasks. This could influence future work on structured representation learning for SfM, localization, and registration by offering an alternative to purely pixel- or patch-level matching.

major comments (2)

[Experiments] The central claim that predicted co-visible masks and boxes act as effective structured priors for correspondence estimation (as opposed to gains arising from the SAM features or symmetric interaction alone) is load-bearing but not isolated. No ablation is described that (a) disables the co-visibility branch while retaining the SAM backbone and cross-view interaction or (b) substitutes oracle co-visible regions; without this, benchmark gains cannot be attributed specifically to the co-visibility modeling.
[Method] The integration step that uses the predicted masks/boxes as priors in the final correspondence estimation is not detailed with sufficient specificity (e.g., how mask and box outputs modulate feature matching or are combined with local descriptors). This leaves the precise mechanism by which co-visibility modeling improves matching unclear.

minor comments (2)

[Abstract] The abstract states 'substantial improvements' but does not report concrete metrics (e.g., AUC@5° or matching score deltas on specific datasets); adding these numbers would strengthen the summary of results.
[Method] Notation for the symmetric interaction and the three supervision terms (mask learning, box regression, mask-box consistency) should be introduced with explicit equations or pseudocode to improve reproducibility.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback on isolating the contribution of co-visibility modeling and clarifying the integration mechanism. We address each major comment below and will revise the manuscript to strengthen these aspects.

read point-by-point responses

Referee: [Experiments] The central claim that predicted co-visible masks and boxes act as effective structured priors for correspondence estimation (as opposed to gains arising from the SAM features or symmetric interaction alone) is load-bearing but not isolated. No ablation is described that (a) disables the co-visibility branch while retaining the SAM backbone and cross-view interaction or (b) substitutes oracle co-visible regions; without this, benchmark gains cannot be attributed specifically to the co-visibility modeling.

Authors: We agree that an explicit ablation isolating the co-visibility branch is needed to attribute gains specifically to the structured priors rather than the SAM backbone or interaction module alone. The current experiments compare the full model against external baselines but do not include an internal ablation that removes the mask/box heads while retaining the backbone and cross-view interaction. We will add this ablation (and, if space permits, an oracle co-visible region experiment) in the revised manuscript to directly address the concern. revision: yes
Referee: [Method] The integration step that uses the predicted masks/boxes as priors in the final correspondence estimation is not detailed with sufficient specificity (e.g., how mask and box outputs modulate feature matching or are combined with local descriptors). This leaves the precise mechanism by which co-visibility modeling improves matching unclear.

Authors: We acknowledge that the description of how the predicted masks and boxes are integrated into correspondence estimation could be more precise. Section 3.3 states that the outputs serve as structured priors by restricting matching to co-visible regions and using boxes for region-level guidance, but the exact modulation of the similarity matrix or combination with local descriptors is only sketched. In the revision we will expand this section with additional equations, a pseudocode listing, and a figure that explicitly shows the masking operation on the feature correlation volume and the box-guided descriptor aggregation step. revision: yes

Circularity Check

0 steps flagged

No significant circularity; empirical extension of SAM with independent experimental validation

full rationale

The paper presents an architectural extension of the external Segment Anything Model (SAM) via added symmetric cross-view interaction, mask/box prediction heads, and a joint supervision scheme. No equations, predictions, or uniqueness claims reduce by construction to fitted inputs or self-citations. Performance claims rest on benchmark experiments rather than tautological derivations. The reader's noted weakest assumption concerns empirical effectiveness (addressable by ablation) but does not constitute circularity under the defined criteria.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Only the abstract is available; no explicit free parameters, axioms, or invented entities are stated.

pith-pipeline@v0.9.1-grok · 5750 in / 1004 out tokens · 36880 ms · 2026-06-28T11:00:44.327738+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

58 extracted references · 2 canonical work pages · 1 internal anchor

[1]

Structure-from-motion revisited,

J. L. Schonberger and J.-M. Frahm, “Structure-from-motion revisited,” inProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2016, pp. 4104–4113

2016
[2]

Orb-slam: A versatile and accurate monocular slam system,

R. Mur-Artal, J. M. M. Montiel, and J. D. Tardos, “Orb-slam: A versatile and accurate monocular slam system,”IEEE Transactions on Robotics, vol. 31, no. 5, pp. 1147–1163, 2015

2015
[3]

Direct sparse odometry,

J. Engel, V . Koltun, and D. Cremers, “Direct sparse odometry,”IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 40, no. 3, pp. 611–625, 2017

2017
[4]

A comparison and evaluation of multi-view stereo reconstruction al- gorithms,

S. M. Seitz, B. Curless, J. Diebel, D. Scharstein, and R. Szeliski, “A comparison and evaluation of multi-view stereo reconstruction al- gorithms,” inProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, vol. 1. IEEE, 2006, pp. 519–528

2006
[5]

Building rome in a day,

S. Agarwal, Y . Furukawa, N. Snavely, I. Simon, B. Curless, S. M. Seitz, and R. Szeliski, “Building rome in a day,”Communications of the ACM, vol. 54, no. 10, pp. 105–112, 2011

2011
[6]

Reconstructing the world* in six days*(as captured by the yahoo 100 million image dataset),

J. Heinly, J. L. Schonberger, E. Dunn, and J.-M. Frahm, “Reconstructing the world* in six days*(as captured by the yahoo 100 million image dataset),” inProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2015, pp. 3287–3295

2015
[7]

Distinctive image features from scale-invariant keypoints,

D. G. Lowe, “Distinctive image features from scale-invariant keypoints,” International Journal of Computer Vision, vol. 60, pp. 91–110, 2004

2004
[8]

Surf: Speeded up robust features,

H. Bay, T. Tuytelaars, and L. Van Gool, “Surf: Speeded up robust features,” inProceedings of the European Conference on Computer Vision. Springer, 2006, pp. 404–417

2006
[9]

Orb: An efficient alternative to sift or surf,

E. Rublee, V . Rabaud, K. Konolige, and G. Bradski, “Orb: An efficient alternative to sift or surf,” inProceedings of the IEEE/CVF International Conference on Computer Vision. Ieee, 2011, pp. 2564–2571

2011
[10]

Hartley and A

R. Hartley and A. Zisserman,Multiple view geometry in computer vision. Cambridge university press, 2003

2003
[11]

Comparative evaluation of binary features,

J. Heinly, E. Dunn, and J.-M. Frahm, “Comparative evaluation of binary features,” inProceedings of the European Conference on Computer Vision. Springer, 2012, pp. 759–773

2012
[12]

Co- matcher: Multi-view collaborative feature matching,

J. Zhang, Z. Xia, M. Dong, S. Shen, L. Yue, and X. Zheng, “Co- matcher: Multi-view collaborative feature matching,” inProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2025, pp. 21 970–21 980

2025
[13]

Superpoint: Self- supervised interest point detection and description,

D. DeTone, T. Malisiewicz, and A. Rabinovich, “Superpoint: Self- supervised interest point detection and description,” inProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2018, pp. 224–236

2018
[14]

Superglue: Learning feature matching with graph neural networks,

P.-E. Sarlin, D. DeTone, T. Malisiewicz, and A. Rabinovich, “Superglue: Learning feature matching with graph neural networks,” inProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recogni- tion, 2020, pp. 4938–4947

2020
[15]

Loftr: Detector- free local feature matching with transformers,

J. Sun, Z. Shen, Y . Wang, H. Bao, and X. Zhou, “Loftr: Detector- free local feature matching with transformers,” inProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2021, pp. 8922–8931

2021
[16]

Segment anything,

A. Kirillov, E. Mintun, N. Ravi, H. Mao, C. Rolland, L. Gustafson, T. Xiao, S. Whitehead, A. C. Berg, W.-Y . Loet al., “Segment anything,” inProceedings of the IEEE/CVF International Conference on Computer Vision, 2023, pp. 4015–4026

2023
[17]

Sa-vla: Spatially-aware flow-matching for vision-language- action reinforcement learning,

X. Pan, Z. Wan, X. Yu, X. Zheng, Y . Ke, M. Sun, R. Wang, Z. Wang, and I. Tsang, “Sa-vla: Spatially-aware flow-matching for vision-language- action reinforcement learning,”arXiv preprint arXiv:2602.00743, 2026

work page arXiv 2026
[18]

Lift: Learned invariant feature transform,

K. M. Yi, E. Trulls, V . Lepetit, and P. Fua, “Lift: Learned invariant feature transform,” inProceedings of the European Conference on Computer Vision. Springer, 2016, pp. 467–483

2016
[19]

Match- former: Interleaving attention in transformers for feature matching,

Q. Wang, J. Zhang, K. Yang, K. Peng, and R. Stiefelhagen, “Match- former: Interleaving attention in transformers for feature matching,” in Proceedings of the Asian Conference on Computer Vision, 2022, pp. 2746–2762

2022
[20]

Lightglue: Local feature matching at light speed,

P. Lindenberger, P.-E. Sarlin, and M. Pollefeys, “Lightglue: Local feature matching at light speed,” inProceedings of the IEEE/CVF International Conference on Computer Vision, 2023, pp. 17 627–17 638

2023
[21]

Local feature matching using deep learning: A survey,

S. Xu, S. Chen, R. Xu, C. Wang, P. Lu, and L. Guo, “Local feature matching using deep learning: A survey,”Information Fusion, vol. 107, p. 102344, 2024

2024
[22]

Fully convolutional networks for semantic segmentation,

J. Long, E. Shelhamer, and T. Darrell, “Fully convolutional networks for semantic segmentation,” inProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2015, pp. 3431–3440

2015
[23]

Mask r-cnn,

K. He, G. Gkioxari, P. Doll ´ar, and R. Girshick, “Mask r-cnn,” in Proceedings of the IEEE/CVF International Conference on Computer Vision, 2017, pp. 2961–2969

2017
[24]

Per-pixel classification is not all you need for semantic segmentation,

B. Cheng, A. Schwing, and A. Kirillov, “Per-pixel classification is not all you need for semantic segmentation,”Advances in Neural Information Processing Systems, vol. 34, pp. 17 864–17 875, 2021

2021
[25]

Segformer: Simple and efficient design for semantic segmentation with transformers,

E. Xie, W. Wang, Z. Yu, A. Anandkumar, J. M. Alvarez, and P. Luo, “Segformer: Simple and efficient design for semantic segmentation with transformers,”Advances in Neural Information Processing Systems, vol. 34, pp. 12 077–12 090, 2021

2021
[26]

Masked-attention mask transformer for universal image segmentation,

B. Cheng, I. Misra, A. G. Schwing, A. Kirillov, and R. Girdhar, “Masked-attention mask transformer for universal image segmentation,” inProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022, pp. 1290–1299

2022
[27]

Panoptic segformer: Delving deeper into panoptic segmen- tation with transformers,

Z. Li, W. Wang, E. Xie, Z. Yu, A. Anandkumar, J. M. Alvarez, P. Luo, and T. Lu, “Panoptic segformer: Delving deeper into panoptic segmen- tation with transformers,” inProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022, pp. 1280–1289

2022
[28]

Gsva: Generalized segmentation via multimodal large language models,

Z. Xia, D. Han, Y . Han, X. Pan, S. Song, and G. Huang, “Gsva: Generalized segmentation via multimodal large language models,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024, pp. 3858–3869

2024
[29]

Image matching across wide baselines: From paper to practice,

Y . Jin, D. Mishkin, A. Mishchuk, J. Matas, P. Fua, K. M. Yi, and E. Trulls, “Image matching across wide baselines: From paper to practice,”International Journal of Computer Vision, vol. 129, no. 2, pp. 517–547, 2021

2021
[30]

Eto: Ef- ficient transformer-based local feature matching by organizing multiple homography hypotheses,

J. Ni, G. Zhang, G. Li, Y . Li, X. Liu, Z. Huang, and H. Bao, “Eto: Ef- ficient transformer-based local feature matching by organizing multiple homography hypotheses,”Advances in Neural Information Processing Systems, vol. 37, pp. 60 260–60 274, 2024

2024
[31]

Cotr: Correspondence transformer for matching across images,

W. Jiang, E. Trulls, J. Hosang, A. Tagliasacchi, and K. M. Yi, “Cotr: Correspondence transformer for matching across images,” inProceed- ings of the IEEE/CVF International Conference on Computer Vision, 2021, pp. 6207–6217

2021
[32]

Back to the feature: Learning robust camera localization from pixels to pose,

P.-E. Sarlin, A. Unagar, M. Larsson, H. Germain, C. Toft, V . Larsson, M. Pollefeys, V . Lepetit, L. Hammarstrand, F. Kahlet al., “Back to the feature: Learning robust camera localization from pixels to pose,” inProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2021, pp. 3247–3257

2021
[33]

Guide local feature matching by overlap estimation,

Y . Chen, D. Huang, S. Xu, J. Liu, and Y . Liu, “Guide local feature matching by overlap estimation,” inProceedings of the AAAI Conference on Artificial Intelligence, vol. 36, no. 1, 2022, pp. 365–373

2022
[34]

Is-mvsnet: Importance sampling-based mvsnet,

L. Wang, Y . Gong, X. Ma, Q. Wang, K. Zhou, and L. Chen, “Is-mvsnet: Importance sampling-based mvsnet,” inProceedings of the European Conference on Computer Vision. Springer, 2022, pp. 668–683

2022
[35]

Learning intra- view and cross-view geometric knowledge for stereo matching,

R. Gong, W. Liu, Z. Gu, X. Yang, and J. Cheng, “Learning intra- view and cross-view geometric knowledge for stereo matching,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024, pp. 20 752–20 762

2024
[36]

Telling left from right: Identifying geometry-aware semantic correspondence,

J. Zhang, C. Herrmann, J. Hur, E. Chen, V . Jampani, D. Sun, and M.- H. Yang, “Telling left from right: Identifying geometry-aware semantic correspondence,” inProceedings of the IEEE/CVF Conference on Com- puter Vision and Pattern Recognition, 2024, pp. 3076–3085

2024
[37]

Joint semantic segmentation using representations of lidar point clouds and camera images,

Y . Wu, J. Liu, M. Gong, Q. Miao, W. Ma, and C. Xu, “Joint semantic segmentation using representations of lidar point clouds and camera images,”Information Fusion, vol. 108, p. 102370, 2024. 14

2024
[38]

Mvg-net: Lidar point cloud semantic segmentation network integrating multi-view images,

Y . Liu, Y . Liu, and Y . Duan, “Mvg-net: Lidar point cloud semantic segmentation network integrating multi-view images,”Remote Sensing, vol. 16, no. 15, p. 2821, 2024

2024
[39]

Segment anything in high quality,

L. Ke, M. Ye, M. Danelljan, Y .-W. Tai, C.-K. Tang, F. Yuet al., “Segment anything in high quality,”Advances in Neural Information Processing Systems, vol. 36, pp. 29 914–29 934, 2023

2023
[40]

SAM 2: Segment Anything in Images and Videos

N. Ravi, V . Gabeur, Y .-T. Hu, R. Hu, C. Ryali, T. Ma, H. Khedr, R. R¨adle, C. Rolland, L. Gustafsonet al., “Sam 2: Segment anything in images and videos,”arXiv preprint arXiv:2408.00714, 2024

work page internal anchor Pith review Pith/arXiv arXiv 2024
[41]

Scalable diffusion models with transformers,

W. Peebles and S. Xie, “Scalable diffusion models with transformers,” inProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023, pp. 4195–4205

2023
[42]

Imagebind: One embedding space to bind them all,

R. Girdhar, A. El-Nouby, Z. Liu, M. Singh, K. V . Alwala, A. Joulin, and I. Misra, “Imagebind: One embedding space to bind them all,” inProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023, pp. 15 180–15 190

2023
[43]

Swin transformer v2: Scaling up capacity and resolution,

Z. Liu, H. Hu, Y . Lin, Z. Yao, Z. Xie, Y . Wei, J. Ning, Y . Cao, Z. Zhang, L. Donget al., “Swin transformer v2: Scaling up capacity and resolution,” inProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022, pp. 12 009–12 019

2022
[44]

Flashattention: Fast and memory-efficient exact attention with io-awareness,

T. Dao, D. Fu, S. Ermon, A. Rudra, and C. R ´e, “Flashattention: Fast and memory-efficient exact attention with io-awareness,”Advances in Neural Information Processing Systems, vol. 35, pp. 16 344–16 359, 2022

2022
[45]

Pointrend: Image segmen- tation as rendering,

A. Kirillov, Y . Wu, K. He, and R. Girshick, “Pointrend: Image segmen- tation as rendering,” inProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020, pp. 9799–9808

2020
[46]

Megadepth: Learning single-view depth predic- tion from internet photos,

Z. Li and N. Snavely, “Megadepth: Learning single-view depth predic- tion from internet photos,” inProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2018

2018
[47]

Structure-from-motion revisited,

J. L. Sch ¨onberger and J.-M. Frahm, “Structure-from-motion revisited,” inProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2016

2016
[48]

Pixelwise view selection for unstructured multi-view stereo,

J. L. Sch ¨onberger, E. Zheng, M. Pollefeys, and J.-M. Frahm, “Pixelwise view selection for unstructured multi-view stereo,” inProceedings of the European Conference on Computer Vision, 2016

2016
[49]

Scale-aware co-visible region detection for image matching,

X. Pan, Z. Xia, and X. Zheng, “Scale-aware co-visible region detection for image matching,”ISPRS Journal of Photogrammetry and Remote Sensing, vol. 229, pp. 122–137, 2025

2025
[50]

Scannet: Richly-annotated 3d reconstructions of indoor scenes,

A. Dai, A. X. Chang, M. Savva, M. Halber, T. Funkhouser, and M. Nießner, “Scannet: Richly-annotated 3d reconstructions of indoor scenes,” inProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2017

2017
[51]

Matchable image retrieval by learning from surface reconstruction,

T. Shen, Z. Luo, L. Zhou, R. Zhang, S. Zhu, T. Fang, and L. Quan, “Matchable image retrieval by learning from surface reconstruction,” in Proceedings of the Asian Conference on Computer Vision, 2018

2018
[52]

Geodesc: Learning local descriptors by integrating geometry constraints,

Z. Luo, T. Shen, L. Zhou, S. Zhu, R. Zhang, Y . Yao, T. Fang, and L. Quan, “Geodesc: Learning local descriptors by integrating geometry constraints,” inProceedings of the European Conference on Computer Vision, 2018

2018
[53]

Object recognition from local scale-invariant features,

D. G. Lowe, “Object recognition from local scale-invariant features,” in Proceedings of the IEEE/CVF International Conference on Computer Vision, vol. 2. Ieee, 1999, pp. 1150–1157

1999
[54]

Disk: Learning local features with policy gradient,

M. Tyszkiewicz, P. Fua, and E. Trulls, “Disk: Learning local features with policy gradient,”Advances in Neural Information Processing Sys- tems, vol. 33, pp. 14 254–14 265, 2020

2020
[55]

D2-net: A trainable CNN for joint description and detection of local features,

M. Dusmanu, I. Rocco, T. Pajdla, M. Pollefeys, J. Sivic, A. Torii, and T. Sattler, “D2-net: A trainable CNN for joint description and detection of local features,” inProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2019, pp. 8092–8101

2019
[56]

Contextdesc: Local descriptor augmentation with cross-modality con- text,

Z. Luo, T. Shen, L. Zhou, J. Zhang, Y . Yao, S. Li, T. Fang, and L. Quan, “Contextdesc: Local descriptor augmentation with cross-modality con- text,” inProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2019, pp. 2527–2536

2019
[57]

R2d2: Reliable and repeatable detector and descriptor,

J. Revaud, C. De Souza, M. Humenberger, and P. Weinzaepfel, “R2d2: Reliable and repeatable detector and descriptor,”Advances in Neural Information Processing Systems, vol. 32, 2019

2019
[58]

Traveling salesman should not be greedy: domination analysis of greedy-type heuristics for the tsp,

G. Gutin, A. Yeo, and A. Zverovich, “Traveling salesman should not be greedy: domination analysis of greedy-type heuristics for the tsp,” Discrete Applied Mathematics, vol. 117, no. 1-3, pp. 81–86, 2002

2002

[1] [1]

Structure-from-motion revisited,

J. L. Schonberger and J.-M. Frahm, “Structure-from-motion revisited,” inProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2016, pp. 4104–4113

2016

[2] [2]

Orb-slam: A versatile and accurate monocular slam system,

R. Mur-Artal, J. M. M. Montiel, and J. D. Tardos, “Orb-slam: A versatile and accurate monocular slam system,”IEEE Transactions on Robotics, vol. 31, no. 5, pp. 1147–1163, 2015

2015

[3] [3]

Direct sparse odometry,

J. Engel, V . Koltun, and D. Cremers, “Direct sparse odometry,”IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 40, no. 3, pp. 611–625, 2017

2017

[4] [4]

A comparison and evaluation of multi-view stereo reconstruction al- gorithms,

S. M. Seitz, B. Curless, J. Diebel, D. Scharstein, and R. Szeliski, “A comparison and evaluation of multi-view stereo reconstruction al- gorithms,” inProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, vol. 1. IEEE, 2006, pp. 519–528

2006

[5] [5]

Building rome in a day,

S. Agarwal, Y . Furukawa, N. Snavely, I. Simon, B. Curless, S. M. Seitz, and R. Szeliski, “Building rome in a day,”Communications of the ACM, vol. 54, no. 10, pp. 105–112, 2011

2011

[6] [6]

Reconstructing the world* in six days*(as captured by the yahoo 100 million image dataset),

J. Heinly, J. L. Schonberger, E. Dunn, and J.-M. Frahm, “Reconstructing the world* in six days*(as captured by the yahoo 100 million image dataset),” inProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2015, pp. 3287–3295

2015

[7] [7]

Distinctive image features from scale-invariant keypoints,

D. G. Lowe, “Distinctive image features from scale-invariant keypoints,” International Journal of Computer Vision, vol. 60, pp. 91–110, 2004

2004

[8] [8]

Surf: Speeded up robust features,

H. Bay, T. Tuytelaars, and L. Van Gool, “Surf: Speeded up robust features,” inProceedings of the European Conference on Computer Vision. Springer, 2006, pp. 404–417

2006

[9] [9]

Orb: An efficient alternative to sift or surf,

E. Rublee, V . Rabaud, K. Konolige, and G. Bradski, “Orb: An efficient alternative to sift or surf,” inProceedings of the IEEE/CVF International Conference on Computer Vision. Ieee, 2011, pp. 2564–2571

2011

[10] [10]

Hartley and A

R. Hartley and A. Zisserman,Multiple view geometry in computer vision. Cambridge university press, 2003

2003

[11] [11]

Comparative evaluation of binary features,

J. Heinly, E. Dunn, and J.-M. Frahm, “Comparative evaluation of binary features,” inProceedings of the European Conference on Computer Vision. Springer, 2012, pp. 759–773

2012

[12] [12]

Co- matcher: Multi-view collaborative feature matching,

J. Zhang, Z. Xia, M. Dong, S. Shen, L. Yue, and X. Zheng, “Co- matcher: Multi-view collaborative feature matching,” inProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2025, pp. 21 970–21 980

2025

[13] [13]

Superpoint: Self- supervised interest point detection and description,

D. DeTone, T. Malisiewicz, and A. Rabinovich, “Superpoint: Self- supervised interest point detection and description,” inProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2018, pp. 224–236

2018

[14] [14]

Superglue: Learning feature matching with graph neural networks,

P.-E. Sarlin, D. DeTone, T. Malisiewicz, and A. Rabinovich, “Superglue: Learning feature matching with graph neural networks,” inProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recogni- tion, 2020, pp. 4938–4947

2020

[15] [15]

Loftr: Detector- free local feature matching with transformers,

J. Sun, Z. Shen, Y . Wang, H. Bao, and X. Zhou, “Loftr: Detector- free local feature matching with transformers,” inProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2021, pp. 8922–8931

2021

[16] [16]

Segment anything,

A. Kirillov, E. Mintun, N. Ravi, H. Mao, C. Rolland, L. Gustafson, T. Xiao, S. Whitehead, A. C. Berg, W.-Y . Loet al., “Segment anything,” inProceedings of the IEEE/CVF International Conference on Computer Vision, 2023, pp. 4015–4026

2023

[17] [17]

Sa-vla: Spatially-aware flow-matching for vision-language- action reinforcement learning,

X. Pan, Z. Wan, X. Yu, X. Zheng, Y . Ke, M. Sun, R. Wang, Z. Wang, and I. Tsang, “Sa-vla: Spatially-aware flow-matching for vision-language- action reinforcement learning,”arXiv preprint arXiv:2602.00743, 2026

work page arXiv 2026

[18] [18]

Lift: Learned invariant feature transform,

K. M. Yi, E. Trulls, V . Lepetit, and P. Fua, “Lift: Learned invariant feature transform,” inProceedings of the European Conference on Computer Vision. Springer, 2016, pp. 467–483

2016

[19] [19]

Match- former: Interleaving attention in transformers for feature matching,

Q. Wang, J. Zhang, K. Yang, K. Peng, and R. Stiefelhagen, “Match- former: Interleaving attention in transformers for feature matching,” in Proceedings of the Asian Conference on Computer Vision, 2022, pp. 2746–2762

2022

[20] [20]

Lightglue: Local feature matching at light speed,

P. Lindenberger, P.-E. Sarlin, and M. Pollefeys, “Lightglue: Local feature matching at light speed,” inProceedings of the IEEE/CVF International Conference on Computer Vision, 2023, pp. 17 627–17 638

2023

[21] [21]

Local feature matching using deep learning: A survey,

S. Xu, S. Chen, R. Xu, C. Wang, P. Lu, and L. Guo, “Local feature matching using deep learning: A survey,”Information Fusion, vol. 107, p. 102344, 2024

2024

[22] [22]

Fully convolutional networks for semantic segmentation,

J. Long, E. Shelhamer, and T. Darrell, “Fully convolutional networks for semantic segmentation,” inProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2015, pp. 3431–3440

2015

[23] [23]

Mask r-cnn,

K. He, G. Gkioxari, P. Doll ´ar, and R. Girshick, “Mask r-cnn,” in Proceedings of the IEEE/CVF International Conference on Computer Vision, 2017, pp. 2961–2969

2017

[24] [24]

Per-pixel classification is not all you need for semantic segmentation,

B. Cheng, A. Schwing, and A. Kirillov, “Per-pixel classification is not all you need for semantic segmentation,”Advances in Neural Information Processing Systems, vol. 34, pp. 17 864–17 875, 2021

2021

[25] [25]

Segformer: Simple and efficient design for semantic segmentation with transformers,

E. Xie, W. Wang, Z. Yu, A. Anandkumar, J. M. Alvarez, and P. Luo, “Segformer: Simple and efficient design for semantic segmentation with transformers,”Advances in Neural Information Processing Systems, vol. 34, pp. 12 077–12 090, 2021

2021

[26] [26]

Masked-attention mask transformer for universal image segmentation,

B. Cheng, I. Misra, A. G. Schwing, A. Kirillov, and R. Girdhar, “Masked-attention mask transformer for universal image segmentation,” inProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022, pp. 1290–1299

2022

[27] [27]

Panoptic segformer: Delving deeper into panoptic segmen- tation with transformers,

Z. Li, W. Wang, E. Xie, Z. Yu, A. Anandkumar, J. M. Alvarez, P. Luo, and T. Lu, “Panoptic segformer: Delving deeper into panoptic segmen- tation with transformers,” inProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022, pp. 1280–1289

2022

[28] [28]

Gsva: Generalized segmentation via multimodal large language models,

Z. Xia, D. Han, Y . Han, X. Pan, S. Song, and G. Huang, “Gsva: Generalized segmentation via multimodal large language models,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024, pp. 3858–3869

2024

[29] [29]

Image matching across wide baselines: From paper to practice,

Y . Jin, D. Mishkin, A. Mishchuk, J. Matas, P. Fua, K. M. Yi, and E. Trulls, “Image matching across wide baselines: From paper to practice,”International Journal of Computer Vision, vol. 129, no. 2, pp. 517–547, 2021

2021

[30] [30]

Eto: Ef- ficient transformer-based local feature matching by organizing multiple homography hypotheses,

J. Ni, G. Zhang, G. Li, Y . Li, X. Liu, Z. Huang, and H. Bao, “Eto: Ef- ficient transformer-based local feature matching by organizing multiple homography hypotheses,”Advances in Neural Information Processing Systems, vol. 37, pp. 60 260–60 274, 2024

2024

[31] [31]

Cotr: Correspondence transformer for matching across images,

W. Jiang, E. Trulls, J. Hosang, A. Tagliasacchi, and K. M. Yi, “Cotr: Correspondence transformer for matching across images,” inProceed- ings of the IEEE/CVF International Conference on Computer Vision, 2021, pp. 6207–6217

2021

[32] [32]

Back to the feature: Learning robust camera localization from pixels to pose,

P.-E. Sarlin, A. Unagar, M. Larsson, H. Germain, C. Toft, V . Larsson, M. Pollefeys, V . Lepetit, L. Hammarstrand, F. Kahlet al., “Back to the feature: Learning robust camera localization from pixels to pose,” inProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2021, pp. 3247–3257

2021

[33] [33]

Guide local feature matching by overlap estimation,

Y . Chen, D. Huang, S. Xu, J. Liu, and Y . Liu, “Guide local feature matching by overlap estimation,” inProceedings of the AAAI Conference on Artificial Intelligence, vol. 36, no. 1, 2022, pp. 365–373

2022

[34] [34]

Is-mvsnet: Importance sampling-based mvsnet,

L. Wang, Y . Gong, X. Ma, Q. Wang, K. Zhou, and L. Chen, “Is-mvsnet: Importance sampling-based mvsnet,” inProceedings of the European Conference on Computer Vision. Springer, 2022, pp. 668–683

2022

[35] [35]

Learning intra- view and cross-view geometric knowledge for stereo matching,

R. Gong, W. Liu, Z. Gu, X. Yang, and J. Cheng, “Learning intra- view and cross-view geometric knowledge for stereo matching,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024, pp. 20 752–20 762

2024

[36] [36]

Telling left from right: Identifying geometry-aware semantic correspondence,

J. Zhang, C. Herrmann, J. Hur, E. Chen, V . Jampani, D. Sun, and M.- H. Yang, “Telling left from right: Identifying geometry-aware semantic correspondence,” inProceedings of the IEEE/CVF Conference on Com- puter Vision and Pattern Recognition, 2024, pp. 3076–3085

2024

[37] [37]

Joint semantic segmentation using representations of lidar point clouds and camera images,

Y . Wu, J. Liu, M. Gong, Q. Miao, W. Ma, and C. Xu, “Joint semantic segmentation using representations of lidar point clouds and camera images,”Information Fusion, vol. 108, p. 102370, 2024. 14

2024

[38] [38]

Mvg-net: Lidar point cloud semantic segmentation network integrating multi-view images,

Y . Liu, Y . Liu, and Y . Duan, “Mvg-net: Lidar point cloud semantic segmentation network integrating multi-view images,”Remote Sensing, vol. 16, no. 15, p. 2821, 2024

2024

[39] [39]

Segment anything in high quality,

L. Ke, M. Ye, M. Danelljan, Y .-W. Tai, C.-K. Tang, F. Yuet al., “Segment anything in high quality,”Advances in Neural Information Processing Systems, vol. 36, pp. 29 914–29 934, 2023

2023

[40] [40]

SAM 2: Segment Anything in Images and Videos

N. Ravi, V . Gabeur, Y .-T. Hu, R. Hu, C. Ryali, T. Ma, H. Khedr, R. R¨adle, C. Rolland, L. Gustafsonet al., “Sam 2: Segment anything in images and videos,”arXiv preprint arXiv:2408.00714, 2024

work page internal anchor Pith review Pith/arXiv arXiv 2024

[41] [41]

Scalable diffusion models with transformers,

W. Peebles and S. Xie, “Scalable diffusion models with transformers,” inProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023, pp. 4195–4205

2023

[42] [42]

Imagebind: One embedding space to bind them all,

R. Girdhar, A. El-Nouby, Z. Liu, M. Singh, K. V . Alwala, A. Joulin, and I. Misra, “Imagebind: One embedding space to bind them all,” inProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023, pp. 15 180–15 190

2023

[43] [43]

Swin transformer v2: Scaling up capacity and resolution,

Z. Liu, H. Hu, Y . Lin, Z. Yao, Z. Xie, Y . Wei, J. Ning, Y . Cao, Z. Zhang, L. Donget al., “Swin transformer v2: Scaling up capacity and resolution,” inProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022, pp. 12 009–12 019

2022

[44] [44]

Flashattention: Fast and memory-efficient exact attention with io-awareness,

T. Dao, D. Fu, S. Ermon, A. Rudra, and C. R ´e, “Flashattention: Fast and memory-efficient exact attention with io-awareness,”Advances in Neural Information Processing Systems, vol. 35, pp. 16 344–16 359, 2022

2022

[45] [45]

Pointrend: Image segmen- tation as rendering,

A. Kirillov, Y . Wu, K. He, and R. Girshick, “Pointrend: Image segmen- tation as rendering,” inProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020, pp. 9799–9808

2020

[46] [46]

Megadepth: Learning single-view depth predic- tion from internet photos,

Z. Li and N. Snavely, “Megadepth: Learning single-view depth predic- tion from internet photos,” inProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2018

2018

[47] [47]

Structure-from-motion revisited,

J. L. Sch ¨onberger and J.-M. Frahm, “Structure-from-motion revisited,” inProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2016

2016

[48] [48]

Pixelwise view selection for unstructured multi-view stereo,

J. L. Sch ¨onberger, E. Zheng, M. Pollefeys, and J.-M. Frahm, “Pixelwise view selection for unstructured multi-view stereo,” inProceedings of the European Conference on Computer Vision, 2016

2016

[49] [49]

Scale-aware co-visible region detection for image matching,

X. Pan, Z. Xia, and X. Zheng, “Scale-aware co-visible region detection for image matching,”ISPRS Journal of Photogrammetry and Remote Sensing, vol. 229, pp. 122–137, 2025

2025

[50] [50]

Scannet: Richly-annotated 3d reconstructions of indoor scenes,

A. Dai, A. X. Chang, M. Savva, M. Halber, T. Funkhouser, and M. Nießner, “Scannet: Richly-annotated 3d reconstructions of indoor scenes,” inProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2017

2017

[51] [51]

Matchable image retrieval by learning from surface reconstruction,

T. Shen, Z. Luo, L. Zhou, R. Zhang, S. Zhu, T. Fang, and L. Quan, “Matchable image retrieval by learning from surface reconstruction,” in Proceedings of the Asian Conference on Computer Vision, 2018

2018

[52] [52]

Geodesc: Learning local descriptors by integrating geometry constraints,

Z. Luo, T. Shen, L. Zhou, S. Zhu, R. Zhang, Y . Yao, T. Fang, and L. Quan, “Geodesc: Learning local descriptors by integrating geometry constraints,” inProceedings of the European Conference on Computer Vision, 2018

2018

[53] [53]

Object recognition from local scale-invariant features,

D. G. Lowe, “Object recognition from local scale-invariant features,” in Proceedings of the IEEE/CVF International Conference on Computer Vision, vol. 2. Ieee, 1999, pp. 1150–1157

1999

[54] [54]

Disk: Learning local features with policy gradient,

M. Tyszkiewicz, P. Fua, and E. Trulls, “Disk: Learning local features with policy gradient,”Advances in Neural Information Processing Sys- tems, vol. 33, pp. 14 254–14 265, 2020

2020

[55] [55]

D2-net: A trainable CNN for joint description and detection of local features,

M. Dusmanu, I. Rocco, T. Pajdla, M. Pollefeys, J. Sivic, A. Torii, and T. Sattler, “D2-net: A trainable CNN for joint description and detection of local features,” inProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2019, pp. 8092–8101

2019

[56] [56]

Contextdesc: Local descriptor augmentation with cross-modality con- text,

Z. Luo, T. Shen, L. Zhou, J. Zhang, Y . Yao, S. Li, T. Fang, and L. Quan, “Contextdesc: Local descriptor augmentation with cross-modality con- text,” inProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2019, pp. 2527–2536

2019

[57] [57]

R2d2: Reliable and repeatable detector and descriptor,

J. Revaud, C. De Souza, M. Humenberger, and P. Weinzaepfel, “R2d2: Reliable and repeatable detector and descriptor,”Advances in Neural Information Processing Systems, vol. 32, 2019

2019

[58] [58]

Traveling salesman should not be greedy: domination analysis of greedy-type heuristics for the tsp,

G. Gutin, A. Yeo, and A. Zverovich, “Traveling salesman should not be greedy: domination analysis of greedy-type heuristics for the tsp,” Discrete Applied Mathematics, vol. 117, no. 1-3, pp. 81–86, 2002

2002