RoMa: Robust Dense Feature Matching

Johan Edstedt , Qiyu Sun , Georg B\"okman , M{\aa}rten Wadenb\"ack , Michael Felsberg

Authors on Pith no claims yet

classification 💻 cs.CV

keywords featuresmodelrobustfeatureproposeromachallengingcorrespondences

read the original abstract

Feature matching is an important computer vision task that involves estimating correspondences between two images of a 3D scene, and dense methods estimate all such correspondences. The aim is to learn a robust model, i.e., a model able to match under challenging real-world changes. In this work, we propose such a model, leveraging frozen pretrained features from the foundation model DINOv2. Although these features are significantly more robust than local features trained from scratch, they are inherently coarse. We therefore combine them with specialized ConvNet fine features, creating a precisely localizable feature pyramid. To further improve robustness, we propose a tailored transformer match decoder that predicts anchor probabilities, which enables it to express multimodality. Finally, we propose an improved loss formulation through regression-by-classification with subsequent robust regression. We conduct a comprehensive set of experiments that show that our method, RoMa, achieves significant gains, setting a new state-of-the-art. In particular, we achieve a 36% improvement on the extremely challenging WxBS benchmark. Code is provided at https://github.com/Parskatt/RoMa

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Improving Local Feature Matching by Entropy-inspired Scale Adaptability and Flow-endowed Local Consistency
cs.CV 2026-04 unverdicted novelty 4.0

A semi-dense image matching pipeline adds scale adaptability via score-matrix hints at the coarse stage and local flow consistency via gradient loss at the fine stage.