Auto-regressive transformation for image alignment

Kanggeon Lee; Kyoung Mu Lee; Soochahn Lee

arxiv: 2505.04864 · v2 · submitted 2025-05-08 · 💻 cs.CV · cs.AI

Auto-regressive transformation for image alignment

Kanggeon Lee , Soochahn Lee , Kyoung Mu Lee This is my paper

Pith reviewed 2026-05-22 16:24 UTC · model grok-4.3

classification 💻 cs.CV cs.AI

keywords image alignmentauto-regressive transformationcross-attentionmulti-scale featurestransformation estimationiterative refinementcomputer visionfeature-sparse alignment

0 comments

The pith

Auto-regressive transformation iteratively refines image alignments by focusing cross-attention on critical regions at multiple scales.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Existing image alignment methods struggle with feature-sparse regions, extreme scale differences, and large deformations. This paper introduces Auto-Regressive Transformation to address these issues through an iterative pipeline that refines transformations from coarse to fine. The approach uses hierarchical multi-scale features and randomly samples points at each scale while cross-attention directs focus to important areas. This design aims to maintain accuracy even when traditional feature matching would fail. Experiments indicate stronger results on planar images and similar performance on 3D scenes compared with current techniques.

Core claim

The paper claims that an auto-regressive pipeline iteratively estimates coarse-to-fine transformations for image alignment by refining transform field parameters with randomly sampled points drawn from hierarchical multi-scale features, guided by a cross-attention layer that directs attention to critical regions and thereby achieves accurate results under conditions of sparse features, large scale changes, and substantial deformations.

What carries the argument

The auto-regressive pipeline that refines the transform field iteratively at each scale using randomly sampled points and cross-attention guidance from multi-scale features.

If this is right

Outperforms existing methods on planar image alignment tasks
Achieves performance comparable to state-of-the-art on 3D scene images
Handles feature-sparse regions, extreme scale and field-of-view differences, and large deformations more reliably
Provides a versatile pipeline for precise alignment across varied imaging conditions

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The sampling-plus-attention strategy could be adapted to other dense correspondence tasks such as optical flow estimation in low-texture video.
If the iterative refinement proves stable, the method might reduce reliance on hand-crafted feature detectors in downstream applications like panoramic stitching.
Extending the pipeline to include temporal auto-regression could support alignment across video sequences without separate tracking modules.

Load-bearing premise

Randomly sampling points at each scale combined with cross-attention guidance is sufficient to accurately refine the transform field even in feature-sparse regions.

What would settle it

A dataset of images containing large feature-sparse areas where cross-attention maps consistently miss the true correspondence regions and produce alignment errors exceeding those of current state-of-the-art methods.

Figures

Figures reproduced from arXiv: 2505.04864 by Kanggeon Lee, Kyoung Mu Lee, Soochahn Lee.

**Figure 1.** Figure 1: Alignment Results in Challenging Scenarios. For image pairs with sparse features, scale differences, deformations, degradations, and domain shifts, our method performs coarse-to-fine auto-regressive transformation refinement, achieving accurate alignment even in challenging scenarios where state-of-the-art methods struggle. The zoomedin boxes show the local alignment results, and the highlighted vessel… view at source ↗

**Figure 2.** Figure 2: Method Overview. Auto-Regressive Transformation (ART) iteratively refines the transformation D for image pairs I in a coarse-to-fine manner. Its sampling strategy enables effective operation across diverse domains and datasets. ance cues from the entire image pair as conditioning signals, ART achieves robustness to initialization. Extensive evaluations demonstrate that ART significantly outperforms existi… view at source ↗

**Figure 3.** Figure 3: Overall Framework. ART first extracts multi-scale features Fs and Fd from the input image pair Is and Id. At each sampling step k, the corresponding features, F k s and F k d , are passed through the Cross-Attention Layer (CAL) to identify the correlated features that guide the network’s focus on regions requiring refinement. The attentive feature map F˜k s→d is then used to refine the transform field para… view at source ↗

**Figure 4.** Figure 4: Point-based Image Warping. At sampling step k, the extracted source points set P k s is warped to P˜k s→d by sequentially multiplying with the corresponding values of the transform field parameter Dk M and adding Dk A for each point. These point pairs are then used to compute the warped image I˜ s→d. where k is the current and K is the maximum transform field parameters sampling step. Each output’s spatia… view at source ↗

**Figure 5.** Figure 5: Qualitative Evaluation on Retinal Datasets. Across various domains, ART robustly identifies sufficient matches compared to SuperRetina [4], GeoFormer [1], and RetinaRegNet [26]. Correct and incorrect matches are shown in green and red, respectively. The zoomed-in boxes highlight overlaid local regions [PITH_FULL_IMAGE:figures/full_fig_p006_5.png] view at source ↗

**Figure 7.** Figure 7: Qualitative Evaluation on Scene-LR Datasets. On the GoogleEarth [16], GoogleMap [16], and MSCOCO [61] datasets, ART successfully finds the correct transformation between input image pairs, even with sparse features from low resolution, domain gaps, and scale differences [PITH_FULL_IMAGE:figures/full_fig_p007_7.png] view at source ↗

**Figure 8.** Figure 8: Ablation Study on Sampling. ART performance varies with (a) the number of sampling steps and (b) different initialization strategies, across HR (left) and LR (right) datasets. 4.5 Understanding ART Here, we present ablation studies to gain a deeper understanding of the key components that constitute ART. Sampling Efficiency The aforementioned number of iteration steps, 6 for HR images and 4 for LR image… view at source ↗

read the original abstract

Existing methods for image alignment struggle in cases involving feature-sparse regions, extreme scale and field-of-view differences, and large deformations, often resulting in suboptimal accuracy. Robustness to these challenges can be improved through iterative refinement of the transform field while focusing on critical regions in multi-scale image representations. We thus propose Auto-Regressive Transformation (ART), a novel method that iteratively estimates the coarse-to-fine transformations through an auto-regressive pipeline. Leveraging hierarchical multi-scale features, our network refines the transform field parameters using randomly sampled points at each scale. By incorporating guidance from the cross-attention layer, the model focuses on critical regions, ensuring accurate alignment even in challenging, feature-limited conditions. Extensive experiments demonstrate that ART significantly outperforms state-of-the-art methods on planar images and achieves comparable performance on 3D scene images, establishing it as a powerful and versatile solution for precise image alignment.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

1 major / 2 minor

Summary. The paper proposes Auto-Regressive Transformation (ART), a novel image alignment method that iteratively estimates coarse-to-fine transformations via an auto-regressive pipeline. It extracts hierarchical multi-scale features, refines transform parameters from randomly sampled points at each scale, and uses cross-attention to focus on critical regions, with the goal of improving robustness to feature-sparse areas, extreme scale/FOV differences, and large deformations. Experiments are claimed to show significant outperformance versus state-of-the-art on planar images and comparable results on 3D scenes.

Significance. If the performance claims are substantiated, the auto-regressive coarse-to-fine refinement with cross-attention guidance offers a plausible route to better handling of challenging alignment cases that defeat current methods. The approach is internally consistent and does not reduce to prior fitted parameters; the novelty lies in the pipeline design rather than circular reuse of earlier results.

major comments (1)

[Method] Method section (pipeline description): the central robustness claim for feature-sparse regions rests on the combination of random point sampling at each scale plus cross-attention guidance. No ablation isolating the sampling strategy or error stratified by local feature density is described, leaving open whether the sampled set remains informative when repeatable features are absent; this directly threatens the headline outperformance result on planar images.

minor comments (2)

[Abstract] Abstract: the statement that 'extensive experiments demonstrate' superiority would be strengthened by naming the primary datasets, key metrics (e.g., mean endpoint error or success rate), and at least one quantitative delta versus the strongest baseline.
[Experiments] Experiments: absence of detailed quantitative tables, error analysis, or ablation studies in the visible text makes the superiority claim only moderately verifiable; adding these would allow readers to assess effect sizes and failure modes.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for the constructive feedback. We address the major comment below and outline the revisions we will make to strengthen the manuscript.

read point-by-point responses

Referee: [Method] Method section (pipeline description): the central robustness claim for feature-sparse regions rests on the combination of random point sampling at each scale plus cross-attention guidance. No ablation isolating the sampling strategy or error stratified by local feature density is described, leaving open whether the sampled set remains informative when repeatable features are absent; this directly threatens the headline outperformance result on planar images.

Authors: We acknowledge the value of an explicit ablation isolating the random sampling strategy and an error analysis stratified by local feature density. The design rationale is that random sampling at each scale deliberately avoids dependence on repeatable keypoints, allowing the network to draw from any image locations while cross-attention modulates focus toward regions that contribute most to alignment. In the revised manuscript we will add an ablation that replaces random sampling with feature-based point selection (e.g., using SIFT or SuperPoint) and report both overall alignment error and error binned by local feature density computed via keypoint counts in image patches. These results will be placed in the experiments section to directly support the robustness claim on planar images. revision: yes

Circularity Check

0 steps flagged

No significant circularity in ART derivation or claims

full rationale

The paper introduces a new neural architecture (ART) that performs iterative coarse-to-fine transform refinement via an auto-regressive pipeline on hierarchical features, random point sampling per scale, and cross-attention guidance. All load-bearing elements are presented as design choices in a novel network, with performance claims resting on experimental comparisons rather than any mathematical reduction, fitted-parameter renaming, or self-citation chain. No equations or steps in the abstract or described method reduce to inputs by construction; the derivation is self-contained as an empirical architecture proposal.

Axiom & Free-Parameter Ledger

2 free parameters · 1 axioms · 0 invented entities

The method relies on standard supervised training of a neural network for regression of transformation parameters; design choices such as number of scales and sampling density function as hyperparameters.

free parameters (2)

number of hierarchical scales
Choice of multi-scale levels is a design decision that affects the coarse-to-fine refinement schedule.
number of randomly sampled points per scale
The count of points used to estimate transform parameters at each level is a tunable hyperparameter.

axioms (1)

domain assumption A neural network trained on image pairs can learn to predict accurate transformation parameters from multi-scale features and attention signals.
The entire pipeline presupposes that end-to-end learning from data will produce reliable iterative refinements.

pith-pipeline@v0.9.0 · 5676 in / 1398 out tokens · 108213 ms · 2026-05-22T16:24:23.895596+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

ART employs an auto-regressive approach, iteratively sampling and refining local transform parameters by joint estimation for a set of points in a coarse-to-fine manner guided by multi-scale representations

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

67 extracted references · 67 canonical work pages

[1]

Geometrized transformer for self- supervised homography estimation

Jiazhen Liu and Xirong Li. Geometrized transformer for self- supervised homography estimation. InICCV, 2023. 1, 2, 5, 6, 8

work page 2023
[2]

Rempe: Registration of retinal images through eye modelling and pose estimation.IEEE Journal of Biomedical and Health Informatics, 24, 2020

Carlos Hernandez-Matas, Xenophon Zabulis, and Antonis A Argyros. Rempe: Registration of retinal images through eye modelling and pose estimation.IEEE Journal of Biomedical and Health Informatics, 24, 2020. 1, 2, 5, 6

work page 2020
[3]

Loftr: Detector-free local feature matching with transformers

Jiaming Sun, Zehong Shen, Yuang Wang, Hujun Bao, and Xi- aowei Zhou. Loftr: Detector-free local feature matching with transformers. InCVPR, 2021. 1, 2, 5, 6, 7, 8

work page 2021
[4]

Semi-supervised keypoint detector and descriptor for retinal im- age matching

Jiazhen Liu, Xirong Li, Qijie Wei, Jie Xu, and Dayong Ding. Semi-supervised keypoint detector and descriptor for retinal im- age matching. InECCV, 2022. 1, 2, 5, 6

work page 2022
[5]

Superjunction: Learning- based junction detection for retinal image registration.Pro- ceedings of the AAAI Conference on Artificial Intelligence, 38(1):292–300, Mar

Yu Wang, Xiaoye Wang, Zaiwang Gu, Weide Liu, Wee Siong Ng, Weimin Huang, and Jun Cheng. Superjunction: Learning- based junction detection for retinal image registration.Pro- ceedings of the AAAI Conference on Artificial Intelligence, 38(1):292–300, Mar. 2024. 1

work page 2024
[6]

V oxelmorph: a learning framework for deformable medical image registration.IEEE Transactions on Medical Imaging, 38, 2019

Guha Balakrishnan, Amy Zhao, Mert R Sabuncu, John Guttag, and Adrian V Dalca. V oxelmorph: a learning framework for deformable medical image registration.IEEE Transactions on Medical Imaging, 38, 2019. 1, 2

work page 2019
[7]

Cyclemorph: cycle consistent un- supervised deformable image registration.Medical Image Anal- ysis, 71, 2021

Boah Kim, Dong Hwan Kim, Seong Ho Park, Jieun Kim, June- Goo Lee, and Jong Chul Ye. Cyclemorph: cycle consistent un- supervised deformable image registration.Medical Image Anal- ysis, 71, 2021. 1, 2

work page 2021
[8]

Diffusemorph: unsu- pervised deformable image registration using diffusion model

Boah Kim, Inhwa Han, and Jong Chul Ye. Diffusemorph: unsu- pervised deformable image registration using diffusion model. InECCV, 2022. 1, 4

work page 2022
[9]

Frey, Yufan He, William P

Junyu Chen, Eric C. Frey, Yufan He, William P. Segars, Ye Li, and Yong Du. Transmorph: Transformer for unsupervised med- ical image registration.Medical Image Analysis, 82:102615, November 2022. 1

work page 2022
[10]

Springer Nature Switzerland, 2023

Mingyuan Meng, Lei Bi, Michael Fulham, Dagan Feng, and Jinman Kim.Non-iterative Coarse-to-Fine Transformer Net- works for Joint Affine and Deformable Image Registration, page 750–760. Springer Nature Switzerland, 2023. 1

work page 2023
[11]

H- ViT: A Hierarchical Vision Transformer for Deformable Image Registration

Morteza Ghahremani, Mohammad Khateri, Bailiang Jian, Benedikt Wiestler, Ehsan Adeli, and Christian Wachinger. H- ViT: A Hierarchical Vision Transformer for Deformable Image Registration . In2024 IEEE/CVF Conference on Computer Vi- sion and Pattern Recognition (CVPR), pages 11513–11523, Los Alamitos, CA, USA, June 2024. IEEE Computer Society. 1

work page 2024
[12]

Besl and Neil D

P.J. Besl and Neil D. McKay. A method for registration of 3-d shapes.IEEE Transactions on Pattern Analysis and Machine Intelligence, 14, 1992. 1, 2

work page 1992
[13]

Active shape models-their training and applica- tion.Computer Vision and Image Understanding, 61, 1995

Timothy F Cootes, Christopher J Taylor, David H Cooper, and Jim Graham. Active shape models-their training and applica- tion.Computer Vision and Image Understanding, 61, 1995. 1

work page 1995
[14]

Lee, Ozan Oktay, Andreas Schuh, Michiel Schaap, and Ben Glocker

Matthew C.H. Lee, Ozan Oktay, Andreas Schuh, Michiel Schaap, and Ben Glocker. Image-and-spatial transformer net- works for structure-guided image registration. InMICCAI,

work page
[15]

A deep learning framework for unsupervised affine and deformable image regis- tration.Medical Image Analysis, 52, 2019

Bob D De V os, Floris F Berendsen, Max A Viergever, Hessam Sokooti, Marius Staring, and Ivana I ˇsgum. A deep learning framework for unsupervised affine and deformable image regis- tration.Medical Image Analysis, 52, 2019. 1, 2

work page 2019
[16]

Deep lucas- kanade homography for multimodal image alignment

Yiming Zhao, Xinming Huang, and Ziming Zhang. Deep lucas- kanade homography for multimodal image alignment. InCVPR,

work page
[17]

Mcnet: Re- thinking the core ingredients for accurate and efficient homog- raphy estimation

Haokai Zhu, Si-Yuan Cao, Jianxin Hu, Sitong Zuo, Beinan Yu, Jiacheng Ying, Junwei Li, and Hui-Liang Shen. Mcnet: Re- thinking the core ingredients for accurate and efficient homog- raphy estimation. InCVPR, 2024. 1, 2, 3, 5, 6, 7, 8

work page 2024
[18]

Correlation-aware coarse-to-fine mlps for deformable medical image registration, 2024

Mingyuan Meng, Dagan Feng, Lei Bi, and Jinman Kim. Correlation-aware coarse-to-fine mlps for deformable medical image registration, 2024. 1

work page 2024
[19]

Stendahl, Lawrence Staib, Albert J

Xiaoran Zhang, John C. Stendahl, Lawrence Staib, Albert J. Sinusas, Alex Wong, and James S. Duncan. Adaptive corre- spondence scoring for unsupervised medical image registration,

work page
[20]

Iirp-net: Iterative inference residual pyramid network for enhanced im- age registration

Tai Ma, Suwei Zhang, Jiafeng Li, and Ying Wen. Iirp-net: Iterative inference residual pyramid network for enhanced im- age registration. In2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 11546–11555,

work page
[21]

Superpoint: Self-supervised interest point detection and de- scription

Daniel DeTone, Tomasz Malisiewicz, and Andrew Rabinovich. Superpoint: Self-supervised interest point detection and de- scription. InCVPRW, 2018. 2, 5, 6

work page 2018
[22]

Glam- points: Greedily learned accurate match points

Prune Truong, Stefanos Apostolopoulos, Agata Mosinska, Samuel Stucky, Carlos Ciller, and Sandro De Zanet. Glam- points: Greedily learned accurate match points. InICCV, 2019. 2, 5, 6

work page 2019
[23]

Ncnet: Neighbourhood consensus networks for estimating image correspondences

Ignacio Rocco, Mircea Cimpoi, Relja Arandjelovi ´c, Akihiko Torii, Tomas Pajdla, and Josef Sivic. Ncnet: Neighbourhood consensus networks for estimating image correspondences. IEEE Transactions on Pattern Analysis and Machine Intelli- gence, 44, 2020. 2, 5, 6

work page 2020
[24]

Superglue: Learning feature matching with graph neural networks

Paul-Edouard Sarlin, Daniel DeTone, Tomasz Malisiewicz, and Andrew Rabinovich. Superglue: Learning feature matching with graph neural networks. InCVPR, 2020. 2, 5, 6

work page 2020
[25]

Aspanformer: Detector-free image matching with adap- tive span transformer

Hongkai Chen, Zixin Luo, Lei Zhou, Yurun Tian, Mingmin Zhen, Tian Fang, David McKinnon, Yanghai Tsin, and Long Quan. Aspanformer: Detector-free image matching with adap- tive span transformer. InECCV, 2022. 2, 5, 6

work page 2022
[26]

Tamplin, Isabella M

Vishal Balaji Sivaraman, Muhammad Imran, Qingyue Wei, Preethika Muralidharan, Michelle R. Tamplin, Isabella M . Grumbach, Randy H. Kardon, Jui-Kai Wang, Yuyin Zhou, and Wei Shao. Retinaregnet: A zero-shot approach for retinal image registration, 2024. 2, 5, 6

work page 2024
[27]

Iterative deep homography estimation

Si-Yuan Cao, Jianxin Hu, Zehua Sheng, and Hui-Liang Shen. Iterative deep homography estimation. InCVPR, 2022. 2, 3, 5, 6, 7, 8 9

work page 2022
[28]

Distinctive image features from scale-invariant keypoints.International Journal of Computer Vision, 60, 2004

David G Lowe. Distinctive image features from scale-invariant keypoints.International Journal of Computer Vision, 60, 2004. 2, 4, 5

work page 2004
[29]

Surf: Speeded up robust features

Herbert Bay, Tinne Tuytelaars, and Luc Van Gool. Surf: Speeded up robust features. InECCV, 2006. 2

work page 2006
[30]

Faster and better: A machine learning approach to corner detection.IEEE Transactions on Pattern Analysis and Machine Intelligence, 32,

Edward Rosten, Reid Porter, and Tom Drummond. Faster and better: A machine learning approach to corner detection.IEEE Transactions on Pattern Analysis and Machine Intelligence, 32,

work page
[31]

Brief: Binary robust independent elementary features

Michael Calonder, Vincent Lepetit, Christoph Strecha, and Pas- cal Fua. Brief: Binary robust independent elementary features. InECCV, 2010. 2

work page 2010
[32]

R2d2: Repeatable and reliable detector and descriptor.arXiv preprint, 2019

Jerome Revaud, Cesar De Souza, Martin Humenberger, and Philippe Weinzaepfel. R2d2: Repeatable and reliable detector and descriptor.arXiv preprint, 2019. 2

work page 2019
[33]

LightGlue: Local Feature Matching at Light Speed

Philipp Lindenberger, Paul-Edouard Sarlin, and Marc Pollefeys. LightGlue: Local Feature Matching at Light Speed. InICCV,

work page
[34]

Deep image homography estimation.arXiv preprint, 2016

Daniel DeTone, Tomasz Malisiewicz, and Andrew Rabinovich. Deep image homography estimation.arXiv preprint, 2016. 2

work page 2016
[35]

Gmflow: Learning optical flow via global match- ing

Haofei Xu, Jing Zhang, Jianfei Cai, Hamid Rezatofighi, and Dacheng Tao. Gmflow: Learning optical flow via global match- ing. InCVPR, 2022. 2

work page 2022
[36]

Flowformer: A transformer architecture for optical flow

Zhaoyang Huang, Xiaoyu Shi, Chao Zhang, Qiang Wang, Ka Chun Cheung, Hongwei Qin, Jifeng Dai, and Hongsheng Li. Flowformer: A transformer architecture for optical flow. In ECCV, 2022. 2

work page 2022
[37]

RoMa: Robust Dense Feature Matching

Johan Edstedt, Qiyu Sun, Georg B ¨okman, M˚arten Wadenb¨ack, and Michael Felsberg. RoMa: Robust Dense Feature Matching. IEEE Conference on Computer Vision and Pattern Recognition,

work page
[38]

Emergent correspondence from image diffusion

Luming Tang, Menglin Jia, Qianqian Wang, Cheng Perng Phoo, and Bharath Hariharan. Emergent correspondence from image diffusion. InThirty-seventh Conference on Neural Information Processing Systems, 2023. 2

work page 2023
[39]

Wells III

Paul Viola and William M. Wells III. Alignment by maximiza- tion of mutual information.International Journal of Computer Vision, 24(2):137–154, 1997. 2

work page 1997
[40]

Image registration methods: a survey.Image and Vision Computing, 21(11):977–1000, 2003

Barbara Zitov ´a and Jan Flusser. Image registration methods: a survey.Image and Vision Computing, 21(11):977–1000, 2003. 2

work page 2003
[41]

Multimodality image registration by maximization of mutual information.IEEE Transactions on Medical Imaging, 16(2):187–198, 1997

Frederik Maes, Andr ´e Collignon, Dirk Vandermeulen, Guy Marchal, and Paul Suetens. Multimodality image registration by maximization of mutual information.IEEE Transactions on Medical Imaging, 16(2):187–198, 1997. 2

work page 1997
[42]

Image matching as a diffusion process: an analogy with maxwell’s demons.Medical Image Analysis, 2(3):243–260, 1998

Jean-Philippe Thirion. Image matching as a diffusion process: an analogy with maxwell’s demons.Medical Image Analysis, 2(3):243–260, 1998. 2

work page 1998
[43]

Spatial transformer networks.arXiv preprint,

Max Jaderberg, Karen Simonyan, Andrew Zisserman, and Ko- ray Kavukcuoglu. Spatial transformer networks.arXiv preprint,

work page
[44]

Separable flow: Learning motion cost volumes for optical flow estimation

Feihu Zhang, Oliver J Woodford, Victor Adrian Prisacariu, and Philip HS Torr. Separable flow: Learning motion cost volumes for optical flow estimation. InICCV, 2021. 2

work page 2021
[45]

Deformable image regis- tration based on similarity-steered cnn regression

Xiaohuan Cao, Jianhua Yang, Jun Zhang, Dong Nie, Minjeong Kim, Qian Wang, and Dinggang Shen. Deformable image regis- tration based on similarity-steered cnn regression. InMICCAI,

work page
[46]

Weakly-supervised con- volutional neural networks for multimodal image registration

Yipeng Hu, Marc Modat, Eli Gibson, Wenqi Li, Nooshin Ghavami, Ester Bonmati, Guotai Wang, Steven Bandula, Car- oline M Moore, Mark Emberton, et al. Weakly-supervised con- volutional neural networks for multimodal image registration. Medical Image Analysis, 49, 2018. 2

work page 2018
[47]

Deepatlas: Joint semi- supervised learning of image registration and segmentation

Zhenlin Xu and Marc Niethammer. Deepatlas: Joint semi- supervised learning of image registration and segmentation. In MICCAI, 2019. 2

work page 2019
[48]

Springer Nature Switzerland, 2023

Mingyuan Meng, Lei Bi, Michael Fulham, Dagan Feng, and Jinman Kim.Non-iterative Coarse-to-Fine Transformer Net- works for Joint Affine and Deformable Image Registration, page 750–760. Springer Nature Switzerland, 2023. 2

work page 2023
[49]

Random sample con- sensus: a paradigm for model fitting with applications to image analysis and automated cartography.Communications of the ACM, 24(6):381–395, 1981

Martin A Fischler and Robert C Bolles. Random sample con- sensus: a paradigm for model fitting with applications to image analysis and automated cartography.Communications of the ACM, 24(6):381–395, 1981. 2

work page 1981
[50]

Lucas and Takeo Kanade

Bruce D. Lucas and Takeo Kanade. An iterative image regis- tration technique with an application to stereo vision. InPro- ceedings of the 7th International Joint Conference on Artificial Intelligence (IJCAI), pages 674–679, 1981. 2

work page 1981
[51]

Posediffusion: Solving pose estimation via diffusion-aided bun- dle adjustment

Jianyuan Wang, Christian Rupprecht, and David Novotny. Posediffusion: Solving pose estimation via diffusion-aided bun- dle adjustment. InICCV, 2023. 3

work page 2023
[52]

Cameras as rays: Pose estimation via ray diffusion

Jason Y Zhang, Amy Lin, Moneish Kumar, Tzu-Hsuan Yang, Deva Ramanan, and Shubham Tulsiani. Cameras as rays: Pose estimation via ray diffusion. InICLR, 2024. 3

work page 2024
[53]

Ang Jr, and Daniela Rus

Yechao Bai, Ziyuan Huang, Lyuyu Shen, Hongliang Guo, Marcelo H. Ang Jr, and Daniela Rus. Multi-scale feature ag- gregation by cross-scale pixel-to-region relation operation for semantic segmentation.IEEE Robotics and Automation Letters, 6(3):5889–5896, July 2021. 3

work page 2021
[54]

Cambridge, 2003

Richard Hartley and Andrew Zisserman.Multiple view geome- try in computer vision. Cambridge, 2003. 4

work page 2003
[55]

Dsac* - differentiable ransac for camera lo- calization

Brachmann et al. Dsac* - differentiable ransac for camera lo- calization. InCVPR, 2019. 4

work page 2019
[56]

Fire: Fundus image registration dataset.Journal for Modeling in Ophthalmology, 1, 2017

Carlos Hernandez-Matas, Xenophon Zabulis, Areti Triantafyl- lou, Panagiota Anyfanti, Stella Douma, and Antonis A Argyros. Fire: Fundus image registration dataset.Journal for Modeling in Ophthalmology, 1, 2017. 5, 6, 8

work page 2017
[57]

Flori21: Fluorescein an- giography longitudinal retinal image registration dataset, 2021

Li Ding, Tony Kang, Ajay Kuriyan, Rajeev Ramchandran, Charles Wykoff, and Gaurav Sharma. Flori21: Fluorescein an- giography longitudinal retinal image registration dataset, 2021. 5, 6, 8

work page 2021
[58]

Hpatches: A benchmark and evaluation of hand- crafted and learned local descriptors, 2017

Vassileios Balntas, Karel Lenc, Andrea Vedaldi, and Krystian Mikolajczyk. Hpatches: A benchmark and evaluation of hand- crafted and learned local descriptors, 2017. 5, 7, 8

work page 2017
[59]

Megadepth: Learning single- view depth prediction from internet photos

Zhengqi Li and Noah Snavely. Megadepth: Learning single- view depth prediction from internet photos. InComputer Vision and Pattern Recognition (CVPR), 2018. 5, 7

work page 2018
[60]

Chang, Manolis Savva, Maciej Halber, Thomas Funkhouser, and Matthias Nießner

Angela Dai, Angel X. Chang, Manolis Savva, Maciej Halber, Thomas Funkhouser, and Matthias Nießner. Scannet: Richly- annotated 3d reconstructions of indoor scenes. InProc. Com- puter Vision and Pattern Recognition (CVPR), IEEE, 2017. 5, 7 10

work page 2017
[61]

Lawrence Zitnick, and Piotr Doll ´ar

Tsung-Yi Lin, Michael Maire, Serge Belongie, Lubomir Bour- dev, Ross Girshick, James Hays, Pietro Perona, Deva Ramanan, C. Lawrence Zitnick, and Piotr Doll ´ar. Microsoft coco: Com- mon objects in context, 2015. 5, 7, 8

work page 2015
[62]

Decoupled weight decay reg- ularization, 2019

Ilya Loshchilov and Frank Hutter. Decoupled weight decay reg- ularization, 2019. 5

work page 2019
[63]

The dual-bootstrap iterative closest point algorithm with application to retinal image registration.IEEE Transactions on Medical Imaging, 22, 2003

Charles Stewart, Chia-Ling Tsai, and Badrinath Roysam. The dual-bootstrap iterative closest point algorithm with application to retinal image registration.IEEE Transactions on Medical Imaging, 22, 2003. 5

work page 2003
[64]

Matchformer: Interleaving attention in transformers for feature matching

Qing Wang, Jiaming Zhang, Kailun Yang, Kunyu Peng, and Rainer Stiefelhagen. Matchformer: Interleaving attention in transformers for feature matching. InAsian Conference on Computer Vision, 2022. 7

work page 2022
[65]

Object retrieval with large vocabularies and fast spatial matching

James Philbin, Ondrej Chum, Michael Isard, Josef Sivic, and Andrew Zisserman. Object retrieval with large vocabularies and fast spatial matching. In2007 IEEE Conference on Computer Vision and Pattern Recognition, pages 1–8, 2007. 7

work page 2007
[66]

Lost in quantization: Improving particular object retrieval in large scale image databases

James Philbin, Ondrej Chum, Michael Isard, Josef Sivic, and Andrew Zisserman. Lost in quantization: Improving particular object retrieval in large scale image databases. In2008 IEEE Conference on Computer Vision and Pattern Recognition, pages 1–8, 2008. 7

work page 2008
[67]

Recurrent homogra- phy estimation using homography-guided image warping and focus transformer

Si-Yuan Cao, Runmin Zhang, Lun Luo, Beinan Yu, Zehua Sheng, Junwei Li, and Hui-Liang Shen. Recurrent homogra- phy estimation using homography-guided image warping and focus transformer. InCVPR, 2023. 8 11

work page 2023

[1] [1]

Geometrized transformer for self- supervised homography estimation

Jiazhen Liu and Xirong Li. Geometrized transformer for self- supervised homography estimation. InICCV, 2023. 1, 2, 5, 6, 8

work page 2023

[2] [2]

Rempe: Registration of retinal images through eye modelling and pose estimation.IEEE Journal of Biomedical and Health Informatics, 24, 2020

Carlos Hernandez-Matas, Xenophon Zabulis, and Antonis A Argyros. Rempe: Registration of retinal images through eye modelling and pose estimation.IEEE Journal of Biomedical and Health Informatics, 24, 2020. 1, 2, 5, 6

work page 2020

[3] [3]

Loftr: Detector-free local feature matching with transformers

Jiaming Sun, Zehong Shen, Yuang Wang, Hujun Bao, and Xi- aowei Zhou. Loftr: Detector-free local feature matching with transformers. InCVPR, 2021. 1, 2, 5, 6, 7, 8

work page 2021

[4] [4]

Semi-supervised keypoint detector and descriptor for retinal im- age matching

Jiazhen Liu, Xirong Li, Qijie Wei, Jie Xu, and Dayong Ding. Semi-supervised keypoint detector and descriptor for retinal im- age matching. InECCV, 2022. 1, 2, 5, 6

work page 2022

[5] [5]

Superjunction: Learning- based junction detection for retinal image registration.Pro- ceedings of the AAAI Conference on Artificial Intelligence, 38(1):292–300, Mar

Yu Wang, Xiaoye Wang, Zaiwang Gu, Weide Liu, Wee Siong Ng, Weimin Huang, and Jun Cheng. Superjunction: Learning- based junction detection for retinal image registration.Pro- ceedings of the AAAI Conference on Artificial Intelligence, 38(1):292–300, Mar. 2024. 1

work page 2024

[6] [6]

V oxelmorph: a learning framework for deformable medical image registration.IEEE Transactions on Medical Imaging, 38, 2019

Guha Balakrishnan, Amy Zhao, Mert R Sabuncu, John Guttag, and Adrian V Dalca. V oxelmorph: a learning framework for deformable medical image registration.IEEE Transactions on Medical Imaging, 38, 2019. 1, 2

work page 2019

[7] [7]

Cyclemorph: cycle consistent un- supervised deformable image registration.Medical Image Anal- ysis, 71, 2021

Boah Kim, Dong Hwan Kim, Seong Ho Park, Jieun Kim, June- Goo Lee, and Jong Chul Ye. Cyclemorph: cycle consistent un- supervised deformable image registration.Medical Image Anal- ysis, 71, 2021. 1, 2

work page 2021

[8] [8]

Diffusemorph: unsu- pervised deformable image registration using diffusion model

Boah Kim, Inhwa Han, and Jong Chul Ye. Diffusemorph: unsu- pervised deformable image registration using diffusion model. InECCV, 2022. 1, 4

work page 2022

[9] [9]

Frey, Yufan He, William P

Junyu Chen, Eric C. Frey, Yufan He, William P. Segars, Ye Li, and Yong Du. Transmorph: Transformer for unsupervised med- ical image registration.Medical Image Analysis, 82:102615, November 2022. 1

work page 2022

[10] [10]

Springer Nature Switzerland, 2023

Mingyuan Meng, Lei Bi, Michael Fulham, Dagan Feng, and Jinman Kim.Non-iterative Coarse-to-Fine Transformer Net- works for Joint Affine and Deformable Image Registration, page 750–760. Springer Nature Switzerland, 2023. 1

work page 2023

[11] [11]

H- ViT: A Hierarchical Vision Transformer for Deformable Image Registration

Morteza Ghahremani, Mohammad Khateri, Bailiang Jian, Benedikt Wiestler, Ehsan Adeli, and Christian Wachinger. H- ViT: A Hierarchical Vision Transformer for Deformable Image Registration . In2024 IEEE/CVF Conference on Computer Vi- sion and Pattern Recognition (CVPR), pages 11513–11523, Los Alamitos, CA, USA, June 2024. IEEE Computer Society. 1

work page 2024

[12] [12]

Besl and Neil D

P.J. Besl and Neil D. McKay. A method for registration of 3-d shapes.IEEE Transactions on Pattern Analysis and Machine Intelligence, 14, 1992. 1, 2

work page 1992

[13] [13]

Active shape models-their training and applica- tion.Computer Vision and Image Understanding, 61, 1995

Timothy F Cootes, Christopher J Taylor, David H Cooper, and Jim Graham. Active shape models-their training and applica- tion.Computer Vision and Image Understanding, 61, 1995. 1

work page 1995

[14] [14]

Lee, Ozan Oktay, Andreas Schuh, Michiel Schaap, and Ben Glocker

Matthew C.H. Lee, Ozan Oktay, Andreas Schuh, Michiel Schaap, and Ben Glocker. Image-and-spatial transformer net- works for structure-guided image registration. InMICCAI,

work page

[15] [15]

A deep learning framework for unsupervised affine and deformable image regis- tration.Medical Image Analysis, 52, 2019

Bob D De V os, Floris F Berendsen, Max A Viergever, Hessam Sokooti, Marius Staring, and Ivana I ˇsgum. A deep learning framework for unsupervised affine and deformable image regis- tration.Medical Image Analysis, 52, 2019. 1, 2

work page 2019

[16] [16]

Deep lucas- kanade homography for multimodal image alignment

Yiming Zhao, Xinming Huang, and Ziming Zhang. Deep lucas- kanade homography for multimodal image alignment. InCVPR,

work page

[17] [17]

Mcnet: Re- thinking the core ingredients for accurate and efficient homog- raphy estimation

Haokai Zhu, Si-Yuan Cao, Jianxin Hu, Sitong Zuo, Beinan Yu, Jiacheng Ying, Junwei Li, and Hui-Liang Shen. Mcnet: Re- thinking the core ingredients for accurate and efficient homog- raphy estimation. InCVPR, 2024. 1, 2, 3, 5, 6, 7, 8

work page 2024

[18] [18]

Correlation-aware coarse-to-fine mlps for deformable medical image registration, 2024

Mingyuan Meng, Dagan Feng, Lei Bi, and Jinman Kim. Correlation-aware coarse-to-fine mlps for deformable medical image registration, 2024. 1

work page 2024

[19] [19]

Stendahl, Lawrence Staib, Albert J

Xiaoran Zhang, John C. Stendahl, Lawrence Staib, Albert J. Sinusas, Alex Wong, and James S. Duncan. Adaptive corre- spondence scoring for unsupervised medical image registration,

work page

[20] [20]

Iirp-net: Iterative inference residual pyramid network for enhanced im- age registration

Tai Ma, Suwei Zhang, Jiafeng Li, and Ying Wen. Iirp-net: Iterative inference residual pyramid network for enhanced im- age registration. In2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 11546–11555,

work page

[21] [21]

Superpoint: Self-supervised interest point detection and de- scription

Daniel DeTone, Tomasz Malisiewicz, and Andrew Rabinovich. Superpoint: Self-supervised interest point detection and de- scription. InCVPRW, 2018. 2, 5, 6

work page 2018

[22] [22]

Glam- points: Greedily learned accurate match points

Prune Truong, Stefanos Apostolopoulos, Agata Mosinska, Samuel Stucky, Carlos Ciller, and Sandro De Zanet. Glam- points: Greedily learned accurate match points. InICCV, 2019. 2, 5, 6

work page 2019

[23] [23]

Ncnet: Neighbourhood consensus networks for estimating image correspondences

Ignacio Rocco, Mircea Cimpoi, Relja Arandjelovi ´c, Akihiko Torii, Tomas Pajdla, and Josef Sivic. Ncnet: Neighbourhood consensus networks for estimating image correspondences. IEEE Transactions on Pattern Analysis and Machine Intelli- gence, 44, 2020. 2, 5, 6

work page 2020

[24] [24]

Superglue: Learning feature matching with graph neural networks

Paul-Edouard Sarlin, Daniel DeTone, Tomasz Malisiewicz, and Andrew Rabinovich. Superglue: Learning feature matching with graph neural networks. InCVPR, 2020. 2, 5, 6

work page 2020

[25] [25]

Aspanformer: Detector-free image matching with adap- tive span transformer

Hongkai Chen, Zixin Luo, Lei Zhou, Yurun Tian, Mingmin Zhen, Tian Fang, David McKinnon, Yanghai Tsin, and Long Quan. Aspanformer: Detector-free image matching with adap- tive span transformer. InECCV, 2022. 2, 5, 6

work page 2022

[26] [26]

Tamplin, Isabella M

Vishal Balaji Sivaraman, Muhammad Imran, Qingyue Wei, Preethika Muralidharan, Michelle R. Tamplin, Isabella M . Grumbach, Randy H. Kardon, Jui-Kai Wang, Yuyin Zhou, and Wei Shao. Retinaregnet: A zero-shot approach for retinal image registration, 2024. 2, 5, 6

work page 2024

[27] [27]

Iterative deep homography estimation

Si-Yuan Cao, Jianxin Hu, Zehua Sheng, and Hui-Liang Shen. Iterative deep homography estimation. InCVPR, 2022. 2, 3, 5, 6, 7, 8 9

work page 2022

[28] [28]

Distinctive image features from scale-invariant keypoints.International Journal of Computer Vision, 60, 2004

David G Lowe. Distinctive image features from scale-invariant keypoints.International Journal of Computer Vision, 60, 2004. 2, 4, 5

work page 2004

[29] [29]

Surf: Speeded up robust features

Herbert Bay, Tinne Tuytelaars, and Luc Van Gool. Surf: Speeded up robust features. InECCV, 2006. 2

work page 2006

[30] [30]

Faster and better: A machine learning approach to corner detection.IEEE Transactions on Pattern Analysis and Machine Intelligence, 32,

Edward Rosten, Reid Porter, and Tom Drummond. Faster and better: A machine learning approach to corner detection.IEEE Transactions on Pattern Analysis and Machine Intelligence, 32,

work page

[31] [31]

Brief: Binary robust independent elementary features

Michael Calonder, Vincent Lepetit, Christoph Strecha, and Pas- cal Fua. Brief: Binary robust independent elementary features. InECCV, 2010. 2

work page 2010

[32] [32]

R2d2: Repeatable and reliable detector and descriptor.arXiv preprint, 2019

Jerome Revaud, Cesar De Souza, Martin Humenberger, and Philippe Weinzaepfel. R2d2: Repeatable and reliable detector and descriptor.arXiv preprint, 2019. 2

work page 2019

[33] [33]

LightGlue: Local Feature Matching at Light Speed

Philipp Lindenberger, Paul-Edouard Sarlin, and Marc Pollefeys. LightGlue: Local Feature Matching at Light Speed. InICCV,

work page

[34] [34]

Deep image homography estimation.arXiv preprint, 2016

Daniel DeTone, Tomasz Malisiewicz, and Andrew Rabinovich. Deep image homography estimation.arXiv preprint, 2016. 2

work page 2016

[35] [35]

Gmflow: Learning optical flow via global match- ing

Haofei Xu, Jing Zhang, Jianfei Cai, Hamid Rezatofighi, and Dacheng Tao. Gmflow: Learning optical flow via global match- ing. InCVPR, 2022. 2

work page 2022

[36] [36]

Flowformer: A transformer architecture for optical flow

Zhaoyang Huang, Xiaoyu Shi, Chao Zhang, Qiang Wang, Ka Chun Cheung, Hongwei Qin, Jifeng Dai, and Hongsheng Li. Flowformer: A transformer architecture for optical flow. In ECCV, 2022. 2

work page 2022

[37] [37]

RoMa: Robust Dense Feature Matching

Johan Edstedt, Qiyu Sun, Georg B ¨okman, M˚arten Wadenb¨ack, and Michael Felsberg. RoMa: Robust Dense Feature Matching. IEEE Conference on Computer Vision and Pattern Recognition,

work page

[38] [38]

Emergent correspondence from image diffusion

Luming Tang, Menglin Jia, Qianqian Wang, Cheng Perng Phoo, and Bharath Hariharan. Emergent correspondence from image diffusion. InThirty-seventh Conference on Neural Information Processing Systems, 2023. 2

work page 2023

[39] [39]

Wells III

Paul Viola and William M. Wells III. Alignment by maximiza- tion of mutual information.International Journal of Computer Vision, 24(2):137–154, 1997. 2

work page 1997

[40] [40]

Image registration methods: a survey.Image and Vision Computing, 21(11):977–1000, 2003

Barbara Zitov ´a and Jan Flusser. Image registration methods: a survey.Image and Vision Computing, 21(11):977–1000, 2003. 2

work page 2003

[41] [41]

Multimodality image registration by maximization of mutual information.IEEE Transactions on Medical Imaging, 16(2):187–198, 1997

Frederik Maes, Andr ´e Collignon, Dirk Vandermeulen, Guy Marchal, and Paul Suetens. Multimodality image registration by maximization of mutual information.IEEE Transactions on Medical Imaging, 16(2):187–198, 1997. 2

work page 1997

[42] [42]

Image matching as a diffusion process: an analogy with maxwell’s demons.Medical Image Analysis, 2(3):243–260, 1998

Jean-Philippe Thirion. Image matching as a diffusion process: an analogy with maxwell’s demons.Medical Image Analysis, 2(3):243–260, 1998. 2

work page 1998

[43] [43]

Spatial transformer networks.arXiv preprint,

Max Jaderberg, Karen Simonyan, Andrew Zisserman, and Ko- ray Kavukcuoglu. Spatial transformer networks.arXiv preprint,

work page

[44] [44]

Separable flow: Learning motion cost volumes for optical flow estimation

Feihu Zhang, Oliver J Woodford, Victor Adrian Prisacariu, and Philip HS Torr. Separable flow: Learning motion cost volumes for optical flow estimation. InICCV, 2021. 2

work page 2021

[45] [45]

Deformable image regis- tration based on similarity-steered cnn regression

Xiaohuan Cao, Jianhua Yang, Jun Zhang, Dong Nie, Minjeong Kim, Qian Wang, and Dinggang Shen. Deformable image regis- tration based on similarity-steered cnn regression. InMICCAI,

work page

[46] [46]

Weakly-supervised con- volutional neural networks for multimodal image registration

Yipeng Hu, Marc Modat, Eli Gibson, Wenqi Li, Nooshin Ghavami, Ester Bonmati, Guotai Wang, Steven Bandula, Car- oline M Moore, Mark Emberton, et al. Weakly-supervised con- volutional neural networks for multimodal image registration. Medical Image Analysis, 49, 2018. 2

work page 2018

[47] [47]

Deepatlas: Joint semi- supervised learning of image registration and segmentation

Zhenlin Xu and Marc Niethammer. Deepatlas: Joint semi- supervised learning of image registration and segmentation. In MICCAI, 2019. 2

work page 2019

[48] [48]

Springer Nature Switzerland, 2023

Mingyuan Meng, Lei Bi, Michael Fulham, Dagan Feng, and Jinman Kim.Non-iterative Coarse-to-Fine Transformer Net- works for Joint Affine and Deformable Image Registration, page 750–760. Springer Nature Switzerland, 2023. 2

work page 2023

[49] [49]

Random sample con- sensus: a paradigm for model fitting with applications to image analysis and automated cartography.Communications of the ACM, 24(6):381–395, 1981

Martin A Fischler and Robert C Bolles. Random sample con- sensus: a paradigm for model fitting with applications to image analysis and automated cartography.Communications of the ACM, 24(6):381–395, 1981. 2

work page 1981

[50] [50]

Lucas and Takeo Kanade

Bruce D. Lucas and Takeo Kanade. An iterative image regis- tration technique with an application to stereo vision. InPro- ceedings of the 7th International Joint Conference on Artificial Intelligence (IJCAI), pages 674–679, 1981. 2

work page 1981

[51] [51]

Posediffusion: Solving pose estimation via diffusion-aided bun- dle adjustment

Jianyuan Wang, Christian Rupprecht, and David Novotny. Posediffusion: Solving pose estimation via diffusion-aided bun- dle adjustment. InICCV, 2023. 3

work page 2023

[52] [52]

Cameras as rays: Pose estimation via ray diffusion

Jason Y Zhang, Amy Lin, Moneish Kumar, Tzu-Hsuan Yang, Deva Ramanan, and Shubham Tulsiani. Cameras as rays: Pose estimation via ray diffusion. InICLR, 2024. 3

work page 2024

[53] [53]

Ang Jr, and Daniela Rus

Yechao Bai, Ziyuan Huang, Lyuyu Shen, Hongliang Guo, Marcelo H. Ang Jr, and Daniela Rus. Multi-scale feature ag- gregation by cross-scale pixel-to-region relation operation for semantic segmentation.IEEE Robotics and Automation Letters, 6(3):5889–5896, July 2021. 3

work page 2021

[54] [54]

Cambridge, 2003

Richard Hartley and Andrew Zisserman.Multiple view geome- try in computer vision. Cambridge, 2003. 4

work page 2003

[55] [55]

Dsac* - differentiable ransac for camera lo- calization

Brachmann et al. Dsac* - differentiable ransac for camera lo- calization. InCVPR, 2019. 4

work page 2019

[56] [56]

Fire: Fundus image registration dataset.Journal for Modeling in Ophthalmology, 1, 2017

Carlos Hernandez-Matas, Xenophon Zabulis, Areti Triantafyl- lou, Panagiota Anyfanti, Stella Douma, and Antonis A Argyros. Fire: Fundus image registration dataset.Journal for Modeling in Ophthalmology, 1, 2017. 5, 6, 8

work page 2017

[57] [57]

Flori21: Fluorescein an- giography longitudinal retinal image registration dataset, 2021

Li Ding, Tony Kang, Ajay Kuriyan, Rajeev Ramchandran, Charles Wykoff, and Gaurav Sharma. Flori21: Fluorescein an- giography longitudinal retinal image registration dataset, 2021. 5, 6, 8

work page 2021

[58] [58]

Hpatches: A benchmark and evaluation of hand- crafted and learned local descriptors, 2017

Vassileios Balntas, Karel Lenc, Andrea Vedaldi, and Krystian Mikolajczyk. Hpatches: A benchmark and evaluation of hand- crafted and learned local descriptors, 2017. 5, 7, 8

work page 2017

[59] [59]

Megadepth: Learning single- view depth prediction from internet photos

Zhengqi Li and Noah Snavely. Megadepth: Learning single- view depth prediction from internet photos. InComputer Vision and Pattern Recognition (CVPR), 2018. 5, 7

work page 2018

[60] [60]

Chang, Manolis Savva, Maciej Halber, Thomas Funkhouser, and Matthias Nießner

Angela Dai, Angel X. Chang, Manolis Savva, Maciej Halber, Thomas Funkhouser, and Matthias Nießner. Scannet: Richly- annotated 3d reconstructions of indoor scenes. InProc. Com- puter Vision and Pattern Recognition (CVPR), IEEE, 2017. 5, 7 10

work page 2017

[61] [61]

Lawrence Zitnick, and Piotr Doll ´ar

Tsung-Yi Lin, Michael Maire, Serge Belongie, Lubomir Bour- dev, Ross Girshick, James Hays, Pietro Perona, Deva Ramanan, C. Lawrence Zitnick, and Piotr Doll ´ar. Microsoft coco: Com- mon objects in context, 2015. 5, 7, 8

work page 2015

[62] [62]

Decoupled weight decay reg- ularization, 2019

Ilya Loshchilov and Frank Hutter. Decoupled weight decay reg- ularization, 2019. 5

work page 2019

[63] [63]

The dual-bootstrap iterative closest point algorithm with application to retinal image registration.IEEE Transactions on Medical Imaging, 22, 2003

Charles Stewart, Chia-Ling Tsai, and Badrinath Roysam. The dual-bootstrap iterative closest point algorithm with application to retinal image registration.IEEE Transactions on Medical Imaging, 22, 2003. 5

work page 2003

[64] [64]

Matchformer: Interleaving attention in transformers for feature matching

Qing Wang, Jiaming Zhang, Kailun Yang, Kunyu Peng, and Rainer Stiefelhagen. Matchformer: Interleaving attention in transformers for feature matching. InAsian Conference on Computer Vision, 2022. 7

work page 2022

[65] [65]

Object retrieval with large vocabularies and fast spatial matching

James Philbin, Ondrej Chum, Michael Isard, Josef Sivic, and Andrew Zisserman. Object retrieval with large vocabularies and fast spatial matching. In2007 IEEE Conference on Computer Vision and Pattern Recognition, pages 1–8, 2007. 7

work page 2007

[66] [66]

Lost in quantization: Improving particular object retrieval in large scale image databases

James Philbin, Ondrej Chum, Michael Isard, Josef Sivic, and Andrew Zisserman. Lost in quantization: Improving particular object retrieval in large scale image databases. In2008 IEEE Conference on Computer Vision and Pattern Recognition, pages 1–8, 2008. 7

work page 2008

[67] [67]

Recurrent homogra- phy estimation using homography-guided image warping and focus transformer

Si-Yuan Cao, Runmin Zhang, Lun Luo, Beinan Yu, Zehua Sheng, Junwei Li, and Hui-Liang Shen. Recurrent homogra- phy estimation using homography-guided image warping and focus transformer. InCVPR, 2023. 8 11

work page 2023