Establishing Robust Retinal Eye Tracking: A Weakly Supervised Algorithmic Framework

Alexander Fix; Bo Wen; Catherine A.Fromm; Dillon Lohr; Francesco La Rocca; Mohamed El-Haddad; Pushkar Anand; Ruobing Qian; Truong Nguyen; Yatong An

arxiv: 2605.09181 · v1 · submitted 2026-05-09 · 💻 cs.CV · cs.ET· eess.IV

Establishing Robust Retinal Eye Tracking: A Weakly Supervised Algorithmic Framework

Bo Wen , Dillon Lohr , Yatong An , Pushkar Anand , Alexander Fix , Ruobing Qian , Catherine A.Fromm , Yimin Ding

show 3 more authors

Truong Nguyen Mohamed El-Haddad Francesco La Rocca

This is my paper

Pith reviewed 2026-05-12 02:32 UTC · model grok-4.3

classification 💻 cs.CV cs.ETeess.IV

keywords retinal eye trackingweakly supervised learninggaze estimationtemplate matchingophthalmic imagingeye tracking robustness

0 comments

The pith

A weakly-supervised learning framework delivers robust retinal eye tracking with gaze error below 0.45 degrees.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Retinal image-based eye tracking offers higher precision than standard pupil and cornea methods used in AR/VR devices, yet existing algorithms depend on classical template matching that breaks down with feature changes and real imaging conditions. The paper presents a new learning-based approach trained under weak supervision to register and track retinal features more reliably. Early tests across six participants reach a 95th-percentile gaze error under 0.45 degrees. If the method holds, it opens practical use of retinal tracking for ophthalmic imaging and higher-accuracy gaze systems. The design avoids the need for dense manual annotations during training.

Core claim

The authors propose a novel weakly-supervised, learning-based framework for robust retinal eye tracking that improves upon classical template-matching registration by handling retinal feature variability and real-world imaging conditions, demonstrated through initial studies achieving a 95th-percentile gaze error below 0.45 degrees across six participants.

What carries the argument

The weakly-supervised learning-based framework that learns to register retinal images for eye position without full supervision or dense labels.

If this is right

Retinal eye tracking becomes reliable enough for routine use in ophthalmic imaging systems.
AR/VR devices can achieve higher gaze accuracy by switching to retinal methods.
Training eye trackers requires far less labeled data than fully supervised alternatives.
Tracking stability improves across variable retinal features and capture conditions.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same weak-supervision strategy might transfer to other medical image registration tasks with limited annotations.
Integration with hardware sensors in consumer devices would test whether the accuracy persists outside controlled studies.
Larger-scale validation on diverse age groups and eye pathologies would clarify the method's practical limits.

Load-bearing premise

The accuracy measured in the small group of six participants will hold for larger populations and under the full range of real-world retinal imaging variations.

What would settle it

Applying the trained framework to a new cohort of participants or under previously untested lighting and eye conditions and measuring 95th-percentile gaze error above 0.45 degrees.

read the original abstract

Retinal image-based eye tracking is widely used in ophthalmic imaging and vision science, and is a promising path to deliver higher gaze accuracy than the pupil- and cornea-based approaches commonly used in modern AR/VR devices. Nevertheless, existing retinal tracking algorithms still primarily rely on classical template-matching registration, which can be insufficiently robust to retinal feature variability and real-world imaging conditions. In this work, we propose a novel weakly-supervised, learning-based framework for robust retinal eye tracking. Initial studies demonstrate high accuracy, achieving the 95th-percentile gaze error < 0.45 deg across a cohort of 6 participants.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper proposes a weakly supervised retinal eye tracker with promising accuracy numbers, but evaluation on only 6 participants undercuts the robustness story.

read the letter

The main thing to know is that this work replaces classical template matching with a weakly supervised learning framework for retinal eye tracking. It reports 95th percentile gaze error below 0.45 degrees on six participants, which sounds good for AR/VR and ophthalmic uses if it generalizes. The paper does a decent job identifying why template matching fails under variability and suggesting a data-driven fix. That's a reasonable direction, and the initial results show the potential for higher accuracy. Where it falls short is the experimental support. A sample of six people is not enough to claim robustness, especially without details on validation methods, diversity of the cohort, or direct comparisons to other approaches. The abstract leaves out error bars, data splits, and any discussion of how the weak supervision is set up, making it difficult to assess if this is a substantive improvement. Readers in computer vision applied to eye tracking or medical devices might find the framework description useful once they see the full methods. It could spark ideas for similar problems in registration tasks. Overall, this deserves to go to peer review. The topic is relevant and the approach is worth exploring further with more rigorous testing.

Referee Report

1 major / 0 minor

Summary. The manuscript proposes a novel weakly-supervised learning-based framework for retinal eye tracking intended to improve robustness over classical template-matching registration methods under retinal feature variability and real-world imaging conditions. It reports initial results achieving a 95th-percentile gaze error below 0.45 degrees on a cohort of 6 participants.

Significance. If the framework can be shown to generalize reliably, it would offer a useful advance for high-accuracy gaze estimation in ophthalmic imaging and AR/VR systems. The weakly-supervised design could lower annotation costs in medical imaging domains. At present, however, the extremely limited evaluation prevents any confident judgment of practical significance or robustness.

major comments (1)

Abstract: the central accuracy and robustness claims rest on quantitative results from only 6 participants with no reported details on validation protocol, participant diversity, data splits, baselines, error bars, or external test sets. This sample size is insufficient to support generalization to the real-world variability highlighted in the introduction as the failure mode of prior methods.

Simulated Author's Rebuttal

1 responses · 1 unresolved

We thank the referee for the constructive feedback on our manuscript. We address the major comment regarding the abstract and evaluation details below, and will revise the manuscript to provide greater clarity while appropriately scoping our claims.

read point-by-point responses

Referee: Abstract: the central accuracy and robustness claims rest on quantitative results from only 6 participants with no reported details on validation protocol, participant diversity, data splits, baselines, error bars, or external test sets. This sample size is insufficient to support generalization to the real-world variability highlighted in the introduction as the failure mode of prior methods.

Authors: We agree that the abstract requires additional detail on the evaluation. In the revision we will expand it to specify the validation protocol (participant-wise cross-validation on the 6-person cohort), participant characteristics, data splits, direct quantitative comparison against classical template-matching baselines, and error statistics with appropriate context. We will also ensure error bars or intervals appear in the results. However, we will revise the abstract and introduction language to present these as initial feasibility results on a small cohort rather than evidence of broad generalization or robustness to all real-world variability. Larger-scale validation with external test sets remains future work. revision: partial

standing simulated objections not resolved

The evaluation remains limited to a 6-participant cohort, which we cannot expand in the current revision and which inherently restricts strong claims of generalization to real-world retinal feature variability.

Circularity Check

0 steps flagged

No circularity detected; empirical evaluation on small cohort with no self-referential derivation chain

full rationale

The manuscript proposes a weakly-supervised learning framework for retinal eye tracking and supports its accuracy claims solely through empirical testing on a cohort of 6 participants. No mathematical derivations, equations, fitted parameters presented as predictions, or load-bearing self-citations appear in the abstract or described structure that would reduce any result to its own inputs by construction. The central contribution is an algorithmic framework whose performance is reported experimentally rather than derived in a closed loop, rendering the work self-contained against the listed circularity patterns.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract provides no equations, parameters, or assumptions; ledger is empty by default.

pith-pipeline@v0.9.0 · 5437 in / 1000 out tokens · 29922 ms · 2026-05-12T02:32:43.922197+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

We propose a novel weakly-supervised, learning-based framework for robust retinal eye tracking... joint image enhancement and keypoint description... canonical feature space registration
IndisputableMonolith/Foundation/AbsoluteFloorClosure.lean reality_from_one_distinction unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

Experiments... 95th-percentile gaze error < 0.45 deg across a cohort of 6 participants

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

26 extracted references · 26 canonical work pages · 1 internal anchor

[1]

Establishing Robust Retinal Eye Tracking: A Weakly Supervised Algorithmic Framework

INTRODUCTION Retinal image-based eye tracking has the potential to deliver substantially higher gaze accuracy than traditional pupil- or cornea-based approaches. This is because it measures gaze more directly, by observing where light falls on the retina—particularly relative to the fovea, which defines the center of vision. The core idea is that each gaz...

work page internal anchor Pith review Pith/arXiv arXiv 2026
[2]

RELA TED WORK 2.1. Retinal Image-based Eye Tracking With the advent of scanning laser ophthalmoscopy (SLO) and adaptive optics SLO (AOSLO), strip-based cross-correlation has become the primary algorithmic paradigm for retinal eye tracking. In this technique, narrow image strips are cross cor- related against a reference retinal image to estimate eye mo- t...

work page
[3]

Overview The core principle of retinal eye tracking is that gaze direc- tion can be inferred from how retinal features shift in the cap- tured image as the eye rotates

PROPOSED METHOD 3.1. Overview The core principle of retinal eye tracking is that gaze direc- tion can be inferred from how retinal features shift in the cap- tured image as the eye rotates. Specifically, the gaze (pitch, yaw) is related to the translation of a source retinal image relative to a reference (foveal) retinal image acquired when the user looks...

work page
[4]

We then fine-tune the descriptor decoder using a triplet loss: Ldesc = X i∈K max(0, m+ϕ pos − 1 2 (ϕneg−rand +ϕ neg−hard)) (2) We refer readers to [8] for details of this loss

and freeze the shared encoder and detector decoder. We then fine-tune the descriptor decoder using a triplet loss: Ldesc = X i∈K max(0, m+ϕ pos − 1 2 (ϕneg−rand +ϕ neg−hard)) (2) We refer readers to [8] for details of this loss. Furthermore, we propose a keypoint-preserving and boosting loss: Lkp =max(0, h−[ X i∈P σ( Di enhanced −γ t ) −stopgrad( X i∈P σ(...

work page
[5]

Dataset Experiments were conducted on both phantom-eye and real- eye images over a +/-5◦ gaze range

EXPERIMENT 4.1. Dataset Experiments were conducted on both phantom-eye and real- eye images over a +/-5◦ gaze range. For the phantom-eye ex- periments, we used a dataset collected with a custom retinal eye tracking system [10]. Ground truth gaze direction (pitch and yaw) was provided by the motorized goniometer stages holding the phantom eye. The dataset ...

work page 2000
[6]

CONCLUSION AND FUTURE WORK In this paper, we propose a robust, accurate and practical algo- rithmic framework for retinal image-based eye tracking. The proposed approach includes multiple methodological contri- butions, including a task-specialized image registration model and a complementary feature space registration strategy de- signed to improve robus...

work page
[7]

Substrip-based registration and automatic mon- taging of adaptive optics retinal images,

Ruixue Liu, Xiaolin Wang, Sujin Hoshi, and Yuhua Zhang, “Substrip-based registration and automatic mon- taging of adaptive optics retinal images,”Biomed. Opt. Express, vol. 15, no. 2, pp. 1311–1330, 2024

work page 2024
[8]

De-warping of images and improved eye tracking for the scanning laser ophthalmoscope,

Phillip Bedggood and Andrew Metha, “De-warping of images and improved eye tracking for the scanning laser ophthalmoscope,”PLoS One, 2017

work page 2017
[9]

Binocular eye tracking with the tracking scanning laser ophthalmoscope,

Scott Stevenson, Christy Sheehy, and Austin Roorda, “Binocular eye tracking with the tracking scanning laser ophthalmoscope,”Vision Res, vol. 118, pp. 98–104, 2016

work page 2016
[10]

Active eye-tracking for an adaptive optics scanning laser ophthalmoscope,

Christy Sheehy, Pavan Tiruveedhula, Ramkumar Sabesan, and Austin Roorda, “Active eye-tracking for an adaptive optics scanning laser ophthalmoscope,” Biomed. Opt. Express, vol. 6, no. 7, pp. 2412–2423, 2015

work page 2015
[11]

Super- junction: Learning-based junction detection for retinal image registration.,

Wang Yu, Xiaoye Wang, Zaiwang Gu, Weide Liu, Wee Siong Ng, Weimin Huang, and Jun Cheng, “Super- junction: Learning-based junction detection for retinal image registration.,” inAAAI Conference on Artificial Intelligence, 2024, p. 292–300

work page 2024
[12]

Robust content-adaptive global registration for multimodal retinal images using weakly supervised deep-learning framework,

Yiqian Wang, Junkang Zhang, Melina Cavichini, Dirk Bartsch, William Freeman, Troung Nguyen, and Cheol- hong An, “Robust content-adaptive global registration for multimodal retinal images using weakly supervised deep-learning framework,”IEEE Transactions on Image Processing, vol. 30, pp. 3167–3178, 2021

work page 2021
[13]

Two-step registration on multi-modal retinal images via deep neural networks,

Junkang Zhang, Yiqian Wang, Ji Dai, Melina Cavichini, Dirk Bartsch, William Freeman, Truong Nguyen, and Cheolhong An, “Two-step registration on multi-modal retinal images via deep neural networks,”IEEE Trans- actions on Image Processing, vol. 31, pp. 823–838, 2022

work page 2022
[14]

Semi-supervised keypoint detector and descrip- tor for retinal image matching,

Jiazhen Liu, Xirong Li, Qijie Wei, Jie Xu, and Dayong Ding, “Semi-supervised keypoint detector and descrip- tor for retinal image matching,” in2022 European Con- ference on Computer Vision (ECCV), 2022, pp. 593– 609

work page 2022
[15]

Accurate regis- tration between ultra-wide-field and narrow angle retina images with 3d eyeball shape optimization,

Junkang Zhang, Bo Wen, Fritz Gerald P. Kalaw, Melina Cavichini, Dirk-Uwe G. Bartsch, William R. Freeman, Truong Q. Nguyen, and Cheolhong An, “Accurate regis- tration between ultra-wide-field and narrow angle retina images with 3d eyeball shape optimization,” in2023 IEEE International Conference on Image Processing (ICIP), 2023, pp. 2750–2754

work page 2023
[16]

Gaze- matched, pupil-steered retinal imaging for arcmin preci- sion eye tracking over a 50° gaze range at 200hz,

Francesco LaRocca, Michael Tilleman, Carmen Wang, Bartlomiej Kowalski, David Li, Youmin Wang, Qiang Yang, Alfredo Dubra, and Mohamed El-Haddad, “Gaze- matched, pupil-steered retinal imaging for arcmin preci- sion eye tracking over a 50° gaze range at 200hz,” in Ophthalmic Technologies XXXV, 2025, p. 15

work page 2025
[17]

High-speed, image-based eye tracking with a scanning laser ophthalmoscope,

Christy Sheehy, Qiang Yang, David W. Arathorn, Pavan Tiruveedhula, Johannes F. de Boer, and Austin Roorda, “High-speed, image-based eye tracking with a scanning laser ophthalmoscope,”Biomed. Opt. Express, vol. 3, no. 10, pp. 2611–2622, 2012

work page 2012
[18]

Retinaregnet: A zero-shot approach for retinal image registration,

Vishal Balaji Sivaraman, Muhammad Imran, Qingyue Wei, Preethika Muralidharan, Michelle R. Tamplin, Is- abella M. Grumbach, Randy H. Kardon, Jui-Kai Wang, Yuyin Zhou, and Wei Shao, “Retinaregnet: A zero-shot approach for retinal image registration,”Computers in Biology and Medicine, vol. 186, pp. 109645, 2025

work page 2025
[19]

Object recognition from local scale- invariant features,

David G. Lowe, “Object recognition from local scale- invariant features,” inProceedings of the IEEE Inter- national Conference on Computer Vision (ICCV), 1999, pp. 1150–1157

work page 1999
[20]

Emergent correspondence from image diffusion,

Luming Tang, Menglin Jia, Qianqian Wang, Cheng Perng Phoo, and Bharath Hariharan, “Emergent correspondence from image diffusion,” inThirty- seventh Conference on Neural Information Processing Systems, 2023

work page 2023
[21]

Zero-reference deep curve estimation for low-light im- age enhancement,

Chunle Guo, Chongyi Li, Jichang Guo, Chen Change Loy, Junhui Hou, Sam Kwong, and Runmin Cong, “Zero-reference deep curve estimation for low-light im- age enhancement,” in2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2020, pp. 1777–1786

work page 2020
[22]

Superpoint: Self-supervised interest point detection and description,

Daniel DeTone, Tomasz Malisiewicz, and Andrew Ra- binovich, “Superpoint: Self-supervised interest point detection and description,” inIEEE/CVF Conf. Comput. Vis. Pattern Recognit. Workshops (CVPRW), 2018, pp. 224–236

work page 2018
[23]

Superglue: Learn- ing feature matching with graph neural networks,

Sarlin Paul-Edouard, Daniel DeTone, Tomasz Mal- isiewicz, and Andrew Rabinovich, “Superglue: Learn- ing feature matching with graph neural networks,” in IEEE/CVF conference on computer vision and pattern recognition (CVPR), 2020, pp. 4938–4947

work page 2020
[24]

Orb: An efficient alternative to sift or surf,

Ethan Rublee, Vincent Rabaud, Kurt Konolige, and Gary R. Bradski, “Orb: An efficient alternative to sift or surf,” inProceedings of the IEEE International Confer- ence on Computer Vision (ICCV), 2011, pp. 2564–2571

work page 2011
[25]

A multiresolution spline with application to image mosaics,

Peter Burt and Edward Adelson, “A multiresolution spline with application to image mosaics,”ACM Trans- actions on Graphics, vol. 2, pp. 217–236, 1983

work page 1983
[26]

Runtime Analysis The inference time of the proposed algorithm is presented in Table 4, which yields an approximate 14.5 FPS

SUPPLEMENTARY MA TERIALS 7.1. Runtime Analysis The inference time of the proposed algorithm is presented in Table 4, which yields an approximate 14.5 FPS. The exper- iment is run on one NVIDIA RTX 3080 GPU, with test im- age size 253×207 and batch size of 1. The canonical feature space is constructed once per subject in approximately 3.8 seconds (on the s...

work page

[1] [1]

Establishing Robust Retinal Eye Tracking: A Weakly Supervised Algorithmic Framework

INTRODUCTION Retinal image-based eye tracking has the potential to deliver substantially higher gaze accuracy than traditional pupil- or cornea-based approaches. This is because it measures gaze more directly, by observing where light falls on the retina—particularly relative to the fovea, which defines the center of vision. The core idea is that each gaz...

work page internal anchor Pith review Pith/arXiv arXiv 2026

[2] [2]

RELA TED WORK 2.1. Retinal Image-based Eye Tracking With the advent of scanning laser ophthalmoscopy (SLO) and adaptive optics SLO (AOSLO), strip-based cross-correlation has become the primary algorithmic paradigm for retinal eye tracking. In this technique, narrow image strips are cross cor- related against a reference retinal image to estimate eye mo- t...

work page

[3] [3]

Overview The core principle of retinal eye tracking is that gaze direc- tion can be inferred from how retinal features shift in the cap- tured image as the eye rotates

PROPOSED METHOD 3.1. Overview The core principle of retinal eye tracking is that gaze direc- tion can be inferred from how retinal features shift in the cap- tured image as the eye rotates. Specifically, the gaze (pitch, yaw) is related to the translation of a source retinal image relative to a reference (foveal) retinal image acquired when the user looks...

work page

[4] [4]

We then fine-tune the descriptor decoder using a triplet loss: Ldesc = X i∈K max(0, m+ϕ pos − 1 2 (ϕneg−rand +ϕ neg−hard)) (2) We refer readers to [8] for details of this loss

and freeze the shared encoder and detector decoder. We then fine-tune the descriptor decoder using a triplet loss: Ldesc = X i∈K max(0, m+ϕ pos − 1 2 (ϕneg−rand +ϕ neg−hard)) (2) We refer readers to [8] for details of this loss. Furthermore, we propose a keypoint-preserving and boosting loss: Lkp =max(0, h−[ X i∈P σ( Di enhanced −γ t ) −stopgrad( X i∈P σ(...

work page

[5] [5]

Dataset Experiments were conducted on both phantom-eye and real- eye images over a +/-5◦ gaze range

EXPERIMENT 4.1. Dataset Experiments were conducted on both phantom-eye and real- eye images over a +/-5◦ gaze range. For the phantom-eye ex- periments, we used a dataset collected with a custom retinal eye tracking system [10]. Ground truth gaze direction (pitch and yaw) was provided by the motorized goniometer stages holding the phantom eye. The dataset ...

work page 2000

[6] [6]

CONCLUSION AND FUTURE WORK In this paper, we propose a robust, accurate and practical algo- rithmic framework for retinal image-based eye tracking. The proposed approach includes multiple methodological contri- butions, including a task-specialized image registration model and a complementary feature space registration strategy de- signed to improve robus...

work page

[7] [7]

Substrip-based registration and automatic mon- taging of adaptive optics retinal images,

Ruixue Liu, Xiaolin Wang, Sujin Hoshi, and Yuhua Zhang, “Substrip-based registration and automatic mon- taging of adaptive optics retinal images,”Biomed. Opt. Express, vol. 15, no. 2, pp. 1311–1330, 2024

work page 2024

[8] [8]

De-warping of images and improved eye tracking for the scanning laser ophthalmoscope,

Phillip Bedggood and Andrew Metha, “De-warping of images and improved eye tracking for the scanning laser ophthalmoscope,”PLoS One, 2017

work page 2017

[9] [9]

Binocular eye tracking with the tracking scanning laser ophthalmoscope,

Scott Stevenson, Christy Sheehy, and Austin Roorda, “Binocular eye tracking with the tracking scanning laser ophthalmoscope,”Vision Res, vol. 118, pp. 98–104, 2016

work page 2016

[10] [10]

Active eye-tracking for an adaptive optics scanning laser ophthalmoscope,

Christy Sheehy, Pavan Tiruveedhula, Ramkumar Sabesan, and Austin Roorda, “Active eye-tracking for an adaptive optics scanning laser ophthalmoscope,” Biomed. Opt. Express, vol. 6, no. 7, pp. 2412–2423, 2015

work page 2015

[11] [11]

Super- junction: Learning-based junction detection for retinal image registration.,

Wang Yu, Xiaoye Wang, Zaiwang Gu, Weide Liu, Wee Siong Ng, Weimin Huang, and Jun Cheng, “Super- junction: Learning-based junction detection for retinal image registration.,” inAAAI Conference on Artificial Intelligence, 2024, p. 292–300

work page 2024

[12] [12]

Robust content-adaptive global registration for multimodal retinal images using weakly supervised deep-learning framework,

Yiqian Wang, Junkang Zhang, Melina Cavichini, Dirk Bartsch, William Freeman, Troung Nguyen, and Cheol- hong An, “Robust content-adaptive global registration for multimodal retinal images using weakly supervised deep-learning framework,”IEEE Transactions on Image Processing, vol. 30, pp. 3167–3178, 2021

work page 2021

[13] [13]

Two-step registration on multi-modal retinal images via deep neural networks,

Junkang Zhang, Yiqian Wang, Ji Dai, Melina Cavichini, Dirk Bartsch, William Freeman, Truong Nguyen, and Cheolhong An, “Two-step registration on multi-modal retinal images via deep neural networks,”IEEE Trans- actions on Image Processing, vol. 31, pp. 823–838, 2022

work page 2022

[14] [14]

Semi-supervised keypoint detector and descrip- tor for retinal image matching,

Jiazhen Liu, Xirong Li, Qijie Wei, Jie Xu, and Dayong Ding, “Semi-supervised keypoint detector and descrip- tor for retinal image matching,” in2022 European Con- ference on Computer Vision (ECCV), 2022, pp. 593– 609

work page 2022

[15] [15]

Accurate regis- tration between ultra-wide-field and narrow angle retina images with 3d eyeball shape optimization,

Junkang Zhang, Bo Wen, Fritz Gerald P. Kalaw, Melina Cavichini, Dirk-Uwe G. Bartsch, William R. Freeman, Truong Q. Nguyen, and Cheolhong An, “Accurate regis- tration between ultra-wide-field and narrow angle retina images with 3d eyeball shape optimization,” in2023 IEEE International Conference on Image Processing (ICIP), 2023, pp. 2750–2754

work page 2023

[16] [16]

Gaze- matched, pupil-steered retinal imaging for arcmin preci- sion eye tracking over a 50° gaze range at 200hz,

Francesco LaRocca, Michael Tilleman, Carmen Wang, Bartlomiej Kowalski, David Li, Youmin Wang, Qiang Yang, Alfredo Dubra, and Mohamed El-Haddad, “Gaze- matched, pupil-steered retinal imaging for arcmin preci- sion eye tracking over a 50° gaze range at 200hz,” in Ophthalmic Technologies XXXV, 2025, p. 15

work page 2025

[17] [17]

High-speed, image-based eye tracking with a scanning laser ophthalmoscope,

Christy Sheehy, Qiang Yang, David W. Arathorn, Pavan Tiruveedhula, Johannes F. de Boer, and Austin Roorda, “High-speed, image-based eye tracking with a scanning laser ophthalmoscope,”Biomed. Opt. Express, vol. 3, no. 10, pp. 2611–2622, 2012

work page 2012

[18] [18]

Retinaregnet: A zero-shot approach for retinal image registration,

Vishal Balaji Sivaraman, Muhammad Imran, Qingyue Wei, Preethika Muralidharan, Michelle R. Tamplin, Is- abella M. Grumbach, Randy H. Kardon, Jui-Kai Wang, Yuyin Zhou, and Wei Shao, “Retinaregnet: A zero-shot approach for retinal image registration,”Computers in Biology and Medicine, vol. 186, pp. 109645, 2025

work page 2025

[19] [19]

Object recognition from local scale- invariant features,

David G. Lowe, “Object recognition from local scale- invariant features,” inProceedings of the IEEE Inter- national Conference on Computer Vision (ICCV), 1999, pp. 1150–1157

work page 1999

[20] [20]

Emergent correspondence from image diffusion,

Luming Tang, Menglin Jia, Qianqian Wang, Cheng Perng Phoo, and Bharath Hariharan, “Emergent correspondence from image diffusion,” inThirty- seventh Conference on Neural Information Processing Systems, 2023

work page 2023

[21] [21]

Zero-reference deep curve estimation for low-light im- age enhancement,

Chunle Guo, Chongyi Li, Jichang Guo, Chen Change Loy, Junhui Hou, Sam Kwong, and Runmin Cong, “Zero-reference deep curve estimation for low-light im- age enhancement,” in2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2020, pp. 1777–1786

work page 2020

[22] [22]

Superpoint: Self-supervised interest point detection and description,

Daniel DeTone, Tomasz Malisiewicz, and Andrew Ra- binovich, “Superpoint: Self-supervised interest point detection and description,” inIEEE/CVF Conf. Comput. Vis. Pattern Recognit. Workshops (CVPRW), 2018, pp. 224–236

work page 2018

[23] [23]

Superglue: Learn- ing feature matching with graph neural networks,

Sarlin Paul-Edouard, Daniel DeTone, Tomasz Mal- isiewicz, and Andrew Rabinovich, “Superglue: Learn- ing feature matching with graph neural networks,” in IEEE/CVF conference on computer vision and pattern recognition (CVPR), 2020, pp. 4938–4947

work page 2020

[24] [24]

Orb: An efficient alternative to sift or surf,

Ethan Rublee, Vincent Rabaud, Kurt Konolige, and Gary R. Bradski, “Orb: An efficient alternative to sift or surf,” inProceedings of the IEEE International Confer- ence on Computer Vision (ICCV), 2011, pp. 2564–2571

work page 2011

[25] [25]

A multiresolution spline with application to image mosaics,

Peter Burt and Edward Adelson, “A multiresolution spline with application to image mosaics,”ACM Trans- actions on Graphics, vol. 2, pp. 217–236, 1983

work page 1983

[26] [26]

Runtime Analysis The inference time of the proposed algorithm is presented in Table 4, which yields an approximate 14.5 FPS

SUPPLEMENTARY MA TERIALS 7.1. Runtime Analysis The inference time of the proposed algorithm is presented in Table 4, which yields an approximate 14.5 FPS. The exper- iment is run on one NVIDIA RTX 3080 GPU, with test im- age size 253×207 and batch size of 1. The canonical feature space is constructed once per subject in approximately 3.8 seconds (on the s...

work page