Active Diffusion Matching: Score-based Iterative Alignment of Cross-Modal Retinal Images
Pith reviewed 2026-05-10 16:40 UTC · model grok-4.3
The pith
Two interdependent diffusion models jointly estimate global and local alignments between standard and ultra-widefield fundus images.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
ADM integrates two interdependent score-based diffusion models to jointly estimate global transformations and local deformations via an iterative Langevin Markov chain, with custom sampling strategies to adapt to input pairs, producing higher alignment accuracy than prior methods on both private SFI-UWFI pairs and public SFI-SFI pairs.
What carries the argument
The Active Diffusion Matching procedure, which couples two score-based diffusion models through an iterative Langevin Markov chain to perform stochastic progressive search for optimal global and local alignment parameters.
If this is right
- Joint optimization of global transformation and local deformation becomes feasible for cross-modal retinal pairs where viewing ranges differ sharply.
- Alignment accuracy improves enough to support downstream tasks such as integrated diagnostic review of standard and widefield images.
- The stochastic iterative search reduces the need for hand-crafted initialization or separate coarse-to-fine pipelines.
- Custom sampling within the diffusion process increases robustness to the amorphous texture of retinal data.
Where Pith is reading between the lines
- The same coupled-diffusion structure could be tested on other cross-modal medical registration problems such as MRI-CT brain alignment.
- If the iterative chain scales well, it might reduce reliance on supervised landmark detectors for retinal registration.
- The approach suggests a general template for using score-based models to handle both rigid and non-rigid components in one optimization loop.
Load-bearing premise
The two diffusion models will converge reliably to good global and local alignment without getting stuck in poor solutions or depending too much on starting guesses for any pair of images.
What would settle it
On a held-out set of SFI-UWFI pairs, ADM produces lower mAUC than the previous best method or shows no improvement over simple affine registration.
Figures
read the original abstract
Objective: The study aims to address the challenge of aligning Standard Fundus Images (SFIs) and Ultra-Widefield Fundus Images (UWFIs), which is difficult due to their substantial differences in viewing range and the amorphous appearance of the retina. Currently, no specialized method exists for this task, and existing image alignment techniques lack accuracy. Methods: We propose Active Diffusion Matching (ADM), a novel cross-modal alignment method. ADM integrates two interdependent score-based diffusion models to jointly estimate global transformations and local deformations via an iterative Langevin Markov chain. This approach facilitates a stochastic, progressive search for optimal alignment. Additionally, custom sampling strategies are introduced to enhance the adaptability of ADM to given input image pairs. Results: Comparative experimental evaluations demonstrate that ADM achieves state-of-the-art alignment accuracy. This was validated on two datasets: a private dataset of SFI-UWFI pairs and a public dataset of SFI-SFI pairs, with mAUC improvements of 5.2 and 0.4 points on the private and public datasets, respectively, compared to existing state-of-the-art methods. Conclusion: ADM effectively bridges the gap in aligning SFIs and UWFIs, providing an innovative solution to a previously unaddressed challenge. The method's ability to jointly optimize global and local alignment makes it highly effective for cross-modal image alignment tasks. Significance: ADM has the potential to transform the integrated analysis of SFIs and UWFIs, enabling better clinical utility and supporting learning-based image enhancements. This advancement could significantly improve diagnostic accuracy and patient outcomes in ophthalmology.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper proposes Active Diffusion Matching (ADM), a novel method for aligning Standard Fundus Images (SFIs) with Ultra-Widefield Fundus Images (UWFIs) that integrates two interdependent score-based diffusion models to jointly estimate global transformations and local deformations. These are optimized via an iterative Langevin Markov chain with custom sampling strategies to perform stochastic progressive search. The central claim is that ADM achieves state-of-the-art alignment accuracy, with reported mAUC gains of 5.2 points on a private SFI-UWFI dataset and 0.4 points on a public SFI-SFI dataset relative to existing methods.
Significance. If the performance claims hold under rigorous validation, ADM would address a genuine gap in cross-modal retinal image registration where large field-of-view differences and amorphous retinal structure make standard techniques unreliable. The joint global-local optimization via coupled diffusion models represents a technically interesting extension of score-based generative approaches to alignment tasks, with potential downstream benefits for clinical analysis and learning-based enhancement in ophthalmology.
major comments (4)
- [Results] Results: The reported mAUC improvements (5.2 and 0.4 points) are presented without error bars, standard deviations across runs, or statistical significance tests. This directly weakens the SOTA claim, as it is impossible to determine whether the gains are robust or could arise from variance in the stochastic Langevin process.
- [Methods] Methods: No ablation studies are provided on the custom sampling strategies or the free parameters of the iterative Langevin Markov chain (chain length, step size). Given that the method relies on these interdependent components for convergence, the absence of such controls leaves open whether the gains reflect the core architecture or favorable hyperparameter tuning on the private data.
- [Methods] Methods: The private dataset is described only at a high level with no information on collection protocol, patient demographics, acquisition parameters, or the train/validation/test split. This is load-bearing for the 5.2-point gain claim, as it prevents assessment of selection bias, data leakage, or generalization.
- [Methods] Methods: The paper asserts that the coupled diffusion models and stochastic progressive search reliably reach optimal alignments despite large FOV differences, yet provides no analysis of convergence behavior, sensitivity to initialization, or failure modes such as mode collapse. This is central to the weakest assumption identified in the skeptic note.
minor comments (2)
- [Abstract] Abstract: The acronym mAUC is introduced without expansion; it should be defined on first use (e.g., mean area under the curve) for clarity.
- [Methods] The description of the two diffusion models as 'interdependent' is repeated but never formalized with an explicit coupling equation or loss term; adding this would improve reproducibility.
Simulated Author's Rebuttal
We thank the referee for the constructive and detailed feedback on our manuscript. We address each major comment point by point below, indicating where revisions will be made to strengthen the work.
read point-by-point responses
-
Referee: [Results] Results: The reported mAUC improvements (5.2 and 0.4 points) are presented without error bars, standard deviations across runs, or statistical significance tests. This directly weakens the SOTA claim, as it is impossible to determine whether the gains are robust or could arise from variance in the stochastic Langevin process.
Authors: We agree that the lack of error bars and statistical tests limits the ability to assess robustness given the stochastic sampling. In the revised manuscript, we will report means and standard deviations from multiple independent runs (with different random seeds) for both datasets. We will also add statistical significance tests (e.g., paired t-tests or Wilcoxon tests) comparing ADM against baselines, with results incorporated into the Results section and tables. revision: yes
-
Referee: [Methods] Methods: No ablation studies are provided on the custom sampling strategies or the free parameters of the iterative Langevin Markov chain (chain length, step size). Given that the method relies on these interdependent components for convergence, the absence of such controls leaves open whether the gains reflect the core architecture or favorable hyperparameter tuning on the private data.
Authors: We concur that ablations are important to isolate the contributions of the sampling strategies and hyperparameters. We will add a new ablation subsection in the Experiments section of the revised manuscript, including targeted experiments on the custom sampling strategies and sweeps over chain length and step size, with quantitative results on alignment performance. revision: yes
-
Referee: [Methods] Methods: The private dataset is described only at a high level with no information on collection protocol, patient demographics, acquisition parameters, or the train/validation/test split. This is load-bearing for the 5.2-point gain claim, as it prevents assessment of selection bias, data leakage, or generalization.
Authors: We recognize the need for greater transparency. Due to privacy regulations, we cannot release patient-level demographics or the dataset itself. However, we will expand the dataset description in the revised manuscript to include acquisition protocol details (imaging devices, resolutions, FOV specifications), number of patients and pairs, and explicit train/validation/test split ratios, while preserving anonymity. The public dataset results will be highlighted as supporting evidence of generalization. revision: partial
-
Referee: [Methods] Methods: The paper asserts that the coupled diffusion models and stochastic progressive search reliably reach optimal alignments despite large FOV differences, yet provides no analysis of convergence behavior, sensitivity to initialization, or failure modes such as mode collapse. This is central to the weakest assumption identified in the skeptic note.
Authors: This concern is well-founded, as convergence analysis is essential for validating the iterative process. In the revision, we will add convergence plots (alignment error vs. iterations), sensitivity experiments to varied initializations, and a discussion of failure modes (including potential mode collapse under extreme FOV mismatches) with mitigation via the custom sampling. These will be placed in the Methods and Experiments sections. revision: yes
Circularity Check
No circularity in derivation chain
full rationale
The paper proposes ADM as a novel method combining two interdependent score-based diffusion models with iterative Langevin sampling and custom strategies for cross-modal retinal image alignment. The SOTA claim rests on comparative mAUC results from external private and public datasets, not on any equation or parameter that reduces the reported accuracy to a fitted input or self-defined quantity by construction. No self-citation chains, ansatzes, or uniqueness theorems are invoked in a load-bearing way that collapses the central result to prior author work or input data. The derivation is self-contained against the experimental benchmarks.
Axiom & Free-Parameter Ledger
free parameters (2)
- Langevin chain length and step size
- Custom sampling strategy parameters
axioms (1)
- domain assumption Score functions of the two diffusion models can be jointly optimized to estimate both global and local transformations
Lean theorems connected to this paper
-
IndisputableMonolith/Cost/FunctionalEquation.leanwashburn_uniqueness_aczel unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
ADM integrates two interdependent score-based diffusion models to jointly estimate global transformations and local deformations via an iterative Langevin Markov chain.
-
IndisputableMonolith/Foundation/RealityFromDistinction.leanreality_from_one_distinction unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
The Langevin dynamics ... xt+1 = xt + ϵt ∇x log p(xt) + √(2ϵt)zt
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Reference graph
Works this paper leans on
-
[1]
Semi-supervised keypoint detector and de- scriptor for retinal image matching
Jiazhen Liu, Xirong Li, Qijie Wei, Jie Xu, and Day- ong Ding. Semi-supervised keypoint detector and de- scriptor for retinal image matching. InECCV, 2022
work page 2022
-
[2]
Geometrized transformer for self-supervised homography estimation
Jiazhen Liu and Xirong Li. Geometrized transformer for self-supervised homography estimation. InICCV, 2023
work page 2023
-
[3]
Gang Wang, Zhicheng Wang, Yufei Chen, and Wei- dong Zhao. Robust point matching method for mul- timodal retinal image registration.Biomedical Signal Processing and Control, 19, 2015
work page 2015
-
[4]
Glampoints: Greedily learned accu- rate match points
Prune Truong, Stefanos Apostolopoulos, Agata Mosinska, Samuel Stucky, Carlos Ciller, and San- dro De Zanet. Glampoints: Greedily learned accu- rate match points. InICCV, 2019
work page 2019
-
[5]
Junyeop Lee and Min Sagong. Ultra-widefield retina imaging: principles of technology and clinical appli- cations.Journal of Retina, 1, 2016
work page 2016
-
[6]
Matthew T Witmer, George Parlitsis, Sarju Patel, and Szil´ ard Kiss. Comparison of ultra-widefield fluo- rescein angiography with the heidelberg spectralis® noncontact ultra-widefield module versus the op- tos®optomap®.Clinical Ophthalmology, 7, 2013
work page 2013
-
[7]
A deep learning-based framework for retinal fundus image enhancement.Plos one, 18, 2023
Kang Geon Lee, Su Jeong Song, Soochahn Lee, Hyeong Gon Yu, Dong Ik Kim, and Kyoung Mu Lee. A deep learning-based framework for retinal fundus image enhancement.Plos one, 18, 2023
work page 2023
-
[8]
Enhanced deep resid- ual networks for single image super-resolution
Bee Lim, Sanghyun Son, Heewon Kim, Seungjun Nah, and Kyoung Mu Lee. Enhanced deep resid- ual networks for single image super-resolution. In CVPRW, 2017
work page 2017
-
[9]
Rempe: Registration of retinal images through eye modelling and pose estimation
Carlos Hernandez-Matas, Xenophon Zabulis, and Antonis A Argyros. Rempe: Registration of retinal images through eye modelling and pose estimation. IEEE Journal of Biomedical and Health Informatics, 24, 2020
work page 2020
-
[10]
Loftr: Detector-free local feature matching with transformers
Jiaming Sun, Zehong Shen, Yuang Wang, Hujun Bao, and Xiaowei Zhou. Loftr: Detector-free local feature matching with transformers. InCVPR, 2021
work page 2021
-
[11]
Denois- ing diffusion probabilistic models.arXiv preprint, 2020
Jonathan Ho, Ajay Jain, and Pieter Abbeel. Denois- ing diffusion probabilistic models.arXiv preprint, 2020
work page 2020
-
[12]
Tutorial on diffusion models for imaging and vision.arXiv preprint, 2024
Stanley H Chan. Tutorial on diffusion models for imaging and vision.arXiv preprint, 2024
work page 2024
-
[13]
Score-based generative modeling through stochastic differential equations
Yang Song, Jascha Sohl-Dickstein, Diederik P Kingma, Abhishek Kumar, Stefano Ermon, and Ben Poole. Score-based generative modeling through stochastic differential equations. InICLR, 2021
work page 2021
-
[14]
Bayesian learning via stochastic gradient langevin dynamics
Max Welling and Yee W Teh. Bayesian learning via stochastic gradient langevin dynamics. InICML, 2011
work page 2011
-
[15]
Generative model- ing by estimating gradients of the data distribution
Yang Song and Stefano Ermon. Generative model- ing by estimating gradients of the data distribution. arXiv preprint, 2020
work page 2020
-
[16]
Timothy F Cootes, Christopher J Taylor, David H Cooper, and Jim Graham. Active shape models-their training and application.Computer Vision and Im- age Understanding, 61, 1995
work page 1995
-
[17]
Tobin BT Thuma, John A Bogovic, Kammi B Gun- ton, Hiram Jimenez, Bernardo Negreiros, and Jose S Pulido. The big warp: Registration of disparate reti- nal imaging modalities and an example overlay of ultrawide-field photos and en-face octa images.Plos one, 18, 2023. 14
work page 2023
-
[18]
Diffuse- morph: unsupervised deformable image registration using diffusion model
Boah Kim, Inhwa Han, and Jong Chul Ye. Diffuse- morph: unsupervised deformable image registration using diffusion model. InECCV, 2022
work page 2022
-
[19]
Posediffusion: Solving pose estimation via diffusion-aided bundle adjustment
Jianyuan Wang, Christian Rupprecht, and David Novotny. Posediffusion: Solving pose estimation via diffusion-aided bundle adjustment. InICCV, 2023
work page 2023
-
[20]
Spatial transformer networks.arXiv preprint, 2016
Max Jaderberg, Karen Simonyan, Andrew Zisser- man, and Koray Kavukcuoglu. Spatial transformer networks.arXiv preprint, 2016
work page 2016
-
[21]
Qiushi Nie, Xiaoqing Zhang, Yan Hu, Mingdao Gong, and Jiang Liu. Medical image registration and its application in retinal images: a review.Vi- sual Computing for Industry, Biomedicine, and Art, 7(1):21, 2024
work page 2024
-
[22]
A deep step pattern representation for multimodal retinal image registration
Jimmy Addison Lee, Peng Liu, Jun Cheng, and Huazhu Fu. A deep step pattern representation for multimodal retinal image registration. InICCV, 2019
work page 2019
-
[23]
Fine-scale vessel extraction in fundus images by reg- istration with fluorescein angiography
Kyoung Jin Noh, Sang Jun Park, and Soochahn Lee. Fine-scale vessel extraction in fundus images by reg- istration with fluorescein angiography. InMICCAI, 2019
work page 2019
-
[24]
Kang Geon Lee, Su Jeong Song, Soochahn Lee, Bo Hee Kim, Mingui Kong, and Kyoung Mu Lee. Fq-uwf: Unpaired generative image enhancement for fundus quality ultra-widefield retinal images.Bio- engineering, 11, 2024
work page 2024
-
[25]
Progres- sive Retinal Image Registration via Global and Lo- cal Deformable Transformations
Yepeng Liu, Baosheng Yu, Tian Chen, Yuliang Gu, Bo Du, Yongchao Xu, and Jun Cheng. Progres- sive Retinal Image Registration via Global and Lo- cal Deformable Transformations . In2024 IEEE International Conference on Bioinformatics and Biomedicine (BIBM), pages 2183–2190, Los Alami- tos, CA, USA, December 2024. IEEE Computer So- ciety
work page 2024
-
[26]
Richard Hartley and Andrew Zisserman.Multiple view geometry in computer vision. Cambridge, 2003
work page 2003
-
[27]
Richard Szeliski.Computer vision: algorithms and applications. Springer, 2022
work page 2022
-
[28]
David G Lowe. Distinctive image features from scale- invariant keypoints.International Journal of Com- puter Vision, 60, 2004
work page 2004
-
[29]
Surf: Speeded up robust features
Herbert Bay, Tinne Tuytelaars, and Luc Van Gool. Surf: Speeded up robust features. InECCV, 2006
work page 2006
-
[30]
Edward Rosten, Reid Porter, and Tom Drummond. Faster and better: A machine learning approach to corner detection.IEEE Transactions on Pattern Analysis and Machine Intelligence, 32, 2008
work page 2008
-
[31]
Brief: Binary robust in- dependent elementary features
Michael Calonder, Vincent Lepetit, Christoph Strecha, and Pascal Fua. Brief: Binary robust in- dependent elementary features. InECCV, 2010
work page 2010
-
[32]
Superpoint: Self-supervised interest point detection and description
Daniel DeTone, Tomasz Malisiewicz, and Andrew Rabinovich. Superpoint: Self-supervised interest point detection and description. InCVPRW, 2018
work page 2018
-
[33]
R2d2: Repeatable and reliable detector and descriptor.arXiv preprint, 2019
Jerome Revaud, Cesar De Souza, Martin Humen- berger, and Philippe Weinzaepfel. R2d2: Repeatable and reliable detector and descriptor.arXiv preprint, 2019
work page 2019
-
[34]
Superglue: Learn- ing feature matching with graph neural networks
Paul-Edouard Sarlin, Daniel DeTone, Tomasz Mal- isiewicz, and Andrew Rabinovich. Superglue: Learn- ing feature matching with graph neural networks. In CVPR, 2020
work page 2020
-
[35]
LightGlue: Local Feature Matching at Light Speed
Philipp Lindenberger, Paul-Edouard Sarlin, and Marc Pollefeys. LightGlue: Local Feature Matching at Light Speed. InICCV, 2023
work page 2023
-
[36]
Deep image homography estimation
Daniel DeTone, Tomasz Malisiewicz, and Andrew Rabinovich. Deep image homography estimation. arXiv preprint, 2016
work page 2016
-
[37]
Ignacio Rocco, Mircea Cimpoi, Relja Arandjelovi´ c, Akihiko Torii, Tomas Pajdla, and Josef Sivic. Nc- net: Neighbourhood consensus networks for estimat- ing image correspondences.IEEE Transactions on Pattern Analysis and Machine Intelligence, 44, 2020
work page 2020
-
[38]
Relpose: Predicting probabilistic relative ro- tation for single objects in the wild
Jason Y Zhang, Deva Ramanan, and Shubham Tul- siani. Relpose: Predicting probabilistic relative ro- tation for single objects in the wild. InECCV, 2022
work page 2022
-
[39]
Sparse- pose: Sparse-view camera pose regression and refine- ment
Samarth Sinha, Jason Y Zhang, Andrea Tagliasac- chi, Igor Gilitschenski, and David B Lindell. Sparse- pose: Sparse-view camera pose regression and refine- ment. InCVPR, 2023
work page 2023
-
[40]
P.J. Besl and Neil D. McKay. A method for registra- tion of 3-d shapes.IEEE Transactions on Pattern Analysis and Machine Intelligence, 14, 1992
work page 1992
-
[41]
Jing Dong, Byron Boots, Frank Dellaert, Ranveer Chandra, and Sudipta N. Sinha. Learning to align images using weak geometric supervision.arXiv preprint, 2018
work page 2018
-
[42]
Deep lucas-kanade homography for multimodal im- age alignment
Yiming Zhao, Xinming Huang, and Ziming Zhang. Deep lucas-kanade homography for multimodal im- age alignment. InCVPR, 2021
work page 2021
-
[43]
Iterative deep homography estimation
Si-Yuan Cao, Jianxin Hu, Zehua Sheng, and Hui- Liang Shen. Iterative deep homography estimation. InCVPR, 2022. 15
work page 2022
-
[44]
Re- current homography estimation using homography- guided image warping and focus transformer
Si-Yuan Cao, Runmin Zhang, Lun Luo, Beinan Yu, Zehua Sheng, Junwei Li, and Hui-Liang Shen. Re- current homography estimation using homography- guided image warping and focus transformer. In CVPR, 2023
work page 2023
-
[45]
Mcnet: Rethinking the core ingredients for accurate and efficient homography estimation
Haokai Zhu, Si-Yuan Cao, Jianxin Hu, Sitong Zuo, Beinan Yu, Jiacheng Ying, Junwei Li, and Hui-Liang Shen. Mcnet: Rethinking the core ingredients for accurate and efficient homography estimation. In CVPR, 2024
work page 2024
-
[46]
Xin Deng, Enpeng Liu, Chao Gao, Shengxi Li, Shuhang Gu, and Mai Xu. Crosshomo: Cross- modality and cross-resolution homography estima- tion.IEEE Transactions on Pattern Analysis and Machine Intelligence, 46, 2024
work page 2024
-
[47]
Separable flow: Learning motion cost volumes for optical flow esti- mation
Feihu Zhang, Oliver J Woodford, Victor Adrian Prisacariu, and Philip HS Torr. Separable flow: Learning motion cost volumes for optical flow esti- mation. InICCV, 2021
work page 2021
-
[48]
Gmflow: Learning optical flow via global matching
Haofei Xu, Jing Zhang, Jianfei Cai, Hamid Rezatofighi, and Dacheng Tao. Gmflow: Learning optical flow via global matching. InCVPR, 2022
work page 2022
-
[49]
Flowformer: A transformer ar- chitecture for optical flow
Zhaoyang Huang, Xiaoyu Shi, Chao Zhang, Qiang Wang, Ka Chun Cheung, Hongwei Qin, Jifeng Dai, and Hongsheng Li. Flowformer: A transformer ar- chitecture for optical flow. InECCV, 2022
work page 2022
-
[50]
Deformable image registration based on similarity- steered cnn regression
Xiaohuan Cao, Jianhua Yang, Jun Zhang, Dong Nie, Minjeong Kim, Qian Wang, and Dinggang Shen. Deformable image registration based on similarity- steered cnn regression. InMICCAI, 2017
work page 2017
-
[51]
Yipeng Hu, Marc Modat, Eli Gibson, Wenqi Li, Nooshin Ghavami, Ester Bonmati, Guotai Wang, Steven Bandula, Caroline M Moore, Mark Ember- ton, et al. Weakly-supervised convolutional neural networks for multimodal image registration.Medical Image Analysis, 49, 2018
work page 2018
-
[52]
Deepatlas: Joint semi-supervised learning of image registration and segmentation
Zhenlin Xu and Marc Niethammer. Deepatlas: Joint semi-supervised learning of image registration and segmentation. InMICCAI, 2019
work page 2019
-
[53]
Guha Balakrishnan, Amy Zhao, Mert R Sabuncu, John Guttag, and Adrian V Dalca. Voxelmorph: a learning framework for deformable medical image registration.IEEE Transactions on Medical Imag- ing, 38, 2019
work page 2019
-
[54]
Boah Kim, Dong Hwan Kim, Seong Ho Park, Jieun Kim, June-Goo Lee, and Jong Chul Ye. Cyclemorph: cycle consistent unsupervised deformable image reg- istration.Medical Image Analysis, 71, 2021
work page 2021
-
[55]
Lee, Ozan Oktay, Andreas Schuh, Michiel Schaap, and Ben Glocker
Matthew C.H. Lee, Ozan Oktay, Andreas Schuh, Michiel Schaap, and Ben Glocker. Image-and-spatial transformer networks for structure-guided image reg- istration. InMICCAI, 2019
work page 2019
-
[56]
Bob D De Vos, Floris F Berendsen, Max A Viergever, Hessam Sokooti, Marius Staring, and Ivana Iˇ sgum. A deep learning framework for unsupervised affine and deformable image registration.Medical Image Analysis, 52, 2019
work page 2019
-
[57]
Diffusion models beat gans on image synthesis
Prafulla Dhariwal and Alex Nichol. Diffusion models beat gans on image synthesis. InNeurIPS, 2024
work page 2024
-
[58]
High- resolution image synthesis with latent diffusion mod- els
Robin Rombach, Andreas Blattmann, Dominik Lorenz, Patrick Esser, and Bj¨ orn Ommer. High- resolution image synthesis with latent diffusion mod- els. InCVPR, 2022
work page 2022
-
[59]
Cameras as rays: Pose estimation via ray diffusion
Jason Y Zhang, Amy Lin, Moneish Kumar, Tzu- Hsuan Yang, Deva Ramanan, and Shubham Tulsiani. Cameras as rays: Pose estimation via ray diffusion. InICLR, 2024
work page 2024
-
[60]
Robustness analysis of non-convex stochastic gradient descent using biased expectations
Kevin Scaman and Cedric Malherbe. Robustness analysis of non-convex stochastic gradient descent using biased expectations. InNeurIPS, 2020
work page 2020
-
[61]
Weiss, Niru Mah- eswaranathan, and Surya Ganguli
Jascha Sohl-Dickstein, Eric A. Weiss, Niru Mah- eswaranathan, and Surya Ganguli. Deep unsuper- vised learning using nonequilibrium thermodynam- ics.arXiv preprint, 2015
work page 2015
-
[62]
Emerging properties in self-supervised vision transformers
Mathilde Caron, Hugo Touvron, Ishan Misra, Herv´ e J´ egou, Julien Mairal, Piotr Bojanowski, and Armand Joulin. Emerging properties in self-supervised vision transformers. InICCV, 2021
work page 2021
-
[63]
Khan BahadarKhan, Amir A Khaliq, and Muham- mad Shahid. A morphological hessian based ap- proach for retinal blood vessels segmentation and denoising using region based otsu thresholding.Plos one, 11, 2016
work page 2016
-
[64]
U-net: Convolutional networks for biomedi- cal image segmentation
Olaf Ronneberger, Philipp Fischer, and Thomas Brox. U-net: Convolutional networks for biomedi- cal image segmentation. InMICCAI, 2015
work page 2015
-
[65]
An unsupervised learning model for deformable medical image regis- tration
Guha Balakrishnan, Amy Zhao, Mert R Sabuncu, John Guttag, and Adrian V Dalca. An unsupervised learning model for deformable medical image regis- tration. InCVPR, 2018
work page 2018
-
[66]
Changsoo Je and Hyung-Min Park. Homographic p-norms: Metrics of homographic image transforma- tion.Signal Processing: Image Communication, 39, 2015. 16
work page 2015
-
[67]
Fire: Fundus image reg- istration dataset.Journal for Modeling in Ophthal- mology, 1, 2017
Carlos Hernandez-Matas, Xenophon Zabulis, Areti Triantafyllou, Panagiota Anyfanti, Stella Douma, and Antonis A Argyros. Fire: Fundus image reg- istration dataset.Journal for Modeling in Ophthal- mology, 1, 2017
work page 2017
-
[68]
Martin A. Fischler and Robert C. Bolles. Random sample consensus: A paradigm for model fitting with applications to image analysis and automated car- tography.Communications of the ACM, 24, 1987
work page 1987
-
[69]
DKM: Dense ker- nelized feature matching for geometry estimation
Johan Edstedt, Ioannis Athanasiadis, M˚ arten Wadenb¨ ack, and Michael Felsberg. DKM: Dense ker- nelized feature matching for geometry estimation. In CVPR, 2023
work page 2023
-
[70]
Aspanformer: Detector- free image matching with adaptive span transformer
Hongkai Chen, Zixin Luo, Lei Zhou, Yurun Tian, Mingmin Zhen, Tian Fang, David McKinnon, Yang- hai Tsin, and Long Quan. Aspanformer: Detector- free image matching with adaptive span transformer. InECCV, 2022
work page 2022
-
[71]
Charles Stewart, Chia-Ling Tsai, and Badrinath Roysam. The dual-bootstrap iterative closest point algorithm with application to retinal image registra- tion.IEEE Transactions on Medical Imaging, 22, 2003
work page 2003
-
[72]
Decoupled weight decay regularization.arXiv preprint, 2017
Ilya Loshchilov and Frank Hutter. Decoupled weight decay regularization.arXiv preprint, 2017
work page 2017
-
[73]
Cvt: Intro- ducing convolutions to vision transformers, 2021
Haiping Wu, Bin Xiao, Noel Codella, Mengchen Liu, Xiyang Dai, Lu Yuan, and Lei Zhang. Cvt: Intro- ducing convolutions to vision transformers, 2021
work page 2021
-
[74]
Scalable diffusion models with transformers, 2023
William Peebles and Saining Xie. Scalable diffusion models with transformers, 2023
work page 2023
-
[75]
Sumukh K Aithal, Pratyush Maini, Zachary C. Lip- ton, and J. Zico Kolter. Understanding hallucina- tions in diffusion models through mode interpolation, 2024
work page 2024
-
[76]
Beyond a gaussian denoiser: Residual learning of deep cnn for image denoising
Kai Zhang, Wangmeng Zuo, Yunjin Chen, Deyu Meng, and Lei Zhang. Beyond a gaussian denoiser: Residual learning of deep cnn for image denoising. InIEEE Transactions on Image Processing, 2017
work page 2017
-
[77]
Photo-realistic single image super- resolution using a generative adversarial network
Christian Ledig, Lucas Theis, Ferenc Huszar, Jose Caballero, et al. Photo-realistic single image super- resolution using a generative adversarial network. In CVPR, 2017
work page 2017
-
[78]
Chen Chen, Qifeng Chen, Jia Xu, and Vladlen Koltun. Learning to see in the dark. InCVPR, 2018
work page 2018
-
[79]
Benchmark- ing neural network robustness to common corrup- tions and perturbations
Dan Hendrycks and Thomas Dietterich. Benchmark- ing neural network robustness to common corrup- tions and perturbations. InICLR, 2019
work page 2019
-
[80]
Yang Song, Chenlin Meng, and Stefano Ermon. Con- sistency models. InAdvances in Neural Information Processing Systems, 2023
work page 2023
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.