EFIQA: Explainable Fundus Image Quality Assessment via Anatomical Priors

Hrvoje Bogunovi\'c; Jos\'e Morano; Pengwei Wang; Qian Wan

arxiv: 2606.20108 · v1 · pith:3UNCI3AUnew · submitted 2026-06-18 · 💻 cs.CV · cs.LG

EFIQA: Explainable Fundus Image Quality Assessment via Anatomical Priors

Pengwei Wang , Jos\'e Morano , Qian Wan , Hrvoje Bogunovi\'c This is my paper

Pith reviewed 2026-06-26 17:47 UTC · model grok-4.3

classification 💻 cs.CV cs.LG

keywords fundus image quality assessmentexplainable quality mapsunsupervised anomaly detectionanatomical priorsmasked inpaintinglabel-free learningspatial quality feedback

0 comments

The pith

Fundus image quality can be assessed without labels by learning expected vasculature through unsupervised inpainting and mapping the differences.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper tries to establish that quality assessment for fundus photographs does not need supervised training on human quality labels. Instead it learns what anatomical structures should be present by using masked inpainting to detect regions where vasculature is missing, then transfers that knowledge to produce spatial quality maps. A sympathetic reader would care because the approach is meant to generalize across datasets that use different quality criteria and to supply built-in explanations of where quality fails. The method freezes a foundation model and adds only a shallow adapter, keeping adaptation minimal.

Core claim

EFIQA first trains an unsupervised anomaly detector that reconstructs masked fundus images to reveal regions of missing vasculature, then distills this anatomical prior into a shallow adapter that maps features from a frozen foundation model onto precise spatial quality maps. Because no quality-related labels are used at any stage, the resulting maps and scalar scores are claimed to generalize better than supervised classifiers when tested on external datasets that employ different quality criteria, while also supplying explicit spatial feedback on degradation locations.

What carries the argument

Two-stage pipeline of unsupervised masked anatomical inpainting for anomaly detection followed by distillation into a shallow adapter on a frozen foundation model to generate quality maps.

If this is right

Quality assessment becomes possible on any new fundus dataset without collecting or matching quality labels.
Spatial quality maps are generated automatically, showing the exact image regions responsible for low scores.
Performance remains high across benchmarks that define quality differently from the original training distribution.
Only a small adapter needs training after the foundation model is frozen, lowering the cost of deployment.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same inpainting-based prior could be tested on other retinal imaging modalities where vessel visibility is a dominant quality factor.
If vasculature is not the main driver of quality in some clinical settings, the maps would systematically miss those degradations.
Real-time screening pipelines could insert the adapter after any compatible foundation model without retraining the entire system per site.

Load-bearing premise

Detecting missing vasculature through inpainting is enough to capture and explain most image quality problems without any quality supervision.

What would settle it

A dataset in which quality is degraded by factors unrelated to vessel visibility, such as uniform color shifts or global blur that leave vessels intact, and on which the method still assigns high quality scores.

Figures

Figures reproduced from arXiv: 2606.20108 by Hrvoje Bogunovi\'c, Jos\'e Morano, Pengwei Wang, Qian Wan.

**Figure 1.** Figure 1: Overview and example results. EFIQA discovers bad quality regions (in magenta) by locating missing anatomical structures, by this, we can achieve precise locallevel quality score in purely unsupervised manner. easily affected even in clinical settings. Consequently, suboptimal images occur frequently and can significantly hinder downstream applications (Ting et al., 2017). Moreover, accepting or rejectin… view at source ↗

**Figure 2.** Figure 2: The proposed pipeline. In stage one, we train a network (VUAD) to reconstruct vessel segmentation maps from partial observations. Then, the VUAD network serves as a teacher to distill its knowledge to an adapter student in the feature space of a foundation model. During inference, only the foundation model and the adapter are used. 3. Methods We instantiate EFIQA through a two-stage design, as is shown in … view at source ↗

**Figure 3.** Figure 3: Visualization of local-level quality maps by different methods on the MSHF dataset from both tabletop and portable devices. Good and bad quality labels are included. EFIQA provides accurate maps, correctly identifying the degraded areas across a wide range of bad quality cases. EFIQA produces uniformly low scores. In contrast, VUAD tends to highlight tiny regions with sparse or no vessels, and MCF-Net disp… view at source ↗

**Figure 4.** Figure 4 [PITH_FULL_IMAGE:figures/full_fig_p016_4.png] view at source ↗

**Figure 5.** Figure 5: Mild false activations occur in the macular region, where vessels are naturally sparse or absent. 17 [PITH_FULL_IMAGE:figures/full_fig_p017_5.png] view at source ↗

read the original abstract

Image quality control is vital for a wide range of downstream applications. Deep learning-based image quality assessment methods typically train classifiers on dataset-specific quality labels, inheriting two limitations: (1) generalization is tied to the labeling criteria of the training set and (2) these methods cannot provide spatial feedback on where the quality is degraded, lacking explainability. In this work, we propose EFIQA, a framework that requires no quality-related supervision and produces spatial quality maps by design. Rather than learning ``what is degradation" from human-annotated labels, EFIQA learns ``what should be there" by leveraging anatomical priors. For fundus photography, we instantiate this as a two-stage approach, by first training an unsupervised anomaly detector via masked anatomical inpainting to identify regions of missing vasculature, and then distilling this prior knowledge into a shallow adapter mapping features of a frozen foundation model to precise quality maps. External-dataset evaluation demonstrates that this label-free approach with minimal adaptation achieves better performance and explainability compared with supervised methods across benchmarks with different quality criteria, highlighting its potential for real-world applications.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

EFIQA's core move is a label-free pipeline that learns anatomical norms via masked inpainting then distills them into spatial quality maps on a frozen model, claiming better external generalization than label-tied classifiers.

read the letter

EFIQA's main contribution is the shift to learning what the anatomy should look like instead of what degradation looks like. It trains an unsupervised anomaly detector with masked anatomical inpainting to flag missing vasculature, then distills that knowledge into a shallow adapter that produces quality maps from a frozen foundation model. This setup is not a routine extension of the supervised fundus QA work cited in the abstract.

The paper handles the explainability requirement cleanly by design, since the maps come directly from the inpainting prior rather than post-hoc attribution. The external-dataset evaluation across benchmarks with mismatched quality criteria is the right test for the generalization claim, and the minimal-adaptation design is practical for real pipelines.

The soft spots are mostly in the execution details that the abstract leaves out. The full paper needs to show clear baselines, ablations on the inpainting and distillation stages, and statistical support for the performance edge; without those the superiority claim stays hard to weigh. The central assumption that vasculature anomalies are sufficient to explain quality degradation is reasonable for fundus photography but could miss other common issues like uneven illumination or sensor artifacts, and any dependence on the foundation model's pretraining data should be checked.

This is for researchers working on ophthalmic imaging or clinical quality control pipelines who need label-efficient, spatially aware assessment. A reader building diagnostic workflows could extract the method and test it directly.

I would send it to peer review. The framing and pipeline are coherent on their own terms and the evaluation direction is appropriate.

Referee Report

0 major / 3 minor

Summary. The manuscript proposes EFIQA, a label-free two-stage framework for fundus image quality assessment. It first trains an unsupervised anomaly detector using masked anatomical inpainting to identify regions of missing vasculature as a proxy for quality issues, then distills this prior into a shallow adapter that maps features from a frozen foundation model to spatial quality maps. The central claim is that this approach, requiring no quality-related supervision, achieves superior performance and built-in explainability compared to supervised methods when evaluated on external datasets with differing quality criteria.

Significance. If the empirical results hold, the work offers a meaningful advance by decoupling quality assessment from dataset-specific labels and providing spatial explainability by design. The reliance on anatomical priors rather than learned degradation patterns, combined with minimal adaptation of a foundation model, addresses generalization challenges common in medical imaging QA. The external-dataset evaluation across benchmarks is a strength, as is the potential for real-world deployment where labeling criteria vary.

minor comments (3)

The abstract and introduction would benefit from explicit numerical results (e.g., AUC or correlation values) and baseline comparisons rather than qualitative statements of superiority; this would strengthen the central claim without requiring new experiments.
Clarify the precise definition of the quality map output (e.g., how inpainting residuals are normalized and thresholded) in the methods section to ensure reproducibility of the spatial feedback.
Add a brief discussion of failure cases, such as images where quality degradation is not primarily due to missing vasculature (e.g., severe illumination artifacts), to bound the scope of the anatomical-prior assumption.

Simulated Author's Rebuttal

0 responses · 0 unresolved

We thank the referee for the positive assessment of our manuscript, the recognition of its potential impact on generalization and explainability in medical image QA, and the recommendation for minor revision. As no major comments were raised, we will incorporate any minor suggestions during the revision process.

Circularity Check

0 steps flagged

No significant circularity

full rationale

The paper describes a label-free pipeline that trains an unsupervised anomaly detector on anatomical inpainting and distills it into an adapter on a frozen external foundation model, then evaluates on external datasets. No equations, derivations, or self-citations are present in the provided text that reduce any claimed prediction or quality map to a parameter fitted from the target quality labels. The central performance claim rests on external evaluation rather than internal self-definition or fitted-input renaming.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim rests on the domain assumption that anatomical completeness (vasculature) serves as a sufficient proxy for overall image quality; no free parameters or invented entities are named in the abstract.

axioms (1)

domain assumption Anatomical priors from vasculature are sufficient to determine image quality in fundus photography
The method replaces quality supervision entirely with this premise about what 'should be there'.

pith-pipeline@v0.9.1-grok · 5731 in / 1226 out tokens · 23564 ms · 2026-06-26T17:47:21.914311+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

37 extracted references · 2 linked inside Pith

[1]

Nature Machine Intelligence , volume=

Shortcut learning in deep neural networks , author=. Nature Machine Intelligence , volume=. 2020 , publisher=

2020
[2]

Jama , volume=

Development and validation of a deep learning system for diabetic retinopathy and related eye diseases using retinal images from multiethnic populations with diabetes , author=. Jama , volume=. 2017 , publisher=

2017
[3]

Medical Image Analysis , volume=

A survey on deep learning in medical image analysis , author=. Medical Image Analysis , volume=. 2017 , publisher=

2017
[4]

Biomedical signal processing and control , volume=

Review of medical image quality assessment , author=. Biomedical signal processing and control , volume=. 2016 , publisher=

2016
[5]

Investigative ophthalmology & visual science , volume=

Automated assessment of diabetic retinal image quality based on clarity and field definition , author=. Investigative ophthalmology & visual science , volume=. 2006 , publisher=

2006
[6]

IEEE transactions on medical imaging , volume=

Human visual system-based fundus image quality assessment of portable fundus camera photographs , author=. IEEE transactions on medical imaging , volume=. 2015 , publisher=

2015
[7]

Medical image analysis , volume=

Image structure clustering for image quality verification of color retina images in diabetic retinopathy screening , author=. Medical image analysis , volume=. 2006 , publisher=

2006
[8]

International conference on medical image computing and computer-assisted intervention , pages=

Evaluation of retinal image quality assessment networks in different color-spaces , author=. International conference on medical image computing and computer-assisted intervention , pages=. 2019 , organization=

2019
[9]

Biocybernetics and Biomedical Engineering , volume=

A dark and bright channel prior guided deep network for retinal image quality assessment , author=. Biocybernetics and Biomedical Engineering , volume=. 2022 , publisher=

2022
[10]

Medical image analysis , volume=

Domain-invariant interpretable fundus image quality assessment , author=. Medical image analysis , volume=. 2020 , publisher=

2020
[11]

British Journal of Ophthalmology , volume=

Quality assessment of colour fundus and fluorescein angiography images using deep learning , author=. British Journal of Ophthalmology , volume=. 2024 , publisher=

2024
[12]

2024 , publisher=

Pyramid network with quality-aware contrastive loss for retinal image quality assessment , author=. 2024 , publisher=

2024
[13]

2023 , publisher=

Jin, Kai and Gao, Zhiyuan and Jiang, Xiaoyu and Wang, Yaqi and Ma, Xiaoyu and Li, Yunxiang and Ye, Juan , journal=. 2023 , publisher=

2023
[14]

JAMA ophthalmology , volume=

Automated analysis of retinal images for detection of referable diabetic retinopathy , author=. JAMA ophthalmology , volume=. 2013 , publisher=

2013
[15]

2022 , publisher=

Zhou, Yukun and Wagner, Siegfried K and Chia, Mark A and Zhao, An and Xu, Moucheng and Struyven, Robbert and Alexander, Daniel C and Keane, Pearse A and others , journal=. 2022 , publisher=

2022
[16]

Journal of Open Source Software , volume=

Fundus Image Toolbox: A Python package for fundus image processing , author=. Journal of Open Source Software , volume=
[17]

Yang, Sidi and Wu, Tianhe and Shi, Shuwei and Lao, Shanshan and Gong, Yuan and Cao, Mingdeng and Wang, Jiahao and Yang, Yujiu , booktitle=
[18]

IEEE Transactions on Image Processing , year=

Chen, Chaofeng and Mo, Jiadi and Hou, Jingwen and Wu, Haoning and Liao, Liang and Sun, Wenxiu and Yan, Qiong and Lin, Weisi , title=. IEEE Transactions on Image Processing , year=
[19]

completely blind

Making a “completely blind” image quality analyzer , author=. IEEE Signal processing letters , volume=. 2012 , publisher=

2012
[20]

IEEE Transactions on Image Processing , volume=

A feature-enriched completely blind image quality evaluator , author=. IEEE Transactions on Image Processing , volume=. 2015 , publisher=

2015
[21]

arXiv preprint arXiv:2508.10104 , year=

Sim. arXiv preprint arXiv:2508.10104 , year=

Pith/arXiv arXiv
[22]

Selvaraju, Ramprasaath R and Cogswell, Michael and Das, Abhishek and Vedantam, Ramakrishna and Parikh, Devi and Batra, Dhruv , booktitle=
[23]

arXiv:2110.13266 , year=

Image Quality Assessment using Contrastive Learning , author=. arXiv:2110.13266 , year=

arXiv
[24]

Saha, Avinab and Mishra, Sandeep and Bovik, Alan C , booktitle=
[25]

2024 , publisher=

Khalid, Saif and Rashwan, Hatem A and Abdulwahab, Saddam and Abdel-Nasser, Mohamed and Quiroga, Facundo Manuel and Puig, Domenec , journal=. 2024 , publisher=

2024
[26]

Computer Methods and Programs in Biomedicine , volume=

Learning for retinal image quality assessment with label regularization , author=. Computer Methods and Programs in Biomedicine , volume=. 2023 , publisher=

2023
[27]

The advantages of the Matthews correlation coefficient

Chicco, Davide and Jurman, Giuseppe , journal=. The advantages of the Matthews correlation coefficient. 2020 , publisher=

2020
[28]

arXiv preprint arXiv:1711.05101 , year=

Decoupled weight decay regularization , author=. arXiv preprint arXiv:1711.05101 , year=

Pith/arXiv arXiv
[29]

International Conference on Medical image computing and computer-assisted intervention , pages=

U-net: Convolutional networks for biomedical image segmentation , author=. International Conference on Medical image computing and computer-assisted intervention , pages=. 2015 , organization=

2015
[30]

2015 , howpublished =

Emma Dugas and Jared and Jorge and Will Cukierski , title =. 2015 , howpublished =

2015
[31]

Scientific Data , volume=

A portable retina fundus photos dataset for clinical, demographic, and diabetic retinopathy prediction , author=. Scientific Data , volume=. 2025 , publisher=

2025
[32]

Journal of biomedical optics , volume=

Identification of suitable fundus images using automated quality assessment methods , author=. Journal of biomedical optics , volume=. 2014 , publisher=

2014
[33]

Expert Systems with Applications , volume=

Morano, Jos. Expert Systems with Applications , volume=. 2024 , publisher=

2024
[34]

2025 , howpublished =

Morano, Jos. 2025 , howpublished =

2025
[35]

1986 , isbn =

Lamport, Leslie , title =. 1986 , isbn =

1986
[36]

Distilling the knowledge in a neural network , author=
[37]

Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision , pages=

Arniqa: Learning distortion manifold for image quality assessment , author=. Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision , pages=

[1] [1]

Nature Machine Intelligence , volume=

Shortcut learning in deep neural networks , author=. Nature Machine Intelligence , volume=. 2020 , publisher=

2020

[2] [2]

Jama , volume=

Development and validation of a deep learning system for diabetic retinopathy and related eye diseases using retinal images from multiethnic populations with diabetes , author=. Jama , volume=. 2017 , publisher=

2017

[3] [3]

Medical Image Analysis , volume=

A survey on deep learning in medical image analysis , author=. Medical Image Analysis , volume=. 2017 , publisher=

2017

[4] [4]

Biomedical signal processing and control , volume=

Review of medical image quality assessment , author=. Biomedical signal processing and control , volume=. 2016 , publisher=

2016

[5] [5]

Investigative ophthalmology & visual science , volume=

Automated assessment of diabetic retinal image quality based on clarity and field definition , author=. Investigative ophthalmology & visual science , volume=. 2006 , publisher=

2006

[6] [6]

IEEE transactions on medical imaging , volume=

Human visual system-based fundus image quality assessment of portable fundus camera photographs , author=. IEEE transactions on medical imaging , volume=. 2015 , publisher=

2015

[7] [7]

Medical image analysis , volume=

Image structure clustering for image quality verification of color retina images in diabetic retinopathy screening , author=. Medical image analysis , volume=. 2006 , publisher=

2006

[8] [8]

International conference on medical image computing and computer-assisted intervention , pages=

Evaluation of retinal image quality assessment networks in different color-spaces , author=. International conference on medical image computing and computer-assisted intervention , pages=. 2019 , organization=

2019

[9] [9]

Biocybernetics and Biomedical Engineering , volume=

A dark and bright channel prior guided deep network for retinal image quality assessment , author=. Biocybernetics and Biomedical Engineering , volume=. 2022 , publisher=

2022

[10] [10]

Medical image analysis , volume=

Domain-invariant interpretable fundus image quality assessment , author=. Medical image analysis , volume=. 2020 , publisher=

2020

[11] [11]

British Journal of Ophthalmology , volume=

Quality assessment of colour fundus and fluorescein angiography images using deep learning , author=. British Journal of Ophthalmology , volume=. 2024 , publisher=

2024

[12] [12]

2024 , publisher=

Pyramid network with quality-aware contrastive loss for retinal image quality assessment , author=. 2024 , publisher=

2024

[13] [13]

2023 , publisher=

Jin, Kai and Gao, Zhiyuan and Jiang, Xiaoyu and Wang, Yaqi and Ma, Xiaoyu and Li, Yunxiang and Ye, Juan , journal=. 2023 , publisher=

2023

[14] [14]

JAMA ophthalmology , volume=

Automated analysis of retinal images for detection of referable diabetic retinopathy , author=. JAMA ophthalmology , volume=. 2013 , publisher=

2013

[15] [15]

2022 , publisher=

Zhou, Yukun and Wagner, Siegfried K and Chia, Mark A and Zhao, An and Xu, Moucheng and Struyven, Robbert and Alexander, Daniel C and Keane, Pearse A and others , journal=. 2022 , publisher=

2022

[16] [16]

Journal of Open Source Software , volume=

Fundus Image Toolbox: A Python package for fundus image processing , author=. Journal of Open Source Software , volume=

[17] [17]

Yang, Sidi and Wu, Tianhe and Shi, Shuwei and Lao, Shanshan and Gong, Yuan and Cao, Mingdeng and Wang, Jiahao and Yang, Yujiu , booktitle=

[18] [18]

IEEE Transactions on Image Processing , year=

Chen, Chaofeng and Mo, Jiadi and Hou, Jingwen and Wu, Haoning and Liao, Liang and Sun, Wenxiu and Yan, Qiong and Lin, Weisi , title=. IEEE Transactions on Image Processing , year=

[19] [19]

completely blind

Making a “completely blind” image quality analyzer , author=. IEEE Signal processing letters , volume=. 2012 , publisher=

2012

[20] [20]

IEEE Transactions on Image Processing , volume=

A feature-enriched completely blind image quality evaluator , author=. IEEE Transactions on Image Processing , volume=. 2015 , publisher=

2015

[21] [21]

arXiv preprint arXiv:2508.10104 , year=

Sim. arXiv preprint arXiv:2508.10104 , year=

Pith/arXiv arXiv

[22] [22]

Selvaraju, Ramprasaath R and Cogswell, Michael and Das, Abhishek and Vedantam, Ramakrishna and Parikh, Devi and Batra, Dhruv , booktitle=

[23] [23]

arXiv:2110.13266 , year=

Image Quality Assessment using Contrastive Learning , author=. arXiv:2110.13266 , year=

arXiv

[24] [24]

Saha, Avinab and Mishra, Sandeep and Bovik, Alan C , booktitle=

[25] [25]

2024 , publisher=

Khalid, Saif and Rashwan, Hatem A and Abdulwahab, Saddam and Abdel-Nasser, Mohamed and Quiroga, Facundo Manuel and Puig, Domenec , journal=. 2024 , publisher=

2024

[26] [26]

Computer Methods and Programs in Biomedicine , volume=

Learning for retinal image quality assessment with label regularization , author=. Computer Methods and Programs in Biomedicine , volume=. 2023 , publisher=

2023

[27] [27]

The advantages of the Matthews correlation coefficient

Chicco, Davide and Jurman, Giuseppe , journal=. The advantages of the Matthews correlation coefficient. 2020 , publisher=

2020

[28] [28]

arXiv preprint arXiv:1711.05101 , year=

Decoupled weight decay regularization , author=. arXiv preprint arXiv:1711.05101 , year=

Pith/arXiv arXiv

[29] [29]

International Conference on Medical image computing and computer-assisted intervention , pages=

U-net: Convolutional networks for biomedical image segmentation , author=. International Conference on Medical image computing and computer-assisted intervention , pages=. 2015 , organization=

2015

[30] [30]

2015 , howpublished =

Emma Dugas and Jared and Jorge and Will Cukierski , title =. 2015 , howpublished =

2015

[31] [31]

Scientific Data , volume=

A portable retina fundus photos dataset for clinical, demographic, and diabetic retinopathy prediction , author=. Scientific Data , volume=. 2025 , publisher=

2025

[32] [32]

Journal of biomedical optics , volume=

Identification of suitable fundus images using automated quality assessment methods , author=. Journal of biomedical optics , volume=. 2014 , publisher=

2014

[33] [33]

Expert Systems with Applications , volume=

Morano, Jos. Expert Systems with Applications , volume=. 2024 , publisher=

2024

[34] [34]

2025 , howpublished =

Morano, Jos. 2025 , howpublished =

2025

[35] [35]

1986 , isbn =

Lamport, Leslie , title =. 1986 , isbn =

1986

[36] [36]

Distilling the knowledge in a neural network , author=

[37] [37]

Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision , pages=

Arniqa: Learning distortion manifold for image quality assessment , author=. Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision , pages=