FunPiQ: A New Benchmark for Pixel-Level Quality Assessment in Fundus Images
Pith reviewed 2026-06-25 20:28 UTC · model grok-4.3
The pith
Pixel-level annotations of anatomical visibility provide a task-agnostic criterion for fundus image quality assessment.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
FunPiQ supplies the first pixel-level quality annotations for fundus images based on anatomical visibility, and EFIQA-CP, trained via Non-Negative Positive-Unlabeled learning on visibility-derived pseudo-labels, outperforms classification methods with post-hoc explanations and anomaly detection methods in extensive evaluations.
What carries the argument
EFIQA-CP, an explainable-by-design CNN trained on anatomical-visibility pseudo-labels using Non-Negative Positive-Unlabeled learning.
If this is right
- Pixel-level maps enable quantitative scoring of localized degradations instead of whole-image pass/fail decisions.
- EFIQA-CP supplies built-in explanations without separate post-hoc attribution steps.
- The visibility-based criterion reduces reliance on task-specific expert definitions of acceptable quality.
- The benchmark supports direct comparison of methods on the same pixel-level ground truth.
Where Pith is reading between the lines
- Pixel-level visibility maps could serve as a filter before automated disease grading pipelines to reduce false positives from poor images.
- The same pseudo-labeling approach might transfer to other retinal imaging modalities where anatomical structures are similarly defined.
- Standardized pixel annotations could eventually support regulatory requirements for explainable quality control in screening programs.
Load-bearing premise
Pixel-level annotations based on anatomical visibility constitute a more task-agnostic and explainable quality criterion than existing image-level labels.
What would settle it
A head-to-head test on a held-out clinical dataset in which image-level labels predict downstream diagnostic accuracy more accurately than EFIQA-CP pixel maps, or in which any non-EBD method matches or exceeds EFIQA-CP performance.
Figures
read the original abstract
Color fundus photography (CFP) is the most common ophthalmic imaging modality for large-scale screening. However, it is highly susceptible to degradations, making robust fundus image quality assessment (FIQA) crucial. The criteria for what constitutes high-quality at the image level vary across clinical tasks, making FIQA dependent on expert knowledge. This motivated the development of automated methods and datasets. While existing datasets aim to standardize image-level quality, their criteria often differ. Furthermore, image-level labels preclude the quantitative evaluation of localized degradations, which is essential for trustworthy FIQA. We argue that pixel-level FIQA based on anatomical visibility represents a more task-agnostic, explainable approach. In this work, we introduce FunPiQ, the first FIQA benchmark to provide pixel-level quality annotations. In addition, we propose EFIQA-CP, an explainable-by-design (EBD) method that uses quality pseudo-labels based on anatomical visibility to train a CNN via Non-Negative Positive-Unlabeled learning. Extensive evaluations of classification methods with post-hoc explanations, anomaly detection methods, and EBD methods demonstrate the superior performance of the last and, particularly, of EFIQA-CP.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript introduces FunPiQ, the first benchmark providing pixel-level quality annotations for color fundus photography images based on anatomical visibility criteria, and proposes EFIQA-CP, an explainable-by-design CNN trained via Non-Negative Positive-Unlabeled (NNPU) learning on visibility-derived pseudo-labels. It reports extensive evaluations comparing classification methods with post-hoc explanations, anomaly detection methods, and other EBD methods, claiming superior performance for the EBD category and particularly for EFIQA-CP.
Significance. If the pixel-level formulation and NNPU training prove robust, the benchmark and method could advance explainable, localized FIQA for ophthalmic screening by moving beyond image-level labels whose criteria vary by task. The pseudo-label approach and EBD design are technically interesting contributions for handling unlabeled data in medical imaging.
major comments (1)
- [Evaluation / Experiments (abstract and implied results sections)] The central claim that pixel-level anatomical visibility yields a more task-agnostic criterion (motivation section of abstract) is evaluated entirely within the FunPiQ benchmark whose labels are defined by that same visibility criterion. No downstream-task experiments (e.g., improvement in diabetic-retinopathy grading or glaucoma detection accuracy when low-quality images are filtered by the pixel maps versus image-level scores) are reported, so the reported superiority of EFIQA-CP could be an artifact of the annotation scheme rather than a general advantage.
Simulated Author's Rebuttal
We thank the referee for the constructive comment on our evaluation approach.
read point-by-point responses
-
Referee: [Evaluation / Experiments (abstract and implied results sections)] The central claim that pixel-level anatomical visibility yields a more task-agnostic criterion (motivation section of abstract) is evaluated entirely within the FunPiQ benchmark whose labels are defined by that same visibility criterion. No downstream-task experiments (e.g., improvement in diabetic-retinopathy grading or glaucoma detection accuracy when low-quality images are filtered by the pixel maps versus image-level scores) are reported, so the reported superiority of EFIQA-CP could be an artifact of the annotation scheme rather than a general advantage.
Authors: We acknowledge that the evaluation is performed exclusively on FunPiQ, whose labels are derived from the anatomical visibility criterion. This setup is intentional to provide the first quantitative benchmark for pixel-level FIQA. The motivation for task-agnosticism stems from the documented variability in image-level quality criteria across different clinical applications, as stated in the abstract and introduction. Pixel-level visibility offers a consistent alternative that does not depend on specific downstream tasks. Our results show EFIQA-CP's superiority over classification and anomaly detection methods in this setting, particularly in explainability. While downstream experiments (e.g., on DR grading) would be a valuable addition to demonstrate broader utility, they fall outside the primary scope of establishing the benchmark and the EBD method. We maintain that the reported advantages are not merely artifacts but reflect the method's ability to leverage the pseudo-labels effectively. revision: no
Circularity Check
No circularity: benchmark labels and training criterion are external and independent of model outputs
full rationale
The paper defines pixel-level quality annotations via an external anatomical-visibility criterion and trains EFIQA-CP on pseudo-labels derived from that same criterion. Evaluation of classification, anomaly-detection, and EBD methods occurs on the resulting FunPiQ benchmark. This is a standard supervised-learning setup with no self-definitional reduction (no quantity is fitted and then re-predicted as a new result), no fitted-input-called-prediction, and no load-bearing self-citation. The claim that the visibility criterion is more task-agnostic is an untested motivation rather than a derivation step that collapses to its inputs. The central performance comparison therefore remains empirically grounded in the provided external annotations.
Axiom & Free-Parameter Ledger
Reference graph
Works this paper leans on
-
[1]
Ba, J.L., Kiros, J.R., Hinton, G.E.: Layer normalization. arXiv:1607.06450 (2016)
Pith/arXiv arXiv 2016
-
[2]
Nature communica- tions12(1), 4828 (2021)
Cen, L.P., Ji, J., Lin, J.W., et al.: Automatic detection of 39 fundus diseases and conditions in retinal photographs using deep neural networks. Nature communica- tions12(1), 4828 (2021)
2021
-
[3]
Psychological bulletin70(4), 213 (1968)
Cohen, J.: Weighted kappa: Nominal scale agreement provision for scaled disagree- ment or partial credit. Psychological bulletin70(4), 213 (1968)
1968
-
[4]
In: 2017 16th IEEE International Conference on machine learning and applications (ICMLA)
Costa, P., Campilho, A., Hooi, B., et al.: EyeQual: Accurate, explainable, retinal image quality assessment. In: 2017 16th IEEE International Conference on machine learning and applications (ICMLA). pp. 323–330. IEEE (2017)
2017
-
[5]
Dice,L.R.:Measuresoftheamountofecologicassociationbetweenspecies.Ecology 26(3), 297–302 (1945)
1945
-
[6]
Dugas, E., Jared, Jorge, Cukierski, W.: Diabetic Retinopathy Detection.https: //kaggle.com/competitions/diabetic-retinopathy-detection(2015), kaggle
2015
-
[7]
Survey of Ophthalmology 67(5), 1373–1390 (2022)
Fogel-Levin, M., Sadda, S.R., Rosenfeld, P.J., et al.: Advanced retinal imaging and applications for clinical practice: A consensus review. Survey of Ophthalmology 67(5), 1373–1390 (2022)
2022
-
[8]
In: International conference on medical image computing and computer-assisted intervention
Fu, H., Wang, B., Shen, J., et al.: Evaluation of retinal image quality assessment networks in different color-spaces. In: International conference on medical image computing and computer-assisted intervention. pp. 48–56. Springer (2019)
2019
-
[9]
Circulation101(23), e215–e220 (2000), rRID:SCR_007345 10 P
Goldberger, A.L., Amaral, L.A.N., Glass, L., et al.: Physiobank, physiotoolkit, and physionet: Components of a new research resource for complex physiologic signals. Circulation101(23), e215–e220 (2000), rRID:SCR_007345 10 P. Wang et al
2000
-
[10]
Scientific Reports15(1), 40524 (2025)
Gong, Z., Deng, Z., Gan, R., et al.: Acquire continuous and precise score for fundus image quality assessment: FTHNet and FQS dataset. Scientific Reports15(1), 40524 (2025)
2025
-
[11]
Ophthalmology and therapy13(8), 2125–2149 (2024)
Grzybowski, A., Jin, K., Zhou, J., et al.: Retina fundus photograph-based artifi- cial intelligence algorithms in medicine: a systematic review. Ophthalmology and therapy13(8), 2125–2149 (2024)
2024
-
[12]
In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition
Gu, Z., Zhu, B., Zhu, G., et al.: UniVAD: A training-free unified model for few- shot visual anomaly detection. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. pp. 15194–15203 (2025)
2025
-
[13]
Hendrycks, D., Gimpel, K.: Gaussian error linear units (GELUs). arXiv:1606.08415 (2016)
Pith/arXiv arXiv 2016
-
[14]
Scientific data10(1), 286 (2023)
Jin, K., Gao, Z., Jiang, X., et al.: MSHF: a multi-source heterogeneous fundus (MSHF) dataset for image quality assessment. Scientific data10(1), 286 (2023)
2023
-
[15]
Expert Systems With Applications238, 121644 (2024)
Khalid, S., Rashwan, H.A., Abdulwahab, S., et al.: FGR-Net: Interpretable fundus image gradeability classification based on deep reconstruction learning. Expert Systems With Applications238, 121644 (2024)
2024
-
[16]
Advances in neural information processing sys- tems30(2017)
Kiryo, R., Niu, G., Du Plessis, M.C., Sugiyama, M.: Positive-unlabeled learning with non-negative risk estimator. Advances in neural information processing sys- tems30(2017)
2017
-
[17]
Patterns3(6) (2022)
Liu, R., Wang, X., Wu, Q., et al.: DeepDRiD: Diabetic retinopathy—grading and image quality estimation challenge. Patterns3(6) (2022)
2022
-
[18]
In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition
Liu, Z., Mao, H., Wu, C.Y., et al.: A ConvNet for the 2020s. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. pp. 11976– 11986 (2022)
2022
-
[19]
In: International Conference on Learning Representations (2019)
Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: International Conference on Learning Representations (2019)
2019
-
[20]
PhysioNet (Mar 2023), version 1.0.0
Nakayama, L.F., Goncalves, M., Zago Ribeiro, L., et al.: A Brazilian Multilabel Ophthalmological Dataset (BRSET). PhysioNet (Mar 2023), version 1.0.0
2023
-
[21]
PLOS Digital Health3(7), e0000454 (2024)
Nakayama, L.F., Restrepo, D., Matos, J., et al.: BRSET: a Brazilian multilabel ophthalmological dataset of retina fundus photos. PLOS Digital Health3(7), e0000454 (2024)
2024
-
[22]
PhysioNet (Jun 2024), version 1.0
Nakayama, L.F., Zago Ribeiro, L., Restrepo, D., et al.: mBRSET, a Mobile Brazil- ian Retinal Dataset. PhysioNet (Jun 2024), version 1.0
2024
-
[23]
Telemedicine and e-Health22(3), 198–208 (2016)
Panwar, N., Huang, P., Lee, J., et al.: Fundus photography in the 21st century—a reviewofrecenttechnologicaladvancesandtheirimplicationsforworldwidehealth- care. Telemedicine and e-Health22(3), 198–208 (2016)
2016
-
[24]
In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition
Roth, K., Pemula, L., Zepeda, J., et al.: Towards total recall in industrial anomaly detection. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. pp. 14318–14328 (2022)
2022
-
[25]
In: Proceedings of the IEEE international conference on computer vision
Selvaraju, R.R., Cogswell, M., Das, A., et al.: Grad-CAM: Visual explanations from deep networks via gradient-based localization. In: Proceedings of the IEEE international conference on computer vision. pp. 618–626 (2017)
2017
-
[26]
Journal of biomedical optics19(4), 046006–046006 (2014)
Şevik, U., Köse, C., Berber, T., Erdöl, H.: Identification of suitable fundus images using automated quality assessment methods. Journal of biomedical optics19(4), 046006–046006 (2014)
2014
-
[27]
Medical image analysis61, 101654 (2020)
Shen, Y., Sheng, B., Fang, R., et al.: Domain-invariant interpretable fundus image quality assessment. Medical image analysis61, 101654 (2020)
2020
-
[28]
Transactions on Machine Learn- ing Research (2026) FunPiQ 11
Siméoni, O., Vo, H.V., Seitzer, M., et al.: DINOv3. Transactions on Machine Learn- ing Research (2026) FunPiQ 11
2026
-
[29]
In: Medical Imaging with Deep Learning (2026)
Wang, P., Morano, J., Wan, Q., Bogunović, H.: EFIQA: Explainable Fundus Im- age Quality Assessment via Anatomical Priors. In: Medical Imaging with Deep Learning (2026)
2026
-
[30]
Scientific Data12(1), 323 (2025)
Wu, C., Restrepo, D., Nakayama, L.F., et al.: A portable retina fundus photos dataset for clinical, demographic, and diabetic retinopathy prediction. Scientific Data12(1), 323 (2025)
2025
-
[31]
In: International Workshop on Ophthalmic Medical Image Analysis
Zhou, K., Gu, Z., Li, A., et al.: Fundus image quality-guided diabetic retinopathy grading. In: International Workshop on Ophthalmic Medical Image Analysis. pp. 245–252. Springer (2018)
2018
-
[32]
Progress in Retinal and Eye Research106, 101350 (2025)
Zhu, Z., Wang, Y., Qi, Z., et al.: Oculomics: Current concepts and evidence. Progress in Retinal and Eye Research106, 101350 (2025)
2025
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.