Exploring Deep Learning and Ultra-Widefield Imaging for Diabetic Retinopathy and Macular Edema
Pith reviewed 2026-05-21 11:54 UTC · model grok-4.3
The pith
Deep learning models using ultra-widefield images detect referable diabetic retinopathy and macular edema with consistent strength across architectures.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Using the UWF4DR Challenge dataset, state-of-the-art deep learning models achieve consistently strong performance across all tested architectures on image quality assessment, referable diabetic retinopathy identification, and diabetic macular edema identification. This performance highlights the competitiveness of vision transformers and foundation models as well as the value of feature-level fusion and frequency-domain representations for ultra-widefield image analysis.
What carries the argument
Benchmarking CNNs, vision transformers, and foundation models in spatial RGB and frequency domains with feature-level fusion and Grad-CAM analysis on the UWF4DR Challenge dataset for the three clinical tasks.
If this is right
- Deep learning models deliver strong results on ultra-widefield image quality assessment.
- Referable diabetic retinopathy identification works reliably with the tested approaches.
- Diabetic macular edema detection improves with the wider field and advanced models.
- Feature-level fusion adds robustness across different model types.
- Frequency-domain representations complement spatial processing for ultra-widefield analysis.
Where Pith is reading between the lines
- Eye clinics might switch to ultra-widefield cameras paired with these models to capture more peripheral retina in routine visits.
- Frequency-domain processing could help when image quality varies due to different camera hardware.
- Adding the Grad-CAM maps to doctor review workflows might increase trust and speed up screening.
- Testing the same pipeline on images from broader age and ethnic groups would check how well results hold in diverse populations.
Load-bearing premise
The labels and imaging conditions in the UWF4DR Challenge dataset are representative of real-world clinical ultra-widefield images so that reported performance will translate to practical screening.
What would settle it
A new collection of ultra-widefield images from different clinics and cameras, labeled independently, on which the trained models show substantially lower accuracy than reported on the challenge set.
Figures
read the original abstract
Diabetic retinopathy (DR) and diabetic macular edema (DME) are leading causes of preventable blindness among working-age adults. Traditional approaches in the literature focus on standard color fundus photography (CFP) for the detection of these conditions. Nevertheless, recent ultra-widefield imaging (UWF) offers a significantly wider field of view in comparison to CFP. Motivated by this, the present study explores state-of-the-art deep learning (DL) methods and UWF imaging on three clinically relevant tasks: i) image quality assessment for UWF, ii) identification of referable diabetic retinopathy (RDR), and iii) identification of DME. Using the publicly available UWF4DR Challenge dataset, released as part of the MICCAI 2024 conference, we benchmark DL models in the spatial (RGB) and frequency domains, including popular convolutional neural networks (CNNs) as well as recent vision transformers (ViTs) and foundation models. In addition, we explore a final feature-level fusion to increase robustness. Finally, we also analyze the decisions of the DL models using Grad-CAM, increasing the explainability. Our proposal achieves consistently strong performance across all architectures, underscoring the competitiveness of emerging ViTs and foundation models and the promise of feature-level fusion and frequency-domain representations for UWF analysis.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript benchmarks deep learning models including CNNs, Vision Transformers, and foundation models for three tasks on ultra-widefield images from the public UWF4DR Challenge dataset (MICCAI 2024): image quality assessment, referable diabetic retinopathy detection, and diabetic macular edema detection. It compares spatial and frequency-domain representations, applies feature-level fusion, and uses Grad-CAM for explainability, claiming consistently strong performance that demonstrates the competitiveness of ViTs/foundation models and the value of fusion and frequency-domain methods for UWF analysis.
Significance. If the performance holds under broader testing, the work could advance automated DR/DME screening by leveraging UWF's wider field of view and modern architectures like ViTs. Use of a public challenge dataset aids reproducibility, and inclusion of explainability is a positive step toward clinical utility. However, significance is constrained by single-dataset evaluation without external validation or domain-shift testing, reducing confidence that results generalize beyond the challenge data.
major comments (2)
- Abstract: The claim of 'consistently strong performance across all architectures' is presented without any quantitative metrics (e.g., AUC, sensitivity, specificity), confidence intervals, data-split details, or statistical tests. This absence directly undermines evaluation of the central claim regarding ViT competitiveness and the promise of fusion/frequency-domain methods.
- Experiments section: All benchmarking occurs exclusively on the single UWF4DR Challenge dataset with no external validation cohort, multi-center data, or domain-shift experiments described. This is load-bearing for the claim that results underscore the 'promise ... for UWF analysis,' as performance consistency may reflect dataset-specific properties rather than methodological advantages.
Simulated Author's Rebuttal
We thank the referee for the constructive feedback. We address the major comments point by point below, with revisions made where they strengthen the manuscript without misrepresenting our results.
read point-by-point responses
-
Referee: Abstract: The claim of 'consistently strong performance across all architectures' is presented without any quantitative metrics (e.g., AUC, sensitivity, specificity), confidence intervals, data-split details, or statistical tests. This absence directly undermines evaluation of the central claim regarding ViT competitiveness and the promise of fusion/frequency-domain methods.
Authors: We agree that the abstract would be strengthened by including quantitative support for the claim. The manuscript already reports detailed AUC, sensitivity, specificity, and statistical comparisons in the Experiments section and Tables 2–4. We have revised the abstract to incorporate representative metrics (e.g., peak AUC values per task) while respecting length constraints. revision: yes
-
Referee: Experiments section: All benchmarking occurs exclusively on the single UWF4DR Challenge dataset with no external validation cohort, multi-center data, or domain-shift experiments described. This is load-bearing for the claim that results underscore the 'promise ... for UWF analysis,' as performance consistency may reflect dataset-specific properties rather than methodological advantages.
Authors: We acknowledge that single-dataset evaluation limits strong generalization claims. The UWF4DR dataset is the official public benchmark from the MICCAI 2024 challenge, chosen specifically to enable reproducible comparisons. We have added an expanded limitations paragraph in the Discussion that explicitly addresses domain-shift risks and calls for future multi-center validation. The observed consistency across CNNs, ViTs, and foundation models on this standardized data still supports the methodological points regarding fusion and frequency-domain processing. revision: partial
Circularity Check
No circularity: empirical benchmarking on external public dataset
full rationale
The paper is a standard empirical benchmarking study that evaluates off-the-shelf CNNs, ViTs, and foundation models on the publicly released UWF4DR Challenge dataset for three classification tasks. It reports experimental results in spatial and frequency domains, applies feature-level fusion, and uses Grad-CAM for visualization. No derivations, equations, fitted parameters renamed as predictions, or self-citation chains are present that could reduce any claim to its own inputs by construction. All performance numbers derive from direct evaluation on an external challenge dataset rather than from internal definitions or prior author results.
Axiom & Free-Parameter Ledger
free parameters (1)
- Model selection and training hyperparameters
axioms (1)
- domain assumption UWF4DR Challenge dataset provides accurate ground-truth labels for the three tasks
Lean theorems connected to this paper
-
IndisputableMonolith/Foundation/RealityFromDistinctionreality_from_one_distinction unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
Using the publicly available UWF4DR Challenge dataset... three binary tasks
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Reference graph
Works this paper leans on
-
[1]
N. Cheung, P. Mitchell, and T. Y. Wong, “Diabetic retinopathy,”The Lancet, vol. 376, no. 9735, pp. 124–136, 2010
work page 2010
-
[2]
Global prevalence of diabetic retinopathy and projection of burden through 2045,
Z. L. Teoet al., “Global prevalence of diabetic retinopathy and projection of burden through 2045,”Ophthalmology, 2021
work page 2045
-
[3]
Diabetic macular edema: Diagnosis and management,
N. Elyasi and H. D. Hemmati, “Diabetic macular edema: Diagnosis and management,” AAO EyeNet, 2021
work page 2021
-
[4]
Early Treatment Diabetic Retinopathy Study Research Group, “Grad- ing diabetic retinopathy from stereoscopic color fundus photographs— an extension of the modified airlie house classification,”Ophthalmol- ogy, vol. 98, no. 5 Suppl, pp. 786–806, 1991
work page 1991
-
[5]
The future of ultrawide field imaging for diabetic retinopathy: Pondering the retinal periphery,
J. K. Sun and L. P. Aiello, “The future of ultrawide field imaging for diabetic retinopathy: Pondering the retinal periphery,”JAMA Ophthal- mology, vol. 134, no. 3, pp. 247–248, 2016
work page 2016
-
[6]
Grad-CAM: visual explanations from deep networks via gradient-based localization,
R. R. Selvaraju, M. Cogswell, A. Daset al., “Grad-CAM: visual explanations from deep networks via gradient-based localization,” in Proc. IEEE/CVF International Conference on Computer Vision, 2017
work page 2017
-
[7]
V. Gulshanet al., “Development and validation of a deep learning algorithm for detection of diabetic retinopathy in retinal fundus pho- tographs,”JAMA, vol. 316, no. 22, pp. 2402–2410, 2016
work page 2016
-
[8]
D. S. W. Tinget al., “Development and validation of a deep learning system for diabetic retinopathy and related eye diseases using retinal images from multiethnic populations with diabetes,”JAMA, 2017
work page 2017
-
[9]
Automated identification of diabetic retinopathy using deep learning,
R. Gargeya and T. Leng, “Automated identification of diabetic retinopathy using deep learning,”Ophthalmology, vol. 124, no. 7, pp. 962–969, 2017
work page 2017
-
[10]
E. Abitbol and others., “Deep learning-based classification of retinal vascular diseases using ultra-widefield colour fundus photographs,” BMJ Open Ophthalmology, vol. 7, no. 1, p. e001056, 2022
work page 2022
-
[11]
Deep learning for the detection of multiple fundus diseases using ultra-widefield images,
G. Sunet al., “Deep learning for the detection of multiple fundus diseases using ultra-widefield images,”Ophthalmology and Therapy, vol. 12, pp. 895–907, 2022
work page 2022
-
[12]
Early detection of diabetic retinopathy based on deep learning and ultra-wide-field fundus images,
K. Ohet al., “Early detection of diabetic retinopathy based on deep learning and ultra-wide-field fundus images,”Scientific Reports, vol. 11, no. 1, p. 1897, 2021
work page 2021
-
[13]
A teleophthalmology support system based on the visibility of retinal elements using cnns,
G. Calderon-Auzaet al., “A teleophthalmology support system based on the visibility of retinal elements using cnns,”Sensors, 2020
work page 2020
-
[14]
Deeplearningfrom“passivefeeding
Z.Lietal.,“Deeplearningfrom“passivefeeding”to“selectiveeating” of real-world data,”NPJ Digital Medicine, vol. 3, no. 1, p. 143, 2020
work page 2020
-
[15]
W. Nazih, A. Aseeri, O. Atallah, and S. El-Sappagh, “Vision trans- former model for predicting the severity of diabetic retinopathy in fundus photography-based retina images,”IEEE Access, 2023
work page 2023
-
[16]
S. Q. Y. Yang, Z. Cai, and P. Xu, “Vision transformer with masked autoencoders for referable diabetic retinopathy classification based on large-size retina image,”PLoS ONE, vol. 19, no. 3, 2024
work page 2024
-
[17]
A foundation model for generalizable disease detection from retinal images,
Y. Zhouet al., “A foundation model for generalizable disease detection from retinal images,”Nature, vol. 622, pp. 156–163, 2023
work page 2023
-
[18]
P. Zhang, P.-H. Conze, M. Lamardet al., “Deep learning-based detection of referable diabetic retinopathy and macular edema using ultra-widefield fundus imaging,”arXiv:2409.12854, 2024
-
[19]
MobileNetV2: inverted resid- uals and linear bottlenecks,
M. Sandler, A. Howard, M. Zhuet al., “MobileNetV2: inverted resid- uals and linear bottlenecks,” inProc. IEEE/CVF Conf. on Computer Vision and Pattern Recognition, 2018
work page 2018
-
[20]
Deep residual learning for image recognition,
K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” inProc. IEEE Conf. on Computer Vision and Pattern Recognition, 2016
work page 2016
-
[21]
An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale
A. Dosovitskiyet al., “An image is worth 16x16 words: Transformers for image recognition at scale,”arXiv:2010.11929, 2020
work page internal anchor Pith review Pith/arXiv arXiv 2010
-
[22]
Imagenet: A large-scale hierar- chical image database,
J. Deng, W. Dong, R. Socheret al., “Imagenet: A large-scale hierar- chical image database,” inProc. IEEE Conf. on Computer Vision and Pattern Recognition, 2009
work page 2009
-
[23]
C. P. Wilkinsonet al., “Proposed international clinical diabetic retinopathy and diabetic macular edema disease severity scales,” Ophthalmology, vol. 110, no. 9, pp. 1677–1682, 2003
work page 2003
-
[24]
Synthetic data for the mitigation of demographic biases in face recognition,
P. Melzi, C. Rathgeb, R. Tolosanaet al., “Synthetic data for the mitigation of demographic biases in face recognition,” inProc. IEEE Conf. on International Joint Conference on Biometrics, 2023
work page 2023
-
[25]
S. Romero-Tapiador, R. Tolosanaet al., “Ai4food-nutritionfw: A novel framework for the automatic synthesis and analysis of eating behaviours,”IEEE Access, vol. 11, pp. 112199–112211, 2023
work page 2023
-
[26]
Sdfr: Synthetic data for face recognition competition,
H. O. Shahreza, C. Ecabert, A. Georgeet al., “Sdfr: Synthetic data for face recognition competition,” inProc. International Conference on Automatic Face and Gesture Recognition, 2024
work page 2024
-
[27]
I. DeAndres-Tame, M. Faisal, R. Tolosanaet al., “From pixels to words: Leveraging explainability in face recognition through interac- tive natural language processing,” inProc. International Conference on Pattern Recognition Workshops, 2024
work page 2024
-
[28]
S. Romero-Tapiador, R. Tolosanaet al., “Are vision-language models ready for dietary assessment? exploring the next frontier in ai-powered food image recognition,” inProc. Computer Vision and Pattern Recognition Conference Workshops, 2025
work page 2025
-
[29]
I. Deandres-Tame, R. Tolosana, R. Vera-Rodriguezet al., “How good is chatgpt at face biometrics? a first look into recognition, soft biometrics, and explainability,”IEEE Access, 2024
work page 2024
-
[30]
Attzoom:Atten- tion zoom for better visual features,
D.DeAlcala,A.Morales,J.Fierrez,andR.Tolosana,“Attzoom:Atten- tion zoom for better visual features,” inProc. IEEE/CVF International Conference on Computer Vision, 2025
work page 2025
-
[31]
Exploiting multiple representations: 3d face biometrics fusion with application to surveillance,
S. M. La Cava, R. Casula, S. Concas, G. Orrùet al., “Exploiting multiple representations: 3d face biometrics fusion with application to surveillance,”arXiv:2504.18886, 2025
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.