Towards Interpretable Foundation Models for Retinal Fundus Images
Pith reviewed 2026-05-15 08:47 UTC · model grok-4.3
The pith
A foundation model for retinal fundus images achieves performance comparable to much larger state-of-the-art models while offering built-in interpretability through class evidence maps and 2D projections.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Dual-IFM is a foundation model interpretable-by-design: it generates class evidence maps faithful to the decision-making process for individual images and includes a 2D projection layer for direct visualization of the representation space across datasets. Pretrained on over 800,000 color fundus photographs, the model attains performance levels similar to state-of-the-art foundation models that use up to 16 times more parameters while maintaining interpretable predictions even on out-of-distribution data. This demonstrates that large-scale self-supervised pretraining combined with inherent interpretability mechanisms produces robust representations suitable for retinal imaging applications.
What carries the argument
The Dual-IFM architecture, featuring self-supervised pretraining combined with a mechanism to produce class evidence maps for local interpretability and a dedicated 2D projection layer that visualizes the learned representation space without loss of structure.
If this is right
- The model supports multiple downstream tasks in retinal analysis with explanations that align with its internal decisions.
- Interpretability holds for data from sources not seen during training.
- Performance parity with larger models shows that added interpretability does not require sacrificing scale or accuracy.
- Such models can provide both local and global views of how representations organize medical image data.
Where Pith is reading between the lines
- Applying the same dual interpretability design to other medical imaging domains like chest X-rays could yield similar benefits.
- The 2D projections might help identify biases in training data distributions across different patient populations.
- Clinicians could use the evidence maps to quickly verify model focus on relevant anatomical features like the optic disc or lesions.
- Further scaling the pretraining data beyond 800,000 images may enhance both performance and the clarity of the visualizations.
Load-bearing premise
The class evidence maps and 2D projection layer faithfully capture the model's actual decision process and representation structure without introducing artifacts or distortions.
What would settle it
Finding cases where the class evidence maps highlight image regions unrelated to the predicted retinal condition, while the model still achieves high accuracy, would show the maps are not faithful.
Figures
read the original abstract
Foundation models are used to extract transferable representations from large amounts of unlabeled data, typically via self-supervised learning (SSL). However, many of these models rely on architectures that offer limited interpretability, which is a critical issue in high-stakes domains such as medical imaging. We propose Dual-IFM, a foundation model that is interpretable-by-design in two ways: First, it provides local interpretability for individual images through class evidence maps that are faithful to the decision-making process. Second, it provides global interpretability for entire datasets through a 2D projection layer that allows for direct visualization of the model's representation space. We trained our model on over 800,000 color fundus photography from various sources to learn generalizable, interpretable representations for different downstream tasks. Our results show that our model reaches a performance range similar to that of state-of-the-art foundation models with up to $16\times$ the number of parameters, while providing interpretable predictions on out-of-distribution data. Our results suggest that large-scale SSL pretraining paired with inherent interpretability can lead to robust representations for retinal imaging.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript proposes Dual-IFM, a foundation model for retinal fundus images trained via self-supervised learning on over 800,000 images from multiple sources. It claims two forms of inherent interpretability: class evidence maps that are faithful to the decision process for local explanations, and a 2D projection layer for global visualization of the representation space. The central empirical claim is that the model achieves performance comparable to state-of-the-art foundation models with up to 16 times more parameters while delivering interpretable predictions on out-of-distribution data.
Significance. If the interpretability claims receive rigorous quantitative support, the work would be significant for medical computer vision. It attempts to combine large-scale SSL pretraining with built-in interpretability mechanisms, addressing a key barrier to deploying foundation models in high-stakes clinical settings such as retinal disease screening.
major comments (3)
- [Abstract and Methods] Abstract and Methods: The assertion that class evidence maps are 'faithful to the decision-making process' is not accompanied by quantitative faithfulness evaluations. No deletion/insertion AUC curves, no comparison against occlusion-based ground truth, and no controlled ablations that isolate whether the maps change when the model is forced to rely on different features are reported.
- [Methods] Methods (2D projection layer description): The claim that the 2D projection preserves the structure of the learned representation space without introducing misleading artifacts lacks supporting metrics. No nearest-neighbor preservation scores, no comparison of clustering quality (e.g., silhouette score or k-NN accuracy) before versus after projection, and no sensitivity analysis to projection hyperparameters are provided.
- [Results] Results: The headline performance claim of reaching a 'performance range similar' to models with up to 16× the parameters is stated without tabulated quantitative comparisons, exact accuracy/F1/AUC numbers on the downstream tasks, or ablation studies isolating the contribution of the interpretability components versus standard SSL backbones.
minor comments (2)
- [Abstract] The abstract references 'up to 16× the number of parameters' but does not state the parameter count of Dual-IFM or the exact counts of the compared foundation models.
- Figure captions for the class evidence maps and 2D projections should explicitly state the visualization technique, color mapping, and how the displayed maps relate to the model's output logits or probabilities.
Simulated Author's Rebuttal
We thank the referee for the constructive comments on our manuscript. We address each major comment below and describe the revisions we will make to provide the requested quantitative support.
read point-by-point responses
-
Referee: [Abstract and Methods] Abstract and Methods: The assertion that class evidence maps are 'faithful to the decision-making process' is not accompanied by quantitative faithfulness evaluations. No deletion/insertion AUC curves, no comparison against occlusion-based ground truth, and no controlled ablations that isolate whether the maps change when the model is forced to rely on different features are reported.
Authors: We agree that quantitative faithfulness evaluations are necessary to rigorously support the claim. The current manuscript emphasizes the architectural design that ensures the evidence maps are derived directly from the model's decision process, but we acknowledge the absence of empirical metrics. In the revised version, we will add deletion/insertion AUC curves, comparisons against occlusion-based ground truth, and controlled ablations that demonstrate how the maps respond when the model is forced to rely on different features. revision: yes
-
Referee: [Methods] Methods (2D projection layer description): The claim that the 2D projection preserves the structure of the learned representation space without introducing misleading artifacts lacks supporting metrics. No nearest-neighbor preservation scores, no comparison of clustering quality (e.g., silhouette score or k-NN accuracy) before versus after projection, and no sensitivity analysis to projection hyperparameters are provided.
Authors: We concur that additional quantitative metrics are required to validate the 2D projection layer. While the layer is intended to preserve representation structure, the manuscript currently lacks explicit supporting analyses. In the revision, we will include nearest-neighbor preservation scores, comparisons of clustering quality (silhouette score and k-NN accuracy) before versus after projection, and a sensitivity analysis to projection hyperparameters. revision: yes
-
Referee: [Results] Results: The headline performance claim of reaching a 'performance range similar' to models with up to 16× the parameters is stated without tabulated quantitative comparisons, exact accuracy/F1/AUC numbers on the downstream tasks, or ablation studies isolating the contribution of the interpretability components versus standard SSL backbones.
Authors: We recognize that the performance claims require more detailed quantitative backing. The manuscript reports that performance reaches a similar range to larger models, but does not provide tabulated exact metrics or isolating ablations. In the revised Results section, we will add tabulated comparisons with exact accuracy, F1, and AUC numbers on the downstream tasks, along with ablation studies that isolate the contribution of the interpretability components versus standard SSL backbones. revision: yes
Circularity Check
No significant circularity in derivation chain
full rationale
The paper's central claims rest on large-scale empirical SSL pretraining of Dual-IFM on 800k fundus images followed by downstream task evaluation, with performance compared to larger models and interpretability asserted via class evidence maps and 2D projections. No equations, fitted parameters renamed as predictions, or self-citation chains are presented that reduce any result to its own inputs by construction. The methodology is self-contained against external benchmarks through reported training and testing protocols rather than definitional or self-referential reductions.
Axiom & Free-Parameter Ledger
Lean theorems connected to this paper
-
IndisputableMonolith/Foundation/RealityFromDistinction.leanreality_from_one_distinction unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
Dual-IFM... BagNet architecture... class evidence maps... 2D projection layer... t-SimCNE algorithm
-
IndisputableMonolith/Cost/FunctionalEquation.leanwashburn_uniqueness_aczel unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
class evidence maps that are faithful to the decision-making process
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Reference graph
Works this paper leans on
-
[1]
Nature Machine Intelligence5(9), 1006– 1019 (2023)
Achtibat, R., Dreyer, M., Eisenbraun, I., Bosse, S., Wiegand, T., Samek, W., Lapuschkin, S.: From attribution maps to human-understandable explanations through concept relevance propagation. Nature Machine Intelligence5(9), 1006– 1019 (2023)
work page 2023
-
[2]
Advances in neural information processing systems31 (2018)
Adebayo, J., Gilmer, J., Muelly, M., Goodfellow, I., Hardt, M., Kim, B.: Sanity checks for saliency maps. Advances in neural information processing systems31 (2018)
work page 2018
-
[3]
PloS one13(11), e0207982 (2018)
Ahn, J.M., Kim, S., Ahn, K.S., Cho, S.H., Lee, K.B., Kim, U.S.: A deep learn- ing model for the detection of both advanced and early glaucoma using fundus photography. PloS one13(11), e0207982 (2018)
work page 2018
-
[4]
Scientific Reports14(1), 8484 (2024)
Ayhan, M.S., Neubauer, J., Uzel, M.M., Gelisken, F., Berens, P.: Interpretable detection of Epiretinal Membrane from optical coherence tomography with deep neural networks. Scientific Reports14(1), 8484 (2024)
work page 2024
-
[5]
IEEE transactions on pattern analysis and machine intelligence 35(8), 1798–1828 (2013)
Bengio, Y., Courville, A., Vincent, P.: Representation learning: A review and new perspectives. IEEE transactions on pattern analysis and machine intelligence 35(8), 1798–1828 (2013)
work page 2013
-
[6]
In: International Conference on Learning Representa- tions (2023)
Böhm, J.N., Berens, P., Kobak, D.: Unsupervised visualization of image datasets using contrastive learning. In: International Conference on Learning Representa- tions (2023)
work page 2023
-
[7]
Bommasani, R., et al.: On the opportunities and risks of foundation models (2022)
work page 2022
-
[8]
International Conference on Learning Rep- resentations (2019)
Brendel, W., Bethge, M.: Approximating CNNs with bag-of-local-features models works surprisingly well on ImageNet. International Conference on Learning Rep- resentations (2019)
work page 2019
-
[9]
In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)
Chefer, H., Gur, S., Wolf, L.: Transformer interpretability beyond attention visu- alization. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). pp. 782–791 (June 2021)
work page 2021
-
[10]
In: International Conference on Machine Learning
Chen, T., Kornblith, S., Norouzi, M., Hinton, G.: A simple framework for con- trastive learning of visual representations. In: International Conference on Machine Learning. pp. 1597–1607. PmLR (2020)
work page 2020
-
[11]
Journal of diabetes science and technology3(3), 509–516 (2009)
Cuadros, J., Bresnick, G.: EyePACS: an adaptable telemedicine system for diabetic retinopathy screening. Journal of diabetes science and technology3(3), 509–516 (2009)
work page 2009
-
[12]
Image Analysis & Stereology pp
Decencière, E., Zhang, X., Cazuguel, G., Lay, B., Cochener, B., Trone, C., Gain, P., Ordóñez-Varela, J.R., Massin, P., Erginay, A., et al.: Feedback on a publicly distributed image database: the Messidor database. Image Analysis & Stereology pp. 231–234 (2014) 10 S.O. Mensah et al
work page 2014
-
[13]
Djoumessi, K., Huang, Z., Kühlewein, L., Rickmann, A., Simon, N., Koch, L.M., Berens, P.: An inherently interpretable AI model improves screening speed and accuracyforearlydiabeticretinopathy.PLOSDigitalHealth4(5),e0000831(2025)
work page 2025
-
[14]
In: Medical Imaging with Deep Learning (2023)
Donteu,K.R.D., Ilanchezian,I.,Kühlewein,L., Faber, H.,Baumgartner,C.F.,Bah, B., Berens, P., Koch, L.M.: Sparse activations for interpretable disease grading. In: Medical Imaging with Deep Learning (2023)
work page 2023
-
[15]
In: Interna- tional conference on medical image computing and computer-assisted intervention
Du, J., Guo, J., Zhang, W., Yang, S., Liu, H., Li, H., Wang, N.: Ret-clip: A retinal image foundation model pre-trained with clinical diagnostic reports. In: Interna- tional conference on medical image computing and computer-assisted intervention. pp. 709–719. Springer (2024)
work page 2024
-
[16]
Nature Communications 16(1), 6862 (Jul 2025)
Engelmann, J., Bernabeu, M.O.: Training a high-performance retinal foundation model with half-the-data and 400 times less compute. Nature Communications 16(1), 6862 (Jul 2025)
work page 2025
-
[17]
Journal of Open Source Software10(108), 7101 (Apr 2025)
Gervelmeyer, J., Müller, S., Huang, Z., Berens, P.: Fundus image toolbox: A python package for fundus image processing. Journal of Open Source Software10(108), 7101 (Apr 2025)
work page 2025
-
[18]
Goha, E.F., Chen, Z., Lima, W.X.: APTOS 2019 blindness detection competition dataset (Dec 2024)
work page 2019
-
[19]
Philosophy of Medicine4(1) (2023)
Grote, T.: The allure of simplicity: On interpretable machine learning models in healthcare. Philosophy of Medicine4(1) (2023)
work page 2023
-
[20]
Scientific data9(1), 475 (2022)
Jin, K., Huang, X., Zhou, J., Li, Y., Yan, Y., Sun, Y., Zhang, Q., Wang, Y., Ye, J.: FIVES: A fundus image dataset for artificial intelligence based vessel segmentation. Scientific data9(1), 475 (2022)
work page 2022
-
[21]
Information Fusion122, 103184 (2025)
Kazmierczak, R., Berthier, E., Frehse, G., Franchi, G.: Explainability and vision foundation models: A survey. Information Fusion122, 103184 (2025)
work page 2025
-
[22]
Na- ture communications10(1), 5416 (2019)
Kobak, D., Berens, P.: The art of using t-SNE for single-cell transcriptomics. Na- ture communications10(1), 5416 (2019)
work page 2019
-
[23]
Scientific Data 9(1), 291 (2022)
Kovalyk, O., Morales-Sánchez, J., Verdú-Monedero, R., Sellés-Navarro, I., Palazón- Cabanes, A., Sancho-Gómez, J.L.: PAPILA: Dataset with fundus images and clin- ical data of both eyes of the same patient for glaucoma assessment. Scientific Data 9(1), 291 (2022)
work page 2022
-
[24]
Liu, R., Wang, X., Wu, Q., Dai, L., Fang, X., Yan, T., Son, J., Tang, S., Li, J., Gao, Z., et al.: Deepdrid: Diabetic retinopathy—grading and image quality estimation challenge. Patterns3(6) (2022)
work page 2022
-
[25]
In: International Conference on Learning Representations (2019)
Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: International Conference on Learning Representations (2019)
work page 2019
-
[26]
Porwal, P., Pachade, S., Kamble, R., Kokare, M., Deshmukh, G., Sahasrabuddhe, V., Meriaudeau, F.: Indian diabetic retinopathy image dataset (IDRiD): a database for diabetic retinopathy screening research. Data3(3), 25 (2018)
work page 2018
-
[27]
Nature machine intelligence1(5), 206–215 (2019)
Rudin, C.: Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature machine intelligence1(5), 206–215 (2019)
work page 2019
-
[28]
NPJ digital medicine8(1), 381 (2025)
Shi, D., Zhang, W., Yang, J., Huang, S., Chen, X., Xu, P., Jin, K., Lin, S., Wei, J., Yusufu, M., et al.: A multimodal visual–language foundation model for compu- tational ophthalmology. NPJ digital medicine8(1), 381 (2025)
work page 2025
-
[29]
Nature Biomed- ical Engineering pp
Sun, Y., Tan, W., Gu, Z., He, R., Chen, S., Pang, M., Yan, B.: A data-efficient strategy for building high-performing medical foundation models. Nature Biomed- ical Engineering pp. 1–13 (2025)
work page 2025
-
[30]
The Age-Related Eye Disease Study Research Group: The age-related eye disease study (AREDS): design implications AREDS report no. 1. Controlled clinical trials 20(6), 573 (1999) Towards Interpretable Foundation Models for Retinal Fundus Images 11
work page 1999
-
[31]
Warwick, A.N., Curran, K., Hamill, B., Stuart, K., Khawaja, A.P., Foster, P.J., Lotery, A.J., Quinn, M., Madhusudhan, S., Balaskas, K., et al.: UK Biobank retinal imaging grading: methodology, baseline characteristics and findings for common ocular diseases. Eye37(10), 2109–2116 (2023)
work page 2023
-
[32]
You, Y., Gitman, I., Ginsburg, B.: Large batch training of convolutional networks (2017)
work page 2017
-
[33]
Nature622(7981), 156–163 (2023)
Zhou, Y., Chia, M.A., Wagner, S.K., Ayhan, M.S., Williamson, D.J., Struyven, R.R., Liu, T., Xu, M., Lozano, M.G., Woodward-Court, P., et al.: A foundation model for generalizable disease detection from retinal images. Nature622(7981), 156–163 (2023)
work page 2023
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.