pith. sign in

arxiv: 2603.18846 · v2 · submitted 2026-03-19 · 💻 cs.CV · cs.LG· stat.CO

Towards Interpretable Foundation Models for Retinal Fundus Images

Pith reviewed 2026-05-15 08:47 UTC · model grok-4.3

classification 💻 cs.CV cs.LGstat.CO
keywords foundation modelsretinal fundus imagesinterpretabilityself-supervised learningclass evidence maps2D projection layerout-of-distributionmedical imaging
0
0 comments X

The pith

A foundation model for retinal fundus images achieves performance comparable to much larger state-of-the-art models while offering built-in interpretability through class evidence maps and 2D projections.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

This paper introduces Dual-IFM, a foundation model designed for retinal fundus images that prioritizes interpretability from the ground up. Trained via self-supervised learning on more than 800,000 images from various sources, it learns transferable representations for multiple downstream tasks. The model delivers local explanations by generating class evidence maps that directly reflect its decision process for each image. It also enables global understanding by projecting the entire representation space into 2D for visualization. Results indicate it performs on par with foundation models up to 16 times larger, even on out-of-distribution data, suggesting that interpretability can be integrated without major performance trade-offs in medical imaging.

Core claim

Dual-IFM is a foundation model interpretable-by-design: it generates class evidence maps faithful to the decision-making process for individual images and includes a 2D projection layer for direct visualization of the representation space across datasets. Pretrained on over 800,000 color fundus photographs, the model attains performance levels similar to state-of-the-art foundation models that use up to 16 times more parameters while maintaining interpretable predictions even on out-of-distribution data. This demonstrates that large-scale self-supervised pretraining combined with inherent interpretability mechanisms produces robust representations suitable for retinal imaging applications.

What carries the argument

The Dual-IFM architecture, featuring self-supervised pretraining combined with a mechanism to produce class evidence maps for local interpretability and a dedicated 2D projection layer that visualizes the learned representation space without loss of structure.

If this is right

  • The model supports multiple downstream tasks in retinal analysis with explanations that align with its internal decisions.
  • Interpretability holds for data from sources not seen during training.
  • Performance parity with larger models shows that added interpretability does not require sacrificing scale or accuracy.
  • Such models can provide both local and global views of how representations organize medical image data.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Applying the same dual interpretability design to other medical imaging domains like chest X-rays could yield similar benefits.
  • The 2D projections might help identify biases in training data distributions across different patient populations.
  • Clinicians could use the evidence maps to quickly verify model focus on relevant anatomical features like the optic disc or lesions.
  • Further scaling the pretraining data beyond 800,000 images may enhance both performance and the clarity of the visualizations.

Load-bearing premise

The class evidence maps and 2D projection layer faithfully capture the model's actual decision process and representation structure without introducing artifacts or distortions.

What would settle it

Finding cases where the class evidence maps highlight image regions unrelated to the predicted retinal condition, while the model still achieves high accuracy, would show the maps are not faithful.

Figures

Figures reproduced from arXiv: 2603.18846 by Camila Roa, Kerol Djoumessi, Philipp Berens, Samuel Ofosu Mensah.

Figure 1
Figure 1. Figure 1: Overview of our inherently-interpretable foundation model [PITH_FULL_IMAGE:figures/full_fig_p004_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Global interpretability via representation space visualization [PITH_FULL_IMAGE:figures/full_fig_p007_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Local interpretability through class evidence maps [PITH_FULL_IMAGE:figures/full_fig_p008_3.png] view at source ↗
read the original abstract

Foundation models are used to extract transferable representations from large amounts of unlabeled data, typically via self-supervised learning (SSL). However, many of these models rely on architectures that offer limited interpretability, which is a critical issue in high-stakes domains such as medical imaging. We propose Dual-IFM, a foundation model that is interpretable-by-design in two ways: First, it provides local interpretability for individual images through class evidence maps that are faithful to the decision-making process. Second, it provides global interpretability for entire datasets through a 2D projection layer that allows for direct visualization of the model's representation space. We trained our model on over 800,000 color fundus photography from various sources to learn generalizable, interpretable representations for different downstream tasks. Our results show that our model reaches a performance range similar to that of state-of-the-art foundation models with up to $16\times$ the number of parameters, while providing interpretable predictions on out-of-distribution data. Our results suggest that large-scale SSL pretraining paired with inherent interpretability can lead to robust representations for retinal imaging.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

3 major / 2 minor

Summary. The manuscript proposes Dual-IFM, a foundation model for retinal fundus images trained via self-supervised learning on over 800,000 images from multiple sources. It claims two forms of inherent interpretability: class evidence maps that are faithful to the decision process for local explanations, and a 2D projection layer for global visualization of the representation space. The central empirical claim is that the model achieves performance comparable to state-of-the-art foundation models with up to 16 times more parameters while delivering interpretable predictions on out-of-distribution data.

Significance. If the interpretability claims receive rigorous quantitative support, the work would be significant for medical computer vision. It attempts to combine large-scale SSL pretraining with built-in interpretability mechanisms, addressing a key barrier to deploying foundation models in high-stakes clinical settings such as retinal disease screening.

major comments (3)
  1. [Abstract and Methods] Abstract and Methods: The assertion that class evidence maps are 'faithful to the decision-making process' is not accompanied by quantitative faithfulness evaluations. No deletion/insertion AUC curves, no comparison against occlusion-based ground truth, and no controlled ablations that isolate whether the maps change when the model is forced to rely on different features are reported.
  2. [Methods] Methods (2D projection layer description): The claim that the 2D projection preserves the structure of the learned representation space without introducing misleading artifacts lacks supporting metrics. No nearest-neighbor preservation scores, no comparison of clustering quality (e.g., silhouette score or k-NN accuracy) before versus after projection, and no sensitivity analysis to projection hyperparameters are provided.
  3. [Results] Results: The headline performance claim of reaching a 'performance range similar' to models with up to 16× the parameters is stated without tabulated quantitative comparisons, exact accuracy/F1/AUC numbers on the downstream tasks, or ablation studies isolating the contribution of the interpretability components versus standard SSL backbones.
minor comments (2)
  1. [Abstract] The abstract references 'up to 16× the number of parameters' but does not state the parameter count of Dual-IFM or the exact counts of the compared foundation models.
  2. Figure captions for the class evidence maps and 2D projections should explicitly state the visualization technique, color mapping, and how the displayed maps relate to the model's output logits or probabilities.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the constructive comments on our manuscript. We address each major comment below and describe the revisions we will make to provide the requested quantitative support.

read point-by-point responses
  1. Referee: [Abstract and Methods] Abstract and Methods: The assertion that class evidence maps are 'faithful to the decision-making process' is not accompanied by quantitative faithfulness evaluations. No deletion/insertion AUC curves, no comparison against occlusion-based ground truth, and no controlled ablations that isolate whether the maps change when the model is forced to rely on different features are reported.

    Authors: We agree that quantitative faithfulness evaluations are necessary to rigorously support the claim. The current manuscript emphasizes the architectural design that ensures the evidence maps are derived directly from the model's decision process, but we acknowledge the absence of empirical metrics. In the revised version, we will add deletion/insertion AUC curves, comparisons against occlusion-based ground truth, and controlled ablations that demonstrate how the maps respond when the model is forced to rely on different features. revision: yes

  2. Referee: [Methods] Methods (2D projection layer description): The claim that the 2D projection preserves the structure of the learned representation space without introducing misleading artifacts lacks supporting metrics. No nearest-neighbor preservation scores, no comparison of clustering quality (e.g., silhouette score or k-NN accuracy) before versus after projection, and no sensitivity analysis to projection hyperparameters are provided.

    Authors: We concur that additional quantitative metrics are required to validate the 2D projection layer. While the layer is intended to preserve representation structure, the manuscript currently lacks explicit supporting analyses. In the revision, we will include nearest-neighbor preservation scores, comparisons of clustering quality (silhouette score and k-NN accuracy) before versus after projection, and a sensitivity analysis to projection hyperparameters. revision: yes

  3. Referee: [Results] Results: The headline performance claim of reaching a 'performance range similar' to models with up to 16× the parameters is stated without tabulated quantitative comparisons, exact accuracy/F1/AUC numbers on the downstream tasks, or ablation studies isolating the contribution of the interpretability components versus standard SSL backbones.

    Authors: We recognize that the performance claims require more detailed quantitative backing. The manuscript reports that performance reaches a similar range to larger models, but does not provide tabulated exact metrics or isolating ablations. In the revised Results section, we will add tabulated comparisons with exact accuracy, F1, and AUC numbers on the downstream tasks, along with ablation studies that isolate the contribution of the interpretability components versus standard SSL backbones. revision: yes

Circularity Check

0 steps flagged

No significant circularity in derivation chain

full rationale

The paper's central claims rest on large-scale empirical SSL pretraining of Dual-IFM on 800k fundus images followed by downstream task evaluation, with performance compared to larger models and interpretability asserted via class evidence maps and 2D projections. No equations, fitted parameters renamed as predictions, or self-citation chains are presented that reduce any result to its own inputs by construction. The methodology is self-contained against external benchmarks through reported training and testing protocols rather than definitional or self-referential reductions.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Only the abstract is available, so no specific free parameters, axioms, or invented entities can be extracted or verified.

pith-pipeline@v0.9.0 · 5502 in / 1069 out tokens · 43118 ms · 2026-05-15T08:47:40.034485+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

33 extracted references · 33 canonical work pages

  1. [1]

    Nature Machine Intelligence5(9), 1006– 1019 (2023)

    Achtibat, R., Dreyer, M., Eisenbraun, I., Bosse, S., Wiegand, T., Samek, W., Lapuschkin, S.: From attribution maps to human-understandable explanations through concept relevance propagation. Nature Machine Intelligence5(9), 1006– 1019 (2023)

  2. [2]

    Advances in neural information processing systems31 (2018)

    Adebayo, J., Gilmer, J., Muelly, M., Goodfellow, I., Hardt, M., Kim, B.: Sanity checks for saliency maps. Advances in neural information processing systems31 (2018)

  3. [3]

    PloS one13(11), e0207982 (2018)

    Ahn, J.M., Kim, S., Ahn, K.S., Cho, S.H., Lee, K.B., Kim, U.S.: A deep learn- ing model for the detection of both advanced and early glaucoma using fundus photography. PloS one13(11), e0207982 (2018)

  4. [4]

    Scientific Reports14(1), 8484 (2024)

    Ayhan, M.S., Neubauer, J., Uzel, M.M., Gelisken, F., Berens, P.: Interpretable detection of Epiretinal Membrane from optical coherence tomography with deep neural networks. Scientific Reports14(1), 8484 (2024)

  5. [5]

    IEEE transactions on pattern analysis and machine intelligence 35(8), 1798–1828 (2013)

    Bengio, Y., Courville, A., Vincent, P.: Representation learning: A review and new perspectives. IEEE transactions on pattern analysis and machine intelligence 35(8), 1798–1828 (2013)

  6. [6]

    In: International Conference on Learning Representa- tions (2023)

    Böhm, J.N., Berens, P., Kobak, D.: Unsupervised visualization of image datasets using contrastive learning. In: International Conference on Learning Representa- tions (2023)

  7. [7]

    Bommasani, R., et al.: On the opportunities and risks of foundation models (2022)

  8. [8]

    International Conference on Learning Rep- resentations (2019)

    Brendel, W., Bethge, M.: Approximating CNNs with bag-of-local-features models works surprisingly well on ImageNet. International Conference on Learning Rep- resentations (2019)

  9. [9]

    In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)

    Chefer, H., Gur, S., Wolf, L.: Transformer interpretability beyond attention visu- alization. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). pp. 782–791 (June 2021)

  10. [10]

    In: International Conference on Machine Learning

    Chen, T., Kornblith, S., Norouzi, M., Hinton, G.: A simple framework for con- trastive learning of visual representations. In: International Conference on Machine Learning. pp. 1597–1607. PmLR (2020)

  11. [11]

    Journal of diabetes science and technology3(3), 509–516 (2009)

    Cuadros, J., Bresnick, G.: EyePACS: an adaptable telemedicine system for diabetic retinopathy screening. Journal of diabetes science and technology3(3), 509–516 (2009)

  12. [12]

    Image Analysis & Stereology pp

    Decencière, E., Zhang, X., Cazuguel, G., Lay, B., Cochener, B., Trone, C., Gain, P., Ordóñez-Varela, J.R., Massin, P., Erginay, A., et al.: Feedback on a publicly distributed image database: the Messidor database. Image Analysis & Stereology pp. 231–234 (2014) 10 S.O. Mensah et al

  13. [13]

    Djoumessi, K., Huang, Z., Kühlewein, L., Rickmann, A., Simon, N., Koch, L.M., Berens, P.: An inherently interpretable AI model improves screening speed and accuracyforearlydiabeticretinopathy.PLOSDigitalHealth4(5),e0000831(2025)

  14. [14]

    In: Medical Imaging with Deep Learning (2023)

    Donteu,K.R.D., Ilanchezian,I.,Kühlewein,L., Faber, H.,Baumgartner,C.F.,Bah, B., Berens, P., Koch, L.M.: Sparse activations for interpretable disease grading. In: Medical Imaging with Deep Learning (2023)

  15. [15]

    In: Interna- tional conference on medical image computing and computer-assisted intervention

    Du, J., Guo, J., Zhang, W., Yang, S., Liu, H., Li, H., Wang, N.: Ret-clip: A retinal image foundation model pre-trained with clinical diagnostic reports. In: Interna- tional conference on medical image computing and computer-assisted intervention. pp. 709–719. Springer (2024)

  16. [16]

    Nature Communications 16(1), 6862 (Jul 2025)

    Engelmann, J., Bernabeu, M.O.: Training a high-performance retinal foundation model with half-the-data and 400 times less compute. Nature Communications 16(1), 6862 (Jul 2025)

  17. [17]

    Journal of Open Source Software10(108), 7101 (Apr 2025)

    Gervelmeyer, J., Müller, S., Huang, Z., Berens, P.: Fundus image toolbox: A python package for fundus image processing. Journal of Open Source Software10(108), 7101 (Apr 2025)

  18. [18]

    Goha, E.F., Chen, Z., Lima, W.X.: APTOS 2019 blindness detection competition dataset (Dec 2024)

  19. [19]

    Philosophy of Medicine4(1) (2023)

    Grote, T.: The allure of simplicity: On interpretable machine learning models in healthcare. Philosophy of Medicine4(1) (2023)

  20. [20]

    Scientific data9(1), 475 (2022)

    Jin, K., Huang, X., Zhou, J., Li, Y., Yan, Y., Sun, Y., Zhang, Q., Wang, Y., Ye, J.: FIVES: A fundus image dataset for artificial intelligence based vessel segmentation. Scientific data9(1), 475 (2022)

  21. [21]

    Information Fusion122, 103184 (2025)

    Kazmierczak, R., Berthier, E., Frehse, G., Franchi, G.: Explainability and vision foundation models: A survey. Information Fusion122, 103184 (2025)

  22. [22]

    Na- ture communications10(1), 5416 (2019)

    Kobak, D., Berens, P.: The art of using t-SNE for single-cell transcriptomics. Na- ture communications10(1), 5416 (2019)

  23. [23]

    Scientific Data 9(1), 291 (2022)

    Kovalyk, O., Morales-Sánchez, J., Verdú-Monedero, R., Sellés-Navarro, I., Palazón- Cabanes, A., Sancho-Gómez, J.L.: PAPILA: Dataset with fundus images and clin- ical data of both eyes of the same patient for glaucoma assessment. Scientific Data 9(1), 291 (2022)

  24. [24]

    Patterns3(6) (2022)

    Liu, R., Wang, X., Wu, Q., Dai, L., Fang, X., Yan, T., Son, J., Tang, S., Li, J., Gao, Z., et al.: Deepdrid: Diabetic retinopathy—grading and image quality estimation challenge. Patterns3(6) (2022)

  25. [25]

    In: International Conference on Learning Representations (2019)

    Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: International Conference on Learning Representations (2019)

  26. [26]

    Data3(3), 25 (2018)

    Porwal, P., Pachade, S., Kamble, R., Kokare, M., Deshmukh, G., Sahasrabuddhe, V., Meriaudeau, F.: Indian diabetic retinopathy image dataset (IDRiD): a database for diabetic retinopathy screening research. Data3(3), 25 (2018)

  27. [27]

    Nature machine intelligence1(5), 206–215 (2019)

    Rudin, C.: Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature machine intelligence1(5), 206–215 (2019)

  28. [28]

    NPJ digital medicine8(1), 381 (2025)

    Shi, D., Zhang, W., Yang, J., Huang, S., Chen, X., Xu, P., Jin, K., Lin, S., Wei, J., Yusufu, M., et al.: A multimodal visual–language foundation model for compu- tational ophthalmology. NPJ digital medicine8(1), 381 (2025)

  29. [29]

    Nature Biomed- ical Engineering pp

    Sun, Y., Tan, W., Gu, Z., He, R., Chen, S., Pang, M., Yan, B.: A data-efficient strategy for building high-performing medical foundation models. Nature Biomed- ical Engineering pp. 1–13 (2025)

  30. [30]

    The Age-Related Eye Disease Study Research Group: The age-related eye disease study (AREDS): design implications AREDS report no. 1. Controlled clinical trials 20(6), 573 (1999) Towards Interpretable Foundation Models for Retinal Fundus Images 11

  31. [31]

    Eye37(10), 2109–2116 (2023)

    Warwick, A.N., Curran, K., Hamill, B., Stuart, K., Khawaja, A.P., Foster, P.J., Lotery, A.J., Quinn, M., Madhusudhan, S., Balaskas, K., et al.: UK Biobank retinal imaging grading: methodology, baseline characteristics and findings for common ocular diseases. Eye37(10), 2109–2116 (2023)

  32. [32]

    You, Y., Gitman, I., Ginsburg, B.: Large batch training of convolutional networks (2017)

  33. [33]

    Nature622(7981), 156–163 (2023)

    Zhou, Y., Chia, M.A., Wagner, S.K., Ayhan, M.S., Williamson, D.J., Struyven, R.R., Liu, T., Xu, M., Lozano, M.G., Woodward-Court, P., et al.: A foundation model for generalizable disease detection from retinal images. Nature622(7981), 156–163 (2023)