pith. sign in

arxiv: 2405.11338 · v2 · pith:XAWXIYZSnew · submitted 2024-05-18 · 💻 cs.CV · cs.AI

EyeFound: A Multimodal Generalist Foundation Model for Ophthalmic Imaging

classification 💻 cs.CV cs.AI
keywords eyefoundmultimodalfoundationmodelmodelsimagesimagingophthalmic
0
0 comments X
read the original abstract

Artificial intelligence (AI) is vital in ophthalmology, tackling tasks like diagnosis, classification, and visual question answering (VQA). However, existing AI models in this domain often require extensive annotation and are task-specific, limiting their clinical utility. While recent developments have brought about foundation models for ophthalmology, they are limited by the need to train separate weights for each imaging modality, preventing a comprehensive representation of multi-modal features. This highlights the need for versatile foundation models capable of handling various tasks and modalities in ophthalmology. To address this gap, we present EyeFound, a multimodal foundation model for ophthalmic images. Unlike existing models, EyeFound learns generalizable representations from unlabeled multimodal retinal images, enabling efficient model adaptation across multiple applications. Trained on 2.78 million images from 227 hospitals across 11 ophthalmic modalities, EyeFound facilitates generalist representations and diverse multimodal downstream tasks, even for detecting challenging rare diseases. It outperforms previous work RETFound in diagnosing eye diseases, predicting systemic disease incidents, and zero-shot multimodal VQA. EyeFound provides a generalizable solution to improve model performance and lessen the annotation burden on experts, facilitating widespread clinical AI applications for retinal imaging.

This paper has not been read by Pith yet.

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Forward citations

Cited by 2 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. EyeMVP: OCT-Informed Fundus Representation Learning via Paired CFP--OCT Pretraining

    cs.CV 2026-06 unverdicted novelty 6.0

    EyeMVP learns OCT-informed CFP representations via cross-modal masked reconstruction on 674k paired triples and reports competitive or superior performance on 15 retinal classification and segmentation tasks.

  2. Representation learning from OCT images

    cs.CV 2026-05 unverdicted novelty 3.0

    A structured survey of representation learning methods for retinal OCT image analysis, covering supervised, self-supervised, generative, multimodal, and foundation model approaches along with datasets and open problems.