CapCLIP uses pathology-aware text captions to align WCE images in a vision-language space, outperforming standard models in zero-shot classification and retrieval on unseen data.
Eyeclip: A visual-language foundation model for multi-modal ophthalmic image analysis
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
citation-role summary
background 2
citation-polarity summary
fields
cs.CV 2years
2026 2verdicts
UNVERDICTED 2roles
background 2polarities
background 2representative citing papers
A structured survey of representation learning methods for retinal OCT image analysis, covering supervised, self-supervised, generative, multimodal, and foundation model approaches along with datasets and open problems.
citing papers explorer
-
CapCLIP: A Vision-Language Representation Alignment Approach for Wireless Capsule Endoscopy Analysis
CapCLIP uses pathology-aware text captions to align WCE images in a vision-language space, outperforming standard models in zero-shot classification and retrieval on unseen data.
-
Representation learning from OCT images
A structured survey of representation learning methods for retinal OCT image analysis, covering supervised, self-supervised, generative, multimodal, and foundation model approaches along with datasets and open problems.