CapCLIP uses pathology-aware text captions to align WCE images in a vision-language space, outperforming standard models in zero-shot classification and retrieval on unseen data.
Gloria: A multimodal global-local representation learn- ing framework for label-efficient medical image recognition
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
fields
cs.CV 1years
2026 1verdicts
UNVERDICTED 1representative citing papers
citing papers explorer
-
CapCLIP: A Vision-Language Representation Alignment Approach for Wireless Capsule Endoscopy Analysis
CapCLIP uses pathology-aware text captions to align WCE images in a vision-language space, outperforming standard models in zero-shot classification and retrieval on unseen data.