CPLIP: Zero-Shot Learning for Histopathology with Comprehensive Vision-Language Alignment

Arif Mahmood; Fayaz Ali Dharejo; Iyyakutti Iyappan Ganapathi; Mohammed Bennamoun; Naoufel Werghi; Sajid Javed

arxiv: 2406.05205 · v1 · pith:CSFKXUB3new · submitted 2024-06-07 · 💻 cs.CV · cs.CL· cs.LG· cs.MM· eess.IV

CPLIP: Zero-Shot Learning for Histopathology with Comprehensive Vision-Language Alignment

Sajid Javed , Arif Mahmood , Iyyakutti Iyappan Ganapathi , Fayaz Ali Dharejo , Naoufel Werghi , Mohammed Bennamoun This is my paper

classification 💻 cs.CV cs.CLcs.LGcs.MMeess.IV

keywords cpliphistopathologyimageslearningmodelsvision-languageacrossalignment

0 comments

read the original abstract

This paper proposes Comprehensive Pathology Language Image Pre-training (CPLIP), a new unsupervised technique designed to enhance the alignment of images and text in histopathology for tasks such as classification and segmentation. This methodology enriches vision-language models by leveraging extensive data without needing ground truth annotations. CPLIP involves constructing a pathology-specific dictionary, generating textual descriptions for images using language models, and retrieving relevant images for each text snippet via a pre-trained model. The model is then fine-tuned using a many-to-many contrastive learning method to align complex interrelated concepts across both modalities. Evaluated across multiple histopathology tasks, CPLIP shows notable improvements in zero-shot learning scenarios, outperforming existing methods in both interpretability and robustness and setting a higher benchmark for the application of vision-language models in the field. To encourage further research and replication, the code for CPLIP is available on GitHub at https://cplip.github.io/

This paper has not been read by Pith yet.

CPLIP: Zero-Shot Learning for Histopathology with Comprehensive Vision-Language Alignment

discussion (0)