Mammo-CLIP: A Vision Language Foundation Model to Enhance Data Efficiency and Robustness in Mammography

Clare B. Poynton; Kayhan Batmanghelich; Shantanu Ghosh; Shyam Visweswaran

arxiv: 2405.12255 · v2 · pith:VZBHO2L2new · submitted 2024-05-20 · 📡 eess.IV · cs.CV

Mammo-CLIP: A Vision Language Foundation Model to Enhance Data Efficiency and Robustness in Mammography

Shantanu Ghosh , Clare B. Poynton , Shyam Visweswaran , Kayhan Batmanghelich This is my paper

classification 📡 eess.IV cs.CV

keywords dataefficiencymammo-cliprobustnessbreastcancerclipdatasets

0 comments

read the original abstract

The lack of large and diverse training data on Computer-Aided Diagnosis (CAD) in breast cancer detection has been one of the concerns that impedes the adoption of the system. Recently, pre-training with large-scale image text datasets via Vision-Language models (VLM) (\eg CLIP) partially addresses the issue of robustness and data efficiency in computer vision (CV). This paper proposes Mammo-CLIP, the first VLM pre-trained on a substantial amount of screening mammogram-report pairs, addressing the challenges of dataset diversity and size. Our experiments on two public datasets demonstrate strong performance in classifying and localizing various mammographic attributes crucial for breast cancer detection, showcasing data efficiency and robustness similar to CLIP in CV. We also propose Mammo-FActOR, a novel feature attribution method, to provide spatial interpretation of representation with sentence-level granularity within mammography reports. Code is available publicly: \url{https://github.com/batmanlab/Mammo-CLIP}.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Boosting Ultrasound Image Classification via Attribute-Guided Dual-Branch Framework
cs.CV 2026-07 conditional novelty 5.0

An attribute-guided dual-branch framework fuses a standard classifier with an interpretable attribute-prior branch to boost ultrasound classification accuracy and explainability.