Holistic Evaluation of GPT-4V for Biomedical Imaging

Bao Ge; Chao Cao; Chong Ma; Dajiang Zhu; Dinggang Shen; Gang Li; Haixing Dai; Hanqi Jiang; Hao He; Huan Zhao

arxiv: 2312.05256 · v1 · pith:HHLQQOECnew · submitted 2023-11-10 · 📡 eess.IV · cs.AI

Holistic Evaluation of GPT-4V for Biomedical Imaging

Zhengliang Liu , Hanqi Jiang , Tianyang Zhong , Zihao Wu , Chong Ma , Yiwei Li , Xiaowei Yu , Yutong Zhang

show 42 more authors

Yi Pan Peng Shu Yanjun Lyu Lu Zhang Junjie Yao Peixin Dong Chao Cao Zhenxiang Xiao Jiaqi Wang Huan Zhao Shaochen Xu Yaonai Wei Jingyuan Chen Haixing Dai Peilong Wang Hao He Zewei Wang Xinyu Wang Xu Zhang Lin Zhao Yiheng Liu Kai Zhang Liheng Yan Lichao Sun Jun Liu Ning Qiang Bao Ge Xiaoyan Cai Shijie Zhao Xintao Hu Yixuan Yuan Gang Li Shu Zhang Xin Zhang Xi Jiang Tuo Zhang Dinggang Shen Quanzheng Li Wei Liu Xiang Li Dajiang Zhu Tianming Liu

This is my paper

classification 📡 eess.IV cs.AI

keywords gpt-4vbiomedicalevaluationimaginganatomyapplicationsdiagnosisdisease

0 comments

read the original abstract

In this paper, we present a large-scale evaluation probing GPT-4V's capabilities and limitations for biomedical image analysis. GPT-4V represents a breakthrough in artificial general intelligence (AGI) for computer vision, with applications in the biomedical domain. We assess GPT-4V's performance across 16 medical imaging categories, including radiology, oncology, ophthalmology, pathology, and more. Tasks include modality recognition, anatomy localization, disease diagnosis, report generation, and lesion detection. The extensive experiments provide insights into GPT-4V's strengths and weaknesses. Results show GPT-4V's proficiency in modality and anatomy recognition but difficulty with disease diagnosis and localization. GPT-4V excels at diagnostic report generation, indicating strong image captioning skills. While promising for biomedical imaging AI, GPT-4V requires further enhancement and validation before clinical deployment. We emphasize responsible development and testing for trustworthy integration of biomedical AGI. This rigorous evaluation of GPT-4V on diverse medical images advances understanding of multimodal large language models (LLMs) and guides future work toward impactful healthcare applications.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Applications of Large Language Models in Radiation Oncology: From Workflow Automation to Clinical Intelligence
physics.med-ph 2026-04 unverdicted novelty 2.0

This review summarizes how large language models are being used for workflow automation, clinical decision support, and patient engagement in radiation oncology.