M4CXR: Exploring Multi-task Potentials of Multi-modal Large Language Models for Chest X-ray Interpretation
Pith reviewed 2026-05-23 21:59 UTC · model grok-4.3
The pith
A single multi-modal LLM trained on conversational visual instructions performs chest X-ray report generation, visual grounding, and VQA while reaching state-of-the-art clinical accuracy in reports.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
M4CXR is trained on a visual instruction-following dataset that integrates various task-specific datasets in conversational format. This enables the model to support medical report generation by using chain-of-thought prompting to identify findings before generating reports, visual grounding, and visual question answering. The model achieves state-of-the-art clinical accuracy in report generation and performs at levels comparable to specialized models in the other tasks, with adaptability to single-image, multi-image, and multi-study contexts.
What carries the argument
The integrated visual instruction-following dataset in conversational format that trains one multi-modal LLM to handle multiple CXR tasks, combined with chain-of-thought prompting that first identifies findings before report generation.
Load-bearing premise
Combining different task datasets into one conversational training set lets the model learn all tasks without losing accuracy on any individual one.
What would settle it
A test on a held-out clinical benchmark for medical report generation where M4CXR is run without chain-of-thought prompting and its accuracy falls below that of prior specialized models.
Figures
read the original abstract
The rapid evolution of artificial intelligence, especially in large language models (LLMs), has significantly impacted various domains, including healthcare. In chest X-ray (CXR) analysis, previous studies have employed LLMs, but with limitations: either underutilizing the multi-tasking capabilities of LLMs or lacking clinical accuracy. This paper presents M4CXR, a multi-modal LLM designed to enhance CXR interpretation. The model is trained on a visual instruction-following dataset that integrates various task-specific datasets in a conversational format. As a result, the model supports multiple tasks such as medical report generation (MRG), visual grounding, and visual question answering (VQA). M4CXR achieves state-of-the-art clinical accuracy in MRG by employing a chain-of-thought prompting strategy, in which it identifies findings in CXR images and subsequently generates corresponding reports. The model is adaptable to various MRG scenarios depending on the available inputs, such as single-image, multi-image, and multi-study contexts. In addition to MRG, M4CXR performs visual grounding at a level comparable to specialized models and also demonstrates outstanding performance in VQA. Both quantitative and qualitative assessments reveal M4CXR's versatility in MRG, visual grounding, and VQA, while consistently maintaining clinical accuracy.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper presents M4CXR, a multi-modal LLM for chest X-ray interpretation trained on an integrated visual instruction-following dataset in conversational format. It supports multiple tasks including medical report generation (MRG), visual grounding, and VQA. The central claim is that M4CXR achieves state-of-the-art clinical accuracy in MRG via a chain-of-thought prompting strategy that first identifies findings in the CXR image and then generates the corresponding report; the model is also adaptable to single-image, multi-image, and multi-study MRG scenarios and performs comparably to specialized models in visual grounding while showing strong VQA results.
Significance. If the performance claims are substantiated with rigorous, isolated evaluations, the work would be significant for demonstrating that a single multi-modal LLM can handle diverse CXR tasks in a conversational format while preserving clinical accuracy, potentially reducing reliance on task-specific models. The conversational training approach and CoT strategy for MRG represent potentially reusable design choices if shown to generalize.
major comments (2)
- [Abstract] Abstract: the assertion of 'state-of-the-art clinical accuracy in MRG' is unsupported by any numerical metrics, baseline comparisons, dataset specifications, or evaluation protocols, preventing verification of the claim.
- [Abstract] Abstract: the attribution of SOTA clinical accuracy specifically to the chain-of-thought prompting strategy (identifying findings then generating reports) lacks isolating evidence such as an ablation comparing CoT vs. standard prompting on the identical model and test set.
minor comments (1)
- [Abstract] The abstract would be strengthened by briefly naming the clinical accuracy metric (e.g., CheXbert F1 or RadGraph) and the primary baselines used to declare SOTA.
Simulated Author's Rebuttal
We thank the referee for the constructive feedback on the abstract. We agree that the claims require supporting details to be verifiable and will revise the abstract in the next version to address both points.
read point-by-point responses
-
Referee: [Abstract] Abstract: the assertion of 'state-of-the-art clinical accuracy in MRG' is unsupported by any numerical metrics, baseline comparisons, dataset specifications, or evaluation protocols, preventing verification of the claim.
Authors: We agree that the abstract should be self-contained. The main text (Sections 4.1–4.3 and associated tables) reports the specific clinical accuracy metrics (via CheXbert/RadGraph factuality), baseline comparisons on MIMIC-CXR and other datasets, and the evaluation protocols used. We will revise the abstract to include concise numerical highlights and a brief mention of the evaluation setup. revision: yes
-
Referee: [Abstract] Abstract: the attribution of SOTA clinical accuracy specifically to the chain-of-thought prompting strategy (identifying findings then generating reports) lacks isolating evidence such as an ablation comparing CoT vs. standard prompting on the identical model and test set.
Authors: The paper presents the CoT strategy as the prompting method employed for the reported MRG results (Section 3.2). We do not provide an explicit ablation isolating CoT versus standard prompting on the same model and test set. We will revise the abstract to describe the prompting approach factually without claiming isolated causation for the SOTA result. revision: yes
Circularity Check
No significant circularity; empirical claims benchmarked externally
full rationale
The paper presents an empirical multi-modal LLM trained on an integrated visual instruction-following dataset in conversational format, then evaluated on standard tasks including MRG, visual grounding, and VQA. The SOTA claim for clinical accuracy in MRG is attributed to a chain-of-thought prompting strategy at inference time and is positioned as a performance result against prior external models. No equations, parameter fits, or derivations are described that reduce by construction to the inputs. No self-citations are invoked as load-bearing uniqueness theorems or ansatzes. The evaluation relies on external benchmarks, satisfying the condition for a self-contained result without circular reduction.
Axiom & Free-Parameter Ledger
free parameters (1)
- model training parameters
axioms (1)
- domain assumption Multi-task training on conversational data improves performance across tasks including clinical accuracy in report generation
Forward citations
Cited by 1 Pith paper
-
RA-RRG: Multimodal Retrieval-Augmented Radiology Report Generation with Key Phrase Extraction
RA-RRG extracts key phrases with LLMs, retrieves them via multimodal similarity, and conditions report generation on them to achieve SOTA CheXbert scores and competitive RadGraph F1 on MIMIC-CXR and IU X-ray while sup...
Reference graph
Works this paper leans on
-
[1]
, " * write output.state after.block = add.period write newline
ENTRY address archivePrefix author booktitle chapter edition editor eid eprint howpublished institution isbn journal key month note number organization pages publisher school series title type volume year label extra.label sort.label short.list INTEGERS output.state before.all mid.sentence after.sentence after.block FUNCTION init.state.consts #0 'before.a...
-
[2]
" write newline "" before.all 'output.state := FUNCTION n.dashify 't := "" t empty not t #1 #1 substring "-" = t #1 #2 substring "--" = not "--" * t #2 global.max substring 't := t #1 #1 substring "-" = "-" * t #2 global.max substring 't := while if t #1 #1 substring * t #2 global.max substring 't := if while FUNCTION word.in bbl.in capitalize " " * FUNCT...
-
[3]
Alayrac, J.-B.; Donahue, J.; Luc, P.; Miech, A.; Barr, I.; Hasson, Y.; Lenc, K.; Mensch, A.; Millican, K.; Reynolds, M.; et al. 2022. Flamingo: a visual language model for few-shot learning. Advances in neural information processing systems, 35: 23716--23736
work page 2022
-
[4]
Bae, S.; Kyung, D.; Ryu, J.; Cho, E.; Lee, G.; Kweon, S.; Oh, J.; Ji, L.; Chang, E.; Kim, T.; et al. 2024. EHRXQA: A multi-modal question answering dataset for electronic health records with chest x-ray images. Advances in Neural Information Processing Systems, 36
work page 2024
-
[5]
Bannur, S.; Bouzid, K.; Castro, D. C.; Schwaighofer, A.; Bond-Taylor, S.; Ilse, M.; Pérez-García, F.; Salvatelli, V.; Sharma, H.; Meissen, F.; Ranjit, M.; Srivastav, S.; Gong, J.; Falck, F.; Oktay, O.; Thieme, A.; Lungren, M. P.; Wetscherek, M. T.; Alvarez-Valle, J.; and Hyland, S. L. 2024. MAIRA-2: Grounded Radiology Report Generation. arXiv:2406.04449
-
[6]
C.; Schwaighofer, A.; Hyland, S.; Wetscherek, M.; Naumann, T.; Nori, A.; Alvarez-Valle, J.; et al
Boecking, B.; Usuyama, N.; Bannur, S.; Castro, D. C.; Schwaighofer, A.; Hyland, S.; Wetscherek, M.; Naumann, T.; Nori, A.; Alvarez-Valle, J.; et al. 2022 a . Making the most of text semantics to improve biomedical vision--language processing. In European conference on computer vision, 1--21. Springer
work page 2022
-
[7]
T.; Naumann, T.; Nori, A.; Alvarez Valle, J.; Poon, H.; and Oktay, O
Boecking, B.; Usuyama, N.; Bannur, S.; Coelho de Castro, D.; Schwaighofer, A.; Hyland, S.; Wetscherek, M. T.; Naumann, T.; Nori, A.; Alvarez Valle, J.; Poon, H.; and Oktay, O. 2022 b . MS-CXR: Making the Most of Text Semantics to Improve Biomedical Vision-Language Processing (version 0.1). https://doi.org/10.13026/b90j-vb87
-
[8]
Cha, J.; Kang, W.; Mun, J.; and Roh, B. 2024. Honeybee: Locality-enhanced projector for multimodal llm. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 13817--13827
work page 2024
-
[9]
Chaves, J. M. Z.; Huang, S.-C.; Xu, Y.; Xu, H.; Usuyama, N.; Zhang, S.; Wang, F.; Xie, Y.; Khademi, M.; Yang, Z.; Awadalla, H.; Gong, J.; Hu, H.; Yang, J.; Li, C.; Gao, J.; Gu, Y.; Wong, C.; Wei, M.; Naumann, T.; Chen, M.; Lungren, M. P.; Chaudhari, A.; Yeung-Levy, S.; Langlotz, C. P.; Wang, S.; and Poon, H. 2024. Towards a clinically accessible radiology...
-
[10]
Chen, J.; Zhu, D.; Shen, X.; Li, X.; Liu, Z.; Zhang, P.; Krishnamoorthi, R.; Chandra, V.; Xiong, Y.; and Elhoseiny, M. 2023 a . MiniGPT-v2: large language model as a unified interface for vision-language multi-task learning. arXiv:2310.09478
work page internal anchor Pith review Pith/arXiv arXiv 2023
- [11]
-
[12]
Chen, Z.; Varma, M.; Delbrouck, J.-B.; Paschali, M.; Blankemeier, L.; Veen, D. V.; Valanarasu, J. M. J.; Youssef, A.; Cohen, J. P.; Reis, E. P.; Tsai, E. B.; Johnston, A.; Olsen, C.; Abraham, T. M.; Gatidis, S.; Chaudhari, A. S.; and Langlotz, C. 2024. CheXagent: Towards a Foundation Model for Chest X-Ray Interpretation. arXiv:2401.12208
-
[13]
Chen, Z.; Zhou, Y.; Tran, A.; Zhao, J.; Wan, L.; Ooi, G. S. K.; Cheng, L. T.-E.; Thng, C. H.; Xu, X.; Liu, Y.; et al. 2023 b . Medical phrase grounding with region-phrase context contrastive alignment. In International Conference on Medical Image Computing and Computer-Assisted Intervention, 371--381. Springer
work page 2023
-
[14]
Chowdhury, M. E. H.; Rahman, T.; Khandakar, A.; Mazhar, R.; Kadir, M. A.; Mahbub, Z. B.; Islam, K. R.; Khan, M. S.; Iqbal, A.; Emadi, N. A.; Reaz, M. B. I.; and Islam, M. T. 2020. Can AI Help in Screening Viral and COVID-19 Pneumonia? IEEE Access, 8: 132665--132676
work page 2020
-
[15]
Dao, T. 2023. FlashAttention-2: Faster Attention with Better Parallelism and Work Partitioning. arXiv:2307.08691
work page internal anchor Pith review Pith/arXiv arXiv 2023
-
[16]
Degerli, A.; Ahishali, M.; Yamac, M.; Kiranyaz, S.; Chowdhury, M. E. H.; Hameed, K.; Hamid, T.; Mazhar, R.; and Gabbouj, M. 2021. COVID-19 infection map generation and detection from chest X-ray images. Health Information Science and Systems, 9(1): 15
work page 2021
- [17]
-
[18]
LoRA: Low-Rank Adaptation of Large Language Models
Hu, E. J.; Shen, Y.; Wallis, P.; Allen-Zhu, Z.; Li, Y.; Wang, S.; Wang, L.; and Chen, W. 2021. LoRA: Low-Rank Adaptation of Large Language Models. arXiv:2106.09685
work page internal anchor Pith review Pith/arXiv arXiv 2021
-
[19]
Hu, X.; Gu, L.; An, Q.; Zhang, M.; Liu, L.; Kobayashi, K.; Harada, T.; Summers, R. M.; and Zhu, Y. 2023. Expert Knowledge-Aware Image Difference Graph Representation Learning for Difference-Aware Medical Visual Question Answering. In Proceedings of the 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, KDD ’23. ACM
work page 2023
-
[20]
L.; Bannur, S.; Bouzid, K.; Castro, D
Hyland, S. L.; Bannur, S.; Bouzid, K.; Castro, D. C.; Ranjit, M.; Schwaighofer, A.; Pérez-García, F.; Salvatelli, V.; Srivastav, S.; Thieme, A.; Codella, N.; Lungren, M. P.; Wetscherek, M. T.; Oktay, O.; and Alvarez-Valle, J. 2024. MAIRA-1: A specialised large multimodal model for radiology report generation. arXiv:2311.13668
-
[21]
CheXpert: A Large Chest Radiograph Dataset with Uncertainty Labels and Expert Comparison
Irvin, J.; Rajpurkar, P.; Ko, M.; Yu, Y.; Ciurea-Ilcus, S.; Chute, C.; Marklund, H.; Haghgoo, B.; Ball, R.; Shpanskaya, K.; Seekins, J.; Mong, D. A.; Halabi, S. S.; Sandberg, J. K.; Jones, R.; Larson, D. B.; Langlotz, C. P.; Patel, B. N.; Lungren, M. P.; and Ng, A. Y. 2019. CheXpert: A Large Chest Radiograph Dataset with Uncertainty Labels and Expert Comp...
work page internal anchor Pith review Pith/arXiv arXiv 2019
-
[22]
Jain, S.; Agrawal, A.; Saporta, A.; Truong, S.; Duong, D. N. D. N.; Bui, T.; Chambon, P.; Zhang, Y.; Lungren, M.; Ng, A.; Langlotz, C.; Rajpurkar, P.; and Rajpurkar, P. 2021. RadGraph: Extracting Clinical Entities and Relations from Radiology Reports. In Vanschoren, J.; and Yeung, S., eds., Proceedings of the Neural Information Processing Systems Track on...
work page 2021
-
[23]
Jiang, A. Q.; Sablayrolles, A.; Mensch, A.; Bamford, C.; Chaplot, D. S.; de las Casas, D.; Bressand, F.; Lengyel, G.; Lample, G.; Saulnier, L.; Lavaud, L. R.; Lachaux, M.-A.; Stock, P.; Scao, T. L.; Lavril, T.; Wang, T.; Lacroix, T.; and Sayed, W. E. 2023. Mistral 7B. arXiv:2310.06825
work page internal anchor Pith review Pith/arXiv arXiv 2023
-
[24]
Jin, H.; Che, H.; Lin, Y.; and Chen, H. 2024. PromptMRG: Diagnosis-Driven Prompts for Medical Report Generation. Proceedings of the AAAI Conference on Artificial Intelligence, 38(3): 2607--2615
work page 2024
-
[25]
Jing, B.; Xie, P.; and Xing, E. 2018. On the Automatic Generation of Medical Imaging Reports. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). Association for Computational Linguistics
work page 2018
-
[26]
Johnson, A. E. W.; Pollard, T. J.; Berkowitz, S. J.; Greenbaum, N. R.; Lungren, M. P.; Deng, C.; Mark, R. G.; and Horng, S. 2019. MIMIC-CXR: A large publicly available database of labeled chest radiographs. CoRR, abs/1901.07042
work page internal anchor Pith review Pith/arXiv arXiv 2019
-
[27]
Lee, S.; Kim, W. J.; Chang, J.; and Ye, J. C. 2024. LLM-CXR: Instruction-Finetuned LLM for CXR Image Understanding and Generation. arXiv:2305.11490
-
[28]
Li, J.; Li, D.; Savarese, S.; and Hoi, S. 2023 a . Blip-2: Bootstrapping language-image pre-training with frozen image encoders and large language models. In International conference on machine learning, 19730--19742. PMLR
work page 2023
-
[29]
Li, M.; Lin, B.; Chen, Z.; Lin, H.; Liang, X.; and Chang, X. 2023 b . Dynamic graph enhanced contrastive learning for chest x-ray report generation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 3334--3343
work page 2023
-
[30]
Lin, C.-Y. 2004. Rouge: A package for automatic evaluation of summaries. In Text summarization branches out, 74--81
work page 2004
- [31]
-
[32]
Liu, H.; Li, C.; Wu, Q.; and Lee, Y. J. 2024. Visual instruction tuning. Advances in neural information processing systems, 36
work page 2024
- [33]
-
[34]
Nguyen, H. Q.; Lam, K.; Le, L. T.; Pham, H. H.; Tran, D. Q.; Nguyen, D. B.; Le, D. D.; Pham, C. M.; Tong, H. T. T.; Dinh, D. H.; Do, C. D.; Doan, L. T.; Nguyen, C. N.; Nguyen, B. T.; Nguyen, Q. V.; Hoang, A. D.; Phan, H. N.; Nguyen, A. T.; Ho, P. H.; Ngo, D. T.; Nguyen, N. T.; Nguyen, N. T.; Dao, M.; and Vu, V. 2022. VinDr-CXR: An open dataset of chest X-...
-
[35]
Papineni, K.; Roukos, S.; Ward, T.; and Zhu, W.-J. 2002. Bleu: a method for automatic evaluation of machine translation. In Proceedings of the 40th annual meeting of the Association for Computational Linguistics, 311--318
work page 2002
- [36]
-
[37]
Pellegrini, C.; Özsoy, E.; Busam, B.; Navab, N.; and Keicher, M. 2024. RaDialog Instruct Dataset (version 1.1.0). PhysioNet
work page 2024
-
[38]
C.; Schwaighofer, A.; Lungren, M
Pérez-García, F.; Sharma, H.; Bond-Taylor, S.; Bouzid, K.; Salvatelli, V.; Ilse, M.; Bannur, S.; Castro, D. C.; Schwaighofer, A.; Lungren, M. P.; Wetscherek, M.; Codella, N.; Hyland, S. L.; Alvarez-Valle, J.; and Oktay, O. 2024. RAD-DINO: Exploring Scalable Medical Image Encoders Beyond Text Supervision. arXiv:2401.10815
-
[39]
Raffel, C.; Shazeer, N.; Roberts, A.; Lee, K.; Narang, S.; Matena, M.; Zhou, Y.; Li, W.; and Liu, P. J. 2020. Exploring the limits of transfer learning with a unified text-to-text transformer. Journal of machine learning research, 21(140): 1--67
work page 2020
-
[40]
Rahman, T.; Khandakar, A.; Qiblawey, Y.; Tahir, A.; Kiranyaz, S.; Abul Kashem, S.; Islam, M.; Al Maadeed, S.; Zughaier, S.; Khan, M.; and Chowdhury, M. 2021. Exploring the Effect of Image Enhancement Techniques on COVID-19 Detection using Chest X-rays Images. Computers in Biology and Medicine, 104319
work page 2021
-
[41]
Reis, E. P.; de Paiva, J. P.; da Silva, M. C.; Ribeiro, G. A.; Paiva, V. F.; Bulgarelli, L.; Lee, H. M.; Santos, P. V.; Brito, V. M.; Amaral, L. T.; Beraldo, G. L.; Haidar Filho, J. N.; Teles, G. B.; Szarf, G.; Pollard, T.; Johnson, A. E.; Celi, L. A.; and Amaro, E. J. 2022. BRAX, Brazilian labeled chest x-ray dataset. Scientific Data, 9(1): 487
work page 2022
-
[42]
Shih, G.; Wu, C.; Halabi, S.; Kohli, M.; Prevedello, L.; Cook, T.; Sharma, A.; Amorosa, J.; Arteaga, V.; Galperin-Aizenberg, M.; Gill, R.; Godoy, M.; Hobbs, S.; Jeudy, J.; Laroia, A.; Shah, P.; Vummidi, D.; Yaddanapudi, K.; and Stein, A. 2019. Augmenting the national institutes of health chest radiograph dataset with expert annotations of possible pneumon...
work page 2019
-
[43]
Shin, H.-C.; Roberts, K.; Lu, L.; Demner-Fushman, D.; Yao, J.; and Summers, R. M. 2016. Learning to Read Chest X-Rays: Recurrent Neural Cascade Model for Automated Image Annotation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR)
work page 2016
-
[44]
Shiraishi, J.; Katsuragawa, S.; Ikezoe, J.; Matsumoto, T.; Kobayashi, T.; Komatsu, K.; Matsui, M.; Fujita, H.; Kodera, Y.; and Doi, K. 2000. Development of a digital image database for chest radiographs with and without a lung nodule: receiver operating characteristic analysis of radiologists' detection of pulmonary nodules. AJR Am J Roentgenol, 174(1): 71--74
work page 2000
-
[45]
Smit, A.; Jain, S.; Rajpurkar, P.; Pareek, A.; Ng, A. Y.; and Lungren, M. P. 2020. CheXbert: Combining Automatic Labelers and Expert Annotations for Accurate Radiology Report Labeling Using BERT. arXiv:2004.09167
-
[46]
Tahir, A.; Chowdhury, M.; Qiblawey, Y.; Khandakar, A.; Rahman, T.; Kiranyaz, S.; Khurshid, U.; Ibtehaz, N.; Mahmud, S.; and Ezeddin, M. 2021. COVID-QU-Ex Dataset. https://www.kaggle.com/datasets/tahirahmed/covidquex
work page 2021
-
[47]
Tanida, T.; M \"u ller, P.; Kaissis, G.; and Rueckert, D. 2023. Interactive and explainable region-guided radiology report generation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 7433--7442
work page 2023
-
[48]
Xraygpt: Chest radiographs summarization using medical vision-language models
Thawkar, O.; Shaker, A.; Mullappilly, S. S.; Cholakkal, H.; Anwer, R. M.; Khan, S.; Laaksonen, J.; and Khan, F. S. 2023. XrayGPT: Chest Radiographs Summarization using Medical Vision-Language Models. arXiv:2306.07971
-
[49]
Tu, T.; Azizi, S.; Driess, D.; Schaekermann, M.; Amin, M.; Chang, P.-C.; Carroll, A.; Lau, C.; Tanno, R.; Ktena, I.; et al. 2024. Towards generalist biomedical AI. NEJM AI, 1(3): AIoa2300138
work page 2024
-
[50]
Wang, X.; Peng, Y.; Lu, L.; Lu, Z.; Bagheri, M.; and Summers, R. M. 2017. ChestX-Ray8: Hospital-Scale Chest X-Ray Database and Benchmarks on Weakly-Supervised Classification and Localization of Common Thorax Diseases. In 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 3462--3471
work page 2017
-
[51]
Wang, Z.; Liu, L.; Wang, L.; and Zhou, L. 2023. Metransformer: Radiology report generation by transformer with multiple learnable expert tokens. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 11558--11567
work page 2023
-
[52]
Wei, J.; Wang, X.; Schuurmans, D.; Bosma, M.; Xia, F.; Chi, E.; Le, Q. V.; Zhou, D.; et al. 2022. Chain-of-thought prompting elicits reasoning in large language models. Advances in neural information processing systems, 35: 24824--24837
work page 2022
- [53]
-
[54]
Wu, J. T.; Agu, N. N.; Lourentzou, I.; Sharma, A.; Paguio, J. A.; Yao, J. S.; Dee, E. C.; Mitchell, W.; Kashyap, S.; Giovannini, A.; Celi, L. A.; and Moradi, M. 2021. Chest ImaGenome Dataset for Clinical Reasoning. arXiv:2108.00316
- [55]
-
[56]
You, H.; Zhang, H.; Gan, Z.; Du, X.; Zhang, B.; Wang, Z.; Cao, L.; Chang, S.-F.; and Yang, Y. 2024. Ferret: Refer and Ground Anything Anywhere at Any Granularity. In The Twelfth International Conference on Learning Representations
work page 2024
-
[57]
You, K.; Gu, J.; Ham, J.; Park, B.; Kim, J.; Hong, E. K.; Baek, W.; and Roh, B. 2023. CXR-CLIP: Toward Large Scale Chest X-ray Language-Image Pre-training. In Medical Image Computing and Computer Assisted Intervention -- MICCAI 2023, 101--111. Springer Nature Switzerland
work page 2023
-
[58]
Yu, F.; Endo, M.; Krishnan, R.; Pan, I.; Tsai, A.; Reis, E. P.; Fonseca, E. K. U. N.; Lee, H. M. H.; Abad, Z. S. H.; Ng, A. Y.; et al. 2023. Evaluating progress in automatic chest x-ray radiology report generation. Patterns, 4(9)
work page 2023
-
[59]
Zawacki, A.; Wu, C.; Shih, G.; Elliott, J.; Fomitchev, M.; Hussain, M.; ParasLakhani; Culliton, P.; and Bao, S. 2019. SIIM-ACR Pneumothorax Segmentation
work page 2019
-
[60]
Zhang, Y.; Wang, X.; Xu, Z.; Yu, Q.; Yuille, A.; and Xu, D. 2020. When Radiology Report Generation Meets Knowledge Graph. Proceedings of the AAAI Conference on Artificial Intelligence, 34(07): 12910--12917
work page 2020
- [61]
- [62]
-
[63]
Zhu, D.; Chen, J.; Shen, X.; Li, X.; and Elhoseiny, M. 2023. MiniGPT-4: Enhancing Vision-Language Understanding with Advanced Large Language Models. arXiv:2304.10592
work page internal anchor Pith review Pith/arXiv arXiv 2023
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.