M4CXR: Exploring Multi-task Potentials of Multi-modal Large Language Models for Chest X-ray Interpretation

Byungmu Yoon; Jihun Hyun; Jonggwon Park; Kyoyun Choi; Soobum Kim

arxiv: 2408.16213 · v1 · submitted 2024-08-29 · 💻 cs.CV · cs.AI· cs.CL

M4CXR: Exploring Multi-task Potentials of Multi-modal Large Language Models for Chest X-ray Interpretation

Jonggwon Park , Soobum Kim , Byungmu Yoon , Jihun Hyun , Kyoyun Choi This is my paper

Pith reviewed 2026-05-23 21:59 UTC · model grok-4.3

classification 💻 cs.CV cs.AIcs.CL

keywords multi-modal LLMchest X-raymedical report generationvisual groundingvisual question answeringchain-of-thought promptingmulti-task learning

0 comments

The pith

A single multi-modal LLM trained on conversational visual instructions performs chest X-ray report generation, visual grounding, and VQA while reaching state-of-the-art clinical accuracy in reports.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper establishes that one multi-modal large language model, when trained on a dataset merging multiple chest X-ray tasks into conversational format, can handle medical report generation, visual grounding, and visual question answering without separate specialized models for each. A sympathetic reader would care because this multi-task setup could simplify AI tools in healthcare by cutting the need for multiple systems. If correct, it means one model adapts to different input scenarios like single or multiple images and still produces clinically accurate outputs through chain-of-thought reasoning that identifies findings first.

Core claim

M4CXR is trained on a visual instruction-following dataset that integrates various task-specific datasets in conversational format. This enables the model to support medical report generation by using chain-of-thought prompting to identify findings before generating reports, visual grounding, and visual question answering. The model achieves state-of-the-art clinical accuracy in report generation and performs at levels comparable to specialized models in the other tasks, with adaptability to single-image, multi-image, and multi-study contexts.

What carries the argument

The integrated visual instruction-following dataset in conversational format that trains one multi-modal LLM to handle multiple CXR tasks, combined with chain-of-thought prompting that first identifies findings before report generation.

Load-bearing premise

Combining different task datasets into one conversational training set lets the model learn all tasks without losing accuracy on any individual one.

What would settle it

A test on a held-out clinical benchmark for medical report generation where M4CXR is run without chain-of-thought prompting and its accuracy falls below that of prior specialized models.

Figures

Figures reproduced from arXiv: 2408.16213 by Byungmu Yoon, Jihun Hyun, Jonggwon Park, Kyoyun Choi, Soobum Kim.

**Figure 1.** Figure 1: Overview of the multi-tasking capabilities of M4CXR. Facilitated by CoT prompting in MRG, M4CXR produces [PITH_FULL_IMAGE:figures/full_fig_p002_1.png] view at source ↗

**Figure 2.** Figure 2: (a) The architecture of M4CXR. Utilizing the LLaVA framework, it allows visual tokens from each image to be [PITH_FULL_IMAGE:figures/full_fig_p003_2.png] view at source ↗

**Figure 3.** Figure 3: Example of multi-turn CoT prompting. M4CXR [PITH_FULL_IMAGE:figures/full_fig_p003_3.png] view at source ↗

**Figure 4.** Figure 4: Examples of M4CXR’s performance in (a) visual grounding and (b) VQA. The images are selected from the test splits [PITH_FULL_IMAGE:figures/full_fig_p007_4.png] view at source ↗

**Figure 5.** Figure 5: Examples of medical report generation across various scenarios. For the same study, the top left shows the result for [PITH_FULL_IMAGE:figures/full_fig_p016_5.png] view at source ↗

**Figure 6.** Figure 6: Examples of visual grounding. The ground-truth bounding box is represented by a yellow solid box, while the predic [PITH_FULL_IMAGE:figures/full_fig_p017_6.png] view at source ↗

**Figure 7.** Figure 7: Examples of medical report explanations using easy language. The left shows the results from M4CXR, while the [PITH_FULL_IMAGE:figures/full_fig_p018_7.png] view at source ↗

**Figure 8.** Figure 8: Examples of medical report summarization. The left shows the results from M4CXR, while the right shows the results [PITH_FULL_IMAGE:figures/full_fig_p019_8.png] view at source ↗

**Figure 9.** Figure 9: Examples of medical treatment recommendation. The left shows the results from M4CXR, while the right shows the [PITH_FULL_IMAGE:figures/full_fig_p020_9.png] view at source ↗

read the original abstract

The rapid evolution of artificial intelligence, especially in large language models (LLMs), has significantly impacted various domains, including healthcare. In chest X-ray (CXR) analysis, previous studies have employed LLMs, but with limitations: either underutilizing the multi-tasking capabilities of LLMs or lacking clinical accuracy. This paper presents M4CXR, a multi-modal LLM designed to enhance CXR interpretation. The model is trained on a visual instruction-following dataset that integrates various task-specific datasets in a conversational format. As a result, the model supports multiple tasks such as medical report generation (MRG), visual grounding, and visual question answering (VQA). M4CXR achieves state-of-the-art clinical accuracy in MRG by employing a chain-of-thought prompting strategy, in which it identifies findings in CXR images and subsequently generates corresponding reports. The model is adaptable to various MRG scenarios depending on the available inputs, such as single-image, multi-image, and multi-study contexts. In addition to MRG, M4CXR performs visual grounding at a level comparable to specialized models and also demonstrates outstanding performance in VQA. Both quantitative and qualitative assessments reveal M4CXR's versatility in MRG, visual grounding, and VQA, while consistently maintaining clinical accuracy.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

M4CXR shows a workable way to train one multi-modal LLM on mixed conversational CXR data for several tasks, but the SOTA claim on report generation has no numbers or ablations attached.

read the letter

The paper's real contribution is the training setup: they combine existing task-specific CXR datasets into a single visual instruction-following collection in conversational format, then fine-tune a multi-modal LLM so the same weights handle medical report generation, visual grounding, and VQA. That avoids the usual pattern of separate models per task and lets the system switch between single-image, multi-image, and multi-study inputs for reports. The description of how they format the data and prompt the model for different scenarios is clear enough to be useful to someone already working on medical vision-language models. The adaptability claim is the part that feels practical rather than just asserted. The main weakness is the performance section. The abstract states that chain-of-thought prompting produces state-of-the-art clinical accuracy on report generation, yet it supplies no metrics, no baseline tables, no dataset sizes or splits, and no comparison that isolates the CoT step from the base model or the training mix. Without those, the causal claim about the prompting strategy cannot be checked. The stress-test note is on target here; the attribution stays unsecured. The paper also does not show whether the multi-task training actually preserves accuracy across tasks or whether one task starts to dominate. This work is aimed at labs already running similar multi-modal setups who might want to try the conversational data recipe. It is not yet at the point where a serious referee would get much out of it, because the central result is still an unverified assertion. I would not bring it to a reading group or cite it until the numbers and controls appear.

Referee Report

2 major / 1 minor

Summary. The paper presents M4CXR, a multi-modal LLM for chest X-ray interpretation trained on an integrated visual instruction-following dataset in conversational format. It supports multiple tasks including medical report generation (MRG), visual grounding, and VQA. The central claim is that M4CXR achieves state-of-the-art clinical accuracy in MRG via a chain-of-thought prompting strategy that first identifies findings in the CXR image and then generates the corresponding report; the model is also adaptable to single-image, multi-image, and multi-study MRG scenarios and performs comparably to specialized models in visual grounding while showing strong VQA results.

Significance. If the performance claims are substantiated with rigorous, isolated evaluations, the work would be significant for demonstrating that a single multi-modal LLM can handle diverse CXR tasks in a conversational format while preserving clinical accuracy, potentially reducing reliance on task-specific models. The conversational training approach and CoT strategy for MRG represent potentially reusable design choices if shown to generalize.

major comments (2)

[Abstract] Abstract: the assertion of 'state-of-the-art clinical accuracy in MRG' is unsupported by any numerical metrics, baseline comparisons, dataset specifications, or evaluation protocols, preventing verification of the claim.
[Abstract] Abstract: the attribution of SOTA clinical accuracy specifically to the chain-of-thought prompting strategy (identifying findings then generating reports) lacks isolating evidence such as an ablation comparing CoT vs. standard prompting on the identical model and test set.

minor comments (1)

[Abstract] The abstract would be strengthened by briefly naming the clinical accuracy metric (e.g., CheXbert F1 or RadGraph) and the primary baselines used to declare SOTA.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback on the abstract. We agree that the claims require supporting details to be verifiable and will revise the abstract in the next version to address both points.

read point-by-point responses

Referee: [Abstract] Abstract: the assertion of 'state-of-the-art clinical accuracy in MRG' is unsupported by any numerical metrics, baseline comparisons, dataset specifications, or evaluation protocols, preventing verification of the claim.

Authors: We agree that the abstract should be self-contained. The main text (Sections 4.1–4.3 and associated tables) reports the specific clinical accuracy metrics (via CheXbert/RadGraph factuality), baseline comparisons on MIMIC-CXR and other datasets, and the evaluation protocols used. We will revise the abstract to include concise numerical highlights and a brief mention of the evaluation setup. revision: yes
Referee: [Abstract] Abstract: the attribution of SOTA clinical accuracy specifically to the chain-of-thought prompting strategy (identifying findings then generating reports) lacks isolating evidence such as an ablation comparing CoT vs. standard prompting on the identical model and test set.

Authors: The paper presents the CoT strategy as the prompting method employed for the reported MRG results (Section 3.2). We do not provide an explicit ablation isolating CoT versus standard prompting on the same model and test set. We will revise the abstract to describe the prompting approach factually without claiming isolated causation for the SOTA result. revision: yes

Circularity Check

0 steps flagged

No significant circularity; empirical claims benchmarked externally

full rationale

The paper presents an empirical multi-modal LLM trained on an integrated visual instruction-following dataset in conversational format, then evaluated on standard tasks including MRG, visual grounding, and VQA. The SOTA claim for clinical accuracy in MRG is attributed to a chain-of-thought prompting strategy at inference time and is positioned as a performance result against prior external models. No equations, parameter fits, or derivations are described that reduce by construction to the inputs. No self-citations are invoked as load-bearing uniqueness theorems or ansatzes. The evaluation relies on external benchmarks, satisfying the condition for a self-contained result without circular reduction.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 0 invented entities

The claim depends on the effectiveness of the chain-of-thought strategy and the multi-task dataset integration for achieving SOTA results.

free parameters (1)

model training parameters
LLM training involves many hyperparameters fitted during optimization.

axioms (1)

domain assumption Multi-task training on conversational data improves performance across tasks including clinical accuracy in report generation
Central to the paper's approach as described in the abstract.

pith-pipeline@v0.9.0 · 5786 in / 1115 out tokens · 38164 ms · 2026-05-23T21:59:58.690284+00:00 · methodology

discussion (0)

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

RA-RRG: Multimodal Retrieval-Augmented Radiology Report Generation with Key Phrase Extraction
cs.CV 2025-04 unverdicted novelty 6.0

RA-RRG extracts key phrases with LLMs, retrieves them via multimodal similarity, and conditions report generation on them to achieve SOTA CheXbert scores and competitive RadGraph F1 on MIMIC-CXR and IU X-ray while sup...

Reference graph

Works this paper leans on

63 extracted references · 63 canonical work pages · cited by 1 Pith paper · 7 internal anchors

[1]

, " * write output.state after.block = add.period write newline

ENTRY address archivePrefix author booktitle chapter edition editor eid eprint howpublished institution isbn journal key month note number organization pages publisher school series title type volume year label extra.label sort.label short.list INTEGERS output.state before.all mid.sentence after.sentence after.block FUNCTION init.state.consts #0 'before.a...

work page
[2]

write newline

" write newline "" before.all 'output.state := FUNCTION n.dashify 't := "" t empty not t #1 #1 substring "-" = t #1 #2 substring "--" = not "--" * t #2 global.max substring 't := t #1 #1 substring "-" = "-" * t #2 global.max substring 't := while if t #1 #1 substring * t #2 global.max substring 't := if while FUNCTION word.in bbl.in capitalize " " * FUNCT...

work page
[3]

Alayrac, J.-B.; Donahue, J.; Luc, P.; Miech, A.; Barr, I.; Hasson, Y.; Lenc, K.; Mensch, A.; Millican, K.; Reynolds, M.; et al. 2022. Flamingo: a visual language model for few-shot learning. Advances in neural information processing systems, 35: 23716--23736

work page 2022
[4]

Bae, S.; Kyung, D.; Ryu, J.; Cho, E.; Lee, G.; Kweon, S.; Oh, J.; Ji, L.; Chang, E.; Kim, T.; et al. 2024. EHRXQA: A multi-modal question answering dataset for electronic health records with chest x-ray images. Advances in Neural Information Processing Systems, 36

work page 2024
[5]

Bannur, S.; Bouzid, K.; Castro, D. C.; Schwaighofer, A.; Bond-Taylor, S.; Ilse, M.; Pérez-García, F.; Salvatelli, V.; Sharma, H.; Meissen, F.; Ranjit, M.; Srivastav, S.; Gong, J.; Falck, F.; Oktay, O.; Thieme, A.; Lungren, M. P.; Wetscherek, M. T.; Alvarez-Valle, J.; and Hyland, S. L. 2024. MAIRA-2: Grounded Radiology Report Generation. arXiv:2406.04449

work page arXiv 2024
[6]

C.; Schwaighofer, A.; Hyland, S.; Wetscherek, M.; Naumann, T.; Nori, A.; Alvarez-Valle, J.; et al

Boecking, B.; Usuyama, N.; Bannur, S.; Castro, D. C.; Schwaighofer, A.; Hyland, S.; Wetscherek, M.; Naumann, T.; Nori, A.; Alvarez-Valle, J.; et al. 2022 a . Making the most of text semantics to improve biomedical vision--language processing. In European conference on computer vision, 1--21. Springer

work page 2022
[7]

T.; Naumann, T.; Nori, A.; Alvarez Valle, J.; Poon, H.; and Oktay, O

Boecking, B.; Usuyama, N.; Bannur, S.; Coelho de Castro, D.; Schwaighofer, A.; Hyland, S.; Wetscherek, M. T.; Naumann, T.; Nori, A.; Alvarez Valle, J.; Poon, H.; and Oktay, O. 2022 b . MS-CXR: Making the Most of Text Semantics to Improve Biomedical Vision-Language Processing (version 0.1). https://doi.org/10.13026/b90j-vb87

work page doi:10.13026/b90j-vb87 2022
[8]

Cha, J.; Kang, W.; Mun, J.; and Roh, B. 2024. Honeybee: Locality-enhanced projector for multimodal llm. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 13817--13827

work page 2024
[9]

Chaves, J. M. Z.; Huang, S.-C.; Xu, Y.; Xu, H.; Usuyama, N.; Zhang, S.; Wang, F.; Xie, Y.; Khademi, M.; Yang, Z.; Awadalla, H.; Gong, J.; Hu, H.; Yang, J.; Li, C.; Gao, J.; Gu, Y.; Wong, C.; Wei, M.; Naumann, T.; Chen, M.; Lungren, M. P.; Chaudhari, A.; Yeung-Levy, S.; Langlotz, C. P.; Wang, S.; and Poon, H. 2024. Towards a clinically accessible radiology...

work page arXiv 2024
[10]

Chen, J.; Zhu, D.; Shen, X.; Li, X.; Liu, Z.; Zhang, P.; Krishnamoorthi, R.; Chandra, V.; Xiong, Y.; and Elhoseiny, M. 2023 a . MiniGPT-v2: large language model as a unified interface for vision-language multi-task learning. arXiv:2310.09478

work page internal anchor Pith review Pith/arXiv arXiv 2023
[11]

Chen, Z.; Song, Y.; Chang, T.-H.; and Wan, X. 2022. Generating Radiology Reports via Memory-driven Transformer. arXiv:2010.16056

work page arXiv 2022
[12]

A vision- language foundation model to enhance efficiency of chest x-ray interpretation.arXiv preprint arXiv:2401.12208, 2024

Chen, Z.; Varma, M.; Delbrouck, J.-B.; Paschali, M.; Blankemeier, L.; Veen, D. V.; Valanarasu, J. M. J.; Youssef, A.; Cohen, J. P.; Reis, E. P.; Tsai, E. B.; Johnston, A.; Olsen, C.; Abraham, T. M.; Gatidis, S.; Chaudhari, A. S.; and Langlotz, C. 2024. CheXagent: Towards a Foundation Model for Chest X-Ray Interpretation. arXiv:2401.12208

work page arXiv 2024
[13]

Chen, Z.; Zhou, Y.; Tran, A.; Zhao, J.; Wan, L.; Ooi, G. S. K.; Cheng, L. T.-E.; Thng, C. H.; Xu, X.; Liu, Y.; et al. 2023 b . Medical phrase grounding with region-phrase context contrastive alignment. In International Conference on Medical Image Computing and Computer-Assisted Intervention, 371--381. Springer

work page 2023
[14]

Chowdhury, M. E. H.; Rahman, T.; Khandakar, A.; Mazhar, R.; Kadir, M. A.; Mahbub, Z. B.; Islam, K. R.; Khan, M. S.; Iqbal, A.; Emadi, N. A.; Reaz, M. B. I.; and Islam, M. T. 2020. Can AI Help in Screening Viral and COVID-19 Pneumonia? IEEE Access, 8: 132665--132676

work page 2020
[15]

Dao, T. 2023. FlashAttention-2: Faster Attention with Better Parallelism and Work Partitioning. arXiv:2307.08691

work page internal anchor Pith review Pith/arXiv arXiv 2023
[16]

Degerli, A.; Ahishali, M.; Yamac, M.; Kiranyaz, S.; Chowdhury, M. E. H.; Hameed, K.; Hamid, T.; Mazhar, R.; and Gabbouj, M. 2021. COVID-19 infection map generation and detection from chest X-ray images. Health Information Science and Systems, 9(1): 15

work page 2021
[17]

Deng, J.; Yang, Z.; Chen, T.; Zhou, W.; and Li, H. 2022. TransVG: End-to-End Visual Grounding with Transformers. arXiv:2104.08541

work page arXiv 2022
[18]

LoRA: Low-Rank Adaptation of Large Language Models

Hu, E. J.; Shen, Y.; Wallis, P.; Allen-Zhu, Z.; Li, Y.; Wang, S.; Wang, L.; and Chen, W. 2021. LoRA: Low-Rank Adaptation of Large Language Models. arXiv:2106.09685

work page internal anchor Pith review Pith/arXiv arXiv 2021
[19]

M.; and Zhu, Y

Hu, X.; Gu, L.; An, Q.; Zhang, M.; Liu, L.; Kobayashi, K.; Harada, T.; Summers, R. M.; and Zhu, Y. 2023. Expert Knowledge-Aware Image Difference Graph Representation Learning for Difference-Aware Medical Visual Question Answering. In Proceedings of the 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, KDD ’23. ACM

work page 2023
[20]

L.; Bannur, S.; Bouzid, K.; Castro, D

Hyland, S. L.; Bannur, S.; Bouzid, K.; Castro, D. C.; Ranjit, M.; Schwaighofer, A.; Pérez-García, F.; Salvatelli, V.; Srivastav, S.; Thieme, A.; Codella, N.; Lungren, M. P.; Wetscherek, M. T.; Oktay, O.; and Alvarez-Valle, J. 2024. MAIRA-1: A specialised large multimodal model for radiology report generation. arXiv:2311.13668

work page arXiv 2024
[21]

CheXpert: A Large Chest Radiograph Dataset with Uncertainty Labels and Expert Comparison

Irvin, J.; Rajpurkar, P.; Ko, M.; Yu, Y.; Ciurea-Ilcus, S.; Chute, C.; Marklund, H.; Haghgoo, B.; Ball, R.; Shpanskaya, K.; Seekins, J.; Mong, D. A.; Halabi, S. S.; Sandberg, J. K.; Jones, R.; Larson, D. B.; Langlotz, C. P.; Patel, B. N.; Lungren, M. P.; and Ng, A. Y. 2019. CheXpert: A Large Chest Radiograph Dataset with Uncertainty Labels and Expert Comp...

work page internal anchor Pith review Pith/arXiv arXiv 2019
[22]

Jain, S.; Agrawal, A.; Saporta, A.; Truong, S.; Duong, D. N. D. N.; Bui, T.; Chambon, P.; Zhang, Y.; Lungren, M.; Ng, A.; Langlotz, C.; Rajpurkar, P.; and Rajpurkar, P. 2021. RadGraph: Extracting Clinical Entities and Relations from Radiology Reports. In Vanschoren, J.; and Yeung, S., eds., Proceedings of the Neural Information Processing Systems Track on...

work page 2021
[23]

Mistral 7B

Jiang, A. Q.; Sablayrolles, A.; Mensch, A.; Bamford, C.; Chaplot, D. S.; de las Casas, D.; Bressand, F.; Lengyel, G.; Lample, G.; Saulnier, L.; Lavaud, L. R.; Lachaux, M.-A.; Stock, P.; Scao, T. L.; Lavril, T.; Wang, T.; Lacroix, T.; and Sayed, W. E. 2023. Mistral 7B. arXiv:2310.06825

work page internal anchor Pith review Pith/arXiv arXiv 2023
[24]

Jin, H.; Che, H.; Lin, Y.; and Chen, H. 2024. PromptMRG: Diagnosis-Driven Prompts for Medical Report Generation. Proceedings of the AAAI Conference on Artificial Intelligence, 38(3): 2607--2615

work page 2024
[25]

Jing, B.; Xie, P.; and Xing, E. 2018. On the Automatic Generation of Medical Imaging Reports. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). Association for Computational Linguistics

work page 2018
[26]

Johnson, A. E. W.; Pollard, T. J.; Berkowitz, S. J.; Greenbaum, N. R.; Lungren, M. P.; Deng, C.; Mark, R. G.; and Horng, S. 2019. MIMIC-CXR: A large publicly available database of labeled chest radiographs. CoRR, abs/1901.07042

work page internal anchor Pith review Pith/arXiv arXiv 2019
[27]

J.; Chang, J.; and Ye, J

Lee, S.; Kim, W. J.; Chang, J.; and Ye, J. C. 2024. LLM-CXR: Instruction-Finetuned LLM for CXR Image Understanding and Generation. arXiv:2305.11490

work page arXiv 2024
[28]

Li, J.; Li, D.; Savarese, S.; and Hoi, S. 2023 a . Blip-2: Bootstrapping language-image pre-training with frozen image encoders and large language models. In International conference on machine learning, 19730--19742. PMLR

work page 2023
[29]

Li, M.; Lin, B.; Chen, Z.; Lin, H.; Liang, X.; and Chang, X. 2023 b . Dynamic graph enhanced contrastive learning for chest x-ray report generation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 3334--3343

work page 2023
[30]

Lin, C.-Y. 2004. Rouge: A package for automatic evaluation of summaries. In Text summarization branches out, 74--81

work page 2004
[31]

Liu, B.; Zhan, L.; Xu, L.; Ma, L.; Yang, Y.; and Wu, X. 2021. SLAKE: A Semantically-Labeled Knowledge-Enhanced Dataset for Medical Visual Question Answering. CoRR, abs/2102.09542

work page arXiv 2021
[32]

Liu, H.; Li, C.; Wu, Q.; and Lee, Y. J. 2024. Visual instruction tuning. Advances in neural information processing systems, 36

work page 2024
[33]

Liu, J.; Lian, J.; and Yu, Y. 2020. ChestX-Det10: Chest X-ray Dataset on Detection of Thoracic Abnormalities. arXiv:2006.10550

work page arXiv 2020
[34]

Q.; Lam, K.; Le, L

Nguyen, H. Q.; Lam, K.; Le, L. T.; Pham, H. H.; Tran, D. Q.; Nguyen, D. B.; Le, D. D.; Pham, C. M.; Tong, H. T. T.; Dinh, D. H.; Do, C. D.; Doan, L. T.; Nguyen, C. N.; Nguyen, B. T.; Nguyen, Q. V.; Hoang, A. D.; Phan, H. N.; Nguyen, A. T.; Ho, P. H.; Ngo, D. T.; Nguyen, N. T.; Nguyen, N. T.; Dao, M.; and Vu, V. 2022. VinDr-CXR: An open dataset of chest X-...

work page arXiv 2022
[35]

Papineni, K.; Roukos, S.; Ward, T.; and Zhu, W.-J. 2002. Bleu: a method for automatic evaluation of machine translation. In Proceedings of the 40th annual meeting of the Association for Computational Linguistics, 311--318

work page 2002
[36]

Pellegrini, C.; Özsoy, E.; Busam, B.; Navab, N.; and Keicher, M. 2023. RaDialog: A Large Vision-Language Model for Radiology Report Generation and Conversational Assistance. arXiv:2311.18681

work page arXiv 2023
[37]

Pellegrini, C.; Özsoy, E.; Busam, B.; Navab, N.; and Keicher, M. 2024. RaDialog Instruct Dataset (version 1.1.0). PhysioNet

work page 2024
[38]

C.; Schwaighofer, A.; Lungren, M

Pérez-García, F.; Sharma, H.; Bond-Taylor, S.; Bouzid, K.; Salvatelli, V.; Ilse, M.; Bannur, S.; Castro, D. C.; Schwaighofer, A.; Lungren, M. P.; Wetscherek, M.; Codella, N.; Hyland, S. L.; Alvarez-Valle, J.; and Oktay, O. 2024. RAD-DINO: Exploring Scalable Medical Image Encoders Beyond Text Supervision. arXiv:2401.10815

work page arXiv 2024
[39]

Raffel, C.; Shazeer, N.; Roberts, A.; Lee, K.; Narang, S.; Matena, M.; Zhou, Y.; Li, W.; and Liu, P. J. 2020. Exploring the limits of transfer learning with a unified text-to-text transformer. Journal of machine learning research, 21(140): 1--67

work page 2020
[40]

Rahman, T.; Khandakar, A.; Qiblawey, Y.; Tahir, A.; Kiranyaz, S.; Abul Kashem, S.; Islam, M.; Al Maadeed, S.; Zughaier, S.; Khan, M.; and Chowdhury, M. 2021. Exploring the Effect of Image Enhancement Techniques on COVID-19 Detection using Chest X-rays Images. Computers in Biology and Medicine, 104319

work page 2021
[41]

P.; de Paiva, J

Reis, E. P.; de Paiva, J. P.; da Silva, M. C.; Ribeiro, G. A.; Paiva, V. F.; Bulgarelli, L.; Lee, H. M.; Santos, P. V.; Brito, V. M.; Amaral, L. T.; Beraldo, G. L.; Haidar Filho, J. N.; Teles, G. B.; Szarf, G.; Pollard, T.; Johnson, A. E.; Celi, L. A.; and Amaro, E. J. 2022. BRAX, Brazilian labeled chest x-ray dataset. Scientific Data, 9(1): 487

work page 2022
[42]

Shih, G.; Wu, C.; Halabi, S.; Kohli, M.; Prevedello, L.; Cook, T.; Sharma, A.; Amorosa, J.; Arteaga, V.; Galperin-Aizenberg, M.; Gill, R.; Godoy, M.; Hobbs, S.; Jeudy, J.; Laroia, A.; Shah, P.; Vummidi, D.; Yaddanapudi, K.; and Stein, A. 2019. Augmenting the national institutes of health chest radiograph dataset with expert annotations of possible pneumon...

work page 2019
[43]

Shin, H.-C.; Roberts, K.; Lu, L.; Demner-Fushman, D.; Yao, J.; and Summers, R. M. 2016. Learning to Read Chest X-Rays: Recurrent Neural Cascade Model for Automated Image Annotation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR)

work page 2016
[44]

Shiraishi, J.; Katsuragawa, S.; Ikezoe, J.; Matsumoto, T.; Kobayashi, T.; Komatsu, K.; Matsui, M.; Fujita, H.; Kodera, Y.; and Doi, K. 2000. Development of a digital image database for chest radiographs with and without a lung nodule: receiver operating characteristic analysis of radiologists' detection of pulmonary nodules. AJR Am J Roentgenol, 174(1): 71--74

work page 2000
[45]

Y.; and Lungren, M

Smit, A.; Jain, S.; Rajpurkar, P.; Pareek, A.; Ng, A. Y.; and Lungren, M. P. 2020. CheXbert: Combining Automatic Labelers and Expert Annotations for Accurate Radiology Report Labeling Using BERT. arXiv:2004.09167

work page arXiv 2020
[46]

Tahir, A.; Chowdhury, M.; Qiblawey, Y.; Khandakar, A.; Rahman, T.; Kiranyaz, S.; Khurshid, U.; Ibtehaz, N.; Mahmud, S.; and Ezeddin, M. 2021. COVID-QU-Ex Dataset. https://www.kaggle.com/datasets/tahirahmed/covidquex

work page 2021
[47]

Tanida, T.; M \"u ller, P.; Kaissis, G.; and Rueckert, D. 2023. Interactive and explainable region-guided radiology report generation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 7433--7442

work page 2023
[48]

Xraygpt: Chest radiographs summarization using medical vision-language models

Thawkar, O.; Shaker, A.; Mullappilly, S. S.; Cholakkal, H.; Anwer, R. M.; Khan, S.; Laaksonen, J.; and Khan, F. S. 2023. XrayGPT: Chest Radiographs Summarization using Medical Vision-Language Models. arXiv:2306.07971

work page arXiv 2023
[49]

Tu, T.; Azizi, S.; Driess, D.; Schaekermann, M.; Amin, M.; Chang, P.-C.; Carroll, A.; Lau, C.; Tanno, R.; Ktena, I.; et al. 2024. Towards generalist biomedical AI. NEJM AI, 1(3): AIoa2300138

work page 2024
[50]

Wang, X.; Peng, Y.; Lu, L.; Lu, Z.; Bagheri, M.; and Summers, R. M. 2017. ChestX-Ray8: Hospital-Scale Chest X-Ray Database and Benchmarks on Weakly-Supervised Classification and Localization of Common Thorax Diseases. In 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 3462--3471

work page 2017
[51]

Wang, Z.; Liu, L.; Wang, L.; and Zhou, L. 2023. Metransformer: Radiology report generation by transformer with multiple learnable expert tokens. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 11558--11567

work page 2023
[52]

V.; Zhou, D.; et al

Wei, J.; Wang, X.; Schuurmans, D.; Bosma, M.; Xia, F.; Chi, E.; Le, Q. V.; Zhou, D.; et al. 2022. Chain-of-thought prompting elicits reasoning in large language models. Advances in neural information processing systems, 35: 24824--24837

work page 2022
[53]

Wu, C.; Zhang, X.; Zhang, Y.; Wang, Y.; and Xie, W. 2023. Towards Generalist Foundation Model for Radiology by Leveraging Web-scale 2D&3D Medical Data. arXiv:2308.02463

work page arXiv 2023
[54]

T.; Agu, N

Wu, J. T.; Agu, N. N.; Lourentzou, I.; Sharma, A.; Paguio, J. A.; Yao, J. S.; Dee, E. C.; Mitchell, W.; Kashyap, S.; Giovannini, A.; Celi, L. A.; and Moradi, M. 2021. Chest ImaGenome Dataset for Clinical Reasoning. arXiv:2108.00316

work page arXiv 2021
[55]

Yang, L.; Xu, S.; Sellergren, A.; Kohlberger, T.; Zhou, Y.; Ktena, I.; Kiraly, A.; Ahmed, F.; Hormozdiari, F.; Jaroensri, T.; et al. 2024. Advancing multimodal medical capabilities of Gemini. arXiv:2405.03162

work page arXiv 2024
[56]

You, H.; Zhang, H.; Gan, Z.; Du, X.; Zhang, B.; Wang, Z.; Cao, L.; Chang, S.-F.; and Yang, Y. 2024. Ferret: Refer and Ground Anything Anywhere at Any Granularity. In The Twelfth International Conference on Learning Representations

work page 2024
[57]

K.; Baek, W.; and Roh, B

You, K.; Gu, J.; Ham, J.; Park, B.; Kim, J.; Hong, E. K.; Baek, W.; and Roh, B. 2023. CXR-CLIP: Toward Large Scale Chest X-ray Language-Image Pre-training. In Medical Image Computing and Computer Assisted Intervention -- MICCAI 2023, 101--111. Springer Nature Switzerland

work page 2023
[58]

P.; Fonseca, E

Yu, F.; Endo, M.; Krishnan, R.; Pan, I.; Tsai, A.; Reis, E. P.; Fonseca, E. K. U. N.; Lee, H. M. H.; Abad, Z. S. H.; Ng, A. Y.; et al. 2023. Evaluating progress in automatic chest x-ray radiology report generation. Patterns, 4(9)

work page 2023
[59]

Zawacki, A.; Wu, C.; Shih, G.; Elliott, J.; Fomitchev, M.; Hussain, M.; ParasLakhani; Culliton, P.; and Bao, S. 2019. SIIM-ACR Pneumothorax Segmentation

work page 2019
[60]

Zhang, Y.; Wang, X.; Xu, Z.; Yu, Q.; Yuille, A.; and Xu, D. 2020. When Radiology Report Generation Meets Knowledge Graph. Proceedings of the AAAI Conference on Artificial Intelligence, 34(07): 12910--12917

work page 2020
[61]

Zheng, Y.; Gan, W.; Chen, Z.; Qi, Z.; Liang, Q.; and Yu, P. S. 2024. Large Language Models for Medicine: A Survey. arXiv:2405.13055

work page arXiv 2024
[62]

Zhou, Z.; Shi, M.; Wei, M.; Alabi, O.; Yue, Z.; and Vercauteren, T. 2024. Large Model driven Radiology Report Generation with Clinical Quality Reinforcement Learning. arXiv:2403.06728

work page arXiv 2024
[63]

Zhu, D.; Chen, J.; Shen, X.; Li, X.; and Elhoseiny, M. 2023. MiniGPT-4: Enhancing Vision-Language Understanding with Advanced Large Language Models. arXiv:2304.10592

work page internal anchor Pith review Pith/arXiv arXiv 2023

[1] [1]

, " * write output.state after.block = add.period write newline

ENTRY address archivePrefix author booktitle chapter edition editor eid eprint howpublished institution isbn journal key month note number organization pages publisher school series title type volume year label extra.label sort.label short.list INTEGERS output.state before.all mid.sentence after.sentence after.block FUNCTION init.state.consts #0 'before.a...

work page

[2] [2]

write newline

" write newline "" before.all 'output.state := FUNCTION n.dashify 't := "" t empty not t #1 #1 substring "-" = t #1 #2 substring "--" = not "--" * t #2 global.max substring 't := t #1 #1 substring "-" = "-" * t #2 global.max substring 't := while if t #1 #1 substring * t #2 global.max substring 't := if while FUNCTION word.in bbl.in capitalize " " * FUNCT...

work page

[3] [3]

Alayrac, J.-B.; Donahue, J.; Luc, P.; Miech, A.; Barr, I.; Hasson, Y.; Lenc, K.; Mensch, A.; Millican, K.; Reynolds, M.; et al. 2022. Flamingo: a visual language model for few-shot learning. Advances in neural information processing systems, 35: 23716--23736

work page 2022

[4] [4]

Bae, S.; Kyung, D.; Ryu, J.; Cho, E.; Lee, G.; Kweon, S.; Oh, J.; Ji, L.; Chang, E.; Kim, T.; et al. 2024. EHRXQA: A multi-modal question answering dataset for electronic health records with chest x-ray images. Advances in Neural Information Processing Systems, 36

work page 2024

[5] [5]

Bannur, S.; Bouzid, K.; Castro, D. C.; Schwaighofer, A.; Bond-Taylor, S.; Ilse, M.; Pérez-García, F.; Salvatelli, V.; Sharma, H.; Meissen, F.; Ranjit, M.; Srivastav, S.; Gong, J.; Falck, F.; Oktay, O.; Thieme, A.; Lungren, M. P.; Wetscherek, M. T.; Alvarez-Valle, J.; and Hyland, S. L. 2024. MAIRA-2: Grounded Radiology Report Generation. arXiv:2406.04449

work page arXiv 2024

[6] [6]

C.; Schwaighofer, A.; Hyland, S.; Wetscherek, M.; Naumann, T.; Nori, A.; Alvarez-Valle, J.; et al

Boecking, B.; Usuyama, N.; Bannur, S.; Castro, D. C.; Schwaighofer, A.; Hyland, S.; Wetscherek, M.; Naumann, T.; Nori, A.; Alvarez-Valle, J.; et al. 2022 a . Making the most of text semantics to improve biomedical vision--language processing. In European conference on computer vision, 1--21. Springer

work page 2022

[7] [7]

T.; Naumann, T.; Nori, A.; Alvarez Valle, J.; Poon, H.; and Oktay, O

Boecking, B.; Usuyama, N.; Bannur, S.; Coelho de Castro, D.; Schwaighofer, A.; Hyland, S.; Wetscherek, M. T.; Naumann, T.; Nori, A.; Alvarez Valle, J.; Poon, H.; and Oktay, O. 2022 b . MS-CXR: Making the Most of Text Semantics to Improve Biomedical Vision-Language Processing (version 0.1). https://doi.org/10.13026/b90j-vb87

work page doi:10.13026/b90j-vb87 2022

[8] [8]

Cha, J.; Kang, W.; Mun, J.; and Roh, B. 2024. Honeybee: Locality-enhanced projector for multimodal llm. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 13817--13827

work page 2024

[9] [9]

Chaves, J. M. Z.; Huang, S.-C.; Xu, Y.; Xu, H.; Usuyama, N.; Zhang, S.; Wang, F.; Xie, Y.; Khademi, M.; Yang, Z.; Awadalla, H.; Gong, J.; Hu, H.; Yang, J.; Li, C.; Gao, J.; Gu, Y.; Wong, C.; Wei, M.; Naumann, T.; Chen, M.; Lungren, M. P.; Chaudhari, A.; Yeung-Levy, S.; Langlotz, C. P.; Wang, S.; and Poon, H. 2024. Towards a clinically accessible radiology...

work page arXiv 2024

[10] [10]

Chen, J.; Zhu, D.; Shen, X.; Li, X.; Liu, Z.; Zhang, P.; Krishnamoorthi, R.; Chandra, V.; Xiong, Y.; and Elhoseiny, M. 2023 a . MiniGPT-v2: large language model as a unified interface for vision-language multi-task learning. arXiv:2310.09478

work page internal anchor Pith review Pith/arXiv arXiv 2023

[11] [11]

Chen, Z.; Song, Y.; Chang, T.-H.; and Wan, X. 2022. Generating Radiology Reports via Memory-driven Transformer. arXiv:2010.16056

work page arXiv 2022

[12] [12]

A vision- language foundation model to enhance efficiency of chest x-ray interpretation.arXiv preprint arXiv:2401.12208, 2024

Chen, Z.; Varma, M.; Delbrouck, J.-B.; Paschali, M.; Blankemeier, L.; Veen, D. V.; Valanarasu, J. M. J.; Youssef, A.; Cohen, J. P.; Reis, E. P.; Tsai, E. B.; Johnston, A.; Olsen, C.; Abraham, T. M.; Gatidis, S.; Chaudhari, A. S.; and Langlotz, C. 2024. CheXagent: Towards a Foundation Model for Chest X-Ray Interpretation. arXiv:2401.12208

work page arXiv 2024

[13] [13]

Chen, Z.; Zhou, Y.; Tran, A.; Zhao, J.; Wan, L.; Ooi, G. S. K.; Cheng, L. T.-E.; Thng, C. H.; Xu, X.; Liu, Y.; et al. 2023 b . Medical phrase grounding with region-phrase context contrastive alignment. In International Conference on Medical Image Computing and Computer-Assisted Intervention, 371--381. Springer

work page 2023

[14] [14]

Chowdhury, M. E. H.; Rahman, T.; Khandakar, A.; Mazhar, R.; Kadir, M. A.; Mahbub, Z. B.; Islam, K. R.; Khan, M. S.; Iqbal, A.; Emadi, N. A.; Reaz, M. B. I.; and Islam, M. T. 2020. Can AI Help in Screening Viral and COVID-19 Pneumonia? IEEE Access, 8: 132665--132676

work page 2020

[15] [15]

Dao, T. 2023. FlashAttention-2: Faster Attention with Better Parallelism and Work Partitioning. arXiv:2307.08691

work page internal anchor Pith review Pith/arXiv arXiv 2023

[16] [16]

Degerli, A.; Ahishali, M.; Yamac, M.; Kiranyaz, S.; Chowdhury, M. E. H.; Hameed, K.; Hamid, T.; Mazhar, R.; and Gabbouj, M. 2021. COVID-19 infection map generation and detection from chest X-ray images. Health Information Science and Systems, 9(1): 15

work page 2021

[17] [17]

Deng, J.; Yang, Z.; Chen, T.; Zhou, W.; and Li, H. 2022. TransVG: End-to-End Visual Grounding with Transformers. arXiv:2104.08541

work page arXiv 2022

[18] [18]

LoRA: Low-Rank Adaptation of Large Language Models

Hu, E. J.; Shen, Y.; Wallis, P.; Allen-Zhu, Z.; Li, Y.; Wang, S.; Wang, L.; and Chen, W. 2021. LoRA: Low-Rank Adaptation of Large Language Models. arXiv:2106.09685

work page internal anchor Pith review Pith/arXiv arXiv 2021

[19] [19]

M.; and Zhu, Y

Hu, X.; Gu, L.; An, Q.; Zhang, M.; Liu, L.; Kobayashi, K.; Harada, T.; Summers, R. M.; and Zhu, Y. 2023. Expert Knowledge-Aware Image Difference Graph Representation Learning for Difference-Aware Medical Visual Question Answering. In Proceedings of the 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, KDD ’23. ACM

work page 2023

[20] [20]

L.; Bannur, S.; Bouzid, K.; Castro, D

Hyland, S. L.; Bannur, S.; Bouzid, K.; Castro, D. C.; Ranjit, M.; Schwaighofer, A.; Pérez-García, F.; Salvatelli, V.; Srivastav, S.; Thieme, A.; Codella, N.; Lungren, M. P.; Wetscherek, M. T.; Oktay, O.; and Alvarez-Valle, J. 2024. MAIRA-1: A specialised large multimodal model for radiology report generation. arXiv:2311.13668

work page arXiv 2024

[21] [21]

CheXpert: A Large Chest Radiograph Dataset with Uncertainty Labels and Expert Comparison

Irvin, J.; Rajpurkar, P.; Ko, M.; Yu, Y.; Ciurea-Ilcus, S.; Chute, C.; Marklund, H.; Haghgoo, B.; Ball, R.; Shpanskaya, K.; Seekins, J.; Mong, D. A.; Halabi, S. S.; Sandberg, J. K.; Jones, R.; Larson, D. B.; Langlotz, C. P.; Patel, B. N.; Lungren, M. P.; and Ng, A. Y. 2019. CheXpert: A Large Chest Radiograph Dataset with Uncertainty Labels and Expert Comp...

work page internal anchor Pith review Pith/arXiv arXiv 2019

[22] [22]

Jain, S.; Agrawal, A.; Saporta, A.; Truong, S.; Duong, D. N. D. N.; Bui, T.; Chambon, P.; Zhang, Y.; Lungren, M.; Ng, A.; Langlotz, C.; Rajpurkar, P.; and Rajpurkar, P. 2021. RadGraph: Extracting Clinical Entities and Relations from Radiology Reports. In Vanschoren, J.; and Yeung, S., eds., Proceedings of the Neural Information Processing Systems Track on...

work page 2021

[23] [23]

Mistral 7B

Jiang, A. Q.; Sablayrolles, A.; Mensch, A.; Bamford, C.; Chaplot, D. S.; de las Casas, D.; Bressand, F.; Lengyel, G.; Lample, G.; Saulnier, L.; Lavaud, L. R.; Lachaux, M.-A.; Stock, P.; Scao, T. L.; Lavril, T.; Wang, T.; Lacroix, T.; and Sayed, W. E. 2023. Mistral 7B. arXiv:2310.06825

work page internal anchor Pith review Pith/arXiv arXiv 2023

[24] [24]

Jin, H.; Che, H.; Lin, Y.; and Chen, H. 2024. PromptMRG: Diagnosis-Driven Prompts for Medical Report Generation. Proceedings of the AAAI Conference on Artificial Intelligence, 38(3): 2607--2615

work page 2024

[25] [25]

Jing, B.; Xie, P.; and Xing, E. 2018. On the Automatic Generation of Medical Imaging Reports. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). Association for Computational Linguistics

work page 2018

[26] [26]

Johnson, A. E. W.; Pollard, T. J.; Berkowitz, S. J.; Greenbaum, N. R.; Lungren, M. P.; Deng, C.; Mark, R. G.; and Horng, S. 2019. MIMIC-CXR: A large publicly available database of labeled chest radiographs. CoRR, abs/1901.07042

work page internal anchor Pith review Pith/arXiv arXiv 2019

[27] [27]

J.; Chang, J.; and Ye, J

Lee, S.; Kim, W. J.; Chang, J.; and Ye, J. C. 2024. LLM-CXR: Instruction-Finetuned LLM for CXR Image Understanding and Generation. arXiv:2305.11490

work page arXiv 2024

[28] [28]

Li, J.; Li, D.; Savarese, S.; and Hoi, S. 2023 a . Blip-2: Bootstrapping language-image pre-training with frozen image encoders and large language models. In International conference on machine learning, 19730--19742. PMLR

work page 2023

[29] [29]

Li, M.; Lin, B.; Chen, Z.; Lin, H.; Liang, X.; and Chang, X. 2023 b . Dynamic graph enhanced contrastive learning for chest x-ray report generation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 3334--3343

work page 2023

[30] [30]

Lin, C.-Y. 2004. Rouge: A package for automatic evaluation of summaries. In Text summarization branches out, 74--81

work page 2004

[31] [31]

Liu, B.; Zhan, L.; Xu, L.; Ma, L.; Yang, Y.; and Wu, X. 2021. SLAKE: A Semantically-Labeled Knowledge-Enhanced Dataset for Medical Visual Question Answering. CoRR, abs/2102.09542

work page arXiv 2021

[32] [32]

Liu, H.; Li, C.; Wu, Q.; and Lee, Y. J. 2024. Visual instruction tuning. Advances in neural information processing systems, 36

work page 2024

[33] [33]

Liu, J.; Lian, J.; and Yu, Y. 2020. ChestX-Det10: Chest X-ray Dataset on Detection of Thoracic Abnormalities. arXiv:2006.10550

work page arXiv 2020

[34] [34]

Q.; Lam, K.; Le, L

Nguyen, H. Q.; Lam, K.; Le, L. T.; Pham, H. H.; Tran, D. Q.; Nguyen, D. B.; Le, D. D.; Pham, C. M.; Tong, H. T. T.; Dinh, D. H.; Do, C. D.; Doan, L. T.; Nguyen, C. N.; Nguyen, B. T.; Nguyen, Q. V.; Hoang, A. D.; Phan, H. N.; Nguyen, A. T.; Ho, P. H.; Ngo, D. T.; Nguyen, N. T.; Nguyen, N. T.; Dao, M.; and Vu, V. 2022. VinDr-CXR: An open dataset of chest X-...

work page arXiv 2022

[35] [35]

Papineni, K.; Roukos, S.; Ward, T.; and Zhu, W.-J. 2002. Bleu: a method for automatic evaluation of machine translation. In Proceedings of the 40th annual meeting of the Association for Computational Linguistics, 311--318

work page 2002

[36] [36]

Pellegrini, C.; Özsoy, E.; Busam, B.; Navab, N.; and Keicher, M. 2023. RaDialog: A Large Vision-Language Model for Radiology Report Generation and Conversational Assistance. arXiv:2311.18681

work page arXiv 2023

[37] [37]

Pellegrini, C.; Özsoy, E.; Busam, B.; Navab, N.; and Keicher, M. 2024. RaDialog Instruct Dataset (version 1.1.0). PhysioNet

work page 2024

[38] [38]

C.; Schwaighofer, A.; Lungren, M

Pérez-García, F.; Sharma, H.; Bond-Taylor, S.; Bouzid, K.; Salvatelli, V.; Ilse, M.; Bannur, S.; Castro, D. C.; Schwaighofer, A.; Lungren, M. P.; Wetscherek, M.; Codella, N.; Hyland, S. L.; Alvarez-Valle, J.; and Oktay, O. 2024. RAD-DINO: Exploring Scalable Medical Image Encoders Beyond Text Supervision. arXiv:2401.10815

work page arXiv 2024

[39] [39]

Raffel, C.; Shazeer, N.; Roberts, A.; Lee, K.; Narang, S.; Matena, M.; Zhou, Y.; Li, W.; and Liu, P. J. 2020. Exploring the limits of transfer learning with a unified text-to-text transformer. Journal of machine learning research, 21(140): 1--67

work page 2020

[40] [40]

Rahman, T.; Khandakar, A.; Qiblawey, Y.; Tahir, A.; Kiranyaz, S.; Abul Kashem, S.; Islam, M.; Al Maadeed, S.; Zughaier, S.; Khan, M.; and Chowdhury, M. 2021. Exploring the Effect of Image Enhancement Techniques on COVID-19 Detection using Chest X-rays Images. Computers in Biology and Medicine, 104319

work page 2021

[41] [41]

P.; de Paiva, J

Reis, E. P.; de Paiva, J. P.; da Silva, M. C.; Ribeiro, G. A.; Paiva, V. F.; Bulgarelli, L.; Lee, H. M.; Santos, P. V.; Brito, V. M.; Amaral, L. T.; Beraldo, G. L.; Haidar Filho, J. N.; Teles, G. B.; Szarf, G.; Pollard, T.; Johnson, A. E.; Celi, L. A.; and Amaro, E. J. 2022. BRAX, Brazilian labeled chest x-ray dataset. Scientific Data, 9(1): 487

work page 2022

[42] [42]

Shih, G.; Wu, C.; Halabi, S.; Kohli, M.; Prevedello, L.; Cook, T.; Sharma, A.; Amorosa, J.; Arteaga, V.; Galperin-Aizenberg, M.; Gill, R.; Godoy, M.; Hobbs, S.; Jeudy, J.; Laroia, A.; Shah, P.; Vummidi, D.; Yaddanapudi, K.; and Stein, A. 2019. Augmenting the national institutes of health chest radiograph dataset with expert annotations of possible pneumon...

work page 2019

[43] [43]

Shin, H.-C.; Roberts, K.; Lu, L.; Demner-Fushman, D.; Yao, J.; and Summers, R. M. 2016. Learning to Read Chest X-Rays: Recurrent Neural Cascade Model for Automated Image Annotation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR)

work page 2016

[44] [44]

Shiraishi, J.; Katsuragawa, S.; Ikezoe, J.; Matsumoto, T.; Kobayashi, T.; Komatsu, K.; Matsui, M.; Fujita, H.; Kodera, Y.; and Doi, K. 2000. Development of a digital image database for chest radiographs with and without a lung nodule: receiver operating characteristic analysis of radiologists' detection of pulmonary nodules. AJR Am J Roentgenol, 174(1): 71--74

work page 2000

[45] [45]

Y.; and Lungren, M

Smit, A.; Jain, S.; Rajpurkar, P.; Pareek, A.; Ng, A. Y.; and Lungren, M. P. 2020. CheXbert: Combining Automatic Labelers and Expert Annotations for Accurate Radiology Report Labeling Using BERT. arXiv:2004.09167

work page arXiv 2020

[46] [46]

Tahir, A.; Chowdhury, M.; Qiblawey, Y.; Khandakar, A.; Rahman, T.; Kiranyaz, S.; Khurshid, U.; Ibtehaz, N.; Mahmud, S.; and Ezeddin, M. 2021. COVID-QU-Ex Dataset. https://www.kaggle.com/datasets/tahirahmed/covidquex

work page 2021

[47] [47]

Tanida, T.; M \"u ller, P.; Kaissis, G.; and Rueckert, D. 2023. Interactive and explainable region-guided radiology report generation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 7433--7442

work page 2023

[48] [48]

Xraygpt: Chest radiographs summarization using medical vision-language models

Thawkar, O.; Shaker, A.; Mullappilly, S. S.; Cholakkal, H.; Anwer, R. M.; Khan, S.; Laaksonen, J.; and Khan, F. S. 2023. XrayGPT: Chest Radiographs Summarization using Medical Vision-Language Models. arXiv:2306.07971

work page arXiv 2023

[49] [49]

Tu, T.; Azizi, S.; Driess, D.; Schaekermann, M.; Amin, M.; Chang, P.-C.; Carroll, A.; Lau, C.; Tanno, R.; Ktena, I.; et al. 2024. Towards generalist biomedical AI. NEJM AI, 1(3): AIoa2300138

work page 2024

[50] [50]

Wang, X.; Peng, Y.; Lu, L.; Lu, Z.; Bagheri, M.; and Summers, R. M. 2017. ChestX-Ray8: Hospital-Scale Chest X-Ray Database and Benchmarks on Weakly-Supervised Classification and Localization of Common Thorax Diseases. In 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 3462--3471

work page 2017

[51] [51]

Wang, Z.; Liu, L.; Wang, L.; and Zhou, L. 2023. Metransformer: Radiology report generation by transformer with multiple learnable expert tokens. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 11558--11567

work page 2023

[52] [52]

V.; Zhou, D.; et al

Wei, J.; Wang, X.; Schuurmans, D.; Bosma, M.; Xia, F.; Chi, E.; Le, Q. V.; Zhou, D.; et al. 2022. Chain-of-thought prompting elicits reasoning in large language models. Advances in neural information processing systems, 35: 24824--24837

work page 2022

[53] [53]

Wu, C.; Zhang, X.; Zhang, Y.; Wang, Y.; and Xie, W. 2023. Towards Generalist Foundation Model for Radiology by Leveraging Web-scale 2D&3D Medical Data. arXiv:2308.02463

work page arXiv 2023

[54] [54]

T.; Agu, N

Wu, J. T.; Agu, N. N.; Lourentzou, I.; Sharma, A.; Paguio, J. A.; Yao, J. S.; Dee, E. C.; Mitchell, W.; Kashyap, S.; Giovannini, A.; Celi, L. A.; and Moradi, M. 2021. Chest ImaGenome Dataset for Clinical Reasoning. arXiv:2108.00316

work page arXiv 2021

[55] [55]

Yang, L.; Xu, S.; Sellergren, A.; Kohlberger, T.; Zhou, Y.; Ktena, I.; Kiraly, A.; Ahmed, F.; Hormozdiari, F.; Jaroensri, T.; et al. 2024. Advancing multimodal medical capabilities of Gemini. arXiv:2405.03162

work page arXiv 2024

[56] [56]

You, H.; Zhang, H.; Gan, Z.; Du, X.; Zhang, B.; Wang, Z.; Cao, L.; Chang, S.-F.; and Yang, Y. 2024. Ferret: Refer and Ground Anything Anywhere at Any Granularity. In The Twelfth International Conference on Learning Representations

work page 2024

[57] [57]

K.; Baek, W.; and Roh, B

You, K.; Gu, J.; Ham, J.; Park, B.; Kim, J.; Hong, E. K.; Baek, W.; and Roh, B. 2023. CXR-CLIP: Toward Large Scale Chest X-ray Language-Image Pre-training. In Medical Image Computing and Computer Assisted Intervention -- MICCAI 2023, 101--111. Springer Nature Switzerland

work page 2023

[58] [58]

P.; Fonseca, E

Yu, F.; Endo, M.; Krishnan, R.; Pan, I.; Tsai, A.; Reis, E. P.; Fonseca, E. K. U. N.; Lee, H. M. H.; Abad, Z. S. H.; Ng, A. Y.; et al. 2023. Evaluating progress in automatic chest x-ray radiology report generation. Patterns, 4(9)

work page 2023

[59] [59]

Zawacki, A.; Wu, C.; Shih, G.; Elliott, J.; Fomitchev, M.; Hussain, M.; ParasLakhani; Culliton, P.; and Bao, S. 2019. SIIM-ACR Pneumothorax Segmentation

work page 2019

[60] [60]

Zhang, Y.; Wang, X.; Xu, Z.; Yu, Q.; Yuille, A.; and Xu, D. 2020. When Radiology Report Generation Meets Knowledge Graph. Proceedings of the AAAI Conference on Artificial Intelligence, 34(07): 12910--12917

work page 2020

[61] [61]

Zheng, Y.; Gan, W.; Chen, Z.; Qi, Z.; Liang, Q.; and Yu, P. S. 2024. Large Language Models for Medicine: A Survey. arXiv:2405.13055

work page arXiv 2024

[62] [62]

Zhou, Z.; Shi, M.; Wei, M.; Alabi, O.; Yue, Z.; and Vercauteren, T. 2024. Large Model driven Radiology Report Generation with Clinical Quality Reinforcement Learning. arXiv:2403.06728

work page arXiv 2024

[63] [63]

Zhu, D.; Chen, J.; Shen, X.; Li, X.; and Elhoseiny, M. 2023. MiniGPT-4: Enhancing Vision-Language Understanding with Advanced Large Language Models. arXiv:2304.10592

work page internal anchor Pith review Pith/arXiv arXiv 2023