LLaMA-XR: A Novel Framework for Radiology Report Generation using LLaMA and QLoRA Fine Tuning

Israt Jahan; Md. Zihad Bin Jahangir; Minh Chau; Muhammad Ashad Kabir; Sumaiya Akter

arxiv: 2506.03178 · v2 · pith:ZUF6SDU7new · submitted 2025-05-29 · 📡 eess.IV · cs.AI· cs.CV

LLaMA-XR: A Novel Framework for Radiology Report Generation using LLaMA and QLoRA Fine Tuning

Md. Zihad Bin Jahangir , Muhammad Ashad Kabir , Sumaiya Akter , Israt Jahan , Minh Chau This is my paper

Pith reviewed 2026-05-19 13:36 UTC · model grok-4.3

classification 📡 eess.IV cs.AIcs.CV

keywords radiology report generationLLaMAQLoRAchest X-raymedical image captioningfine-tuningnatural language generationclinical accuracy

0 comments

The pith

LLaMA-XR generates more coherent and clinically accurate radiology reports from chest X-rays by pairing LLaMA 3.1 with DenseNet-121 embeddings and QLoRA fine-tuning.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper presents LLaMA-XR as a system that takes chest radiographs, extracts visual features with DenseNet-121, and feeds them into a LLaMA 3.1 model that has been adapted with QLoRA for efficient training. The authors report that this produces reports scoring 0.433 on ROUGE-L and 0.336 on METEOR on the IU X-ray dataset, beating prior methods while using less memory and running faster. A reader would care because automated reports could lighten the routine workload for radiologists without demanding expensive hardware. The work frames this as a practical step toward reliable AI assistance in diagnostic imaging.

Core claim

LLaMA-XR integrates LLaMA 3.1 with DenseNet-121-based image embeddings and Quantized Low-Rank Adaptation (QLoRA) fine-tuning. On the IU X-ray benchmark dataset it reaches a ROUGE-L score of 0.433 and a METEOR score of 0.336, outperforming existing methods in coherence and clinical accuracy while preserving computational efficiency through optimized parameter utilization and reduced memory overhead.

What carries the argument

QLoRA-adapted LLaMA 3.1 conditioned on DenseNet-121 image embeddings, which enables memory-efficient fine-tuning for medical report generation from radiographs.

If this is right

Outperforms prior state-of-the-art methods on the standard IU X-ray benchmark.
Produces reports with greater coherence and clinical accuracy.
Generates reports faster while requiring lower computational resources.
Provides enhanced clinical utility and reliability for automated radiology reporting.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

If the metric gains translate to real clinical settings, hospitals could deploy similar systems to draft initial reports and let radiologists focus on ambiguous cases.
The quantized adaptation technique may allow other large language models to be specialized for additional medical imaging modalities without large compute budgets.
Testing the same architecture on larger, multi-institutional radiology datasets would reveal whether the reported improvements hold outside the IU X-ray collection.

Load-bearing premise

Higher scores on automatic similarity metrics such as ROUGE-L and METEOR reliably indicate improved clinical accuracy and usefulness in the generated reports.

What would settle it

A head-to-head evaluation in which practicing radiologists rate the clinical accuracy, completeness, and diagnostic utility of LLaMA-XR reports against both human-written ground truth and outputs from prior models, showing no meaningful advantage for the new system.

Figures

Figures reproduced from arXiv: 2506.03178 by Israt Jahan, Md. Zihad Bin Jahangir, Minh Chau, Muhammad Ashad Kabir, Sumaiya Akter.

**Figure 1.** Figure 1: Overview of the proposed model architecture for radiology report generation. The X-ray images (AP and LAT views) are processed [PITH_FULL_IMAGE:figures/full_fig_p006_1.png] view at source ↗

**Figure 2.** Figure 2: Illustration of the “DenseNet121-res224-all” output classes. [PITH_FULL_IMAGE:figures/full_fig_p007_2.png] view at source ↗

**Figure 3.** Figure 3: The heatmap highlights the critical regions in a chest X-ray image that influenced the Densenet-121 model’s classification decision. [PITH_FULL_IMAGE:figures/full_fig_p008_3.png] view at source ↗

**Figure 4.** Figure 4: Example prompt for fine-tuning and gradient stability, the batch size was set to 8, with gradient accumulation steps of 4, effectively simulating a batch size of 32. The model was trained for three complete epochs, which was sufficient for convergence, given the size of the training dataset and the complexity of the task. A learning rate of 2 × 10−6 was used, optimized using the AdamW 8-bit optimizer (opti… view at source ↗

read the original abstract

Automated radiology report generation holds significant potential to reduce radiologists' workload and enhance diagnostic accuracy. However, generating precise and clinically meaningful reports from chest radiographs remains challenging due to the complexity of medical language and the need for contextual understanding. Existing models often struggle with maintaining both accuracy and contextual relevance. In this paper, we present LLaMA-XR, a novel framework that integrates LLaMA 3.1 with DenseNet-121-based image embeddings and Quantized Low-Rank Adaptation (QLoRA) fine-tuning. LLaMA-XR achieves improved coherence and clinical accuracy while maintaining computational efficiency. This efficiency is driven by an optimization strategy that enhances parameter utilization and reduces memory overhead, enabling faster report generation with lower computational resource demands. Extensive experiments conducted on the IU X-ray benchmark dataset demonstrate that LLaMA-XR outperforms a range of state-of-the-art methods. Our model achieves a ROUGE-L score of 0.433 and a METEOR score of 0.336, establishing new performance benchmarks in the domain. These results underscore LLaMA-XR's potential as an effective and efficient AI system for automated radiology reporting, offering enhanced clinical utility and reliability.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

This paper applies LLaMA with QLoRA and a DenseNet encoder to radiology report generation and reports ROUGE/METEOR gains on IU X-ray, but the link from those scores to actual clinical accuracy is not supported by the evidence shown.

read the letter

The core of LLaMA-XR is a standard multimodal setup: DenseNet-121 extracts image features from chest X-rays, those features feed into LLaMA 3.1, and QLoRA handles the fine-tuning to keep memory use low. On the IU X-ray dataset they reach ROUGE-L of 0.433 and METEOR of 0.336 and present these as new benchmarks with better coherence and clinical accuracy. The efficiency angle from quantization is the part that feels practical for real deployment on modest hardware. That is the main thing the work contributes—an engineering recipe rather than a new algorithm or theory. The abstract does a clean job of describing the pipeline and the resource savings. The soft spot is the jump from lexical overlap metrics to claims of improved clinical accuracy and utility. ROUGE and METEOR measure n-gram matches with reference reports; they do not test whether critical findings are missed or whether false pathologies appear. No radiologist scoring, no factuality checks with tools like CheXbert or RadGraph, and no error analysis are mentioned. Without those, the clinical claims rest on an untested assumption. The paper also gives little detail on the exact baselines, statistical tests, or data splits, so it is hard to tell how much the specific framework adds beyond ordinary hyperparameter tuning of the same base models. This is the kind of paper that might interest teams already building or fine-tuning medical report generators and who need a working example of QLoRA on this task. It is less useful for readers looking for new methods or rigorous clinical validation. I would send it to peer review. The implementation is coherent enough that referees could usefully ask for the missing ablations and human or factual evaluations, and the efficiency focus is worth documenting if the numbers hold up.

Referee Report

2 major / 1 minor

Summary. The paper introduces LLaMA-XR, a framework integrating LLaMA 3.1 with DenseNet-121 image embeddings and QLoRA fine-tuning for automated generation of radiology reports from chest X-rays. It claims improved coherence and clinical accuracy with computational efficiency, reporting ROUGE-L of 0.433 and METEOR of 0.336 on the IU X-ray benchmark while outperforming state-of-the-art methods.

Significance. If properly validated, the use of QLoRA for efficient adaptation of LLaMA to medical report generation could offer a practical contribution to resource-efficient LLM fine-tuning in radiology. However, the current results rest on automatic lexical metrics without demonstrated links to clinical utility, limiting the work's immediate significance for diagnostic applications.

major comments (2)

[Abstract] Abstract: The claim that ROUGE-L = 0.433 and METEOR = 0.336 establish 'improved coherence and clinical accuracy' plus 'enhanced clinical utility' is unsupported. These metrics quantify n-gram overlap with reference reports and do not assess omission of critical findings, hallucinated pathologies, or diagnostic correctness; no radiologist scoring or factuality metrics (e.g., RadGraph, CheXbert) are referenced to bridge this gap.
[Abstract] Abstract: The assertion of outperforming 'a range of state-of-the-art methods' supplies no information on the exact baselines, statistical significance tests, ablation studies isolating the contribution of DenseNet-121 embeddings or QLoRA, or details on train/validation/test splits and data handling for the IU X-ray dataset, leaving the central empirical claim without visible supporting evidence.

minor comments (1)

The methods section should include explicit details on QLoRA rank, scaling factors, learning rate schedule, and exact training procedure to support reproducibility.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their constructive and detailed feedback. We address each major comment below, indicating where revisions to the manuscript are planned.

read point-by-point responses

Referee: [Abstract] Abstract: The claim that ROUGE-L = 0.433 and METEOR = 0.336 establish 'improved coherence and clinical accuracy' plus 'enhanced clinical utility' is unsupported. These metrics quantify n-gram overlap with reference reports and do not assess omission of critical findings, hallucinated pathologies, or diagnostic correctness; no radiologist scoring or factuality metrics (e.g., RadGraph, CheXbert) are referenced to bridge this gap.

Authors: We agree that ROUGE-L and METEOR are lexical overlap metrics and do not directly measure clinical accuracy, factuality, omission of findings, or hallucination of pathologies. The abstract phrasing overstated the clinical implications of these scores. In the revised manuscript we will rephrase the abstract to report the metric values as performance on standard automatic evaluation benchmarks without claiming direct clinical accuracy or utility. We will also add a limitations paragraph that explicitly notes the scope of these metrics and identifies clinical validation and factuality metrics (such as CheXbert-based entity extraction) as important directions for future work. revision: yes
Referee: [Abstract] Abstract: The assertion of outperforming 'a range of state-of-the-art methods' supplies no information on the exact baselines, statistical significance tests, ablation studies isolating the contribution of DenseNet-121 embeddings or QLoRA, or details on train/validation/test splits and data handling for the IU X-ray dataset, leaving the central empirical claim without visible supporting evidence.

Authors: The experimental section of the manuscript contains the full set of baseline comparisons, but the abstract is too concise to convey the necessary details. We will revise the abstract to name the primary state-of-the-art methods against which improvements are reported. We will also ensure the methods and results sections clearly document the train/validation/test splits used on IU X-ray, any statistical significance testing performed, and ablation experiments that isolate the contributions of the DenseNet-121 encoder and QLoRA adaptation. These details will be summarized or cross-referenced so that the empirical claims are fully supported. revision: yes

Circularity Check

0 steps flagged

No circularity: standard empirical fine-tuning and benchmark evaluation

full rationale

The paper presents an empirical ML framework that combines LLaMA 3.1, DenseNet-121 image embeddings, and QLoRA fine-tuning, then reports ROUGE-L and METEOR scores on the IU X-ray dataset after training. No mathematical derivation chain exists that reduces claimed outputs to inputs by construction. Performance numbers are obtained via conventional train/test splits and standard NLP metrics; they are not self-defined, fitted parameters renamed as predictions, or justified solely by self-citations. The central claim of improved performance rests on external benchmark comparison rather than tautological redefinition, satisfying the criteria for a self-contained empirical result.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 0 invented entities

The central claim rests on the validity of standard NLP metrics as proxies for clinical quality and on the representativeness of the IU X-ray benchmark for real-world radiology reporting.

free parameters (1)

QLoRA rank and scaling factors
Low-rank adaptation parameters are selected to control memory use and are not derived from first principles.

axioms (1)

domain assumption ROUGE-L and METEOR scores are adequate proxies for clinical accuracy of radiology reports.
Invoked when the abstract equates higher metric values with improved clinical accuracy and utility.

pith-pipeline@v0.9.0 · 5771 in / 1485 out tokens · 85707 ms · 2026-05-19T13:36:22.483067+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Foundation/RealityFromDistinction.lean reality_from_one_distinction unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

LLaMA-XR integrates LLaMA 3.1 with DenseNet-121-based image embeddings and Quantized Low-Rank Adaptation (QLoRA) fine-tuning... ROUGE-L score of 0.433 and a METEOR score of 0.336
IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

We employ fine-tuning techniques such as QLoRA... SFT to adapt LLaMA 3.1 to medical datasets

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

64 extracted references · 64 canonical work pages · 4 internal anchors

[1]

X. Wang, Y . Peng, L. Lu, Z. Lu, R. M. Summers, Tienet: Text-image embedding network for common thorax disease classification and reporting in chest x-rays, in: Proceedings of the IEEE conference on computer vision and pattern recognition, 2018, pp. 9049–9058. doi:10.1109/cvpr.2018.00943

work page doi:10.1109/cvpr.2018.00943 2018
[2]

R. M. MR, et al., Acquired heart disease in adults: what can a chest x-ray tell us?, Radiologia 59 (2017) 446–459

work page 2017
[3]

S. Bahl, T. Ramzan, R. Maraj, Interpretation and documentation of chest x-rays in the acute medical unit, Clinical Medicine 20 (2020) s73

work page 2020
[4]

Liu, T.-M

G. Liu, T.-M. H. Hsu, M. McDermott, W. Boag, W.-H. Weng, P. Szolovits, M. Ghassemi, Clinically accurate chest x-ray report generation, in: Machine Learning for Healthcare Conference, PMLR, 2019, pp. 249–269

work page 2019
[5]

Sloan, P

P. Sloan, P. Clatworthy, E. Simpson, M. Mirmehdi, Automated radiology report generation: A review of recent advances, IEEE Reviews in Biomedical Engineering (2024). doi:10.1109/RBME.2024.3408456

work page doi:10.1109/rbme.2024.3408456 2024
[6]

Q. You, H. Jin, Z. Wang, C. Fang, J. Luo, Image captioning with semantic attention, in: Proceedings of the IEEE conference on computer vision and pattern recognition, 2016, pp. 4651–4659. doi:10.1109/CVPR.2016.503

work page doi:10.1109/cvpr.2016.503 2016
[7]

F. Liu, X. Ren, Y . Liu, H. Wang, X. Sun, simnet: Stepwise image-topic merging network for generating detailed and comprehensive image captions, 2018. arXiv:1808.08732

work page internal anchor Pith review Pith/arXiv arXiv 2018
[8]

Vinyals, A

O. Vinyals, A. Toshev, S. Bengio, D. Erhan, Show and tell: A neural image caption generator, in: Proceedings of the IEEE conference on computer vision and pattern recognition, 2015, pp. 3156–3164. doi:10.1109/CVPR.2015.7298935

work page doi:10.1109/cvpr.2015.7298935 2015
[9]

Iftikhar, Iqra naz, anmol zahra, and syeda zainab yousuf zaidi

S. Iftikhar, Iqra naz, anmol zahra, and syeda zainab yousuf zaidi. 2022. report generation of lungs diseases from chest x-ray using nlp”, International Journal of Innovations in Science & Technology 3 (2022) 223–233

work page 2022
[10]

Ranjit, G

M. Ranjit, G. Ganapathy, R. Manuel, T. Ganu, Retrieval augmented chest x-ray report generation using openai gpt models, in: Machine Learning for Healthcare Conference, PMLR, 2023, pp. 650–666

work page 2023
[11]

L. C. Adams, D. Truhn, F. Busch, A. Kader, S. M. Niehues, M. R. Makowski, K. K. Bressem, Leveraging gpt-4 for post hoc transformation of free-text radiology reports into structured reporting: a multilingual feasibility study, Radiology 307 (2023) e230725. doi:10.1148/radiol. 230725

work page doi:10.1148/radiol 2023
[12]

Buckley, J

T. Buckley, J. Diao, R. Adam, A. Manrai, Accuracy of a vision-language model on challenging medical cases, 2023. arXiv:2311.05591

work page arXiv 2023
[13]

Z. Liu, Y . Huang, X. Yu, L. Zhang, Z. Wu, C. Cao, H. Dai, L. Zhao, Y . Li, P. Shu, F. Zeng, L. Sun, W. Liu, D. Shen, Q. Li, T. Liu, D. Zhu, X. Li, Deid-gpt: Zero-shot medical text de-identification by gpt-4, 2023. arXiv:2303.11032

work page arXiv 2023
[14]

T. H. Kung, M. Cheatham, A. Medenilla, C. Sillos, L. De Leon, C. Elepa ˜no, M. Madriaga, R. Aggabao, G. Diaz-Candido, J. Maningo, et al., Performance of chatgpt on usmle: potential for ai-assisted medical education using large language models, PLoS digital health 2 (2023) e0000198. doi:10.1371/journal.pdig.0000198

work page doi:10.1371/journal.pdig.0000198 2023
[15]

& Chen, C

T. Tanida, P. M ¨uller, G. Kaissis, D. Rueckert, Interactive and explainable region-guided radiology report generation, in: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023, pp. 7433–7442. doi:10.1109/CVPR52729.2023.00718

work page doi:10.1109/cvpr52729.2023.00718 2023
[16]

Xu, Medicalgpt: Training medical gpt model, https://github.com/shibing624/MedicalGPT, 2023

M. Xu, Medicalgpt: Training medical gpt model, https://github.com/shibing624/MedicalGPT, 2023

work page 2023
[17]

Radford, K

A. Radford, K. Narasimhan, T. Salimans, I. Sutskever, et al., Improving language understanding by generative pre-training (2018). xxii

work page 2018
[18]

Devlin, M.-W

J. Devlin, M.-W. Chang, K. Lee, K. Toutanova, Bert: Pre-training of deep bidirectional transformers for language understanding, in: Proceed- ings of the 2019 conference of the North American chapter of the association for computational linguistics: human language technologies, volume 1 (long and short papers), 2019, pp. 4171–4186

work page 2019
[19]

LLaMA: Open and Efficient Foundation Language Models

H. Touvron, T. Lavril, G. Izacard, X. Martinet, M.-A. Lachaux, T. Lacroix, B. Rozi `ere, N. Goyal, E. Hambro, F. Azhar, et al., Llama: Open and efficient foundation language models, arXiv preprint arXiv:2302.13971 (2023)

work page internal anchor Pith review Pith/arXiv arXiv 2023
[20]

The Llama 3 Herd of Models

A. Dubey, A. Jauhri, A. Pandey, A. Kadian, A. Al-Dahle, A. Letman, A. Mathur, A. Schelten, A. Yang, A. Fan, et al., The llama 3 herd of models, arXiv preprint arXiv:2407.21783 (2024)

work page internal anchor Pith review Pith/arXiv arXiv 2024
[21]

Nicolson, J

A. Nicolson, J. Dowling, B. Koopman, Improving chest x-ray report generation by leveraging warm starting, Artificial intelligence in medicine 144 (2023) 102633. doi:10.1016/j.artmed.2023.102633

work page doi:10.1016/j.artmed.2023.102633 2023
[22]

Y . Tao, L. Ma, J. Yu, H. Zhang, Memory-based cross-modal semantic alignment network for radiology report generation, IEEE Journal of Biomedical and Health Informatics (2024). doi:10.1109/JBHI.2024.3393018

work page doi:10.1109/jbhi.2024.3393018 2024
[23]

J. P. Cohen, J. D. Viviano, P. Bertin, P. Morrison, P. Torabian, M. Guarrera, M. P. Lungren, A. Chaudhari, R. Brooks, M. Hashir, et al., Torchxrayvision: A library of chest x-ray datasets and models, in: International Conference on Medical Imaging with Deep Learning, PMLR, 2022, pp. 231–249

work page 2022
[24]

H. T. N. Nguyen, D. Nie, T. Badamdorj, Y . Liu, Y . Zhu, J. Truong, L. Cheng, Automated generation of accurate & fluent medical x-ray reports, 2021. arXiv:2108.12126

work page arXiv 2021
[25]

Dettmers, A

T. Dettmers, A. Pagnoni, A. Holtzman, L. Zettlemoyer, Qlora: E fficient finetuning of quantized llms, Advances in Neural Information Processing Systems 36 (2024)

work page 2024
[26]

Demner-Fushman, M

D. Demner-Fushman, M. D. Kohli, M. B. Rosenman, S. E. Shooshan, L. Rodriguez, S. Antani, G. R. Thoma, C. J. McDonald, Preparing a collection of radiology examinations for distribution and retrieval, Journal of the American Medical Informatics Association 23 (2016) 304–310. doi:10.1093/jamia/ocv080

work page doi:10.1093/jamia/ocv080 2016
[27]

Vaswani, Attention is all you need, Advances in Neural Information Processing Systems (2017)

A. Vaswani, Attention is all you need, Advances in Neural Information Processing Systems (2017)

work page 2017
[28]

Y . Li, X. Liang, Z. Hu, E. P. Xing, Hybrid retrieval-generation reinforced agent for medical image report generation, Advances in neural information processing systems 31 (2018)

work page 2018
[29]

B. Jing, Z. Wang, E. Xing, Show, describe and conclude: On exploiting the structure information of chest x-ray reports, arXiv preprint arXiv:2004.12274 (2020)

work page arXiv 2004
[30]

Zhang, X

Y . Zhang, X. Wang, Z. Xu, Q. Yu, A. Yuille, D. Xu, When radiology report generation meets knowledge graph, in: Proceedings of the AAAI conference on artificial intelligence, volume 34, 2020, pp. 12910–12917. doi:10.1609/aaai.v34i07.6989

work page doi:10.1609/aaai.v34i07.6989 2020
[31]

Z. Chen, Y . Song, T.-H. Chang, X. Wan, Generating radiology reports via memory-driven transformer, arXiv preprint arXiv:2010.16056 (2020)

work page arXiv 2010
[32]

F. Liu, X. Wu, S. Ge, W. Fan, Y . Zou, Exploring and distilling posterior and prior knowledge for radiology report generation, in: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2021, pp. 13753–13762. doi:10.1109/CVPR46437.2021.01354

work page doi:10.1109/cvpr46437.2021.01354 2021
[33]

J. Li, S. Li, Y . Hu, H. Tao, A self-guided framework for radiology report generation, in: International Conference on Medical Image Computing and Computer-Assisted Intervention, Springer, 2022, pp. 588–598. doi:10.1007/978-3-031-16452-1_56

work page doi:10.1007/978-3-031-16452-1_56 2022
[34]

F. Liu, S. Ge, Y . Zou, X. Wu, Competence-based multimodal curriculum learning for medical report generation, arXiv preprint arXiv:2206.14579 (2022)

work page arXiv 2022
[35]

Z. Chen, Y . Shen, Y . Song, X. Wan, Cross-modal memory networks for radiology report generation, 2022.arXiv:2204.13258

work page arXiv 2022
[36]

J. You, D. Li, M. Okumura, K. Suzuki, Jpg-jointly learn to align: Automated disease prediction and radiology report generation, in: Proceedings of the 29th international conference on computational linguistics, 2022, pp. 5989–6001

work page 2022
[37]

B. Yan, M. Pei, M. Zhao, C. Shan, Z. Tian, Prior guided transformer for accurate radiology reports generation, IEEE Journal of Biomedical and Health Informatics 26 (2022) 5631–5640. doi:10.1109/JBHI.2022.3197162

work page doi:10.1109/jbhi.2022.3197162 2022
[38]

L. Wang, M. Ning, D. Lu, D. Wei, Y . Zheng, J. Chen, An inclusive task-aware framework for radiology report generation, in: International Conference on Medical Image Computing and Computer-Assisted Intervention, Springer, 2022, pp. 568–577. doi: 10.1007/978-3-031- xxiii 16452-1_54

work page doi:10.1007/978-3-031- 2022
[39]

M. Li, B. Lin, Z. Chen, H. Lin, X. Liang, X. Chang, Dynamic graph enhanced contrastive learning for chest x-ray report generation, in: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023, pp. 3334–3343. doi: 10.1109/CVPR52729. 2023.00325

work page doi:10.1109/cvpr52729 2023
[40]

H. Qin, Y . Song, Reinforced cross-modal alignment for radiology report generation, in: Findings of the Association for Computational Linguistics: ACL 2022, 2022, pp. 448–458. doi:10.18653/v1/2022.findings-acl.38

work page doi:10.18653/v1/2022.findings-acl.38 2022
[41]

Najdenkoska, X

I. Najdenkoska, X. Zhen, M. Worring, L. Shao, Variational topic inference for chest x-ray report generation, in: Medical Image Computing and Computer Assisted Intervention–MICCAI 2021: 24th International Conference, Strasbourg, France, September 27–October 1, 2021, Proceedings, Part III 24, Springer, 2021, pp. 625–635. doi:10.1007/978-3-030-87199-4_59

work page doi:10.1007/978-3-030-87199-4_59 2021
[42]

F. Zeng, Z. Lyu, Q. Li, X. Li, Enhancing llms for impression generation in radiology reports through a multi-agent system, arXiv preprint arXiv:2412.06828 (2024). doi:10.48550/arXiv.2412.06828

work page doi:10.48550/arxiv.2412.06828 2024
[43]

Y . Li, B. Yang, X. Cheng, Z. Zhu, H. Li, Y . Zou, Unify, align and refine: Multi-level semantic alignment for radiology report generation, in: Proceedings of the IEEE/CVF international conference on computer vision, 2023, pp. 2863–2874. doi:10.48550/arXiv.2303.15932

work page doi:10.48550/arxiv.2303.15932 2023
[44]

C. Yin, B. Qian, J. Wei, X. Li, X. Zhang, Y . Li, Q. Zheng, Automatic generation of medical imaging diagnostic report with hierarchical recurrent neural network, in: 2019 IEEE international conference on data mining (ICDM), IEEE, 2019, pp. 728–737. doi: 10.1109/ICDM. 2019.00083

work page doi:10.1109/icdm 2019
[45]

Islam, A

S. Islam, A. Dash, A. Seum, A. H. Raj, T. Hossain, F. M. Shah, Exploring video captioning techniques: A comprehensive survey on deep learning methods, SN Computer Science 2 (2021) 1–28. doi: 10.1007/s42979-021-00487-x

work page doi:10.1007/s42979-021-00487-x 2021
[46]

K. R. Suresh, A. Jarapala, P. Sudeep, Image captioning encoder–decoder models using cnn-rnn architectures: A comparative study, Circuits, Systems, and Signal Processing 41 (2022) 5719–5742. doi:10.1007/s00034-022-02050-2

work page doi:10.1007/s00034-022-02050-2 2022
[47]

Zhang, P

K. Zhang, P. Li, J. Wang, A review of deep learning-based remote sensing image caption: Methods, models, comparisons and future directions, Remote Sensing 16 (2024) 4113. doi: 10.3390/rs16214113

work page doi:10.3390/rs16214113 2024
[48]

G. Xu, S. Niu, M. Tan, Y . Luo, Q. Du, Q. Wu, Towards accurate text-based image captioning with content diversity exploration, in: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2021, pp. 12637–12646. doi: 10.1109/CVPR46437. 2021.01245

work page doi:10.1109/cvpr46437 2021
[49]

L. Chen, Z. Jiang, J. Xiao, W. Liu, Human-like controllable image captioning with verb-specific semantic roles, in: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2021, pp. 16846–16856. doi:10.1109/CVPR46437.2021.01657

work page doi:10.1109/cvpr46437.2021.01657 2021
[50]

A. Tran, A. Mathews, L. Xie, Transform and tell: Entity-aware news image captioning, in: Proceedings of the IEEE /CVF conference on computer vision and pattern recognition, 2020, pp. 13035–13045. doi:10.1109/CVPR42600.2020.01305

work page doi:10.1109/cvpr42600.2020.01305 2020
[51]

Jiang, C

Y . Jiang, C. Chen, D. Nguyen, B. M. Mervak, C. Tan, Gpt-4v cannot generate radiology reports yet, 2024. arXiv:2407.12176

work page arXiv 2024
[52]

E. J. Hu, Y . Shen, P. Wallis, Z. Allen-Zhu, Y . Li, S. Wang, L. Wang, W. Chen, Lora: Low-rank adaptation of large language models, 2021. arXiv:2106.09685

work page internal anchor Pith review Pith/arXiv arXiv 2021
[53]

B leu: a Method for Automatic Evaluation of Machine Translation

K. Papineni, S. Roukos, T. Ward, W.-J. Zhu, Bleu: a method for automatic evaluation of machine translation, in: Proceedings of the 40th annual meeting of the Association for Computational Linguistics, 2002, pp. 311–318. doi:10.3115/1073083.1073135

work page doi:10.3115/1073083.1073135 2002
[54]

Lin, Rouge: A package for automatic evaluation of summaries, in: Text summarization branches out, 2004, pp

C.-Y . Lin, Rouge: A package for automatic evaluation of summaries, in: Text summarization branches out, 2004, pp. 74–81

work page 2004
[55]

Denkowski, A

M. Denkowski, A. Lavie, Meteor 1.3: Automatic metric for reliable optimization and evaluation of machine translation systems, in: Proceedings of the sixth workshop on statistical machine translation, 2011, pp. 85–91

work page 2011
[56]

Banerjee, A

S. Banerjee, A. Lavie, Meteor: An automatic metric for mt evaluation with improved correlation with human judgments, in: Proceedings of the acl workshop on intrinsic and extrinsic evaluation measures for machine translation and/or summarization, 2005, pp. 65–72

work page 2005
[57]

Nguyen, C

D. Nguyen, C. Chen, H. He, C. Tan, Pragmatic radiology report generation, in: Machine Learning for Health (ML4H), PMLR, 2023, pp. 385–402

work page 2023
[58]

A. E. Johnson, T. J. Pollard, S. J. Berkowitz, N. R. Greenbaum, M. P. Lungren, C.-y. Deng, R. G. Mark, S. Horng, Mimic-cxr, a de-identified publicly available database of chest radiographs with free-text reports, Scientific data 6 (2019) 317. doi:10.1038/s41597-019-0322-0 . xxiv

work page doi:10.1038/s41597-019-0322-0 2019
[59]

Kim, C.-k

S. Kim, C.-k. Lee, S.-s. Kim, Large language models: a guide for radiologists, Korean Journal of Radiology 25 (2024) 126. doi: 10.3348/ kjr.2023.0997

work page arXiv 2024
[60]

Dikici, M

E. Dikici, M. Bigelow, L. M. Prevedello, R. D. White, B. S. Erdal, Integrating ai into radiology workflow: levels of research, production, and feedback maturity, Journal of Medical Imaging 7 (2020) 016502–016502. doi:10.1117/1.JMI.7.1.016502

work page doi:10.1117/1.jmi.7.1.016502 2020
[61]

L. Guo, L. Xia, Q. Zheng, B. Zheng, S. Jaeger, M. L. Giger, J. Fuhrman, H. Li, F. Y . Lure, H. Li, et al., Can ai generate diagnostic reports for radiologist approval on cxr images? a multi-reader and multi-case observer performance study, Journal of X-Ray Science and Technology (2024) 1–16. doi:10.3233/XST-240051

work page doi:10.3233/xst-240051 2024
[62]

Watanabe, S

A. Watanabe, S. Ketabi, K. Namdar, F. Khalvati, Improving disease classification performance and explainability of deep learning models in radiology with heatmap generators, Frontiers in radiology 2 (2022) 991683. doi: 10.3389/fradi.2022.991683

work page doi:10.3389/fradi.2022.991683 2022
[63]

Granata, F

V . Granata, F. De Muzio, C. Cutolo, F. Dell’Aversana, F. Grassi, R. Grassi, I. Simonetti, F. Bruno, P. Palumbo, G. Chiti, et al., Structured reporting in radiological settings: pitfalls and perspectives, Journal of Personalized Medicine 12 (2022) 1344. doi: 10.3390/jpm12081344

work page doi:10.3390/jpm12081344 2022
[64]

Ahluwalia, M

M. Ahluwalia, M. Abdalla, J. Sanayei, L. Seyyed-Kalantari, M. Hussain, A. Ali, B. Fine, The subgroup imperative: chest radiograph classifier generalization gaps in patient, setting, and pathology subgroups, Radiology: Artificial Intelligence 5 (2023) e220270. doi: 10.1148/ryai. 220270. xxv

work page doi:10.1148/ryai 2023

[1] [1]

X. Wang, Y . Peng, L. Lu, Z. Lu, R. M. Summers, Tienet: Text-image embedding network for common thorax disease classification and reporting in chest x-rays, in: Proceedings of the IEEE conference on computer vision and pattern recognition, 2018, pp. 9049–9058. doi:10.1109/cvpr.2018.00943

work page doi:10.1109/cvpr.2018.00943 2018

[2] [2]

R. M. MR, et al., Acquired heart disease in adults: what can a chest x-ray tell us?, Radiologia 59 (2017) 446–459

work page 2017

[3] [3]

S. Bahl, T. Ramzan, R. Maraj, Interpretation and documentation of chest x-rays in the acute medical unit, Clinical Medicine 20 (2020) s73

work page 2020

[4] [4]

Liu, T.-M

G. Liu, T.-M. H. Hsu, M. McDermott, W. Boag, W.-H. Weng, P. Szolovits, M. Ghassemi, Clinically accurate chest x-ray report generation, in: Machine Learning for Healthcare Conference, PMLR, 2019, pp. 249–269

work page 2019

[5] [5]

Sloan, P

P. Sloan, P. Clatworthy, E. Simpson, M. Mirmehdi, Automated radiology report generation: A review of recent advances, IEEE Reviews in Biomedical Engineering (2024). doi:10.1109/RBME.2024.3408456

work page doi:10.1109/rbme.2024.3408456 2024

[6] [6]

Q. You, H. Jin, Z. Wang, C. Fang, J. Luo, Image captioning with semantic attention, in: Proceedings of the IEEE conference on computer vision and pattern recognition, 2016, pp. 4651–4659. doi:10.1109/CVPR.2016.503

work page doi:10.1109/cvpr.2016.503 2016

[7] [7]

F. Liu, X. Ren, Y . Liu, H. Wang, X. Sun, simnet: Stepwise image-topic merging network for generating detailed and comprehensive image captions, 2018. arXiv:1808.08732

work page internal anchor Pith review Pith/arXiv arXiv 2018

[8] [8]

Vinyals, A

O. Vinyals, A. Toshev, S. Bengio, D. Erhan, Show and tell: A neural image caption generator, in: Proceedings of the IEEE conference on computer vision and pattern recognition, 2015, pp. 3156–3164. doi:10.1109/CVPR.2015.7298935

work page doi:10.1109/cvpr.2015.7298935 2015

[9] [9]

Iftikhar, Iqra naz, anmol zahra, and syeda zainab yousuf zaidi

S. Iftikhar, Iqra naz, anmol zahra, and syeda zainab yousuf zaidi. 2022. report generation of lungs diseases from chest x-ray using nlp”, International Journal of Innovations in Science & Technology 3 (2022) 223–233

work page 2022

[10] [10]

Ranjit, G

M. Ranjit, G. Ganapathy, R. Manuel, T. Ganu, Retrieval augmented chest x-ray report generation using openai gpt models, in: Machine Learning for Healthcare Conference, PMLR, 2023, pp. 650–666

work page 2023

[11] [11]

L. C. Adams, D. Truhn, F. Busch, A. Kader, S. M. Niehues, M. R. Makowski, K. K. Bressem, Leveraging gpt-4 for post hoc transformation of free-text radiology reports into structured reporting: a multilingual feasibility study, Radiology 307 (2023) e230725. doi:10.1148/radiol. 230725

work page doi:10.1148/radiol 2023

[12] [12]

Buckley, J

T. Buckley, J. Diao, R. Adam, A. Manrai, Accuracy of a vision-language model on challenging medical cases, 2023. arXiv:2311.05591

work page arXiv 2023

[13] [13]

Z. Liu, Y . Huang, X. Yu, L. Zhang, Z. Wu, C. Cao, H. Dai, L. Zhao, Y . Li, P. Shu, F. Zeng, L. Sun, W. Liu, D. Shen, Q. Li, T. Liu, D. Zhu, X. Li, Deid-gpt: Zero-shot medical text de-identification by gpt-4, 2023. arXiv:2303.11032

work page arXiv 2023

[14] [14]

T. H. Kung, M. Cheatham, A. Medenilla, C. Sillos, L. De Leon, C. Elepa ˜no, M. Madriaga, R. Aggabao, G. Diaz-Candido, J. Maningo, et al., Performance of chatgpt on usmle: potential for ai-assisted medical education using large language models, PLoS digital health 2 (2023) e0000198. doi:10.1371/journal.pdig.0000198

work page doi:10.1371/journal.pdig.0000198 2023

[15] [15]

& Chen, C

T. Tanida, P. M ¨uller, G. Kaissis, D. Rueckert, Interactive and explainable region-guided radiology report generation, in: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023, pp. 7433–7442. doi:10.1109/CVPR52729.2023.00718

work page doi:10.1109/cvpr52729.2023.00718 2023

[16] [16]

Xu, Medicalgpt: Training medical gpt model, https://github.com/shibing624/MedicalGPT, 2023

M. Xu, Medicalgpt: Training medical gpt model, https://github.com/shibing624/MedicalGPT, 2023

work page 2023

[17] [17]

Radford, K

A. Radford, K. Narasimhan, T. Salimans, I. Sutskever, et al., Improving language understanding by generative pre-training (2018). xxii

work page 2018

[18] [18]

Devlin, M.-W

J. Devlin, M.-W. Chang, K. Lee, K. Toutanova, Bert: Pre-training of deep bidirectional transformers for language understanding, in: Proceed- ings of the 2019 conference of the North American chapter of the association for computational linguistics: human language technologies, volume 1 (long and short papers), 2019, pp. 4171–4186

work page 2019

[19] [19]

LLaMA: Open and Efficient Foundation Language Models

H. Touvron, T. Lavril, G. Izacard, X. Martinet, M.-A. Lachaux, T. Lacroix, B. Rozi `ere, N. Goyal, E. Hambro, F. Azhar, et al., Llama: Open and efficient foundation language models, arXiv preprint arXiv:2302.13971 (2023)

work page internal anchor Pith review Pith/arXiv arXiv 2023

[20] [20]

The Llama 3 Herd of Models

A. Dubey, A. Jauhri, A. Pandey, A. Kadian, A. Al-Dahle, A. Letman, A. Mathur, A. Schelten, A. Yang, A. Fan, et al., The llama 3 herd of models, arXiv preprint arXiv:2407.21783 (2024)

work page internal anchor Pith review Pith/arXiv arXiv 2024

[21] [21]

Nicolson, J

A. Nicolson, J. Dowling, B. Koopman, Improving chest x-ray report generation by leveraging warm starting, Artificial intelligence in medicine 144 (2023) 102633. doi:10.1016/j.artmed.2023.102633

work page doi:10.1016/j.artmed.2023.102633 2023

[22] [22]

Y . Tao, L. Ma, J. Yu, H. Zhang, Memory-based cross-modal semantic alignment network for radiology report generation, IEEE Journal of Biomedical and Health Informatics (2024). doi:10.1109/JBHI.2024.3393018

work page doi:10.1109/jbhi.2024.3393018 2024

[23] [23]

J. P. Cohen, J. D. Viviano, P. Bertin, P. Morrison, P. Torabian, M. Guarrera, M. P. Lungren, A. Chaudhari, R. Brooks, M. Hashir, et al., Torchxrayvision: A library of chest x-ray datasets and models, in: International Conference on Medical Imaging with Deep Learning, PMLR, 2022, pp. 231–249

work page 2022

[24] [24]

H. T. N. Nguyen, D. Nie, T. Badamdorj, Y . Liu, Y . Zhu, J. Truong, L. Cheng, Automated generation of accurate & fluent medical x-ray reports, 2021. arXiv:2108.12126

work page arXiv 2021

[25] [25]

Dettmers, A

T. Dettmers, A. Pagnoni, A. Holtzman, L. Zettlemoyer, Qlora: E fficient finetuning of quantized llms, Advances in Neural Information Processing Systems 36 (2024)

work page 2024

[26] [26]

Demner-Fushman, M

D. Demner-Fushman, M. D. Kohli, M. B. Rosenman, S. E. Shooshan, L. Rodriguez, S. Antani, G. R. Thoma, C. J. McDonald, Preparing a collection of radiology examinations for distribution and retrieval, Journal of the American Medical Informatics Association 23 (2016) 304–310. doi:10.1093/jamia/ocv080

work page doi:10.1093/jamia/ocv080 2016

[27] [27]

Vaswani, Attention is all you need, Advances in Neural Information Processing Systems (2017)

A. Vaswani, Attention is all you need, Advances in Neural Information Processing Systems (2017)

work page 2017

[28] [28]

Y . Li, X. Liang, Z. Hu, E. P. Xing, Hybrid retrieval-generation reinforced agent for medical image report generation, Advances in neural information processing systems 31 (2018)

work page 2018

[29] [29]

B. Jing, Z. Wang, E. Xing, Show, describe and conclude: On exploiting the structure information of chest x-ray reports, arXiv preprint arXiv:2004.12274 (2020)

work page arXiv 2004

[30] [30]

Zhang, X

Y . Zhang, X. Wang, Z. Xu, Q. Yu, A. Yuille, D. Xu, When radiology report generation meets knowledge graph, in: Proceedings of the AAAI conference on artificial intelligence, volume 34, 2020, pp. 12910–12917. doi:10.1609/aaai.v34i07.6989

work page doi:10.1609/aaai.v34i07.6989 2020

[31] [31]

Z. Chen, Y . Song, T.-H. Chang, X. Wan, Generating radiology reports via memory-driven transformer, arXiv preprint arXiv:2010.16056 (2020)

work page arXiv 2010

[32] [32]

F. Liu, X. Wu, S. Ge, W. Fan, Y . Zou, Exploring and distilling posterior and prior knowledge for radiology report generation, in: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2021, pp. 13753–13762. doi:10.1109/CVPR46437.2021.01354

work page doi:10.1109/cvpr46437.2021.01354 2021

[33] [33]

J. Li, S. Li, Y . Hu, H. Tao, A self-guided framework for radiology report generation, in: International Conference on Medical Image Computing and Computer-Assisted Intervention, Springer, 2022, pp. 588–598. doi:10.1007/978-3-031-16452-1_56

work page doi:10.1007/978-3-031-16452-1_56 2022

[34] [34]

F. Liu, S. Ge, Y . Zou, X. Wu, Competence-based multimodal curriculum learning for medical report generation, arXiv preprint arXiv:2206.14579 (2022)

work page arXiv 2022

[35] [35]

Z. Chen, Y . Shen, Y . Song, X. Wan, Cross-modal memory networks for radiology report generation, 2022.arXiv:2204.13258

work page arXiv 2022

[36] [36]

J. You, D. Li, M. Okumura, K. Suzuki, Jpg-jointly learn to align: Automated disease prediction and radiology report generation, in: Proceedings of the 29th international conference on computational linguistics, 2022, pp. 5989–6001

work page 2022

[37] [37]

B. Yan, M. Pei, M. Zhao, C. Shan, Z. Tian, Prior guided transformer for accurate radiology reports generation, IEEE Journal of Biomedical and Health Informatics 26 (2022) 5631–5640. doi:10.1109/JBHI.2022.3197162

work page doi:10.1109/jbhi.2022.3197162 2022

[38] [38]

L. Wang, M. Ning, D. Lu, D. Wei, Y . Zheng, J. Chen, An inclusive task-aware framework for radiology report generation, in: International Conference on Medical Image Computing and Computer-Assisted Intervention, Springer, 2022, pp. 568–577. doi: 10.1007/978-3-031- xxiii 16452-1_54

work page doi:10.1007/978-3-031- 2022

[39] [39]

M. Li, B. Lin, Z. Chen, H. Lin, X. Liang, X. Chang, Dynamic graph enhanced contrastive learning for chest x-ray report generation, in: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023, pp. 3334–3343. doi: 10.1109/CVPR52729. 2023.00325

work page doi:10.1109/cvpr52729 2023

[40] [40]

H. Qin, Y . Song, Reinforced cross-modal alignment for radiology report generation, in: Findings of the Association for Computational Linguistics: ACL 2022, 2022, pp. 448–458. doi:10.18653/v1/2022.findings-acl.38

work page doi:10.18653/v1/2022.findings-acl.38 2022

[41] [41]

Najdenkoska, X

I. Najdenkoska, X. Zhen, M. Worring, L. Shao, Variational topic inference for chest x-ray report generation, in: Medical Image Computing and Computer Assisted Intervention–MICCAI 2021: 24th International Conference, Strasbourg, France, September 27–October 1, 2021, Proceedings, Part III 24, Springer, 2021, pp. 625–635. doi:10.1007/978-3-030-87199-4_59

work page doi:10.1007/978-3-030-87199-4_59 2021

[42] [42]

F. Zeng, Z. Lyu, Q. Li, X. Li, Enhancing llms for impression generation in radiology reports through a multi-agent system, arXiv preprint arXiv:2412.06828 (2024). doi:10.48550/arXiv.2412.06828

work page doi:10.48550/arxiv.2412.06828 2024

[43] [43]

Y . Li, B. Yang, X. Cheng, Z. Zhu, H. Li, Y . Zou, Unify, align and refine: Multi-level semantic alignment for radiology report generation, in: Proceedings of the IEEE/CVF international conference on computer vision, 2023, pp. 2863–2874. doi:10.48550/arXiv.2303.15932

work page doi:10.48550/arxiv.2303.15932 2023

[44] [44]

C. Yin, B. Qian, J. Wei, X. Li, X. Zhang, Y . Li, Q. Zheng, Automatic generation of medical imaging diagnostic report with hierarchical recurrent neural network, in: 2019 IEEE international conference on data mining (ICDM), IEEE, 2019, pp. 728–737. doi: 10.1109/ICDM. 2019.00083

work page doi:10.1109/icdm 2019

[45] [45]

Islam, A

S. Islam, A. Dash, A. Seum, A. H. Raj, T. Hossain, F. M. Shah, Exploring video captioning techniques: A comprehensive survey on deep learning methods, SN Computer Science 2 (2021) 1–28. doi: 10.1007/s42979-021-00487-x

work page doi:10.1007/s42979-021-00487-x 2021

[46] [46]

K. R. Suresh, A. Jarapala, P. Sudeep, Image captioning encoder–decoder models using cnn-rnn architectures: A comparative study, Circuits, Systems, and Signal Processing 41 (2022) 5719–5742. doi:10.1007/s00034-022-02050-2

work page doi:10.1007/s00034-022-02050-2 2022

[47] [47]

Zhang, P

K. Zhang, P. Li, J. Wang, A review of deep learning-based remote sensing image caption: Methods, models, comparisons and future directions, Remote Sensing 16 (2024) 4113. doi: 10.3390/rs16214113

work page doi:10.3390/rs16214113 2024

[48] [48]

G. Xu, S. Niu, M. Tan, Y . Luo, Q. Du, Q. Wu, Towards accurate text-based image captioning with content diversity exploration, in: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2021, pp. 12637–12646. doi: 10.1109/CVPR46437. 2021.01245

work page doi:10.1109/cvpr46437 2021

[49] [49]

L. Chen, Z. Jiang, J. Xiao, W. Liu, Human-like controllable image captioning with verb-specific semantic roles, in: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2021, pp. 16846–16856. doi:10.1109/CVPR46437.2021.01657

work page doi:10.1109/cvpr46437.2021.01657 2021

[50] [50]

A. Tran, A. Mathews, L. Xie, Transform and tell: Entity-aware news image captioning, in: Proceedings of the IEEE /CVF conference on computer vision and pattern recognition, 2020, pp. 13035–13045. doi:10.1109/CVPR42600.2020.01305

work page doi:10.1109/cvpr42600.2020.01305 2020

[51] [51]

Jiang, C

Y . Jiang, C. Chen, D. Nguyen, B. M. Mervak, C. Tan, Gpt-4v cannot generate radiology reports yet, 2024. arXiv:2407.12176

work page arXiv 2024

[52] [52]

E. J. Hu, Y . Shen, P. Wallis, Z. Allen-Zhu, Y . Li, S. Wang, L. Wang, W. Chen, Lora: Low-rank adaptation of large language models, 2021. arXiv:2106.09685

work page internal anchor Pith review Pith/arXiv arXiv 2021

[53] [53]

B leu: a Method for Automatic Evaluation of Machine Translation

K. Papineni, S. Roukos, T. Ward, W.-J. Zhu, Bleu: a method for automatic evaluation of machine translation, in: Proceedings of the 40th annual meeting of the Association for Computational Linguistics, 2002, pp. 311–318. doi:10.3115/1073083.1073135

work page doi:10.3115/1073083.1073135 2002

[54] [54]

Lin, Rouge: A package for automatic evaluation of summaries, in: Text summarization branches out, 2004, pp

C.-Y . Lin, Rouge: A package for automatic evaluation of summaries, in: Text summarization branches out, 2004, pp. 74–81

work page 2004

[55] [55]

Denkowski, A

M. Denkowski, A. Lavie, Meteor 1.3: Automatic metric for reliable optimization and evaluation of machine translation systems, in: Proceedings of the sixth workshop on statistical machine translation, 2011, pp. 85–91

work page 2011

[56] [56]

Banerjee, A

S. Banerjee, A. Lavie, Meteor: An automatic metric for mt evaluation with improved correlation with human judgments, in: Proceedings of the acl workshop on intrinsic and extrinsic evaluation measures for machine translation and/or summarization, 2005, pp. 65–72

work page 2005

[57] [57]

Nguyen, C

D. Nguyen, C. Chen, H. He, C. Tan, Pragmatic radiology report generation, in: Machine Learning for Health (ML4H), PMLR, 2023, pp. 385–402

work page 2023

[58] [58]

A. E. Johnson, T. J. Pollard, S. J. Berkowitz, N. R. Greenbaum, M. P. Lungren, C.-y. Deng, R. G. Mark, S. Horng, Mimic-cxr, a de-identified publicly available database of chest radiographs with free-text reports, Scientific data 6 (2019) 317. doi:10.1038/s41597-019-0322-0 . xxiv

work page doi:10.1038/s41597-019-0322-0 2019

[59] [59]

Kim, C.-k

S. Kim, C.-k. Lee, S.-s. Kim, Large language models: a guide for radiologists, Korean Journal of Radiology 25 (2024) 126. doi: 10.3348/ kjr.2023.0997

work page arXiv 2024

[60] [60]

Dikici, M

E. Dikici, M. Bigelow, L. M. Prevedello, R. D. White, B. S. Erdal, Integrating ai into radiology workflow: levels of research, production, and feedback maturity, Journal of Medical Imaging 7 (2020) 016502–016502. doi:10.1117/1.JMI.7.1.016502

work page doi:10.1117/1.jmi.7.1.016502 2020

[61] [61]

L. Guo, L. Xia, Q. Zheng, B. Zheng, S. Jaeger, M. L. Giger, J. Fuhrman, H. Li, F. Y . Lure, H. Li, et al., Can ai generate diagnostic reports for radiologist approval on cxr images? a multi-reader and multi-case observer performance study, Journal of X-Ray Science and Technology (2024) 1–16. doi:10.3233/XST-240051

work page doi:10.3233/xst-240051 2024

[62] [62]

Watanabe, S

A. Watanabe, S. Ketabi, K. Namdar, F. Khalvati, Improving disease classification performance and explainability of deep learning models in radiology with heatmap generators, Frontiers in radiology 2 (2022) 991683. doi: 10.3389/fradi.2022.991683

work page doi:10.3389/fradi.2022.991683 2022

[63] [63]

Granata, F

V . Granata, F. De Muzio, C. Cutolo, F. Dell’Aversana, F. Grassi, R. Grassi, I. Simonetti, F. Bruno, P. Palumbo, G. Chiti, et al., Structured reporting in radiological settings: pitfalls and perspectives, Journal of Personalized Medicine 12 (2022) 1344. doi: 10.3390/jpm12081344

work page doi:10.3390/jpm12081344 2022

[64] [64]

Ahluwalia, M

M. Ahluwalia, M. Abdalla, J. Sanayei, L. Seyyed-Kalantari, M. Hussain, A. Ali, B. Fine, The subgroup imperative: chest radiograph classifier generalization gaps in patient, setting, and pathology subgroups, Radiology: Artificial Intelligence 5 (2023) e220270. doi: 10.1148/ryai. 220270. xxv

work page doi:10.1148/ryai 2023