Astra: a generalizable report generation foundation model for 3D computed tomography

Chaohui Yu; Chao Zhu; Fang Chen; Fan Wang; Haojie Han; Hao Luo; Hongen Liao; Jia Guo; Jing Wang; Shipeng Zhang

arxiv: 2605.31437 · v2 · pith:B5HV3MNTnew · submitted 2026-05-29 · 💻 cs.CV

Astra: a generalizable report generation foundation model for 3D computed tomography

Zhuhao Wang , Fang Chen , Chaohui Yu , Zihan Li , Yuchao Zheng , Jing Wang , Xuan Yang , Jia Guo

show 14 more authors

Zhenlu Yang Xingju Zheng Yihua Sun Haojie Han Xiaoxiao Qin Zhan Feng Wenbo Xiao Chao Zhu Yuehua Li Shipeng Zhang Hao Luo Yunsong Peng Fan Wang Hongen Liao

This is my paper

Pith reviewed 2026-06-28 23:14 UTC · model grok-4.3

classification 💻 cs.CV

keywords CT report generationfoundation model3D computed tomographymedical imagingreinforcement learningreport style harmonizationthoracoabdominal CTclinical workflow assistance

0 comments

The pith

Astra generates style-consistent and diagnostically accurate reports from 3D CT scans across institutions by harmonizing report styles and using reinforcement learning.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces Astra, a foundation model trained on 90,678 thoracoabdominal CT-report pairs that cover 353,671 abnormalities across eight organ systems. It tackles inconsistencies in reporting style and diagnostic terms between different cohorts through style harmonization during training and reinforcement learning to improve diagnostic consistency. This produces reports that maintain accuracy for multiple anatomical regions even when tested on six separate external cohorts from different institutions. A sympathetic reader would care because the approach could cut the time radiologists spend reviewing hundreds of slices per exam while raising the completeness and consistency of the resulting reports. The model is also positioned as reusable infrastructure that can support other CT-related AI tasks through its generated reports.

Core claim

Astra, trained on 90,678 thoracoabdominal CT-report pairs spanning eight organ systems, employs report style harmonization and reinforcement learning to produce style-consistent and diagnostically accurate reports that generalize across diverse anatomical regions and institutions, delivering a 44.1 percent average gain in fine-grained diagnostic metrics on the main dataset and six external cohorts while accelerating chest report drafting by 29.6 percent and raising abdominal report completeness by 11.3 percent in real clinical workflows.

What carries the argument

Report style harmonization combined with reinforcement learning for diagnostic consistency, applied to a large multi-region CT-report dataset.

If this is right

Astra assistance reduces time to draft chest reports by 29.6 percent in live clinical settings.
Astra raises completeness of abdominal reports by 11.3 percent with statistical significance.
The model improves performance on downstream CT diagnostic tasks when used for pretraining or data synthesis.
High-quality synthetic reports from Astra scale vision-language pretraining for other CT AI models.
The same training approach supports multi-region reporting that remains robust outside the original training distribution.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The style-harmonization step could transfer to report generation for other volumetric modalities such as MRI if similar terminology differences appear across sites.
Widespread use might reduce inter-institution variability in how findings are described, affecting how referring physicians interpret results.
Synthetic reports produced by the model could fill gaps in training data for rare conditions without requiring new manual annotations.
Embedding the model in reporting software might shift radiologist effort from initial drafting toward verification and refinement of AI output.

Load-bearing premise

The six external cohorts capture enough real-world variation in reporting style and diagnostic practice, and the fine-grained metrics measure clinical accuracy independently of the harmonization step used in training.

What would settle it

Performance measurements on a fresh external cohort from an unseen institution that show no improvement over prior methods in diagnostic metrics or no gain in workflow completeness and speed.

Figures

Figures reproduced from arXiv: 2605.31437 by Chaohui Yu, Chao Zhu, Fang Chen, Fan Wang, Haojie Han, Hao Luo, Hongen Liao, Jia Guo, Jing Wang, Shipeng Zhang, Wenbo Xiao, Xiaoxiao Qin, Xingju Zheng, Xuan Yang, Yihua Sun, Yuchao Zheng, Yuehua Li, Yunsong Peng, Zhan Feng, Zhenlu Yang, Zhuhao Wang, Zihan Li.

**Figure 2.** Figure 2: Evaluation of the harmonization strategy. a, t-SNE visualization of report text embeddings from the original CT-Rate reports. Embeddings were extracted using Qwen3- Embedding for the four most prevalent diseases. Red and grey points denote positive and negative cases, respectively. b, t-SNE visualization of report text embeddings from harmonized CT-Rate reports using Qwen3-Embedding. The harmonization stra… view at source ↗

**Figure 3.** Figure 3: Evaluation on public benchmark and external cohorts. a, Evaluation on CTRate (n = 1564), Inspect (n = 1000), BIMCV (n = 1505) and Merlin (n = 1000). b, Evaluation on CTRG-Chest (n = 324), AMOS-MM (n = 400), Inhouse-Abdomen-1 (n = 400), InhouseAbdomen-2 (n = 400), Inhouse-Abdomen-3 (n = 400), Inhouse-Chest (n = 400). NLG means average score across BLEU-4, ROUGE-L, METEOR. Astra outperforms current general… view at source ↗

**Figure 4.** Figure 4: Human-AI collaboration in radiology report generation. a–d, Comparison of reporting efficiency and report quality for chest CT interpretation with and without AI assistance. P values was calculated using a two-sided Wilcoxon signed-rank test. e–h, Comparison of reporting efficiency and report quality for abdomen CT interpretation with and without AI assistance. Orange bars indicate results from junior rad… view at source ↗

**Figure 5.** Figure 5: Astra facilitates efficient development of diagnostic models a, Current disease diagnosis pipelines rely on fine-tuning pretrained vision foundation models. b, Astra boosts the performance of pretrained CT encoders through an ensemble strategy. c–f, Performance comparison between the ensemble strategy and the pretrained encoder alone under linear probing settings across four datasets (l.p. means linear pro… view at source ↗

**Figure 6.** Figure 6: Astra scales CT vision-language pretraining a, Scaling pretraining by integrating unlabeled NLST data (augmented with Astra-generated pseudo-reports) with the CT-Rate dataset yields consistent performance gains over pretraining on CT-Rate alone. b,c, Downstream classification performance of the visual encoder using 100% (b) and 20% (c) of labeled training samples. Astra-enhanced joint pretraining signifi… view at source ↗

**Figure 7.** Figure 7: Ablation studies of harmonization strategy and reinforcement learning. a, Performance gains from the reinforcement learning across multiple benchmarks. Baseline means the supervised fine-tuning model trained on CTRgDB, whereas RL-tuned means the model further optimized with reinforcement learning. Additional results are provided in supplementary Tab. B3,4. b–d, Case-level F1-score analysis on CT-Rate. The … view at source ↗

**Figure 8.** Figure 8: Architecture and training strategy of Astra. Astra adopts a LLaVA-style architecture composed of a 3D visual encoder, a Perceiver token compressor and a large language model decoder. Volumetric CT inputs are first encoded into visual features, compressed into a small set of latent tokens, and then projected into the language model for report generation. Training is performed in two stages: supervised fine-… view at source ↗

**Figure 9.** Figure 9: Representative harmonized chest and abdominal CT report cases. The original free-text reports are reorganized into predefined anatomical regions, and each region focuses on positive abnormalities while regions without relevant findings are labeled as normal. For chest CT reports, the standardized reporting order is abdomen, bone, breast, esophagus, heart, lung, mediastinum, pleura, thyroid, and trachea and… view at source ↗

**Figure 10.** Figure 10: Organ-level abnormality co-occurrence matrix in the CTRgDB dataset. The matrix shows multi-organ abnormality co-occurrence across 30 major organs in CTRgDB. In the abdomen, abnormalities of the liver, kidneys, peritoneum and lymph nodes frequently co-occur, consistent with common metastatic spread patterns. In the thorax, lung abnormalities co-occur broadly with abnormalities in multiple other organs. Tog… view at source ↗

**Figure 11.** Figure 11: Region-level abnormality co-occurrence chord diagram for chest CTs in CTRgDB. The diagram shows the frequency of multi-region abnormality involvement across 10 major thoracic regions in CTRgDB. Abnormalities in the mediastinum frequently co-occur with those in the lungs, consistent with the common spread of neoplastic and inflammatory processes to the mediastinal region. CTRgDB captures complex multi-regi… view at source ↗

**Figure 12.** Figure 12: Region-level abnormality co-occurrence chord diagram for abdominal CTs in CTRgDB. The diagram shows the frequency of multi-region abnormality involvement across 13 major abdominal regions in CTRgDB. CTRgDB captures complex multi-region abnormality involvement in abdominal CT examinations. 33 [PITH_FULL_IMAGE:figures/full_fig_p033_12.png] view at source ↗

**Figure 13.** Figure 13: Disease-level co-occurrence matrix for chest CTs in CTRgDB. The matrix shows co-occurrence patterns among 18 major diseases defined in CT-Rate, highlighting frequent multi-disease involvement in individual chest CT examinations. These results demonstrate that CTRgDB captures complex disease-level abnormalities in chest CTs. 34 [PITH_FULL_IMAGE:figures/full_fig_p034_13.png] view at source ↗

**Figure 14.** Figure 14: Disease-level co-occurrence matrix for abdominal CTs in CTRgDB. The matrix illustrates co-occurrence patterns among 30 major abdominal diseases defined with reference to Merlin, revealing frequent coexisting disease entities within individual abdominal CT examinations. This distribution supports the ability of CTRgDB to represent complex diseaselevel involvement in abdominal CTs. 35 [PITH_FULL_IMAGE:fi… view at source ↗

**Figure 15.** Figure 15: Word clouds in CTRgDB reports. The word clouds highlight the rich diagnostic vocabulary used in chest and abdominal CT reports in CTRgDB. Reports frequently describe abnormalities in terms of degree, landmark and feature, enabling precise and clinically informative characterization. These patterns indicate that CTRgDB provides detailed diagnostic descriptions at the finding level. 36 [PITH_FULL_IMAGE:fi… view at source ↗

**Figure 16.** Figure 16: Heterogeneity of imaging and reporting distributions across CT cohorts. a–c, Distributions of slice spacing, in-plane XY spacing and slice number across cohorts. d,e, Region-wise abnormality profiles in chest and abdominal cohorts, with dot size and colour denoting normalized abnormality counts. f,g, Report length distributions in chest and abdominal cohorts, measured by word count. The cohorts showed ma… view at source ↗

**Figure 17.** Figure 17: Case study of human-AI collaboration in chest CT report generation. Astra identified all abnormalities but provided insufficient characterization of a pulmonary nodule. The radiologist missed arteriosclerosis. Human–AI collaboration accurately detected all abnormalities with precise characterization. 38 [PITH_FULL_IMAGE:figures/full_fig_p038_17.png] view at source ↗

**Figure 18.** Figure 18: Case study of human-AI collaboration in abdomen CT report generation. Astra identified bladder tumour, renal cyst, enlarged prostate, arteriosclerosis, and spinal degeneration, but misdiagnosed atelectasis. The radiologist missed enlarged prostate, arteriosclerosis, and spinal degeneration. Human–AI collaboration correctly detected all abnormalities, provided accurate characterizations, and corrected the… view at source ↗

**Figure 19.** Figure 19: Case study of human-AI collaboration in abdomen CT report generation. Astra missed prostatic calcification, while the radiologist missed hepatic steatosis. Human–AI collaboration accurately identified all abnormalities with precise characterization. 40 [PITH_FULL_IMAGE:figures/full_fig_p040_19.png] view at source ↗

read the original abstract

CT interpretation requires radiologists to review hundreds of volumetric slices per examination, making reporting time-consuming and highly expertise-dependent. Automated CT report generation offers a promising route to improving clinical efficiency, yet the field still lacks a generalizable CT report generation foundation model that supports multi-region reporting and remains robust across external real-world cohorts. Intrinsic inconsistencies in reporting style and diagnostic terminology across cohorts make naive joint training prone to noisy textual supervision, thereby limiting model generalizability. Here we present Astra, a generalizable CT report generation foundation model trained on 90,678 thoracoabdominal CT-report pairs (CTRgDB) with 353,671 abnormalities spanning eight organ systems. By harmonizing report style and further refining diagnostic consistency via reinforcement learning, Astra achieves style-consistent and diagnostically accurate report generation across diverse anatomical regions and institutions. Evaluating on CTRgDB and six external cohorts, Astra achieves state-of-the-art performance with a 44.1% average improvement in fine-grained diagnostic metrics (P<0.001). In real-world clinical workflows, Astra assistance accelerates chest report drafting by 29.6% and improves abdominal report completeness by 11.3% (P<0.001). Furthermore, Astra also demonstrates broad utility as a foundation for CT AI development, improving downstream diagnostic performance and scaling vision-language pretrain through high-quality report synthesis. Overall, Astra serves as a broadly accessible clinical assistant and a pivotal infrastructure for the next generation of AI-powered healthcare.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

Astra scales up CT report generation with a big dataset and external cohorts but the abstract's metric claims are hard to evaluate without methods details.

read the letter

The punchline is that this paper trains on 90k thoracoabdominal CT-report pairs and reports gains across six external cohorts plus some clinical workflow numbers. That scale and the multi-institution testing are the concrete advances.

What is new is the CTRgDB dataset itself and the explicit use of style harmonization followed by reinforcement learning to reduce cross-cohort noise in the text supervision. The authors also run a real-world workflow study measuring drafting time and report completeness. Those elements go beyond most prior vision-language medical report papers, which often stay within single institutions or smaller sets.

The soft spots sit in the evaluation. The abstract states a 44.1% average improvement in fine-grained diagnostic metrics with p-values but gives no definition of those metrics, no ablation that isolates diagnostic content from the harmonized style, and no error bars or per-cohort breakdowns. The stress-test concern about possible confounding is reasonable on the current text: if the metrics reward phrasing or structure that the harmonization step directly targets, the external gains could partly reflect style transfer rather than independent accuracy. Without the methods section it is impossible to judge how cleanly the two are separated.

This paper is for groups building or deploying medical report generators who need large-scale multi-region data and external validation examples. Readers focused on clinical efficiency metrics will find the workflow numbers useful even if the core model details require more scrutiny.

It deserves peer review because the dataset size and external cohort design are substantial enough to warrant referee time, provided the authors supply the missing metric definitions and ablations.

Referee Report

2 major / 2 minor

Summary. The manuscript presents Astra, a foundation model for generating reports from 3D CT scans. Trained on 90,678 thoracoabdominal CT-report pairs (CTRgDB) containing 353,671 abnormalities across eight organ systems, the model applies report style harmonization followed by reinforcement learning to produce consistent outputs. It claims state-of-the-art performance with a 44.1% average improvement in fine-grained diagnostic metrics (P<0.001) on CTRgDB plus six external cohorts, plus real-world clinical benefits (29.6% faster chest report drafting; 11.3% better abdominal report completeness, P<0.001) and utility as a foundation for downstream CT AI tasks.

Significance. If the performance and generalizability claims are substantiated with full methodological detail and appropriate controls, the work would represent a meaningful advance in medical vision-language modeling by addressing cross-institution style and terminology inconsistencies at scale. The size of the training corpus and the inclusion of multi-cohort external evaluation plus clinical workflow measurements are positive features.

major comments (2)

[Abstract] Abstract: the central claim of a 44.1% average improvement in 'fine-grained diagnostic metrics' (P<0.001) across CTRgDB and six external cohorts is presented without any definition of the metrics themselves, without ablation results that hold style fixed while varying only diagnostic content, and without error bars or dataset characteristics for the external cohorts. This is load-bearing for the generalizability and diagnostic-accuracy assertions because, per the stress-test note, the metrics may incorporate terminology or structure altered by the harmonization step, so measured gains could reflect partial style transfer rather than independent diagnostic improvement.
[Abstract] Abstract: the training description states that style harmonization and reinforcement learning are used to refine diagnostic consistency, yet no implementation details, hyper-parameters, or ablation studies isolating the contribution of each step are supplied. Without these, it is impossible to assess whether the reported external-cohort gains are robust or whether they arise from overfitting to harmonized style patterns.

minor comments (2)

[Abstract] Abstract: the phrase 'fine-grained diagnostic metrics' is used without elaboration; a brief parenthetical definition or reference to the exact metric formulation would improve interpretability of the 44.1% figure.
[Abstract] Abstract: the clinical workflow results (29.6% acceleration, 11.3% completeness) are reported with p-values but without the number of participating radiologists, the exact study design, or confidence intervals, which would help readers gauge practical significance.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback on the abstract. We address each major comment below and will revise the manuscript to improve clarity while preserving the integrity of the reported results.

read point-by-point responses

Referee: [Abstract] Abstract: the central claim of a 44.1% average improvement in 'fine-grained diagnostic metrics' (P<0.001) across CTRgDB and six external cohorts is presented without any definition of the metrics themselves, without ablation results that hold style fixed while varying only diagnostic content, and without error bars or dataset characteristics for the external cohorts. This is load-bearing for the generalizability and diagnostic-accuracy assertions because, per the stress-test note, the metrics may incorporate terminology or structure altered by the harmonization step, so measured gains could reflect partial style transfer rather than independent diagnostic improvement.

Authors: We agree the abstract would benefit from greater specificity. The fine-grained diagnostic metrics are abnormality-level precision, recall, and F1 scores, defined in Methods Section 4.2. We will revise the abstract to include this definition and a parenthetical reference to the external cohort characteristics (Table 1) and error bars (Tables 2-3). Ablation results controlling for style via harmonized references while isolating diagnostic content appear in Supplementary Section S3.4; these show persistent gains attributable to diagnostic accuracy rather than style alone. The harmonization step is applied uniformly prior to training, and external-cohort evaluations use unharmonized target reports, supporting that gains are not limited to style transfer. revision: yes
Referee: [Abstract] Abstract: the training description states that style harmonization and reinforcement learning are used to refine diagnostic consistency, yet no implementation details, hyper-parameters, or ablation studies isolating the contribution of each step are supplied. Without these, it is impossible to assess whether the reported external-cohort gains are robust or whether they arise from overfitting to harmonized style patterns.

Authors: The implementation details for style harmonization (Section 3.1) and reinforcement learning (Section 3.2), including the style classifier, reward formulation, and training procedure, are supplied in the Methods. Hyperparameters are listed in Supplementary Table S2, and ablation studies isolating each component's contribution (including style vs. diagnostic effects) are in Figure 4 and Supplementary Table S3. These ablations demonstrate additive gains from the RL stage on external cohorts. We will revise the abstract to briefly reference these sections so readers can locate the supporting analyses. revision: yes

Circularity Check

0 steps flagged

No significant circularity; empirical claims rest on external validation

full rationale

The paper describes training a foundation model on CTRgDB with style harmonization and reinforcement learning, followed by evaluation on CTRgDB plus six external cohorts. No equations, derivations, or first-principles chains appear in the provided text. Performance metrics (e.g., 44.1% improvement) are reported from held-out external data rather than any fitted parameter renamed as a prediction or any self-referential definition. Self-citations, if present in the full text, are not load-bearing for the central empirical claims. The work is therefore self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

Abstract-only review; ledger populated from stated assumptions in the summary text.

axioms (2)

domain assumption External cohorts reflect real-world reporting style and diagnostic variability independent of training data
Required for the generalizability claim across institutions
domain assumption Fine-grained diagnostic metrics are valid proxies for clinical accuracy after style harmonization
Underpins the 44.1% improvement statement

pith-pipeline@v0.9.1-grok · 5869 in / 1334 out tokens · 20784 ms · 2026-06-28T23:14:01.553683+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

87 extracted references · 13 canonical work pages · 6 internal anchors

[1]

Role of computed tomography at a cancer center emergency department.Emergency radiology, 24(2):113–117, 2017

Jessyca Couto Otoni, Julia Noschang, Thábata Yaedu Okamoto, Diego Rosseman Vieira, Michel Souto Mayor Petry, Lucas de Araujo Ramos, Paula Nicole Vieira Pinto Barbosa, Almir Galvão Vieira Bitencourt, and Rubens Chojniak. Role of computed tomography at a cancer center emergency department.Emergency radiology, 24(2):113–117, 2017

2017
[2]

Developments in x-ray contrast media and the potential impact on computed tomography.Investigative radiology, 55(9):592–597, 2020

Laura Schöckel, Gregor Jost, Peter Seidensticker, Philipp Lengsfeld, Petra Palkowitsch, and Hubertus Pietsch. Developments in x-ray contrast media and the potential impact on computed tomography.Investigative radiology, 55(9):592–597, 2020

2020
[3]

Us diagnostic reference levels and achievable doses for 10 adult ct examinations.Radiology, 284(1):120–133, 2017

Kalpana M Kanal, Priscilla F Butler, Debapriya Sengupta, Mythreyi Bhargavan-Chatfield, Laura P Coombs, and Richard L Morin. Us diagnostic reference levels and achievable doses for 10 adult ct examinations.Radiology, 284(1):120–133, 2017

2017
[4]

Radiologist productivity analytics: factors impacting abdominal pelvic ct exam reporting times.Journal of Digital Imaging, 35(2):87–97, 2022

Amar Udare, Minu Agarwal, Kiret Dhindsa, Amer Alaref, Michael Patlas, Abdullah Al- abousi, Yoan K Kagoma, and Christian B van der Pol. Radiologist productivity analytics: factors impacting abdominal pelvic ct exam reporting times.Journal of Digital Imaging, 35(2):87–97, 2022

2022
[5]

The growing problem of radiologist shortage: China’s perspective.Korean Journal of Radiology, 24(11):1046, 2023

Fanyang Meng, Lan Zhan, Shiyuan Liu, and Huimao Zhang. The growing problem of radiologist shortage: China’s perspective.Korean Journal of Radiology, 24(11):1046, 2023

2023
[6]

Clinically applicable ai system for accurate diagnosis, quantitativemeasurements, andprognosisofcovid-19pneumoniausingcomputed tomography.Cell, 181(6):1423–1433, 2020

Kang Zhang, Xiaohong Liu, Jun Shen, Zhihuan Li, Ye Sang, Xingwang Wu, Yunfei Zha, Wenhua Liang, Chengdi Wang, Ke Wang, et al. Clinically applicable ai system for accurate diagnosis, quantitativemeasurements, andprognosisofcovid-19pneumoniausingcomputed tomography.Cell, 181(6):1423–1433, 2020

2020
[7]

Ai-based large-scale screening of gastric cancer from noncontrast ct imaging.Nature Medicine, 31(9):3011–3019, 2025

Can Hu, Yingda Xia, Zhilin Zheng, Mengxuan Cao, Guoliang Zheng, Shangqi Chen, Jiancheng Sun, Wujie Chen, Qi Zheng, Siwei Pan, et al. Ai-based large-scale screening of gastric cancer from noncontrast ct imaging.Nature Medicine, 31(9):3011–3019, 2025

2025
[8]

Ai-based diagnosis of acute aortic syndrome from noncontrast ct.Nature Medicine, 31(11):3832–3844, 2025

Yujian Hu, Yilang Xiang, Yan-Jie Zhou, Yangyan He, Dehai Lang, Shifeng Yang, Xiaolong Du, Chunlan Den, Youyao Xu, Gaofeng Wang, et al. Ai-based diagnosis of acute aortic syndrome from noncontrast ct.Nature Medicine, 31(11):3832–3844, 2025

2025
[9]

Large-scale pancreatic cancer detection via non- contrast ct and deep learning.Nature medicine, 29(12):3033–3043, 2023

Kai Cao, Yingda Xia, Jiawen Yao, Xu Han, Lukas Lambert, Tingting Zhang, Wei Tang, Gang Jin, Hui Jiang, Xu Fang, et al. Large-scale pancreatic cancer detection via non- contrast ct and deep learning.Nature medicine, 29(12):3033–3043, 2023

2023
[10]

Thecurrentstatusandfutureoffda-approvedartificialintelligence tools in chest radiology in the united states.Clinical Radiology, 78(2):115–122, 2023

MEMilamandCWKoo. Thecurrentstatusandfutureoffda-approvedartificialintelligence tools in chest radiology in the united states.Clinical Radiology, 78(2):115–122, 2023

2023
[11]

Towards a holistic frame- workformultimodalllmin3dbrainctradiologyreportgeneration.Nature Communications, 16(1):2258, 2025

Cheng-Yi Li, Kao-Jung Chang, Cheng-Fu Yang, Hsin-Yu Wu, Wenting Chen, Hritik Bansal, Ling Chen, Yi-Ping Yang, Yu-Chun Chen, Shih-Pin Chen, et al. Towards a holistic frame- workformultimodalllmin3dbrainctradiologyreportgeneration.Nature Communications, 16(1):2258, 2025

2025
[12]

Large language model with region- guided referring and grounding for ct report generation.IEEE Transactions on Medical Imaging, 2025

Zhixuan Chen, Yequan Bie, Haibo Jin, and Hao Chen. Large language model with region- guided referring and grounding for ct report generation.IEEE Transactions on Medical Imaging, 2025

2025
[13]

Ct2rep: Automated radiology report generation for 3d medical imaging

Ibrahim Ethem Hamamci, Sezgin Er, and Bjoern Menze. Ct2rep: Automated radiology report generation for 3d medical imaging. InInternational Conference on Medical Image Computing and Computer-Assisted Intervention, pages 476–486. Springer, 2024. 25

2024
[14]

Dia-llama: Towards large lan- guage model-driven ct report generation

Zhixuan Chen, Luyang Luo, Yequan Bie, and Hao Chen. Dia-llama: Towards large lan- guage model-driven ct report generation. InInternational Conference on Medical Image Computing and Computer-Assisted Intervention, pages 141–151. Springer, 2025

2025
[15]

Mvketr: chest ct report generation with multi-view perception and knowledge en- hancement.IEEE Journal of Biomedical and Health Informatics, 2025

Xiwei Deng, Xianchun He, Jianfeng Bao, Yudan Zhou, Shuhui Cai, Congbo Cai, and Zhong Chen. Mvketr: chest ct report generation with multi-view perception and knowledge en- hancement.IEEE Journal of Biomedical and Health Informatics, 2025

2025
[16]

Ct-agrg: Automated abnormality-guided report generation from 3d chest ct volumes

Theo Di Piazza, Carole Lazarus, Olivier Nempont, and Loic Boussel. Ct-agrg: Automated abnormality-guided report generation from 3d chest ct volumes. In2025 IEEE 22nd Inter- national Symposium on Biomedical Imaging (ISBI), pages 01–05. IEEE, 2025

2025
[17]

Ct- graph: Hierarchical graph attention network for anatomy-guided ct report generation

Hamza Kalisch, Fabian Hörst, Jens Kleesiek, Ken Herrmann, and Constantin Seibold. Ct- graph: Hierarchical graph attention network for anatomy-guided ct report generation. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 6775– 6784, 2025

2025
[18]

Better tokens for better 3d: Advancing vision-language modeling in 3d medical imaging.Advances in Neural Information Processing Systems, 38:135074–135102, 2026

Ibrahim Ethem Hamamci, Sezgin Er, Suprosanna Shit, Hadrien Reynaud, Dong Yang, Pengfei Guo, Marc Edgar, Daguang Xu, Bernhard Kainz, and Bjoern Menze. Better tokens for better 3d: Advancing vision-language modeling in 3d medical imaging.Advances in Neural Information Processing Systems, 38:135074–135102, 2026

2026
[19]

3d-ct-gpt: Generating 3d radiology reports through integration of large vision-language models.arXiv preprint arXiv:2409.19330, 2024

Hao Chen, Wei Zhao, Yingli Li, Tianyang Zhong, Yisong Wang, Youlan Shang, Lei Guo, Junwei Han, Tianming Liu, Jun Liu, et al. 3d-ct-gpt: Generating 3d radiology reports through integration of large vision-language models.arXiv preprint arXiv:2409.19330, 2024

work page arXiv 2024
[20]

Automated structured radiology report generation

Jean-Benoit Delbrouck, Justin Xu, Johannes Moll, Alois Thomas, Zhihong Chen, Sophie Ostmeier, Asfandyar Azhar, Kelvin Zhenghao Li, Andrew Johnston, Christian Bluethgen, et al. Automated structured radiology report generation. InProceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 26813–26829, 2025

2025
[21]

GPT-4 Technical Report

Josh Achiam, Steven Adler, Sandhini Agarwal, Lama Ahmad, Ilge Akkaya, Florencia Leoni Aleman, Diogo Almeida, Janko Altenschmidt, Sam Altman, Shyamal Anadkat, et al. Gpt-4 technical report.arXiv preprint arXiv:2303.08774, 2023

work page internal anchor Pith review Pith/arXiv arXiv 2023
[22]

Gemini: A Family of Highly Capable Multimodal Models

Gemini Team, Rohan Anil, Sebastian Borgeaud, Jean-Baptiste Alayrac, Jiahui Yu, Radu Soricut, Johan Schalkwyk, Andrew M Dai, Anja Hauth, Katie Millican, et al. Gemini: a family of highly capable multimodal models.arXiv preprint arXiv:2312.11805, 2023

work page internal anchor Pith review Pith/arXiv arXiv 2023
[23]

Towards generalist foundation model for radiology by leveraging web-scale 2d&3d medical data

Chaoyi Wu, Xiaoman Zhang, Ya Zhang, Hui Hui, Yanfeng Wang, and Weidi Xie. Towards generalist foundation model for radiology by leveraging web-scale 2d&3d medical data. Nature Communications, 16(1):7866, 2025

2025
[24]

Bai, et al., M3D: Advancing 3D medical image analysis with multi- modal large language modelsArXiv:2404.00578 (2024)

Fan Bai, Yuxin Du, Tiejun Huang, Max Q-H Meng, and Bo Zhao. M3d: Advanc- ing 3d medical image analysis with multi-modal large language models.arXiv preprint arXiv:2404.00578, 2024

work page arXiv 2024
[25]

InInternational Conference on Medical Image Com- puting and Computer-Assisted Intervention, pages 268–277

Songtao Jiang, Yuan Wang, Sibo Song, Tianxiang Hu, Chenyi Zhou, Bin Pu, Yan Zhang, Zhibo Yang, Yang Feng, Joey Tianyi Zhou, et al. Hulu-med: A transparent generalist model towards holistic medical vision-language understanding.arXiv preprint arXiv:2510.08668, 2025

work page arXiv 2025
[26]

Large language models improve transferability of electronic health record-based predictions across countries and coding systems.npj Digital Medicine, 2026

Matthias Kirchler, Matteo Ferro, Veronica Lorenzini, Robin P van de Water, FinnGen Ganna Andrea 3, Christoph Lippert, and Andrea Ganna. Large language models improve transferability of electronic health record-based predictions across countries and coding systems.npj Digital Medicine, 2026. 26

2026
[27]

Benchmark evaluation of deepseek large language models in clinical decision-making.Nature medicine, 31(8):2546–2549, 2025

Sarah Sandmann, Stefan Hegselmann, Michael Fujarski, Lucas Bickmann, Benjamin Wild, Roland Eils, and Julian Varghese. Benchmark evaluation of deepseek large language models in clinical decision-making.Nature medicine, 31(8):2546–2549, 2025

2025
[28]

Com- parative benchmarking of the deepseek large language model on medical tasks and clinical reasoning.Nature medicine, 31(8):2550–2555, 2025

Mickael Tordjman, Zelong Liu, Murat Yuce, Valentin Fauveau, Yunhao Mei, Jerome Had- jadj, Ian Bolger, Haidara Almansour, Carolyn Horst, Ashwin Singh Parihar, et al. Com- parative benchmarking of the deepseek large language model on medical tasks and clinical reasoning.Nature medicine, 31(8):2550–2555, 2025

2025
[29]

Zero-shot information extraction from radiological reports using chatgpt.International Journal of Medical Infor- matics, 183:105321, 2024

Danqing Hu, Bing Liu, Xiaofeng Zhu, Xudong Lu, and Nan Wu. Zero-shot information extraction from radiological reports using chatgpt.International Journal of Medical Infor- matics, 183:105321, 2024

2024
[30]

Clinical entity augmented retrieval for clinical information extraction.NPJ digital medicine, 8(1):45, 2025

Ivan Lopez, Akshay Swaminathan, Karthik Vedula, Sanjana Narayanan, Fateme Nateghi Haredasht, Stephen P Ma, April S Liang, Steven Tate, Manoj Maddali, Robert Joseph Gallo, et al. Clinical entity augmented retrieval for clinical information extraction.NPJ digital medicine, 8(1):45, 2025

2025
[31]

Merlin: A vision language foundation model for 3d computed tomography

Louis Blankemeier, Joseph Paul Cohen, Ashwin Kumar, Dave Van Veen, Syed Jamal Safdar Gardezi, Magdalini Paschali, Zhihong Chen, Jean-Benoit Delbrouck, Eduardo Reis, Cesar Truyts, et al. Merlin: A vision language foundation model for 3d computed tomography. Research Square, pages rs–3, 2024

2024
[32]

Pragmatic radiology report gener- ation

Dang Nguyen, Chacha Chen, He He, and Chenhao Tan. Pragmatic radiology report gener- ation. InMachine Learning for Health (ML4H), pages 385–402. PMLR, 2023

2023
[33]

Deepseek-r1 incentivizes reasoning in llms through reinforcement learning.Nature, 645(8081):633–638, 2025

Daya Guo, Dejian Yang, Haowei Zhang, Junxiao Song, Peiyi Wang, Qihao Zhu, Runxin Xu, Ruoyu Zhang, Shirong Ma, Xiao Bi, et al. Deepseek-r1 incentivizes reasoning in llms through reinforcement learning.Nature, 645(8081):633–638, 2025

2025
[34]

Radgpt: Constructing 3d image-text tumor datasets

Pedro RAS Bassi, Mehmet Can Yavuz, Ibrahim Ethem Hamamci, Sezgin Er, Xiaoxi Chen, Wenxuan Li, Bjoern Menze, Sergio Decherchi, Andrea Cavalli, Kang Wang, et al. Radgpt: Constructing 3d image-text tumor datasets. InProceedings of the IEEE/CVF International Conference on Computer Vision, pages 23720–23730, 2025

2025
[35]

A foundation model utilizing chest ct volumes and radiology reports for supervised-level zero-shot detection of abnormalities.CoRR, 2024

Ibrahim Ethem Hamamci, Sezgin Er, Furkan Almas, Ayse Gulnihan Simsek, Sevval Nil Esir- gun, Irem Dogan, Muhammed Furkan Dasdelen, Bastian Wittmann, Enis Simsar, Mehmet Simsar, et al. A foundation model utilizing chest ct volumes and radiology reports for supervised-level zero-shot detection of abnormalities.CoRR, 2024

2024
[36]

Inspect: a multimodal dataset for pulmonary embolism diagnosis and prognosis.arXiv preprint arXiv:2311.10798, 2023

Shih-ChengHuang, ZepengHuo, EthanSteinberg, Chia-ChunChiang, MatthewPLungren, Curtis P Langlotz, Serena Yeung, Nigam H Shah, and Jason A Fries. Inspect: a multimodal dataset for pulmonary embolism diagnosis and prognosis.arXiv preprint arXiv:2311.10798, 2023

work page arXiv 2023
[37]

Bimcv-r: A land- mark dataset for 3d ct text-image retrieval

Yinda Chen, Che Liu, Xiaoyu Liu, Rossella Arcucci, and Zhiwei Xiong. Bimcv-r: A land- mark dataset for 3d ct text-image retrieval. InInternational Conference on Medical Image Computing and Computer-Assisted Intervention, pages 124–134. Springer, 2024

2024
[38]

Ratescore: A metric for radiology report generation

Weike Zhao, Chaoyi Wu, Xiaoman Zhang, Ya Zhang, Yanfeng Wang, and Weidi Xie. Ratescore: A metric for radiology report generation. InProceedings of the 2024 Conference on Empirical Methods in Natural Language Processing, pages 15004–15019, 2024

2024
[39]

Qwen3 Embedding: Advancing Text Embedding and Reranking Through Foundation Models

Yanzhao Zhang, Mingxin Li, Dingkun Long, Xin Zhang, Huan Lin, Baosong Yang, Pengjun Xie, An Yang, Dayiheng Liu, Junyang Lin, et al. Qwen3 embedding: Advancing text embedding and reranking through foundation models.arXiv preprint arXiv:2506.05176, 2025. 27

work page internal anchor Pith review Pith/arXiv arXiv 2025
[40]

Visualizing data using t-sne.Journal of machine learning research, 9(11), 2008

Laurens Van der Maaten and Geoffrey Hinton. Visualizing data using t-sne.Journal of machine learning research, 9(11), 2008

2008
[41]

Qwen3 Technical Report

An Yang, Anfeng Li, Baosong Yang, Beichen Zhang, Binyuan Hui, Bo Zheng, Bowen Yu, Chang Gao, Chengen Huang, Chenxu Lv, et al. Qwen3 technical report.arXiv preprint arXiv:2505.09388, 2025

work page internal anchor Pith review Pith/arXiv arXiv 2025
[42]

Lingshu: A Generalist Foundation Model for Unified Multimodal Medical Understanding and Reasoning

Weiwen Xu, Hou Pong Chan, Long Li, Mahani Aljunied, Ruifeng Yuan, Jianyu Wang, Chenghao Xiao, Guizhen Chen, Chaoqun Liu, Zhaodonghui Li, et al. Lingshu: A general- ist foundation model for unified multimodal medical understanding and reasoning.arXiv preprint arXiv:2506.07044, 2025

work page internal anchor Pith review Pith/arXiv arXiv 2025
[43]

MedGemma Technical Report

Andrew Sellergren, Sahar Kazemzadeh, Tiam Jaroensri, Atilla Kiraly, Madeleine Traverse, Timo Kohlberger, Shawn Xu, Fayaz Jamil, Cían Hughes, Charles Lau, et al. Medgemma technical report.arXiv preprint arXiv:2507.05201, 2025

work page internal anchor Pith review Pith/arXiv arXiv 2025
[44]

R2gengpt: Radiology report generation with frozen llms.Meta-Radiology, 1(3):100033, 2023

Zhanyu Wang, Lingqiao Liu, Lei Wang, and Luping Zhou. R2gengpt: Radiology report generation with frozen llms.Meta-Radiology, 1(3):100033, 2023

2023
[45]

A clinically accessible small multimodal radiology model and evaluation metric for chest x- ray findings.Nature Communications, 16(1):3108, 2025

Juan Manuel Zambrano Chaves, Shih-Cheng Huang, Yanbo Xu, Hanwen Xu, Naoto Usuyama, Sheng Zhang, Fei Wang, Yujia Xie, Mahmoud Khademi, Ziyi Yang, et al. A clinically accessible small multimodal radiology model and evaluation metric for chest x- ray findings.Nature Communications, 16(1):3108, 2025

2025
[46]

3d-rad: A comprehensive 3d radiology med-vqa dataset with multi-temporal analysis and diverse diagnostic tasks.arXiv preprint arXiv:2506.11147, 2025

Xiaotang Gai, Jiaxiang Liu, Yichen Li, Zijie Meng, Jian Wu, and Zuozhu Liu. 3d-rad: A comprehensive 3d radiology med-vqa dataset with multi-temporal analysis and diverse diagnostic tasks.arXiv preprint arXiv:2506.11147, 2025

work page arXiv 2025
[47]

Work like a doctor: Unifying scan localizer and dynamic generator for automated computed tomography report generation

Yuhao Tang, Haichen Yang, Liyan Zhang, and Ye Yuan. Work like a doctor: Unifying scan localizer and dynamic generator for automated computed tomography report generation. Expert Systems with Applications, 237:121442, 2024

2024
[48]

Lung cancer screening with low-dose helical ct: results from the national lung screening trial (nlst), 2011

Barnett S Kramer, Christine D Berg, Denise R Aberle, and Philip C Prorok. Lung cancer screening with low-dose helical ct: results from the national lung screening trial (nlst), 2011

2011
[49]

Machine-learning-basedmultipleabnormalityprediction with large-scale chest computed tomography volumes.Medical image analysis, 67:101857, 2021

Rachel Lea Draelos, David Dov, Maciej A Mazurowski, Joseph Y Lo, Ricardo Henao, Geof- freyDRubin, andLawrenceCarin. Machine-learning-basedmultipleabnormalityprediction with large-scale chest computed tomography volumes.Medical image analysis, 67:101857, 2021

2021
[50]

The rsna pulmonary embolism ct dataset.Radiology: Artificial Intelligence, 3 (2):e200254, 2021

Errol Colak, Felipe C Kitamura, Stephen B Hobbs, Carol C Wu, Matthew P Lungren, Luciano M Prevedello, Jayashree Kalpathy-Cramer, Robyn L Ball, George Shih, Anouk Stein, et al. The rsna pulmonary embolism ct dataset.Radiology: Artificial Intelligence, 3 (2):e200254, 2021

2021
[51]

Bleu: a method for automatic evaluation of machine translation

Kishore Papineni, Salim Roukos, Todd Ward, and Wei-Jing Zhu. Bleu: a method for automatic evaluation of machine translation. InProceedings of the 40th annual meeting of the Association for Computational Linguistics, pages 311–318, 2002

2002
[52]

Rouge: A package for automatic evaluation of summaries

Chin-Yew Lin. Rouge: A package for automatic evaluation of summaries. InText summa- rization branches out, pages 74–81, 2004

2004
[53]

Meteor: An automatic metric for mt evaluation with improved correlation with human judgments

Satanjeev Banerjee and Alon Lavie. Meteor: An automatic metric for mt evaluation with improved correlation with human judgments. InProceedings of the acl workshop on intrinsic and extrinsic evaluation measures for machine translation and/or summarization, pages 65– 72, 2005. 28

2005
[54]

Maira-2: Grounded radiology report generation

Shruthi Bannur, Kenza Bouzid, Daniel C Castro, Anton Schwaighofer, Anja Thieme, Sam Bond-Taylor, Maximilian Ilse, Fernando Pérez-García, Valentina Salvatelli, Harshita Sharma, et al. Maira-2: Grounded radiology report generation.arXiv preprint arXiv:2406.04449, 2024

work page arXiv 2024
[55]

Maira-1: A specialised large multimodal model for radiology report generation.arXiv preprint arXiv:2311.13668, 2023

Stephanie L Hyland, Shruthi Bannur, Kenza Bouzid, Daniel C Castro, Mercy Ranjit, Anton Schwaighofer, Fernando Pérez-García, Valentina Salvatelli, Shaury Srivastav, Anja Thieme, et al. Maira-1: A specialised large multimodal model for radiology report generation.arXiv preprint arXiv:2311.13668, 2023

work page arXiv 2023
[56]

Generating radiology reports via memory-driven transformer

Zhihong Chen, Yan Song, Tsung-Hui Chang, and Xiang Wan. Generating radiology reports via memory-driven transformer. InProceedings of the 2020 conference on empirical methods in natural language processing (EMNLP), pages 1439–1449, 2020

2020
[57]

Cross-modal memory networks for radiology report generation

Zhihong Chen, Yaling Shen, Yan Song, and Xiang Wan. Cross-modal memory networks for radiology report generation. InProceedings of the 59th annual meeting of the association for computational linguistics and the 11th international joint conference on natural language processing (volume 1: long papers), pages 5904–5914, 2021

2021
[58]

A vision–language pretrained transformer for versatile clinical respiratory disease applications.Nature Biomedical Engineering, pages 1–19, 2025

Liangdi Ma, Hengrui Liang, Yuwei He, Wei Wang, Zeping Yan, Wuchao Li, Rongpin Wang, Yongyi Li, Yuerong Lizhu, Yaou Liu, et al. A vision–language pretrained transformer for versatile clinical respiratory disease applications.Nature Biomedical Engineering, pages 1–19, 2025

2025
[59]

Col- laboration between clinicians and vision–language models in radiology report generation

Ryutaro Tanno, David GT Barrett, Andrew Sellergren, Sumedh Ghaisas, Sumanth Dathathri, Abigail See, Johannes Welbl, Charles Lau, Tao Tu, Shekoofeh Azizi, et al. Col- laboration between clinicians and vision–language models in radiology report generation. Nature Medicine, 31(2):599–608, 2025

2025
[60]

A deep learning based automatic report generator for retinal optical coherence tomography images.npj Digital Medicine, 8(1):618, 2025

Xinjian Chen, Huazhu Fu, Jingtao Wang, Tian Lin, Qian Cheng, Cangxin Li, Meng Wang, Zhongyue Chen, Aidi Lin, Anlin Zhang, et al. A deep learning based automatic report generator for retinal optical coherence tomography images.npj Digital Medicine, 8(1):618, 2025

2025
[61]

Keyword-based ai assistance in the generation of radiology reports: A pilot study.NPJ Digital Medicine, 8 (1):490, 2025

Fei Dong, Shouping Nie, Manling Chen, Fangfang Xu, and Qian Li. Keyword-based ai assistance in the generation of radiology reports: A pilot study.NPJ Digital Medicine, 8 (1):490, 2025

2025
[62]

Generating synthetic data for medical imaging.Radiology, 312(3):e232471, 2024

Lennart R Koetzier, Jie Wu, Domenico Mastrodicasa, Aline Lutz, Matthew Chung, W Adam Koszek, Jayanth Pratap, Akshay S Chaudhari, Pranav Rajpurkar, Matthew P Lungren, et al. Generating synthetic data for medical imaging.Radiology, 312(3):e232471, 2024

2024
[63]

Generative ai for misalignment- resistant virtual staining to accelerate histopathology workflows.Nature Communications, 2026

Jiabo Ma, Wenqiang Li, Jinbang Li, Ziyi Liu, Linshan Wu, Fengtao Zhou, Li Liang, Ronald Cheong Kin Chan, Terence TW Wong, and Hao Chen. Generative ai for misalignment- resistant virtual staining to accelerate histopathology workflows.Nature Communications, 2026

2026
[64]

Development of a large-scale grounded vision language dataset for chest ct analysis.Scientific Data, 12(1):1636, 2025

Xiaoman Zhang, Chaoyi Wu, Ziheng Zhao, Jiayu Lei, Weiwei Tian, Ya Zhang, Weidi Xie, and Yanfeng Wang. Development of a large-scale grounded vision language dataset for chest ct analysis.Scientific Data, 12(1):1636, 2025. 29 Appendix A Supplementary Figures Esophagus: Normal. Heart: heart size increased. calcific atheroma plaques are observed in the aorta ...

2025
[65]

lower thorax: includes lower chest and lung bases
[66]

liver and biliary tree: includes liver and biliary tree (bile ducts), but NOT gallbladder
[67]

gallbladder: gallbladder only (separate from liver)
[68]

pancreas: pancreas only
[69]

adrenal glands: adrenal glands only
[70]

kidneys and ureters: includes kidneys and ureters
[71]

gastrointestinal tract: includes stomach, intestines, appendix; be careful not to miss the appendix
[72]

peritoneum: includes peritoneal space, peritoneal cavity, and abdominal wall
[73]

pelvic: includes pelvic organs, bladder, prostate and seminal vesicles, uterus and ovaries
[74]

vasculature: vasculature system
[75]

lymph nodes: lymph nodes
[76]

FIND- INGS GIVEN TO Dr

musculoskeletal: includes bones and muscles Rule 1: Remove Non-Diagnostic Text Before any other processing, you must identify and completely remove any text that represents communication between clinicians, summary codes, or procedural notes. This includes but is not limited to phrases like “FIND- INGS GIVEN TO Dr. ...”, “SUMMARY 4:...”, “END OF IMPRESSIO...
[78]

Report:{report} Processed Report (JSON):

If a region had no positive findings (i.e., it was normal, all findings were negative, or the section was not mentioned in the report), the value MUST be the string “normal”. Report:{report} Processed Report (JSON):... Table 8: Prompt used for structured information extraction from abdominal CT reports. 48 Structured Chest CT Report Extraction Prompt: Sys...
[79]

abdomen: include all findings related to liver, gallbladder, pancreas, spleen, kidneys, adrenals, gastrointestinal tract, abdominal vessels, and abdominal lymph nodes, etc
[80]

esophagus: esophagus
[81]

mediastinum: mediastinum area

Showing first 80 references.

[1] [1]

Role of computed tomography at a cancer center emergency department.Emergency radiology, 24(2):113–117, 2017

Jessyca Couto Otoni, Julia Noschang, Thábata Yaedu Okamoto, Diego Rosseman Vieira, Michel Souto Mayor Petry, Lucas de Araujo Ramos, Paula Nicole Vieira Pinto Barbosa, Almir Galvão Vieira Bitencourt, and Rubens Chojniak. Role of computed tomography at a cancer center emergency department.Emergency radiology, 24(2):113–117, 2017

2017

[2] [2]

Developments in x-ray contrast media and the potential impact on computed tomography.Investigative radiology, 55(9):592–597, 2020

Laura Schöckel, Gregor Jost, Peter Seidensticker, Philipp Lengsfeld, Petra Palkowitsch, and Hubertus Pietsch. Developments in x-ray contrast media and the potential impact on computed tomography.Investigative radiology, 55(9):592–597, 2020

2020

[3] [3]

Us diagnostic reference levels and achievable doses for 10 adult ct examinations.Radiology, 284(1):120–133, 2017

Kalpana M Kanal, Priscilla F Butler, Debapriya Sengupta, Mythreyi Bhargavan-Chatfield, Laura P Coombs, and Richard L Morin. Us diagnostic reference levels and achievable doses for 10 adult ct examinations.Radiology, 284(1):120–133, 2017

2017

[4] [4]

Radiologist productivity analytics: factors impacting abdominal pelvic ct exam reporting times.Journal of Digital Imaging, 35(2):87–97, 2022

Amar Udare, Minu Agarwal, Kiret Dhindsa, Amer Alaref, Michael Patlas, Abdullah Al- abousi, Yoan K Kagoma, and Christian B van der Pol. Radiologist productivity analytics: factors impacting abdominal pelvic ct exam reporting times.Journal of Digital Imaging, 35(2):87–97, 2022

2022

[5] [5]

The growing problem of radiologist shortage: China’s perspective.Korean Journal of Radiology, 24(11):1046, 2023

Fanyang Meng, Lan Zhan, Shiyuan Liu, and Huimao Zhang. The growing problem of radiologist shortage: China’s perspective.Korean Journal of Radiology, 24(11):1046, 2023

2023

[6] [6]

Clinically applicable ai system for accurate diagnosis, quantitativemeasurements, andprognosisofcovid-19pneumoniausingcomputed tomography.Cell, 181(6):1423–1433, 2020

Kang Zhang, Xiaohong Liu, Jun Shen, Zhihuan Li, Ye Sang, Xingwang Wu, Yunfei Zha, Wenhua Liang, Chengdi Wang, Ke Wang, et al. Clinically applicable ai system for accurate diagnosis, quantitativemeasurements, andprognosisofcovid-19pneumoniausingcomputed tomography.Cell, 181(6):1423–1433, 2020

2020

[7] [7]

Ai-based large-scale screening of gastric cancer from noncontrast ct imaging.Nature Medicine, 31(9):3011–3019, 2025

Can Hu, Yingda Xia, Zhilin Zheng, Mengxuan Cao, Guoliang Zheng, Shangqi Chen, Jiancheng Sun, Wujie Chen, Qi Zheng, Siwei Pan, et al. Ai-based large-scale screening of gastric cancer from noncontrast ct imaging.Nature Medicine, 31(9):3011–3019, 2025

2025

[8] [8]

Ai-based diagnosis of acute aortic syndrome from noncontrast ct.Nature Medicine, 31(11):3832–3844, 2025

Yujian Hu, Yilang Xiang, Yan-Jie Zhou, Yangyan He, Dehai Lang, Shifeng Yang, Xiaolong Du, Chunlan Den, Youyao Xu, Gaofeng Wang, et al. Ai-based diagnosis of acute aortic syndrome from noncontrast ct.Nature Medicine, 31(11):3832–3844, 2025

2025

[9] [9]

Large-scale pancreatic cancer detection via non- contrast ct and deep learning.Nature medicine, 29(12):3033–3043, 2023

Kai Cao, Yingda Xia, Jiawen Yao, Xu Han, Lukas Lambert, Tingting Zhang, Wei Tang, Gang Jin, Hui Jiang, Xu Fang, et al. Large-scale pancreatic cancer detection via non- contrast ct and deep learning.Nature medicine, 29(12):3033–3043, 2023

2023

[10] [10]

Thecurrentstatusandfutureoffda-approvedartificialintelligence tools in chest radiology in the united states.Clinical Radiology, 78(2):115–122, 2023

MEMilamandCWKoo. Thecurrentstatusandfutureoffda-approvedartificialintelligence tools in chest radiology in the united states.Clinical Radiology, 78(2):115–122, 2023

2023

[11] [11]

Towards a holistic frame- workformultimodalllmin3dbrainctradiologyreportgeneration.Nature Communications, 16(1):2258, 2025

Cheng-Yi Li, Kao-Jung Chang, Cheng-Fu Yang, Hsin-Yu Wu, Wenting Chen, Hritik Bansal, Ling Chen, Yi-Ping Yang, Yu-Chun Chen, Shih-Pin Chen, et al. Towards a holistic frame- workformultimodalllmin3dbrainctradiologyreportgeneration.Nature Communications, 16(1):2258, 2025

2025

[12] [12]

Large language model with region- guided referring and grounding for ct report generation.IEEE Transactions on Medical Imaging, 2025

Zhixuan Chen, Yequan Bie, Haibo Jin, and Hao Chen. Large language model with region- guided referring and grounding for ct report generation.IEEE Transactions on Medical Imaging, 2025

2025

[13] [13]

Ct2rep: Automated radiology report generation for 3d medical imaging

Ibrahim Ethem Hamamci, Sezgin Er, and Bjoern Menze. Ct2rep: Automated radiology report generation for 3d medical imaging. InInternational Conference on Medical Image Computing and Computer-Assisted Intervention, pages 476–486. Springer, 2024. 25

2024

[14] [14]

Dia-llama: Towards large lan- guage model-driven ct report generation

Zhixuan Chen, Luyang Luo, Yequan Bie, and Hao Chen. Dia-llama: Towards large lan- guage model-driven ct report generation. InInternational Conference on Medical Image Computing and Computer-Assisted Intervention, pages 141–151. Springer, 2025

2025

[15] [15]

Mvketr: chest ct report generation with multi-view perception and knowledge en- hancement.IEEE Journal of Biomedical and Health Informatics, 2025

Xiwei Deng, Xianchun He, Jianfeng Bao, Yudan Zhou, Shuhui Cai, Congbo Cai, and Zhong Chen. Mvketr: chest ct report generation with multi-view perception and knowledge en- hancement.IEEE Journal of Biomedical and Health Informatics, 2025

2025

[16] [16]

Ct-agrg: Automated abnormality-guided report generation from 3d chest ct volumes

Theo Di Piazza, Carole Lazarus, Olivier Nempont, and Loic Boussel. Ct-agrg: Automated abnormality-guided report generation from 3d chest ct volumes. In2025 IEEE 22nd Inter- national Symposium on Biomedical Imaging (ISBI), pages 01–05. IEEE, 2025

2025

[17] [17]

Ct- graph: Hierarchical graph attention network for anatomy-guided ct report generation

Hamza Kalisch, Fabian Hörst, Jens Kleesiek, Ken Herrmann, and Constantin Seibold. Ct- graph: Hierarchical graph attention network for anatomy-guided ct report generation. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 6775– 6784, 2025

2025

[18] [18]

Better tokens for better 3d: Advancing vision-language modeling in 3d medical imaging.Advances in Neural Information Processing Systems, 38:135074–135102, 2026

Ibrahim Ethem Hamamci, Sezgin Er, Suprosanna Shit, Hadrien Reynaud, Dong Yang, Pengfei Guo, Marc Edgar, Daguang Xu, Bernhard Kainz, and Bjoern Menze. Better tokens for better 3d: Advancing vision-language modeling in 3d medical imaging.Advances in Neural Information Processing Systems, 38:135074–135102, 2026

2026

[19] [19]

3d-ct-gpt: Generating 3d radiology reports through integration of large vision-language models.arXiv preprint arXiv:2409.19330, 2024

Hao Chen, Wei Zhao, Yingli Li, Tianyang Zhong, Yisong Wang, Youlan Shang, Lei Guo, Junwei Han, Tianming Liu, Jun Liu, et al. 3d-ct-gpt: Generating 3d radiology reports through integration of large vision-language models.arXiv preprint arXiv:2409.19330, 2024

work page arXiv 2024

[20] [20]

Automated structured radiology report generation

Jean-Benoit Delbrouck, Justin Xu, Johannes Moll, Alois Thomas, Zhihong Chen, Sophie Ostmeier, Asfandyar Azhar, Kelvin Zhenghao Li, Andrew Johnston, Christian Bluethgen, et al. Automated structured radiology report generation. InProceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 26813–26829, 2025

2025

[21] [21]

GPT-4 Technical Report

Josh Achiam, Steven Adler, Sandhini Agarwal, Lama Ahmad, Ilge Akkaya, Florencia Leoni Aleman, Diogo Almeida, Janko Altenschmidt, Sam Altman, Shyamal Anadkat, et al. Gpt-4 technical report.arXiv preprint arXiv:2303.08774, 2023

work page internal anchor Pith review Pith/arXiv arXiv 2023

[22] [22]

Gemini: A Family of Highly Capable Multimodal Models

Gemini Team, Rohan Anil, Sebastian Borgeaud, Jean-Baptiste Alayrac, Jiahui Yu, Radu Soricut, Johan Schalkwyk, Andrew M Dai, Anja Hauth, Katie Millican, et al. Gemini: a family of highly capable multimodal models.arXiv preprint arXiv:2312.11805, 2023

work page internal anchor Pith review Pith/arXiv arXiv 2023

[23] [23]

Towards generalist foundation model for radiology by leveraging web-scale 2d&3d medical data

Chaoyi Wu, Xiaoman Zhang, Ya Zhang, Hui Hui, Yanfeng Wang, and Weidi Xie. Towards generalist foundation model for radiology by leveraging web-scale 2d&3d medical data. Nature Communications, 16(1):7866, 2025

2025

[24] [24]

Bai, et al., M3D: Advancing 3D medical image analysis with multi- modal large language modelsArXiv:2404.00578 (2024)

Fan Bai, Yuxin Du, Tiejun Huang, Max Q-H Meng, and Bo Zhao. M3d: Advanc- ing 3d medical image analysis with multi-modal large language models.arXiv preprint arXiv:2404.00578, 2024

work page arXiv 2024

[25] [25]

InInternational Conference on Medical Image Com- puting and Computer-Assisted Intervention, pages 268–277

Songtao Jiang, Yuan Wang, Sibo Song, Tianxiang Hu, Chenyi Zhou, Bin Pu, Yan Zhang, Zhibo Yang, Yang Feng, Joey Tianyi Zhou, et al. Hulu-med: A transparent generalist model towards holistic medical vision-language understanding.arXiv preprint arXiv:2510.08668, 2025

work page arXiv 2025

[26] [26]

Large language models improve transferability of electronic health record-based predictions across countries and coding systems.npj Digital Medicine, 2026

Matthias Kirchler, Matteo Ferro, Veronica Lorenzini, Robin P van de Water, FinnGen Ganna Andrea 3, Christoph Lippert, and Andrea Ganna. Large language models improve transferability of electronic health record-based predictions across countries and coding systems.npj Digital Medicine, 2026. 26

2026

[27] [27]

Benchmark evaluation of deepseek large language models in clinical decision-making.Nature medicine, 31(8):2546–2549, 2025

Sarah Sandmann, Stefan Hegselmann, Michael Fujarski, Lucas Bickmann, Benjamin Wild, Roland Eils, and Julian Varghese. Benchmark evaluation of deepseek large language models in clinical decision-making.Nature medicine, 31(8):2546–2549, 2025

2025

[28] [28]

Com- parative benchmarking of the deepseek large language model on medical tasks and clinical reasoning.Nature medicine, 31(8):2550–2555, 2025

Mickael Tordjman, Zelong Liu, Murat Yuce, Valentin Fauveau, Yunhao Mei, Jerome Had- jadj, Ian Bolger, Haidara Almansour, Carolyn Horst, Ashwin Singh Parihar, et al. Com- parative benchmarking of the deepseek large language model on medical tasks and clinical reasoning.Nature medicine, 31(8):2550–2555, 2025

2025

[29] [29]

Zero-shot information extraction from radiological reports using chatgpt.International Journal of Medical Infor- matics, 183:105321, 2024

Danqing Hu, Bing Liu, Xiaofeng Zhu, Xudong Lu, and Nan Wu. Zero-shot information extraction from radiological reports using chatgpt.International Journal of Medical Infor- matics, 183:105321, 2024

2024

[30] [30]

Clinical entity augmented retrieval for clinical information extraction.NPJ digital medicine, 8(1):45, 2025

Ivan Lopez, Akshay Swaminathan, Karthik Vedula, Sanjana Narayanan, Fateme Nateghi Haredasht, Stephen P Ma, April S Liang, Steven Tate, Manoj Maddali, Robert Joseph Gallo, et al. Clinical entity augmented retrieval for clinical information extraction.NPJ digital medicine, 8(1):45, 2025

2025

[31] [31]

Merlin: A vision language foundation model for 3d computed tomography

Louis Blankemeier, Joseph Paul Cohen, Ashwin Kumar, Dave Van Veen, Syed Jamal Safdar Gardezi, Magdalini Paschali, Zhihong Chen, Jean-Benoit Delbrouck, Eduardo Reis, Cesar Truyts, et al. Merlin: A vision language foundation model for 3d computed tomography. Research Square, pages rs–3, 2024

2024

[32] [32]

Pragmatic radiology report gener- ation

Dang Nguyen, Chacha Chen, He He, and Chenhao Tan. Pragmatic radiology report gener- ation. InMachine Learning for Health (ML4H), pages 385–402. PMLR, 2023

2023

[33] [33]

Deepseek-r1 incentivizes reasoning in llms through reinforcement learning.Nature, 645(8081):633–638, 2025

Daya Guo, Dejian Yang, Haowei Zhang, Junxiao Song, Peiyi Wang, Qihao Zhu, Runxin Xu, Ruoyu Zhang, Shirong Ma, Xiao Bi, et al. Deepseek-r1 incentivizes reasoning in llms through reinforcement learning.Nature, 645(8081):633–638, 2025

2025

[34] [34]

Radgpt: Constructing 3d image-text tumor datasets

Pedro RAS Bassi, Mehmet Can Yavuz, Ibrahim Ethem Hamamci, Sezgin Er, Xiaoxi Chen, Wenxuan Li, Bjoern Menze, Sergio Decherchi, Andrea Cavalli, Kang Wang, et al. Radgpt: Constructing 3d image-text tumor datasets. InProceedings of the IEEE/CVF International Conference on Computer Vision, pages 23720–23730, 2025

2025

[35] [35]

A foundation model utilizing chest ct volumes and radiology reports for supervised-level zero-shot detection of abnormalities.CoRR, 2024

Ibrahim Ethem Hamamci, Sezgin Er, Furkan Almas, Ayse Gulnihan Simsek, Sevval Nil Esir- gun, Irem Dogan, Muhammed Furkan Dasdelen, Bastian Wittmann, Enis Simsar, Mehmet Simsar, et al. A foundation model utilizing chest ct volumes and radiology reports for supervised-level zero-shot detection of abnormalities.CoRR, 2024

2024

[36] [36]

Inspect: a multimodal dataset for pulmonary embolism diagnosis and prognosis.arXiv preprint arXiv:2311.10798, 2023

Shih-ChengHuang, ZepengHuo, EthanSteinberg, Chia-ChunChiang, MatthewPLungren, Curtis P Langlotz, Serena Yeung, Nigam H Shah, and Jason A Fries. Inspect: a multimodal dataset for pulmonary embolism diagnosis and prognosis.arXiv preprint arXiv:2311.10798, 2023

work page arXiv 2023

[37] [37]

Bimcv-r: A land- mark dataset for 3d ct text-image retrieval

Yinda Chen, Che Liu, Xiaoyu Liu, Rossella Arcucci, and Zhiwei Xiong. Bimcv-r: A land- mark dataset for 3d ct text-image retrieval. InInternational Conference on Medical Image Computing and Computer-Assisted Intervention, pages 124–134. Springer, 2024

2024

[38] [38]

Ratescore: A metric for radiology report generation

Weike Zhao, Chaoyi Wu, Xiaoman Zhang, Ya Zhang, Yanfeng Wang, and Weidi Xie. Ratescore: A metric for radiology report generation. InProceedings of the 2024 Conference on Empirical Methods in Natural Language Processing, pages 15004–15019, 2024

2024

[39] [39]

Qwen3 Embedding: Advancing Text Embedding and Reranking Through Foundation Models

Yanzhao Zhang, Mingxin Li, Dingkun Long, Xin Zhang, Huan Lin, Baosong Yang, Pengjun Xie, An Yang, Dayiheng Liu, Junyang Lin, et al. Qwen3 embedding: Advancing text embedding and reranking through foundation models.arXiv preprint arXiv:2506.05176, 2025. 27

work page internal anchor Pith review Pith/arXiv arXiv 2025

[40] [40]

Visualizing data using t-sne.Journal of machine learning research, 9(11), 2008

Laurens Van der Maaten and Geoffrey Hinton. Visualizing data using t-sne.Journal of machine learning research, 9(11), 2008

2008

[41] [41]

Qwen3 Technical Report

An Yang, Anfeng Li, Baosong Yang, Beichen Zhang, Binyuan Hui, Bo Zheng, Bowen Yu, Chang Gao, Chengen Huang, Chenxu Lv, et al. Qwen3 technical report.arXiv preprint arXiv:2505.09388, 2025

work page internal anchor Pith review Pith/arXiv arXiv 2025

[42] [42]

Lingshu: A Generalist Foundation Model for Unified Multimodal Medical Understanding and Reasoning

Weiwen Xu, Hou Pong Chan, Long Li, Mahani Aljunied, Ruifeng Yuan, Jianyu Wang, Chenghao Xiao, Guizhen Chen, Chaoqun Liu, Zhaodonghui Li, et al. Lingshu: A general- ist foundation model for unified multimodal medical understanding and reasoning.arXiv preprint arXiv:2506.07044, 2025

work page internal anchor Pith review Pith/arXiv arXiv 2025

[43] [43]

MedGemma Technical Report

Andrew Sellergren, Sahar Kazemzadeh, Tiam Jaroensri, Atilla Kiraly, Madeleine Traverse, Timo Kohlberger, Shawn Xu, Fayaz Jamil, Cían Hughes, Charles Lau, et al. Medgemma technical report.arXiv preprint arXiv:2507.05201, 2025

work page internal anchor Pith review Pith/arXiv arXiv 2025

[44] [44]

R2gengpt: Radiology report generation with frozen llms.Meta-Radiology, 1(3):100033, 2023

Zhanyu Wang, Lingqiao Liu, Lei Wang, and Luping Zhou. R2gengpt: Radiology report generation with frozen llms.Meta-Radiology, 1(3):100033, 2023

2023

[45] [45]

A clinically accessible small multimodal radiology model and evaluation metric for chest x- ray findings.Nature Communications, 16(1):3108, 2025

Juan Manuel Zambrano Chaves, Shih-Cheng Huang, Yanbo Xu, Hanwen Xu, Naoto Usuyama, Sheng Zhang, Fei Wang, Yujia Xie, Mahmoud Khademi, Ziyi Yang, et al. A clinically accessible small multimodal radiology model and evaluation metric for chest x- ray findings.Nature Communications, 16(1):3108, 2025

2025

[46] [46]

3d-rad: A comprehensive 3d radiology med-vqa dataset with multi-temporal analysis and diverse diagnostic tasks.arXiv preprint arXiv:2506.11147, 2025

Xiaotang Gai, Jiaxiang Liu, Yichen Li, Zijie Meng, Jian Wu, and Zuozhu Liu. 3d-rad: A comprehensive 3d radiology med-vqa dataset with multi-temporal analysis and diverse diagnostic tasks.arXiv preprint arXiv:2506.11147, 2025

work page arXiv 2025

[47] [47]

Work like a doctor: Unifying scan localizer and dynamic generator for automated computed tomography report generation

Yuhao Tang, Haichen Yang, Liyan Zhang, and Ye Yuan. Work like a doctor: Unifying scan localizer and dynamic generator for automated computed tomography report generation. Expert Systems with Applications, 237:121442, 2024

2024

[48] [48]

Lung cancer screening with low-dose helical ct: results from the national lung screening trial (nlst), 2011

Barnett S Kramer, Christine D Berg, Denise R Aberle, and Philip C Prorok. Lung cancer screening with low-dose helical ct: results from the national lung screening trial (nlst), 2011

2011

[49] [49]

Machine-learning-basedmultipleabnormalityprediction with large-scale chest computed tomography volumes.Medical image analysis, 67:101857, 2021

Rachel Lea Draelos, David Dov, Maciej A Mazurowski, Joseph Y Lo, Ricardo Henao, Geof- freyDRubin, andLawrenceCarin. Machine-learning-basedmultipleabnormalityprediction with large-scale chest computed tomography volumes.Medical image analysis, 67:101857, 2021

2021

[50] [50]

The rsna pulmonary embolism ct dataset.Radiology: Artificial Intelligence, 3 (2):e200254, 2021

Errol Colak, Felipe C Kitamura, Stephen B Hobbs, Carol C Wu, Matthew P Lungren, Luciano M Prevedello, Jayashree Kalpathy-Cramer, Robyn L Ball, George Shih, Anouk Stein, et al. The rsna pulmonary embolism ct dataset.Radiology: Artificial Intelligence, 3 (2):e200254, 2021

2021

[51] [51]

Bleu: a method for automatic evaluation of machine translation

Kishore Papineni, Salim Roukos, Todd Ward, and Wei-Jing Zhu. Bleu: a method for automatic evaluation of machine translation. InProceedings of the 40th annual meeting of the Association for Computational Linguistics, pages 311–318, 2002

2002

[52] [52]

Rouge: A package for automatic evaluation of summaries

Chin-Yew Lin. Rouge: A package for automatic evaluation of summaries. InText summa- rization branches out, pages 74–81, 2004

2004

[53] [53]

Meteor: An automatic metric for mt evaluation with improved correlation with human judgments

Satanjeev Banerjee and Alon Lavie. Meteor: An automatic metric for mt evaluation with improved correlation with human judgments. InProceedings of the acl workshop on intrinsic and extrinsic evaluation measures for machine translation and/or summarization, pages 65– 72, 2005. 28

2005

[54] [54]

Maira-2: Grounded radiology report generation

Shruthi Bannur, Kenza Bouzid, Daniel C Castro, Anton Schwaighofer, Anja Thieme, Sam Bond-Taylor, Maximilian Ilse, Fernando Pérez-García, Valentina Salvatelli, Harshita Sharma, et al. Maira-2: Grounded radiology report generation.arXiv preprint arXiv:2406.04449, 2024

work page arXiv 2024

[55] [55]

Maira-1: A specialised large multimodal model for radiology report generation.arXiv preprint arXiv:2311.13668, 2023

Stephanie L Hyland, Shruthi Bannur, Kenza Bouzid, Daniel C Castro, Mercy Ranjit, Anton Schwaighofer, Fernando Pérez-García, Valentina Salvatelli, Shaury Srivastav, Anja Thieme, et al. Maira-1: A specialised large multimodal model for radiology report generation.arXiv preprint arXiv:2311.13668, 2023

work page arXiv 2023

[56] [56]

Generating radiology reports via memory-driven transformer

Zhihong Chen, Yan Song, Tsung-Hui Chang, and Xiang Wan. Generating radiology reports via memory-driven transformer. InProceedings of the 2020 conference on empirical methods in natural language processing (EMNLP), pages 1439–1449, 2020

2020

[57] [57]

Cross-modal memory networks for radiology report generation

Zhihong Chen, Yaling Shen, Yan Song, and Xiang Wan. Cross-modal memory networks for radiology report generation. InProceedings of the 59th annual meeting of the association for computational linguistics and the 11th international joint conference on natural language processing (volume 1: long papers), pages 5904–5914, 2021

2021

[58] [58]

A vision–language pretrained transformer for versatile clinical respiratory disease applications.Nature Biomedical Engineering, pages 1–19, 2025

Liangdi Ma, Hengrui Liang, Yuwei He, Wei Wang, Zeping Yan, Wuchao Li, Rongpin Wang, Yongyi Li, Yuerong Lizhu, Yaou Liu, et al. A vision–language pretrained transformer for versatile clinical respiratory disease applications.Nature Biomedical Engineering, pages 1–19, 2025

2025

[59] [59]

Col- laboration between clinicians and vision–language models in radiology report generation

Ryutaro Tanno, David GT Barrett, Andrew Sellergren, Sumedh Ghaisas, Sumanth Dathathri, Abigail See, Johannes Welbl, Charles Lau, Tao Tu, Shekoofeh Azizi, et al. Col- laboration between clinicians and vision–language models in radiology report generation. Nature Medicine, 31(2):599–608, 2025

2025

[60] [60]

A deep learning based automatic report generator for retinal optical coherence tomography images.npj Digital Medicine, 8(1):618, 2025

Xinjian Chen, Huazhu Fu, Jingtao Wang, Tian Lin, Qian Cheng, Cangxin Li, Meng Wang, Zhongyue Chen, Aidi Lin, Anlin Zhang, et al. A deep learning based automatic report generator for retinal optical coherence tomography images.npj Digital Medicine, 8(1):618, 2025

2025

[61] [61]

Keyword-based ai assistance in the generation of radiology reports: A pilot study.NPJ Digital Medicine, 8 (1):490, 2025

Fei Dong, Shouping Nie, Manling Chen, Fangfang Xu, and Qian Li. Keyword-based ai assistance in the generation of radiology reports: A pilot study.NPJ Digital Medicine, 8 (1):490, 2025

2025

[62] [62]

Generating synthetic data for medical imaging.Radiology, 312(3):e232471, 2024

Lennart R Koetzier, Jie Wu, Domenico Mastrodicasa, Aline Lutz, Matthew Chung, W Adam Koszek, Jayanth Pratap, Akshay S Chaudhari, Pranav Rajpurkar, Matthew P Lungren, et al. Generating synthetic data for medical imaging.Radiology, 312(3):e232471, 2024

2024

[63] [63]

Generative ai for misalignment- resistant virtual staining to accelerate histopathology workflows.Nature Communications, 2026

Jiabo Ma, Wenqiang Li, Jinbang Li, Ziyi Liu, Linshan Wu, Fengtao Zhou, Li Liang, Ronald Cheong Kin Chan, Terence TW Wong, and Hao Chen. Generative ai for misalignment- resistant virtual staining to accelerate histopathology workflows.Nature Communications, 2026

2026

[64] [64]

Development of a large-scale grounded vision language dataset for chest ct analysis.Scientific Data, 12(1):1636, 2025

Xiaoman Zhang, Chaoyi Wu, Ziheng Zhao, Jiayu Lei, Weiwei Tian, Ya Zhang, Weidi Xie, and Yanfeng Wang. Development of a large-scale grounded vision language dataset for chest ct analysis.Scientific Data, 12(1):1636, 2025. 29 Appendix A Supplementary Figures Esophagus: Normal. Heart: heart size increased. calcific atheroma plaques are observed in the aorta ...

2025

[65] [65]

lower thorax: includes lower chest and lung bases

[66] [66]

liver and biliary tree: includes liver and biliary tree (bile ducts), but NOT gallbladder

[67] [67]

gallbladder: gallbladder only (separate from liver)

[68] [68]

pancreas: pancreas only

[69] [69]

adrenal glands: adrenal glands only

[70] [70]

kidneys and ureters: includes kidneys and ureters

[71] [71]

gastrointestinal tract: includes stomach, intestines, appendix; be careful not to miss the appendix

[72] [72]

peritoneum: includes peritoneal space, peritoneal cavity, and abdominal wall

[73] [73]

pelvic: includes pelvic organs, bladder, prostate and seminal vesicles, uterus and ovaries

[74] [74]

vasculature: vasculature system

[75] [75]

lymph nodes: lymph nodes

[76] [76]

FIND- INGS GIVEN TO Dr

musculoskeletal: includes bones and muscles Rule 1: Remove Non-Diagnostic Text Before any other processing, you must identify and completely remove any text that represents communication between clinicians, summary codes, or procedural notes. This includes but is not limited to phrases like “FIND- INGS GIVEN TO Dr. ...”, “SUMMARY 4:...”, “END OF IMPRESSIO...

[77] [78]

Report:{report} Processed Report (JSON):

If a region had no positive findings (i.e., it was normal, all findings were negative, or the section was not mentioned in the report), the value MUST be the string “normal”. Report:{report} Processed Report (JSON):... Table 8: Prompt used for structured information extraction from abdominal CT reports. 48 Structured Chest CT Report Extraction Prompt: Sys...

[78] [79]

abdomen: include all findings related to liver, gallbladder, pancreas, spleen, kidneys, adrenals, gastrointestinal tract, abdominal vessels, and abdominal lymph nodes, etc

[79] [80]

esophagus: esophagus

[80] [81]

mediastinum: mediastinum area