A Vision-language Framework for Comparative Reasoning in Radiology
Pith reviewed 2026-06-28 01:39 UTC · model grok-4.3
The pith
Radiology comparison can be learned as entity-aware cross-image reasoning from routine clinical reports at scale.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
By decomposing radiology reports into anatomical structures, abnormal findings, and pathological conditions, entity-aware models can be trained on more than 690,000 images to perform controllable retrieval of clinically analogous cases and to generate accurate interpretations of temporal change, with consistent gains over baselines in both retrieval recall and longitudinal accuracy across modalities and institutions.
What carries the argument
Entity-conditioned retrieval and generation, where an encoder conditions on report-derived entities to select reference cases and to produce comparative visual question answers.
If this is right
- MedReCo achieves the highest Recall@1 across all 12 internal retrieval settings and raises external retrieval by an average of 6.0 percentage points.
- MedReCo-VLM raises longitudinal follow-up accuracy by 14.5-46.5 percentage points on chest radiographs and 13.0-27.9 percentage points on CT.
- Performance remains superior in clinically confusable differential diagnosis groups.
- The same entity-decomposition pipeline works across eight institutions, four countries, and seven imaging modalities.
Where Pith is reading between the lines
- The same decomposition-plus-conditioning pattern could be tested on non-radiology image-report corpora such as pathology slides or ophthalmology photographs.
- If the entity labels prove noisy, hybrid human-AI verification loops on a small fraction of reports might restore accuracy while retaining most of the scale advantage.
- The framework supplies a concrete route to measure how much additional clinical alignment is gained by explicit comparison modeling versus single-image interpretation.
Load-bearing premise
Automatic decomposition of reports into anatomical structures, abnormal findings, and pathological conditions supplies sufficiently accurate and unbiased entity labels for model supervision.
What would settle it
If a manually verified subset of decomposed reports shows low entity-label accuracy and retraining on the corrected labels eliminates the reported performance gains, the central claim would be falsified.
read the original abstract
Medical imaging artificial intelligence has achieved strong performance in isolated image interpretation, but remains poorly aligned with radiological practice, where diagnosis and follow-up rely on comparison across prior studies and analogous reference cases. Here we formulate radiological comparison as an entity-aware cross-image reasoning problem and introduce a framework that supports both reference-case retrieval and temporal comparative interpretation. We construct MedReCo-DB, a large-scale comparative imaging resource derived from routine image-report pairs, comprising more than 690,000 images from over 160,000 patients across eight institutions, four countries and seven imaging modalities. Reports are decomposed into anatomical structures, abnormal findings and pathological conditions to provide supervision for entity-conditioned retrieval and comparative visual question answering. Using this resource, we develop MedReCo, an entity-aware visual encoder for controllable retrieval of clinically analogous cases, and MedReCo-VLM, a vision--language extension for generative interpretation of interval change. Across internal, external and cross-center evaluations, MedReCo achieved the highest Recall@1 in all 12 internal retrieval settings and improved external retrieval by a mean of 6.0 percentage points. In clinically confusable differential groups, it consistently outperformed the strongest baselines. MedReCo-VLM achieved the best performance across all comparative generation evaluations and improved longitudinal follow-up accuracy by 14.5-46.5 percentage points on chest radiographs and 13.0-27.9 percentage points on CT. These findings suggest that entity-aware comparative reasoning can be learned from routine clinical data at scale and may provide a more clinically aligned foundation for medical imaging AI.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper formulates radiological comparison as an entity-aware cross-image reasoning task and introduces MedReCo-DB (>690k images from >160k patients across 8 institutions) derived from routine image-report pairs. Reports are automatically decomposed into anatomical structures, abnormal findings, and pathological conditions to supervise entity-conditioned retrieval (MedReCo) and comparative VQA/generation (MedReCo-VLM). The work claims state-of-the-art Recall@1 across all 12 internal settings, +6.0 pp mean external retrieval improvement, and large gains (13.0-46.5 pp) in longitudinal follow-up accuracy on chest radiographs and CT.
Significance. If the entity labels prove reliable, the scale and multi-center construction of MedReCo-DB together with the entity-aware models would represent a substantive step toward clinically aligned comparative reasoning in medical imaging AI, moving beyond isolated interpretation. The multi-institutional, multi-modality scope and held-out evaluations are positive features.
major comments (2)
- [Abstract] Abstract: the central supervision signal is created by automatic decomposition of reports into anatomical structures, abnormal findings, and pathological conditions, yet the manuscript provides no accuracy metrics, inter-annotator agreement, human validation study, or error analysis for this decomposition step. Because every reported gain (Recall@1, longitudinal accuracy) is conditioned on these labels, the absence of validation directly undermines the claim that the models have learned genuine entity-aware comparative reasoning rather than artifacts of the extraction process.
- [Abstract] Abstract and methods description: the abstract asserts consistent outperformance on internal, external, and cross-center tests with specific percentage gains, but supplies no ablation studies, statistical tests, error bars, or dataset-construction validation (e.g., confirmation that the held-out splits preserve entity distributions). These omissions make it impossible to determine whether the reported improvements are robust or sensitive to the particular decomposition pipeline.
Simulated Author's Rebuttal
We thank the referee for the constructive feedback highlighting the need for validation of the report decomposition and additional robustness checks. We address each major comment below and will incorporate the suggested additions in the revised manuscript.
read point-by-point responses
-
Referee: [Abstract] Abstract: the central supervision signal is created by automatic decomposition of reports into anatomical structures, abnormal findings, and pathological conditions, yet the manuscript provides no accuracy metrics, inter-annotator agreement, human validation study, or error analysis for this decomposition step. Because every reported gain (Recall@1, longitudinal accuracy) is conditioned on these labels, the absence of validation directly undermines the claim that the models have learned genuine entity-aware comparative reasoning rather than artifacts of the extraction process.
Authors: We acknowledge that the manuscript does not report quantitative validation metrics, inter-annotator agreement, or error analysis for the automatic decomposition step. While the pipeline builds on established medical NLP methods and the overall scale provides indirect support, we agree this leaves the entity-aware claims open to the concern raised. In revision we will add a new subsection with a human validation study on a stratified sample of 1,000 reports (two radiologists per report), reporting per-entity accuracy, Cohen's kappa, and a categorized error analysis. This will allow readers to assess whether gains reflect genuine reasoning. revision: yes
-
Referee: [Abstract] Abstract and methods description: the abstract asserts consistent outperformance on internal, external, and cross-center tests with specific percentage gains, but supplies no ablation studies, statistical tests, error bars, or dataset-construction validation (e.g., confirmation that the held-out splits preserve entity distributions). These omissions make it impossible to determine whether the reported improvements are robust or sensitive to the particular decomposition pipeline.
Authors: The manuscript contains component ablations and multi-center held-out results, yet we agree that statistical significance testing, error bars across runs, and explicit verification that entity distributions are preserved in the splits are not reported. In the revision we will add: (i) results from three random seeds with standard-error bars, (ii) paired statistical tests (McNemar or t-tests) for all key comparisons, and (iii) a table comparing entity-type frequencies and KL divergence between training and test partitions. These additions will directly demonstrate robustness independent of any single decomposition run. revision: yes
Circularity Check
No circularity: empirical dataset construction and held-out evaluation
full rationale
The paper constructs MedReCo-DB from routine image-report pairs across multiple institutions, decomposes reports to create entity labels for supervision, trains MedReCo and MedReCo-VLM, and reports performance on internal/external/cross-center held-out evaluations. No step reduces a claimed prediction or result to a fitted parameter, self-citation chain, or input by construction; the central claims rest on new data collection and measurable improvements on unseen cases rather than definitional equivalence.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption Radiology reports can be decomposed into anatomical structures, abnormal findings and pathological conditions to yield reliable supervision signals for entity-conditioned tasks.
Reference graph
Works this paper leans on
-
[1]
Content-based image retrieval in radiology: current status and future directions.Journal of digital imaging, 24(2):208–222, 2011
Ceyhun Burak Akgül, Daniel L Rubin, Sandy Napel, Christopher F Beaulieu, Hayit Greenspan, and Burak Acar. Content-based image retrieval in radiology: current status and future directions.Journal of digital imaging, 24(2):208–222, 2011
2011
-
[2]
Introducing claude 4, 2025
Anthropic. Introducing claude 4, 2025
2025
-
[3]
Meteor: An automatic metric for mt evaluation with improved correlation with human judgments
Satanjeev Banerjee and Alon Lavie. Meteor: An automatic metric for mt evaluation with improved correlation with human judgments. In Proceedings of the acl workshop on intrinsic and extrinsic evaluation measures for machine translation and/or summarization, pages 65–72, 2005
2005
-
[4]
Shruthi Bannur, Kenza Bouzid, Daniel C. Castro, Anton Schwaighofer, Anja Thieme, Sam Bond-Taylor, Maximilian Ilse, Fernando Pérez-García, Valentina Salvatelli, Harshita Sharma, et al. Maira-2: Grounded radiology report generation.arXiv preprint arXiv:2406.04449, 2024
-
[5]
Pierre Chambon, Jean-Benoit Delbrouck, Thomas Sounack, Shih-Cheng Huang, Zhihong Chen, Maya Varma, Steven QH Truong, Chu The Chuong, and Curtis P Langlotz. Chexpert plus: Augmenting a large chest x-ray dataset with text radiology reports, patient demographics and additional image formats. arXiv preprint arXiv:2405.19538, 2024
-
[6]
Bimcv-r: A landmark dataset for 3d ct text-image retrieval
Yinda Chen, Che Liu, Xiaoyu Liu, Rossella Arcucci, and Zhiwei Xiong. Bimcv-r: A landmark dataset for 3d ct text-image retrieval. InInternationalconferenceonmedicalimagecomputingandcomputer-assisted intervention, pages 124–134. Springer, 2024
2024
-
[7]
Generating radiology reports via memory- driven transformer
Zhihong Chen, Yan Song, Tsung-Hui Chang, and Xiang Wan. Generating radiology reports via memory- driven transformer. In Proceedings of the Conference on Empirical Methods in Natural Language Processing, Nov. 2020
2020
-
[8]
Zhihong Chen, Maya Varma, Jean-Benoit Delbrouck, Magdalini Paschali, Louis Blankemeier, Dave Van Veen, Jeya Maria Jose Valanarasu, Alaa Youssef, Joseph Paul Cohen, Eduardo Pontes Reis, et al. Chexagent: Towards a foundation model for chest x-ray interpretation.arXiv preprint arXiv:2401.12208, 2024
-
[9]
Content-based image retrieval by using deep learning for interstitial lung disease diagnosis with chest ct.Radiology, 302(1):187–197, 2022
Jooae Choe, Hye Jeon Hwang, Joon Beom Seo, Sang Min Lee, Jihye Yun, Min-Ju Kim, Jewon Jeong, Youngsoo Lee, Kiok Jin, Rohee Park, Jihoon Kim, Howook Jeon, Namkug Kim, Jaeyoun Yi, Donghoon Yu, and Byeongsoo Kim. Content-based image retrieval by using deep learning for interstitial lung disease diagnosis with chest ct.Radiology, 302(1):187–197, 2022
2022
-
[10]
Controllable chest x-ray report generation from longitudinal representations
Francesco Dalla Serra, Chaoyang Wang, Fani Deligianni, Jeff Dalton, and Alison O’Neil. Controllable chest x-ray report generation from longitudinal representations. In Findings of the Association for Computational Linguistics: EMNLP 2023, pages 4891–4904, 2023
2023
-
[11]
Radgraph-xl: A large-scale expert-annotated dataset for entity and relation extraction from radiology reports
Jean-Benoit Delbrouck, Pierre Chambon, Zhihong Chen, Maya Varma, Andrew Johnston, Louis Blanke- meier, Dave Van Veen, Tan Bui, Steven Truong, and Curtis Langlotz. Radgraph-xl: A large-scale expert-annotated dataset for entity and relation extraction from radiology reports. InFindings of the Association for Computational Linguistics, pages 12902–12915, 2024
2024
-
[12]
Eisenhauer, Patrick Therasse, Jan Bogaerts, Lawrence H
Elizabeth A. Eisenhauer, Patrick Therasse, Jan Bogaerts, Lawrence H. Schwartz, Daniel Sargent, Robert Ford, Janet Dancey, Susan Arbuck, Steve Gwyther, Margaret Mooney, Larry Rubinstein, Lalitha Shankar, |19 Lori Dodd, Robert Kaplan, Denis Lacombe, and Jaap Verweij. New response evaluation criteria in solid tumours: Revised recist guideline (version 1.1).E...
2009
-
[13]
3d-rad: A comprehensive 3d radiology med-vqa dataset with multi-temporal analysis and diverse diagnostic tasks.Advances in Neural Information Processing Systems, 38, 2026
Xiaotang Gai, Jiaxiang Liu, Yichen Li, Zijie Meng, Jian Wu, and Zuozhu Liu. 3d-rad: A comprehensive 3d radiology med-vqa dataset with multi-temporal analysis and diverse diagnostic tasks.Advances in Neural Information Processing Systems, 38, 2026
2026
-
[14]
Aaron Grattafiori, Abhimanyu Dubey, Abhinav Jauhri, Abhinav Pandey, Abhishek Kadian, Ahmad Al-Dahle, Aiesha Letman, Akhil Mathur, Alan Schelten, Alex Vaughan, et al. The llama 3 herd of models. arXiv preprint arXiv:2407.21783, 2024
work page internal anchor Pith review Pith/arXiv arXiv 2024
-
[15]
Generalist foundation models from a multimodal dataset for 3d computed tomography.Nature Biomedical Engineering, pages 1–19, 2026
Ibrahim Ethem Hamamci, Sezgin Er, Chenyu Wang, Furkan Almas, Ayse Gulnihan Simsek, Sevval Nil Esirgun, Irem Dogan, Omer Faruk Durugol, Benjamin Hou, Suprosanna Shit, et al. Generalist foundation models from a multimodal dataset for 3d computed tomography.Nature Biomedical Engineering, pages 1–19, 2026
2026
-
[16]
Deep metric learning using triplet network
Elad Hoffer and Nir Ailon. Deep metric learning using triplet network. InSimilarity-based pattern recognition: third international workshop, SIMBAD 2015, Copenhagen, Denmark, October 12-14, 2015. Proceedings 3, pages 84–92, 2015
2015
-
[17]
Medical-Diff-VQA: A Large-Scale Medical Dataset for Difference Visual Question Answering on Chest X-Ray Images.PhysioNet, Feb
Xinyue Hu, Lin Gu, Qiyuan An, Mengliang Zhang, liangchen liu, Kazuma Kobayashi, Tatsuya Harada, Ronald Summers, and Yingying Zhu. Medical-Diff-VQA: A Large-Scale Medical Dataset for Difference Visual Question Answering on Chest X-Ray Images.PhysioNet, Feb. 2025. Version 1.0.1
2025
-
[18]
Expert knowledge-aware image difference graph representation learning for difference-aware medical visual question answering
Xinyue Hu, Lin Gu, Qiyuan An, Mengliang Zhang, Liangchen Liu, Kazuma Kobayashi, Tatsuya Harada, Ronald M Summers, and Yingying Zhu. Expert knowledge-aware image difference graph representation learning for difference-aware medical visual question answering. InProceedings of the ACM SIGKDD Conference on Knowledge Discovery and Data Mining, pages 4156–4165, 2023
2023
-
[19]
Lungren, and Serena Yeung
Shih-Cheng Huang, Liyue Shen, Matthew P. Lungren, and Serena Yeung. Gloria: A multimodal global- local representation learning framework for label-efficient medical image recognition. InProceedings of the IEEE/CVF International Conference on Computer Vision, pages 3942–3951, 2021
2021
-
[20]
Chexpert: A large chest radiograph dataset with uncertainty labels and expert comparison
Jeremy Irvin, Pranav Rajpurkar, Michael Ko, Yifan Yu, Silviana Ciurea-Ilcus, Chris Chute, Henrik Marklund, Behzad Haghgoo, Robyn Ball, Katie Shpanskaya, et al. Chexpert: A large chest radiograph dataset with uncertainty labels and expert comparison. InProceedings of the AAAI conference on artificial intelligence, volume 33, pages 590–597, 2019
2019
-
[21]
nnu-net: a self-configuring method for deep learning-based biomedical image segmentation.Nature methods, 18(2):203–211, 2021
Fabian Isensee, Paul F Jaeger, Simon AA Kohl, Jens Petersen, and Klaus H Maier-Hein. nnu-net: a self-configuring method for deep learning-based biomedical image segmentation.Nature methods, 18(2):203–211, 2021
2021
-
[22]
Amos: A large-scale abdominal multi-organ benchmark for versatile medical image segmentation.Advancesin Neural Information Processing Systems, 35:36722–36732, 2022
Yuanfeng Ji, Haotian Bai, Chongjian Ge, Jie Yang, Ye Zhu, Ruimao Zhang, Zhen Li, Lingyan Zhanng, Wanling Ma, Xiang Wan, et al. Amos: A large-scale abdominal multi-organ benchmark for versatile medical image segmentation.Advancesin Neural Information Processing Systems, 35:36722–36732, 2022
2022
-
[23]
Hulu-med: A transparent generalist model towards holistic medical vision-language understanding
Songtao Jiang, Yuan Wang, Sibo Song, Tianxiang Hu, Chenyi Zhou, Bin Pu, Yan Zhang, Zhibo Yang, Yang Feng, Joey Tianyi Zhou, et al. Hulu-med: A transparent generalist model towards holistic medical vision-language understanding. arXiv preprint arXiv:2510.08668, 2025
-
[24]
On the automatic generation of medical imaging reports
Baoyu Jing, Pengtao Xie, and Eric Xing. On the automatic generation of medical imaging reports. In Proceedings of the 56th annual meeting of the association for computational linguistics (volume 1: long papers), pages 2577–2586, 2018
2018
-
[25]
Mimic-cxr, a de-identified publicly available database of chest radiographs with free-text reports.Scientific data, 6(1):317, 2019
Alistair EW Johnson, Tom J Pollard, Seth J Berkowitz, Nathaniel R Greenbaum, Matthew P Lungren, Chih-ying Deng, Roger G Mark, and Steven Horng. Mimic-cxr, a de-identified publicly available database of chest radiographs with free-text reports.Scientific data, 6(1):317, 2019
2019
-
[26]
Unibrain: Universal brain mri diagnosis with hierarchical knowledge-enhanced pre-training.Computerized Medical Imaging and Graphics, 122:102516, 2025
Jiayu Lei, Lisong Dai, Haoyun Jiang, Chaoyi Wu, Xiaoman Zhang, Yao Zhang, Jiangchao Yao, Weidi Xie, Yanyong Zhang, Yuehua Li, Ya Zhang, and Yanfeng Wang. Unibrain: Universal brain mri diagnosis with hierarchical knowledge-enhanced pre-training.Computerized Medical Imaging and Graphics, 122:102516, 2025
2025
-
[27]
Knowledge-driven encode, retrieve, paraphrase for medical image report generation
Christy Y Li, Xiaodan Liang, Zhiting Hu, and Eric P Xing. Knowledge-driven encode, retrieve, paraphrase for medical image report generation. InProceedings of the AAAI conference on artificial intelligence, volume 33, pages 6666–6673, 2019. |20
2019
-
[28]
Ultrasound report generation with cross-modality feature alignment via unsupervised guidance
Jun Li, Tongkun Su, Baoliang Zhao, Faqin Lv, Qiong Wang, Nassir Navab, Ying Hu, and Zhongliang Jiang. Ultrasound report generation with cross-modality feature alignment via unsupervised guidance. IEEE Transactions on Medical Imaging, 2024
2024
-
[29]
RadThinking: A Dataset for Longitudinal Clinical Reasoning in Radiology
Wenxuan Li, Pedro RAS Bassi, Xinze Zhou, Jakob Wasserthal, Alan L Yuille, and Zongwei Zhou. Radthinking: A dataset for longitudinal clinical reasoning in radiology.arXiv preprint arXiv:2605.10761, 2026
work page internal anchor Pith review Pith/arXiv arXiv 2026
-
[30]
Pmc- clip: Contrastive language-image pre-training using biomedical documents
Weixiong Lin, Ziheng Zhao, Xiaoman Zhang, Chaoyi Wu, Ya Zhang, Yanfeng Wang, and Weidi Xie. Pmc- clip: Contrastive language-image pre-training using biomedical documents. InInternational Conference on Medical Image Computing and Computer-Assisted Intervention, pages 525–536. Springer, 2023
2023
-
[31]
Aixin Liu, Bei Feng, Bing Xue, Bingxuan Wang, Bochao Wu, Chengda Lu, Chenggang Zhao, Chengqi Deng, Chenyu Zhang, Chong Ruan, et al. Deepseek-v3 technical report.arXiv preprint arXiv:2412.19437, 2024
work page internal anchor Pith review Pith/arXiv arXiv 2024
-
[32]
Yizhen Luo, Jiahuan Zhang, Siqi Fan, Kai Yang, Yushuai Wu, Mu Qiao, and Zaiqing Nie. Biomedgpt: Open multimodal generative pre-trained transformer for biomedicine.arXiv preprint arXiv:2308.09442, 2023
-
[33]
Segment anything in medical images
Jun Ma, Yuting He, Feifei Li, Lin Han, Chenyu You, and Bo Wang. Segment anything in medical images. Nature communications, 15(1):654, 2024
2024
-
[34]
Mmxu: A multi-modal and multi-x-ray understanding dataset for disease progression
Linjie Mu, Zhongzhen Huang, Shengqian Qin, Yakun Zhu, Shaoting Zhang, and Xiaofan Zhang. Mmxu: A multi-modal and multi-x-ray understanding dataset for disease progression. InFindings of the Association for Computational Linguistics: ACL 2025, pages 9785–9803, 2025
2025
-
[35]
A review of content-based image retrieval systems in medical applications—clinical benefits and future directions.International Journal of Medical Informatics, 73(1):1–23, 2004
Henning Müller, Nicolas Michoux, David Bandon, and Antoine Geissbuhler. A review of content-based image retrieval systems in medical applications—clinical benefits and future directions.International Journal of Medical Informatics, 73(1):1–23, 2004
2004
-
[36]
Representation Learning with Contrastive Predictive Coding
Aaron van den Oord, Yazhe Li, and Oriol Vinyals. Representation learning with contrastive predictive coding. arXiv preprint arXiv:1807.03748, 2018
work page internal anchor Pith review Pith/arXiv arXiv 2018
-
[37]
Bleu: a method for automatic evaluation of machine translation
Kishore Papineni, Salim Roukos, Todd Ward, and Wei-Jing Zhu. Bleu: a method for automatic evaluation of machine translation. InAssociation for Computational Linguistics, pages 311–318, 2002
2002
-
[38]
CheXTemporal: A Dataset for Temporally-Grounded Reasoning in Chest Radiography
Eva Prakash, Yunhe Gao, Chong Wang, Justin Xu, Neal Prakash, Arne Michalson, Seena Dehkharghani, Eun Kyoung Hong, Julie Bauml, Roger Boodoo, Jean-Benoit Delbrouck, Sophie Ostmeier, and Curtis Langlotz. Chextemporal: A dataset for temporally-grounded reasoning in chest radiography.arXiv preprint arXiv:2605.11304, 2026
work page internal anchor Pith review Pith/arXiv arXiv 2026
-
[39]
CheXNet: Radiologist-Level Pneumonia Detection on Chest X-Rays with Deep Learning
Pranav Rajpurkar, Jeremy Irvin, Kaylie Zhu, Brandon Yang, Hershel Mehta, Tony Duan, Daisy Ding, Aarti Bagul, Curtis Langlotz, Katie Shpanskaya, et al. Chexnet: Radiologist-level pneumonia detection on chest x-rays with deep learning.arXiv preprint arXiv:1711.05225, 2017
work page internal anchor Pith review Pith/arXiv arXiv 2017
-
[40]
Biolord: Learning ontological representations from definitions for biomedical concepts and their textual descriptions
François Remy, Kris Demuynck, and Thomas Demeester. Biolord: Learning ontological representations from definitions for biomedical concepts and their textual descriptions. InFindings of the Association for Computational Linguistics: EMNLP 2022, pages 1454–1465, 2022
2022
-
[41]
Scaling vision with sparse mixture of experts.Advances in Neural Information Processing Systems, 34:8583–8595, 2021
Carlos Riquelme, Joan Puigcerver, Basil Mustafa, Maxim Neumann, Rodolphe Jenatton, André Su- sano Pinto, Daniel Keysers, and Neil Houlsby. Scaling vision with sparse mixture of experts.Advances in Neural Information Processing Systems, 34:8583–8595, 2021
2021
-
[42]
Anna A. J. Roelofs, Nico Karssemeijer, Nicola Wedekind, Christoph Beck, Sabine van Woudenberg, Peter R. Snoeren, Jan H. C. L. Hendriks, Marco Rosselli Del Turco, Nils Bjurstam, Horst Junkermann, David Beijerinck, Bruno Seradour, Cees J. G. Evertsz, Linda van Erning, and Mireille J. M. Broeders. Importance of comparison of current and prior mammograms in b...
2007
-
[43]
Andrew Sellergren, Sahar Kazemzadeh, Tiam Jaroensri, Atilla Kiraly, Madeleine Traverse, Timo Kohlberger, Shawn Xu, Fayaz Jamil, Cían Hughes, Charles Lau, et al. Medgemma technical report. arXiv preprint arXiv:2507.05201, 2025
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[44]
Qwen2.5: A party of foundation models, September 2024
Qwen Team. Qwen2.5: A party of foundation models, September 2024. |21
2024
-
[45]
Hergen: Elevating radiology report generation with longitudinal data
Fuying Wang, Shenghui Du, and Lequan Yu. Hergen: Elevating radiology report generation with longitudinal data. InComputer Vision –ECCV 2024, pages 183–200. Springer, 2024
2024
-
[46]
Ai-driven smart patient retrieval for precision oncology
Yan-Ran Joyce Wang and Akshay S Chaudhari. Ai-driven smart patient retrieval for precision oncology. Nature Reviews Cancer, pages 1–3, 2026
2026
-
[47]
Medclip: Contrastive learning from unpaired medical images and text
Zifeng Wang, Zhenbang Wu, Dinesh Agarwal, and Jimeng Sun. Medclip: Contrastive learning from unpaired medical images and text. InProceedings of the Conference on Empirical Methods in Natural Language Processing, page 3876, 2022
2022
-
[48]
Towards generalist foundation model for radiology by leveraging web-scale 2d&3d medical data.Nature Communications, 16(1):7866, 2025
Chaoyi Wu, Xiaoman Zhang, Ya Zhang, Hui Hui, Yanfeng Wang, and Weidi Xie. Towards generalist foundation model for radiology by leveraging web-scale 2d&3d medical data.Nature Communications, 16(1):7866, 2025
2025
-
[49]
Lingshu: A Generalist Foundation Model for Unified Multimodal Medical Understanding and Reasoning
Weiwen Xu, Hou Pong Chan, Long Li, Mahani Aljunied, Ruifeng Yuan, Jianyu Wang, Chenghao Xiao, Guizhen Chen, Chaoqun Liu, Zhaodonghui Li, et al. Lingshu: A generalist foundation model for unified multimodal medical understanding and reasoning.arXiv preprint arXiv:2506.07044, 2025
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[50]
A multimodal biomedical foundation model trained from fifteen million image–text pairs.NEJM AI, 2(1):AIoa2400640, 2025
Sheng Zhang, Yanbo Xu, Naoto Usuyama, Hanwen Xu, Jaspreet Bagga, Robert Tinn, Sam Preston, Rajesh Rao, Mu Wei, Naveen Valluri, et al. A multimodal biomedical foundation model trained from fifteen million image–text pairs.NEJM AI, 2(1):AIoa2400640, 2025
2025
-
[51]
Bertscore: Evaluating text generation with bert
Tianyi Zhang, Varsha Kishore, Felix Wu, Kilian Q Weinberger, and Yoav Artzi. Bertscore: Evaluating text generation with bert. InProceedings of the International Conference on Learning Representations
-
[52]
Towards scalable language-image pre-training for 3d medical imaging
Chenhui Zhao, Yiwei Lyu, Asadur Chowdury, Edward Harake, Akhil Kondepudi, Akshay Rao, Xinhai Hou, Honglak Lee, and Todd Hollon. Towards scalable language-image pre-training for 3d medical imaging. arXiv preprint arXiv:2505.21862, 2025
-
[53]
Ratescore: A metric for radiology report generation
Weike Zhao, Chaoyi Wu, Xiaoman Zhang, Ya Zhang, Yanfeng Wang, and Weidi Xie. Ratescore: A metric for radiology report generation. In Proceedings of the Conference on Empirical Methods in Natural Language Processing, pages 15004–15019, 2024
2024
-
[54]
Rethinking whole-body ct image interpretation: An abnormality-centric approach
Ziheng Zhao, Lisong Dai, Ya Zhang, Weidi Xie, and Yanfeng Wang. Rethinking whole-body ct image interpretation: An abnormality-centric approach. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 5179–5189, 2026
2026
-
[55]
Large-vocabulary segmentation for medical images with text prompts.NPJ Digital Medicine, 8(1):566, 2025
Ziheng Zhao, Yao Zhang, Chaoyi Wu, Xiaoman Zhang, Xiao Zhou, Ya Zhang, Yanfeng Wang, and Weidi Xie. Large-vocabulary segmentation for medical images with text prompts.NPJ Digital Medicine, 8(1):566, 2025
2025
-
[56]
1113” and “free
Qingqing Zhu, Tejas Sudharshan Mathai, Pritam Mukherjee, Yifan Peng, Ronald M Summers, and Zhiyong Lu. Utilizing longitudinal chest x-rays and reports to pre-fill radiology reports. InInternational Conference on Medical Image Computing and Computer-Assisted Intervention, pages 189–198. Springer, 2023. |22 S1 Supplementary Dataset and Construction Details ...
2023
-
[57]
Aspects for comparison include, but are not limited to: presence/absence, location, size, morphology, margins, internal characteristics, associated findings, or potential etiology
Question Focus: Questions should focus specifically on the differential manifestations of the abnormality [abnormal] between the two images. Aspects for comparison include, but are not limited to: presence/absence, location, size, morphology, margins, internal characteristics, associated findings, or potential etiology. The overall imaging appearance of t...
-
[58]
Comprehensive Coverage: Questions should cover all identifiable dimensions of difference, ensuring variety and that no key comparative points are omitted
-
[59]
Answer and Rationale: Each question must include the correct answer (e.g., A, B, C, D) accompanied by a concise rationale explaining the basis for the judgment
-
[60]
image comparison
Perspective of Expression: The wording of questions and rationales should adopt an "image comparison" perspective. References to the cases must be specific; for example, use "the imaging from Case A" and "the imaging from Case B," avoiding generic terms like "one image" or "the other image." Additionally, refrain from using phrases such as "the report ind...
-
[61]
Avoid content related to unchanged features or aspects irrelevant to the comparison
Relevance and Specificity: All questions must strictly pertain to the actual differences present between the two images. Avoid content related to unchanged features or aspects irrelevant to the comparison
-
[62]
question
Information Fidelity: Questions, options, answers, and rationales must be strictly based on the provided factual information. Do not fabricate or introduce details not mentioned. Output Format (JSON): Each question must strictly adhere to the following JSON structure: { "question": "Question content", "condition": "[anatomy]_and_[abnormal]", "content_type...
2048
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.