A Vision-Language Foundation Model to Enhance Efficiency of Chest X-ray Interpretation

Akshay S. Chaudhari; Alaa Youssef; Andrew Johnston; Cameron Olsen; Christian Bluethgen; Christopher F. Beaulieu; Curtis P. Langlotz; Dave Van Veen; Eduardo Pontes Reis; Emily B. Tsai

arxiv: 2401.12208 · v2 · pith:LD7FO4TLnew · submitted 2024-01-22 · 💻 cs.CV · cs.CL

A Vision-Language Foundation Model to Enhance Efficiency of Chest X-ray Interpretation

Zhihong Chen , Maya Varma , Justin Xu , Magdalini Paschali , Dave Van Veen , Andrew Johnston , Alaa Youssef , Louis Blankemeier

show 15 more authors

Christian Bluethgen Stephan Altmayer Jeya Maria Jose Valanarasu Mohamed Siddig Eltayeb Muneer Eduardo Pontes Reis Joseph Paul Cohen Cameron Olsen Tanishq Mathew Abraham Emily B. Tsai Christopher F. Beaulieu Jenia Jitsev Sergios Gatidis Jean-Benoit Delbrouck Akshay S. Chaudhari Curtis P. Langlotz

This is my paper

classification 💻 cs.CV cs.CL

keywords radiologistsreportschexagentchexagent-draftedfoundationinterpretationattendingchest

0 comments

read the original abstract

Over 1.4 billion chest X-rays (CXRs) are performed annually due to their cost-effectiveness as an initial diagnostic test. This scale of radiological studies provides a significant opportunity to streamline CXR interpretation and documentation. While foundation models are a promising solution, the lack of publicly available large-scale datasets and benchmarks inhibits their iterative development and real-world evaluation. To overcome these challenges, we constructed a large-scale dataset (CheXinstruct), which we utilized to train a vision-language foundation model (CheXagent). We systematically demonstrated competitive performance across eight distinct task types on our novel evaluation benchmark (CheXbench). Beyond technical validation, we assessed the real-world utility of CheXagent in directly drafting radiology reports. Our clinical assessment with eight radiologists revealed a 36% time saving for residents using CheXagent-drafted reports, while attending radiologists showed no significant time difference editing resident-drafted or CheXagent-drafted reports. The CheXagent-drafted reports improved the writing efficiency of both radiology residents and attending radiologists in 81% and 61% of cases, respectively, without loss of quality. Overall, we demonstrate that CheXagent can effectively perform a variety of CXR interpretation tasks and holds potential to assist radiologists in routine clinical workflows.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 23 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Dental-TriageBench: Benchmarking Multimodal Reasoning for Hierarchical Dental Triage
cs.CL 2026-03 unverdicted novelty 8.0

Dental-TriageBench is the first expert-annotated multimodal benchmark for hierarchical dental triage and shows a substantial performance gap between 19 MLLMs and junior dentists, especially on multi-domain referral cases.
CheXpercept: A Benchmark for Evaluating Expert-Level Lesion Perception in Chest X-rays
cs.CV 2026-06 unverdicted novelty 7.0

CheXpercept is a sequential multi-level perception benchmark showing VLMs perform adequately only on coarse lesion detection in chest X-rays while degrading sharply on finer tasks, with medical VLMs offering no advant...
A Vision-language Framework for Comparative Reasoning in Radiology
cs.CV 2026-06 unverdicted novelty 7.0

Introduces MedReCo-DB dataset of 690k+ images and entity-aware models MedReCo/MedReCo-VLM that improve reference retrieval and comparative change interpretation in radiology across multiple centers and modalities.
DDX-TRACE: A Benchmark for Medical Diagnostic Trajectories in VLMs
cs.CV 2026-05 unverdicted novelty 7.0

DDX-TRACE is a physician-adjudicated benchmark for evaluating VLMs on evidence-supported diagnostic trajectories rather than final answers alone in multimodal neuroradiology.
HalluCXR: Benchmarking and Mitigating Hallucinations in Medical Vision-Language Models for Chest Radiograph Interpretation
cs.CV 2026-05 conditional novelty 7.0

HalluCXR benchmark shows 61.9-82.3% hallucination rates across VLMs on MIMIC-CXR images, identifies patterns such as length-based risk and over-fabrication of common findings, and demonstrates ensemble mitigation that...
CXR-ContraBench: Benchmarking Negated-Option Attraction in Medical VLMs
cs.CV 2026-05 conditional novelty 7.0

Medical VLMs frequently select negated options that contradict visible chest X-ray findings, achieving only ~30% accuracy on direct presence probes, but a post-hoc consistency verifier raises accuracy above 95%.
ECHO: Efficient Chest X-ray Report Generation with One-step Block Diffusion
cs.LG 2026-04 unverdicted novelty 7.0

ECHO is a one-step block diffusion VLM for chest X-ray reports that improves RaTE and SemScore by over 60% while delivering 8x faster inference than autoregressive baselines.
CheXmix: Unified Generative Pretraining for Vision Language Models in Medical Imaging
cs.CV 2026-04 unverdicted novelty 6.0

CheXmix combines masked autoencoder pretraining with early-fusion generative modeling to outperform prior models on chest X-ray classification by up to 8.6% AUROC, inpainting by 51%, and report generation by 45% on GREEN.
Enhancing Reinforcement Learning for Radiology Report Generation with Evidence-aware Rewards and Self-correcting Preference Learning
cs.LG 2026-04 unverdicted novelty 6.0

ESC-RL improves RL for radiology reports via group-wise evidence-aware rewards (GEAR) and LLM-driven self-correcting preference learning (SPL), reaching state-of-the-art on two chest X-ray datasets.
IMACT-CXR: An Interactive Multi-Agent Conversational Tutoring System for Chest X-Ray Interpretation
cs.AI 2025-11 unverdicted novelty 6.0

IMACT-CXR presents an integrated multi-agent system using AutoGen, Bayesian Knowledge Tracing, gaze feedback, and vision-language models to provide interactive tutoring for chest X-ray interpretation with preliminary ...
RA-RRG: Multimodal Retrieval-Augmented Radiology Report Generation with Key Phrase Extraction
cs.CV 2025-04 unverdicted novelty 6.0

RA-RRG extracts key phrases with LLMs, retrieves them via multimodal similarity, and conditions report generation on them to achieve SOTA CheXbert scores and competitive RadGraph F1 on MIMIC-CXR and IU X-ray while sup...
PMC-VQA: Visual Instruction Tuning for Medical Visual Question Answering
cs.CV 2023-05 conditional novelty 6.0

PMC-VQA dataset and MedVInT model achieve better generative performance on medical VQA benchmarks by visual instruction tuning on a newly constructed large-scale dataset.
Mind the Tool Failures: Achieving Synergistic Tool Gains for Medical Agents
cs.AI 2026-05 unverdicted novelty 5.0

A GRPO-based RL framework with probabilistic risk minimization, disagreement-aware synergy rewards, and entropy-guided sampling enables instance-level tool selection that closes the single-oracle risk gap on medical b...
CXRMate-2: Structured Multimodal Temporal Embeddings and Tractable Reinforcement Learning for Clinically Acceptable Chest X-ray Radiology Report Generation
cs.CV 2026-04 unverdicted novelty 5.0

CXRMate-2 improves chest X-ray report generation via temporal embeddings and tractable RL, delivering metric gains and 45% acceptability in radiologist review with no significant preference difference on most findings.
Medical Image Understanding Improves Survival Prediction via Visual Instruction Tuning
cs.CV 2026-04 unverdicted novelty 5.0

A vision-language model pre-trained via instruction tuning on CT-report pairs improves survival prediction accuracy over baselines, especially when clinical data alone is weak, while also producing text answers to cli...
ECHO: Efficient Chest X-ray Report Generation with One-step Block Diffusion
cs.LG 2026-04 unverdicted novelty 5.0

ECHO introduces one-step block diffusion via Direct Conditional Distillation and Response-Asymmetric Diffusion to generate chest X-ray reports faster than autoregressive models while improving clinical metrics.
Resolution scaling governs DINOv3 transfer performance in chest radiograph classification
cs.CV 2025-10 conditional novelty 5.0

DINOv3 at 512x512 resolution with ConvNeXt-B outperforms prior initializations for adult chest X-ray classification but shows no benefit in pediatric cohorts or at 1024 resolution.
RadAgents: Multimodal Agentic Reasoning for Chest X-ray Interpretation with Radiologist-like Workflows
cs.MA 2025-09 unverdicted novelty 5.0

RadAgents is a multi-agent framework coupling clinical priors with task-aware multimodal reasoning and radiologist-like workflows, plus grounding and retrieval-augmentation for conflict resolution in chest X-ray inter...
Enhancing the Safety of Medical Vision-Language Models by Synthetic Demonstrations
cs.CV 2025-06 unverdicted novelty 5.0

Synthetic clinical demonstrations at inference time improve safety of Med-VLMs against visual and textual jailbreaks while preserving general performance on medical tasks.
M4CXR: Exploring Multi-task Potentials of Multi-modal Large Language Models for Chest X-ray Interpretation
cs.CV 2024-08 unverdicted novelty 5.0

M4CXR is a multi-modal large language model that performs multiple tasks in chest X-ray analysis including report generation with claimed SOTA clinical accuracy using chain-of-thought prompting.
NoduLoCC2026: Lung Nodule Localization and Classification Contest from Chest X-Ray Images
cs.CV 2026-06 unverdicted novelty 4.0

A new contest provides a chest X-ray dataset for lung nodule classification and localization; the best entry reaches 0.72 balanced accuracy and 0.79 AUC on classification but only predicts the correct nodule count on ...
Analyzing and Improving Fine-grained Preference Optimization in Medical LVLMs
cs.CV 2026-06 unverdicted novelty 4.0

Proposes bidirectional token-wise KL regularizer and visual-contrastive grounding objective to create fine-grained on-policy preference pairs for medical LVLMs by minimally editing model outputs.
Data-Centric Foundation Models in Computational Healthcare: A Survey
cs.LG 2024-01 unverdicted novelty 3.0

The paper surveys data-centric strategies for foundation models in computational healthcare and supplies a curated list of related models and datasets.