A Vision-Language Foundation Model to Enhance Efficiency of Chest X-ray Interpretation
read the original abstract
Over 1.4 billion chest X-rays (CXRs) are performed annually due to their cost-effectiveness as an initial diagnostic test. This scale of radiological studies provides a significant opportunity to streamline CXR interpretation and documentation. While foundation models are a promising solution, the lack of publicly available large-scale datasets and benchmarks inhibits their iterative development and real-world evaluation. To overcome these challenges, we constructed a large-scale dataset (CheXinstruct), which we utilized to train a vision-language foundation model (CheXagent). We systematically demonstrated competitive performance across eight distinct task types on our novel evaluation benchmark (CheXbench). Beyond technical validation, we assessed the real-world utility of CheXagent in directly drafting radiology reports. Our clinical assessment with eight radiologists revealed a 36% time saving for residents using CheXagent-drafted reports, while attending radiologists showed no significant time difference editing resident-drafted or CheXagent-drafted reports. The CheXagent-drafted reports improved the writing efficiency of both radiology residents and attending radiologists in 81% and 61% of cases, respectively, without loss of quality. Overall, we demonstrate that CheXagent can effectively perform a variety of CXR interpretation tasks and holds potential to assist radiologists in routine clinical workflows.
This paper has not been read by Pith yet.
Forward citations
Cited by 23 Pith papers
-
Dental-TriageBench: Benchmarking Multimodal Reasoning for Hierarchical Dental Triage
Dental-TriageBench is the first expert-annotated multimodal benchmark for hierarchical dental triage and shows a substantial performance gap between 19 MLLMs and junior dentists, especially on multi-domain referral cases.
-
CheXpercept: A Benchmark for Evaluating Expert-Level Lesion Perception in Chest X-rays
CheXpercept is a sequential multi-level perception benchmark showing VLMs perform adequately only on coarse lesion detection in chest X-rays while degrading sharply on finer tasks, with medical VLMs offering no advant...
-
A Vision-language Framework for Comparative Reasoning in Radiology
Introduces MedReCo-DB dataset of 690k+ images and entity-aware models MedReCo/MedReCo-VLM that improve reference retrieval and comparative change interpretation in radiology across multiple centers and modalities.
-
DDX-TRACE: A Benchmark for Medical Diagnostic Trajectories in VLMs
DDX-TRACE is a physician-adjudicated benchmark for evaluating VLMs on evidence-supported diagnostic trajectories rather than final answers alone in multimodal neuroradiology.
-
HalluCXR: Benchmarking and Mitigating Hallucinations in Medical Vision-Language Models for Chest Radiograph Interpretation
HalluCXR benchmark shows 61.9-82.3% hallucination rates across VLMs on MIMIC-CXR images, identifies patterns such as length-based risk and over-fabrication of common findings, and demonstrates ensemble mitigation that...
-
CXR-ContraBench: Benchmarking Negated-Option Attraction in Medical VLMs
Medical VLMs frequently select negated options that contradict visible chest X-ray findings, achieving only ~30% accuracy on direct presence probes, but a post-hoc consistency verifier raises accuracy above 95%.
-
ECHO: Efficient Chest X-ray Report Generation with One-step Block Diffusion
ECHO is a one-step block diffusion VLM for chest X-ray reports that improves RaTE and SemScore by over 60% while delivering 8x faster inference than autoregressive baselines.
-
CheXmix: Unified Generative Pretraining for Vision Language Models in Medical Imaging
CheXmix combines masked autoencoder pretraining with early-fusion generative modeling to outperform prior models on chest X-ray classification by up to 8.6% AUROC, inpainting by 51%, and report generation by 45% on GREEN.
-
Enhancing Reinforcement Learning for Radiology Report Generation with Evidence-aware Rewards and Self-correcting Preference Learning
ESC-RL improves RL for radiology reports via group-wise evidence-aware rewards (GEAR) and LLM-driven self-correcting preference learning (SPL), reaching state-of-the-art on two chest X-ray datasets.
-
IMACT-CXR: An Interactive Multi-Agent Conversational Tutoring System for Chest X-Ray Interpretation
IMACT-CXR presents an integrated multi-agent system using AutoGen, Bayesian Knowledge Tracing, gaze feedback, and vision-language models to provide interactive tutoring for chest X-ray interpretation with preliminary ...
-
RA-RRG: Multimodal Retrieval-Augmented Radiology Report Generation with Key Phrase Extraction
RA-RRG extracts key phrases with LLMs, retrieves them via multimodal similarity, and conditions report generation on them to achieve SOTA CheXbert scores and competitive RadGraph F1 on MIMIC-CXR and IU X-ray while sup...
-
PMC-VQA: Visual Instruction Tuning for Medical Visual Question Answering
PMC-VQA dataset and MedVInT model achieve better generative performance on medical VQA benchmarks by visual instruction tuning on a newly constructed large-scale dataset.
-
Mind the Tool Failures: Achieving Synergistic Tool Gains for Medical Agents
A GRPO-based RL framework with probabilistic risk minimization, disagreement-aware synergy rewards, and entropy-guided sampling enables instance-level tool selection that closes the single-oracle risk gap on medical b...
-
CXRMate-2: Structured Multimodal Temporal Embeddings and Tractable Reinforcement Learning for Clinically Acceptable Chest X-ray Radiology Report Generation
CXRMate-2 improves chest X-ray report generation via temporal embeddings and tractable RL, delivering metric gains and 45% acceptability in radiologist review with no significant preference difference on most findings.
-
Medical Image Understanding Improves Survival Prediction via Visual Instruction Tuning
A vision-language model pre-trained via instruction tuning on CT-report pairs improves survival prediction accuracy over baselines, especially when clinical data alone is weak, while also producing text answers to cli...
-
ECHO: Efficient Chest X-ray Report Generation with One-step Block Diffusion
ECHO introduces one-step block diffusion via Direct Conditional Distillation and Response-Asymmetric Diffusion to generate chest X-ray reports faster than autoregressive models while improving clinical metrics.
-
Resolution scaling governs DINOv3 transfer performance in chest radiograph classification
DINOv3 at 512x512 resolution with ConvNeXt-B outperforms prior initializations for adult chest X-ray classification but shows no benefit in pediatric cohorts or at 1024 resolution.
-
RadAgents: Multimodal Agentic Reasoning for Chest X-ray Interpretation with Radiologist-like Workflows
RadAgents is a multi-agent framework coupling clinical priors with task-aware multimodal reasoning and radiologist-like workflows, plus grounding and retrieval-augmentation for conflict resolution in chest X-ray inter...
-
Enhancing the Safety of Medical Vision-Language Models by Synthetic Demonstrations
Synthetic clinical demonstrations at inference time improve safety of Med-VLMs against visual and textual jailbreaks while preserving general performance on medical tasks.
-
M4CXR: Exploring Multi-task Potentials of Multi-modal Large Language Models for Chest X-ray Interpretation
M4CXR is a multi-modal large language model that performs multiple tasks in chest X-ray analysis including report generation with claimed SOTA clinical accuracy using chain-of-thought prompting.
-
NoduLoCC2026: Lung Nodule Localization and Classification Contest from Chest X-Ray Images
A new contest provides a chest X-ray dataset for lung nodule classification and localization; the best entry reaches 0.72 balanced accuracy and 0.79 AUC on classification but only predicts the correct nodule count on ...
-
Analyzing and Improving Fine-grained Preference Optimization in Medical LVLMs
Proposes bidirectional token-wise KL regularizer and visual-contrastive grounding objective to create fine-grained on-policy preference pairs for medical LVLMs by minimally editing model outputs.
-
Data-Centric Foundation Models in Computational Healthcare: A Survey
The paper surveys data-centric strategies for foundation models in computational healthcare and supplies a curated list of related models and datasets.
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.