A vision- language foundation model to enhance efficiency of chest x-ray interpretation.arXiv preprint arXiv:2401.12208, 2024

Zhihong Chen, Maya Varma, Justin Xu, Magdalini Paschali, Dave Van Veen, Andrew Johnston, Alaa Youssef, Louis Blankemeier, Christian Bluethgen, Stephan Altmayer, et al · 2024 · arXiv 2401.12208

17 Pith papers cite this work. Polarity classification is still indexing.

17 Pith papers citing it

read on arXiv browse 17 citing papers

citation-role summary

background 4

citation-polarity summary

background 4

representative citing papers

Dental-TriageBench: Benchmarking Multimodal Reasoning for Hierarchical Dental Triage

cs.CL · 2026-03-18 · unverdicted · novelty 8.0

Dental-TriageBench is the first expert-annotated multimodal benchmark for hierarchical dental triage and shows a substantial performance gap between 19 MLLMs and junior dentists, especially on multi-domain referral cases.

DDX-TRACE: A Benchmark for Medical Diagnostic Trajectories in VLMs

cs.CV · 2026-05-22 · unverdicted · novelty 7.0

DDX-TRACE is a physician-adjudicated benchmark for evaluating VLMs on evidence-supported diagnostic trajectories rather than final answers alone in multimodal neuroradiology.

HalluCXR: Benchmarking and Mitigating Hallucinations in Medical Vision-Language Models for Chest Radiograph Interpretation

cs.CV · 2026-05-19 · conditional · novelty 7.0

HalluCXR benchmark shows 61.9-82.3% hallucination rates across VLMs on MIMIC-CXR images, identifies patterns such as length-based risk and over-fabrication of common findings, and demonstrates ensemble mitigation that cuts fabrication by up to 84.8%.

CXR-ContraBench: Benchmarking Negated-Option Attraction in Medical VLMs

cs.CV · 2026-05-07 · conditional · novelty 7.0

Medical VLMs frequently select negated options that contradict visible chest X-ray findings, achieving only ~30% accuracy on direct presence probes, but a post-hoc consistency verifier raises accuracy above 95%.

CheXmix: Unified Generative Pretraining for Vision Language Models in Medical Imaging

cs.CV · 2026-04-24 · unverdicted · novelty 6.0

CheXmix combines masked autoencoder pretraining with early-fusion generative modeling to outperform prior models on chest X-ray classification by up to 8.6% AUROC, inpainting by 51%, and report generation by 45% on GREEN.

Enhancing Reinforcement Learning for Radiology Report Generation with Evidence-aware Rewards and Self-correcting Preference Learning

cs.LG · 2026-04-15 · unverdicted · novelty 6.0

ESC-RL improves RL for radiology reports via group-wise evidence-aware rewards (GEAR) and LLM-driven self-correcting preference learning (SPL), reaching state-of-the-art on two chest X-ray datasets.

IMACT-CXR: An Interactive Multi-Agent Conversational Tutoring System for Chest X-Ray Interpretation

cs.AI · 2025-11-19 · unverdicted · novelty 6.0

IMACT-CXR presents an integrated multi-agent system using AutoGen, Bayesian Knowledge Tracing, gaze feedback, and vision-language models to provide interactive tutoring for chest X-ray interpretation with preliminary evidence of improved learner performance.

RA-RRG: Multimodal Retrieval-Augmented Radiology Report Generation with Key Phrase Extraction

cs.CV · 2025-04-10 · unverdicted · novelty 6.0

RA-RRG extracts key phrases with LLMs, retrieves them via multimodal similarity, and conditions report generation on them to achieve SOTA CheXbert scores and competitive RadGraph F1 on MIMIC-CXR and IU X-ray while supporting multi-view inputs.

PMC-VQA: Visual Instruction Tuning for Medical Visual Question Answering

cs.CV · 2023-05-17 · conditional · novelty 6.0

PMC-VQA dataset and MedVInT model achieve better generative performance on medical VQA benchmarks by visual instruction tuning on a newly constructed large-scale dataset.

CXRMate-2: Structured Multimodal Temporal Embeddings and Tractable Reinforcement Learning for Clinically Acceptable Chest X-ray Radiology Report Generation

cs.CV · 2026-04-21 · unverdicted · novelty 5.0

CXRMate-2 improves chest X-ray report generation via temporal embeddings and tractable RL, delivering metric gains and 45% acceptability in radiologist review with no significant preference difference on most findings.

Medical Image Understanding Improves Survival Prediction via Visual Instruction Tuning

cs.CV · 2026-04-20 · unverdicted · novelty 5.0

A vision-language model pre-trained via instruction tuning on CT-report pairs improves survival prediction accuracy over baselines, especially when clinical data alone is weak, while also producing text answers to clinical questions.

ECHO: Efficient Chest X-ray Report Generation with One-step Block Diffusion

cs.LG · 2026-04-10 · unverdicted · novelty 5.0 · 2 refs

ECHO introduces one-step block diffusion via Direct Conditional Distillation and Response-Asymmetric Diffusion to generate chest X-ray reports faster than autoregressive models while improving clinical metrics.

Resolution scaling governs DINOv3 transfer performance in chest radiograph classification

cs.CV · 2025-10-08 · conditional · novelty 5.0

DINOv3 at 512x512 resolution with ConvNeXt-B outperforms prior initializations for adult chest X-ray classification but shows no benefit in pediatric cohorts or at 1024 resolution.

RadAgents: Multimodal Agentic Reasoning for Chest X-ray Interpretation with Radiologist-like Workflows

cs.MA · 2025-09-24 · unverdicted · novelty 5.0

RadAgents is a multi-agent framework coupling clinical priors with task-aware multimodal reasoning and radiologist-like workflows, plus grounding and retrieval-augmentation for conflict resolution in chest X-ray interpretation.

Enhancing the Safety of Medical Vision-Language Models by Synthetic Demonstrations

cs.CV · 2025-06-08 · unverdicted · novelty 5.0

Synthetic clinical demonstrations at inference time improve safety of Med-VLMs against visual and textual jailbreaks while preserving general performance on medical tasks.

M4CXR: Exploring Multi-task Potentials of Multi-modal Large Language Models for Chest X-ray Interpretation

cs.CV · 2024-08-29 · unverdicted · novelty 5.0

M4CXR is a multi-modal large language model that performs multiple tasks in chest X-ray analysis including report generation with claimed SOTA clinical accuracy using chain-of-thought prompting.

Data-Centric Foundation Models in Computational Healthcare: A Survey

cs.LG · 2024-01-04 · unverdicted · novelty 3.0

The paper surveys data-centric strategies for foundation models in computational healthcare and supplies a curated list of related models and datasets.

citing papers explorer

Showing 17 of 17 citing papers.

Dental-TriageBench: Benchmarking Multimodal Reasoning for Hierarchical Dental Triage cs.CL · 2026-03-18 · unverdicted · none · ref 1
Dental-TriageBench is the first expert-annotated multimodal benchmark for hierarchical dental triage and shows a substantial performance gap between 19 MLLMs and junior dentists, especially on multi-domain referral cases.
DDX-TRACE: A Benchmark for Medical Diagnostic Trajectories in VLMs cs.CV · 2026-05-22 · unverdicted · none · ref 6
DDX-TRACE is a physician-adjudicated benchmark for evaluating VLMs on evidence-supported diagnostic trajectories rather than final answers alone in multimodal neuroradiology.
HalluCXR: Benchmarking and Mitigating Hallucinations in Medical Vision-Language Models for Chest Radiograph Interpretation cs.CV · 2026-05-19 · conditional · none · ref 6
HalluCXR benchmark shows 61.9-82.3% hallucination rates across VLMs on MIMIC-CXR images, identifies patterns such as length-based risk and over-fabrication of common findings, and demonstrates ensemble mitigation that cuts fabrication by up to 84.8%.
CXR-ContraBench: Benchmarking Negated-Option Attraction in Medical VLMs cs.CV · 2026-05-07 · conditional · none · ref 7
Medical VLMs frequently select negated options that contradict visible chest X-ray findings, achieving only ~30% accuracy on direct presence probes, but a post-hoc consistency verifier raises accuracy above 95%.
CheXmix: Unified Generative Pretraining for Vision Language Models in Medical Imaging cs.CV · 2026-04-24 · unverdicted · none · ref 8
CheXmix combines masked autoencoder pretraining with early-fusion generative modeling to outperform prior models on chest X-ray classification by up to 8.6% AUROC, inpainting by 51%, and report generation by 45% on GREEN.
Enhancing Reinforcement Learning for Radiology Report Generation with Evidence-aware Rewards and Self-correcting Preference Learning cs.LG · 2026-04-15 · unverdicted · none · ref 6
ESC-RL improves RL for radiology reports via group-wise evidence-aware rewards (GEAR) and LLM-driven self-correcting preference learning (SPL), reaching state-of-the-art on two chest X-ray datasets.
IMACT-CXR: An Interactive Multi-Agent Conversational Tutoring System for Chest X-Ray Interpretation cs.AI · 2025-11-19 · unverdicted · none · ref 14
IMACT-CXR presents an integrated multi-agent system using AutoGen, Bayesian Knowledge Tracing, gaze feedback, and vision-language models to provide interactive tutoring for chest X-ray interpretation with preliminary evidence of improved learner performance.
RA-RRG: Multimodal Retrieval-Augmented Radiology Report Generation with Key Phrase Extraction cs.CV · 2025-04-10 · unverdicted · none · ref 7
RA-RRG extracts key phrases with LLMs, retrieves them via multimodal similarity, and conditions report generation on them to achieve SOTA CheXbert scores and competitive RadGraph F1 on MIMIC-CXR and IU X-ray while supporting multi-view inputs.
PMC-VQA: Visual Instruction Tuning for Medical Visual Question Answering cs.CV · 2023-05-17 · conditional · none · ref 12
PMC-VQA dataset and MedVInT model achieve better generative performance on medical VQA benchmarks by visual instruction tuning on a newly constructed large-scale dataset.
CXRMate-2: Structured Multimodal Temporal Embeddings and Tractable Reinforcement Learning for Clinically Acceptable Chest X-ray Radiology Report Generation cs.CV · 2026-04-21 · unverdicted · none · ref 65
CXRMate-2 improves chest X-ray report generation via temporal embeddings and tractable RL, delivering metric gains and 45% acceptability in radiologist review with no significant preference difference on most findings.
Medical Image Understanding Improves Survival Prediction via Visual Instruction Tuning cs.CV · 2026-04-20 · unverdicted · none · ref 7
A vision-language model pre-trained via instruction tuning on CT-report pairs improves survival prediction accuracy over baselines, especially when clinical data alone is weak, while also producing text answers to clinical questions.
ECHO: Efficient Chest X-ray Report Generation with One-step Block Diffusion cs.LG · 2026-04-10 · unverdicted · none · ref 7 · 2 links
ECHO introduces one-step block diffusion via Direct Conditional Distillation and Response-Asymmetric Diffusion to generate chest X-ray reports faster than autoregressive models while improving clinical metrics.
Resolution scaling governs DINOv3 transfer performance in chest radiograph classification cs.CV · 2025-10-08 · conditional · none · ref 6
DINOv3 at 512x512 resolution with ConvNeXt-B outperforms prior initializations for adult chest X-ray classification but shows no benefit in pediatric cohorts or at 1024 resolution.
RadAgents: Multimodal Agentic Reasoning for Chest X-ray Interpretation with Radiologist-like Workflows cs.MA · 2025-09-24 · unverdicted · none · ref 4
RadAgents is a multi-agent framework coupling clinical priors with task-aware multimodal reasoning and radiologist-like workflows, plus grounding and retrieval-augmentation for conflict resolution in chest X-ray interpretation.
Enhancing the Safety of Medical Vision-Language Models by Synthetic Demonstrations cs.CV · 2025-06-08 · unverdicted · none · ref 6
Synthetic clinical demonstrations at inference time improve safety of Med-VLMs against visual and textual jailbreaks while preserving general performance on medical tasks.
M4CXR: Exploring Multi-task Potentials of Multi-modal Large Language Models for Chest X-ray Interpretation cs.CV · 2024-08-29 · unverdicted · none · ref 12
M4CXR is a multi-modal large language model that performs multiple tasks in chest X-ray analysis including report generation with claimed SOTA clinical accuracy using chain-of-thought prompting.
Data-Centric Foundation Models in Computational Healthcare: A Survey cs.LG · 2024-01-04 · unverdicted · none · ref 51
The paper surveys data-centric strategies for foundation models in computational healthcare and supplies a curated list of related models and datasets.

A vision- language foundation model to enhance efficiency of chest x-ray interpretation.arXiv preprint arXiv:2401.12208, 2024

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer