SLaVA-CXR: Small Language and Vision Assistant for Chest X-ray Report Automation

Daqian Shi; David Cliffton; Fenglin Liu; Honghan Wu; Jinge Wu; Yunsoo Kim

arxiv: 2409.13321 · v1 · pith:55GHHJE7new · submitted 2024-09-20 · 💻 cs.LG · cs.AI· cs.CL· cs.CV

SLaVA-CXR: Small Language and Vision Assistant for Chest X-ray Report Automation

Jinge Wu , Yunsoo Kim , Daqian Shi , David Cliffton , Fenglin Liu , Honghan Wu This is my paper

classification 💻 cs.LG cs.AIcs.CLcs.CV

keywords llmsassistantlanguageslava-cxrsmalltrainingautomationchest

0 comments

read the original abstract

Inspired by the success of large language models (LLMs), there is growing research interest in developing LLMs in the medical domain to assist clinicians. However, for hospitals, using closed-source commercial LLMs involves privacy issues, and developing open-source public LLMs requires large-scale computational resources, which are usually limited, especially in resource-efficient regions and low-income countries. We propose an open-source Small Language and Vision Assistant (SLaVA-CXR) that can be used for Chest X-Ray report automation. To efficiently train a small assistant, we first propose the Re$^3$Training method, which simulates the cognitive development of radiologists and optimizes the model in the Recognition, Reasoning, and Reporting training manner. Then, we introduce a data synthesis method, RADEX, which can generate a high-quality and diverse training corpus with privacy regulation compliance. The extensive experiments show that our SLaVA-CXR built on a 2.7B backbone not only outperforms but also achieves 6 times faster inference efficiency than previous state-of-the-art larger models.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 2 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Cross-Source Supervision for Bone Infection Segmentation in Dual-Modality PET-CT
cs.CV 2026-05 unverdicted novelty 5.0

A decoupled dual-source learning framework trains parallel models on independent expert annotations for PET-CT bone infection segmentation and uses patient-level 3D evaluation to report performance variations.
RAG4Outcome: A Retrieval-Augmented Multimodal Framework for Prognostic Prediction in Chronic Osteomyelitis
cs.IR 2026-04 unverdicted novelty 3.0

RAG4Outcome is a retrieval-augmented multimodal framework for prognostic prediction in chronic osteomyelitis using imaging reports, structured records, and unstructured notes.