Diffusion LM matches AR performance on medical VQA, runs 3.5-4.4x faster, and enables bidirectional infilling for interactive radiology report drafting.
SemEnrich: Self-Supervised Semantic Enrichment of Radiology Reports for Vision-Language Learning
1 Pith paper cite this work. Polarity classification is still indexing.
abstract
Medical vision-language datasets are often limited in size and biased toward negative findings, as clinicians report abnormalities mostly but might omit some positive/neutral findings because they might be considered as irrelevant to the patient's condition. We propose a self-supervised data enrichment method that leverages semantic clustering of report sentences. Then we enrich the findings in the medical reports in the training set by adding positive/neutral observations from different clusters in a self-supervised manner. Our approach yields consistent gains in supervised fine-tuning (5.63%, 3.04%, 7.40%, 5.30%, 7.47% average gains on COMET score, Bert score, Sentence Bleu, CheXbert-F1 and RadGraph-F1 scores respectively). Ablation studies confirm that improvements stem from semantic clustering rather than random augmentation. Furthermore, we introduce a way to incorporate semantic cluster information into the reward design for GRPO training, which leads to further performance gains (2.78%, 3.14%, 12.80% average gains on COMET score, Bert score and Sentence Bleu scores respectively). We share our code at https://anonymous.4open.science/r/SemEnrich-75CF
fields
cs.AI 1years
2026 1verdicts
UNVERDICTED 1representative citing papers
citing papers explorer
-
Discrete Diffusion Language Models for Interactive Radiology Report Drafting
Diffusion LM matches AR performance on medical VQA, runs 3.5-4.4x faster, and enables bidirectional infilling for interactive radiology report drafting.