On the Automatic Generation of Medical Imaging Reports

Baoyu Jing; Eric Xing; Pengtao Xie

arxiv: 1711.08195 · v3 · pith:WEEKC3LVnew · submitted 2017-11-22 · 💻 cs.CL · cs.CV

On the Automatic Generation of Medical Imaging Reports

Baoyu Jing , Pengtao Xie , Eric Xing This is my paper

classification 💻 cs.CL cs.CV

keywords medicalgenerationimagingautomaticchallengescontaininggeneratelong

0 comments

read the original abstract

Medical imaging is widely used in clinical practice for diagnosis and treatment. Report-writing can be error-prone for unexperienced physicians, and time- consuming and tedious for experienced physicians. To address these issues, we study the automatic generation of medical imaging reports. This task presents several challenges. First, a complete report contains multiple heterogeneous forms of information, including findings and tags. Second, abnormal regions in medical images are difficult to identify. Third, the re- ports are typically long, containing multiple sentences. To cope with these challenges, we (1) build a multi-task learning framework which jointly performs the pre- diction of tags and the generation of para- graphs, (2) propose a co-attention mechanism to localize regions containing abnormalities and generate narrations for them, (3) develop a hierarchical LSTM model to generate long paragraphs. We demonstrate the effectiveness of the proposed methods on two publicly available datasets.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 2 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

RIHA: Report-Image Hierarchical Alignment for Radiology Report Generation
cs.CV 2026-04 unverdicted novelty 6.0

RIHA proposes a hierarchical alignment transformer that uses multi-scale visual and textual feature pyramids plus optimal transport to generate more accurate radiology reports from medical images.
Justifying Diagnosis Decisions by Deep Neural Networks
cs.LG 2019-07 unverdicted novelty 5.0

A multi-task deep learning model maps frontal X-rays to continuous text for producing diagnoses, textual justifications, and alternative images, with expert study showing better justification than saliency maps.