AOR: Anatomical Ontology-Guided Reasoning for Medical Large Multimodal Model in Chest X-Ray Interpretation

Jilan Xu; Junjun He; Qingqiu Li; Quanli Shen; Rui Feng; Runtian Yuan; Seongsu Bae; Shujun Wang; Xiaobo Zhang; Yuejie Zhang

arxiv: 2505.02830 · v1 · pith:PU4MZ2PBnew · submitted 2025-05-05 · 💻 cs.CV · cs.CL

AOR: Anatomical Ontology-Guided Reasoning for Medical Large Multimodal Model in Chest X-Ray Interpretation

Qingqiu Li , Zihang Cui , Seongsu Bae , Jilan Xu , Runtian Yuan , Yuejie Zhang , Rui Feng , Quanli Shen

show 3 more authors

Xiaobo Zhang Junjun He Shujun Wang

This is my paper

classification 💻 cs.CV cs.CL

keywords reasoninglargemlmmsaccuracyanatomicalchestinterpretationlmms

0 comments

read the original abstract

Chest X-rays (CXRs) are the most frequently performed imaging examinations in clinical settings. Recent advancements in Large Multimodal Models (LMMs) have enabled automated CXR interpretation, enhancing diagnostic accuracy and efficiency. However, despite their strong visual understanding, current Medical LMMs (MLMMs) still face two major challenges: (1) Insufficient region-level understanding and interaction, and (2) Limited accuracy and interpretability due to single-step reasoning. In this paper, we empower MLMMs with anatomy-centric reasoning capabilities to enhance their interactivity and explainability. Specifically, we first propose an Anatomical Ontology-Guided Reasoning (AOR) framework, which centers on cross-modal region-level information to facilitate multi-step reasoning. Next, under the guidance of expert physicians, we develop AOR-Instruction, a large instruction dataset for MLMMs training. Our experiments demonstrate AOR's superior performance in both VQA and report generation tasks.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 2 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

EasyLens: A Training-Free Plug-and-Play Subtle-Lesion Representation Amplifier for Medical Vision-Language Models
cs.CV 2026-06 unverdicted novelty 6.0

EasyLens introduces a plug-and-play amplifier that uses pathology-anatomy prototypes and morphology-guided residual enhancement to boost subtle-lesion cues in frozen medical VLMs.
Towards Responsible Multimodal Medical Reasoning via Context-Aligned Vision-Language Models
cs.CV 2026-04 unverdicted novelty 4.0

Context alignment in medical VLMs raises AUC from 0.918 to 0.925, cuts hallucinated keywords from 1.14 to 0.25, shortens explanations to 15.3 words, and maintains calibrated uncertainty without raising model confidence.