AOR: Anatomical Ontology-Guided Reasoning for Medical Large Multimodal Model in Chest X-Ray Interpretation
read the original abstract
Chest X-rays (CXRs) are the most frequently performed imaging examinations in clinical settings. Recent advancements in Large Multimodal Models (LMMs) have enabled automated CXR interpretation, enhancing diagnostic accuracy and efficiency. However, despite their strong visual understanding, current Medical LMMs (MLMMs) still face two major challenges: (1) Insufficient region-level understanding and interaction, and (2) Limited accuracy and interpretability due to single-step reasoning. In this paper, we empower MLMMs with anatomy-centric reasoning capabilities to enhance their interactivity and explainability. Specifically, we first propose an Anatomical Ontology-Guided Reasoning (AOR) framework, which centers on cross-modal region-level information to facilitate multi-step reasoning. Next, under the guidance of expert physicians, we develop AOR-Instruction, a large instruction dataset for MLMMs training. Our experiments demonstrate AOR's superior performance in both VQA and report generation tasks.
This paper has not been read by Pith yet.
Forward citations
Cited by 2 Pith papers
-
EasyLens: A Training-Free Plug-and-Play Subtle-Lesion Representation Amplifier for Medical Vision-Language Models
EasyLens introduces a plug-and-play amplifier that uses pathology-anatomy prototypes and morphology-guided residual enhancement to boost subtle-lesion cues in frozen medical VLMs.
-
Towards Responsible Multimodal Medical Reasoning via Context-Aligned Vision-Language Models
Context alignment in medical VLMs raises AUC from 0.918 to 0.925, cuts hallucinated keywords from 1.14 to 0.25, shortens explanations to 15.3 words, and maintains calibrated uncertainty without raising model confidence.
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.