SLAKE: A Semantically-Labeled Knowledge-Enhanced Dataset for Medical Visual Question Answering

Bo Liu; Li-Ming Zhan; Lin Ma; Li Xu; Xiao-Ming Wu; Yan Yang

arxiv: 2102.09542 · v1 · pith:QOOIELZ7new · submitted 2021-02-18 · 💻 cs.CV · cs.AI· cs.CL

SLAKE: A Semantically-Labeled Knowledge-Enhanced Dataset for Medical Visual Question Answering

Bo Liu , Li-Ming Zhan , Li Xu , Lin Ma , Yan Yang , Xiao-Ming Wu This is my paper

classification 💻 cs.CV cs.AIcs.CL

keywords slakedatasetmed-vqamedicalansweringdevelopmentevaluationquestion

0 comments

read the original abstract

Medical visual question answering (Med-VQA) has tremendous potential in healthcare. However, the development of this technology is hindered by the lacking of publicly-available and high-quality labeled datasets for training and evaluation. In this paper, we present a large bilingual dataset, SLAKE, with comprehensive semantic labels annotated by experienced physicians and a new structural medical knowledge base for Med-VQA. Besides, SLAKE includes richer modalities and covers more human body parts than the currently available dataset. We show that SLAKE can be used to facilitate the development and evaluation of Med-VQA systems. The dataset can be downloaded from http://www.med-vqa.com/slake.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 8 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

MKG-RAG-Bench: Benchmarking Retrieval in Multimodal Knowledge Graph-Augmented Generation
cs.AI 2026-06 unverdicted novelty 7.0

MKG-RAG-Bench is a cross-domain benchmark for retrieval in multimodal knowledge graph-augmented generation, constructed via LLM curation from two MKGs with aligned QA datasets.
MMBU: A Massive Multi-modal Biomedical Understanding Benchmark to Probe the Perception Capabilities of Vision-Language Models
cs.CV 2026-06 unverdicted novelty 7.0

Introduces MMBU benchmark for VLMs in biomedicine and demonstrates that established benchmarks mask perception deficiencies in evaluated models.
Does Language Shift Break Medical Vision-Language Models? Indonesian Radiology Visual Question Answering Case Study
cs.CL 2026-06 conditional novelty 7.0

Introduces IndoRad-VQA dataset and reports 8-25% performance gap in medical VLMs between English and Indonesian radiology VQA prompts.
CXR-ContraBench: Benchmarking Negated-Option Attraction in Medical VLMs
cs.CV 2026-05 conditional novelty 7.0

Medical VLMs frequently select negated options that contradict visible chest X-ray findings, achieving only ~30% accuracy on direct presence probes, but a post-hoc consistency verifier raises accuracy above 95%.
Ask4VG: Risk-Aware Question Selection for Reducing Prior-Driven Answers in Medical VQA
cs.CV 2026-05 unverdicted novelty 5.0

Ask4VG learns a risk estimator from counterfactual visual probes to rerank question rewrites, reducing held-out hallucination risk from 0.658 to 0.623 and raising accuracy from 0.337 to 0.356 on VQA-RAD.
$M^3 QuestionIng$: Multi-modal Multi-span Medical Question Answering
cs.IR 2026-05 unverdicted novelty 5.0

Proposes a multi-modal multi-span medical QA framework and new dataset that outputs answers containing both text and relevant images.
M4CXR: Exploring Multi-task Potentials of Multi-modal Large Language Models for Chest X-ray Interpretation
cs.CV 2024-08 unverdicted novelty 5.0

M4CXR is a multi-modal large language model that performs multiple tasks in chest X-ray analysis including report generation with claimed SOTA clinical accuracy using chain-of-thought prompting.
BCER Agent: Reliable Long-Horizon MRI Workflow Execution via Compilation, Artifact Binding, and Bounded Local Recovery
eess.IV 2026-05 unverdicted novelty 4.0

BCER agent improves end-to-end reliability of long-horizon MRI workflows via compilation, artifact binding, and bounded local recovery, outperforming reactive baselines especially on long-chain tasks across brain, pro...