SLAKE: A Semantically-Labeled Knowledge-Enhanced Dataset for Medical Visual Question Answering
read the original abstract
Medical visual question answering (Med-VQA) has tremendous potential in healthcare. However, the development of this technology is hindered by the lacking of publicly-available and high-quality labeled datasets for training and evaluation. In this paper, we present a large bilingual dataset, SLAKE, with comprehensive semantic labels annotated by experienced physicians and a new structural medical knowledge base for Med-VQA. Besides, SLAKE includes richer modalities and covers more human body parts than the currently available dataset. We show that SLAKE can be used to facilitate the development and evaluation of Med-VQA systems. The dataset can be downloaded from http://www.med-vqa.com/slake.
This paper has not been read by Pith yet.
Forward citations
Cited by 8 Pith papers
-
MKG-RAG-Bench: Benchmarking Retrieval in Multimodal Knowledge Graph-Augmented Generation
MKG-RAG-Bench is a cross-domain benchmark for retrieval in multimodal knowledge graph-augmented generation, constructed via LLM curation from two MKGs with aligned QA datasets.
-
MMBU: A Massive Multi-modal Biomedical Understanding Benchmark to Probe the Perception Capabilities of Vision-Language Models
Introduces MMBU benchmark for VLMs in biomedicine and demonstrates that established benchmarks mask perception deficiencies in evaluated models.
-
Does Language Shift Break Medical Vision-Language Models? Indonesian Radiology Visual Question Answering Case Study
Introduces IndoRad-VQA dataset and reports 8-25% performance gap in medical VLMs between English and Indonesian radiology VQA prompts.
-
CXR-ContraBench: Benchmarking Negated-Option Attraction in Medical VLMs
Medical VLMs frequently select negated options that contradict visible chest X-ray findings, achieving only ~30% accuracy on direct presence probes, but a post-hoc consistency verifier raises accuracy above 95%.
-
Ask4VG: Risk-Aware Question Selection for Reducing Prior-Driven Answers in Medical VQA
Ask4VG learns a risk estimator from counterfactual visual probes to rerank question rewrites, reducing held-out hallucination risk from 0.658 to 0.623 and raising accuracy from 0.337 to 0.356 on VQA-RAD.
-
$M^3 QuestionIng$: Multi-modal Multi-span Medical Question Answering
Proposes a multi-modal multi-span medical QA framework and new dataset that outputs answers containing both text and relevant images.
-
M4CXR: Exploring Multi-task Potentials of Multi-modal Large Language Models for Chest X-ray Interpretation
M4CXR is a multi-modal large language model that performs multiple tasks in chest X-ray analysis including report generation with claimed SOTA clinical accuracy using chain-of-thought prompting.
-
BCER Agent: Reliable Long-Horizon MRI Workflow Execution via Compilation, Artifact Binding, and Bounded Local Recovery
BCER agent improves end-to-end reliability of long-horizon MRI workflows via compilation, artifact binding, and bounded local recovery, outperforming reactive baselines especially on long-chain tasks across brain, pro...
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.