pith. sign in

arxiv: 2102.09542 · v1 · pith:QOOIELZ7new · submitted 2021-02-18 · 💻 cs.CV · cs.AI· cs.CL

SLAKE: A Semantically-Labeled Knowledge-Enhanced Dataset for Medical Visual Question Answering

classification 💻 cs.CV cs.AIcs.CL
keywords slakedatasetmed-vqamedicalansweringdevelopmentevaluationquestion
0
0 comments X
read the original abstract

Medical visual question answering (Med-VQA) has tremendous potential in healthcare. However, the development of this technology is hindered by the lacking of publicly-available and high-quality labeled datasets for training and evaluation. In this paper, we present a large bilingual dataset, SLAKE, with comprehensive semantic labels annotated by experienced physicians and a new structural medical knowledge base for Med-VQA. Besides, SLAKE includes richer modalities and covers more human body parts than the currently available dataset. We show that SLAKE can be used to facilitate the development and evaluation of Med-VQA systems. The dataset can be downloaded from http://www.med-vqa.com/slake.

This paper has not been read by Pith yet.

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Forward citations

Cited by 8 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. MKG-RAG-Bench: Benchmarking Retrieval in Multimodal Knowledge Graph-Augmented Generation

    cs.AI 2026-06 unverdicted novelty 7.0

    MKG-RAG-Bench is a cross-domain benchmark for retrieval in multimodal knowledge graph-augmented generation, constructed via LLM curation from two MKGs with aligned QA datasets.

  2. MMBU: A Massive Multi-modal Biomedical Understanding Benchmark to Probe the Perception Capabilities of Vision-Language Models

    cs.CV 2026-06 unverdicted novelty 7.0

    Introduces MMBU benchmark for VLMs in biomedicine and demonstrates that established benchmarks mask perception deficiencies in evaluated models.

  3. Does Language Shift Break Medical Vision-Language Models? Indonesian Radiology Visual Question Answering Case Study

    cs.CL 2026-06 conditional novelty 7.0

    Introduces IndoRad-VQA dataset and reports 8-25% performance gap in medical VLMs between English and Indonesian radiology VQA prompts.

  4. CXR-ContraBench: Benchmarking Negated-Option Attraction in Medical VLMs

    cs.CV 2026-05 conditional novelty 7.0

    Medical VLMs frequently select negated options that contradict visible chest X-ray findings, achieving only ~30% accuracy on direct presence probes, but a post-hoc consistency verifier raises accuracy above 95%.

  5. Ask4VG: Risk-Aware Question Selection for Reducing Prior-Driven Answers in Medical VQA

    cs.CV 2026-05 unverdicted novelty 5.0

    Ask4VG learns a risk estimator from counterfactual visual probes to rerank question rewrites, reducing held-out hallucination risk from 0.658 to 0.623 and raising accuracy from 0.337 to 0.356 on VQA-RAD.

  6. $M^3 QuestionIng$: Multi-modal Multi-span Medical Question Answering

    cs.IR 2026-05 unverdicted novelty 5.0

    Proposes a multi-modal multi-span medical QA framework and new dataset that outputs answers containing both text and relevant images.

  7. M4CXR: Exploring Multi-task Potentials of Multi-modal Large Language Models for Chest X-ray Interpretation

    cs.CV 2024-08 unverdicted novelty 5.0

    M4CXR is a multi-modal large language model that performs multiple tasks in chest X-ray analysis including report generation with claimed SOTA clinical accuracy using chain-of-thought prompting.

  8. BCER Agent: Reliable Long-Horizon MRI Workflow Execution via Compilation, Artifact Binding, and Bounded Local Recovery

    eess.IV 2026-05 unverdicted novelty 4.0

    BCER agent improves end-to-end reliability of long-horizon MRI workflows via compilation, artifact binding, and bounded local recovery, outperforming reactive baselines especially on long-chain tasks across brain, pro...