NeuroQA is a large-scale 3D brain MRI visual question answering benchmark with verified image-grounded QA pairs, multi-domain coverage, and baseline evaluations showing current models lag behind text-only performance.
arXiv preprint arXiv:2506.11147 (2025)
4 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
years
2026 4roles
dataset 1polarities
use dataset 1representative citing papers
MedFlowBench evaluates VLM agents on full radiology and pathology studies by requiring both task answers and verifiable evidence like key slices and regions of interest, revealing that answer-only scores overestimate performance.
RadThinking releases a large longitudinal CT VQA dataset stratified into foundation perception questions, single-rule reasoning questions, and compositional multi-step chains grounded in clinical reporting standards for cancer screening.
SemEnrich enriches radiology reports with positive/neutral findings via self-supervised semantic clustering, yielding average gains of 5-7% on COMET, BERT score, Sentence BLEU, CheXbert-F1 and RadGraph-F1 after fine-tuning, plus further gains when cluster info is added to GRPO rewards.
citing papers explorer
-
NeuroQA: A Large-Scale Image-Grounded Benchmark for 3D Brain MRI Understanding
NeuroQA is a large-scale 3D brain MRI visual question answering benchmark with verified image-grounded QA pairs, multi-domain coverage, and baseline evaluations showing current models lag behind text-only performance.
-
MedOpenClaw and MedFlowBench: Auditing Medical Agents in Full-Study Workflows
MedFlowBench evaluates VLM agents on full radiology and pathology studies by requiring both task answers and verifiable evidence like key slices and regions of interest, revealing that answer-only scores overestimate performance.
-
RadThinking: A Dataset for Longitudinal Clinical Reasoning in Radiology
RadThinking releases a large longitudinal CT VQA dataset stratified into foundation perception questions, single-rule reasoning questions, and compositional multi-step chains grounded in clinical reporting standards for cancer screening.
-
SemEnrich: Self-Supervised Semantic Enrichment of Radiology Reports for Vision-Language Learning
SemEnrich enriches radiology reports with positive/neutral findings via self-supervised semantic clustering, yielding average gains of 5-7% on COMET, BERT score, Sentence BLEU, CheXbert-F1 and RadGraph-F1 after fine-tuning, plus further gains when cluster info is added to GRPO rewards.