MedFlowBench evaluates VLM agents on full radiology and pathology studies by requiring both task answers and verifiable evidence like key slices and regions of interest, revealing that answer-only scores overestimate performance.
Nova: A benchmark for anomaly localization and clinical reasoning in brain mri
3 Pith papers cite this work. Polarity classification is still indexing.
years
2026 3representative citing papers
WALDO improves zero-shot anomaly localization in medical imaging by selecting reference distributions via entropy-weighted Sliced Wasserstein distances and Goldilocks zone sampling, yielding a 19% relative gain on brain MRI benchmarks.
GAZE framework with viewer tools and literature retrieval achieves 58.2 mAP@0.3 lesion localization and 34.9% top-1 diagnostic accuracy on 906 rare brain MRI cases in zero-shot setting, with larger gains on rarest pathologies.
citing papers explorer
-
MedOpenClaw and MedFlowBench: Auditing Medical Agents in Full-Study Workflows
MedFlowBench evaluates VLM agents on full radiology and pathology studies by requiring both task answers and verifiable evidence like key slices and regions of interest, revealing that answer-only scores overestimate performance.
-
Wasserstein-Aligned Localisation for VLM-Based Distributional OOD Detection in Medical Imaging
WALDO improves zero-shot anomaly localization in medical imaging by selecting reference distributions via entropy-weighted Sliced Wasserstein distances and Goldilocks zone sampling, yielding a 19% relative gain on brain MRI benchmarks.
-
GAZE: Grounded Agentic Zero-shot Evaluation with Viewer-Level Tools and Literature Retrieval on Rare Brain MRI
GAZE framework with viewer tools and literature retrieval achieves 58.2 mAP@0.3 lesion localization and 34.9% top-1 diagnostic accuracy on 906 rare brain MRI cases in zero-shot setting, with larger gains on rarest pathologies.