arXiv preprint arXiv:2505.24173 , year=

DrVD-Bench: Do Vision-Language Models Reason Like Human Doctors in Medical Image Diagnosis? , author= · 2025 · arXiv 2505.24173

2 Pith papers cite this work. Polarity classification is still indexing.

2 Pith papers citing it

read on arXiv browse 2 citing papers

representative citing papers

MedOpenClaw and MedFlowBench: Auditing Medical Agents in Full-Study Workflows

cs.CV · 2026-03-25 · conditional · novelty 8.0

MedFlowBench evaluates VLM agents on full radiology and pathology studies by requiring both task answers and verifiable evidence like key slices and regions of interest, revealing that answer-only scores overestimate performance.

SaaS-Bench: Can Computer-Use Agents Leverage Real-World SaaS to Solve Professional Workflows?

cs.AI · 2026-05-15

citing papers explorer

Showing 2 of 2 citing papers.

MedOpenClaw and MedFlowBench: Auditing Medical Agents in Full-Study Workflows cs.CV · 2026-03-25 · conditional · none · ref 49
MedFlowBench evaluates VLM agents on full radiology and pathology studies by requiring both task answers and verifiable evidence like key slices and regions of interest, revealing that answer-only scores overestimate performance.
SaaS-Bench: Can Computer-Use Agents Leverage Real-World SaaS to Solve Professional Workflows? cs.AI · 2026-05-15 · unreviewed · ref 92

arXiv preprint arXiv:2505.24173 , year=

fields

years

verdicts

representative citing papers

citing papers explorer