IMCBench is a new benchmark for image-grounded multi-turn medical conversations that evaluates eight multimodal LLMs on safety, accuracy, and uncertainty, finding Claude Opus highest overall but safety drops for malignant and rare conditions.
Evaluating llm– generated multimodal diagnosis from medical images and symptom analysis,
2 Pith papers cite this work. Polarity classification is still indexing.
years
2026 2verdicts
UNVERDICTED 2representative citing papers
SABER integrates LLM semantics into brain networks via global self-attention and multi-scale hypergraphs with decision-level alignment, claiming SOTA performance, stability, and interpretability on ABIDE and ADHD-200.
citing papers explorer
-
IMCBench: A benchmark for multimodal LLMs in Image-grounded Medical Conversations
IMCBench is a new benchmark for image-grounded multi-turn medical conversations that evaluates eight multimodal LLMs on safety, accuracy, and uncertainty, finding Claude Opus highest overall but safety drops for malignant and rare conditions.
-
SABER: A Semantic-Aligned Brain Network Analysis Framework via Multi-scale Hypergraphs
SABER integrates LLM semantics into brain networks via global self-attention and multi-scale hypergraphs with decision-level alignment, claiming SOTA performance, stability, and interpretability on ABIDE and ADHD-200.