Evaluating llm– generated multimodal diagnosis from medical images and symptom analysis,

· 2024 · arXiv 2402.01730

2 Pith papers cite this work. Polarity classification is still indexing.

2 Pith papers citing it

representative citing papers

IMCBench: A benchmark for multimodal LLMs in Image-grounded Medical Conversations

cs.AI · 2026-06-26 · unverdicted · novelty 7.0

IMCBench is a new benchmark for image-grounded multi-turn medical conversations that evaluates eight multimodal LLMs on safety, accuracy, and uncertainty, finding Claude Opus highest overall but safety drops for malignant and rare conditions.

SABER: A Semantic-Aligned Brain Network Analysis Framework via Multi-scale Hypergraphs

cs.LG · 2026-07-02 · unverdicted · novelty 5.0

SABER integrates LLM semantics into brain networks via global self-attention and multi-scale hypergraphs with decision-level alignment, claiming SOTA performance, stability, and interpretability on ABIDE and ADHD-200.

citing papers explorer

Showing 2 of 2 citing papers.

IMCBench: A benchmark for multimodal LLMs in Image-grounded Medical Conversations cs.AI · 2026-06-26 · unverdicted · none · ref 11
IMCBench is a new benchmark for image-grounded multi-turn medical conversations that evaluates eight multimodal LLMs on safety, accuracy, and uncertainty, finding Claude Opus highest overall but safety drops for malignant and rare conditions.
SABER: A Semantic-Aligned Brain Network Analysis Framework via Multi-scale Hypergraphs cs.LG · 2026-07-02 · unverdicted · none · ref 7
SABER integrates LLM semantics into brain networks via global self-attention and multi-scale hypergraphs with decision-level alignment, claiming SOTA performance, stability, and interpretability on ABIDE and ADHD-200.

Evaluating llm– generated multimodal diagnosis from medical images and symptom analysis,

fields

years

verdicts

representative citing papers

citing papers explorer