MedHorizon benchmark reveals current multimodal LLMs achieve only 41.1% accuracy on long medical videos due to failures in sparse evidence retrieval and procedural reasoning.
arXiv preprint arXiv:2504.14391 , year=
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
fields
cs.CV 2years
2026 2verdicts
UNVERDICTED 2representative citing papers
EchoTrust is an evidence-driven actor-verifier framework that produces structured intermediate representations for more reliable and interpretable reasoning in echocardiography visual language models.
citing papers explorer
-
MedHorizon: Towards Long-context Medical Video Understanding in the Wild
MedHorizon benchmark reveals current multimodal LLMs achieve only 41.1% accuracy on long medical videos due to failures in sparse evidence retrieval and procedural reasoning.
-
Evidence-Based Actor-Verifier Reasoning for Echocardiographic Agents
EchoTrust is an evidence-driven actor-verifier framework that produces structured intermediate representations for more reliable and interpretable reasoning in echocardiography visual language models.