arXiv preprint arXiv:2504.14391 , year=

How Well Can General Vision-Language Models Learn Medicine By Watching Public Educational Videos? , author= · 2025 · arXiv 2504.14391

2 Pith papers cite this work. Polarity classification is still indexing.

2 Pith papers citing it

read on arXiv browse 2 citing papers

representative citing papers

MedHorizon: Towards Long-context Medical Video Understanding in the Wild

cs.CV · 2026-05-07 · unverdicted · novelty 8.0

MedHorizon benchmark reveals current multimodal LLMs achieve only 41.1% accuracy on long medical videos due to failures in sparse evidence retrieval and procedural reasoning.

Evidence-Based Actor-Verifier Reasoning for Echocardiographic Agents

cs.CV · 2026-04-07 · unverdicted · novelty 5.0

EchoTrust is an evidence-driven actor-verifier framework that produces structured intermediate representations for more reliable and interpretable reasoning in echocardiography visual language models.

citing papers explorer

Showing 2 of 2 citing papers.

MedHorizon: Towards Long-context Medical Video Understanding in the Wild cs.CV · 2026-05-07 · unverdicted · none · ref 82
MedHorizon benchmark reveals current multimodal LLMs achieve only 41.1% accuracy on long medical videos due to failures in sparse evidence retrieval and procedural reasoning.
Evidence-Based Actor-Verifier Reasoning for Echocardiographic Agents cs.CV · 2026-04-07 · unverdicted · none · ref 20
EchoTrust is an evidence-driven actor-verifier framework that produces structured intermediate representations for more reliable and interpretable reasoning in echocardiography visual language models.

arXiv preprint arXiv:2504.14391 , year=

fields

years

verdicts

representative citing papers

citing papers explorer