OralMLLM-Bench reveals performance gaps between multimodal large language models and clinicians on cognitive tasks for dental radiographic analysis across periapical, panoramic, and cephalometric images.
Mimicking or reasoning: Rethinking multi-modal in-context learning in vision-language models
3 Pith papers cite this work. Polarity classification is still indexing.
verdicts
UNVERDICTED 3representative citing papers
Multimodal ICL lags text-only ICL in few-shot settings due to weak cross-modal reasoning alignment and unreliable task mapping transfer, with an inference-stage method proposed to strengthen transfer.
MetaEmbed trains fixed learnable Meta Tokens to produce granularity-organized multi-vector embeddings that support test-time scaling in multimodal retrieval.
citing papers explorer
-
OralMLLM-Bench: Evaluating Cognitive Capabilities of Multimodal Large Language Models in Dental Practice
OralMLLM-Bench reveals performance gaps between multimodal large language models and clinicians on cognitive tasks for dental radiographic analysis across periapical, panoramic, and cephalometric images.
-
Why Multimodal In-Context Learning Lags Behind? Unveiling the Inner Mechanisms and Bottlenecks
Multimodal ICL lags text-only ICL in few-shot settings due to weak cross-modal reasoning alignment and unreliable task mapping transfer, with an inference-stage method proposed to strengthen transfer.
-
MetaEmbed: Scaling Multimodal Retrieval at Test-Time with Flexible Late Interaction
MetaEmbed trains fixed learnable Meta Tokens to produce granularity-organized multi-vector embeddings that support test-time scaling in multimodal retrieval.