MME is a manually annotated benchmark evaluating MLLMs on perception and cognition across 14 subtasks to avoid data leakage and support fair model comparisons.
An overview of large ai models and their applications.Visual Intelligence
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
citation-role summary
other 1
citation-polarity summary
fields
cs.CV 2roles
other 1polarities
unclear 1representative citing papers
VITA-1.5 integrates vision and speech into a single LLM through multi-stage training, delivering competitive benchmark results on image, video, and speech tasks with near real-time response speed.
citing papers explorer
-
MME: A Comprehensive Evaluation Benchmark for Multimodal Large Language Models
MME is a manually annotated benchmark evaluating MLLMs on perception and cognition across 14 subtasks to avoid data leakage and support fair model comparisons.
-
VITA-1.5: Towards GPT-4o Level Real-Time Vision and Speech Interaction
VITA-1.5 integrates vision and speech into a single LLM through multi-stage training, delivering competitive benchmark results on image, video, and speech tasks with near real-time response speed.