EmoBench-M: Benchmarking Emotional Intelligence for Multimodal Large Language Models

He Hu , Lianzhong You , Hongbo Xu , Qianning Wang , Fei Richard Yu , Fei Ma , Zebang Cheng , Zheng Lian

show 2 more authors

Yucheng Zhou Laizhong Cui

Authors on Pith no claims yet

classification 💻 cs.CL cs.AI

keywords emotionalemobench-mmllmsmodelsevaluationintelligencemultimodalacross

0 comments

read the original abstract

With the integration of multimodal large language models (MLLMs) into robotic systems and AI applications, embedding emotional intelligence (EI) capabilities is essential for enabling these models to perceive, interpret, and respond to human emotions effectively in real-world scenarios. Existing static, text-based, or text-image benchmarks overlook the multimodal complexities of real interactions and fail to capture the dynamic, context-dependent nature of emotional expressions, rendering them inadequate for evaluating MLLMs' EI capabilities. To address these limitations, we introduce EmoBench-M, a systematic benchmark grounded in established psychological theories, designed to evaluate MLLMs across 13 evaluation scenarios spanning three hierarchical dimensions: foundational emotion recognition (FER), conversational emotion understanding (CEU), and socially complex emotion analysis (SCEA). Evaluation was conducted on 27 state-of-the-art MLLMs, using both objective task-specific metrics and LLM-based evaluation, revealing a substantial performance gap relative to human-level competence. Even the best performing models, Gemini-3.0-Pro and GPT-5.2, achieve the highest scores on EmoBench-M, 70.5 and 66.5 points respectively. Specialized models such as AffectGPT exhibit uneven performance across EmoBench-M, demonstrating strengths in certain scenarios but generally lacking comprehensive emotional intelligence. By providing a comprehensive, multimodal evaluation framework, EmoBench-M captures both the strengths and weaknesses of current MLLMs across diverse emotional contexts. All benchmark resources, including datasets and code, are publicly available at https://emo-gml.github.io/, facilitating further research and advancement in MLLM emotional intelligence.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 2 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

EmoTrans: A Benchmark for Understanding, Reasoning, and Predicting Emotion Transitions in Multimodal LLMs
cs.CV 2026-04 unverdicted novelty 7.0

EmoTrans is a new video benchmark with four progressive tasks that measures how well current multimodal LLMs handle dynamic emotion transitions rather than static recognition.
AICA-Bench: Holistically Examining the Capabilities of VLMs in Affective Image Content Analysis
cs.CV 2026-04 unverdicted novelty 6.0

AICA-Bench evaluates 23 VLMs on affective image analysis, identifies weak intensity calibration and shallow descriptions as limitations, and proposes training-free Grounded Affective Tree Prompting to improve performance.