pith. sign in

arxiv: 2602.01173 · v2 · pith:KTOGG55Fnew · submitted 2026-02-01 · 💻 cs.CV

EEmo-Logic: A Unified Dataset and Multi-Stage Framework for Comprehensive Image-Evoked Emotion Assessment

classification 💻 cs.CV
keywords eemo-logicassessmentdatasetemotiontextbfcomprehensivefine-grainedimage-evoked
0
0 comments X
read the original abstract

Understanding the multi-dimensional attributes and intensity nuances of image-evoked emotions is pivotal for advancing machine empathy and empowering diverse human-computer interaction applications. However, existing models are still limited to coarse-grained emotion perception or deficient reasoning capabilities. To bridge this gap, we introduce \textbf{EEmoDB}, the largest image-{\ul e}voked {\ul emo}tion understanding {\ul d}ataset to date. It features $5$ analysis dimensions spanning $5$ distinct task categories, facilitating comprehensive interpretation. Specifically, we compile $1.2M$ question-answering (QA) pairs (EEmoDB-QA) from $125K$ images via automated generation, alongside a $36K$ dataset (EEmoDB-Assess) curated from $25K$ images for fine-grained assessment. Furthermore, we propose \textbf{EEmo-Logic}, an \textbf{all-in-one} multimodal large language model (MLLM) developed via instruction fine-tuning and task-customized group relative preference optimization (GRPO) with novel reward design. Extensive experiments demonstrate that EEmo-Logic achieves robust performance in in-domain and cross-domain datasets, excelling in emotion QA and fine-grained assessment. The dataset and code are available at https://github.com/workerred/EEmo-Logic.

This paper has not been read by Pith yet.

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.