AV-EMO-Reasoning: Benchmarking Emotional Reasoning Capabilities in Omni-modal LLMS with Audio-visual Cues

Ajay Kankipati; Akshaj Gupta; Dingkun Zhou; Gopala Anumanchipalli; Grace Wang; Guan-Ting Lin; Huang-Cheng Chou; Jiachen Lian; Kan Jen Cheng; Krish Patel

arxiv: 2510.07355 · v2 · pith:G3QNB57Anew · submitted 2025-10-08 · 💻 cs.MM · cs.SD

AV-EMO-Reasoning: Benchmarking Emotional Reasoning Capabilities in Omni-modal LLMS with Audio-visual Cues

Dingkun Zhou , Krish Patel , Ajay Kankipati , Akshaj Gupta , Zeyi Austin Li , Mohul Shukla , Vibhor Narang , Sara Kofman

show 9 more authors

Zongli Ye Grace Wang Xiaoyu Shi Tingle Li Guan-Ting Lin Kan Jen Cheng Huang-Cheng Chou Jiachen Lian Gopala Anumanchipalli

This is my paper

classification 💻 cs.MM cs.SD

keywords reasoningemotionalinteractionmodelsaudiovisualbenchmarkcuesemotion

0 comments

read the original abstract

Emotions conveyed through voice and face shape engagement and context in human AI interaction. Despite rapid progress in omni modal large language models, the holistic evaluation of emotional reasoning with audiovisual cues remains limited. To address this gap, we introduce AV EMO Reasoning, a benchmark designed to systematically assess emotional reasoning abilities in large language models. The framework uses a curated audiovisual corpus comprising synthetic single turn and multi turn dialogues and a real world subset, together with emotion perception and interaction reasoning metrics, to evaluate whether models can understand user emotions and produce appropriate responses. By releasing a systematic evaluation benchmark, AV EMO Reasoning offers a reproducible standard for evaluating emotion aware dialogue and advances toward more natural, adaptive human AI interaction.

This paper has not been read by Pith yet.

AV-EMO-Reasoning: Benchmarking Emotional Reasoning Capabilities in Omni-modal LLMS with Audio-visual Cues

discussion (0)