MER 2026: From Discriminative Emotion Recognition to Generative Emotion Understanding

Bjorn W. Schuller; Erik Cambria; Fan Zhang; Fei Ma; Guoying Zhao; Jia Li; Jianhua Tao; Kele Xu; Laizhong Cui; Liang Yang

arxiv: 2604.19417 · v4 · submitted 2026-04-21 · 💻 cs.HC

MER 2026: From Discriminative Emotion Recognition to Generative Emotion Understanding

Zheng Lian , Xiaojiang Peng , Kele Xu , Ziyu Jia , Xinyi Che , Zebang Cheng , Fei Ma , Laizhong Cui

show 10 more authors

Yazhou Zhang Xin Liu Liang Yang Jia Li Fan Zhang Liumeng Xue Erik Cambria Guoying Zhao Bjorn W. Schuller Jianhua Tao

This is my paper

Pith reviewed 2026-05-10 01:52 UTC · model grok-4.3

classification 💻 cs.HC

keywords emotion recognitionmultimodal large language modelsgenerative emotion understandingMER challengedyadic interactionfine-grained emotionphysiological signalshuman preferences

0 comments

The pith

MER2026 advances emotion recognition from fixed basic labels to generative understanding with four new tracks.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper presents the fourth edition of the MER challenge series, documenting its ongoing shift from tasks that assign one of a few basic emotion categories to tasks that require generative, descriptive outputs from multimodal large language models. It specifies four tracks for the 2026 edition that move the focus to interactions between two people, finer emotion distinctions, human judgments of description quality, and signals from the body. A reader would care because the series supplies public datasets and evaluation baselines that can guide development of AI systems capable of handling nuanced, context-dependent emotions rather than isolated, coarse labels.

Core claim

MER2026 contains four tracks: MER-Cross for dyadic interaction scenarios, MER-FG for fine-grained emotion recognition, MER-Prefer for predicting human preferences over emotion descriptions, and MER-PS for emotion recognition from physiological signals, continuing the series' progression from discriminative to generative emotion understanding.

What carries the argument

The MER challenge series, which supplies datasets and baselines for tasks that test models on evolving aspects of emotion recognition.

If this is right

Emotion recognition systems will be tested on understanding exchanges between two people rather than single individuals.
Models will need to distinguish subtle emotion variations instead of selecting from a small fixed set of labels.
Generated descriptions will be scored by how well they match human preferences for accuracy and usefulness.
Emotion detection will incorporate body signals such as heart rate or skin conductance in addition to visual and audio input.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

These tracks could support development of conversational agents that respond more appropriately during real-time social exchanges.
The preference-prediction track may surface which kinds of explanations people actually find helpful in emotion AI applications.
Combining physiological data with language-model outputs could create more reliable systems for monitoring emotional states in health or education settings.

Load-bearing premise

That the new tasks will successfully draw on the broad vocabulary and multimodal capabilities of large language models to produce finer and more explainable emotion recognition.

What would settle it

An evaluation in which models trained or tested on the MER2026 tasks show no gains in fine-grained accuracy or human preference alignment compared with models from the prior discriminative-label editions.

Figures

Figures reproduced from arXiv: 2604.19417 by Bjorn W. Schuller, Erik Cambria, Fan Zhang, Fei Ma, Guoying Zhao, Jia Li, Jianhua Tao, Kele Xu, Laizhong Cui, Liang Yang, Liumeng Xue, Xiaojiang Peng, Xin Liu, Xinyi Che, Yazhou Zhang, Zebang Cheng, Zheng Lian, Ziyu Jia.

**Figure 2.** Figure 2: MER-FG. Unlike previous MER tasks that focus on [PITH_FULL_IMAGE:figures/full_fig_p003_2.png] view at source ↗

**Figure 3.** Figure 3: MER-Prefer. Given a video, the model needs to de [PITH_FULL_IMAGE:figures/full_fig_p005_3.png] view at source ↗

**Figure 4.** Figure 4: MER-PS. EEG and fNIRS signals are synchronously [PITH_FULL_IMAGE:figures/full_fig_p005_4.png] view at source ↗

read the original abstract

MER2026 marks the fourth edition of the MER series of challenges. The MER series provides valuable data resources to the research community and offers tasks centered on recent research trends, establishing itself as one of the largest challenges in the field. Throughout its history, the focus of MER has shifted from discriminative emotion recognition to generative emotion understanding. Specifically, MER2023 concentrated on discriminative emotion recognition, restricting the emotion recognition scope to fixed basic labels. In MER2024 and MER2025, we transitioned to generative emotion understanding and introduced two new tasks: fine-grained emotion recognition and descriptive emotion analysis, aiming to leverage the extensive vocabulary and multimodal understanding capabilities of Multimodal Large Language Models (MLLMs) to facilitate fine-grained and explainable emotion recognition. Building on this trajectory, MER2026 continues to follow these research trends and contains four tracks: MER-Cross shifts the focus from individual to dyadic interaction scenarios; MER-FG centers on fine-grained emotion recognition; MER-Prefer aims to predict human preferences regarding different emotion descriptions; MER-PS focuses on emotion recognition based on physiological signals. More details regarding the dataset and baselines are available at https://zeroqiaoba.github.io/MER-Challenge.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

This is a challenge announcement for MER2026 that defines four new tracks extending the prior series, with no new empirical results or methods presented.

read the letter

This document is an announcement for the fourth MER challenge rather than a research paper with findings. The core update is the addition of four tracks that continue the shift toward generative emotion tasks with multimodal large language models: MER-Cross for dyadic interactions, MER-FG for fine-grained labels, MER-Prefer for human preference on descriptions, and MER-PS for physiological signals. They reference datasets and baselines at an external site, which keeps things concrete for participants.

Referee Report

0 major / 2 minor

Summary. The manuscript announces MER2026, the fourth edition of the MER challenge series. It traces the progression from discriminative emotion recognition with fixed labels in MER2023 to generative emotion understanding in MER2024/2025 via fine-grained and descriptive tasks that aim to exploit MLLMs. MER2026 introduces four tracks: MER-Cross (dyadic interactions), MER-FG (fine-grained recognition), MER-Prefer (human preference prediction over emotion descriptions), and MER-PS (physiological-signal-based recognition). Task definitions, historical context, and pointers to datasets/baselines on an external site are provided; no empirical results or derivations appear.

Significance. If the tracks are implemented with high-quality, publicly released datasets and clear evaluation protocols, the challenge will extend the MER series' role in supplying community resources and benchmarks. By targeting dyadic, fine-grained, preference-driven, and multimodal-physiological scenarios, it aligns with current trends toward explainable emotion understanding in HCI and could accelerate adoption of MLLMs for nuanced affective computing.

minor comments (2)

[Abstract] Abstract and §1: the four tracks are introduced in a single sentence each; adding one or two sentences per track on input modalities, output format, and evaluation metric (even if summarized from the website) would make the manuscript more self-contained for readers who do not immediately consult the external link.
The manuscript refers readers to https://zeroqiaoba.github.io/MER-Challenge for datasets and baselines but does not include even a high-level table summarizing track names, data sources, or participant requirements; such a table would improve readability and serve as a quick reference.

Simulated Author's Rebuttal

0 responses · 0 unresolved

We sincerely thank the referee for their positive summary of the manuscript and for recommending minor revision. We appreciate the acknowledgment that the MER2026 tracks target important directions in dyadic, fine-grained, preference-driven, and physiological emotion understanding, and that the challenge series continues to supply valuable community resources.

Circularity Check

0 steps flagged

No significant circularity; purely descriptive challenge announcement

full rationale

The paper is a challenge announcement describing the MER2026 tracks (MER-Cross, MER-FG, MER-Prefer, MER-PS) as a continuation of prior MER editions' shift from discriminative to generative emotion understanding. No derivations, equations, predictions, fitted parameters, or load-bearing claims appear. The text only defines tasks and points to external resources for datasets and baselines. No self-citation chain or self-definitional reduction exists; the announcement is self-contained as a forward-looking task description without internal logical steps that collapse to their own inputs.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

No free parameters, axioms, or invented entities are introduced; the document is a challenge announcement without technical derivations or new postulated constructs.

pith-pipeline@v0.9.0 · 5573 in / 993 out tokens · 52454 ms · 2026-05-10T01:52:56.941848+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

18 extracted references · 18 canonical work pages

[1]

Emotions: from brain to robot.Trends in Cognitive Sciences, 8(12):554–561, 2004

Michael A Arbib and Jean-Marc Fellous. Emotions: from brain to robot.Trends in Cognitive Sciences, 8(12):554–561, 2004

work page 2004
[2]

A survey of embodied ai: From simulators to research tasks.IEEE Transactions on Emerging Topics in Computational Intelligence, 6(2):230–244, 2022

Jiafei Duan, Samson Yu, Hui Li Tan, Hongyuan Zhu, and Cheston Tan. A survey of embodied ai: From simulators to research tasks.IEEE Transactions on Emerging Topics in Computational Intelligence, 6(2):230–244, 2022

work page 2022
[3]

A general psychoevolutionary theory of emotion.Emotion: Theory, research, and experience, 1, 1980

Robert Plutchik. A general psychoevolutionary theory of emotion.Emotion: Theory, research, and experience, 1, 1980

work page 1980
[4]

MIT press, 2000

Rosalind W Picard.Affective computing. MIT press, 2000

work page 2000
[5]

Merbench: A unified evaluation benchmark for multimodal emotion recognition.IEEE Transactions on Pattern Analysis and Machine Intelligence, 2026

Zheng Lian, Licai Sun, Yong Ren, Hao Gu, Haiyang Sun, Lan Chen, Bin Liu, and Jianhua Tao. Merbench: A unified evaluation benchmark for multimodal emotion recognition.IEEE Transactions on Pattern Analysis and Machine Intelligence, 2026

work page 2026
[6]

Mer 2023: Multi-label learning, modal- ity robustness, and semi-supervised learning

Zheng Lian, Haiyang Sun, Licai Sun, Kang Chen, Mngyu Xu, Kexin Wang, Ke Xu, Yu He, Ying Li, Jinming Zhao, et al. Mer 2023: Multi-label learning, modal- ity robustness, and semi-supervised learning. InProceedings of the 31st ACM International Conference on Multimedia, pages 9610–9614, 2023

work page 2023
[7]

Mer 2024: Semi-supervised learning, noise robustness, and open-vocabulary multimodal emotion recognition

Zheng Lian, Haiyang Sun, Licai Sun, Zhuofan Wen, Siyuan Zhang, Shun Chen, Hao Gu, Jinming Zhao, Ziyang Ma, Xie Chen, et al. Mer 2024: Semi-supervised learning, noise robustness, and open-vocabulary multimodal emotion recognition. InProceedings of the 2nd International Workshop on Multimodal and Responsible Affective Computing, pages 41–48, 2024

work page 2024
[8]

Ov-mer: Towards open-vocabulary multimodal emotion recognition

Zheng Lian, Haiyang Sun, Licai Sun, Haoyu Chen, Lan Chen, Hao Gu, Zhuo- fan Wen, Shun Chen, Zhang Siyuan, Hailiang Yao, et al. Ov-mer: Towards open-vocabulary multimodal emotion recognition. InForty-second International Conference on Machine Learning, 2025

work page 2025
[9]

Mer 2025: When affective computing meets large language models

Zheng Lian, Rui Liu, Kele Xu, Bin Liu, Xuefei Liu, Yazhou Zhang, Xin Liu, Yong Li, Zebang Cheng, Haolin Zuo, et al. Mer 2025: When affective computing meets large language models. InProceedings of the 33rd ACM International Conference on Multimedia, pages 13837–13842, 2025

work page 2025
[10]

AffectGPT: Multimodal large language model for emotion recognition.arXiv preprint arXiv:2306.15401, 2023

Zheng Lian, Licai Sun, Mingyu Xu, Haiyang Sun, Ke Xu, Zhuofan Wen, Shun Chen, Bin Liu, and Jianhua Tao. Explainable multimodal emotion reasoning. arXiv preprint arXiv:2306.15401, 2023

work page arXiv 2023
[11]

SenticNet 9: Generative commonsense for emotion AI via conceptual primitive discovery and time shift mechanism.IEEE Transactions on Computa- tional Social Systems, 13, 2026

Erik Cambria, Rui Mao, Xulang Zhang, Luwei Xiao, Tiesunlong Shen, and Avinash Anand. SenticNet 9: Generative commonsense for emotion AI via conceptual primitive discovery and time shift mechanism.IEEE Transactions on Computa- tional Social Systems, 13, 2026

work page 2026
[12]

Emoprefer: Can large language mod- els understand human emotion preferences? InProceedings of the International Conference on Learning Representations, ICLR, 2026

Zheng Lian, Licai Sun, Lan Chen, Haoyu Chen, Zebang Cheng, Fan Zhang, Ziyu Jia, Ziyang Ma, Fei Ma, Xiaojiang Peng, et al. Emoprefer: Can large language mod- els understand human emotion preferences? InProceedings of the International Conference on Learning Representations, ICLR, 2026

work page 2026
[13]

Is someone speaking? exploring long-term temporal features for audio-visual active speaker detection

Ruijie Tao, Zexu Pan, Rohan Kumar Das, Xinyuan Qian, Mike Zheng Shou, and Haizhou Li. Is someone speaking? exploring long-term temporal features for audio-visual active speaker detection. InProceedings of the 29th ACM international conference on multimedia, pages 3927–3935, 2021

work page 2021
[14]

Are there basic emotions? 1992

Paul Ekman. Are there basic emotions? 1992

work page 1992
[15]

Affectgpt: A new dataset, model, and benchmark for emotion understanding with multimodal large language models

Zheng Lian, Haoyu Chen, Lan Chen, Haiyang Sun, Licai Sun, Yong Ren, Zebang Cheng, Bin Liu, Rui Liu, Xiaojiang Peng, et al. Affectgpt: A new dataset, model, and benchmark for emotion understanding with multimodal large language models. InForty-second International Conference on Machine Learning, 2025

work page 2025
[16]

Affectgpt-r1: Leveraging reinforcement learning for open-vocabulary multimodal emotion recognition,

Zheng Lian, Fan Zhang, Yazhou Zhang, Jianhua Tao, Rui Liu, Haoyu Chen, and Xiaobai Li. Affectgpt-r1: Leveraging reinforcement learning for open-vocabulary multimodal emotion recognition.arXiv preprint arXiv:2508.01318, 2026

work page arXiv 2026
[17]

Eegnet: a compact convolutional neural network for eeg-based brain–computer interfaces.Journal of neural engineering, 15(5):056013, 2018

Vernon J Lawhern, Amelia J Solon, Nicholas R Waytowich, Stephen M Gordon, Chou P Hung, and Brent J Lance. Eegnet: a compact convolutional neural network for eeg-based brain–computer interfaces.Journal of neural engineering, 15(5):056013, 2018

work page 2018
[18]

Asac-net: A novel multimodal alignment-complementary fusion framework for eeg-fnirs emotion recognition.Information Fusion, 134:104329, 2026

Kaining Fang, Jing Qu, Zixing Ding, Junhang Ding, and Lingguo Bu. Asac-net: A novel multimodal alignment-complementary fusion framework for eeg-fnirs emotion recognition.Information Fusion, 134:104329, 2026

work page 2026

[1] [1]

Emotions: from brain to robot.Trends in Cognitive Sciences, 8(12):554–561, 2004

Michael A Arbib and Jean-Marc Fellous. Emotions: from brain to robot.Trends in Cognitive Sciences, 8(12):554–561, 2004

work page 2004

[2] [2]

A survey of embodied ai: From simulators to research tasks.IEEE Transactions on Emerging Topics in Computational Intelligence, 6(2):230–244, 2022

Jiafei Duan, Samson Yu, Hui Li Tan, Hongyuan Zhu, and Cheston Tan. A survey of embodied ai: From simulators to research tasks.IEEE Transactions on Emerging Topics in Computational Intelligence, 6(2):230–244, 2022

work page 2022

[3] [3]

A general psychoevolutionary theory of emotion.Emotion: Theory, research, and experience, 1, 1980

Robert Plutchik. A general psychoevolutionary theory of emotion.Emotion: Theory, research, and experience, 1, 1980

work page 1980

[4] [4]

MIT press, 2000

Rosalind W Picard.Affective computing. MIT press, 2000

work page 2000

[5] [5]

Merbench: A unified evaluation benchmark for multimodal emotion recognition.IEEE Transactions on Pattern Analysis and Machine Intelligence, 2026

Zheng Lian, Licai Sun, Yong Ren, Hao Gu, Haiyang Sun, Lan Chen, Bin Liu, and Jianhua Tao. Merbench: A unified evaluation benchmark for multimodal emotion recognition.IEEE Transactions on Pattern Analysis and Machine Intelligence, 2026

work page 2026

[6] [6]

Mer 2023: Multi-label learning, modal- ity robustness, and semi-supervised learning

Zheng Lian, Haiyang Sun, Licai Sun, Kang Chen, Mngyu Xu, Kexin Wang, Ke Xu, Yu He, Ying Li, Jinming Zhao, et al. Mer 2023: Multi-label learning, modal- ity robustness, and semi-supervised learning. InProceedings of the 31st ACM International Conference on Multimedia, pages 9610–9614, 2023

work page 2023

[7] [7]

Mer 2024: Semi-supervised learning, noise robustness, and open-vocabulary multimodal emotion recognition

Zheng Lian, Haiyang Sun, Licai Sun, Zhuofan Wen, Siyuan Zhang, Shun Chen, Hao Gu, Jinming Zhao, Ziyang Ma, Xie Chen, et al. Mer 2024: Semi-supervised learning, noise robustness, and open-vocabulary multimodal emotion recognition. InProceedings of the 2nd International Workshop on Multimodal and Responsible Affective Computing, pages 41–48, 2024

work page 2024

[8] [8]

Ov-mer: Towards open-vocabulary multimodal emotion recognition

Zheng Lian, Haiyang Sun, Licai Sun, Haoyu Chen, Lan Chen, Hao Gu, Zhuo- fan Wen, Shun Chen, Zhang Siyuan, Hailiang Yao, et al. Ov-mer: Towards open-vocabulary multimodal emotion recognition. InForty-second International Conference on Machine Learning, 2025

work page 2025

[9] [9]

Mer 2025: When affective computing meets large language models

Zheng Lian, Rui Liu, Kele Xu, Bin Liu, Xuefei Liu, Yazhou Zhang, Xin Liu, Yong Li, Zebang Cheng, Haolin Zuo, et al. Mer 2025: When affective computing meets large language models. InProceedings of the 33rd ACM International Conference on Multimedia, pages 13837–13842, 2025

work page 2025

[10] [10]

AffectGPT: Multimodal large language model for emotion recognition.arXiv preprint arXiv:2306.15401, 2023

Zheng Lian, Licai Sun, Mingyu Xu, Haiyang Sun, Ke Xu, Zhuofan Wen, Shun Chen, Bin Liu, and Jianhua Tao. Explainable multimodal emotion reasoning. arXiv preprint arXiv:2306.15401, 2023

work page arXiv 2023

[11] [11]

SenticNet 9: Generative commonsense for emotion AI via conceptual primitive discovery and time shift mechanism.IEEE Transactions on Computa- tional Social Systems, 13, 2026

Erik Cambria, Rui Mao, Xulang Zhang, Luwei Xiao, Tiesunlong Shen, and Avinash Anand. SenticNet 9: Generative commonsense for emotion AI via conceptual primitive discovery and time shift mechanism.IEEE Transactions on Computa- tional Social Systems, 13, 2026

work page 2026

[12] [12]

Emoprefer: Can large language mod- els understand human emotion preferences? InProceedings of the International Conference on Learning Representations, ICLR, 2026

Zheng Lian, Licai Sun, Lan Chen, Haoyu Chen, Zebang Cheng, Fan Zhang, Ziyu Jia, Ziyang Ma, Fei Ma, Xiaojiang Peng, et al. Emoprefer: Can large language mod- els understand human emotion preferences? InProceedings of the International Conference on Learning Representations, ICLR, 2026

work page 2026

[13] [13]

Is someone speaking? exploring long-term temporal features for audio-visual active speaker detection

Ruijie Tao, Zexu Pan, Rohan Kumar Das, Xinyuan Qian, Mike Zheng Shou, and Haizhou Li. Is someone speaking? exploring long-term temporal features for audio-visual active speaker detection. InProceedings of the 29th ACM international conference on multimedia, pages 3927–3935, 2021

work page 2021

[14] [14]

Are there basic emotions? 1992

Paul Ekman. Are there basic emotions? 1992

work page 1992

[15] [15]

Affectgpt: A new dataset, model, and benchmark for emotion understanding with multimodal large language models

Zheng Lian, Haoyu Chen, Lan Chen, Haiyang Sun, Licai Sun, Yong Ren, Zebang Cheng, Bin Liu, Rui Liu, Xiaojiang Peng, et al. Affectgpt: A new dataset, model, and benchmark for emotion understanding with multimodal large language models. InForty-second International Conference on Machine Learning, 2025

work page 2025

[16] [16]

Affectgpt-r1: Leveraging reinforcement learning for open-vocabulary multimodal emotion recognition,

Zheng Lian, Fan Zhang, Yazhou Zhang, Jianhua Tao, Rui Liu, Haoyu Chen, and Xiaobai Li. Affectgpt-r1: Leveraging reinforcement learning for open-vocabulary multimodal emotion recognition.arXiv preprint arXiv:2508.01318, 2026

work page arXiv 2026

[17] [17]

Eegnet: a compact convolutional neural network for eeg-based brain–computer interfaces.Journal of neural engineering, 15(5):056013, 2018

Vernon J Lawhern, Amelia J Solon, Nicholas R Waytowich, Stephen M Gordon, Chou P Hung, and Brent J Lance. Eegnet: a compact convolutional neural network for eeg-based brain–computer interfaces.Journal of neural engineering, 15(5):056013, 2018

work page 2018

[18] [18]

Asac-net: A novel multimodal alignment-complementary fusion framework for eeg-fnirs emotion recognition.Information Fusion, 134:104329, 2026

Kaining Fang, Jing Qu, Zixing Ding, Junhang Ding, and Lingguo Bu. Asac-net: A novel multimodal alignment-complementary fusion framework for eeg-fnirs emotion recognition.Information Fusion, 134:104329, 2026

work page 2026