MER 2026: From Discriminative Emotion Recognition to Generative Emotion Understanding
Pith reviewed 2026-05-10 01:52 UTC · model grok-4.3
The pith
MER2026 advances emotion recognition from fixed basic labels to generative understanding with four new tracks.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
MER2026 contains four tracks: MER-Cross for dyadic interaction scenarios, MER-FG for fine-grained emotion recognition, MER-Prefer for predicting human preferences over emotion descriptions, and MER-PS for emotion recognition from physiological signals, continuing the series' progression from discriminative to generative emotion understanding.
What carries the argument
The MER challenge series, which supplies datasets and baselines for tasks that test models on evolving aspects of emotion recognition.
If this is right
- Emotion recognition systems will be tested on understanding exchanges between two people rather than single individuals.
- Models will need to distinguish subtle emotion variations instead of selecting from a small fixed set of labels.
- Generated descriptions will be scored by how well they match human preferences for accuracy and usefulness.
- Emotion detection will incorporate body signals such as heart rate or skin conductance in addition to visual and audio input.
Where Pith is reading between the lines
- These tracks could support development of conversational agents that respond more appropriately during real-time social exchanges.
- The preference-prediction track may surface which kinds of explanations people actually find helpful in emotion AI applications.
- Combining physiological data with language-model outputs could create more reliable systems for monitoring emotional states in health or education settings.
Load-bearing premise
That the new tasks will successfully draw on the broad vocabulary and multimodal capabilities of large language models to produce finer and more explainable emotion recognition.
What would settle it
An evaluation in which models trained or tested on the MER2026 tasks show no gains in fine-grained accuracy or human preference alignment compared with models from the prior discriminative-label editions.
Figures
read the original abstract
MER2026 marks the fourth edition of the MER series of challenges. The MER series provides valuable data resources to the research community and offers tasks centered on recent research trends, establishing itself as one of the largest challenges in the field. Throughout its history, the focus of MER has shifted from discriminative emotion recognition to generative emotion understanding. Specifically, MER2023 concentrated on discriminative emotion recognition, restricting the emotion recognition scope to fixed basic labels. In MER2024 and MER2025, we transitioned to generative emotion understanding and introduced two new tasks: fine-grained emotion recognition and descriptive emotion analysis, aiming to leverage the extensive vocabulary and multimodal understanding capabilities of Multimodal Large Language Models (MLLMs) to facilitate fine-grained and explainable emotion recognition. Building on this trajectory, MER2026 continues to follow these research trends and contains four tracks: MER-Cross shifts the focus from individual to dyadic interaction scenarios; MER-FG centers on fine-grained emotion recognition; MER-Prefer aims to predict human preferences regarding different emotion descriptions; MER-PS focuses on emotion recognition based on physiological signals. More details regarding the dataset and baselines are available at https://zeroqiaoba.github.io/MER-Challenge.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript announces MER2026, the fourth edition of the MER challenge series. It traces the progression from discriminative emotion recognition with fixed labels in MER2023 to generative emotion understanding in MER2024/2025 via fine-grained and descriptive tasks that aim to exploit MLLMs. MER2026 introduces four tracks: MER-Cross (dyadic interactions), MER-FG (fine-grained recognition), MER-Prefer (human preference prediction over emotion descriptions), and MER-PS (physiological-signal-based recognition). Task definitions, historical context, and pointers to datasets/baselines on an external site are provided; no empirical results or derivations appear.
Significance. If the tracks are implemented with high-quality, publicly released datasets and clear evaluation protocols, the challenge will extend the MER series' role in supplying community resources and benchmarks. By targeting dyadic, fine-grained, preference-driven, and multimodal-physiological scenarios, it aligns with current trends toward explainable emotion understanding in HCI and could accelerate adoption of MLLMs for nuanced affective computing.
minor comments (2)
- [Abstract] Abstract and §1: the four tracks are introduced in a single sentence each; adding one or two sentences per track on input modalities, output format, and evaluation metric (even if summarized from the website) would make the manuscript more self-contained for readers who do not immediately consult the external link.
- The manuscript refers readers to https://zeroqiaoba.github.io/MER-Challenge for datasets and baselines but does not include even a high-level table summarizing track names, data sources, or participant requirements; such a table would improve readability and serve as a quick reference.
Simulated Author's Rebuttal
We sincerely thank the referee for their positive summary of the manuscript and for recommending minor revision. We appreciate the acknowledgment that the MER2026 tracks target important directions in dyadic, fine-grained, preference-driven, and physiological emotion understanding, and that the challenge series continues to supply valuable community resources.
Circularity Check
No significant circularity; purely descriptive challenge announcement
full rationale
The paper is a challenge announcement describing the MER2026 tracks (MER-Cross, MER-FG, MER-Prefer, MER-PS) as a continuation of prior MER editions' shift from discriminative to generative emotion understanding. No derivations, equations, predictions, fitted parameters, or load-bearing claims appear. The text only defines tasks and points to external resources for datasets and baselines. No self-citation chain or self-definitional reduction exists; the announcement is self-contained as a forward-looking task description without internal logical steps that collapse to their own inputs.
Axiom & Free-Parameter Ledger
Reference graph
Works this paper leans on
-
[1]
Emotions: from brain to robot.Trends in Cognitive Sciences, 8(12):554–561, 2004
Michael A Arbib and Jean-Marc Fellous. Emotions: from brain to robot.Trends in Cognitive Sciences, 8(12):554–561, 2004
work page 2004
-
[2]
Jiafei Duan, Samson Yu, Hui Li Tan, Hongyuan Zhu, and Cheston Tan. A survey of embodied ai: From simulators to research tasks.IEEE Transactions on Emerging Topics in Computational Intelligence, 6(2):230–244, 2022
work page 2022
-
[3]
A general psychoevolutionary theory of emotion.Emotion: Theory, research, and experience, 1, 1980
Robert Plutchik. A general psychoevolutionary theory of emotion.Emotion: Theory, research, and experience, 1, 1980
work page 1980
- [4]
-
[5]
Zheng Lian, Licai Sun, Yong Ren, Hao Gu, Haiyang Sun, Lan Chen, Bin Liu, and Jianhua Tao. Merbench: A unified evaluation benchmark for multimodal emotion recognition.IEEE Transactions on Pattern Analysis and Machine Intelligence, 2026
work page 2026
-
[6]
Mer 2023: Multi-label learning, modal- ity robustness, and semi-supervised learning
Zheng Lian, Haiyang Sun, Licai Sun, Kang Chen, Mngyu Xu, Kexin Wang, Ke Xu, Yu He, Ying Li, Jinming Zhao, et al. Mer 2023: Multi-label learning, modal- ity robustness, and semi-supervised learning. InProceedings of the 31st ACM International Conference on Multimedia, pages 9610–9614, 2023
work page 2023
-
[7]
Zheng Lian, Haiyang Sun, Licai Sun, Zhuofan Wen, Siyuan Zhang, Shun Chen, Hao Gu, Jinming Zhao, Ziyang Ma, Xie Chen, et al. Mer 2024: Semi-supervised learning, noise robustness, and open-vocabulary multimodal emotion recognition. InProceedings of the 2nd International Workshop on Multimodal and Responsible Affective Computing, pages 41–48, 2024
work page 2024
-
[8]
Ov-mer: Towards open-vocabulary multimodal emotion recognition
Zheng Lian, Haiyang Sun, Licai Sun, Haoyu Chen, Lan Chen, Hao Gu, Zhuo- fan Wen, Shun Chen, Zhang Siyuan, Hailiang Yao, et al. Ov-mer: Towards open-vocabulary multimodal emotion recognition. InForty-second International Conference on Machine Learning, 2025
work page 2025
-
[9]
Mer 2025: When affective computing meets large language models
Zheng Lian, Rui Liu, Kele Xu, Bin Liu, Xuefei Liu, Yazhou Zhang, Xin Liu, Yong Li, Zebang Cheng, Haolin Zuo, et al. Mer 2025: When affective computing meets large language models. InProceedings of the 33rd ACM International Conference on Multimedia, pages 13837–13842, 2025
work page 2025
-
[10]
Zheng Lian, Licai Sun, Mingyu Xu, Haiyang Sun, Ke Xu, Zhuofan Wen, Shun Chen, Bin Liu, and Jianhua Tao. Explainable multimodal emotion reasoning. arXiv preprint arXiv:2306.15401, 2023
-
[11]
Erik Cambria, Rui Mao, Xulang Zhang, Luwei Xiao, Tiesunlong Shen, and Avinash Anand. SenticNet 9: Generative commonsense for emotion AI via conceptual primitive discovery and time shift mechanism.IEEE Transactions on Computa- tional Social Systems, 13, 2026
work page 2026
-
[12]
Zheng Lian, Licai Sun, Lan Chen, Haoyu Chen, Zebang Cheng, Fan Zhang, Ziyu Jia, Ziyang Ma, Fei Ma, Xiaojiang Peng, et al. Emoprefer: Can large language mod- els understand human emotion preferences? InProceedings of the International Conference on Learning Representations, ICLR, 2026
work page 2026
-
[13]
Is someone speaking? exploring long-term temporal features for audio-visual active speaker detection
Ruijie Tao, Zexu Pan, Rohan Kumar Das, Xinyuan Qian, Mike Zheng Shou, and Haizhou Li. Is someone speaking? exploring long-term temporal features for audio-visual active speaker detection. InProceedings of the 29th ACM international conference on multimedia, pages 3927–3935, 2021
work page 2021
- [14]
-
[15]
Zheng Lian, Haoyu Chen, Lan Chen, Haiyang Sun, Licai Sun, Yong Ren, Zebang Cheng, Bin Liu, Rui Liu, Xiaojiang Peng, et al. Affectgpt: A new dataset, model, and benchmark for emotion understanding with multimodal large language models. InForty-second International Conference on Machine Learning, 2025
work page 2025
-
[16]
Affectgpt-r1: Leveraging reinforcement learning for open-vocabulary multimodal emotion recognition,
Zheng Lian, Fan Zhang, Yazhou Zhang, Jianhua Tao, Rui Liu, Haoyu Chen, and Xiaobai Li. Affectgpt-r1: Leveraging reinforcement learning for open-vocabulary multimodal emotion recognition.arXiv preprint arXiv:2508.01318, 2026
-
[17]
Vernon J Lawhern, Amelia J Solon, Nicholas R Waytowich, Stephen M Gordon, Chou P Hung, and Brent J Lance. Eegnet: a compact convolutional neural network for eeg-based brain–computer interfaces.Journal of neural engineering, 15(5):056013, 2018
work page 2018
-
[18]
Kaining Fang, Jing Qu, Zixing Ding, Junhang Ding, and Lingguo Bu. Asac-net: A novel multimodal alignment-complementary fusion framework for eeg-fnirs emotion recognition.Information Fusion, 134:104329, 2026
work page 2026
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.