MECO: A Multimodal Dataset for Emotion and Cognitive Understanding in Older Adults

Hongbin Chen; Jianqing Li; Jie Li; Siyang Song; Wei Wang; Wentao Xiang; Xiao Gu

arxiv: 2604.03050 · v1 · submitted 2026-04-03 · 💻 cs.HC · cs.AI

MECO: A Multimodal Dataset for Emotion and Cognitive Understanding in Older Adults

Hongbin Chen , Jie Li , Wei Wang , Siyang Song , Xiao Gu , Jianqing Li , Wentao Xiang This is my paper

Pith reviewed 2026-05-13 18:29 UTC · model grok-4.3

classification 💻 cs.HC cs.AI

keywords MECO datasetmultimodal signalsolder adultsemotion recognitioncognitive assessmentEEGECGmild cognitive impairment

0 comments

The pith

MECO supplies synchronized video, audio, EEG and ECG recordings from 42 older adults with emotion and cognitive annotations.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces MECO to address the shortage of multimodal datasets focused on emotion and cognition in older adults rather than young subjects. Data were gathered from 42 participants in community settings using standardized protocols to yield roughly 38 hours of synchronized video, audio, EEG, and ECG signals across 30,592 samples. Annotations cover self-reported valence and arousal, six basic emotions, and Mini-Mental State Examination cognitive scores. Baseline benchmarks for emotion prediction and cognitive state estimation are reported to illustrate immediate usability. The resource targets downstream tasks such as personalized affect recognition and early mild cognitive impairment detection in everyday environments.

Core claim

MECO is a multimodal dataset that records approximately 38 hours of synchronized video, audio, electroencephalography and electrocardiography from 42 older adults in community settings, producing 30,592 labeled samples annotated with self-assessed emotional valence, arousal, six basic emotions and Mini-Mental State Examination scores, together with baseline results for emotion and cognitive prediction tasks.

What carries the argument

The MECO dataset of synchronized multimodal signals and annotations for affect and cognition collected from older adults under community protocols.

Load-bearing premise

Recordings from 42 participants under standardized community protocols represent ecologically valid emotional and cognitive expressions across the broader older adult population.

What would settle it

A replication study that collects comparable multimodal signals from a larger, demographically broader sample of older adults and shows that models trained on MECO lose substantial accuracy on the new data would falsify the dataset's claimed representativeness.

Figures

Figures reproduced from arXiv: 2604.03050 by Hongbin Chen, Jianqing Li, Jie Li, Siyang Song, Wei Wang, Wentao Xiang, Xiao Gu.

**Figure 1.** Figure 1: Overview of MECO dataset. (a) Emotion-inducing video stimuli for older adults. (b) Data acquisition protocol, encom [PITH_FULL_IMAGE:figures/full_fig_p001_1.png] view at source ↗

**Figure 2.** Figure 2: Overview of the experimental protocol. Experimental Protocol As illustrated in [PITH_FULL_IMAGE:figures/full_fig_p003_2.png] view at source ↗

**Figure 3.** Figure 3: Distribution of MECO dataset. (a-c) Global-level sta [PITH_FULL_IMAGE:figures/full_fig_p004_3.png] view at source ↗

read the original abstract

While affective computing has advanced considerably, multimodal emotion prediction in aging populations remains underexplored, largely due to the scarcity of dedicated datasets. Existing multimodal benchmarks predominantly target young, cognitively healthy subjects, neglecting the influence of cognitive decline on emotional expression and physiological responses. To bridge this gap, we present MECO, a Multimodal dataset for Emotion and Cognitive understanding in Older adults. MECO includes 42 participants and provides approximately 38 hours of multimodal signals, yielding 30,592 synchronized samples. To maximize ecological validity, data collection followed standardized protocols within community-based settings. The modalities cover video, audio, electroencephalography (EEG), and electrocardiography (ECG). In addition, the dataset offers comprehensive annotations of emotional and cognitive states, including self-assessed valence, arousal, six basic emotions, and Mini-Mental State Examination cognitive scores. We further establish baseline benchmarks for both emotion and cognitive prediction. MECO serves as a foundational resource for multimodal modeling of affect and cognition in aging populations, facilitating downstream applications such as personalized emotion recognition and early detection of mild cognitive impairment (MCI) in real-world settings. The complete dataset and supplementary materials are available at https://maitrechen.github.io/meco-page/.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

MECO introduces a new multimodal dataset for older adults combining emotion and cognition, but lacks key sample details to support its MCI detection claims.

read the letter

The main thing your colleague should know is that this paper introduces MECO, a multimodal dataset for emotion and cognitive understanding in older adults. It includes data from 42 participants in community settings, covering about 38 hours of video, audio, EEG, and ECG signals, with annotations for emotional states and Mini-Mental State Examination scores. This is new because prior multimodal emotion datasets have mostly used young, healthy subjects, so having one tailored to aging populations with combined affect and cognition labels is a clear addition. The paper does a decent job on the collection side by using standardized protocols in real-world community environments rather than controlled labs. That choice supports better ecological validity. They also set up some baseline benchmarks for predicting emotions and cognitive states from the multimodal signals, and the data is made available publicly, which is helpful for the community. Where it falls short is in the supporting details for the stronger claims. The abstract and available info don't include the distribution of MMSE scores or how many participants might indicate mild cognitive impairment. Without that, it's tough to assess whether the dataset has enough variation to back up applications like early MCI detection. The sample size is modest at 42, and there's no information on annotation reliability, such as inter-rater agreement, or detailed error analysis from the baselines. These gaps make the downstream utility harder to evaluate right now. This work is aimed at researchers in affective computing, human-computer interaction, and those developing tools for elder care or cognitive health monitoring. A reader looking for new data resources in this underexplored area could find it valuable for initial experiments, though they'd likely need to supplement with their own analysis on the cognitive side. I think it deserves a serious referee because new datasets like this are rare and the topic is relevant given aging demographics, even if revisions are needed to strengthen the methods and results sections.

Referee Report

3 major / 3 minor

Summary. The paper introduces MECO, a multimodal dataset collected from 42 older adults in community-based settings, comprising ~38 hours of synchronized video, audio, EEG, and ECG recordings that produce 30,592 samples. It provides annotations for self-reported valence, arousal, six basic emotions, and MMSE cognitive scores, establishes baseline benchmarks for emotion and cognitive prediction, and positions the resource as enabling personalized emotion recognition and early MCI detection in aging populations.

Significance. If annotation reliability and cognitive variance are adequately demonstrated, MECO would address a clear gap in multimodal affective computing resources focused on older adults, supporting reproducible research on the intersection of cognitive status and emotional expression. The public release of the dataset and baselines contributes positively to the field by enabling downstream modeling work.

major comments (3)

[Abstract] Abstract: The claim that MECO facilitates early detection of mild cognitive impairment (MCI) is not supported by any reported statistics on MMSE score distribution, mean, range, standard deviation, or the count/proportion of participants scoring below 24; without these, it is impossible to verify sufficient variance in cognitive status for the stated downstream utility.
[Methods] Annotations (Methods section): No inter-rater agreement metrics (e.g., Fleiss' kappa or intraclass correlation) or reliability assessments are provided for the emotion annotations (valence, arousal, or basic emotions), which directly affects the interpretability of the baseline prediction results.
[Results] Baseline benchmarks (Results section): Cognitive prediction baselines are not stratified by MMSE status or compared to performance on younger-adult datasets, weakening the evaluation of the dataset's claimed advantage for MCI-related tasks.

minor comments (3)

[Introduction] Introduction: More explicit quantitative comparisons to existing multimodal emotion datasets (e.g., sample sizes, age ranges, and modality coverage) would strengthen the motivation section.
[Methods] Data collection: The description of synchronization across video, audio, EEG, and ECG modalities and any quality-control steps (e.g., artifact rejection) lacks sufficient technical detail for replication.
[Results] Figures: Axis labels and legends in the baseline performance plots are difficult to read at standard print size; consider increasing font size and adding error bars.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for their constructive feedback on the MECO dataset paper. We address each major comment point by point below, providing honest responses based on the manuscript content and planned revisions.

read point-by-point responses

Referee: [Abstract] The claim that MECO facilitates early detection of mild cognitive impairment (MCI) is not supported by any reported statistics on MMSE score distribution, mean, range, standard deviation, or the count/proportion of participants scoring below 24; without these, it is impossible to verify sufficient variance in cognitive status for the stated downstream utility.

Authors: We agree that the abstract claim would be strengthened by explicit MMSE statistics. In the revised manuscript, we will add the MMSE score distribution details, including mean, range, standard deviation, and the count/proportion of participants scoring below 24, to demonstrate cognitive variance in the sample. revision: yes
Referee: [Methods] No inter-rater agreement metrics (e.g., Fleiss' kappa or intraclass correlation) or reliability assessments are provided for the emotion annotations (valence, arousal, or basic emotions), which directly affects the interpretability of the baseline prediction results.

Authors: The valence, arousal, and six basic emotion labels are self-reported by participants rather than annotated by external raters. Inter-rater agreement metrics are therefore not applicable. We will revise the Methods section to explicitly state that these are self-assessments and discuss associated limitations in the context of cognitive status. revision: yes
Referee: [Results] Cognitive prediction baselines are not stratified by MMSE status or compared to performance on younger-adult datasets, weakening the evaluation of the dataset's claimed advantage for MCI-related tasks.

Authors: We report overall baselines as an initial benchmark for the dataset. Stratification by MMSE status risks underpowered subgroups given n=42; we will add a discussion of this limitation. Direct comparisons to younger-adult datasets are beyond the scope of this resource paper focused on older adults, though we can reference relevant literature for context. revision: partial

Circularity Check

0 steps flagged

No circularity: dataset paper with no derivation chain or fitted predictions

full rationale

The paper presents MECO as a new empirical multimodal dataset (42 participants, ~38 hours of video/audio/EEG/ECG signals, 30,592 samples, annotations including MMSE scores). It states that baseline benchmarks for emotion and cognitive prediction are established, but these are standard evaluations on the released data rather than any claimed derivation, prediction, or first-principles result that reduces to inputs by construction. No equations, self-definitional claims, fitted parameters renamed as predictions, or load-bearing self-citations appear in the provided text. The contribution is data provision and basic benchmarking; the derivation chain is absent, rendering the work self-contained against external benchmarks with no circular steps.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

Minimal ledger as this is an empirical dataset contribution without theoretical derivations, new entities, or fitted parameters.

axioms (1)

domain assumption Data collection followed standardized protocols within community-based settings to maximize ecological validity.
Assumed to ensure real-world relevance but details on protocol adherence and potential artifacts not provided in abstract.

pith-pipeline@v0.9.0 · 5530 in / 1212 out tokens · 45157 ms · 2026-05-13T18:29:24.393538+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

55 extracted references · 55 canonical work pages

[1]

Ingrid Arevalo-Rodriguez, Nadja Smailagic, Marta Roqué-Figuls, Agustín Ciap- poni, Erick Sanchez-Perez, Antri Giannakou, Olga L Pedraza, Xavier Bonfill Cosp, and Sarah Cullum. 2021. Mini-Mental State Examination (MMSE) for the early detection of dementia in people with mild cognitive impairment (MCI).Cochrane Database of Systematic Reviews2021, 7 (July 2021)

work page 2021
[2]

AmirAli Bagher Zadeh, Paul Pu Liang, Soujanya Poria, Erik Cambria, and Louis- Philippe Morency. 2018. Multimodal Language Analysis in the Wild: CMU- MOSEI Dataset and Interpretable Dynamic Fusion Graph. InProceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 2236–2246

work page 2018
[3]

Tadas Baltrusaitis, Peter Robinson, and Louis-Philippe Morency. 2016. OpenFace: An open source facial behavior analysis toolkit. In2016 IEEE Winter Conference on Applications of Computer Vision (W ACV). 1–10

work page 2016
[4]

John R Beard, Alana Officer, Islene Araujo de Carvalho, Ritu Sadana, Anne Mar- griet Pot, Jean-Pierre Michel, Peter Lloyd-Sherlock, JoAnne E Epping-Jordan, G M E E (Geeske) Peeters, Wahyu Retno Mahanani, Jotheeswaran Amuthavalli Thiyagarajan, and Somnath Chatterji. 2016. The World report on ageing and health: a policy framework for healthy ageing.The Lanc...

work page 2016
[5]

Gedas Bertasius, Heng Wang, and Lorenzo Torresani. 2021. Is space-time attention all you need for video understanding?. InIcml, Vol. 2. 4

work page 2021
[6]

Chang, Sungbok Lee, and Shrikanth S

Carlos Busso, Murtaza Bulut, Chi-Chun Lee, Abe Kazemzadeh, Emily Mower, Samuel Kim, Jeannette N. Chang, Sungbok Lee, and Shrikanth S. Narayanan. 2008. IEMOCAP: interactive emotional dyadic motion capture database.Language Resources and Evaluation42, 4 (Nov. 2008), 335–359

work page 2008
[7]

Yin Chen, Jia Li, Shiguang Shan, Meng Wang, and Richang Hong. 2025. From Static to Dynamic: Adapting Landmark-Aware Image Models for Facial Expres- sion Recognition in Videos.IEEE Transactions on Affective Computing16, 2 (April 2025), 624–638

work page 2025
[8]

Kyunghyun Cho, Bart van Merrienboer, Caglar Gulcehre, Dzmitry Bahdanau, Fethi Bougares, Holger Schwenk, and Yoshua Bengio. 2014. Learning Phrase Representations using RNN Encoder–Decoder for Statistical Machine Translation. InProceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP). 1724–1734

work page 2014
[9]

Masters, Benjamin Goudey, Liang Jin, and Yijun Pan

Chenyin Chu, Yihan Wang, Paul Maruff, Colin L. Masters, Benjamin Goudey, Liang Jin, and Yijun Pan. 2024. Developing a machine learning stack model to forecast the progression of mild cognitive impairment to Alzheimer’s demen- tia, using the Australian Imaging, Biomarker & Lifestyle (AIBL) Study dataset. Alzheimer’s & Dementia20, S10 (Dec. 2024)

work page 2024
[10]

Kateryna Chumachenko, Alexandros Iosifidis, and Moncef Gabbouj. 2024. MMA- DFER: MultiModal Adaptation of unimodal models for Dynamic Facial Expression Recognition in-the-wild. InCVPR Workshops. 4673–4682

work page 2024
[11]

Mini-mental state

Marshal F. Folstein, Susan E. Folstein, and Paul R. McHugh. 1975. “Mini-mental state”.Journal of Psychiatric Research12, 3 (Nov. 1975), 189–198

work page 1975
[12]

T. Higuchi. 1988. Approach to an irregular time series on the basis of the fractal theory.Physica D: Nonlinear Phenomena31, 2 (June 1988), 277–283

work page 1988
[13]

Yu-Liang Hsu, Jeen-Shing Wang, Wei-Chun Chiang, and Chien-Han Hung. 2020. Automatic ECG-Based Emotion Recognition in Music Listening.IEEE Transactions on Affective Computing11, 1 (Jan. 2020), 85–99

work page 2020
[14]

Smith, Yonas Geda, David Sultzer, Henry Brodaty, Gwenn Smith, Luis Agüera-Ortiz, Rob Sweet, David Miller, and Constantine G

Zahinoor Ismail, Eric E. Smith, Yonas Geda, David Sultzer, Henry Brodaty, Gwenn Smith, Luis Agüera-Ortiz, Rob Sweet, David Miller, and Constantine G. Lyketsos

work page
[15]

Neuropsychiatric symptoms as early manifestations of emergent dementia: Provisional diagnostic criteria for mild behavioral impairment.Alzheimer’s & Dementia12, 2 (June 2015), 195–202

work page 2015
[16]

Robert Jenke, Angelika Peer, and Martin Buss. 2014. Feature Extraction and Selection for Emotion Recognition from EEG.IEEE Transactions on Affective Computing5, 3 (July 2014), 327–339

work page 2014
[17]

Xingxun Jiang, Yuan Zong, Wenming Zheng, Chuangao Tang, Wanchuang Xia, Cheng Lu, and Jiateng Liu. 2020. DFEW: A Large-Scale Database for Recog- nizing Dynamic Facial Expressions in the Wild. InProceedings of the 28th ACM International Conference on Multimedia. 2881–2889

work page 2020
[18]

A. John, U. Patel, J. Rusted, M. Richards, and D. Gaysina. 2018. Affective problems and decline in cognitive state in older adults: a systematic review and meta- analysis.Psychological Medicine49, 3 (May 2018), 353–365

work page 2018
[19]

Stamos Katsigiannis and Naeem Ramzan. 2018. DREAMER: A Database for Emotion Recognition Through EEG and ECG Signals From Wireless Low-cost Off-the-Shelf Devices.IEEE Journal of Biomedical and Health Informatics22, 1 (Jan. 2018), 98–107

work page 2018
[20]

Koelstra, C

S. Koelstra, C. Muhl, M. Soleymani, Jong-Seok Lee, A. Yazdani, T. Ebrahimi, T. Pun, A. Nijholt, and I. Patras. 2012. DEAP: A Database for Emotion Analysis; Using Physiological Signals.IEEE Transactions on Affective Computing3, 1 (Jan. 2012), 18–31

work page 2012
[21]

Min-Ho Lee, Adai Shomanov, Balgyn Begim, Zhuldyz Kabidenova, Aruna Nyssan- bay, Adnan Yazici, and Seong-Whan Lee. 2024. EAV: EEG-Audio-Video Dataset for Emotion Recognition in Conversational Contexts.Scientific Data11, 1 (2024)

work page 2024
[22]

Tao Liang, Junxiao Yu, Keke Shi, Yihao Yao, Jie Li, Bin Liu, Wei Wang, Chengyu Liu, Liangcheng Qu, Kuiying Yin, Wentao Xiang, and Jianqing Li. 2025. Con- struction and evaluation of an emotion-inducing video dataset towards Chinese elderly healthy controls and individuals with mild cognitive impairment.Cogni- tive Neurodynamics19, 1 (Sept. 2025)

work page 2025
[23]

Yuanyuan Liu, Lin Wei, Kejun Liu, Zijing Chen, Zhe Chen, Chang Tang, Jingying Chen, and Shiguang Shan. 2025. Leveraging Eye Movement for Instructing Robust Video-Based Facial Expression Recognition.IEEE Transactions on Affective Computing16, 4 (Oct. 2025), 3404–3420

work page 2025
[24]

Ilya Loshchilov and Frank Hutter. 2017. SGDR: Stochastic Gradient Descent with Warm Restarts. InInternational Conference on Learning Representations

work page 2017
[25]

Ilya Loshchilov and Frank Hutter. 2019. Decoupled Weight Decay Regularization. InInternational Conference on Learning Representations

work page 2019
[26]

Saturnino Luz, Fasih Haider, Sofia de la Fuente, Davida Fromm, and Brian MacWhinney. 2020. Alzheimer’s Dementia Recognition Through Spontaneous Speech: The ADReSS Challenge. InInterspeech 2020. 2172–2176

work page 2020
[27]

Kaixin Ma, Xinyu Wang, Xinru Yang, Mingtong Zhang, Jeffrey M Girard, and Louis-Philippe Morency. 2019. ElderReact: A Multimodal Dataset for Recogniz- ing Emotional Response in Aging Adults. In2019 International Conference on Multimodal Interaction. 349–357

work page 2019
[28]

Mauss, Robert W

Iris B. Mauss, Robert W. Levenson, Loren McCarter, Frank H. Wilhelm, and James J. Gross. 2005. The Tie That Binds? Coherence Among Emotion Experience, Behavior, and Physiology.Emotion5, 2 (2005), 175–190

work page 2005
[29]

Juan Abdon Miranda-Correa, Mojtaba Khomami Abadi, Nicu Sebe, and Ioannis Patras. 2021. AMIGOS: A Dataset for Affect, Personality and Mood Research on Individuals and Groups.IEEE Transactions on Affective Computing12, 2 (April 2021), 479–493

work page 2021
[30]

M. A. Nicolaou, H. Gunes, and M. Pantic. 2011. Continuous Prediction of Sponta- neous Affect from Multiple Cues and Modalities in Valence-Arousal Space.IEEE Transactions on Affective Computing2, 2 (April 2011), 92–105

work page 2011
[31]

Cheul Young Park, Narae Cha, Soowon Kang, Auk Kim, Ahsan Habib Khandoker, Leontios Hadjileontiadis, Alice Oh, Yong Jeong, and Uichin Lee. 2020. K-EmoCon, a multimodal sensor dataset for continuous emotion recognition in naturalistic conversations.Scientific Data7, 1 (Sept. 2020)

work page 2020
[32]

Soujanya Poria, Erik Cambria, Rajiv Bajpai, and Amir Hussain. 2017. A review of affective computing: From unimodal analysis to multimodal fusion.Information Fusion37 (Sept. 2017), 98–125

work page 2017
[33]

Richman and J

Joshua S. Richman and J. Randall Moorman. 2000. Physiological time-series analysis using approximate entropy and sample entropy.American Journal of Physiology-Heart and Circulatory Physiology278, 6 (June 2000), H2039–H2049

work page 2000
[34]

Fabien Ringeval, Björn Schuller, Michel Valstar, Shashank Jaiswal, Erik Marchi, Denis Lalanne, Roddy Cowie, and Maja Pantic. 2015. AV+EC 2015: The First Affect Recognition Challenge Bridging Across Audio, Video, and Physiological Data. InProceedings of the 5th International Workshop on Audio/Visual Emotion Challenge. 3–8

work page 2015
[35]

Robert, C.U

P. Robert, C.U. Onyike, A.F.G. Leentjens, K. Dujardin, P. Aalten, S. Starkstein, F.R.J. Verhey, J. Yessavage, J.P. Clement, D. Drapier, F. Bayle, M. Benoit, P. Boyer, P.M. Lorca, F. Thibaut, S. Gauthier, G. Grossberg, B. Vellas, and J. Byrne. 2009. Proposed diagnostic criteria for apathy in Alzheimer’s disease and other neuropsychiatric disorders.European...

work page 2009
[36]

Elena Ryumina, Denis Dresvyanskiy, and Alexey Karpov. 2022. In search of a robust facial expressions recognition model: A large-scale visual cross-corpus study.Neurocomputing514 (Dec. 2022), 435–450

work page 2022
[37]

Tulika Saha, Aditya Patra, Sriparna Saha, and Pushpak Bhattacharyya. 2020. Towards Emotion-aided Multi-modal Dialogue Act Classification. InProceedings of the 58th Annual Meeting of the Association for Computational Linguistics

work page 2020
[38]

Sen, Gazi Naven, Luke Gerstner, Daryl Bagley, Raiyan Abdul Baten, Wasifur Rahman, Md Kamrul Hasan, Kurtis Haut, Abdullah Al Mamun, Samiha Samrose, Anne Solbu, R

Taylan K. Sen, Gazi Naven, Luke Gerstner, Daryl Bagley, Raiyan Abdul Baten, Wasifur Rahman, Md Kamrul Hasan, Kurtis Haut, Abdullah Al Mamun, Samiha Samrose, Anne Solbu, R. Eric Barnes, Mark G. Frank, and Ehsan Hoque. 2023. DBATES: Dataset for Discerning Benefits of Audio, Textual, and Facial Expres- sion Features in Competitive Debate Speeches.IEEE Transa...

work page 2023
[39]

Li-Chen Shi, Ying-Ying Jiao, and Bao-Liang Lu. 2013. Differential entropy feature for EEG-based vigilance estimation. In2013 35th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC). 6627–6630

work page 2013
[40]

Shrout and Joseph L

Patrick E. Shrout and Joseph L. Fleiss. 1979. Intraclass correlations: Uses in assessing rater reliability.Psychological Bulletin86, 2 (1979), 420–428

work page 1979
[41]

Soleymani, J

M. Soleymani, J. Lichtenauer, T. Pun, and M. Pantic. 2012. A Multimodal Database for Affect Recognition and Implicit Tagging.IEEE Transactions on Affective Computing3, 1 (Jan. 2012), 42–55

work page 2012
[42]

Nitish Srivastava, Geoffrey Hinton, Alex Krizhevsky, Ilya Sutskever, and Ruslan Salakhutdinov. 2014. Dropout: a simple way to prevent neural networks from overfitting.J. Mach. Learn. Res.15, 1 (Jan. 2014), 1929–1958

work page 2014
[43]

Vieriu, Stefan Winkler, and Nicu Sebe

Ramanathan Subramanian, Julia Wache, Mojtaba Khomami Abadi, Radu L. Vieriu, Stefan Winkler, and Nicu Sebe. 2018. ASCERTAIN: Emotion and Personality Recognition Using Commercial Sensors.IEEE Transactions on Affective Computing 9, 2 (April 2018), 147–160. Conference acronym ’XX, June 03–05, 2026, Woodstock, NY Chen et al

work page 2018
[44]

Du Tran, Lubomir Bourdev, Rob Fergus, Lorenzo Torresani, and Manohar Paluri

work page
[45]

In 2015 IEEE International Conference on Computer Vision (ICCV)

Learning Spatiotemporal Features with 3D Convolutional Networks. In 2015 IEEE International Conference on Computer Vision (ICCV). 4489–4497

work page 2015
[46]

Michel Valstar, Jonathan Gratch, Björn Schuller, Fabien Ringeval, Denis Lalanne, Mercedes Torres Torres, Stefan Scherer, Giota Stratou, Roddy Cowie, and Maja Pantic. 2016. AVEC 2016: Depression, Mood, and Emotion Recognition Workshop and Challenge. InProceedings of the 6th International Workshop on Audio/Visual Emotion Challenge. 3–10

work page 2016
[47]

Weiner, Dallas P

Michael W. Weiner, Dallas P. Veitch, Paul S. Aisen, Laurel A. Beckett, Nigel J. Cairns, Robert C. Green, Danielle Harvey, Clifford R. Jack, William Jagust, Enchi Liu, John C. Morris, Ronald C. Petersen, Andrew J. Saykin, Mark E. Schmidt, Leslie Shaw, Judith A. Siuciak, Holly Soares, Arthur W. Toga, and John Q. Trojanowski

work page
[48]

The Alzheimer’s Disease Neuroimaging Initiative: A review of papers published since its inception.Alzheimer’s & Dementia8, 1S (Nov. 2011)

work page 2011
[49]

Qu Yang, Qinghongya Shi, Tongxin Wang, and Mang Ye. 2025. Uncertain Mul- timodal Intention and Emotion Understanding in the Wild. In2025 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 24700–24709

work page 2025
[50]

Shiqing Zhang, Yijiao Yang, Chen Chen, Xingnan Zhang, Qingming Leng, and Xiaoming Zhao. 2024. Deep learning-based multimodal emotion recognition from audio, visual, and text modalities: A systematic review of recent advancements and future prospects.Expert Systems with Applications237 (2024), 121692

work page 2024
[51]

Zhicheng Zhang, Pancheng Zhao, Eunil Park, and Jufeng Yang. 2024. MART: Masked Affective RepresenTation Learning via Masked Temporal Distribution Distillation. In2024 IEEE/CVF Conference on Computer Vision and Pattern Recog- nition (CVPR). 12830–12840

work page 2024
[52]

Minghui Zhao, Hongxiang Gao, Xinru Qi, Haiyan Yin, Yulei Song, Yamei Bai, Jianqing Li, Lulu Zhao, and Chengyu Liu. 2025. Multi-Query Cross-Modal Atten- tion Fusion for Cognitive Impairment Recognition.IEEE Transactions on Neural Systems and Rehabilitation Engineering33 (2025), 2520–2530

work page 2025
[53]

Yuliang Zhao, Huawei Zhang, Jian Li, Siyang Song, Chao Lian, Yinghao Liu, Yulin Wang, and Changzeng Fu. 2025. Multimodal Depression Assessment Framework Integrating Personality and Gait for Older Adults With Medical Conditions.IEEE Transactions on Affective Computing16, 3 (July 2025), 2048–2061

work page 2025
[54]

Wei-Long Zheng, Wei Liu, Yifei Lu, Bao-Liang Lu, and Andrzej Cichocki. 2019. EmotionMeter: A Multimodal Framework for Recognizing Human Emotions. IEEE Transactions on Cybernetics49, 3 (March 2019), 1110–1122

work page 2019
[55]

Wei-Long Zheng and Bao-Liang Lu. 2015. Investigating Critical Frequency Bands and Channels for EEG-Based Emotion Recognition with Deep Neural Networks. IEEE Transactions on Autonomous Mental Development7, 3 (Sept. 2015), 162–175. Received 20 February 2007; revised 12 March 2009; accepted 5 June 2009

work page 2015

[1] [1]

Ingrid Arevalo-Rodriguez, Nadja Smailagic, Marta Roqué-Figuls, Agustín Ciap- poni, Erick Sanchez-Perez, Antri Giannakou, Olga L Pedraza, Xavier Bonfill Cosp, and Sarah Cullum. 2021. Mini-Mental State Examination (MMSE) for the early detection of dementia in people with mild cognitive impairment (MCI).Cochrane Database of Systematic Reviews2021, 7 (July 2021)

work page 2021

[2] [2]

AmirAli Bagher Zadeh, Paul Pu Liang, Soujanya Poria, Erik Cambria, and Louis- Philippe Morency. 2018. Multimodal Language Analysis in the Wild: CMU- MOSEI Dataset and Interpretable Dynamic Fusion Graph. InProceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 2236–2246

work page 2018

[3] [3]

Tadas Baltrusaitis, Peter Robinson, and Louis-Philippe Morency. 2016. OpenFace: An open source facial behavior analysis toolkit. In2016 IEEE Winter Conference on Applications of Computer Vision (W ACV). 1–10

work page 2016

[4] [4]

John R Beard, Alana Officer, Islene Araujo de Carvalho, Ritu Sadana, Anne Mar- griet Pot, Jean-Pierre Michel, Peter Lloyd-Sherlock, JoAnne E Epping-Jordan, G M E E (Geeske) Peeters, Wahyu Retno Mahanani, Jotheeswaran Amuthavalli Thiyagarajan, and Somnath Chatterji. 2016. The World report on ageing and health: a policy framework for healthy ageing.The Lanc...

work page 2016

[5] [5]

Gedas Bertasius, Heng Wang, and Lorenzo Torresani. 2021. Is space-time attention all you need for video understanding?. InIcml, Vol. 2. 4

work page 2021

[6] [6]

Chang, Sungbok Lee, and Shrikanth S

Carlos Busso, Murtaza Bulut, Chi-Chun Lee, Abe Kazemzadeh, Emily Mower, Samuel Kim, Jeannette N. Chang, Sungbok Lee, and Shrikanth S. Narayanan. 2008. IEMOCAP: interactive emotional dyadic motion capture database.Language Resources and Evaluation42, 4 (Nov. 2008), 335–359

work page 2008

[7] [7]

Yin Chen, Jia Li, Shiguang Shan, Meng Wang, and Richang Hong. 2025. From Static to Dynamic: Adapting Landmark-Aware Image Models for Facial Expres- sion Recognition in Videos.IEEE Transactions on Affective Computing16, 2 (April 2025), 624–638

work page 2025

[8] [8]

Kyunghyun Cho, Bart van Merrienboer, Caglar Gulcehre, Dzmitry Bahdanau, Fethi Bougares, Holger Schwenk, and Yoshua Bengio. 2014. Learning Phrase Representations using RNN Encoder–Decoder for Statistical Machine Translation. InProceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP). 1724–1734

work page 2014

[9] [9]

Masters, Benjamin Goudey, Liang Jin, and Yijun Pan

Chenyin Chu, Yihan Wang, Paul Maruff, Colin L. Masters, Benjamin Goudey, Liang Jin, and Yijun Pan. 2024. Developing a machine learning stack model to forecast the progression of mild cognitive impairment to Alzheimer’s demen- tia, using the Australian Imaging, Biomarker & Lifestyle (AIBL) Study dataset. Alzheimer’s & Dementia20, S10 (Dec. 2024)

work page 2024

[10] [10]

Kateryna Chumachenko, Alexandros Iosifidis, and Moncef Gabbouj. 2024. MMA- DFER: MultiModal Adaptation of unimodal models for Dynamic Facial Expression Recognition in-the-wild. InCVPR Workshops. 4673–4682

work page 2024

[11] [11]

Mini-mental state

Marshal F. Folstein, Susan E. Folstein, and Paul R. McHugh. 1975. “Mini-mental state”.Journal of Psychiatric Research12, 3 (Nov. 1975), 189–198

work page 1975

[12] [12]

T. Higuchi. 1988. Approach to an irregular time series on the basis of the fractal theory.Physica D: Nonlinear Phenomena31, 2 (June 1988), 277–283

work page 1988

[13] [13]

Yu-Liang Hsu, Jeen-Shing Wang, Wei-Chun Chiang, and Chien-Han Hung. 2020. Automatic ECG-Based Emotion Recognition in Music Listening.IEEE Transactions on Affective Computing11, 1 (Jan. 2020), 85–99

work page 2020

[14] [14]

Smith, Yonas Geda, David Sultzer, Henry Brodaty, Gwenn Smith, Luis Agüera-Ortiz, Rob Sweet, David Miller, and Constantine G

Zahinoor Ismail, Eric E. Smith, Yonas Geda, David Sultzer, Henry Brodaty, Gwenn Smith, Luis Agüera-Ortiz, Rob Sweet, David Miller, and Constantine G. Lyketsos

work page

[15] [15]

Neuropsychiatric symptoms as early manifestations of emergent dementia: Provisional diagnostic criteria for mild behavioral impairment.Alzheimer’s & Dementia12, 2 (June 2015), 195–202

work page 2015

[16] [16]

Robert Jenke, Angelika Peer, and Martin Buss. 2014. Feature Extraction and Selection for Emotion Recognition from EEG.IEEE Transactions on Affective Computing5, 3 (July 2014), 327–339

work page 2014

[17] [17]

Xingxun Jiang, Yuan Zong, Wenming Zheng, Chuangao Tang, Wanchuang Xia, Cheng Lu, and Jiateng Liu. 2020. DFEW: A Large-Scale Database for Recog- nizing Dynamic Facial Expressions in the Wild. InProceedings of the 28th ACM International Conference on Multimedia. 2881–2889

work page 2020

[18] [18]

A. John, U. Patel, J. Rusted, M. Richards, and D. Gaysina. 2018. Affective problems and decline in cognitive state in older adults: a systematic review and meta- analysis.Psychological Medicine49, 3 (May 2018), 353–365

work page 2018

[19] [19]

Stamos Katsigiannis and Naeem Ramzan. 2018. DREAMER: A Database for Emotion Recognition Through EEG and ECG Signals From Wireless Low-cost Off-the-Shelf Devices.IEEE Journal of Biomedical and Health Informatics22, 1 (Jan. 2018), 98–107

work page 2018

[20] [20]

Koelstra, C

S. Koelstra, C. Muhl, M. Soleymani, Jong-Seok Lee, A. Yazdani, T. Ebrahimi, T. Pun, A. Nijholt, and I. Patras. 2012. DEAP: A Database for Emotion Analysis; Using Physiological Signals.IEEE Transactions on Affective Computing3, 1 (Jan. 2012), 18–31

work page 2012

[21] [21]

Min-Ho Lee, Adai Shomanov, Balgyn Begim, Zhuldyz Kabidenova, Aruna Nyssan- bay, Adnan Yazici, and Seong-Whan Lee. 2024. EAV: EEG-Audio-Video Dataset for Emotion Recognition in Conversational Contexts.Scientific Data11, 1 (2024)

work page 2024

[22] [22]

Tao Liang, Junxiao Yu, Keke Shi, Yihao Yao, Jie Li, Bin Liu, Wei Wang, Chengyu Liu, Liangcheng Qu, Kuiying Yin, Wentao Xiang, and Jianqing Li. 2025. Con- struction and evaluation of an emotion-inducing video dataset towards Chinese elderly healthy controls and individuals with mild cognitive impairment.Cogni- tive Neurodynamics19, 1 (Sept. 2025)

work page 2025

[23] [23]

Yuanyuan Liu, Lin Wei, Kejun Liu, Zijing Chen, Zhe Chen, Chang Tang, Jingying Chen, and Shiguang Shan. 2025. Leveraging Eye Movement for Instructing Robust Video-Based Facial Expression Recognition.IEEE Transactions on Affective Computing16, 4 (Oct. 2025), 3404–3420

work page 2025

[24] [24]

Ilya Loshchilov and Frank Hutter. 2017. SGDR: Stochastic Gradient Descent with Warm Restarts. InInternational Conference on Learning Representations

work page 2017

[25] [25]

Ilya Loshchilov and Frank Hutter. 2019. Decoupled Weight Decay Regularization. InInternational Conference on Learning Representations

work page 2019

[26] [26]

Saturnino Luz, Fasih Haider, Sofia de la Fuente, Davida Fromm, and Brian MacWhinney. 2020. Alzheimer’s Dementia Recognition Through Spontaneous Speech: The ADReSS Challenge. InInterspeech 2020. 2172–2176

work page 2020

[27] [27]

Kaixin Ma, Xinyu Wang, Xinru Yang, Mingtong Zhang, Jeffrey M Girard, and Louis-Philippe Morency. 2019. ElderReact: A Multimodal Dataset for Recogniz- ing Emotional Response in Aging Adults. In2019 International Conference on Multimodal Interaction. 349–357

work page 2019

[28] [28]

Mauss, Robert W

Iris B. Mauss, Robert W. Levenson, Loren McCarter, Frank H. Wilhelm, and James J. Gross. 2005. The Tie That Binds? Coherence Among Emotion Experience, Behavior, and Physiology.Emotion5, 2 (2005), 175–190

work page 2005

[29] [29]

Juan Abdon Miranda-Correa, Mojtaba Khomami Abadi, Nicu Sebe, and Ioannis Patras. 2021. AMIGOS: A Dataset for Affect, Personality and Mood Research on Individuals and Groups.IEEE Transactions on Affective Computing12, 2 (April 2021), 479–493

work page 2021

[30] [30]

M. A. Nicolaou, H. Gunes, and M. Pantic. 2011. Continuous Prediction of Sponta- neous Affect from Multiple Cues and Modalities in Valence-Arousal Space.IEEE Transactions on Affective Computing2, 2 (April 2011), 92–105

work page 2011

[31] [31]

Cheul Young Park, Narae Cha, Soowon Kang, Auk Kim, Ahsan Habib Khandoker, Leontios Hadjileontiadis, Alice Oh, Yong Jeong, and Uichin Lee. 2020. K-EmoCon, a multimodal sensor dataset for continuous emotion recognition in naturalistic conversations.Scientific Data7, 1 (Sept. 2020)

work page 2020

[32] [32]

Soujanya Poria, Erik Cambria, Rajiv Bajpai, and Amir Hussain. 2017. A review of affective computing: From unimodal analysis to multimodal fusion.Information Fusion37 (Sept. 2017), 98–125

work page 2017

[33] [33]

Richman and J

Joshua S. Richman and J. Randall Moorman. 2000. Physiological time-series analysis using approximate entropy and sample entropy.American Journal of Physiology-Heart and Circulatory Physiology278, 6 (June 2000), H2039–H2049

work page 2000

[34] [34]

Fabien Ringeval, Björn Schuller, Michel Valstar, Shashank Jaiswal, Erik Marchi, Denis Lalanne, Roddy Cowie, and Maja Pantic. 2015. AV+EC 2015: The First Affect Recognition Challenge Bridging Across Audio, Video, and Physiological Data. InProceedings of the 5th International Workshop on Audio/Visual Emotion Challenge. 3–8

work page 2015

[35] [35]

Robert, C.U

P. Robert, C.U. Onyike, A.F.G. Leentjens, K. Dujardin, P. Aalten, S. Starkstein, F.R.J. Verhey, J. Yessavage, J.P. Clement, D. Drapier, F. Bayle, M. Benoit, P. Boyer, P.M. Lorca, F. Thibaut, S. Gauthier, G. Grossberg, B. Vellas, and J. Byrne. 2009. Proposed diagnostic criteria for apathy in Alzheimer’s disease and other neuropsychiatric disorders.European...

work page 2009

[36] [36]

Elena Ryumina, Denis Dresvyanskiy, and Alexey Karpov. 2022. In search of a robust facial expressions recognition model: A large-scale visual cross-corpus study.Neurocomputing514 (Dec. 2022), 435–450

work page 2022

[37] [37]

Tulika Saha, Aditya Patra, Sriparna Saha, and Pushpak Bhattacharyya. 2020. Towards Emotion-aided Multi-modal Dialogue Act Classification. InProceedings of the 58th Annual Meeting of the Association for Computational Linguistics

work page 2020

[38] [38]

Sen, Gazi Naven, Luke Gerstner, Daryl Bagley, Raiyan Abdul Baten, Wasifur Rahman, Md Kamrul Hasan, Kurtis Haut, Abdullah Al Mamun, Samiha Samrose, Anne Solbu, R

Taylan K. Sen, Gazi Naven, Luke Gerstner, Daryl Bagley, Raiyan Abdul Baten, Wasifur Rahman, Md Kamrul Hasan, Kurtis Haut, Abdullah Al Mamun, Samiha Samrose, Anne Solbu, R. Eric Barnes, Mark G. Frank, and Ehsan Hoque. 2023. DBATES: Dataset for Discerning Benefits of Audio, Textual, and Facial Expres- sion Features in Competitive Debate Speeches.IEEE Transa...

work page 2023

[39] [39]

Li-Chen Shi, Ying-Ying Jiao, and Bao-Liang Lu. 2013. Differential entropy feature for EEG-based vigilance estimation. In2013 35th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC). 6627–6630

work page 2013

[40] [40]

Shrout and Joseph L

Patrick E. Shrout and Joseph L. Fleiss. 1979. Intraclass correlations: Uses in assessing rater reliability.Psychological Bulletin86, 2 (1979), 420–428

work page 1979

[41] [41]

Soleymani, J

M. Soleymani, J. Lichtenauer, T. Pun, and M. Pantic. 2012. A Multimodal Database for Affect Recognition and Implicit Tagging.IEEE Transactions on Affective Computing3, 1 (Jan. 2012), 42–55

work page 2012

[42] [42]

Nitish Srivastava, Geoffrey Hinton, Alex Krizhevsky, Ilya Sutskever, and Ruslan Salakhutdinov. 2014. Dropout: a simple way to prevent neural networks from overfitting.J. Mach. Learn. Res.15, 1 (Jan. 2014), 1929–1958

work page 2014

[43] [43]

Vieriu, Stefan Winkler, and Nicu Sebe

Ramanathan Subramanian, Julia Wache, Mojtaba Khomami Abadi, Radu L. Vieriu, Stefan Winkler, and Nicu Sebe. 2018. ASCERTAIN: Emotion and Personality Recognition Using Commercial Sensors.IEEE Transactions on Affective Computing 9, 2 (April 2018), 147–160. Conference acronym ’XX, June 03–05, 2026, Woodstock, NY Chen et al

work page 2018

[44] [44]

Du Tran, Lubomir Bourdev, Rob Fergus, Lorenzo Torresani, and Manohar Paluri

work page

[45] [45]

In 2015 IEEE International Conference on Computer Vision (ICCV)

Learning Spatiotemporal Features with 3D Convolutional Networks. In 2015 IEEE International Conference on Computer Vision (ICCV). 4489–4497

work page 2015

[46] [46]

Michel Valstar, Jonathan Gratch, Björn Schuller, Fabien Ringeval, Denis Lalanne, Mercedes Torres Torres, Stefan Scherer, Giota Stratou, Roddy Cowie, and Maja Pantic. 2016. AVEC 2016: Depression, Mood, and Emotion Recognition Workshop and Challenge. InProceedings of the 6th International Workshop on Audio/Visual Emotion Challenge. 3–10

work page 2016

[47] [47]

Weiner, Dallas P

Michael W. Weiner, Dallas P. Veitch, Paul S. Aisen, Laurel A. Beckett, Nigel J. Cairns, Robert C. Green, Danielle Harvey, Clifford R. Jack, William Jagust, Enchi Liu, John C. Morris, Ronald C. Petersen, Andrew J. Saykin, Mark E. Schmidt, Leslie Shaw, Judith A. Siuciak, Holly Soares, Arthur W. Toga, and John Q. Trojanowski

work page

[48] [48]

The Alzheimer’s Disease Neuroimaging Initiative: A review of papers published since its inception.Alzheimer’s & Dementia8, 1S (Nov. 2011)

work page 2011

[49] [49]

Qu Yang, Qinghongya Shi, Tongxin Wang, and Mang Ye. 2025. Uncertain Mul- timodal Intention and Emotion Understanding in the Wild. In2025 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 24700–24709

work page 2025

[50] [50]

Shiqing Zhang, Yijiao Yang, Chen Chen, Xingnan Zhang, Qingming Leng, and Xiaoming Zhao. 2024. Deep learning-based multimodal emotion recognition from audio, visual, and text modalities: A systematic review of recent advancements and future prospects.Expert Systems with Applications237 (2024), 121692

work page 2024

[51] [51]

Zhicheng Zhang, Pancheng Zhao, Eunil Park, and Jufeng Yang. 2024. MART: Masked Affective RepresenTation Learning via Masked Temporal Distribution Distillation. In2024 IEEE/CVF Conference on Computer Vision and Pattern Recog- nition (CVPR). 12830–12840

work page 2024

[52] [52]

Minghui Zhao, Hongxiang Gao, Xinru Qi, Haiyan Yin, Yulei Song, Yamei Bai, Jianqing Li, Lulu Zhao, and Chengyu Liu. 2025. Multi-Query Cross-Modal Atten- tion Fusion for Cognitive Impairment Recognition.IEEE Transactions on Neural Systems and Rehabilitation Engineering33 (2025), 2520–2530

work page 2025

[53] [53]

Yuliang Zhao, Huawei Zhang, Jian Li, Siyang Song, Chao Lian, Yinghao Liu, Yulin Wang, and Changzeng Fu. 2025. Multimodal Depression Assessment Framework Integrating Personality and Gait for Older Adults With Medical Conditions.IEEE Transactions on Affective Computing16, 3 (July 2025), 2048–2061

work page 2025

[54] [54]

Wei-Long Zheng, Wei Liu, Yifei Lu, Bao-Liang Lu, and Andrzej Cichocki. 2019. EmotionMeter: A Multimodal Framework for Recognizing Human Emotions. IEEE Transactions on Cybernetics49, 3 (March 2019), 1110–1122

work page 2019

[55] [55]

Wei-Long Zheng and Bao-Liang Lu. 2015. Investigating Critical Frequency Bands and Channels for EEG-Based Emotion Recognition with Deep Neural Networks. IEEE Transactions on Autonomous Mental Development7, 3 (Sept. 2015), 162–175. Received 20 February 2007; revised 12 March 2009; accepted 5 June 2009

work page 2015