pith. sign in

arxiv: 2606.00815 · v1 · pith:IA7IESKBnew · submitted 2026-05-30 · 💻 cs.LG

OmniEEG-Bench: A Standardized Evaluation Benchmark for EEG Foundation Models

Pith reviewed 2026-06-28 19:16 UTC · model grok-4.3

classification 💻 cs.LG
keywords EEG foundation modelsbenchmarkscaling lawspretraining diversitymodel sizebrain-computer interfacetask familiesevaluation protocols
0
0 comments X

The pith

EEG foundation models perform better when both their size and pretraining data diversity increase, as shown on a new benchmark covering 54 datasets.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper creates OmniEEG-Bench to bring consistent evaluation to EEG foundation models by grouping tasks into six families and aligning 54 datasets under one set of protocols. Benchmarking ten existing models reveals that larger models trained on more varied pretraining data achieve higher average ranks across those datasets. A reader would care because this pattern points to scaling behavior that could guide how future brain-signal models are built for applications like monitoring or interaction. The work replaces scattered task setups with a shared task-card system that makes direct comparisons possible.

Core claim

OmniEEG-Bench standardizes deployment, task definitions, and metrics for EEG foundation models across six task families and 54 unified datasets; when ten representative models are evaluated, both pretraining dataset diversity and model size show significant positive association with better average ranks, indicating scaling-law behavior.

What carries the argument

OmniEEG-Bench task-card specification that enforces consistent model deployment, task definitions, and metrics across the six task families and 54 datasets.

If this is right

  • Scaling EEG foundation models requires both larger architectures and broader, more diverse pretraining data rather than size alone.
  • New EEG foundation models can be compared directly using the shared task-card protocols instead of custom setups.
  • Performance trends observed on the six task families can be used to forecast results on additional datasets that follow the same families.
  • Development efforts should expand pretraining collections to include more varied recording conditions and subject populations.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • If the scaling pattern holds, future work could test whether adding specific types of diversity, such as cross-device recordings, produces larger gains than simply increasing total hours of data.
  • The benchmark structure could be extended to measure transfer between task families, for example checking whether gains on motor tasks predict gains on emotion tasks.
  • Researchers might examine whether the same diversity-size relationship appears when the benchmark is applied to models trained from scratch rather than fine-tuned from existing checkpoints.

Load-bearing premise

The chosen 54 datasets and six task families capture typical real-world EEG capabilities without major selection effects that would create false associations between diversity, size, and performance.

What would settle it

Re-running the ten models on a fresh collection of datasets that deliberately balances or reduces diversity while keeping the same task families and finding no remaining link between diversity or size and rank would falsify the reported association.

Figures

Figures reproduced from arXiv: 2606.00815 by Chen Wei, Chenyu Huang, Jiahao Fan, Kexin Lou, Quanying Liu, Shinan Wang, Xiang Chen, XiaoQi Chen, Xinke Shen, Xin Xu, Yingyue Xin, Zhoujie Hou, Ziling Lu, Zongsheng Li.

Figure 1
Figure 1. Figure 1: Scaling law of pretraining data diversity (a) and model size (b) for linear-probing generalization of EEG foundation models. Tests on OmniEEG-Bench with 58 datasets show that EEG foundation models pretrained on a greater number of datasets and models with a larger number of parameters tend to achieve lower average ranks (i.e., better performance). Abstract Electroencephalography (EEG) supports a variety of… view at source ↗
Figure 2
Figure 2. Figure 2: Task taxonomy of OmniEEG-Bench. We organize 58 tasks (from 54 datasets) into 6 [PITH_FULL_IMAGE:figures/full_fig_p003_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: OmniEEG-Bench evaluation pipeline, equipped with four evaluation protocols: cross [PITH_FULL_IMAGE:figures/full_fig_p005_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Primary cross-subject-prioritized leaderboard of ten EEG foundation models. [PITH_FULL_IMAGE:figures/full_fig_p006_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: The performance of zero-shot and few-shot learning. [PITH_FULL_IMAGE:figures/full_fig_p008_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: The robustness of model performance on channel masking. [PITH_FULL_IMAGE:figures/full_fig_p008_6.png] view at source ↗
Figure 7
Figure 7. Figure 7: Pretraining design factors and cross-subject linear-probing performance of EEG foundation [PITH_FULL_IMAGE:figures/full_fig_p009_7.png] view at source ↗
read the original abstract

Electroencephalography (EEG) supports a variety of brain-computer interface (BCI) tasks ranging from brain-state monitoring to human-LLM interactions. EEG foundation models are emerging, but evaluation remains fragmented due to heterogeneous datasets and nconsistent task protocols. Here, we introduce OmniEEG-Bench, a unified benchmark and downstream task roadmap for EEG foundation models (FMs). It organizes evaluation of EEG FMs into six task families spanning (i) signal reliability, (ii) biometrics and disease, (iii) consciousness and state, (iv) cognition and emotion, (v) naturalistic stimulus decoding, and (vi) motor and interaction, introducing a new generation of tasks not systematically benchmarked in prior EEG FM work. OmniEEG-Bench standardizes model deployment, task definitions, and metrics through a task-card specification, and unifies 54 EEG datasets with consistent evaluation protocols. We benchmark 10 representative EEG foundation models and report a leaderboard that covers diverse evaluation settings. Both pretraining dataset diversity and model size are significantly associated with better average ranks across datasets, revealing scaling-law behavior in EEG foundation models (Figure 1). These results suggest that scaling EEG foundation models requires not only larger architectures but also broader and more diverse pretraining data. The benchmark code is available at https://github.com/ncclab-sustech/omni-eegbench.git.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

1 major / 0 minor

Summary. The paper introduces OmniEEG-Bench, a standardized benchmark organizing 54 EEG datasets into six task families (signal reliability, biometrics/disease, consciousness/state, cognition/emotion, naturalistic stimulus decoding, motor/interaction) with unified task-card specifications, metrics, and protocols. It benchmarks 10 EEG foundation models, produces a leaderboard, and reports that pretraining dataset diversity and model size are significantly associated with better average ranks across datasets, indicating scaling-law behavior (Figure 1).

Significance. A well-executed unified benchmark with new task families could reduce fragmentation in EEG FM evaluation and enable more comparable progress; the scaling observation, if supported by transparent statistics, would provide actionable guidance on data diversity versus architecture size. The open code release is a positive factor for reproducibility.

major comments (1)
  1. [Abstract] Abstract (and associated Figure 1 claim): the assertion that 'pretraining dataset diversity and model size are significantly associated with better average ranks' is presented without any description of the statistical test, p-values, confidence intervals, error bars, controls for confounding variables (e.g., model architecture family), or dataset exclusion criteria, rendering the central scaling-law claim unverifiable from the provided text.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for the constructive feedback on our manuscript. The single major comment identifies a clear gap in statistical transparency for the scaling-law claim. We address it directly below and will incorporate the requested details in the revision.

read point-by-point responses
  1. Referee: [Abstract] Abstract (and associated Figure 1 claim): the assertion that 'pretraining dataset diversity and model size are significantly associated with better average ranks' is presented without any description of the statistical test, p-values, confidence intervals, error bars, controls for confounding variables (e.g., model architecture family), or dataset exclusion criteria, rendering the central scaling-law claim unverifiable from the provided text.

    Authors: We agree that the abstract and the text/figure associated with this claim do not currently provide the requested statistical details. In the revised manuscript we will (1) specify the exact statistical test(s) employed (e.g., Spearman rank correlation or linear regression on average rank), (2) report the corresponding p-values and confidence intervals, (3) add error bars or uncertainty measures to the relevant panels of Figure 1 where appropriate, (4) describe any controls or subgroup analyses performed for confounding factors such as model architecture family, and (5) explicitly state dataset exclusion criteria (or confirm that none were applied beyond the stated unification protocol). These additions will be placed in both the abstract (concisely) and the main results or methods section so the claim becomes verifiable. revision: yes

Circularity Check

0 steps flagged

No significant circularity

full rationale

This is an empirical benchmarking paper that unifies 54 external EEG datasets under standardized protocols and reports observed correlations between model size, pretraining diversity, and average ranks across those datasets. The scaling-law claim is a post-hoc statistical association from direct model evaluations rather than any derivation, equation, or self-citation that reduces the result to the paper's own inputs by construction. No load-bearing step invokes self-referential predictions, fitted parameters renamed as forecasts, or uniqueness theorems from the authors' prior work.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

No free parameters, axioms, or invented entities are introduced; the work is an empirical aggregation and comparison of existing datasets and models.

pith-pipeline@v0.9.1-grok · 5819 in / 1028 out tokens · 22061 ms · 2026-06-28T19:16:35.750174+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

124 extracted references · 8 canonical work pages

  1. [1]

    EEG foundation models: A critical revi ew of current progress and future directions

    Gayal Kuruppu, Neeraj Wagh, Vaclav Kremen, Sandipan Pati, Gregory Worrell, and Yogatheesan Varathara- jah. Eeg foundation models: A critical review of current progress and future directions.arXiv preprint arXiv:2507.11783, 2025

  2. [2]

    Brain foundation models: A survey on advancements in neural signal processing and brain discovery.arXiv preprint arXiv:2503.00580, 2025

    Xinliang Zhou, Chenyu Liu, Zhisheng Chen, Kun Wang, Yi Ding, Ziyu Jia, and Qingsong Wen. Brain foundation models: A survey on advancements in neural signal processing and brain discovery.arXiv preprint arXiv:2503.00580, 2025

  3. [3]

    Adabrain-bench: Benchmarking brain foundation models for brain-computer interface applications.arXiv preprint arXiv:2507.09882, 2025

    Jiamin Wu, Zichen Ren, Junyu Wang, Pengyu Zhu, Yonghao Song, Mianxin Liu, Qihao Zheng, Lei Bai, Wanli Ouyang, and Chunfeng Song. Adabrain-bench: Benchmarking brain foundation models for brain-computer interface applications.arXiv preprint arXiv:2507.09882, 2025

  4. [4]

    Biot: Biosignal transformer for cross-data learning in the wild.Advances in Neural Information Processing Systems, 36:78240–78260, 2023

    Chaoqi Yang, M Westover, and Jimeng Sun. Biot: Biosignal transformer for cross-data learning in the wild.Advances in Neural Information Processing Systems, 36:78240–78260, 2023

  5. [5]

    Large brain model for learning generic representations with tremendous EEG data in BCI

    Wei-Bang Jiang, Li-Ming Zhao, and Bao-Liang Lu. Large brain model for learning generic representations with tremendous EEG data in BCI. InThe Twelfth International Conference on Learning Representations, 2024

  6. [6]

    Brainomni: A brain foundation model for unified eeg and meg signals

    Qinfan Xiao, Ziyun Cui, Chi Zhang, Siqi Chen, Wen Wu, Andrew Thwaites, Alexandra Woolgar, Bowen Zhou, and Chao Zhang. Brainomni: A brain foundation model for unified eeg and meg signals. InThe Thirty-ninth Annual Conference on Neural Information Processing Systems, 2025

  7. [7]

    The applied principles of eeg analysis methods in neuroscience and clinical neurology.Military Medical Research, 10(1):67, 2023

    Hao Zhang, Qing-Qi Zhou, He Chen, Xiao-Qing Hu, Wei-Guang Li, Yang Bai, Jun-Xia Han, Yao Wang, Zhen-Hu Liang, Dan Chen, et al. The applied principles of eeg analysis methods in neuroscience and clinical neurology.Military Medical Research, 10(1):67, 2023

  8. [8]

    Automatic sleep staging of eeg signals: recent development, challenges, and future directions.Physiological Measurement, 43(4):04TR01, 2022

    Huy Phan and Kaare Mikkelsen. Automatic sleep staging of eeg signals: recent development, challenges, and future directions.Physiological Measurement, 43(4):04TR01, 2022

  9. [9]

    Past, present, and future of eeg-based bci applications.Sensors, 22(9):3331, 2022

    Kaido Värbu, Naveed Muhammad, and Yar Muhammad. Past, present, and future of eeg-based bci applications.Sensors, 22(9):3331, 2022

  10. [10]

    Human eeg recordings for 1,854 concepts presented in rapid serial visual presentation streams.Scientific Data, 9(1):3, 2022

    Tijl Grootswagers, Ivy Zhou, Amanda K Robinson, Martin N Hebart, and Thomas A Carlson. Human eeg recordings for 1,854 concepts presented in rapid serial visual presentation streams.Scientific Data, 9(1):3, 2022

  11. [11]

    Electrophysiological correlates of semantic dissimilarity reflect the comprehension of natural, narrative speech.Current Biology, 28(5):803–809, 2018

    Michael P Broderick, Andrew J Anderson, Giovanni M Di Liberto, Michael J Crosse, and Edmund C Lalor. Electrophysiological correlates of semantic dissimilarity reflect the comprehension of natural, narrative speech.Current Biology, 28(5):803–809, 2018

  12. [12]

    An eeg dataset for multimodal semantic alignment and neural decoding during reading and listening.Scientific Data, 2025

    Sitong Chen, Beiqianyi Li, Cuilin He, Dongyang Li, Mingyang Wu, Xinke Shen, Song Wang, Xuetao Wei, Xindi Wang, Haiyan Wu, et al. An eeg dataset for multimodal semantic alignment and neural decoding during reading and listening.Scientific Data, 2025

  13. [13]

    Naturalistic stimuli in neuroscience: critically acclaimed.Trends in cognitive sciences, 23(8):699–714, 2019

    Saurabh Sonkusare, Michael Breakspear, and Christine Guo. Naturalistic stimuli in neuroscience: critically acclaimed.Trends in cognitive sciences, 23(8):699–714, 2019

  14. [14]

    States, traits, and the resting state eeg task aftereffect.International Journal of Psychophysiology, 210:112523, 2025

    Tim Martin, Erica Holliday, Cyril Okhio, Alexis Newman, Lamar LaTella, Makayla Mcginnis, Bruno Giordani, V oyko Kavcic, et al. States, traits, and the resting state eeg task aftereffect.International Journal of Psychophysiology, 210:112523, 2025

  15. [15]

    Interface, interaction, and intelligence in generalized brain–computer interfaces.Trends in cognitive sciences, 25(8):671–684, 2021

    Xiaorong Gao, Yijun Wang, Xiaogang Chen, and Shangkai Gao. Interface, interaction, and intelligence in generalized brain–computer interfaces.Trends in cognitive sciences, 25(8):671–684, 2021

  16. [16]

    Eegdenoisenet: a benchmark dataset for deep learning solutions of eeg denoising.Journal of Neural Engineering, 18(5):056057, 2021

    Haoming Zhang, Mingqi Zhao, Chen Wei, Dante Mantini, Zherui Li, and Quanying Liu. Eegdenoisenet: a benchmark dataset for deep learning solutions of eeg denoising.Journal of Neural Engineering, 18(5):056057, 2021

  17. [17]

    dataset: Eeg-controlled exoskeleton for walking and standing - a longitudinal study of healthy individuals

    Shantanu Sarkar, Kevin Nathan, and Jose L. Contreras-Vidal. "dataset: Eeg-controlled exoskeleton for walking and standing - a longitudinal study of healthy individuals", 2025

  18. [18]

    A mind-brain-body dataset of mri, eeg, cognition, emotion, and peripheral physiology in young and old adults.Scientific data, 6(1):1–21, 2019

    Anahit Babayan, Miray Erbey, Deniz Kumral, Janis D Reinelt, Andrea MF Reiter, Josefin Röbbig, H Lina Schaare, Marie Uhlig, Alfred Anwander, Pierre-Louis Bazin, et al. A mind-brain-body dataset of mri, eeg, cognition, emotion, and peripheral physiology in young and old adults.Scientific data, 6(1):1–21, 2019

  19. [19]

    dataset of eeg recordings of pediatric patients with epilepsy based on the 10-20 system

    Dorottya Cserpan, Ece Boran, Richard Rosch, San Pietro Lo Biundo, Georgia Ramantani, and Johannes Sarnthein. "dataset of eeg recordings of pediatric patients with epilepsy based on the 10-20 system ", 2021. 11

  20. [20]

    The temple university hospital eeg data corpus.Frontiers in Neuroscience, V olume 10 - 2016, 2016

    Iyad Obeid and Joseph Picone. The temple university hospital eeg data corpus.Frontiers in Neuroscience, V olume 10 - 2016, 2016

  21. [21]

    Improved eeg event classification using differential energy

    Amir Harati, Meysam Golmohammadi, Silvia Lopez, Iyad Obeid, and Joseph Picone. Improved eeg event classification using differential energy. InIEEE Signal Processing in Medicine and Biology Symposium (SPMB), 2015

  22. [22]

    Temple university eeg corpus - downloads, 2026

    Neural Engineering Data Consortium. Temple university eeg corpus - downloads, 2026

  23. [23]

    Siena scalp eeg database.PhysioNet, 2020

    Paolo Detti. Siena scalp eeg database.PhysioNet, 2020

  24. [24]

    A dataset of eeg signals from adults with adhd and healthy controls: Resting state, cognitive function, and sound listening paradigm.Mendeley Data, 2023

    Ghasem Sadeghi Bajestani, Shima Abedian, Fatemeh Makhloughi, Motahhareh Raoufitabar, and Hamid Saeedi. A dataset of eeg signals from adults with adhd and healthy controls: Resting state, cognitive function, and sound listening paradigm.Mendeley Data, 2023

  25. [25]

    A dataset of scalp eeg recordings of alzheimer’s disease, frontotemporal dementia and healthy subjects from routine eeg.Data, 8(6):95, 2023

    Andreas Miltiadous, Katerina D Tzimourta, Theodora Afrantou, Panagiotis Ioannidis, Nikolaos Grigoriadis, Dimitrios G Tsalikakis, Pantelis Angelidis, Markos G Tsipouras, Euripidis Glavas, Nikolaos Giannakeas, et al. A dataset of scalp eeg recordings of alzheimer’s disease, frontotemporal dementia and healthy subjects from routine eeg.Data, 8(6):95, 2023

  26. [26]

    uc san diego resting state eeg data from patients with parkinson’s disease

    Alexander P. Rockhill, Nicko Jackson, Jobi George, Adam Aron, and Nicole C. Swann. "uc san diego resting state eeg data from patients with parkinson’s disease", 2021

  27. [27]

    eeg mortality dataset in parkinson’s disease

    Simin Jamshidi, Arturo Espinoza, Soura Dasgupta, and Nandakumar Narayanan. "eeg mortality dataset in parkinson’s disease", 2025

  28. [28]

    MDD Patients and Healthy Controls EEG Data (New)

    Wajid Mumtaz. MDD Patients and Healthy Controls EEG Data (New). 11 2016

  29. [29]

    The two decades brainclinics research archive for insights in neurophysiology (tdbrain) database.Scientific data, 9(1):333, 2022

    Hanneke Van Dijk, Guido Van Wingen, Damiaan Denys, Sebastian Olbrich, Rosalinde Van Ruth, and Martijn Arns. The two decades brainclinics research archive for insights in neurophysiology (tdbrain) database.Scientific data, 9(1):333, 2022

  30. [30]

    eeg: Depression rest

    James F Cavanagh jcavanagh@unm.edu. "eeg: Depression rest", 2021

  31. [31]

    Modma dataset: a multi-modal open dataset for mental-disorder analysis.CoRR, abs/2002.09283, 2020

    Hanshu Cai, Yiwen Gao, Shuting Sun, et al. Modma dataset: a multi-modal open dataset for mental-disorder analysis.CoRR, abs/2002.09283, 2020

  32. [32]

    A repeated awakening study exploring the capacity of complexity measures to capture dreaming during propofol sedation.Scientific Reports, 15(1):32746, 2025

    Imad J Bajwa, Andre S Nilsen, René Skukies, Arnfinn Aamodt, Gernot Ernst, Johan F Storm, and Bjørn E Juel. A repeated awakening study exploring the capacity of complexity measures to capture dreaming during propofol sedation.Scientific Reports, 15(1):32746, 2025

  33. [33]

    Isruc-sleep: A comprehensive public dataset for sleep researchers.Computer methods and programs in biomedicine, 124:180–192, 2016

    Sirvan Khalighi, Teresa Sousa, José Moutinho Santos, and Urbano Nunes. Isruc-sleep: A comprehensive public dataset for sleep researchers.Computer methods and programs in biomedicine, 124:180–192, 2016

  34. [34]

    Analysis of a sleep-dependent neuronal feedback loop: the slow-wave microcontinuity of the eeg.IEEE Transactions on Biomedical Engineering, 47(9):1185–1194, 2000

    Bob Kemp, Aeilko H Zwinderman, Bert Tuk, Hilbert AC Kamphuisen, and Josefien JL Oberye. Analysis of a sleep-dependent neuronal feedback loop: the slow-wave microcontinuity of the eeg.IEEE Transactions on Biomedical Engineering, 47(9):1185–1194, 2000

  35. [35]

    Haaglanden medisch centrum sleep staging database

    Diego Alvarez-Estevez and Roselyne Rijsman. Haaglanden medisch centrum sleep staging database. PhysioNet, 2022

  36. [36]

    Hbn-eeg: The fair implementation of the healthy brain network (hbn) electroencephalography dataset.bioRxiv, pages 2024–10, 2024

    Seyed Yahya Shirazi, Alexandre Franco, Maurício Scopel Hoffmann, Nathalia B Esper, Dung Truong, Arnaud Delorme, Michael P Milham, and Scott Makeig. Hbn-eeg: The fair implementation of the healthy brain network (hbn) electroencephalography dataset.bioRxiv, pages 2024–10, 2024

  37. [37]

    An open resource for transdiagnostic research in pediatric mental health and learning disorders.Scientific data, 4(1):1–26, 2017

    Lindsay M Alexander, Jasmine Escalera, Lei Ai, Charissa Andreotti, Karina Febre, Alexander Mangone, Natan Vega-Potler, Nicolas Langer, Alexis Alexander, Meagan Kovacs, et al. An open resource for transdiagnostic research in pediatric mental health and learning disorders.Scientific data, 4(1):1–26, 2017

  38. [38]

    Pearl-neuro database: Eeg, fmri, health and lifestyle data of middle-aged people at risk of dementia.Scientific Data, 11(1):276, 2024

    Patrycja Dzianok and Ewa Kublik. Pearl-neuro database: Eeg, fmri, health and lifestyle data of middle-aged people at risk of dementia.Scientific Data, 11(1):276, 2024

  39. [39]

    an eeg dataset recorded during affective music listening

    Ian Daly, Nicoletta Nicolaou, Duncan Williams, Faustina Hwang, Alexis Kirke, Eduardo Miranda, and Slawomir J. Nasuto. "an eeg dataset recorded during affective music listening", 2024

  40. [40]

    A multimodal approach to estimating vigilance using eeg and forehead eog.Journal of neural engineering, 14(2):026017, 2017

    Wei-Long Zheng and Bao-Liang Lu. A multimodal approach to estimating vigilance using eeg and forehead eog.Journal of neural engineering, 14(2):026017, 2017. 12

  41. [41]

    Deap: A database for emotion analysis; using physiological signals.IEEE transactions on affective computing, 3(1):18–31, 2011

    Sander Koelstra, Christian Muhl, Mohammad Soleymani, Jong-Seok Lee, Ashkan Yazdani, Touradj Ebrahimi, Thierry Pun, Anton Nijholt, and Ioannis Patras. Deap: A database for emotion analysis; using physiological signals.IEEE transactions on affective computing, 3(1):18–31, 2011

  42. [42]

    A large finer-grained affective computing eeg dataset.Scientific Data, 10(1):740, 2023

    Jingjing Chen, Xiaobin Wang, Chen Huang, Xin Hu, Xinke Shen, and Dan Zhang. A large finer-grained affective computing eeg dataset.Scientific Data, 10(1):740, 2023

  43. [43]

    Differential entropy feature for eeg-based emotion classification

    Ruo-Nan Duan, Jia-Yi Zhu, and Bao-Liang Lu. Differential entropy feature for eeg-based emotion classification. In2013 6th international IEEE/EMBS conference on neural engineering (NER), pages 81–84. IEEE, 2013

  44. [44]

    Emotionmeter: A multimodal framework for recognizing human emotions.IEEE transactions on cybernetics, 49(3):1110–1122, 2018

    Wei-Long Zheng, Wei Liu, Yifei Lu, Bao-Liang Lu, and Andrzej Cichocki. Emotionmeter: A multimodal framework for recognizing human emotions.IEEE transactions on cybernetics, 49(3):1110–1122, 2018

  45. [45]

    Wei Liu, Jie-Lin Qiu, Wei-Long Zheng, and Bao-Liang Lu. Comparing recognition performance and robustness of multimodal deep learning models for multimodal emotion recognition.IEEE Transactions on Cognitive and Developmental Systems, 14(2):715–729, 2021

  46. [46]

    Seed-vii: A multimodal dataset of six basic emotions with continuous labels for emotion recognition.IEEE Transactions on Affective Computing, 2024

    Wei-Bang Jiang, Xuan-Hao Liu, Wei-Long Zheng, and Bao-Liang Lu. Seed-vii: A multimodal dataset of six basic emotions with continuous labels for emotion recognition.IEEE Transactions on Affective Computing, 2024

  47. [47]

    An open-source toolbox for standardized use of physionet sleep edf expanded database

    Syed Anas Imtiaz and Esther Rodriguez-Villegas. An open-source toolbox for standardized use of physionet sleep edf expanded database. In2015 37th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC), pages 6014–6017. IEEE, 2015

  48. [48]

    Eeg-svrec: An eeg dataset with user multidimensional affective engagement labels in short video recommendation

    Shaorun Zhang, Zhiyu He, Ziyi Ye, Peijie Sun, Qingyao Ai, Min Zhang, and Yiqun Liu. Eeg-svrec: An eeg dataset with user multidimensional affective engagement labels in short video recommendation. In Proceedings of the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval, pages 698–708, 2024

  49. [49]

    Cire: A chinese eeg dataset for decoding speech intention modulated by prosodic emotion.Scientific Data, 12(1):1664, 2025

    Shengrui He, Zhongjie Li, Jianwu Dang, Yingyi Luo, and Gaoyan Zhang. Cire: A chinese eeg dataset for decoding speech intention modulated by prosodic emotion.Scientific Data, 12(1):1664, 2025

  50. [50]

    A multi-context emotional eeg dataset for cross-context emotion decoding.Scientific Data, 12(1):1142, 2025

    Xin Xu, Xinke Shen, Xuyang Chen, Qingzhu Zhang, Sitian Wang, Yihan Li, Zongsheng Li, Dan Zhang, Mingming Zhang, and Quanying Liu. A multi-context emotional eeg dataset for cross-context emotion decoding.Scientific Data, 12(1):1142, 2025

  51. [51]

    Electroencephalograms during mental arithmetic task performance.Data, 4(1):14, 2019

    Igor Zyma, Sergey Tukaev, Ivan Seleznov, Ken Kiyono, Anton Popov, Mariia Chernykh, and Olexii Shpenkov. Electroencephalograms during mental arithmetic task performance.Data, 4(1):14, 2019

  52. [52]

    Hinss, Emilie S

    Marcel F. Hinss, Emilie S. Jahanpour, Bertille Somon, et al. Open multi-session and multi-task eeg cognitive dataset for passive brain-computer interface applications.Scientific Data, 10:85, 2023

  53. [53]

    Seong-Whan Lee, Klaus-Robert Müller, and José del R. Millán. 2020 bci competition, track 3, 2020

  54. [54]

    A large and rich eeg dataset for modeling human visual object recognition.NeuroImage, 264:119754, 2022

    Alessandro T Gifford, Kshitij Dwivedi, Gemma Roig, and Radoslaw M Cichy. A large and rich eeg dataset for modeling human visual object recognition.NeuroImage, 264:119754, 2022

  55. [55]

    The non-invasive berlin brain–computer interface: fast acquisition of effective performance in untrained subjects.NeuroImage, 37(2):539–550, 2007

    Benjamin Blankertz, Guido Dornhege, Matthias Krauledat, Klaus-Robert Müller, and Gabriel Curio. The non-invasive berlin brain–computer interface: fast acquisition of effective performance in untrained subjects.NeuroImage, 37(2):539–550, 2007

  56. [56]

    Bci competition 2008–graz data set a.Institute for knowledge discovery (laboratory of brain-computer interfaces), Graz University of Technology, 16(1-6):34, 2008

    Clemens Brunner, Robert Leeb, Gernot Müller-Putz, Alois Schlögl, and Gert Pfurtscheller. Bci competition 2008–graz data set a.Institute for knowledge discovery (laboratory of brain-computer interfaces), Graz University of Technology, 16(1-6):34, 2008

  57. [57]

    Bci2000: a general-purpose brain-computer interface (bci) system.IEEE Transactions on biomedical engineering, 51(6):1034–1043, 2004

    Gerwin Schalk, Dennis J McFarland, Thilo Hinterberger, Niels Birbaumer, and Jonathan R Wolpaw. Bci2000: a general-purpose brain-computer interface (bci) system.IEEE Transactions on biomedical engineering, 51(6):1034–1043, 2004

  58. [58]

    SHU Multi-session Dataset

    Jun Ma, Banghua Yang, Wenzheng Qiu, Yunzhe Li, Shouwei Gao, and XinXing Xia. SHU Multi-session Dataset. 8 2022

  59. [59]

    Beta: A large benchmark database toward ssvep-bci application.Frontiers in neuroscience, 14:627, 2020

    Bingchuan Liu, Xiaoshan Huang, Yijun Wang, Xiaogang Chen, and Xiaorong Gao. Beta: A large benchmark database toward ssvep-bci application.Frontiers in neuroscience, 14:627, 2020

  60. [60]

    A benchmark dataset for ssvep-based brain- computer interfaces.IEEE Transactions on Neural Systems and Rehabilitation Engineering, 25(10):1746– 1752, 2017

    Yijun Wang, Xiaogang Chen, Xiaorong Gao, and Shangkai Gao. A benchmark dataset for ssvep-based brain- computer interfaces.IEEE Transactions on Neural Systems and Rehabilitation Engineering, 25(10):1746– 1752, 2017. 13

  61. [61]

    Efficient dual-frequency ssvep brain-computer interface system exploiting interocular visual resource disparities.Expert Systems with Applications, 252:124144, 2024

    Yike Sun, Yuhan Li, Yuzhen Chen, et al. Efficient dual-frequency ssvep brain-computer interface system exploiting interocular visual resource disparities.Expert Systems with Applications, 252:124144, 2024

  62. [62]

    A multimodal neuroimaging dataset to study spatiotemporal dynamics of brain activity and gait during real-world walking with and without a lower-limb exoskeleton, 2025

    OpenNeuro. A multimodal neuroimaging dataset to study spatiotemporal dynamics of brain activity and gait during real-world walking with and without a lower-limb exoskeleton, 2025

  63. [63]

    Monitoring error–related potentials

    Ricardo Chavarriaga and José del R Millán. Monitoring error–related potentials

  64. [64]

    Diagnosis of major depressive disorder using eeg signals

    Nadide Gulsah Gulenc and Mahmut Ozturk. Diagnosis of major depressive disorder using eeg signals. In 2024 Innovations in Intelligent Systems and Applications Conference (ASYU), pages 1–6. IEEE, 2024

  65. [65]

    Femba: Efficient and scalable eeg analysis with a bidirectional mamba foundation model.arXiv preprint arXiv:2502.06438, 2025

    Anna Tegon, Thorir Mar Ingolfsson, Xiaying Wang, Luca Benini, and Yawei Li. Femba: Efficient and scalable eeg analysis with a bidirectional mamba foundation model.arXiv preprint arXiv:2502.06438, 2025

  66. [66]

    Neurolm: A universal multi-task foundation model for bridging the gap between language and eeg signals.arXiv preprint arXiv:2409.00101, 2024

    Wei-Bang Jiang, Yansen Wang, Bao-Liang Lu, and Dongsheng Li. Neurolm: A universal multi-task foundation model for bridging the gap between language and eeg signals.arXiv preprint arXiv:2409.00101, 2024

  67. [67]

    Cbramod: A criss-cross brain foundation model for eeg decoding.arXiv preprint arXiv:2412.07236, 2024

    Jiquan Wang, Sha Zhao, Zhiling Luo, Yangxuan Zhou, Haiteng Jiang, Shijian Li, Tao Li, and Gang Pan. Cbramod: A criss-cross brain foundation model for eeg decoding.arXiv preprint arXiv:2412.07236, 2024

  68. [68]

    Eegmamba: An eeg foundation model with mamba.Neural Networks, page 107816, 2025

    Jiquan Wang, Sha Zhao, Zhiling Luo, Yangxuan Zhou, Shijian Li, and Gang Pan. Eegmamba: An eeg foundation model with mamba.Neural Networks, page 107816, 2025

  69. [69]

    Neuro-gpt: Towards a foundation model for eeg

    Wenhui Cui, Woojae Jeong, Philipp Thölke, Takfarinas Medani, Karim Jerbi, Anand A Joshi, and Richard M Leahy. Neuro-gpt: Towards a foundation model for eeg. In2024 IEEE International Symposium on Biomedical Imaging (ISBI), pages 1–5. IEEE, 2024

  70. [70]

    Reve: A foundation model for eeg–adapting to any setup with large-scale pretraining on 25,000 subjects.arXiv preprint arXiv:2510.21585, 2025

    Yassine El Ouahidi, Jonathan Lys, Philipp Thölke, Nicolas Farrugia, Bastien Pasdeloup, Vincent Gripon, Karim Jerbi, and Giulia Lioi. Reve: A foundation model for eeg–adapting to any setup with large-scale pretraining on 25,000 subjects.arXiv preprint arXiv:2510.21585, 2025

  71. [71]

    Bendr: Using transformers and a contrastive self-supervised learning task to learn from massive amounts of eeg data.Frontiers in Human Neuroscience, 15:653659, 2021

    Demetres Kostas, Stephane Aroca-Ouellette, and Frank Rudzicz. Bendr: Using transformers and a contrastive self-supervised learning task to learn from massive amounts of eeg data.Frontiers in Human Neuroscience, 15:653659, 2021. 14 A Tasks and datasets This supplementary section provides a complete inventory of the datasets included in OmniEEG-Bench. Suppl...

  72. [72]

    Test-retest reliability

    EEGDenoiseNet [16] 1 1 2 s Single-channel noise-related binary classification (2 classes). Test-retest reliability

  73. [73]

    Type-II: Biometrics and disease Biometrics

    Longitudinal test-retest [17] 45 60 2 s Cross-session subject identification (2 classes in the current mounted version). Type-II: Biometrics and disease Biometrics

  74. [74]

    MPI-LEMON-age [18] 203 64 1 s Age group classification derived from the MPI- LEMON cohort (4 groups in the current mounted version)

  75. [75]

    MPI-LEMON-gender [18] 203 64 1 s Gender classification derived from the MPI-LEMON cohort (2 classes)

  76. [76]

    Epilepsy and abnormalities

    MPI-LEMON-extraversion [18] 203 64 1 s Extraversion classification (2 classes). Epilepsy and abnormalities

  77. [77]

    HFO [19] 30 18 2 s High-frequency oscillation related binary classification (2 classes)

  78. [78]

    abnormal EEG classification (2 classes)

    TUAB [20] 325 23 10 s Clinical normal vs. abnormal EEG classification (2 classes)

  79. [79]

    TUEV [21] 370 32 5 s EEG event classification (6 classes)

  80. [80]

    TUEP [20] 200 32 10 s Seizure-related binary classification (2 classes)

Showing first 80 references.