pith. sign in

arxiv: 2605.14698 · v1 · pith:DWDNVW7Jnew · submitted 2026-05-14 · 💻 cs.LG · cs.AI

NeuroAtlas: Benchmarking Foundation Models for Clinical EEG and Brain-Computer Interfaces

Pith reviewed 2026-06-30 21:07 UTC · model grok-4.3

classification 💻 cs.LG cs.AI
keywords EEGfoundation modelsbenchmarkclinical applicationsbrain-computer interfacetime series
0
0 comments X

The pith

EEG-specific foundation models do not consistently outperform generic time-series models on clinical tasks.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper presents NeuroAtlas, the largest EEG benchmark with 42 datasets spanning clinical applications and brain-computer interfaces. It compares EEG-focused foundation models against generic time-series models using both standard and clinical-specific metrics such as event detection and brain age estimation. Results indicate that specialized models do not hold a clear advantage and that performance differs markedly across tasks. This challenges the expectation of a single pretrained model that works universally for EEG data in clinical settings.

Core claim

NeuroAtlas reveals that EEG-specific foundation models do not consistently outperform time-series foundation models without EEG-focused designs or pretraining. Standard metrics fall short for clinical assessment, so the benchmark employs measures like event-level decision quality, hypnogram features, and brain-age gaps. Within domains, rankings shift substantially, and overall pretrained models perform largely on par with only narrow edges for some, showing that current models have not achieved an out-of-the-box unified EEG model.

What carries the argument

The NeuroAtlas benchmark, consisting of 42 datasets totaling 260k hours and tailored clinical evaluation metrics for tasks including epilepsy, sleep medicine, and brain age estimation.

If this is right

  • Standard accuracy metrics alone cannot capture whether a model aids clinical decisions in EEG analysis.
  • Model performance must be assessed separately within each clinical domain due to substantial variation.
  • An effective unified EEG foundation model would need to exceed current performance levels on these clinical metrics across multiple datasets.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Researchers might explore pretraining strategies that incorporate diverse time-series data rather than EEG-only corpora.
  • The benchmark could serve as a standard testbed to track progress toward clinically viable EEG models.
  • If models remain domain-specific in practice, hybrid approaches combining foundation models with task-specific adaptations may be required for deployment.

Load-bearing premise

The selected 42 datasets and the custom clinical metrics sufficiently represent real clinical utility to allow meaningful comparisons between different foundation models.

What would settle it

Demonstration of a single foundation model that achieves superior performance over all others consistently across the benchmark's datasets and clinical metrics would challenge the conclusion that no unified EEG model exists yet.

Figures

Figures reproduced from arXiv: 2605.14698 by Angeliki-Ilektra Karaiskou, Anku Rani, Chanakya Ekbote, Christos Chatzichristos, Guido Gagliardi, Jaedong Hwang, Konstantinos Kontras, Maarten De Vos, Maarten Vanmarcke, Miguel Bhagubai, Mohammad Hossein Badiei, Paul Pu Liang, Stylianos G. Mouslech, Thomas Strypsteen, Trui Osselaer.

Figure 1
Figure 1. Figure 1: NeuroAtlas overview. NeuroAtlas spans four EEG domains (epilepsy, sleep, brain age, BCI) 42 datasets and ∼260k hours of EEG, and benchmarks three model families on each. Per￾domain task list and dataset coverage, with total EEG hours per domain (epilepsy: 58k h; sleep: 201k h, shared with brain age; BCI: 170 h). Each domain is evaluated with both standard and clinically grounded tasks: seizure detection wi… view at source ↗
Figure 2
Figure 2. Figure 2: Epilepsy: no model or family is consistently best across datasets, and most models perform on par with random initialization. (a) Family-best Event-Sens@FA AUC across seven continuously-labeled datasets (labels include total EEG hours), as the area under the mean (5-fold) sensitivity vs. log10(FA/h) curve over 0.1–100 FA/h under any-overlap event scoring [19]. The leading EEG-FM differs across datasets and… view at source ↗
Figure 3
Figure 3. Figure 3: Sleep: Performance is strongly dataset-dependent across tasks. Metrics are shown for the best-performing model per category, i.e., EEG-FMs, TS-FMs, and Supervised-Pre models with and without temporal context (sequence length l=1), and a random-initialized reference. (a) Cohen’s κ sleep staging agreement with ground-truth labels. Model categories achieve similar performance when excluding the effects of seq… view at source ↗
Figure 4
Figure 4. Figure 4: Supervised Models outperform EEG-FMs for Brain Age Prediction, but performance does not translate to neurodegenerative assessment biomarker. (A) Radar plot of performance (Pearson’s r) of best model per category of models in different cohorts. Supervised Brain Age prediction outperforms the rest. (B) Best average ranking of the best model of each category for MAE and r metrics. (C) Top-10 AUROC for cogniti… view at source ↗
Figure 5
Figure 5. Figure 5: BCI: MI performance is strongly driven by confounds Linear probing performance of frozen EEG and time-series foundation models across 17 BCI datasets spanning four paradigms (MI, ERP, SSVEP, Cognitive). Metrics are reported as normalized balanced accuracy (%), where 50% equals chance regardless of the number of classes. The per-dataset significance threshold (p < 0.05, binomial test [17]) is shown as a das… view at source ↗
Figure 6
Figure 6. Figure 6: Per-window seizure-probability distributions, stratified by ILAE-2017 main type, on TUSZ [PITH_FULL_IMAGE:figures/full_fig_p024_6.png] view at source ↗
Figure 7
Figure 7. Figure 7: AUROC and event-level Sens@FA yield systematically different model rankings on every [PITH_FULL_IMAGE:figures/full_fig_p026_7.png] view at source ↗
Figure 8
Figure 8. Figure 8: Foundation-model rankings on the three non-window EEG benchmarks (Bonn, TUAB, [PITH_FULL_IMAGE:figures/full_fig_p027_8.png] view at source ↗
Figure 9
Figure 9. Figure 9: Sleep staging performance is strongly dataset-dependent. Cohen’s κ sleep staging agreement with ground-truth labels for all datasets and models per model category, i.e., EEG-FMs, TS-FMs, and Supervised-Pre models, with and without temporal context (sequence length l=1), and a random-initialized reference. Cohen’s κ (above) and standard deviation over folds (below) are shown for each. Black borders mark the… view at source ↗
Figure 10
Figure 10. Figure 10: Performance is not related to dataset size or embedding dimension. (a), (b) Cohen’s κ sleep staging agreement for best performing model per model category, i.e., EEG-FMs, TS-FMs, and Supervised-Pre models, with and without temporal context (sequence length l=1), and a random￾initialized reference, is shown. (a) Dataset size in number of patients. (b) Dataset size in number of hours. (c) Mean Cohen’s κ ove… view at source ↗
Figure 11
Figure 11. Figure 11: Sleep staging performance is lowest for N1 and REM. F1-score per sleep stage for best performing model per category, i.e., EEG-FMs, TS-FMs, and Supervised-Pre models, with and without temporal context (sequence length l=1), and a random-initialized reference [PITH_FULL_IMAGE:figures/full_fig_p031_11.png] view at source ↗
Figure 12
Figure 12. Figure 12: Cohen’s κ sleep staging agreement for best performing EEG-FM per subset. (a) Subsets recorded at different sites differ significantly due to differing acquisition protocols. (b) Clinical cohorts are not systematically associated with lower performance. DOD-O contains patients with OSA, DOD-H contains healthy controls. ISRUC Subgroup I and II contain patients with sleep disorders, Subgroup III contains hea… view at source ↗
Figure 13
Figure 13. Figure 13: Sleep staging agreement is highest with consensus scoring, as ambiguous epochs are excluded. Cohen’s κ sleep staging agreement with ground-truth labels scored by different experts, and consensus scoring for best performing EEG-FM [PITH_FULL_IMAGE:figures/full_fig_p032_13.png] view at source ↗
Figure 14
Figure 14. Figure 14: Sleep staging and hypnogram feature performance are not consistently aligned across datasets. Normalized Mean Absolute Error (MAE) between ground truth and predicted hypnogram features for all datasets and models per model category, i.e., EEG-FMs, TS-FMs, and Supervised-Pre models, with and without temporal context (sequence length l=1), and a random-initialized reference. Black borders mark the cohort be… view at source ↗
Figure 15
Figure 15. Figure 15: (FMs fail to adequately capture microarousals in their embeddings. Area under the precision-recall curve (AUPRC) for sleep event detection for best performing model per category. Prevalence indicates chance baseline. (a) Microarousals with a minimal duration of 3 seconds. (b) Apnea and hypopnea events with a minimal duration of 10 seconds. (c) Periodic limb movement with a minimal duration of 0.5 seconds … view at source ↗
Figure 16
Figure 16. Figure 16: Models insufficiently encode cognitive impairment for reliable detection. Area under the receiver-operator curve (AUROC) for diagnosis of cognitive impairment in the PhysioNet 2026 dataset for best performing model per model category, i.e., EEG-FMs, TS-FMs, and Supervised-Pre models, with and without temporal context (sequence length l=1), and a random-initialized reference, is shown. D.3.4 Cognitive Impa… view at source ↗
Figure 17
Figure 17. Figure 17: Embedding dimensionality does not correlate with brain-age performance. Each marker is one frozen-encoder checkpoint, with metrics fold-averaged across the seven main-pool brain-age cohorts (PN2026, WSC, ISRUC, SleepEDF SC, CFS, MrOS). Marker color denotes encoder family. (A) Embedding dim versus mean Pearson r between predicted and chronological age. (B) Embedding dim versus mean MAE improvement over a p… view at source ↗
Figure 18
Figure 18. Figure 18: Aggregating per subject tends to outperform aggregating per epoch. Each marker is one (model, cohort) pair, fold-averaged across the 5 folds; The horizontal axis shows ∆ MAE = MAEepoch − MAEsubject; positive values indicate that subject-level Ridge (mean-pool a recording’s epoch embeddings into one subject vector before fitting to age beats epoch-level Ridge fit per 30s epoch, then average predictions per… view at source ↗
Figure 19
Figure 19. Figure 19: Brain-age-gap (BAG) distributions on the PN2026 Healthy/Cognitive Impairment holdout. BAG was computed per subject as predicted age minus chronological age using the 5-fold ensemble prediction. We show the 10 models with the largest positive mean BAG difference, ordered by ∆BAG = BAGCI − BAGH. For each model, box plots show the BAG distributions of 100 age-matched healthy subjects and 100 cognitively impa… view at source ↗
Figure 20
Figure 20. Figure 20: Per-paradigm comparison of frozen EEG foundation models under No filtering vs. Confound filtering for ERP, SSVEP, and Cognitive BCI datasets. Bars show the mean normalized balanced accuracy (%) across datasets within each paradigm; error bars denote the standard error of the mean. Asterisks indicate significant paired differences between tracks (Wilcoxon signed-rank test; ∗ p < .05, ∗∗ p < .01, ∗ ∗ ∗ p < … view at source ↗
Figure 21
Figure 21. Figure 21: Normalized balanced accuracy of foundation model families under task-specific confound [PITH_FULL_IMAGE:figures/full_fig_p043_21.png] view at source ↗
read the original abstract

Foundation models (FMs) promise to extract unified representations that generalize across downstream tasks. They have emerged across fields, including electroencephalography (EEG), but it is less clear how effective they are in this particular field. Published evaluations differ in datasets, in the EEG-specific preprocessing that might influence reported results, and in the reported metrics, frequently obscuring the clinical relevance in EEG. We introduce NeuroAtlas, the largest EEG benchmark to date: 42 datasets and 260k hours covering clinical EEG (epilepsy, sleep medicine, brain age estimation) and brain-computer interfaces, and include multiple datasets per task along with bespoke clinical evaluation metrics. Besides evaluating EEG-FMs with respect to supervised baselines, we present results from generic time-series FMs. We report three findings. First, EEG-specific FMs do not consistently outperform time-series FMs, which have neither EEG-focused architectures nor been pretrained on EEG. Second, standard machine learning metrics are insufficient to assess clinical utility: thus, we thoroughly evaluate more appropriate measures such as the quality of event-level decision-making, hypnogram-derived features, and the brain-age gap in the domains of epilepsy, sleep, and brain age, respectively. Third, model rankings and performance can vary substantially within domains. We conclude that pretrained models perform largely on par, with only narrow advantages for a few, and that current models do not yet deliver on the promise of an out-of-the-box unified EEG model. NeuroAtlas exposes this gap and provides the datasets and metrics for the next generation of unified EEG FMs.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper presents NeuroAtlas, a large-scale benchmark comprising 42 EEG datasets (260k hours) spanning epilepsy, sleep, brain age, and BCI tasks. It evaluates EEG-specific foundation models against generic time-series foundation models and supervised baselines using both standard ML metrics and custom clinical metrics (event-level decisions, hypnogram features, brain-age gap). The three main findings are that EEG-specific FMs do not consistently outperform time-series FMs, standard metrics are insufficient for clinical utility, and model rankings vary substantially within domains; the conclusion is that current models fall short of delivering an out-of-the-box unified EEG model.

Significance. If the evaluation protocol and metric choices prove robust, NeuroAtlas would provide a valuable standardized resource for EEG foundation model development by exposing performance gaps, within-domain variability, and the limitations of accuracy/AUROC-style metrics. The inclusion of multiple datasets per task and direct comparison to non-EEG time-series models is a constructive contribution to the field.

major comments (2)
  1. [Abstract, §4] Abstract and §4 (results): The headline claim that EEG-specific FMs do not consistently outperform time-series FMs rests on the 42 datasets and bespoke metrics being representative of clinical utility. The manuscript does not provide external validation or selection criteria showing that these corpora and derived scores (event-level decisions, hypnogram features, brain-age gap) track real clinical endpoints better than standard metrics or generalize beyond the chosen collection.
  2. [Methods (evaluation protocol)] Methods section on evaluation protocol: No details are supplied on data splits, preprocessing consistency across the 42 datasets, or statistical testing for the reported performance gaps and domain variations. Without these, it is unclear whether the three findings are robustly supported rather than artifacts of particular splits or post-hoc metric choices.
minor comments (2)
  1. [§3, tables] Table captions and §3 should explicitly state the number of subjects, recording lengths, and class distributions for each of the 42 datasets to allow readers to assess balance.
  2. [Discussion] The paper should add a limitations paragraph discussing potential selection bias in the 42 datasets and how future work could expand coverage.

Simulated Author's Rebuttal

2 responses · 0 unresolved

Thank you for the constructive feedback on NeuroAtlas. We address the major comments point by point below, providing the strongest honest defense of the work while acknowledging where revisions are warranted.

read point-by-point responses
  1. Referee: [Abstract, §4] Abstract and §4 (results): The headline claim that EEG-specific FMs do not consistently outperform time-series FMs rests on the 42 datasets and bespoke metrics being representative of clinical utility. The manuscript does not provide external validation or selection criteria showing that these corpora and derived scores (event-level decisions, hypnogram features, brain-age gap) track real clinical endpoints better than standard metrics or generalize beyond the chosen collection.

    Authors: The 42 datasets were assembled from all publicly available sources that provide sufficient scale and task coverage for the four domains, with explicit inclusion of multiple datasets per task to quantify within-domain variability (as stated in the abstract and §3). The bespoke metrics were chosen because they map directly onto clinical decision points (e.g., seizure-event detection rather than per-sample classification). We agree that prospective external validation against downstream clinical outcomes lies outside the scope of a benchmarking study; the manuscript therefore presents these metrics as improved proxies rather than validated surrogates. In revision we will add an explicit subsection on dataset selection rationale and a limitations paragraph discussing the absence of external clinical-endpoint correlation. revision: partial

  2. Referee: [Methods (evaluation protocol)] Methods section on evaluation protocol: No details are supplied on data splits, preprocessing consistency across the 42 datasets, or statistical testing for the reported performance gaps and domain variations. Without these, it is unclear whether the three findings are robustly supported rather than artifacts of particular splits or post-hoc metric choices.

    Authors: We accept that the current Methods section is insufficiently detailed for full reproducibility. The revised manuscript will expand this section to specify (i) subject-wise or temporal split protocols used for each dataset, (ii) the single preprocessing pipeline applied uniformly across all 42 corpora, and (iii) the statistical tests (including multiple-comparison correction) used to assess significance of performance differences and domain-level rank variability. revision: yes

Circularity Check

0 steps flagged

Empirical benchmarking study with no derivation chain or self-referential reductions

full rationale

This paper is a purely empirical benchmarking study. It defines 42 external datasets, applies models to them, and reports performance using standard and bespoke metrics. There are no equations, derivations, fitted parameters renamed as predictions, uniqueness theorems, or ansatzes. All claims (e.g., EEG-specific FMs not consistently outperforming time-series FMs) follow directly from the reported evaluation results on the chosen corpora. No load-bearing step reduces to a self-citation or internal definition by construction. This is the normal case of a self-contained empirical paper.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The central claims rest on assumptions about dataset representativeness and metric validity rather than new mathematical constructs or fitted parameters.

axioms (2)
  • domain assumption The chosen 42 datasets and bespoke clinical metrics accurately capture clinical utility across epilepsy, sleep, brain age, and BCI tasks.
    This premise is required for the second and third findings and the overall conclusion that current models fall short.
  • domain assumption Preprocessing and evaluation protocols are consistent enough across the 42 datasets to allow meaningful model comparisons.
    The abstract notes that published evaluations differ in preprocessing; the benchmark claims to address this but the assumption underpins all reported rankings.

pith-pipeline@v0.9.1-grok · 5890 in / 1493 out tokens · 39424 ms · 2026-06-30T21:07:30.212880+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

104 extracted references · 19 canonical work pages · 5 internal anchors

  1. [1]

    Performance variation in motor imagery brain–computer interface: A brief review.Journal of Neuroscience Methods, 243:103–110, 2015

    Minkyu Ahn and Sung Chan Jun. Performance variation in motor imagery brain–computer interface: A brief review.Journal of Neuroscience Methods, 243:103–110, 2015

  2. [2]

    Haaglanden Medisch Centrum sleep staging database

    Diego Alvarez-Estevez and Roselyne Rijsman. Haaglanden Medisch Centrum sleep staging database. PhysioNet, March 2022. Version 1.1

  3. [3]

    Ralph G Andrzejak, Klaus Lehnertz, Florian Mormann, Christoph Rieke, Peter David, and Christian E Elger. Indications of nonlinear deterministic and finite-dimensional structures in time series of brain electrical activity: Dependence on recording region and brain state.Physical Review E, 64(6):061907, 2001

  4. [4]

    Chronos: Learning the Language of Time Series

    Abdul Fatir Ansari, Lorenzo Stella, Caner Turkmen, Xiyuan Zhang, Pedro Mercado, Huibin Shen, Oleksandr Shchur, Syama Sundar Rangapuram, Sebastian Pineda Arango, Shubham Kapoor, et al. Chronos: Learning the language of time series.arXiv preprint arXiv:2403.07815, 2024

  5. [5]

    Ilae classification of the epilepsies: the 2017 revision and update.Epilepsy and paroxysmal conditions, 9(1):6–25, 2017

    GN Avakyan, DV Blinov, A V Lebedeva, SG Burd, and GG Avakyan. Ilae classification of the epilepsies: the 2017 revision and update.Epilepsy and paroxysmal conditions, 9(1):6–25, 2017

  6. [6]

    Latent alignment in deep learning models for EEG decoding.Journal of neural engineering, 22(1):016047, 2025

    Stylianos Bakas, Siegfried Ludwig, Dimitrios A Adamos, Nikolaos Laskaris, Yannis Panagakis, and Stefanos Zafeiriou. Latent alignment in deep learning models for EEG decoding.Journal of neural engineering, 22(1):016047, 2025

  7. [7]

    Narcolepsy.Journal of Sleep Research, 31(4):e13631, August 2022

    Lucie Barateau, Fabio Pizza, Giuseppe Plazzi, and Yves Dauvilliers. Narcolepsy.Journal of Sleep Research, 31(4):e13631, August 2022

  8. [8]

    NeuroRVQ: Multi-Scale Biosignal Tokenization for Generative Foundation Models

    Konstantinos Barmpas, Na Lee, Alexandros Koliousis, Yannis Panagakis, Dimitrios A Adamos, Nikolaos Laskaris, and Stefanos Zafeiriou. Neurorvq: Multi-scale eeg tokenization for generative large brainwave models.arXiv preprint arXiv:2510.13068, 2025

  9. [9]

    Seizeit2: Wearable dataset of patients with focal epilepsy.Scientific Data, 12(1):1228, 2025

    Miguel Bhagubai, Christos Chatzichristos, Lauren Swinnen, Jaiver Macea, Jingwei Zhang, Lieven Lagae, Katrien Jansen, Andreas Schulze-Bonhage, Francisco Sales, Benno Mahler, et al. Seizeit2: Wearable dataset of patients with focal epilepsy.Scientific Data, 12(1):1228, 2025

  10. [10]

    Towards automated seizure detection with wearable eeg–grand challenge.IEEE Open Journal of Signal Processing, 5:717–724, 2024

    Miguel Bhagubai, Lauren Swinnen, Evy Cleeren, Wim Van Paesschen, Maarten De V os, and Christos Chatzichristos. Towards automated seizure detection with wearable eeg–grand challenge.IEEE Open Journal of Signal Processing, 5:717–724, 2024

  11. [11]

    The power of ecg in semi-automated seizure detection in addition to two-channel behind-the-ear eeg.Bioengineering, 10(4), 2023

    Miguel Bhagubai, Kaat Vandecasteele, Lauren Swinnen, Jaiver Macea, Christos Chatzichristos, Maarten De V os, and Wim Van Paesschen. The power of ecg in semi-automated seizure detection in addition to two-channel behind-the-ear eeg.Bioengineering, 10(4), 2023

  12. [12]

    Associations between sleep architecture and sleep-disordered breathing and cognition in older community-dwelling men: the osteoporotic fractures in men sleep study.J

    Terri Blackwell, Kristine Yaffe, Sonia Ancoli-Israel, Susan Redline, Kristine E Ensrud, Marcia L Stefanick, Alison Laffan, Katie L Stone, and Osteoporotic Fractures in Men Study Group. Associations between sleep architecture and sleep-disordered breathing and cognition in older community-dwelling men: the osteoporotic fractures in men sleep study.J. Am. G...

  13. [13]

    Automated detection of cognitive impairment in clinical practice.J

    Robyn M Busch, Olivia Hogue, Abagail F Postle, and Darlene P Floden. Automated detection of cognitive impairment in clinical practice.J. Neurol., 271(8):5187–5196, August 2024

  14. [14]

    Emerging properties in self-supervised vision transformers

    Mathilde Caron, Hugo Touvron, Ishan Misra, Hervé Jégou, Julien Mairal, Piotr Bojanowski, and Armand Joulin. Emerging properties in self-supervised vision transformers. InProceedings of the IEEE/CVF international conference on computer vision, pages 9650–9660, 2021

  15. [15]

    Racial/ethnic differences in sleep disturbances: The Multi-Ethnic study of atherosclerosis (MESA).Sleep, 38(6):877–888, June 2015

    Xiaoli Chen, Rui Wang, Phyllis Zee, Pamela L Lutsey, Sogol Javaheri, Carmela Alcántara, Chandra L Jackson, Michelle A Williams, and Susan Redline. Racial/ethnic differences in sleep disturbances: The Multi-Ethnic study of atherosclerosis (MESA).Sleep, 38(6):877–888, June 2015

  16. [16]

    Sleep stability and transitions in patients with idiopathic REM sleep behavior disorder and patients with Parkinson’s disease

    Julie Anja Engelhard Christensen, Poul Jennum, Henriette Koch, Rune Frandsen, Marielle Zoetmulder, Lars Arvastson, Søren Rahn Christensen, and Helge Bjarrup Dissing Sorensen. Sleep stability and transitions in patients with idiopathic REM sleep behavior disorder and patients with Parkinson’s disease. Clinical Neurophysiology, 127(1):537–543, January 2016

  17. [17]

    Etienne Combrisson and Karim Jerbi. Exceeding chance level by chance: The caveat of theoretical chance levels in brain signal classification and statistical assessment of decoding accuracy.Journal of Neuroscience Methods, 250:126–136, 2015. Cutting-edge EEG Methods. 10

  18. [18]

    Neuro-gpt: Towards a foundation model for eeg

    Wenhui Cui, Woojae Jeong, Philipp Thölke, Takfarinas Medani, Karim Jerbi, Anand A Joshi, and Richard M Leahy. Neuro-gpt: Towards a foundation model for eeg. In2024 IEEE International Symposium on Biomedical Imaging (ISBI), pages 1–5. IEEE, 2024

  19. [19]

    Jonathan Dan, Una Pale, Alireza Amirshahi, William Cappelletti, Thorir Mar Ingolfsson, Xiaying Wang, Andrea Cossettini, Adriano Bernini, Luca Benini, Sándor Beniczky, David Atienza, and Philippe Ryvlin. Szcore: Seizure community open-source research evaluation framework for the validation of electroencephalography-based automated seizure detection algorit...

  20. [20]

    Temple University, 2017

    Silvia López de Diego.Automated interpretation of abnormal adult electroencephalograms. Temple University, 2017

  21. [21]

    Siena Scalp EEG Database.PhysioNet, August 2020

    Paolo Detti. Siena Scalp EEG Database.PhysioNet, August 2020. Version 1.0.0

  22. [22]

    Engemann, Apolline Mellot, Richard Höchenberger, Hubert Banville, David Sabbagh, Lukas Gemein, Tonio Ball, and Alexandre Gramfort

    Denis A. Engemann, Apolline Mellot, Richard Höchenberger, Hubert Banville, David Sabbagh, Lukas Gemein, Tonio Ball, and Alexandre Gramfort. A reusable benchmark of brain-age prediction from m/eeg resting-state signals.NeuroImage, 262:119521, 2022

  23. [23]

    Autocali- bration and recurrent adaptation: Towards a plug and play online erd-bci.IEEE Transactions on Neural Systems and Rehabilitation Engineering, 20(3):313–319, 2012

    Josef Faller, Carmen Vidaurre, Teodoro Solis-Escalante, Christa Neuper, and Reinhold Scherer. Autocali- bration and recurrent adaptation: Towards a plug and play online erd-bci.IEEE Transactions on Neural Systems and Rehabilitation Engineering, 20(3):313–319, 2012

  24. [24]

    Moment: A family of open time-series foundation models.arXiv preprint arXiv:2402.03885, 2024

    Mononito Goswami, Konrad Szafer, Arjun Choudhry, Yifu Cai, Shuo Li, and Artur Dubrawski. Moment: A family of open time-series foundation models.arXiv preprint arXiv:2402.03885, 2024

  25. [25]

    Neural network transfer learning with fast calibration for mental imagery decoding

    Pierre Guetschel, Théodore Papadopoulo, and Michael Tangermann. Neural network transfer learning with fast calibration for mental imagery decoding. InProceedings of the 10th International Brain-Computer Interface Meeting 2023, June 2023

  26. [26]

    During, and Valentin Thorey

    Antoine Guillot, Fabien Sauvet, Emmanuel H. During, and Valentin Thorey. Dreem open datasets: Multi-scored sleep datasets to compare human and automated sleep staging.IEEE Transactions on Neural Systems and Rehabilitation Engineering, 28(9):1955–1965, 2020

  27. [27]

    CHB-MIT Scalp EEG Database.PhysioNet, June 2010

    John Guttag. CHB-MIT Scalp EEG Database.PhysioNet, June 2010. Version 1.0.0

  28. [28]

    Elisabeth R.M. Heremans, Astrid Devulder, Pascal Borzée, Rik Vandenberghe, François-Laurent Winter, Mathieu Vandenbulcke, Maarten Van Den Bossche, Bertien Buyse, Dries Testelmans, Wim Van Paesschen, and Maarten De V os. Wearable sleep recording augmented by artificial intelligence for Alzheimer’s disease screening, December 2024

  29. [29]

    An efficient p300-based brain–computer interface for disabled subjects.Journal of Neuroscience methods, 167(1):115–125, 2008

    Ulrich Hoffmann, Jean-Marc Vesin, Touradj Ebrahimi, and Karin Diserens. An efficient p300-based brain–computer interface for disabled subjects.Journal of Neuroscience methods, 167(1):115–125, 2008

  30. [30]

    Neurolm: A universal multi-task foundation model for bridging the gap between language and eeg signals.arXiv preprint arXiv:2409.00101, 2024

    Wei-Bang Jiang, Yansen Wang, Bao-Liang Lu, and Dongsheng Li. Neurolm: A universal multi-task foundation model for bridging the gap between language and eeg signals.arXiv preprint arXiv:2409.00101, 2024

  31. [31]

    Large brain model for learning generic representations with tremendous eeg data in bci.arXiv preprint arXiv:2405.18765, 2024

    Wei-Bang Jiang, Li-Ming Zhao, and Bao-Liang Lu. Large brain model for learning generic representations with tremendous eeg data in bci.arXiv preprint arXiv:2405.18765, 2024

  32. [32]

    Jirsaraie, Alexander J

    Robert J. Jirsaraie, Alexander J. Gorelik, M. M. Gatavins, Denis A. Engemann, Ryan Bogdan, Deanna M. Barch, and Aristeidis Sotiras. A systematic review of multimodal brain age studies: Uncovering a divergence between model accuracy and utility.Patterns, 4(4):100712, 2023

  33. [33]

    Erp core: An open resource for human event-related potential research.NeuroImage, 225:117465, 2021

    Emily S Kappenman, Jaclyn L Farrens, Wendy Zhang, Andrew X Stewart, and Steven J Luck. Erp core: An open resource for human event-related potential research.NeuroImage, 225:117465, 2021

  34. [34]

    EEG- Bench: A Benchmark for EEG Foundation Models in Clinical Applications, 2025

    Ard Kastrati, Josua Bürki, Jonas Lauer, Cheng Xuan, Raffaele Iaquinto, and Roger Wattenhofer. EEG- Bench: A Benchmark for EEG Foundation Models in Clinical Applications, 2025. Version Number: 1

  35. [35]

    Stamos Katsigiannis and Naeem Ramzan. DREAMER: A Database for Emotion Recognition Through EEG and ECG Signals From Wireless Low-cost Off-the-Shelf Devices.IEEE Journal of Biomedical and Health Informatics, 22(1):98–107, 2018

  36. [36]

    The sleep-EDF database [expanded], 2018

    Bastiaan Kemp, Aeilko Zwinderman, Bert Tuk, Hilbert Kamphuisen, and Josefien Oberyé. The sleep-EDF database [expanded], 2018. 11

  37. [37]

    Isruc-sleep: A comprehensive public dataset for sleep researchers.Computer Methods and Programs in Biomedicine, 124, 11 2015

    Sirvan Khalighi, Teresa Sousa, José Santos, and Urbano Nunes. Isruc-sleep: A comprehensive public dataset for sleep researchers.Computer Methods and Programs in Biomedicine, 124, 11 2015

  38. [38]

    The nmt scalp eeg dataset: An open-source annotated dataset of healthy and pathological eeg recordings for predictive modeling.Frontiers in neuroscience, 15:755817, 2022

    Hassan Aqeel Khan, Rahat Ul Ain, Awais Mehmood Kamboh, Hammad Tanveer Butt, Saima Shafait, Wasim Alamgir, Didier Stricker, and Faisal Shafait. The nmt scalp eeg dataset: An open-source annotated dataset of healthy and pathological eeg recordings for predictive modeling.Frontiers in neuroscience, 15:755817, 2022

  39. [39]

    A 40-class ssvep speller dataset: Beta range stimulation for low-fatigue bci applications.Scientific Data, 12(1):1751, 2025

    Heegyu Kim, Kyungho Won, Minkyu Ahn, and Sung Chan Jun. A 40-class ssvep speller dataset: Beta range stimulation for low-fatigue bci applications.Scientific Data, 12(1):1751, 2025

  40. [40]

    The epilepsiae database: An extensive electroencephalography database of epilepsy patients.Epilepsia, 53(9):1669–1676, 2012

    Juliane Klatt, Hinnerk Feldwisch-Drentrup, Matthias Ihle, Vincent Navarro, Markus Neufang, Cesar Teix- eira, Claude Adam, Mario Valderrama, Catalina Alvarado-Rojas, Adrien Witon, Michel Le Van Quyen, Francisco Sales, Antonio Dourado, Jens Timmer, Andreas Schulze-Bonhage, and Bjoern Schelter. The epilepsiae database: An extensive electroencephalography dat...

  41. [41]

    Core- sleep: A multimodal fusion framework for time series robust to imperfect modalities.IEEE Transactions on Neural Systems and Rehabilitation Engineering, 32:840–849, 2024

    Konstantinos Kontras, Christos Chatzichristos, Huy Phan, Johan Suykens, and Maarten De V os. Core- sleep: A multimodal fusion framework for time series robust to imperfect modalities.IEEE Transactions on Neural Systems and Rehabilitation Engineering, 32:840–849, 2024

  42. [42]

    PhD thesis, GIPSA-lab, 2019

    Louis Korczowski, Ekaterina Ostaschenko, Anton Andreev, Grégoire Cattan, Pedro Luiz Coelho Ro- drigues, Violette Gautheret, and Marco Congedo.Brain Invaders calibration-less P300-based BCI using dry EEG electrodes Dataset (bi2014a). PhD thesis, GIPSA-lab, 2019

  43. [43]

    Eeg foundation models: a critical review of current progress and future directions.Journal of neural engineering, 23(2):021001, 2026

    Gayal Kuruppu, Neeraj Wagh, Vaclav Kremen, and Yogatheesan Varatharajah. Eeg foundation models: a critical review of current progress and future directions.Journal of neural engineering, 23(2):021001, 2026

  44. [44]

    Eegnet: a compact convolutional neural network for eeg-based brain–computer interfaces

    Vernon J Lawhern, Amelia J Solon, Nicholas R Waytowich, Stephen M Gordon, Chou P Hung, and Brent J Lance. Eegnet: a compact convolutional neural network for eeg-based brain–computer interfaces. Journal of neural engineering, 15(5):056013, 2018

  45. [45]

    SleePyCo: Automatic sleep scoring with feature pyramid and contrastive learning.Expert Systems with Applications, 240:122551, 2024

    Seongju Lee, Yeonguk Yu, Seunghyeok Back, Hogeon Seo, and Kyoobin Lee. SleePyCo: Automatic sleep scoring with feature pyramid and contrastive learning.Expert Systems with Applications, 240:122551, 2024

  46. [46]

    Interrater reliability of sleep stage scoring: a meta-analysis.Journal of Clinical Sleep Medicine, 18(1):193–202, January 2022

    Yun Ji Lee, Jae Yong Lee, Jae Hoon Cho, and Ji Ho Choi. Interrater reliability of sleep stage scoring: a meta-analysis.Journal of Clinical Sleep Medicine, 18(1):193–202, January 2022

  47. [47]

    Sijie Li, Bian Liu, Qing-hao Li, Yan Zhang, Haihua Zhang, Shan Gao, Longcai Wang, Tao Wang, Zhifa Han, Guiyou Liu, and Kun Wang. Evaluating the Bidirectional Causal Association Between Daytime Napping and Alzheimer’s Disease Using Mendelian Randomization.Journal of Alzheimer’s Disease, 89(4):1315–1322, October 2022

  48. [48]

    Eeg foundation models: Progresses, benchmarking, and open problems.arXiv preprint arXiv:2601.17883, 2026

    Dingkun Liu, Yuheng Chen, Zhu Chen, Zhenyao Cui, Yaozhi Wen, Jiayu An, Jingwei Luo, and Don- grui Wu. Eeg foundation models: Progresses, benchmarking, and open problems.arXiv preprint arXiv:2601.17883, 2026

  49. [49]

    Mirepnet: A pipeline and foundation model for eeg-based motor imagery classification.arXiv preprint arXiv:2507.20254, 2025

    Dingkun Liu, Zhu Chen, Jingwei Luo, Shijie Lian, and Dongrui Wu. Mirepnet: A pipeline and foundation model for eeg-based motor imagery classification.arXiv preprint arXiv:2507.20254, 2025

  50. [50]

    An eeg motor imagery dataset for brain computer interface in acute stroke patients.Scientific Data, 11(1):131, 2024

    Haijie Liu, Penghu Wei, Haochong Wang, Xiaodong Lv, Wei Duan, Meijie Li, Yan Zhao, Qingmei Wang, Xinyuan Chen, Gaige Shi, et al. An eeg motor imagery dataset for brain computer interface in acute stroke patients.Scientific Data, 11(1):131, 2024

  51. [51]

    Event-related potentials

    Steven J Luck. Event-related potentials. 2012

  52. [52]

    Shama, Jiasen Jing, and Archana Venkataraman

    Deeksha M. Shama, Jiasen Jing, and Archana Venkataraman. Deepsoz: A robust deep model for joint temporal and spatial seizure onset localization from multichannel eeg data. InInternational Conference on Medical Image Computing and Computer-Assisted Intervention, pages 184–194. Springer, 2023

  53. [53]

    Walter McNicholas, Liam Doherty, Silke Ryan, John Garvey, Patricia Boyle, and Eric Chua. St. vincent’s university hospital / university college dublin sleep apnea database, 2004

  54. [54]

    A comparison study of canoni- cal correlation analysis based methods for detecting steady-state visual evoked potentials.PloS one, 10(10):e0140703, 2015

    Masaki Nakanishi, Yijun Wang, Yu-Te Wang, and Tzyy-Ping Jung. A comparison study of canoni- cal correlation analysis based methods for detecting steady-state visual evoked potentials.PloS one, 10(10):e0140703, 2015. 12

  55. [55]

    Epileptic EEG Dataset, March 2021

    Wassim Nasreddine. Epileptic EEG Dataset, March 2021

  56. [56]

    A Time Series is Worth 64 Words: Long-term Forecasting with Transformers

    Yuqi Nie, Nam H Nguyen, Phanwadee Sinthong, and Jayant Kalagnanam. A time series is worth 64 words: Long-term forecasting with transformers.arXiv preprint arXiv:2211.14730, 2022

  57. [57]

    The temple university hospital eeg data corpus.Frontiers in neuroscience, 10:196, 2016

    Iyad Obeid and Joseph Picone. The temple university hospital eeg data corpus.Frontiers in neuroscience, 10:196, 2016

  58. [58]

    Montreal archive of sleep studies: an open-access resource for instrument benchmarking and exploratory research.Journal of Sleep Research, 23(6):628–635, 2014

    Christian O’Reilly, Nadia Gosselin, Julie Carrier, and Tore Nielsen. Montreal archive of sleep studies: an open-access resource for instrument benchmarking and exploratory research.Journal of Sleep Research, 23(6):628–635, 2014

  59. [59]

    Reve: A foundation model for eeg–adapting to any setup with large-scale pretraining on 25,000 subjects.arXiv preprint arXiv:2510.21585, 2025

    Yassine El Ouahidi, Jonathan Lys, Philipp Thölke, Nicolas Farrugia, Bastien Pasdeloup, Vincent Gripon, Karim Jerbi, and Giulia Lioi. Reve: A foundation model for eeg–adapting to any setup with large-scale pretraining on 25,000 subjects.arXiv preprint arXiv:2510.21585, 2025

  60. [60]

    DCSM sleep staging dataset, 2021

    Mathias Perslev, Sune Darkner, Lykke Kempfner, Miki Nikolic, Poul Jørgen Jennum, and Christian Igel. DCSM sleep staging dataset, 2021

  61. [61]

    Chen, and Maarten De V os

    Huy Phan, Fernando Andreotti, Navin Cooray, Oliver Y . Chen, and Maarten De V os. SeqSleepNet: End-to-End Hierarchical Recurrent Neural Network for Sequence-to-Sequence Automatic Sleep Staging. IEEE Transactions on Neural Systems and Rehabilitation Engineering, 27(3):400–410, March 2019

  62. [62]

    Chén, Philipp Koch, Alfred Mertins, and Maarten De V os

    Huy Phan, Kaare Mikkelsen, Oliver Y . Chén, Philipp Koch, Alfred Mertins, and Maarten De V os. SleepTransformer: Automatic Sleep Staging with Interpretability and Uncertainty Quantification.IEEE Transactions on Biomedical Engineering, 69(8):2456–2467, August 2022. arXiv:2105.11043 [cs]

  63. [63]

    Experimenters’ influence on mental- imagery based brain-computer interface user training.International Journal of Human-Computer Studies, 149:102603, 2021

    Léa Pillette, Aline Roc, Bernard N’kaoua, and Fabien Lotte. Experimenters’ influence on mental- imagery based brain-computer interface user training.International Journal of Human-Computer Studies, 149:102603, 2021

  64. [64]

    S. F. Quan, B. V . Howard, C. Iber, J. P. Kiley, F. J. Nieto, G. T. O’Connor, D. M. Rapoport, S. Redline, J. Robbins, J. M. Samet, and P. W. Wahl. The sleep heart health study: design, rationale, and methods. Sleep, 20(12):1077–1085, December 1997

  65. [65]

    Learning transferable visual models from natural language supervision

    Alec Radford, Jong Wook Kim, Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sandhini Agarwal, Girish Sastry, Amanda Askell, Pamela Mishkin, Jack Clark, et al. Learning transferable visual models from natural language supervision. InInternational conference on machine learning, pages 8748–8763. PmLR, 2021

  66. [66]

    Haider Raza, Dheeraj Rathee, Shang-Ming Zhou, Hubert Cecotti, and Girijesh Prasad. Covariate shift estimation based adaptive ensemble learning for handling non-stationarity in motor imagery related eeg-based brain-computer interface.Neurocomputing, 343:154–166, 2019. Learning in the Presence of Class Imbalance and Concept Drift

  67. [67]

    The familial aggregation of obstructive sleep apnea.Am

    S Redline, P V Tishler, T D Tosteson, J Williamson, K Kump, I Browner, V Ferrette, and P Krejci. The familial aggregation of obstructive sleep apnea.Am. J. Respir. Crit. Care Med., 151(3 Pt 1):682–687, March 1995

  68. [68]

    Attention and p300-based bci performance in people with amyotrophic lateral sclerosis.Frontiers in human neuroscience, 7:732, 2013

    Angela Riccio, Luca Simione, Francesca Schettini, Alessia Pizzimenti, Maurizio Inghilleri, Marta Olivetti Belardinelli, Donatella Mattia, and Febo Cincotti. Attention and p300-based bci performance in people with amyotrophic lateral sclerosis.Frontiers in human neuroscience, 7:732, 2013

  69. [69]

    Julio Rodriguez-Larios and Kaat Alaerts. Tracking transient changes in the neural frequency architecture: harmonic relationships between theta and alpha peaks facilitate cognitive performance.Journal of Neuroscience, 39(32):6291–6298, 2019

  70. [70]

    Carol L Rosen, Dennis Auckley, Ruth Benca, Nancy Foldvary-Schaefer, Conrad Iber, Vishesh Kapur, Michael Rueschman, Phyllis Zee, and Susan Redline. A multisite randomized trial of portable sleep studies and positive airway pressure autotitration versus laboratory-based polysomnography for the diagnosis and treatment of obstructive sleep apnea: the HomePAP ...

  71. [71]

    The temple university hospital seizure detection corpus.Frontiers in neuroinformatics, 12:83, 2018

    Vinit Shah, Eva V on Weltin, Silvia Lopez, James Riley McHugh, Lillian Veloso, Meysam Golmohammadi, Iyad Obeid, and Joseph Picone. The temple university hospital seizure detection corpus.Frontiers in neuroinformatics, 12:83, 2018

  72. [72]

    Brain4FMs: A Benchmark of Foundation Models for Electrical Brain Signal, 2026

    Fanqi Shen, Enhong Yang, Jiahe Li, Junru Hong, Xiaoran Pan, Zhizhang Yuan, Meng Li, and Yang Yang. Brain4FMs: A Benchmark of Foundation Models for Electrical Brain Signal, 2026. Version Number: 1. 13

  73. [73]

    Fome: A founda- tion model for eeg using adaptive temporal-lateral attention scaling.arXiv preprint arXiv:2409.12454, 2024

    Enze Shi, Kui Zhao, Qilong Yuan, Jiaqi Wang, Huawen Hu, Sigang Yu, and Shu Zhang. Fome: A founda- tion model for eeg using adaptive temporal-lateral attention scaling.arXiv preprint arXiv:2409.12454, 2024

  74. [74]

    Open access dataset for eeg+ nirs single-trial classification.IEEE Transactions on Neural Systems and Rehabilitation Engineering, 25(10):1735–1745, 2016

    Jaeyoung Shin, Alexander von Lühmann, Benjamin Blankertz, Do-Won Kim, Jichai Jeong, Han-Jeong Hwang, and Klaus-Robert Müller. Open access dataset for eeg+ nirs single-trial classification.IEEE Transactions on Neural Systems and Rehabilitation Engineering, 25(10):1735–1745, 2016

  75. [75]

    Mental workload estimation based on physiological features for pilot-UA V teaming applications.Frontiers in Human Neuroscience, 15:692878, 2021

    Gaganpreet Singh, Caroline PC Chanel, and Raphaëlle N Roy. Mental workload estimation based on physiological features for pilot-UA V teaming applications.Frontiers in Human Neuroscience, 15:692878, 2021

  76. [76]

    A dataset of neonatal eeg recordings with seizure annotations.Scientific data, 6(1):190039, 2019

    Nathan J Stevenson, Karoliina Tapani, Leena Lauronen, and Sampsa Vanhatalo. A dataset of neonatal eeg recordings with seizure annotations.Scientific data, 6(1):190039, 2019

  77. [77]

    Machine learning–based sleep electroencephalographic brain age index and dementia risk: An individual participant data meta-analysis.JAMA Network Open, 9(3):e261521, 2026

    Haoqi Sun, Sarah Milton, Yi Fang, et al. Machine learning–based sleep electroencephalographic brain age index and dementia risk: An individual participant data meta-analysis.JAMA Network Open, 9(3):e261521, 2026

  78. [78]

    Brain age from the electroencephalogram of sleep.Neurobiology of aging, 74:112–120, 2019

    Haoqi Sun, Luis Paixao, Jefferson T Oliva, Balaji Goparaju, Diego Z Carvalho, Kicky G van Leeuwen, Oluwaseun Akeju, Robert J Thomas, Sydney S Cash, Matt T Bianchi, et al. Brain age from the electroencephalogram of sleep.Neurobiology of aging, 74:112–120, 2019

  79. [79]

    Oliva, Balaji Goparaju, Diego Z

    Haoqi Sun, Luis Paixao, Jefferson T. Oliva, Balaji Goparaju, Diego Z. Carvalho, Kicky G. van Leeuwen, Oluwaseun Akeju, Robert J. Thomas, Sydney S. Cash, Matt T. Bianchi, and M. Brandon Westover. Brain age from the electroencephalogram of sleep.Neurobiology of Aging, 74:112–120, February 2019

  80. [80]

    Chu, Can Zhang, Jonathan Rosand, Emmanuel Mignot, Sydney S

    Haoqi Sun, Elissa Ye, Luis Paixao, Wolfgang Ganglberger, Catherine J. Chu, Can Zhang, Jonathan Rosand, Emmanuel Mignot, Sydney S. Cash, David Gozal, Robert J. Thomas, and M. Brandon Westover. The sleep and wake electroencephalogram over the lifespan.Neurobiology of Aging, 124:60–70, 2023

Showing first 80 references.