pith. sign in

arxiv: 2605.18085 · v1 · pith:J2FMNNV3new · submitted 2026-05-18 · 📡 eess.SP

PRiSE-EEG: A Prior-Guided Foundation Model with Depth-Stratified Experts for Cross-Paradigm EEG Representation Learning

Pith reviewed 2026-05-20 01:01 UTC · model grok-4.3

classification 📡 eess.SP
keywords EEG foundation modelcross-paradigm representation learningmixture of expertsCKA similarityprior-guided patchingdepth-stratified expertstransformerbrain signals
0
0 comments X

The pith

PRiSE-EEG learns reusable EEG representations by patching signals with cortical priors and allocating experts according to layer-wise CKA sharedness.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper seeks to build a foundation model that works across different EEG recording paradigms without separate fine-tuning protocols. It starts by showing that dense Transformers face optimization conflicts between paradigms and that similarity across paradigms drops with depth. This motivates PRiSE-EEG, which creates continuous multi-channel patches from static priors and short-time interactions, then routes through shared and specialized experts in a depth-dependent way using a sigmoid function on CKA values. The result is stronger performance on twelve public benchmarks when protocols are matched, outperforming both standard Transformers and simpler MoE variants.

Core claim

By analyzing gradients and CKA similarities, the authors establish that shallow layers capture shared EEG features while deeper layers specialize. They then design PRiSE-EEG to form patches using weak static cortical and network priors along with dynamic channel interactions, and to place shared and specialized experts in MoE blocks via a sigmoid mapping of layer-wise CKA sharedness. This yields strong cross-paradigm results on 12 benchmarks.

What carries the argument

CKA-calibrated Depth-Stratified Experts, which allocate shared versus specialized experts across MoE Transformer blocks based on a sigmoid function of layer-wise CKA similarity.

If this is right

  • Common EEG regularities are preserved in early blocks while later blocks gain specialized capacity.
  • Optimization conflicts among EEG paradigms are reduced through the depth-dependent expert allocation.
  • Performance improves on heterogeneous benchmarks under consistent evaluation protocols.
  • Compact models can outperform dense Transformers by using CKA-derived routing instead of uniform or manual expert ratios.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Similar CKA-based routing might reduce conflicts in other heterogeneous signal domains such as multi-site recordings.
  • Checking whether the depth transition stays stable when more datasets are added would test how general the allocation rule is.
  • Replacing the fixed sigmoid with a learned router could further adapt the sharing pattern during training.

Load-bearing premise

The depth-wise pattern of decreasing cross-paradigm similarity seen in CKA analysis on the six training datasets will hold for other EEG data and justify using a fixed sigmoid-based expert split.

What would settle it

A new set of EEG benchmarks where the CKA sharedness does not decrease consistently with depth, or where the PRiSE-EEG model fails to improve over dense baselines, would challenge the central design choice.

Figures

Figures reproduced from arXiv: 2605.18085 by Jiangtong Li, Jie Li, Kun Zhu, Wei Xiong.

Figure 1
Figure 1. Figure 1: Pairwise cosine similarity of downstream fine-tuning gradients across datasets for LaBraM [PITH_FULL_IMAGE:figures/full_fig_p002_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: The architecture of PRiSE-EEG framework. The Prior-Guided Continuous Tokenizer [PITH_FULL_IMAGE:figures/full_fig_p004_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Ablation Study. We study the effect of the approaches for three modules in Prior-Guided [PITH_FULL_IMAGE:figures/full_fig_p008_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: CKA analysis and shared expert allocation. The left heatmap shows layer-wise CKA [PITH_FULL_IMAGE:figures/full_fig_p008_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Gradient subspace affinity (rank=5) and similarity of routed expert distribution among [PITH_FULL_IMAGE:figures/full_fig_p009_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: Ablation Study of attention bias strength for static path on TUSL, Workload and SEED-V [PITH_FULL_IMAGE:figures/full_fig_p026_6.png] view at source ↗
Figure 7
Figure 7. Figure 7: Ablation Study of the stage that the neural prior has been introduced on TUSL, SEED-V, [PITH_FULL_IMAGE:figures/full_fig_p026_7.png] view at source ↗
Figure 8
Figure 8. Figure 8: Grad-CAM visualization of PRiSE-EEG on SEED showing the region of model interest for [PITH_FULL_IMAGE:figures/full_fig_p029_8.png] view at source ↗
Figure 9
Figure 9. Figure 9: Grad-CAM visualization of our PRiSE-EEG on ADFTD dataset showing the region of [PITH_FULL_IMAGE:figures/full_fig_p030_9.png] view at source ↗
Figure 10
Figure 10. Figure 10: Grad-CAM visualization of PRiSE-EEG on Workload dataset showing the region of model [PITH_FULL_IMAGE:figures/full_fig_p030_10.png] view at source ↗
Figure 11
Figure 11. Figure 11: Grad-CAM visualization of PRiSE-EEG on Workload dataset showing the region of model [PITH_FULL_IMAGE:figures/full_fig_p031_11.png] view at source ↗
Figure 12
Figure 12. Figure 12: t-SNE visualizations of feature embeddings reduced to 40 dimensions by PCA on different [PITH_FULL_IMAGE:figures/full_fig_p031_12.png] view at source ↗
Figure 13
Figure 13. Figure 13: Scaling behavior with model size N and a) Lp; b) HMC balanced accuracy; c) SEED balanced accuracy. Axes are all on a logarithmic scale. 0.005 0.014 0.040 0.12 0.34 1.0 Training Data Size (ratio) 0.6 0.7 0.8 0.9 1.0 1.1 Pretrain Loss Pretrain Loss 0.005 0.014 0.040 0.12 0.34 1.0 Training Data Size (ratio) 0.66 0.68 0.70 0.72 0.74 0.76 Test Balanced Accuracy SEED (B-Acc) 0.005 0.014 0.040 0.12 0.34 1.0 Trai… view at source ↗
Figure 14
Figure 14. Figure 14: Scaling behavior with data partition P and a) Lp; b) SEED balanced accuracy; c) HMC balanced accuracy. Axes are all on a logarithmic scale. of the test Lp with model size (N) is: Lp = −0.034 * ln(N) + 0.786 , where R2 is 0.974. The results on the SEED dataset show that the scaling law of the test balanced accuracy with model size (N) is: BAcc = 0.010 * ln(N) + 0.699, where R2 is 0.940. The results on the … view at source ↗
read the original abstract

EEG foundation models aim to learn reusable representations across heterogeneous paradigms, yet existing approaches often use uniform adaptation mechanisms and are typically reported under separate downstream fine-tuning protocols. In this work, we first analyze dense EEG Transformers from two complementary perspectives. Gradient similarity across six downstream datasets reveals substantial optimization conflicts among EEG paradigms, while CKA analysis on mixed-paradigm batches shows a consistent depth-wise transition: shallow layers preserve stronger cross-paradigm similarity, whereas deeper layers become increasingly specialized. Motivated by these findings, we propose \textbf{PRiSE-EEG}, a prior-guided EEG foundation model with CKA-calibrated Depth-Stratified Experts. PRiSE-EEG forms continuous multi-channel EEG patches using weak static cortical and network priors and dynamic short-time channel interactions, then allocates shared and specialized experts across MoE Transformer blocks according to a sigmoid mapping from layer-wise CKA sharedness. This design preserves common EEG regularities in early blocks while assigning more specialized capacity to later task-specific transformations. Experiments on 12 public EEG benchmarks show strong cross-paradigm performance under matched protocols. Compact ablations further show that CKA-derived expert allocation improves over dense Transformers, uniform MoE, and manually fixed shared-specific expert ratios.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 1 minor

Summary. The paper proposes PRiSE-EEG, a foundation model for cross-paradigm EEG representation learning. It constructs continuous multi-channel EEG patches from weak static cortical/network priors combined with dynamic short-time channel interactions, then deploys a MoE Transformer with depth-stratified experts whose shared-versus-specialized allocation is set by a sigmoid mapping of layer-wise CKA similarity. The design is motivated by gradient-similarity analysis revealing optimization conflicts across paradigms and by CKA analysis on mixed batches from six datasets showing a consistent shallow-to-deep transition from shared to specialized representations. Experiments on 12 public EEG benchmarks under matched protocols, together with compact ablations, are reported to demonstrate gains over dense Transformers and uniform MoE baselines.

Significance. If the empirical improvements are shown to be statistically robust and the CKA-derived allocation rule generalizes, the work would supply a concrete, data-driven mechanism for balancing shared and specialized capacity inside EEG foundation models, directly addressing the paradigm-heterogeneity problem that uniform adaptation strategies have left unresolved.

major comments (2)
  1. [Motivation and Method (CKA analysis and expert allocation)] The sigmoid expert-allocation rule is calibrated exclusively on layer-wise CKA values obtained from mixed-paradigm batches drawn from the same six datasets used for the gradient and CKA analysis. No verification is provided that the observed depth-wise sharedness transition (and therefore the same sigmoid parameters) remains near-optimal when the paradigm mix is expanded to the remaining six benchmarks or altered in other ways; if the transition point or slope shifts, the allocation becomes an arbitrary hyperparameter and the performance advantage over uniform MoE cannot be confidently attributed to the CKA calibration.
  2. [Experiments and Ablations] The central performance claims rest on results from 12 benchmarks and compact ablations that are presented without error bars, without statistical significance tests, and without complete training details or hyperparameter specifications. This absence prevents assessment of whether the reported gains are reliable or could be explained by other design choices such as the prior-guided patch construction.
minor comments (1)
  1. [Method] Notation for the sigmoid mapping parameters and the precise definition of 'sharedness' used to compute CKA should be introduced earlier and used consistently throughout the method and ablation sections.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the thoughtful and constructive report. We address each major comment below and outline the revisions we will make to improve the manuscript.

read point-by-point responses
  1. Referee: The sigmoid expert-allocation rule is calibrated exclusively on layer-wise CKA values obtained from mixed-paradigm batches drawn from the same six datasets used for the gradient and CKA analysis. No verification is provided that the observed depth-wise sharedness transition (and therefore the same sigmoid parameters) remains near-optimal when the paradigm mix is expanded to the remaining six benchmarks or altered in other ways; if the transition point or slope shifts, the allocation becomes an arbitrary hyperparameter and the performance advantage over uniform MoE cannot be confidently attributed to the CKA calibration.

    Authors: We appreciate this observation on the scope of the CKA calibration. The six datasets selected for the gradient and CKA analyses represent a diverse collection of EEG paradigms (motor imagery, P300, SSVEP, and others) that capture the primary sources of heterogeneity present across the full 12 benchmarks. The consistent shallow-to-deep transition in cross-paradigm similarity appears to reflect a general property of how dense Transformers process mixed EEG data rather than a narrow dataset artifact. To directly verify robustness, we will add CKA analyses on mixed batches that include the remaining six benchmarks, recompute the sigmoid parameters, and report the resulting performance deltas versus uniform MoE in the revised manuscript. This will strengthen the claim that the allocation rule is data-driven and generalizable. revision: yes

  2. Referee: The central performance claims rest on results from 12 benchmarks and compact ablations that are presented without error bars, without statistical significance tests, and without complete training details or hyperparameter specifications. This absence prevents assessment of whether the reported gains are reliable or could be explained by other design choices such as the prior-guided patch construction.

    Authors: We agree that the current experimental reporting lacks the statistical detail and transparency needed for rigorous evaluation. In the revised manuscript we will include error bars (standard deviation over three independent runs with different seeds) for all main results and ablations. We will add paired statistical significance tests (t-tests or Wilcoxon signed-rank) comparing PRiSE-EEG against the dense Transformer and uniform MoE baselines. A new appendix will provide complete hyperparameter tables, optimizer settings, learning-rate schedules, batch sizes, and hardware specifications for every experiment. These additions will allow readers to assess whether the observed gains are robust and attributable to the CKA-calibrated depth-stratified experts. revision: yes

Circularity Check

1 steps flagged

CKA-derived sigmoid expert allocation is a data-informed design choice but does not reduce the central claim by construction

specific steps
  1. fitted input called prediction [Abstract (CKA analysis and PRiSE-EEG proposal)]
    "Gradient similarity across six downstream datasets reveals substantial optimization conflicts among EEG paradigms, while CKA analysis on mixed-paradigm batches shows a consistent depth-wise transition: shallow layers preserve stronger cross-paradigm similarity, whereas deeper layers become increasingly specialized. Motivated by these findings, we propose PRiSE-EEG, a prior-guided EEG foundation model with CKA-calibrated Depth-Stratified Experts. ... allocates shared and specialized experts across MoE Transformer blocks according to a sigmoid mapping from layer-wise CKA sharedness."

    The sigmoid mapping is calibrated directly from the observed CKA transition on the same six datasets used for gradient/CKA analysis and downstream training. While not a direct fit of accuracy, the expert allocation rule is statistically informed by the input data's layer-wise similarity statistics, so performance gains over uniform MoE baselines are partly attributable to this post-observation design choice rather than an independent prior.

full rationale

The paper computes CKA on held-out mixed-paradigm batches from six datasets solely to observe the depth-wise sharedness pattern and then selects a sigmoid mapping to allocate experts. This is a modest data-dependent hyperparameter choice rather than fitting the final performance metric or redefining the result in terms of itself. The core contributions (continuous patch formation with priors, MoE blocks, and cross-paradigm evaluation on 12 benchmarks) retain independent content, so the derivation chain is largely self-contained.

Axiom & Free-Parameter Ledger

1 free parameters · 2 axioms · 0 invented entities

The design rests on two empirical observations (gradient conflicts across paradigms and depth-wise CKA transition) that are treated as stable properties of EEG Transformers; the sigmoid mapping introduces at least one tunable functional form whose parameters are not derived from first principles.

free parameters (1)
  • sigmoid mapping parameters
    Controls how CKA sharedness values are converted into the proportion of shared versus specialized experts at each depth; chosen after inspecting CKA curves on the analysis datasets.
axioms (2)
  • domain assumption Gradient similarity across six downstream datasets reveals substantial optimization conflicts among EEG paradigms
    Invoked in the opening analysis paragraph to justify the need for depth-stratified rather than uniform adaptation.
  • domain assumption CKA analysis on mixed-paradigm batches shows a consistent depth-wise transition from shared to specialized representations
    Used to motivate the sigmoid allocation rule; treated as a general property rather than dataset-specific.

pith-pipeline@v0.9.0 · 5764 in / 1604 out tokens · 68523 ms · 2026-05-20T01:01:57.553890+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

91 extracted references · 91 canonical work pages · 4 internal anchors

  1. [1]

    ISSN 0304-3975

    Finding frequent items in data streams.Theoretical Computer Science, 312(1):3–15, 2004. ISSN 0304-3975. Automata, Languages and Programming

  2. [2]

    Emotion estimation from eeg signals during listening to quran using psd features

    Mashail Alsolamy and Anas Fattouh. Emotion estimation from eeg signals during listening to quran using psd features. InCSIT, pages 1–5, 2016

  3. [3]

    Diego Alvarez-Estevez and Roselyne M. Rijsman. Inter-database validation of a deep learning approach for automatic sleep scoring.PLOS ONE, 16(8):1–27, 08 2021

  4. [4]

    Vivit: A video vision transformer

    Anurag Arnab, Mostafa Dehghani, Georg Heigold, Chen Sun, Mario Lu ˇci´c, and Cordelia Schmid. Vivit: A video vision transformer. InProceedings of the IEEE/CVF international conference on computer vision, pages 6836–6846, 2021

  5. [5]

    Developmental fronto-parietal shift of brain activation during mental arithmetic across the lifespan: A registered report protocol.PLOS ONE, 16(8):1–13, 08 2021

    Christina Artemenko. Developmental fronto-parietal shift of brain activation during mental arithmetic across the lifespan: A registered report protocol.PLOS ONE, 16(8):1–13, 08 2021

  6. [6]

    Is space-time attention all you need for video understanding? InIcml, volume 2, page 4, 2021

    Gedas Bertasius, Heng Wang, and Lorenzo Torresani. Is space-time attention all you need for video understanding? InIcml, volume 2, page 4, 2021

  7. [7]

    The non-invasive berlin brain–computer interface: Fast acquisition of effective perfor- mance in untrained subjects.NeuroImage, 37(2):539–550, 2007

    Benjamin Blankertz, Guido Dornhege, Matthias Krauledat, Klaus-Robert Müller, and Gabriel Curio. The non-invasive berlin brain–computer interface: Fast acquisition of effective perfor- mance in untrained subjects.NeuroImage, 37(2):539–550, 2007

  8. [8]

    Be- havioural correlates of the p3b event-related potential in school-age children.International Journal of Psychophysiology, 76(3):148–157, 2010

    O Boucher, CH Bastien, G Muckle, D Saint-Amour, SW Jacobson, and JL Jacobson. Be- havioural correlates of the p3b event-related potential in school-age children.International Journal of Psychophysiology, 76(3):148–157, 2010

  9. [9]

    Investigating the electrophysiological basis of resting state networks using magnetoencephalography.Proceedings of the National Academy of Sciences, 108(40):16783–16788, 2011

    Matthew J Brookes, Mark Woolrich, Henry Luckhoo, Darren Price, Joanne R Hale, Mary C Stephenson, Gareth R Barnes, Stephen M Smith, and Peter G Morris. Investigating the electrophysiological basis of resting state networks using magnetoencephalography.Proceedings of the National Academy of Sciences, 108(40):16783–16788, 2011

  10. [10]

    Eeg-gnn: Graph neural networks for classification of electroencephalogram (eeg) signals

    Andac Demir, Toshiaki Koike-Akino, Ye Wang, Masaki Haruna, and Deniz Erdogmus. Eeg-gnn: Graph neural networks for classification of electroencephalogram (eeg) signals. InEMBC, pages 1061–1067, 2021

  11. [11]

    Siena scalp eeg database.physionet, 10:493, 2020

    Paolo Detti. Siena scalp eeg database.physionet, 10:493, 2020

  12. [12]

    Capturing heterogeneous group differences using mixture-of-experts: Application to a study of aging.Neuroimage, 125:498–514, 2016

    Harini Eavani, Meng Kang Hsieh, Yang An, Guray Erus, Lori Beason-Held, Susan Resnick, and Christos Davatzikos. Capturing heterogeneous group differences using mixture-of-experts: Application to a study of aging.Neuroimage, 125:498–514, 2016

  13. [13]

    The human brainnetome atlas: a new brain atlas based on connectional architecture.Cerebral cortex, 26(8):3508–3526, 2016

    Lingzhong Fan, Hai Li, Junjie Zhuo, Yu Zhang, Jiaojian Wang, Liangfu Chen, Zhengyi Yang, Congying Chu, Sangma Xie, Angela R Laird, et al. The human brainnetome atlas: a new brain atlas based on connectional architecture.Cerebral cortex, 26(8):3508–3526, 2016

  14. [14]

    Switch transformers: Scaling to trillion parameter models with simple and efficient sparsity.Journal of Machine Learning Research, 23 (120):1–39, 2022

    William Fedus, Barret Zoph, and Noam Shazeer. Switch transformers: Scaling to trillion parameter models with simple and efficient sparsity.Journal of Machine Learning Research, 23 (120):1–39, 2022

  15. [15]

    Eegmoe: A domain-decoupled mixture-of-experts model for self-supervised eeg representation learning.IEEE Transactions on Neural Networks and Learning Systems, 2026

    Xuange Gao, Danli Wang, and Yanyan Zhao. Eegmoe: A domain-decoupled mixture-of-experts model for self-supervised eeg representation learning.IEEE Transactions on Neural Networks and Learning Systems, 2026

  16. [16]

    A large and rich eeg dataset for modeling human visual object recognition.NeuroImage, 264:119754, 2022

    Alessandro T Gifford, Kshitij Dwivedi, Gemma Roig, and Radoslaw M Cichy. A large and rich eeg dataset for modeling human visual object recognition.NeuroImage, 264:119754, 2022

  17. [17]

    Engemann, Daniel Strohmeier, Christian Brodbeck, Roman Goj, Mainak Jas, Teon Brooks, Lauri Parkkonen, and Matti Hämäläinen

    Alexandre Gramfort, Martin Luessi, Eric Larson, Denis A. Engemann, Daniel Strohmeier, Christian Brodbeck, Roman Goj, Mainak Jas, Teon Brooks, Lauri Parkkonen, and Matti Hämäläinen. Meg and eeg data analysis with mne-python.Frontiers in Neuroscience, 7, 2013. 10

  18. [18]

    Human eeg recordings for 1,854 concepts presented in rapid serial visual presentation streams

    Tijl Grootswagers, Ivy Zhou, Amanda K Robinson, Martin N Hebart, and Thomas A Carlson. Human eeg recordings for 1,854 concepts presented in rapid serial visual presentation streams. Scientific Data, 9(1):3, 2022

  19. [19]

    Harati, M

    A. Harati, M. Golmohammadi, S. Lopez, I. Obeid, and J. Picone. Improved eeg event classifica- tion using differential energy. InSPMB, pages 1–4, 2015

  20. [20]

    Large- scale cortical correlation structure of spontaneous oscillatory activity.Nature neuroscience, 15 (6):884–890, 2012

    Joerg F Hipp, David J Hawellek, Maurizio Corbetta, Markus Siegel, and Andreas K Engel. Large- scale cortical correlation structure of spontaneous oscillatory activity.Nature neuroscience, 15 (6):884–890, 2012

  21. [21]

    Lora: Low-rank adaptation of large language models.ICLR, 1 (2):3, 2022

    Edward J Hu, Yelong Shen, Phillip Wallis, Zeyuan Allen-Zhu, Yuanzhi Li, Shean Wang, Lu Wang, Weizhu Chen, et al. Lora: Low-rank adaptation of large language models.ICLR, 1 (2):3, 2022

  22. [22]

    Categorical reparametrization with gumble-softmax

    Eric Jang, Shixiang Gu, and Ben Poole. Categorical reparametrization with gumble-softmax. In International Conference on Learning Representations (ICLR 2017), 2017

  23. [23]

    Multimodal signal dataset for 11 intuitive movement tasks from single upper extremity during multiple recording sessions.GigaScience, 9(10):giaa098, 2020

    Ji-Hoon Jeong, Jeong-Hyun Cho, Kyung-Hwan Shim, Byoung-Hee Kwon, Byeong-Hoo Lee, Do-Yeun Lee, Dae-Hyeok Lee, and Seong-Whan Lee. Multimodal signal dataset for 11 intuitive movement tasks from single upper extremity during multiple recording sessions.GigaScience, 9(10):giaa098, 2020

  24. [24]

    Large brain model for learning generic representations with tremendous eeg data in bci

    Wei-Bang Jiang, Li-Ming Zhao, and Bao-Liang Lu. Large brain model for learning generic representations with tremendous eeg data in bci. InICLR, 2024

  25. [25]

    Neurolm: A universal multi-task foundation model for bridging the gap between language and eeg signals

    Wei-Bang Jiang, Yansen Wang, Bao-Liang Lu, and Dongsheng Li. Neurolm: A universal multi-task foundation model for bridging the gap between language and eeg signals. InICLR, 2025

  26. [26]

    Brain Invaders calibration-less P300-based BCI with modulation of flash duration Dataset (bi2015a)

    Louis Korczowski, Martine Cederhout, Anton Andreev, Grégoire Cattan, Pedro Luiz Coelho Ro- drigues, Violette Gautheret, and Marco Congedo. Brain Invaders calibration-less P300-based BCI with modulation of flash duration Dataset (bi2015a). Technical report, July 2019

  27. [27]

    Similarity of neural network representations revisited

    Simon Kornblith, Mohammad Norouzi, Honglak Lee, and Geoffrey Hinton. Similarity of neural network representations revisited. InInternational conference on machine learning, pages 3519–3529. PMlR, 2019

  28. [28]

    Bendr: Using transformers and a contrastive self-supervised learning task to learn from massive amounts of eeg data.FHN, 15:653–659, 2021

    Demetres Kostas, Stephane Aroca-Ouellette, and Frank Rudzicz. Bendr: Using transformers and a contrastive self-supervised learning task to learn from massive amounts of eeg data.FHN, 15:653–659, 2021

  29. [29]

    Eegnet: a compact convolutional neural network for eeg-based brain–computer interfaces.Journal of Neural Engineering, 15(5), 2018

    Vernon J Lawhern, Amelia J Solon, Nicholas R Waytowich, Stephen M Gordon, Chou P Hung, and Brent J Lance. Eegnet: a compact convolutional neural network for eeg-based brain–computer interfaces.Journal of Neural Engineering, 15(5), 2018

  30. [30]

    Estformer: Transformer utilising spatiotemporal dependencies for electroencephalogram super-resolution.Knowledge-Based Systems, 317:113345, 2025

    Dongdong Li, Zhongliang Zeng, Zhe Wang, and Hai Yang. Estformer: Transformer utilising spatiotemporal dependencies for electroencephalogram super-resolution.Knowledge-Based Systems, 317:113345, 2025

  31. [31]

    DeepSeek-V3 Technical Report

    Aixin Liu, Bei Feng, Bing Xue, Bingxuan Wang, Bochao Wu, Chengda Lu, Chenggang Zhao, Chengqi Deng, Chenyu Zhang, Chong Ruan, et al. Deepseek-v3 technical report.arXiv preprint arXiv:2412.19437, 2024

  32. [32]

    Fine-grained interpretability for eeg emotion recognition: Concat-aided grad-cam and systematic brain functional network

    Bingxiu Liu, Jifeng Guo, CL Philip Chen, Xia Wu, and Tong Zhang. Fine-grained interpretability for eeg emotion recognition: Concat-aided grad-cam and systematic brain functional network. IEEE Transactions on Affective Computing, 15(2):671–684, 2023

  33. [33]

    Comparing recognition performance and robustness of multimodal deep learning models for multimodal emotion recognition.IEEE TCDS, 14(2):715–729, 2022

    Wei Liu, Jie-Lin Qiu, Wei-Long Zheng, and Bao-Liang Lu. Comparing recognition performance and robustness of multimodal deep learning models for multimodal emotion recognition.IEEE TCDS, 14(2):715–729, 2022

  34. [34]

    Wei Liu, Wei-Long Zheng, Ziyi Li, Si-Yuan Wu, Lu Gan, and Bao-Liang Lu. Identifying similarities and differences in emotion recognition with eeg and eye movements among chinese, german, and french people.Journal of Neural Engineering, 19(2):026012, 2022. 11

  35. [35]

    Luciw, Ewa Jarocka, and Benoni B

    Matthew D. Luciw, Ewa Jarocka, and Benoni B. Edin. Multi-channel eeg recordings during 3,936 grasp and lift trials with varying weight and friction.Scientific Data, 1(1):140047, Nov 2014

  36. [36]

    López, G

    S. López, G. Suarez, D. Jungreis, I. Obeid, and J. Picone. Automated identification of abnormal adult eegs. InSPMB, pages 1–5, 2015

  37. [37]

    CodeBrain: Bridging Decoupled Tokenizer and Multi-Scale Architecture for EEG Foundation Model

    Jingying Ma, Feng Wu, Qika Lin, Yucheng Xing, Chenyu Liu, Ziyu Jia, and Mengling Feng. Codebrain: Towards decoupled interpretability and multi-scale architecture for eeg foundation model.arXiv preprint arXiv:2506.09110, 2025

  38. [38]

    Objective and subjective evaluation of online error correction during p300-based spelling

    Perrin Margaux, Maby Emmanuel, Daligault Sébastien, Bertrand Olivier, and Mattout Jérémie. Objective and subjective evaluation of online error correction during p300-based spelling. Advances in Human-Computer Interaction, 2012(1):578295, 2012

  39. [39]

    Zero-phase-delay syn- chrony between interacting neural populations: implications for functional connectivity-derived biomarkers.Imaging Neuroscience, 3:IMAG–a, 2025

    Chirag Mehra, Ahmad Beyh, Petroula Laiou, Pilar Garces, Emily JH Jones, Luke Mason, Jan Buitelaar, Mark H Johnson, Declan Murphy, Eva Loth, et al. Zero-phase-delay syn- chrony between interacting neural populations: implications for functional connectivity-derived biomarkers.Imaging Neuroscience, 3:IMAG–a, 2025

  40. [40]

    Eeg microstates as a tool for studying the temporal dynamics of whole-brain neuronal networks: a review.Neuroimage, 180:577–593, 2018

    Christoph M Michel and Thomas Koenig. Eeg microstates as a tool for studying the temporal dynamics of whole-brain neuronal networks: a review.Neuroimage, 180:577–593, 2018

  41. [41]

    A dataset of scalp eeg recordings of alzheimer’s disease, frontotemporal dementia and healthy subjects from routine eeg.Data, 8(6):95, 2023

    Andreas Miltiadous, Katerina D Tzimourta, Theodora Afrantou, Panagiotis Ioannidis, Niko- laos Grigoriadis, Dimitrios G Tsalikakis, Pantelis Angelidis, Markos G Tsipouras, Euripidis Glavas, Nikolaos Giannakeas, et al. A dataset of scalp eeg recordings of alzheimer’s disease, frontotemporal dementia and healthy subjects from routine eeg.Data, 8(6):95, 2023

  42. [42]

    Contextual feature extraction hierarchies converge in large language models and the brain

    Gavin Mischler, Yinghao Aaron Li, Stephan Bickel, Ashesh D Mehta, and Nima Mesgarani. Contextual feature extraction hierarchies converge in large language models and the brain. Nature Machine Intelligence, 6(12):1467–1477, 2024

  43. [43]

    Insights on representational similarity in neural networks with canonical correlation.Advances in neural information processing systems, 31, 2018

    Ari Morcos, Maithra Raghu, and Samy Bengio. Insights on representational similarity in neural networks with canonical correlation.Advances in neural information processing systems, 31, 2018

  44. [44]

    Useful- ness of eeg techniques in distinguishing frontotemporal dementia from alzheimer’s disease and other dementias.Disease markers, 2018(1):6581490, 2018

    Raffaele Nardone, Luca Sebastianelli, Viviana Versace, Leopold Saltuari, Piergiorgio Lochner, Vanessa Frey, Stefan Golaszewski, Francesco Brigo, Eugen Trinka, and Yvonne Höller. Useful- ness of eeg techniques in distinguishing frontotemporal dementia from alzheimer’s disease and other dementias.Disease markers, 2018(1):6581490, 2018

  45. [45]

    The temple university hospital eeg data corpus.Frontiers in Neuroscience, 10:196, 2016

    Iyad Obeid and Joseph Picone. The temple university hospital eeg data corpus.Frontiers in Neuroscience, 10:196, 2016

  46. [46]

    REVE: A foundation model for EEG–adapting to any setup with large-scale pretraining on 25,000 subjects.arXiv preprint arXiv:2510.21585, 2025

    Yassine El Ouahidi, Jonathan Lys, Philipp Thölke, Nicolas Farrugia, Bastien Pasdeloup, Vincent Gripon, Karim Jerbi, and Giulia Lioi. Reve: A foundation model for eeg - adapting to any setup with large-scale pretraining on 25,000 subjects.ArXiv, abs/2510.21585, 2025

  47. [47]

    Updating p300: an integrative theory of p3a and p3b.Clinical neurophysiology, 118(10):2128–2148, 2007

    John Polich. Updating p300: an integrative theory of p3a and p3b.Clinical neurophysiology, 118(10):2128–2148, 2007

  48. [48]

    Tokenizing Single-Channel EEG with Time-Frequency Motif Learning

    Jathurshan Pradeepkumar, Xihao Piao, Zheng Chen, and Jimeng Sun. Tokenizing single-channel eeg with time-frequency motif learning.arXiv preprint arXiv:2502.16060, 2025

  49. [49]

    The brainlat project, a multimodal neuroimaging dataset of neurodegeneration from underrepresented backgrounds.Scientific Data, 10(1):889, 2023

    Pavel Prado, Vicente Medel, Raul Gonzalez-Gomez, Agustín Sainz-Ballesteros, Victor Vi- dal, Hernando Santamaría-García, Sebastian Moguilner, Jhony Mejia, Andrea Slachevsky, Maria Isabel Behrens, et al. The brainlat project, a multimodal neuroimaging dataset of neurodegeneration from underrepresented backgrounds.Scientific Data, 10(1):889, 2023

  50. [50]

    Svcca: Singular vector canonical correlation analysis for deep learning dynamics and interpretability.Advances in neural information processing systems, 30, 2017

    Maithra Raghu, Justin Gilmer, Jason Yosinski, and Jascha Sohl-Dickstein. Svcca: Singular vector canonical correlation analysis for deep learning dynamics and interpretability.Advances in neural information processing systems, 30, 2017. 12

  51. [51]

    Mricrogl: voxel-based visualization for neuroimaging.Nature methods, 22(8):1613–1614, 2025

    Christopher Rorden. Mricrogl: voxel-based visualization for neuroimaging.Nature methods, 22(8):1613–1614, 2025

  52. [52]

    Chrononet: A deep recurrent neural network for abnormal eeg identification

    Subhrajit Roy, Isabell Kiral-Kornek, and Stefan Harrer. Chrononet: A deep recurrent neural network for abnormal eeg identification. In David Riaño, Szymon Wilk, and Annette ten Teije, editors,Artificial Intelligence in Medicine, pages 47–56, 2019

  53. [53]

    Emotion detection in the loop from brain signals and facial images

    Arman Savran, Koray Çiftçi, Guillaume Chanel, Javier Mota, Luong Viet, Bulent Sankur, Lale Akarun, Alice Caplier, and Michèle Rombaut. Emotion detection in the loop from brain signals and facial images. 01 2006

  54. [54]

    Schalk, D.J

    G. Schalk, D.J. McFarland, T. Hinterberger, N. Birbaumer, and J.R. Wolpaw. Bci2000: a general-purpose brain-computer interface (bci) system.IEEE TBE, 51(6):1034–1043, 2004

  55. [55]

    Grad-cam: Visual explanations from deep networks via gradient-based localization

    Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. InProceedings of the IEEE international conference on computer vision, pages 618–626, 2017

  56. [56]

    Multi-task learning as multi-objective optimization.Advances in neural information processing systems, 31, 2018

    Ozan Sener and Vladlen Koltun. Multi-task learning as multi-objective optimization.Advances in neural information processing systems, 31, 2018

  57. [57]

    Outrageously Large Neural Networks: The Sparsely-Gated Mixture-of-Experts Layer

    Noam Shazeer, Azalia Mirhoseini, Krzysztof Maziarz, Andy Davis, Quoc Le, Geoffrey Hinton, and Jeff Dean. Outrageously large neural networks: The sparsely-gated mixture-of-experts layer.arXiv preprint arXiv:1701.06538, 2017

  58. [58]

    Hbn-eeg: The fair imple- mentation of the healthy brain network (hbn) electroencephalography dataset.bioRxiv, pages 2024–10, 2024

    Seyed Yahya Shirazi, Alexandre Franco, Maurício Scopel Hoffmann, Nathalia B Esper, Dung Truong, Arnaud Delorme, Michael P Milham, and Scott Makeig. Hbn-eeg: The fair imple- mentation of the healthy brain network (hbn) electroencephalography dataset.bioRxiv, pages 2024–10, 2024

  59. [59]

    Lora vs full fine-tuning: An illusion of equivalence.arXiv preprint arXiv:2410.21228, 2024

    Reece Shuttleworth, Jacob Andreas, Antonio Torralba, and Pratyusha Sharma. Lora vs full fine-tuning: An illusion of equivalence.arXiv preprint arXiv:2410.21228, 2024

  60. [60]

    Transformer-based spatial-temporal feature learning for eeg decoding.arXiv preprint arXiv:2106.11170, 2021

    Yonghao Song, Xueyu Jia, Lie Yang, and Longhan Xie. Transformer-based spatial-temporal feature learning for eeg decoding.arXiv preprint arXiv:2106.11170, 2021

  61. [61]

    Eeg conformer: Convolu- tional transformer for eeg decoding and visualization.TNSRE, 31:710–719, 2022

    Yonghao Song, Qingqing Zheng, Bingchuan Liu, and Xiaorong Gao. Eeg conformer: Convolu- tional transformer for eeg decoding and visualization.TNSRE, 31:710–719, 2022

  62. [62]

    Towards music imagery information retrieval: Introducing the openmiir dataset of eeg recordings from music perception and imagination

    Sebastian Stober, Avital Sternin, Adrian M Owen, and Jessica A Grahn. Towards music imagery information retrieval: Introducing the openmiir dataset of eeg recordings from music perception and imagination. InISMIR, pages 763–769, 2015

  63. [63]

    Axiomatic attribution for deep networks

    Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017

  64. [64]

    Bert rediscovers the classical nlp pipeline

    Ian Tenney, Dipanjan Das, and Ellie Pavlick. Bert rediscovers the classical nlp pipeline. In Proceedings of the 57th annual meeting of the association for computational linguistics, pages 4593–4601, 2019

  65. [65]

    Prediction of reaction time and vigilance variability from spatio-spectral features of resting-state eeg in a long sustained attention task.IEEE JBHI, 24(9):2550–2558, 2020

    Mastaneh Torkamani-Azar, Sumeyra Demir Kanik, Serap Aydin, and Mujdat Cetin. Prediction of reaction time and vigilance variability from spatio-spectral features of resting-state eeg in a long sustained attention task.IEEE JBHI, 24(9):2550–2558, 2020

  66. [66]

    Trujillo

    Logan T. Trujillo. Mental effort and information-processing costs are inversely related to global brain free energy during visual categorization.Frontiers in Neuroscience, 13, 2019

  67. [67]

    Trujillo, Candice T

    Logan T. Trujillo, Candice T. Stanfield, and Ruben D. Vela. The effect of electroencephalogram (eeg) reference choice on information-theoretic measures of the complexity and integration of eeg signals.Frontiers in Neuroscience, 11, 2017

  68. [68]

    Eeg microstate sequences in healthy humans at rest reveal scale-free dynamics.Proceedings of the National Academy of Sciences, 107(42):18179–18184, 2010

    Dimitri Van de Ville, Juliane Britz, and Christoph M Michel. Eeg microstate sequences in healthy humans at rest reveal scale-free dynamics.Proceedings of the National Academy of Sciences, 107(42):18179–18184, 2010. 13

  69. [69]

    Neural discrete representation learning

    Aaron Van Den Oord, Oriol Vinyals, et al. Neural discrete representation learning. InNeurIPS, 2017

  70. [70]

    Visualizing data using t-sne.Journal of Machine Learning Research, 9(86):2579–2605, 2008

    Laurens van der Maaten and Geoffrey Hinton. Visualizing data using t-sne.Journal of Machine Learning Research, 9(86):2579–2605, 2008

  71. [71]

    Spatiotemporal characteristics of electrocortical brain activity during mental calculation.Human Brain Mapping, 35(12), 2014

    Mariska J Vansteensel, Martin G Bleichner, Zac V Freudenburg, Dora Hermes, Erik J Aarnoutse, Frans SS Leijten, Cyrille H Ferrier, Johan Martijn Jansma, and Nick F Ramsey. Spatiotemporal characteristics of electrocortical brain activity during mental calculation.Human Brain Mapping, 35(12), 2014

  72. [72]

    Dynamical relaying can yield zero time lag neuronal synchrony despite long conduction delays.Proceedings of the National Academy of Sciences, 105(44):17157–17162, 2008

    Raul Vicente, Leonardo L Gollo, Claudio R Mirasso, Ingo Fischer, and Gordon Pipa. Dynamical relaying can yield zero time lag neuronal synchrony despite long conduction delays.Proceedings of the National Academy of Sciences, 105(44):17157–17162, 2008

  73. [73]

    von Weltin, T

    E. von Weltin, T. Ahsan, V . Shah, D. Jamshed, M. Golmohammadi, I. Obeid, and J. Picone. Electroencephalographic slowing: A primary source of error in automatic seizure detection. In SPMB, pages 1–5, 2017

  74. [74]

    Eegpt: Pretrained transformer for universal and reliable representation of eeg signals

    Guangyu Wang, Wenchao Liu, Yuhong He, Cong Xu, Lin Ma, and Haifeng Li. Eegpt: Pretrained transformer for universal and reliable representation of eeg signals. InNeurIPS, pages 39249– 39280, 2024

  75. [75]

    Cbramod: A criss-cross brain foundation model for eeg decoding

    Jiquan Wang, Sha Zhao, Zhiling Luo, Yangxuan Zhou, Haiteng Jiang, Shijian Li, Tao Li, and Gang Pan. Cbramod: A criss-cross brain foundation model for eeg decoding. InICLR, 2025

  76. [76]

    arXiv preprint arXiv:2505.15946 (2025)

    Yuxiang Wei, Yanteng Zhang, Xi Xiao, Tianyang Wang, Xiao Wang, and Vince D Calhoun. More-brain: Routed mixture of experts for interpretable and generalizable cross-subject fmri visual decoding.arXiv preprint arXiv:2505.15946, 2025

  77. [77]

    Consistency of resting-state correlations between fmri networks and eeg band power.Imaging Neuroscience, 3:IMAG–a, 2025

    Marta Xavier, Inês Esteves, João Jorge, Rodolfo Abreu, Anne-Lise Giraud, Sepideh Sadaghiani, Jonathan Wirsich, and Patrícia Figueiredo. Consistency of resting-state correlations between fmri networks and eeg band power.Imaging Neuroscience, 3:IMAG–a, 2025

  78. [78]

    Biot: Biosignal transformer for cross-data learning in the wild

    Chaoqi Yang, M Westover, and Jimeng Sun. Biot: Biosignal transformer for cross-data learning in the wild. InNeurIPS, pages 78240–78260, 2023

  79. [79]

    Cross-modal information flow in multimodal large language models

    Zhi Zhang, Srishti Yadav, Fengze Han, and Ekaterina Shutova. Cross-modal information flow in multimodal large language models. InProceedings of the Computer Vision and Pattern Recognition Conference, pages 19781–19791, 2025

  80. [80]

    Chisco: An eeg-based bci dataset for decoding of imagined speech.Scientific Data, 11(1):1265, 2024

    Zihan Zhang, Xiao Ding, Yu Bao, Yi Zhao, Xia Liang, Bing Qin, and Ting Liu. Chisco: An eeg-based bci dataset for decoding of imagined speech.Scientific Data, 11(1):1265, 2024

Showing first 80 references.