CodeBrain: Bridging Decoupled Tokenizer and Multi-Scale Architecture for EEG Foundation Model

Chenyu Liu; Feng Wu; Jingying Ma; Mengling Feng; Qika Lin; Yucheng Xing; Ziyu Jia

arxiv: 2506.09110 · v4 · submitted 2025-06-10 · 💻 cs.LG

CodeBrain: Bridging Decoupled Tokenizer and Multi-Scale Architecture for EEG Foundation Model

Jingying Ma , Feng Wu , Qika Lin , Yucheng Xing , Chenyu Liu , Ziyu Jia , Mengling Feng This is my paper

Pith reviewed 2026-05-19 10:10 UTC · model grok-4.3

classification 💻 cs.LG

keywords EEG foundation modeldiscrete tokenizermulti-scale architecturetemporal frequency decouplingbrain signal generalizationsmall-world topologyrepresentation interpretabilitypretrained EEG

0 comments

The pith

Decoupling EEG signals into temporal and frequency tokens plus multi-scale attention lets a foundation model generalize across brain tasks and datasets.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper presents CodeBrain as a two-stage EEG foundation model that first converts raw brain signals into discrete tokens by separately handling their time-based and frequency-based parts. This tokenization step is meant to create richer, more distinguishable representations while also making the internal features easier to connect to known brain phenomena. The second stage then processes those tokens with an architecture that mixes broad global patterns and fine local details to match how brain networks are organized. When pretrained on a very large collection of EEG recordings, the resulting model is shown to handle eight different analysis tasks across ten separate datasets even when the data distributions change. A reader might care because EEG is used for real-time monitoring of brain activity, and better foundation models could reduce the need to build new systems for each new use case.

Core claim

CodeBrain is a two-stage EFM. In the first stage, the TFDual-Tokenizer decouples heterogeneous temporal and frequency EEG signals into discrete tokens, quadratically expanding the representation space to enhance discriminative power and offering domain-specific representation-level interpretability by suggesting potential links to neural events and spectral rhythms. In the second stage, the multi-scale EEGSSM architecture combines structured global convolution with sliding window attention to efficiently capture both sparse long-range and local dependencies, reflecting the brain's small-world topology. Pretrained on the largest public EEG corpus, CodeBrain achieves strong generalization on a

What carries the argument

The TFDual-Tokenizer that separates temporal and frequency EEG components into discrete tokens, paired with the multi-scale EEGSSM that mixes global convolution and sliding-window attention to capture both long-range and local brain dependencies.

If this is right

The model generalizes across eight downstream tasks on ten datasets even when data distributions shift.
Ablation studies, scaling-law analysis, and interpretability checks support the design choices.
The architecture mirrors the brain's small-world topology by handling both sparse long-range and local patterns.
Representation-level interpretability arises from linking tokens to neural events and spectral rhythms.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The token-based approach could make it easier to combine EEG data with recordings from other sensors without retraining from scratch.
If the scaling laws hold, larger versions of the model may continue to improve performance on rare or noisy brain signals.
Clinicians might one day inspect which tokens activate for specific symptoms, turning the model into a diagnostic aid rather than a black box.

Load-bearing premise

That separating temporal and frequency parts of EEG signals into tokens will both enlarge the space of possible representations and create meaningful links to actual brain events for interpretability.

What would settle it

Finding that CodeBrain shows no measurable gain in accuracy or robustness over prior EEG models when evaluated on the same eight tasks and ten datasets under distribution shifts.

Figures

Figures reproduced from arXiv: 2506.09110 by Chenyu Liu, Feng Wu, Jingying Ma, Mengling Feng, Qika Lin, Yucheng Xing, Ziyu Jia.

**Figure 1.** Figure 1: Overview of our contributions. (a) Decoupled vector quantization independently tokenizes temporal and frequency components to preserve heterogeneous EEG structures and enhance representation capacity. (b) State Space Model efficiently captures sparse global dependencies across patches. (c) Sliding Window Attention models fine-grained local dependencies within patches. To address the above challenges, we p… view at source ↗

**Figure 2.** Figure 2: Overview of the CodeBrain framework. Left: TFDual-Tokenizer learns to discretize EEG signals into temporal and frequency tokens using two separate codebooks, by reconstructing both the temporal waveforms and the frequency-domain magnitude and phase. Right: EEGSSM learns representations by predicting the discrete tokens of masked patches generated by TFDual-Tokenizer. 3.3 TFDual-Tokenizer Pretraining Our TF… view at source ↗

**Figure 3.** Figure 3: Model and training data scaling laws of CodeBrain across three datasets on Cohen’s Kappa. [PITH_FULL_IMAGE:figures/full_fig_p009_3.png] view at source ↗

**Figure 4.** Figure 4: Decoupled time-frequency codes visualization on ISRUC_S3 dataset. [PITH_FULL_IMAGE:figures/full_fig_p009_4.png] view at source ↗

**Figure 5.** Figure 5: The model demonstrates a rapid initial decrease in loss during the first few epochs, followed [PITH_FULL_IMAGE:figures/full_fig_p016_5.png] view at source ↗

**Figure 5.** Figure 5: Pretraining Loss Curve of TFDual-Tokenizer [PITH_FULL_IMAGE:figures/full_fig_p017_5.png] view at source ↗

**Figure 6.** Figure 6: Pretraining Loss Curve of TFDual-Tokenizer. Unused Codes Analysis. During pretraining, we track the number of unused codes in both the temporal and frequency codebooks of the TFDual-Tokenizer, each with a size of 4096. As shown in [PITH_FULL_IMAGE:figures/full_fig_p017_6.png] view at source ↗

**Figure 7.** Figure 7: Unused code dynamics of the TFDual-Tokenizer. B.2 EEGSSSM Pretraining Results We plot the pretraining loss curve of EEGSSM in [PITH_FULL_IMAGE:figures/full_fig_p017_7.png] view at source ↗

**Figure 8.** Figure 8: Pretraining Loss Curve of EEGSSM. 18 [PITH_FULL_IMAGE:figures/full_fig_p018_8.png] view at source ↗

**Figure 9.** Figure 9: Class-Specific Code Ratio Across Different Codebooks. [PITH_FULL_IMAGE:figures/full_fig_p019_9.png] view at source ↗

**Figure 10.** Figure 10: Effect of Contrastive Loss on Temporal Codebook Learning. [PITH_FULL_IMAGE:figures/full_fig_p020_10.png] view at source ↗

**Figure 11.** Figure 11: Computational Overhead of Using Different Backbones in the [PITH_FULL_IMAGE:figures/full_fig_p025_11.png] view at source ↗

**Figure 12.** Figure 12: Performance Across Different Mask Ratios on FACED, SEED-V, and ISRUC_S3. [PITH_FULL_IMAGE:figures/full_fig_p029_12.png] view at source ↗

**Figure 13.** Figure 13: EEGSSM Pre-Training Loss Curve for Different Mask Ratios. To further illustrate this pattern, we visualize the training loss curves across different mask ratios in [PITH_FULL_IMAGE:figures/full_fig_p029_13.png] view at source ↗

**Figure 15.** Figure 15: As expected, larger pretraining data consistently lead to lower training loss, indicating more [PITH_FULL_IMAGE:figures/full_fig_p032_15.png] view at source ↗

**Figure 14.** Figure 14: Training Data Scaling Laws on FACED, SEED-V, and ISRUC_S3. [PITH_FULL_IMAGE:figures/full_fig_p033_14.png] view at source ↗

**Figure 15.** Figure 15: EEGSSM Pre-Training Loss Curve for Different Training Data Volume [PITH_FULL_IMAGE:figures/full_fig_p034_15.png] view at source ↗

**Figure 16.** Figure 16: Model Size Scaling Laws on FACED, SEED-V, and ISRUC_S3. [PITH_FULL_IMAGE:figures/full_fig_p035_16.png] view at source ↗

**Figure 17.** Figure 17: EEGSSM Pre-Training Loss Curve for Different Model Size. 35 [PITH_FULL_IMAGE:figures/full_fig_p035_17.png] view at source ↗

read the original abstract

Electroencephalography (EEG) provides real-time insights into brain activity and supports diverse applications in neuroscience. While EEG foundation models (EFMs) have emerged to address the scalability issues of task-specific models, current approaches still yield clinically uninterpretable and weakly discriminative representations, inefficiently capturing global dependencies and neglecting important local neural events. We present CodeBrain, a two-stage EFM designed to fill this gap. In the first stage, we introduce the TFDual-Tokenizer, which decouples heterogeneous temporal and frequency EEG signals into discrete tokens, quadratically expanding the representation space to enhance discriminative power and offering domain-specific representation-level interpretability by suggesting potential links to neural events and spectral rhythms. In the second stage, we propose the multi-scale EEGSSM architecture, which combines structured global convolution with sliding window attention to efficiently capture both sparse long-range and local dependencies, reflecting the brain's small-world topology. Pretrained on the largest public EEG corpus, CodeBrain achieves strong generalization across eight downstream tasks and ten datasets under distribution shifts, supported by comprehensive ablations, scaling-law analyzes, and interpretability evaluations. The code and the pretrained weights are available at https://github.com/jingyingma01/CodeBrain.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

CodeBrain's decoupled tokenizer and multi-scale SSM are a reasonable engineering move for EEG models, but the quadratic expansion and interpretability claims rest on assertion rather than shown derivation or controls.

read the letter

CodeBrain's main new pieces are the TFDual-Tokenizer that splits temporal and frequency EEG signals into separate discrete codebooks and the EEGSSM backbone that mixes structured global convolution with sliding-window attention. They pretrain on the largest public EEG corpus and report results across eight tasks and ten datasets with distribution shifts, plus ablations and scaling checks. The code and weights are released on GitHub, which is useful on its own.

Referee Report

3 major / 2 minor

Summary. CodeBrain is a two-stage EEG foundation model. Stage one introduces the TFDual-Tokenizer that decouples temporal and frequency EEG signals into discrete tokens, claimed to quadratically expand the representation space and supply domain-specific interpretability via links to neural events and spectral rhythms. Stage two deploys the multi-scale EEGSSM architecture that combines structured global convolution with sliding-window attention to capture sparse long-range and local dependencies consistent with brain small-world topology. The model is pretrained on the largest public EEG corpus and evaluated for generalization across eight downstream tasks on ten datasets under distribution shifts, accompanied by ablations, scaling-law analyses, and interpretability studies. Code and pretrained weights are released.

Significance. If the performance and attribution claims hold, the work would advance EEG foundation models by improving both discriminative power and neurophysiological interpretability while respecting brain topology. The public release of code and weights, together with scaling-law analyses and comprehensive ablations, constitutes a clear strength that supports reproducibility and further research.

major comments (3)

[§3.1] §3.1 (TFDual-Tokenizer description): The assertion that decoupling into two discrete codebooks 'quadratically expands the representation space' is not accompanied by a derivation or explicit comparison. It remains unclear whether the model uses the Cartesian product of the two codebooks (size |C_t| × |C_f|) or a simple concatenation; without this formalization or an ablation against a single unified tokenizer, the claimed enhancement in discriminative power cannot be rigorously attributed to the decoupling step.
[§4.3] §4.3 and interpretability subsection: The claim of 'domain-specific representation-level interpretability' via potential links to neural events and spectral rhythms is presented without quantitative validation, such as alignment metrics, statistical tests, or controls that compare learned token assignments against established neurophysiological markers. This leaves the interpretability benefit motivational rather than demonstrated, weakening the link to downstream gains.
[Table 2] Table 2 (main results under distribution shifts): The reported generalization across ten datasets lacks error bars, confidence intervals, or statistical significance tests relative to baselines. Given that the central claim rests on 'strong generalization,' the absence of these elements makes it difficult to assess whether observed improvements are robust or attributable to the proposed architecture.

minor comments (2)

[Abstract] Abstract: 'scaling-law analyzes' should read 'scaling-law analyses'.
[Figure 2] Figure 2 (architecture diagram): The caption and legend could more explicitly distinguish the flow from TFDual-Tokenizer outputs to the EEGSSM blocks to aid reader comprehension.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the insightful comments on our manuscript. We have carefully considered each point and outline our responses below, along with the revisions we plan to implement.

read point-by-point responses

Referee: §3.1 (TFDual-Tokenizer description): The assertion that decoupling into two discrete codebooks 'quadratically expands the representation space' is not accompanied by a derivation or explicit comparison. It remains unclear whether the model uses the Cartesian product of the two codebooks (size |C_t| × |C_f|) or a simple concatenation; without this formalization or an ablation against a single unified tokenizer, the claimed enhancement in discriminative power cannot be rigorously attributed to the decoupling step.

Authors: We appreciate this observation. Upon review, the TFDual-Tokenizer indeed employs separate discrete codebooks for temporal and frequency signals, with the effective representation space being the Cartesian product |C_t| × |C_f|. In the revised version, we will include a mathematical derivation of this quadratic expansion relative to a unified tokenizer, clarify the usage of the product, and add an ablation study comparing it to a single codebook approach to rigorously demonstrate the improvement in discriminative power. revision: yes
Referee: §4.3 and interpretability subsection: The claim of 'domain-specific representation-level interpretability' via potential links to neural events and spectral rhythms is presented without quantitative validation, such as alignment metrics, statistical tests, or controls that compare learned token assignments against established neurophysiological markers. This leaves the interpretability benefit motivational rather than demonstrated, weakening the link to downstream gains.

Authors: We agree that stronger quantitative evidence would enhance this section. We will revise the interpretability subsection to include quantitative validation, such as alignment metrics between token assignments and known neural events (e.g., P300, mu rhythm) and statistical tests against random baselines or controls, to better demonstrate the domain-specific interpretability and its connection to performance gains. revision: yes
Referee: Table 2 (main results under distribution shifts): The reported generalization across ten datasets lacks error bars, confidence intervals, or statistical significance tests relative to baselines. Given that the central claim rests on 'strong generalization,' the absence of these elements makes it difficult to assess whether observed improvements are robust or attributable to the proposed architecture.

Authors: We thank the referee for highlighting this. In the updated manuscript, we will augment Table 2 with error bars (standard deviations across runs or datasets), confidence intervals, and statistical significance tests (such as t-tests with p-values) against the baseline methods to provide a more robust assessment of the generalization performance under distribution shifts. revision: yes

Circularity Check

1 steps flagged

TFDual-Tokenizer quadratic expansion presented as derived benefit but follows directly from dual codebook definition

specific steps

self definitional [Abstract, first-stage description]
"we introduce the TFDual-Tokenizer, which decouples heterogeneous temporal and frequency EEG signals into discrete tokens, quadratically expanding the representation space to enhance discriminative power and offering domain-specific representation-level interpretability by suggesting potential links to neural events and spectral rhythms."

The quadratic expansion is claimed as a benefit that enhances discriminative power, yet it is the immediate result of defining the tokenizer via two separate codebooks whose combined space size is their product; this holds by construction for any dual discrete tokenizer and does not require EEG-specific properties or additional derivation.

full rationale

The paper's central architectural claim in the first stage asserts that decoupling temporal and frequency signals into discrete tokens quadratically expands the representation space and supplies domain-specific interpretability. This expansion is a direct arithmetic consequence of combining two independent codebooks (product of their sizes) rather than an independent derivation from EEG signal properties or empirical validation. No equations are shown demonstrating quadratic growth beyond the definitional product, and the interpretability is framed as 'suggesting potential links' without quantitative mapping to neural events. The downstream generalization claims rest on this premise, but the expansion itself reduces to the tokenizer's construction. The multi-scale architecture and pretraining claims do not exhibit similar reductions and appear independent of fitted inputs or self-citations.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 2 invented entities

The central claims rest on the domain assumption that EEG signals contain separable temporal and frequency components whose decoupling yields interpretable tokens, plus the modeling choice that brain small-world topology is best captured by global convolution plus sliding-window attention. No explicit free parameters or invented physical entities are named in the abstract.

axioms (2)

domain assumption EEG signals contain heterogeneous temporal and frequency components that can be decoupled into discrete tokens for improved representation
Invoked in the description of the TFDual-Tokenizer stage.
domain assumption The brain's small-world topology is effectively modeled by combining structured global convolution with sliding window attention
Invoked in the description of the multi-scale EEGSSM architecture.

invented entities (2)

TFDual-Tokenizer no independent evidence
purpose: Decouple temporal and frequency EEG signals into discrete tokens
New component introduced in stage one to expand representation space and add interpretability.
EEGSSM no independent evidence
purpose: Multi-scale architecture capturing sparse long-range and local dependencies
New architecture introduced in stage two.

pith-pipeline@v0.9.0 · 5763 in / 1505 out tokens · 47822 ms · 2026-05-19T10:10:20.599438+00:00 · methodology

discussion (0)

Forward citations

Cited by 2 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Foundation Model Guided Dual-Branch Co-Adaptation for Source-Free EEG Decoding
eess.SP 2026-04 unverdicted novelty 7.0

FUSED integrates EEG foundation models into source-free domain adaptation via dual-branch co-adaptation, consensus filtering, and two-stage pseudo-label refinement to achieve state-of-the-art cross-subject EEG decoding.
PRiSE-EEG: A Prior-Guided Foundation Model with Depth-Stratified Experts for Cross-Paradigm EEG Representation Learning
eess.SP 2026-05 unverdicted novelty 6.0

PRiSE-EEG is a prior-guided EEG foundation model that allocates shared and specialized experts across depth using CKA-derived sigmoid mappings and reports strong cross-paradigm results on 12 benchmarks.

Reference graph

Works this paper leans on

84 extracted references · 84 canonical work pages · cited by 2 Pith papers · 6 internal anchors

[1]

Lippincott Williams & Wilkins, 2005

Ernst Niedermeyer and FH Lopes da Silva.Electroencephalography: basic principles, clinical applications, and related fields. Lippincott Williams & Wilkins, 2005

work page 2005
[2]

Eeg and meg: relevance to neuroscience.Neuron, 80(5):1112–1128, 2013

Fernando Lopes da Silva. Eeg and meg: relevance to neuroscience.Neuron, 80(5):1112–1128, 2013

work page 2013
[3]

Cognitive impairment during epileptiform discharges: is it ever justifiable to treat the eeg? The Lancet Neurology, 2(12):725–730, 2003

Colin D Binnie. Cognitive impairment during epileptiform discharges: is it ever justifiable to treat the eeg? The Lancet Neurology, 2(12):725–730, 2003

work page 2003
[4]

Xsleepnet: Multi-view sequential model for automatic sleep staging.IEEE Transactions on Pattern Analysis and Machine Intelligence, 44(9):5903–5915, 2021

Huy Phan, Oliver Y Chén, Minh C Tran, Philipp Koch, Alfred Mertins, and Maarten De Vos. Xsleepnet: Multi-view sequential model for automatic sleep staging.IEEE Transactions on Pattern Analysis and Machine Intelligence, 44(9):5903–5915, 2021

work page 2021
[5]

Explainable vision transformer for automatic visual sleep staging on multimodal psg signals

Hyojin Lee, You Rim Choi, Hyun Kyung Lee, Jaemin Jeong, Joopyo Hong, Hyun-Woo Shin, and Hyung-Sin Kim. Explainable vision transformer for automatic visual sleep staging on multimodal psg signals. npj Digital Medicine, 8(1):55, 2025

work page 2025
[6]

St-usleepnet: A spatial-temporal coupling prominence network for multi-channel sleep staging.arXiv preprint arXiv:2408.11884, 2024

Jingying Ma, Qika Lin, Ziyu Jia, and Mengling Feng. St-usleepnet: A spatial-temporal coupling prominence network for multi-channel sleep staging.arXiv preprint arXiv:2408.11884, 2024

work page arXiv 2024
[7]

Sst-emotionnet: Spatial-spectral-temporal based attention 3d dense network for eeg emotion recognition

Ziyu Jia, Youfang Lin, Xiyang Cai, Haobin Chen, Haijun Gou, and Jing Wang. Sst-emotionnet: Spatial-spectral-temporal based attention 3d dense network for eeg emotion recognition. In Proceedings of the 28th ACM international conference on multimedia, pages 2909–2917, 2020

work page 2020
[8]

Eeg emotion recognition based on dynamical graph attention network

Yi Guo, Chao Tang, Hao Wu, and Badong Chen. Eeg emotion recognition based on dynamical graph attention network. InICASSP 2024-2024 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pages 1921–1925. IEEE, 2024

work page 2024
[9]

Dmmr: Cross-subject domain generalization for eeg-based emotion recognition via denoising mixed mutual reconstruction

Yiming Wang, Bin Zhang, and Yujiao Tang. Dmmr: Cross-subject domain generalization for eeg-based emotion recognition via denoising mixed mutual reconstruction. InProceedings of the AAAI Conference on Artificial Intelligence, volume 38, pages 628–636, 2024

work page 2024
[10]

St-gf: Graph-based fusion of spatial and temporal features for eeg motor imagery decoding

Xuhui Wang, Kui Zhao, Enze Shi, Sigang Yu, Geng Chen, and Shu Zhang. St-gf: Graph-based fusion of spatial and temporal features for eeg motor imagery decoding. In2024 IEEE International Conference on Bioinformatics and Biomedicine (BIBM), pages 3811–3816. IEEE, 2024

work page 2024
[11]

Emre Arı and Ertuğrul Taçgın. Nf-eeg: A generalized cnn model for multi class eeg motor imagery classification without signal preprocessing for brain computer interfaces.Biomedical Signal Processing and Control, 92:106081, 2024

work page 2024
[12]

Learning space-time-frequency representation with two-stream attention based 3d network for motor imagery classification

Zhenqi Li, Jing Wang, Ziyu Jia, and Youfang Lin. Learning space-time-frequency representation with two-stream attention based 3d network for motor imagery classification. In2020 IEEE International Conference on Data Mining (ICDM), pages 1124–1129. IEEE, 2020

work page 2020
[13]

Exploring the diagnostic potential of llms in schizophrenia detection through eeg analysis

Michele Guerra, Roberto Milanese, Michele Deodato, Madalina G Ciobanu, and Fausto Fasano. Exploring the diagnostic potential of llms in schizophrenia detection through eeg analysis. In2024 IEEE International Conference on Bioinformatics and Biomedicine (BIBM), pages 6812–6819. IEEE, 2024

work page 2024
[14]

Exploring large-scale language models to evaluate eeg-based multimodal data for mental health

Yongquan Hu, Shuning Zhang, Ting Dang, Hong Jia, Flora D Salim, Wen Hu, and Aaron J Quigley. Exploring large-scale language models to evaluate eeg-based multimodal data for mental health. In Companion of the 2024 on ACM International Joint Conference on Pervasive and Ubiquitous Computing, pages 412–417, 2024

work page 2024
[15]

Brain foundation models: A survey on advancements in neural signal processing and brain discovery

Xinliang Zhou, Chenyu Liu, Zhisheng Chen, Kun Wang, Yi Ding, Ziyu Jia, and Qingsong Wen. Brain foundation models: A survey on advancements in neural signal processing and brain discovery. arXiv preprint arXiv:2503.00580, 2025

work page arXiv 2025
[16]

Neural discrete representation learning

Aaron Van Den Oord, Oriol Vinyals, et al. Neural discrete representation learning. InAdvances in neural information processing systems, volume 30, 2017. 11

work page 2017
[17]

Bert: Pre-training of deep bidirectional transformers for language understanding

Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. Bert: Pre-training of deep bidirectional transformers for language understanding. InProceedings of the 2019 conference of the North American chapter of the association for computational linguistics: human language technologies, volume 1 (long and short papers), pages 4171–4186, 2019

work page 2019
[18]

Eegpt: Pretrained transformer for universal and reliable representation of eeg signals

Guangyu Wang, Wenchao Liu, Yuhong He, Cong Xu, Lin Ma, and Haifeng Li. Eegpt: Pretrained transformer for universal and reliable representation of eeg signals. In Advances in Neural Information Processing Systems, volume 37, pages 39249–39280, 2024

work page 2024
[19]

Cbramod: A criss-cross brain foundation model for eeg decoding

Jiquan Wang, Sha Zhao, Zhiling Luo, Yangxuan Zhou, Haiteng Jiang, Shijian Li, Tao Li, and Gang Pan. Cbramod: A criss-cross brain foundation model for eeg decoding. InThe Third International Conference on Learning Representations, 2025

work page 2025
[20]

Large brain model for learning generic represen- tations with tremendous eeg data in bci

Weibang Jiang, Liming Zhao, and Bao-liang Lu. Large brain model for learning generic represen- tations with tremendous eeg data in bci. InThe Twelfth International Conference on Learning Representations, 2024

work page 2024
[21]

Tokenizing Single-Channel EEG with Time-Frequency Motif Learning

Jathurshan Pradeepkumar, Xihao Piao, Zheng Chen, and Jimeng Sun. Single-channel eeg tok- enization through time-frequency modeling.arXiv preprint arXiv:2502.16060, 2025

work page internal anchor Pith review Pith/arXiv arXiv 2025
[22]

Online clustered codebook

Chuanxia Zheng and Andrea Vedaldi. Online clustered codebook. InProceedings of the IEEE/CVF International Conference on Computer Vision, pages 22798–22807, 2023

work page 2023
[23]

Finite scalar quantiza- tion: Vq-vae made simple

Fabian Mentzer, David Minnen, Eirikur Agustsson, and Michael Tschannen. Finite scalar quantiza- tion: Vq-vae made simple. InThe Twelfth International Conference on Learning Representations, 2024

work page 2024
[24]

Decomposing eeg data into space–time–frequency components using parallel factor analysis.NeuroImage, 22(3):1035–1045, 2004

Fumikazu Miwakeichi, Eduardo Martınez-Montes, Pedro A Valdés-Sosa, Nobuaki Nishiyama, Hiroaki Mizuhara, and Yoko Yamaguchi. Decomposing eeg data into space–time–frequency components using parallel factor analysis.NeuroImage, 22(3):1035–1045, 2004

work page 2004
[25]

Vector quantization for recommender systems: a review and outlook

Qijiong Liu, Xiaoyu Dong, Jiaren Xiao, Nuo Chen, Hengchang Hu, Jieming Zhu, Chenxu Zhu, Tetsuya Sakai, and Xiao-Ming Wu. Vector quantization for recommender systems: a review and outlook. arXiv preprint arXiv:2405.03110, 2024

work page arXiv 2024
[26]

Complex brain networks: graph theoretical analysis of structural and functional systems.Nature reviews neuroscience, 10(3):186–198, 2009

Ed Bullmore and Olaf Sporns. Complex brain networks: graph theoretical analysis of structural and functional systems.Nature reviews neuroscience, 10(3):186–198, 2009

work page 2009
[27]

Small-world brain networks

Danielle Smith Bassett and ED Bullmore. Small-world brain networks. The neuroscientist, 12(6):512–523, 2006

work page 2006
[28]

Uncovering intrinsic modular organization of spontaneous brain activity in humans.PloS one, 4(4):e5226, 2009

Yong He, Jinhui Wang, Liang Wang, Zhang J Chen, Chaogan Yan, Hong Yang, Hehan Tang, Chaozhe Zhu, Qiyong Gong, Yufeng Zang, et al. Uncovering intrinsic modular organization of spontaneous brain activity in humans.PloS one, 4(4):e5226, 2009

work page 2009
[29]

Biot: Biosignal transformer for cross-data learning in the wild

Chaoqi Yang, M Westover, and Jimeng Sun. Biot: Biosignal transformer for cross-data learning in the wild. InAdvances in Neural Information Processing Systems, volume 36, pages 78240–78260, 2023

work page 2023
[30]

Eeg2rep: enhancing self-supervised eeg representation through informative masked inputs

Navid Mohammadi Foumani, Geoffrey Mackellar, Soheila Ghane, Saad Irtza, Nam Nguyen, and Mahsa Salehi. Eeg2rep: enhancing self-supervised eeg representation through informative masked inputs. In Proceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, pages 5544–5555, 2024

work page 2024
[31]

Attention is all you need

Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Łukasz Kaiser, and Illia Polosukhin. Attention is all you need. InAdvances in neural information processing systems, volume 30, 2017

work page 2017
[32]

Long range arena : A benchmark for efficient transformers

Yi Tay, Mostafa Dehghani, Samira Abnar, Yikang Shen, Dara Bahri, Philip Pham, Jinfeng Rao, Liu Yang, Sebastian Ruder, and Donald Metzler. Long range arena : A benchmark for efficient transformers. In The Ninth International Conference on Learning Representations, 2021. 12

work page 2021
[33]

An Efficient Self-Supervised Framework for Long-Sequence EEG Modeling

Jiazhen Hong, Geoffrey Mackellar, and Soheila Ghane. Eegm2: An efficient mamba-2-based self-supervised framework for long-sequence eeg modeling.arXiv preprint arXiv:2502.17873, 2025

work page internal anchor Pith review Pith/arXiv arXiv 2025
[34]

Femba: Effi- cient and scalable eeg analysis with a bidirectional mamba foundation model.arXiv preprint arXiv:2502.06438, 2025

Anna Tegon, Thorir Mar Ingolfsson, Xiaying Wang, Luca Benini, and Yawei Li. Femba: Effi- cient and scalable eeg analysis with a bidirectional mamba foundation model.arXiv preprint arXiv:2502.06438, 2025

work page arXiv 2025
[35]

Springer Publishing Company, 2021

William O Tatum IV.Handbook of EEG interpretation. Springer Publishing Company, 2021

work page 2021
[36]

Efficiently Modeling Long Sequences with Structured State Spaces

Albert Gu, Karan Goel, and Christopher Ré. Efficiently modeling long sequences with structured state spaces. arXiv preprint arXiv:2111.00396, 2021

work page internal anchor Pith review Pith/arXiv arXiv 2021
[37]

Simplified state space layers for sequence modeling

Jimmy TH Smith, Andrew Warrington, and Scott W Linderman. Simplified state space layers for sequence modeling. InICLR, 2023

work page 2023
[38]

What makes convolutional models great on long sequence modeling?arXiv preprint arXiv:2210.09298, 2022

Yuhong Li, Tianle Cai, Yi Zhang, Deming Chen, and Debadeepta Dey. What makes convolutional models great on long sequence modeling?arXiv preprint arXiv:2210.09298, 2022

work page arXiv 2022
[39]

On the parameterization and ini- tialization of diagonal state space models.Advances in Neural Information Processing Systems, 35:35971–35983, 2022

Albert Gu, Karan Goel, Ankit Gupta, and Christopher Ré. On the parameterization and ini- tialization of diagonal state space models.Advances in Neural Information Processing Systems, 35:35971–35983, 2022

work page 2022
[40]

Longformer: The Long-Document Transformer

Iz Beltagy, Matthew E Peters, and Arman Cohan. Longformer: The long-document transformer. arXiv preprint arXiv:2004.05150, 2020

work page internal anchor Pith review Pith/arXiv arXiv 2004
[41]

Internimage: Exploring large-scale vision foundation models with deformable convolutions

Wenhai Wang, Jifeng Dai, Zhe Chen, Zhenhang Huang, Zhiqi Li, Xizhou Zhu, Xiaowei Hu, Tong Lu, Lewei Lu, Hongsheng Li, et al. Internimage: Exploring large-scale vision foundation models with deformable convolutions. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 14408–14419, 2023

work page 2023
[42]

GPT-4 Technical Report

Josh Achiam, Steven Adler, Sandhini Agarwal, Lama Ahmad, Ilge Akkaya, Florencia Leoni Aleman, Diogo Almeida, Janko Altenschmidt, Sam Altman, Shyamal Anadkat, et al. Gpt-4 technical report. arXiv preprint arXiv:2303.08774, 2023

work page internal anchor Pith review Pith/arXiv arXiv 2023
[43]

Graphsleepnet: Adaptive spatial-temporal graph convolutional networks for sleep stage classification

Ziyu Jia, Youfang Lin, Jing Wang, Ronghao Zhou, Xiaojun Ning, Yuanlai He, and Yaoshuai Zhao. Graphsleepnet: Adaptive spatial-temporal graph convolutional networks for sleep stage classification. InIjcai, volume 2021, pages 1324–1330, 2020

work page 2021
[44]

Caresleepnet: a hybrid deep learning network for automatic sleep staging.IEEE Journal of Biomedical and Health Informatics, 2024

Jiquan Wang, Sha Zhao, Haiteng Jiang, Yangxuan Zhou, Zhenghe Yu, Tao Li, Shijian Li, and Gang Pan. Caresleepnet: a hybrid deep learning network for automatic sleep staging.IEEE Journal of Biomedical and Health Informatics, 2024

work page 2024
[45]

Long-term eeg partitioning for seizure onset detection

Zheng Chen, Yasuko Matsubara, Yasushi Sakurai, and Jimeng Sun. Long-term eeg partitioning for seizure onset detection. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 39, pages 14221–14229, 2025

work page 2025
[46]

Large cognition model: Towards pretrained eeg foundation model.arXiv preprint arXiv:2502.17464, 2025

Chi-Sheng Chen, Ying-Jung Chen, and Aidan Hung-Wen Tsai. Large cognition model: Towards pretrained eeg foundation model.arXiv preprint arXiv:2502.17464, 2025

work page arXiv 2025
[47]

Bendr: Using transformers and a contrastive self-supervised learning task to learn from massive amounts of eeg data.Frontiers in Human Neuroscience, 15:653659, 2021

Demetres Kostas, Stephane Aroca-Ouellette, and Frank Rudzicz. Bendr: Using transformers and a contrastive self-supervised learning task to learn from massive amounts of eeg data.Frontiers in Human Neuroscience, 15:653659, 2021

work page 2021
[48]

Brant: Foundation model for intracranial neural signal

Daoze Zhang, Zhizhang Yuan, Yang Yang, Junru Chen, Jingjing Wang, and Yafeng Li. Brant: Foundation model for intracranial neural signal. Advances in Neural Information Processing Systems, 36:26304–26321, 2023

work page 2023
[49]

Brant-2: Foundation model for brain signals.CoRR, 2024

Zhizhang Yuan, Daoze Zhang, Junru Chen, Gefei Gu, and Yang Yang. Brant-2: Foundation model for brain signals.CoRR, 2024

work page 2024
[50]

Brant-x: A unified physiological signal alignment framework

Daoze Zhang, Zhizhang Yuan, Junru Chen, Kerui Chen, and Yang Yang. Brant-x: A unified physiological signal alignment framework. InProceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, pages 4155–4166, 2024. 13

work page 2024
[51]

Neurolm: A universal multi-task foundation model for bridging the gap between language and eeg signals

Weibang Jiang, Yansen Wang, Bao-liang Lu, and Dongsheng Li. Neurolm: A universal multi-task foundation model for bridging the gap between language and eeg signals. InThe Thirteenth International Conference on Learning Representations, 2025

work page 2025
[52]

Neural machine translation of rare words with subword units

Rico Sennrich, Barry Haddow, and Alexandra Birch. Neural machine translation of rare words with subword units. In54th Annual Meeting of the Association for Computational Linguistics, pages 1715–1725. Association for Computational Linguistics (ACL), 2016

work page 2016
[53]

Sentencepiece: A simple and language independent subword tokenizer and detokenizer for neural text processing

Taku Kudo and John Richardson. Sentencepiece: A simple and language independent subword tokenizer and detokenizer for neural text processing. InProceedings of the 2018 Conference on Empirical Methods in Natural Language Processing: System Demonstrations, pages 66–71, 2018

work page 2018
[54]

Deep state space models for time series forecasting.Advances in neural information processing systems, 31, 2018

Syama Sundar Rangapuram, Matthias W Seeger, Jan Gasthaus, Lorenzo Stella, Yuyang Wang, and Tim Januschowski. Deep state space models for time series forecasting.Advances in neural information processing systems, 31, 2018

work page 2018
[55]

Hippo: Recurrent memory with optimal polynomial projections.Advances in neural information processing systems, 33:1474– 1487, 2020

Albert Gu, Tri Dao, Stefano Ermon, Atri Rudra, and Christopher Ré. Hippo: Recurrent memory with optimal polynomial projections.Advances in neural information processing systems, 33:1474– 1487, 2020

work page 2020
[56]

Simplified State Space Layers for Sequence Modeling

Jimmy TH Smith, Andrew Warrington, and Scott W Linderman. Simplified state space layers for sequence modeling. arXiv preprint arXiv:2208.04933, 2022

work page internal anchor Pith review Pith/arXiv arXiv 2022
[57]

Aniruddh Raghu, Payal Chandak, Ridwan Alam, John Guttag, and Collin M. Stultz. Sequential multi-dimensional self-supervised learning for clinical time series, 2023

work page 2023
[58]

Fu, Tri Dao, Khaled K

Tri Dao, Daniel Y Fu, Khaled K Saab, Armin W Thomas, Atri Rudra, and Christopher Ré. Hungry hungry hippos: Towards language modeling with state space models.arXiv preprint arXiv:2212.14052, 2022

work page arXiv 2022
[59]

Deep latent state space models for time-series generation

Linqi Zhou, Michael Poli, Winnie Xu, Stefano Massaroli, and Stefano Ermon. Deep latent state space models for time-series generation. InInternational Conference on Machine Learning, pages 42625–42643. PMLR, 2023

work page 2023
[60]

Eeg-ssm: Leveraging state-space model for dementia detection

Xuan-The Tran, LinhLe, QuocToan Nguyen, Thomas Do, andChin-Teng Lin. Eeg-ssm: Leveraging state-space model for dementia detection.arXiv preprint arXiv:2407.17801, 2024

work page arXiv 2024
[61]

Eegmamba: Bidirectional state space model with mixture of experts for eeg multi-task classification, 2024

Yiyu Gui, MingZhi Chen, Yuqi Su, Guibo Luo, and Yuchao Yang. Eegmamba: Bidirectional state space model with mixture of experts for eeg multi-task classification, 2024

work page 2024
[62]

An algorithm for the machine calculation of complex fourier series

James W Cooley and John W Tukey. An algorithm for the machine calculation of complex fourier series. Mathematics of computation, 19(90):297–301, 1965

work page 1965
[63]

Clocs: Contrastive learning of cardiac signals across space, time, and patients

Dani Kiyasseh, Tingting Zhu, and David A Clifton. Clocs: Contrastive learning of cardiac signals across space, time, and patients. InInternational Conference on Machine Learning, pages 5606–5615. PMLR, 2021

work page 2021
[64]

A simple framework for contrastive learning of visual representations

Ting Chen, Simon Kornblith, Mohammad Norouzi, and Geoffrey Hinton. A simple framework for contrastive learning of visual representations. InInternational conference on machine learning, pages 1597–1607. PmLR, 2020

work page 2020
[65]

Root mean square layer normalization, 2019

Biao Zhang and Rico Sennrich. Root mean square layer normalization, 2019

work page 2019
[66]

The temple university hospital eeg data corpus.Frontiers in neuroscience, 10:196, 2016

Iyad Obeid and Joseph Picone. The temple university hospital eeg data corpus.Frontiers in neuroscience, 10:196, 2016

work page 2016
[67]

The ten-twenty electrode system of the international federation.Electroenceph clin Neurophysiol, 10:367–380, 1958

JASPER HH. The ten-twenty electrode system of the international federation.Electroenceph clin Neurophysiol, 10:367–380, 1958

work page 1958
[68]

A large finer-grained affective computing eeg dataset.Scientific Data, 10(1):740, 2023

Jingjing Chen, Xiaobin Wang, Chen Huang, Xin Hu, Xinke Shen, and Dan Zhang. A large finer-grained affective computing eeg dataset.Scientific Data, 10(1):740, 2023. 14

work page 2023
[69]

Wei Liu, Jie-Lin Qiu, Wei-Long Zheng, and Bao-Liang Lu. Comparing recognition performance and robustness of multimodal deep learning models for multimodal emotion recognition.IEEE Transactions on Cognitive and Developmental Systems, 14(2):715–729, 2021

work page 2021
[70]

Isruc-sleep: A com- prehensive public dataset for sleep researchers.Computer methods and programs in biomedicine, 124:180–192, 2016

Sirvan Khalighi, Teresa Sousa, José Moutinho Santos, and Urbano Nunes. Isruc-sleep: A com- prehensive public dataset for sleep researchers.Computer methods and programs in biomedicine, 124:180–192, 2016

work page 2016
[71]

2020 international brain– computer interface competition: A review.Frontiers in human neuroscience, 16:898300, 2022

Ji-Hoon Jeong, Jeong-Hyun Cho, Young-Eun Lee, Seo-Hyun Lee, Gi-Hwan Shin, Young-Seok Kweon, José del R Millán, Klaus-Robert Müller, and Seong-Whan Lee. 2020 international brain– computer interface competition: A review.Frontiers in human neuroscience, 16:898300, 2022

work page 2020
[72]

Application of machine learning to epileptic seizure onset detection and treatment

Ali Hossam Shoeb. Application of machine learning to epileptic seizure onset detection and treatment. PhD thesis, Massachusetts Institute of Technology, 2009

work page 2009
[73]

MDD Patients and Healthy Controls EEG Data (New)

Wajid Mumtaz. MDD Patients and Healthy Controls EEG Data (New). Figshare, November 2016

work page 2016
[74]

Physiobank, physiotoolkit, and physionet: components of a new research resource for complex physiologic signals

Ary L Goldberger, Luis AN Amaral, Leon Glass, Jeffrey M Hausdorff, Plamen Ch Ivanov, Roger G Mark, Joseph E Mietus, George B Moody, Chung-Kang Peng, and H Eugene Stanley. Physiobank, physiotoolkit, and physionet: components of a new research resource for complex physiologic signals. circulation, 101(23):e215–e220, 2000

work page 2000
[75]

Efficiently modeling long sequences with structured state spaces

Albert Gu, Karan Goel, and Christopher Re. Efficiently modeling long sequences with structured state spaces. InThe Tenth International Conference on Learning Representations, 2022

work page 2022
[76]

Simple hardware-efficient long convolutions for sequence modeling

Daniel Y Fu, Elliot L Epstein, Eric Nguyen, Armin W Thomas, Michael Zhang, Tri Dao, Atri Rudra, and Christopher Ré. Simple hardware-efficient long convolutions for sequence modeling. In International Conference on Machine Learning, pages 10373–10391. PMLR, 2023

work page 2023
[77]

Emerging properties in self-supervised vision transformers

Mathilde Caron, Hugo Touvron, Ishan Misra, Hervé Jégou, Julien Mairal, Piotr Bojanowski, and Armand Joulin. Emerging properties in self-supervised vision transformers. InProceedings of the IEEE/CVF international conference on computer vision, pages 9650–9660, 2021

work page 2021
[78]

Richard B Berry, Rita Brooks, Charlene E Gamaldo, Susan M Harding, Carole Marcus, Bradley V Vaughn, et al. The aasm manual for the scoring of sleep and associated events.Rules, Terminology and Technical Specifications, Darien, Illinois, American Academy of Sleep Medicine, 176(2012):7, 2012

work page 2012
[79]

Eegnet: a compact convolutional neural network for eeg-based brain–computer interfaces

Vernon J Lawhern, Amelia J Solon, Nicholas R Waytowich, Stephen M Gordon, Chou P Hung, and Brent J Lance. Eegnet: a compact convolutional neural network for eeg-based brain–computer interfaces. Journal of neural engineering, 15(5):056013, 2018

work page 2018
[80]

Transformer-based spatial-temporal feature learning for eeg decoding.arXiv preprint arXiv:2106.11170, 2021

Yonghao Song, Xueyu Jia, Lie Yang, and Longhan Xie. Transformer-based spatial-temporal feature learning for eeg decoding.arXiv preprint arXiv:2106.11170, 2021. 15 A Preliminaries Convolution State Space Models The state-space model is a classic model in control theory, and it represents the operational state of a system using first-order differential equa...

work page arXiv 2021

Showing first 80 references.

[1] [1]

Lippincott Williams & Wilkins, 2005

Ernst Niedermeyer and FH Lopes da Silva.Electroencephalography: basic principles, clinical applications, and related fields. Lippincott Williams & Wilkins, 2005

work page 2005

[2] [2]

Eeg and meg: relevance to neuroscience.Neuron, 80(5):1112–1128, 2013

Fernando Lopes da Silva. Eeg and meg: relevance to neuroscience.Neuron, 80(5):1112–1128, 2013

work page 2013

[3] [3]

Cognitive impairment during epileptiform discharges: is it ever justifiable to treat the eeg? The Lancet Neurology, 2(12):725–730, 2003

Colin D Binnie. Cognitive impairment during epileptiform discharges: is it ever justifiable to treat the eeg? The Lancet Neurology, 2(12):725–730, 2003

work page 2003

[4] [4]

Xsleepnet: Multi-view sequential model for automatic sleep staging.IEEE Transactions on Pattern Analysis and Machine Intelligence, 44(9):5903–5915, 2021

Huy Phan, Oliver Y Chén, Minh C Tran, Philipp Koch, Alfred Mertins, and Maarten De Vos. Xsleepnet: Multi-view sequential model for automatic sleep staging.IEEE Transactions on Pattern Analysis and Machine Intelligence, 44(9):5903–5915, 2021

work page 2021

[5] [5]

Explainable vision transformer for automatic visual sleep staging on multimodal psg signals

Hyojin Lee, You Rim Choi, Hyun Kyung Lee, Jaemin Jeong, Joopyo Hong, Hyun-Woo Shin, and Hyung-Sin Kim. Explainable vision transformer for automatic visual sleep staging on multimodal psg signals. npj Digital Medicine, 8(1):55, 2025

work page 2025

[6] [6]

St-usleepnet: A spatial-temporal coupling prominence network for multi-channel sleep staging.arXiv preprint arXiv:2408.11884, 2024

Jingying Ma, Qika Lin, Ziyu Jia, and Mengling Feng. St-usleepnet: A spatial-temporal coupling prominence network for multi-channel sleep staging.arXiv preprint arXiv:2408.11884, 2024

work page arXiv 2024

[7] [7]

Sst-emotionnet: Spatial-spectral-temporal based attention 3d dense network for eeg emotion recognition

Ziyu Jia, Youfang Lin, Xiyang Cai, Haobin Chen, Haijun Gou, and Jing Wang. Sst-emotionnet: Spatial-spectral-temporal based attention 3d dense network for eeg emotion recognition. In Proceedings of the 28th ACM international conference on multimedia, pages 2909–2917, 2020

work page 2020

[8] [8]

Eeg emotion recognition based on dynamical graph attention network

Yi Guo, Chao Tang, Hao Wu, and Badong Chen. Eeg emotion recognition based on dynamical graph attention network. InICASSP 2024-2024 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pages 1921–1925. IEEE, 2024

work page 2024

[9] [9]

Dmmr: Cross-subject domain generalization for eeg-based emotion recognition via denoising mixed mutual reconstruction

Yiming Wang, Bin Zhang, and Yujiao Tang. Dmmr: Cross-subject domain generalization for eeg-based emotion recognition via denoising mixed mutual reconstruction. InProceedings of the AAAI Conference on Artificial Intelligence, volume 38, pages 628–636, 2024

work page 2024

[10] [10]

St-gf: Graph-based fusion of spatial and temporal features for eeg motor imagery decoding

Xuhui Wang, Kui Zhao, Enze Shi, Sigang Yu, Geng Chen, and Shu Zhang. St-gf: Graph-based fusion of spatial and temporal features for eeg motor imagery decoding. In2024 IEEE International Conference on Bioinformatics and Biomedicine (BIBM), pages 3811–3816. IEEE, 2024

work page 2024

[11] [11]

Emre Arı and Ertuğrul Taçgın. Nf-eeg: A generalized cnn model for multi class eeg motor imagery classification without signal preprocessing for brain computer interfaces.Biomedical Signal Processing and Control, 92:106081, 2024

work page 2024

[12] [12]

Learning space-time-frequency representation with two-stream attention based 3d network for motor imagery classification

Zhenqi Li, Jing Wang, Ziyu Jia, and Youfang Lin. Learning space-time-frequency representation with two-stream attention based 3d network for motor imagery classification. In2020 IEEE International Conference on Data Mining (ICDM), pages 1124–1129. IEEE, 2020

work page 2020

[13] [13]

Exploring the diagnostic potential of llms in schizophrenia detection through eeg analysis

Michele Guerra, Roberto Milanese, Michele Deodato, Madalina G Ciobanu, and Fausto Fasano. Exploring the diagnostic potential of llms in schizophrenia detection through eeg analysis. In2024 IEEE International Conference on Bioinformatics and Biomedicine (BIBM), pages 6812–6819. IEEE, 2024

work page 2024

[14] [14]

Exploring large-scale language models to evaluate eeg-based multimodal data for mental health

Yongquan Hu, Shuning Zhang, Ting Dang, Hong Jia, Flora D Salim, Wen Hu, and Aaron J Quigley. Exploring large-scale language models to evaluate eeg-based multimodal data for mental health. In Companion of the 2024 on ACM International Joint Conference on Pervasive and Ubiquitous Computing, pages 412–417, 2024

work page 2024

[15] [15]

Brain foundation models: A survey on advancements in neural signal processing and brain discovery

Xinliang Zhou, Chenyu Liu, Zhisheng Chen, Kun Wang, Yi Ding, Ziyu Jia, and Qingsong Wen. Brain foundation models: A survey on advancements in neural signal processing and brain discovery. arXiv preprint arXiv:2503.00580, 2025

work page arXiv 2025

[16] [16]

Neural discrete representation learning

Aaron Van Den Oord, Oriol Vinyals, et al. Neural discrete representation learning. InAdvances in neural information processing systems, volume 30, 2017. 11

work page 2017

[17] [17]

Bert: Pre-training of deep bidirectional transformers for language understanding

Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. Bert: Pre-training of deep bidirectional transformers for language understanding. InProceedings of the 2019 conference of the North American chapter of the association for computational linguistics: human language technologies, volume 1 (long and short papers), pages 4171–4186, 2019

work page 2019

[18] [18]

Eegpt: Pretrained transformer for universal and reliable representation of eeg signals

Guangyu Wang, Wenchao Liu, Yuhong He, Cong Xu, Lin Ma, and Haifeng Li. Eegpt: Pretrained transformer for universal and reliable representation of eeg signals. In Advances in Neural Information Processing Systems, volume 37, pages 39249–39280, 2024

work page 2024

[19] [19]

Cbramod: A criss-cross brain foundation model for eeg decoding

Jiquan Wang, Sha Zhao, Zhiling Luo, Yangxuan Zhou, Haiteng Jiang, Shijian Li, Tao Li, and Gang Pan. Cbramod: A criss-cross brain foundation model for eeg decoding. InThe Third International Conference on Learning Representations, 2025

work page 2025

[20] [20]

Large brain model for learning generic represen- tations with tremendous eeg data in bci

Weibang Jiang, Liming Zhao, and Bao-liang Lu. Large brain model for learning generic represen- tations with tremendous eeg data in bci. InThe Twelfth International Conference on Learning Representations, 2024

work page 2024

[21] [21]

Tokenizing Single-Channel EEG with Time-Frequency Motif Learning

Jathurshan Pradeepkumar, Xihao Piao, Zheng Chen, and Jimeng Sun. Single-channel eeg tok- enization through time-frequency modeling.arXiv preprint arXiv:2502.16060, 2025

work page internal anchor Pith review Pith/arXiv arXiv 2025

[22] [22]

Online clustered codebook

Chuanxia Zheng and Andrea Vedaldi. Online clustered codebook. InProceedings of the IEEE/CVF International Conference on Computer Vision, pages 22798–22807, 2023

work page 2023

[23] [23]

Finite scalar quantiza- tion: Vq-vae made simple

Fabian Mentzer, David Minnen, Eirikur Agustsson, and Michael Tschannen. Finite scalar quantiza- tion: Vq-vae made simple. InThe Twelfth International Conference on Learning Representations, 2024

work page 2024

[24] [24]

Decomposing eeg data into space–time–frequency components using parallel factor analysis.NeuroImage, 22(3):1035–1045, 2004

Fumikazu Miwakeichi, Eduardo Martınez-Montes, Pedro A Valdés-Sosa, Nobuaki Nishiyama, Hiroaki Mizuhara, and Yoko Yamaguchi. Decomposing eeg data into space–time–frequency components using parallel factor analysis.NeuroImage, 22(3):1035–1045, 2004

work page 2004

[25] [25]

Vector quantization for recommender systems: a review and outlook

Qijiong Liu, Xiaoyu Dong, Jiaren Xiao, Nuo Chen, Hengchang Hu, Jieming Zhu, Chenxu Zhu, Tetsuya Sakai, and Xiao-Ming Wu. Vector quantization for recommender systems: a review and outlook. arXiv preprint arXiv:2405.03110, 2024

work page arXiv 2024

[26] [26]

Complex brain networks: graph theoretical analysis of structural and functional systems.Nature reviews neuroscience, 10(3):186–198, 2009

Ed Bullmore and Olaf Sporns. Complex brain networks: graph theoretical analysis of structural and functional systems.Nature reviews neuroscience, 10(3):186–198, 2009

work page 2009

[27] [27]

Small-world brain networks

Danielle Smith Bassett and ED Bullmore. Small-world brain networks. The neuroscientist, 12(6):512–523, 2006

work page 2006

[28] [28]

Uncovering intrinsic modular organization of spontaneous brain activity in humans.PloS one, 4(4):e5226, 2009

Yong He, Jinhui Wang, Liang Wang, Zhang J Chen, Chaogan Yan, Hong Yang, Hehan Tang, Chaozhe Zhu, Qiyong Gong, Yufeng Zang, et al. Uncovering intrinsic modular organization of spontaneous brain activity in humans.PloS one, 4(4):e5226, 2009

work page 2009

[29] [29]

Biot: Biosignal transformer for cross-data learning in the wild

Chaoqi Yang, M Westover, and Jimeng Sun. Biot: Biosignal transformer for cross-data learning in the wild. InAdvances in Neural Information Processing Systems, volume 36, pages 78240–78260, 2023

work page 2023

[30] [30]

Eeg2rep: enhancing self-supervised eeg representation through informative masked inputs

Navid Mohammadi Foumani, Geoffrey Mackellar, Soheila Ghane, Saad Irtza, Nam Nguyen, and Mahsa Salehi. Eeg2rep: enhancing self-supervised eeg representation through informative masked inputs. In Proceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, pages 5544–5555, 2024

work page 2024

[31] [31]

Attention is all you need

Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Łukasz Kaiser, and Illia Polosukhin. Attention is all you need. InAdvances in neural information processing systems, volume 30, 2017

work page 2017

[32] [32]

Long range arena : A benchmark for efficient transformers

Yi Tay, Mostafa Dehghani, Samira Abnar, Yikang Shen, Dara Bahri, Philip Pham, Jinfeng Rao, Liu Yang, Sebastian Ruder, and Donald Metzler. Long range arena : A benchmark for efficient transformers. In The Ninth International Conference on Learning Representations, 2021. 12

work page 2021

[33] [33]

An Efficient Self-Supervised Framework for Long-Sequence EEG Modeling

Jiazhen Hong, Geoffrey Mackellar, and Soheila Ghane. Eegm2: An efficient mamba-2-based self-supervised framework for long-sequence eeg modeling.arXiv preprint arXiv:2502.17873, 2025

work page internal anchor Pith review Pith/arXiv arXiv 2025

[34] [34]

Femba: Effi- cient and scalable eeg analysis with a bidirectional mamba foundation model.arXiv preprint arXiv:2502.06438, 2025

Anna Tegon, Thorir Mar Ingolfsson, Xiaying Wang, Luca Benini, and Yawei Li. Femba: Effi- cient and scalable eeg analysis with a bidirectional mamba foundation model.arXiv preprint arXiv:2502.06438, 2025

work page arXiv 2025

[35] [35]

Springer Publishing Company, 2021

William O Tatum IV.Handbook of EEG interpretation. Springer Publishing Company, 2021

work page 2021

[36] [36]

Efficiently Modeling Long Sequences with Structured State Spaces

Albert Gu, Karan Goel, and Christopher Ré. Efficiently modeling long sequences with structured state spaces. arXiv preprint arXiv:2111.00396, 2021

work page internal anchor Pith review Pith/arXiv arXiv 2021

[37] [37]

Simplified state space layers for sequence modeling

Jimmy TH Smith, Andrew Warrington, and Scott W Linderman. Simplified state space layers for sequence modeling. InICLR, 2023

work page 2023

[38] [38]

What makes convolutional models great on long sequence modeling?arXiv preprint arXiv:2210.09298, 2022

Yuhong Li, Tianle Cai, Yi Zhang, Deming Chen, and Debadeepta Dey. What makes convolutional models great on long sequence modeling?arXiv preprint arXiv:2210.09298, 2022

work page arXiv 2022

[39] [39]

On the parameterization and ini- tialization of diagonal state space models.Advances in Neural Information Processing Systems, 35:35971–35983, 2022

Albert Gu, Karan Goel, Ankit Gupta, and Christopher Ré. On the parameterization and ini- tialization of diagonal state space models.Advances in Neural Information Processing Systems, 35:35971–35983, 2022

work page 2022

[40] [40]

Longformer: The Long-Document Transformer

Iz Beltagy, Matthew E Peters, and Arman Cohan. Longformer: The long-document transformer. arXiv preprint arXiv:2004.05150, 2020

work page internal anchor Pith review Pith/arXiv arXiv 2004

[41] [41]

Internimage: Exploring large-scale vision foundation models with deformable convolutions

Wenhai Wang, Jifeng Dai, Zhe Chen, Zhenhang Huang, Zhiqi Li, Xizhou Zhu, Xiaowei Hu, Tong Lu, Lewei Lu, Hongsheng Li, et al. Internimage: Exploring large-scale vision foundation models with deformable convolutions. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 14408–14419, 2023

work page 2023

[42] [42]

GPT-4 Technical Report

Josh Achiam, Steven Adler, Sandhini Agarwal, Lama Ahmad, Ilge Akkaya, Florencia Leoni Aleman, Diogo Almeida, Janko Altenschmidt, Sam Altman, Shyamal Anadkat, et al. Gpt-4 technical report. arXiv preprint arXiv:2303.08774, 2023

work page internal anchor Pith review Pith/arXiv arXiv 2023

[43] [43]

Graphsleepnet: Adaptive spatial-temporal graph convolutional networks for sleep stage classification

Ziyu Jia, Youfang Lin, Jing Wang, Ronghao Zhou, Xiaojun Ning, Yuanlai He, and Yaoshuai Zhao. Graphsleepnet: Adaptive spatial-temporal graph convolutional networks for sleep stage classification. InIjcai, volume 2021, pages 1324–1330, 2020

work page 2021

[44] [44]

Caresleepnet: a hybrid deep learning network for automatic sleep staging.IEEE Journal of Biomedical and Health Informatics, 2024

Jiquan Wang, Sha Zhao, Haiteng Jiang, Yangxuan Zhou, Zhenghe Yu, Tao Li, Shijian Li, and Gang Pan. Caresleepnet: a hybrid deep learning network for automatic sleep staging.IEEE Journal of Biomedical and Health Informatics, 2024

work page 2024

[45] [45]

Long-term eeg partitioning for seizure onset detection

Zheng Chen, Yasuko Matsubara, Yasushi Sakurai, and Jimeng Sun. Long-term eeg partitioning for seizure onset detection. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 39, pages 14221–14229, 2025

work page 2025

[46] [46]

Large cognition model: Towards pretrained eeg foundation model.arXiv preprint arXiv:2502.17464, 2025

Chi-Sheng Chen, Ying-Jung Chen, and Aidan Hung-Wen Tsai. Large cognition model: Towards pretrained eeg foundation model.arXiv preprint arXiv:2502.17464, 2025

work page arXiv 2025

[47] [47]

Bendr: Using transformers and a contrastive self-supervised learning task to learn from massive amounts of eeg data.Frontiers in Human Neuroscience, 15:653659, 2021

Demetres Kostas, Stephane Aroca-Ouellette, and Frank Rudzicz. Bendr: Using transformers and a contrastive self-supervised learning task to learn from massive amounts of eeg data.Frontiers in Human Neuroscience, 15:653659, 2021

work page 2021

[48] [48]

Brant: Foundation model for intracranial neural signal

Daoze Zhang, Zhizhang Yuan, Yang Yang, Junru Chen, Jingjing Wang, and Yafeng Li. Brant: Foundation model for intracranial neural signal. Advances in Neural Information Processing Systems, 36:26304–26321, 2023

work page 2023

[49] [49]

Brant-2: Foundation model for brain signals.CoRR, 2024

Zhizhang Yuan, Daoze Zhang, Junru Chen, Gefei Gu, and Yang Yang. Brant-2: Foundation model for brain signals.CoRR, 2024

work page 2024

[50] [50]

Brant-x: A unified physiological signal alignment framework

Daoze Zhang, Zhizhang Yuan, Junru Chen, Kerui Chen, and Yang Yang. Brant-x: A unified physiological signal alignment framework. InProceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, pages 4155–4166, 2024. 13

work page 2024

[51] [51]

Neurolm: A universal multi-task foundation model for bridging the gap between language and eeg signals

Weibang Jiang, Yansen Wang, Bao-liang Lu, and Dongsheng Li. Neurolm: A universal multi-task foundation model for bridging the gap between language and eeg signals. InThe Thirteenth International Conference on Learning Representations, 2025

work page 2025

[52] [52]

Neural machine translation of rare words with subword units

Rico Sennrich, Barry Haddow, and Alexandra Birch. Neural machine translation of rare words with subword units. In54th Annual Meeting of the Association for Computational Linguistics, pages 1715–1725. Association for Computational Linguistics (ACL), 2016

work page 2016

[53] [53]

Sentencepiece: A simple and language independent subword tokenizer and detokenizer for neural text processing

Taku Kudo and John Richardson. Sentencepiece: A simple and language independent subword tokenizer and detokenizer for neural text processing. InProceedings of the 2018 Conference on Empirical Methods in Natural Language Processing: System Demonstrations, pages 66–71, 2018

work page 2018

[54] [54]

Deep state space models for time series forecasting.Advances in neural information processing systems, 31, 2018

Syama Sundar Rangapuram, Matthias W Seeger, Jan Gasthaus, Lorenzo Stella, Yuyang Wang, and Tim Januschowski. Deep state space models for time series forecasting.Advances in neural information processing systems, 31, 2018

work page 2018

[55] [55]

Hippo: Recurrent memory with optimal polynomial projections.Advances in neural information processing systems, 33:1474– 1487, 2020

Albert Gu, Tri Dao, Stefano Ermon, Atri Rudra, and Christopher Ré. Hippo: Recurrent memory with optimal polynomial projections.Advances in neural information processing systems, 33:1474– 1487, 2020

work page 2020

[56] [56]

Simplified State Space Layers for Sequence Modeling

Jimmy TH Smith, Andrew Warrington, and Scott W Linderman. Simplified state space layers for sequence modeling. arXiv preprint arXiv:2208.04933, 2022

work page internal anchor Pith review Pith/arXiv arXiv 2022

[57] [57]

Aniruddh Raghu, Payal Chandak, Ridwan Alam, John Guttag, and Collin M. Stultz. Sequential multi-dimensional self-supervised learning for clinical time series, 2023

work page 2023

[58] [58]

Fu, Tri Dao, Khaled K

Tri Dao, Daniel Y Fu, Khaled K Saab, Armin W Thomas, Atri Rudra, and Christopher Ré. Hungry hungry hippos: Towards language modeling with state space models.arXiv preprint arXiv:2212.14052, 2022

work page arXiv 2022

[59] [59]

Deep latent state space models for time-series generation

Linqi Zhou, Michael Poli, Winnie Xu, Stefano Massaroli, and Stefano Ermon. Deep latent state space models for time-series generation. InInternational Conference on Machine Learning, pages 42625–42643. PMLR, 2023

work page 2023

[60] [60]

Eeg-ssm: Leveraging state-space model for dementia detection

Xuan-The Tran, LinhLe, QuocToan Nguyen, Thomas Do, andChin-Teng Lin. Eeg-ssm: Leveraging state-space model for dementia detection.arXiv preprint arXiv:2407.17801, 2024

work page arXiv 2024

[61] [61]

Eegmamba: Bidirectional state space model with mixture of experts for eeg multi-task classification, 2024

Yiyu Gui, MingZhi Chen, Yuqi Su, Guibo Luo, and Yuchao Yang. Eegmamba: Bidirectional state space model with mixture of experts for eeg multi-task classification, 2024

work page 2024

[62] [62]

An algorithm for the machine calculation of complex fourier series

James W Cooley and John W Tukey. An algorithm for the machine calculation of complex fourier series. Mathematics of computation, 19(90):297–301, 1965

work page 1965

[63] [63]

Clocs: Contrastive learning of cardiac signals across space, time, and patients

Dani Kiyasseh, Tingting Zhu, and David A Clifton. Clocs: Contrastive learning of cardiac signals across space, time, and patients. InInternational Conference on Machine Learning, pages 5606–5615. PMLR, 2021

work page 2021

[64] [64]

A simple framework for contrastive learning of visual representations

Ting Chen, Simon Kornblith, Mohammad Norouzi, and Geoffrey Hinton. A simple framework for contrastive learning of visual representations. InInternational conference on machine learning, pages 1597–1607. PmLR, 2020

work page 2020

[65] [65]

Root mean square layer normalization, 2019

Biao Zhang and Rico Sennrich. Root mean square layer normalization, 2019

work page 2019

[66] [66]

The temple university hospital eeg data corpus.Frontiers in neuroscience, 10:196, 2016

Iyad Obeid and Joseph Picone. The temple university hospital eeg data corpus.Frontiers in neuroscience, 10:196, 2016

work page 2016

[67] [67]

The ten-twenty electrode system of the international federation.Electroenceph clin Neurophysiol, 10:367–380, 1958

JASPER HH. The ten-twenty electrode system of the international federation.Electroenceph clin Neurophysiol, 10:367–380, 1958

work page 1958

[68] [68]

A large finer-grained affective computing eeg dataset.Scientific Data, 10(1):740, 2023

Jingjing Chen, Xiaobin Wang, Chen Huang, Xin Hu, Xinke Shen, and Dan Zhang. A large finer-grained affective computing eeg dataset.Scientific Data, 10(1):740, 2023. 14

work page 2023

[69] [69]

Wei Liu, Jie-Lin Qiu, Wei-Long Zheng, and Bao-Liang Lu. Comparing recognition performance and robustness of multimodal deep learning models for multimodal emotion recognition.IEEE Transactions on Cognitive and Developmental Systems, 14(2):715–729, 2021

work page 2021

[70] [70]

Isruc-sleep: A com- prehensive public dataset for sleep researchers.Computer methods and programs in biomedicine, 124:180–192, 2016

Sirvan Khalighi, Teresa Sousa, José Moutinho Santos, and Urbano Nunes. Isruc-sleep: A com- prehensive public dataset for sleep researchers.Computer methods and programs in biomedicine, 124:180–192, 2016

work page 2016

[71] [71]

2020 international brain– computer interface competition: A review.Frontiers in human neuroscience, 16:898300, 2022

Ji-Hoon Jeong, Jeong-Hyun Cho, Young-Eun Lee, Seo-Hyun Lee, Gi-Hwan Shin, Young-Seok Kweon, José del R Millán, Klaus-Robert Müller, and Seong-Whan Lee. 2020 international brain– computer interface competition: A review.Frontiers in human neuroscience, 16:898300, 2022

work page 2020

[72] [72]

Application of machine learning to epileptic seizure onset detection and treatment

Ali Hossam Shoeb. Application of machine learning to epileptic seizure onset detection and treatment. PhD thesis, Massachusetts Institute of Technology, 2009

work page 2009

[73] [73]

MDD Patients and Healthy Controls EEG Data (New)

Wajid Mumtaz. MDD Patients and Healthy Controls EEG Data (New). Figshare, November 2016

work page 2016

[74] [74]

Physiobank, physiotoolkit, and physionet: components of a new research resource for complex physiologic signals

Ary L Goldberger, Luis AN Amaral, Leon Glass, Jeffrey M Hausdorff, Plamen Ch Ivanov, Roger G Mark, Joseph E Mietus, George B Moody, Chung-Kang Peng, and H Eugene Stanley. Physiobank, physiotoolkit, and physionet: components of a new research resource for complex physiologic signals. circulation, 101(23):e215–e220, 2000

work page 2000

[75] [75]

Efficiently modeling long sequences with structured state spaces

Albert Gu, Karan Goel, and Christopher Re. Efficiently modeling long sequences with structured state spaces. InThe Tenth International Conference on Learning Representations, 2022

work page 2022

[76] [76]

Simple hardware-efficient long convolutions for sequence modeling

Daniel Y Fu, Elliot L Epstein, Eric Nguyen, Armin W Thomas, Michael Zhang, Tri Dao, Atri Rudra, and Christopher Ré. Simple hardware-efficient long convolutions for sequence modeling. In International Conference on Machine Learning, pages 10373–10391. PMLR, 2023

work page 2023

[77] [77]

Emerging properties in self-supervised vision transformers

Mathilde Caron, Hugo Touvron, Ishan Misra, Hervé Jégou, Julien Mairal, Piotr Bojanowski, and Armand Joulin. Emerging properties in self-supervised vision transformers. InProceedings of the IEEE/CVF international conference on computer vision, pages 9650–9660, 2021

work page 2021

[78] [78]

Richard B Berry, Rita Brooks, Charlene E Gamaldo, Susan M Harding, Carole Marcus, Bradley V Vaughn, et al. The aasm manual for the scoring of sleep and associated events.Rules, Terminology and Technical Specifications, Darien, Illinois, American Academy of Sleep Medicine, 176(2012):7, 2012

work page 2012

[79] [79]

Eegnet: a compact convolutional neural network for eeg-based brain–computer interfaces

Vernon J Lawhern, Amelia J Solon, Nicholas R Waytowich, Stephen M Gordon, Chou P Hung, and Brent J Lance. Eegnet: a compact convolutional neural network for eeg-based brain–computer interfaces. Journal of neural engineering, 15(5):056013, 2018

work page 2018

[80] [80]

Transformer-based spatial-temporal feature learning for eeg decoding.arXiv preprint arXiv:2106.11170, 2021

Yonghao Song, Xueyu Jia, Lie Yang, and Longhan Xie. Transformer-based spatial-temporal feature learning for eeg decoding.arXiv preprint arXiv:2106.11170, 2021. 15 A Preliminaries Convolution State Space Models The state-space model is a classic model in control theory, and it represents the operational state of a system using first-order differential equa...

work page arXiv 2021