An Efficient Self-Supervised Framework for Long-Sequence EEG Modeling

Geoffrey Mackellar; Jiazhen Hong; Soheila Ghane

arxiv: 2502.17873 · v2 · submitted 2025-02-25 · 💻 cs.LG · eess.SP

An Efficient Self-Supervised Framework for Long-Sequence EEG Modeling

Jiazhen Hong , Geoffrey Mackellar , Soheila Ghane This is my paper

Pith reviewed 2026-05-23 02:29 UTC · model grok-4.3

classification 💻 cs.LG eess.SP

keywords EEGself-supervised learningMambalong-sequence modelingbrain-computer interfacesignal reconstructionU-shaped architecture

0 comments

The pith

EEGM2 uses Mamba-2 in a U-shaped encoder-decoder for linear-complexity self-supervised EEG modeling that captures long-range dependencies in raw signals.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper presents EEGM2 as a self-supervised framework that replaces Transformer architectures with Mamba-2 to handle long EEG sequences efficiently. It combines a U-shaped encoder-decoder structure with a reconstruction objective that mixes L1 loss and Fourier spectral loss to preserve both temporal dynamics and frequency content. This design yields linear scaling in computation and memory while maintaining performance on classification tasks. The resulting model shows strong cross-subject generalization and domain transfer, positioning it for use in resource-limited brain-computer interface hardware.

Core claim

EEGM2 adopts a U-shaped encoder-decoder architecture integrated with Mamba-2 to achieve linear computational complexity, thereby reducing memory usage and improving inference speed. The selective information propagation mechanism of Mamba-2 enables the model to effectively capture and preserve long-range dependencies in raw EEG signals. It employs a self-supervised pre-training objective that reconstructs raw EEG using a combined L1 and spectral (Fourier-based) loss, enhancing generalization by jointly preserving temporal dynamics and spectral characteristics.

What carries the argument

U-shaped encoder-decoder with Mamba-2 blocks and combined L1 plus Fourier spectral reconstruction loss for self-supervised pre-training on raw EEG.

If this is right

Achieves state-of-the-art results on both short- and long-sequence EEG modeling and classification.
Demonstrates consistent outperformance with strong generalization across subjects, tasks, and domains.
Provides an efficient, scalable option for deployment on resource-constrained BCI devices.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same linear-complexity backbone could be tested on other noisy biomedical time series such as ECG or EMG.
Real-time inference on wearable hardware becomes feasible once memory and speed scale linearly with sequence length.
Joint temporal-spectral reconstruction may transfer to pre-training on other non-stationary signals outside neuroscience.

Load-bearing premise

Mamba-2's selective propagation mechanism successfully captures long-range dependencies in raw EEG signals where RNNs and CNNs fail.

What would settle it

An experiment showing EEGM2 fails to match or exceed baseline performance on long-sequence EEG classification or reconstruction tasks would disprove the claimed advantages.

Figures

Figures reproduced from arXiv: 2502.17873 by Geoffrey Mackellar, Jiazhen Hong, Soheila Ghane.

**Figure 1.** Figure 1: Overview of the EEGM2 framework. (a) Reconstruction-based self-supervised pretraining, where the model learns to [PITH_FULL_IMAGE:figures/full_fig_p003_1.png] view at source ↗

**Figure 2.** Figure 2: Comparison of 2D t-SNE projections of (a) raw mean [PITH_FULL_IMAGE:figures/full_fig_p005_2.png] view at source ↗

**Figure 3.** Figure 3: Memory usage and inference speed across varying [PITH_FULL_IMAGE:figures/full_fig_p008_3.png] view at source ↗

**Figure 4.** Figure 4: Training time of EEGM2 and its variants across four [PITH_FULL_IMAGE:figures/full_fig_p010_4.png] view at source ↗

read the original abstract

Electroencephalogram (EEG) signals generally exhibit low signal-to-noise ratio (SNR) and high inter-subject variability, making generalization across subjects and domains challenging. Recent advances in deep learning, particularly self-supervised learning with Transformer-based architectures, have shown promise in EEG representation learning. However, their quadratic computational complexity increases memory usage and slows inference, making them inefficient for modeling long-range dependencies. Moreover, most existing approaches emphasize either explicit window segmentation of the temporal signal or spectral-only input embedding while neglecting raw temporal dynamics. In this paper, we propose EEGM2, a self-supervised framework that overcomes these limitations. EEGM2 adopts a U-shaped encoder-decoder architecture integrated with Mamba-2 to achieve linear computational complexity, thereby reducing memory usage and improving inference speed. Meanwhile, the selective information propagation mechanism of Mamba-2 enables the model to effectively capture and preserve long-range dependencies in raw EEG signals, where traditional RNN or CNN architectures often struggle. Moreover, EEGM2 employs a self-supervised pre-training objective that reconstructs raw EEG using a combined L1 and spectral (Fourier-based) loss, enhancing generalization by jointly preserving temporal dynamics and spectral characteristics. Experimental results demonstrate that EEGM2 achieves state-of-the-art performance in both short- and long-sequence modeling and classification. Further evaluations show that EEGM2 consistently outperforms existing models, demonstrating strong generalization across subjects and tasks, as well as transferability across domains. Overall, EEGM2 offers an efficient and scalable solution suitable for deployment on resource-constrained brain-computer interface (BCI) devices.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

EEGM2 applies Mamba-2 in a U-shaped self-supervised setup for EEG but the abstract states the long-range dependency advantage without any supporting analysis or ablations.

read the letter

The paper introduces EEGM2: a U-shaped encoder-decoder that uses Mamba-2 blocks instead of attention, pretrains with a combined L1 and Fourier reconstruction loss on raw EEG, and claims linear complexity plus SOTA results on both short and long sequences plus cross-subject and cross-domain transfer. That combination is the main new piece. It correctly identifies the quadratic cost problem with transformers on long raw signals and tries to keep both temporal and spectral structure in the objective, which is a reasonable direction for resource-limited BCI work. The architecture choice itself is straightforward once you know the Mamba papers. The soft spot is exactly the one the stress test flags. The abstract asserts that Mamba-2's selective propagation captures long-range dependencies in low-SNR EEG where RNNs and CNNs fail, yet supplies no length-ablation results, no state-decay measurements on actual EEG traces, and no comparison isolating the SSM component from the U-shape or the loss. The generalization and transfer claims therefore sit on an untested premise. From the abstract alone the soundness is low because the central efficiency and modeling advantage is not demonstrated. If the full paper contains those controls and the numbers are clean, the work would be worth citing for anyone building edge BCI pipelines. Otherwise it reads as an application note without the evidence needed to trust the performance edge. This is for the EEG modeling and BCI crowd who already follow SSM papers. A reader who needs a drop-in efficient long-sequence model might skim it for the loss formulation, but would wait for the experiments before adopting anything. I would send it to referees so they can check whether the claimed advantages actually appear in the results section.

Referee Report

2 major / 0 minor

Summary. The manuscript proposes EEGM2, a self-supervised U-shaped encoder-decoder framework that integrates Mamba-2 for linear-complexity modeling of long EEG sequences. It employs a combined L1 and Fourier-based spectral reconstruction loss during pre-training and asserts state-of-the-art results on short- and long-sequence modeling and classification tasks, plus strong cross-subject, cross-task, and cross-domain generalization suitable for resource-constrained BCI devices.

Significance. If the performance claims are substantiated, the work would demonstrate a practical efficiency advantage over quadratic Transformer models for EEG while addressing long-range temporal dependencies in low-SNR signals; this could matter for scalable BCI deployment.

major comments (2)

[Abstract] Abstract: the central claim that 'EEGM2 achieves state-of-the-art performance in both short- and long-sequence modeling and classification' and 'consistently outperforms existing models' is presented with no numerical results, tables, figures, or references to experimental sections, so the data supporting the claim cannot be evaluated.
[Abstract] Abstract: the load-bearing premise that 'the selective information propagation mechanism of Mamba-2 enables the model to effectively capture and preserve long-range dependencies in raw EEG signals, where traditional RNN or CNN architectures often struggle' receives no supporting analysis (e.g., sequence-length ablation, SSM state-decay comparison on EEG traces, or reconstruction error versus length metrics), leaving the claimed architectural advantage unverified.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback. We address each major comment below and commit to revisions that strengthen the abstract's support for our claims.

read point-by-point responses

Referee: [Abstract] Abstract: the central claim that 'EEGM2 achieves state-of-the-art performance in both short- and long-sequence modeling and classification' and 'consistently outperforms existing models' is presented with no numerical results, tables, figures, or references to experimental sections, so the data supporting the claim cannot be evaluated.

Authors: We agree that the abstract would benefit from more direct support. In the revised version we will incorporate concise numerical highlights (e.g., accuracy gains on key benchmarks) together with explicit references to the relevant tables and experimental sections. revision: yes
Referee: [Abstract] Abstract: the load-bearing premise that 'the selective information propagation mechanism of Mamba-2 enables the model to effectively capture and preserve long-range dependencies in raw EEG signals, where traditional RNN or CNN architectures often struggle' receives no supporting analysis (e.g., sequence-length ablation, SSM state-decay comparison on EEG traces, or reconstruction error versus length metrics), leaving the claimed architectural advantage unverified.

Authors: The main text already contains long-sequence modeling results and architecture comparisons. Nevertheless, we acknowledge that the abstract itself provides no supporting analysis. We will revise the abstract to reference the existing experimental evidence and add a brief sequence-length ablation summary in the main text to make the architectural advantage more explicit. revision: partial

Circularity Check

0 steps flagged

No circularity: empirical claims with no derivations or self-referential steps

full rationale

The paper advances EEGM2 as an empirical self-supervised model using Mamba-2 in a U-shaped architecture, with claims resting on experimental SOTA results for short- and long-sequence EEG tasks. No equations, parameter-fitting derivations, or mathematical chains appear in the abstract or described content. The premise that Mamba-2's selective mechanism captures long-range dependencies is stated as an architectural property but is not derived from prior results within the paper; it is presented as motivation for the design, with performance evaluated externally via experiments. No self-citations are invoked as load-bearing uniqueness theorems, no ansatzes are smuggled, and no predictions reduce to fitted inputs by construction. The derivation chain is therefore self-contained against external benchmarks (experimental outcomes), warranting score 0.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract provides no information on parameters, axioms, or entities.

pith-pipeline@v0.9.0 · 5817 in / 1198 out tokens · 33730 ms · 2026-05-23T02:29:59.077671+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

the selective information propagation mechanism of Mamba-2 enables the model to effectively capture and preserve long-range dependencies in raw EEG signals
IndisputableMonolith/Foundation/RealityFromDistinction.lean reality_from_one_distinction unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

U-shaped encoder-decoder architecture integrated with Mamba-2

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

CodeBrain: Bridging Decoupled Tokenizer and Multi-Scale Architecture for EEG Foundation Model
cs.LG 2025-06 unverdicted novelty 6.0

CodeBrain introduces a decoupled TFDual-Tokenizer and multi-scale EEGSSM architecture for an EEG foundation model pretrained on a large corpus, claiming strong generalization across eight downstream tasks and ten datasets.

Reference graph

Works this paper leans on

29 extracted references · 29 canonical work pages · cited by 1 Pith paper · 3 internal anchors

[1]

ChatBCI: A P300 speller bci leveraging large language models for improved sentence composition in realistic scenarios,

J. Hong, W. Wang, and L. Najafizadeh, “ChatBCI: A P300 speller bci leveraging large language models for improved sentence composition in realistic scenarios,”arXiv preprint arXiv:2411.15395, 2024

work page arXiv 2024
[2]

Automated EEG analysis of epilepsy: a review,

U. R. Acharya, S. V . Sree, G. Swapna, R. J. Martis, and J. S. Suri, “Automated EEG analysis of epilepsy: a review,”Knowledge-Based Systems, vol. 45, pp. 147–165, 2013

work page 2013
[3]

ST-CapsNet: linking spatial and temporal attention with capsule network for P300 detection improvement,

Z. Wang, C. Chen, J. Li, F. Wan, Y . Sun, and H. Wang, “ST-CapsNet: linking spatial and temporal attention with capsule network for P300 detection improvement,”IEEE Transactions on Neural Systems and Rehabilitation Engineering, vol. 31, pp. 991–1000, 2023

work page 2023
[4]

A deep learning framework based on dynamic channel selection for early classification of left and right hand motor imagery tasks,

J. Hong, F. Shamsi, and L. Najafizadeh, “A deep learning framework based on dynamic channel selection for early classification of left and right hand motor imagery tasks,” in2022 44th Annual International Conference of the IEEE Engineering in Medicine & Biology Society (EMBC). IEEE, 2022, pp. 3550–3553

work page 2022
[5]

EEG2Rep: enhancing self-supervised EEG represen- tation through informative masked inputs,

N. Mohammadi Foumani, G. Mackellar, S. Ghane, S. Irtza, N. Nguyen, and M. Salehi, “EEG2Rep: enhancing self-supervised EEG represen- tation through informative masked inputs,” inProceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, 2024, pp. 5544–5555

work page 2024
[6]

BENDR: Using trans- formers and a contrastive self-supervised learning task to learn from massive amounts of EEG data,

D. Kostas, S. Aroca-Ouellette, and F. Rudzicz, “BENDR: Using trans- formers and a contrastive self-supervised learning task to learn from massive amounts of EEG data,”Frontiers in Human Neuroscience, vol. 15, p. 653659, 2021

work page 2021
[7]

BIOT: Biosignal transformer for cross-data learning in the wild,

C. Yang, M. Westover, and J. Sun, “BIOT: Biosignal transformer for cross-data learning in the wild,”Advances in Neural Information Processing Systems, vol. 36, 2024

work page 2024
[8]

EEGPT: Pre- trained transformer for universal and reliable representation of EEG signals,

G. Wang, W. Liu, Y . He, C. Xu, L. Ma, and H. Li, “EEGPT: Pre- trained transformer for universal and reliable representation of EEG signals,”Advances in Neural Information Processing Systems, vol. 37, pp. 39 249–39 280, 2024

work page 2024
[9]

Multichannelsleepnet: A transformer-based model for automatic sleep stage classification with PSG,

Y . Dai, X. Li, S. Liang, L. Wang, Q. Duan, H. Yang, C. Zhang, X. Chen, L. Li, X. Liet al., “Multichannelsleepnet: A transformer-based model for automatic sleep stage classification with PSG,”IEEE Journal of Biomedical and Health Informatics, vol. 27, no. 9, pp. 4204–4215, 2023

work page 2023
[10]

Efficiently Modeling Long Sequences with Structured State Spaces

A. Gu, K. Goel, and C. R ´e, “Efficiently modeling long sequences with structured state spaces,”arXiv preprint arXiv:2111.00396, 2021

work page internal anchor Pith review Pith/arXiv arXiv 2021
[11]

EEGFormer: Towards transferable and interpretable large-scale EEG foundation model,

Y . Chen, K. Ren, K. Song, Y . Wang, Y . Wang, D. Li, and L. Qiu, “EEGFormer: Towards transferable and interpretable large-scale EEG foundation model,”arXiv preprint arXiv:2401.10278, 2024

work page arXiv 2024
[12]

Large brain model for learning generic representations with tremendous EEG data in BCI,

W.-B. Jiang, L.-M. Zhao, and B.-L. Lu, “Large brain model for learning generic representations with tremendous EEG data in BCI,”arXiv preprint arXiv:2405.18765, 2024

work page arXiv 2024
[13]

MAEEG: Masked auto-encoder for EEG representation learning,

H.-Y . S. Chien, H. Goh, C. M. Sandino, and J. Y . Cheng, “MAEEG: Masked auto-encoder for EEG representation learning,”arXiv preprint arXiv:2211.02625, 2022

work page arXiv 2022
[14]

U-Net: Convolutional net- works for biomedical image segmentation,

O. Ronneberger, P. Fischer, and T. Brox, “U-Net: Convolutional net- works for biomedical image segmentation,” inMedical image computing and computer-assisted intervention–MICCAI 2015: 18th international conference, Munich, Germany, October 5-9, 2015, proceedings, part III

work page 2015
[15]

Springer, 2015, pp. 234–241

work page 2015
[16]

Transformers are SSMs: Generalized Models and Efficient Algorithms Through Structured State Space Duality

T. Dao and A. Gu, “Transformers are SSMs: Generalized models and efficient algorithms through structured state space duality,”arXiv preprint arXiv:2405.21060, 2024

work page internal anchor Pith review Pith/arXiv arXiv 2024
[17]

Diffusion models in vision: A survey,

F.-A. Croitoru, V . Hondru, R. T. Ionescu, and M. Shah, “Diffusion models in vision: A survey,”IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 45, no. 9, pp. 10 850–10 869, 2023

work page 2023
[18]

Mamba: Linear-Time Sequence Modeling with Selective State Spaces

A. Gu and T. Dao, “Mamba: Linear-time sequence modeling with selective state spaces,”arXiv preprint arXiv:2312.00752, 2023

work page internal anchor Pith review Pith/arXiv arXiv 2023
[19]

Jamba: Hybrid transformer-mamba language models,

B. Lenz, O. Lieber, A. Arazi, A. Bergman, A. Manevich, B. Peleg, B. Aviram, C. Almagor, C. Fridman, D. Padnoset al., “Jamba: Hybrid transformer-mamba language models,” inThe Thirteenth International Conference on Learning Representations, 2025

work page 2025
[20]

Zamba: A compact 7B SSM hybrid model,

P. Glorioso, Q. Anthony, Y . Tokpanov, J. Whittington, J. Pilault, A. Ibrahim, and B. Millidge, “Zamba: A compact 7B SSM hybrid model,”arXiv preprint arXiv:2405.16712, 2024

work page arXiv 2024
[21]

Checkerboard artifacts free convolutional neural networks,

Y . Sugawara, S. Shiota, and H. Kiya, “Checkerboard artifacts free convolutional neural networks,”APSIPA Transactions on Signal and Information Processing, vol. 8, p. e9, 2019

work page 2019
[22]

L1 vs. L2 regularization in text classification when learning from labeled features,

S. Mazilu and J. Iria, “L1 vs. L2 regularization in text classification when learning from labeled features,” in2011 10th international conference on machine learning and applications and workshops, vol. 1. IEEE, 2011, pp. 166–171

work page 2011
[23]

Automated identification of abnormal adult EEGs,

S. Lopez, G. Suarez, D. Jungreis, I. Obeid, and J. Picone, “Automated identification of abnormal adult EEGs,” in2015 IEEE signal processing in medicine and biology symposium (SPMB). IEEE, 2015, pp. 1–5

work page 2015
[24]

Crowdsourced EEG experiments: A proof of concept for remote EEG acquisition using EmotivPRO builder and EmotivLABS,

N. S. Williams, W. King, G. Mackellar, R. Randeniya, A. McCormick, and N. A. Badcock, “Crowdsourced EEG experiments: A proof of concept for remote EEG acquisition using EmotivPRO builder and EmotivLABS,”Heliyon, vol. 9, no. 8, 2023

work page 2023
[25]

STEW: Simultaneous task EEG workload data set,

W. L. Lim, O. Sourina, and L. P. Wang, “STEW: Simultaneous task EEG workload data set,”IEEE Transactions on Neural Systems and Rehabilitation Engineering, vol. 26, no. 11, pp. 2106–2114, 2018

work page 2018
[26]

Super-convergence: Very fast training of neural networks using large learning rates,

L. N. Smith and N. Topin, “Super-convergence: Very fast training of neural networks using large learning rates,” inArtificial intelligence and machine learning for multi-domain operations applications, vol. 11006. SPIE, 2019, pp. 369–386

work page 2019
[27]

Visualizing data using t-SNE,

L. Van der Maaten and G. Hinton, “Visualizing data using t-SNE,” Journal of machine learning research, vol. 9, no. 11, 2008

work page 2008
[28]

Motor imagery EEG clas- sification algorithm based on CNN-LSTM feature fusion network,

H. Li, M. Ding, R. Zhang, and C. Xiu, “Motor imagery EEG clas- sification algorithm based on CNN-LSTM feature fusion network,” Biomedical signal processing and control, vol. 72, p. 103342, 2022

work page 2022
[29]

Transformer convolutional neural networks for automated artifact detection in scalp EEG,

W. Y . Peh, Y . Yao, and J. Dauwels, “Transformer convolutional neural networks for automated artifact detection in scalp EEG,” in2022 44th Annual International Conference of the IEEE Engineering in Medicine & Biology Society (EMBC). IEEE, 2022, pp. 3599–3602. APPENDIX A. Emotiv Dataset Attentiondataset was collected through an experiment where subjects c...

work page 2022

[1] [1]

ChatBCI: A P300 speller bci leveraging large language models for improved sentence composition in realistic scenarios,

J. Hong, W. Wang, and L. Najafizadeh, “ChatBCI: A P300 speller bci leveraging large language models for improved sentence composition in realistic scenarios,”arXiv preprint arXiv:2411.15395, 2024

work page arXiv 2024

[2] [2]

Automated EEG analysis of epilepsy: a review,

U. R. Acharya, S. V . Sree, G. Swapna, R. J. Martis, and J. S. Suri, “Automated EEG analysis of epilepsy: a review,”Knowledge-Based Systems, vol. 45, pp. 147–165, 2013

work page 2013

[3] [3]

ST-CapsNet: linking spatial and temporal attention with capsule network for P300 detection improvement,

Z. Wang, C. Chen, J. Li, F. Wan, Y . Sun, and H. Wang, “ST-CapsNet: linking spatial and temporal attention with capsule network for P300 detection improvement,”IEEE Transactions on Neural Systems and Rehabilitation Engineering, vol. 31, pp. 991–1000, 2023

work page 2023

[4] [4]

A deep learning framework based on dynamic channel selection for early classification of left and right hand motor imagery tasks,

J. Hong, F. Shamsi, and L. Najafizadeh, “A deep learning framework based on dynamic channel selection for early classification of left and right hand motor imagery tasks,” in2022 44th Annual International Conference of the IEEE Engineering in Medicine & Biology Society (EMBC). IEEE, 2022, pp. 3550–3553

work page 2022

[5] [5]

EEG2Rep: enhancing self-supervised EEG represen- tation through informative masked inputs,

N. Mohammadi Foumani, G. Mackellar, S. Ghane, S. Irtza, N. Nguyen, and M. Salehi, “EEG2Rep: enhancing self-supervised EEG represen- tation through informative masked inputs,” inProceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, 2024, pp. 5544–5555

work page 2024

[6] [6]

BENDR: Using trans- formers and a contrastive self-supervised learning task to learn from massive amounts of EEG data,

D. Kostas, S. Aroca-Ouellette, and F. Rudzicz, “BENDR: Using trans- formers and a contrastive self-supervised learning task to learn from massive amounts of EEG data,”Frontiers in Human Neuroscience, vol. 15, p. 653659, 2021

work page 2021

[7] [7]

BIOT: Biosignal transformer for cross-data learning in the wild,

C. Yang, M. Westover, and J. Sun, “BIOT: Biosignal transformer for cross-data learning in the wild,”Advances in Neural Information Processing Systems, vol. 36, 2024

work page 2024

[8] [8]

EEGPT: Pre- trained transformer for universal and reliable representation of EEG signals,

G. Wang, W. Liu, Y . He, C. Xu, L. Ma, and H. Li, “EEGPT: Pre- trained transformer for universal and reliable representation of EEG signals,”Advances in Neural Information Processing Systems, vol. 37, pp. 39 249–39 280, 2024

work page 2024

[9] [9]

Multichannelsleepnet: A transformer-based model for automatic sleep stage classification with PSG,

Y . Dai, X. Li, S. Liang, L. Wang, Q. Duan, H. Yang, C. Zhang, X. Chen, L. Li, X. Liet al., “Multichannelsleepnet: A transformer-based model for automatic sleep stage classification with PSG,”IEEE Journal of Biomedical and Health Informatics, vol. 27, no. 9, pp. 4204–4215, 2023

work page 2023

[10] [10]

Efficiently Modeling Long Sequences with Structured State Spaces

A. Gu, K. Goel, and C. R ´e, “Efficiently modeling long sequences with structured state spaces,”arXiv preprint arXiv:2111.00396, 2021

work page internal anchor Pith review Pith/arXiv arXiv 2021

[11] [11]

EEGFormer: Towards transferable and interpretable large-scale EEG foundation model,

Y . Chen, K. Ren, K. Song, Y . Wang, Y . Wang, D. Li, and L. Qiu, “EEGFormer: Towards transferable and interpretable large-scale EEG foundation model,”arXiv preprint arXiv:2401.10278, 2024

work page arXiv 2024

[12] [12]

Large brain model for learning generic representations with tremendous EEG data in BCI,

W.-B. Jiang, L.-M. Zhao, and B.-L. Lu, “Large brain model for learning generic representations with tremendous EEG data in BCI,”arXiv preprint arXiv:2405.18765, 2024

work page arXiv 2024

[13] [13]

MAEEG: Masked auto-encoder for EEG representation learning,

H.-Y . S. Chien, H. Goh, C. M. Sandino, and J. Y . Cheng, “MAEEG: Masked auto-encoder for EEG representation learning,”arXiv preprint arXiv:2211.02625, 2022

work page arXiv 2022

[14] [14]

U-Net: Convolutional net- works for biomedical image segmentation,

O. Ronneberger, P. Fischer, and T. Brox, “U-Net: Convolutional net- works for biomedical image segmentation,” inMedical image computing and computer-assisted intervention–MICCAI 2015: 18th international conference, Munich, Germany, October 5-9, 2015, proceedings, part III

work page 2015

[15] [15]

Springer, 2015, pp. 234–241

work page 2015

[16] [16]

Transformers are SSMs: Generalized Models and Efficient Algorithms Through Structured State Space Duality

T. Dao and A. Gu, “Transformers are SSMs: Generalized models and efficient algorithms through structured state space duality,”arXiv preprint arXiv:2405.21060, 2024

work page internal anchor Pith review Pith/arXiv arXiv 2024

[17] [17]

Diffusion models in vision: A survey,

F.-A. Croitoru, V . Hondru, R. T. Ionescu, and M. Shah, “Diffusion models in vision: A survey,”IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 45, no. 9, pp. 10 850–10 869, 2023

work page 2023

[18] [18]

Mamba: Linear-Time Sequence Modeling with Selective State Spaces

A. Gu and T. Dao, “Mamba: Linear-time sequence modeling with selective state spaces,”arXiv preprint arXiv:2312.00752, 2023

work page internal anchor Pith review Pith/arXiv arXiv 2023

[19] [19]

Jamba: Hybrid transformer-mamba language models,

B. Lenz, O. Lieber, A. Arazi, A. Bergman, A. Manevich, B. Peleg, B. Aviram, C. Almagor, C. Fridman, D. Padnoset al., “Jamba: Hybrid transformer-mamba language models,” inThe Thirteenth International Conference on Learning Representations, 2025

work page 2025

[20] [20]

Zamba: A compact 7B SSM hybrid model,

P. Glorioso, Q. Anthony, Y . Tokpanov, J. Whittington, J. Pilault, A. Ibrahim, and B. Millidge, “Zamba: A compact 7B SSM hybrid model,”arXiv preprint arXiv:2405.16712, 2024

work page arXiv 2024

[21] [21]

Checkerboard artifacts free convolutional neural networks,

Y . Sugawara, S. Shiota, and H. Kiya, “Checkerboard artifacts free convolutional neural networks,”APSIPA Transactions on Signal and Information Processing, vol. 8, p. e9, 2019

work page 2019

[22] [22]

L1 vs. L2 regularization in text classification when learning from labeled features,

S. Mazilu and J. Iria, “L1 vs. L2 regularization in text classification when learning from labeled features,” in2011 10th international conference on machine learning and applications and workshops, vol. 1. IEEE, 2011, pp. 166–171

work page 2011

[23] [23]

Automated identification of abnormal adult EEGs,

S. Lopez, G. Suarez, D. Jungreis, I. Obeid, and J. Picone, “Automated identification of abnormal adult EEGs,” in2015 IEEE signal processing in medicine and biology symposium (SPMB). IEEE, 2015, pp. 1–5

work page 2015

[24] [24]

Crowdsourced EEG experiments: A proof of concept for remote EEG acquisition using EmotivPRO builder and EmotivLABS,

N. S. Williams, W. King, G. Mackellar, R. Randeniya, A. McCormick, and N. A. Badcock, “Crowdsourced EEG experiments: A proof of concept for remote EEG acquisition using EmotivPRO builder and EmotivLABS,”Heliyon, vol. 9, no. 8, 2023

work page 2023

[25] [25]

STEW: Simultaneous task EEG workload data set,

W. L. Lim, O. Sourina, and L. P. Wang, “STEW: Simultaneous task EEG workload data set,”IEEE Transactions on Neural Systems and Rehabilitation Engineering, vol. 26, no. 11, pp. 2106–2114, 2018

work page 2018

[26] [26]

Super-convergence: Very fast training of neural networks using large learning rates,

L. N. Smith and N. Topin, “Super-convergence: Very fast training of neural networks using large learning rates,” inArtificial intelligence and machine learning for multi-domain operations applications, vol. 11006. SPIE, 2019, pp. 369–386

work page 2019

[27] [27]

Visualizing data using t-SNE,

L. Van der Maaten and G. Hinton, “Visualizing data using t-SNE,” Journal of machine learning research, vol. 9, no. 11, 2008

work page 2008

[28] [28]

Motor imagery EEG clas- sification algorithm based on CNN-LSTM feature fusion network,

H. Li, M. Ding, R. Zhang, and C. Xiu, “Motor imagery EEG clas- sification algorithm based on CNN-LSTM feature fusion network,” Biomedical signal processing and control, vol. 72, p. 103342, 2022

work page 2022

[29] [29]

Transformer convolutional neural networks for automated artifact detection in scalp EEG,

W. Y . Peh, Y . Yao, and J. Dauwels, “Transformer convolutional neural networks for automated artifact detection in scalp EEG,” in2022 44th Annual International Conference of the IEEE Engineering in Medicine & Biology Society (EMBC). IEEE, 2022, pp. 3599–3602. APPENDIX A. Emotiv Dataset Attentiondataset was collected through an experiment where subjects c...

work page 2022