Channel Adaptation for EEG Foundation Models: A Systematic Benchmark Across Architectures, Tasks, and Training Regimes
Pith reviewed 2026-05-08 08:31 UTC · model grok-4.3
The pith
EEG foundation models need architecture-specific channel adaptation, with a compact model outperforming much larger ones.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Systematic trials show that the best channel adaptation method is not universal but depends on the model's architecture and training regime. Rigid-montage models such as BENDR and Neuro-GPT improve with external techniques like Conv1d projection or SSI, while flexible models such as EEGPT and CBraMod reach or exceed that performance when fine-tuned without adaptation yet benefit from it under frozen-encoder deployment. External adaptation can produce negative transfer when applied during fine-tuning of flexible models. Across the tasks, the compact 5M-parameter CBraMod consistently surpasses far larger models, aligning with separate evidence that EEG-tailored small architectures can compete.
What carries the argument
Four channel adaptation methods—Conv1d projection, spherical spline interpolation, source-space decomposition, and Riemannian re-centering—used to align heterogeneous EEG electrode montages to the expectations of pretrained foundation models under different training regimes.
If this is right
- Rigid-montage models require external adaptation to work with new electrode configurations.
- Flexible models reach native performance when fine-tuned but gain from external adaptation in frozen-encoder settings.
- External adaptation can cause negative transfer if used during fine-tuning of flexible models.
- The optimal adaptation technique changes with model architecture and task.
- Compact EEG-specific models can exceed the performance of models many times their size.
Where Pith is reading between the lines
- Future EEG foundation models may benefit from building in montage flexibility to reduce the need for separate adaptation steps during training.
- The results encourage prioritizing parameter-efficient, domain-specific designs over simply increasing model size.
- Repeating the benchmark on clinical real-time tasks could expose additional practical limits not seen in the current offline datasets.
- Effective handling of montage differences would allow larger pretraining datasets drawn from many recording sites.
Load-bearing premise
The five chosen models, five tasks, and four adaptation methods are representative of the larger space of EEG foundation models and channel adaptation techniques.
What would settle it
A new large model that performs best with one fixed adaptation method across all architectures and tasks, or a replication where the 5M-parameter CBraMod no longer outperforms larger models on four of five datasets, would undermine the architecture-dependence and compact-model claims.
Figures
read the original abstract
Scaling EEG foundation models requires pooling data across heterogeneous electrode montages, a prerequisite both for larger pretraining corpora and for downstream deployment. We present the first systematic comparison of four channel adaptation methods (Conv1d projection, spherical spline interpolation (SSI), source-space decomposition, and Riemannian re-centering) across five pretrained EEG foundation models (5M--157M parameters), five downstream tasks, and two training regimes with 10--15 random seeds each. We find that rigid-montage models (BENDR, Neuro-GPT) require external adaptation, while flexible models (EEGPT, CBraMod) match or exceed it natively when fine-tuned but benefit from external methods under frozen-encoder deployment. A probe-SFT asymmetry exists: external adaptation can cause severe negative transfer during fine-tuning of flexible models. The optimal method is architecture-dependent (Conv1d for BENDR, SSI/Riemannian for Neuro-GPT, source-space decomposition for depression detection), and 5M-parameter CBraMod outperforms models up to 31$\times$ larger on 4/5 datasets, consistent with independent findings that compact EEG-specific architectures can match larger models.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper conducts the first systematic benchmark of four channel adaptation methods (Conv1d projection, spherical spline interpolation (SSI), source-space decomposition, and Riemannian re-centering) across five pretrained EEG foundation models (5M–157M parameters), five downstream tasks, and two training regimes (fine-tuning and frozen-encoder), each with 10–15 random seeds. It reports that rigid-montage models require external adaptation while flexible models match or exceed it natively under fine-tuning but can suffer severe negative transfer; optimal methods are architecture-dependent (e.g., Conv1d for BENDR, SSI/Riemannian for Neuro-GPT, source-space for depression detection); and the 5M-parameter CBraMod outperforms models up to 31× larger on 4/5 datasets.
Significance. If the empirical patterns hold, the work addresses a practical barrier to scaling EEG foundation models by enabling data pooling across heterogeneous montages and offers guidance for deployment. The multi-seed, multi-architecture, multi-regime design provides a useful reference point for the community and reinforces independent observations that compact EEG-specific models can compete with much larger ones.
major comments (1)
- [Abstract] Abstract: The central claims—that optimal adaptation is architecture-dependent and that the 5M-parameter CBraMod outperforms models up to 31× larger on 4/5 datasets—depend on the representativeness of the five selected models and five tasks. No explicit coverage argument, sensitivity analysis, or justification for this selection is provided, leaving the generalizability of the headline findings as the least secure link in the argument.
minor comments (2)
- [Abstract] Abstract: The phrase 'severe negative transfer' is used without accompanying quantitative thresholds or conditions; a brief indication of effect size or statistical support would improve precision.
- [Abstract] Abstract: The specific identities of all five models and five tasks are not listed; including them would aid immediate context for readers.
Simulated Author's Rebuttal
We thank the referee for the constructive feedback and positive assessment of the paper's significance. We address the single major comment below and will incorporate revisions to strengthen the manuscript.
read point-by-point responses
-
Referee: The central claims—that optimal adaptation is architecture-dependent and that the 5M-parameter CBraMod outperforms models up to 31× larger on 4/5 datasets—depend on the representativeness of the five selected models and five tasks. No explicit coverage argument, sensitivity analysis, or justification for this selection is provided, leaving the generalizability of the headline findings as the least secure link in the argument.
Authors: We agree that the manuscript would benefit from an explicit justification of model and task selection to support the generalizability of the central claims. The five models were deliberately chosen to span key architectural axes relevant to channel adaptation: rigid-montage models (BENDR, Neuro-GPT) versus flexible-montage models (EEGPT, CBraMod), together with a range of parameter scales (5M–157M) that reflect publicly available EEG foundation models at the time of the study. The five downstream tasks were selected to cover diverse EEG applications, including motor imagery, emotion recognition, sleep staging, seizure detection, and clinical depression detection. We will add a dedicated subsection (likely in Methods or a new Limitations paragraph) that states these selection criteria, notes the coverage of rigid vs. flexible architectures and task domains, and acknowledges that the set is not exhaustive. While a full sensitivity analysis on alternative model/task combinations is not feasible within the current experimental budget, the reported results already include 10–15 random seeds per configuration to quantify variability. This addition will be made in the revised manuscript. revision: yes
Circularity Check
No circularity: purely empirical benchmark with no derivations or self-referential fits
full rationale
The paper reports direct experimental comparisons of four adaptation methods across five models, five tasks, and two regimes, with results stated as observations from runs with 10-15 seeds. No equations, parameter fits, or predictions are presented as derived from inputs by construction; the architecture-dependent optimality and CBraMod outperformance are empirical outcomes, not reductions of the selection criteria. Self-citations are absent from load-bearing claims, and the representativeness of the chosen models/tasks is an explicit scope limitation rather than a hidden self-definition. The work is self-contained as a benchmark against external datasets and prior models.
Axiom & Free-Parameter Ledger
axioms (1)
- standard math Multiple random seeds provide a reliable estimate of model performance variability
Reference graph
Works this paper leans on
-
[1]
D. Kostas, S. Aroca-Ouellette, and F. Rudzicz, “BENDR: Using trans- formers and a contrastive self-supervised learning task to learn from massive amounts of EEG data,” Frontiers in Human Neuroscience , vol. 15, p. 653659, 2021
work page 2021
-
[2]
Neuro-GPT: Towards a foundation model for EEG,
W . Cui, W. Jeong, P . Thölke, T. Medani, K. Jerbi, A. A. Joshi, and R. M. Leahy, “Neuro-GPT: Towards a foundation model for EEG,” in Proc. IEEE ISBI , Athens, Greece, 2024
work page 2024
-
[3]
EEGPT: Pretrained transformer for universal and reliable representation of EEG signals,
G. Wang, W. Liu, Y . He, C. Xu, L. Ma, and H. Li, “EEGPT: Pretrained transformer for universal and reliable representation of EEG signals,” in Proc. NeurIPS, vol. 37, 2024, pp. 39 249–39 280
work page 2024
-
[4]
LUNA: Efficient and topology-agnostic foundation model for EEG signal analysis,
B. Döner, T. M. Ingolfsson, L. Benini, and Y . Li, “LUNA: Efficient and topology-agnostic foundation model for EEG signal analysis,” in Proc. NeurIPS, 2025
work page 2025
-
[5]
CBraMod: A criss-cross brain foundation model for EEG decoding,
J. Wang, S. Zhao, Z. Luo, Y . Zhou, H. Jiang, S. Li, T. Li, and G. Pan, “CBraMod: A criss-cross brain foundation model for EEG decoding,” in Proc. ICLR , 2025
work page 2025
-
[6]
Uncovering the structure of clinical EEG signals with self- supervised learning,
H. Banville, O. Chehab, A. Hyvärinen, D.-A. Engemann, and A. Gram- fort, “Uncovering the structure of clinical EEG signals with self- supervised learning,” Journal of Neural Engineering , vol. 18, no. 4, p. 046020, 2021
work page 2021
-
[7]
Spherical splines for scalp potential and current density mapping,
F. Perrin, J. Pernier, O. Bertrand, and J. F. Echallier, “Spherical splines for scalp potential and current density mapping,” Electroencephalogra- phy and Clinical Neurophysiology , vol. 72, no. 2, pp. 184–187, 1989
work page 1989
-
[8]
B. M. Wingeier, P . L. Nunez, and R. B. Silberstein, “Spherical harmonic decomposition applied to spatial-temporal analysis of human high-density electroencephalogram,” Physical Review E, vol. 64, no. 5, p. 051916, 2001
work page 2001
-
[9]
OmnEEG: Simple EEG tokenizer with PyTorch datasets,
M. Ramezanian-Panahi and G. Dumas, “OmnEEG: Simple EEG tokenizer with PyTorch datasets,” 2025. [Online]. Available: https: //github.com/brain2vec/OmnEEG
work page 2025
-
[10]
A. Mellot, A. Collas, S. Chevallier, D. Engemann, and A. Gramfort, “Physics-informed and unsupervised Riemannian domain adaptation for machine learning on heterogeneous EEG datasets,” in Proc. EU- SIPCO, 2024
work page 2024
-
[11]
June 23, 2025.DOI:10.48550/arXiv.2506.19141
B. Aristimunha et al., “EEG foundation challenge: From cross-task to cross-subject EEG decoding,” arXiv preprint arXiv:2506.19141, 2025
-
[12]
D. Borra, F. Paissan, and M. Ravanelli, “SpeechBrain-MOABB: An open-source Python library for benchmarking deep neural networks applied to EEG signals,” Computers in Biology and Medicine , vol. 182, p. 109097, 2024
work page 2024
-
[13]
A method to standardize a reference of scalp EEG recordings to a point at infinity,
D. Y ao, “A method to standardize a reference of scalp EEG recordings to a point at infinity,” Physiological Measurement, vol. 22, no. 4, pp. 693–711, 2001
work page 2001
-
[14]
Multiclass brain-computer interface classification by Riemannian geometry,
A. Barachant, S. Bonnet, M. Congedo, and C. Jutten, “Multiclass brain-computer interface classification by Riemannian geometry,” IEEE Transactions on Biomedical Engineering , vol. 59, no. 4, pp. 920–928, 2012
work page 2012
-
[15]
Y . El Ouahidi, J. Lys, P . Thölke, N. Farrugia, B. Pasdeloup, V . Gripon, K. Jerbi, and G. Lioi, “REVE: A foundation model for EEG—adapting to any setup with large-scale pretraining on 25,000 subjects,” arXiv preprint arXiv:2510.21585, 2025
-
[16]
Beyond within-subject performance: A multi-dataset study of fine-tuning in the EEG domain,
C. Sartzetaki, P . Antoniadis, N. Antonopoulos, I. Gkinis, A. Krasoulis, S. Perdikis, and V . Pitsikalis, “Beyond within-subject performance: A multi-dataset study of fine-tuning in the EEG domain,” in Proc. IEEE SMC, 2023
work page 2023
-
[17]
Braindecode: toolbox for de- coding raw electrophysiological brain data with deep learning models,
B. Aristimunha, P . Guetschel, M. Wimpff, L. Gemein, C. Rommel, H. Banville, M. Sliwowski, D. Wilson, S. Brandt, T. Gnassounou, J. Paillard, B. Junqueira Lopes, S. Sedlar, T. Moreau, S. Chevallier, A. Gramfort, and R. T. Schirrmeister, “Braindecode: toolbox for de- coding raw electrophysiological brain data with deep learning models,” 2025
work page 2025
-
[18]
BCI competition 2008 – graz data set a,
C. Brunner, R. Leeb, G. Müller-Putz, A. Schlögl, and G. Pfurtscheller, “BCI competition 2008 – graz data set a,” Institute for Knowledge Discovery, Graz University of Technology, Tech. Rep., 2008
work page 2008
-
[19]
BCI2000: A general-purpose brain-computer interface (BCI) system,
G. Schalk, D. J. McFarland, T. Hinterberger, N. Birbaumer, and J. R. Wolpaw, “BCI2000: A general-purpose brain-computer interface (BCI) system,” IEEE Transactions on Biomedical Engineering, vol. 51, no. 6, pp. 1034–1043, 2004
work page 2004
-
[20]
The Temple University Hospital EEG data corpus,
I. Obeid and J. Picone, “The Temple University Hospital EEG data corpus,” Frontiers in Neuroscience , vol. 10, p. 196, 2016
work page 2016
-
[21]
A large finer-grained affective computing EEG dataset,
J. Chen, X. Wang, C. Huang, X. Hu, X. Shen, and D. Zhang, “A large finer-grained affective computing EEG dataset,” Scientific Data, vol. 10, p. 740, 2023
work page 2023
-
[22]
W. Mumtaz, S. S. A. Ali, M. A. M. Y asin, and A. S. Malik, “A machine learning framework involving EEG-based functional connectivity to diagnose major depressive disorder (MDD),” Medical & Biological Engineering & Computing , vol. 56, no. 2, pp. 233–246, 2018
work page 2018
-
[23]
L. Y ang, Q. Sun, A. Li, and M. M. Van Hulle, “Are EEG foundation models worth it? Comparative evaluation with traditional decoders in diverse BCI tasks,” in Proc. ICLR , 2026
work page 2026
-
[24]
Introducing the Neuroscience Gateway,
S. Sivagnanam et al. , “Introducing the Neuroscience Gateway,” in IWSG, 2013
work page 2013
-
[25]
Early experiences in developing and managing the Neuro- science Gateway,
——, “Early experiences in developing and managing the Neuro- science Gateway,” Concurrency Comput. Pract. Exper. , vol. 27, no. 2, pp. 473–488, 2015
work page 2015
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.