Aperiodic and Low-Frequency Spectral Bias in Reconstruction based EEG Foundation Models
Pith reviewed 2026-06-29 18:55 UTC · model grok-4.3
The pith
Reconstruction-based EEG foundation models capture aperiodic signal components while under-representing high-frequency oscillatory ones.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Using controlled synthetic EEG inputs that separate aperiodic and oscillatory components, reconstruction-based EEG foundation model embeddings are shown to preferentially encode aperiodic structure while under-representing oscillatory activity, with the effect strongest at higher frequencies. On real-world BCI datasets, linear probes confirm that these embeddings represent subject identity more strongly than task-relevant features, thereby reinforcing the low-frequency and aperiodic bias induced by the reconstruction objective.
What carries the argument
The reconstruction pretext task, which aligns embeddings with high-power aperiodic EEG components at the expense of low-power oscillatory ones.
If this is right
- Embeddings will show weaker performance on downstream tasks that depend on high-frequency oscillatory content.
- Linear probes will continue to recover subject identity more readily than task labels across multiple BCI datasets.
- The performance gap versus fully supervised models will remain largest in low-resource regimes where fine-tuning cannot overcome the spectral bias.
- Adding explicit auxiliary losses for high-frequency oscillatory structure during pretraining would reduce the mismatch.
Where Pith is reading between the lines
- The same reconstruction-induced bias could appear in foundation models trained on other biosignals that exhibit strong aperiodic backgrounds.
- Pretraining objectives that directly penalize loss of oscillatory power, such as frequency-specific contrastive terms, offer a testable route to more balanced representations.
- Subject-identity dominance in embeddings suggests that current models may require explicit disentanglement steps before they can generalize across individuals.
Load-bearing premise
Synthetic EEG signals accurately reproduce the spectral decomposition and statistical properties of real EEG without introducing artifacts that exaggerate the observed bias.
What would settle it
An experiment that measures the power spectrum of signals reconstructed from the model embeddings on synthetic inputs with isolated high-frequency oscillations and finds equal or stronger representation of those oscillations compared with aperiodic components.
Figures
read the original abstract
EEG foundation models, pre-trained on large-scale unlabelled EEG data, have emerged as a promising direction towards learning generalizable EEG representations. Despite showing positive results in data-rich regimes, they often fail to outperform significantly smaller supervised models in low-resource settings compared to fully supervised models. We provide a mechanistic account of this shortcoming, attributing it to a fundamental mismatch between reconstruction-based pretext tasks and the idiosyncratic spectral structure of EEG signals, which decompose into distinct high-power aperiodic and low-power oscillatory components. Using controlled, synthetically-generated EEG inputs, we demonstrate that EEG foundation model embeddings are biased to capture the aperiodic components of the EEG signal while under-representing oscillatory components, particularly at higher frequencies. Additionally, linear probe evaluations on real-world BCI datasets further reveal that embeddings encode subject identity more strongly than task-relevant information, thereby reinforcing the low-frequency and aperiodic component bias in foundation model embeddings trained primarily on reconstruction based objectives. Together, these findings elucidate a failure mode in reconstruction based EEG foundation models and motivate future work to incorporate auxiliary losses explicitly targeting high-frequency oscillatory structure as a path toward more capable and generalizable EEG representations.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper claims that reconstruction-based pretraining in EEG foundation models induces a spectral bias favoring high-power aperiodic components and low-frequency content while under-representing low-power high-frequency oscillations. This is shown via controlled synthetic EEG inputs that isolate the effect, plus linear probe experiments on real BCI datasets demonstrating stronger encoding of subject identity than task information; the authors conclude this mismatch explains poor low-resource performance and motivate auxiliary losses targeting oscillatory structure.
Significance. If the central attribution holds, the work supplies a concrete mechanistic account of why reconstruction objectives are mismatched to EEG statistics and supplies an actionable path (auxiliary losses) for improving foundation-model pretraining. The controlled synthetic setup, if shown to preserve real EEG joint statistics, would be a strength for causal isolation; the linear-probe results on subject vs. task encoding would further ground the practical relevance.
major comments (2)
- [Experiments / synthetic EEG generation] Synthetic EEG generation subsection (Experiments section): the central claim that the observed aperiodic bias is caused by the reconstruction objective requires that the synthetic inputs reproduce the joint spectral statistics of real EEG (non-stationarity, phase-amplitude coupling, channel correlations). The manuscript must supply the precise generation procedure and quantitative comparisons (e.g., PSD, coherence, PAC metrics) between synthetic and real data; absent these controls, the mismatch could originate in the synthetic construction itself rather than the pretext task.
- [Results / linear probe evaluation] Linear probe evaluation paragraph (Results section): the claim that embeddings encode subject identity more strongly than task-relevant information is load-bearing for the practical implication. The manuscript should report the exact probe accuracies, the number of subjects/tasks, cross-validation scheme, and statistical comparison (e.g., paired t-test or effect size) between subject and task probes; without these numbers the strength of the bias remains unquantified.
minor comments (1)
- [Abstract] Abstract, final sentence: the phrasing 'thereby reinforcing the low-frequency and aperiodic component bias' is circular; reword to separate the empirical observation from the interpretive claim.
Simulated Author's Rebuttal
Thank you for the constructive feedback on our manuscript. We address each major comment below and will incorporate revisions to strengthen the work.
read point-by-point responses
-
Referee: [Experiments / synthetic EEG generation] Synthetic EEG generation subsection (Experiments section): the central claim that the observed aperiodic bias is caused by the reconstruction objective requires that the synthetic inputs reproduce the joint spectral statistics of real EEG (non-stationarity, phase-amplitude coupling, channel correlations). The manuscript must supply the precise generation procedure and quantitative comparisons (e.g., PSD, coherence, PAC metrics) between synthetic and real data; absent these controls, the mismatch could originate in the synthetic construction itself rather than the pretext task.
Authors: We agree that quantitative validation of the synthetic data against real EEG is necessary to attribute the observed bias specifically to the reconstruction objective. In the revised manuscript, we will expand the synthetic EEG generation subsection to provide the precise generation procedure along with direct comparisons of power spectral density (PSD), coherence, and phase-amplitude coupling (PAC) metrics between synthetic and real datasets. This addition will address the concern and reinforce the causal interpretation. revision: yes
-
Referee: [Results / linear probe evaluation] Linear probe evaluation paragraph (Results section): the claim that embeddings encode subject identity more strongly than task-relevant information is load-bearing for the practical implication. The manuscript should report the exact probe accuracies, the number of subjects/tasks, cross-validation scheme, and statistical comparison (e.g., paired t-test or effect size) between subject and task probes; without these numbers the strength of the bias remains unquantified.
Authors: We acknowledge that detailed quantitative reporting is required to substantiate the relative encoding of subject identity versus task information. In the revision, we will update the linear probe evaluation paragraph to include the exact probe accuracies, the number of subjects and tasks, the cross-validation scheme, and statistical comparisons (e.g., paired t-tests or effect sizes) between the subject and task probes. These additions will quantify the bias and support the practical implications. revision: yes
Circularity Check
No circularity: empirical observations from synthetic and real-data experiments, no derivation chain
full rationale
The paper advances its central claims through controlled experiments on synthetically generated EEG signals and linear-probe evaluations on real BCI datasets. No mathematical derivation, uniqueness theorem, or first-principles result is presented that reduces to fitted parameters, self-definitions, or self-citations. The abstract and described methodology rely on direct measurement of spectral bias and subject/task encoding, which are falsifiable against external benchmarks rather than tautological. This matches the default expectation for non-circular empirical work.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption EEG signals decompose into distinct high-power aperiodic and low-power oscillatory components
Reference graph
Works this paper leans on
-
[1]
Josh Achiam, Steven Adler, Sandhini Agarwal, Lama Ahmad, Ilge Akkaya, Florencia Leoni Aleman, Diogo Almeida, Janko Altenschmidt, Sam Altman, Shyamal Anadkat, et al. Gpt-4 technical report. arXiv preprint arXiv:2303.08774,
work page internal anchor Pith review Pith/arXiv arXiv
-
[2]
Neural codecs as biosignal tokenizers.arXiv preprint arXiv:2510.09095,
Kleanthis Avramidis, Tiantian Feng, Woojae Jeong, Jihwan Lee, Wenhui Cui, Richard M Leahy, and Shrikanth Narayanan. Neural codecs as biosignal tokenizers.arXiv preprint arXiv:2510.09095,
-
[3]
Language models are few-shot learners
Tom Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared D Kaplan, Prafulla Dhariwal, Arvind Neelakantan, Pranav Shyam, Girish Sastry, Amanda Askell, Sandhini Agarwal, Ariel Herbert-V oss, Gretchen Krueger, Tom Henighan, Rewon Child, Aditya Ramesh, Daniel Ziegler, Jeffrey Wu, Clemens Winter, Chris Hesse, Mark Chen, Eric Sigler, Mateusz Litwin, Scott G...
1901
-
[4]
Clemens Brunner, Robert Leeb, Gernot Müller-Putz, Alois Schlögl, and Gert Pfurtscheller
URL https://proceedings.neurips.cc/ paper_files/paper/2020/file/1457c0d6bfcb4967418bfb8ac142f64a-Paper.pdf. Clemens Brunner, Robert Leeb, Gernot Müller-Putz, Alois Schlögl, and Gert Pfurtscheller. Bci competition 2008–graz data set a.Institute for knowledge discovery (laboratory of brain-computer interfaces), Graz University of Technology, 16(1-6):1,
2020
-
[5]
10 Hsiang-Yun Sherry Chien, Hanlin Goh, Christopher Michael Sandino, and Joseph Yitan Cheng
URLhttps://openreview.net/forum?id=oUMiuYHW21. 10 Hsiang-Yun Sherry Chien, Hanlin Goh, Christopher Michael Sandino, and Joseph Yitan Cheng. MAEEG: Masked auto-encoder for EEG representation learning. InNeurIPS 2022 Workshop on Learning from Time Series for Health,
2022
-
[6]
URL https://openreview.net/forum?id= kttuLV59ZuJ. Gheorghe Comanici, Eric Bieber, Mike Schaekermann, Ice Pasupat, Noveen Sachdeva, Inderjit Dhillon, Marcel Blistein, Ori Ram, Dan Zhang, Evan Rosen, et al. Gemini 2.5: Pushing the frontier with advanced reasoning, multimodality, long context, and next generation agentic capabilities. arXiv preprint arXiv:25...
work page internal anchor Pith review Pith/arXiv arXiv
-
[7]
doi: https://doi.org/10.1016/j.neuroimage.2022.119034
ISSN 1053-8119. doi: https://doi.org/10.1016/j.neuroimage.2022.119034. URL https://www.sciencedirect.com/ science/article/pii/S105381192200163X. Weibang Jiang, Liming Zhao, and Bao liang Lu. Large brain model for learning generic representa- tions with tremendous EEG data in BCI. InThe Twelfth International Conference on Learning Representations,
-
[8]
doi: 10.1109/EMBC53108. 2024.10782310. Demetres Kostas, Stephane Aroca-Ouellette, and Frank Rudzicz. Bendr: Using transformers and a contrastive self-supervised learning task to learn from massive amounts of eeg data.Frontiers in Human Neuroscience, 15:653659,
-
[9]
Gayal Kuruppu, Neeraj Wagh, Vaclav Kremen, and Yogatheesan Varatharajah
doi: 10.1109/TIFS.2021.3067998. Gayal Kuruppu, Neeraj Wagh, Vaclav Kremen, and Yogatheesan Varatharajah. Eeg foundation models: a critical review of current progress and future directions.Journal of neural engineering, 23(2):021001,
-
[10]
EEG foundation models: Progresses, benchmarki ng, and open problems
Chenyu Liu, Yuqiu Deng, Tianyu Liu, Jinan Zhou, Xinliang Zhou, Ziyu Jia, and Yi Ding. ECHO: Toward contextual seq2seq paradigms in large EEG models. InThe F ourteenth International Conference on Learning Representations, 2026a. URL https://openreview.net/forum?id= ClLQ6cLkoR. 11 Dingkun Liu, Yuheng Chen, Zhu Chen, Zhenyao Cui, Yaozhi Wen, Jiayu An, Jingwe...
-
[11]
Aditya Mishra, Ahnaf Mozib Samin, Ali Etemad, and Javad Hashemi
Kaggle. Aditya Mishra, Ahnaf Mozib Samin, Ali Etemad, and Javad Hashemi. Subject representation learning from eeg using graph convolutional variational autoencoders. InICASSP 2025-2025 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pages 1–5. IEEE,
2025
-
[12]
G. Schalk, D.J. McFarland, T. Hinterberger, N. Birbaumer, and J.R. Wolpaw. Bci2000: a general- purpose brain-computer interface (bci) system.IEEE Transactions on Biomedical Engineering, 51 (6):1034–1043, 2004a. doi: 10.1109/TBME.2004.827072. Gerwin Schalk, Dennis J McFarland, Thilo Hinterberger, Niels Birbaumer, and Jonathan R Wol- paw. Bci2000: a general...
-
[13]
Jiquan Wang, Sha Zhao, Yangxuan Zhou, Yiming Kang, Shijian Li, and Gang Pan
URL https://openreview.net/ forum?id=NPNUHgHF2w. Jiquan Wang, Sha Zhao, Yangxuan Zhou, Yiming Kang, Shijian Li, and Gang Pan. Deeperbrain: A neuro-grounded eeg foundation model towards universal bci.arXiv preprint arXiv:2601.06134,
-
[14]
Liuyin Yang, Qiang Sun, Ang Li, and Marc M
URL https://proceedings.neurips.cc/paper_files/paper/2023/ file/f6b30f3e2dd9cb53bbf2024402d02295-Paper-Conference.pdf. Liuyin Yang, Qiang Sun, Ang Li, and Marc M. Van Hulle. Are EEG foundation models worth it? comparative evaluation with traditional decoders in diverse BCI tasks. InThe F ourteenth International Conference on Learning Representations,
2023
-
[15]
FOOOF python package was used for generating the frequency spectrum of single channel EEG [Donoghue et al., 2020]. Fig. 6 illustrates the examples generated by sweep of exponent values from 1.0 to 2.0. D Linear Decodability Results Additional linear decodability plots for channels across different EEG montage lobes, Frontal (Fz) [Fig 7, Fig 8], Parietal (...
2020
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.