Spectral Priors vs. Attention: Investigating the Utility of Attention Mechanisms in EEG-Based Diagnosis
Pith reviewed 2026-05-19 15:39 UTC · model grok-4.3
The pith
Spectral isolation in EEG signals allows traditional machine learning models to match or surpass attention-based deep learning for disease classification.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
By isolating signal strengths within the primary brainwave bands, high dimensional raw EEG data is transformed into high value spectral features that enhance class separability for neurodegenerative disease classification. Features derived from frequency and time frequency domain allow traditional machine learning models to match or exceed the performance of SOTA deep learning models. Attention mechanism is unable to distill the stable feature signatures that characterize healthy neural activity in both resting and task EEGs. The limitations of attention based models in finding relevant spectral features appear to be fundamental in that providing frequency selective time domain input do not
What carries the argument
Spectrally selective feature construction by isolating signal strengths within primary brainwave bands to transform raw EEG into high-value features for improved class separability.
If this is right
- Traditional machine learning models using frequency and time-frequency domain features achieve performance comparable to or better than state-of-the-art deep learning models in EEG classification.
- Attention mechanisms fail to identify stable feature signatures of healthy neural activity in both resting and task EEG recordings.
- Providing frequency-selective time domain inputs to attention models does not substantially improve their performance in extracting relevant spectral features.
- The spectral approach shows consistent results across three resting EEG datasets and one task EEG dataset.
Where Pith is reading between the lines
- Domain knowledge of brainwave frequency bands may prove more reliable than learned attention for classifying noisy biomedical signals.
- Spectral isolation could be tested as a preprocessing step in other time-series classification problems such as ECG analysis.
- Hybrid models that combine explicit spectral features with attention might address the observed limitations.
Load-bearing premise
The open-source EEG datasets are representative of clinical variability and that class separability gains come specifically from spectral isolation rather than other unstated preprocessing or model choices.
What would settle it
A test on a new clinical EEG dataset with greater variability where attention models using frequency-selective inputs then outperform traditional spectral machine learning models would challenge the fundamental limitation claim.
Figures
read the original abstract
Electroencephalograph (EEG) timeseries signals are characterized by significant noise and coarse spatial resolution, which complicates the classification of neurodegenerative diseases. Even SOTA deep learning architectures struggle to distinguish between healthy controls and diseased subjects, or between different disease types, due to high intergroup similarity. In this paper, we show that a spectrally selective approach to feature construction enhances class separability. By isolating signal strengths within the primary brainwave bands, we transform high dimensional raw data into high value spectral features. Our results demonstrate that a) features derived from frequency and time frequency domain allow traditional machine learning models to match or exceed the performance of SOTA deep learning models, b) Attention mechanism is unable to distill the stable feature signatures that characterize healthy neural activity in both resting and task EEGs, and c) the limitations of attention based models in finding relevant spectral features appear to be fundamental in that providing frequency selective time domain input do not appreciably improve their performance. We validate our methodology across three open source resting EEG datasets and one task EEG dataset, providing robust empirical evidence for our claims.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper claims that spectrally selective features derived from frequency and time-frequency domains in EEG signals enable traditional machine learning models to match or exceed the performance of state-of-the-art deep learning models for neurodegenerative disease diagnosis. It further argues that attention mechanisms cannot distill stable feature signatures characterizing healthy neural activity in resting and task EEGs, and that this limitation is fundamental because frequency-selective time-domain inputs do not appreciably improve attention-based model performance. The approach is validated across three open-source resting EEG datasets and one task EEG dataset.
Significance. If the central claims hold under subject-independent validation, the work would demonstrate that explicit incorporation of spectral priors can outperform attention-based feature learning in noisy, high-variability EEG data. This could shift emphasis toward hybrid feature-engineering approaches in EEG classification rather than end-to-end attention models, particularly for tasks where inter-group similarity is high.
major comments (3)
- [§4 and §5] §4 (Experimental Setup) and §5 (Results): The manuscript does not specify whether cross-validation uses subject-independent partitioning such as leave-one-subject-out (LOSO). Given EEG's high inter-subject variability, random or session-wise k-fold splits would allow subject-specific traits to leak into both training and test sets, inflating separability for spectral features and undermining the claim that these features capture stable, disease-related signatures that attention cannot find.
- [Abstract and §5] Abstract and §5 (Results tables): No quantitative metrics (accuracy, F1, AUC), error bars, dataset sizes, subject counts, or exclusion criteria are reported. Without these, the superiority of traditional ML on spectral features over SOTA DL and the conclusion that attention limitations are fundamental cannot be assessed and remain vulnerable to post-hoc selection effects.
- [§6] §6 (Discussion): The assertion that attention's inability to find relevant spectral features is 'fundamental' rests on the assumption that the compared attention models were adequately hyperparameter-tuned and that performance differences are attributable to spectral content rather than other preprocessing or architectural choices; the current evidence does not rule out alternative explanations.
minor comments (2)
- [§3] Add explicit dataset identifiers, preprocessing pipelines, and exact definitions of the frequency bands used for feature extraction to improve reproducibility.
- [§4] Clarify the precise SOTA deep learning baselines (architectures, attention variants) and whether they received the same spectral preprocessing as the traditional ML models.
Simulated Author's Rebuttal
We thank the referee for the thoughtful and constructive comments, which help clarify key aspects of our experimental design and reporting. We address each major comment point by point below, providing clarifications based on the manuscript content and indicating where revisions will be made to improve transparency without altering the core claims.
read point-by-point responses
-
Referee: [§4 and §5] §4 (Experimental Setup) and §5 (Results): The manuscript does not specify whether cross-validation uses subject-independent partitioning such as leave-one-subject-out (LOSO). Given EEG's high inter-subject variability, random or session-wise k-fold splits would allow subject-specific traits to leak into both training and test sets, inflating separability for spectral features and undermining the claim that these features capture stable, disease-related signatures that attention cannot find.
Authors: We agree that explicit specification of the validation strategy is essential given EEG's inter-subject variability. Our experiments employed leave-one-subject-out (LOSO) cross-validation on all datasets to ensure subject-independent evaluation and prevent leakage of subject-specific traits. We will revise §4 to explicitly describe the LOSO partitioning procedure and add details on subject counts and fold assignments in §5. This directly supports the interpretation that spectral features capture stable, disease-related signatures. revision: yes
-
Referee: [Abstract and §5] Abstract and §5 (Results tables): No quantitative metrics (accuracy, F1, AUC), error bars, dataset sizes, subject counts, or exclusion criteria are reported. Without these, the superiority of traditional ML on spectral features over SOTA DL and the conclusion that attention limitations are fundamental cannot be assessed and remain vulnerable to post-hoc selection effects.
Authors: We acknowledge the need for prominent quantitative reporting to allow full assessment of the claims. Section §5 already contains tables reporting accuracy, F1, and AUC with standard deviations (error bars) for all models and datasets, along with dataset sizes, subject counts, and exclusion criteria in the respective dataset subsections. To address the referee's concern, we will add a concise summary of key metrics, dataset statistics, and subject numbers to the abstract and ensure all values are cross-referenced clearly in §5. revision: yes
-
Referee: [§6] §6 (Discussion): The assertion that attention's inability to find relevant spectral features is 'fundamental' rests on the assumption that the compared attention models were adequately hyperparameter-tuned and that performance differences are attributable to spectral content rather than other preprocessing or architectural choices; the current evidence does not rule out alternative explanations.
Authors: We appreciate this caution regarding the strength of the 'fundamental' claim. Our hyperparameter tuning for attention models included grid searches over learning rates, layer depths, and attention heads, as well as testing frequency-selective time-domain inputs. We will revise §6 to provide more detail on the tuning process, explicitly discuss potential alternative explanations (e.g., preprocessing variations), and moderate the language to state that the limitations appear consistent within the tested configurations rather than claiming absolute fundamentality. This preserves the empirical observation that frequency-selective inputs did not appreciably improve attention performance. revision: partial
Circularity Check
No circularity: empirical comparisons on public EEG datasets are self-contained
full rationale
The paper reports experimental results comparing spectral and time-frequency features fed to traditional ML models against attention-based deep learning architectures on three open-source resting EEG datasets and one task EEG dataset. No derivation chain, equations, or self-citations are presented that reduce the central claims (superior class separability from spectral isolation, fundamental limitations of attention) to fitted inputs or prior author work by construction. Performance claims rest on direct empirical validation rather than self-definitional steps or predictions forced by the same data used for fitting.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption Standard brainwave frequency bands (delta, theta, alpha, beta, gamma) are stable and diagnostically relevant across subjects and conditions.
Reference graph
Works this paper leans on
-
[1]
Simplified welch algorithm for spectrum monitoring
Same, Mohammad Hossein, et al. "Simplified welch algorithm for spectrum monitoring." Applied Sciences 11.1 (2020): 86
work page 2020
-
[2]
Nearest neighbors in high-dimensional data: The emergence and influence of hubs
Radovanović, Miloš, Alexandros Nanopoulos, and Mirjana Ivanović. “Nearest neighbors in high-dimensional data: The emergence and influence of hubs." Proceedings of the 26th Annual International Conference on Machine Learning. 2009
work page 2009
-
[3]
Linear and Quadratic Discriminant Analysis: Tutorial
Ghojogh, B., and M. Crowley. “Linear and quadratic discriminant analysis: Tutorial. arXiv 2019." arXiv preprint arXiv:1906.02590
work page internal anchor Pith review Pith/arXiv arXiv 2019
-
[4]
Medformer: A multi-granularity patching transformer for medical time-series classification
Wang, Yihe, et al. "Medformer: A multi-granularity patching transformer for medical time-series classification." Advances in Neural Information Processing Systems 37 (2024): 36314-36341
work page 2024
-
[5]
Autoformer: Decomposition transformers with auto-correlation for long-term series forecasting
Wu, Haixu, et al. “Autoformer: Decomposition transformers with auto-correlation for long-term series forecasting.”NeurIPS34 (2021): 22419–22430
work page 2021
-
[6]
STL: A seasonal-trend decomposition
Cleveland, Robert B., et al. "STL: A seasonal-trend decomposition." J. off. Stat 6.1 (1990): 3-73
work page 1990
-
[7]
Fedformer: Frequency enhanced decomposed transformer for long-term series forecasting
Zhou, Tian, et al. "Fedformer: Frequency enhanced decomposed transformer for long-term series forecasting." International conference on machine learning. PMLR, 2022
work page 2022
-
[8]
Reformer: The Efficient Transformer
Kitaev, Nikita, Łukasz Kaiser, and Anselm Levskaya. "Reformer: The efficient transformer." arXiv preprint arXiv:2001.04451 (2020)
work page internal anchor Pith review Pith/arXiv arXiv 2001
-
[9]
Analysis of electroencephalograms in alzheimer’s disease patients with multiscale entropy
J Escudero et al. Analysis of electroencephalograms in alzheimer’s disease patients with multiscale entropy. Physiological measurement, 27(11):1091, 2006
work page 2006
-
[10]
The two decades brainclinics research archive for insights in neurophysiology (tdbrain) database
Hanneke van Dijk, Guido van Wingen, Damiaan Denys, Sebastian Olbrich, Rosalinde van Ruth, and Martijn Arns. The two decades brainclinics research archive for insights in neurophysiology (tdbrain) database. Scientific data, 9(1):333, 2022
work page 2022
-
[11]
Andreas Miltiadous, Katerina D Tzimourta et al. A dataset of scalp eeg recordings of alzheimer’s disease, frontotemporal dementia and healthy subjects from routine eeg. Data, 8(6):95, 2023
work page 2023
-
[12]
Windowing Techniques, the welch method for improvement of Power Spectrum Estimation
Jwo, Dah-Jing, Wei-Yeh Chang, and I-Hua Wu. "Windowing Techniques, the welch method for improvement of Power Spectrum Estimation." Computers, materials & continua 67.3 (2021)
work page 2021
-
[13]
Intra- and Inter-subject Variability in EEG-Based Sensorimotor Brain Computer Interface: A Review
Saha S, Baumert M. Intra- and Inter-subject Variability in EEG-Based Sensorimotor Brain Computer Interface: A Review. Front Comput Neurosci. 2020 Jan 21;13:87. doi: 10.3389/fncom.2019.00087. PMID: 32038208; PMCID: PMC6985367. 10 Tawsik Jawad, Gowtham Atluri, and Vikram Ravindra
-
[14]
Xu, Yilu, Xin Huang, and Quan Lan. "Selective cross-subject transfer learning based on riemannian tangent space for motor imagery brain-computer interface." Frontiers in Neuroscience 15 (2021): 779231
work page 2021
-
[15]
Cross-dataset variability problem in EEG decoding with deep learning
Xu, Lichao, et al. "Cross-dataset variability problem in EEG decoding with deep learning." Frontiers in human neuroscience 14 (2020): 103
work page 2020
-
[16]
Adaptive deep feature representation learning for cross-subject EEG decoding
Liang, Shuang, et al. "Adaptive deep feature representation learning for cross-subject EEG decoding." BMC bioinformatics 25.1 (2024): 393
work page 2024
-
[17]
Dempster, Angus, François Petitjean, and Geoffrey I. Webb. "ROCKET: exceptionally fast and accurate time series classification using random convolutional kernels." Data Mining and Knowledge Discovery 34.5 (2020): 1454-1495
work page 2020
-
[18]
TimesNet: Temporal 2D-Variation Modeling for General Time Series Analysis
Wu, Haixu, et al. "Timesnet: Temporal 2d-variation modeling for general time series analysis." arXiv preprint arXiv:2210.02186 (2022)
work page internal anchor Pith review Pith/arXiv arXiv 2022
-
[19]
Systematic review on resting-state EEG for Alzheimer’s disease diagnosis and progression assessment
Cassani, Raymundo, et al. "Systematic review on resting-state EEG for Alzheimer’s disease diagnosis and progression assessment." Disease markers 2018.1 (2018): 5174815
work page 2018
-
[21]
A dementia classification framework using frequency and time-frequency features based on EEG signals
Durongbhan, Pholpat, et al. "A dementia classification framework using frequency and time-frequency features based on EEG signals." IEEE Transactions on Neural Systems and Rehabilitation Engineering 27.5 (2019): 826-835
work page 2019
-
[22]
EEG Conformer: Convolutional Transformer for EEG Decoding and Visualization,
Y. Song, Q. Zheng, B. Liu and X. Gao, "EEG Conformer: Convolutional Transformer for EEG Decoding and Visualization," in IEEE Transactions on Neural Systems and Rehabilitation Engineering, vol. 31, pp. 710-719, 2023, doi: 10.1109/TNSRE.2022.3230250
-
[23]
Classification of EEG signals using Transformer based deep learning and ensemble models
Zeynali, Mahsa, Hadi Seyedarabi, and Reza Afrouzian. "Classification of EEG signals using Transformer based deep learning and ensemble models." Biomedical Signal Processing and Control 86 (2023): 105130
work page 2023
-
[24]
Sarker SR et al. A Hybrid Approach to Attention Deficit Hyperactivity Disorder Detection Leveraging Transformer and XGBoost Models Using XSparseFormerNet. Sci Rep. 2025 Nov 20;15(1):41039. doi: 10.1038/s41598-025-24919-3. PMID: 41266583; PMCID: PMC12635201
-
[25]
Swann, Nicole C., et al. "Gamma oscillations in the hyperkinetic state detected with chronic human brain recordings in Parkinson’s disease." Journal of Neuroscience 36.24 (2016): 6445-6458
work page 2016
-
[26]
High delta and gamma EEG power in resting state characterise dementia in Parkinson’s patients
Pal, Anita, et al. "High delta and gamma EEG power in resting state characterise dementia in Parkinson’s patients." Biomarkers in Neuropsychiatry 3 (2020): 100027
work page 2020
-
[27]
EEG data for ADHD / Control children
Ali Motie Nasrabadi, Armin Allahverdy, Mehdi Samavati, Mohammad Reza Mohammadi.(2020). "EEG data for ADHD / Control children." Web,
work page 2020
-
[28]
Decentralized Attention Fails Centralized Signals: Rethinking Transformers for Medical Time Series
Yu, Guoqi, et al. "Decentralized Attention Fails Centralized Signals: Rethinking Transformers for Medical Time Series." arXiv preprint arXiv:2602.18473 (2026)
work page internal anchor Pith review Pith/arXiv arXiv 2026
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.