pith. sign in

arxiv: 2605.15433 · v1 · pith:DOJBNU2Znew · submitted 2026-05-14 · 💻 cs.LG

Spectral Priors vs. Attention: Investigating the Utility of Attention Mechanisms in EEG-Based Diagnosis

Pith reviewed 2026-05-19 15:39 UTC · model grok-4.3

classification 💻 cs.LG
keywords EEG classificationspectral featuresattention mechanismsneurodegenerative diseasemachine learningbrainwave bandstime-frequency analysis
0
0 comments X

The pith

Spectral isolation in EEG signals allows traditional machine learning models to match or surpass attention-based deep learning for disease classification.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

This paper establishes that transforming EEG data by isolating strengths in key brainwave frequency bands creates features that make disease classification easier. A sympathetic reader would care because current deep learning approaches struggle with noisy EEG signals and high similarity between groups, suggesting a simpler path might work better. The work shows that attention mechanisms cannot effectively find the stable patterns of healthy brain activity. This holds for both resting state and task-based EEG recordings, and even feeding attention models pre-filtered frequency data does not fix the issue.

Core claim

By isolating signal strengths within the primary brainwave bands, high dimensional raw EEG data is transformed into high value spectral features that enhance class separability for neurodegenerative disease classification. Features derived from frequency and time frequency domain allow traditional machine learning models to match or exceed the performance of SOTA deep learning models. Attention mechanism is unable to distill the stable feature signatures that characterize healthy neural activity in both resting and task EEGs. The limitations of attention based models in finding relevant spectral features appear to be fundamental in that providing frequency selective time domain input do not

What carries the argument

Spectrally selective feature construction by isolating signal strengths within primary brainwave bands to transform raw EEG into high-value features for improved class separability.

If this is right

  • Traditional machine learning models using frequency and time-frequency domain features achieve performance comparable to or better than state-of-the-art deep learning models in EEG classification.
  • Attention mechanisms fail to identify stable feature signatures of healthy neural activity in both resting and task EEG recordings.
  • Providing frequency-selective time domain inputs to attention models does not substantially improve their performance in extracting relevant spectral features.
  • The spectral approach shows consistent results across three resting EEG datasets and one task EEG dataset.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Domain knowledge of brainwave frequency bands may prove more reliable than learned attention for classifying noisy biomedical signals.
  • Spectral isolation could be tested as a preprocessing step in other time-series classification problems such as ECG analysis.
  • Hybrid models that combine explicit spectral features with attention might address the observed limitations.

Load-bearing premise

The open-source EEG datasets are representative of clinical variability and that class separability gains come specifically from spectral isolation rather than other unstated preprocessing or model choices.

What would settle it

A test on a new clinical EEG dataset with greater variability where attention models using frequency-selective inputs then outperform traditional spectral machine learning models would challenge the fundamental limitation claim.

Figures

Figures reproduced from arXiv: 2605.15433 by Gowtham Atluri, Tawsik Jawad, Vikram Ravindra.

Figure 1
Figure 1. Figure 1: Holdout-set confusion matrices on ADFTD comparing a classical pipeline vs. a Transformer. Rows denote ground-truth labels (‘A’:Alzheimers, ‘C’:Healthy Controls, ‘F’:Dementia) and columns denote predicted labels; darker diagonal entries indicate better class-wise performance. Left: Quadratic Discriminant Analysis (QDA) trained on aggregated spectral features (Welch-FFT/DWT, channel￾and window-averaged) show… view at source ↗
read the original abstract

Electroencephalograph (EEG) timeseries signals are characterized by significant noise and coarse spatial resolution, which complicates the classification of neurodegenerative diseases. Even SOTA deep learning architectures struggle to distinguish between healthy controls and diseased subjects, or between different disease types, due to high intergroup similarity. In this paper, we show that a spectrally selective approach to feature construction enhances class separability. By isolating signal strengths within the primary brainwave bands, we transform high dimensional raw data into high value spectral features. Our results demonstrate that a) features derived from frequency and time frequency domain allow traditional machine learning models to match or exceed the performance of SOTA deep learning models, b) Attention mechanism is unable to distill the stable feature signatures that characterize healthy neural activity in both resting and task EEGs, and c) the limitations of attention based models in finding relevant spectral features appear to be fundamental in that providing frequency selective time domain input do not appreciably improve their performance. We validate our methodology across three open source resting EEG datasets and one task EEG dataset, providing robust empirical evidence for our claims.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

3 major / 2 minor

Summary. The paper claims that spectrally selective features derived from frequency and time-frequency domains in EEG signals enable traditional machine learning models to match or exceed the performance of state-of-the-art deep learning models for neurodegenerative disease diagnosis. It further argues that attention mechanisms cannot distill stable feature signatures characterizing healthy neural activity in resting and task EEGs, and that this limitation is fundamental because frequency-selective time-domain inputs do not appreciably improve attention-based model performance. The approach is validated across three open-source resting EEG datasets and one task EEG dataset.

Significance. If the central claims hold under subject-independent validation, the work would demonstrate that explicit incorporation of spectral priors can outperform attention-based feature learning in noisy, high-variability EEG data. This could shift emphasis toward hybrid feature-engineering approaches in EEG classification rather than end-to-end attention models, particularly for tasks where inter-group similarity is high.

major comments (3)
  1. [§4 and §5] §4 (Experimental Setup) and §5 (Results): The manuscript does not specify whether cross-validation uses subject-independent partitioning such as leave-one-subject-out (LOSO). Given EEG's high inter-subject variability, random or session-wise k-fold splits would allow subject-specific traits to leak into both training and test sets, inflating separability for spectral features and undermining the claim that these features capture stable, disease-related signatures that attention cannot find.
  2. [Abstract and §5] Abstract and §5 (Results tables): No quantitative metrics (accuracy, F1, AUC), error bars, dataset sizes, subject counts, or exclusion criteria are reported. Without these, the superiority of traditional ML on spectral features over SOTA DL and the conclusion that attention limitations are fundamental cannot be assessed and remain vulnerable to post-hoc selection effects.
  3. [§6] §6 (Discussion): The assertion that attention's inability to find relevant spectral features is 'fundamental' rests on the assumption that the compared attention models were adequately hyperparameter-tuned and that performance differences are attributable to spectral content rather than other preprocessing or architectural choices; the current evidence does not rule out alternative explanations.
minor comments (2)
  1. [§3] Add explicit dataset identifiers, preprocessing pipelines, and exact definitions of the frequency bands used for feature extraction to improve reproducibility.
  2. [§4] Clarify the precise SOTA deep learning baselines (architectures, attention variants) and whether they received the same spectral preprocessing as the traditional ML models.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the thoughtful and constructive comments, which help clarify key aspects of our experimental design and reporting. We address each major comment point by point below, providing clarifications based on the manuscript content and indicating where revisions will be made to improve transparency without altering the core claims.

read point-by-point responses
  1. Referee: [§4 and §5] §4 (Experimental Setup) and §5 (Results): The manuscript does not specify whether cross-validation uses subject-independent partitioning such as leave-one-subject-out (LOSO). Given EEG's high inter-subject variability, random or session-wise k-fold splits would allow subject-specific traits to leak into both training and test sets, inflating separability for spectral features and undermining the claim that these features capture stable, disease-related signatures that attention cannot find.

    Authors: We agree that explicit specification of the validation strategy is essential given EEG's inter-subject variability. Our experiments employed leave-one-subject-out (LOSO) cross-validation on all datasets to ensure subject-independent evaluation and prevent leakage of subject-specific traits. We will revise §4 to explicitly describe the LOSO partitioning procedure and add details on subject counts and fold assignments in §5. This directly supports the interpretation that spectral features capture stable, disease-related signatures. revision: yes

  2. Referee: [Abstract and §5] Abstract and §5 (Results tables): No quantitative metrics (accuracy, F1, AUC), error bars, dataset sizes, subject counts, or exclusion criteria are reported. Without these, the superiority of traditional ML on spectral features over SOTA DL and the conclusion that attention limitations are fundamental cannot be assessed and remain vulnerable to post-hoc selection effects.

    Authors: We acknowledge the need for prominent quantitative reporting to allow full assessment of the claims. Section §5 already contains tables reporting accuracy, F1, and AUC with standard deviations (error bars) for all models and datasets, along with dataset sizes, subject counts, and exclusion criteria in the respective dataset subsections. To address the referee's concern, we will add a concise summary of key metrics, dataset statistics, and subject numbers to the abstract and ensure all values are cross-referenced clearly in §5. revision: yes

  3. Referee: [§6] §6 (Discussion): The assertion that attention's inability to find relevant spectral features is 'fundamental' rests on the assumption that the compared attention models were adequately hyperparameter-tuned and that performance differences are attributable to spectral content rather than other preprocessing or architectural choices; the current evidence does not rule out alternative explanations.

    Authors: We appreciate this caution regarding the strength of the 'fundamental' claim. Our hyperparameter tuning for attention models included grid searches over learning rates, layer depths, and attention heads, as well as testing frequency-selective time-domain inputs. We will revise §6 to provide more detail on the tuning process, explicitly discuss potential alternative explanations (e.g., preprocessing variations), and moderate the language to state that the limitations appear consistent within the tested configurations rather than claiming absolute fundamentality. This preserves the empirical observation that frequency-selective inputs did not appreciably improve attention performance. revision: partial

Circularity Check

0 steps flagged

No circularity: empirical comparisons on public EEG datasets are self-contained

full rationale

The paper reports experimental results comparing spectral and time-frequency features fed to traditional ML models against attention-based deep learning architectures on three open-source resting EEG datasets and one task EEG dataset. No derivation chain, equations, or self-citations are presented that reduce the central claims (superior class separability from spectral isolation, fundamental limitations of attention) to fitted inputs or prior author work by construction. Performance claims rest on direct empirical validation rather than self-definitional steps or predictions forced by the same data used for fitting.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

Paper rests on standard EEG band definitions and the assumption that public datasets capture the relevant disease signatures; no new entities or fitted constants are introduced in the abstract.

axioms (1)
  • domain assumption Standard brainwave frequency bands (delta, theta, alpha, beta, gamma) are stable and diagnostically relevant across subjects and conditions.
    Invoked when isolating signal strengths within primary brainwave bands to create features.

pith-pipeline@v0.9.0 · 5726 in / 1249 out tokens · 34451 ms · 2026-05-19T15:39:05.386157+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

27 extracted references · 27 canonical work pages · 4 internal anchors

  1. [1]

    Simplified welch algorithm for spectrum monitoring

    Same, Mohammad Hossein, et al. "Simplified welch algorithm for spectrum monitoring." Applied Sciences 11.1 (2020): 86

  2. [2]

    Nearest neighbors in high-dimensional data: The emergence and influence of hubs

    Radovanović, Miloš, Alexandros Nanopoulos, and Mirjana Ivanović. “Nearest neighbors in high-dimensional data: The emergence and influence of hubs." Proceedings of the 26th Annual International Conference on Machine Learning. 2009

  3. [3]

    Linear and Quadratic Discriminant Analysis: Tutorial

    Ghojogh, B., and M. Crowley. “Linear and quadratic discriminant analysis: Tutorial. arXiv 2019." arXiv preprint arXiv:1906.02590

  4. [4]

    Medformer: A multi-granularity patching transformer for medical time-series classification

    Wang, Yihe, et al. "Medformer: A multi-granularity patching transformer for medical time-series classification." Advances in Neural Information Processing Systems 37 (2024): 36314-36341

  5. [5]

    Autoformer: Decomposition transformers with auto-correlation for long-term series forecasting

    Wu, Haixu, et al. “Autoformer: Decomposition transformers with auto-correlation for long-term series forecasting.”NeurIPS34 (2021): 22419–22430

  6. [6]

    STL: A seasonal-trend decomposition

    Cleveland, Robert B., et al. "STL: A seasonal-trend decomposition." J. off. Stat 6.1 (1990): 3-73

  7. [7]

    Fedformer: Frequency enhanced decomposed transformer for long-term series forecasting

    Zhou, Tian, et al. "Fedformer: Frequency enhanced decomposed transformer for long-term series forecasting." International conference on machine learning. PMLR, 2022

  8. [8]

    Reformer: The Efficient Transformer

    Kitaev, Nikita, Łukasz Kaiser, and Anselm Levskaya. "Reformer: The efficient transformer." arXiv preprint arXiv:2001.04451 (2020)

  9. [9]

    Analysis of electroencephalograms in alzheimer’s disease patients with multiscale entropy

    J Escudero et al. Analysis of electroencephalograms in alzheimer’s disease patients with multiscale entropy. Physiological measurement, 27(11):1091, 2006

  10. [10]

    The two decades brainclinics research archive for insights in neurophysiology (tdbrain) database

    Hanneke van Dijk, Guido van Wingen, Damiaan Denys, Sebastian Olbrich, Rosalinde van Ruth, and Martijn Arns. The two decades brainclinics research archive for insights in neurophysiology (tdbrain) database. Scientific data, 9(1):333, 2022

  11. [11]

    A dataset of scalp eeg recordings of alzheimer’s disease, frontotemporal dementia and healthy subjects from routine eeg

    Andreas Miltiadous, Katerina D Tzimourta et al. A dataset of scalp eeg recordings of alzheimer’s disease, frontotemporal dementia and healthy subjects from routine eeg. Data, 8(6):95, 2023

  12. [12]

    Windowing Techniques, the welch method for improvement of Power Spectrum Estimation

    Jwo, Dah-Jing, Wei-Yeh Chang, and I-Hua Wu. "Windowing Techniques, the welch method for improvement of Power Spectrum Estimation." Computers, materials & continua 67.3 (2021)

  13. [13]

    Intra- and Inter-subject Variability in EEG-Based Sensorimotor Brain Computer Interface: A Review

    Saha S, Baumert M. Intra- and Inter-subject Variability in EEG-Based Sensorimotor Brain Computer Interface: A Review. Front Comput Neurosci. 2020 Jan 21;13:87. doi: 10.3389/fncom.2019.00087. PMID: 32038208; PMCID: PMC6985367. 10 Tawsik Jawad, Gowtham Atluri, and Vikram Ravindra

  14. [14]

    Selective cross-subject transfer learning based on riemannian tangent space for motor imagery brain-computer interface

    Xu, Yilu, Xin Huang, and Quan Lan. "Selective cross-subject transfer learning based on riemannian tangent space for motor imagery brain-computer interface." Frontiers in Neuroscience 15 (2021): 779231

  15. [15]

    Cross-dataset variability problem in EEG decoding with deep learning

    Xu, Lichao, et al. "Cross-dataset variability problem in EEG decoding with deep learning." Frontiers in human neuroscience 14 (2020): 103

  16. [16]

    Adaptive deep feature representation learning for cross-subject EEG decoding

    Liang, Shuang, et al. "Adaptive deep feature representation learning for cross-subject EEG decoding." BMC bioinformatics 25.1 (2024): 393

  17. [17]

    ROCKET: exceptionally fast and accurate time series classification using random convolutional kernels

    Dempster, Angus, François Petitjean, and Geoffrey I. Webb. "ROCKET: exceptionally fast and accurate time series classification using random convolutional kernels." Data Mining and Knowledge Discovery 34.5 (2020): 1454-1495

  18. [18]

    TimesNet: Temporal 2D-Variation Modeling for General Time Series Analysis

    Wu, Haixu, et al. "Timesnet: Temporal 2d-variation modeling for general time series analysis." arXiv preprint arXiv:2210.02186 (2022)

  19. [19]

    Systematic review on resting-state EEG for Alzheimer’s disease diagnosis and progression assessment

    Cassani, Raymundo, et al. "Systematic review on resting-state EEG for Alzheimer’s disease diagnosis and progression assessment." Disease markers 2018.1 (2018): 5174815

  20. [21]

    A dementia classification framework using frequency and time-frequency features based on EEG signals

    Durongbhan, Pholpat, et al. "A dementia classification framework using frequency and time-frequency features based on EEG signals." IEEE Transactions on Neural Systems and Rehabilitation Engineering 27.5 (2019): 826-835

  21. [22]

    EEG Conformer: Convolutional Transformer for EEG Decoding and Visualization,

    Y. Song, Q. Zheng, B. Liu and X. Gao, "EEG Conformer: Convolutional Transformer for EEG Decoding and Visualization," in IEEE Transactions on Neural Systems and Rehabilitation Engineering, vol. 31, pp. 710-719, 2023, doi: 10.1109/TNSRE.2022.3230250

  22. [23]

    Classification of EEG signals using Transformer based deep learning and ensemble models

    Zeynali, Mahsa, Hadi Seyedarabi, and Reza Afrouzian. "Classification of EEG signals using Transformer based deep learning and ensemble models." Biomedical Signal Processing and Control 86 (2023): 105130

  23. [24]

    A Hybrid Approach to Attention Deficit Hyperactivity Disorder Detection Leveraging Transformer and XGBoost Models Using XSparseFormerNet

    Sarker SR et al. A Hybrid Approach to Attention Deficit Hyperactivity Disorder Detection Leveraging Transformer and XGBoost Models Using XSparseFormerNet. Sci Rep. 2025 Nov 20;15(1):41039. doi: 10.1038/s41598-025-24919-3. PMID: 41266583; PMCID: PMC12635201

  24. [25]

    Gamma oscillations in the hyperkinetic state detected with chronic human brain recordings in Parkinson’s disease

    Swann, Nicole C., et al. "Gamma oscillations in the hyperkinetic state detected with chronic human brain recordings in Parkinson’s disease." Journal of Neuroscience 36.24 (2016): 6445-6458

  25. [26]

    High delta and gamma EEG power in resting state characterise dementia in Parkinson’s patients

    Pal, Anita, et al. "High delta and gamma EEG power in resting state characterise dementia in Parkinson’s patients." Biomarkers in Neuropsychiatry 3 (2020): 100027

  26. [27]

    EEG data for ADHD / Control children

    Ali Motie Nasrabadi, Armin Allahverdy, Mehdi Samavati, Mohammad Reza Mohammadi.(2020). "EEG data for ADHD / Control children." Web,

  27. [28]

    Decentralized Attention Fails Centralized Signals: Rethinking Transformers for Medical Time Series

    Yu, Guoqi, et al. "Decentralized Attention Fails Centralized Signals: Rethinking Transformers for Medical Time Series." arXiv preprint arXiv:2602.18473 (2026)