pith. sign in

arxiv: 2606.00884 · v1 · pith:EX7NOHSVnew · submitted 2026-05-30 · 💻 cs.LG · cs.AI

Dive into Waves: Morlet Spectral Transformer for Cross-Subject Emotion Decoding from EEG

Pith reviewed 2026-06-28 18:54 UTC · model grok-4.3

classification 💻 cs.LG cs.AI
keywords cross-subject EEGemotion recognitionMorlet waveletspectral transformerbrain-computer interfacetime-frequency representationbaseline removal
0
0 comments X

The pith

The Morlet Spectral Transformer outperforms large pretrained EEG models in cross-subject emotion recognition without pretraining.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper establishes that a Transformer built around three spectral-specific components can decode emotions from EEG more accurately across subjects than either massive pretrained foundation models or conventional frequency encoders. Emotion signals appear mainly as weak, noisy changes in spectral power that shift between people, so standard waveform or generic tokenization approaches fail to generalize. By tokenizing with Morlet wavelets to align with multi-scale brain rhythms, stripping subject drift via long-context baseline removal, and applying separate spatial projections per frequency band, the model produces better results on SEED-family datasets. This matters because it points to a lighter, more interpretable route for brain-computer interfaces that avoids the data and compute demands of large-scale pretraining.

Core claim

The Morlet Spectral Transformer integrates Morlet wavelet tokenization that extends differential entropy into a time-frequency form suitable for Transformers, long-context baseline removal that normalizes out subject-specific drift and redundancy, and frequency-specific spatial projection that learns independent channel mixers per band. These elements are placed inside a spatiotemporal Transformer backbone. The resulting model consistently exceeds both large pretrained EEG foundation models and prior frequency-based methods on every SEED-family dataset for cross-subject emotion recognition, even when trained from scratch.

What carries the argument

Morlet wavelet tokenization paired with long-context baseline removal and frequency-specific spatial projection inside a Transformer backbone.

If this is right

  • Careful spectral representation design can serve as an accurate, cost-effective alternative to large-scale pretraining for EEG tasks.
  • The frequency-specific spatial projection yields interpretable band-specific patterns that reduce unwanted cross-channel mixing.
  • Long-context baseline removal provides a simple normalization that improves generalization by removing subject drift across windows.
  • The overall approach is particularly suited to signals whose information lies in spectral power rather than clear waveform signatures.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same tokenization and normalization steps could be tested on other EEG tasks that rely on spectral rather than temporal signatures.
  • The emphasis on matching multi-scale rhythms suggests similar wavelet-based tokenization might help in other noisy, multi-scale biosignal problems.
  • If the baseline removal step generalizes, it could become a standard preprocessing choice for handling inter-subject variability in neural recordings.

Load-bearing premise

The reported gains come from the three proposed components rather than the shared Transformer backbone or characteristics of the SEED datasets.

What would settle it

An ablation study that removes each of the three components one at a time and checks whether accuracy falls to or below the level of the pretrained baselines or standard frequency methods.

Figures

Figures reproduced from arXiv: 2606.00884 by Jiaxin Qing, Lexin Li.

Figure 1
Figure 1. Figure 1: Overview of MST. Raw EEG is decomposed into a time-frequency grid by fixed Morlet wavelets. [PITH_FULL_IMAGE:figures/full_fig_p003_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Temporal redundancy. The left panel shows pairwise cosine similarity of raw log-amplitude [PITH_FULL_IMAGE:figures/full_fig_p004_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Learned frequency-specific spatial projections. Each panel shows one spatial component [PITH_FULL_IMAGE:figures/full_fig_p005_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Effects of four data augmentation strategies on a representative sample. [PITH_FULL_IMAGE:figures/full_fig_p006_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Per-subject cross-subject LOSO accuracy on SEED for MST. Red bars indicate subjects whose [PITH_FULL_IMAGE:figures/full_fig_p008_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: Morlet wavelet tokenization. (a) The complex Morlet kernels at different center frequencies, with [PITH_FULL_IMAGE:figures/full_fig_p012_6.png] view at source ↗
read the original abstract

We study cross-subject emotion recognition from EEG, a practically important yet challenging problem in brain-computer interfaces. Unlike tasks with clear waveform signatures, emotion-related EEG signals are primarily encoded in spectral power and are weak, noisy, and highly variable across subjects. Existing approaches rely either on large pretrained EEG foundation models, which require massive data yet still struggle with cross-subject variability, or frequency-domain encoders, which better reflect spectral structure but suffer from mismatched representations, drift-dominated tokenization, and lack of band-specific spatial modeling. In this article, we propose the Morlet Spectral Transformer (MST), built around three key components and integrated with a spatiotemporal Transformer backbone. First, Morlet wavelet tokenization provides a time-frequency representation that matches the multi-scale structure of brain rhythms, and extends classical differential entropy features to a form suitable for Transformers. Second, long-context baseline removal acts as a simple temporal normalization that removes subject-specific drift and redundancy across nearby windows. Third, frequency-specific spatial projection learns a separate channel mixer for each frequency band, capturing interpretable band-specific patterns and reducing cross-channel mixing. We show that, even without pretraining, MST consistently outperforms both large pretrained EEG foundation models and frequency-based methods across all SEED-family datasets. These results suggest that careful representation design can yield an accurate, cost-effective, and interpretable alternative to large-scale pretraining.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper proposes the Morlet Spectral Transformer (MST) for cross-subject EEG-based emotion recognition. It integrates a spatiotemporal Transformer backbone with three components: Morlet wavelet tokenization to produce time-frequency representations matching brain rhythms (extending differential entropy), long-context baseline removal as temporal normalization to reduce subject-specific drift, and frequency-specific spatial projection to learn band-wise channel mixers. The central empirical claim is that MST, even without pretraining, consistently outperforms both large pretrained EEG foundation models and existing frequency-based methods across SEED-family datasets.

Significance. If the performance gains can be rigorously attributed to the proposed components rather than the Transformer backbone or dataset artifacts, the work would support the value of domain-specific representation design over scale in EEG decoding tasks, providing a lower-cost, more interpretable alternative to foundation-model pretraining for cross-subject generalization.

major comments (2)
  1. [Results / Experiments (section containing the main tables and comparisons)] The central claim that the three components (Morlet tokenization, long-context baseline removal, frequency-specific spatial projection) drive the reported gains requires controlled ablations; no such experiments (e.g., replacing Morlet with standard DE/FFT features, removing baseline correction, or using a shared spatial mixer) are described that would show corresponding performance drops when these elements are ablated.
  2. [Abstract and Results section] The abstract and results presentation supply no statistical tests, error bars, dataset splits, or subject-wise variance measures to support the claim of 'consistent outperformance' across SEED-family datasets; without these, the strength of the empirical comparison cannot be evaluated.
minor comments (2)
  1. [Method section] Notation for the Morlet wavelet parameters and the exact formulation of the long-context baseline removal should be made explicit with equations to allow reproduction.
  2. [Figures] Figure captions for any architecture or tokenization diagrams should explicitly label the three proposed components and their integration with the Transformer backbone.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive comments, which highlight important aspects for strengthening the empirical claims. We agree that both controlled ablations and statistical reporting are needed to rigorously support the contribution of the proposed components and will add these elements to the revised manuscript.

read point-by-point responses
  1. Referee: [Results / Experiments (section containing the main tables and comparisons)] The central claim that the three components (Morlet tokenization, long-context baseline removal, frequency-specific spatial projection) drive the reported gains requires controlled ablations; no such experiments (e.g., replacing Morlet with standard DE/FFT features, removing baseline correction, or using a shared spatial mixer) are described that would show corresponding performance drops when these elements are ablated.

    Authors: We agree that the absence of component-specific ablations limits the ability to attribute gains specifically to Morlet tokenization, long-context baseline removal, and frequency-specific spatial projection rather than the Transformer backbone alone. In the revised manuscript we will add a dedicated ablation study section that systematically replaces each component (Morlet with standard DE/FFT, removal of baseline correction, and replacement of per-band spatial mixers with a shared mixer) and reports the resulting performance drops on the SEED-family datasets. These experiments will be presented alongside the main results tables. revision: yes

  2. Referee: [Abstract and Results section] The abstract and results presentation supply no statistical tests, error bars, dataset splits, or subject-wise variance measures to support the claim of 'consistent outperformance' across SEED-family datasets; without these, the strength of the empirical comparison cannot be evaluated.

    Authors: We acknowledge that the current presentation lacks statistical tests, error bars, explicit dataset splits, and subject-wise variance, which weakens the evaluation of the 'consistent outperformance' claim. In the revision we will (i) report mean and standard deviation across multiple random seeds and cross-subject folds, (ii) include paired statistical tests (e.g., Wilcoxon signed-rank) with p-values against baselines, (iii) specify the exact train/validation/test subject splits used, and (iv) add subject-wise performance variance measures. These will appear in both the abstract (concise summary) and the results section tables/figures. revision: yes

Circularity Check

0 steps flagged

No circularity; empirical claims rest on external benchmarks

full rationale

The paper introduces the MST architecture with three described components (Morlet wavelet tokenization, long-context baseline removal, frequency-specific spatial projection) integrated into a Transformer backbone, then reports empirical outperformance on SEED-family datasets without pretraining. No equations, derivations, or self-citations appear that reduce any claimed result to its inputs by construction, nor any fitted parameters renamed as predictions, ansatzes smuggled via citation, or uniqueness theorems. Performance attribution is presented via direct experimental comparison to baselines, making the chain self-contained against external data rather than internally tautological.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract provides no explicit free parameters, axioms, or invented entities; the design relies on standard assumptions of wavelet transforms and Transformer architectures from prior literature.

pith-pipeline@v0.9.1-grok · 5771 in / 998 out tokens · 18128 ms · 2026-06-28T18:54:29.417027+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

30 extracted references · 5 canonical work pages · 1 internal anchor

  1. [1]

    Filter bank common spatial pattern (FBCSP) in brain-computer interface

    Kai Keng Ang, Zheng Y ang Chin, Haihong Zhang, and Cuntai Guan. Filter bank common spatial pattern (FBCSP) in brain-computer interface. In2008 IEEE International Joint Conference on Neural Networks, pages 2390–2397. IEEE, 2008

  2. [2]

    MIT Press, 2014

    Mike X Cohen.Analyzing Neural Time Series Data: Theory and Practice. MIT Press, 2014

  3. [3]

    Cover and Joy A

    Thomas M. Cover and Joy A. Thomas.Elements of Information Theory. Wiley-Interscience, 2 edition, 2006

  4. [4]

    Differential entropy feature for EEG-based emotion recognition

    Ruo-Nan Duan, Jia-Yi Zhu, and Bao-Liang Lu. Differential entropy feature for EEG-based emotion recognition. In6th International IEEE/EMBS Conference on Neural Engineering (NER), pages 81–84. IEEE, 2013

  5. [5]

    Evgenia Gkintoni, Anthimos Aroutzidis, Hera Antonopoulou, and Constantinos Halkiopoulos. From neural networks to emotional networks: A systematic review of eeg-based emotion recognition in cognitive neuroscience and real-world applications.Brain Sciences, 15(3):220, 2025. doi: 10.3390/brainsci15030220

  6. [6]

    SEED-VII: A multimodal dataset of six basic emotions with continuous labels for emotion recognition.IEEE Transactions on Affective Computing, 2024

    Wei-Bang Jiang, Xuan-Hao Liu, Wei-Long Zheng, and Bao-Liang Lu. SEED-VII: A multimodal dataset of six basic emotions with continuous labels for emotion recognition.IEEE Transactions on Affective Computing, 2024

  7. [7]

    Large brain model for learning generic representa- tions with tremendous EEG data in BCI

    Wei-Bang Jiang, Li-Ming Zhao, and Bao-Liang Lu. Large brain model for learning generic representa- tions with tremendous EEG data in BCI. InInternational Conference on Learning Representations, 2024

  8. [8]

    EEG alpha and theta oscillations reflect cognitive and memory performance: a review and analysis.Brain Research Reviews, 29(2-3):169–195, 1999

    Wolfgang Klimesch. EEG alpha and theta oscillations reflect cognitive and memory performance: a review and analysis.Brain Research Reviews, 29(2-3):169–195, 1999

  9. [9]

    DEAP: A database for emotion analysis using physiological signals.IEEE Transactions on Affective Computing, 3(1):18–31, 2012

    Sander Koelstra, Christian Mühl, Mohammad Soleymani, Jong-Seok Lee, Ashkan Y azdani, Touradj Ebrahimi, Thierry Pun, Anton Nijholt, and Ioannis Patras. DEAP: A database for emotion analysis using physiological signals.IEEE Transactions on Affective Computing, 3(1):18–31, 2012

  10. [10]

    Journal of Neural Engineering15(5), 056013 (2018).https: //doi.org/10.1088/1741-2552/aace8c

    V ernon J. Lawhern, Amelia J. Solon, Nicholas R. Waytowich, Stephen M. Gordon, Chou P . Hung, and Brent J. Lance. EEGNet: A compact convolutional neural network for EEG-based brain-computer interfaces.Journal of Neural Engineering, 15(5):056013, 2018. doi: 10.1088/1741-2552/aace8c

  11. [11]

    Multisource transfer learning for cross-subject EEG emotion recognition.IEEE Transactions on Cybernetics, 50(7):3281–3293, 2020

    Jinpeng Li, Shuang Qiu, Changde Du, Yixin Wang, and Huiguang He. Multisource transfer learning for cross-subject EEG emotion recognition.IEEE Transactions on Cybernetics, 50(7):3281–3293, 2020

  12. [12]

    Domain adaptation for EEG emotion recognition based on latent representation similarity.IEEE Transactions on Cognitive and Developmental Systems, 12(2):344–353, 2020

    Jinpeng Li, Shuang Qiu, Y uan-Y uan Shen, Cheng-Lin Liu, and Huiguang He. Domain adaptation for EEG emotion recognition based on latent representation similarity.IEEE Transactions on Cognitive and Developmental Systems, 12(2):344–353, 2020

  13. [13]

    Eeg foundation models: Progresses, benchmarking, and open problems.arXiv preprint arXiv:2601.17883, 2026

    Dingkun Liu, Y uheng Chen, Zhu Chen, Zhenyao Cui, Y aozhi Wen, Jiayu An, Jingwei Luo, and Dongrui Wu. EEG foundation models: Progresses, benchmarking, and open problems.arXiv preprint arXiv:2601.17883, 2025

  14. [14]

    Tokenizing single-channel EEG with time-frequency motif learning

    Jiarui Liu et al. Tokenizing single-channel EEG with time-frequency motif learning. InInternational Conference on Learning Representations (ICLR), 2025

  15. [15]

    Wei Liu, Jie-Lin Qiu, Wei-Long Zheng, and Bao-Liang Lu. Comparing recognition performance and robustness of multimodal deep learning models for multimodal emotion recognition.IEEE Transactions on Cognitive and Developmental Systems, 14(2):715–729, 2022

  16. [16]

    Ravikiran Mane, Effie Chew, Karen Chua, Kai Keng Ang, Neethu Robinson, A. P . Vinod, Seong-Whan Lee, and Cuntai Guan. FBCNet: A multi-view convolutional neural network for brain-computer interface.arXiv preprint arXiv:2104.01233, 2021

  17. [17]

    Unnikrishna Pillai.Probability, Random V ariables, and Stochastic Processes

    Athanasios Papoulis and S. Unnikrishna Pillai.Probability, Random V ariables, and Stochastic Processes. McGraw-Hill, 4 edition, 2002

  18. [18]

    Stephen O. Rice. Mathematical analysis of random noise.The Bell System T echnical Journal, 23 (3):282–332, 1944. 10

  19. [19]

    EEG conformer: Convolutional transformer for EEG decoding and visualization.IEEE Transactions on Neural Systems and Rehabilitation Engineering, 31:710–719, 2023

    Y onghao Song, Qingqing Zheng, Bingchuan Liu, and Xiaorong Gao. EEG conformer: Convolutional transformer for EEG decoding and visualization.IEEE Transactions on Neural Systems and Rehabilitation Engineering, 31:710–719, 2023

  20. [20]

    RoFormer: Enhanced transformer with rotary position embedding.Neurocomputing, 568:127063, 2024

    Jianlin Su, Murtadha Ahmed, Y u Lu, Shengfeng Pan, Wen Bo, and Y unfeng Liu. RoFormer: Enhanced transformer with rotary position embedding.Neurocomputing, 568:127063, 2024

  21. [21]

    Oscillatory gamma-band (30–70 Hz) activity induced by a visual search task in humans.Journal of Neuroscience, 17(2):722–734, 1997

    Catherine Tallon-Baudry, Olivier Bertrand, Claude Delpuech, and Jacques Pernier. Oscillatory gamma-band (30–70 Hz) activity induced by a visual search task in humans.Journal of Neuroscience, 17(2):722–734, 1997

  22. [22]

    Attention is all you need

    Ashish V aswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Łukasz Kaiser, and Illia Polosukhin. Attention is all you need. InAdvances in Neural Information Processing Systems, volume 30, 2017

  23. [23]

    BrainBERT: Self-supervised representation learning for intracranial recordings

    Christopher Wang, Vighnesh Subramaniam, Adam Uri Y aari, Gabriel Kreiman, Boris Katz, Ignacio Cases, and Andrei Barbu. BrainBERT: Self-supervised representation learning for intracranial recordings. InInternational Conference on Learning Representations (ICLR), 2023

  24. [24]

    EEGPT: Pretrained transformer for universal and reliable representation of EEG signals

    Guangyu Wang, Wenchao Liu, Y uhong He, Cong Xu, Lin Ma, and Haifeng Li. EEGPT: Pretrained transformer for universal and reliable representation of EEG signals. InAdvances in Neural Information Processing Systems, volume 37, 2024

  25. [25]

    CBraMod: A criss-cross brain foundation model for EEG decoding

    Jiquan Wang, Sha Zhao, Zhiling Luo, Y angxuan Zhou, Haiteng Jiang, Shijian Li, Tao Li, and Gang Pan. CBraMod: A criss-cross brain foundation model for EEG decoding. InInternational Conference on Learning Representations, 2025

  26. [26]

    Brandon Westover, and Jimeng Sun

    Chaoqi Y ang, M. Brandon Westover, and Jimeng Sun. BIOT: Biosignal transformer for cross-data learning in the wild. InAdvances in Neural Information Processing Systems (NeurIPS), 2023

  27. [27]

    CodeBrain: Bridging Decoupled Tokenizer and Multi-Scale Architecture for EEG Foundation Model

    Jiahe Zhang et al. CodeBrain: Bridging decoupled tokenizer and multi-scale architecture for EEG foundation model.arXiv preprint arXiv:2506.09110, 2025

  28. [28]

    Wei-Long Zheng and Bao-Liang Lu. Investigating critical frequency bands and channels for EEG-based emotion recognition with deep neural networks.IEEE Transactions on Autonomous Mental Development, 7(3):162–175, 2015

  29. [29]

    EmotionMeter: A multimodal framework for recognizing human emotions.IEEE Transactions on Cybernetics, 49(3):1110–1122, 2018

    Wei-Long Zheng, Wei Liu, Yifei Lu, Bao-Liang Lu, and Andrzej Cichocki. EmotionMeter: A multimodal framework for recognizing human emotions.IEEE Transactions on Cybernetics, 49(3):1110–1122, 2018

  30. [30]

    CSBrain: A cross-scale spatiotemporal brain foundation model for EEG decoding

    Y uchen Zhou, Jiamin Wu, Zichen Ren, Zhouheng Y ao, Weiheng Lu, Kunyu Peng, Qihao Zheng, Chunfeng Song, Wanli Ouyang, and Chao Gou. CSBrain: A cross-scale spatiotemporal brain foundation model for EEG decoding. InAdvances in Neural Information Processing Systems, volume 38, 2025. 11 A T echnical appendices and supplementary material A.1 Visualization of t...