EEG-MFTNet: An Enhanced EEGNet Architecture with Multi-Scale Temporal Convolutions and Transformer Fusion for Cross-Session Motor Imagery Decoding
Pith reviewed 2026-05-10 18:55 UTC · model grok-4.3
The pith
EEG-MFTNet adds multi-scale temporal convolutions and a transformer stream to EEGNet to reach 58.9 percent average accuracy on cross-session motor imagery tasks.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
EEG-MFTNet achieves an average classification accuracy of 58.9 percent on the SHU dataset under subject-dependent cross-session conditions for four-class motor imagery, outperforming EEGNet and its recent derivatives by jointly capturing short- and long-range temporal structure through multi-scale convolutions and transformer fusion.
What carries the argument
The EEG-MFTNet architecture, built on EEGNet, that integrates parallel multi-scale temporal convolution branches to extract features at multiple temporal resolutions and a transformer encoder stream to model global dependencies before final classification.
If this is right
- The model supports real-time BCI use because computational complexity and inference latency remain low.
- The architecture reduces sensitivity to cross-session variability while preserving the original EEGNet's efficiency.
- The same design approach can extend to other EEG decoding tasks that require both local and long-range temporal modeling.
Where Pith is reading between the lines
- Comparable fusion of multi-scale convolutions and transformers may improve cross-session performance on other EEG classification problems such as emotion recognition or sleep staging.
- Evaluating the model on additional public motor-imagery datasets would test whether the reported gains hold beyond the SHU collection.
- The low-latency profile suggests the architecture could run on embedded hardware for portable BCI devices.
Load-bearing premise
The accuracy improvement stems mainly from the added multi-scale convolutions and transformer components rather than from training procedure differences or dataset-specific tuning.
What would settle it
Re-training the baseline EEGNet under identical optimizer, learning-rate schedule, epoch count, and data handling as EEG-MFTNet on the same SHU cross-session splits and measuring whether the accuracy difference vanishes.
Figures
read the original abstract
Brain-computer interfaces (BCIs) enable direct communication between the brain and external devices, providing critical support for individuals with motor impairments. However, accurate motor imagery (MI) decoding from electroencephalography (EEG) remains challenging due to noise and cross-session variability. This study introduces EEG-MFTNet, a novel deep learning model based on the EEGNet architecture, enhanced with multi-scale temporal convolutions and a Transformer encoder stream. These components are designed to capture both short and long-range temporal dependencies in EEG signals. The model is evaluated on the SHU dataset using a subject-dependent cross-session setup, outperforming baseline models, including EEGNet and its recent derivatives. EEG-MFTNet achieves an average classification accuracy of 58.9% while maintaining low computational complexity and inference latency. The results highlight the model's potential for real-time BCI applications and underscore the importance of architectural innovations in improving MI decoding. This work contributes to the development of more robust and adaptive BCI systems, with implications for assistive technologies and neurorehabilitation.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper proposes EEG-MFTNet, an enhanced EEGNet architecture that adds multi-scale temporal convolutions and a Transformer encoder stream to capture short- and long-range temporal dependencies in EEG for motor imagery decoding. Evaluated in a subject-dependent cross-session setup on the SHU dataset, the model reports 58.9% average accuracy, outperforming EEGNet and recent derivatives while maintaining low computational complexity and inference latency, with implications for real-time BCI applications.
Significance. If the accuracy gains can be rigorously attributed to the architectural additions, the work offers a practical direction for improving cross-session robustness in MI-BCI systems without excessive compute overhead. The explicit attention to inference latency is a strength for deployment. However, the single-dataset scope and lack of controlled isolation of components limit the strength of claims about broader applicability.
major comments (1)
- [Experiments] Experiments section: No ablation studies are reported that isolate the multi-scale temporal convolutions and Transformer encoder stream by removing them while holding the training pipeline, optimizer, regularization, preprocessing, and hyperparameter schedule fixed to those used for the full EEG-MFTNet. Without these controls, the 58.9% accuracy cannot be confidently attributed to the proposed components rather than to differences in implementation details or capacity.
minor comments (2)
- [Results] Results: Provide error bars, standard deviations across subjects or sessions, and statistical significance tests (e.g., paired t-tests or Wilcoxon) for all accuracy comparisons against baselines.
- [Method] Method: Include a clear diagram or equations detailing the exact fusion operation between the multi-scale convolution branch and the Transformer stream.
Simulated Author's Rebuttal
We thank the referee for the constructive feedback on our manuscript. The major comment highlights a valid point about experimental controls, which we address directly below. We are committed to enhancing the rigor of our claims through revisions.
read point-by-point responses
-
Referee: Experiments section: No ablation studies are reported that isolate the multi-scale temporal convolutions and Transformer encoder stream by removing them while holding the training pipeline, optimizer, regularization, preprocessing, and hyperparameter schedule fixed to those used for the full EEG-MFTNet. Without these controls, the 58.9% accuracy cannot be confidently attributed to the proposed components rather than to differences in implementation details or capacity.
Authors: We agree that controlled ablation studies are necessary to rigorously attribute performance improvements to the multi-scale temporal convolutions and Transformer encoder stream. In the revised manuscript, we will add a dedicated ablation subsection in the Experiments section. These studies will remove each component individually (and in combination) while strictly maintaining the identical training pipeline, optimizer, regularization, preprocessing steps, and hyperparameter schedule used for the full EEG-MFTNet. Results will be reported with the same cross-session evaluation protocol on the SHU dataset to isolate the contributions of the proposed additions. This will strengthen the evidence that the observed gains over EEGNet and its derivatives stem from the architectural innovations rather than implementation variances. revision: yes
Circularity Check
No circularity: empirical architecture proposal with standard evaluation
full rationale
The paper proposes EEG-MFTNet as an architectural modification to EEGNet (multi-scale temporal convolutions plus Transformer stream) and reports empirical accuracies (58.9% average) on the SHU dataset under subject-dependent cross-session splits. No derivation, first-principles result, or prediction is claimed that reduces by construction to fitted inputs, self-citations, or ansatzes. Evaluation follows ordinary supervised training and held-out testing; any performance lift is presented as an empirical observation rather than a mathematical necessity. This is the expected non-finding for an applied ML architecture paper.
Axiom & Free-Parameter Ledger
free parameters (1)
- Neural network hyperparameters (filter counts, kernel sizes, attention heads, learning rate schedule)
axioms (2)
- domain assumption EEG motor imagery signals contain both short-range and long-range temporal dependencies that multi-scale convolutions and transformers can usefully capture
- domain assumption The SHU dataset in a subject-dependent cross-session protocol is representative for evaluating real-world BCI robustness
Reference graph
Works this paper leans on
-
[1]
Brain computer interfaces, a review,
L. F. Nicolas-Alonso and J. Gomez-Gil, “Brain computer interfaces, a review,”Sensors, vol. 12, no. 2, pp. 1211–1279, 2012. [Online]. Available: https://doi.org/10.3390/s120201211
-
[2]
H. Altaheri, G. Muhammad, M. Alsulaiman, S. U. Amin, G. A. Altuwaijri, W. Abdul, M. A. Bencherif, and M. Faisal, “Deep learning techniques for classification of electroencephalogram (eeg) motor imagery (mi) signals: A review,”Neural Computing and Applications, vol. 35, no. 20, pp. 14 681–14 722, 2023
work page 2023
-
[3]
J. Meng, Y . Wei, X. Mai, S. Li, X. Wang, R. Luo, M. Ji, and X. Zhu, “Paradigms and methods of noninvasive brain-computer interfaces in motor or communication assistance and rehabilitation: a systematic review,”Medical & Biological Engineering & Computing, pp. 1–25, 2025
work page 2025
-
[4]
Brain computer interface: control signals review,
R. A. Ramadan and A. V . Vasilakos, “Brain computer interface: control signals review,”Neurocomputing, vol. 223, pp. 26–44, 2017
work page 2017
-
[5]
Signal acquisition of brain–computer interfaces: A medical-engineering crossover perspective review,
Y . Sun, X. Chen, B. Liu, L. Liang, Y . Wang, S. Gao, and X. Gao, “Signal acquisition of brain–computer interfaces: A medical-engineering crossover perspective review,”Fundamental Research, vol. 5, no. 1, pp. 3–16, 2025
work page 2025
-
[6]
Eeg- based brain-computer interfaces using motor-imagery: Techniques and challenges,
N. Padfield, J. Zabalza, H. Zhao, V . Masero, and J. Ren, “Eeg- based brain-computer interfaces using motor-imagery: Techniques and challenges,”Sensors (Basel, Switzerland), vol. 19, no. 6, p. 1423, 2019. [Online]. Available: https://doi.org/10.3390/s19061423
-
[7]
Eeg-based bcis on motor imagery paradigm using wearable technologies: A systematic review,
A. Saibene, M. Caglioni, S. Corchs, and F. Gasparini, “Eeg-based bcis on motor imagery paradigm using wearable technologies: A systematic review,”Sensors, vol. 23, no. 5, p. 2798, 2023. [Online]. Available: https://doi.org/10.3390/s23052798
-
[8]
Non-invasive brain-computer interfaces: state of the art and trends,
B. J. Edelman, S. Zhang, G. Schalk, P. Brunner, G. Müller-Putz, C. Guan, and B. He, “Non-invasive brain-computer interfaces: state of the art and trends,”IEEE reviews in biomedical engineering, 2024
work page 2024
-
[9]
Optimal spatial filtering of single trial eeg during imagined hand movement,
H. Ramoser, J. Muller-Gerking, and G. Pfurtscheller, “Optimal spatial filtering of single trial eeg during imagined hand movement,”IEEE transactions on rehabilitation engineering, vol. 8, no. 4, pp. 441–446, 2000
work page 2000
-
[10]
Filter bank common spatial pattern (fbcsp) in brain-computer interface,
K. K. Ang, Z. Y . Chin, H. Zhang, and C. Guan, “Filter bank common spatial pattern (fbcsp) in brain-computer interface,” in2008 IEEE international joint conference on neural networks (IEEE world congress on computational intelligence). IEEE, 2008, pp. 2390–2397
work page 2008
-
[11]
Mfrc-net: Multi- scale feature residual convolutional neural network for motor imagery decoding,
X. Li, Z. Yang, X. Tu, J. Wang, and J. Huang, “Mfrc-net: Multi- scale feature residual convolutional neural network for motor imagery decoding,”IEEE Journal of Biomedical and Health Informatics, 2024
work page 2024
-
[12]
An in-depth survey on deep learning-based motor imagery electroencephalogram (eeg) classification,
X. Wang, V . Liesaputra, Z. Liu, Y . Wang, and Z. Huang, “An in-depth survey on deep learning-based motor imagery electroencephalogram (eeg) classification,”Artificial intelligence in medicine, vol. 147, p. 102738, 2024
work page 2024
-
[13]
J. Ma, B. Yang, W. Qiu, Y . Li, S. Gao, and X. Xia, “A large eeg dataset for studying cross-session variability in motor imagery brain-computer interface,”Scientific Data, vol. 9, no. 1, p. 531, 2022
work page 2022
-
[14]
Deep learning with convolutional neural networks for eeg decoding and visualization,
R. T. Schirrmeister, J. T. Springenberg, L. D. J. Fiederer, M. Glasstetter, K. Eggensperger, M. Tangermann, F. Hutter, W. Burgard, and T. Ball, “Deep learning with convolutional neural networks for eeg decoding and visualization,”Human brain mapping, vol. 38, no. 11, pp. 5391–5420, 2017
work page 2017
-
[15]
Eegnet: a compact convolutional neural network for eeg-based brain–computer interfaces,
V . J. Lawhern, A. J. Solon, N. R. Waytowich, S. M. Gordon, C. P. Hung, and B. J. Lance, “Eegnet: a compact convolutional neural network for eeg-based brain–computer interfaces,”Journal of neural engineering, vol. 15, no. 5, p. 056013, 2018
work page 2018
-
[16]
T. M. Ingolfsson, M. Hersche, X. Wang, N. Kobayashi, L. Cavigelli, and L. Benini, “Eeg-tcnet: An accurate temporal convolutional network for embedded motor-imagery brain–machine interfaces,” in2020 IEEE International Conference on Systems, Man, and Cybernetics (SMC). IEEE, 2020, pp. 2958–2965
work page 2020
-
[17]
Mi-eegnet: A novel convolutional neural network for motor imagery classification,
M. Riyad, M. Khalil, and A. Adib, “Mi-eegnet: A novel convolutional neural network for motor imagery classification,”Journal of Neuroscience Methods, vol. 353, p. 109037, 2021
work page 2021
-
[18]
A. Salami, J. Andreu-Perez, and H. Gillmeister, “Eeg-itnet: An ex- plainable inception temporal convolutional network for motor imagery classification,”IEEE Access, vol. 10, pp. 36 672–36 685, 2022
work page 2022
-
[19]
T. Liang, X. Yu, X. Liu, H. Wang, X. Liu, and B. Dong, “Eeg-cdilnet: a lightweight and accurate cnn network using circular dilated convolution for motor imagery classification,”Journal of Neural Engineering, vol. 20, no. 4, p. 046031, 2023
work page 2023
-
[20]
H. Wang, H. Yu, and H. Wang, “Eeg_genet: A feature-level graph embedding method for motor imagery classification based on eeg signals,” Biocybernetics and Biomedical Engineering, vol. 42, no. 3, pp. 1023– 1040, 2022
work page 2022
-
[21]
A strong and simple deep learning baseline for bci motor imagery decoding,
Y . El Ouahidi, V . Gripon, B. Pasdeloup, G. Bouallegue, N. Farrugia, and G. Lioi, “A strong and simple deep learning baseline for bci motor imagery decoding,”IEEE Transactions on Neural Systems and Rehabilitation Engineering, 2024
work page 2024
-
[22]
Y . Hong, X. Zeng, F. Wu, and J. Wang, “Mafnet: Multi-domain features attention-based fusion network for cross-subject motor imagery classi- fication,” in2024 International Joint Conference on Neural Networks (IJCNN). IEEE, 2024, pp. 1–7
work page 2024
-
[23]
Enhancing brain decoding using attention augmented deep neural networks,
I. A. Abdellaoui, J. G. Fernández, C. Sahinli, and S. Mehrkanoon, “Enhancing brain decoding using attention augmented deep neural networks,” inProceedings of European Symposium on Artificial Neural Networks, Computational Intelligence and Machine Learning. ESANN, 2021, pp. 183–188
work page 2021
-
[24]
Dual stream graph transformer fusion networks for enhanced brain decoding,
L. Goené and S. Mehrkanoon, “Dual stream graph transformer fusion networks for enhanced brain decoding,” inProceedings of European Symposium on Artificial Neural Networks, Computational Intelligence and Machine Learning (ESANN). ESANN, 2024, pp. 375–380
work page 2024
-
[25]
S. Kazatzidis and S. Mehrkanoon, “A novel dual-stream time-frequency contrastive pretext tasks framework for sleep stage classification,” in Proceedings of International Joint Conference on Neural Networks (IJCNN). IEEE, 2024, pp. 1–8
work page 2024
-
[26]
BAST- Mamba: Binaural audio spectrogram mamba transformer for binaural sound localization,
S. Kuang, J. Shi, K. van der Heijden, and S. Mehrkanoon, “BAST- Mamba: Binaural audio spectrogram mamba transformer for binaural sound localization,”Neurocomputing, p. 130804, 2025
work page 2025
-
[27]
C. Fan, B. Yang, X. Li, S. Gao, and P. Zan, “Eeg-based feature classification combining 3d-convolutional neural networks with gen- erative adversarial networks for motor imagery,”Journal of Integrative Neuroscience, vol. 23, no. 8, p. 153, 2024
work page 2024
-
[28]
Towards best practice of interpreting deep learning models for eeg-based brain computer interfaces,
J. Cui, L. Yuan, Z. Wang, R. Li, and T. Jiang, “Towards best practice of interpreting deep learning models for eeg-based brain computer interfaces,”Frontiers in Computational Neuroscience, vol. 17, 2023. [Online]. Available: https://doi.org/10.3389/fncom.2023.1232925
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.