Handwriting decoding as a challenging motor task for EEG Foundation Models
Pith reviewed 2026-05-20 17:04 UTC · model grok-4.3
The pith
EEG foundation models are outperformed by smaller task-specific models on a new handwriting decoding dataset
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The paper claims that handwriting decoding serves as a challenging motor task for EEG foundation models; on a new dataset for 4-letter classification that avoids prior confounds, existing foundation models are outperformed by smaller task-specific models, even while those same foundation models achieve state-of-the-art results on multiple motor imagery datasets.
What carries the argument
The new EEG handwriting dataset designed to avoid confounds in prior works, used for 4-letter classification to test model performance with and without movement-onset knowledge.
If this is right
- Knowledge of movement onset is crucial: average performance across subjects drops from 41.3 percent to 32.4 percent without it.
- Increasing test-time signal quality raises performance substantially, for example from 45 percent to 78 percent in the best subject.
- Scaling training data steadily improves results, yet existing foundation models still do not surpass specialist models on handwriting decoding.
- Handwriting decoding exposes specific challenges in EEG motor tasks that coarse motor imagery benchmarks miss.
Where Pith is reading between the lines
- Foundation models may need pretraining on finer-grained motor sequences rather than only coarse imagery to close the gap with specialists.
- Hybrid systems that keep a general EEG foundation model but add lightweight task-specific heads could be more practical for complex BCI applications.
- The performance gap suggests that future benchmarks should include both onset-agnostic and high-signal-quality conditions to track real progress.
Load-bearing premise
The introduced dataset more rigorously evaluates models by avoiding confounds present in prior handwriting EEG datasets.
What would settle it
Showing that foundation models match or exceed task-specific models on the new dataset even when movement-onset timing is withheld would falsify the claim that current foundation models are insufficient for this task.
Figures
read the original abstract
Recent attempts at creating Foundation Models (FMs) for Electroencephalography (EEG) have achieved state-of-the-art performance on multiple tasks including Motor Imagery (MI). These MI tasks have typically involved coarse classification between imagined limb movements. However, the development of foundation models necessitates diverse datasets, both for pretraining and evaluating the progress of these models. In this work, we propose handwriting decoding as a challenging motor task for FMs. We show that several existing datasets are potentially confounded, and introduce a dataset that more rigorously evaluates models. On this dataset, we find that current FMs, despite showing SOTA performance in multiple MI datasets are outperformed by smaller task-specific models. We also highlight challenges specific to EEG-based handwriting decoding to inform future work. In our 4-letter classification task, we show that (a) Knowledge of movement-onset is crucial to reported decoding performance in prior works, with average performance across subjects dropping from $41.3\%$ to $32.4\%$. (b) Increasing test-time signal quality provides significant performance improvements ($45\%$ to $78\%$ in our best subject) compared to scaling training data with single-trial EEG. (c) While scaling training data steadily improves decoding performance, existing FMs do not outperform specialist models in handwriting decoding. We make our code available at https://anonymous.4open.science/r/EEG-Handwriting-BCI-DFCD/
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper argues that handwriting decoding represents a more challenging motor task than standard motor imagery for evaluating EEG foundation models (FMs). It identifies movement-onset timing as a potential confound in prior EEG handwriting and MI datasets, introduces a new 4-letter handwriting dataset intended to avoid such confounds, and reports that on this dataset current FMs are outperformed by smaller task-specific models. Additional experiments show that knowledge of movement onset boosts prior-work performance from 41.3% to 32.4% on average and that test-time signal quality yields larger gains (up to 78% in the best subject) than simply scaling training data.
Significance. If the new dataset successfully removes the identified timing confounds, the result would provide a stronger test of FM generalization on fine-grained motor tasks and could motivate more robust pretraining strategies. The public release of code is a positive contribution to reproducibility. The significance is tempered by the need to confirm that the new recordings themselves do not inadvertently leak onset information.
major comments (2)
- [§3 / Experiments on new dataset] The central claim that the introduced 4-letter dataset 'more rigorously evaluates models' by avoiding movement-onset confounds (abstract and §3) rests on the assertion that prior datasets are confounded, yet the manuscript does not report the same controlled ablation (with vs. without explicit movement-onset knowledge) on the new dataset that was performed on existing datasets. Without this check it remains possible that the observed performance gap versus FMs reflects residual timing information rather than a genuinely harder, confound-free task.
- [Results / Prior dataset re-evaluation] Table or figure reporting the 41.3% → 32.4% drop (abstract) should explicitly state the preprocessing steps and subject count used for the prior-work re-evaluation so that readers can assess whether the same protocol was applied uniformly when claiming the new dataset is stricter.
minor comments (2)
- [§3] Clarify in the methods whether the new dataset collection protocol (e.g., cue timing, trial segmentation) was designed to eliminate onset leakage or merely to reduce it; a short paragraph on this design choice would strengthen the rigor claim.
- [Results / FM comparison] The statement that 'existing FMs do not outperform specialist models' should be accompanied by the exact model sizes, pretraining corpora, and fine-tuning hyperparameters used for the FM baselines to allow direct comparison.
Simulated Author's Rebuttal
We thank the referee for the detailed and constructive comments. We address each major point below, agreeing where the manuscript can be strengthened through clarification or additional reporting, and explaining our position on the substance of the claims.
read point-by-point responses
-
Referee: [§3 / Experiments on new dataset] The central claim that the introduced 4-letter dataset 'more rigorously evaluates models' by avoiding movement-onset confounds (abstract and §3) rests on the assertion that prior datasets are confounded, yet the manuscript does not report the same controlled ablation (with vs. without explicit movement-onset knowledge) on the new dataset that was performed on existing datasets. Without this check it remains possible that the observed performance gap versus FMs reflects residual timing information rather than a genuinely harder, confound-free task.
Authors: We agree that performing the identical with/without movement-onset ablation on the new dataset would provide direct evidence that residual timing information is not driving the observed performance gap. The new dataset was recorded with a protocol that does not provide explicit onset cues to participants or models during data collection, but we did not include the controlled ablation in the submitted version. In the revision we will add this experiment (reporting average and per-subject accuracies with and without onset knowledge) to §3 and the associated figure, allowing readers to directly compare the magnitude of any drop against the prior-dataset results. revision: yes
-
Referee: [Results / Prior dataset re-evaluation] Table or figure reporting the 41.3% → 32.4% drop (abstract) should explicitly state the preprocessing steps and subject count used for the prior-work re-evaluation so that readers can assess whether the same protocol was applied uniformly when claiming the new dataset is stricter.
Authors: We accept that the current presentation of the 41.3% to 32.4% result lacks sufficient detail for readers to verify protocol uniformity. The re-evaluation used the same bandpass filtering, epoching relative to movement onset, and subject inclusion criteria as the original prior-work papers, applied to the publicly released data for the reported number of subjects. In the revised manuscript we will expand the caption of the relevant table/figure and add a short methods paragraph in §3 that lists the exact preprocessing pipeline and subject count for this ablation, making the comparison with the new dataset fully transparent. revision: yes
Circularity Check
No significant circularity; empirical comparisons are self-contained
full rationale
The paper's central claims rest on direct experimental evaluations of foundation models versus task-specific models on a newly collected 4-letter handwriting EEG dataset. Performance metrics (e.g., accuracy drops from 41.3% to 32.4% when withholding movement-onset information on prior datasets) are reported from controlled ablations and model training runs, not from any equations, parameter fits, or derivations that loop back to the inputs by construction. The assertion that the new dataset avoids confounds present in prior work is presented as a design choice and motivation, but the reported results do not reduce to self-citations, renamed known patterns, or fitted inputs called predictions. No load-bearing self-citation chains or ansatz smuggling appear in the provided text; the work is a standard empirical ML comparison paper.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption Standard assumptions in EEG preprocessing and classification hold for the new dataset
Reference graph
Works this paper leans on
-
[1]
Tomislav Milekovic, Anish A Sarma, Daniel Bacher, John D Simeral, Jad Saab, Chethan Pandarinath, Brittany L Sorice, Christine Blabe, Erin M Oakley, Kathryn R Tringale, et al. Stable long-term BCI-enabled communication in ALS and locked-in syndrome using LFP signals.Journal of neurophysiology, 120(7):343–360, 2018
work page 2018
-
[2]
Andrea Kübler. The history of BCI: From a vision for the future to real support for personhood in people with locked-in syndrome.Neuroethics, 13(2):163–180, 2020
work page 2020
-
[3]
The neurophysiological basis of motor imagery.Behavioural brain research, 77(1-2):45–52, 1996
Jean Decety. The neurophysiological basis of motor imagery.Behavioural brain research, 77(1-2):45–52, 1996
work page 1996
-
[4]
Eltaf Abdalsalam Mohamed, Mohd Zuki Yusoff, Aamir Saeed Malik, Mohammad Rida Bahloul, Dalia Mahmoud Adam, and Ibrahim Khalil Adam. Comparison of EEG signal decomposition methods in classification of motor-imagery BCI.Multimedia Tools and Applications, 77:21305– 21327, 2018
work page 2018
-
[5]
High-performance brain-to-text communication via handwriting.Nature, 593(7858):249–254, 2021
Francis R Willett, Donald T Avansino, Leigh R Hochberg, Jaimie M Henderson, and Kr- ishna V Shenoy. High-performance brain-to-text communication via handwriting.Nature, 593(7858):249–254, 2021
work page 2021
-
[6]
Leisi Pei and Guang Ouyang. Online recognition of handwritten characters from scalp-recorded brain activities during handwriting.Journal of Neural Engineering, 18(4):046070, 2021
work page 2021
-
[7]
Jun-Young Kim, Deok-Seon Kim, and Seo-Hyun Lee. Towards scalable handwriting communi- cation via EEG decoding and latent embedding integration.arXiv preprint arXiv:2411.09170, 2024
-
[8]
Xiaowei Jiang, Charles Zhou, Yiqun Duan, Ziyi Zhao, Thomas Do, and Chin-Teng Lin. Neural spelling: A spell-based BCI system for language neural decoding.arXiv preprint arXiv:2501.17489, 2025
-
[9]
Ayush Tripathi, Aryan Gupta, AP Prathosh, Suriya Prakash Muthukrishnan, and Lalan Kumar. NeuroAiR: Deep learning framework for airwriting recognition from scalp-recorded neural signals.IEEE Transactions on Instrumentation and Measurement, 2024
work page 2024
-
[10]
Seeing through the brain: Image reconstruction of visual perception from human brain signals, 2023
Yu-Ting Lan, Kan Ren, Yansen Wang, Wei-Long Zheng, Dongsheng Li, Bao-Liang Lu, and Lili Qiu. Seeing through the brain: Image reconstruction of visual perception from human brain signals, 2023
work page 2023
-
[11]
Perceptogram: Reconstructing visual percepts from EEG.arXiv preprint arXiv:2404.01250, 2024
Teng Fei, Abhinav Uppal, Ian Jackson, Srinivas Ravishankar, David Wang, and Virginia R de Sa. Perceptogram: Reconstructing visual percepts from EEG.arXiv preprint arXiv:2404.01250, 2024
-
[12]
Ren Li, Jared S Johansen, Hamad Ahmed, Thomas V Ilyevsky, Ronnie B Wilbur, Hari M Bharad- waj, and Jeffrey Mark Siskind. The perils and pitfalls of block design for EEG classification experiments.IEEE Transactions on Pattern Analysis and Machine Intelligence, 43(1):316–333, 2020
work page 2020
-
[13]
Lingyu Wu, Tzyy-Ping Jung, Xiaojian Li, Yanhong Zhou, Xianglong Wan, Wenlong Jiao, Xueguang Xie, Dingna Duan, Tiange Liu, Hao Yu, et al. Mind-pinyin speller: A non-invasive brain-computer interface for efficient chinese character input using EEG-based imagined hand- writing.Expert Systems with Applications, page 129999, 2025
work page 2025
-
[14]
Ovishake Sen, Raghav Soni, Darpan Virmani, Akshar Parekh, Patrick Lehman, Sarthak Jena, Adithi Katikhaneni, Adam Khalifa, and Baibhab Chatterjee. A low-latency neural inference framework for real-time handwriting recognition from EEG signals on an edge device: O. sen et al.Scientific Reports, 15(1):41040, 2025
work page 2025
-
[15]
Markus R Crell and Gernot R Müller-Putz. Handwritten character classification from EEG through continuous kinematic decoding.Computers in Biology and Medicine, 182:109132, 2024. 10
work page 2024
-
[16]
Hao Yang and Guang Ouyang. Handwriting imagery EEG classification based on convolutional neural networks.arXiv preprint arXiv:2509.03111, 2025
-
[17]
Vernon J Lawhern, Amelia J Solon, Nicholas R Waytowich, Stephen M Gordon, Chou P Hung, and Brent J Lance. EEGNet: a compact convolutional neural network for EEG-based brain-computer interfaces.Journal of neural engineering, 15(5):056013, 2018
work page 2018
-
[18]
Yonghao Song, Qingqing Zheng, Bingchuan Liu, and Xiaorong Gao. EEG conformer: Convo- lutional transformer for EEG decoding and visualization.IEEE Transactions on Neural Systems and Rehabilitation Engineering, 31:710–719, 2022
work page 2022
-
[19]
Jin Jing, Wendong Ge, Shenda Hong, Marta Bento Fernandes, Zhen Lin, Chaoqi Yang, Sungtae An, Aaron F Struck, Aline Herlopian, Ioannis Karakis, et al. Development of expert-level clas- sification of seizures and rhythmic and periodic patterns during EEG interpretation.Neurology, 100(17):e1750–e1762, 2023
work page 2023
-
[20]
Jiquan Wang, Sha Zhao, Zhiling Luo, Yangxuan Zhou, Haiteng Jiang, Shijian Li, Tao Li, and Gang Pan. CBraMod: A criss-cross brain foundation model for EEG decoding.arXiv preprint arXiv:2412.07236, 2024
-
[21]
Dingkun Liu, Zhu Chen, Jingwei Luo, Shijie Lian, and Dongrui Wu. MIRepNet: A pipeline and foundation model for EEG-based motor imagery classification.arXiv preprint arXiv:2507.20254, 2025
-
[22]
Yassine El Ouahidi, Jonathan Lys, Philipp Thölke, Nicolas Farrugia, Bastien Pasdeloup, Vincent Gripon, Karim Jerbi, and Giulia Lioi. REVE: A foundation model for EEG–adapting to any setup with large-scale pretraining on 25,000 subjects.arXiv preprint arXiv:2510.21585, 2025
-
[23]
H Shibasaki, G Barrett, Elise Halliday, and A.M Halliday. Components of the movement- related cortical potential and their scalp topography.Electroencephalography and Clinical Neurophysiology, 49(3–4):213–226, August 1980
work page 1980
-
[24]
Grad-cam: Visual explanations from deep networks via gradient-based localization
Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. InProceedings of the IEEE international conference on computer vision, pages 618–626, 2017
work page 2017
-
[25]
Deep learning with convolutional neural networks for eeg decoding and visualization
Robin Tibor Schirrmeister, Jost Tobias Springenberg, Lukas Dominique Josef Fiederer, Martin Glasstetter, Katharina Eggensperger, Michael Tangermann, Frank Hutter, Wolfram Burgard, and Tonio Ball. Deep learning with convolutional neural networks for eeg decoding and visualization. Human Brain Mapping, aug 2017. 11 A Confounded Experimental Designs A.1 Ocul...
work page 2017
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.