pith. sign in

arxiv: 2605.15698 · v1 · pith:RFID5OIEnew · submitted 2026-05-15 · 💻 cs.HC

Handwriting decoding as a challenging motor task for EEG Foundation Models

Pith reviewed 2026-05-20 17:04 UTC · model grok-4.3

classification 💻 cs.HC
keywords EEGfoundation modelshandwriting decodingmotor imagerybrain-computer interfacemotor task
0
0 comments X

The pith

EEG foundation models are outperformed by smaller task-specific models on a new handwriting decoding dataset

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

This paper proposes handwriting decoding from EEG signals as a challenging motor task to test and advance foundation models. It points out that several existing handwriting datasets may contain confounds that inflate performance, and introduces a new dataset meant to evaluate models more rigorously. On this dataset the authors find that current foundation models, which lead on coarse motor imagery tasks, are beaten by smaller models built specifically for the handwriting classification problem. They further show that knowing when movement starts is important for reported accuracy and that better test-time signal quality lifts performance more than simply adding training trials. This matters because foundation models for EEG require varied, clean benchmarks to move beyond simple limb-imagery classification.

Core claim

The paper claims that handwriting decoding serves as a challenging motor task for EEG foundation models; on a new dataset for 4-letter classification that avoids prior confounds, existing foundation models are outperformed by smaller task-specific models, even while those same foundation models achieve state-of-the-art results on multiple motor imagery datasets.

What carries the argument

The new EEG handwriting dataset designed to avoid confounds in prior works, used for 4-letter classification to test model performance with and without movement-onset knowledge.

If this is right

  • Knowledge of movement onset is crucial: average performance across subjects drops from 41.3 percent to 32.4 percent without it.
  • Increasing test-time signal quality raises performance substantially, for example from 45 percent to 78 percent in the best subject.
  • Scaling training data steadily improves results, yet existing foundation models still do not surpass specialist models on handwriting decoding.
  • Handwriting decoding exposes specific challenges in EEG motor tasks that coarse motor imagery benchmarks miss.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Foundation models may need pretraining on finer-grained motor sequences rather than only coarse imagery to close the gap with specialists.
  • Hybrid systems that keep a general EEG foundation model but add lightweight task-specific heads could be more practical for complex BCI applications.
  • The performance gap suggests that future benchmarks should include both onset-agnostic and high-signal-quality conditions to track real progress.

Load-bearing premise

The introduced dataset more rigorously evaluates models by avoiding confounds present in prior handwriting EEG datasets.

What would settle it

Showing that foundation models match or exceed task-specific models on the new dataset even when movement-onset timing is withheld would falsify the claim that current foundation models are insufficient for this task.

Figures

Figures reproduced from arXiv: 2605.15698 by Ishayu Ghosh, Nora Zajzon, Srinivas Ravishankar, Teng Fei, Virginia de Sa.

Figure 1
Figure 1. Figure 1: Experimental design for a single trial. The participant fixates on the monitor while writing [PITH_FULL_IMAGE:figures/full_fig_p004_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Epoching settings that represent different difficulties. Movement locked epoching (left) is [PITH_FULL_IMAGE:figures/full_fig_p004_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: (a) Participant-wise decoding performance on 3 different settings. Performance drops [PITH_FULL_IMAGE:figures/full_fig_p007_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Decoding performance on P1’s last 160 ME movement-locked trials, when scaling training [PITH_FULL_IMAGE:figures/full_fig_p008_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: (a): Movement Related Cortical Potential (MRCP) signal during Handwriting Imagery [PITH_FULL_IMAGE:figures/full_fig_p008_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: (a) Flawed exp. design, with horizontal eye motion predictive of the letter being written. [PITH_FULL_IMAGE:figures/full_fig_p012_6.png] view at source ↗
Figure 7
Figure 7. Figure 7: Experimental design with potential visual decoding confounds. Participant is instructed to [PITH_FULL_IMAGE:figures/full_fig_p012_7.png] view at source ↗
Figure 8
Figure 8. Figure 8: Confusion matrix showing evidence consistent with temporal confounding in a block [PITH_FULL_IMAGE:figures/full_fig_p013_8.png] view at source ↗
Figure 9
Figure 9. Figure 9: A subset of the standard 10-10 montage designed to record more densely from the midline [PITH_FULL_IMAGE:figures/full_fig_p014_9.png] view at source ↗
Figure 10
Figure 10. Figure 10: Effect of epoching strategy on decoding performance for the 5-letter classification. The [PITH_FULL_IMAGE:figures/full_fig_p016_10.png] view at source ↗
Figure 11
Figure 11. Figure 11: Mean-subtracted reaction time distribution indicates that cue-locked setting introduces [PITH_FULL_IMAGE:figures/full_fig_p017_11.png] view at source ↗
read the original abstract

Recent attempts at creating Foundation Models (FMs) for Electroencephalography (EEG) have achieved state-of-the-art performance on multiple tasks including Motor Imagery (MI). These MI tasks have typically involved coarse classification between imagined limb movements. However, the development of foundation models necessitates diverse datasets, both for pretraining and evaluating the progress of these models. In this work, we propose handwriting decoding as a challenging motor task for FMs. We show that several existing datasets are potentially confounded, and introduce a dataset that more rigorously evaluates models. On this dataset, we find that current FMs, despite showing SOTA performance in multiple MI datasets are outperformed by smaller task-specific models. We also highlight challenges specific to EEG-based handwriting decoding to inform future work. In our 4-letter classification task, we show that (a) Knowledge of movement-onset is crucial to reported decoding performance in prior works, with average performance across subjects dropping from $41.3\%$ to $32.4\%$. (b) Increasing test-time signal quality provides significant performance improvements ($45\%$ to $78\%$ in our best subject) compared to scaling training data with single-trial EEG. (c) While scaling training data steadily improves decoding performance, existing FMs do not outperform specialist models in handwriting decoding. We make our code available at https://anonymous.4open.science/r/EEG-Handwriting-BCI-DFCD/

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper argues that handwriting decoding represents a more challenging motor task than standard motor imagery for evaluating EEG foundation models (FMs). It identifies movement-onset timing as a potential confound in prior EEG handwriting and MI datasets, introduces a new 4-letter handwriting dataset intended to avoid such confounds, and reports that on this dataset current FMs are outperformed by smaller task-specific models. Additional experiments show that knowledge of movement onset boosts prior-work performance from 41.3% to 32.4% on average and that test-time signal quality yields larger gains (up to 78% in the best subject) than simply scaling training data.

Significance. If the new dataset successfully removes the identified timing confounds, the result would provide a stronger test of FM generalization on fine-grained motor tasks and could motivate more robust pretraining strategies. The public release of code is a positive contribution to reproducibility. The significance is tempered by the need to confirm that the new recordings themselves do not inadvertently leak onset information.

major comments (2)
  1. [§3 / Experiments on new dataset] The central claim that the introduced 4-letter dataset 'more rigorously evaluates models' by avoiding movement-onset confounds (abstract and §3) rests on the assertion that prior datasets are confounded, yet the manuscript does not report the same controlled ablation (with vs. without explicit movement-onset knowledge) on the new dataset that was performed on existing datasets. Without this check it remains possible that the observed performance gap versus FMs reflects residual timing information rather than a genuinely harder, confound-free task.
  2. [Results / Prior dataset re-evaluation] Table or figure reporting the 41.3% → 32.4% drop (abstract) should explicitly state the preprocessing steps and subject count used for the prior-work re-evaluation so that readers can assess whether the same protocol was applied uniformly when claiming the new dataset is stricter.
minor comments (2)
  1. [§3] Clarify in the methods whether the new dataset collection protocol (e.g., cue timing, trial segmentation) was designed to eliminate onset leakage or merely to reduce it; a short paragraph on this design choice would strengthen the rigor claim.
  2. [Results / FM comparison] The statement that 'existing FMs do not outperform specialist models' should be accompanied by the exact model sizes, pretraining corpora, and fine-tuning hyperparameters used for the FM baselines to allow direct comparison.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the detailed and constructive comments. We address each major point below, agreeing where the manuscript can be strengthened through clarification or additional reporting, and explaining our position on the substance of the claims.

read point-by-point responses
  1. Referee: [§3 / Experiments on new dataset] The central claim that the introduced 4-letter dataset 'more rigorously evaluates models' by avoiding movement-onset confounds (abstract and §3) rests on the assertion that prior datasets are confounded, yet the manuscript does not report the same controlled ablation (with vs. without explicit movement-onset knowledge) on the new dataset that was performed on existing datasets. Without this check it remains possible that the observed performance gap versus FMs reflects residual timing information rather than a genuinely harder, confound-free task.

    Authors: We agree that performing the identical with/without movement-onset ablation on the new dataset would provide direct evidence that residual timing information is not driving the observed performance gap. The new dataset was recorded with a protocol that does not provide explicit onset cues to participants or models during data collection, but we did not include the controlled ablation in the submitted version. In the revision we will add this experiment (reporting average and per-subject accuracies with and without onset knowledge) to §3 and the associated figure, allowing readers to directly compare the magnitude of any drop against the prior-dataset results. revision: yes

  2. Referee: [Results / Prior dataset re-evaluation] Table or figure reporting the 41.3% → 32.4% drop (abstract) should explicitly state the preprocessing steps and subject count used for the prior-work re-evaluation so that readers can assess whether the same protocol was applied uniformly when claiming the new dataset is stricter.

    Authors: We accept that the current presentation of the 41.3% to 32.4% result lacks sufficient detail for readers to verify protocol uniformity. The re-evaluation used the same bandpass filtering, epoching relative to movement onset, and subject inclusion criteria as the original prior-work papers, applied to the publicly released data for the reported number of subjects. In the revised manuscript we will expand the caption of the relevant table/figure and add a short methods paragraph in §3 that lists the exact preprocessing pipeline and subject count for this ablation, making the comparison with the new dataset fully transparent. revision: yes

Circularity Check

0 steps flagged

No significant circularity; empirical comparisons are self-contained

full rationale

The paper's central claims rest on direct experimental evaluations of foundation models versus task-specific models on a newly collected 4-letter handwriting EEG dataset. Performance metrics (e.g., accuracy drops from 41.3% to 32.4% when withholding movement-onset information on prior datasets) are reported from controlled ablations and model training runs, not from any equations, parameter fits, or derivations that loop back to the inputs by construction. The assertion that the new dataset avoids confounds present in prior work is presented as a design choice and motivation, but the reported results do not reduce to self-citations, renamed known patterns, or fitted inputs called predictions. No load-bearing self-citation chains or ansatz smuggling appear in the provided text; the work is a standard empirical ML comparison paper.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claims rest on standard assumptions in EEG signal processing and machine learning evaluation rather than new postulates. No invented entities or heavy free parameters are introduced beyond typical model hyperparameters.

axioms (1)
  • domain assumption Standard assumptions in EEG preprocessing and classification hold for the new dataset
    Invoked when claiming the new dataset avoids confounds and enables fair model comparison.

pith-pipeline@v0.9.0 · 5794 in / 1319 out tokens · 80890 ms · 2026-05-20T17:04:35.191758+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

25 extracted references · 25 canonical work pages

  1. [1]

    Stable long-term BCI-enabled communication in ALS and locked-in syndrome using LFP signals.Journal of neurophysiology, 120(7):343–360, 2018

    Tomislav Milekovic, Anish A Sarma, Daniel Bacher, John D Simeral, Jad Saab, Chethan Pandarinath, Brittany L Sorice, Christine Blabe, Erin M Oakley, Kathryn R Tringale, et al. Stable long-term BCI-enabled communication in ALS and locked-in syndrome using LFP signals.Journal of neurophysiology, 120(7):343–360, 2018

  2. [2]

    The history of BCI: From a vision for the future to real support for personhood in people with locked-in syndrome.Neuroethics, 13(2):163–180, 2020

    Andrea Kübler. The history of BCI: From a vision for the future to real support for personhood in people with locked-in syndrome.Neuroethics, 13(2):163–180, 2020

  3. [3]

    The neurophysiological basis of motor imagery.Behavioural brain research, 77(1-2):45–52, 1996

    Jean Decety. The neurophysiological basis of motor imagery.Behavioural brain research, 77(1-2):45–52, 1996

  4. [4]

    Comparison of EEG signal decomposition methods in classification of motor-imagery BCI.Multimedia Tools and Applications, 77:21305– 21327, 2018

    Eltaf Abdalsalam Mohamed, Mohd Zuki Yusoff, Aamir Saeed Malik, Mohammad Rida Bahloul, Dalia Mahmoud Adam, and Ibrahim Khalil Adam. Comparison of EEG signal decomposition methods in classification of motor-imagery BCI.Multimedia Tools and Applications, 77:21305– 21327, 2018

  5. [5]

    High-performance brain-to-text communication via handwriting.Nature, 593(7858):249–254, 2021

    Francis R Willett, Donald T Avansino, Leigh R Hochberg, Jaimie M Henderson, and Kr- ishna V Shenoy. High-performance brain-to-text communication via handwriting.Nature, 593(7858):249–254, 2021

  6. [6]

    Online recognition of handwritten characters from scalp-recorded brain activities during handwriting.Journal of Neural Engineering, 18(4):046070, 2021

    Leisi Pei and Guang Ouyang. Online recognition of handwritten characters from scalp-recorded brain activities during handwriting.Journal of Neural Engineering, 18(4):046070, 2021

  7. [7]

    Towards scalable handwriting communi- cation via EEG decoding and latent embedding integration.arXiv preprint arXiv:2411.09170, 2024

    Jun-Young Kim, Deok-Seon Kim, and Seo-Hyun Lee. Towards scalable handwriting communi- cation via EEG decoding and latent embedding integration.arXiv preprint arXiv:2411.09170, 2024

  8. [8]

    Neural spelling: A spell-based BCI system for language neural decoding.arXiv preprint arXiv:2501.17489, 2025

    Xiaowei Jiang, Charles Zhou, Yiqun Duan, Ziyi Zhao, Thomas Do, and Chin-Teng Lin. Neural spelling: A spell-based BCI system for language neural decoding.arXiv preprint arXiv:2501.17489, 2025

  9. [9]

    NeuroAiR: Deep learning framework for airwriting recognition from scalp-recorded neural signals.IEEE Transactions on Instrumentation and Measurement, 2024

    Ayush Tripathi, Aryan Gupta, AP Prathosh, Suriya Prakash Muthukrishnan, and Lalan Kumar. NeuroAiR: Deep learning framework for airwriting recognition from scalp-recorded neural signals.IEEE Transactions on Instrumentation and Measurement, 2024

  10. [10]

    Seeing through the brain: Image reconstruction of visual perception from human brain signals, 2023

    Yu-Ting Lan, Kan Ren, Yansen Wang, Wei-Long Zheng, Dongsheng Li, Bao-Liang Lu, and Lili Qiu. Seeing through the brain: Image reconstruction of visual perception from human brain signals, 2023

  11. [11]

    Perceptogram: Reconstructing visual percepts from EEG.arXiv preprint arXiv:2404.01250, 2024

    Teng Fei, Abhinav Uppal, Ian Jackson, Srinivas Ravishankar, David Wang, and Virginia R de Sa. Perceptogram: Reconstructing visual percepts from EEG.arXiv preprint arXiv:2404.01250, 2024

  12. [12]

    The perils and pitfalls of block design for EEG classification experiments.IEEE Transactions on Pattern Analysis and Machine Intelligence, 43(1):316–333, 2020

    Ren Li, Jared S Johansen, Hamad Ahmed, Thomas V Ilyevsky, Ronnie B Wilbur, Hari M Bharad- waj, and Jeffrey Mark Siskind. The perils and pitfalls of block design for EEG classification experiments.IEEE Transactions on Pattern Analysis and Machine Intelligence, 43(1):316–333, 2020

  13. [13]

    Lingyu Wu, Tzyy-Ping Jung, Xiaojian Li, Yanhong Zhou, Xianglong Wan, Wenlong Jiao, Xueguang Xie, Dingna Duan, Tiange Liu, Hao Yu, et al. Mind-pinyin speller: A non-invasive brain-computer interface for efficient chinese character input using EEG-based imagined hand- writing.Expert Systems with Applications, page 129999, 2025

  14. [14]

    A low-latency neural inference framework for real-time handwriting recognition from EEG signals on an edge device: O

    Ovishake Sen, Raghav Soni, Darpan Virmani, Akshar Parekh, Patrick Lehman, Sarthak Jena, Adithi Katikhaneni, Adam Khalifa, and Baibhab Chatterjee. A low-latency neural inference framework for real-time handwriting recognition from EEG signals on an edge device: O. sen et al.Scientific Reports, 15(1):41040, 2025

  15. [15]

    Handwritten character classification from EEG through continuous kinematic decoding.Computers in Biology and Medicine, 182:109132, 2024

    Markus R Crell and Gernot R Müller-Putz. Handwritten character classification from EEG through continuous kinematic decoding.Computers in Biology and Medicine, 182:109132, 2024. 10

  16. [16]

    Handwriting imagery EEG classification based on convolutional neural networks.arXiv preprint arXiv:2509.03111, 2025

    Hao Yang and Guang Ouyang. Handwriting imagery EEG classification based on convolutional neural networks.arXiv preprint arXiv:2509.03111, 2025

  17. [17]

    EEGNet: a compact convolutional neural network for EEG-based brain-computer interfaces.Journal of neural engineering, 15(5):056013, 2018

    Vernon J Lawhern, Amelia J Solon, Nicholas R Waytowich, Stephen M Gordon, Chou P Hung, and Brent J Lance. EEGNet: a compact convolutional neural network for EEG-based brain-computer interfaces.Journal of neural engineering, 15(5):056013, 2018

  18. [18]

    EEG conformer: Convo- lutional transformer for EEG decoding and visualization.IEEE Transactions on Neural Systems and Rehabilitation Engineering, 31:710–719, 2022

    Yonghao Song, Qingqing Zheng, Bingchuan Liu, and Xiaorong Gao. EEG conformer: Convo- lutional transformer for EEG decoding and visualization.IEEE Transactions on Neural Systems and Rehabilitation Engineering, 31:710–719, 2022

  19. [19]

    Development of expert-level clas- sification of seizures and rhythmic and periodic patterns during EEG interpretation.Neurology, 100(17):e1750–e1762, 2023

    Jin Jing, Wendong Ge, Shenda Hong, Marta Bento Fernandes, Zhen Lin, Chaoqi Yang, Sungtae An, Aaron F Struck, Aline Herlopian, Ioannis Karakis, et al. Development of expert-level clas- sification of seizures and rhythmic and periodic patterns during EEG interpretation.Neurology, 100(17):e1750–e1762, 2023

  20. [20]

    Cbramod: A criss-cross brain foundation model for eeg decoding.arXiv preprint arXiv:2412.07236, 2024a

    Jiquan Wang, Sha Zhao, Zhiling Luo, Yangxuan Zhou, Haiteng Jiang, Shijian Li, Tao Li, and Gang Pan. CBraMod: A criss-cross brain foundation model for EEG decoding.arXiv preprint arXiv:2412.07236, 2024

  21. [21]

    MIRepNet: A pipeline and foundation model for EEG-based motor imagery classification.arXiv preprint arXiv:2507.20254, 2025

    Dingkun Liu, Zhu Chen, Jingwei Luo, Shijie Lian, and Dongrui Wu. MIRepNet: A pipeline and foundation model for EEG-based motor imagery classification.arXiv preprint arXiv:2507.20254, 2025

  22. [22]

    REVE: A foundation model for EEG–adapting to any setup with large-scale pretraining on 25,000 subjects.arXiv preprint arXiv:2510.21585, 2025

    Yassine El Ouahidi, Jonathan Lys, Philipp Thölke, Nicolas Farrugia, Bastien Pasdeloup, Vincent Gripon, Karim Jerbi, and Giulia Lioi. REVE: A foundation model for EEG–adapting to any setup with large-scale pretraining on 25,000 subjects.arXiv preprint arXiv:2510.21585, 2025

  23. [23]

    Components of the movement- related cortical potential and their scalp topography.Electroencephalography and Clinical Neurophysiology, 49(3–4):213–226, August 1980

    H Shibasaki, G Barrett, Elise Halliday, and A.M Halliday. Components of the movement- related cortical potential and their scalp topography.Electroencephalography and Clinical Neurophysiology, 49(3–4):213–226, August 1980

  24. [24]

    Grad-cam: Visual explanations from deep networks via gradient-based localization

    Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. InProceedings of the IEEE international conference on computer vision, pages 618–626, 2017

  25. [25]

    Deep learning with convolutional neural networks for eeg decoding and visualization

    Robin Tibor Schirrmeister, Jost Tobias Springenberg, Lukas Dominique Josef Fiederer, Martin Glasstetter, Katharina Eggensperger, Michael Tangermann, Frank Hutter, Wolfram Burgard, and Tonio Ball. Deep learning with convolutional neural networks for eeg decoding and visualization. Human Brain Mapping, aug 2017. 11 A Confounded Experimental Designs A.1 Ocul...