EEG-based AI-BCI Wheelchair Advancement: Hybrid Deep Learning with Motor Imagery for Brain Computer Interface

Biplov Paneru; Bipul Thapa; Bishwash Paneru; Khem Narayan Poudyal

arxiv: 2509.25667 · v2 · submitted 2025-09-30 · 💻 cs.LG · cs.AI· cs.HC

EEG-based AI-BCI Wheelchair Advancement: Hybrid Deep Learning with Motor Imagery for Brain Computer Interface

Bipul Thapa , Biplov Paneru , Bishwash Paneru , Khem Narayan Poudyal This is my paper

Pith reviewed 2026-05-18 11:33 UTC · model grok-4.3

classification 💻 cs.LG cs.AIcs.HC

keywords EEG classificationmotor imagerybrain-computer interfacedeep learning hybridCNN-Transformerwheelchair controlBCI simulation

0 comments

The pith

A hybrid CNN-Transformer model classifies motor imagery EEG signals with 91.73 percent accuracy for BCI wheelchair control.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

This paper develops a Convolutional Neural Network-Transformer Hybrid Model, or CTHM, to classify EEG data from imagined left and right hand movements. The model processes pre-filtered signals segmented into 19 by 200 arrays at 200 Hz sampling and achieves 91.73 percent test accuracy, surpassing baselines such as XGBoost, EEGNet, and a pure transformer. It integrates these classifications into a Tkinter simulation for wheelchair navigation. If the results hold, this hybrid architecture offers a more effective way to turn brain signals into reliable commands for assistive devices.

Core claim

The paper establishes that the CTHM framework, which combines convolutional layers for spatial feature extraction with transformer attention for temporal dependencies in EEG, delivers 91.73 percent accuracy on motor imagery classification and maintains a mean of 90 percent under stratified cross-validation, outperforming the listed machine learning baselines.

What carries the argument

The CTHM, a hybrid architecture that merges CNN and Transformer components to classify motor imagery from EEG arrays.

If this is right

The hybrid model demonstrates superior performance over standalone CNN or transformer approaches in this EEG task.
Stratified cross-validation yields consistent 90 percent mean accuracy, indicating robustness to data splits.
The system successfully simulates wheelchair movements using classified EEG signals in a graphical interface.
This method advances practical BCI applications by leveraging open-source data without custom collection.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Real-world deployment on physical wheelchairs would test whether simulation accuracy translates to live control.
The 19 by 200 segmentation at 200 Hz may highlight specific onset features that drive the classification success.
Extending the hybrid design to multi-class motor imagery or other BCI paradigms could broaden its utility beyond binary left-right decisions.

Load-bearing premise

The open-source pre-filtered EEG dataset, when reshaped into 19x200 arrays, sufficiently captures the essential patterns of right-left hand motor imagery for reliable model training and testing.

What would settle it

Retraining and testing the CTHM on a fresh set of raw, unsegmented EEG recordings from multiple subjects performing actual motor imagery tasks would show if accuracy remains above 85 percent or collapses.

read the original abstract

This paper presents an Artificial Intelligence (AI) integrated approach to Brain-Computer Interface (BCI)-based wheelchair development, utilizing a motor imagery right-left-hand movement mechanism for control. The system is designed to simulate wheelchair navigation based on motor imagery right and left-hand movements using electroencephalogram (EEG) data. A pre-filtered dataset, obtained from an open-source EEG repository, was segmented into arrays of 19x200 to capture the onset of hand movements. The data was acquired at a sampling frequency of 200Hz. The system integrates a Tkinter-based interface for simulating wheelchair movements, offering users a functional and intuitive control system. We propose a framework that uses Convolutional Neural Network-Transformer Hybrid Model, named CTHM, for motor imagery EEG classification. The model achieves a test accuracy of 91.73% compared with various machine learning baseline models, including XGBoost, EEGNet, and a transformer-based model. The CTHM achieved a mean accuracy of 90% through stratified cross-validation, showcasing the effectiveness of the CNN-Transformer hybrid architecture in BCI applications.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

This applies a CNN-Transformer hybrid to motor-imagery EEG for a wheelchair simulator and reports 91.73% test accuracy, but the abstract omits subject counts, CV partitioning, and preprocessing steps that would let us judge whether the numbers reflect real generalization.

read the letter

The main takeaway is that the authors built a CNN-Transformer hybrid they call CTHM, trained it on segmented 19x200 EEG arrays from a pre-filtered open dataset at 200 Hz, and got it to classify left versus right hand motor imagery well enough to drive a Tkinter wheelchair simulation. On held-out test data it reaches 91.73% and averages 90% under stratified cross-validation, beating the XGBoost, EEGNet, and plain transformer baselines they tried. That is the concrete result they deliver. The hybrid itself is not a new architectural idea, but putting it on this particular simulation task and showing the interface is a straightforward applied step. They also keep the evaluation on external test data and use stratified CV, which at least supplies some external check rather than pure training-set performance. Those are the parts that actually move the work forward from the abstract alone. The soft spots sit right where the stress-test note flags them. The abstract gives no subject count, no trial numbers, and no statement on whether the stratified folds keep subjects completely separate. In motor-imagery BCI that distinction usually decides whether the accuracy will hold for new users or just reflects within-subject patterns. The one-second windows are also short; typical mu and beta modulations often need a bit more post-cue time to appear clearly, so it is unclear how much signal the segments actually contain. Without those details the headline numbers stay hard to interpret for practical wheelchair control. This paper is aimed at applied BCI engineers who want to see hybrid models tested on mobility tasks and who might reuse the simulation setup or the basic comparison list. A reader hunting for new theory or large-scale subject-independent benchmarks will not find them here. The work shows clear engagement with standard baselines and reports reproducible-sounding numbers on the given data, so it is coherent on its own terms. I would send it to peer review rather than desk-reject it. A referee could ask for the missing subject and partitioning information, check the exact architecture and hyperparameters, and test whether the short windows capture the expected rhythms. Once those pieces are supplied the applied claim becomes easier to evaluate.

Referee Report

3 major / 1 minor

Summary. The manuscript proposes a hybrid Convolutional Neural Network-Transformer Model (CTHM) for classifying motor imagery EEG signals corresponding to right and left hand movements to enable simulated wheelchair control. Using a pre-filtered open-source dataset segmented into 19x200 arrays sampled at 200 Hz to capture movement onset, the CTHM achieves 91.73% test accuracy and a 90% mean accuracy via stratified cross-validation, outperforming baselines including XGBoost, EEGNet, and a transformer-based model. A Tkinter interface is integrated for simulation.

Significance. If the accuracies hold under subject-independent evaluation with appropriate capture of mu/beta modulations, the CNN-Transformer hybrid could offer a useful architecture for EEG-based BCI applications. The stratified cross-validation provides some grounding beyond a single train-test split, but the absence of subject-level details prevents assessment of whether the result generalizes beyond the training distribution.

major comments (3)

Abstract: The reported 91.73% test accuracy and 90% stratified-CV mean accuracy are presented without disclosing the number of subjects, trials per subject, or whether CV folds keep all trials from a given subject within the same fold. In BCI settings this omission leaves open the possibility that performance reflects subject-specific leakage rather than true generalization to new users, directly undermining the claim of effectiveness for wheelchair control.
Abstract: No description is supplied of the CTHM architecture (layer counts, kernel sizes, attention mechanisms), training hyperparameters, loss function, or optimizer. Without these the superiority over EEGNet and the transformer baseline cannot be verified or reproduced, rendering the central performance claim unevaluable.
Abstract: Segmentation into 19x200 arrays at 200 Hz produces 1-second windows asserted to capture movement onset. Typical motor-imagery ERD/ERS signatures appear 0.5–2 s post-cue; it is unclear whether these short windows contain the discriminative spectral features or whether the unspecified pre-filtering steps adequately mitigate ocular and muscular artifacts.

minor comments (1)

Abstract: The Tkinter simulation interface is mentioned only in passing; a brief description of how classification outputs map to navigation commands would clarify the end-to-end system.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for their valuable comments on our manuscript. We address each of the major comments below and will revise the manuscript accordingly to improve clarity and reproducibility.

read point-by-point responses

Referee: Abstract: The reported 91.73% test accuracy and 90% stratified-CV mean accuracy are presented without disclosing the number of subjects, trials per subject, or whether CV folds keep all trials from a given subject within the same fold. In BCI settings this omission leaves open the possibility that performance reflects subject-specific leakage rather than true generalization to new users, directly undermining the claim of effectiveness for wheelchair control.

Authors: We agree that this information is critical for evaluating the generalizability of the results in a BCI context. The abstract summarizes the key results but does not include these dataset details. In the revised manuscript, we will explicitly state the number of subjects, the number of trials per subject, and clarify the stratified cross-validation procedure, including whether subject trials are kept within the same fold to prevent leakage. This will strengthen the claim for wheelchair control applications. revision: yes
Referee: Abstract: No description is supplied of the CTHM architecture (layer counts, kernel sizes, attention mechanisms), training hyperparameters, loss function, or optimizer. Without these the superiority over EEGNet and the transformer baseline cannot be verified or reproduced, rendering the central performance claim unevaluable.

Authors: We recognize the importance of providing sufficient details for reproducibility. While the full manuscript likely contains these in the methods section, the abstract focuses on the high-level approach and results. To address this, we will include a concise description of the CTHM architecture, key hyperparameters, loss function, and optimizer in the revised abstract or as a note to make the performance claims verifiable. revision: yes
Referee: Abstract: Segmentation into 19x200 arrays at 200 Hz produces 1-second windows asserted to capture movement onset. Typical motor-imagery ERD/ERS signatures appear 0.5–2 s post-cue; it is unclear whether these short windows contain the discriminative spectral features or whether the unspecified pre-filtering steps adequately mitigate ocular and muscular artifacts.

Authors: The 19x200 segmentation at 200 Hz corresponds to 1-second windows chosen to capture the onset of motor imagery movements based on the dataset characteristics. We will revise the manuscript to provide more justification for this window size in relation to ERD/ERS timing and elaborate on the pre-filtering steps applied to the open-source dataset to handle artifacts such as ocular and muscular noise. This will clarify how the discriminative features are captured. revision: yes

Circularity Check

0 steps flagged

Reported accuracies grounded in held-out test data and stratified cross-validation

full rationale

The abstract reports an empirical test accuracy of 91.73% and a mean stratified cross-validation accuracy of 90% for the proposed CTHM model on pre-filtered, segmented 19x200 EEG arrays. These performance figures are obtained via standard held-out testing and cross-validation splits, supplying external grounding against the chosen data partitions rather than reducing to a definitional identity or fitted input renamed as prediction. No equations, self-citations, uniqueness theorems, or ansatzes appear in the provided text, and the central claim does not invoke prior author work to force the result. Minor uncertainty remains around hyperparameter tuning or subject-wise leakage details (not disclosed in the abstract), but this does not constitute circularity under the defined criteria; the evaluation remains self-contained against the reported benchmarks.

Axiom & Free-Parameter Ledger

2 free parameters · 2 axioms · 0 invented entities

Abstract-only review limits visibility into exact modeling choices; the ledger captures the main assumptions and parameters implied by the described pipeline.

free parameters (2)

CTHM architecture hyperparameters
Number of layers, attention heads, and other design choices in the hybrid model were selected to reach the stated accuracy but are not enumerated.
Segmentation dimensions
19x200 array size at 200 Hz sampling frequency chosen to capture movement onset.

axioms (2)

domain assumption Motor imagery right-left hand movements produce reliably distinguishable patterns in the pre-filtered EEG data
Fundamental to the classification task and taken from standard BCI assumptions.
domain assumption The open-source pre-filtered dataset is representative and free of critical artifacts for this application
Relies on external repository without additional validation steps described in the abstract.

pith-pipeline@v0.9.0 · 5716 in / 1456 out tokens · 54956 ms · 2026-05-18T11:33:27.848235+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

We propose a framework that uses Convolutional Neural Network-Transformer Hybrid Model, named CTHM, for motor imagery EEG classification. The model achieves a test accuracy of 91.73%... BiLSTM-BiGRU attention-based model achieved a mean accuracy of 90.13% through cross-validation
IndisputableMonolith/Foundation/DimensionForcing.lean alexander_duality_circle_linking unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

segmented into arrays of 19x200 to capture the onset of hand movements... sampling frequency of 200Hz

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.