Average Rankings Mask Per-Subject Optimality: A Friedman-Nemenyi Benchmark of EEG Motor-Imagery BCI Decoders
Pith reviewed 2026-06-25 23:03 UTC · model grok-4.3
The pith
No single decoding pipeline dominates EEG motor imagery even in the easiest single-subject regime.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Covariance tangent-space projection and Common Spatial Patterns form the strongest families, yet on the largest heterogeneous cohort their rankings are statistically indistinguishable, the single best pipeline serves only 35 percent of participants, and participant-specific selection improves accuracy by about seven points over the best fixed pipeline.
What carries the argument
Friedman omnibus test followed by Nemenyi critical-difference analysis on 1,056 decoding configurations evaluated subject-by-subject.
If this is right
- Feature representation choice matters more than classifier or scaler choice.
- Dataset identity changes which family ranks first.
- Nonlinear descriptors are optimal for roughly one third of participants.
- Aggregate rankings conceal large per-subject variability.
Where Pith is reading between the lines
- BCI pipelines may need built-in mechanisms for rapid per-user model selection or adaptation.
- Future benchmarks should report the fraction of participants for whom each pipeline wins rather than only mean ranks.
- The observed variability may reflect subject-specific spatial patterns that fixed extractors cannot capture uniformly.
Load-bearing premise
The three chosen public datasets and the within-session single-subject fitting regime are representative enough to support the conclusion that no universal decoder exists.
What would settle it
A new dataset in which one fixed pipeline is the single best choice for more than 70 percent of participants would falsify the claim.
Figures
read the original abstract
Electroencephalography (EEG) is the dominant non-invasive modality for brain-computer interfaces (BCIs), yet reliable decoding of motor imagery is hampered by inter- and intra-individual variability. A recurring claim is that one decoding pipeline, most often a spatial or Riemannian method, is broadly preferable. We test the weakest version of that claim under the most favourable conditions. Using the Mother of All BCI Benchmarks (MOABB) framework, we evaluated 1,056 decoding configurations (feature extractor x scaler x classifier), >340,000 subject-level model fits, across three public left-versus-right motor-imagery datasets (PhysionetMI, 109 participants; Cho2017, 52; Zhou2016, 4) and two frequency bands (8-15 Hz, 8-30 Hz). Every model is fit and tested within a single session of a single participant, the easiest regime, giving every pipeline its best chance. We apply the statistics standard for multi-classifier comparison: Friedman omnibus tests, Nemenyi critical-difference analysis and Wilcoxon signed-rank tests with effect sizes. Covariance tangent-space projection (cov-tgsp) and Common Spatial Patterns (CSP) are the strongest families, but their ordering is dataset-dependent and, on the largest and most heterogeneous cohort (PhysionetMI), statistically indistinguishable (Nemenyi p = 0.27; Kendall's W = 0.11). At the individual level the single best pipeline is optimal for only 35% of PhysionetMI participants, and nonlinear descriptors are best for roughly one third; matching pipeline to participant adds about seven accuracy points over the best fixed choice. The ranking is not an artefact of dimensionality, and classifier and scaler choices are secondary to the feature representation. Even in the easiest regime, no single pipeline dominates: a lower bound on the personalization problem and a quantitative case for participant-aware model selection rather than a universal decoder.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper evaluates 1056 decoding configurations (feature extractor × scaler × classifier) on three public left-vs-right motor-imagery EEG datasets (PhysionetMI N=109, Cho2017 N=52, Zhou2016 N=4) using >340000 within-session single-subject fits in the MOABB framework. It applies standard Friedman omnibus, Nemenyi post-hoc, and Wilcoxon signed-rank tests and reports that covariance tangent-space and CSP families are strongest but statistically indistinguishable on PhysionetMI (Nemenyi p=0.27, Kendall W=0.11), that the single best pipeline is optimal for only 35% of PhysionetMI participants, and that participant-specific matching yields an approximately 7-point accuracy gain over the best fixed pipeline. The central claim is that these results constitute a lower bound showing no universal decoder exists even under the easiest regime.
Significance. If the empirical results hold, the work supplies a large-scale, reproducible quantitative lower bound on the personalization problem in EEG-BCI motor-imagery decoding. The use of >340000 subject-level fits, the MOABB evaluation protocol, and the standard non-parametric multi-comparison statistics (Friedman-Nemenyi and Wilcoxon with effect sizes) are clear strengths that make the dataset-dependent rankings and per-participant optimality counts directly falsifiable and extensible.
minor comments (3)
- The abstract states that 'classifier and scaler choices are secondary to the feature representation' and that 'the ranking is not an artefact of dimensionality'; the main text should contain an explicit ablation or supplementary table that isolates these factors so readers can verify the claim without re-running the full 1056-configuration grid.
- Results for the smallest cohort (Zhou2016, N=4) are mentioned but contribute little statistical power; reporting them in a separate supplementary table or noting their limited weight in the aggregate conclusion would improve clarity.
- The per-participant optimality count (35%) would benefit from an accompanying binomial or permutation test against the null that any single pipeline is best for all subjects, even if the result is obvious by inspection.
Simulated Author's Rebuttal
We thank the referee for the positive evaluation, the recognition of the scale of the benchmark (>340k fits), the use of standard non-parametric tests, and the recommendation to accept. No major comments were raised.
Circularity Check
No circularity: direct empirical benchmark outcomes
full rationale
The paper reports results from exhaustive evaluation of 1056 pipelines on three public datasets under within-session single-subject fitting, using standard Friedman-Nemenyi and Wilcoxon tests. All central claims (dataset-dependent family rankings, 35% unique-best pipelines, +7 point personalization gain) are computed outputs of these runs and the MOABB protocol; no equations, fitted parameters, or self-citations are invoked to derive the conclusions, and the methodology contains no self-definitional, predictive, or ansatz-smuggling steps.
Axiom & Free-Parameter Ledger
free parameters (2)
- frequency bands
- pipeline configurations
axioms (2)
- domain assumption The MOABB framework correctly implements every one of the 1056 pipelines without implementation errors.
- domain assumption The three public datasets capture the relevant inter- and intra-individual variability for motor-imagery BCI.
Reference graph
Works this paper leans on
-
[1]
Khan, S. et al. Invasive brain-computer interface for communication: a scoping review. Brain Sci. 15, 336 (2025)
2025
-
[2]
Edelman, B. J. et al. Non-invasive brain-computer interfaces: state of the art and trends. IEEE Rev. Biomed. Eng. 18, 26–49 (2025)
2025
-
[3]
Saha, S. et al. Progress in brain-computer interface: challenges and opportunities. Front. Syst. Neurosci. 15, 578875 (2021)
2021
-
[4]
Chen, J. et al. fNIRS-EEG BCIs for motor rehabilitation: a review. Bioengineering 10, 1393 (2023)
2023
-
[5]
Freudenburg, Z. V. et al. Sensorimotor ECoG signal features for BCI control. Front. Neurosci. 13, 1058 (2019)
2019
-
[6]
Levett, J. J. et al. Invasive brain-computer interface for motor restoration in spinal cord injury: a systematic review. Neuromodulation 27, 597–603 (2024)
2024
-
[7]
B., Littlejohn, K
Silva, A. B., Littlejohn, K. T., Liu, J. R., Moses, D. A. & Chang, E. F. The speech neuropros- thesis. Nat. Rev. Neurosci. 25, 473–492 (2024)
2024
-
[8]
Stavisky, S. D. Restoring speech using brain-computer interfaces. Annu. Rev. Biomed. Eng. 27, 29–54 (2025)
2025
-
[9]
J., Steinberg, F
Oullier, O., Jantzen, K. J., Steinberg, F. L. & Kelso, J. A. S. Neural substrates of real and imagined sensorimotor coordination. Cereb. Cortex 15, 975–985 (2005)
2005
-
[10]
Xu, L. et al. Cross-dataset variability problem in EEG decoding with deep learning. Front. Hum. Neurosci. 14, 103 (2020)
2020
-
[11]
& Jun, S
Cho, H., Ahn, M., Ahn, S., Kwon, M. & Jun, S. C. EEG datasets for motor imagery brain- computer interface. GigaScience 6, gix034 (2017)
2017
-
[12]
J., Joordens, S
Gibson, E., Lobaugh, N. J., Joordens, S. & McIntosh, A. R. EEG variability: task-driven or subject-driven signal of interest? NeuroImage 252, 119034 (2022)
2022
-
[13]
Apicella, A. et al. Toward cross-subject and cross-session generalization in EEG-based emotion recognition: systematic review, taxonomy, and methods. Neurocomputing 604, 128354 (2024)
2024
-
[14]
& Jun, S
Ahn, M., Cho, H., Ahn, S. & Jun, S. C. High theta and low alpha powers may be indicative of BCI-illiteracy in motor imagery. PLoS One 8, e80886 (2013)
2013
-
[15]
& Al Dabagh, Y
Becker, S., Dhindsa, K., Mousapour, L. & Al Dabagh, Y. BCI illiteracy: it’s us, not them. In 2022 10th Int. Winter Conf. on BCI 1–3 (IEEE, 2022)
2022
-
[16]
& Kam, T.-E
Kim, D.-H., Shin, D.-H. & Kam, T.-E. Bridging the BCI illiteracy gap: a subject-to-subject semantic style transfer for EEG-based motor imagery classification. Front. Hum. Neurosci. 17, 1194751 (2023)
2023
-
[17]
& Pfurtscheller, G
Ramoser, H., Muller-Gerking, J. & Pfurtscheller, G. Optimal spatial filtering of single-trial EEG during imagined hand movement. IEEE Trans. Rehabil. Eng. 8, 441–446 (2000). 14
2000
-
[18]
& Jutten, C
Barachant, A., Bonnet, S., Congedo, M. & Jutten, C. Multiclass brain-computer interface classification by Riemannian geometry. IEEE Trans. Biomed. Eng. 59, 920–928 (2012)
2012
-
[19]
& Jutten, C
Barachant, A., Bonnet, S., Congedo, M. & Jutten, C. Classification of covariance matrices using a Riemannian-based kernel for BCI applications. Neurocomputing 112, 172–178 (2013)
2013
-
[20]
& Bhatia, R
Congedo, M., Barachant, A. & Bhatia, R. Riemannian geometry for EEG-based brain-computer interfaces: a primer and a review. Brain-Comput. Interfaces 4, 155–174 (2017)
2017
-
[21]
Samek, W., Meinecke, F. C. & Muller, K.-R. Transferring subspaces between subjects in brain-computer interfacing. IEEE Trans. Biomed. Eng. 60, 2289–2298 (2013)
2013
-
[22]
Singh, A. K. & Krishnan, S. Trends in EEG signal feature extraction applications. Front. Artif. Intell. 5, 1072801 (2022)
2022
-
[23]
EEG analysis based on time domain properties
Hjorth, B. EEG analysis based on time domain properties. Electroencephalogr. Clin. Neuro- physiol. 29, 306–310 (1970)
1970
-
[24]
Approach to an irregular time series on the basis of the fractal theory
Higuchi, T. Approach to an irregular time series on the basis of the fractal theory. Phys. D 31, 277–283 (1988)
1988
-
[25]
Astolfi, L. et al. Comparison of different cortical connectivity estimators for high-resolution EEG recordings. Hum. Brain Mapp. 28, 143–157 (2007)
2007
-
[26]
& Lee, T
Islam, M. & Lee, T. Functional connectivity analysis in multi-channel EEG for emotion detection. In Annu. Int. Conf. IEEE EMBS 1–4 (2023)
2023
-
[27]
& Jun, S
Park, H. & Jun, S. C. Connectivity study on resting-state EEG between motor-imagery BCI-literate and BCI-illiterate groups. J. Neural Eng. 21, 016015 (2024)
2024
-
[28]
& Lotte, F
Yger, F., Berar, M. & Lotte, F. Riemannian approaches in brain-computer interfaces: a review. IEEE Trans. Neural Syst. Rehabil. Eng. 25, 1753–1762 (2017)
2017
-
[29]
Lotte, F. et al. A review of classification algorithms for EEG-based brain-computer interfaces: a 10-year update. J. Neural Eng. 15, 031005 (2018)
2018
-
[30]
Kunjan, S. et al. The necessity of leave-one-subject-out (LOSO) cross-validation for EEG disease diagnosis. In Brain Informatics 558–567 (Springer, 2021)
2021
-
[31]
J., Hinterberger, T., Birbaumer, N
Schalk, G., McFarland, D. J., Hinterberger, T., Birbaumer, N. & Wolpaw, J. R. BCI2000: a general-purpose brain-computer interface system. IEEE Trans. Biomed. Eng. 51, 1034–1043 (2004)
2004
-
[32]
Goldberger, A. L. et al. PhysioBank, PhysioToolkit, and PhysioNet. Circulation 101, e215–e220 (2000)
2000
-
[33]
& Guo, X
Zhou, B., Wu, X., Lv, Z., Zhang, L. & Guo, X. A fully automated trial selection method for optimization of motor-imagery-based BCI. PLoS One 11, e0162657 (2016)
2016
-
[34]
Chevallier, S. et al. The largest EEG-based BCI reproducibility study for open science: the MOABB benchmark. Preprint at arXiv:2404.15319 (2024)
arXiv 2024
-
[35]
& Barachant, A
Jayaram, V. & Barachant, A. MOABB: trustworthy algorithm benchmarking for BCIs. J. Neural Eng. 15, 066011 (2018). 15
2018
-
[36]
Statistical comparisons of classifiers over multiple data sets
Demšar, J. Statistical comparisons of classifiers over multiple data sets. J. Mach. Learn. Res. 7, 1–30 (2006)
2006
-
[37]
The use of ranks to avoid the assumption of normality implicit in the analysis of variance
Friedman, M. The use of ranks to avoid the assumption of normality implicit in the analysis of variance. J. Am. Stat. Assoc. 32, 675–701 (1937)
1937
-
[38]
Schalk, G. et al. EEG Motor Movement/Imagery Dataset (PhysioNet). (2009)
2009
-
[39]
J., Lazar, M
Koles, Z. J., Lazar, M. S. & Zhou, S. Z. Spatial patterns underlying population differences in the background EEG. Brain Topogr. 2, 275–284 (1990)
1990
-
[40]
& Jutten, C
Barachant, A., Bonnet, S., Congedo, M. & Jutten, C. Common spatial pattern revisited by Riemannian geometry. In 2010 IEEE Int. Workshop on Multimedia Signal Processing 472–476 (2010)
2010
-
[41]
Zhong, X.-C. et al. EEG-DG: a multi-source domain generalization framework for motor-imagery EEG classification. IEEE J. Biomed. Health Inform. 29, 2484–2495 (2025)
2025
-
[42]
J., Solon, A
Lawhern, V. J., Solon, A. J., Waytowich, N. R., Gordon, S. M., Hung, C. P. & Lance, B. J. EEGNet: a compact convolutional neural network for EEG-based brain-computer interfaces. J. Neural Eng. 15, 056013 (2018)
2018
-
[43]
Schirrmeister, R. T. et al. Deep learning with convolutional neural networks for EEG decoding and visualization. Hum. Brain Mapp. 38, 5391–5420 (2017)
2017
-
[44]
& Grosse-Wentrup, M
Jayaram, V., Alamgir, M., Altun, Y., Schölkopf, B. & Grosse-Wentrup, M. Transfer learning in brain-computer interfaces. IEEE Comput. Intell. Mag. 11, 20–31 (2016)
2016
-
[45]
& Baumert, M
Saha, S. & Baumert, M. Intra- and inter-subject variability in EEG-based sensorimotor brain- computer interface: a review. Front. Comput. Neurosci. 13, 87 (2020)
2020
-
[46]
He, H. & Wu, D. Transfer learning for brain-computer interfaces: a Euclidean space data alignment approach. IEEE Trans. Biomed. Eng. 67, 399–410 (2020). 16
2020
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.