BCI-sift: An automated feature selection toolbox for Brain Computer Interface applications

Dirk Keller; Elena C Offenberg; Julia Berezutskaya; Mariska J Vansteensel; Nick F Ramsey; Zachary V Freudenburg

arxiv: 2605.19646 · v1 · pith:ZZ2Q3TYBnew · submitted 2026-05-19 · 🧬 q-bio.NC · cs.LG

BCI-sift: An automated feature selection toolbox for Brain Computer Interface applications

Elena C Offenberg , Dirk Keller , Mariska J Vansteensel , Zachary V Freudenburg , Nick F Ramsey , Julia Berezutskaya This is my paper

Pith reviewed 2026-05-20 02:00 UTC · model grok-4.3

classification 🧬 q-bio.NC cs.LG

keywords BCIfeature selectionelectrocorticographyoptimization algorithmsmachine learningspeech decodingtoolboxsensorimotor cortex

0 comments

The pith

BCI-sift toolbox applies optimization algorithms to select relevant neural features from high-dimensional brain recordings, raising classification accuracy while highlighting consistent brain patterns.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

BCI-sift is a Python toolbox that integrates optimization methods to identify the most useful features in brain-computer interface datasets for machine learning tasks. When tested on high-density electrocorticography data from eight participants repeatedly speaking 12 words, the toolbox selected electrodes over the sensorimotor cortex, time points clustered around speech production, and high-frequency signals as most informative. These choices remained consistent across participants and matched known functional organization of the brain. Using only the selected features raised classification accuracy compared with using the complete set of signals. The toolbox is designed to make such feature selection easier, more automated, and more interpretable for BCI research across different recording types.

Core claim

BCI-sift identifies informative neural features across electrode, temporal, and frequency dimensions in HD ECoG data from a 12-word speech task. The anatomical locations of the selected electrodes prove consistent across participants and align with the known functional organization of the sensorimotor cortex; relevant time points cluster around speech production, and the high-frequency band emerges as most informative. Feature selection performed by the toolbox improves classification accuracy relative to using all available features.

What carries the argument

BCI-sift toolbox, a scikit-learn-compatible Python package that integrates multiple optimization algorithms to automate selection of electrode, time, and frequency features from BCI recordings.

If this is right

Feature selection yields higher accuracy in classifying neural signals for spoken words than using the full feature set.
Selected electrodes, times, and frequency bands remain consistent across participants and match established sensorimotor cortex maps.
The approach simplifies automated feature analysis while increasing interpretability of which signals drive decoding.
The toolbox works with both implanted and non-implanted BCI modalities beyond the tested HD ECoG setup.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Applying the toolbox to larger or clinical populations could reveal how feature patterns change with neurological conditions.
Integration into real-time BCI pipelines might allow ongoing adaptation of selected features during use.
Comparing outputs across multiple optimization methods within the toolbox could highlight which algorithms best suit specific BCI tasks.

Load-bearing premise

The integrated optimization algorithms will reliably find generalizable, task-relevant features when applied to new BCI datasets or participants beyond the eight tested individuals and the specific speech task.

What would settle it

Running BCI-sift on a fresh dataset from new participants or a different BCI task and observing no gain in classification accuracy or selections that vary widely across individuals instead of aligning with known brain organization.

Figures

Figures reproduced from arXiv: 2605.19646 by Dirk Keller, Elena C Offenberg, Julia Berezutskaya, Mariska J Vansteensel, Nick F Ramsey, Zachary V Freudenburg.

**Figure 1.** Figure 1: Overview of the BCI-sift workflow. The toolbox receives brain data from a BCI task, along with corresponding labels and user-defined analysis parameters for the feature selection procedure. BCI-sift then performs the selected optimization analysis and provides the following outputs: a feature mask indicating the most relevant features, a results table summarizing the evaluation metrics, and plots for resul… view at source ↗

**Figure 2.** Figure 2: Test-set classification accuracies for combined electrode-frequency selection using recursive feature elimination (RFE). For each participant, blue bars show the mean cross-validated accuracy without feature selection, while orange bars indicate performance after applying RFE. Error bars represent the standard deviation across crossvalidation folds [PITH_FULL_IMAGE:figures/full_fig_p006_2.png] view at source ↗

**Figure 3.** Figure 3: Elimination of frequency band features during combined electrode-frequency selection per participant using recursive feature elimination (RFE). The stars, shown at the top of each plot, indicate the point along the xaxis (RFE progression) at which classification performance was highest for each cross-validation fold. The later in the process a feature gets eliminated, the more important it is for the clas… view at source ↗

**Figure 4.** Figure 4: Elimination per frequency band in combined electrode-frequency selection using recursive feature elimination across participants. The proportion of each frequency band removed at each normalized elimination step is shown. HFB is eliminated last, indicating that it carries the most informative features for classification [PITH_FULL_IMAGE:figures/full_fig_p007_4.png] view at source ↗

**Figure 5.** Figure 5: Test-set accuracies for electrode selection using recursive feature elimination (RFE). For each participant, blue bars show the mean cross-validated accuracy on the HFB features without electrode selection, while orange bars indicate performance after applying RFE on the electrode dimension. Error bars represent the standard deviation across cross-validation folds [PITH_FULL_IMAGE:figures/full_fig_p008_5.png] view at source ↗

**Figure 6.** Figure 6: Electrodes selected by recursive feature elimination for each participant. The color scale indicates the ratio of cross-validation folds (out of 10) in which a given electrode appeared in the final feature selection. electrodes excluded from analysis due to noisiness or flat signals are shown in gray, and the central sulci are outlined in white. “A” and “P” denote anterior and posterior directions on the c… view at source ↗

**Figure 7.** Figure 7: Electrodes selected by recursive feature elimination for all participants, projected onto a common MNI brain. The color scale indicates the cumulative ratio of cross-validation folds (out of 10) in which a given electrode appeared in the final feature selection across participants. For visualization purposes, data from P2 is mapped onto the left hemisphere. Finally, RFE was applied solely to the time dimen… view at source ↗

**Figure 8.** Figure 8: Test-set accuracies for time point selection using recursive feature elimination (RFE). For each participant, blue bars show the mean cross-validated accuracy with all time points, while orange bars indicate performance after applying RFE on the time dimension. Error bars represent the standard deviation across crossvalidation folds [PITH_FULL_IMAGE:figures/full_fig_p010_8.png] view at source ↗

**Figure 10.** Figure 10: Time points selected by recursive feature elimination on cue-aligned data. A) Proportion of cross-validation folds (out of 10) in which each time point was retained in the final feature selection, shown separately for each participant. The red line indicates the cue per trial. The dashed blue line indicates the mean voice onset time, and the shaded blue region is the standard deviation of voice onset time… view at source ↗

**Figure 11.** Figure 11: Test-set accuracies for electrode selection analysis including electrodes previously marked as noisy. For each participant, blue bars indicate the mean cross-validated accuracy without feature selection, orange bars show performance after applying recursive feature elimination, and green bars show performance when noisy electrodes were excluded manually. Participant P1 was excluded because no electrodes w… view at source ↗

**Figure 12.** Figure 12: Results of electrode feature selection, showing electrodes selected by each optimization algorithm for channel selection across all participants, projected onto a common MNI brain. The color scale indicates the number of cross-validation folds (out of 10) in which a given electrode appeared in the final feature set; for the electrodedensity visualization, each electrode was instead assigned a fixed value… view at source ↗

read the original abstract

Advancements in clinical Brain-Computer Interfaces (BCIs) depend on precise and reliable signal interpretation. However, the high-dimensional and noisy nature of data captured from both implanted and non-implanted BCIs poses significant challenges, motivating the use of feature selection algorithms. We introduce BCI-sift (BCI Systematic and Interpretable Feature Tuning), a Python-based toolbox designed to streamline the application of diverse optimization algorithms to BCI datasets for identifying the most relevant features in machine learning tasks. Our scikit-learn-compatible toolbox (github.com/UMCU-RIBS/BCI-sift) simplifies feature selection in BCI tasks by integrating advanced optimization methods. We validated the toolbox on high-density electrocorticography (HD ECoG) data from eight able-bodied participants with 64-128 electrodes implanted over the sensorimotor cortex, who repeatedly spoke 12 words. BCI-sift identified informative neural features across electrode, temporal, and frequency dimensions. The anatomical locations of electrode selections were consistent across participants and aligned with known functional organization of the sensorimotor cortex. Relevant time points clustered around speech production, and the high-frequency band was identified as most informative, in line with prior work. Feature selection improved classification accuracy compared to using all features. BCI-sift provides an accessible and versatile platform for feature selection in BCI research, enabling improved decoding performance, automated feature analysis, and enhanced interpretability. While validated on HD ECoG data, the approach is broadly applicable to other BCI modalities. By enhancing classification accuracy and interpretability, BCI-sift addresses key challenges in developing efficient and transparent BCI systems.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

BCI-sift is a convenient wrapper for existing feature selection methods with a narrow but coherent validation on ECoG speech data.

read the letter

BCI-sift packages standard feature selection algorithms into a scikit-learn compatible toolbox aimed at BCI researchers. The authors ran it on high-density ECoG data from eight able-bodied people speaking twelve words and report that it picks consistent electrodes in sensorimotor areas, relevant time windows around speech, and high-frequency bands. This is useful because it automates what many labs do manually and provides some interpretability by showing which features matter. The code is on GitHub, which lowers the barrier for others to try it. The anatomical consistency they found matches what we expect from prior ECoG studies, so that part holds up. The soft spots are in the scope of the validation. Everything is from able-bodied participants with the same task and implant locations. We lack tests on held-out subjects, clinical populations, or other signal types like EEG or fNIRS. Without those, it's unclear if the toolbox generalizes or just fits this particular dataset well. The abstract also omits specific accuracy numbers and details on how they handled cross-validation or multiple comparisons. This paper is for applied BCI groups that want a ready-made tool rather than for theorists. A reader looking for practical improvements in decoding pipelines could get value from trying the code. It deserves a serious referee because the core idea is straightforward and the implementation could be checked for bugs or usability issues. I recommend sending it out for peer review with a note to expand the validation set.

Referee Report

2 major / 2 minor

Summary. The paper introduces BCI-sift, a scikit-learn-compatible Python toolbox that applies diverse optimization algorithms for automated feature selection in BCI machine learning tasks. Validation is performed on high-density ECoG recordings from eight able-bodied participants with 64-128 electrodes over sensorimotor cortex during repeated production of 12 words; the toolbox identifies consistent electrode locations aligned with functional organization, time points clustered around speech production, high-frequency bands as most informative, and reports that selected features yield higher classification accuracy than the full feature set. The work positions the toolbox as accessible, interpretable, and broadly applicable to other BCI modalities.

Significance. If the empirical outcomes hold under broader testing, BCI-sift would supply a practical, open-source (github.com/UMCU-RIBS/BCI-sift) platform that lowers the barrier to interpretable feature selection in high-dimensional BCI data. The reported anatomical consistency and high-frequency preference match existing neurophysiological knowledge, and the emphasis on reproducibility via scikit-learn integration is a clear strength for the field.

major comments (2)

Abstract and Results: the statement that 'feature selection improved classification accuracy compared to using all features' is presented without any numerical values, confidence intervals, or statistical tests. Because this performance gain is the primary empirical support for the toolbox's utility, the absence of quantitative metrics leaves the central claim unverifiable from the provided text.
Validation experiments (described in the abstract and methods): all eight participants share identical able-bodied status, sensorimotor coverage, and the same 12-word speech task. This narrow design does not address whether the same pipelines produce reliable gains on held-out participants, different tasks, non-ECoG modalities, or clinical populations, which directly limits support for the abstract's claim of broad applicability.

minor comments (2)

Methods: the criteria and cross-validation scheme used to select or tune hyperparameters of the integrated optimization algorithms are not detailed; explicit description would strengthen reproducibility.
The manuscript would benefit from a summary table listing per-participant selected electrodes, time windows, and frequency bands to allow direct inspection of the claimed consistency.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the thoughtful and constructive comments. These have helped us clarify the presentation of our empirical results and the scope of the validation. We address each major comment in turn below.

read point-by-point responses

Referee: Abstract and Results: the statement that 'feature selection improved classification accuracy compared to using all features' is presented without any numerical values, confidence intervals, or statistical tests. Because this performance gain is the primary empirical support for the toolbox's utility, the absence of quantitative metrics leaves the central claim unverifiable from the provided text.

Authors: We agree that the absence of specific numerical values, confidence intervals, and statistical tests in the abstract (and potentially the summarized results) makes the central claim difficult to verify at a glance. Although comparative results are shown in the manuscript's results section and figures, we acknowledge that explicit quantification and statistical support were not sufficiently highlighted. In the revised version we have updated the abstract to report key quantitative outcomes (mean accuracy improvement across participants with standard deviation) and added a statement that a paired statistical test confirmed significant improvement (p < 0.05). We have also ensured that the results section now explicitly includes confidence intervals or equivalent variability measures alongside the accuracy comparisons. revision: yes
Referee: Validation experiments (described in the abstract and methods): all eight participants share identical able-bodied status, sensorimotor coverage, and the same 12-word speech task. This narrow design does not address whether the same pipelines produce reliable gains on held-out participants, different tasks, non-ECoG modalities, or clinical populations, which directly limits support for the abstract's claim of broad applicability.

Authors: We recognize that the validation dataset is homogeneous—all participants were able-bodied, used the same HD ECoG coverage over sensorimotor cortex, and performed the identical 12-word speech task. This choice was deliberate to enable direct comparison of selected features against established neurophysiological expectations. The toolbox itself is implemented as a general, scikit-learn-compatible package that does not embed assumptions specific to ECoG or speech. To address the concern about broad applicability, we have added an explicit limitations paragraph in the discussion that acknowledges the narrow validation scope and outlines how the same pipelines can be applied to held-out participants, other tasks, different recording modalities, and clinical populations. We have also softened the abstract's phrasing regarding broad applicability to better reflect that the current empirical demonstration is on one high-dimensional dataset while the software supports wider use. revision: partial

Circularity Check

0 steps flagged

No circularity: toolbox applies standard methods with empirical validation only

full rationale

The paper introduces BCI-sift as a scikit-learn-compatible Python toolbox integrating existing optimization algorithms for feature selection in BCI data. Validation consists of applying these methods to HD ECoG recordings from eight participants performing a 12-word speech task, then reporting observed outcomes such as selected electrodes aligning with sensorimotor cortex, time points clustering around speech production, high-frequency band preference, and improved classification accuracy versus the full feature set. No equations, predictions, or first-principles derivations are presented that reduce to fitted parameters or self-citations by construction. All load-bearing claims are direct empirical results from the applied algorithms on the given dataset, with no self-referential loops or renamed inputs masquerading as outputs. The approach remains self-contained against external benchmarks as a methods contribution.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claims rest on standard assumptions from neuroscience and machine learning rather than new postulates; no free parameters or invented entities are introduced in the abstract description.

axioms (1)

domain assumption High-frequency band activity in sensorimotor cortex is most informative for speech-related BCI tasks
Abstract states this finding aligns with prior work but treats it as a validation outcome rather than a new axiom.

pith-pipeline@v0.9.0 · 5851 in / 1303 out tokens · 38333 ms · 2026-05-20T02:00:09.207366+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Foundation/RealityFromDistinction.lean reality_from_one_distinction unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

We introduce the Python-based BCI-sift ... toolbox ... integrating advanced optimization methods ... recursive feature elimination, simulated annealing, evolutionary strategies, and particle swarm optimization.

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

5 extracted references · 5 canonical work pages

[1]

A., Miller, L

Altan, E., Solla, S. A., Miller, L. E., & Perreault, E. J. (2021). Estimating the dimensionality of the manifold underlying multi-electrode neural recordings. PLOS Computational Biology, 17(11), e1008591. https://doi.org/10.1371/journal.pcbi.1008591 Arvaneh, M., Guan, C., Ang, K. K., & Quek, C. (2011). Optimizing the Channel Selection and Classification A...

work page doi:10.1371/journal.pcbi.1008591 2021
[2]

https://doi.org/10.1039/A905556H Fortin, F.-A., De Rainville, F.-M., Gardner, M.-A., Parizeau, M., & Gagne, C. (2012). DEAP: Evolutionary Algorithms Made Easy. Journal of Machine Learning Research, 13, 2171–2175. Hettick, M., Ho, E., Poole, A. J., Monge, M., Papageorgiou, D., Takahashi, K., LaMarca, M., Trietsch, D., Reed, K., Murphy, M., Rider, S., Gelma...

work page doi:10.1039/a905556h 2012
[3]

H., Akhtar, N

https://doi.org/10.21105/joss.00433 Kabir, Md. H., Akhtar, N. I., Tasnim, N., Miah, A. S. M., Lee, H.-S., Jang, S.-W., & Shin, J. (2024). Exploring Feature Selection and Classification Techniques to Improve the Performance of an Electroencephalography-Based Motor Imagery Brain–Computer Interface System. Sensors, 24(15),

work page doi:10.21105/joss.00433 2024
[4]

M., Ali, Y

https://doi.org/10.3390/s24154989 Karpowicz, B. M., Ali, Y. H., Wimalasena, L. N., Sedler, A. R., Keshtkaran, M. R., Bodkin, K., Ma, X., Rubin, D. B., Williams, Z. M., Cash, S. S., Hochberg, L. R., Miller, L. E., & Pandarinath, C. (2025). Stabilizing brain-computer interfaces through alignment of latent dynamics. Nature Communications, 16(1),

work page doi:10.3390/s24154989 2025
[5]

https://doi.org/10.1038/s41467- 025-59652-y Kennedy, J., & Eberhart, R. (1995). Particle swarm optimization. Proceedings of ICNN’95 - International Conference on Neural Networks, 4, 1942–1948. https://doi.org/10.1109/ICNN.1995.488968 Kirkpatrick, S., Gelatt, C. D., & Vecchi, M. P. (1983). Optimization by Simulated Annealing. Science, 220(4598), 671–680. h...

work page doi:10.1038/s41467- 1995

[1] [1]

A., Miller, L

Altan, E., Solla, S. A., Miller, L. E., & Perreault, E. J. (2021). Estimating the dimensionality of the manifold underlying multi-electrode neural recordings. PLOS Computational Biology, 17(11), e1008591. https://doi.org/10.1371/journal.pcbi.1008591 Arvaneh, M., Guan, C., Ang, K. K., & Quek, C. (2011). Optimizing the Channel Selection and Classification A...

work page doi:10.1371/journal.pcbi.1008591 2021

[2] [2]

https://doi.org/10.1039/A905556H Fortin, F.-A., De Rainville, F.-M., Gardner, M.-A., Parizeau, M., & Gagne, C. (2012). DEAP: Evolutionary Algorithms Made Easy. Journal of Machine Learning Research, 13, 2171–2175. Hettick, M., Ho, E., Poole, A. J., Monge, M., Papageorgiou, D., Takahashi, K., LaMarca, M., Trietsch, D., Reed, K., Murphy, M., Rider, S., Gelma...

work page doi:10.1039/a905556h 2012

[3] [3]

H., Akhtar, N

https://doi.org/10.21105/joss.00433 Kabir, Md. H., Akhtar, N. I., Tasnim, N., Miah, A. S. M., Lee, H.-S., Jang, S.-W., & Shin, J. (2024). Exploring Feature Selection and Classification Techniques to Improve the Performance of an Electroencephalography-Based Motor Imagery Brain–Computer Interface System. Sensors, 24(15),

work page doi:10.21105/joss.00433 2024

[4] [4]

M., Ali, Y

https://doi.org/10.3390/s24154989 Karpowicz, B. M., Ali, Y. H., Wimalasena, L. N., Sedler, A. R., Keshtkaran, M. R., Bodkin, K., Ma, X., Rubin, D. B., Williams, Z. M., Cash, S. S., Hochberg, L. R., Miller, L. E., & Pandarinath, C. (2025). Stabilizing brain-computer interfaces through alignment of latent dynamics. Nature Communications, 16(1),

work page doi:10.3390/s24154989 2025

[5] [5]

https://doi.org/10.1038/s41467- 025-59652-y Kennedy, J., & Eberhart, R. (1995). Particle swarm optimization. Proceedings of ICNN’95 - International Conference on Neural Networks, 4, 1942–1948. https://doi.org/10.1109/ICNN.1995.488968 Kirkpatrick, S., Gelatt, C. D., & Vecchi, M. P. (1983). Optimization by Simulated Annealing. Science, 220(4598), 671–680. h...

work page doi:10.1038/s41467- 1995