pith. machine review for the scientific record. sign in

arxiv: 2511.21740 · v5 · submitted 2025-11-21 · 💻 cs.CL · cs.AI

Recognition: unknown

A cross-species neural foundation model for end-to-end speech decoding

Authors on Pith no claims yet
classification 💻 cs.CL cs.AI
keywords end-to-endneuralspeechdecodingactivityapproachattemptedaudio
0
0 comments X
read the original abstract

Speech brain-computer interfaces (BCIs) aim to restore communication for people with paralysis by translating neural activity into text. Most systems use cascaded frameworks that decode phonemes before assembling sentences with an n-gram language model (LM), preventing joint optimization of all stages simultaneously. Here, we introduce an end-to-end BraIn-to-Text (BIT) framework that translates neural activity into coherent sentences using a single differentiable neural network. Central to our approach is a cross-task, cross-species pretrained neural encoder, whose representations transfer to both attempted and imagined speech. In a cascaded setting with an n-gram LM, the pretrained encoder establishes a new state-of-the-art (SOTA) on the Brain-to-Text '24 and '25 benchmarks. Integrated end-to-end with audio large language models (LLMs) and trained with contrastive learning for cross-modal alignment, BIT reduces the word error rate (WER) of the prior end-to-end method from 24.69% to 10.22%. Notably, we find that small-scale audio LLMs markedly improve end-to-end decoding. Beyond record-setting performance, BIT aligns attempted and imagined speech embeddings to enable cross-task generalization. Altogether, our approach advances the integration of large, diverse neural datasets, paving the way for an end-to-end decoding framework that supports seamless, differentiable optimization.

This paper has not been read by Pith yet.

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Forward citations

Cited by 2 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. DANCE: Detect and Classify Events in EEG

    cs.LG 2026-05 unverdicted novelty 6.0

    DANCE frames EEG event identification as a set-prediction problem to jointly detect and classify events directly from raw, unaligned signals, outperforming existing methods on seizure monitoring and matching onset-inf...

  2. MoDAl: Self-Supervised Neural Modality Discovery via Decorrelation for Speech Neuroprosthesis

    q-bio.NC 2026-04 unverdicted novelty 6.0

    MoDAl discovers complementary neurolinguistic modalities via contrastive-decorrelation objectives, cutting brain-to-text word error rate from 26.3% to 21.6% by incorporating area 44 signals.