End-to-End Intracortical Speech Decoding from Neural Activity
Pith reviewed 2026-06-30 13:55 UTC · model grok-4.3
The pith
An end-to-end Conformer decoder extracts character sequences from intracortical brain signals at 23.80 percent error rate without any external language model.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
An end-to-end Conformer-based neural decoder trained directly on intracortical recordings from a participant with ALS achieves a character error rate of 23.80 percent on held-out validation data without any external language model. Performance variability stems mainly from inter-session signal degradation, and the dominant error type is incorrect word boundary segmentation. These outcomes establish that effective character-level decoding is possible in a fully end-to-end framework and that the decoded neural signal supplies a strong foundation for downstream linguistic processing.
What carries the argument
The end-to-end Conformer-based neural decoder trained directly on intracortical recordings, which maps raw neural activity to character sequences without intermediate language-model correction.
If this is right
- Character sequences can be produced from neural activity alone, removing the need for an external language model at inference time.
- The decoded output remains usable as input to any later language-processing stage.
- Inter-session signal changes are the primary driver of performance drops, pointing to signal stability as the next limiting factor.
- Word-boundary errors dominate over letter-level mistakes, suggesting boundary detection as a high-value target for further refinement.
Where Pith is reading between the lines
- If signal stability across sessions can be improved through hardware or preprocessing changes, the same decoder architecture would likely show lower error rates on new data.
- The end-to-end character stream could be fed into existing language models as an additional input rather than replaced by them, potentially combining the strengths of both.
- The approach isolates the contribution of the raw neural signal, allowing direct comparison of decoder performance across different recording sites or participant groups without confounding language-model effects.
Load-bearing premise
Recordings from a single participant contain enough stable information that a decoder trained on some sessions will continue to work on held-out sessions despite changes in the recorded signal.
What would settle it
Re-training and testing the same decoder architecture on additional held-out sessions from the same participant that yield character error rates near 100 percent would show the reported performance does not generalize beyond the specific training sessions used.
Figures
read the original abstract
Current high-performing intracortical speech neuroprostheses achieve low word error rates but typically rely on external language models during inference, increasing memory, computation, and latency. In this work, we investigate whether meaningful character-level decoding is achievable without such models. We propose an end-to-end Conformer-based neural decoder trained directly on intracortical recordings from a participant with amyotrophic lateral sclerosis (ALS). Without any external language model, the system achieves a character error rate (CER) of 23.80\% on held-out validation data. Analysis shows that performance variability is driven by inter-session signal degradation, while dominant errors arise from incorrect word boundary segmentation. These results demonstrate that effective character-level decoding is possible in a fully end-to-end framework, providing a strong neural signal for downstream linguistic processing.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript presents an end-to-end Conformer-based neural decoder trained directly on intracortical recordings from a single ALS participant. It reports a character error rate of 23.80% on held-out validation data without any external language model, attributes performance variability to inter-session signal degradation, identifies word-boundary segmentation as the dominant error type, and concludes that effective character-level decoding is achievable in a fully end-to-end framework, yielding a strong neural signal for downstream linguistic processing.
Significance. If the held-out validation set is demonstrably session-disjoint, the result would be significant because it establishes that usable character-level decoding is possible without an external LM, directly addressing latency, memory, and compute concerns in intracortical speech neuroprostheses. The explicit reporting of a numeric CER on held-out data and the error analysis constitute concrete, falsifiable claims that strengthen the contribution relative to LM-dependent baselines.
major comments (1)
- [Abstract / Methods] Abstract and Methods: The claim that the 23.80% CER on held-out validation data reflects a 'strong neural signal' independent of session effects is load-bearing, yet the manuscript provides no explicit description of how the train/validation split respects session boundaries. Because the abstract itself states that performance variability is driven by inter-session signal degradation, it is necessary to verify that validation utterances come from temporally and session-disjoint blocks; otherwise the reported CER could be inflated by shared non-stationarities rather than stable neural information.
minor comments (2)
- The manuscript should report model hyperparameters, training procedure, data-split statistics (number of sessions, utterances per split), and any statistical significance testing around the 23.80% CER to allow independent assessment of the result.
- Figure or table presenting per-session CER values would directly support the inter-session degradation analysis and make the variability claim more transparent.
Simulated Author's Rebuttal
We thank the referee for the constructive feedback on the data splitting procedure. We address the single major comment below.
read point-by-point responses
-
Referee: [Abstract / Methods] Abstract and Methods: The claim that the 23.80% CER on held-out validation data reflects a 'strong neural signal' independent of session effects is load-bearing, yet the manuscript provides no explicit description of how the train/validation split respects session boundaries. Because the abstract itself states that performance variability is driven by inter-session signal degradation, it is necessary to verify that validation utterances come from temporally and session-disjoint blocks; otherwise the reported CER could be inflated by shared non-stationarities rather than stable neural information.
Authors: We agree that an explicit description of the session-disjoint nature of the split is required to support the interpretation of the reported CER. The current manuscript does not provide this level of detail in the Methods section. In the revision we will add a clear statement that the train/validation partition was performed at the session level, with all validation utterances drawn from temporally later sessions that share no overlap with the training sessions. This procedure was chosen precisely to mitigate the inter-session signal degradation highlighted in the abstract and to ensure the CER reflects generalization rather than within-session non-stationarities. revision: yes
Circularity Check
No circularity in empirical reporting of held-out performance
full rationale
The paper presents an empirical result: a Conformer model trained on intracortical recordings achieves 23.80% CER on held-out validation data without an external language model. This is a direct measurement on data not used in training, with no mathematical derivation chain, no parameters fitted to a subset then renamed as predictions, and no load-bearing self-citations or uniqueness theorems invoked. The central claim rests on observable performance metrics rather than any reduction to its own inputs by construction.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption Intracortical neural recordings from a single ALS participant contain sufficient information for meaningful character-level speech decoding without external linguistic models.
Reference graph
Works this paper leans on
-
[1]
Introduction Neural speech prostheses [1, 2] represent one of the most am- bitious frontiers in modern neuroscience and biomedical engi- neering, offering the prospect of restoring lost communication to individuals with severe neurological conditions [3, 4, 5, 6]. Among the populations who stand to benefit most are those af- fected by amyotrophic lateral ...
work page internal anchor Pith review Pith/arXiv arXiv 2026
-
[2]
Related Work The decoding of speech and language from neural signals has progressed rapidly across multiple recording modalities [39, 40, 41]. Early work with ECoG demonstrated that neural activity in speech-related cortical regions contains sufficient informa- tion to reconstruct acoustic features and classify phonemes [19, 41]. Sequence-to-sequence appr...
-
[3]
The proposed pipeline, depicted in Fig
Methods We evaluate an end-to-end intracortical speech decoder on the public Brain-to-Text ’25 benchmark. The proposed pipeline, depicted in Fig. 1, first applies a session-specific alignment layer to the neural features, followed by temporal patch embed- ding and a Conformer encoder that predicts character sequences with a CTC objective. During training,...
2048
-
[4]
Results In this section, we report the performance of the proposed model on the Brain-to-Text ’25 benchmark, focusing on the val- idation set (1,426 sentences), and analyze the main factors in- fluencing its behavior. 4.1. Overall Performance and Model Comparison We first compare in Table 3 the proposed Conformer-based model against the baseline provided ...
2023
-
[5]
Conclusion In this work, we presented an end-to-end Conformer-based decoder for intracortical speech neuroprostheses that directly maps neural activity to character sequences. By combining dataset augmentation, a session-specific alignment layer, tem- poral patch embedding, and a Conformer encoder trained with a CTC objective and entropy regularization, t...
-
[6]
Acknowledgement This work was supported by grants PID2022-141378OB- C22 and AIA2025-163317-C32 funded by MI- CIU/AEI/10.13039/501100011033 and ERDF/EU
-
[7]
Brain-computer interfaces for restoring communi- cation,
E. F. Chang, “Brain-computer interfaces for restoring communi- cation,”New England Journal of Medicine, vol. 391, no. 7, pp. 654–657, 2024
2024
-
[8]
The speech neuroprosthesis,
A. B. Silva, K. T. Littlejohn, J. R. Liu, D. A. Moses, and E. F. Chang, “The speech neuroprosthesis,”Nature Reviews Neuro- science, vol. 25, no. 7, pp. 473–492, 2024
2024
-
[9]
Neuronal ensemble control of prosthetic devices by a human with tetraplegia,
L. R. Hochberg, M. D. Serruya, G. M. Friehs, J. A. Mukand, M. Saleh, A. H. Caplan, A. Branner, D. Chen, R. D. Penn, and J. P. Donoghue, “Neuronal ensemble control of prosthetic devices by a human with tetraplegia,”Nature, vol. 442, no. 7099, pp. 164– 171, 2006
2006
-
[10]
Cortical con- trol of arm movements: A dynamical systems perspective,
K. V . Shenoy, M. Sahani, and M. M. Churchland, “Cortical con- trol of arm movements: A dynamical systems perspective,”An- nual Review of Neuroscience, vol. 36, pp. 337–359, 2013
2013
-
[11]
Cognitive neural prosthetics,
R. A. Andersen, J. W. Burdick, S. Musallam, B. Pesaran, and J. G. Cham, “Cognitive neural prosthetics,”Trends in Cognitive Sciences, vol. 8, no. 11, pp. 486–493, 2004
2004
-
[12]
Connecting cortex to machines: Recent advances in brain interfaces,
J. P. Donoghue, “Connecting cortex to machines: Recent advances in brain interfaces,”Nature Neuroscience, vol. 5, pp. 1085–1088, 2002
2002
-
[13]
A spelling device for the paralysed,
N. Birbaumer, N. Ghanayim, T. Hinterberger, I. Iversen, B. Kotchoubey, A. K ¨ubler, J. Perelmouter, E. Taub, and H. Flor, “A spelling device for the paralysed,”Nature, vol. 398, no. 6725, pp. 297–298, 1999
1999
-
[14]
Brain-computer interfaces for communication and rehabilita- tion,
U. Chaudhary, N. Birbaumer, and A. Ramos-Murguialday, “Brain-computer interfaces for communication and rehabilita- tion,”Nature Reviews Neurology, vol. 12, no. 9, pp. 513–525, 2016
2016
-
[15]
Fully implanted brain-computer interface in a locked-in patient with ALS,
M. J. Vansteensel, E. G. M. Pels, M. G. Bleichner, M. P. Branco, T. Denison, Z. V . Freudenburg, P. Gosselaar, S. Leinders, T. H. Ottens, M. A. Van Den Boom, P. C. Van Rijen, E. J. Aarnoutse, and N. F. Ramsey, “Fully implanted brain-computer interface in a locked-in patient with ALS,”New England Journal of Medicine, vol. 375, no. 21, pp. 2060–2066, 2016
2060
-
[16]
Brain-computer interfaces for communica- tion and control,
J. R. Wolpaw, N. Birbaumer, D. J. McFarland, G. Pfurtscheller, and T. M. Vaughan, “Brain-computer interfaces for communica- tion and control,”Clinical Neurophysiology, vol. 113, no. 6, pp. 767–791, 2002
2002
-
[17]
Brain-machine interfaces: Past, present and future,
M. A. Lebedev and M. A. L. Nicolelis, “Brain-machine interfaces: Past, present and future,”Trends in Neurosciences, vol. 29, no. 9, pp. 536–546, 2006
2006
-
[18]
A brain-computer interface using electrocortico- graphic signals in humans,
E. C. Leuthardt, G. Schalk, J. R. Wolpaw, J. G. Ojemann, and D. W. Moran, “A brain-computer interface using electrocortico- graphic signals in humans,”Journal of Neural Engineering, vol. 1, no. 2, pp. 63–71, 2004
2004
-
[19]
The open dataset of EEG motor imagery: BCI motor imagery data from healthy subjects,
G. Wang, C. Teng, K. Li, Z. Zhang, and Y . Chai, “The open dataset of EEG motor imagery: BCI motor imagery data from healthy subjects,”Frontiers in Neuroscience, vol. 16, p. 1044299, 2022
2022
-
[20]
Semantic reconstruc- tion of continuous language from non-invasive brain recordings,
J. Tang, A. LeBel, S. Jain, and A. G. Huth, “Semantic reconstruc- tion of continuous language from non-invasive brain recordings,” Nature Neuroscience, vol. 26, no. 5, pp. 858–866, 2023
2023
-
[21]
Enhancing detection of ssveps for a high-speed brain speller using task-related component analysis,
M. Nakanishi, Y . Wang, X. Chen, Y .-T. Wang, X. Gao, and T.- P. Jung, “Enhancing detection of ssveps for a high-speed brain speller using task-related component analysis,”IEEE Transac- tions on Biomedical Engineering, vol. 65, no. 1, pp. 104–112, 2018
2018
-
[22]
A comprehensive review of EEG-based brain-computer interface paradigms,
R. Abiri, S. Borhani, E. W. Sellers, Y . Jiang, and X. Zhao, “A comprehensive review of EEG-based brain-computer interface paradigms,”Journal of Neural Engineering, vol. 16, no. 1, p. 011001, 2019
2019
-
[23]
Machine translation of cortical activity to text with an encoder-decoder framework,
J. G. Makin, D. A. Moses, and E. F. Chang, “Machine translation of cortical activity to text with an encoder-decoder framework,” Nature Neuroscience, vol. 23, no. 4, pp. 575–582, 2020
2020
-
[24]
Neuropros- thesis for decoding speech in a paralyzed person with anarthria,
D. A. Moses, S. L. Metzger, J. R. Liu, G. K. Anumanchipalli, J. G. Makin, P. F. Sun, J. Chartier, M. E. Dougherty, P. M. Liu, G. M. Abrams, A. Tu-Chan, K. Ganguly, and E. F. Chang, “Neuropros- thesis for decoding speech in a paralyzed person with anarthria,” New England Journal of Medicine, vol. 385, no. 3, pp. 217–227, 2021
2021
-
[25]
Brain-to-text: Decoding spoken phrases from phone representations in the brain,
C. Herff, D. Heger, A. De Pesters, D. Telaar, P. Brunner, G. Schalk, and T. Schultz, “Brain-to-text: Decoding spoken phrases from phone representations in the brain,”Frontiers in Neuroscience, vol. 9, p. 217, 2015
2015
-
[26]
Speech synthesis from ECoG using densely connected 3D convolutional neural networks,
M. Angrick, C. Herff, E. Mugler, M. C. Tate, M. W. Slutzky, D. J. Krusienski, and T. Schultz, “Speech synthesis from ECoG using densely connected 3D convolutional neural networks,”Journal of Neural Engineering, vol. 16, no. 3, p. 036019, 2019
2019
-
[27]
High performance communication by people with paralysis using an intracortical brain-computer interface,
C. Pandarinath, P. Nuyujukian, C. H. Blabe, B. L. Sorice, J. Saab, F. R. Willett, L. R. Hochberg, K. V . Shenoy, and J. M. Hender- son, “High performance communication by people with paralysis using an intracortical brain-computer interface,”eLife, vol. 6, p. e18554, 2017
2017
-
[28]
Clini- cal translation of a high-performance neural prosthesis,
V . Gilja, C. Pandarinath, C. H. Blabe, P. Nuyujukian, J. D. Simeral, A. A. Sarma, B. L. Sorice, J. A. Perge, B. Jarosiewicz, L. R. Hochberg, K. V . Shenoy, and J. M. Henderson, “Clini- cal translation of a high-performance neural prosthesis,”Nature Medicine, vol. 21, no. 10, pp. 1142–1145, 2015
2015
-
[29]
Reach and grasp by people with tetraplegia using a neurally controlled robotic arm,
L. R. Hochberg, D. Bacher, B. Jarosiewicz, N. Y . Masse, J. D. Simeral, J. V ogel, S. Haddadin, J. Liu, S. S. Cash, P. van der Smagt, and J. P. Donoghue, “Reach and grasp by people with tetraplegia using a neurally controlled robotic arm,”Nature, vol. 485, no. 7398, pp. 372–375, 2012
2012
-
[30]
Accurate estimation of neural population dynam- ics without spike sorting,
E. M. Trautmann, S. D. Stavisky, S. Lahiri, K. C. Ames, M. T. Kaufman, D. J. O’Shea, S. Vyas, X. Sun, I. Bhowmick, S. Bhowmick, B. M. Yu, N. Even-Chen, J. M. Henderson, and K. V . Shenoy, “Accurate estimation of neural population dynam- ics without spike sorting,”Neuron, vol. 103, no. 2, pp. 292–308, 2019
2019
-
[31]
High-performance brain-to-text communica- tion via handwriting,
F. R. Willett, D. T. Avansino, L. R. Hochberg, J. M. Henderson, and K. V . Shenoy, “High-performance brain-to-text communica- tion via handwriting,”Nature, vol. 593, no. 7858, pp. 249–254, 2021
2021
-
[32]
A high-performance speech neuro- prosthesis,
F. R. Willett, E. M. Kunz, C. Fan, D. T. Avansino, G. H. Wilson, E. Y . Choi, F. Kamdar, L. R. Hochberg, J. M. Henderson, P. Bhatt, P. Rezaii, and K. V . Shenoy, “A high-performance speech neuro- prosthesis,”Nature, vol. 620, no. 7976, pp. 1031–1036, 2023
2023
-
[33]
An accurate and rapidly calibrating speech neuroprosthesis,
N. S. Card, M. Wairagkar, C. Iacono, P. Bhatt, T. Singer-Clark, F. R. Willett, K. C. Ames, J. Liu, P. Rezaii, L. R. Hochberg, J. M. Henderson, K. V . Shenoy, and D. M. Brandman, “An accurate and rapidly calibrating speech neuroprosthesis,”New England Journal of Medicine, vol. 391, no. 7, pp. 609–618, 2024
2024
-
[34]
wav2vec 2.0: A framework for self-supervised learning of speech repre- sentations,
A. Baevski, H. Zhou, A. Mohamed, and M. Auli, “wav2vec 2.0: A framework for self-supervised learning of speech repre- sentations,”Advances in Neural Information Processing Systems, vol. 33, pp. 12 449–12 460, 2020
2020
-
[35]
Robust speech recognition via large-scale weak su- pervision,
A. Radford, J. W. Kim, T. Xu, G. Brockman, C. McLeavey, and I. Sutskever, “Robust speech recognition via large-scale weak su- pervision,”Proceedings of the International Conference on Ma- chine Learning, pp. 28 492–28 518, 2023
2023
-
[36]
Con- nectionist temporal classification: Labelling unsegmented se- quence data with recurrent neural networks,
A. Graves, S. Fern ´andez, F. Gomez, and J. Schmidhuber, “Con- nectionist temporal classification: Labelling unsegmented se- quence data with recurrent neural networks,”Proceedings of the International Conference on Machine Learning, pp. 369–376, 2006
2006
-
[37]
Deep Speech: Scaling up end-to-end speech recognition
A. Hannun, C. Case, J. Casper, B. Catanzaro, G. Diamos, E. Elsen, R. Prenger, S. Satheesh, S. Sengupta, A. Coates, and A. Y . Ng, “Deep speech: Scaling up end-to-end speech recogni- tion,” inarXiv preprint arXiv:1412.5567, 2014
work page internal anchor Pith review Pith/arXiv arXiv 2014
-
[38]
Single-trial dynamics of motor cortex and their applications to brain-machine interfaces,
J. C. Kao, P. Nuyujukian, S. I. Ryu, M. M. Churchland, J. P. Cunningham, and K. V . Shenoy, “Single-trial dynamics of motor cortex and their applications to brain-machine interfaces,”Nature Communications, vol. 6, p. 7759, 2015
2015
-
[39]
Conformer: Convolution-augmented transformer for speech recognition,
A. Gulati, J. Qin, C.-C. Chiu, N. Parmar, Y . Zhang, J. Yu, W. Han, S. Wang, Z. Zhang, Y . Wu, and R. Pang, “Conformer: Convolution-augmented transformer for speech recognition,” in Proc. Interspeech, 2020, pp. 5036–5040
2020
-
[40]
Neural control of cursor trajectory and click by a human with tetraplegia 1000 days after implant of an intracorti- cal microelectrode array,
J. D. Simeral, S.-P. Kim, M. J. Black, J. P. Donoghue, and L. R. Hochberg, “Neural control of cursor trajectory and click by a human with tetraplegia 1000 days after implant of an intracorti- cal microelectrode array,”Journal of Neural Engineering, vol. 8, no. 2, p. 025027, 2011
2011
-
[41]
Neural manifolds for the control of movement,
J. A. Gallego, M. G. Perich, L. E. Miller, and S. A. Solla, “Neural manifolds for the control of movement,”Neuron, vol. 94, no. 5, pp. 978–984, 2017
2017
-
[42]
Single-unit stability us- ing chronically implanted multielectrode arrays in motor cortex of macaque monkeys,
C. A. Chestek, V . Gilja, P. Nuyujukian, J. D. Foster, J. M. Fan, M. T. Kaufman, M. M. Churchland, Z. Rivera-Alvidrez, J. P. Cun- ningham, S. I. Ryu, and K. V . Shenoy, “Single-unit stability us- ing chronically implanted multielectrode arrays in motor cortex of macaque monkeys,”Journal of Neurophysiology, vol. 105, no. 2, pp. 567–579, 2011
2011
-
[43]
Jasper: An end-to-end con- volutional neural acoustic model,
J. Li, V . Lavrukhin, B. Ginsburg, R. Leary, O. Kuchaiev, J. M. Cohen, H. Nguyen, and R. T. Gadde, “Jasper: An end-to-end con- volutional neural acoustic model,” inProc. Interspeech, 2019, pp. 71–75
2019
-
[44]
SpecAugment: A simple data augmen- tation method for automatic speech recognition,
D. S. Park, W. Chan, Y . Zhang, C.-C. Chiu, B. Zoph, E. D. Cubuk, and Q. V . Le, “SpecAugment: A simple data augmen- tation method for automatic speech recognition,” inProc. Inter- speech, 2019, pp. 2613–2617
2019
-
[45]
Brain-computer interfaces for speech communica- tion,
J. S. Brumberg, A. Nieto-Castanon, P. R. Kennedy, and F. H. Guenther, “Brain-computer interfaces for speech communica- tion,”Speech Communication, vol. 52, no. 4, pp. 367–379, 2010
2010
-
[46]
Decoding spectrotemporal features of overt and covert speech from the hu- man cortex,
S. Martin, P. Brunner, C. Holdgraf, H.-J. Heinze, N. E. Crone, J. Rieger, G. Schalk, R. T. Knight, and B. N. Pasley, “Decoding spectrotemporal features of overt and covert speech from the hu- man cortex,”Frontiers in Neuroengineering, vol. 7, p. 14, 2014
2014
-
[47]
Speech syn- thesis from neural decoding of spoken sentences,
G. K. Anumanchipalli, J. Chartier, and E. F. Chang, “Speech syn- thesis from neural decoding of spoken sentences,”Nature, vol. 568, no. 7753, pp. 493–498, 2019
2019
-
[48]
A high-performance neuroprosthesis for speech decoding and avatar control,
S. L. Metzger, K. T. Littlejohn, A. B. Silva, D. A. Moses, M. P. Seaton, R. Wang, M. E. Dougherty, J. R. Liu, P. Wu, M. A. Berger, I. Zhuravleva, A. Tu-Chan, K. Ganguly, G. K. Anumanchipalli, and E. F. Chang, “A high-performance neuroprosthesis for speech decoding and avatar control,”Nature, vol. 620, no. 7976, pp. 1037–1046, 2023
2023
-
[49]
Listen, attend and spell: A neural network for large vocabulary conversa- tional speech recognition,
W. Chan, N. Jaitly, Q. Le, and O. Vinyals, “Listen, attend and spell: A neural network for large vocabulary conversa- tional speech recognition,” inProceedings of the IEEE Interna- tional Conference on Acoustics, Speech and Signal Processing (ICASSP), 2016, pp. 4960–4964
2016
-
[50]
Virtual typing by people with tetraplegia using a self-calibrating intracor- tical brain-computer interface,
B. Jarosiewicz, A. A. Sarma, D. Bacher, N. Y . Masse, J. D. Simeral, B. Sorice, E. M. Oakley, C. Blabe, C. Pandarinath, V . Gilja, S. S. Cash, E. N. Eskandar, G. Friehs, J. M. Hender- son, K. V . Shenoy, J. P. Donoghue, and L. R. Hochberg, “Virtual typing by people with tetraplegia using a self-calibrating intracor- tical brain-computer interface,”Science...
2015
-
[51]
Stabilization of a brain- computer interface via the alignment of low-dimensional spaces of neural activity,
A. D. Degenhart, W. E. Bishop, E. R. Oby, E. C. Tyler-Kabara, S. M. Chase, A. P. Batista, and B. M. Yu, “Stabilization of a brain- computer interface via the alignment of low-dimensional spaces of neural activity,”Nature Biomedical Engineering, vol. 4, no. 7, pp. 672–685, 2020
2020
-
[52]
An image is worth 16x16 words: Transformers for image recognition at scale,
A. Dosovitskiy, L. Beyer, A. Kolesnikov, D. Weissenborn, X. Zhai, T. Unterthiner, M. Dehghani, M. Minderer, G. Heigold, S. Gelly, J. Uszkoreit, and N. Houlsby, “An image is worth 16x16 words: Transformers for image recognition at scale,” inProc. In- ternational Conference on Learning Representations, 2021
2021
-
[53]
Swin transformer: Hierarchical vision transformer us- ing shifted windows,
Z. Liu, Y . Lin, Y . Cao, H. Hu, Y . Wei, Z. Zhang, S. Lin, and B. Guo, “Swin transformer: Hierarchical vision transformer us- ing shifted windows,” inProceedings of the IEEE/CVF Interna- tional Conference on Computer Vision (ICCV), 2021, pp. 10 012– 10 022
2021
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.