Comparison of sEMG Encoding Accuracy Across Speech Modes Using Articulatory and Phoneme Features

Adeen Flinker; Amirhossein Khalilian-Gourtani; Beatrice Fumagalli; Chenqian Le; Ruisi Li; Tianyu He; Xupeng Chen; Yao Wang; Yasamin Esmaeili

arxiv: 2604.18920 · v2 · submitted 2026-04-20 · 💻 cs.SD · cs.CL

Comparison of sEMG Encoding Accuracy Across Speech Modes Using Articulatory and Phoneme Features

Chenqian Le , Ruisi Li , Beatrice Fumagalli , Yasamin Esmaeili , Xupeng Chen , Amirhossein Khalilian-Gourtani , Tianyu He , Adeen Flinker

show 1 more author

Yao Wang

This is my paper

Pith reviewed 2026-05-10 02:46 UTC · model grok-4.3

classification 💻 cs.SD cs.CL

keywords sEMG encodingSPARCarticulatory featuresphoneme featuressilent speechmTRFspeech modes

0 comments

The pith

SPARC articulatory features predict sEMG envelopes more accurately than phoneme representations across all tested speech modes.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper examines whether Speech Articulatory Coding features can linearly predict surface electromyography signals from speech muscles in aloud, mimed, and subvocal conditions. Using a regularized linear model on data from 24 subjects, it shows SPARC outperforms simple phoneme codes on nearly every electrode and in every mode, including when speech is silent. Subvocal speech remains predictable above chance level, and the model weights point to consistent links between electrodes and specific articulatory movements. This matters because it identifies a robust intermediate representation that could support silent speech interfaces without relying on audible output or phoneme-based assumptions.

Core claim

SPARC features yield higher prediction accuracy than phoneme one-hot representations on nearly all electrodes and in all speech modes. Aloud and mimed speech perform comparably, subvocal speech remains above chance, variance partitioning shows substantial unique contribution from SPARC, and mTRF weight patterns reveal anatomically interpretable relationships consistent across modes. This supports SPARC as a robust intermediate target for sEMG-based silent-speech modeling.

What carries the argument

Speech Articulatory Coding (SPARC) features as the central representation in elastic-net regularized multivariate temporal response function (mTRF) models for predicting sEMG envelopes.

If this is right

Aloud and mimed speech show comparable encoding performance using SPARC.
Subvocal speech exhibits detectable articulatory activity above chance levels.
SPARC contributes uniquely to predictions beyond what phoneme features provide.
Anatomically interpretable mTRF weights remain consistent across speech modes.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

These findings suggest SPARC could serve as a target for training decoders in practical silent speech applications.
The consistency across modes implies potential for models trained on audible speech to generalize to silent conditions.
Extending the analysis to real-time decoding scenarios could test whether the linear advantage holds under streaming constraints.

Load-bearing premise

The assumption that a linear model with elastic-net regularization and sentence-level cross-validation adequately captures the true encoding relationship without overfitting or missing important nonlinear dynamics.

What would settle it

Finding no significant accuracy advantage for SPARC over phonemes when using a nonlinear decoder on the same data, or observing subvocal prediction accuracy drop to chance levels in an independent replication.

Figures

Figures reproduced from arXiv: 2604.18920 by Adeen Flinker, Amirhossein Khalilian-Gourtani, Beatrice Fumagalli, Chenqian Le, Ruisi Li, Tianyu He, Xupeng Chen, Yao Wang, Yasamin Esmaeili.

**Figure 2.** Figure 2: Encoding performance by electrode and speech mode. (a) Mean Pearson correlation r (average over 24 subjects) for SPARC articulatory features on each channel in Aloud, Mimed, and Subvocal speech. Error bars represent standard error of the mean (SEM). Chance level (Dotted black lines) indicates significance threshold by permutation test (p < 0.05). (b) Per-subject advantage of SPARC over phoneme one-hots, ∆r… view at source ↗

**Figure 3.** Figure 3: Variance partitioning of explained variance ( [PITH_FULL_IMAGE:figures/full_fig_p003_3.png] view at source ↗

**Figure 4.** Figure 4: Per-channel normalized linear weight maps (mean across subjects) [PITH_FULL_IMAGE:figures/full_fig_p004_4.png] view at source ↗

read the original abstract

We test whether Speech Articulatory Coding (SPARC) features can linearly predict surface electromyography (sEMG) envelopes across aloud, mimed, and subvocal speech in twenty-four subjects. Using elastic-net multivariate temporal response function (mTRF) with sentence-level cross-validation, SPARC yields higher prediction accuracy than phoneme one-hot representations on nearly all electrodes and in all speech modes. Aloud and mimed speech perform comparably, and subvocal speech remains above chance, indicating detectable articulatory activity. Variance partitioning shows a substantial unique contribution from SPARC and a minimal unique contribution from phoneme features. mTRF weight patterns reveal anatomically interpretable relationships between electrode sites and articulatory movements that remain consistent across modes. This study focuses on representation/encoding analysis (not end-to-end decoding) and supports SPARC as a robust and interpretable intermediate target for sEMG-based silent-speech modeling.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

SPARC features give higher linear prediction accuracy for sEMG than phoneme one-hots across modes, but the gain could partly trace to feature richness rather than pure representational superiority.

read the letter

The paper's central result is that Speech Articulatory Coding features outperform phoneme one-hot encodings when used to predict sEMG envelopes via elastic-net mTRF, and this holds for aloud, mimed, and subvocal speech in 24 subjects. Variance partitioning attributes most of the explained variance to SPARC with little unique contribution from phonemes, and the mTRF weights show anatomically plausible patterns that stay consistent across modes. Subvocal performance stays above chance, which is useful for silent-speech applications. The design uses sentence-level cross-validation, which is a reasonable step against leakage, and the focus stays on encoding rather than end-to-end decoding. That keeps the claims grounded. The work is incremental but cleanly executed on the comparison it sets out to do. The main soft spot is the dimensionality difference: SPARC supplies continuous multi-dimensional articulatory parameters while phoneme one-hots are sparse and limited by inventory size. Elastic-net regularization reduces but does not remove the risk that SPARC simply has more degrees of freedom to fit noise, especially in lower-SNR subvocal conditions. The abstract gives no numbers on feature counts, exact regularization schedule, or any explicit capacity-matched control, so a referee would need to see those details to judge whether the unique SPARC variance truly reflects better articulatory information. The citation pattern looks standard for the subfield and does not appear to over-claim prior results. This is the kind of targeted empirical study that matters for the BCI and speech-neuroscience crowd working on intermediate representations. Readers who care about sEMG encoding or silent-speech interfaces will find the head-to-head and the anatomical weight maps worth their time. The paper is coherent on its own terms and shows clear thinking about the comparison, so it deserves a serious referee even if the effect sizes turn out modest and the dimensionality concern needs addressing in revision.

Referee Report

2 major / 2 minor

Summary. The manuscript compares Speech Articulatory Coding (SPARC) features against phoneme one-hot encodings for linearly predicting sEMG envelopes in aloud, mimed, and subvocal speech. Using elastic-net regularized mTRF models with sentence-level cross-validation across 24 subjects, it reports higher prediction accuracy for SPARC on nearly all electrodes and modes, substantial unique variance from SPARC in partitioning analyses, minimal unique variance from phonemes, and anatomically interpretable mTRF weights consistent across modes. The work positions SPARC as a robust intermediate representation for sEMG-based silent speech modeling.

Significance. If robust after addressing dimensionality confounds, the results would support SPARC as a more effective and interpretable articulatory target than phoneme encodings for sEMG interfaces, particularly for subvocal speech where above-chance encoding is shown. The cross-mode consistency and variance partitioning provide useful empirical data on articulatory feature encoding from muscle signals.

major comments (2)

[Methods (mTRF and feature comparison)] Methods section on mTRF modeling and feature sets: The comparison applies the same elastic-net regularization schedule to SPARC (continuous, multi-dimensional articulatory parameters) and phoneme one-hot (sparse, bounded by ~40-60 phonemes). No explicit control for effective degrees of freedom or feature dimensionality is described, raising the possibility that SPARC's higher capacity contributes to elevated r-values and unique variance rather than superior representation of sEMG. A matched-dimensionality control or reporting of effective df would be required to support the central claim.
[Results (variance partitioning)] Results section on variance partitioning: The reported unique contribution of SPARC may partly reflect its richer basis set rather than unique neural information. It is unclear whether the partitioning isolates representation quality after accounting for the continuous vs. discrete nature of the features; subsampling SPARC to phoneme dimensionality or adding a capacity-matched baseline would test this.

minor comments (2)

[Abstract] Abstract: No quantitative accuracy values (e.g., mean r or percentage improvement), SPARC extraction details, electrode montage, or statistical test descriptions are provided, limiting immediate assessment of effect sizes.
[Results] Results: The claim that subvocal speech remains above chance would be strengthened by explicit reporting of the chance-level baseline, exact p-values, and correction for multiple comparisons across electrodes.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their constructive comments, which identify a valid methodological concern about potential dimensionality confounds in our feature comparison. We address each point below and commit to revisions that directly test this issue.

read point-by-point responses

Referee: Methods section on mTRF modeling and feature sets: The comparison applies the same elastic-net regularization schedule to SPARC (continuous, multi-dimensional articulatory parameters) and phoneme one-hot (sparse, bounded by ~40-60 phonemes). No explicit control for effective degrees of freedom or feature dimensionality is described, raising the possibility that SPARC's higher capacity contributes to elevated r-values and unique variance rather than superior representation of sEMG. A matched-dimensionality control or reporting of effective df would be required to support the central claim.

Authors: We acknowledge that the continuous, multi-dimensional nature of SPARC versus the discrete phoneme one-hot encoding could introduce a capacity difference, even under elastic-net regularization. Although the L1/L2 penalties and sentence-level cross-validation are intended to limit effective model complexity, we agree that an explicit control is needed to isolate representation quality. In the revised manuscript, we will add a matched-dimensionality analysis by randomly subsampling SPARC features to approximately 50 dimensions (matching the phoneme set) and re-evaluate both prediction accuracies and unique variances. We will also report effective degrees of freedom derived from the regularization paths for both feature sets. revision: yes
Referee: Results section on variance partitioning: The reported unique contribution of SPARC may partly reflect its richer basis set rather than unique neural information. It is unclear whether the partitioning isolates representation quality after accounting for the continuous vs. discrete nature of the features; subsampling SPARC to phoneme dimensionality or adding a capacity-matched baseline would test this.

Authors: We agree that the variance partitioning results require additional controls to rule out capacity effects. To directly address this, the revised manuscript will include a capacity-matched baseline in which SPARC features are subsampled to phoneme dimensionality before repeating the unique-variance analysis. This will clarify whether SPARC's unique contribution persists after dimensionality matching, thereby strengthening the claim that it captures superior articulatory information for sEMG encoding. revision: yes

Circularity Check

0 steps flagged

Empirical comparison via cross-validated mTRF shows no circularity

full rationale

The paper reports an empirical encoding analysis: elastic-net mTRF models are trained on sentence-level cross-validation to predict sEMG envelopes from either SPARC articulatory features or phoneme one-hot vectors. Reported accuracies (and variance partitioning) are computed on held-out sentences and electrodes; no equation or procedure reduces these metrics to a fitted parameter by construction. No self-definitional steps, fitted-input predictions, load-bearing self-citations, uniqueness theorems, or ansatz smuggling appear in the methodology or claims. The central result is a data-driven comparison whose validity can be assessed against external benchmarks (e.g., SNR, electrode anatomy) without tautology.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 0 invented entities

The central claim rests on the assumption that a linear temporal response function suffices to model feature-to-sEMG mapping and that SPARC features are well-defined from prior work.

free parameters (1)

elastic-net regularization strength
Selected via cross-validation but not reported as a specific value; affects the reported accuracies.

axioms (1)

domain assumption Linear relationship between articulatory/phoneme features and sEMG envelopes via mTRF
Invoked by the choice of mTRF encoding model in the abstract.

pith-pipeline@v0.9.0 · 5491 in / 1258 out tokens · 41321 ms · 2026-05-10T02:46:48.043335+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

18 extracted references · 18 canonical work pages

[1]

semg-based technology for silent voice recognition,

W. Li, J. Yuan, L. Zhang, J. Cui, X. Wang, and H. Li, “semg-based technology for silent voice recognition,”Computers in Biology and Medicine, vol. 152, p. 106336, 2023

work page 2023
[2]

Decoding silent speech based on high-density surface electromyogram using spatiotemporal neu- ral network,

X. Chen, X. Zhang, X. Chen, and X. Chen, “Decoding silent speech based on high-density surface electromyogram using spatiotemporal neu- ral network,”IEEE Transactions on Neural Systems and Rehabilitation Engineering, vol. 31, pp. 2069–2078, 2023

work page 2069
[3]

Evidence of vocal tract articulation in self-supervised learning of speech,

C. J. Cho, P. Wu, A. Mohamed, and G. K. Anumanchipalli, “Evidence of vocal tract articulation in self-supervised learning of speech,” inICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2023, pp. 1–5

work page 2023
[4]

A review of data collection practices using electromagnetic articulography,

T. Rebernik, J. Jacobi, R. Jonkers, A. Noiray, and M. Wieling, “A review of data collection practices using electromagnetic articulography,” Laboratory Phonology, vol. 12, no. 1, p. 6, 2021. [Online]. Available: https://doi.org/10.5334/labphon.237

work page doi:10.5334/labphon.237 2021
[5]

Coding speech through vocal tract kinematics,

C. J. Cho, P. Wu, T. S. Prabhune, D. Agarwal, and G. K. Anumanchipalli, “Coding speech through vocal tract kinematics,” arXiv preprint arXiv:2406.12998, 2024. [Online]. Available: https: //arxiv.org/abs/2406.12998

work page arXiv 2024
[6]

The mul- tivariate temporal response function (mtrf) toolbox: A matlab toolbox for relating neural signals to continuous stimuli,

M. J. Crosse, G. M. Di Liberto, A. Bednar, and E. C. Lalor, “The mul- tivariate temporal response function (mtrf) toolbox: A matlab toolbox for relating neural signals to continuous stimuli,”Frontiers in Human Neuroscience, vol. 10, p. 604, 2016

work page 2016
[7]

Low-frequency cortical entrainment to speech reflects phoneme-level processing,

G. M. Di Liberto, J. A. O’Sullivan, and E. C. Lalor, “Low-frequency cortical entrainment to speech reflects phoneme-level processing,”Cur- rent Biology, vol. 25, no. 19, pp. 2457–2465, 2015

work page 2015
[8]

Fourier power, subjective distance, and object categories all provide plausible models of bold responses in scene-selective visual areas,

M. D. Lescroart, D. E. Stansbury, and J. L. Gallant, “Fourier power, subjective distance, and object categories all provide plausible models of bold responses in scene-selective visual areas,”Frontiers in Compu- tational Neuroscience, vol. 9, p. 135, 2015

work page 2015
[9]

A left-lateralized dorsolateral prefrontal network for naming,

L. Yu, P. Dugan, W. Doyle, O. Devinsky, D. Friedman, and A. Flinker, “A left-lateralized dorsolateral prefrontal network for naming,”Cell Reports, vol. 44, no. 5, p. 115677, 2025. [Online]. Available: https://doi.org/10.1016/j.celrep.2025.115677

work page doi:10.1016/j.celrep.2025.115677 2025
[10]

Regularization and variable selection via the elastic net,

H. Zou and T. Hastie, “Regularization and variable selection via the elastic net,”Journal of the Royal Statistical Society: Series B (Statistical Methodology), vol. 67, no. 2, pp. 301–320, 2005

work page 2005
[11]

Admm mtrf: A fast implementation of multivariate temporal response function (mtrf) with elastic net,

A. H. Khalilian, “Admm mtrf: A fast implementation of multivariate temporal response function (mtrf) with elastic net,” https://github.com/ amirhkhalilian/ADMM mTRF, 2025, mIT License

work page 2025
[12]

Montreal forced aligner: Trainable text-speech alignment using kaldi,

M. McAuliffe, M. Socolof, S. Mihuc, M. Wagner, and M. Sonderegger, “Montreal forced aligner: Trainable text-speech alignment using kaldi,” inProc. Interspeech, 2017, pp. 498–502

work page 2017
[13]

Dynamic programming algorithm optimization for spoken word recognition,

H. Sakoe and S. Chiba, “Dynamic programming algorithm optimization for spoken word recognition,”IEEE Transactions on Acoustics, Speech, and Signal Processing, vol. 26, no. 1, pp. 43–49, 1978

work page 1978
[14]

TIMIT Acoustic-Phonetic Continuous Speech Corpus,

J. S. Garofolo, L. F. Lamel, W. M. Fisher, J. G. Fiscus, D. S. Pallett, N. L. Dahlgren, and V . Zue, “TIMIT Acoustic-Phonetic Continuous Speech Corpus,” Linguistic Data Consortium, LDC93S1, Philadelphia, 1993

work page 1993
[15]

Signal acquisition and processing techniques for semg based silent speech recognition,

G. S. Meltzner, G. Colby, Y . Deng, and J. T. Heaton, “Signal acquisition and processing techniques for semg based silent speech recognition,” in2011 Annual International Conference of the IEEE Engineering in Medicine and Biology Society, 2011, pp. 4848–4851

work page 2011
[16]

Controlling the false discovery rate: A practical and powerful approach to multiple testing,

Y . Benjamini and Y . Hochberg, “Controlling the false discovery rate: A practical and powerful approach to multiple testing,”Journal of the Royal Statistical Society: Series B (Methodological), vol. 57, no. 1, pp. 289–300, 1995

work page 1995
[17]

Individual comparisons by ranking methods,

F. Wilcoxon, “Individual comparisons by ranking methods,”Biometrics Bulletin, vol. 1, no. 6, pp. 80–83, 1945

work page 1945
[18]

Digital voicing of silent speech,

D. Gaddy and D. Klein, “Digital voicing of silent speech,” inProceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP). Association for Computational Linguistics, 2020, pp. 5521–5530. [Online]. Available: https://aclanthology.org/2020.emnlp-main.445/

work page 2020

[1] [1]

semg-based technology for silent voice recognition,

W. Li, J. Yuan, L. Zhang, J. Cui, X. Wang, and H. Li, “semg-based technology for silent voice recognition,”Computers in Biology and Medicine, vol. 152, p. 106336, 2023

work page 2023

[2] [2]

Decoding silent speech based on high-density surface electromyogram using spatiotemporal neu- ral network,

X. Chen, X. Zhang, X. Chen, and X. Chen, “Decoding silent speech based on high-density surface electromyogram using spatiotemporal neu- ral network,”IEEE Transactions on Neural Systems and Rehabilitation Engineering, vol. 31, pp. 2069–2078, 2023

work page 2069

[3] [3]

Evidence of vocal tract articulation in self-supervised learning of speech,

C. J. Cho, P. Wu, A. Mohamed, and G. K. Anumanchipalli, “Evidence of vocal tract articulation in self-supervised learning of speech,” inICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2023, pp. 1–5

work page 2023

[4] [4]

A review of data collection practices using electromagnetic articulography,

T. Rebernik, J. Jacobi, R. Jonkers, A. Noiray, and M. Wieling, “A review of data collection practices using electromagnetic articulography,” Laboratory Phonology, vol. 12, no. 1, p. 6, 2021. [Online]. Available: https://doi.org/10.5334/labphon.237

work page doi:10.5334/labphon.237 2021

[5] [5]

Coding speech through vocal tract kinematics,

C. J. Cho, P. Wu, T. S. Prabhune, D. Agarwal, and G. K. Anumanchipalli, “Coding speech through vocal tract kinematics,” arXiv preprint arXiv:2406.12998, 2024. [Online]. Available: https: //arxiv.org/abs/2406.12998

work page arXiv 2024

[6] [6]

The mul- tivariate temporal response function (mtrf) toolbox: A matlab toolbox for relating neural signals to continuous stimuli,

M. J. Crosse, G. M. Di Liberto, A. Bednar, and E. C. Lalor, “The mul- tivariate temporal response function (mtrf) toolbox: A matlab toolbox for relating neural signals to continuous stimuli,”Frontiers in Human Neuroscience, vol. 10, p. 604, 2016

work page 2016

[7] [7]

Low-frequency cortical entrainment to speech reflects phoneme-level processing,

G. M. Di Liberto, J. A. O’Sullivan, and E. C. Lalor, “Low-frequency cortical entrainment to speech reflects phoneme-level processing,”Cur- rent Biology, vol. 25, no. 19, pp. 2457–2465, 2015

work page 2015

[8] [8]

Fourier power, subjective distance, and object categories all provide plausible models of bold responses in scene-selective visual areas,

M. D. Lescroart, D. E. Stansbury, and J. L. Gallant, “Fourier power, subjective distance, and object categories all provide plausible models of bold responses in scene-selective visual areas,”Frontiers in Compu- tational Neuroscience, vol. 9, p. 135, 2015

work page 2015

[9] [9]

A left-lateralized dorsolateral prefrontal network for naming,

L. Yu, P. Dugan, W. Doyle, O. Devinsky, D. Friedman, and A. Flinker, “A left-lateralized dorsolateral prefrontal network for naming,”Cell Reports, vol. 44, no. 5, p. 115677, 2025. [Online]. Available: https://doi.org/10.1016/j.celrep.2025.115677

work page doi:10.1016/j.celrep.2025.115677 2025

[10] [10]

Regularization and variable selection via the elastic net,

H. Zou and T. Hastie, “Regularization and variable selection via the elastic net,”Journal of the Royal Statistical Society: Series B (Statistical Methodology), vol. 67, no. 2, pp. 301–320, 2005

work page 2005

[11] [11]

Admm mtrf: A fast implementation of multivariate temporal response function (mtrf) with elastic net,

A. H. Khalilian, “Admm mtrf: A fast implementation of multivariate temporal response function (mtrf) with elastic net,” https://github.com/ amirhkhalilian/ADMM mTRF, 2025, mIT License

work page 2025

[12] [12]

Montreal forced aligner: Trainable text-speech alignment using kaldi,

M. McAuliffe, M. Socolof, S. Mihuc, M. Wagner, and M. Sonderegger, “Montreal forced aligner: Trainable text-speech alignment using kaldi,” inProc. Interspeech, 2017, pp. 498–502

work page 2017

[13] [13]

Dynamic programming algorithm optimization for spoken word recognition,

H. Sakoe and S. Chiba, “Dynamic programming algorithm optimization for spoken word recognition,”IEEE Transactions on Acoustics, Speech, and Signal Processing, vol. 26, no. 1, pp. 43–49, 1978

work page 1978

[14] [14]

TIMIT Acoustic-Phonetic Continuous Speech Corpus,

J. S. Garofolo, L. F. Lamel, W. M. Fisher, J. G. Fiscus, D. S. Pallett, N. L. Dahlgren, and V . Zue, “TIMIT Acoustic-Phonetic Continuous Speech Corpus,” Linguistic Data Consortium, LDC93S1, Philadelphia, 1993

work page 1993

[15] [15]

Signal acquisition and processing techniques for semg based silent speech recognition,

G. S. Meltzner, G. Colby, Y . Deng, and J. T. Heaton, “Signal acquisition and processing techniques for semg based silent speech recognition,” in2011 Annual International Conference of the IEEE Engineering in Medicine and Biology Society, 2011, pp. 4848–4851

work page 2011

[16] [16]

Controlling the false discovery rate: A practical and powerful approach to multiple testing,

Y . Benjamini and Y . Hochberg, “Controlling the false discovery rate: A practical and powerful approach to multiple testing,”Journal of the Royal Statistical Society: Series B (Methodological), vol. 57, no. 1, pp. 289–300, 1995

work page 1995

[17] [17]

Individual comparisons by ranking methods,

F. Wilcoxon, “Individual comparisons by ranking methods,”Biometrics Bulletin, vol. 1, no. 6, pp. 80–83, 1945

work page 1945

[18] [18]

Digital voicing of silent speech,

D. Gaddy and D. Klein, “Digital voicing of silent speech,” inProceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP). Association for Computational Linguistics, 2020, pp. 5521–5530. [Online]. Available: https://aclanthology.org/2020.emnlp-main.445/

work page 2020