Masked multimodal training on sEMG and lipreading reduces word error rate by up to 14 percentage points and improves robustness to modality loss in silent speech synthesis.
HiFTNet: A Fast High- Quality Neural V ocoder with Harmonic-Plus-Noise Filter and Inverse Short Time Fourier Transform,
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
years
2026 2verdicts
UNVERDICTED 2representative citing papers
Unified guidance framework for Flow Matching speech synthesis achieves nearly 3x faster inference and improved speaker similarity by combining heterogeneous data augmentation with intrinsic model guidance to eliminate CFG overhead.
citing papers explorer
-
Cross-Modal Masking for Robust Silent Speech Synthesis Using sEMG and Lipreading
Masked multimodal training on sEMG and lipreading reduces word error rate by up to 14 percentage points and improves robustness to modality loss in silent speech synthesis.