Masked multimodal training on sEMG and lipreading reduces word error rate by up to 14 percentage points and improves robustness to modality loss in silent speech synthesis.
HiFTNet: A Fast High- Quality Neural V ocoder with Harmonic-Plus-Noise Filter and Inverse Short Time Fourier Transform,
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
years
2026 2verdicts
UNVERDICTED 2representative citing papers
Unified guidance framework for Flow Matching speech synthesis achieves nearly 3x faster inference and improved speaker similarity by combining heterogeneous data augmentation with intrinsic model guidance to eliminate CFG overhead.
citing papers explorer
-
Enhancing Flow Matching with A Unified Guidance Framework for Efficient and Robust Speech Synthesis
Unified guidance framework for Flow Matching speech synthesis achieves nearly 3x faster inference and improved speaker similarity by combining heterogeneous data augmentation with intrinsic model guidance to eliminate CFG overhead.