A 616K-parameter CNN predicts upcoming stuttering from current 3s audio with AUC 0.58 overall but higher (0.60-0.62) for severe blocks and sound repetitions, runs on-device at low latency, and shows cross-dataset transfer.
wav2vec 2.0: A framework for self-supervised learning of speech representations
2 Pith papers cite this work. Polarity classification is still indexing.
years
2026 2verdicts
UNVERDICTED 2representative citing papers
Frame-aligned fusion of Canary and WavLM encoders, with WavLM temporally prepared via learnable strided convolution, outperforms other fusion strategies and reaches Eval RMSE 24.96 and Corr 0.796 on non-intrusive intelligibility prediction.
citing papers explorer
-
Predicting Upcoming Stuttering Events from Three-Second Audio: Stratified Evaluation Reveals Severity-Selective Precursors, and the Model Deploys Fully On-Device
A 616K-parameter CNN predicts upcoming stuttering from current 3s audio with AUC 0.58 overall but higher (0.60-0.62) for severe blocks and sound repetitions, runs on-device at low latency, and shows cross-dataset transfer.
-
Frame-Aligned Fusion of Canary and WavLM for Non-Intrusive Intelligibility Prediction of Hearing-Aid-Processed Speech
Frame-aligned fusion of Canary and WavLM encoders, with WavLM temporally prepared via learnable strided convolution, outperforms other fusion strategies and reaches Eval RMSE 24.96 and Corr 0.796 on non-intrusive intelligibility prediction.