pith. sign in

arxiv: 2606.20338 · v1 · pith:VMIUZSEHnew · submitted 2026-06-18 · 📡 eess.AS

Stuttering Classification and Segmentation with Attention-Based Multiple Instance Learning

classification 📡 eess.AS
keywords stutteringclassificationclip-levelframe-levelinstancelearningmultiplemodels
0
0 comments X
read the original abstract

Stuttering detection and classification using deep learning methods has the potential to improve the process of stuttering severity assessment. Most stuttering classification datasets provide clip-level labels, making them unsuitable for fine-grained frame-level classification needed to determine the duration of individual stuttering dysfluencies. To overcome this challenge, we present a multiple instance neural network architecture based on fine-tuned wav2vec 2.0, WavLM and Whisper encoders. We apply instance- and embedding-based multiple instance learning approaches to train models on a clip-level dataset for both clip-level and frame-level stuttering classification tasks. Our results show a 23% improvement in frame-level F1 score and between 2% and 9% in clip-level F1 score, demonstrating the ability of our models to utilize clip-level data for frame-level segmentation.

This paper has not been read by Pith yet.

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.