Fine-tune Before Structured Pruning: Towards Compact and Accurate Self-Supervised Models for Speaker Diarization

Anna Silnova; Federico Landini; Jan Cernocky; Jiangyu Han; Johan Rohdin; Lukas Burget; Mireia Diez

arxiv: 2505.24111 · v1 · pith:XIPCVVYQnew · submitted 2025-05-30 · 📡 eess.AS

Fine-tune Before Structured Pruning: Towards Compact and Accurate Self-Supervised Models for Speaker Diarization

Jiangyu Han , Federico Landini , Johan Rohdin , Anna Silnova , Mireia Diez , Jan Cernocky , Lukas Burget This is my paper

classification 📡 eess.AS

keywords modelspruninglargewavlmbasebeforediarizationperformance

0 comments

read the original abstract

Self-supervised learning (SSL) models like WavLM can be effectively utilized when building speaker diarization systems but are often large and slow, limiting their use in resource constrained scenarios. Previous studies have explored compression techniques, but usually for the price of degraded performance at high pruning ratios. In this work, we propose to compress SSL models through structured pruning by introducing knowledge distillation. Different from the existing works, we emphasize the importance of fine-tuning SSL models before pruning. Experiments on far-field single-channel AMI, AISHELL-4, and AliMeeting datasets show that our method can remove redundant parameters of WavLM Base+ and WavLM Large by up to 80% without any performance degradation. After pruning, the inference speeds on a single GPU for the Base+ and Large models are 4.0 and 2.6 times faster, respectively. Our source code is publicly available.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

DiariZen Explained: A Tutorial for the Open Source State-of-the-Art Speaker Diarization Pipeline
eess.AS 2026-04 unverdicted novelty 1.0

The paper explains the DiariZen hybrid speaker diarization pipeline stage by stage, covering WavLM feature extraction, Conformer powerset classification, VBx clustering, and providing runnable code for the full system.