A new pruning recipe for Whisper on Bambara with 32h data uses low-rank embedding compression, feature distillation, and layer merging to produce a model 48x smaller and 2.15x faster that retains 90% of original performance.
Pruning via merg- ing: Compressing LLMs via manifold alignment based layer merging,
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
fields
eess.AS 1years
2025 1verdicts
UNVERDICTED 1representative citing papers
citing papers explorer
-
BaldWhisper: Faster Whisper with Head Shearing and Layer Merging
A new pruning recipe for Whisper on Bambara with 32h data uses low-rank embedding compression, feature distillation, and layer merging to produce a model 48x smaller and 2.15x faster that retains 90% of original performance.