pith. sign in

arxiv: 2412.17778 · v2 · pith:RHRHJBHBnew · submitted 2024-12-23 · 📡 eess.AS · cs.AI· cs.LG

From KAN to GR-KAN: Advancing Speech Enhancement with KAN-Based Methodology

classification 📡 eess.AS cs.AIcs.LG
keywords gr-kanlayerscomplexenhancementexpressivenessimprovingkan-basedmp-senet
0
0 comments X
read the original abstract

Deep neural network (DNN)-based speech enhancement (SE) usually uses conventional activation functions, which lack the expressiveness to capture complex multiscale structures needed for high-fidelity SE. Group-Rational KAN (GR-KAN), a variant of Kolmogorov-Arnold Networks (KAN), retains KAN's expressiveness while improving scalability on complex tasks. We adapt GR-KAN to existing DNN-based SE by replacing dense layers with GR-KAN layers in the time-frequency (T-F) domain MP-SENet and adapting GR-KAN's activations into the 1D CNN layers in the time-domain Demucs. Results on Voicebank-DEMAND show that GR-KAN requires up to 4x fewer parameters while improving PESQ by up to 0.1. In contrast, KAN, facing scalability issues, outperforms MLP on a small-scale signal modeling task but fails to improve MP-SENet. We demonstrate the first successful use of KAN-based methods for consistent improvement in both time- and SoTA TF-domain SE, establishing GR-KAN as a promising alternative for SE.

This paper has not been read by Pith yet.

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Forward citations

Cited by 2 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. Impact Analysis of Speech Representation Learning Models for Acoustic Side-Channel Attack

    cs.CR 2026-06 unverdicted novelty 5.0

    KEYAC dataset created; KAN fine-tuning achieves SOTA on acoustic side-channel keystroke recognition from speech representations under zero-shot and partial fine-tuning.

  2. Impact Analysis of Speech Representation Learning Models for Acoustic Side-Channel Attack

    cs.CR 2026-06 unverdicted novelty 5.0

    KEYAC dataset benchmarks speech models for keyboard acoustic side-channel attacks, with KAN fine-tuning setting new SOTA by addressing nonlinear feature interactions.