DGSNA dynamically generates scene-specific noise via prompt-driven language models and text-to-audio diffusion, then mixes it with speech to improve recognition and keyword spotting robustness by up to 11.32%.
Title resolution pending
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
fields
cs.SD 2years
2024 2verdicts
UNVERDICTED 2representative citing papers
Transformer-based 1D CNN with MFCC features classifies pediatric heart sounds at 93.69 percent accuracy on 5-second segments, identifying minimum effective length and RMSSD/ZCR quality threshold of 0.4.
citing papers explorer
-
DGSNA: Dynamic Generative Scene-based Noise Addition method
DGSNA dynamically generates scene-specific noise via prompt-driven language models and text-to-audio diffusion, then mixes it with speech to improve recognition and keyword spotting robustness by up to 11.32%.
-
Classification of Short Segment Pediatric Heart Sounds Based on a Transformer-Based Convolutional Neural Network
Transformer-based 1D CNN with MFCC features classifies pediatric heart sounds at 93.69 percent accuracy on 5-second segments, identifying minimum effective length and RMSSD/ZCR quality threshold of 0.4.