Temporal speech activity features from a VAD classified by a tree ensemble distinguish voicemail from live human answers at 96.1% accuracy on 764 telephony recordings with 46 ms inference.
Audio Set: An ontology and human-labeled dataset for audio events
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
fields
cs.SD 1years
2026 1verdicts
UNVERDICTED 1representative citing papers
citing papers explorer
-
Real-Time Voicemail Detection in Telephony Audio Using Temporal Speech Activity Features
Temporal speech activity features from a VAD classified by a tree ensemble distinguish voicemail from live human answers at 96.1% accuracy on 764 telephony recordings with 46 ms inference.