SEAM achieves 0.971 ROC-AUC on external interview data for real-time scripted speech detection by combining shortcut-prevention data techniques with a compact audio backbone.
SEAM: Shortcut-Aware Real-Time Detection of Scripted vs. Spontaneous Speech for Interview Guardrails
1 Pith paper cite this work. Polarity classification is still indexing.
abstract
Scripted vs spontaneous speech detection is appealing for interview guardrails, but benchmark performance can be inflated by shortcuts tied to corpus identity, channel conditions, and recording artifacts rather than speaking style itself. We present SEAM, a shortcut-aware framework for real-time scriptedness detection that combines uniform preprocessing, seam-aware sampling, non-speech augmentation, and a compact DistilHuBERT backbone. With 8s windows, the model achieves 0.971 +- 0.004 ROC-AUC on an external interview-domain evaluation set. Removing the shortcut-prevention components improves internal held-out metrics but sharply reduces external performance, indicating shortcut learning. Post-training quantization reduces the model footprint to 41.8MB with little loss in external performance. The results demonstrate that robust real-time scriptedness detection depends not only on the backbone, but on shortcut-aware data design and evaluation. We release code and model checkpoints.
fields
eess.AS 1years
2026 1verdicts
CONDITIONAL 1representative citing papers
citing papers explorer
-
SEAM: Shortcut-Aware Real-Time Detection of Scripted vs. Spontaneous Speech for Interview Guardrails
SEAM achieves 0.971 ROC-AUC on external interview data for real-time scripted speech detection by combining shortcut-prevention data techniques with a compact audio backbone.