AVA-Speech: A Densely Labeled Dataset of Speech Activity in Movies

Andrew Gallagher; Caroline Pantofaru; Daniel P. W. Ellis; Joseph Roth; Kevin Wilson; Liat Kaver; Loretta Guarino Reid; Nathan Reale; Radhika Marvin; Sourish Chaudhuri

arxiv: 1808.00606 · v2 · pith:UYKHNRTNnew · submitted 2018-08-02 · 💻 cs.SD · eess.AS

AVA-Speech: A Densely Labeled Dataset of Speech Activity in Movies

Sourish Chaudhuri , Joseph Roth , Daniel P. W. Ellis , Andrew Gallagher , Liat Kaver , Radhika Marvin , Caroline Pantofaru , Nathan Reale

show 3 more authors

Loretta Guarino Reid Kevin Wilson Zhonghua Xi

This is my paper

classification 💻 cs.SD eess.AS

keywords speechactivitydatasetapplicationsapproachesava-speechavailableco-occurring

0 comments

read the original abstract

Speech activity detection (or endpointing) is an important processing step for applications such as speech recognition, language identification and speaker diarization. Both audio- and vision-based approaches have been used for this task in various settings, often tailored toward end applications. However, much of the prior work reports results in synthetic settings, on task-specific datasets, or on datasets that are not openly available. This makes it difficult to compare approaches and understand their strengths and weaknesses. In this paper, we describe a new dataset which we will release publicly containing densely labeled speech activity in YouTube videos, with the goal of creating a shared, available dataset for this task. The labels in the dataset annotate three different speech activity conditions: clean speech, speech co-occurring with music, and speech co-occurring with noise, which enable analysis of model performance in more challenging conditions based on the presence of overlapping noise. We report benchmark performance numbers on AVA-Speech using off-the-shelf, state-of-the-art audio and vision models that serve as a baseline to facilitate future research.

This paper has not been read by Pith yet.

AVA-Speech: A Densely Labeled Dataset of Speech Activity in Movies

discussion (0)