Context-aware child-directed speech detection from long-form recordings

· 2026 · eess.AS · arXiv 2606.01134

1 Pith paper cite this work. Polarity classification is still indexing.

1 Pith paper citing it

open full Pith review browse 1 citing papers arXiv PDF

abstract

Automatically distinguishing child-directed speech from adult-directed speech in long-form recordings is key to scalable analyses of children's language environments. Existing approaches process utterances in isolation and have been evaluated primarily on English. We address these gaps along three dimensions. First, we fine-tune and evaluate six-self supervised models on a multilingual dataset of 182 children, showing that in-domain pre-training on child-centered recordings substantially outperforms models trained on adult speech. Second, we demonstrate that incorporating surrounding context substantially improves classification, with an absolute gain of 13.8% in average F1-score. Third, we evaluate our model in a realistic end-to-end pipeline, from adult speech detection to addressee classification, showing that performance drops under automatic segmentation but still consistently outperforms a rule-based baseline.

representative citing papers

Context-aware child-directed speech detection from long-form recordings

eess.AS · 2026-05-31 · unverdicted · novelty 5.0

Context from neighboring speech raises average F1 by 13.8 points for child-directed speech classification; in-domain pre-training on child recordings outperforms adult-speech models, and the pipeline still beats a rule baseline after automatic segmentation.

citing papers explorer

Showing 1 of 1 citing paper after filters.

Context-aware child-directed speech detection from long-form recordings eess.AS · 2026-05-31 · unverdicted · none · ref 1 · internal anchor
Context from neighboring speech raises average F1 by 13.8 points for child-directed speech classification; in-domain pre-training on child recordings outperforms adult-speech models, and the pipeline still beats a rule baseline after automatic segmentation.

Context-aware child-directed speech detection from long-form recordings

fields

years

verdicts

representative citing papers

citing papers explorer