The fifth 'CHiME' Speech Separation and Recognition Challenge: Dataset, task and baselines

Jon Barker , Shinji Watanabe (CLSP) , Emmanuel Vincent (MULTISPEECH) , Jan Trmal (CLSP)

Authors on Pith no claims yet

classification 💻 cs.SD cs.AIeess.AS

keywords speechchallengetaskchimeconversationalsystemstrackcapture

read the original abstract

The CHiME challenge series aims to advance robust automatic speech recognition (ASR) technology by promoting research at the interface of speech and language processing, signal processing , and machine learning. This paper introduces the 5th CHiME Challenge, which considers the task of distant multi-microphone conversational ASR in real home environments. Speech material was elicited using a dinner party scenario with efforts taken to capture data that is representative of natural conversational speech and recorded by 6 Kinect microphone arrays and 4 binaural microphone pairs. The challenge features a single-array track and a multiple-array track and, for each track, distinct rankings will be produced for systems focusing on robustness with respect to distant-microphone capture vs. systems attempting to address all aspects of the task including conversational language modeling. We discuss the rationale for the challenge and provide a detailed description of the data collection procedure, the task, and the baseline systems for array synchronization, speech enhancement, and conventional and end-to-end ASR.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 3 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

From Reactive to Proactive: Assessing the Proactivity of Voice Agents via ProVoice-Bench
cs.AI 2026-04 unverdicted novelty 7.0

ProVoice-Bench is the first framework to evaluate proactive voice agents, revealing that state-of-the-art multimodal LLMs struggle with over-triggering and context-aware reasoning.
SQuTR: A Robustness Benchmark for Spoken Query to Text Retrieval under Acoustic Noise
cs.IR 2026-02 unverdicted novelty 7.0

SQuTR aggregates 37k queries from six text retrieval datasets, synthesizes speech from 200 speakers, adds 17 noise categories at varying SNR, and shows that even large retrieval models degrade sharply under extreme ac...
Who Gets Flagged? The Pluralistic Evaluation Gap in AI Content Watermarking
cs.CY 2026-04 conditional novelty 6.0

AI content watermarking exhibits detection disparities across languages, cultures, and demographics due to content-dependent signal properties, with benchmarks failing to disaggregate performance and watermarking held...