pith. sign in

arxiv: 1512.01882 · v2 · pith:WSKNHQUHnew · submitted 2015-12-07 · 💻 cs.CL · cs.SD

THCHS-30 : A Free Chinese Speech Corpus

classification 💻 cs.CL cs.SD
keywords speechdataresearchchinesefreerecognitiondatabaseinstitutes
0
0 comments X
read the original abstract

Speech data is crucially important for speech recognition research. There are quite some speech databases that can be purchased at prices that are reasonable for most research institutes. However, for young people who just start research activities or those who just gain initial interest in this direction, the cost for data is still an annoying barrier. We support the `free data' movement in speech recognition: research institutes (particularly supported by public funds) publish their data freely so that new researchers can obtain sufficient data to kick of their career. In this paper, we follow this trend and release a free Chinese speech database THCHS-30 that can be used to build a full- edged Chinese speech recognition system. We report the baseline system established with this database, including the performance under highly noisy conditions.

This paper has not been read by Pith yet.

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. Speech-Driven End-to-End Language Discrimination towards Chinese Dialects

    cs.CL 2026-06 unverdicted novelty 4.0

    A speech-driven pipeline with MFCC features, HMM-DNN speech recognition, attention, and CNN fusion is presented for fine-grained Chinese dialect discrimination and evaluated on two benchmark corpora.