Google Crowdsourced Speech Corpora and Related Open-Source Resources for Low-Resource Languages and Dialects: An Overview

Alena Butryna; Alexander Gutkin; Anna Katanova; Chenfang Li; Cibu Johny; Clara Rivera; Fei He; Isin Demirsahin; Jaka Aris Eko Wibawa; Keshan Sodimana

arxiv: 2010.06778 · v1 · pith:7XI46ASNnew · submitted 2020-10-14 · 💻 cs.CL

Google Crowdsourced Speech Corpora and Related Open-Source Resources for Low-Resource Languages and Dialects: An Overview

Alena Butryna , Shan-Hui Cathy Chu , Isin Demirsahin , Alexander Gutkin , Linne Ha , Fei He , Martin Jansche , Cibu Johny

show 13 more authors

Anna Katanova Oddur Kjartansson Chenfang Li Tatiana Merkulova Yin May Oo Knot Pipatsrisawat Clara Rivera Supheakmungkol Sarin Pasindu de Silva Keshan Sodimana Richard Sproat Theeraphol Wattanavekin Jaka Aris Eko Wibawa

This is my paper

classification 💻 cs.CL

keywords languagesspeechcorporadevelopingdialectsoverviewpresentsresources

0 comments

read the original abstract

This paper presents an overview of a program designed to address the growing need for developing freely available speech resources for under-represented languages. At present we have released 38 datasets for building text-to-speech and automatic speech recognition applications for languages and dialects of South and Southeast Asia, Africa, Europe and South America. The paper describes the methodology used for developing such corpora and presents some of our findings that could benefit under-represented language communities.

This paper has not been read by Pith yet.

Google Crowdsourced Speech Corpora and Related Open-Source Resources for Low-Resource Languages and Dialects: An Overview

discussion (0)