AI4D -- African Language Program
read the original abstract
Advances in speech and language technologies enable tools such as voice-search, text-to-speech, speech recognition and machine translation. These are however only available for high resource languages like English, French or Chinese. Without foundational digital resources for African languages, which are considered low-resource in the digital context, these advanced tools remain out of reach. This work details the AI4D - African Language Program, a 3-part project that 1) incentivised the crowd-sourcing, collection and curation of language datasets through an online quantitative and qualitative challenge, 2) supported research fellows for a period of 3-4 months to create datasets annotated for NLP tasks, and 3) hosted competitive Machine Learning challenges on the basis of these datasets. Key outcomes of the work so far include 1) the creation of 9+ open source, African language datasets annotated for a variety of ML tasks, and 2) the creation of baseline models for these datasets through hosting of competitive ML challenges.
This paper has not been read by Pith yet.
Forward citations
Cited by 4 Pith papers
-
The Annotation Scarcity Paradox in Low-Resource NLP Evaluation: A Decade of Acceleration and Emerging Constraints
Introduces the Annotation Scarcity Paradox to describe how model scaling in low-resource NLP outpaces the human expertise required for authentic evaluation, threatening the validity of reported progress.
-
The Annotation Scarcity Paradox in Low-Resource NLP Evaluation: A Decade of Acceleration and Emerging Constraints
A critical narrative survey conceptualizes the Annotation Scarcity Paradox as a structural limit on the epistemic validity of progress claims in low-resource NLP due to strained sociolinguistic expertise and extractiv...
-
The Annotation Scarcity Paradox in Low-Resource NLP Evaluation: A Decade of Acceleration and Emerging Constraints
A narrative survey of low-resource NLP evaluation identifies the Annotation Scarcity Paradox as a structural mismatch between scalable models and scarce sociolinguistic evaluation capacity.
-
A Survey of Text and Speech Resources for Hausa and Fongbe: Availability, Quality, and Gaps for NLP Development
A survey catalogs text and speech resources for Hausa and Fongbe, documenting sizes, domains, licensing, and gaps including limited Fongbe text diversity and missing Hausa speech corpora.
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.