AI4D -- African Language Program

Amelia Taylor; Bhanu Neupane; Chayma Fourati; David I. Adelani; Davis David; Davor Orlic; Godson Kalipe; Hatem Haddad; Jade Abbott; Jamiil Toure Ali

arxiv: 2104.02516 · v1 · pith:MHRYBDCBnew · submitted 2021-04-06 · 💻 cs.CL

AI4D -- African Language Program

Kathleen Siminyu , Godson Kalipe , Davor Orlic , Jade Abbott , Vukosi Marivate , Sackey Freshia , Prateek Sibal , Bhanu Neupane

show 10 more authors

David I. Adelani Amelia Taylor Jamiil Toure ALI Kevin Degila Momboladji Balogoun Thierno Ibrahima DIOP Davis David Chayma Fourati Hatem Haddad Malek Naski

This is my paper

classification 💻 cs.CL

keywords datasetslanguageafricanai4dannotatedchallengescompetitivecreation

0 comments

read the original abstract

Advances in speech and language technologies enable tools such as voice-search, text-to-speech, speech recognition and machine translation. These are however only available for high resource languages like English, French or Chinese. Without foundational digital resources for African languages, which are considered low-resource in the digital context, these advanced tools remain out of reach. This work details the AI4D - African Language Program, a 3-part project that 1) incentivised the crowd-sourcing, collection and curation of language datasets through an online quantitative and qualitative challenge, 2) supported research fellows for a period of 3-4 months to create datasets annotated for NLP tasks, and 3) hosted competitive Machine Learning challenges on the basis of these datasets. Key outcomes of the work so far include 1) the creation of 9+ open source, African language datasets annotated for a variety of ML tasks, and 2) the creation of baseline models for these datasets through hosting of competitive ML challenges.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 4 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

The Annotation Scarcity Paradox in Low-Resource NLP Evaluation: A Decade of Acceleration and Emerging Constraints
cs.CL 2026-05 unverdicted novelty 5.0

Introduces the Annotation Scarcity Paradox to describe how model scaling in low-resource NLP outpaces the human expertise required for authentic evaluation, threatening the validity of reported progress.
The Annotation Scarcity Paradox in Low-Resource NLP Evaluation: A Decade of Acceleration and Emerging Constraints
cs.CL 2026-05 unverdicted novelty 4.0

A critical narrative survey conceptualizes the Annotation Scarcity Paradox as a structural limit on the epistemic validity of progress claims in low-resource NLP due to strained sociolinguistic expertise and extractiv...
The Annotation Scarcity Paradox in Low-Resource NLP Evaluation: A Decade of Acceleration and Emerging Constraints
cs.CL 2026-05 unverdicted novelty 4.0

A narrative survey of low-resource NLP evaluation identifies the Annotation Scarcity Paradox as a structural mismatch between scalable models and scarce sociolinguistic evaluation capacity.
A Survey of Text and Speech Resources for Hausa and Fongbe: Availability, Quality, and Gaps for NLP Development
cs.CL 2026-04 unverdicted novelty 4.0

A survey catalogs text and speech resources for Hausa and Fongbe, documenting sizes, domains, licensing, and gaps including limited Fongbe text diversity and missing Hausa speech corpora.