pith. sign in

arxiv: 2104.02516 · v1 · pith:MHRYBDCBnew · submitted 2021-04-06 · 💻 cs.CL

AI4D -- African Language Program

classification 💻 cs.CL
keywords datasetslanguageafricanai4dannotatedchallengescompetitivecreation
0
0 comments X
read the original abstract

Advances in speech and language technologies enable tools such as voice-search, text-to-speech, speech recognition and machine translation. These are however only available for high resource languages like English, French or Chinese. Without foundational digital resources for African languages, which are considered low-resource in the digital context, these advanced tools remain out of reach. This work details the AI4D - African Language Program, a 3-part project that 1) incentivised the crowd-sourcing, collection and curation of language datasets through an online quantitative and qualitative challenge, 2) supported research fellows for a period of 3-4 months to create datasets annotated for NLP tasks, and 3) hosted competitive Machine Learning challenges on the basis of these datasets. Key outcomes of the work so far include 1) the creation of 9+ open source, African language datasets annotated for a variety of ML tasks, and 2) the creation of baseline models for these datasets through hosting of competitive ML challenges.

This paper has not been read by Pith yet.

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Forward citations

Cited by 4 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. The Annotation Scarcity Paradox in Low-Resource NLP Evaluation: A Decade of Acceleration and Emerging Constraints

    cs.CL 2026-05 unverdicted novelty 5.0

    Introduces the Annotation Scarcity Paradox to describe how model scaling in low-resource NLP outpaces the human expertise required for authentic evaluation, threatening the validity of reported progress.

  2. The Annotation Scarcity Paradox in Low-Resource NLP Evaluation: A Decade of Acceleration and Emerging Constraints

    cs.CL 2026-05 unverdicted novelty 4.0

    A critical narrative survey conceptualizes the Annotation Scarcity Paradox as a structural limit on the epistemic validity of progress claims in low-resource NLP due to strained sociolinguistic expertise and extractiv...

  3. The Annotation Scarcity Paradox in Low-Resource NLP Evaluation: A Decade of Acceleration and Emerging Constraints

    cs.CL 2026-05 unverdicted novelty 4.0

    A narrative survey of low-resource NLP evaluation identifies the Annotation Scarcity Paradox as a structural mismatch between scalable models and scarce sociolinguistic evaluation capacity.

  4. A Survey of Text and Speech Resources for Hausa and Fongbe: Availability, Quality, and Gaps for NLP Development

    cs.CL 2026-04 unverdicted novelty 4.0

    A survey catalogs text and speech resources for Hausa and Fongbe, documenting sizes, domains, licensing, and gaps including limited Fongbe text diversity and missing Hausa speech corpora.