pith. sign in

arxiv: 2003.11529 · v1 · pith:GFK5AYOPnew · submitted 2020-03-13 · 💻 cs.CL

Masakhane -- Machine Translation For Africa

classification 💻 cs.CL
keywords africancommunitylanguagesafricaidentifiedlacklanguagemachine
0
0 comments X
read the original abstract

Africa has over 2000 languages. Despite this, African languages account for a small portion of available resources and publications in Natural Language Processing (NLP). This is due to multiple factors, including: a lack of focus from government and funding, discoverability, a lack of community, sheer language complexity, difficulty in reproducing papers and no benchmarks to compare techniques. To begin to address the identified problems, MASAKHANE, an open-source, continent-wide, distributed, online research effort for machine translation for African languages, was founded. In this paper, we discuss our methodology for building the community and spurring research from the African continent, as well as outline the success of the community in terms of addressing the identified problems affecting African NLP.

This paper has not been read by Pith yet.

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Forward citations

Cited by 7 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. AfriSUD: A Dependency Treebank Collection for Evaluating Models on African Languages

    cs.CL 2026-06 unverdicted novelty 7.0

    AfriSUD supplies new SUD-annotated dependency treebanks for nine Sub-Saharan African languages and demonstrates that existing models exhibit clear limitations on their syntax.

  2. Evaluating AI-Generated Images of Cultural Artifacts with Community-Informed Rubrics

    cs.CY 2026-04 unverdicted novelty 6.0

    Community members from the UK blind community, Kerala, and Tamil Nadu helped define what counts as culturally appropriate depictions of artifacts, and the authors tested whether those definitions can be turned into re...

  3. Mining Large Language Models for Low-Resource Language Data: Comparing Elicitation Strategies for Hausa and Fongbe

    cs.CL 2026-04 unverdicted novelty 5.0

    GPT-4o Mini extracts 6-41 times more usable Hausa and Fongbe text per API call than Gemini 2.5 Flash, with optimal elicitation strategies differing by language.

  4. Evaluating AI-Generated Images of Cultural Artifacts with Community-Informed Rubrics

    cs.CY 2026-04 unverdicted novelty 5.0

    Case studies with blind UK residents and people from Kerala and Tamil Nadu demonstrate that community input at the systematization stage produces culturally grounded definitions of appropriateness for text-to-image mo...

  5. A Survey of Text and Speech Resources for Hausa and Fongbe: Availability, Quality, and Gaps for NLP Development

    cs.CL 2026-04 unverdicted novelty 4.0

    A survey catalogs text and speech resources for Hausa and Fongbe, documenting sizes, domains, licensing, and gaps including limited Fongbe text diversity and missing Hausa speech corpora.

  6. Toward Responsible and Epistemically Grounded Multilingual LLMs for Computational Social Science and Humanities

    cs.CL 2026-05 unverdicted novelty 3.0

    Proposes a hermeneutics-informed evaluation framework with metrics for cultural alignment, cross-lingual stability, and reasoning faithfulness for multilingual LLMs in SSH research.

  7. Opportunities and Challenges of Large Language Models for Low-Resource Languages in Humanities Research

    cs.CL 2024-11 unverdicted novelty 2.0

    This survey paper identifies opportunities for LLMs in low-resource language humanities research along with challenges in data accessibility, model adaptability, and cultural sensitivity.