Masakhane -- Machine Translation For Africa

Abdallah Bashir; Adewale Akinfaderin; Alp \"Oktem; Arshath Ramkilowan; Blessing Sibanda; Bonaventure Dossou; Chris Emezue; Daniel Whitenack; Elan Van Biljon; Espoir Murhabazi

arxiv: 2003.11529 · v1 · pith:GFK5AYOPnew · submitted 2020-03-13 · 💻 cs.CL

Masakhane -- Machine Translation For Africa

Iroro Orife , Julia Kreutzer , Blessing Sibanda , Daniel Whitenack , Kathleen Siminyu , Laura Martinus , Jamiil Toure Ali , Jade Abbott

show 17 more authors

Vukosi Marivate Salomon Kabongo Musie Meressa Espoir Murhabazi Orevaoghene Ahia Elan van Biljon Arshath Ramkilowan Adewale Akinfaderin Alp \"Oktem Wole Akin Ghollah Kioko Kevin Degila Herman Kamper Bonaventure Dossou Chris Emezue Kelechi Ogueji Abdallah Bashir

This is my paper

classification 💻 cs.CL

keywords africancommunitylanguagesafricaidentifiedlacklanguagemachine

0 comments

read the original abstract

Africa has over 2000 languages. Despite this, African languages account for a small portion of available resources and publications in Natural Language Processing (NLP). This is due to multiple factors, including: a lack of focus from government and funding, discoverability, a lack of community, sheer language complexity, difficulty in reproducing papers and no benchmarks to compare techniques. To begin to address the identified problems, MASAKHANE, an open-source, continent-wide, distributed, online research effort for machine translation for African languages, was founded. In this paper, we discuss our methodology for building the community and spurring research from the African continent, as well as outline the success of the community in terms of addressing the identified problems affecting African NLP.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 7 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

AfriSUD: A Dependency Treebank Collection for Evaluating Models on African Languages
cs.CL 2026-06 unverdicted novelty 7.0

AfriSUD supplies new SUD-annotated dependency treebanks for nine Sub-Saharan African languages and demonstrates that existing models exhibit clear limitations on their syntax.
Evaluating AI-Generated Images of Cultural Artifacts with Community-Informed Rubrics
cs.CY 2026-04 unverdicted novelty 6.0

Community members from the UK blind community, Kerala, and Tamil Nadu helped define what counts as culturally appropriate depictions of artifacts, and the authors tested whether those definitions can be turned into re...
Mining Large Language Models for Low-Resource Language Data: Comparing Elicitation Strategies for Hausa and Fongbe
cs.CL 2026-04 unverdicted novelty 5.0

GPT-4o Mini extracts 6-41 times more usable Hausa and Fongbe text per API call than Gemini 2.5 Flash, with optimal elicitation strategies differing by language.
Evaluating AI-Generated Images of Cultural Artifacts with Community-Informed Rubrics
cs.CY 2026-04 unverdicted novelty 5.0

Case studies with blind UK residents and people from Kerala and Tamil Nadu demonstrate that community input at the systematization stage produces culturally grounded definitions of appropriateness for text-to-image mo...
A Survey of Text and Speech Resources for Hausa and Fongbe: Availability, Quality, and Gaps for NLP Development
cs.CL 2026-04 unverdicted novelty 4.0

A survey catalogs text and speech resources for Hausa and Fongbe, documenting sizes, domains, licensing, and gaps including limited Fongbe text diversity and missing Hausa speech corpora.
Toward Responsible and Epistemically Grounded Multilingual LLMs for Computational Social Science and Humanities
cs.CL 2026-05 unverdicted novelty 3.0

Proposes a hermeneutics-informed evaluation framework with metrics for cultural alignment, cross-lingual stability, and reasoning faithfulness for multilingual LLMs in SSH research.
Opportunities and Challenges of Large Language Models for Low-Resource Languages in Humanities Research
cs.CL 2024-11 unverdicted novelty 2.0

This survey paper identifies opportunities for LLMs in low-resource language humanities research along with challenges in data accessibility, model adaptability, and cultural sensitivity.