Participatory Research for Low-resourced Machine Translation: A Case Study in African Languages

Abdallah Bashir; Adewale Akinfaderin; Alp \"Oktem; Arshath Ramkilowan; Ayodele Olabiyi; Blessing Itoro Bassey; Blessing Sibanda; Bonaventure Dossou; Chris Emezue; Christopher Onyefuluchi

arxiv: 2010.02353 · v2 · pith:53SSHP6Wnew · submitted 2020-10-05 · 💻 cs.CL · cs.AI· cs.LG

Participatory Research for Low-resourced Machine Translation: A Case Study in African Languages

Wilhelmina Nekoto , Vukosi Marivate , Tshinondiwa Matsila , Timi Fasubaa , Tajudeen Kolawole , Taiwo Fagbohungbe , Solomon Oluwole Akinola , Shamsuddeen Hassan Muhammad

show 40 more authors

Salomon Kabongo Salomey Osei Sackey Freshia Rubungo Andre Niyongabo Ricky Macharm Perez Ogayo Orevaoghene Ahia Musie Meressa Mofe Adeyemi Masabata Mokgesi-Selinga Lawrence Okegbemi Laura Jane Martinus Kolawole Tajudeen Kevin Degila Kelechi Ogueji Kathleen Siminyu Julia Kreutzer Jason Webster Jamiil Toure Ali Jade Abbott Iroro Orife Ignatius Ezeani Idris Abdulkabir Dangana Herman Kamper Hady Elsahar Goodness Duru Ghollah Kioko Espoir Murhabazi Elan van Biljon Daniel Whitenack Christopher Onyefuluchi Chris Emezue Bonaventure Dossou Blessing Sibanda Blessing Itoro Bassey Ayodele Olabiyi Arshath Ramkilowan Alp \"Oktem Adewale Akinfaderin Abdallah Bashir

This is my paper

classification 💻 cs.CL cs.AIcs.LG

keywords languagesresearchlow-resourcedparticipatorytranslationafricanbenchmarkscase

0 comments

read the original abstract

Research in NLP lacks geographic diversity, and the question of how NLP can be scaled to low-resourced languages has not yet been adequately solved. "Low-resourced"-ness is a complex problem going beyond data availability and reflects systemic problems in society. In this paper, we focus on the task of Machine Translation (MT), that plays a crucial role for information accessibility and communication worldwide. Despite immense improvements in MT over the past decade, MT is centered around a few high-resourced languages. As MT researchers cannot solve the problem of low-resourcedness alone, we propose participatory research as a means to involve all necessary agents required in the MT development process. We demonstrate the feasibility and scalability of participatory research with a case study on MT for African languages. Its implementation leads to a collection of novel translation datasets, MT benchmarks for over 30 languages, with human evaluations for a third of them, and enables participants without formal training to make a unique scientific contribution. Benchmarks, models, data, code, and evaluation results are released under https://github.com/masakhane-io/masakhane-mt.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Afrispeech Semantics: Evaluating Audio Semantic Reasoning in Spoken Language Models Across Domains and Accents
cs.CL 2026-05 unverdicted novelty 4.0

Audio language models are benchmarked on five semantic and paralinguistic reasoning tasks to reveal limitations in handling spoken audio evidence, accent variation, and domain shifts.