AfriSenti: A Twitter Sentiment Analysis Benchmark for African Languages

Abinew Ali Ayele; Bello Shehu Bello; Bernard Opoku; David Ifeoluwa Adelani; Davis David; Falalu Ibrahim; Felermino D\'ario M\'ario Ant\'onio Ali; Hagos Tesfahun Gebremichael; Hailu Beshada Balcha; Ibrahim Sa'id Ahmad

arxiv: 2302.08956 · v5 · pith:SKI7J5AKnew · submitted 2023-02-17 · 💻 cs.CL

AfriSenti: A Twitter Sentiment Analysis Benchmark for African Languages

Shamsuddeen Hassan Muhammad , Idris Abdulmumin , Abinew Ali Ayele , Nedjma Ousidhoum , David Ifeoluwa Adelani , Seid Muhie Yimam , Ibrahim Sa'id Ahmad , Meriem Beloucif

show 18 more authors

Saif M. Mohammad Sebastian Ruder Oumaima Hourrane Pavel Brazdil Felermino D\'ario M\'ario Ant\'onio Ali Davis David Salomey Osei Bello Shehu Bello Falalu Ibrahim Tajuddeen Gwadabe Samuel Rutunda Tadesse Belay Wendimu Baye Messelle Hailu Beshada Balcha Sisay Adugna Chala Hagos Tesfahun Gebremichael Bernard Opoku Steven Arthur

This is my paper

classification 💻 cs.CL

keywords languagesafricanafrisentiafrisenti-semevalanalysisannotatedarabicbenchmark

0 comments

read the original abstract

Africa is home to over 2,000 languages from more than six language families and has the highest linguistic diversity among all continents. These include 75 languages with at least one million speakers each. Yet, there is little NLP research conducted on African languages. Crucial to enabling such research is the availability of high-quality annotated datasets. In this paper, we introduce AfriSenti, a sentiment analysis benchmark that contains a total of >110,000 tweets in 14 African languages (Amharic, Algerian Arabic, Hausa, Igbo, Kinyarwanda, Moroccan Arabic, Mozambican Portuguese, Nigerian Pidgin, Oromo, Swahili, Tigrinya, Twi, Xitsonga, and Yor\`ub\'a) from four language families. The tweets were annotated by native speakers and used in the AfriSenti-SemEval shared task (The AfriSenti Shared Task had over 200 participants. See website at https://afrisenti-semeval.github.io). We describe the data collection methodology, annotation process, and the challenges we dealt with when curating each dataset. We further report baseline experiments conducted on the different datasets and discuss their usefulness.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

The Register Gap: A Meaning Intelligence Framework for Nigerian Public Discourse
cs.CL 2026-06 unverdicted novelty 5.0

The Meaning Intelligence Framework raises zero-shot register classification accuracy from 33.3% to 73.3% on a 30-item Nigerian discourse calibration set while showing that smaller models can outperform larger ones on ...