Diacritization of Maghrebi Arabic Sub-Dialects

Ahmed Abdelali; Hamdy Mubarak; Kareem Darwish; Mohammed Attia; Younes Samih

arxiv: 1810.06619 · v3 · pith:PNCHTHNMnew · submitted 2018-10-15 · 💻 cs.CL

Diacritization of Maghrebi Arabic Sub-Dialects

Ahmed Abdelali , Mohammed Attia , Younes Samih , Kareem Darwish , Hamdy Mubarak This is my paper

classification 💻 cs.CL

keywords arabicdiacritizationmaghrebimoroccanprocesssub-dialectstunisianachieves

0 comments

read the original abstract

Diacritization process attempt to restore the short vowels in Arabic written text; which typically are omitted. This process is essential for applications such as Text-to-Speech (TTS). While diacritization of Modern Standard Arabic (MSA) still holds the lion share, research on dialectal Arabic (DA) diacritization is very limited. In this paper, we present our contribution and results on the automatic diacritization of two sub-dialects of Maghrebi Arabic, namely Tunisian and Moroccan, using a character-level deep neural network architecture that stacks two bi-LSTM layers over a CRF output layer. The model achieves word error rate of 2.7% and 3.6% for Moroccan and Tunisian respectively and is capable of implicitly identifying the sub-dialect of the input.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Multilingual and Multimodal LLMs in the Wild: Building for Low-Resource Languages
cs.CL 2026-05 unverdicted novelty 2.0

A tutorial synthesizing foundations, recent models such as PALO and Maya, and low-cost methods for tri-modal multilingual AI in resource-constrained settings.