NADI 2025: The First Multidialectal Arabic Speech Processing Shared Task

AbdelRahim Elmadany; Abdurrahman Juma; Amirbek Djanibekov; Bashar Talafha; Chiyu Zhang; Hamad AlShehhi; Hanan Aldarmaki; Hawau Olamide Toyin; Muhammad Abdul-Mageed; Mustafa Jarrar

arxiv: 2509.02038 · v2 · pith:UHEFKN5Ynew · submitted 2025-09-02 · 💻 cs.CL · cs.SD

NADI 2025: The First Multidialectal Arabic Speech Processing Shared Task

Bashar Talafha , Hawau Olamide Toyin , Peter Sullivan , AbdelRahim Elmadany , Abdurrahman Juma , Amirbek Djanibekov , Chiyu Zhang , Hamad Alshehhi

show 4 more authors

Hanan Aldarmaki Mustafa Jarrar Nizar Habash Muhammad Abdul-Mageed

This is my paper

classification 💻 cs.CL cs.SD

keywords subtaskteamsdialectarabicspeechsubmissionsidentificationnadi

0 comments

read the original abstract

We present the findings of the sixth Nuanced Arabic Dialect Identification (NADI 2025) Shared Task, which focused on Arabic speech dialect processing across three subtasks: spoken dialect identification (Subtask 1), speech recognition (Subtask 2), and diacritic restoration for spoken dialects (Subtask 3). A total of 44 teams registered, and during the testing phase, 100 valid submissions were received from eight unique teams. The distribution was as follows: 34 submissions for Subtask 1 "five teams{\ae}, 47 submissions for Subtask 2 "six teams", and 19 submissions for Subtask 3 "two teams". The best-performing systems achieved 79.8% accuracy on Subtask 1, 35.68/12.20 WER/CER (overall average) on Subtask 2, and 55/13 WER/CER on Subtask 3. These results highlight the ongoing challenges of Arabic dialect speech processing, particularly in dialect identification, recognition, and diacritic restoration. We also summarize the methods adopted by participating teams and briefly outline directions for future editions of NADI.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 3 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

An End-to-End Hybrid Framework for Rumour Detection in Low-Resources Algerian Dialect
cs.CL 2026-06 unverdicted novelty 6.0

Hybrid framework using transformer embeddings plus classical classifier achieves 0.84 F1 for rumour detection in Algerian dialect on a newly constructed dataset from real posts, synthetic data, and FASSILA corpus.
Dziri Voicebot: An End-to-End Low-Resource Speech-to-Speech Conversational System for Algerian Dialect
cs.CL 2026-06 unverdicted novelty 5.0

Presents a modular end-to-end speech-to-speech conversational system for Algerian Dialect by fine-tuning pretrained models on dedicated telecom datasets and reports strong component-level performance.
Dziri Voicebot: An End-to-End Low-Resource Speech-to-Speech Conversational System for Algerian Dialect
cs.CL 2026-06 unverdicted novelty 4.0

The authors construct and evaluate an end-to-end speech-to-speech pipeline for Algerian Dialect by adapting Whisper for ASR, transformer embeddings for NLU, and a neural TTS on custom dialectal data.