Guidelines for Fine-grained Sentence-level Arabic Readability Annotation

Abdallah Abushmaes; Hanada Taha-Thomure; Khalid N. Elmadani; Nizar Habash; Zeina Zeino

arxiv: 2410.08674 · v3 · pith:6NJPKTMTnew · submitted 2024-10-11 · 💻 cs.CL

Guidelines for Fine-grained Sentence-level Arabic Readability Annotation

Nizar Habash , Hanada Taha-Thomure , Khalid N. Elmadani , Zeina Zeino , Abdallah Abushmaes This is my paper

classification 💻 cs.CL

keywords readabilityguidelinesannotationarabicacrossagreementbareccorpus

0 comments

read the original abstract

This paper presents the annotation guidelines of the Balanced Arabic Readability Evaluation Corpus (BAREC), a large-scale resource for fine-grained sentence-level readability assessment in Arabic. BAREC includes 69,441 sentences (1M+ words) labeled across 19 levels, from kindergarten to postgraduate. Based on the Taha/Arabi21 framework, the guidelines were refined through iterative training with native Arabic-speaking educators. We highlight key linguistic, pedagogical, and cognitive factors in determining readability and report high inter-annotator agreement: Quadratic Weighted Kappa 81.8% (substantial/excellent agreement) in the last annotation phase. We also benchmark automatic readability models across multiple classification granularities (19-, 7-, 5-, and 3-level). The corpus and guidelines are publicly available.

This paper has not been read by Pith yet.

Guidelines for Fine-grained Sentence-level Arabic Readability Annotation

discussion (0)