AraHopeCorpus: Annotation Guidelines and Dataset for Hope Speech in Arabic Social Media Crisis Discourse
Pith reviewed 2026-05-25 04:55 UTC · model grok-4.3
The pith
AraHopeCorpus is the first annotated Arabic dataset of hope speech from ten thousand YouTube comments on the Gaza war, where hopeful expressions exceed sixty four percent.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The paper presents AraHopeCorpus as the first such resource for Arabic, collected from crisis-related YouTube comments, and demonstrates through annotation that hopeful language is the most common category at over sixty four percent, primarily through religious, solidarity-based, and optimistic expressions.
What carries the argument
AraHopeCorpus, a dataset of ten thousand annotated Arabic YouTube comments categorized into hope speech, no hope speech, and neutral or unclear, along with the accompanying annotation guidelines.
Load-bearing premise
The three annotation categories can be applied consistently to informal, dialectal Arabic text even when sarcasm or implicit cultural references are present.
What would settle it
Re-annotating a random sample of the comments with new annotators and obtaining substantially lower agreement scores or a different category distribution.
Figures
read the original abstract
Social media has become a crucial arena for shaping public narratives during armed conflicts, providing space for both harmful and constructive communication. While hate speech and misinformation have been widely studied, expressions that promote resilience, solidarity, and optimism remain underexplored, particularly in Arabic contexts. This paper introduces AraHopeCorpus, the first annotated dataset of Arabic hope speech collected from ten thousand YouTube comments related to the war on Gaza between 2023 and 2024. Using a detailed annotation framework, comments were classified into three categories: hope speech, no hope speech, and neutral or unclear discourse. The dataset shows that hopeful language dominates, accounting for more than sixty four percent of all comments. These expressions of hope appear mainly as religious encouragement, collective solidarity, and optimism for endurance and justice. No hope speech, representing about thirteen percent, reflects despair and disillusionment, while the rest of the comments contain neutral or mixed content. Inter-Annotator Agreement reached substantial levels (Cohen's Kappa equals 0.71), though dialectal variation, sarcasm, and implicit meaning posed annotation challenges. A comparative analysis between human annotators and ChatGPT revealed that large language models can support annotation but remain limited in handling dialectal and culturally embedded expressions. AraHopeCorpus will be released for research purposes under an open and non commercial license. It provides a valuable resource for studying constructive digital discourse, enabling further research on hope speech detection, crisis communication, and resilience in Arabic social media.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper introduces AraHopeCorpus, the first annotated dataset of Arabic hope speech from 10,000 YouTube comments on the 2023-2024 Gaza war. Comments are labeled into three categories (hope speech >64%, no hope speech ~13%, neutral/unclear) using explicit guidelines; Cohen's Kappa reaches 0.71. The work includes annotation challenges, a human-LLM comparison, and plans to release the corpus under an open non-commercial license.
Significance. If the annotations hold, the resource fills a clear gap in non-English hope-speech research during crises and supports downstream work on constructive discourse detection. The reported IAA, category definitions, and release commitment are direct strengths that enable community use.
minor comments (3)
- [Data Collection] Data collection section: specify the exact keywords, channels, and sampling procedure used to obtain the 10,000 comments so that the corpus can be replicated or extended.
- [Comparative Analysis] LLM comparison section: state the exact prompt template and temperature settings given to ChatGPT; without them the comparative results cannot be reproduced.
- [Results] Table 1 (category distribution): add row percentages with 95% confidence intervals to quantify uncertainty around the reported 64% and 13% figures.
Simulated Author's Rebuttal
We thank the referee for their positive evaluation of the manuscript and the recommendation to accept. The review accurately captures the contribution of AraHopeCorpus as the first annotated Arabic dataset for hope speech in crisis-related social media discourse.
Circularity Check
No significant circularity
full rationale
The paper is an empirical contribution that introduces AraHopeCorpus by collecting and annotating 10,000 YouTube comments into three categories, reporting observed distributions (>64% hope speech) and Cohen's Kappa of 0.71. No equations, parameter fitting, derivations, predictions, or self-citation chains exist that could reduce any claim to its own inputs by construction. The annotation process is validated against external human judgments rather than internally forced, and the work is transparent about dialectal and cultural challenges without claiming generalizability beyond the sample.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption Standard practices for defining and applying hope/no-hope/neutral categories to social media comments produce reliable labels when inter-annotator agreement reaches substantial levels.
Reference graph
Works this paper leans on
-
[1]
Ines Abbes, Wajdi Zaghouani, O. El-Hardlo, and F. Achour. 2020. Daict: A dialectal arabic irony corpus extracted from twitter. In Proceedings of the Language Resources and Evaluation Confer- ence. A. Abdelali et al. 2020. Protest discourse in arabic social media. In Proceedings of WANLP. M. Al Emadi and Wajdi Zaghouani. 2024. Emotional toll and coping str...
work page 2020
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.