HalluScore is a curated Arabic QA dataset with 827 questions, ground-truth evidence, and human annotations used to measure hallucination rates across 17 LLMs.
A survey of large language models for ara- bic language and its dialects,
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
fields
cs.CL 2years
2026 2verdicts
UNVERDICTED 2representative citing papers
Arabic-DeepSeek-R1 sets new state-of-the-art results on the Open Arabic LLM Leaderboard by combining sparse MoE fine-tuning with culturally-informed CoT distillation on a controlled bilingual dataset.
citing papers explorer
-
HalluScore: Large Language Model Hallucination Question Answering Benchmark
HalluScore is a curated Arabic QA dataset with 827 questions, ground-truth evidence, and human annotations used to measure hallucination rates across 17 LLMs.
-
State-of-the-Art Arabic Language Modeling with Sparse MoE Fine-Tuning and Chain-of-Thought Distillation
Arabic-DeepSeek-R1 sets new state-of-the-art results on the Open Arabic LLM Leaderboard by combining sparse MoE fine-tuning with culturally-informed CoT distillation on a controlled bilingual dataset.