pith. sign in
Pith Number

pith:TW2NMZMH

pith:2026:TW2NMZMHICZI74RCPKWHQHSDWM
not attested not anchored not stored refs resolved

HalluScore: Large Language Model Hallucination Question Answering Benchmark

Aisha Alansari, Hamzah Luqman

HalluScore is a curated Arabic QA dataset with 827 questions, ground-truth evidence, and human annotations used to measure hallucination rates across 17 LLMs.

arxiv:2605.17007 v1 · 2026-05-16 · cs.CL

Add to your LaTeX paper
\usepackage{pith}
\pithnumber{TW2NMZMHICZI74RCPKWHQHSDWM}

Prints a linked badge after your title and injects PDF metadata. Compiles on arXiv. Learn more · Embed verified badge

Record completeness

1 Bitcoin timestamp
2 Internet Archive
3 Author claim open · sign in to claim
4 Citations open
5 Replications open
Portable graph bundle live · download bundle · merged state
The bundle contains the canonical record plus signed events. A mirror can host it anywhere and recompute the same current state with the deterministic merge algorithm.

Claims

C1strongest claim

We introduce HalluScore, a structured Arabic question answering benchmark designed to evaluate hallucination behavior in LLMs across different levels of reasoning difficulty, various knowledge domains, historical timelines, and culturally grounded Arabic scenarios. It contains 827 carefully curated questions.

C2weakest assumption

The model-driven selection process successfully retains only questions that consistently trigger hallucinations while preserving factual validity and cultural grounding; this premise is stated in the abstract's description of the construction pipeline but lacks independent verification details.

C3one line summary

HalluScore is a curated Arabic QA dataset with 827 questions, ground-truth evidence, and human annotations used to measure hallucination rates across 17 LLMs.

References

69 extracted · 69 resolved · 5 Pith anchors

[1] On faithfulness and factuality in ab- stractive summarization, 2020
[2] A survey on hallucination in large lan- guage models: Principles, taxonomy, challenges, and open questions, 2025
[3] Arahallueval: A fine-grained hallucination evaluation framework for arabic llms, 2025
[4] Survey of hallucination in natural language generation, 2023
[5] A sur- vey of automatic hallucination evaluation on natural language generation, 2024
Receipt and verification
First computed 2026-05-20T00:03:35.680979Z
Builder pith-number-builder-2026-05-17-v1
Signature Pith Ed25519 (pith-v1-2026-05) · public key
Schema pith-number/v1.0

Canonical hash

9db4d6658740b28ff2227aac781e43b32ca440b2c613e029041c30f8eeef9b86

Aliases

arxiv: 2605.17007 · arxiv_version: 2605.17007v1 · doi: 10.48550/arxiv.2605.17007 · pith_short_12: TW2NMZMHICZI · pith_short_16: TW2NMZMHICZI74RC · pith_short_8: TW2NMZMH
Agent API
Verify this Pith Number yourself
curl -sH 'Accept: application/ld+json' https://pith.science/pith/TW2NMZMHICZI74RCPKWHQHSDWM \
  | jq -c '.canonical_record' \
  | python3 -c "import sys,json,hashlib; b=json.dumps(json.loads(sys.stdin.read()), sort_keys=True, separators=(',',':'), ensure_ascii=False).encode(); print(hashlib.sha256(b).hexdigest())"
# expect: 9db4d6658740b28ff2227aac781e43b32ca440b2c613e029041c30f8eeef9b86
Canonical record JSON
{
  "metadata": {
    "abstract_canon_sha256": "d282ecd04265ed3c1cdc4244c43228492da185798c0dbee60c0c8c18612f8d13",
    "cross_cats_sorted": [],
    "license": "http://arxiv.org/licenses/nonexclusive-distrib/1.0/",
    "primary_cat": "cs.CL",
    "submitted_at": "2026-05-16T14:08:15Z",
    "title_canon_sha256": "8d9db1692352343d3d6137a75d4941442578da79660e749c022489853b683c95"
  },
  "schema_version": "1.0",
  "source": {
    "id": "2605.17007",
    "kind": "arxiv",
    "version": 1
  }
}