Pith Number

pith:TW2NMZMH

pith:2026:TW2NMZMHICZI74RCPKWHQHSDWM

not attested not anchored not stored refs resolved

HalluScore: Large Language Model Hallucination Question Answering Benchmark

Aisha Alansari, Hamzah Luqman

HalluScore is a curated Arabic QA dataset with 827 questions, ground-truth evidence, and human annotations used to measure hallucination rates across 17 LLMs.

arxiv:2605.17007 v1 · 2026-05-16 · cs.CL

Open paper page JSON Open Graph Bundle Merged state Verified badge What is a Pith Number?

Add to your LaTeX paper

\usepackage{pith}
\pithnumber{TW2NMZMHICZI74RCPKWHQHSDWM}

Prints a linked badge after your title and injects PDF metadata. Compiles on arXiv. Learn more · Embed verified badge

Record completeness

1 Bitcoin timestamp

2 Internet Archive

3 Author claim open · sign in to claim

4 Citations open

5 Replications open

✓ Portable graph bundle live · download bundle · merged state

The bundle contains the canonical record plus signed events. A mirror can host it anywhere and recompute the same current state with the deterministic merge algorithm.

Claims

C1strongest claim

We introduce HalluScore, a structured Arabic question answering benchmark designed to evaluate hallucination behavior in LLMs across different levels of reasoning difficulty, various knowledge domains, historical timelines, and culturally grounded Arabic scenarios. It contains 827 carefully curated questions.

C2weakest assumption

The model-driven selection process successfully retains only questions that consistently trigger hallucinations while preserving factual validity and cultural grounding; this premise is stated in the abstract's description of the construction pipeline but lacks independent verification details.

C3one line summary

HalluScore is a curated Arabic QA dataset with 827 questions, ground-truth evidence, and human annotations used to measure hallucination rates across 17 LLMs.

References

69 extracted · 69 resolved · 5 Pith anchors

[1] On faithfulness and factuality in ab- stractive summarization, 2020

[2] A survey on hallucination in large lan- guage models: Principles, taxonomy, challenges, and open questions, 2025

[3] Arahallueval: A fine-grained hallucination evaluation framework for arabic llms, 2025

[4] Survey of hallucination in natural language generation, 2023

[5] A sur- vey of automatic hallucination evaluation on natural language generation, 2024

Receipt and verification

First computed	2026-05-20T00:03:35.680979Z
Builder	pith-number-builder-2026-05-17-v1
Signature	Pith Ed25519 (`pith-v1-2026-05`) · public key
Schema	pith-number/v1.0

Canonical hash

9db4d6658740b28ff2227aac781e43b32ca440b2c613e029041c30f8eeef9b86

Aliases

arxiv: 2605.17007 · arxiv_version: 2605.17007v1 · doi: 10.48550/arxiv.2605.17007 · pith_short_12: TW2NMZMHICZI · pith_short_16: TW2NMZMHICZI74RC · pith_short_8: TW2NMZMH

Agent API

Resolver JSON Graph JSON Events JSON Schema Signing key

Verify this Pith Number yourself

curl -sH 'Accept: application/ld+json' https://pith.science/pith/TW2NMZMHICZI74RCPKWHQHSDWM \
  | jq -c '.canonical_record' \
  | python3 -c "import sys,json,hashlib; b=json.dumps(json.loads(sys.stdin.read()), sort_keys=True, separators=(',',':'), ensure_ascii=False).encode(); print(hashlib.sha256(b).hexdigest())"
# expect: 9db4d6658740b28ff2227aac781e43b32ca440b2c613e029041c30f8eeef9b86

Canonical record JSON

{
  "metadata": {
    "abstract_canon_sha256": "d282ecd04265ed3c1cdc4244c43228492da185798c0dbee60c0c8c18612f8d13",
    "cross_cats_sorted": [],
    "license": "http://arxiv.org/licenses/nonexclusive-distrib/1.0/",
    "primary_cat": "cs.CL",
    "submitted_at": "2026-05-16T14:08:15Z",
    "title_canon_sha256": "8d9db1692352343d3d6137a75d4941442578da79660e749c022489853b683c95"
  },
  "schema_version": "1.0",
  "source": {
    "id": "2605.17007",
    "kind": "arxiv",
    "version": 1
  }
}