pith:FMJQSJSF
A Multitask, Multilingual, Multimodal Evaluation of ChatGPT on Reasoning, Hallucination, and Interactivity
ChatGPT averages 63.41% accuracy across ten reasoning categories and improves only modestly with human interaction.
arxiv:2302.04023 v4 · 2023-02-08 · cs.CL · cs.AI
Record completeness
Claims
ChatGPT is 63.41% accurate on average in 10 different reasoning categories under logical reasoning, non-textual reasoning, and commonsense reasoning, hence making it an unreliable reasoner. It is, for example, better at deductive than inductive reasoning.
That the 23 chosen datasets, the newly designed multimodal dataset, and the 10 reasoning categories provide a representative and low-bias measure of ChatGPT capabilities without major sensitivity to prompt wording or subjective hallucination labeling.
ChatGPT outperforms zero-shot LLMs on most tasks and improves with interaction but scores only 63.41 percent on reasoning categories and generates extrinsic hallucinations from its training data.
References
Formal links
Cited by
Receipt and verification
| First computed | 2026-05-17T23:38:13.237057Z |
|---|---|
| Builder | pith-number-builder-2026-05-17-v1 |
| Signature | Pith Ed25519
(pith-v1-2026-05) · public key |
| Schema | pith-number/v1.0 |
Canonical hash
2b13092645281cb300a86c66d9d4af39df07b806108f271b30a4904d70721687
Aliases
· · · · ·Agent API
Verify this Pith Number yourself
curl -sH 'Accept: application/ld+json' https://pith.science/pith/FMJQSJSFFAOLGAFINRTNTVFPHH \
| jq -c '.canonical_record' \
| python3 -c "import sys,json,hashlib; b=json.dumps(json.loads(sys.stdin.read()), sort_keys=True, separators=(',',':'), ensure_ascii=False).encode(); print(hashlib.sha256(b).hexdigest())"
# expect: 2b13092645281cb300a86c66d9d4af39df07b806108f271b30a4904d70721687
Canonical record JSON
{
"metadata": {
"abstract_canon_sha256": "a2a18e59511c993ddda9d3136987bf9209d89feec0b355f8f79871b9e7e58bd3",
"cross_cats_sorted": [
"cs.AI"
],
"license": "http://creativecommons.org/licenses/by-nc-sa/4.0/",
"primary_cat": "cs.CL",
"submitted_at": "2023-02-08T12:35:34Z",
"title_canon_sha256": "0af1cd3d0f93626347676b13538faee2652ea3b76a43db3c8090a2df56595df6"
},
"schema_version": "1.0",
"source": {
"id": "2302.04023",
"kind": "arxiv",
"version": 4
}
}