pith. sign in
Pith Number

pith:GNBHBX5A

pith:2024:GNBHBX5AV57ZBW4XO7HMQW3VML
not attested not anchored not stored refs resolved

GSM-Symbolic: Understanding the Limitations of Mathematical Reasoning in Large Language Models

Hooman Shahrokhi, Iman Mirzadeh, Keivan Alizadeh, Mehrdad Farajtabar, Oncel Tuzel, Samy Bengio

Large language models cannot perform genuine mathematical reasoning and instead replicate patterns from training data.

arxiv:2410.05229 v2 · 2024-10-07 · cs.LG · cs.AI

Add to your LaTeX paper
\usepackage{pith}
\pithnumber{GNBHBX5AV57ZBW4XO7HMQW3VML}

Prints a linked badge after your title and injects PDF metadata. Compiles on arXiv. Learn more · Embed verified badge

Record completeness

1 Bitcoin timestamp
2 Internet Archive
3 Author claim open · sign in to claim
4 Citations open
5 Replications open
Portable graph bundle live · download bundle · merged state
The bundle contains the canonical record plus signed events. A mirror can host it anywhere and recompute the same current state with the deterministic merge algorithm.

Claims

C1strongest claim

current LLMs cannot perform genuine logical reasoning; they replicate reasoning steps from their training data.

C2weakest assumption

That the added clauses are truly irrelevant to the solution process and that performance drops therefore demonstrate absence of genuine reasoning rather than sensitivity to prompt length or surface features.

C3one line summary

LLMs display high variance and major accuracy drops on GSM-Symbolic variants of grade-school math problems, indicating they replicate training patterns rather than execute logical reasoning.

References

85 extracted · 85 resolved · 14 Pith anchors

[8] Qintong Li and Leyang Cui and Xueliang Zhao and Lingpeng Kong and Wei Bi , editor =. GSM-Plus:. Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long 2024
[9] Hwang and Soumya Sanyal and Xiang Ren and Allyson Ettinger and Za 2023
[10] Chi and Quoc V 2022
[11] Chi and Nathanael Sch 2023
[12] Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics , pages= 2021

Formal links

1 machine-checked theorem link

Cited by

43 papers in Pith

Receipt and verification
First computed 2026-05-17T23:39:19.672939Z
Builder pith-number-builder-2026-05-17-v1
Signature Pith Ed25519 (pith-v1-2026-05) · public key
Schema pith-number/v1.0

Canonical hash

334270dfa0af7f90db9777cec85b7562e5edc3e94d8b66606e23715261873f72

Aliases

arxiv: 2410.05229 · arxiv_version: 2410.05229v2 · doi: 10.48550/arxiv.2410.05229 · pith_short_12: GNBHBX5AV57Z · pith_short_16: GNBHBX5AV57ZBW4X · pith_short_8: GNBHBX5A
Agent API
Verify this Pith Number yourself
curl -sH 'Accept: application/ld+json' https://pith.science/pith/GNBHBX5AV57ZBW4XO7HMQW3VML \
  | jq -c '.canonical_record' \
  | python3 -c "import sys,json,hashlib; b=json.dumps(json.loads(sys.stdin.read()), sort_keys=True, separators=(',',':'), ensure_ascii=False).encode(); print(hashlib.sha256(b).hexdigest())"
# expect: 334270dfa0af7f90db9777cec85b7562e5edc3e94d8b66606e23715261873f72
Canonical record JSON
{
  "metadata": {
    "abstract_canon_sha256": "1dc6c56c53a6fe5ea687fb26bc48ca00a6117e241edd9dbd65d2c9cd925265e1",
    "cross_cats_sorted": [
      "cs.AI"
    ],
    "license": "http://arxiv.org/licenses/nonexclusive-distrib/1.0/",
    "primary_cat": "cs.LG",
    "submitted_at": "2024-10-07T17:36:37Z",
    "title_canon_sha256": "acaeed5a707ab7d7d3f7aaad19d1f014386246dea511a2a9263886064c877d57"
  },
  "schema_version": "1.0",
  "source": {
    "id": "2410.05229",
    "kind": "arxiv",
    "version": 2
  }
}