Pith Number

pith:GNBHBX5A

pith:2024:GNBHBX5AV57ZBW4XO7HMQW3VML

not attested not anchored not stored refs resolved

GSM-Symbolic: Understanding the Limitations of Mathematical Reasoning in Large Language Models

Hooman Shahrokhi, Iman Mirzadeh, Keivan Alizadeh, Mehrdad Farajtabar, Oncel Tuzel, Samy Bengio

Large language models cannot perform genuine mathematical reasoning and instead replicate patterns from training data.

arxiv:2410.05229 v2 · 2024-10-07 · cs.LG · cs.AI

Open paper page JSON Open Graph Bundle Merged state Verified badge What is a Pith Number?

Add to your LaTeX paper

\usepackage{pith}
\pithnumber{GNBHBX5AV57ZBW4XO7HMQW3VML}

Prints a linked badge after your title and injects PDF metadata. Compiles on arXiv. Learn more · Embed verified badge

Record completeness

1 Bitcoin timestamp

2 Internet Archive

3 Author claim open · sign in to claim

4 Citations open

5 Replications open

✓ Portable graph bundle live · download bundle · merged state

The bundle contains the canonical record plus signed events. A mirror can host it anywhere and recompute the same current state with the deterministic merge algorithm.

Claims

C1strongest claim

current LLMs cannot perform genuine logical reasoning; they replicate reasoning steps from their training data.

C2weakest assumption

That the added clauses are truly irrelevant to the solution process and that performance drops therefore demonstrate absence of genuine reasoning rather than sensitivity to prompt length or surface features.

C3one line summary

LLMs display high variance and major accuracy drops on GSM-Symbolic variants of grade-school math problems, indicating they replicate training patterns rather than execute logical reasoning.

References

85 extracted · 85 resolved · 14 Pith anchors

[8] Qintong Li and Leyang Cui and Xueliang Zhao and Lingpeng Kong and Wei Bi , editor =. GSM-Plus:. Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long 2024

[9] Hwang and Soumya Sanyal and Xiang Ren and Allyson Ettinger and Za 2023

[10] Chi and Quoc V 2022

[11] Chi and Nathanael Sch 2023

[12] Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics , pages= 2021

Formal links

1 machine-checked theorem link

Cited by

43 papers in Pith

Position: Multimodal Large Language Models Can Significantly Advance Scientific Reasoning

Large Language Models for Multi-Robot Systems: A Survey

Lost in Cultural Translation: Do LLMs Struggle with Math Across Cultural Contexts?

Gemma 3 Technical Report

Robust Reasoning Benchmark

Receipt and verification

First computed	2026-05-17T23:39:19.672939Z
Builder	pith-number-builder-2026-05-17-v1
Signature	Pith Ed25519 (`pith-v1-2026-05`) · public key
Schema	pith-number/v1.0

Canonical hash

334270dfa0af7f90db9777cec85b7562e5edc3e94d8b66606e23715261873f72

Aliases

arxiv: 2410.05229 · arxiv_version: 2410.05229v2 · doi: 10.48550/arxiv.2410.05229 · pith_short_12: GNBHBX5AV57Z · pith_short_16: GNBHBX5AV57ZBW4X · pith_short_8: GNBHBX5A

Agent API

Resolver JSON Graph JSON Events JSON Schema Signing key

Verify this Pith Number yourself

curl -sH 'Accept: application/ld+json' https://pith.science/pith/GNBHBX5AV57ZBW4XO7HMQW3VML \
  | jq -c '.canonical_record' \
  | python3 -c "import sys,json,hashlib; b=json.dumps(json.loads(sys.stdin.read()), sort_keys=True, separators=(',',':'), ensure_ascii=False).encode(); print(hashlib.sha256(b).hexdigest())"
# expect: 334270dfa0af7f90db9777cec85b7562e5edc3e94d8b66606e23715261873f72

Canonical record JSON

{
  "metadata": {
    "abstract_canon_sha256": "1dc6c56c53a6fe5ea687fb26bc48ca00a6117e241edd9dbd65d2c9cd925265e1",
    "cross_cats_sorted": [
      "cs.AI"
    ],
    "license": "http://arxiv.org/licenses/nonexclusive-distrib/1.0/",
    "primary_cat": "cs.LG",
    "submitted_at": "2024-10-07T17:36:37Z",
    "title_canon_sha256": "acaeed5a707ab7d7d3f7aaad19d1f014386246dea511a2a9263886064c877d57"
  },
  "schema_version": "1.0",
  "source": {
    "id": "2410.05229",
    "kind": "arxiv",
    "version": 2
  }
}