pith:Y2PRIO4E
Are NLP Models really able to Solve Simple Math Word Problems?
NLP solvers for simple math word problems achieve high benchmark scores by exploiting shallow patterns instead of actual reasoning.
arxiv:2103.07191 v2 · 2021-03-12 · cs.CL
Add to your LaTeX paper
\usepackage{pith}
\pithnumber{Y2PRIO4E3LEENHWXTSUJMEA4GQ}
Prints a linked badge after your title and injects PDF metadata. Compiles on arXiv. Learn more · Embed verified badge
Record completeness
Claims
MWP solvers that do not have access to the question asked in the MWP can still solve a large fraction of MWPs. Similarly, models that treat MWPs as bag-of-words can also achieve surprisingly high accuracy. The best accuracy achieved by state-of-the-art models is substantially lower on SVAMP.
That the carefully chosen variations used to create SVAMP are sufficient to block all shallow heuristics while still testing the intended arithmetic reasoning.
NLP models for elementary math word problems rely on shallow heuristics rather than genuine understanding, performing well without questions or as bag-of-words but dropping substantially on the new SVAMP variation dataset.
References
Formal links
Cited by
Receipt and verification
| First computed | 2026-05-17T23:38:47.115752Z |
|---|---|
| Builder | pith-number-builder-2026-05-17-v1 |
| Signature | Pith Ed25519
(pith-v1-2026-05) · public key |
| Schema | pith-number/v1.0 |
Canonical hash
c69f143b84dac8469ed79ca896101c343937602e2da37b54f59641f4f9c4056c
Aliases
· · · · ·Agent API
Verify this Pith Number yourself
curl -sH 'Accept: application/ld+json' https://pith.science/pith/Y2PRIO4E3LEENHWXTSUJMEA4GQ \
| jq -c '.canonical_record' \
| python3 -c "import sys,json,hashlib; b=json.dumps(json.loads(sys.stdin.read()), sort_keys=True, separators=(',',':'), ensure_ascii=False).encode(); print(hashlib.sha256(b).hexdigest())"
# expect: c69f143b84dac8469ed79ca896101c343937602e2da37b54f59641f4f9c4056c
Canonical record JSON
{
"metadata": {
"abstract_canon_sha256": "620056386ef10602f08752d34cf4fdc9b37ad1f22161507fa11deb87ca0ad34a",
"cross_cats_sorted": [],
"license": "http://arxiv.org/licenses/nonexclusive-distrib/1.0/",
"primary_cat": "cs.CL",
"submitted_at": "2021-03-12T10:23:47Z",
"title_canon_sha256": "27961b34ee3f17ffb2c39dc34923abd3cbb8e795187eb39ceb96d48b3f33953d"
},
"schema_version": "1.0",
"source": {
"id": "2103.07191",
"kind": "arxiv",
"version": 2
}
}