pith. sign in
Pith Number

pith:Y2PRIO4E

pith:2021:Y2PRIO4E3LEENHWXTSUJMEA4GQ
not attested not anchored not stored refs resolved

Are NLP Models really able to Solve Simple Math Word Problems?

Arkil Patel, Navin Goyal, Satwik Bhattamishra

NLP solvers for simple math word problems achieve high benchmark scores by exploiting shallow patterns instead of actual reasoning.

arxiv:2103.07191 v2 · 2021-03-12 · cs.CL

Add to your LaTeX paper
\usepackage{pith}
\pithnumber{Y2PRIO4E3LEENHWXTSUJMEA4GQ}

Prints a linked badge after your title and injects PDF metadata. Compiles on arXiv. Learn more · Embed verified badge

Record completeness

1 Bitcoin timestamp
2 Internet Archive
3 Author claim open · sign in to claim
4 Citations open
5 Replications open
Portable graph bundle live · download bundle · merged state
The bundle contains the canonical record plus signed events. A mirror can host it anywhere and recompute the same current state with the deterministic merge algorithm.

Claims

C1strongest claim

MWP solvers that do not have access to the question asked in the MWP can still solve a large fraction of MWPs. Similarly, models that treat MWPs as bag-of-words can also achieve surprisingly high accuracy. The best accuracy achieved by state-of-the-art models is substantially lower on SVAMP.

C2weakest assumption

That the carefully chosen variations used to create SVAMP are sufficient to block all shallow heuristics while still testing the intended arithmetic reasoning.

C3one line summary

NLP models for elementary math word problems rely on shallow heuristics rather than genuine understanding, performing well without questions or as bag-of-words but dropping substantially on the new SVAMP variation dataset.

References

12 extracted · 12 resolved · 0 Pith anchors

[1] Suchin Gururangan, Swabha Swayamdipta, Omer Levy, Roy Schwartz, Samuel Bowman, and Noah A 2018
[2] In Proceed- ings of the 58th Annual Meeting of the Association for Computational Linguistics, pages 975–984, On- line 2020
[3] In Proceed- ings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP) , pages 3702–3710, Online 2020
[4] IEEE Transac- tions on Pattern Analysis and Machine Intelligence , 42(9):2287–2305 2020
[5] B Implementation Details We use 8 NVIDIA Tesla P100 GPUs each with 16 GB memory to run our experiments

Formal links

1 machine-checked theorem link

Cited by

40 papers in Pith

Receipt and verification
First computed 2026-05-17T23:38:47.115752Z
Builder pith-number-builder-2026-05-17-v1
Signature Pith Ed25519 (pith-v1-2026-05) · public key
Schema pith-number/v1.0

Canonical hash

c69f143b84dac8469ed79ca896101c343937602e2da37b54f59641f4f9c4056c

Aliases

arxiv: 2103.07191 · arxiv_version: 2103.07191v2 · doi: 10.48550/arxiv.2103.07191 · pith_short_12: Y2PRIO4E3LEE · pith_short_16: Y2PRIO4E3LEENHWX · pith_short_8: Y2PRIO4E
Agent API
Verify this Pith Number yourself
curl -sH 'Accept: application/ld+json' https://pith.science/pith/Y2PRIO4E3LEENHWXTSUJMEA4GQ \
  | jq -c '.canonical_record' \
  | python3 -c "import sys,json,hashlib; b=json.dumps(json.loads(sys.stdin.read()), sort_keys=True, separators=(',',':'), ensure_ascii=False).encode(); print(hashlib.sha256(b).hexdigest())"
# expect: c69f143b84dac8469ed79ca896101c343937602e2da37b54f59641f4f9c4056c
Canonical record JSON
{
  "metadata": {
    "abstract_canon_sha256": "620056386ef10602f08752d34cf4fdc9b37ad1f22161507fa11deb87ca0ad34a",
    "cross_cats_sorted": [],
    "license": "http://arxiv.org/licenses/nonexclusive-distrib/1.0/",
    "primary_cat": "cs.CL",
    "submitted_at": "2021-03-12T10:23:47Z",
    "title_canon_sha256": "27961b34ee3f17ffb2c39dc34923abd3cbb8e795187eb39ceb96d48b3f33953d"
  },
  "schema_version": "1.0",
  "source": {
    "id": "2103.07191",
    "kind": "arxiv",
    "version": 2
  }
}