An automatic numeric-remapping attack generator reveals 12-26 point accuracy drops on GSM8K for three LLMs while MAWPS and MultiArith stay near 98%.
Evaluating Robustness of LLMs to Numerical Varia- tions in Mathematical Reasoning
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
fields
cs.CR 1years
2026 1verdicts
UNVERDICTED 1representative citing papers
citing papers explorer
-
Testing LLM Arithmetic Reasoning Generalization with Automatic Numeric-Remapping Attacks
An automatic numeric-remapping attack generator reveals 12-26 point accuracy drops on GSM8K for three LLMs while MAWPS and MultiArith stay near 98%.