Numerical sensitivity and robustness: Exploring the flaws of mathematical reasoning in large language models, 2025

Zhishen Sun, Guang Dai, Ivor Tsang, Haishan Ye · 2025 · arXiv 2511.08022

2 Pith papers cite this work. Polarity classification is still indexing.

2 Pith papers citing it

representative citing papers

cs.LG · 2026-03-26 · unverdicted · novelty 7.0 · 2 refs

The Robust Reasoning Benchmark shows frontier LLMs are mostly resilient to textual perturbations on AIME problems while open-weight models suffer up to 54% accuracy drops and exhibit accuracy decay on later problems due to attention dilution during chain-of-thought.

Look Twice before You Leap: A Rational Framework for Localized Adversarial Anonymization

cs.CR · 2025-12-07 · unverdicted · novelty 5.0

RLAA is a localized adversarial anonymization framework that adds an arbitrator to filter ghost leaks and enforce rational early stopping, yielding superior privacy-utility trade-offs on benchmarks compared to greedy baselines.

citing papers explorer

Showing 2 of 2 citing papers.

Robust Reasoning Benchmark cs.LG · 2026-03-26 · unverdicted · none · ref 48 · 2 links
The Robust Reasoning Benchmark shows frontier LLMs are mostly resilient to textual perturbations on AIME problems while open-weight models suffer up to 54% accuracy drops and exhibit accuracy decay on later problems due to attention dilution during chain-of-thought.
Look Twice before You Leap: A Rational Framework for Localized Adversarial Anonymization cs.CR · 2025-12-07 · unverdicted · none · ref 5
RLAA is a localized adversarial anonymization framework that adds an arbitrator to filter ghost leaks and enforce rational early stopping, yielding superior privacy-utility trade-offs on benchmarks compared to greedy baselines.

Numerical sensitivity and robustness: Exploring the flaws of mathematical reasoning in large language models, 2025

fields

years

verdicts

representative citing papers

citing papers explorer