Introduces rewriting categories to formalize proof equivariance and success invariance, shows LLM provers violate both, and demonstrates test-time aggregation recovers invariance and boosts performance.
GSM-symbolic: Understanding the limitations of mathematical reasoning in large language models
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
verdicts
UNVERDICTED 2representative citing papers
LRMs exhibit complete accuracy collapse beyond certain puzzle complexities, with reasoning effort rising then declining, outperforming standard LLMs only on medium-complexity tasks.
citing papers explorer
-
What are the Right Symmetries for Formal Theorem Proving?
Introduces rewriting categories to formalize proof equivariance and success invariance, shows LLM provers violate both, and demonstrates test-time aggregation recovers invariance and boosts performance.
-
The Illusion of Thinking: Understanding the Strengths and Limitations of Reasoning Models via the Lens of Problem Complexity
LRMs exhibit complete accuracy collapse beyond certain puzzle complexities, with reasoning effort rising then declining, outperforming standard LLMs only on medium-complexity tasks.