LinAlg-Bench shows LLMs switch from execution errors to computational abandonment and structured fabrication at 4x4 matrix scale, indicating a working memory limit rather than knowledge gaps.
Teach- ing arithmetic to small transformers
7 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
verdicts
UNVERDICTED 7roles
background 1polarities
background 1representative citing papers
A small GPT-2 model trained from scratch on GASING-derived CoT supervision for arithmetic reaches over 80% held-out accuracy, exhibits three learning phases, and develops both procedural and associative reasoning.
In a structured-output NW matrix task, Transformers generalize fastest at intermediate dataset sizes while larger sets can accelerate memorization in partial-competence regimes.
FSLR explicitly supervises the initial logical planning step in math problems, boosting LLM accuracy by 3-5% while using 80% fewer training tokens than standard CoT fine-tuning.
FoNE encodes numbers as single tokens via Fourier features and outperforms subword and digit-wise embeddings on addition, subtraction, and multiplication with far less data.
LiveCodeBench collects 400 recent contest problems to create a contamination-free benchmark evaluating LLMs on code generation and related capabilities like self-repair and execution.
Parameter reconstruction algorithm for SNN training obtained by extending convexification of parallel feedforward threshold networks to the recurrent case that subsumes SNNs.
citing papers explorer
-
Slower Generalization, Faster Memorization: A Sweet Spot in Algorithmic Learning
In a structured-output NW matrix task, Transformers generalize fastest at intermediate dataset sizes while larger sets can accelerate memorization in partial-competence regimes.