LinAlg-Bench shows LLMs switch from execution errors to computational abandonment and structured fabrication at 4x4 matrix scale, indicating a working memory limit rather than knowledge gaps.
Teach- ing arithmetic to small transformers
7 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
verdicts
UNVERDICTED 7roles
background 1polarities
background 1representative citing papers
A small GPT-2 model trained from scratch on GASING-derived CoT supervision for arithmetic reaches over 80% held-out accuracy, exhibits three learning phases, and develops both procedural and associative reasoning.
In a structured-output NW matrix task, Transformers generalize fastest at intermediate dataset sizes while larger sets can accelerate memorization in partial-competence regimes.
FSLR explicitly supervises the initial logical planning step in math problems, boosting LLM accuracy by 3-5% while using 80% fewer training tokens than standard CoT fine-tuning.
FoNE encodes numbers as single tokens via Fourier features and outperforms subword and digit-wise embeddings on addition, subtraction, and multiplication with far less data.
LiveCodeBench collects 400 recent contest problems to create a contamination-free benchmark evaluating LLMs on code generation and related capabilities like self-repair and execution.
Parameter reconstruction algorithm for SNN training obtained by extending convexification of parallel feedforward threshold networks to the recurrent case that subsumes SNNs.
citing papers explorer
-
LinAlg-Bench: A Forensic Benchmark Revealing Structural Failure Modes in LLM Mathematical Reasoning
LinAlg-Bench shows LLMs switch from execution errors to computational abandonment and structured fabrication at 4x4 matrix scale, indicating a working memory limit rather than knowledge gaps.
-
Arithmetic Pedagogy for Language Models
A small GPT-2 model trained from scratch on GASING-derived CoT supervision for arithmetic reaches over 80% held-out accuracy, exhibits three learning phases, and develops both procedural and associative reasoning.
-
Slower Generalization, Faster Memorization: A Sweet Spot in Algorithmic Learning
In a structured-output NW matrix task, Transformers generalize fastest at intermediate dataset sizes while larger sets can accelerate memorization in partial-competence regimes.
-
From Implicit to Explicit: Token-Efficient Logical Supervision for Mathematical Reasoning in LLMs
FSLR explicitly supervises the initial logical planning step in math problems, boosting LLM accuracy by 3-5% while using 80% fewer training tokens than standard CoT fine-tuning.
-
FoNE: Precise Single-Token Number Embeddings via Fourier Features
FoNE encodes numbers as single tokens via Fourier features and outperforms subword and digit-wise embeddings on addition, subtraction, and multiplication with far less data.
-
LiveCodeBench: Holistic and Contamination Free Evaluation of Large Language Models for Code
LiveCodeBench collects 400 recent contest problems to create a contamination-free benchmark evaluating LLMs on code generation and related capabilities like self-repair and execution.
-
Globally Optimal Training of Spiking Neural Networks via Parameter Reconstruction
Parameter reconstruction algorithm for SNN training obtained by extending convexification of parallel feedforward threshold networks to the recurrent case that subsumes SNNs.