Teach- ing arithmetic to small transformers

Lee, N · 2023 · arXiv 2307.03381

7 Pith papers cite this work. Polarity classification is still indexing.

7 Pith papers citing it

read on arXiv browse 7 citing papers

citation-role summary

background 1

citation-polarity summary

background 1

representative citing papers

LinAlg-Bench: A Forensic Benchmark Revealing Structural Failure Modes in LLM Mathematical Reasoning

cs.AI · 2026-05-15 · unverdicted · novelty 7.0

LinAlg-Bench shows LLMs switch from execution errors to computational abandonment and structured fabrication at 4x4 matrix scale, indicating a working memory limit rather than knowledge gaps.

Arithmetic Pedagogy for Language Models

cs.CL · 2026-06-03 · unverdicted · novelty 6.0

A small GPT-2 model trained from scratch on GASING-derived CoT supervision for arithmetic reaches over 80% held-out accuracy, exhibits three learning phases, and develops both procedural and associative reasoning.

Slower Generalization, Faster Memorization: A Sweet Spot in Algorithmic Learning

cs.LG · 2026-05-14 · unverdicted · novelty 6.0

In a structured-output NW matrix task, Transformers generalize fastest at intermediate dataset sizes while larger sets can accelerate memorization in partial-competence regimes.

From Implicit to Explicit: Token-Efficient Logical Supervision for Mathematical Reasoning in LLMs

cs.CL · 2026-01-07 · unverdicted · novelty 6.0

FSLR explicitly supervises the initial logical planning step in math problems, boosting LLM accuracy by 3-5% while using 80% fewer training tokens than standard CoT fine-tuning.

FoNE: Precise Single-Token Number Embeddings via Fourier Features

cs.CL · 2025-02-13 · unverdicted · novelty 6.0

FoNE encodes numbers as single tokens via Fourier features and outperforms subword and digit-wise embeddings on addition, subtraction, and multiplication with far less data.

LiveCodeBench: Holistic and Contamination Free Evaluation of Large Language Models for Code

cs.SE · 2024-03-12 · unverdicted · novelty 6.0

LiveCodeBench collects 400 recent contest problems to create a contamination-free benchmark evaluating LLMs on code generation and related capabilities like self-repair and execution.

Globally Optimal Training of Spiking Neural Networks via Parameter Reconstruction

cs.NE · 2026-05-08 · unverdicted · novelty 5.0 · 2 refs

Parameter reconstruction algorithm for SNN training obtained by extending convexification of parallel feedforward threshold networks to the recurrent case that subsumes SNNs.

citing papers explorer

Showing 1 of 1 citing paper after filters.

Slower Generalization, Faster Memorization: A Sweet Spot in Algorithmic Learning cs.LG · 2026-05-14 · unverdicted · none · ref 5
In a structured-output NW matrix task, Transformers generalize fastest at intermediate dataset sizes while larger sets can accelerate memorization in partial-competence regimes.

Teach- ing arithmetic to small transformers

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer