Positional description matters for transformers arithmetic

Ruoqi Shen, S´ ebastien Bubeck, Ronen Eldan, Yin Tat Lee, Yuanzhi Li, Yi Zhang · 2023 · arXiv 2311.14737

3 Pith papers cite this work. Polarity classification is still indexing.

3 Pith papers citing it

representative citing papers

To See the Unseen: on the Generalization Ability of Transformers in Symbolic Reasoning

cs.AI · 2026-04-23 · conditional · novelty 7.0

Unembedding collapse in transformers prevents distinguishing unseen tokens in symbolic reasoning, but targeted interventions restore generalization.

The Long Delay to Arithmetic Generalization: When Learned Representations Outrun Behavior

cs.LG · 2026-03-30 · unverdicted · novelty 7.0

The grokking delay in encoder-decoder models on one-step Collatz prediction stems from decoder inability to use early-learned encoder representations of parity and residue structure, with numeral base acting as a strong inductive bias that can raise accuracy from failure to 99.8%.

FoNE: Precise Single-Token Number Embeddings via Fourier Features

cs.CL · 2025-02-13 · unverdicted · novelty 6.0

FoNE encodes numbers as single tokens via Fourier features and outperforms subword and digit-wise embeddings on addition, subtraction, and multiplication with far less data.

citing papers explorer

Showing 3 of 3 citing papers.

To See the Unseen: on the Generalization Ability of Transformers in Symbolic Reasoning cs.AI · 2026-04-23 · conditional · none · ref 14
Unembedding collapse in transformers prevents distinguishing unseen tokens in symbolic reasoning, but targeted interventions restore generalization.
The Long Delay to Arithmetic Generalization: When Learned Representations Outrun Behavior cs.LG · 2026-03-30 · unverdicted · none · ref 24
The grokking delay in encoder-decoder models on one-step Collatz prediction stems from decoder inability to use early-learned encoder representations of parity and residue structure, with numeral base acting as a strong inductive bias that can raise accuracy from failure to 99.8%.
FoNE: Precise Single-Token Number Embeddings via Fourier Features cs.CL · 2025-02-13 · unverdicted · none · ref 38
FoNE encodes numbers as single tokens via Fourier features and outperforms subword and digit-wise embeddings on addition, subtraction, and multiplication with far less data.

Positional description matters for transformers arithmetic

fields

years

verdicts

representative citing papers

citing papers explorer