Arithmetic in the Wild: Llama uses Base-10 Addition to Reason About Cyclic Concepts

· 2026 · cs.AI · arXiv 2605.01148

2 Pith papers cite this work. Polarity classification is still indexing.

2 Pith papers citing it

open full Pith review browse 2 citing papers arXiv PDF

abstract

Does structure in representations imply structure in computation? We study how Llama-3.1-8B reasons over cyclic concepts (e.g., "what month is six months after August?"). Even though Llama-3.1-8B's representations for these concepts are circularly structured, we find that instead of directly computing modular addition in the period of the cyclic concept (e.g., 12 for months), the model re-uses a generic addition mechanism across tasks that operates independently of concept-specific geometry. First, it computes the sum of its two inputs using base-10 addition (six + August=14). Then, it maps this sum back to cyclic concept space (14->February). We show that Llama-3.1-8B uses task-agnostic Fourier features to compute these sums--in fact, these features have periods that respect standard base-10 addition, e.g., 2, 5, and 10, rather than the cyclic concept period (e.g., 12 for months). Furthermore, we identify a sparse set of 28 MLP neurons re-used across all tasks (approximately 0.2% of the MLP at layer 18) that can be partitioned into disjoint clusters, each computing the sum for a Fourier feature with a different period. Our work highlights how an interplay between causal abstraction and feature geometry can deepen our mechanistic understanding of LMs.

representative citing papers

Relational Rank Geometry in Transformers: Detecting and Steering Hidden-State Relation Frames

cs.LG · 2026-05-28 · unverdicted · novelty 6.0

Transformer hidden states contain rank-indexed orientation signatures for true r-argument relations (r=3-6) that survive surface controls and can be patched to alter model outputs on relation tasks.

Why Larger Models Learn More: Effects of Capacity, Interference, and Rare-Task Retention

cs.LG · 2026-05-28 · unverdicted · novelty 6.0

Larger models succeed on rare and complex tasks by reducing gradient interference from common tasks, allowing rare-task features to accumulate, as shown via synthetic task mixtures and OLMo pretraining from 4M to 4B parameters.

citing papers explorer

Showing 2 of 2 citing papers.

Relational Rank Geometry in Transformers: Detecting and Steering Hidden-State Relation Frames cs.LG · 2026-05-28 · unverdicted · none · ref 14 · internal anchor
Transformer hidden states contain rank-indexed orientation signatures for true r-argument relations (r=3-6) that survive surface controls and can be patched to alter model outputs on relation tasks.
Why Larger Models Learn More: Effects of Capacity, Interference, and Rare-Task Retention cs.LG · 2026-05-28 · unverdicted · none · ref 73 · internal anchor
Larger models succeed on rare and complex tasks by reducing gradient interference from common tasks, allowing rare-task features to accumulate, as shown via synthetic task mixtures and OLMo pretraining from 4M to 4B parameters.

Arithmetic in the Wild: Llama uses Base-10 Addition to Reason About Cyclic Concepts

fields

years

verdicts

representative citing papers

citing papers explorer