When models manipulate manifolds: The geometry of a counting task

Wes Gurnee, Emmanuel Ameisen, Isaac Kauvar, Julius Tarng, Adam Pearce, Chris Olah, Joshua Batson · 2026 · arXiv 2601.04480

6 Pith papers cite this work. Polarity classification is still indexing.

6 Pith papers citing it

read on arXiv browse 6 citing papers

citation-role summary

background 3

citation-polarity summary

background 2 support 1

representative citing papers

SMIXAE: Towards Unsupervised Manifold Discovery in Language Models

cs.LG · 2026-05-09 · unverdicted · novelty 7.0

SMIXAE is a new mixture-of-autoencoders architecture that learns multidimensional manifolds directly from transformer activations, recovering known structures and identifying novel ones in Gemma 2 2B and 9B models.

Manifold Steering Reveals the Shared Geometry of Neural Network Representation and Behavior

cs.LG · 2026-05-06 · unverdicted · novelty 7.0

Manifold steering along activation geometry induces behavioral trajectories matching the natural manifold of outputs, while linear steering produces off-manifold unnatural behaviors.

Task Vector Geometry Underlies Dual Modes of Task Inference in Transformers

cs.LG · 2026-05-05 · unverdicted · novelty 7.0

In a controlled synthetic setting, transformers implement in-distribution task inference via convex combinations of task vectors and out-of-distribution inference via nearly orthogonal extrapolative representations.

A Geometric Perspective on Next-Token Prediction in Large Language Models: Three Emerging Phases

cs.LG · 2026-05-09 · unverdicted · novelty 6.0

LLMs exhibit three geometric phases in next-token prediction—seeding multiplexing, hoisting overriding, and focal convergence—where predictive subspaces rise, stabilize, and converge across layers.

The Position Curse: LLMs Struggle to Locate the Last Few Items in a List

cs.LG · 2026-05-08 · unverdicted · novelty 6.0

LLMs exhibit the Position Curse, with backward position retrieval in lists lagging far behind forward retrieval, showing only partial gains from PosBench fine-tuning.

Dimensionality-Aware Anomaly Detection in Learned Representations of Self-Supervised Speech Models

eess.AS · 2026-05-04 · unverdicted · novelty 6.0

LID rises under low-SNR perturbations in models like WavLM and wav2vec 2.0, diverges between benign and adversarial noise at high SNR, co-occurs with higher WER, and supports anomaly detection at AUROC 0.78-1.00.

citing papers explorer

Showing 6 of 6 citing papers.

SMIXAE: Towards Unsupervised Manifold Discovery in Language Models cs.LG · 2026-05-09 · unverdicted · none · ref 3
SMIXAE is a new mixture-of-autoencoders architecture that learns multidimensional manifolds directly from transformer activations, recovering known structures and identifying novel ones in Gemma 2 2B and 9B models.
Manifold Steering Reveals the Shared Geometry of Neural Network Representation and Behavior cs.LG · 2026-05-06 · unverdicted · none · ref 189
Manifold steering along activation geometry induces behavioral trajectories matching the natural manifold of outputs, while linear steering produces off-manifold unnatural behaviors.
Task Vector Geometry Underlies Dual Modes of Task Inference in Transformers cs.LG · 2026-05-05 · unverdicted · none · ref 16
In a controlled synthetic setting, transformers implement in-distribution task inference via convex combinations of task vectors and out-of-distribution inference via nearly orthogonal extrapolative representations.
A Geometric Perspective on Next-Token Prediction in Large Language Models: Three Emerging Phases cs.LG · 2026-05-09 · unverdicted · none · ref 12
LLMs exhibit three geometric phases in next-token prediction—seeding multiplexing, hoisting overriding, and focal convergence—where predictive subspaces rise, stabilize, and converge across layers.
The Position Curse: LLMs Struggle to Locate the Last Few Items in a List cs.LG · 2026-05-08 · unverdicted · none · ref 8
LLMs exhibit the Position Curse, with backward position retrieval in lists lagging far behind forward retrieval, showing only partial gains from PosBench fine-tuning.
Dimensionality-Aware Anomaly Detection in Learned Representations of Self-Supervised Speech Models eess.AS · 2026-05-04 · unverdicted · none · ref 33
LID rises under low-SNR perturbations in models like WavLM and wav2vec 2.0, diverges between benign and adversarial noise at high SNR, co-occurs with higher WER, and supports anomaly detection at AUROC 0.78-1.00.

When models manipulate manifolds: The geometry of a counting task

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer