arXiv preprint arXiv:2402.02619 , year =

Quirke, P · 2024 · arXiv 2402.02619

5 Pith papers cite this work. Polarity classification is still indexing.

5 Pith papers citing it

representative citing papers

cs.LG · 2026-05-14 · unverdicted · novelty 7.0

Dynamic Latent Routing jointly learns discrete latent codes, routing policies, and model parameters via dynamic search to match or exceed supervised fine-tuning by 6.6 points on average in low-data settings across four datasets and six models.

Manifold Steering Reveals the Shared Geometry of Neural Network Representation and Behavior

cs.LG · 2026-05-06 · unverdicted · novelty 7.0

Manifold steering along activation geometry induces behavioral trajectories matching the natural manifold of outputs, while linear steering produces off-manifold unnatural behaviors.

Arithmetic in the Wild: Llama uses Base-10 Addition to Reason About Cyclic Concepts

cs.AI · 2026-05-01 · unverdicted · novelty 7.0

Llama-3.1-8B computes sums for cyclic concepts using base-10 addition via task-agnostic Fourier features with periods 2, 5, and 10 rather than modular arithmetic in the concept period.

Represented Is Not Computed: A Causal Test of Candidate Algorithmic Intermediates in a Transformer

cs.LG · 2026-05-21 · unverdicted · novelty 6.0

Transformer represents but does not causally transmit staged algorithmic intermediates for base-digit extraction, diverging from probe predictions.

Generalization in LLM Problem Solving: The Case of the Shortest Path

cs.AI · 2026-04-16 · unverdicted · novelty 6.0

LLMs show strong spatial generalization to unseen maps in shortest-path tasks but fail length scaling due to recursive instability, with data coverage setting hard limits.

citing papers explorer

Showing 5 of 5 citing papers.

Dynamic Latent Routing cs.LG · 2026-05-14 · unverdicted · none · ref 33
Dynamic Latent Routing jointly learns discrete latent codes, routing policies, and model parameters via dynamic search to match or exceed supervised fine-tuning by 6.6 points on average in low-data settings across four datasets and six models.
Manifold Steering Reveals the Shared Geometry of Neural Network Representation and Behavior cs.LG · 2026-05-06 · unverdicted · none · ref 139
Manifold steering along activation geometry induces behavioral trajectories matching the natural manifold of outputs, while linear steering produces off-manifold unnatural behaviors.
Arithmetic in the Wild: Llama uses Base-10 Addition to Reason About Cyclic Concepts cs.AI · 2026-05-01 · unverdicted · none · ref 144
Llama-3.1-8B computes sums for cyclic concepts using base-10 addition via task-agnostic Fourier features with periods 2, 5, and 10 rather than modular arithmetic in the concept period.
Represented Is Not Computed: A Causal Test of Candidate Algorithmic Intermediates in a Transformer cs.LG · 2026-05-21 · unverdicted · none · ref 21
Transformer represents but does not causally transmit staged algorithmic intermediates for base-digit extraction, diverging from probe predictions.
Generalization in LLM Problem Solving: The Case of the Shortest Path cs.AI · 2026-04-16 · unverdicted · none · ref 41
LLMs show strong spatial generalization to unseen maps in shortest-path tasks but fail length scaling due to recursive instability, with data coverage setting hard limits.

arXiv preprint arXiv:2402.02619 , year =

fields

years

verdicts

representative citing papers

citing papers explorer