Sparse autoencoders reveal temporal difference learning in large language models.ArXiv, abs/2410.01280

Can Demircan, Tankred Saanum, Akshay Kumar Jagadish, Marcel Binz, Eric Schulz · arXiv 2410.01280

3 Pith papers cite this work. Polarity classification is still indexing.

3 Pith papers citing it

representative citing papers

Manifold Steering Reveals the Shared Geometry of Neural Network Representation and Behavior

cs.LG · 2026-05-06 · unverdicted · novelty 7.0

Manifold steering along activation geometry induces behavioral trajectories matching the natural manifold of outputs, while linear steering produces off-manifold unnatural behaviors.

Stories in Space: In-Context Learning Trajectories in Conceptual Belief Space

cs.CL · 2026-05-12 · unverdicted · novelty 6.0

LLMs perform in-context learning as trajectories through a structured low-dimensional conceptual belief space, with the structure visible in both behavior and internal representations and causally manipulable via interventions.

Steered Generation via Gradient-Based Optimization on Sparse Query Features

cs.LG · 2026-05-21 · unverdicted · novelty 5.0

Prototype-Based Sparse Steering decomposes query activations with SAEs and optimizes sparse features via gradients to steer LLM outputs toward specific behaviors.

citing papers explorer

Showing 3 of 3 citing papers.

Manifold Steering Reveals the Shared Geometry of Neural Network Representation and Behavior cs.LG · 2026-05-06 · unverdicted · none · ref 260
Manifold steering along activation geometry induces behavioral trajectories matching the natural manifold of outputs, while linear steering produces off-manifold unnatural behaviors.
Stories in Space: In-Context Learning Trajectories in Conceptual Belief Space cs.CL · 2026-05-12 · unverdicted · none · ref 42
LLMs perform in-context learning as trajectories through a structured low-dimensional conceptual belief space, with the structure visible in both behavior and internal representations and causally manipulable via interventions.
Steered Generation via Gradient-Based Optimization on Sparse Query Features cs.LG · 2026-05-21 · unverdicted · none · ref 11
Prototype-Based Sparse Steering decomposes query activations with SAEs and optimizes sparse features via gradients to steer LLM outputs toward specific behaviors.

Sparse autoencoders reveal temporal difference learning in large language models.ArXiv, abs/2410.01280

fields

years

verdicts

representative citing papers

citing papers explorer