Neuronpedia: An Open Platform for Mechanistic Inter- pretability Research,

· 2024

1 Pith paper cite this work. Polarity classification is still indexing.

1 Pith paper citing it

browse 1 citing papers

representative citing papers

Reading Task Failure Off the Activations: A Sparse-Feature Audit of GPT-2 Small on Indirect Object Identification

cs.LG · 2026-05-21 · unverdicted · novelty 5.0

An empirical audit identifies a strong SAE feature correlate for GPT-2 small failures on 'keys' prompts in the IOI task, performs ablation and baseline controls showing it is not causal, and presents the audit pipeline as the primary contribution.

citing papers explorer

Showing 1 of 1 citing paper.

Reading Task Failure Off the Activations: A Sparse-Feature Audit of GPT-2 Small on Indirect Object Identification cs.LG · 2026-05-21 · unverdicted · none · ref 24
An empirical audit identifies a strong SAE feature correlate for GPT-2 small failures on 'keys' prompts in the IOI task, performs ablation and baseline controls showing it is not causal, and presents the audit pipeline as the primary contribution.

Neuronpedia: An Open Platform for Mechanistic Inter- pretability Research,

fields

years

verdicts

representative citing papers

citing papers explorer