GPT-2 small solves indirect object identification via a circuit of 26 attention heads organized into seven functional classes discovered through causal interventions.
Advances in Neural Information Processing Systems , editor=
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
representative citing papers
Finite-answer projections of continuation probabilities stabilize before the answer is parseable, showing 17-31 token mean lead in delayed-verdict tasks with Qwen3-4B-Instruct.
citing papers explorer
-
Interpretability in the Wild: a Circuit for Indirect Object Identification in GPT-2 small
GPT-2 small solves indirect object identification via a circuit of 26 attention heads organized into seven functional classes discovered through causal interventions.
-
When Does a Language Model Commit? A Finite-Answer Theory of Pre-Verbalization Commitment
Finite-answer projections of continuation probabilities stabilize before the answer is parseable, showing 17-31 token mean lead in delayed-verdict tasks with Qwen3-4B-Instruct.