Advances in Neural Information Processing Systems , editor=

Causal Abstractions of Neural Networks , author= · 2021

2 Pith papers cite this work. Polarity classification is still indexing.

2 Pith papers citing it

browse 2 citing papers

representative citing papers

Interpretability in the Wild: a Circuit for Indirect Object Identification in GPT-2 small

cs.LG · 2022-11-01 · conditional · novelty 8.0

GPT-2 small solves indirect object identification via a circuit of 26 attention heads organized into seven functional classes discovered through causal interventions.

When Does a Language Model Commit? A Finite-Answer Theory of Pre-Verbalization Commitment

cs.AI · 2026-05-07 · unverdicted · novelty 7.0

Finite-answer projections of continuation probabilities stabilize before the answer is parseable, showing 17-31 token mean lead in delayed-verdict tasks with Qwen3-4B-Instruct.

citing papers explorer

Showing 2 of 2 citing papers.

Interpretability in the Wild: a Circuit for Indirect Object Identification in GPT-2 small cs.LG · 2022-11-01 · conditional · none · ref 60
GPT-2 small solves indirect object identification via a circuit of 26 attention heads organized into seven functional classes discovered through causal interventions.
When Does a Language Model Commit? A Finite-Answer Theory of Pre-Verbalization Commitment cs.AI · 2026-05-07 · unverdicted · none · ref 19
Finite-answer projections of continuation probabilities stabilize before the answer is parseable, showing 17-31 token mean lead in delayed-verdict tasks with Qwen3-4B-Instruct.

Advances in Neural Information Processing Systems , editor=

fields

years

verdicts

representative citing papers

citing papers explorer