Path patching provides a method to express and quantitatively test hypotheses that neural network behaviors are localized to sets of paths.
Causal analysis of syntactic agreement mechanisms in neural language models
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
fields
cs.LG 2representative citing papers
GPT-2 small solves indirect object identification via a circuit of 26 attention heads organized into seven functional classes discovered through causal interventions.
citing papers explorer
-
Interpretability in the Wild: a Circuit for Indirect Object Identification in GPT-2 small
GPT-2 small solves indirect object identification via a circuit of 26 attention heads organized into seven functional classes discovered through causal interventions.