Path patching provides a method to express and quantitatively test hypotheses that neural network behaviors are localized to sets of paths.
arXiv preprint arXiv:2106.06087 , year=
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
fields
cs.LG 2representative citing papers
GPT-2 small solves indirect object identification via a circuit of 26 attention heads organized into seven functional classes discovered through causal interventions.
citing papers explorer
-
Localizing Model Behavior with Path Patching
Path patching provides a method to express and quantitatively test hypotheses that neural network behaviors are localized to sets of paths.
-
Interpretability in the Wild: a Circuit for Indirect Object Identification in GPT-2 small
GPT-2 small solves indirect object identification via a circuit of 26 attention heads organized into seven functional classes discovered through causal interventions.