Path patching provides a method to express and quantitatively test hypotheses that neural network behaviors are localized to sets of paths.
Neural Natural Language Inference Models Partially Embed Theories of Lexical Entailment and Negation
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
fields
cs.LG 2representative citing papers
Activation patching provides evidence about neural network circuits when the choice of metric is aligned with the hypothesis and common interpretation errors are avoided.
citing papers explorer
-
Localizing Model Behavior with Path Patching
Path patching provides a method to express and quantitatively test hypotheses that neural network behaviors are localized to sets of paths.
-
How to use and interpret activation patching
Activation patching provides evidence about neural network circuits when the choice of metric is aligned with the hypothesis and common interpretation errors are avoided.