MechaRule localizes agonist neurons in LLMs via contrastive hierarchical ablation to ground rule extraction in circuitry, recalling 96.8% of high-effect neurons and reducing task performance when suppressed.
McCormick, and David Madigan
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
verdicts
UNVERDICTED 2representative citing papers
Explores reference document choices for applying DeepSHAP to neural retrieval models and reports that its explanations differ substantially from those of LIME.
citing papers explorer
-
Neuron-Anchored Rule Extraction for Large Language Models via Contrastive Hierarchical Ablation
MechaRule localizes agonist neurons in LLMs via contrastive hierarchical ablation to ground rule extraction in circuitry, recalling 96.8% of high-effect neurons and reducing task performance when suppressed.
-
A study on the Interpretability of Neural Retrieval Models using DeepSHAP
Explores reference document choices for applying DeepSHAP to neural retrieval models and reports that its explanations differ substantially from those of LIME.