pith:6W7OXTAI
Towards Best Practices of Activation Patching in Language Models: Metrics and Methods
Varying metrics and corruption methods in activation patching can produce conflicting pictures of which model components matter.
arxiv:2309.16042 v2 · 2023-09-27 · cs.LG · cs.AI · cs.CL
Add to your LaTeX paper
\usepackage{pith}
\pithnumber{6W7OXTAIG7NINMBJSTPPYRCD5T}
Prints a linked badge after your title and injects PDF metadata. Compiles on arXiv. Learn more · Embed verified badge
Record completeness
Claims
In several settings of localization and circuit discovery in language models, we find that varying these hyperparameters could lead to disparate interpretability results.
That the specific localization and circuit discovery tasks and models examined are representative enough for the derived recommendations to apply broadly to activation patching usage.
Varying evaluation metrics and corruption methods in activation patching produces different localization and circuit discovery outcomes in language models, leading to recommendations for preferred practices.
References
Cited by
Receipt and verification
| First computed | 2026-05-17T23:38:14.187241Z |
|---|---|
| Builder | pith-number-builder-2026-05-17-v1 |
| Signature | Pith Ed25519
(pith-v1-2026-05) · public key |
| Schema | pith-number/v1.0 |
Canonical hash
f5beebcc0837da86b02994defc4443ecf487aefdf1df4577fa756af1d02069e2
Aliases
· · · · ·Agent API
Verify this Pith Number yourself
curl -sH 'Accept: application/ld+json' https://pith.science/pith/6W7OXTAIG7NINMBJSTPPYRCD5T \
| jq -c '.canonical_record' \
| python3 -c "import sys,json,hashlib; b=json.dumps(json.loads(sys.stdin.read()), sort_keys=True, separators=(',',':'), ensure_ascii=False).encode(); print(hashlib.sha256(b).hexdigest())"
# expect: f5beebcc0837da86b02994defc4443ecf487aefdf1df4577fa756af1d02069e2
Canonical record JSON
{
"metadata": {
"abstract_canon_sha256": "080c9b3f05b25967043221c33a05f6dfd8524334bf4eeacae0d6cdfa03a8f9f7",
"cross_cats_sorted": [
"cs.AI",
"cs.CL"
],
"license": "http://creativecommons.org/licenses/by/4.0/",
"primary_cat": "cs.LG",
"submitted_at": "2023-09-27T21:53:56Z",
"title_canon_sha256": "ddff9c6515e0ed0b541d738fa2ef96374070398ddde2305c90772355b2954c95"
},
"schema_version": "1.0",
"source": {
"id": "2309.16042",
"kind": "arxiv",
"version": 2
}
}