pith. sign in
Pith Number

pith:6W7OXTAI

pith:2023:6W7OXTAIG7NINMBJSTPPYRCD5T
not attested not anchored not stored refs resolved

Towards Best Practices of Activation Patching in Language Models: Metrics and Methods

Fred Zhang, Neel Nanda

Varying metrics and corruption methods in activation patching can produce conflicting pictures of which model components matter.

arxiv:2309.16042 v2 · 2023-09-27 · cs.LG · cs.AI · cs.CL

Add to your LaTeX paper
\usepackage{pith}
\pithnumber{6W7OXTAIG7NINMBJSTPPYRCD5T}

Prints a linked badge after your title and injects PDF metadata. Compiles on arXiv. Learn more · Embed verified badge

Record completeness

1 Bitcoin timestamp
2 Internet Archive
3 Author claim open · sign in to claim
4 Citations open
5 Replications open
Portable graph bundle live · download bundle · merged state
The bundle contains the canonical record plus signed events. A mirror can host it anywhere and recompute the same current state with the deterministic merge algorithm.

Claims

C1strongest claim

In several settings of localization and circuit discovery in language models, we find that varying these hyperparameters could lead to disparate interpretability results.

C2weakest assumption

That the specific localization and circuit discovery tasks and models examined are representative enough for the derived recommendations to apply broadly to activation patching usage.

C3one line summary

Varying evaluation metrics and corruption methods in activation patching produces different localization and circuit discovery outcomes in language models, leading to recommendations for preferred practices.

References

108 extracted · 108 resolved · 3 Pith anchors

[1] Advances in Neural Information Processing Systems (NeurIPS) , year=
[2] Mechanistic Interpretability, Variables, and the Importance of Interpretable Bases , author=
[3] Transformer visualization via dictionary learning: contextualized embedding as a linear superposition of transformer factors
[5] International Conference on Artificial Intelligence and Statistics (AISTATS) , year=
[6] A circuit for

Cited by

22 papers in Pith

Receipt and verification
First computed 2026-05-17T23:38:14.187241Z
Builder pith-number-builder-2026-05-17-v1
Signature Pith Ed25519 (pith-v1-2026-05) · public key
Schema pith-number/v1.0

Canonical hash

f5beebcc0837da86b02994defc4443ecf487aefdf1df4577fa756af1d02069e2

Aliases

arxiv: 2309.16042 · arxiv_version: 2309.16042v2 · doi: 10.48550/arxiv.2309.16042 · pith_short_12: 6W7OXTAIG7NI · pith_short_16: 6W7OXTAIG7NINMBJ · pith_short_8: 6W7OXTAI
Agent API
Verify this Pith Number yourself
curl -sH 'Accept: application/ld+json' https://pith.science/pith/6W7OXTAIG7NINMBJSTPPYRCD5T \
  | jq -c '.canonical_record' \
  | python3 -c "import sys,json,hashlib; b=json.dumps(json.loads(sys.stdin.read()), sort_keys=True, separators=(',',':'), ensure_ascii=False).encode(); print(hashlib.sha256(b).hexdigest())"
# expect: f5beebcc0837da86b02994defc4443ecf487aefdf1df4577fa756af1d02069e2
Canonical record JSON
{
  "metadata": {
    "abstract_canon_sha256": "080c9b3f05b25967043221c33a05f6dfd8524334bf4eeacae0d6cdfa03a8f9f7",
    "cross_cats_sorted": [
      "cs.AI",
      "cs.CL"
    ],
    "license": "http://creativecommons.org/licenses/by/4.0/",
    "primary_cat": "cs.LG",
    "submitted_at": "2023-09-27T21:53:56Z",
    "title_canon_sha256": "ddff9c6515e0ed0b541d738fa2ef96374070398ddde2305c90772355b2954c95"
  },
  "schema_version": "1.0",
  "source": {
    "id": "2309.16042",
    "kind": "arxiv",
    "version": 2
  }
}