pith. sign in
Pith Number

pith:2SQRZQWV

pith:2026:2SQRZQWVLP3RE5QRKYYPLS5KIE
not attested not anchored not stored refs resolved

Exemplar Partitioning for Mechanistic Interpretability

Jessica Rumbelow

Exemplar Partitioning constructs feature dictionaries for language model activations by clustering around observed exemplars, achieving near-SAE performance at much lower computational cost.

arxiv:2605.14347 v1 · 2026-05-14 · cs.LG

Add to your LaTeX paper
\usepackage{pith}
\pithnumber{2SQRZQWVLP3RE5QRKYYPLS5KIE}

Prints a linked badge after your title and injects PDF metadata. Compiles on arXiv. Learn more · Embed verified badge

Record completeness

1 Bitcoin timestamp
2 Internet Archive
3 Author claim open · sign in to claim
4 Citations open
5 Replications open
Portable graph bundle live · download bundle · merged state
The bundle contains the canonical record plus signed events. A mirror can host it anywhere and recompute the same current state with the deterministic merge algorithm.

Claims

C1strongest claim

On AxBench latent concept detection at Gemma-2-2B-it L20, EP at p1 reaches mean AUROC 0.881, +0.126 over the canonical GemmaScope SAE leaderboard entry and within 0.030 of SAE-A's 0.911, at ~10^3× less build compute.

C2weakest assumption

That nearest-exemplar Voronoi regions defined by a single distance threshold correspond to causally meaningful and human-interpretable features rather than arbitrary geometric clusters.

C3one line summary

Exemplar Partitioning creates activation-space dictionaries via leader-clustered Voronoi partitions around real observed exemplars, delivering competitive concept-detection performance with far lower build cost than SAEs.

References

39 extracted · 39 resolved · 0 Pith anchors

[1] Emergence of simple-cell receptive field properties by learning a sparse code for natural images , author=. Nature , volume=
[2] Toy Models of Superposition , author=. 2022 , note= 2022
[3] Sparse Autoencoders Find Highly Interpretable Features in Language Models , author=. 2023 , eprint= 2023
[4] Towards Monosemanticity: Decomposing Language Models With Dictionary Learning , author=. 2023 , note= 2023
[5] Scaling Monosemanticity: Extracting Interpretable Features from Claude 3 Sonnet , author=. 2024 , note= 2024

Formal links

2 machine-checked theorem links

Receipt and verification
First computed 2026-05-17T23:39:08.115573Z
Builder pith-number-builder-2026-05-17-v1
Signature Pith Ed25519 (pith-v1-2026-05) · public key
Schema pith-number/v1.0

Canonical hash

d4a11cc2d55bf71276115630f5cbaa4115dd4ff6e69d2c2f6875ad656d7711a9

Aliases

arxiv: 2605.14347 · arxiv_version: 2605.14347v1 · doi: 10.48550/arxiv.2605.14347 · pith_short_12: 2SQRZQWVLP3R · pith_short_16: 2SQRZQWVLP3RE5QR · pith_short_8: 2SQRZQWV
Agent API
Verify this Pith Number yourself
curl -sH 'Accept: application/ld+json' https://pith.science/pith/2SQRZQWVLP3RE5QRKYYPLS5KIE \
  | jq -c '.canonical_record' \
  | python3 -c "import sys,json,hashlib; b=json.dumps(json.loads(sys.stdin.read()), sort_keys=True, separators=(',',':'), ensure_ascii=False).encode(); print(hashlib.sha256(b).hexdigest())"
# expect: d4a11cc2d55bf71276115630f5cbaa4115dd4ff6e69d2c2f6875ad656d7711a9
Canonical record JSON
{
  "metadata": {
    "abstract_canon_sha256": "636aa3e6dcc915b5ec565a8db37bd6a75c7ef2486ee07309547e27d472f16e3a",
    "cross_cats_sorted": [],
    "license": "http://creativecommons.org/licenses/by/4.0/",
    "primary_cat": "cs.LG",
    "submitted_at": "2026-05-14T04:15:30Z",
    "title_canon_sha256": "325b159300e80cf7f2e654225d65059c4c87a20c84cfd4e299bf50ee7c414895"
  },
  "schema_version": "1.0",
  "source": {
    "id": "2605.14347",
    "kind": "arxiv",
    "version": 1
  }
}