pith. machine review for the scientific record.
sign in
Pith Number

pith:QXWMAQZU

pith:2024:QXWMAQZURQIIK5MLWJWKSA4UXL
not attested not anchored not stored refs resolved

Improving Dictionary Learning with Gated Sparse Autoencoders

Arthur Conmy, J\'anos Kram\'ar, Lewis Smith, Neel Nanda, Rohin Shah, Senthooran Rajamanoharan, Tom Lieberum, Vikrant Varma

Gated Sparse Autoencoders separate feature selection from magnitude estimation to eliminate L1-induced shrinkage in language model dictionary learning.

arxiv:2404.16014 v2 · 2024-04-24 · cs.LG · cs.AI

Record completeness

1 Bitcoin timestamp
2 Internet Archive
3 Author claim open · sign in to claim
4 Citations open
5 Replications open
Portable graph bundle live · download bundle · merged state
The bundle contains the canonical record plus signed events. A mirror can host it anywhere and recompute the same current state with the deterministic merge algorithm.

Claims

C1strongest claim

Through training SAEs on LMs of up to 7B parameters we find that, in typical hyper-parameter ranges, Gated SAEs solve shrinkage, are similarly interpretable, and require half as many firing features to achieve comparable reconstruction fidelity.

C2weakest assumption

That restricting the L1 penalty to the gating branch does not introduce new biases or degrade feature quality in dimensions not measured by the reported reconstruction and interpretability metrics.

C3one line summary

Gated SAEs decouple which features to use from how large their activations should be, applying the L1 penalty only to selection and thereby eliminating shrinkage while halving the number of firing features needed for good fidelity.

References

255 extracted · 255 resolved · 5 Pith anchors

[1] M. Aharon, M. Elad, and A. Bruckstein. K-svd: An algorithm for designing overcomplete dictionaries for sparse representation. IEEE Transactions on Signal Processing, 54 0 (11): 0 4311--4322, 2006. doi 2006 · doi:10.1109/tsp.2006.881199
[2] Introducing the next generation of Claude 2024
[3] J. Batson, B. Chen, A. Jones, A. Templeton, T. Conerly, J. Marcus, T. Henighan, N. L. Turner, and A. Pearce. Circuits Updates - March 2024 . Transformer Circuits Thread, 2024. URL https://transformer- 2024
[4] Y. Bengio. Deep learning of representations: Looking forward, 2013 2013
[5] S. Biderman, H. Schoelkopf, Q. G. Anthony, H. Bradley, K. O’Brien, E. Hallahan, M. A. Khan, S. Purohit, U. S. Prashanth, E. Raff, et al. Pythia: A suite for analyzing large language models across trai 2023

Formal links

2 machine-checked theorem links

Cited by

18 papers in Pith

Receipt and verification
First computed 2026-05-17T23:38:13.270899Z
Builder pith-number-builder-2026-05-17-v1
Signature Pith Ed25519 (pith-v1-2026-05) · public key
Schema pith-number/v1.0

Canonical hash

85ecc043348c1085758bb26ca90394baf62ecde2b4a65f0faa10053019f8335c

Aliases

arxiv: 2404.16014 · arxiv_version: 2404.16014v2 · doi: 10.48550/arxiv.2404.16014 · pith_short_12: QXWMAQZURQII · pith_short_16: QXWMAQZURQIIK5ML · pith_short_8: QXWMAQZU
Agent API
Verify this Pith Number yourself
curl -sH 'Accept: application/ld+json' https://pith.science/pith/QXWMAQZURQIIK5MLWJWKSA4UXL \
  | jq -c '.canonical_record' \
  | python3 -c "import sys,json,hashlib; b=json.dumps(json.loads(sys.stdin.read()), sort_keys=True, separators=(',',':'), ensure_ascii=False).encode(); print(hashlib.sha256(b).hexdigest())"
# expect: 85ecc043348c1085758bb26ca90394baf62ecde2b4a65f0faa10053019f8335c
Canonical record JSON
{
  "metadata": {
    "abstract_canon_sha256": "a5a5978bca297540afaf137cdc1c11e59dd3aa7ff92132d2dba627675ae9dca9",
    "cross_cats_sorted": [
      "cs.AI"
    ],
    "license": "http://creativecommons.org/licenses/by/4.0/",
    "primary_cat": "cs.LG",
    "submitted_at": "2024-04-24T17:47:22Z",
    "title_canon_sha256": "de78f0873097f3b9f45e65322afe73347a1488e485186984f4ef162891cec806"
  },
  "schema_version": "1.0",
  "source": {
    "id": "2404.16014",
    "kind": "arxiv",
    "version": 2
  }
}