pith. machine review for the scientific record. sign in
Pith Number

pith:DPM7NWRD

pith:2022:DPM7NWRDQBDKWACAS324KYME6L
not attested not anchored not stored refs resolved

Scaling Laws and Interpretability of Learning from Repeated Data

Ben Mann, Catherine Olsson, Chris Olah, Danny Hernandez, Dario Amodei, Dawn Drain, Jared Kaplan, Nelson Elhage, Nicholas Joseph, Nova DasSarma, Sam McCandlish, Scott Johnston, Sheer El-Showk, Tom Brown, Tom Conerly, Tom Henighan, Tristan Hume, Zac Hatfield-Dodds

Repeating 0.1% of training data 100 times makes an 800M model perform like a 400M model

arxiv:2205.10487 v1 · 2022-05-21 · cs.LG · cs.AI

Record completeness

1 Bitcoin timestamp
2 Internet Archive
3 Author claim open · sign in to claim
4 Citations open
5 Replications open
Portable graph bundle live · download bundle · merged state
The bundle contains the canonical record plus signed events. A mirror can host it anywhere and recompute the same current state with the deterministic merge algorithm.

Claims

C1strongest claim

Performance of an 800M parameter model can be degraded to that of a 2x smaller model (400M params) by repeating 0.1% of the data 100 times, despite the other 90% of the training tokens remaining unique.

C2weakest assumption

That the performance degradation is primarily caused by memorization consuming model capacity rather than by changes in optimization dynamics or other unmeasured factors.

C3one line summary

Repeating 0.1% of training data 100 times degrades an 800M parameter model's performance to that of a 400M model by damaging copying mechanisms and induction heads associated with generalization.

References

71 extracted · 71 resolved · 18 Pith anchors

[1] Learning Transferable Visual Models From Natural Language Supervision 2021 · doi:10.48550/arxiv.2103.00020
[2] Multimodal neurons in artificial neural networks · doi:10.23915/distill.00030
[3] In-context Learning and Induction Heads , year =
[4] Training language models to follow instructions with human feedback 2022 · doi:10.48550/arxiv.2203.02155
[5] A Variational Approach to Learning Curves , url = 2001

Formal links

1 machine-checked theorem link

Cited by

20 papers in Pith

Receipt and verification
First computed 2026-05-17T23:38:13.661649Z
Builder pith-number-builder-2026-05-17-v1
Signature Pith Ed25519 (pith-v1-2026-05) · public key
Schema pith-number/v1.0

Canonical hash

1bd9f6da238046ab004096f5c56184f2ee4f9d899bfef8747904d11cde8645ea

Aliases

arxiv: 2205.10487 · arxiv_version: 2205.10487v1 · doi: 10.48550/arxiv.2205.10487 · pith_short_12: DPM7NWRDQBDK · pith_short_16: DPM7NWRDQBDKWACA · pith_short_8: DPM7NWRD
Agent API
Verify this Pith Number yourself
curl -sH 'Accept: application/ld+json' https://pith.science/pith/DPM7NWRDQBDKWACAS324KYME6L \
  | jq -c '.canonical_record' \
  | python3 -c "import sys,json,hashlib; b=json.dumps(json.loads(sys.stdin.read()), sort_keys=True, separators=(',',':'), ensure_ascii=False).encode(); print(hashlib.sha256(b).hexdigest())"
# expect: 1bd9f6da238046ab004096f5c56184f2ee4f9d899bfef8747904d11cde8645ea
Canonical record JSON
{
  "metadata": {
    "abstract_canon_sha256": "1f3ba547302854ee4ff49f5540a368b48db97ee6f792bc5d1b6ce32b750eb0bd",
    "cross_cats_sorted": [
      "cs.AI"
    ],
    "license": "http://creativecommons.org/licenses/by/4.0/",
    "primary_cat": "cs.LG",
    "submitted_at": "2022-05-21T02:14:27Z",
    "title_canon_sha256": "5a369711a870bc18ae971249f94ed6b0f5346791131e8e2f0ab4be8f4502fb45"
  },
  "schema_version": "1.0",
  "source": {
    "id": "2205.10487",
    "kind": "arxiv",
    "version": 1
  }
}