pith. sign in
Pith Number

pith:CIHWLDKT

pith:2023:CIHWLDKTKGCTCOIPMPEHC3WEPE
not attested not anchored not stored refs resolved

Pythia: A Suite for Analyzing Large Language Models Across Training and Scaling

Aviya Skowron, Edward Raff, Eric Hallahan, Hailey Schoelkopf, Herbie Bradley, Kyle O'Brien, Lintang Sutawika, Mohammad Aflah Khan, Oskar van der Wal, Quentin Anthony, Shivanshu Purohit, Stella Biderman, USVSN Sai Prashanth

A suite of 16 language models trained on identical public data in the same order from 70M to 12B parameters enables direct tracking of how abilities emerge during training and across scales.

arxiv:2304.01373 v2 · 2023-04-03 · cs.CL

Add to your LaTeX paper
\usepackage{pith}
\pithnumber{CIHWLDKTKGCTCOIPMPEHC3WEPE}

Prints a linked badge after your title and injects PDF metadata. Compiles on arXiv. Learn more · Embed verified badge

Record completeness

1 Bitcoin timestamp
2 Internet Archive
3 Author claim open · sign in to claim
4 Citations open
5 Replications open
Portable graph bundle live · download bundle · merged state
The bundle contains the canonical record plus signed events. A mirror can host it anywhere and recompute the same current state with the deterministic merge algorithm.

Claims

C1strongest claim

We introduce Pythia, a suite of 16 LLMs all trained on public data seen in the exact same order and ranging in size from 70M to 12B parameters. We provide public access to 154 checkpoints for each one of the 16 models, alongside tools to download and reconstruct their exact training dataloaders for further study. We demonstrate that this highly controlled setup can be used to yield novel insights toward LLMs and their training dynamics.

C2weakest assumption

That training all models on the exact same public data in identical order, combined with released checkpoints, will produce reproducible and generalizable insights into training dynamics without major unaccounted confounding from data selection or implementation details.

C3one line summary

Pythia releases 16 identically trained LLMs with full checkpoints and data tools to study training dynamics, scaling, memorization, and bias in language models.

References

198 extracted · 198 resolved · 27 Pith anchors

[1] J., Berenberg, D., Fisk, I., Zanichelli, N., Zhang, B., et al 2022
[2] GPT-NeoX : Large scale autoregressive language modeling in PyTorch , 8 2021 2021
[3] H., Sanh, V., Yong, Z.-X., Webson, A., Raffel, C., Nayak, N 2022
[6] S., Sutawika, L., Purohit, S., Schoelkopf, H., Anthony, Q., and Raff, E 2023
[8] GPT-Neo : Large scale autoregressive language modeling with Mesh-TensorFlow 2021

Formal links

2 machine-checked theorem links

Cited by

29 papers in Pith

Receipt and verification
First computed 2026-05-17T23:38:50.686760Z
Builder pith-number-builder-2026-05-17-v1
Signature Pith Ed25519 (pith-v1-2026-05) · public key
Schema pith-number/v1.0

Canonical hash

120f658d53518531390f63c8716ec479191aa9c9eb53d212b17f5f5f4bd0e183

Aliases

arxiv: 2304.01373 · arxiv_version: 2304.01373v2 · doi: 10.48550/arxiv.2304.01373 · pith_short_12: CIHWLDKTKGCT · pith_short_16: CIHWLDKTKGCTCOIP · pith_short_8: CIHWLDKT
Agent API
Verify this Pith Number yourself
curl -sH 'Accept: application/ld+json' https://pith.science/pith/CIHWLDKTKGCTCOIPMPEHC3WEPE \
  | jq -c '.canonical_record' \
  | python3 -c "import sys,json,hashlib; b=json.dumps(json.loads(sys.stdin.read()), sort_keys=True, separators=(',',':'), ensure_ascii=False).encode(); print(hashlib.sha256(b).hexdigest())"
# expect: 120f658d53518531390f63c8716ec479191aa9c9eb53d212b17f5f5f4bd0e183
Canonical record JSON
{
  "metadata": {
    "abstract_canon_sha256": "cb96737b1f74b472e0e4c6f7b934e1843aa0112ab33accf7b0435c9c26ebe4ce",
    "cross_cats_sorted": [],
    "license": "http://creativecommons.org/licenses/by-sa/4.0/",
    "primary_cat": "cs.CL",
    "submitted_at": "2023-04-03T20:58:15Z",
    "title_canon_sha256": "0ace9f335158bb7146489e81651673b6809aa20162fa00ad6de66d9a408ff3db"
  },
  "schema_version": "1.0",
  "source": {
    "id": "2304.01373",
    "kind": "arxiv",
    "version": 2
  }
}