Pith Number

pith:CIHWLDKT

pith:2023:CIHWLDKTKGCTCOIPMPEHC3WEPE

not attested not anchored not stored refs resolved

Pythia: A Suite for Analyzing Large Language Models Across Training and Scaling

Aviya Skowron, Edward Raff, Eric Hallahan, Hailey Schoelkopf, Herbie Bradley, Kyle O'Brien, Lintang Sutawika, Mohammad Aflah Khan, Oskar van der Wal, Quentin Anthony, Shivanshu Purohit, Stella Biderman, USVSN Sai Prashanth

A suite of 16 language models trained on identical public data in the same order from 70M to 12B parameters enables direct tracking of how abilities emerge during training and across scales.

arxiv:2304.01373 v2 · 2023-04-03 · cs.CL

Open paper page JSON Open Graph Bundle Merged state Verified badge What is a Pith Number?

Add to your LaTeX paper

\usepackage{pith}
\pithnumber{CIHWLDKTKGCTCOIPMPEHC3WEPE}

Prints a linked badge after your title and injects PDF metadata. Compiles on arXiv. Learn more · Embed verified badge

Record completeness

1 Bitcoin timestamp

2 Internet Archive

3 Author claim open · sign in to claim

4 Citations open

5 Replications open

✓ Portable graph bundle live · download bundle · merged state

The bundle contains the canonical record plus signed events. A mirror can host it anywhere and recompute the same current state with the deterministic merge algorithm.

Claims

C1strongest claim

We introduce Pythia, a suite of 16 LLMs all trained on public data seen in the exact same order and ranging in size from 70M to 12B parameters. We provide public access to 154 checkpoints for each one of the 16 models, alongside tools to download and reconstruct their exact training dataloaders for further study. We demonstrate that this highly controlled setup can be used to yield novel insights toward LLMs and their training dynamics.

C2weakest assumption

That training all models on the exact same public data in identical order, combined with released checkpoints, will produce reproducible and generalizable insights into training dynamics without major unaccounted confounding from data selection or implementation details.

C3one line summary

Pythia releases 16 identically trained LLMs with full checkpoints and data tools to study training dynamics, scaling, memorization, and bias in language models.

References

198 extracted · 198 resolved · 27 Pith anchors

[1] J., Berenberg, D., Fisk, I., Zanichelli, N., Zhang, B., et al 2022

[2] GPT-NeoX : Large scale autoregressive language modeling in PyTorch , 8 2021 2021

[3] H., Sanh, V., Yong, Z.-X., Webson, A., Raffel, C., Nayak, N 2022

[6] S., Sutawika, L., Purohit, S., Schoelkopf, H., Anthony, Q., and Raff, E 2023

[8] GPT-Neo : Large scale autoregressive language modeling with Mesh-TensorFlow 2021

Formal links

2 machine-checked theorem links

Cited by

29 papers in Pith

LLMs as Noisy Channels: A Shannon Perspective on Model Capacity and Scaling Laws

Baichuan 2: Open Large-scale Language Models

LLMs on the Line: Data Determines Loss-to-Loss Scaling Laws

Features have life history. And we should care

Receipt and verification

First computed	2026-05-17T23:38:50.686760Z
Builder	pith-number-builder-2026-05-17-v1
Signature	Pith Ed25519 (`pith-v1-2026-05`) · public key
Schema	pith-number/v1.0

Canonical hash

120f658d53518531390f63c8716ec479191aa9c9eb53d212b17f5f5f4bd0e183

Aliases

arxiv: 2304.01373 · arxiv_version: 2304.01373v2 · doi: 10.48550/arxiv.2304.01373 · pith_short_12: CIHWLDKTKGCT · pith_short_16: CIHWLDKTKGCTCOIP · pith_short_8: CIHWLDKT

Agent API

Resolver JSON Graph JSON Events JSON Schema Signing key

Verify this Pith Number yourself

curl -sH 'Accept: application/ld+json' https://pith.science/pith/CIHWLDKTKGCTCOIPMPEHC3WEPE \
  | jq -c '.canonical_record' \
  | python3 -c "import sys,json,hashlib; b=json.dumps(json.loads(sys.stdin.read()), sort_keys=True, separators=(',',':'), ensure_ascii=False).encode(); print(hashlib.sha256(b).hexdigest())"
# expect: 120f658d53518531390f63c8716ec479191aa9c9eb53d212b17f5f5f4bd0e183

Canonical record JSON

{
  "metadata": {
    "abstract_canon_sha256": "cb96737b1f74b472e0e4c6f7b934e1843aa0112ab33accf7b0435c9c26ebe4ce",
    "cross_cats_sorted": [],
    "license": "http://creativecommons.org/licenses/by-sa/4.0/",
    "primary_cat": "cs.CL",
    "submitted_at": "2023-04-03T20:58:15Z",
    "title_canon_sha256": "0ace9f335158bb7146489e81651673b6809aa20162fa00ad6de66d9a408ff3db"
  },
  "schema_version": "1.0",
  "source": {
    "id": "2304.01373",
    "kind": "arxiv",
    "version": 2
  }
}