pith. sign in
Pith Number

pith:3WE7LPRT

pith:2023:3WE7LPRTJJF7PEZ2S25A2IAP3W
not attested not anchored not stored refs resolved

Scaling Relationship on Learning Mathematical Reasoning with Large Language Models

Chang Zhou, Chengpeng Li, Chuanqi Tan, Guanting Dong, Hongyi Yuan, Jingren Zhou, Keming Lu, Zheng Yuan

Pre-training loss predicts LLM mathematical reasoning performance better than parameter count, and rejection sampling fine-tuning lifts LLaMA-7B to 49.3 percent accuracy on GSM8K.

arxiv:2308.01825 v2 · 2023-08-03 · cs.CL

Add to your LaTeX paper
\usepackage{pith}
\pithnumber{3WE7LPRTJJF7PEZ2S25A2IAP3W}

Prints a linked badge after your title and injects PDF metadata. Compiles on arXiv. Learn more · Embed verified badge

Record completeness

1 Bitcoin timestamp
2 Internet Archive
3 Author claim open · sign in to claim
4 Citations open
5 Replications open
Portable graph bundle live · download bundle · merged state
The bundle contains the canonical record plus signed events. A mirror can host it anywhere and recompute the same current state with the deterministic merge algorithm.

Claims

C1strongest claim

We find that pre-training loss is a better indicator of the model's performance than the model's parameter count. ... we combine rejection samples from multiple models which push LLaMA-7B to an accuracy of 49.3% on GSM8K which outperforms the supervised fine-tuning (SFT) accuracy of 35.9% significantly.

C2weakest assumption

That model-generated reasoning paths can be reliably verified as correct by the same or similar models without introducing systematic errors or false positives in the rejection filter.

C3one line summary

Pre-training loss predicts LLM math reasoning better than parameter count; rejection sampling fine-tuning with diverse paths raises LLaMA-7B accuracy on GSM8K from 35.9% with SFT to 49.3%.

References

93 extracted · 93 resolved · 10 Pith anchors

[2] Emergent Abilities of Large Language Models , author=. Trans. Mach. Learn. Res. , year=
[3] Finetuned Language Models Are Zero-Shot Learners , author=. ArXiv , year=
[4] Chain of Thought Prompting Elicits Reasoning in Large Language Models , author=. ArXiv , year=
[5] The Eleventh International Conference on Learning Representations , year=
[6] Scaling Data-Constrained Language Models , author=. 2023 , eprint= 2023

Formal links

2 machine-checked theorem links

Cited by

42 papers in Pith

Receipt and verification
First computed 2026-05-17T23:39:19.734436Z
Builder pith-number-builder-2026-05-17-v1
Signature Pith Ed25519 (pith-v1-2026-05) · public key
Schema pith-number/v1.0

Canonical hash

dd89f5be334a4bf7933a96ba0d200fdda07c40408dee4eb6a83fba23befb2395

Aliases

arxiv: 2308.01825 · arxiv_version: 2308.01825v2 · doi: 10.48550/arxiv.2308.01825 · pith_short_12: 3WE7LPRTJJF7 · pith_short_16: 3WE7LPRTJJF7PEZ2 · pith_short_8: 3WE7LPRT
Agent API
Verify this Pith Number yourself
curl -sH 'Accept: application/ld+json' https://pith.science/pith/3WE7LPRTJJF7PEZ2S25A2IAP3W \
  | jq -c '.canonical_record' \
  | python3 -c "import sys,json,hashlib; b=json.dumps(json.loads(sys.stdin.read()), sort_keys=True, separators=(',',':'), ensure_ascii=False).encode(); print(hashlib.sha256(b).hexdigest())"
# expect: dd89f5be334a4bf7933a96ba0d200fdda07c40408dee4eb6a83fba23befb2395
Canonical record JSON
{
  "metadata": {
    "abstract_canon_sha256": "1b9def5d64481b95ede3a3744d495f79937b5bf1849f35e0103828ced38257f7",
    "cross_cats_sorted": [],
    "license": "http://arxiv.org/licenses/nonexclusive-distrib/1.0/",
    "primary_cat": "cs.CL",
    "submitted_at": "2023-08-03T15:34:01Z",
    "title_canon_sha256": "c762f07c2dee1db9acd2261bf247e9a3ef1c3108732525103ccfca0718b6a249"
  },
  "schema_version": "1.0",
  "source": {
    "id": "2308.01825",
    "kind": "arxiv",
    "version": 2
  }
}