pith. sign in
Pith Number

pith:VV7D6X4N

pith:2023:VV7D6X4NKA5XSLSHIYQRE5SR3I
not attested not anchored not stored refs resolved

Textbooks Are All You Need II: phi-1.5 technical report

Allie Del Giorno, Ronen Eldan, S\'ebastien Bubeck, Suriya Gunasekar, Yin Tat Lee, Yuanzhi Li

A 1.3 billion parameter model trained on synthetic textbooks matches models five times larger on reasoning tasks.

arxiv:2309.05463 v1 · 2023-09-11 · cs.CL · cs.AI

Add to your LaTeX paper
\usepackage{pith}
\pithnumber{VV7D6X4NKA5XSLSHIYQRE5SR3I}

Prints a linked badge after your title and injects PDF metadata. Compiles on arXiv. Learn more · Embed verified badge

Record completeness

1 Bitcoin timestamp
2 Internet Archive
3 Author claim open · sign in to claim
4 Citations open
5 Replications open
Portable graph bundle live · download bundle · merged state
The bundle contains the canonical record plus signed events. A mirror can host it anywhere and recompute the same current state with the deterministic merge algorithm.

Claims

C1strongest claim

phi-1.5 ... with performance on natural language tasks comparable to models 5x larger, and surpassing most non-frontier LLMs on more complex reasoning tasks such as grade-school mathematics and basic coding.

C2weakest assumption

That standard benchmarks for grade-school math and basic coding sufficiently measure general common sense reasoning without the model overfitting to patterns in the synthetic textbook data.

C3one line summary

phi-1.5 is a 1.3B parameter model trained on synthetic textbook data that matches the reasoning performance of models five times larger on natural language, math, and basic coding tasks.

References

24 extracted · 24 resolved · 14 Pith anchors

[1] Program Synthesis with Large Language Models · arXiv:2108.07732
[2] Identify, align, and integrate: Matching knowledge graphs to commonsense reasoning tasks
[3] Sparks of Artificial General Intelligence: Early experiments with GPT-4 · arXiv:2303.12712
[4] On the dangers of stochastic parrots: Can language models be too big? In Proceedings of the 2021 ACM Conference on Fairness, Accountability, and Transparency , pages 610–623 2021
[5] PIQA: Reasoning about Physical Commonsense in Natural Language 1911 · arXiv:1911.11641

Formal links

2 machine-checked theorem links

Cited by

40 papers in Pith

Receipt and verification
First computed 2026-05-17T23:39:22.089627Z
Builder pith-number-builder-2026-05-17-v1
Signature Pith Ed25519 (pith-v1-2026-05) · public key
Schema pith-number/v1.0

Canonical hash

ad7e3f5f8d503b792e474621127651da1367db03a76710f51beb0b221c4806d7

Aliases

arxiv: 2309.05463 · arxiv_version: 2309.05463v1 · doi: 10.48550/arxiv.2309.05463 · pith_short_12: VV7D6X4NKA5X · pith_short_16: VV7D6X4NKA5XSLSH · pith_short_8: VV7D6X4N
Agent API
Verify this Pith Number yourself
curl -sH 'Accept: application/ld+json' https://pith.science/pith/VV7D6X4NKA5XSLSHIYQRE5SR3I \
  | jq -c '.canonical_record' \
  | python3 -c "import sys,json,hashlib; b=json.dumps(json.loads(sys.stdin.read()), sort_keys=True, separators=(',',':'), ensure_ascii=False).encode(); print(hashlib.sha256(b).hexdigest())"
# expect: ad7e3f5f8d503b792e474621127651da1367db03a76710f51beb0b221c4806d7
Canonical record JSON
{
  "metadata": {
    "abstract_canon_sha256": "f2258d654a2f0d6c28d086ba9fe59e7ffe77bc4cefb4f99f008ef8e38d28238b",
    "cross_cats_sorted": [
      "cs.AI"
    ],
    "license": "http://creativecommons.org/licenses/by/4.0/",
    "primary_cat": "cs.CL",
    "submitted_at": "2023-09-11T14:01:45Z",
    "title_canon_sha256": "99f1dc3a00721b4964d2f8fcf43443912ce5445816a292edfbd4c7d6b695e336"
  },
  "schema_version": "1.0",
  "source": {
    "id": "2309.05463",
    "kind": "arxiv",
    "version": 1
  }
}