pith. sign in
Pith Number

pith:HR6EXQBO

pith:2023:HR6EXQBOEZBW37EITU7YOOTWPH
not attested not anchored not stored refs resolved

Orca: Progressive Learning from Complex Explanation Traces of GPT-4

Ahmed Awadallah, Arindam Mitra, Ganesh Jawahar, Hamid Palangi, Sahaj Agarwal, Subhabrata Mukherjee

A 13B model trained on GPT-4's step-by-step explanations reaches ChatGPT parity on complex reasoning benchmarks.

arxiv:2306.02707 v1 · 2023-06-05 · cs.CL · cs.LG

Add to your LaTeX paper
\usepackage{pith}
\pithnumber{HR6EXQBOEZBW37EITU7YOOTWPH}

Prints a linked badge after your title and injects PDF metadata. Compiles on arXiv. Learn more · Embed verified badge

Record completeness

1 Bitcoin timestamp
2 Internet Archive
3 Author claim open · sign in to claim
4 Citations open
5 Replications open
Portable graph bundle live · download bundle · merged state
The bundle contains the canonical record plus signed events. A mirror can host it anywhere and recompute the same current state with the deterministic merge algorithm.

Claims

C1strongest claim

Orca reaches parity with ChatGPT on the BBH benchmark and shows competitive performance (4 pts gap with optimized system message) in professional and academic examinations like the SAT, LSAT, GRE, and GMAT, both in zero-shot settings without CoT; while trailing behind GPT-4.

C2weakest assumption

The assumption that the imitation data's explanation traces cause genuine transfer of reasoning processes rather than style or pattern matching, and that benchmark gains reflect true capability improvements rather than data contamination or evaluation artifacts.

C3one line summary

A 13B model called Orca learns detailed reasoning from GPT-4 explanation traces and reaches parity with ChatGPT on Big-Bench Hard while outperforming other 13B models.

References

37 extracted · 37 resolved · 5 Pith anchors

[1] Agieval: A human-centric benchmark for evaluating foundation models, 2023 2023
[3] Mark Chen, Jerry Tworek, Heewoo Jun, Qiming Yuan, Henrique Ponde de Oliveira Pinto, Jared Kaplan, Harri Edwards, Yuri Burda, Nicholas Joseph, Greg Brockman, Alex Ray, Raul Puri, Gretchen Krueger, Mich 2021
[4] Beyond the imitation game: Quantifying and extrapolating the capabilities of language models, 2022 2022
[5] Training language models to follow instructions with human feedback 2022 · arXiv:2203.02155
[6] Constitutional AI: Harmlessness from AI Feedback 2022 · arXiv:2212.08073

Cited by

34 papers in Pith

Receipt and verification
First computed 2026-05-17T23:38:52.866062Z
Builder pith-number-builder-2026-05-17-v1
Signature Pith Ed25519 (pith-v1-2026-05) · public key
Schema pith-number/v1.0

Canonical hash

3c7c4bc02e26436dfc889d3f873a7679f57fa512b1b9f839c9e9d5d11ce7a4e0

Aliases

arxiv: 2306.02707 · arxiv_version: 2306.02707v1 · doi: 10.48550/arxiv.2306.02707 · pith_short_12: HR6EXQBOEZBW · pith_short_16: HR6EXQBOEZBW37EI · pith_short_8: HR6EXQBO
Agent API
Verify this Pith Number yourself
curl -sH 'Accept: application/ld+json' https://pith.science/pith/HR6EXQBOEZBW37EITU7YOOTWPH \
  | jq -c '.canonical_record' \
  | python3 -c "import sys,json,hashlib; b=json.dumps(json.loads(sys.stdin.read()), sort_keys=True, separators=(',',':'), ensure_ascii=False).encode(); print(hashlib.sha256(b).hexdigest())"
# expect: 3c7c4bc02e26436dfc889d3f873a7679f57fa512b1b9f839c9e9d5d11ce7a4e0
Canonical record JSON
{
  "metadata": {
    "abstract_canon_sha256": "f99376ddf09c239247038747abe31c74b056ace90aab7d796babbd73df4813f5",
    "cross_cats_sorted": [
      "cs.LG"
    ],
    "license": "http://creativecommons.org/licenses/by/4.0/",
    "primary_cat": "cs.CL",
    "submitted_at": "2023-06-05T08:58:39Z",
    "title_canon_sha256": "4daa66731f0c8281c32fb5ae122014e12d616b75e13008bace6110b67d198d9a"
  },
  "schema_version": "1.0",
  "source": {
    "id": "2306.02707",
    "kind": "arxiv",
    "version": 1
  }
}