pith. machine review for the scientific record. sign in
Pith Number

pith:MQKP6N75

pith:2022:MQKP6N75QRFPM5BVMR6HDPEPJ4
not attested not anchored not stored refs resolved

Measuring Progress on Scalable Oversight for Large Language Models

Amanda Askell, Andy Jones, Anna Chen, Anna Goldie, Azalia Mirhoseini, Ben Mann, Cameron McKinnon, Christopher Olah, Craig Pettit, Daniela Amodei, Dario Amodei, Dawn Drain, Dustin Li, Edwin Chen, Eli Tran-Johnson, Ethan Perez, Jackson Kernion, Jamie Kerr, Jared Kaplan, Jared Mueller, Jeeyoon Hyun, Jeffrey Ladish, Joshua Landau, Kamal Ndousse, Kamil\.e Luko\v{s}i\=ut\.e, Liane Lovitt, Nelson Elhage, Nicholas Joseph, Nicholas Schiefer, Noem\'i Mercado, Nova DasSarma, Robin Larson, Sam McCandlish, Samuel R. Bowman, Sandipan Kundu, Scott Heiner, Scott Johnston, Shauna Kravec, Sheer El Showk, Stanislav Fort, Timothy Telleen-Lawton, Tom Brown, Tom Henighan, Tristan Hume, Yuntao Bai, Zac Hatfield-Dodds

Humans chatting with an unreliable LLM outperform both the model and unaided humans on specialist tasks.

arxiv:2211.03540 v2 · 2022-11-04 · cs.HC · cs.AI · cs.CL

Record completeness

1 Bitcoin timestamp
2 Internet Archive
3 Author claim open · sign in to claim
4 Citations open
5 Replications open
Portable graph bundle live · download bundle · merged state
The bundle contains the canonical record plus signed events. A mirror can host it anywhere and recompute the same current state with the deterministic merge algorithm.

Claims

C1strongest claim

human participants who interact with an unreliable large-language-model dialog assistant through chat -- a trivial baseline strategy for scalable oversight -- substantially outperform both the model alone and their own unaided performance.

C2weakest assumption

That tasks like MMLU and time-limited QuALITY, where specialists succeed but unaided humans and current AI fail, serve as valid proxies for the challenges of supervising future AI systems that broadly outperform humans.

C3one line summary

Humans chatting with an unreliable LLM assistant outperform both the model alone and unaided humans on MMLU and time-limited QuALITY tasks.

References

42 extracted · 42 resolved · 14 Pith anchors

[2] The case for aligning narrowly superhuman models , url=
[3] Christiano, Paul and Xu, Mark and Cotra, Ajeya , note=
[4] Irving, Geoffrey and Christiano, Paul and Amodei, Dario , journal=
[7] Gagan Bansal and Tongshuang Sherry Wu and Joyce Zhou and Raymond Fok and Besmira Nushi and Ece Kamar and Marco Tulio Ribeiro and Daniel S. Weld , journal=. Does the Whole Exceed its Parts?
[9] Advances in neural information processing systems , volume=

Formal links

2 machine-checked theorem links

Cited by

22 papers in Pith

Receipt and verification
First computed 2026-05-17T23:38:13.752688Z
Builder pith-number-builder-2026-05-17-v1
Signature Pith Ed25519 (pith-v1-2026-05) · public key
Schema pith-number/v1.0

Canonical hash

6414ff37fd844af67435647c71bc8f4f3d78dfb19d13a59aba476e3da5f27cee

Aliases

arxiv: 2211.03540 · arxiv_version: 2211.03540v2 · doi: 10.48550/arxiv.2211.03540 · pith_short_12: MQKP6N75QRFP · pith_short_16: MQKP6N75QRFPM5BV · pith_short_8: MQKP6N75
Agent API
Verify this Pith Number yourself
curl -sH 'Accept: application/ld+json' https://pith.science/pith/MQKP6N75QRFPM5BVMR6HDPEPJ4 \
  | jq -c '.canonical_record' \
  | python3 -c "import sys,json,hashlib; b=json.dumps(json.loads(sys.stdin.read()), sort_keys=True, separators=(',',':'), ensure_ascii=False).encode(); print(hashlib.sha256(b).hexdigest())"
# expect: 6414ff37fd844af67435647c71bc8f4f3d78dfb19d13a59aba476e3da5f27cee
Canonical record JSON
{
  "metadata": {
    "abstract_canon_sha256": "9d8be59b4fe0892a9e7bc36fee065f6d3ad2279d919dc940b513451532307883",
    "cross_cats_sorted": [
      "cs.AI",
      "cs.CL"
    ],
    "license": "http://arxiv.org/licenses/nonexclusive-distrib/1.0/",
    "primary_cat": "cs.HC",
    "submitted_at": "2022-11-04T17:03:49Z",
    "title_canon_sha256": "cd7a48720f06f77097cb2d9e16b055b11610edd9b281b874f9a13a8c18b52b2c"
  },
  "schema_version": "1.0",
  "source": {
    "id": "2211.03540",
    "kind": "arxiv",
    "version": 2
  }
}