pith:MQKP6N75
Measuring Progress on Scalable Oversight for Large Language Models
Humans chatting with an unreliable LLM outperform both the model and unaided humans on specialist tasks.
arxiv:2211.03540 v2 · 2022-11-04 · cs.HC · cs.AI · cs.CL
Record completeness
Claims
human participants who interact with an unreliable large-language-model dialog assistant through chat -- a trivial baseline strategy for scalable oversight -- substantially outperform both the model alone and their own unaided performance.
That tasks like MMLU and time-limited QuALITY, where specialists succeed but unaided humans and current AI fail, serve as valid proxies for the challenges of supervising future AI systems that broadly outperform humans.
Humans chatting with an unreliable LLM assistant outperform both the model alone and unaided humans on MMLU and time-limited QuALITY tasks.
References
Formal links
Cited by
Receipt and verification
| First computed | 2026-05-17T23:38:13.752688Z |
|---|---|
| Builder | pith-number-builder-2026-05-17-v1 |
| Signature | Pith Ed25519
(pith-v1-2026-05) · public key |
| Schema | pith-number/v1.0 |
Canonical hash
6414ff37fd844af67435647c71bc8f4f3d78dfb19d13a59aba476e3da5f27cee
Aliases
· · · · ·Agent API
Verify this Pith Number yourself
curl -sH 'Accept: application/ld+json' https://pith.science/pith/MQKP6N75QRFPM5BVMR6HDPEPJ4 \
| jq -c '.canonical_record' \
| python3 -c "import sys,json,hashlib; b=json.dumps(json.loads(sys.stdin.read()), sort_keys=True, separators=(',',':'), ensure_ascii=False).encode(); print(hashlib.sha256(b).hexdigest())"
# expect: 6414ff37fd844af67435647c71bc8f4f3d78dfb19d13a59aba476e3da5f27cee
Canonical record JSON
{
"metadata": {
"abstract_canon_sha256": "9d8be59b4fe0892a9e7bc36fee065f6d3ad2279d919dc940b513451532307883",
"cross_cats_sorted": [
"cs.AI",
"cs.CL"
],
"license": "http://arxiv.org/licenses/nonexclusive-distrib/1.0/",
"primary_cat": "cs.HC",
"submitted_at": "2022-11-04T17:03:49Z",
"title_canon_sha256": "cd7a48720f06f77097cb2d9e16b055b11610edd9b281b874f9a13a8c18b52b2c"
},
"schema_version": "1.0",
"source": {
"id": "2211.03540",
"kind": "arxiv",
"version": 2
}
}