pith. machine review for the scientific record. sign in
Pith Number

pith:RIN4LFOD

pith:2023:RIN4LFODOD6LAI57IADUKMTMY2
not attested not anchored not stored refs resolved

Language Models can Solve Computer Tasks

Geunwoo Kim, Pierre Baldi, Stephen McAleer

Pre-trained language models solve novel computer tasks by recursively criticizing and improving their own outputs.

arxiv:2303.17491 v3 · 2023-03-30 · cs.CL · cs.AI · cs.HC · cs.LG

Record completeness

1 Bitcoin timestamp
2 Internet Archive
3 Author claim open · sign in to claim
4 Citations open
5 Replications open
Portable graph bundle live · download bundle · merged state
The bundle contains the canonical record plus signed events. A mirror can host it anywhere and recompute the same current state with the deterministic merge algorithm.

Claims

C1strongest claim

RCI with the InstructGPT-3+RLHF LLM is state-of-the-art on MiniWoB++, using only a handful of demonstrations per task rather than tens of thousands, and without a task-specific reward function.

C2weakest assumption

That the pre-trained LLM already contains sufficient world knowledge and self-critique capability to generate and iteratively refine correct computer actions for novel tasks when given only a few demonstrations and a simple prompting template.

C3one line summary

Pre-trained LLMs using recursive criticism and improvement prompting achieve state-of-the-art results on the MiniWoB++ computer task benchmark with only a handful of demonstrations and no task-specific reward function.

References

102 extracted · 102 resolved · 19 Pith anchors

[1] Do As I Can, Not As I Say: Grounding Language in Robotic Affordances 2022 · arXiv:2204.01691
[2] Flamingo: a visual language model for few-shot learning 2022
[3] Constitutional AI: Harmlessness from AI Feedback 2022 · arXiv:2212.08073
[4] Video pretraining (vpt): Learning to act by watching unlabeled online videos 2022
[5] Language models are few-shot learners 1901

Formal links

1 machine-checked theorem link

Cited by

18 papers in Pith

Receipt and verification
First computed 2026-05-17T23:38:14.139474Z
Builder pith-number-builder-2026-05-17-v1
Signature Pith Ed25519 (pith-v1-2026-05) · public key
Schema pith-number/v1.0

Canonical hash

8a1bc595c370fcb023bf400745326cc699c5b84031d15b20656b81b207f550dc

Aliases

arxiv: 2303.17491 · arxiv_version: 2303.17491v3 · doi: 10.48550/arxiv.2303.17491 · pith_short_12: RIN4LFODOD6L · pith_short_16: RIN4LFODOD6LAI57 · pith_short_8: RIN4LFOD
Agent API
Verify this Pith Number yourself
curl -sH 'Accept: application/ld+json' https://pith.science/pith/RIN4LFODOD6LAI57IADUKMTMY2 \
  | jq -c '.canonical_record' \
  | python3 -c "import sys,json,hashlib; b=json.dumps(json.loads(sys.stdin.read()), sort_keys=True, separators=(',',':'), ensure_ascii=False).encode(); print(hashlib.sha256(b).hexdigest())"
# expect: 8a1bc595c370fcb023bf400745326cc699c5b84031d15b20656b81b207f550dc
Canonical record JSON
{
  "metadata": {
    "abstract_canon_sha256": "69132e31a7eb5356f32a2e7c61b3915b3437ad300418ab3fa4792d45de0579dc",
    "cross_cats_sorted": [
      "cs.AI",
      "cs.HC",
      "cs.LG"
    ],
    "license": "http://arxiv.org/licenses/nonexclusive-distrib/1.0/",
    "primary_cat": "cs.CL",
    "submitted_at": "2023-03-30T16:01:52Z",
    "title_canon_sha256": "3052200996bccbdd3aea210ba8a81dc343126e1d0a133066f47e02fa852b0ed5"
  },
  "schema_version": "1.0",
  "source": {
    "id": "2303.17491",
    "kind": "arxiv",
    "version": 3
  }
}