pith. sign in
Pith Number

pith:YUTF4YCI

pith:2024:YUTF4YCITC3EGOBXRLV65Q3MUC
not attested not anchored not stored refs resolved

GPT-4V(ision) is a Generalist Web Agent, if Grounded

Boyuan Zheng, Boyu Gou, Huan Sun, Jihyung Kil, Yu Su

GPT-4V completes 51.1 percent of tasks on live websites when its textual plans are manually grounded into actions.

arxiv:2401.01614 v2 · 2024-01-03 · cs.IR · cs.AI · cs.CL · cs.CV

Add to your LaTeX paper
\usepackage{pith}
\pithnumber{YUTF4YCITC3EGOBXRLV65Q3MUC}

Prints a linked badge after your title and injects PDF metadata. Compiles on arXiv. Learn more · Embed verified badge

Record completeness

1 Bitcoin timestamp
2 Internet Archive
3 Author claim open · sign in to claim
4 Citations open
5 Replications open
Portable graph bundle live · download bundle · merged state
The bundle contains the canonical record plus signed events. A mirror can host it anywhere and recompute the same current state with the deterministic merge algorithm.

Claims

C1strongest claim

we show that GPT-4V presents a great potential for web agents -- it can successfully complete 51.1 of the tasks on live websites if we manually ground its textual plans into actions on the websites.

C2weakest assumption

That manual grounding of the model's textual plans provides a valid upper-bound proxy for evaluating the agent's planning and reasoning capability, while automatic grounding methods remain underdeveloped.

C3one line summary

GPT-4V achieves 51.1% success on live web tasks as a generalist agent when plans are manually grounded, outperforming text-only models, but automatic grounding lags far behind oracle performance.

References

42 extracted · 42 resolved · 13 Pith anchors

[1] Flamingo: a Visual Language Model for Few-Shot Learning · arXiv:2204.14198
[2] org/CorpusID:248476411
[3] Shikra: Unleashing Multimodal LLM's Referential Dialogue Magic 2021 · arXiv:2306.15195
[4] org/CorpusID:259262082
[5] Scaling Instruction-Finetuned Language Models · arXiv:2210.11416

Formal links

2 machine-checked theorem links

Cited by

34 papers in Pith

Receipt and verification
First computed 2026-05-17T23:38:50.385915Z
Builder pith-number-builder-2026-05-17-v1
Signature Pith Ed25519 (pith-v1-2026-05) · public key
Schema pith-number/v1.0

Canonical hash

c5265e604898b64338378aebeec36ca0a9bd6641f715b6691a2cec878dad0d8f

Aliases

arxiv: 2401.01614 · arxiv_version: 2401.01614v2 · doi: 10.48550/arxiv.2401.01614 · pith_short_12: YUTF4YCITC3E · pith_short_16: YUTF4YCITC3EGOBX · pith_short_8: YUTF4YCI
Agent API
Verify this Pith Number yourself
curl -sH 'Accept: application/ld+json' https://pith.science/pith/YUTF4YCITC3EGOBXRLV65Q3MUC \
  | jq -c '.canonical_record' \
  | python3 -c "import sys,json,hashlib; b=json.dumps(json.loads(sys.stdin.read()), sort_keys=True, separators=(',',':'), ensure_ascii=False).encode(); print(hashlib.sha256(b).hexdigest())"
# expect: c5265e604898b64338378aebeec36ca0a9bd6641f715b6691a2cec878dad0d8f
Canonical record JSON
{
  "metadata": {
    "abstract_canon_sha256": "c5f8fd01b2b3b4d5ef5e685f92a621b6282414be9d2e0bdc8ea8d15f6eb155eb",
    "cross_cats_sorted": [
      "cs.AI",
      "cs.CL",
      "cs.CV"
    ],
    "license": "http://arxiv.org/licenses/nonexclusive-distrib/1.0/",
    "primary_cat": "cs.IR",
    "submitted_at": "2024-01-03T08:33:09Z",
    "title_canon_sha256": "226fa896a9db28a6cfee31311a098a43f3414f63a0fab3c5023cb7fce7453933"
  },
  "schema_version": "1.0",
  "source": {
    "id": "2401.01614",
    "kind": "arxiv",
    "version": 2
  }
}