pith. sign in
Pith Number

pith:B6N5RPC6

pith:2024:B6N5RPC67O33FJY2I5FPMYLZZO
not attested not anchored not stored refs resolved

WorkArena: How Capable Are Web Agents at Solving Common Knowledge Work Tasks?

Alexandre Drouin, Alexandre Lacoste, David Vazquez, Issam H. Laradji, L\'eo Boisvert, Manuel Del Verme, Massimo Caccia, Maxime Gasse, Megh Thakkar, Nicolas Chapados, Quentin Cappart, Tom Marty

Web agents based on large language models show some success on enterprise tasks but leave a large gap to full automation

arxiv:2403.07718 v5 · 2024-03-12 · cs.LG · cs.AI

Add to your LaTeX paper
\usepackage{pith}
\pithnumber{B6N5RPC67O33FJY2I5FPMYLZZO}

Prints a linked badge after your title and injects PDF metadata. Compiles on arXiv. Learn more · Embed verified badge

Record completeness

1 Bitcoin timestamp
2 Internet Archive
3 Author claim open · sign in to claim
4 Citations open
5 Replications open
Portable graph bundle live · download bundle · merged state
The bundle contains the canonical record plus signed events. A mirror can host it anywhere and recompute the same current state with the deterministic merge algorithm.

Claims

C1strongest claim

while current agents show promise on WorkArena, there remains a considerable gap towards achieving full task automation. Notably, our analysis uncovers a significant performance disparity between open and closed-source LLMs

C2weakest assumption

The 33 tasks chosen for WorkArena are representative of the typical daily work of knowledge workers utilizing enterprise software systems.

C3one line summary

WorkArena benchmark shows LLM web agents achieve partial success on enterprise tasks but have a substantial gap to full automation and perform worse with open-source models.

References

36 extracted · 36 resolved · 12 Pith anchors

[1] The unsolved challenges of LLM s in open-ended web tasks: A case study 2023
[2] Brockman, G., Cheung, V., Pettersson, L., Schneider, J., Schulman, J., Tang, J., and Zaremba, W. OpenAI gym, 2016 2016
[3] Mind2Web: Towards a Generalist Agent for the Web 2023 · arXiv:2306.06070
[4] Multimodal web navigation with instruction-finetuned foundation models 2023
[5] Chrome devtools protocol, 2023 2023

Formal links

1 machine-checked theorem link

Cited by

35 papers in Pith

Receipt and verification
First computed 2026-05-17T23:38:53.769379Z
Builder pith-number-builder-2026-05-17-v1
Signature Pith Ed25519 (pith-v1-2026-05) · public key
Schema pith-number/v1.0

Canonical hash

0f9bd8bc5efbb7b2a71a474af66179cb8b53111d2184deeaedb5e532799e08ad

Aliases

arxiv: 2403.07718 · arxiv_version: 2403.07718v5 · doi: 10.48550/arxiv.2403.07718 · pith_short_12: B6N5RPC67O33 · pith_short_16: B6N5RPC67O33FJY2 · pith_short_8: B6N5RPC6
Agent API
Verify this Pith Number yourself
curl -sH 'Accept: application/ld+json' https://pith.science/pith/B6N5RPC67O33FJY2I5FPMYLZZO \
  | jq -c '.canonical_record' \
  | python3 -c "import sys,json,hashlib; b=json.dumps(json.loads(sys.stdin.read()), sort_keys=True, separators=(',',':'), ensure_ascii=False).encode(); print(hashlib.sha256(b).hexdigest())"
# expect: 0f9bd8bc5efbb7b2a71a474af66179cb8b53111d2184deeaedb5e532799e08ad
Canonical record JSON
{
  "metadata": {
    "abstract_canon_sha256": "241cc0cc95b853603ea2fb29976c470fc5f752468f33e0ea0bfdf7a31e2cb398",
    "cross_cats_sorted": [
      "cs.AI"
    ],
    "license": "http://arxiv.org/licenses/nonexclusive-distrib/1.0/",
    "primary_cat": "cs.LG",
    "submitted_at": "2024-03-12T14:58:45Z",
    "title_canon_sha256": "6ac53eeabc9ba4a7957514da4595c3bd216575a61e7de3fd99f2fd3b9d5a0af2"
  },
  "schema_version": "1.0",
  "source": {
    "id": "2403.07718",
    "kind": "arxiv",
    "version": 5
  }
}