Pith Number

pith:B6N5RPC6

pith:2024:B6N5RPC67O33FJY2I5FPMYLZZO

not attested not anchored not stored refs resolved

WorkArena: How Capable Are Web Agents at Solving Common Knowledge Work Tasks?

Alexandre Drouin, Alexandre Lacoste, David Vazquez, Issam H. Laradji, L\'eo Boisvert, Manuel Del Verme, Massimo Caccia, Maxime Gasse, Megh Thakkar, Nicolas Chapados, Quentin Cappart, Tom Marty

Web agents based on large language models show some success on enterprise tasks but leave a large gap to full automation

arxiv:2403.07718 v5 · 2024-03-12 · cs.LG · cs.AI

Open paper page JSON Open Graph Bundle Merged state Verified badge What is a Pith Number?

Add to your LaTeX paper

\usepackage{pith}
\pithnumber{B6N5RPC67O33FJY2I5FPMYLZZO}

Prints a linked badge after your title and injects PDF metadata. Compiles on arXiv. Learn more · Embed verified badge

Record completeness

1 Bitcoin timestamp

2 Internet Archive

3 Author claim open · sign in to claim

4 Citations open

5 Replications open

✓ Portable graph bundle live · download bundle · merged state

The bundle contains the canonical record plus signed events. A mirror can host it anywhere and recompute the same current state with the deterministic merge algorithm.

Claims

C1strongest claim

while current agents show promise on WorkArena, there remains a considerable gap towards achieving full task automation. Notably, our analysis uncovers a significant performance disparity between open and closed-source LLMs

C2weakest assumption

The 33 tasks chosen for WorkArena are representative of the typical daily work of knowledge workers utilizing enterprise software systems.

C3one line summary

WorkArena benchmark shows LLM web agents achieve partial success on enterprise tasks but have a substantial gap to full automation and perform worse with open-source models.

References

36 extracted · 36 resolved · 12 Pith anchors

[1] The unsolved challenges of LLM s in open-ended web tasks: A case study 2023

[2] Brockman, G., Cheung, V., Pettersson, L., Schneider, J., Schulman, J., Tang, J., and Zaremba, W. OpenAI gym, 2016 2016

[3] Mind2Web: Towards a Generalist Agent for the Web 2023 · arXiv:2306.06070

[4] Multimodal web navigation with instruction-finetuned foundation models 2023

[5] Chrome devtools protocol, 2023 2023

Formal links

1 machine-checked theorem link

Cited by

35 papers in Pith

WebGameBench: Requirement-to-Application Evaluation for Coding Agents via Browser-Native Games

Beyond Benchmark Islands: Toward Representative Trustworthiness Evaluation for Agentic AI

$\pi$-Bench: Evaluating Proactive Personal Assistant Agents in Long-Horizon Workflows

ChromaFlow: A Negative Ablation Study of Orchestration Overhead in Tool-Augmented Agent Evaluation

SaaS-Bench: Can Computer-Use Agents Leverage Real-World SaaS to Solve Professional Workflows?

Receipt and verification

First computed	2026-05-17T23:38:53.769379Z
Builder	pith-number-builder-2026-05-17-v1
Signature	Pith Ed25519 (`pith-v1-2026-05`) · public key
Schema	pith-number/v1.0

Canonical hash

0f9bd8bc5efbb7b2a71a474af66179cb8b53111d2184deeaedb5e532799e08ad

Aliases

arxiv: 2403.07718 · arxiv_version: 2403.07718v5 · doi: 10.48550/arxiv.2403.07718 · pith_short_12: B6N5RPC67O33 · pith_short_16: B6N5RPC67O33FJY2 · pith_short_8: B6N5RPC6

Agent API

Resolver JSON Graph JSON Events JSON Schema Signing key

Verify this Pith Number yourself

curl -sH 'Accept: application/ld+json' https://pith.science/pith/B6N5RPC67O33FJY2I5FPMYLZZO \
  | jq -c '.canonical_record' \
  | python3 -c "import sys,json,hashlib; b=json.dumps(json.loads(sys.stdin.read()), sort_keys=True, separators=(',',':'), ensure_ascii=False).encode(); print(hashlib.sha256(b).hexdigest())"
# expect: 0f9bd8bc5efbb7b2a71a474af66179cb8b53111d2184deeaedb5e532799e08ad

Canonical record JSON

{
  "metadata": {
    "abstract_canon_sha256": "241cc0cc95b853603ea2fb29976c470fc5f752468f33e0ea0bfdf7a31e2cb398",
    "cross_cats_sorted": [
      "cs.AI"
    ],
    "license": "http://arxiv.org/licenses/nonexclusive-distrib/1.0/",
    "primary_cat": "cs.LG",
    "submitted_at": "2024-03-12T14:58:45Z",
    "title_canon_sha256": "6ac53eeabc9ba4a7957514da4595c3bd216575a61e7de3fd99f2fd3b9d5a0af2"
  },
  "schema_version": "1.0",
  "source": {
    "id": "2403.07718",
    "kind": "arxiv",
    "version": 5
  }
}