Econwebarena: Benchmarking autonomous agents on economic tasks in realistic web environments.arXiv preprint arXiv:2506.08136

Zefang Liu, Yinzhu Quan · arXiv 2506.08136

1 Pith paper cite this work. Polarity classification is still indexing.

1 Pith paper citing it

open full Pith review browse 1 citing papers arXiv PDF

representative citing papers

cs.CL · 2026-04-09 · unverdicted · novelty 7.0

ClawBench is a benchmark of 153 live-web tasks where AI agents achieve low success rates, e.g. 33.3% for Claude Sonnet 4.6.

Showing 1 of 1 citing paper.

ClawBench: Can AI Agents Complete Everyday Online Tasks? cs.CL · 2026-04-09 · unverdicted · none · ref 5 · internal anchor
ClawBench is a benchmark of 153 live-web tasks where AI agents achieve low success rates, e.g. 33.3% for Claude Sonnet 4.6.