Workspace-Bench reveals that AI agents achieve only 43.3% average success on workspace tasks with large-scale file dependencies, compared to 80.7% for humans.
Employ any necessary tools—such as code-based parsing scripts or vision-based image conversion—to accurately extract content
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
fields
cs.AI 1years
2026 1verdicts
UNVERDICTED 1representative citing papers
citing papers explorer
-
Workspace-Bench 1.0: Benchmarking AI Agents on Workspace Tasks with Large-Scale File Dependencies
Workspace-Bench reveals that AI agents achieve only 43.3% average success on workspace tasks with large-scale file dependencies, compared to 80.7% for humans.