SaaS-Bench provides 106 realistic professional tasks across 23 deployable SaaS platforms to evaluate LLM-based agents, finding that even the strongest models complete fewer than 4% of tasks end-to-end.
arXiv preprint arXiv:2510.13394 , year=
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
fields
cs.AI 1years
2026 1verdicts
UNVERDICTED 1representative citing papers
citing papers explorer
-
SaaS-Bench: Can Computer-Use Agents Leverage Real-World SaaS to Solve Professional Workflows?
SaaS-Bench provides 106 realistic professional tasks across 23 deployable SaaS platforms to evaluate LLM-based agents, finding that even the strongest models complete fewer than 4% of tasks end-to-end.