pith:H6UDO7IP
SaaSBench: Exploring the Boundaries of Coding Agents in Long-Horizon Enterprise SaaS Engineering
Coding agents fail over 95% of enterprise SaaS tasks before reaching business logic.
arxiv:2605.17526 v1 · 2026-05-17 · cs.SE · cs.AI
Add to your LaTeX paper
\usepackage{pith}
\pithnumber{H6UDO7IPENMTQKXAXTNMJ33II6}
Prints a linked badge after your title and injects PDF metadata. Compiles on arXiv. Learn more
Record completeness
Claims
Over 95% of task failures occur before agents even reach deep business logic, with models often falling victim to overconfidence and prematurely halting during foundational system setup, or getting trapped in ineffective debugging loops.
The 30 tasks and 5,370 validation nodes sufficiently capture the heterogeneity, coupling, and long-horizon constraints of real enterprise SaaS systems without introducing artificial simplifications that favor or penalize particular agent behaviors.
SaaSBench introduces a heterogeneous benchmark for enterprise SaaS engineering and shows that state-of-the-art coding agents fail over 95% of the time before reaching deep business logic due to setup and integration problems.
References
Formal links
Receipt and verification
| First computed | 2026-05-20T00:04:44.046775Z |
|---|---|
| Builder | pith-number-builder-2026-05-17-v1 |
| Signature | Pith Ed25519
(pith-v1-2026-05) · public key |
| Schema | pith-number/v1.0 |
Canonical hash
3fa8377d0f2359382ae0bcdac4ef684789c1edcd81da74c9e5535912cce13f28
Aliases
· · · · ·Agent API
Verify this Pith Number yourself
curl -sH 'Accept: application/ld+json' https://pith.science/pith/H6UDO7IPENMTQKXAXTNMJ33II6 \
| jq -c '.canonical_record' \
| python3 -c "import sys,json,hashlib; b=json.dumps(json.loads(sys.stdin.read()), sort_keys=True, separators=(',',':'), ensure_ascii=False).encode(); print(hashlib.sha256(b).hexdigest())"
# expect: 3fa8377d0f2359382ae0bcdac4ef684789c1edcd81da74c9e5535912cce13f28
Canonical record JSON
{
"metadata": {
"abstract_canon_sha256": "163597358c8bb8172230ee1c963c10c414ddaed2eb7ac29ae3f07ca21a601e21",
"cross_cats_sorted": [
"cs.AI"
],
"license": "http://creativecommons.org/licenses/by/4.0/",
"primary_cat": "cs.SE",
"submitted_at": "2026-05-17T16:15:56Z",
"title_canon_sha256": "8915d7c6cbf7d59a1304c8146e19184686d39e5b04b1747292b3026376a3e130"
},
"schema_version": "1.0",
"source": {
"id": "2605.17526",
"kind": "arxiv",
"version": 1
}
}