pith. sign in
Pith Number

pith:H6UDO7IP

pith:2026:H6UDO7IPENMTQKXAXTNMJ33II6
not attested not anchored not stored refs resolved

SaaSBench: Exploring the Boundaries of Coding Agents in Long-Horizon Enterprise SaaS Engineering

Feng Zhao, Kou Shi, Lin Chen, Qingnan Ren, Qisheng Su, Shiting Huang, Shun Zou, Xiangxiang Chu, Yiming Zhao, Yong Wang, Yu Zeng, Zehui Chen, Zhen Fang, Ziao Zhang

Coding agents fail over 95% of enterprise SaaS tasks before reaching business logic.

arxiv:2605.17526 v1 · 2026-05-17 · cs.SE · cs.AI

Add to your LaTeX paper
\usepackage{pith}
\pithnumber{H6UDO7IPENMTQKXAXTNMJ33II6}

Prints a linked badge after your title and injects PDF metadata. Compiles on arXiv. Learn more

Record completeness

1 Bitcoin timestamp
2 Internet Archive
3 Author claim open · sign in to claim
4 Citations open
5 Replications open
Portable graph bundle live · download bundle · merged state
The bundle contains the canonical record plus signed events. A mirror can host it anywhere and recompute the same current state with the deterministic merge algorithm.

Claims

C1strongest claim

Over 95% of task failures occur before agents even reach deep business logic, with models often falling victim to overconfidence and prematurely halting during foundational system setup, or getting trapped in ineffective debugging loops.

C2weakest assumption

The 30 tasks and 5,370 validation nodes sufficiently capture the heterogeneity, coupling, and long-horizon constraints of real enterprise SaaS systems without introducing artificial simplifications that favor or penalize particular agent behaviors.

C3one line summary

SaaSBench introduces a heterogeneous benchmark for enterprise SaaS engineering and shows that state-of-the-art coding agents fail over 95% of the time before reaching deep business logic due to setup and integration problems.

References

56 extracted · 56 resolved · 8 Pith anchors

[1] Claude code: Ai-powered coding assistant, 2024 2024
[2] System card: Claude opus 4 & claude sonnet 4, 2025 2025
[3] System card: Claude sonnet 4.5, 2025 2025
[4] Introducing Claude Opus 4.7, 2026 2026
[5] Program Synthesis with Large Language Models 2021 · arXiv:2108.07732

Formal links

2 machine-checked theorem links

Receipt and verification
First computed 2026-05-20T00:04:44.046775Z
Builder pith-number-builder-2026-05-17-v1
Signature Pith Ed25519 (pith-v1-2026-05) · public key
Schema pith-number/v1.0

Canonical hash

3fa8377d0f2359382ae0bcdac4ef684789c1edcd81da74c9e5535912cce13f28

Aliases

arxiv: 2605.17526 · arxiv_version: 2605.17526v1 · doi: 10.48550/arxiv.2605.17526 · pith_short_12: H6UDO7IPENMT · pith_short_16: H6UDO7IPENMTQKXA · pith_short_8: H6UDO7IP
Agent API
Verify this Pith Number yourself
curl -sH 'Accept: application/ld+json' https://pith.science/pith/H6UDO7IPENMTQKXAXTNMJ33II6 \
  | jq -c '.canonical_record' \
  | python3 -c "import sys,json,hashlib; b=json.dumps(json.loads(sys.stdin.read()), sort_keys=True, separators=(',',':'), ensure_ascii=False).encode(); print(hashlib.sha256(b).hexdigest())"
# expect: 3fa8377d0f2359382ae0bcdac4ef684789c1edcd81da74c9e5535912cce13f28
Canonical record JSON
{
  "metadata": {
    "abstract_canon_sha256": "163597358c8bb8172230ee1c963c10c414ddaed2eb7ac29ae3f07ca21a601e21",
    "cross_cats_sorted": [
      "cs.AI"
    ],
    "license": "http://creativecommons.org/licenses/by/4.0/",
    "primary_cat": "cs.SE",
    "submitted_at": "2026-05-17T16:15:56Z",
    "title_canon_sha256": "8915d7c6cbf7d59a1304c8146e19184686d39e5b04b1747292b3026376a3e130"
  },
  "schema_version": "1.0",
  "source": {
    "id": "2605.17526",
    "kind": "arxiv",
    "version": 1
  }
}