pith. sign in
Pith Number

pith:GNDFWN72

pith:2026:GNDFWN72KKEFXVRCBNX2INK4V6
not attested not anchored not stored refs resolved

From Runnable to Shippable: Multi-Agent Test-Driven Development for Generating Full-Stack Web Applications from Requirements

Jiakai Xu, Jingyu Xiao, Michael R Lyu, Tingshuo Liang, Yintong Huo, Yuxuan Wan

TDDev automates test-driven development so coding agents can generate functional full-stack web apps from requirements

arxiv:2605.17242 v1 · 2026-05-17 · cs.SE

Add to your LaTeX paper
\usepackage{pith}
\pithnumber{GNDFWN72KKEFXVRCBNX2INK4V6}

Prints a linked badge after your title and injects PDF metadata. Compiles on arXiv. Learn more

Record completeness

1 Bitcoin timestamp
2 Internet Archive
3 Author claim open · sign in to claim
4 Citations open
5 Replications open
Portable graph bundle live · download bundle · merged state
The bundle contains the canonical record plus signed events. A mirror can host it anywhere and recompute the same current state with the deterministic merge algorithm.

Claims

C1strongest claim

TDD infrastructure consistently improves generation quality by 34--48 percentage points over a no-TDD baseline. The central finding is that the optimal protocol depends on the model's generation style: models that build applications holistically benefit most from agentic enforcement, while models that extend code conservatively benefit from incremental enforcement.

C2weakest assumption

The assumption that browser-based interaction simulation can reliably detect functional failures and translate them into structured repair reports that the coding agent can act on without human mediation, as this is presented as the core difficulty that current agents cannot perform.

C3one line summary

TDDev automates the full TDD loop for web app generation from requirements, delivering 34-48 percentage point quality gains and zero manual intervention in user studies.

References

59 extracted · 59 resolved · 2 Pith anchors

[1] UI/Application Exerciser Monkey 2023
[2] 17+ Surprising WordPress Statistics You Should Not Miss [2024].WPDe- veloper(2024) 2024
[3] How Many Websites Are There in 2024? (13 Latest Statistics).TechJury (2024) 2024
[4] Nadia Alshahwan, Jubin Chheda, Anastasia Finogenova, Beliz Gokkaya, Mark Harman, Inna Harper, Alexandru Marginean, Shubho Sengupta, and Eddy Wang
[5] InCompanion Proceedings of the ACM International Conference on Foundations of Software Engineering (FSE Companion) · doi:10.1145/3663529.3663839
Receipt and verification
First computed 2026-05-20T00:03:47.134497Z
Builder pith-number-builder-2026-05-17-v1
Signature Pith Ed25519 (pith-v1-2026-05) · public key
Schema pith-number/v1.0

Canonical hash

33465b37fa52885bd6220b6fa4355cafbc9d85e7634acae73e21ca62f9d45c2f

Aliases

arxiv: 2605.17242 · arxiv_version: 2605.17242v1 · doi: 10.48550/arxiv.2605.17242 · pith_short_12: GNDFWN72KKEF · pith_short_16: GNDFWN72KKEFXVRC · pith_short_8: GNDFWN72
Agent API
Verify this Pith Number yourself
curl -sH 'Accept: application/ld+json' https://pith.science/pith/GNDFWN72KKEFXVRCBNX2INK4V6 \
  | jq -c '.canonical_record' \
  | python3 -c "import sys,json,hashlib; b=json.dumps(json.loads(sys.stdin.read()), sort_keys=True, separators=(',',':'), ensure_ascii=False).encode(); print(hashlib.sha256(b).hexdigest())"
# expect: 33465b37fa52885bd6220b6fa4355cafbc9d85e7634acae73e21ca62f9d45c2f
Canonical record JSON
{
  "metadata": {
    "abstract_canon_sha256": "ec43e51d781993157f004d2f4a81905a76800d6adf4c5cb18e48d22478f539b2",
    "cross_cats_sorted": [],
    "license": "http://creativecommons.org/licenses/by-nc-sa/4.0/",
    "primary_cat": "cs.SE",
    "submitted_at": "2026-05-17T03:48:41Z",
    "title_canon_sha256": "83c9de78f150d1c5f3f28e16a196ccf37f6a0df0b0a147a8ab48f5d0c47d32dd"
  },
  "schema_version": "1.0",
  "source": {
    "id": "2605.17242",
    "kind": "arxiv",
    "version": 1
  }
}