pith:3BSYH5ML
Is Agentic AI Ready for Real-World Hardware Engineering? A Deep Dive with Phoenix-bench
Software-tuned AI agents struggle with hardware engineering because bugs propagate through signal flows across instantiated modules rather than along call graphs.
arxiv:2605.15226 v1 · 2026-05-13 · cs.AR · cs.AI · cs.SE
Add to your LaTeX paper
\usepackage{pith}
\pithnumber{3BSYH5ML66C6PQIBFRZ4HXGBA3}
Prints a linked badge after your title and injects PDF metadata. Compiles on arXiv. Learn more · Embed verified badge
Record completeness
Claims
Software and hardware are fundamentally different engineering tasks: the same agent loses 37% to 58% from SWE-bench Verified to Phoenix-bench because hardware bugs propagate across parallel instantiated modules through signal flow rather than along a software-style call graph, and software-tuned agents stop at the symptom file instead of tracing back through the instantiation chain.
The 511 instances drawn from 114 GitHub repositories, together with their developer patches and testbenches, form a representative sample of real-world hardware engineering work that requires repository navigation, hierarchy-aware localization, EDA verification, and multi-file patching.
Phoenix-bench shows agentic AI systems lose 37-58% resolved rate when moving from SWE-bench Verified to hardware tasks because bugs spread across parallel modules via signal flow, with testbench feedback lifting performance by 42-45% while file-level oracles add only 1.4%.
References
Formal links
Receipt and verification
| First computed | 2026-05-20T00:00:47.281949Z |
|---|---|
| Builder | pith-number-builder-2026-05-17-v1 |
| Signature | Pith Ed25519
(pith-v1-2026-05) · public key |
| Schema | pith-number/v1.0 |
Canonical hash
d86583f58bf785e7c1012c73c3dcc106d5dca1a9ee298848b42ad23b7e77393f
Aliases
· · · · ·Agent API
Verify this Pith Number yourself
curl -sH 'Accept: application/ld+json' https://pith.science/pith/3BSYH5ML66C6PQIBFRZ4HXGBA3 \
| jq -c '.canonical_record' \
| python3 -c "import sys,json,hashlib; b=json.dumps(json.loads(sys.stdin.read()), sort_keys=True, separators=(',',':'), ensure_ascii=False).encode(); print(hashlib.sha256(b).hexdigest())"
# expect: d86583f58bf785e7c1012c73c3dcc106d5dca1a9ee298848b42ad23b7e77393f
Canonical record JSON
{
"metadata": {
"abstract_canon_sha256": "902eb9cd27c5ad8e290462ad99ed711299f34815d01cb4dc434a1f3b60727fdf",
"cross_cats_sorted": [
"cs.AI",
"cs.SE"
],
"license": "http://creativecommons.org/licenses/by/4.0/",
"primary_cat": "cs.AR",
"submitted_at": "2026-05-13T14:14:54Z",
"title_canon_sha256": "b260dbe540a2309f007647add5e11c061faa3163f40b17033c76fa95326e86d5"
},
"schema_version": "1.0",
"source": {
"id": "2605.15226",
"kind": "arxiv",
"version": 1
}
}