pith. sign in
Pith Number

pith:3BSYH5ML

pith:2026:3BSYH5ML66C6PQIBFRZ4HXGBA3
not attested not anchored not stored refs resolved

Is Agentic AI Ready for Real-World Hardware Engineering? A Deep Dive with Phoenix-bench

Bingsheng He, Feng Yu, Hongshi Tan, Qingyun Zou, WengFai Wong

Software-tuned AI agents struggle with hardware engineering because bugs propagate through signal flows across instantiated modules rather than along call graphs.

arxiv:2605.15226 v1 · 2026-05-13 · cs.AR · cs.AI · cs.SE

Add to your LaTeX paper
\usepackage{pith}
\pithnumber{3BSYH5ML66C6PQIBFRZ4HXGBA3}

Prints a linked badge after your title and injects PDF metadata. Compiles on arXiv. Learn more · Embed verified badge

Record completeness

1 Bitcoin timestamp
2 Internet Archive
3 Author claim open · sign in to claim
4 Citations open
5 Replications open
Portable graph bundle live · download bundle · merged state
The bundle contains the canonical record plus signed events. A mirror can host it anywhere and recompute the same current state with the deterministic merge algorithm.

Claims

C1strongest claim

Software and hardware are fundamentally different engineering tasks: the same agent loses 37% to 58% from SWE-bench Verified to Phoenix-bench because hardware bugs propagate across parallel instantiated modules through signal flow rather than along a software-style call graph, and software-tuned agents stop at the symptom file instead of tracing back through the instantiation chain.

C2weakest assumption

The 511 instances drawn from 114 GitHub repositories, together with their developer patches and testbenches, form a representative sample of real-world hardware engineering work that requires repository navigation, hierarchy-aware localization, EDA verification, and multi-file patching.

C3one line summary

Phoenix-bench shows agentic AI systems lose 37-58% resolved rate when moving from SWE-bench Verified to hardware tasks because bugs spread across parallel modules via signal flow, with testbench feedback lifting performance by 42-45% while file-level oracles add only 1.4%.

References

50 extracted · 50 resolved · 7 Pith anchors

[1] Benchmarking Large Language Models for Automated Verilog 2023
[2] Thakur, Shailja and Ahmad, Baleegh and Pearce, Hammond and Tan, Benjamin and Dolan-Gavitt, Brendan and Karri, Ramesh and Garg, Siddharth , journal=. 2024 , publisher= 2024
[3] Liu, Mingjie and Pinckney, Nathaniel and Khailany, Brucek and Ren, Haoxing , booktitle=. 2023 , organization= 2023
[4] Location is Key: Leveraging 2025
[5] arXiv preprint arXiv:2503.04057 , year=

Formal links

2 machine-checked theorem links

Receipt and verification
First computed 2026-05-20T00:00:47.281949Z
Builder pith-number-builder-2026-05-17-v1
Signature Pith Ed25519 (pith-v1-2026-05) · public key
Schema pith-number/v1.0

Canonical hash

d86583f58bf785e7c1012c73c3dcc106d5dca1a9ee298848b42ad23b7e77393f

Aliases

arxiv: 2605.15226 · arxiv_version: 2605.15226v1 · doi: 10.48550/arxiv.2605.15226 · pith_short_12: 3BSYH5ML66C6 · pith_short_16: 3BSYH5ML66C6PQIB · pith_short_8: 3BSYH5ML
Agent API
Verify this Pith Number yourself
curl -sH 'Accept: application/ld+json' https://pith.science/pith/3BSYH5ML66C6PQIBFRZ4HXGBA3 \
  | jq -c '.canonical_record' \
  | python3 -c "import sys,json,hashlib; b=json.dumps(json.loads(sys.stdin.read()), sort_keys=True, separators=(',',':'), ensure_ascii=False).encode(); print(hashlib.sha256(b).hexdigest())"
# expect: d86583f58bf785e7c1012c73c3dcc106d5dca1a9ee298848b42ad23b7e77393f
Canonical record JSON
{
  "metadata": {
    "abstract_canon_sha256": "902eb9cd27c5ad8e290462ad99ed711299f34815d01cb4dc434a1f3b60727fdf",
    "cross_cats_sorted": [
      "cs.AI",
      "cs.SE"
    ],
    "license": "http://creativecommons.org/licenses/by/4.0/",
    "primary_cat": "cs.AR",
    "submitted_at": "2026-05-13T14:14:54Z",
    "title_canon_sha256": "b260dbe540a2309f007647add5e11c061faa3163f40b17033c76fa95326e86d5"
  },
  "schema_version": "1.0",
  "source": {
    "id": "2605.15226",
    "kind": "arxiv",
    "version": 1
  }
}