pith. sign in
Pith Number

pith:I5GZ6MFW

pith:2026:I5GZ6MFWK7FFEIOWPADQLIOKE2
not attested not anchored not stored refs resolved

SWE-Chain: Benchmarking Coding Agents on Chained Release-Level Package Upgrades

Chaozheng Wang, Haau-sing Li, Hange Liu, Jen-tse Huang, Jingyu Xiao, Man Ho Lam, Michael R. Lyu, Terry Yue Zhuo

Coding agents resolve an average of 44.8 percent of chained release-level package upgrades while preserving prior functionality.

arxiv:2605.14415 v1 · 2026-05-14 · cs.SE · cs.AI · cs.CL

Add to your LaTeX paper
\usepackage{pith}
\pithnumber{I5GZ6MFWK7FFEIOWPADQLIOKE2}

Prints a linked badge after your title and injects PDF metadata. Compiles on arXiv. Learn more · Embed verified badge

Record completeness

1 Bitcoin timestamp
2 Internet Archive
3 Author claim open · sign in to claim
4 Citations open
5 Replications open
Portable graph bundle live · download bundle · merged state
The bundle contains the canonical record plus signed events. A mirror can host it anywhere and recompute the same current state with the deterministic merge algorithm.

Claims

C1strongest claim

Across nine frontier agent-model configurations, agents achieve an average of 44.8% resolving, 65.4% precision, and 50.2% F1 under the Build+Fix regime, with Claude-Opus-4.7 leading at 60.8% resolving, and current agents still struggle to make correct upgrades across chained package releases without breaking existing functionality.

C2weakest assumption

The divide-and-conquer synthesis pipeline produces upgrade specifications that are both grounded in actual code changes and feasible for agents to implement without introducing artificial simplifications that do not occur in real maintenance.

C3one line summary

SWE-Chain provides 155 chained version transitions and 1,660 requirements across 9 Python packages, where frontier agents resolve 44.8% of tasks on average and struggle to preserve functionality across releases.

References

95 extracted · 95 resolved · 6 Pith anchors

[1] Claude Code overview , author=
[2] Introducing Codex , author=. 2025 , url= 2025
[3] OpenCode: The open source AI coding agent , author=. 2025 , url= 2025
[4] Xu and Xiangru Tang and Mingchen Zhuge and Jiayi Pan and Yueqi Song and Bowen Li and Jaskirat Singh and Hoang H
[5] John Yang and Carlos E Jimenez and Alexander Wettig and Kilian Lieret and Shunyu Yao and Karthik R Narasimhan and Ofir Press , booktitle=
Receipt and verification
First computed 2026-05-17T23:39:07.314178Z
Builder pith-number-builder-2026-05-17-v1
Signature Pith Ed25519 (pith-v1-2026-05) · public key
Schema pith-number/v1.0

Canonical hash

474d9f30b657ca5221d6780705a1ca269b60eecb7ae13c4f9d51845fcff4cd23

Aliases

arxiv: 2605.14415 · arxiv_version: 2605.14415v1 · doi: 10.48550/arxiv.2605.14415 · pith_short_12: I5GZ6MFWK7FF · pith_short_16: I5GZ6MFWK7FFEIOW · pith_short_8: I5GZ6MFW
Agent API
Verify this Pith Number yourself
curl -sH 'Accept: application/ld+json' https://pith.science/pith/I5GZ6MFWK7FFEIOWPADQLIOKE2 \
  | jq -c '.canonical_record' \
  | python3 -c "import sys,json,hashlib; b=json.dumps(json.loads(sys.stdin.read()), sort_keys=True, separators=(',',':'), ensure_ascii=False).encode(); print(hashlib.sha256(b).hexdigest())"
# expect: 474d9f30b657ca5221d6780705a1ca269b60eecb7ae13c4f9d51845fcff4cd23
Canonical record JSON
{
  "metadata": {
    "abstract_canon_sha256": "d07b6fa4a78b1bd684a2c3c1593038313b95af421cf353c5eac080d737a608cf",
    "cross_cats_sorted": [
      "cs.AI",
      "cs.CL"
    ],
    "license": "http://arxiv.org/licenses/nonexclusive-distrib/1.0/",
    "primary_cat": "cs.SE",
    "submitted_at": "2026-05-14T06:04:40Z",
    "title_canon_sha256": "d29a8e6f072e3bfa4839c7409bdd0d6d6e603043268be74ba3cb087ad580d879"
  },
  "schema_version": "1.0",
  "source": {
    "id": "2605.14415",
    "kind": "arxiv",
    "version": 1
  }
}