Pith Number

pith:WRJXYT43

pith:2026:WRJXYT43L5CHOHEVXRH55FQBDG

not attested not anchored not stored refs resolved

SWE-Cycle: Benchmarking Code Agents across the Complete Issue Resolution Cycle

Hao Guan, Kangning Zhang, Lingyue Fu, Lin Qiu, Shao Zhang, Weinan Zhang, Weiwen Liu, Xuezhi Cao, Xunliang Cai, Yaoming Zhu, Yong Yu

Code agents show sharply lower success rates when handling complete issue resolution autonomously versus in isolated subtasks.

arxiv:2605.13139 v1 · 2026-05-13 · cs.SE

Open paper page JSON Open Graph Bundle Merged state Verified badge What is a Pith Number?

Add to your LaTeX paper

\usepackage{pith}
\pithnumber{WRJXYT43L5CHOHEVXRH55FQBDG}

Prints a linked badge after your title and injects PDF metadata. Compiles on arXiv. Learn more · Embed verified badge

Record completeness

1 Bitcoin timestamp

2 Internet Archive

3 Author claim open · sign in to claim

4 Citations open

5 Replications open

✓ Portable graph bundle live · download bundle · merged state

The bundle contains the canonical record plus signed events. A mirror can host it anywhere and recompute the same current state with the deterministic merge algorithm.

Claims

C1strongest claim

The results reveal a sharp drop in solve rates when transitioning from isolated tasks to FullCycle execution, exposing critical bottlenecks in handling cross-phase dependencies and maintaining code quality.

C2weakest assumption

The 489 rigorously filtered instances and the SWE-Judge evaluation accurately capture practical autonomy without introducing selection bias or verification errors that would change the observed performance drop.

C3one line summary

SWE-Cycle benchmark shows sharp drops in code agent success rates from isolated tasks to full autonomous issue resolution, highlighting cross-phase dependency issues.

References

66 extracted · 66 resolved · 12 Pith anchors

[1] Claude 4.6 sonnet system card

[2] URL https://assets.anthropic.com/m/785e231869ea8b3b/original/ Claude-4-6-Sonnet-System-Card.pdf

[3] Anthropic. Claude code, 2025. URLhttps://github.com/anthropics/claude-code 2025

[4] Introducing claude opus 4.5 2025

[5] Why Do Multi-Agent LLM Systems Fail? 2025 · arXiv:2503.13657

Receipt and verification

First computed	2026-05-18T03:08:57.494442Z
Builder	pith-number-builder-2026-05-17-v1
Signature	Pith Ed25519 (`pith-v1-2026-05`) · public key
Schema	pith-number/v1.0

Canonical hash

b4537c4f9b5f44771c95bc4fde9601198855fb927d0086718ea87b4890ea082b

Aliases

arxiv: 2605.13139 · arxiv_version: 2605.13139v1 · doi: 10.48550/arxiv.2605.13139 · pith_short_12: WRJXYT43L5CH · pith_short_16: WRJXYT43L5CHOHEV · pith_short_8: WRJXYT43

Agent API

Resolver JSON Graph JSON Events JSON Schema Signing key

Verify this Pith Number yourself

curl -sH 'Accept: application/ld+json' https://pith.science/pith/WRJXYT43L5CHOHEVXRH55FQBDG \
  | jq -c '.canonical_record' \
  | python3 -c "import sys,json,hashlib; b=json.dumps(json.loads(sys.stdin.read()), sort_keys=True, separators=(',',':'), ensure_ascii=False).encode(); print(hashlib.sha256(b).hexdigest())"
# expect: b4537c4f9b5f44771c95bc4fde9601198855fb927d0086718ea87b4890ea082b

Canonical record JSON

{
  "metadata": {
    "abstract_canon_sha256": "a632e2b0355406cda9c6b3bf42793c472e422d164019278a193ecf76af82506e",
    "cross_cats_sorted": [],
    "license": "http://creativecommons.org/licenses/by/4.0/",
    "primary_cat": "cs.SE",
    "submitted_at": "2026-05-13T08:05:16Z",
    "title_canon_sha256": "b6ab07ec35f275f76d2d1c83327c48eedca8f2c143c8260e94e88a166c7db4bb"
  },
  "schema_version": "1.0",
  "source": {
    "id": "2605.13139",
    "kind": "arxiv",
    "version": 1
  }
}