pith. sign in
Pith Number

pith:P3SW5TQD

pith:2026:P3SW5TQDWJ66SP67MI52MELCU3
not attested not anchored not stored refs pending

Precise Debugging Benchmark: Is Your Model Debugging or Regenerating?

Honghua Dong, Miaosen Chai, Robin Jia, Shangshang Wang, Song Bian, Wang Bill Zhu, Willie Neiswanger, Yejia Liu

Frontier LLMs pass debugging tests above 76 percent yet edit with precision below 45 percent, often regenerating entire solutions instead of making targeted fixes.

arxiv:2604.17338 v4 · 2026-04-19 · cs.SE · cs.CL

Add to your LaTeX paper
\usepackage{pith}
\pithnumber{P3SW5TQDWJ66SP67MI52MELCU3}

Prints a linked badge after your title and injects PDF metadata. Compiles on arXiv. Learn more · Embed verified badge

Record completeness

1 Bitcoin timestamp
2 Internet Archive
3 Author claim open · sign in to claim
4 Citations open
5 Replications open
Portable graph bundle live · download bundle · merged state
The bundle contains the canonical record plus signed events. A mirror can host it anywhere and recompute the same current state with the deterministic merge algorithm.

Claims

C1strongest claim

frontier models, such as GPT-5.1-Codex and DeepSeek-V3.2-Thinking, achieve unit-test pass rates above 76% but exhibit precision below 45%, even when explicitly instructed to perform minimal debugging.

C2weakest assumption

The automatically synthesized verified atomic bugs and their compositions accurately represent real-world debugging scenarios, and the new edit-level precision and bug-level recall metrics validly measure 'precise debugging' behavior.

C3one line summary

Frontier LLMs pass unit tests over 76% of the time on debugging tasks but achieve edit precision below 45%, indicating regeneration rather than precise debugging.

Receipt and verification
First computed 2026-05-20T00:02:11.746673Z
Builder pith-number-builder-2026-05-17-v1
Signature Pith Ed25519 (pith-v1-2026-05) · public key
Schema pith-number/v1.0

Canonical hash

7ee56ece03b27de93fdf623ba61162a6eebf7b5c39e66ab2415d64668e99915c

Aliases

arxiv: 2604.17338 · arxiv_version: 2604.17338v4 · doi: 10.48550/arxiv.2604.17338 · pith_short_12: P3SW5TQDWJ66 · pith_short_16: P3SW5TQDWJ66SP67 · pith_short_8: P3SW5TQD
Agent API
Verify this Pith Number yourself
curl -sH 'Accept: application/ld+json' https://pith.science/pith/P3SW5TQDWJ66SP67MI52MELCU3 \
  | jq -c '.canonical_record' \
  | python3 -c "import sys,json,hashlib; b=json.dumps(json.loads(sys.stdin.read()), sort_keys=True, separators=(',',':'), ensure_ascii=False).encode(); print(hashlib.sha256(b).hexdigest())"
# expect: 7ee56ece03b27de93fdf623ba61162a6eebf7b5c39e66ab2415d64668e99915c
Canonical record JSON
{
  "metadata": {
    "abstract_canon_sha256": "603fc203afb649abde113a1836b1896540aab56d93223a4ca49f3caeb2df3143",
    "cross_cats_sorted": [
      "cs.CL"
    ],
    "license": "http://creativecommons.org/licenses/by/4.0/",
    "primary_cat": "cs.SE",
    "submitted_at": "2026-04-19T09:08:23Z",
    "title_canon_sha256": "ae186650e23b68441935990920cd0cedd02e5a1ea8d2f2523644cc1cb669b4ac"
  },
  "schema_version": "1.0",
  "source": {
    "id": "2604.17338",
    "kind": "arxiv",
    "version": 4
  }
}