pith. sign in
Pith Number

pith:2XD7AD2D

pith:2026:2XD7AD2DV4LE6DKOARAWLZULHA
not attested not anchored not stored refs resolved

AgentLens: Revealing The Lucky Pass Problem in SWE-Agent Evaluation

Benjamin Steenhoek, Gaurav Mittal, Pingping Lin, Priyam Sahoo, Shengjie Ma, Xiaomin Li, Yu Hu

Binary pass rates in SWE-agent tests equate chaotic trial-and-error successes with systematic ones.

arxiv:2605.12925 v1 · 2026-05-13 · cs.SE · cs.AI

Add to your LaTeX paper
\usepackage{pith}
\pithnumber{2XD7AD2DV4LE6DKOARAWLZULHA}

Prints a linked badge after your title and injects PDF metadata. Compiles on arXiv. Learn more · Embed verified badge

Record completeness

1 Bitcoin timestamp
2 Internet Archive
3 Author claim open · sign in to claim
4 Citations open
5 Replications open
Portable graph bundle live · download bundle · merged state
The bundle contains the canonical record plus signed events. A mirror can host it anywhere and recompute the same current state with the deterministic merge algorithm.

Claims

C1strongest claim

Among passing trajectories in this subset, 10.7% exhibit behavior we call a Lucky Pass: regression cycles, blind retries, missing verification, or temporally disordered exploration, implementation, and verification.

C2weakest assumption

That Prefix Tree Acceptor references built by merging multiple passing solutions accurately represent principled behavior without incorporating lucky elements from the source trajectories.

C3one line summary

10.7% of passing SWE-agent trajectories are Lucky Passes with chaotic behaviors, and a quality score based on process references changes model rankings across eight backends.

References

61 extracted · 61 resolved · 14 Pith anchors

[1] arXiv preprint arXiv:2410.20285 , year=
[2] Swe-rebench: An automated pipeline for task collection and decontaminated evaluation of software engineering agents
[3] Islem Bouzenia, Premkumar Devanbu, and Michael Pradel
[4] 2026 , note = 2026
[5] doi:10.5281/zenodo.19357078 , url = · doi:10.5281/zenodo.19357078
Receipt and verification
First computed 2026-05-18T03:09:10.100969Z
Builder pith-number-builder-2026-05-17-v1
Signature Pith Ed25519 (pith-v1-2026-05) · public key
Schema pith-number/v1.0

Canonical hash

d5c7f00f43af164f0d4e044165e68b380302144e136f8c90ba9bcc3473f5877d

Aliases

arxiv: 2605.12925 · arxiv_version: 2605.12925v1 · doi: 10.48550/arxiv.2605.12925 · pith_short_12: 2XD7AD2DV4LE · pith_short_16: 2XD7AD2DV4LE6DKO · pith_short_8: 2XD7AD2D
Agent API
Verify this Pith Number yourself
curl -sH 'Accept: application/ld+json' https://pith.science/pith/2XD7AD2DV4LE6DKOARAWLZULHA \
  | jq -c '.canonical_record' \
  | python3 -c "import sys,json,hashlib; b=json.dumps(json.loads(sys.stdin.read()), sort_keys=True, separators=(',',':'), ensure_ascii=False).encode(); print(hashlib.sha256(b).hexdigest())"
# expect: d5c7f00f43af164f0d4e044165e68b380302144e136f8c90ba9bcc3473f5877d
Canonical record JSON
{
  "metadata": {
    "abstract_canon_sha256": "803d155a77ded41b300fe34dced984f3eb46a0af65c306e31432ddf0c8f2c16a",
    "cross_cats_sorted": [
      "cs.AI"
    ],
    "license": "http://creativecommons.org/licenses/by/4.0/",
    "primary_cat": "cs.SE",
    "submitted_at": "2026-05-13T03:00:57Z",
    "title_canon_sha256": "07872682c0cfb5124101ff197f31db38f83c886e616dae8057e2bc39abffda39"
  },
  "schema_version": "1.0",
  "source": {
    "id": "2605.12925",
    "kind": "arxiv",
    "version": 1
  }
}