pith:2XD7AD2D
AgentLens: Revealing The Lucky Pass Problem in SWE-Agent Evaluation
Binary pass rates in SWE-agent tests equate chaotic trial-and-error successes with systematic ones.
arxiv:2605.12925 v1 · 2026-05-13 · cs.SE · cs.AI
Add to your LaTeX paper
\usepackage{pith}
\pithnumber{2XD7AD2DV4LE6DKOARAWLZULHA}
Prints a linked badge after your title and injects PDF metadata. Compiles on arXiv. Learn more · Embed verified badge
Record completeness
Claims
Among passing trajectories in this subset, 10.7% exhibit behavior we call a Lucky Pass: regression cycles, blind retries, missing verification, or temporally disordered exploration, implementation, and verification.
That Prefix Tree Acceptor references built by merging multiple passing solutions accurately represent principled behavior without incorporating lucky elements from the source trajectories.
10.7% of passing SWE-agent trajectories are Lucky Passes with chaotic behaviors, and a quality score based on process references changes model rankings across eight backends.
References
Receipt and verification
| First computed | 2026-05-18T03:09:10.100969Z |
|---|---|
| Builder | pith-number-builder-2026-05-17-v1 |
| Signature | Pith Ed25519
(pith-v1-2026-05) · public key |
| Schema | pith-number/v1.0 |
Canonical hash
d5c7f00f43af164f0d4e044165e68b380302144e136f8c90ba9bcc3473f5877d
Aliases
· · · · ·Agent API
Verify this Pith Number yourself
curl -sH 'Accept: application/ld+json' https://pith.science/pith/2XD7AD2DV4LE6DKOARAWLZULHA \
| jq -c '.canonical_record' \
| python3 -c "import sys,json,hashlib; b=json.dumps(json.loads(sys.stdin.read()), sort_keys=True, separators=(',',':'), ensure_ascii=False).encode(); print(hashlib.sha256(b).hexdigest())"
# expect: d5c7f00f43af164f0d4e044165e68b380302144e136f8c90ba9bcc3473f5877d
Canonical record JSON
{
"metadata": {
"abstract_canon_sha256": "803d155a77ded41b300fe34dced984f3eb46a0af65c306e31432ddf0c8f2c16a",
"cross_cats_sorted": [
"cs.AI"
],
"license": "http://creativecommons.org/licenses/by/4.0/",
"primary_cat": "cs.SE",
"submitted_at": "2026-05-13T03:00:57Z",
"title_canon_sha256": "07872682c0cfb5124101ff197f31db38f83c886e616dae8057e2bc39abffda39"
},
"schema_version": "1.0",
"source": {
"id": "2605.12925",
"kind": "arxiv",
"version": 1
}
}