pith:6ONVFNPI
Agentic Harness Engineering: Observability-Driven Automatic Evolution of Coding-Agent Harnesses
Three observability pillars let coding-agent harnesses evolve autonomously to beat human designs and transfer across benchmarks.
arxiv:2604.25850 v4 · 2026-04-28 · cs.CL · cs.SE
Add to your LaTeX paper
\usepackage{pith}
\pithnumber{6ONVFNPI3ZGS76LMCWLCTYR5MJ}
Prints a linked badge after your title and injects PDF metadata. Compiles on arXiv. Learn more · Embed verified badge
Record completeness
Claims
ten AHE iterations lift pass@1 on Terminal-Bench 2 from 69.7% to 77.0%, surpassing the human-designed harness Codex-CLI (71.9%) and the self-evolving baselines ACE and TF-GRPO. The frozen harness transfers without re-evolution: on SWE-bench-verified it tops aggregate success at 12% fewer tokens than the seed, and on Terminal-Bench 2 it yields +5.1 to +10.1pp cross-family gains across three alternate model families.
That the three observability pillars sufficiently constrain the action space and provide actionable signal so that the evolution loop produces generalizable improvements rather than benchmark-specific overfitting or noise-driven changes.
AHE automates coding-agent harness evolution via component, experience, and decision observability, raising Terminal-Bench 2 pass@1 from 69.7% to 77.0% with transfer gains across models and benchmarks.
Cited by
Receipt and verification
| First computed | 2026-05-20T00:05:45.401710Z |
|---|---|
| Builder | pith-number-builder-2026-05-17-v1 |
| Signature | Pith Ed25519
(pith-v1-2026-05) · public key |
| Schema | pith-number/v1.0 |
Canonical hash
f39b52b5e8de4d2ff96c159629e23d625e1dc1ec86997f97438faaf942ee54ef
Aliases
· · · · ·Agent API
Verify this Pith Number yourself
curl -sH 'Accept: application/ld+json' https://pith.science/pith/6ONVFNPI3ZGS76LMCWLCTYR5MJ \
| jq -c '.canonical_record' \
| python3 -c "import sys,json,hashlib; b=json.dumps(json.loads(sys.stdin.read()), sort_keys=True, separators=(',',':'), ensure_ascii=False).encode(); print(hashlib.sha256(b).hexdigest())"
# expect: f39b52b5e8de4d2ff96c159629e23d625e1dc1ec86997f97438faaf942ee54ef
Canonical record JSON
{
"metadata": {
"abstract_canon_sha256": "5bbdee89bffc3cd0212d2d5031c2ed5765e647efb58f9c1cd03725e88a9fcd4e",
"cross_cats_sorted": [
"cs.SE"
],
"license": "http://arxiv.org/licenses/nonexclusive-distrib/1.0/",
"primary_cat": "cs.CL",
"submitted_at": "2026-04-28T16:55:02Z",
"title_canon_sha256": "c7b83e7ad798a3955b2ec3441ac7ba8f62c56efb704cc3f89b7bf7fc8e01046f"
},
"schema_version": "1.0",
"source": {
"id": "2604.25850",
"kind": "arxiv",
"version": 4
}
}