pith. sign in
Pith Number

pith:6ONVFNPI

pith:2026:6ONVFNPI3ZGS76LMCWLCTYR5MJ
not attested not anchored not stored refs pending

Agentic Harness Engineering: Observability-Driven Automatic Evolution of Coding-Agent Harnesses

Chengjun Pan, Hang Yan, Jiahang Lin, Lizhi Lin, Shichun Liu, Shihan Dou, Tao Gui, Xuanjing Huang, Yu-Gang Jiang, Zhenhua Han, Zhiheng Xi

Three observability pillars let coding-agent harnesses evolve autonomously to beat human designs and transfer across benchmarks.

arxiv:2604.25850 v4 · 2026-04-28 · cs.CL · cs.SE

Add to your LaTeX paper
\usepackage{pith}
\pithnumber{6ONVFNPI3ZGS76LMCWLCTYR5MJ}

Prints a linked badge after your title and injects PDF metadata. Compiles on arXiv. Learn more · Embed verified badge

Record completeness

1 Bitcoin timestamp
2 Internet Archive
3 Author claim open · sign in to claim
4 Citations open
5 Replications open
Portable graph bundle live · download bundle · merged state
The bundle contains the canonical record plus signed events. A mirror can host it anywhere and recompute the same current state with the deterministic merge algorithm.

Claims

C1strongest claim

ten AHE iterations lift pass@1 on Terminal-Bench 2 from 69.7% to 77.0%, surpassing the human-designed harness Codex-CLI (71.9%) and the self-evolving baselines ACE and TF-GRPO. The frozen harness transfers without re-evolution: on SWE-bench-verified it tops aggregate success at 12% fewer tokens than the seed, and on Terminal-Bench 2 it yields +5.1 to +10.1pp cross-family gains across three alternate model families.

C2weakest assumption

That the three observability pillars sufficiently constrain the action space and provide actionable signal so that the evolution loop produces generalizable improvements rather than benchmark-specific overfitting or noise-driven changes.

C3one line summary

AHE automates coding-agent harness evolution via component, experience, and decision observability, raising Terminal-Bench 2 pass@1 from 69.7% to 77.0% with transfer gains across models and benchmarks.

Cited by

3 papers in Pith

Receipt and verification
First computed 2026-05-20T00:05:45.401710Z
Builder pith-number-builder-2026-05-17-v1
Signature Pith Ed25519 (pith-v1-2026-05) · public key
Schema pith-number/v1.0

Canonical hash

f39b52b5e8de4d2ff96c159629e23d625e1dc1ec86997f97438faaf942ee54ef

Aliases

arxiv: 2604.25850 · arxiv_version: 2604.25850v4 · doi: 10.48550/arxiv.2604.25850 · pith_short_12: 6ONVFNPI3ZGS · pith_short_16: 6ONVFNPI3ZGS76LM · pith_short_8: 6ONVFNPI
Agent API
Verify this Pith Number yourself
curl -sH 'Accept: application/ld+json' https://pith.science/pith/6ONVFNPI3ZGS76LMCWLCTYR5MJ \
  | jq -c '.canonical_record' \
  | python3 -c "import sys,json,hashlib; b=json.dumps(json.loads(sys.stdin.read()), sort_keys=True, separators=(',',':'), ensure_ascii=False).encode(); print(hashlib.sha256(b).hexdigest())"
# expect: f39b52b5e8de4d2ff96c159629e23d625e1dc1ec86997f97438faaf942ee54ef
Canonical record JSON
{
  "metadata": {
    "abstract_canon_sha256": "5bbdee89bffc3cd0212d2d5031c2ed5765e647efb58f9c1cd03725e88a9fcd4e",
    "cross_cats_sorted": [
      "cs.SE"
    ],
    "license": "http://arxiv.org/licenses/nonexclusive-distrib/1.0/",
    "primary_cat": "cs.CL",
    "submitted_at": "2026-04-28T16:55:02Z",
    "title_canon_sha256": "c7b83e7ad798a3955b2ec3441ac7ba8f62c56efb704cc3f89b7bf7fc8e01046f"
  },
  "schema_version": "1.0",
  "source": {
    "id": "2604.25850",
    "kind": "arxiv",
    "version": 4
  }
}