pith. sign in
Pith Number

pith:IYJKHMX7

pith:2026:IYJKHMX7J73ZG5LO4MMAPPN655
not attested not anchored not stored refs resolved

ContractBench: Can LLM Agents Preserve Observation Contracts?

Arkaprava De, Hanwen Xing, Hao Chen, Jicheng Wang, Yifeng He, Zili Wang

LLM agents must preserve observation contracts like tokens and presigned URLs, yet current models routinely fail at this separate capability.

arxiv:2605.17281 v1 · 2026-05-17 · cs.SE · cs.AI

Add to your LaTeX paper
\usepackage{pith}
\pithnumber{IYJKHMX7J73ZG5LO4MMAPPN655}

Prints a linked badge after your title and injects PDF metadata. Compiles on arXiv. Learn more · Embed verified badge

Record completeness

1 Bitcoin timestamp
2 Internet Archive
3 Author claim open · sign in to claim
4 Citations open
5 Replications open
Portable graph bundle live · download bundle · merged state
The bundle contains the canonical record plus signed events. A mirror can host it anywhere and recompute the same current state with the deterministic merge algorithm.

Claims

C1strongest claim

no evaluated model clears 80%, with Claude-Opus-4.6 leading at 77.8%, revealing that current frontier models still fail to comply with observation contracts

C2weakest assumption

The 33 dual-axis tasks and their failure labels drawn from real-world API specifications sufficiently capture the observation-contract compliance problem that arises in deployed tool-augmented agents.

C3one line summary

ContractBench shows that LLM agents frequently violate observation contracts by using expired artifacts or corrupting their byte integrity, with no model exceeding 80% success and notable scaling irregularities across families.

References

44 extracted · 44 resolved · 1 Pith anchors

[1] 2026 , eprint = 2026
[2] 2026 , eprint = 2026
[3] Programming semantics for multiprogrammed computations.Commun 1966 · doi:10.1145/365230.365252
[4] 2024 , url = 2024
[5] AgentBench: Evaluating 2024

Formal links

2 machine-checked theorem links

Receipt and verification
First computed 2026-05-20T00:03:49.580728Z
Builder pith-number-builder-2026-05-17-v1
Signature Pith Ed25519 (pith-v1-2026-05) · public key
Schema pith-number/v1.0

Canonical hash

4612a3b2ff4ff793756ee31807bdbeef45529d1249e54c0aff62f318755c7218

Aliases

arxiv: 2605.17281 · arxiv_version: 2605.17281v1 · doi: 10.48550/arxiv.2605.17281 · pith_short_12: IYJKHMX7J73Z · pith_short_16: IYJKHMX7J73ZG5LO · pith_short_8: IYJKHMX7
Agent API
Verify this Pith Number yourself
curl -sH 'Accept: application/ld+json' https://pith.science/pith/IYJKHMX7J73ZG5LO4MMAPPN655 \
  | jq -c '.canonical_record' \
  | python3 -c "import sys,json,hashlib; b=json.dumps(json.loads(sys.stdin.read()), sort_keys=True, separators=(',',':'), ensure_ascii=False).encode(); print(hashlib.sha256(b).hexdigest())"
# expect: 4612a3b2ff4ff793756ee31807bdbeef45529d1249e54c0aff62f318755c7218
Canonical record JSON
{
  "metadata": {
    "abstract_canon_sha256": "a41d80bf262fe501d30e568d92f73f98eb919970804ba764aba2c6acf257fb86",
    "cross_cats_sorted": [
      "cs.AI"
    ],
    "license": "http://creativecommons.org/licenses/by/4.0/",
    "primary_cat": "cs.SE",
    "submitted_at": "2026-05-17T06:37:04Z",
    "title_canon_sha256": "29507e0cd5028ab233f6d828aa1c66393e13166fb034674024c2886cf2f0971d"
  },
  "schema_version": "1.0",
  "source": {
    "id": "2605.17281",
    "kind": "arxiv",
    "version": 1
  }
}