pith:DDI4OEIX
TERMS-Bench: Diagnosing LLM Negotiation Agents Beyond Deal Rate
A Bayesian-game testbed diagnoses LLM agents in price negotiation by measuring surplus extraction, cue use, and belief calibration rather than deal rate alone.
arxiv:2605.13909 v1 · 2026-05-13 · cs.GT · cs.AI
Add to your LaTeX paper
\usepackage{pith}
\pithnumber{DDI4OEIXPI7Y2SG3WSQ6CL2URP}
Prints a linked badge after your title and injects PDF metadata. Compiles on arXiv. Learn more · Embed verified badge
Record completeness
Claims
Evaluating 13 LLM agents spanning frontier systems, Terms-Bench shows frontier models saturate deal rate yet diverge in surplus extraction, cue use, belief calibration, and compliance, revealing agent-specific bargaining bottlenecks masked by prior benchmarks.
The simulator policy and payoff structure chosen for the bilateral price negotiation accurately capture the strategic and informational features that matter in real human negotiations, so that observed gaps can be attributed to the agent rather than to an unrealistic environment.
Terms-Bench is a diagnostic benchmark for LLM negotiation agents that reveals agent-specific strategic failures beyond simple deal rates by using hidden-type simulators as oracles.
References
Formal links
Receipt and verification
| First computed | 2026-05-17T23:39:18.840890Z |
|---|---|
| Builder | pith-number-builder-2026-05-17-v1 |
| Signature | Pith Ed25519
(pith-v1-2026-05) · public key |
| Schema | pith-number/v1.0 |
Canonical hash
18d1c711177a3f8d48dbb4a1e12f548bc0efe1a77db62b56eabf834c869b52c9
Aliases
· · · · ·Agent API
Verify this Pith Number yourself
curl -sH 'Accept: application/ld+json' https://pith.science/pith/DDI4OEIXPI7Y2SG3WSQ6CL2URP \
| jq -c '.canonical_record' \
| python3 -c "import sys,json,hashlib; b=json.dumps(json.loads(sys.stdin.read()), sort_keys=True, separators=(',',':'), ensure_ascii=False).encode(); print(hashlib.sha256(b).hexdigest())"
# expect: 18d1c711177a3f8d48dbb4a1e12f548bc0efe1a77db62b56eabf834c869b52c9
Canonical record JSON
{
"metadata": {
"abstract_canon_sha256": "cf73b9f7a06c72b9592ace0545de9a6a6dc3c1d6ed96695ce3e8caa82db7e40a",
"cross_cats_sorted": [
"cs.AI"
],
"license": "http://creativecommons.org/licenses/by/4.0/",
"primary_cat": "cs.GT",
"submitted_at": "2026-05-13T06:22:50Z",
"title_canon_sha256": "89383b75c682bf5dd51e91c235c8d967cdbabfae06b7a96beb8335f37b90112b"
},
"schema_version": "1.0",
"source": {
"id": "2605.13909",
"kind": "arxiv",
"version": 1
}
}