pith:WQ66NAS7
FutureSim: Replaying World Events to Evaluate Adaptive Agents
FutureSim evaluates AI agents by replaying real historical events in order and shows even the best achieve only 25 percent accuracy on future predictions.
arxiv:2605.15188 v1 · 2026-05-14 · cs.LG · cs.AI · cs.CL
Add to your LaTeX paper
\usepackage{pith}
\pithnumber{WQ66NAS7QRJQWP4HPYLRQ2N4ST}
Prints a linked badge after your title and injects PDF metadata. Compiles on arXiv. Learn more
Record completeness
Claims
FutureSim reveals a clear separation in their capabilities, with the best agent's accuracy being 25%, and many having worse Brier skill score than making no prediction at all.
That replaying real historical events chronologically without future knowledge leakage accurately measures an agent's adaptive capabilities in open-ended real-world settings.
FutureSim is a benchmark that replays real news from January to March 2026 for AI agents to forecast events, with top accuracy at 25% and some agents worse than no-prediction baselines on Brier skill score.
References
Receipt and verification
| First computed | 2026-05-17T21:40:25.093331Z |
|---|---|
| Last reissued | 2026-05-17T21:57:18.480573Z |
| Builder | pith-number-builder-2026-05-17-v1 |
| Signature | unsigned_v0 |
| Schema | pith-number/v1.0 |
Canonical hash
b43de6825f84530b3f877e171869bc94eab8d04bd392bd457dcb524b154deb46
Aliases
· · · ·Agent API
Verify this Pith Number yourself
curl -sH 'Accept: application/ld+json' https://pith.science/pith/WQ66NAS7QRJQWP4HPYLRQ2N4ST \
| jq -c '.canonical_record' \
| python3 -c "import sys,json,hashlib; b=json.dumps(json.loads(sys.stdin.read()), sort_keys=True, separators=(',',':'), ensure_ascii=False).encode(); print(hashlib.sha256(b).hexdigest())"
# expect: b43de6825f84530b3f877e171869bc94eab8d04bd392bd457dcb524b154deb46
Canonical record JSON
{
"metadata": {
"abstract_canon_sha256": "f79f3e8bfe083d3301c4d1dae9a620b2c49a6264f13323f074fd97ad4e825d76",
"cross_cats_sorted": [
"cs.AI",
"cs.CL"
],
"license": "http://creativecommons.org/licenses/by/4.0/",
"primary_cat": "cs.LG",
"submitted_at": "2026-05-14T17:59:28Z",
"title_canon_sha256": "131de9b90c4210166213f7230b50e3513bf7fc6742b5a6d98d95edbdd3897002"
},
"schema_version": "1.0",
"source": {
"id": "2605.15188",
"kind": "arxiv",
"version": 1
}
}