Pith Number

pith:ZRRBVFRC

pith:2026:ZRRBVFRC3IY2Q7TFZQ5I6WPDGE

not attested not anchored not stored refs resolved

FutureWorld: A Live Reinforcement Learning Environment for Predictive Agents with Real-World Outcome Rewards

Chuyang Wei, Haoxiang Guan, Jian Li, Jiyan He, Kefei Chen, Maohang Gao, Mengting Hu, Shuxin Zheng, Xiawei Yue, Yanzhi Zhang, Yitong Duan, Yu Shi, Yu Zhuang, Zhixin Han

Delayed real-world outcome feedback serves as an effective reinforcement learning signal for predictive agents.

arxiv:2604.26733 v4 · 2026-04-29 · cs.AI · cs.LG

Open paper page JSON Open Graph Bundle Merged state Verified badge What is a Pith Number?

Add to your LaTeX paper

\usepackage{pith}
\pithnumber{ZRRBVFRC3IY2Q7TFZQ5I6WPDGE}

Prints a linked badge after your title and injects PDF metadata. Compiles on arXiv. Learn more · Embed verified badge

Record completeness

1 Bitcoin timestamp

2 Internet Archive

3 Author claim open · sign in to claim

4 Citations open

5 Replications open

✓ Portable graph bundle live · download bundle · merged state

The bundle contains the canonical record plus signed events. A mirror can host it anywhere and recompute the same current state with the deterministic merge algorithm.

Claims

C1strongest claim

Across three open-source agents, successive FutureWorld training rounds lead to consistent improvements in prediction accuracy, probabilistic scoring, and calibration, demonstrating that delayed real-world outcome feedback can serve as an effective reinforcement learning signal.

C2weakest assumption

The method assumes that real-world outcomes can be obtained, accurately matched to specific stored predictions, and converted into unbiased reward signals without significant delays, selection effects, or data leakage that would distort the policy updates.

C3one line summary

FutureWorld is a modified verl-tool framework that enables delayed real-world outcome rewards for training LLM-based predictive agents, yielding consistent gains in accuracy, scoring, and calibration across three open-source models.

References

3 extracted · 3 resolved · 1 Pith anchors

[1] arXiv preprint arXiv:2502.01600 , year= 2023

[2] DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning 2024 · doi:10.1609/aaai.v34i05.6297

[3] V isual W eb A rena: Evaluating Multimodal Agents on Realistic Visual Web Tasks 2005 · doi:10.18653/v1/2024.acl-long.50

Formal links

2 machine-checked theorem links

Receipt and verification

First computed	2026-05-20T00:00:39.697935Z
Builder	pith-number-builder-2026-05-17-v1
Signature	Pith Ed25519 (`pith-v1-2026-05`) · public key
Schema	pith-number/v1.0

Canonical hash

cc621a9622da31a87e65cc3a8f59e3311c915d9a30e9206d91718f9861cc1ea4

Aliases

arxiv: 2604.26733 · arxiv_version: 2604.26733v4 · doi: 10.48550/arxiv.2604.26733 · pith_short_12: ZRRBVFRC3IY2 · pith_short_16: ZRRBVFRC3IY2Q7TF · pith_short_8: ZRRBVFRC

Agent API

Resolver JSON Graph JSON Events JSON Schema Signing key

Verify this Pith Number yourself

curl -sH 'Accept: application/ld+json' https://pith.science/pith/ZRRBVFRC3IY2Q7TFZQ5I6WPDGE \
  | jq -c '.canonical_record' \
  | python3 -c "import sys,json,hashlib; b=json.dumps(json.loads(sys.stdin.read()), sort_keys=True, separators=(',',':'), ensure_ascii=False).encode(); print(hashlib.sha256(b).hexdigest())"
# expect: cc621a9622da31a87e65cc3a8f59e3311c915d9a30e9206d91718f9861cc1ea4

Canonical record JSON

{
  "metadata": {
    "abstract_canon_sha256": "1c4e0d12eda73b31fa140cf80883a3a49419f5ba59c327ee23eb38c750e27922",
    "cross_cats_sorted": [
      "cs.LG"
    ],
    "license": "http://arxiv.org/licenses/nonexclusive-distrib/1.0/",
    "primary_cat": "cs.AI",
    "submitted_at": "2026-04-29T14:34:45Z",
    "title_canon_sha256": "85f9c5b8136842b83f482a6354a7bed6fa2d6eb53d74624fd9ee0ea57e87eba1"
  },
  "schema_version": "1.0",
  "source": {
    "id": "2604.26733",
    "kind": "arxiv",
    "version": 4
  }
}