Pith Number

pith:RYVQEAUD

pith:2025:RYVQEAUDGW65ZGD55QAOJNHSIW

not attested not anchored not stored refs resolved

Agentic Reinforced Policy Optimization

Fuzheng Zhang, Guanting Dong, Guorui Zhou, Hangyu Mao, Huiyang Wang, Jiazhen Du, Ji-Rong Wen, Kai Ma, Licheng Bao, Yifei Chen, Yutao Zhu, Zhicheng Dou, Zhongxia Chen, Zhongyuan Wang

ARPO improves LLM agent performance on long-horizon tasks by sampling more at high-entropy steps right after each tool call.

arxiv:2507.19849 v1 · 2025-07-26 · cs.LG · cs.AI · cs.CL

Open paper page JSON Open Graph Bundle Merged state Verified badge What is a Pith Number?

Add to your LaTeX paper

\usepackage{pith}
\pithnumber{RYVQEAUDGW65ZGD55QAOJNHSIW}

Prints a linked badge after your title and injects PDF metadata. Compiles on arXiv. Learn more · Embed verified badge

Record completeness

1 Bitcoin timestamp

2 Internet Archive

3 Author claim open · sign in to claim

4 Citations open

5 Replications open

✓ Portable graph bundle live · download bundle · merged state

The bundle contains the canonical record plus signed events. A mirror can host it anywhere and recompute the same current state with the deterministic merge algorithm.

Claims

C1strongest claim

ARPO achieves improved performance using only half of the tool-use budget required by existing methods, offering a scalable solution for aligning LLM-based agents with real-time dynamic environments.

C2weakest assumption

The preliminary observation that LLMs exhibit highly uncertain behavior (increased entropy) immediately following tool interactions is general enough to guide adaptive sampling across tasks and that this mechanism reliably improves long-horizon performance.

C3one line summary

ARPO adds entropy-based adaptive rollouts and stepwise advantage attribution to RL for LLM agents, outperforming prior trajectory-level methods on 13 benchmarks with half the tool budget.

References

11 extracted · 11 resolved · 2 Pith anchors

[1] REINFORCE++: Stabilizing Critic-Free Policy Optimization with Global Advantage Normalization 2020 · doi:10.18653/v1/2020.coling-main.580

[3] Prabha, D., Aswini, J., Maheswari, B., Subramanian, R 2023 · doi:10.18653/v1/2023.findings-emnlp.378

[5] Scaling Relationship on Learning Mathematical Reasoning with Large Language Models 2025 · doi:10.48550/arxiv

[6] thinking while doing 2024

[7] Each interaction response length is capped at 4096 tokens

Formal links

2 machine-checked theorem links

Cited by

26 papers in Pith

Gen-Searcher: Reinforcing Agentic Search for Image Generation

Why Semantic Entropy Fails: Geometry-Aware and Calibrated Uncertainty for Policy Optimization

ClinQueryAgent: A Conversational Agent for Population Health Management

ClawForge: Generating Executable Interactive Benchmarks for Command-Line Agents

Differentiable Mixture-of-Agents Incentivizes Swarm Intelligence of Large Language Models

Receipt and verification

First computed	2026-05-17T23:38:15.333245Z
Builder	pith-number-builder-2026-05-17-v1
Signature	Pith Ed25519 (`pith-v1-2026-05`) · public key
Schema	pith-number/v1.0

Canonical hash

8e2b02028335bddc987dec00e4b4f2459a73c968a690cb2f56fc3280e364f4d7

Aliases

arxiv: 2507.19849 · arxiv_version: 2507.19849v1 · doi: 10.48550/arxiv.2507.19849 · pith_short_12: RYVQEAUDGW65 · pith_short_16: RYVQEAUDGW65ZGD5 · pith_short_8: RYVQEAUD

Agent API

Resolver JSON Graph JSON Events JSON Schema Signing key

Verify this Pith Number yourself

curl -sH 'Accept: application/ld+json' https://pith.science/pith/RYVQEAUDGW65ZGD55QAOJNHSIW \
  | jq -c '.canonical_record' \
  | python3 -c "import sys,json,hashlib; b=json.dumps(json.loads(sys.stdin.read()), sort_keys=True, separators=(',',':'), ensure_ascii=False).encode(); print(hashlib.sha256(b).hexdigest())"
# expect: 8e2b02028335bddc987dec00e4b4f2459a73c968a690cb2f56fc3280e364f4d7

Canonical record JSON

{
  "metadata": {
    "abstract_canon_sha256": "2d063dcb52d9088260070a91f280b9064b4539cd1d082dfcb0de4de283df80a3",
    "cross_cats_sorted": [
      "cs.AI",
      "cs.CL"
    ],
    "license": "http://arxiv.org/licenses/nonexclusive-distrib/1.0/",
    "primary_cat": "cs.LG",
    "submitted_at": "2025-07-26T07:53:11Z",
    "title_canon_sha256": "c6efe2ebcc3ed7ebb55512d4066b4de04d544066275cf02ab776cb1a95f4a0df"
  },
  "schema_version": "1.0",
  "source": {
    "id": "2507.19849",
    "kind": "arxiv",
    "version": 1
  }
}