pith. machine review for the scientific record. sign in
Pith Number

pith:LLYFYPQ7

pith:2026:LLYFYPQ7KKAXCZ5CC5ET4AMJQ5
not attested not anchored not stored refs pending

Proximal Action Replacement for Behavior Cloning Actor-Critic in Offline Reinforcement Learning

Jianshu Zhang, Jinzong Dong, Nanyang Ye, Qinying Gu, Wei Huang, Xinzhe Yuan, Zhaohui Jiang, Zhuo Chen

Proximal action replacement overcomes the imitation ceiling in BC-regularized actor-critic by substituting suboptimal dataset actions with value-guided improvements.

arxiv:2602.07441 v2 · 2026-02-07 · cs.LG · cs.AI

Record completeness

1 Bitcoin timestamp
2 Internet Archive
3 Author claim open · sign in to claim
4 Citations open
5 Replications open

Claims

C1strongest claim

PAR consistently improves performance across offline RL benchmarks and approaches state-of-the-art results simply by being combined with the basic TD3+BC.

C2weakest assumption

That actions generated by the stable target policy, guided by local ascent of the action-value function and bounded by value uncertainty, can be substituted without destabilizing training or introducing new bias when dataset actions are suboptimal.

C3one line summary

Proximal action replacement breaks the imitation ceiling in BC-regularized offline RL actor-critic by substituting suboptimal dataset actions with value-guided improvements from a stable target policy.

Formal links

2 machine-checked theorem links

Cited by

1 paper in Pith

Receipt and verification
First computed2026-05-17T23:39:00.026294Z
Builderpith-number-builder-2026-05-17-v1
SignaturePith Ed25519 (pith-v1-2026-05) · public key
Schemapith-number/v1.0

Canonical hash

5af05c3e1f52817167a217493e018987643dfe7037bdd563807d036431926018

Aliases

arxiv: 2602.07441 · arxiv_version: 2602.07441v2 · doi: 10.48550/arxiv.2602.07441
Agent API
Verify this Pith Number yourself
curl -sH 'Accept: application/ld+json' https://pith.science/pith/LLYFYPQ7KKAXCZ5CC5ET4AMJQ5 \
  | jq -c '.canonical_record' \
  | python3 -c "import sys,json,hashlib; b=json.dumps(json.loads(sys.stdin.read()), sort_keys=True, separators=(',',':'), ensure_ascii=False).encode(); print(hashlib.sha256(b).hexdigest())"
# expect: 5af05c3e1f52817167a217493e018987643dfe7037bdd563807d036431926018
Canonical record JSON
{
  "metadata": {
    "abstract_canon_sha256": "dd379b69b869ef4149d7ec98a96f6edfb06f271348c99998106e1a120b549e75",
    "cross_cats_sorted": [
      "cs.AI"
    ],
    "license": "http://creativecommons.org/licenses/by/4.0/",
    "primary_cat": "cs.LG",
    "submitted_at": "2026-02-07T08:44:27Z",
    "title_canon_sha256": "11ceda1decf65b89cb4513c31144ecd8ad1132fdbf83a6e4a8002b631649229e"
  },
  "schema_version": "1.0",
  "source": {
    "id": "2602.07441",
    "kind": "arxiv",
    "version": 2
  }
}