pith. sign in
Pith Number

pith:U7ZC24LC

pith:2026:U7ZC24LCQ2KKXB5MFGVXKHMLNJ
not attested not anchored not stored refs resolved

Safety-Constrained Reinforcement Learning with Post-Training Reachability Verification for Robot Navigation

Changshun Wu, Jinwei Hu, Qisong He, Xiaowei Huang, Xinmiao Huang, Yi Dong, Zhuoyun Li

CVaR-constrained training produces robot navigation policies with larger obstacle margins that formal reachability verification confirms at higher rates.

arxiv:2605.14174 v1 · 2026-05-13 · cs.RO

Add to your LaTeX paper
\usepackage{pith}
\pithnumber{U7ZC24LCQ2KKXB5MFGVXKHMLNJ}

Prints a linked badge after your title and injects PDF metadata. Compiles on arXiv. Learn more · Embed verified badge

Record completeness

1 Bitcoin timestamp
2 Internet Archive
3 Author claim open · sign in to claim
4 Citations open
5 Replications open
Portable graph bundle live · download bundle · merged state
The bundle contains the canonical record plus signed events. A mirror can host it anywhere and recompute the same current state with the deterministic merge algorithm.

Claims

C1strongest claim

A key finding is that policies trained with CVaR constraints maintain larger safety margins from obstacles across evaluated states. This makes them significantly more amenable to formal reachability verification.

C2weakest assumption

The assumption that bounded observation uncertainty can be accurately modeled and that Taylor Model analysis yields sufficiently tight reachable sets for meaningful safety rate computation.

C3one line summary

CVaR-constrained TD3 policies for robot navigation show larger safety margins and higher post-training reachability verification rates than average-cost baselines across simulated scenarios and real-robot tests.

References

31 extracted · 31 resolved · 2 Pith anchors

[1] Altman,Constrained Markov decision processes 2021
[2] J. Achiam, D. Held, A. Tamar, and P. Abbeel, “Constrained policy optimization,” inInternational conference on machine learning. Pmlr, 2017, pp. 22–31 2017
[3] Reward constrained policy optimization 2018 · arXiv:1805.11074
[4] Learning to walk in the real world with minimal human effort, 2020
[5] Benchmarking Batch Deep Reinforcement Learning Algorithms 1910 · arXiv:1910.01708

Formal links

2 machine-checked theorem links

Receipt and verification
First computed 2026-05-17T23:39:11.321214Z
Builder pith-number-builder-2026-05-17-v1
Signature Pith Ed25519 (pith-v1-2026-05) · public key
Schema pith-number/v1.0

Canonical hash

a7f22d71628694ab87ac29ab751d8b6a4dcbe55c5ab1128023f7cc811b75f1a9

Aliases

arxiv: 2605.14174 · arxiv_version: 2605.14174v1 · doi: 10.48550/arxiv.2605.14174 · pith_short_12: U7ZC24LCQ2KK · pith_short_16: U7ZC24LCQ2KKXB5M · pith_short_8: U7ZC24LC
Agent API
Verify this Pith Number yourself
curl -sH 'Accept: application/ld+json' https://pith.science/pith/U7ZC24LCQ2KKXB5MFGVXKHMLNJ \
  | jq -c '.canonical_record' \
  | python3 -c "import sys,json,hashlib; b=json.dumps(json.loads(sys.stdin.read()), sort_keys=True, separators=(',',':'), ensure_ascii=False).encode(); print(hashlib.sha256(b).hexdigest())"
# expect: a7f22d71628694ab87ac29ab751d8b6a4dcbe55c5ab1128023f7cc811b75f1a9
Canonical record JSON
{
  "metadata": {
    "abstract_canon_sha256": "0b0be7802aeba1ca833695cc858141d72379007655c1bab8fad3d01522642489",
    "cross_cats_sorted": [],
    "license": "http://creativecommons.org/licenses/by/4.0/",
    "primary_cat": "cs.RO",
    "submitted_at": "2026-05-13T22:53:47Z",
    "title_canon_sha256": "6ab7e6657de1a6f15f7fc08002c2eeaf504c17a80c6409a44bd8241cd1fdfb87"
  },
  "schema_version": "1.0",
  "source": {
    "id": "2605.14174",
    "kind": "arxiv",
    "version": 1
  }
}