{"record_type":"pith_number_record","schema_url":"https://pith.science/schemas/pith-number/v1.json","pith_number":"pith:2026:U7ZC24LCQ2KKXB5MFGVXKHMLNJ","short_pith_number":"pith:U7ZC24LC","schema_version":"1.0","canonical_sha256":"a7f22d71628694ab87ac29ab751d8b6a4dcbe55c5ab1128023f7cc811b75f1a9","source":{"kind":"arxiv","id":"2605.14174","version":1},"attestation_state":"computed","paper":{"title":"Safety-Constrained Reinforcement Learning with Post-Training Reachability Verification for Robot Navigation","license":"http://creativecommons.org/licenses/by/4.0/","headline":"CVaR-constrained training produces robot navigation policies with larger obstacle margins that formal reachability verification confirms at higher rates.","cross_cats":[],"primary_cat":"cs.RO","authors_text":"Changshun Wu, Jinwei Hu, Qisong He, Xiaowei Huang, Xinmiao Huang, Yi Dong, Zhuoyun Li","submitted_at":"2026-05-13T22:53:47Z","abstract_excerpt":"Safe navigation for mobile robots demands policies that remain reliable under the high-consequence perception uncertainty of cluttered environments. Yet most existing safe reinforcement learning (RL) methods assess safety through average cumulative cost. Such metrics can mask dangerous tail-risk behaviors. To address this, we propose a framework that trains risk-sensitive policies through Conditional Value-at-Risk (CVaR) constrained optimization on an off-policy TD3 backbone and evaluates their safety margins post-training through neural network reachability verification. During training, the "},"verification_status":{"content_addressed":true,"pith_receipt":true,"author_attested":false,"weak_author_claims":0,"strong_author_claims":0,"externally_anchored":false,"storage_verified":false,"citation_signatures":0,"replication_records":0,"graph_snapshot":true,"references_resolved":true,"formal_links_present":true},"canonical_record":{"source":{"id":"2605.14174","kind":"arxiv","version":1},"metadata":{"license":"http://creativecommons.org/licenses/by/4.0/","primary_cat":"cs.RO","submitted_at":"2026-05-13T22:53:47Z","cross_cats_sorted":[],"title_canon_sha256":"6ab7e6657de1a6f15f7fc08002c2eeaf504c17a80c6409a44bd8241cd1fdfb87","abstract_canon_sha256":"0b0be7802aeba1ca833695cc858141d72379007655c1bab8fad3d01522642489"},"schema_version":"1.0"},"receipt":{"kind":"pith_receipt","key_id":"pith-v1-2026-05","algorithm":"ed25519","signed_at":"2026-05-17T23:39:11.321934Z","signature_b64":"LL5tlWL+t+QjVWl+j5LFjutsZYuvPNvGq8ATTyoptn3FUomn+YE7ZCoKT0swsvVqp4eSmnDBzJUua1pHqpxOAg==","signed_message":"canonical_sha256_bytes","builder_version":"pith-number-builder-2026-05-17-v1","receipt_version":"0.3","canonical_sha256":"a7f22d71628694ab87ac29ab751d8b6a4dcbe55c5ab1128023f7cc811b75f1a9","last_reissued_at":"2026-05-17T23:39:11.321214Z","signature_status":"signed_v1","first_computed_at":"2026-05-17T23:39:11.321214Z","public_key_fingerprint":"8d4b5ee74e4693bcd1df2446408b0d54"},"graph_snapshot":{"paper":{"title":"Safety-Constrained Reinforcement Learning with Post-Training Reachability Verification for Robot Navigation","license":"http://creativecommons.org/licenses/by/4.0/","headline":"CVaR-constrained training produces robot navigation policies with larger obstacle margins that formal reachability verification confirms at higher rates.","cross_cats":[],"primary_cat":"cs.RO","authors_text":"Changshun Wu, Jinwei Hu, Qisong He, Xiaowei Huang, Xinmiao Huang, Yi Dong, Zhuoyun Li","submitted_at":"2026-05-13T22:53:47Z","abstract_excerpt":"Safe navigation for mobile robots demands policies that remain reliable under the high-consequence perception uncertainty of cluttered environments. Yet most existing safe reinforcement learning (RL) methods assess safety through average cumulative cost. Such metrics can mask dangerous tail-risk behaviors. To address this, we propose a framework that trains risk-sensitive policies through Conditional Value-at-Risk (CVaR) constrained optimization on an off-policy TD3 backbone and evaluates their safety margins post-training through neural network reachability verification. During training, the "},"claims":{"count":4,"items":[{"kind":"strongest_claim","text":"A key finding is that policies trained with CVaR constraints maintain larger safety margins from obstacles across evaluated states. This makes them significantly more amenable to formal reachability verification.","source":"verdict.strongest_claim","status":"machine_extracted","claim_id":"C1","attestation":"unclaimed"},{"kind":"weakest_assumption","text":"The assumption that bounded observation uncertainty can be accurately modeled and that Taylor Model analysis yields sufficiently tight reachable sets for meaningful safety rate computation.","source":"verdict.weakest_assumption","status":"machine_extracted","claim_id":"C2","attestation":"unclaimed"},{"kind":"one_line_summary","text":"CVaR-constrained TD3 policies for robot navigation show larger safety margins and higher post-training reachability verification rates than average-cost baselines across simulated scenarios and real-robot tests.","source":"verdict.one_line_summary","status":"machine_extracted","claim_id":"C3","attestation":"unclaimed"},{"kind":"headline","text":"CVaR-constrained training produces robot navigation policies with larger obstacle margins that formal reachability verification confirms at higher rates.","source":"verdict.pith_extraction.headline","status":"machine_extracted","claim_id":"C4","attestation":"unclaimed"}],"snapshot_sha256":"dda45f0955487e9b0d8aac5e96a3d07285309cb0e1a991c5f976c524fc7087df"},"source":{"id":"2605.14174","kind":"arxiv","version":1},"verdict":{"id":"b8da1dcf-4446-49e9-9a4a-e18ad2f7c8d9","model_set":{"reader":"grok-4.3"},"created_at":"2026-05-15T04:50:05.442166Z","strongest_claim":"A key finding is that policies trained with CVaR constraints maintain larger safety margins from obstacles across evaluated states. This makes them significantly more amenable to formal reachability verification.","one_line_summary":"CVaR-constrained TD3 policies for robot navigation show larger safety margins and higher post-training reachability verification rates than average-cost baselines across simulated scenarios and real-robot tests.","pipeline_version":"pith-pipeline@v0.9.0","weakest_assumption":"The assumption that bounded observation uncertainty can be accurately modeled and that Taylor Model analysis yields sufficiently tight reachable sets for meaningful safety rate computation.","pith_extraction_headline":"CVaR-constrained training produces robot navigation policies with larger obstacle margins that formal reachability verification confirms at higher rates."},"references":{"count":31,"sample":[{"doi":"","year":2021,"title":"Altman,Constrained Markov decision processes","work_id":"4b75b0e7-2f33-475c-9d5d-51dfb37ae21b","ref_index":1,"cited_arxiv_id":"","is_internal_anchor":false},{"doi":"","year":2017,"title":"J. Achiam, D. Held, A. Tamar, and P. Abbeel, “Constrained policy optimization,” inInternational conference on machine learning. Pmlr, 2017, pp. 22–31","work_id":"204df451-d9a5-4111-a0c9-7b71030f345e","ref_index":2,"cited_arxiv_id":"","is_internal_anchor":false},{"doi":"","year":2018,"title":"Reward constrained policy optimization","work_id":"c4fdaea7-11ae-432a-8a0c-0b650e87b855","ref_index":3,"cited_arxiv_id":"1805.11074","is_internal_anchor":true},{"doi":"","year":2020,"title":"Learning to walk in the real world with minimal human effort,","work_id":"dbf0682e-ef18-4752-81c1-b60e982e7541","ref_index":4,"cited_arxiv_id":"","is_internal_anchor":false},{"doi":"","year":1910,"title":"Benchmarking Batch Deep Reinforcement Learning Algorithms","work_id":"399c3bf3-740c-41a8-bb6b-dfe1ea43e56d","ref_index":5,"cited_arxiv_id":"1910.01708","is_internal_anchor":true}],"resolved_work":31,"snapshot_sha256":"d7039e30ca8f23f86a4cf65fee8e05cf2786ab0fb06f2bd968e15b663add1538","internal_anchors":2},"formal_canon":{"evidence_count":2,"snapshot_sha256":"af09edae8261ab59e1b492c11adc909bd687294d886f40e54334aeb460f0eafe"},"author_claims":{"count":0,"strong_count":0,"snapshot_sha256":"258153158e38e3291e3d48162225fcdb2d5a3ed65a07baac614ab91432fd4f57"},"builder_version":"pith-number-builder-2026-05-17-v1"},"aliases":[{"alias_kind":"arxiv","alias_value":"2605.14174","created_at":"2026-05-17T23:39:11.321331+00:00"},{"alias_kind":"arxiv_version","alias_value":"2605.14174v1","created_at":"2026-05-17T23:39:11.321331+00:00"},{"alias_kind":"doi","alias_value":"10.48550/arxiv.2605.14174","created_at":"2026-05-17T23:39:11.321331+00:00"},{"alias_kind":"pith_short_12","alias_value":"U7ZC24LCQ2KK","created_at":"2026-05-18T12:33:37.589309+00:00"},{"alias_kind":"pith_short_16","alias_value":"U7ZC24LCQ2KKXB5M","created_at":"2026-05-18T12:33:37.589309+00:00"},{"alias_kind":"pith_short_8","alias_value":"U7ZC24LC","created_at":"2026-05-18T12:33:37.589309+00:00"}],"events":[],"event_summary":{},"paper_claims":[],"inbound_citations":{"count":0,"internal_anchor_count":0,"sample":[]},"formal_canon":{"evidence_count":2,"sample":[],"anchors":[]},"links":{"html":"https://pith.science/pith/U7ZC24LCQ2KKXB5MFGVXKHMLNJ","json":"https://pith.science/pith/U7ZC24LCQ2KKXB5MFGVXKHMLNJ.json","graph_json":"https://pith.science/api/pith-number/U7ZC24LCQ2KKXB5MFGVXKHMLNJ/graph.json","events_json":"https://pith.science/api/pith-number/U7ZC24LCQ2KKXB5MFGVXKHMLNJ/events.json","paper":"https://pith.science/paper/U7ZC24LC"},"agent_actions":{"view_html":"https://pith.science/pith/U7ZC24LCQ2KKXB5MFGVXKHMLNJ","download_json":"https://pith.science/pith/U7ZC24LCQ2KKXB5MFGVXKHMLNJ.json","view_paper":"https://pith.science/paper/U7ZC24LC","resolve_alias":"https://pith.science/api/pith-number/resolve?arxiv=2605.14174&json=true","fetch_graph":"https://pith.science/api/pith-number/U7ZC24LCQ2KKXB5MFGVXKHMLNJ/graph.json","fetch_events":"https://pith.science/api/pith-number/U7ZC24LCQ2KKXB5MFGVXKHMLNJ/events.json","actions":{"anchor_timestamp":"https://pith.science/pith/U7ZC24LCQ2KKXB5MFGVXKHMLNJ/action/timestamp_anchor","attest_storage":"https://pith.science/pith/U7ZC24LCQ2KKXB5MFGVXKHMLNJ/action/storage_attestation","attest_author":"https://pith.science/pith/U7ZC24LCQ2KKXB5MFGVXKHMLNJ/action/author_attestation","sign_citation":"https://pith.science/pith/U7ZC24LCQ2KKXB5MFGVXKHMLNJ/action/citation_signature","submit_replication":"https://pith.science/pith/U7ZC24LCQ2KKXB5MFGVXKHMLNJ/action/replication_record"}},"created_at":"2026-05-17T23:39:11.321331+00:00","updated_at":"2026-05-17T23:39:11.321331+00:00"}