pith. sign in
Pith Number

pith:22SKRVBN

pith:2026:22SKRVBN6JPMXE7TRXT7KERWRX
not attested not anchored not stored refs resolved

Action-Conditioned Risk Gating for Safety-Critical Control under Partial Observability

Heng Huang, Tao Wang, Xugui Zhou, Yanfu Zhang, Yin-Jen Chen, Yushen Liu, Ziyi Chen

A learned action-conditioned predictor of near-term safety violations gates value estimates to approximate risk-sensitive control under partial observability.

arxiv:2605.14246 v1 · 2026-05-14 · cs.LG · cs.AI · cs.SY · eess.SY

Add to your LaTeX paper
\usepackage{pith}
\pithnumber{22SKRVBN6JPMXE7TRXT7KERWRX}

Prints a linked badge after your title and injects PDF metadata. Compiles on arXiv. Learn more · Embed verified badge

Record completeness

1 Bitcoin timestamp
2 Internet Archive
3 Author claim open · sign in to claim
4 Citations open
5 Replications open
Portable graph bundle live · download bundle · merged state
The bundle contains the canonical record plus signed events. A mirror can host it anywhere and recompute the same current state with the deterministic merge algorithm.

Claims

C1strongest claim

The method improves overall glycemic tradeoffs and substantially reduces runtime relative to a belief-space planning baseline; on Safety-Gym it achieves a more favorable reward-cost balance than unconstrained RL and several standard safe-RL baselines.

C2weakest assumption

That a compact finite-history proxy state plus a learned action-conditioned predictor of near-term safety violation is sufficient to produce effective risk-sensitive decisions under partial observability.

C3one line summary

Action-conditioned near-term risk prediction gates optimistic and conservative value estimates in RL to approximate risk-sensitive POMDP control, yielding better safety-performance tradeoffs with lower runtime than belief planning baselines.

References

43 extracted · 43 resolved · 3 Pith anchors

[1] Constrained policy optimization 2017
[2] Constrained policy optimization 2017
[3] A distributional perspective on rein- forcement learning 2017
[4] Wayne Bequette, Darrell M 2011
[5] Safe reinforcement learning via shielding under par- tial observability 2023
Receipt and verification
First computed 2026-05-17T23:39:10.608575Z
Builder pith-number-builder-2026-05-17-v1
Signature Pith Ed25519 (pith-v1-2026-05) · public key
Schema pith-number/v1.0

Canonical hash

d6a4a8d42df25ecb93f38de7f512368de2cb8bbc79db99e0eeb77cc4f1fb830c

Aliases

arxiv: 2605.14246 · arxiv_version: 2605.14246v1 · doi: 10.48550/arxiv.2605.14246 · pith_short_12: 22SKRVBN6JPM · pith_short_16: 22SKRVBN6JPMXE7T · pith_short_8: 22SKRVBN
Agent API
Verify this Pith Number yourself
curl -sH 'Accept: application/ld+json' https://pith.science/pith/22SKRVBN6JPMXE7TRXT7KERWRX \
  | jq -c '.canonical_record' \
  | python3 -c "import sys,json,hashlib; b=json.dumps(json.loads(sys.stdin.read()), sort_keys=True, separators=(',',':'), ensure_ascii=False).encode(); print(hashlib.sha256(b).hexdigest())"
# expect: d6a4a8d42df25ecb93f38de7f512368de2cb8bbc79db99e0eeb77cc4f1fb830c
Canonical record JSON
{
  "metadata": {
    "abstract_canon_sha256": "3f200740c745531f54a7d7846d87a31d780daab9d32338fab196a8ac00834794",
    "cross_cats_sorted": [
      "cs.AI",
      "cs.SY",
      "eess.SY"
    ],
    "license": "http://creativecommons.org/licenses/by/4.0/",
    "primary_cat": "cs.LG",
    "submitted_at": "2026-05-14T01:23:09Z",
    "title_canon_sha256": "ea214f43eddd2a9618adef28a0e04e8a22aa30150bcb7982081b855fad359598"
  },
  "schema_version": "1.0",
  "source": {
    "id": "2605.14246",
    "kind": "arxiv",
    "version": 1
  }
}