pith. sign in
Pith Number

pith:XUUK4HNO

pith:2026:XUUK4HNOI77GCX52SJSPFQUD5H
not attested not anchored not stored refs resolved

Bellman Value Decomposition for Task Logic in Safe Optimal Control

Chuchu Fan, Dylan Hirsch, Oswin So, Sylvia Herbert, William Sharpless

The Bellman value for temporal logic tasks decomposes into a graph of simpler values connected by reach-avoid, avoid, and reach-avoid-loop equations.

arxiv:2602.19532 v2 · 2026-02-23 · cs.RO · cs.SY · eess.SY

Add to your LaTeX paper
\usepackage{pith}
\pithnumber{XUUK4HNOI77GCX52SJSPFQUD5H}

Prints a linked badge after your title and injects PDF metadata. Compiles on arXiv. Learn more · Embed verified badge

Record completeness

1 Bitcoin timestamp
2 Internet Archive
3 Author claim open · sign in to claim
4 Citations open
5 Replications open
Portable graph bundle live · download bundle · merged state
The bundle contains the canonical record plus signed events. A mirror can host it anywhere and recompute the same current state with the deterministic merge algorithm.

Claims

C1strongest claim

We prove the Bellman Value for a complex task defined in temporal logic can be decomposed into a graph of Bellman Values, connected by a set of well-known Bellman equations (BEs): the Reach-Avoid BE, the Avoid BE, and a novel type, the Reach-Avoid-Loop BE.

C2weakest assumption

The innate structure of the Bellman value naturally organizes temporal logic tasks so that the decomposed graph can be embedded in a two-layer neural net that bootstraps implicit dependencies without additional manual tuning or post-hoc adjustments.

C3one line summary

Bellman values for temporal logic tasks decompose into a graph of reach-avoid, avoid, and reach-avoid-loop equations solved by embedding the graph in a two-layer neural net (VDPPO) for safe high-dimensional control.

References

97 extracted · 97 resolved · 2 Pith anchors

[1] R. S. Sutton and A. G. Barto,Reinforcement Learning: An Introduction. Cambridge, MA, USA: A Bradford Book, 2018 2018
[2] LTL and beyond: Formal languages for reward function specification in reinforcement learning, 2019
[3] A time-dependent hamilton-jacobi formulation of reachable sets for continuous dynamic games 2005
[4] Reach-avoid problems with time-varying dynamics, targets and constraints, 2015
[5] Dual-objective reinforcement learning with novel hamilton-jacobi-bellman formulations, 2025

Formal links

2 machine-checked theorem links

Cited by

1 paper in Pith

Receipt and verification
First computed 2026-05-17T23:39:16.014036Z
Builder pith-number-builder-2026-05-17-v1
Signature Pith Ed25519 (pith-v1-2026-05) · public key
Schema pith-number/v1.0

Canonical hash

bd28ae1dae47fe615fba9264f2c283e9fffb760ae7ffd66cbbf6eaeed9cc33e2

Aliases

arxiv: 2602.19532 · arxiv_version: 2602.19532v2 · doi: 10.48550/arxiv.2602.19532 · pith_short_12: XUUK4HNOI77G · pith_short_16: XUUK4HNOI77GCX52 · pith_short_8: XUUK4HNO
Agent API
Verify this Pith Number yourself
curl -sH 'Accept: application/ld+json' https://pith.science/pith/XUUK4HNOI77GCX52SJSPFQUD5H \
  | jq -c '.canonical_record' \
  | python3 -c "import sys,json,hashlib; b=json.dumps(json.loads(sys.stdin.read()), sort_keys=True, separators=(',',':'), ensure_ascii=False).encode(); print(hashlib.sha256(b).hexdigest())"
# expect: bd28ae1dae47fe615fba9264f2c283e9fffb760ae7ffd66cbbf6eaeed9cc33e2
Canonical record JSON
{
  "metadata": {
    "abstract_canon_sha256": "5f00bec0ef5597893b606682b95cc33568db4b311aac8f2d4f65b8fa8a71f962",
    "cross_cats_sorted": [
      "cs.SY",
      "eess.SY"
    ],
    "license": "http://creativecommons.org/licenses/by-nc-sa/4.0/",
    "primary_cat": "cs.RO",
    "submitted_at": "2026-02-23T05:48:58Z",
    "title_canon_sha256": "33efe4b725c0d887804372495e33b7214738738197a66c3dbdc1c1ad8c4a8b0f"
  },
  "schema_version": "1.0",
  "source": {
    "id": "2602.19532",
    "kind": "arxiv",
    "version": 2
  }
}