pith. sign in
Pith Number

pith:56GWHSSX

pith:2026:56GWHSSX22L4FBE6S7D3WLCK2W
not attested not anchored not stored refs pending

SafeManip: A Property-Driven Benchmark for Temporal Safety Evaluation in Robotic Manipulation

Chengyue Huang, Khang Vo Huynh, Lu Feng, Sebastian Elbaum, Zsolt Kira

A new benchmark shows that robotic manipulation policies often violate temporal safety rules even on tasks they complete successfully.

arxiv:2605.12386 v2 · 2026-05-12 · cs.RO

Add to your LaTeX paper
\usepackage{pith}
\pithnumber{56GWHSSX22L4FBE6S7D3WLCK2W}

Prints a linked badge after your title and injects PDF metadata. Compiles on arXiv. Learn more · Embed verified badge

Record completeness

1 Bitcoin timestamp
2 Internet Archive
3 Author claim open · sign in to claim
4 Citations open
5 Replications open
Portable graph bundle live · download bundle · merged state
The bundle contains the canonical record plus signed events. A mirror can host it anywhere and recompute the same current state with the deterministic merge algorithm.

Claims

C1strongest claim

Results show that even strong models often behave unsafely. Task-success gains do not reliably translate into safer execution: many successful rollouts remain unsafe, while longer-horizon or more complex tasks expose more violations.

C2weakest assumption

The LTLf safety templates and the mapping from observed rollouts to symbolic predicate traces accurately capture all relevant temporal safety properties without introducing false positives or missing critical real-world constraints.

C3one line summary

SafeManip is a new benchmark that applies LTLf monitors to assess temporal safety properties across eight categories in robotic manipulation, demonstrating that task success frequently fails to ensure safe execution in vision-language-action policies.

Formal links

2 machine-checked theorem links

Receipt and verification
First computed 2026-06-11T02:09:30.756267Z
Builder pith-number-builder-2026-05-17-v1
Signature Pith Ed25519 (pith-v1-2026-05) · public key
Schema pith-number/v1.0

Canonical hash

ef8d63ca57d697c2849e97c7bb2c4ad5be7da783abc59f60446a80065f37b7c0

Aliases

arxiv: 2605.12386 · arxiv_version: 2605.12386v2 · doi: 10.48550/arxiv.2605.12386 · pith_short_12: 56GWHSSX22L4 · pith_short_16: 56GWHSSX22L4FBE6 · pith_short_8: 56GWHSSX
Agent API
Verify this Pith Number yourself
curl -sH 'Accept: application/ld+json' https://pith.science/pith/56GWHSSX22L4FBE6S7D3WLCK2W \
  | jq -c '.canonical_record' \
  | python3 -c "import sys,json,hashlib; b=json.dumps(json.loads(sys.stdin.read()), sort_keys=True, separators=(',',':'), ensure_ascii=False).encode(); print(hashlib.sha256(b).hexdigest())"
# expect: ef8d63ca57d697c2849e97c7bb2c4ad5be7da783abc59f60446a80065f37b7c0
Canonical record JSON
{
  "metadata": {
    "abstract_canon_sha256": "e48b1d6e7014fefe724fb74a1bce032a262a5f70c8fb634c1c9da864a8b24a10",
    "cross_cats_sorted": [],
    "license": "http://creativecommons.org/licenses/by/4.0/",
    "primary_cat": "cs.RO",
    "submitted_at": "2026-05-12T16:49:28Z",
    "title_canon_sha256": "c33e6927696d000caf790804a5cdb6c955cff6d8f248b7bd48ea53f3c7078a90"
  },
  "schema_version": "1.0",
  "source": {
    "id": "2605.12386",
    "kind": "arxiv",
    "version": 2
  }
}