pith. sign in
Pith Number

pith:OBCY4VAN

pith:2026:OBCY4VANEY2NYRJSKCHRUP4AZG
not attested not anchored not stored refs pending

SafeHarbor: Hierarchical Memory-Augmented Guardrail for LLM Agent Safety

Deyue Zhang, Dongdong Yang, Hao Peng, Quanchen Zou, Wenxin Zhang, Xiangzheng Zhang, Zhe Liu, Zonghao Ying

SafeHarbor uses hierarchical memory to extract and evolve context-aware rules that let LLM agents refuse harmful tool use while handling ambiguous benign tasks.

arxiv:2605.05704 v2 · 2026-05-07 · cs.CR · cs.AI

Add to your LaTeX paper
\usepackage{pith}
\pithnumber{OBCY4VANEY2NYRJSKCHRUP4AZG}

Prints a linked badge after your title and injects PDF metadata. Compiles on arXiv. Learn more · Embed verified badge

Record completeness

1 Bitcoin timestamp
2 Internet Archive
3 Author claim open · sign in to claim
4 Citations open
5 Replications open
Portable graph bundle live · download bundle · merged state
The bundle contains the canonical record plus signed events. A mirror can host it anywhere and recompute the same current state with the deterministic merge algorithm.

Claims

C1strongest claim

SafeHarbor achieves state-of-the-art performance on both ambiguous benign tasks and explicit malicious attacks, notably attaining a peak benign utility of 63.6% on GPT-4o while maintaining a robust refusal rate exceeding 93% against harmful requests.

C2weakest assumption

That context-aware rules extracted via enhanced adversarial generation plus entropy-based node splitting and merging will maintain precise decision boundaries across unseen tasks and models without introducing new failure modes or requiring per-deployment tuning.

C3one line summary

SafeHarbor uses hierarchical memory with adversarial rule extraction and entropy-driven self-evolution to achieve over 93% refusal on harmful requests while reaching 63.6% benign utility on GPT-4o.

Receipt and verification
First computed 2026-05-25T02:01:21.940352Z
Builder pith-number-builder-2026-05-17-v1
Signature Pith Ed25519 (pith-v1-2026-05) · public key
Schema pith-number/v1.0

Canonical hash

70458e540d2634dc4532508f1a3f80c9b248a688a922bd61ebf1b1f9b90f6362

Aliases

arxiv: 2605.05704 · arxiv_version: 2605.05704v2 · doi: 10.48550/arxiv.2605.05704 · pith_short_12: OBCY4VANEY2N · pith_short_16: OBCY4VANEY2NYRJS · pith_short_8: OBCY4VAN
Agent API
Verify this Pith Number yourself
curl -sH 'Accept: application/ld+json' https://pith.science/pith/OBCY4VANEY2NYRJSKCHRUP4AZG \
  | jq -c '.canonical_record' \
  | python3 -c "import sys,json,hashlib; b=json.dumps(json.loads(sys.stdin.read()), sort_keys=True, separators=(',',':'), ensure_ascii=False).encode(); print(hashlib.sha256(b).hexdigest())"
# expect: 70458e540d2634dc4532508f1a3f80c9b248a688a922bd61ebf1b1f9b90f6362
Canonical record JSON
{
  "metadata": {
    "abstract_canon_sha256": "fdc8601bab076071299419f12107c07906e0c1fe3cbdf46bcea5e206d4efb7d3",
    "cross_cats_sorted": [
      "cs.AI"
    ],
    "license": "http://creativecommons.org/licenses/by/4.0/",
    "primary_cat": "cs.CR",
    "submitted_at": "2026-05-07T05:50:45Z",
    "title_canon_sha256": "d2a4aac0c28d5323ed860c47914c6dafa55c7d5e7a5f3e1071b5a73749d29459"
  },
  "schema_version": "1.0",
  "source": {
    "id": "2605.05704",
    "kind": "arxiv",
    "version": 2
  }
}