pith:OBCY4VAN
SafeHarbor: Hierarchical Memory-Augmented Guardrail for LLM Agent Safety
SafeHarbor uses hierarchical memory to extract and evolve context-aware rules that let LLM agents refuse harmful tool use while handling ambiguous benign tasks.
arxiv:2605.05704 v2 · 2026-05-07 · cs.CR · cs.AI
Add to your LaTeX paper
\usepackage{pith}
\pithnumber{OBCY4VANEY2NYRJSKCHRUP4AZG}
Prints a linked badge after your title and injects PDF metadata. Compiles on arXiv. Learn more · Embed verified badge
Record completeness
Claims
SafeHarbor achieves state-of-the-art performance on both ambiguous benign tasks and explicit malicious attacks, notably attaining a peak benign utility of 63.6% on GPT-4o while maintaining a robust refusal rate exceeding 93% against harmful requests.
That context-aware rules extracted via enhanced adversarial generation plus entropy-based node splitting and merging will maintain precise decision boundaries across unseen tasks and models without introducing new failure modes or requiring per-deployment tuning.
SafeHarbor uses hierarchical memory with adversarial rule extraction and entropy-driven self-evolution to achieve over 93% refusal on harmful requests while reaching 63.6% benign utility on GPT-4o.
Receipt and verification
| First computed | 2026-05-25T02:01:21.940352Z |
|---|---|
| Builder | pith-number-builder-2026-05-17-v1 |
| Signature | Pith Ed25519
(pith-v1-2026-05) · public key |
| Schema | pith-number/v1.0 |
Canonical hash
70458e540d2634dc4532508f1a3f80c9b248a688a922bd61ebf1b1f9b90f6362
Aliases
· · · · ·Agent API
Verify this Pith Number yourself
curl -sH 'Accept: application/ld+json' https://pith.science/pith/OBCY4VANEY2NYRJSKCHRUP4AZG \
| jq -c '.canonical_record' \
| python3 -c "import sys,json,hashlib; b=json.dumps(json.loads(sys.stdin.read()), sort_keys=True, separators=(',',':'), ensure_ascii=False).encode(); print(hashlib.sha256(b).hexdigest())"
# expect: 70458e540d2634dc4532508f1a3f80c9b248a688a922bd61ebf1b1f9b90f6362
Canonical record JSON
{
"metadata": {
"abstract_canon_sha256": "fdc8601bab076071299419f12107c07906e0c1fe3cbdf46bcea5e206d4efb7d3",
"cross_cats_sorted": [
"cs.AI"
],
"license": "http://creativecommons.org/licenses/by/4.0/",
"primary_cat": "cs.CR",
"submitted_at": "2026-05-07T05:50:45Z",
"title_canon_sha256": "d2a4aac0c28d5323ed860c47914c6dafa55c7d5e7a5f3e1071b5a73749d29459"
},
"schema_version": "1.0",
"source": {
"id": "2605.05704",
"kind": "arxiv",
"version": 2
}
}