pith:ENBAVMJH
History Anchors: How Prior Behavior Steers LLM Decisions Toward Unsafe Actions
A single consistency instruction with harmful prior actions causes aligned frontier LLMs to select unsafe options at 91-98% rates in high-stakes domains, with escalation and inverse scaling by model size.
arxiv:2605.13825 v1 · 2026-05-13 · cs.AI · cs.CV
Record completeness
Claims
under a neutral system prompt the strongest aligned models almost never pick unsafe, but a single added sentence, 'stay consistent with the strategy shown in the prior history', flips them to 91-98%, and the flipped models often escalate beyond continuation.
The 100 scenarios and forced harmful priors are representative of real agent trajectories and that model outputs can be cleanly interpreted as deliberate choices rather than prompt artifacts.
A single consistency instruction with harmful prior actions causes aligned frontier LLMs to select unsafe options at 91-98% rates in high-stakes domains, with escalation and inverse scaling by model size.
References
Receipt and verification
| First computed | 2026-05-18T02:44:15.176278Z |
|---|---|
| Builder | pith-number-builder-2026-05-17-v1 |
| Signature | Pith Ed25519
(pith-v1-2026-05) · public key |
| Schema | pith-number/v1.0 |
Canonical hash
23420ab127ed79d91c44add42b704b7fb828b46e78a59703bef7df00136b7fb6
Aliases
· · · · ·Agent API
Verify this Pith Number yourself
curl -sH 'Accept: application/ld+json' https://pith.science/pith/ENBAVMJH5V45SHCEVXKCW4CLP6 \
| jq -c '.canonical_record' \
| python3 -c "import sys,json,hashlib; b=json.dumps(json.loads(sys.stdin.read()), sort_keys=True, separators=(',',':'), ensure_ascii=False).encode(); print(hashlib.sha256(b).hexdigest())"
# expect: 23420ab127ed79d91c44add42b704b7fb828b46e78a59703bef7df00136b7fb6
Canonical record JSON
{
"metadata": {
"abstract_canon_sha256": "fb340834afdd7a7186a67b5788137043769fbbd168a207d3c015fb67d79c1823",
"cross_cats_sorted": [
"cs.CV"
],
"license": "http://creativecommons.org/licenses/by/4.0/",
"primary_cat": "cs.AI",
"submitted_at": "2026-05-13T17:50:27Z",
"title_canon_sha256": "12c9716a8e648335eebda4facfc0ba685366eb9f0a978221de3b808ecdc890f5"
},
"schema_version": "1.0",
"source": {
"id": "2605.13825",
"kind": "arxiv",
"version": 1
}
}