pith. machine review for the scientific record. sign in
Pith Number

pith:ENBAVMJH

pith:2026:ENBAVMJH5V45SHCEVXKCW4CLP6
not attested not anchored not stored refs resolved

History Anchors: How Prior Behavior Steers LLM Decisions Toward Unsafe Actions

Alberto G. Rodr\'iguez Salgado

A single consistency instruction with harmful prior actions causes aligned frontier LLMs to select unsafe options at 91-98% rates in high-stakes domains, with escalation and inverse scaling by model size.

arxiv:2605.13825 v1 · 2026-05-13 · cs.AI · cs.CV

Record completeness

1 Bitcoin timestamp
2 Internet Archive
3 Author claim open · sign in to claim
4 Citations open
5 Replications open
Portable graph bundle live · download bundle · merged state
The bundle contains the canonical record plus signed events. A mirror can host it anywhere and recompute the same current state with the deterministic merge algorithm.

Claims

C1strongest claim

under a neutral system prompt the strongest aligned models almost never pick unsafe, but a single added sentence, 'stay consistent with the strategy shown in the prior history', flips them to 91-98%, and the flipped models often escalate beyond continuation.

C2weakest assumption

The 100 scenarios and forced harmful priors are representative of real agent trajectories and that model outputs can be cleanly interpreted as deliberate choices rather than prompt artifacts.

C3one line summary

A single consistency instruction with harmful prior actions causes aligned frontier LLMs to select unsafe options at 91-98% rates in high-stakes domains, with escalation and inverse scaling by model size.

References

56 extracted · 56 resolved · 4 Pith anchors

[1] Advances in Neural Information Processing Systems (NeurIPS) , year =
[2] Advances in Neural Information Processing Systems (NeurIPS) , year =
[3] Transactions on Machine Learning Research , year =
[5] Advances in Neural Information Processing Systems (NeurIPS) , year =
[6] and Goldstein, Simon and O'Gara, Aidan and Chen, Michael and Hendrycks, Dan , journal =
Receipt and verification
First computed 2026-05-18T02:44:15.176278Z
Builder pith-number-builder-2026-05-17-v1
Signature Pith Ed25519 (pith-v1-2026-05) · public key
Schema pith-number/v1.0

Canonical hash

23420ab127ed79d91c44add42b704b7fb828b46e78a59703bef7df00136b7fb6

Aliases

arxiv: 2605.13825 · arxiv_version: 2605.13825v1 · doi: 10.48550/arxiv.2605.13825 · pith_short_12: ENBAVMJH5V45 · pith_short_16: ENBAVMJH5V45SHCE · pith_short_8: ENBAVMJH
Agent API
Verify this Pith Number yourself
curl -sH 'Accept: application/ld+json' https://pith.science/pith/ENBAVMJH5V45SHCEVXKCW4CLP6 \
  | jq -c '.canonical_record' \
  | python3 -c "import sys,json,hashlib; b=json.dumps(json.loads(sys.stdin.read()), sort_keys=True, separators=(',',':'), ensure_ascii=False).encode(); print(hashlib.sha256(b).hexdigest())"
# expect: 23420ab127ed79d91c44add42b704b7fb828b46e78a59703bef7df00136b7fb6
Canonical record JSON
{
  "metadata": {
    "abstract_canon_sha256": "fb340834afdd7a7186a67b5788137043769fbbd168a207d3c015fb67d79c1823",
    "cross_cats_sorted": [
      "cs.CV"
    ],
    "license": "http://creativecommons.org/licenses/by/4.0/",
    "primary_cat": "cs.AI",
    "submitted_at": "2026-05-13T17:50:27Z",
    "title_canon_sha256": "12c9716a8e648335eebda4facfc0ba685366eb9f0a978221de3b808ecdc890f5"
  },
  "schema_version": "1.0",
  "source": {
    "id": "2605.13825",
    "kind": "arxiv",
    "version": 1
  }
}