Pith Number

pith:ENBAVMJH

pith:2026:ENBAVMJH5V45SHCEVXKCW4CLP6

not attested not anchored not stored refs resolved

History Anchors: How Prior Behavior Steers LLM Decisions Toward Unsafe Actions

Alberto G. Rodr\'iguez Salgado

A single consistency instruction with harmful prior actions causes aligned frontier LLMs to select unsafe options at 91-98% rates in high-stakes domains, with escalation and inverse scaling by model size.

arxiv:2605.13825 v1 · 2026-05-13 · cs.AI · cs.CV

Open paper page JSON Open Graph Bundle Merged state What is a Pith Number?

Record completeness

1 Bitcoin timestamp

2 Internet Archive

3 Author claim open · sign in to claim

4 Citations open

5 Replications open

✓ Portable graph bundle live · download bundle · merged state

The bundle contains the canonical record plus signed events. A mirror can host it anywhere and recompute the same current state with the deterministic merge algorithm.

Claims

C1strongest claim

under a neutral system prompt the strongest aligned models almost never pick unsafe, but a single added sentence, 'stay consistent with the strategy shown in the prior history', flips them to 91-98%, and the flipped models often escalate beyond continuation.

C2weakest assumption

The 100 scenarios and forced harmful priors are representative of real agent trajectories and that model outputs can be cleanly interpreted as deliberate choices rather than prompt artifacts.

C3one line summary

References

56 extracted · 56 resolved · 4 Pith anchors

[1] Advances in Neural Information Processing Systems (NeurIPS) , year =

[2] Advances in Neural Information Processing Systems (NeurIPS) , year =

[3] Transactions on Machine Learning Research , year =

[5] Advances in Neural Information Processing Systems (NeurIPS) , year =

[6] and Goldstein, Simon and O'Gara, Aidan and Chen, Michael and Hendrycks, Dan , journal =

Receipt and verification

First computed	2026-05-18T02:44:15.176278Z
Builder	pith-number-builder-2026-05-17-v1
Signature	Pith Ed25519 (`pith-v1-2026-05`) · public key
Schema	pith-number/v1.0

Canonical hash

23420ab127ed79d91c44add42b704b7fb828b46e78a59703bef7df00136b7fb6

Aliases

arxiv: 2605.13825 · arxiv_version: 2605.13825v1 · doi: 10.48550/arxiv.2605.13825 · pith_short_12: ENBAVMJH5V45 · pith_short_16: ENBAVMJH5V45SHCE · pith_short_8: ENBAVMJH

Agent API

Resolver JSON Graph JSON Events JSON Schema Signing key

Verify this Pith Number yourself

curl -sH 'Accept: application/ld+json' https://pith.science/pith/ENBAVMJH5V45SHCEVXKCW4CLP6 \
  | jq -c '.canonical_record' \
  | python3 -c "import sys,json,hashlib; b=json.dumps(json.loads(sys.stdin.read()), sort_keys=True, separators=(',',':'), ensure_ascii=False).encode(); print(hashlib.sha256(b).hexdigest())"
# expect: 23420ab127ed79d91c44add42b704b7fb828b46e78a59703bef7df00136b7fb6

Canonical record JSON

{
  "metadata": {
    "abstract_canon_sha256": "fb340834afdd7a7186a67b5788137043769fbbd168a207d3c015fb67d79c1823",
    "cross_cats_sorted": [
      "cs.CV"
    ],
    "license": "http://creativecommons.org/licenses/by/4.0/",
    "primary_cat": "cs.AI",
    "submitted_at": "2026-05-13T17:50:27Z",
    "title_canon_sha256": "12c9716a8e648335eebda4facfc0ba685366eb9f0a978221de3b808ecdc890f5"
  },
  "schema_version": "1.0",
  "source": {
    "id": "2605.13825",
    "kind": "arxiv",
    "version": 1
  }
}