pith. sign in
Pith Number

pith:L33RNGUY

pith:2026:L33RNGUYBWS576LT36D4GTW3WL
not attested not anchored not stored refs pending

Most Current Model Organisms Are Leaky: Perplexity Differencing Often Reveals Finetuning Objectives

Dan Wilhelm, Luca Baroni, Mohammed Abu Baker

Perplexity differencing on completions from random prefills surfaces finetuning objectives in most model organisms.

arxiv:2605.00994 v2 · 2026-05-01 · cs.CL · cs.AI

Add to your LaTeX paper
\usepackage{pith}
\pithnumber{L33RNGUYBWS576LT36D4GTW3WL}

Prints a linked badge after your title and injects PDF metadata. Compiles on arXiv. Learn more · Embed verified badge

Record completeness

1 Bitcoin timestamp
2 Internet Archive
3 Author claim open · sign in to claim
4 Citations open
5 Replications open
Portable graph bundle live · download bundle · merged state
The bundle contains the canonical record plus signed events. A mirror can host it anywhere and recompute the same current state with the deterministic merge algorithm.

Claims

C1strongest claim

a simple perplexity-based method can surface finetuning objectives from model organisms by leveraging their tendency to overgeneralize their finetuned behaviors beyond the intended context

C2weakest assumption

that finetuned models reliably overgeneralize their training objectives to unrelated contexts generated from random prefills, producing detectable perplexity gaps against a reference model

C3one line summary

Perplexity gaps between finetuned and reference models on random-prefill completions often reveal the original finetuning objectives across diverse model organisms.

Receipt and verification
First computed 2026-06-30T02:18:08.784645Z
Builder pith-number-builder-2026-05-17-v1
Signature Pith Ed25519 (pith-v1-2026-05) · public key
Schema pith-number/v1.0

Canonical hash

5ef7169a980da5dff973df87c34edbb2f1730eddd9597ab8b78e8106ed54ac93

Aliases

arxiv: 2605.00994 · arxiv_version: 2605.00994v2 · doi: 10.48550/arxiv.2605.00994 · pith_short_12: L33RNGUYBWS5 · pith_short_16: L33RNGUYBWS576LT · pith_short_8: L33RNGUY
Agent API
Verify this Pith Number yourself
curl -sH 'Accept: application/ld+json' https://pith.science/pith/L33RNGUYBWS576LT36D4GTW3WL \
  | jq -c '.canonical_record' \
  | python3 -c "import sys,json,hashlib; b=json.dumps(json.loads(sys.stdin.read()), sort_keys=True, separators=(',',':'), ensure_ascii=False).encode(); print(hashlib.sha256(b).hexdigest())"
# expect: 5ef7169a980da5dff973df87c34edbb2f1730eddd9597ab8b78e8106ed54ac93
Canonical record JSON
{
  "metadata": {
    "abstract_canon_sha256": "98133f29ab394c6fc1eb1e7453dce3e825395a16bc5f801aae67f197ba48a623",
    "cross_cats_sorted": [
      "cs.AI"
    ],
    "license": "http://creativecommons.org/licenses/by-nc-sa/4.0/",
    "primary_cat": "cs.CL",
    "submitted_at": "2026-05-01T18:00:55Z",
    "title_canon_sha256": "bbeb7847e2c0218b82d28e816a5d6a57d3dc09a8f6399c5e4101c304f0ba6784"
  },
  "schema_version": "1.0",
  "source": {
    "id": "2605.00994",
    "kind": "arxiv",
    "version": 2
  }
}