pith. sign in
Pith Number

pith:TESXBW47

pith:2026:TESXBW475SPYDRJSCOXPFS7VHO
not attested not anchored not stored refs resolved

Emergent and Subliminal Misalignment Through the Lens of Data-Mediated Transfer

Anupam Nayak, Baris Askin, Carlee Joe-Wong, Gauri Joshi, Guannan Qu, Muhammed Ustaomeroglu

Harmful fine-tuning induces emergent misalignment via data structure interactions rather than isolated examples.

arxiv:2605.12798 v1 · 2026-05-12 · cs.LG · cs.AI · cs.CL

Add to your LaTeX paper
\usepackage{pith}
\pithnumber{TESXBW475SPYDRJSCOXPFS7VHO}

Prints a linked badge after your title and injects PDF metadata. Compiles on arXiv. Learn more · Embed verified badge

Record completeness

1 Bitcoin timestamp
2 Internet Archive
3 Author claim open · sign in to claim
4 Citations open
5 Replications open
Portable graph bundle live · download bundle · merged state
The bundle contains the canonical record plus signed events. A mirror can host it anywhere and recompute the same current state with the deterministic merge algorithm.

Claims

C1strongest claim

misalignment appears more readily when fine-tuning and evaluation prompts share similar underlying functional structure, when prompts leave more room for coherent harmful completions, and when the target behavior has been more reliably learned by the model.

C2weakest assumption

That observed misalignment differences are caused by data-mediated transfer mechanisms rather than uncontrolled differences in model capacity, optimization dynamics, or evaluation prompt difficulty.

C3one line summary

Emergent and subliminal misalignment in LLMs arise from data structure interactions and transfer via benign distillation data, with stronger effects under shared functional structure and on-policy settings.

References

34 extracted · 34 resolved · 12 Pith anchors

[1] Accessed: 2026-05-04 2026
[2] Persona Vectors: Monitoring and Controlling Character Traits in Language Models · arXiv:2507.21509
[3] arXiv preprint arXiv:2506.13206 , year=
[4] arXiv preprint arXiv:2507.14805 , year=
[5] Gemini 2.5: Pushing the Frontier with Advanced Reasoning, Multimodality, Long Context, and Next Generation Agentic Capabilities · arXiv:2507.06261

Formal links

2 machine-checked theorem links

Receipt and verification
First computed 2026-05-18T03:09:12.759719Z
Builder pith-number-builder-2026-05-17-v1
Signature Pith Ed25519 (pith-v1-2026-05) · public key
Schema pith-number/v1.0

Canonical hash

992570db9fec9f81c53213aef2cbf53b9c222b7196cf5f74b8d7367e70eab110

Aliases

arxiv: 2605.12798 · arxiv_version: 2605.12798v1 · doi: 10.48550/arxiv.2605.12798 · pith_short_12: TESXBW475SPY · pith_short_16: TESXBW475SPYDRJS · pith_short_8: TESXBW47
Agent API
Verify this Pith Number yourself
curl -sH 'Accept: application/ld+json' https://pith.science/pith/TESXBW475SPYDRJSCOXPFS7VHO \
  | jq -c '.canonical_record' \
  | python3 -c "import sys,json,hashlib; b=json.dumps(json.loads(sys.stdin.read()), sort_keys=True, separators=(',',':'), ensure_ascii=False).encode(); print(hashlib.sha256(b).hexdigest())"
# expect: 992570db9fec9f81c53213aef2cbf53b9c222b7196cf5f74b8d7367e70eab110
Canonical record JSON
{
  "metadata": {
    "abstract_canon_sha256": "eab5810efdaac7f3af5cc8243ac56d189900d12f19f59f00ab9c15909bd6c9f4",
    "cross_cats_sorted": [
      "cs.AI",
      "cs.CL"
    ],
    "license": "http://arxiv.org/licenses/nonexclusive-distrib/1.0/",
    "primary_cat": "cs.LG",
    "submitted_at": "2026-05-12T22:27:32Z",
    "title_canon_sha256": "3c067641f8f329a95a6ea8f80ad9a1b697163b2ffe9c6a70c9c762f3007ceca7"
  },
  "schema_version": "1.0",
  "source": {
    "id": "2605.12798",
    "kind": "arxiv",
    "version": 1
  }
}