pith:TESXBW47
Emergent and Subliminal Misalignment Through the Lens of Data-Mediated Transfer
Harmful fine-tuning induces emergent misalignment via data structure interactions rather than isolated examples.
arxiv:2605.12798 v1 · 2026-05-12 · cs.LG · cs.AI · cs.CL
Add to your LaTeX paper
\usepackage{pith}
\pithnumber{TESXBW475SPYDRJSCOXPFS7VHO}
Prints a linked badge after your title and injects PDF metadata. Compiles on arXiv. Learn more · Embed verified badge
Record completeness
Claims
misalignment appears more readily when fine-tuning and evaluation prompts share similar underlying functional structure, when prompts leave more room for coherent harmful completions, and when the target behavior has been more reliably learned by the model.
That observed misalignment differences are caused by data-mediated transfer mechanisms rather than uncontrolled differences in model capacity, optimization dynamics, or evaluation prompt difficulty.
Emergent and subliminal misalignment in LLMs arise from data structure interactions and transfer via benign distillation data, with stronger effects under shared functional structure and on-policy settings.
References
Formal links
Receipt and verification
| First computed | 2026-05-18T03:09:12.759719Z |
|---|---|
| Builder | pith-number-builder-2026-05-17-v1 |
| Signature | Pith Ed25519
(pith-v1-2026-05) · public key |
| Schema | pith-number/v1.0 |
Canonical hash
992570db9fec9f81c53213aef2cbf53b9c222b7196cf5f74b8d7367e70eab110
Aliases
· · · · ·Agent API
Verify this Pith Number yourself
curl -sH 'Accept: application/ld+json' https://pith.science/pith/TESXBW475SPYDRJSCOXPFS7VHO \
| jq -c '.canonical_record' \
| python3 -c "import sys,json,hashlib; b=json.dumps(json.loads(sys.stdin.read()), sort_keys=True, separators=(',',':'), ensure_ascii=False).encode(); print(hashlib.sha256(b).hexdigest())"
# expect: 992570db9fec9f81c53213aef2cbf53b9c222b7196cf5f74b8d7367e70eab110
Canonical record JSON
{
"metadata": {
"abstract_canon_sha256": "eab5810efdaac7f3af5cc8243ac56d189900d12f19f59f00ab9c15909bd6c9f4",
"cross_cats_sorted": [
"cs.AI",
"cs.CL"
],
"license": "http://arxiv.org/licenses/nonexclusive-distrib/1.0/",
"primary_cat": "cs.LG",
"submitted_at": "2026-05-12T22:27:32Z",
"title_canon_sha256": "3c067641f8f329a95a6ea8f80ad9a1b697163b2ffe9c6a70c9c762f3007ceca7"
},
"schema_version": "1.0",
"source": {
"id": "2605.12798",
"kind": "arxiv",
"version": 1
}
}