pith. sign in
Pith Number

pith:G5AADYN4

pith:2026:G5AADYN46YJRDMDAOMT4DUHU4S
not attested not anchored not stored refs resolved

MetaBackdoor: Exploiting Positional Encoding as a Backdoor Attack Surface in LLMs

Ahmed Salem, Andrew Paverd, Jun Sakuma, Mark Russinovich, Rui Wen

LLM backdoors can activate on input length alone by exploiting positional encodings without any text changes.

arxiv:2605.15172 v1 · 2026-05-14 · cs.CR · cs.CL

Add to your LaTeX paper
\usepackage{pith}
\pithnumber{G5AADYN46YJRDMDAOMT4DUHU4S}

Prints a linked badge after your title and injects PDF metadata. Compiles on arXiv. Learn more · Embed verified badge

Record completeness

1 Bitcoin timestamp
2 Internet Archive
3 Author claim open · sign in to claim
4 Citations open
5 Replications open
Portable graph bundle live · download bundle · merged state
The bundle contains the canonical record plus signed events. A mirror can host it anywhere and recompute the same current state with the deterministic merge algorithm.

Claims

C1strongest claim

even a simple length-based positional trigger is sufficient to activate stealthy backdoors... a backdoored LLM can be induced to disclose sensitive internal information, including proprietary system prompts, once a length condition is satisfied.

C2weakest assumption

That the model's internal representations of positional structure can be reliably shaped during training to create a stable, stealthy trigger without affecting normal behavior on non-trigger lengths.

C3one line summary

MetaBackdoor shows that LLMs can be backdoored using positional triggers like sequence length, enabling stealthy activation on clean inputs to leak system prompts or trigger malicious behavior.

References

49 extracted · 49 resolved · 8 Pith anchors

[1] BadNets: Identifying Vulnerabilities in the Machine Learning Model Supply Chain 2017 · arXiv:1708.06733
[2] Targeted Backdoor Attacks on Deep Learning Systems Using Data Poisoning 2017 · arXiv:1712.05526
[3] PPT: Backdoor Attacks on Pre-trained Models via Poisoned Prompt Tuning, 2022
[4] NOTABLE: Transferable Backdoor Attacks Against Prompt-based NLP Models, 2023
[5] Training- free Lexical Backdoor Attacks on Language Models, 2023
Receipt and verification
First computed 2026-05-17T21:40:25.278762Z
Last reissued 2026-05-17T21:57:18.616145Z
Builder pith-number-builder-2026-05-17-v1
Signature unsigned_v0
Schema pith-number/v1.0

Canonical hash

374001e1bcf61311b0607327c1d0f4e4895093a5923e52cd9c352fcfdf77f19b

Aliases

arxiv: 2605.15172 · arxiv_version: 2605.15172v1 · pith_short_12: G5AADYN46YJR · pith_short_16: G5AADYN46YJRDMDA · pith_short_8: G5AADYN4
Agent API
Verify this Pith Number yourself
curl -sH 'Accept: application/ld+json' https://pith.science/pith/G5AADYN46YJRDMDAOMT4DUHU4S \
  | jq -c '.canonical_record' \
  | python3 -c "import sys,json,hashlib; b=json.dumps(json.loads(sys.stdin.read()), sort_keys=True, separators=(',',':'), ensure_ascii=False).encode(); print(hashlib.sha256(b).hexdigest())"
# expect: 374001e1bcf61311b0607327c1d0f4e4895093a5923e52cd9c352fcfdf77f19b
Canonical record JSON
{
  "metadata": {
    "abstract_canon_sha256": "7c5ca57f2172a267780a357cf476fa7071ea12d4024ca4f17c95a8152dd2cca4",
    "cross_cats_sorted": [
      "cs.CL"
    ],
    "license": "http://creativecommons.org/licenses/by/4.0/",
    "primary_cat": "cs.CR",
    "submitted_at": "2026-05-14T17:56:22Z",
    "title_canon_sha256": "d7641841524c5cc050e515bb9a715569da76192a4f4e4a57b6cfb2740d976ee4"
  },
  "schema_version": "1.0",
  "source": {
    "id": "2605.15172",
    "kind": "arxiv",
    "version": 1
  }
}