pith. sign in
Pith Number

pith:NCTT23H2

pith:2026:NCTT23H2SLTJGYAMII3FZVBGCV
not attested not anchored not stored refs pending

Prompt Injection as Role Confusion

Charles Ye, Dylan Hadfield-Menell, Jasmine Cui

Language models fall for prompt injection because they judge text by its sound rather than its actual source.

arxiv:2603.12277 v5 · 2026-02-22 · cs.CL · cs.AI · cs.CR

Add to your LaTeX paper
\usepackage{pith}
\pithnumber{NCTT23H2SLTJGYAMII3FZVBGCV}

Prints a linked badge after your title and injects PDF metadata. Compiles on arXiv. Learn more · Embed verified badge

Record completeness

1 Bitcoin timestamp
2 Internet Archive
3 Author claim open · sign in to claim
4 Citations open
5 Replications open
Portable graph bundle live · download bundle · merged state
The bundle contains the canonical record plus signed events. A mirror can host it anywhere and recompute the same current state with the deterministic merge algorithm.

Claims

C1strongest claim

We trace this failure to role confusion: models infer the source of text based on how it sounds, not where it actually comes from... the degree of role confusion strongly predicts attack success... introducing a unifying framework that reframes prompt injection not as an ad-hoc exploit but as a measurable consequence of how models represent role.

C2weakest assumption

That role probes accurately measure internal role perception and that this perception causally drives the behavioral prompt injection success rather than merely correlating with it.

C3one line summary

Language models confuse roles based on how text sounds rather than its true source, enabling measurable prompt injection attacks via role probes that predict success rates.

Formal links

2 machine-checked theorem links

Receipt and verification
First computed 2026-06-01T01:02:36.888342Z
Builder pith-number-builder-2026-05-17-v1
Signature Pith Ed25519 (pith-v1-2026-05) · public key
Schema pith-number/v1.0

Canonical hash

68a73d6cfa92e693600c42365cd426156bc5a66b75aacc328a6a6900cd2a6b84

Aliases

arxiv: 2603.12277 · arxiv_version: 2603.12277v5 · doi: 10.48550/arxiv.2603.12277 · pith_short_12: NCTT23H2SLTJ · pith_short_16: NCTT23H2SLTJGYAM · pith_short_8: NCTT23H2
Agent API
Verify this Pith Number yourself
curl -sH 'Accept: application/ld+json' https://pith.science/pith/NCTT23H2SLTJGYAMII3FZVBGCV \
  | jq -c '.canonical_record' \
  | python3 -c "import sys,json,hashlib; b=json.dumps(json.loads(sys.stdin.read()), sort_keys=True, separators=(',',':'), ensure_ascii=False).encode(); print(hashlib.sha256(b).hexdigest())"
# expect: 68a73d6cfa92e693600c42365cd426156bc5a66b75aacc328a6a6900cd2a6b84
Canonical record JSON
{
  "metadata": {
    "abstract_canon_sha256": "1fd173ffe76f4c1fdacae62c0b259dcddab24491ce607557d4de3b39cc718967",
    "cross_cats_sorted": [
      "cs.AI",
      "cs.CR"
    ],
    "license": "http://arxiv.org/licenses/nonexclusive-distrib/1.0/",
    "primary_cat": "cs.CL",
    "submitted_at": "2026-02-22T18:43:34Z",
    "title_canon_sha256": "b797b173eb31f7e2e55d66f702ce648ba2e7efc582da7d634a5eede42cfbb69b"
  },
  "schema_version": "1.0",
  "source": {
    "id": "2603.12277",
    "kind": "arxiv",
    "version": 5
  }
}