pith:NCTT23H2
Prompt Injection as Role Confusion
Language models fall for prompt injection because they judge text by its sound rather than its actual source.
arxiv:2603.12277 v5 · 2026-02-22 · cs.CL · cs.AI · cs.CR
Add to your LaTeX paper
\usepackage{pith}
\pithnumber{NCTT23H2SLTJGYAMII3FZVBGCV}
Prints a linked badge after your title and injects PDF metadata. Compiles on arXiv. Learn more · Embed verified badge
Record completeness
Claims
We trace this failure to role confusion: models infer the source of text based on how it sounds, not where it actually comes from... the degree of role confusion strongly predicts attack success... introducing a unifying framework that reframes prompt injection not as an ad-hoc exploit but as a measurable consequence of how models represent role.
That role probes accurately measure internal role perception and that this perception causally drives the behavioral prompt injection success rather than merely correlating with it.
Language models confuse roles based on how text sounds rather than its true source, enabling measurable prompt injection attacks via role probes that predict success rates.
Formal links
Receipt and verification
| First computed | 2026-06-01T01:02:36.888342Z |
|---|---|
| Builder | pith-number-builder-2026-05-17-v1 |
| Signature | Pith Ed25519
(pith-v1-2026-05) · public key |
| Schema | pith-number/v1.0 |
Canonical hash
68a73d6cfa92e693600c42365cd426156bc5a66b75aacc328a6a6900cd2a6b84
Aliases
· · · · ·Agent API
Verify this Pith Number yourself
curl -sH 'Accept: application/ld+json' https://pith.science/pith/NCTT23H2SLTJGYAMII3FZVBGCV \
| jq -c '.canonical_record' \
| python3 -c "import sys,json,hashlib; b=json.dumps(json.loads(sys.stdin.read()), sort_keys=True, separators=(',',':'), ensure_ascii=False).encode(); print(hashlib.sha256(b).hexdigest())"
# expect: 68a73d6cfa92e693600c42365cd426156bc5a66b75aacc328a6a6900cd2a6b84
Canonical record JSON
{
"metadata": {
"abstract_canon_sha256": "1fd173ffe76f4c1fdacae62c0b259dcddab24491ce607557d4de3b39cc718967",
"cross_cats_sorted": [
"cs.AI",
"cs.CR"
],
"license": "http://arxiv.org/licenses/nonexclusive-distrib/1.0/",
"primary_cat": "cs.CL",
"submitted_at": "2026-02-22T18:43:34Z",
"title_canon_sha256": "b797b173eb31f7e2e55d66f702ce648ba2e7efc582da7d634a5eede42cfbb69b"
},
"schema_version": "1.0",
"source": {
"id": "2603.12277",
"kind": "arxiv",
"version": 5
}
}