pith. sign in
Pith Number

pith:F3FPAIIF

pith:2024:F3FPAIIFWWF3BMZ5TRBYHCPUNI
not attested not anchored not stored refs resolved

Defending Against Indirect Prompt Injection Attacks With Spotlighting

Emre Kiciman, Federico Zarfati, Gary Lopez, Keegan Hines, Matthew Hall, Yonatan Zunger

Spotlighting uses input transformations to mark data origins, letting LLMs ignore embedded adversarial instructions and cutting indirect prompt injection success from over 50% to under 2%.

arxiv:2403.14720 v1 · 2024-03-20 · cs.CR · cs.CL · cs.LG

Add to your LaTeX paper
\usepackage{pith}
\pithnumber{F3FPAIIFWWF3BMZ5TRBYHCPUNI}

Prints a linked badge after your title and injects PDF metadata. Compiles on arXiv. Learn more · Embed verified badge

Record completeness

1 Bitcoin timestamp
2 Internet Archive
3 Author claim open · sign in to claim
4 Citations open
5 Replications open
Portable graph bundle live · download bundle · merged state
The bundle contains the canonical record plus signed events. A mirror can host it anywhere and recompute the same current state with the deterministic merge algorithm.

Claims

C1strongest claim

spotlighting reduces the attack success rate from greater than 50% to below 2% in our experiments with minimal impact on task efficacy.

C2weakest assumption

That the chosen input transformations create a reliable, continuous provenance signal that LLMs will consistently interpret and follow without being bypassed by new attack variants.

C3one line summary

Spotlighting prompt transformations cut indirect prompt injection success rates from >50% to <2% on GPT models while preserving task performance.

References

22 extracted · 22 resolved · 12 Pith anchors

[1] Code Llama: Open Foundation Models for Code 2023 · arXiv:2308.12950
[2] Benchmarking and Defending Against Indirect Prompt Injection Attacks on Large Language Models 2023
[3] SuperGLUE: A Stickier Benchmark for General-Purpose Language Understanding Systems 1905 · arXiv:1905.00537
[4] SQuAD: 100,000+ Questions for Machine Comprehension of Text 2016 · arXiv:1606.05250
[5] Learning Word Vectors for Sentiment Analysis, 2011

Formal links

3 machine-checked theorem links

Cited by

34 papers in Pith

Receipt and verification
First computed 2026-05-17T23:39:21.453581Z
Builder pith-number-builder-2026-05-17-v1
Signature Pith Ed25519 (pith-v1-2026-05) · public key
Schema pith-number/v1.0

Canonical hash

2ecaf02105b58bb0b33d9c438389f46a3d996bf390d91fe8aa9b2b65415e8c1f

Aliases

arxiv: 2403.14720 · arxiv_version: 2403.14720v1 · doi: 10.48550/arxiv.2403.14720 · pith_short_12: F3FPAIIFWWF3 · pith_short_16: F3FPAIIFWWF3BMZ5 · pith_short_8: F3FPAIIF
Agent API
Verify this Pith Number yourself
curl -sH 'Accept: application/ld+json' https://pith.science/pith/F3FPAIIFWWF3BMZ5TRBYHCPUNI \
  | jq -c '.canonical_record' \
  | python3 -c "import sys,json,hashlib; b=json.dumps(json.loads(sys.stdin.read()), sort_keys=True, separators=(',',':'), ensure_ascii=False).encode(); print(hashlib.sha256(b).hexdigest())"
# expect: 2ecaf02105b58bb0b33d9c438389f46a3d996bf390d91fe8aa9b2b65415e8c1f
Canonical record JSON
{
  "metadata": {
    "abstract_canon_sha256": "fea25809dd0b1a10541abd53367f0aa00aebbf135846fabc969247d4eaeb46eb",
    "cross_cats_sorted": [
      "cs.CL",
      "cs.LG"
    ],
    "license": "http://creativecommons.org/licenses/by/4.0/",
    "primary_cat": "cs.CR",
    "submitted_at": "2024-03-20T15:26:23Z",
    "title_canon_sha256": "8301859de849661297e773cbeb44b0110dbd106073897754b6d4c5439be2684a"
  },
  "schema_version": "1.0",
  "source": {
    "id": "2403.14720",
    "kind": "arxiv",
    "version": 1
  }
}