{"paper":{"title":"Defending Against Indirect Prompt Injection Attacks With Spotlighting","license":"http://creativecommons.org/licenses/by/4.0/","headline":"Spotlighting uses input transformations to mark data origins, letting LLMs ignore embedded adversarial instructions and cutting indirect prompt injection success from over 50% to under 2%.","cross_cats":["cs.CL","cs.LG"],"primary_cat":"cs.CR","authors_text":"Emre Kiciman, Federico Zarfati, Gary Lopez, Keegan Hines, Matthew Hall, Yonatan Zunger","submitted_at":"2024-03-20T15:26:23Z","abstract_excerpt":"Large Language Models (LLMs), while powerful, are built and trained to process a single text input. In common applications, multiple inputs can be processed by concatenating them together into a single stream of text. However, the LLM is unable to distinguish which sections of prompt belong to various input sources. Indirect prompt injection attacks take advantage of this vulnerability by embedding adversarial instructions into untrusted data being processed alongside user commands. Often, the LLM will mistake the adversarial instructions as user commands to be followed, creating a security vu"},"claims":{"count":4,"items":[{"kind":"strongest_claim","text":"spotlighting reduces the attack success rate from greater than 50% to below 2% in our experiments with minimal impact on task efficacy.","source":"verdict.strongest_claim","status":"machine_extracted","claim_id":"C1","attestation":"unclaimed"},{"kind":"weakest_assumption","text":"That the chosen input transformations create a reliable, continuous provenance signal that LLMs will consistently interpret and follow without being bypassed by new attack variants.","source":"verdict.weakest_assumption","status":"machine_extracted","claim_id":"C2","attestation":"unclaimed"},{"kind":"one_line_summary","text":"Spotlighting prompt transformations cut indirect prompt injection success rates from >50% to <2% on GPT models while preserving task performance.","source":"verdict.one_line_summary","status":"machine_extracted","claim_id":"C3","attestation":"unclaimed"},{"kind":"headline","text":"Spotlighting uses input transformations to mark data origins, letting LLMs ignore embedded adversarial instructions and cutting indirect prompt injection success from over 50% to under 2%.","source":"verdict.pith_extraction.headline","status":"machine_extracted","claim_id":"C4","attestation":"unclaimed"}],"snapshot_sha256":"bc51889fa24ca1cf7b4e57d628d840e1c47756b4ac996cbffe9967773bf2043e"},"source":{"id":"2403.14720","kind":"arxiv","version":1},"verdict":{"id":"fd214ed0-7e32-4f65-a188-42294243632d","model_set":{"reader":"grok-4.3"},"created_at":"2026-05-14T22:24:16.816058Z","strongest_claim":"spotlighting reduces the attack success rate from greater than 50% to below 2% in our experiments with minimal impact on task efficacy.","one_line_summary":"Spotlighting prompt transformations cut indirect prompt injection success rates from >50% to <2% on GPT models while preserving task performance.","pipeline_version":"pith-pipeline@v0.9.0","weakest_assumption":"That the chosen input transformations create a reliable, continuous provenance signal that LLMs will consistently interpret and follow without being bypassed by new attack variants.","pith_extraction_headline":"Spotlighting uses input transformations to mark data origins, letting LLMs ignore embedded adversarial instructions and cutting indirect prompt injection success from over 50% to under 2%."},"references":{"count":22,"sample":[{"doi":"","year":2023,"title":"Code Llama: Open Foundation Models for Code","work_id":"e73bffa4-7620-47ac-9327-259a60db52ca","ref_index":1,"cited_arxiv_id":"2308.12950","is_internal_anchor":true},{"doi":"","year":2023,"title":"Benchmarking and Defending Against Indirect Prompt Injection Attacks on Large Language Models","work_id":"0a458c42-fb17-4655-82ad-c93057550c76","ref_index":2,"cited_arxiv_id":"","is_internal_anchor":false},{"doi":"","year":1905,"title":"SuperGLUE: A Stickier Benchmark for General-Purpose Language Understanding Systems","work_id":"54fdcd2d-ade5-4d5e-9b37-8d75abcbaae2","ref_index":3,"cited_arxiv_id":"1905.00537","is_internal_anchor":true},{"doi":"","year":2016,"title":"SQuAD: 100,000+ Questions for Machine Comprehension of Text","work_id":"0492dd16-26e8-48d9-874c-3dd90cae7b85","ref_index":4,"cited_arxiv_id":"1606.05250","is_internal_anchor":true},{"doi":"","year":2011,"title":"Learning Word Vectors for Sentiment Analysis,","work_id":"9c87ba77-b4a4-434a-aa36-0dee865dcd6a","ref_index":5,"cited_arxiv_id":"","is_internal_anchor":false}],"resolved_work":22,"snapshot_sha256":"c3c754cd10427b708c32358ac5785c656dd9cc722656ca3fd8f1278595c0df64","internal_anchors":12},"formal_canon":{"evidence_count":3,"snapshot_sha256":"2a65519579dab32c2b384b1f257eb88b225e03efc862d06133f48eb9860002e6"},"author_claims":{"count":0,"strong_count":0,"snapshot_sha256":"258153158e38e3291e3d48162225fcdb2d5a3ed65a07baac614ab91432fd4f57"},"builder_version":"pith-number-builder-2026-05-17-v1"}