pith. sign in

arxiv: 2605.28017 · v2 · pith:RNZQOB6Snew · submitted 2026-05-27 · 💻 cs.CR · cs.IR

Can It Reach the Generator? Investigating the Survival of Prompt-Injection Attacks in Realistic RAG Settings

Pith reviewed 2026-06-29 11:53 UTC · model grok-4.3

classification 💻 cs.CR cs.IR
keywords prompt injectionRAGgenerative engine optimisationattack survivalretrieval augmented generationLLM securityGEO attacks
0
0 comments X

The pith

In realistic RAG pipelines, most prompt-injection attacks do not survive retrieval and reranking to reach the generator.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper tests seven GEO attacks inside a full three-stage RAG pipeline of retriever, LLM reranker, and LLM generator rather than assuming the attacked document is always passed directly to the generator. Gradient-based and instruction-override attacks largely disappear before they reach the generator, while only LLM-driven prompt injections remain effective through the entire process. Earlier reports of roughly 80 percent success therefore rest on an unrealistic shortcut. The same study shows that a lightweight guard model finetuned on a small attack dataset already catches every tested attack.

Core claim

When GEO attacks are evaluated end-to-end through a retriever, LLM reranker, and LLM generator, gradient-based and instruction override attacks largely collapse before reaching the generator, and only LLM-driven prompt injections remain effective end-to-end.

What carries the argument

The three-stage pipeline of retriever followed by LLM reranker followed by LLM generator, which filters out many modified documents before they reach the final model.

If this is right

  • Gradient-based attacks lose most of their effectiveness once document retrieval and reranking are required.
  • Instruction override attacks also fail to survive the full pipeline at high rates.
  • LLM-driven prompt injections maintain their ability to influence the generator.
  • A lightweight finetuned prompt-injection guard detects every tested attack.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Production RAG systems may gain more protection by hardening the retrieval and reranking stages than by hardening the generator alone.
  • Success rates measured in direct-injection tests should be treated as upper bounds when estimating risk for live deployments.
  • Repeating the survival measurements on additional retriever and reranker architectures would test how general the collapse pattern is.

Load-bearing premise

The specific retriever, reranker, and generator models, datasets, and attack implementations chosen for the three-stage pipeline are representative of deployed production RAG systems.

What would settle it

Re-running the seven attacks through a different combination of retriever, reranker, and generator models and checking whether gradient-based and instruction-override attacks still largely fail to reach the generator.

Figures

Figures reproduced from arXiv: 2605.28017 by Bevan Koopman, Guido Zuccon, Shuai Wang, Yu Yin.

Figure 1
Figure 1. Figure 1: Overview of the end-to-end evaluation pipeline. [PITH_FULL_IMAGE:figures/full_fig_p003_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Attack strength when attacking the document [PITH_FULL_IMAGE:figures/full_fig_p005_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Stage-specific position preference (retriever [PITH_FULL_IMAGE:figures/full_fig_p008_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Distribution of product counts per query in the [PITH_FULL_IMAGE:figures/full_fig_p011_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Per-label end-to-end ∆nDCG@5 (E2E − clean) by attack and ESCI label at positions 6 and 10. For E-label targets, positive ∆ indicates the attack pro￾motes a relevant product without degrading ranking quality. For S/C/I-label targets, negative ∆ indicates the attack promotes a less relevant product into the top-5, degrading overall ranking quality. In both cases, larger magnitude reflects stronger attack inf… view at source ↗
Figure 6
Figure 6. Figure 6: Retriever comparison (BM25 vs. dense) across [PITH_FULL_IMAGE:figures/full_fig_p016_6.png] view at source ↗
read the original abstract

Recent generative engine optimisation (GEO) research has shown that prompt-injection attacks can push a target product to the top of an LLM's recommendation list, with the strongest attacks reporting around $80\%$ success and raising serious security concerns about RAG-based recommendation. However, these results assume the attacked document is always fed directly to the generator, bypassing the retriever and reranker. This is unrealistic: in deployed RAG systems, the attack modifies the document content, which can in turn change whether the document is retrieved and reranked highly enough to reach the generator at all. In this paper, we re-evaluate seven GEO attacks under a realistic three-stage pipeline (retriever\,$\to$\,LLM reranker\,$\to$\,LLM generator). We find that prior protocols substantially overstate attack effectiveness: gradient-based and instruction override attacks largely collapse before reaching the generator, and only LLM-driven prompt injections remain effective end-to-end. Our analysis further reveals that current GEO attacks are easily detectable: a lightweight prompt-injection guard finetuned on a small attack dataset already detects every attack. Our code and data are available at https://github.com/ielab/geo_injection_rag_survival.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 1 minor

Summary. The paper claims that prior GEO research overstates the effectiveness of prompt-injection attacks by assuming attacked documents are directly passed to the generator. Re-evaluating seven attacks in a realistic three-stage RAG pipeline (retriever → LLM reranker → LLM generator) shows that gradient-based and instruction-override attacks largely fail to reach the generator, while only LLM-driven prompt injections remain effective end-to-end; the attacks are also easily detectable by a lightweight finetuned guard. Code and data are released.

Significance. If the results hold, the work underscores the gap between simplified attack evaluations and deployed RAG systems, indicating that security concerns from earlier studies may be overstated. The open release of code and data is a clear strength that supports reproducibility and further testing of the survival-rate findings.

major comments (2)
  1. [Abstract] The central claim that gradient-based and instruction-override attacks 'largely collapse' before reaching the generator is measured only on one fixed retriever/reranker/generator pipeline (Abstract); without ablations on alternative dense retrievers, cross-encoders, or production rerankers, the result is tied to this specific configuration and its generalizability to other deployed systems is unclear.
  2. [Abstract] The abstract states clear experimental outcomes on attack survival rates but provides no details on statistical tests, exact model sizes, dataset sizes, or controls for confounding factors, preventing full assessment of support for the reported differences between attack types.
minor comments (1)
  1. [Abstract] The 80% success rate cited from prior work lacks a specific reference to the protocols being compared.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive comments on our work. We respond to each major comment below and outline revisions to address the concerns raised.

read point-by-point responses
  1. Referee: [Abstract] The central claim that gradient-based and instruction-override attacks 'largely collapse' before reaching the generator is measured only on one fixed retriever/reranker/generator pipeline (Abstract); without ablations on alternative dense retrievers, cross-encoders, or production rerankers, the result is tied to this specific configuration and its generalizability to other deployed systems is unclear.

    Authors: We chose a representative three-stage pipeline (dense retriever, LLM reranker, LLM generator) to reflect common deployed RAG setups and highlight the survival problem. We agree that additional ablations would strengthen claims of generalizability. In the revision we will add results using an alternative dense retriever and a cross-encoder reranker, reporting attack survival rates across these configurations. This will allow readers to assess whether the observed collapse is configuration-specific. revision: yes

  2. Referee: [Abstract] The abstract states clear experimental outcomes on attack survival rates but provides no details on statistical tests, exact model sizes, dataset sizes, or controls for confounding factors, preventing full assessment of support for the reported differences between attack types.

    Authors: The abstract is space-constrained, but we accept that key parameters should be stated to support the claims. We will revise the abstract to include dataset size (1000 queries), model sizes (7B reranker and generator), and note that statistical significance was assessed via paired tests. Full details on controls (fixed seeds, identical templates) and exact statistical procedures already appear in Section 3 and the released code; we will add an explicit cross-reference in the abstract. revision: partial

Circularity Check

0 steps flagged

No circularity; purely empirical attack survival measurements

full rationale

The paper conducts an empirical re-evaluation of seven GEO attacks by measuring their end-to-end survival rates through an explicit three-stage RAG pipeline (retriever → reranker → generator). No equations, fitted parameters, self-citations used as load-bearing premises, or derivations are present in the provided text. All claims reduce directly to observed success rates on the chosen models and datasets, with no reduction by construction or renaming of prior results. This is a standard empirical comparison and receives the default non-circularity finding.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Empirical security evaluation with no free parameters, axioms, or invented entities beyond standard assumptions of machine-learning evaluation.

pith-pipeline@v0.9.1-grok · 5754 in / 997 out tokens · 31432 ms · 2026-06-29T11:53:18.017215+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

41 extracted references · 11 canonical work pages · 3 internal anchors

  1. [1]

    Pranjal Aggarwal, Vishvak Murahari, Tanmay Rajpurohit, Ashwin Kalyan, Karthik Narasimhan, and Ameet Deshpande. 2024. Geo: Generative engine optimization. In Proceedings of the 30th ACM SIGKDD conference on knowledge discovery and data mining, pages 5--16

  2. [2]

    Amazon News . 2025. https://www.aboutamazon.com/news/retail/amazon-rufus-ai-assistant-personalized-shopping-features Amazon's next-gen ai assistant for shopping is now even smarter, more capable, and more helpful . Accessed: 2026-05-08

  3. [3]

    Ricardo Baeza-Yates, Berthier Ribeiro-Neto, and 1 others. 1999. Modern information retrieval, volume 463. ACM press New York

  4. [4]

    Mahe Chen, Xiaoxuan Wang, Kaiwen Chen, and Nick Koudas. 2025. Generative engine optimization: How to dominate ai search. arXiv preprint arXiv:2509.08919

  5. [5]

    Xin Cheng, Xun Wang, Xingxing Zhang, Tao Ge, Si-Qing Chen, Furu Wei, Huishuai Zhang, and Dongyan Zhao. 2024. xrag: Extreme context compression for retrieval-augmented generation with one token. Advances in Neural Information Processing Systems, 37:109487--109516

  6. [6]

    Yunfan Gao, Yun Xiong, Xinyu Gao, Kangxiang Jia, Jinliu Pan, Yuxi Bi, Yixin Dai, Jiawei Sun, Haofen Wang, Haofen Wang, and 1 others. 2023. Retrieval-augmented generation for large language models: A survey. arXiv preprint arXiv:2312.10997, 2(1):32

  7. [7]

    Yupeng Hou, Junjie Zhang, Zihan Lin, Hongyu Lu, Ruobing Xie, Julian McAuley, and Wayne Xin Zhao. 2024. Large language models are zero-shot rankers for recommender systems. In European conference on information retrieval, pages 364--381. Springer

  8. [8]

    Chumeng Jiang, Jiayin Wang, Weizhi Ma, Charles LA Clarke, Shuai Wang, Chuhan Wu, and Min Zhang. 2025. Beyond utility: Evaluating llm as recommender. In Proceedings of the ACM on Web Conference 2025, pages 3850--3862

  9. [9]

    Haibo Jin, Ruoxi Chen, Peiyan Zhang, Yifeng Luo, Huimin Zeng, Man Luo, and Haohan Wang. 2026. Controlling output rankings in generative engines for llm-based search. arXiv preprint arXiv:2602.03608

  10. [10]

    Vladimir Karpukhin, Barlas Oguz, Sewon Min, Patrick Lewis, Ledell Wu, Sergey Edunov, Danqi Chen, and Wen-tau Yih. 2020. Dense passage retrieval for open-domain question answering. In Proceedings of the 2020 conference on empirical methods in natural language processing (EMNLP), pages 6769--6781

  11. [11]

    Aounon Kumar and Himabindu Lakkaraju. 2024. Manipulating large language models to increase product visibility. arXiv preprint arXiv:2404.07981

  12. [12]

    Nelson F Liu, Kevin Lin, John Hewitt, Ashwin Paranjape, Michele Bevilacqua, Fabio Petroni, and Percy Liang. 2024. Lost in the middle: How language models use long contexts. Transactions of the association for computational linguistics, 12:157--173

  13. [13]

    Xueguang Ma, Liang Wang, Nan Yang, Furu Wei, and Jimmy Lin. 2024. Fine-tuning llama for multi-stage text retrieval. In Proceedings of the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval, pages 2421--2425

  14. [14]

    Xueguang Ma, Xinyu Zhang, Ronak Pradeep, and Jimmy Lin. 2023. Zero-shot listwise document reranking with a large language model. arXiv preprint arXiv:2305.02156

  15. [15]

    Meta . 2025 a . Llama guard 4. https://www.llama.com/docs/model-cards-and-prompt-formats/llama-guard-4/. Accessed: 2026-05-10

  16. [16]

    Meta . 2025 b . Prompt guard 2. https://www.llama.com/docs/model-cards-and-prompt-formats/prompt-guard/. Accessed: 2026-05-10

  17. [17]

    Rodrigo Nogueira, Zhiying Jiang, Ronak Pradeep, and Jimmy Lin. 2020. Document ranking with a pretrained sequence-to-sequence model. In Findings of the association for computational linguistics: EMNLP 2020, pages 708--718

  18. [18]

    Samuel Pfrommer, Yatong Bai, Tanmay Gautam, and Somayeh Sojoudi. 2024. Ranking manipulation for conversational search engines. In Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing, pages 9523--9552

  19. [19]

    Yaoyao Qian, Yifan Zeng, Yuchao Jiang, Chelsi Jain, and Huazheng Wang. 2025. The ranking blind spot: Decision hijacking in llm-based text ranking. In Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing, pages 21969--21979

  20. [20]

    Zhen Qin, Rolf Jagerman, Kai Hui, Honglei Zhuang, Junru Wu, Le Yan, Jiaming Shen, Tianqi Liu, Jialu Liu, Donald Metzler, and 1 others. 2024. Large language models are effective text rankers with pairwise ranking prompting. In Findings of the Association for Computational Linguistics: NAACL 2024, pages 1504--1518

  21. [21]

    David Rau, Herv \'e D \'e jean, Nadezhda Chirkova, Thibault Formal, Shuai Wang, St \'e phane Clinchant, and Vassilina Nikoulina. 2024. Bergen: A benchmarking library for retrieval-augmented generation. In Findings of the Association for Computational Linguistics: EMNLP 2024, pages 7640--7663

  22. [22]

    Chandan K Reddy, Llu \' s M \`a rquez, Fran Valero, Nikhil Rao, Hugo Zaragoza, Sambaran Bandyopadhyay, Arnab Biswas, Anlu Xing, and Karthik Subbian. 2022. Shopping queries dataset: A large-scale esci benchmark for improving product search. arXiv preprint arXiv:2206.06588

  23. [23]

    Stephen Robertson and Hugo Zaragoza. 2009. The probabilistic relevance framework: BM25 and beyond, volume 4. Now Publishers Inc

  24. [24]

    Stephen E Robertson and K Sparck Jones. 1976. Relevance weighting of search terms. Journal of the American Society for Information science, 27(3):129--146

  25. [25]

    Devendra Sachan, Mike Lewis, Mandar Joshi, Armen Aghajanyan, Wen-tau Yih, Joelle Pineau, and Luke Zettlemoyer. 2022. Improving passage retrieval with zero-shot question generation. In Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, pages 3781--3797

  26. [26]

    Hinrich Sch \"u tze, Christopher D Manning, and Prabhakar Raghavan. 2008. Introduction to information retrieval, volume 39. Cambridge University Press Cambridge

  27. [27]

    Dave Smith. 2025. https://fortune.com/2025/11/02/amazon-rufus-ai-shopping-assistant-chatbot-10-billion-sales-monetization/ Amazon says its ai shopping assistant rufus is so effective it's on pace to pull in an extra \ 10 billion in sales . Accessed: 2026-05-08

  28. [28]

    Weiwei Sun, Lingyong Yan, Xinyu Ma, Shuaiqiang Wang, Pengjie Ren, Zhumin Chen, Dawei Yin, and Zhaochun Ren. 2023. Is chatgpt good at search? investigating large language models as re-ranking agents. In Proceedings of the 2023 conference on empirical methods in natural language processing, pages 14918--14937

  29. [29]

    Raphael Tang, Crystina Zhang, Xueguang Ma, Jimmy Lin, and Ferhan T \"u re. 2024. Found in the middle: Permutation self-consistency improves listwise ranking in large language models. In Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers), pag...

  30. [30]

    Yiming Tang, Yi Fan, Chenxiao Yu, Tiankai Yang, Yue Zhao, and Xiyang Hu. 2025. Stealthrank: Llm ranking manipulation via stealthy prompt optimization. arXiv preprint arXiv:2504.05804

  31. [31]

    Han Wang, Archiki Prasad, Elias Stengel-Eskin, and Mohit Bansal. 2025. Retrieval-augmented generation with conflicting evidence. arXiv preprint arXiv:2504.13079

  32. [32]

    Lidan Wang, Jimmy Lin, and Donald Metzler. 2011. A cascade ranking model for efficient ranked retrieval. In Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval, pages 105--114

  33. [33]

    Tiancheng Xing, Jerry Li, Yixuan Du, and Xiyang Hu. 2025. Are llms reliable rankers? rank manipulation via two-stage token optimization. arXiv preprint arXiv:2510.06732

  34. [34]

    Yu Yin, Shuai Wang, Bevan Koopman, and Guido Zuccon. 2026. The vulnerability of llm rankers to prompt injection attacks. arXiv preprint arXiv:2602.16752

  35. [35]

    Yue Yu, Wei Ping, Zihan Liu, Boxin Wang, Jiaxuan You, Chao Zhang, Mohammad Shoeybi, and Bryan Catanzaro. 2024. Rankrag: Unifying context ranking with retrieval-augmented generation in llms. Advances in Neural Information Processing Systems, 37:121156--121184

  36. [36]

    Haiquan Zhao, Chenhan Yuan, Fei Huang, Xiaomeng Hu, Yichang Zhang, An Yang, Bowen Yu, Dayiheng Liu, Jingren Zhou, Junyang Lin, and 1 others. 2025. Qwen3guard technical report. arXiv preprint arXiv:2510.14276

  37. [37]

    Tao Zhou and Songtao Li. 2026. Understanding user switch of information seeking: From search engines to generative ai. Journal of librarianship and information science, 58(1):696--708

  38. [38]

    Shengyao Zhuang, Honglei Zhuang, Bevan Koopman, and Guido Zuccon. 2024. A setwise approach for effective and highly efficient zero-shot ranking with large language models. In Proceedings of the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval, pages 38--47

  39. [39]

    Guido Zuccon, Shengyao Zhuang, and Xueguang Ma. 2025. R2llms: Retrieval and ranking with llms. In Proceedings of the 48th International ACM SIGIR Conference on Research and Development in Information Retrieval, pages 4106--4109

  40. [40]

    online" 'onlinestring :=

    ENTRY address archivePrefix author booktitle chapter edition editor eid eprint eprinttype howpublished institution journal key month note number organization pages publisher school series title type volume year doi pubmed url lastchecked label extra.label sort.label short.list INTEGERS output.state before.all mid.sentence after.sentence after.block STRING...

  41. [41]

    write newline

    " write newline "" before.all 'output.state := FUNCTION n.dashify 't := "" t empty not t #1 #1 substring "-" = t #1 #2 substring "--" = not "--" * t #2 global.max substring 't := t #1 #1 substring "-" = "-" * t #2 global.max substring 't := while if t #1 #1 substring * t #2 global.max substring 't := if while FUNCTION word.in bbl.in capitalize " " * FUNCT...