Recognition: unknown
AdversarialCoT: Single-Document Retrieval Poisoning for LLM Reasoning
Pith reviewed 2026-05-10 16:17 UTC · model grok-4.3
The pith
A single adversarial document can significantly degrade large language model reasoning in retrieval-augmented generation.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
AdversarialCoT is a query-specific attack that poisons only a single document by extracting the LLM's reasoning framework to construct an initial adversarial chain-of-thought, then iteratively refines the document through direct interactions with the LLM to progressively expose and exploit critical reasoning vulnerabilities, which results in substantially reduced accuracy when the document is surfaced by the retriever.
What carries the argument
AdversarialCoT, the iterative refinement of a single query-specific adversarial chain-of-thought document that guides the LLM toward flawed reasoning once retrieved.
If this is right
- RAG systems become vulnerable to targeted poisoning attacks that require injecting only one document.
- LLM reasoning accuracy drops substantially once the crafted adversarial content is retrieved and incorporated.
- Iterative interactions allow progressive identification and exploitation of subtle weaknesses in the model's reasoning process.
- The approach exposes security risks and offers insights for building more robust LLM reasoning pipelines.
Where Pith is reading between the lines
- Retrieval systems may need additional checks to detect documents that mimic or distort reasoning chains.
- This single-document technique could extend to other retrieval-dependent AI applications that rely on external knowledge.
- Defenses might benefit from requiring multiple independent documents or verifying consistency across retrieved sources.
Load-bearing premise
The attack requires the adversary to interact iteratively with the target LLM to refine the poisoned document and assumes the retriever will reliably return that single document for the chosen query.
What would settle it
Measure whether reasoning accuracy on standard benchmarks remains above baseline levels when a single refined adversarial document is the only one retrieved and consumed during inference.
Figures
read the original abstract
Retrieval-augmented generation (RAG) enhances large language model (LLM) reasoning by retrieving external documents, but also opens up new attack surfaces. We study knowledge-base poisoning attacks in RAG, where an attacker injects malicious content into the retrieval corpus, which is then naturally surfaced by the retriever and consumed by the LLM during reasoning. Unlike prior work that floods the corpus with poisoned documents, we propose AdversarialCoT, a query-specific attack that poisons only a single document in the corpus. AdversarialCoT first extracts the target LLM's reasoning framework to guide the construction of an initial adversarial chain-of-thought (CoT). The adversarial document is iteratively refined through interactions with the LLM, progressively exposing and exploiting critical reasoning vulnerabilities. Experiments on benchmark LLMs show that a single adversarial document can significantly degrade reasoning accuracy, revealing subtle yet impactful weaknesses. This study exposes security risks in RAG systems and provides actionable insights for designing more robust LLM reasoning pipelines.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper proposes AdversarialCoT, a query-specific single-document poisoning attack on RAG systems. It extracts the target LLM's reasoning framework to construct an initial adversarial chain-of-thought, then iteratively refines the poisoned document via interactions with the LLM to exploit reasoning vulnerabilities. The central claim is that injecting only this one document into the corpus causes the retriever to naturally surface it and significantly degrade LLM reasoning accuracy on benchmark tasks.
Significance. If the results hold under realistic retrieval conditions, the work would be significant for RAG security and LLM reasoning robustness, demonstrating that minimal poisoning (one document) can compromise systems where prior attacks required corpus flooding. It exposes subtle integration weaknesses between retrieval and reasoning steps and offers insights for defenses. The empirical iterative refinement process is a strength, as it is presented as a direct construction without free parameters or circular derivations.
major comments (2)
- [Experimental Evaluation] The experimental evaluation reports accuracy degradation from the single adversarial document but supplies no retrieval metrics (e.g., rank, hit rate at k=1, or success rate of surfacing the document for target queries). This is load-bearing for the central claim, as the attack presupposes that the retriever will naturally surface the single poisoned document in the RAG pipeline; without these metrics, the causal connection between injection and observed degradation cannot be verified, especially in large corpora with approximate nearest-neighbor search.
- [AdversarialCoT Construction] The AdversarialCoT construction (§3) relies on iterative interactions with the target LLM to refine the document and expose vulnerabilities. The paper should explicitly state the threat model (e.g., black-box query access only) and typical iteration counts, as this assumption directly affects whether the attack is practical outside controlled settings.
minor comments (2)
- [Abstract] The abstract asserts significant accuracy degradation without any numerical results, baselines, or error bars; including a concise quantitative summary would improve completeness.
- Define notation for the adversarial document and CoT components consistently upon first use to prevent ambiguity in the method description.
Simulated Author's Rebuttal
We thank the referee for their constructive and detailed feedback. The comments highlight important aspects of empirical validation and threat model clarity that will strengthen the manuscript. We address each point below and will incorporate the suggested revisions.
read point-by-point responses
-
Referee: [Experimental Evaluation] The experimental evaluation reports accuracy degradation from the single adversarial document but supplies no retrieval metrics (e.g., rank, hit rate at k=1, or success rate of surfacing the document for target queries). This is load-bearing for the central claim, as the attack presupposes that the retriever will naturally surface the single poisoned document in the RAG pipeline; without these metrics, the causal connection between injection and observed degradation cannot be verified, especially in large corpora with approximate nearest-neighbor search.
Authors: We agree that retrieval metrics are essential to verify the attack's mechanism and establish the causal link between document injection and reasoning degradation. In the revised manuscript, we will add a dedicated analysis subsection reporting the average retrieval rank of the adversarial document, hit rate at k=1, and surfacing success rates across queries, models, and varying corpus sizes (including approximate nearest-neighbor settings). These metrics will be computed on the same experimental setups as the accuracy results to directly support the central claim. revision: yes
-
Referee: [AdversarialCoT Construction] The AdversarialCoT construction (§3) relies on iterative interactions with the target LLM to refine the document and expose vulnerabilities. The paper should explicitly state the threat model (e.g., black-box query access only) and typical iteration counts, as this assumption directly affects whether the attack is practical outside controlled settings.
Authors: We concur that explicit threat model details and iteration statistics improve reproducibility and practicality assessment. We will revise Section 3 to clearly state the black-box threat model (query-only access to the target LLM for construction and refinement, with no access to retrieval internals, corpus, or model weights). We will also report the observed iteration counts from our experiments, including averages and ranges per model and task, to quantify the construction effort. revision: yes
Circularity Check
No circularity detected in empirical attack construction
full rationale
The paper presents AdversarialCoT as an iterative, query-specific empirical procedure for building one poisoned document via LLM interactions to expose reasoning flaws. No mathematical derivation, fitted parameters renamed as predictions, or self-citation chains are invoked to support the central claim. The reported accuracy degradation on benchmark LLMs stands as an external experimental outcome rather than a tautology or reduction to inputs by construction. The method is self-contained and does not rely on any of the enumerated circularity patterns.
Axiom & Free-Parameter Ledger
axioms (2)
- domain assumption The retriever will surface the single poisoned document for the target query
- domain assumption Iterative interaction with the target LLM is available to the attacker for refinement
invented entities (1)
-
AdversarialCoT document
no independent evidence
Reference graph
Works this paper leans on
-
[1]
Dibyanayan Bandyopadhyay, Soham Bhattacharjee, and Asif Ekbal. 2025. Think- ing Machines: A Survey of LLM Based Reasoning Strategies. arXiv:2503.10814 [cs] doi:10.48550/arXiv.2503.10814
-
[2]
Maciej Besta, Nils Blach, Ales Kubicek, Robert Gerstenberger, Lukas Gianinazzi, Joanna Gajda, Tomasz Lehmann, Michal Podstawski, Hubert Niewiadomski, Piotr Nyczyk, and Torsten Hoefler. 2023. Graph of Thoughts: Solving Elaborate Problems with Large Language Models. arXiv:2308.09687 [cs] doi:10.48550/arXiv. 2308.09687
work page internal anchor Pith review doi:10.48550/arxiv 2023
-
[3]
Zhuo Chen, Yuyang Gong, Miaokun Chen, Haotan Liu, Qikai Cheng, Fan Zhang, Wei Lu, Xiaozhong Liu, and Jiawei Liu. 2025. FlipedRAG: Black-Box Opinion Manipulation Attacks to Retrieval-Augmented Generation of Large Language Models. arXiv:2501.02968 [cs] doi:10.48550/arXiv.2501.02968
- [4]
-
[5]
Luyu Gao and Jamie Callan. 2022. Unsupervised Corpus Aware Language Model Pre-training for Dense Passage Retrieval. InProceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), Smaranda Muresan, Preslav Nakov, and Aline Villavicencio (Eds.). Association for Computa- tional Linguistics, Dublin, Ireland,...
-
[6]
GLM-4.5 Team, Aohan Zeng, Xin Lv, Qinkai Zheng, Zhenyu Hou, Bin Chen, Chengxing Xie, et al . 2025. GLM-4.5: Agentic, Reasoning, and Coding (ARC) Foundation Models. arXiv:2508.06471 [cs.CL] https://arxiv.org/abs/2508.06471
work page internal anchor Pith review arXiv 2025
-
[7]
Daya Guo, Dejian Yang, Haowei Zhang, Junxiao Song, Ruoyu Zhang, Runxin Xu, Qihao Zhu, Shirong Ma, Peiyi Wang, Xiao Bi, et al . 2025. DeepSeek- R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning. arXiv:2501.12948 [cs.CL] https://doi.org/10.48550/arXiv.2501.12948
work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2501.12948 2025
-
[8]
Kelvin Guu, Kenton Lee, Zora Tung, Panupong Pasupat, and Mingwei Chang. 2020. Retrieval Augmented Language Model Pre-training. InInternational Conference on Machine Learning. PMLR, 3929–3938
2020
-
[9]
Zhibo Hu, Chen Wang, Yanfeng Shu, Hye-Young Paik, and Liming Zhu. 2024. Prompt Perturbation in Retrieval-Augmented Generation based Large Language Models. InSIGKDD. 1119–1130
2024
-
[10]
Gautier Izacard and Édouard Grave. 2021. Leveraging Passage Retrieval with Generative Models for Open Domain Question Answering. InEACL. 874–880
2021
-
[11]
Gautier Izacard, Patrick Lewis, Maria Lomeli, Lucas Hosseini, Fabio Petroni, Timo Schick, Jane Dwivedi-Yu, Armand Joulin, Sebastian Riedel, and Edouard Grave
-
[12]
Few-shot learning with retrieval augmented language models,
Atlas: Few-shot Learning with Retrieval Augmented Language Models. arXiv:2208.03299 [cs.CL] https://arxiv.org/abs/2208.03299
-
[13]
Bowen Jin, Hansi Zeng, Zhenrui Yue, Jinsung Yoon, Sercan Arik, Dong Wang, Hamed Zamani, and Jiawei Han. 2025. Search-R1: Training LLMs to Reason and Leverage Search Engines with Reinforcement Learning. arXiv:2503.09516 [cs] doi:10.48550/arXiv.2503.09516
-
[14]
Toutanova, Llion Jones, Ming-Wei Chang, Andrew Dai, Jakob Uszkoreit, Quoc Le, and Slav Petrov
Tom Kwiatkowski, Jennimaria Palomaki, Olivia Redfield, Michael Collins, Ankur Parikh, Chris Alberti, Danielle Epstein, Illia Polosukhin, Matthew Kelcey, Jacob Devlin, Kenton Lee, Kristina N. Toutanova, Llion Jones, Ming-Wei Chang, Andrew Dai, Jakob Uszkoreit, Quoc Le, and Slav Petrov. 2019. Natural Questions: a Benchmark for Question Answering Research.Tr...
2019
-
[15]
Patrick Lewis, Ethan Perez, Aleksandra Piktus, Fabio Petroni, Vladimir Karpukhin, Naman Goyal, Heinrich Küttler, Mike Lewis, Wen-tau Yih, Tim Rocktäschel, et al
-
[16]
Advances in Neural Information Processing Systems33 (2020), 9459–9474
Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks. Advances in Neural Information Processing Systems33 (2020), 9459–9474
2020
-
[17]
Xiaoxi Li, Guanting Dong, Jiajie Jin, Yuyao Zhang, Yujia Zhou, Yutao Zhu, Peitian Zhang, and Zhicheng Dou. 2025. Search-O1: Agentic Search-Enhanced Large Reasoning Models. arXiv:2501.05366 [cs] doi:10.48550/arXiv.2501.05366
work page internal anchor Pith review doi:10.48550/arxiv.2501.05366 2025
-
[18]
Jiawei Liu, Yangyang Kang, Di Tang, Kaisong Song, Changlong Sun, Xiaofeng Wang, Wei Lu, and Xiaozhong Liu. 2022. Order-disorder: Imitation adversarial attacks for black-box neural ranking models. InSIGSAC. 2025–2039
2022
-
[19]
Yupei Liu, Yuqi Jia, Runpeng Geng, Jinyuan Jia, and Neil Zhenqiang Gong. 2024. Formalizing and Benchmarking Prompt Injection Attacks and Defenses. In33rd USENIX Security Symposium (USENIX Security 24). 1831–1847
2024
-
[20]
Yu-An Liu, Ruqing Zhang, Jiafeng Guo, and Xueqi Cheng. 2025. On the Robustness of Generative Information Retrieval Models. InECIR
2025
-
[21]
Yu-An Liu, Ruqing Zhang, Jiafeng Guo, and Maarten de Rijke. 2024. Robust Information Retrieval. InSIGIR. 3009–3012
2024
-
[22]
Yu-An Liu, Ruqing Zhang, Jiafeng Guo, Maarten de Rijke, Yixing Fan, and Xueqi Cheng. 2024. Multi-Granular Adversarial Attacks against Black-box Neural Ranking Models. InSIGIR. 1391–1400
2024
-
[23]
Yu-An Liu, Ruqing Zhang, Mingkun Zhang, Wei Chen, Maarten de Rijke, Jiafeng Guo, and Xueqi Cheng. 2024. Perturbation-Invariant Adversarial Training for Neural Ranking Models: Improving the Effectiveness-Robustness Trade-Off. In AAAI, Vol. 38
2024
-
[24]
Tri Nguyen, Mir Rosenberg, Xia Song, Jianfeng Gao, Saurabh Tiwary, Rangan Majumder, and Li Deng. 2016. MS MARCO: A Human-Generated MAchine Reading COmprehension Dataset. arXiv:1611.09268 [cs.CL] https://arxiv.org/ abs/1611.09268
work page internal anchor Pith review arXiv 2016
-
[25]
Ori Ram, Yoav Levine, Itay Dalmedigos, Dor Muhlgay, Amnon Shashua, Kevin Leyton-Brown, and Yoav Shoham. 2023. In-context Retrieval-Augmented Lan- guage Models.TACL11 (2023), 1316–1331
2023
-
[26]
Zhihong Shao, Peiyi Wang, Qihao Zhu, Runxin Xu, Junxiao Song, Xiao Bi, Haowei Zhang, Mingchuan Zhang, Y. K. Li, Y. Wu, and Daya Guo. 2024. DeepSeek- Math: Pushing the Limits of Mathematical Reasoning in Open Language Models. arXiv:2402.03300 [cs] doi:10.48550/arXiv.2402.03300
work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2402.03300 2024
-
[28]
Hongru Song, Yu-An Liu, Ruqing Zhang, Jiafeng Guo, Jianming Lv, Maarten de Rijke, and Xueqi Cheng. 2025. The Silent Saboteur: Imperceptible Adversarial Attacks against Black-Box Retrieval-Augmented Generation Systems. InFindings of the Association for Computational Linguistics: ACL 2025, Wanxiang Che, Joyce Nabende, Ekaterina Shutova, and Mohammad Taher P...
-
[29]
Sutton and Andrew G
Richard S. Sutton and Andrew G. Barto. 2018.Reinforcement Learning: An Intro- duction(2 ed.). The MIT Press, Cambridge, MA
2018
-
[30]
Kimi Team, Yifan Bai, Yiping Bao, Guanduo Chen, Jiahao Chen, Ningxin Chen, Ruijue Chen, et al . 2025. Kimi K2: Open Agentic Intelligence. arXiv:2507.20534 [cs.LG] https://arxiv.org/abs/2507.20534
work page internal anchor Pith review arXiv 2025
-
[31]
Alexander Wei, Nika Haghtalab, and Jacob Steinhardt. 2023. Jailbroken: How Does LLM Safety Training Fail?NIPS36 (2023), 80079–80110
2023
-
[32]
Chi, Quoc V
Jason Wei, Xuezhi Wang, Dale Schuurmans, Maarten Bosma, Brian Ichter, Fei Xia, Ed H. Chi, Quoc V. Le, and Denny Zhou. 2022. Chain-of-Thought Prompting Elic- its Reasoning in Large Language Models. InProceedings of the 36th International Conference on Neural Information Processing Systems(New Orleans, LA, USA) (NIPS ’22). Curran Associates Inc., Red Hook, ...
2022
-
[33]
Chen Wu, Ruqing Zhang, Jiafeng Guo, Maarten de Rijke, Yixing Fan, and Xueqi Cheng. 2023. Prada: Practical Black-box Adversarial Attacks Against Neural Ranking Models.ACM Transactions on Information Systems41, 4 (2023), 1–27
2023
- [34]
-
[35]
An Yang, Anfeng Li, Baosong Yang, Beichen Zhang, Binyuan Hui, Bo Zheng, Bowen Yu, Chang Gao, Chengen Huang, et al. 2025. Qwen3 Technical Report. arXiv:2505.09388 [cs.CL] https://arxiv.org/abs/2505.09388
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[36]
An Yang, Baosong Yang, Beichen Zhang, Binyuan Hui, Bo Zheng, Bowen Yu, Chengyuan Li, Dayiheng Liu, Fei Huang, Haoran Wei, et al . 2024. Qwen2.5 Technical Report.arXiv preprint arXiv:2412.15115(2024)
work page internal anchor Pith review Pith/arXiv arXiv 2024
-
[37]
Zhilin Yang, Peng Qi, Saizheng Zhang, Yoshua Bengio, William Cohen, Ruslan Salakhutdinov, and Christopher D. Manning. 2018. HotpotQA: A Dataset for Diverse, Explainable Multi-hop Question Answering. InProceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, Ellen Riloff, David Chiang, Julia Hockenmaier, and Jun’ichi Tsujii (...
-
[38]
Tree of Thoughts: Deliberate Problem Solving with Large Language Models
Shunyu Yao, Dian Yu, Jeffrey Zhao, Izhak Shafran, Thomas L. Griffiths, Yuan Cao, and Karthik Narasimhan. 2023. Tree of Thoughts: Deliberate Problem Solving with Large Language Models. arXiv:2305.10601 [cs.CL] https://arxiv.org/abs/ 2305.10601
work page internal anchor Pith review arXiv 2023
-
[39]
Peitian Zhang, Zheng Liu, Shitao Xiao, Zhicheng Dou, and Jian-Yun Nie. 2024. A Multi-task Embedder for Retrieval Augmented LLMs. InACL. 3537–3553
2024
- [40]
-
[41]
Chujie Zheng, Shixuan Liu, Mingze Li, Xiong-Hui Chen, Bowen Yu, Chang Gao, Kai Dang, Yuqiong Liu, Rui Men, An Yang, Jingren Zhou, and Junyang Lin
-
[42]
Group Sequence Policy Optimization
Group Sequence Policy Optimization. arXiv:2507.18071 [cs.LG] https: //arxiv.org/abs/2507.18071
work page internal anchor Pith review arXiv
- [43]
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.