BiRD: A Bidirectional Ranking Defense Mechanism for Retrieval Augmented Generation
Pith reviewed 2026-05-20 03:41 UTC · model grok-4.3
The pith
BiRD defends RAG by spotting poisoned documents through unusually strong alignment between their backward rankings and the query's forward ranking.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The central claim is that poisoned documents exhibit significantly stronger alignment between their backward rankings and the query's forward ranking than benign ones do. The authors build BiRD on a dual-signal framework that uses forward ranking to judge semantic relevance and backward ranking to measure ranking context consistency, directly addressing the prior focus on content alone. Experiments across three datasets, three retrievers, three LLMs, and two attack scenarios show this reduces PoisonedRAG attack success by up to 54 percent while lifting task accuracy by up to 56 percent with under one second of extra latency on average.
What carries the argument
The bidirectional ranking defense mechanism that pairs forward ranking for semantic content relevance with backward ranking for ranking context consistency.
Load-bearing premise
Poisoned documents will reliably show stronger alignment between their backward rankings and the query's forward ranking across varied datasets, retrievers, models, and attacks.
What would settle it
A test on a fresh dataset or attack method where poisoned documents no longer display the claimed stronger backward-forward alignment would falsify the central pattern.
Figures
read the original abstract
The growing adoption of Retrieval-Augmented Generation (RAG) has led to a rise in adversarial attacks. Existing defenses, relying on semantic analysis or voting, face a trade-off between high computational cost and limited robustness under strong poisoning attacks. Their fundamental limitation is the exclusive focus on semantic content relevance, while neglecting the retrieval context that is critically defined by ranking structures. To this end, we investigate the bidirectional ranking behavior of poisoned and benign documents, and discover a key discriminative pattern: poisoned documents exhibit significantly stronger alignment between their backward rankings and the query's forward ranking. Capitalizing on this, we propose BiRD, a bidirectional ranking defense mechanism built upon a dual-signal framework that leverages forward ranking to assess semantic content relevance and backward ranking to quantify ranking context consistency. This design directly addresses the fundamental limitation of prior approaches, enabling simultaneous efficiency and robustness. Extensive evaluation across 3 datasets with 3 retrievers and 3 LLMs under 2 attack scenarios validates BiRD's effectiveness. Notably, BiRD reduces the attack success rate of PoisonedRAG by up to 54% while simultaneously improving task accuracy by up to 56%, with average additional latency under 1 second.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper claims to have identified a key discriminative pattern in RAG systems: poisoned documents exhibit significantly stronger alignment between their backward rankings and the query's forward ranking. Building on this observation, it proposes BiRD, a bidirectional ranking defense that uses a dual-signal framework—forward ranking to assess semantic content relevance and backward ranking to quantify ranking context consistency. The approach is evaluated across 3 datasets, 3 retrievers, and 3 LLMs under 2 attack scenarios, reporting up to 54% reduction in PoisonedRAG attack success rate, up to 56% improvement in task accuracy, and average added latency under 1 second.
Significance. If the bidirectional ranking alignment pattern is shown to be a robust, generalizable property of poisoning rather than tied to specific attack constructions or retriever choices, BiRD would represent a meaningful advance by resolving the efficiency-robustness trade-off in prior semantic or voting-based defenses. The multi-configuration evaluation across datasets/retrievers/LLMs is a strength that supports broader applicability claims.
major comments (3)
- [Abstract] Abstract: The central performance claims (up to 54% ASR reduction and 56% accuracy improvement) are stated as maxima without identifying the exact dataset/retriever/LLM configuration, reporting variance across runs, or including statistical significance tests; this directly affects whether the data support the claimed effectiveness of the dual-signal mechanism.
- [Method] Method (dual-signal framework): The defense rests on the assumption that stronger backward-forward ranking alignment is intrinsic to poisoned documents and a reliable discriminator; without an ablation that isolates or removes the backward-ranking signal, it remains unclear whether reported gains derive from the proposed mechanism or from incidental top-k filtering effects.
- [Experiments] Experiments: Results are shown across 3 retrievers, but the evaluation does not test adapted variants of PoisonedRAG or alternative embedding models (dense vs. sparse); this leaves open whether the alignment pattern persists when the attack or retrieval setup changes, which is load-bearing for the generalizability of the defense.
minor comments (2)
- [Method] Notation for forward and backward rankings could be introduced with a small illustrative example early in the method section to improve readability.
- [Abstract] The abstract mentions '2 attack scenarios' but does not name them; adding the names (e.g., PoisonedRAG and the second scenario) would aid quick assessment.
Simulated Author's Rebuttal
We thank the referee for the constructive feedback and the recommendation for major revision. We address each major comment point by point below, indicating where we agree and the revisions we will make to strengthen the manuscript.
read point-by-point responses
-
Referee: [Abstract] Abstract: The central performance claims (up to 54% ASR reduction and 56% accuracy improvement) are stated as maxima without identifying the exact dataset/retriever/LLM configuration, reporting variance across runs, or including statistical significance tests; this directly affects whether the data support the claimed effectiveness of the dual-signal mechanism.
Authors: We agree that the abstract would benefit from greater specificity. In the revised version, we will update the abstract to identify the exact dataset/retriever/LLM configuration that achieves the reported maxima. We will also ensure the experimental results section reports variance across runs and includes statistical significance tests to more rigorously support the effectiveness of the dual-signal mechanism. revision: yes
-
Referee: [Method] Method (dual-signal framework): The defense rests on the assumption that stronger backward-forward ranking alignment is intrinsic to poisoned documents and a reliable discriminator; without an ablation that isolates or removes the backward-ranking signal, it remains unclear whether reported gains derive from the proposed mechanism or from incidental top-k filtering effects.
Authors: This is a fair and important point. To clarify that the gains stem from the bidirectional mechanism rather than incidental top-k effects, we will add an ablation study in the revised manuscript. The ablation will compare the full BiRD dual-signal framework against a forward-ranking-only variant, thereby isolating the contribution of the backward-ranking signal. revision: yes
-
Referee: [Experiments] Experiments: Results are shown across 3 retrievers, but the evaluation does not test adapted variants of PoisonedRAG or alternative embedding models (dense vs. sparse); this leaves open whether the alignment pattern persists when the attack or retrieval setup changes, which is load-bearing for the generalizability of the defense.
Authors: We acknowledge the value of broader testing for generalizability. While our evaluation already spans three retrievers under two attack scenarios, we agree that adapted PoisonedRAG variants and explicit dense-versus-sparse comparisons would provide stronger evidence. We will revise the discussion section to explicitly address this limitation and outline it as future work; however, we cannot perform these additional experiments within the current revision timeline. revision: partial
Circularity Check
No significant circularity; defense rests on empirical pattern discovery and direct implementation.
full rationale
The paper derives BiRD from an empirical observation of bidirectional ranking alignment in poisoned documents, discovered via investigation across datasets and setups, then applies a dual-signal framework using forward and backward rankings. No equations reduce performance claims to fitted parameters by construction, no self-citations form load-bearing premises, and no ansatzes or uniqueness theorems are imported from prior author work. The central claim remains an independent empirical finding applied to defense design, with results reported across multiple retrievers and LLMs without reducing to self-referential inputs.
Axiom & Free-Parameter Ledger
Lean theorems connected to this paper
-
IndisputableMonolith/Foundation/AlexanderDuality.leanalexander_duality_circle_linking unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
poisoned documents exhibit significantly stronger alignment between their backward rankings and the query's forward ranking
-
IndisputableMonolith/Cost/FunctionalEquation.leanwashburn_uniqueness_aczel unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
S(dq_i) = r_i_cr / (1 - r_i_cc) with Spearman rank correlation
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Reference graph
Works this paper leans on
-
[1]
L. Huang, W. Yu, W. Ma, W. Zhong, Z. Feng, H. Wang, Q. Chen, W. Peng, X. Feng, B. Qin, and T. Liu, “A Survey on Hallucination in Large Language Models: Principles, Taxonomy, Challenges, and Open Questions,” ACM Trans. Inf. Syst., vol. 43, no. 2, pp. 42:1–42:55, 2025
work page 2025
-
[2]
A survey on large language model (LLM) security and privacy: The Good, The Bad, and The Ugly,
Y . Yao, J. Duan, K. Xu, Y . Cai, Z. Sun, and Y . Zhang, “A survey on large language model (LLM) security and privacy: The Good, The Bad, and The Ugly,”High- Confidence Computing, vol. 4, no. 2, p. 100211, 2024
work page 2024
-
[3]
Retrieval-Augmented Generation for Large Language Models: A Survey,
Y . Gao, Y . Xiong, X. Gao, K. Jia, J. Pan, Y . Bi, Y . Dai, J. Sun, M. Wang, and H. Wang, “Retrieval-Augmented Generation for Large Language Models: A Survey,” 2024
work page 2024
-
[4]
W. Zou, R. Geng, B. Wang, and J. Jia, “PoisonedRAG: Knowledge Corruption Attacks to Retrieval-Augmented Generation of Large Language Models,” 2024
work page 2024
-
[5]
Poisoning Retrieval Corpora by Injecting Adversarial Passages,
Z. Zhong, Z. Huang, A. Wettig, and D. Chen, “Poisoning Retrieval Corpora by Injecting Adversarial Passages,” inProceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, H. Bouamor, J. Pino, and K. Bali, Eds. Singapore: Association for Computational Linguistics, 2023, pp. 13 764–13 775
work page 2023
-
[6]
GASLITEing the Retrieval: Exploring Vulnerabilities in Dense Embedding-based Search,
M. Ben-Tov and M. Sharif, “GASLITEing the Retrieval: Exploring Vulnerabilities in Dense Embedding-based Search,” 2025
work page 2025
-
[7]
H. Song, Y .-a. Liu, R. Zhang, J. Guo, J. Lv, M. de Ri- jke, and X. Cheng, “The Silent Saboteur: Impercepti- ble Adversarial Attacks against Black-Box Retrieval- Augmented Generation Systems,” 2025
work page 2025
-
[8]
Y . Gong, Z. Chen, M. Chen, F. Yu, W. Lu, X. Wang, X. Liu, and J. Liu, “Topic-FlipRAG: Topic-Orientated Adversarial Opinion Manipulation Attacks to Retrieval- Augmented Generation Models,” 2025
work page 2025
-
[9]
G. Bagwe, S. S. Chaturvedi, X. Ma, X. Yuan, K.-C. Wang, and L. Zhang, “Your RAG is Unfair: Exposing Fairness Vulnerabilities in Retrieval-Augmented Gener- ation via Backdoor Attacks,” 2025
work page 2025
-
[10]
SeCon-RAG: A Two- Stage Semantic Filtering and Conflict-Free Framework for Trustworthy RAG,
X. Si, M. Zhu, S. Qin, L. Yu, L. Zhang, S. Liu, X. Li, R. Duan, Y . Liu, and X. Jia, “SeCon-RAG: A Two- Stage Semantic Filtering and Conflict-Free Framework for Trustworthy RAG,” 2025
work page 2025
-
[11]
TrustRAG: Enhancing Ro- bustness and Trustworthiness in Retrieval-Augmented Generation,
H. Zhou, K.-H. Lee, Z. Zhan, Y . Chen, Z. Li, Z. Wang, H. Haddadi, and E. Yilmaz, “TrustRAG: Enhancing Ro- bustness and Trustworthiness in Retrieval-Augmented Generation,” 2025
work page 2025
-
[12]
ReliabilityRAG: Effective and Provably Robust Defense for RAG-based Web-Search,
Z. Shen, B. Imana, T. Wu, C. Xiang, P. Mittal, and A. Korolova, “ReliabilityRAG: Effective and Provably Robust Defense for RAG-based Web-Search,” 2025
work page 2025
-
[13]
Certifiably Robust RAG against Retrieval Corruption,
C. Xiang, T. Wu, Z. Zhong, D. Wagner, D. Chen, and P. Mittal, “Certifiably Robust RAG against Retrieval Corruption,” 2024
work page 2024
-
[14]
X. Xian, G. Wang, X. Bi, J. Srinivasa, A. Kundu, C. Fleming, M. Hong, and J. Ding, “On the Vulnerabil- ity of Applying Retrieval-Augmented Generation within Knowledge-Intensive Application Domains,” 2025
work page 2025
-
[15]
F. Wang, X. Wan, R. Sun, J. Chen, and S. O. Arik, “Astute RAG: Overcoming Imperfect Retrieval Augmen- tation and Knowledge Conflicts for Large Language Models,” inProceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), W. Che, J. Nabende, E. Shutova, and M. T. Pilehvar, Eds. Vienna, Austria: Associa...
work page 2025
-
[16]
InstructRAG: Instructing Retrieval-Augmented Generation via Self- Synthesized Rationales,
Z. Wei, W.-L. Chen, and Y . Meng, “InstructRAG: Instructing Retrieval-Augmented Generation via Self- Synthesized Rationales,” inThe Thirteenth International Conference on Learning Representations, 2024. 13
work page 2024
-
[17]
Query Rewriting in Retrieval-Augmented Large Language Models,
X. Ma, Y . Gong, P. He, H. Zhao, and N. Duan, “Query Rewriting in Retrieval-Augmented Large Language Models,” inProceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, H. Bouamor, J. Pino, and K. Bali, Eds. Singapore: Association for Computational Linguistics, 2023, pp. 5303–5315
work page 2023
-
[18]
Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks,
P. Lewis, E. Perez, A. Piktus, F. Petroni, V . Karpukhin, N. Goyal, H. Küttler, M. Lewis, W.-t. Yih, T. Rock- täschel, S. Riedel, and D. Kiela, “Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks,” in Advances in Neural Information Processing Systems, vol. 33. Curran Associates, Inc., 2020, pp. 9459–9474
work page 2020
-
[19]
Siren’s Song in the AI Ocean: A Survey on Hallucination in Large Language Models,
Y . Zhang, Y . Li, L. Cui, D. Cai, L. Liu, T. Fu, X. Huang, E. Zhao, Y . Zhang, C. Xu, Y . Chen, L. Wang, A. T. Luu, W. Bi, F. Shi, and S. Shi, “Siren’s Song in the AI Ocean: A Survey on Hallucination in Large Language Models,” 2025
work page 2025
-
[20]
R. Xu, H. Liu, S. Nag, Z. Dai, Y . Xie, X. Tang, C. Luo, Y . Li, J. C. Ho, C. Yang, and Q. He, “SimRAG: Self- Improving Retrieval-Augmented Generation for Adapt- ing Large Language Models to Specialized Domains,” in Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Te...
work page 2025
-
[21]
HijackRAG: Hijacking Attacks against Retrieval-Augmented Large Language Models,
Y . Zhang, Q. Li, T. Du, X. Zhang, X. Zhao, Z. Feng, and J. Yin, “HijackRAG: Hijacking Attacks against Retrieval-Augmented Large Language Models,” 2024
work page 2024
-
[22]
Y . Jiao, X. Wang, and K. Yang, “PR-Attack: Coordi- nated Prompt-RAG Attacks on Retrieval-Augmented Generation in Large Language Models via Bilevel Opti- mization,” 2025
work page 2025
-
[23]
Maximal independent sets in bipartite graphs,
J. Liu, “Maximal independent sets in bipartite graphs,” Journal of Graph Theory, vol. 17, no. 4, pp. 495–507,
-
[24]
Available: https://onlinelibrary.wiley
[Online]. Available: https://onlinelibrary.wiley. com/doi/abs/10.1002/jgt.3190170407
-
[25]
Bidi- rectional ranking for person re-identification,
Q. Leng, R. Hu, C. Liang, Y . Wang, and J. Chen, “Bidi- rectional ranking for person re-identification,” in2013 IEEE International Conference on Multimedia and Expo (ICME), 2013, pp. 1–6
work page 2013
-
[26]
Bidirectional Attention Flow for Machine Comprehen- sion,
M. Seo, A. Kembhavi, A. Farhadi, and H. Hajishirzi, “Bidirectional Attention Flow for Machine Comprehen- sion,” 2018
work page 2018
-
[27]
Query2doc: Query Expansion with Large Language Models,
L. Wang, N. Yang, and F. Wei, “Query2doc: Query Expansion with Large Language Models,” 2023
work page 2023
-
[28]
Hello neighbor: Accurate object retrieval with k-reciprocal nearest neighbors,
D. Qin, S. Gammeter, L. Bossard, T. Quack, and L. van Gool, “Hello neighbor: Accurate object retrieval with k-reciprocal nearest neighbors,” inCVPR 2011, 2011, pp. 777–784
work page 2011
-
[29]
A contextual dissimilarity measure for accurate and efficient image search,
H. Jegou, H. Harzallah, and C. Schmid, “A contextual dissimilarity measure for accurate and efficient image search,” in2007 IEEE Conference on Computer Vision and Pattern Recognition, 2007, pp. 1–8
work page 2007
-
[30]
Re-ranking Person Re-identification with k-reciprocal Encoding,
Z. Zhong, L. Zheng, D. Cao, and S. Li, “Re-ranking Person Re-identification with k-reciprocal Encoding,” 2017
work page 2017
-
[31]
Natural Questions: A Benchmark for Question Answer- ing Research,
T. Kwiatkowski, J. Palomaki, O. Redfield, M. Collins, A. Parikh, C. Alberti, D. Epstein, I. Polosukhin, J. De- vlin, K. Lee, K. Toutanova, L. Jones, M. Kelcey, M.-W. Chang, A. M. Dai, J. Uszkoreit, Q. Le, and S. Petrov, “Natural Questions: A Benchmark for Question Answer- ing Research,”Transactions of the Association for Com- putational Linguistics, vol. ...
work page 2019
-
[32]
MS MARCO: A Human Generated MAchine Reading COmprehension Dataset,
P. Bajaj, D. Campos, N. Craswell, L. Deng, J. Gao, X. Liu, R. Majumder, A. McNamara, B. Mitra, T. Nguyen, M. Rosenberg, X. Song, A. Stoica, S. Ti- wary, and T. Wang, “MS MARCO: A Human Generated MAchine Reading COmprehension Dataset,” 2018
work page 2018
-
[33]
HotpotQA: A Dataset for Diverse, Explainable Multi-hop Question Answering,
Z. Yang, P. Qi, S. Zhang, Y . Bengio, W. Cohen, R. Salakhutdinov, and C. D. Manning, “HotpotQA: A Dataset for Diverse, Explainable Multi-hop Question Answering,” inProceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, E. Riloff, D. Chiang, J. Hockenmaier, and J. Tsujii, Eds. Brussels, Belgium: Association for Computationa...
work page 2018
-
[34]
Unsupervised Dense Information Retrieval with Contrastive Learning,
G. Izacard, M. Caron, L. Hosseini, S. Riedel, P. Bo- janowski, A. Joulin, and E. Grave, “Unsupervised Dense Information Retrieval with Contrastive Learning,” 2022
work page 2022
-
[35]
Approximate Nearest Neighbor Negative Contrastive Learning for Dense Text Retrieval,
L. Xiong, C. Xiong, Y . Li, K.-F. Tang, J. Liu, P. N. Bennett, J. Ahmed, and A. Overwijk, “Approximate Nearest Neighbor Negative Contrastive Learning for Dense Text Retrieval,” inInternational Conference on Learning Representations, 2020
work page 2020
-
[36]
Dense Passage Re- trieval for Open-Domain Question Answering,
V . Karpukhin, B. Oguz, S. Min, P. Lewis, L. Wu, S. Edunov, D. Chen, and W.-t. Yih, “Dense Passage Re- trieval for Open-Domain Question Answering,” inPro- ceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), B. Webber, T. Cohn, Y . He, and Y . Liu, Eds. Online: Association for Computational Linguistics, 2020, pp. 6769–6781
work page 2020
-
[37]
Qwen, A. Yang, B. Yang, B. Zhang, B. Hui, B. Zheng, B. Yu, C. Li, D. Liu, F. Huang, H. Wei, H. Lin, J. Yang, 14 J. Tu, J. Zhang, J. Yang, J. Yang, J. Zhou, J. Lin, K. Dang, K. Lu, K. Bao, K. Yang, L. Yu, M. Li, M. Xue, P. Zhang, Q. Zhu, R. Men, R. Lin, T. Li, T. Tang, T. Xia, X. Ren, X. Ren, Y . Fan, Y . Su, Y . Zhang, Y . Wan, Y . Liu, Z. Cui, Z. Zhang, ...
work page 2025
-
[38]
A. Q. Jiang, A. Sablayrolles, A. Mensch, C. Bamford, D. S. Chaplot, D. de las Casas, F. Bressand, G. Lengyel, G. Lample, L. Saulnier, L. R. Lavaud, M.-A. Lachaux, P. Stock, T. L. Scao, T. Lavril, T. Wang, T. Lacroix, and W. E. Sayed, “Mistral 7B,” 2023
work page 2023
-
[39]
The distribution of the flora in the alpine zone
P. Jaccard, “The distribution of the flora in the alpine zone.”New Phytologist, vol. 11, no. 2, pp. 37–50, 1912. [Online]. Available: https://nph.onlinelibrary.wiley. com/doi/abs/10.1111/j.1469-8137.1912.tb05611.x
-
[40]
TESTS FOR RANK CORRELATION COEFFI- CIENTS. I,
E. C. FIELLER, H. O. HARTLEY , and E. S. PEAR- SON, “TESTS FOR RANK CORRELATION COEFFI- CIENTS. I,”Biometrika, vol. 44, no. 3-4, pp. 470–481, 1957
work page 1957
-
[41]
A similarity mea- sure for indefinite rankings,
W. Webber, A. Moffat, and J. Zobel, “A similarity mea- sure for indefinite rankings,”ACM Transactions on In- formation Systems, vol. 28, no. 4, pp. 1–38, 2010. A Appendix A.1 Formulation of Bidirectional Ranking De- fense To facilitate a clear understanding of the proposed framework, we provide a comprehensive summary of the mathematical notations and var...
work page 2010
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.