pith. sign in

arxiv: 2605.20123 · v1 · pith:OTTCEHA4new · submitted 2026-05-19 · 💻 cs.CR · cs.IR

BiRD: A Bidirectional Ranking Defense Mechanism for Retrieval Augmented Generation

Pith reviewed 2026-05-20 03:41 UTC · model grok-4.3

classification 💻 cs.CR cs.IR
keywords retrieval augmented generationadversarial defensepoisoning attacksbidirectional rankingRAG securityranking structuresadversarial robustness
0
0 comments X

The pith

BiRD defends RAG by spotting poisoned documents through unusually strong alignment between their backward rankings and the query's forward ranking.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper sets out to prove that poisoned documents in retrieval-augmented generation systems display a consistent ranking pattern that benign documents lack, allowing a lightweight defense to filter them without heavy semantic analysis. This matters because existing defenses either demand high computation or lose effectiveness against strong poisoning attacks, restricting reliable use of RAG. By combining forward ranking for content relevance with backward ranking for context consistency, the approach aims to cut attack success while raising overall accuracy and keeping added delay under one second. If the pattern holds, RAG deployments could become safer in practice without trading performance for security.

Core claim

The central claim is that poisoned documents exhibit significantly stronger alignment between their backward rankings and the query's forward ranking than benign ones do. The authors build BiRD on a dual-signal framework that uses forward ranking to judge semantic relevance and backward ranking to measure ranking context consistency, directly addressing the prior focus on content alone. Experiments across three datasets, three retrievers, three LLMs, and two attack scenarios show this reduces PoisonedRAG attack success by up to 54 percent while lifting task accuracy by up to 56 percent with under one second of extra latency on average.

What carries the argument

The bidirectional ranking defense mechanism that pairs forward ranking for semantic content relevance with backward ranking for ranking context consistency.

Load-bearing premise

Poisoned documents will reliably show stronger alignment between their backward rankings and the query's forward ranking across varied datasets, retrievers, models, and attacks.

What would settle it

A test on a fresh dataset or attack method where poisoned documents no longer display the claimed stronger backward-forward alignment would falsify the central pattern.

Figures

Figures reproduced from arXiv: 2605.20123 by Chao Liang, Chengcai Gao, Qiufeng Wang, Xiaochuan Shi, Zhihong Sun.

Figure 1
Figure 1. Figure 1: Bidirectional Ranking Defense (BiRD) framework. [PITH_FULL_IMAGE:figures/full_fig_p001_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Rank Position Poison Frequency Statistical [PITH_FULL_IMAGE:figures/full_fig_p002_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Rank position poison frequency comparison: (Top) different retrievers on HotpotQA; (Middle) Contriever across [PITH_FULL_IMAGE:figures/full_fig_p004_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: t-SNE visualization of textual embeddings for be [PITH_FULL_IMAGE:figures/full_fig_p005_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Overview of BiRD method. The process is mainly divided into three stages: forward retrieval, backward retrieval, and [PITH_FULL_IMAGE:figures/full_fig_p006_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: The variation of ASR and ACC across different datasets in relation to the parameter [PITH_FULL_IMAGE:figures/full_fig_p011_6.png] view at source ↗
Figure 7
Figure 7. Figure 7: ASR and ACC versus the filtering threshold [PITH_FULL_IMAGE:figures/full_fig_p011_7.png] view at source ↗
Figure 9
Figure 9. Figure 9: Comprehensive comparison of rank position poisoned frequency heatmaps across three datasets (rows) and three [PITH_FULL_IMAGE:figures/full_fig_p016_9.png] view at source ↗
Figure 10
Figure 10. Figure 10: Rank position poison frequency for Topic-Flip attacks: PRO strategy (top row) and CON strategy (bottom row) across [PITH_FULL_IMAGE:figures/full_fig_p017_10.png] view at source ↗
read the original abstract

The growing adoption of Retrieval-Augmented Generation (RAG) has led to a rise in adversarial attacks. Existing defenses, relying on semantic analysis or voting, face a trade-off between high computational cost and limited robustness under strong poisoning attacks. Their fundamental limitation is the exclusive focus on semantic content relevance, while neglecting the retrieval context that is critically defined by ranking structures. To this end, we investigate the bidirectional ranking behavior of poisoned and benign documents, and discover a key discriminative pattern: poisoned documents exhibit significantly stronger alignment between their backward rankings and the query's forward ranking. Capitalizing on this, we propose BiRD, a bidirectional ranking defense mechanism built upon a dual-signal framework that leverages forward ranking to assess semantic content relevance and backward ranking to quantify ranking context consistency. This design directly addresses the fundamental limitation of prior approaches, enabling simultaneous efficiency and robustness. Extensive evaluation across 3 datasets with 3 retrievers and 3 LLMs under 2 attack scenarios validates BiRD's effectiveness. Notably, BiRD reduces the attack success rate of PoisonedRAG by up to 54% while simultaneously improving task accuracy by up to 56%, with average additional latency under 1 second.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

3 major / 2 minor

Summary. The paper claims to have identified a key discriminative pattern in RAG systems: poisoned documents exhibit significantly stronger alignment between their backward rankings and the query's forward ranking. Building on this observation, it proposes BiRD, a bidirectional ranking defense that uses a dual-signal framework—forward ranking to assess semantic content relevance and backward ranking to quantify ranking context consistency. The approach is evaluated across 3 datasets, 3 retrievers, and 3 LLMs under 2 attack scenarios, reporting up to 54% reduction in PoisonedRAG attack success rate, up to 56% improvement in task accuracy, and average added latency under 1 second.

Significance. If the bidirectional ranking alignment pattern is shown to be a robust, generalizable property of poisoning rather than tied to specific attack constructions or retriever choices, BiRD would represent a meaningful advance by resolving the efficiency-robustness trade-off in prior semantic or voting-based defenses. The multi-configuration evaluation across datasets/retrievers/LLMs is a strength that supports broader applicability claims.

major comments (3)
  1. [Abstract] Abstract: The central performance claims (up to 54% ASR reduction and 56% accuracy improvement) are stated as maxima without identifying the exact dataset/retriever/LLM configuration, reporting variance across runs, or including statistical significance tests; this directly affects whether the data support the claimed effectiveness of the dual-signal mechanism.
  2. [Method] Method (dual-signal framework): The defense rests on the assumption that stronger backward-forward ranking alignment is intrinsic to poisoned documents and a reliable discriminator; without an ablation that isolates or removes the backward-ranking signal, it remains unclear whether reported gains derive from the proposed mechanism or from incidental top-k filtering effects.
  3. [Experiments] Experiments: Results are shown across 3 retrievers, but the evaluation does not test adapted variants of PoisonedRAG or alternative embedding models (dense vs. sparse); this leaves open whether the alignment pattern persists when the attack or retrieval setup changes, which is load-bearing for the generalizability of the defense.
minor comments (2)
  1. [Method] Notation for forward and backward rankings could be introduced with a small illustrative example early in the method section to improve readability.
  2. [Abstract] The abstract mentions '2 attack scenarios' but does not name them; adding the names (e.g., PoisonedRAG and the second scenario) would aid quick assessment.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the constructive feedback and the recommendation for major revision. We address each major comment point by point below, indicating where we agree and the revisions we will make to strengthen the manuscript.

read point-by-point responses
  1. Referee: [Abstract] Abstract: The central performance claims (up to 54% ASR reduction and 56% accuracy improvement) are stated as maxima without identifying the exact dataset/retriever/LLM configuration, reporting variance across runs, or including statistical significance tests; this directly affects whether the data support the claimed effectiveness of the dual-signal mechanism.

    Authors: We agree that the abstract would benefit from greater specificity. In the revised version, we will update the abstract to identify the exact dataset/retriever/LLM configuration that achieves the reported maxima. We will also ensure the experimental results section reports variance across runs and includes statistical significance tests to more rigorously support the effectiveness of the dual-signal mechanism. revision: yes

  2. Referee: [Method] Method (dual-signal framework): The defense rests on the assumption that stronger backward-forward ranking alignment is intrinsic to poisoned documents and a reliable discriminator; without an ablation that isolates or removes the backward-ranking signal, it remains unclear whether reported gains derive from the proposed mechanism or from incidental top-k filtering effects.

    Authors: This is a fair and important point. To clarify that the gains stem from the bidirectional mechanism rather than incidental top-k effects, we will add an ablation study in the revised manuscript. The ablation will compare the full BiRD dual-signal framework against a forward-ranking-only variant, thereby isolating the contribution of the backward-ranking signal. revision: yes

  3. Referee: [Experiments] Experiments: Results are shown across 3 retrievers, but the evaluation does not test adapted variants of PoisonedRAG or alternative embedding models (dense vs. sparse); this leaves open whether the alignment pattern persists when the attack or retrieval setup changes, which is load-bearing for the generalizability of the defense.

    Authors: We acknowledge the value of broader testing for generalizability. While our evaluation already spans three retrievers under two attack scenarios, we agree that adapted PoisonedRAG variants and explicit dense-versus-sparse comparisons would provide stronger evidence. We will revise the discussion section to explicitly address this limitation and outline it as future work; however, we cannot perform these additional experiments within the current revision timeline. revision: partial

Circularity Check

0 steps flagged

No significant circularity; defense rests on empirical pattern discovery and direct implementation.

full rationale

The paper derives BiRD from an empirical observation of bidirectional ranking alignment in poisoned documents, discovered via investigation across datasets and setups, then applies a dual-signal framework using forward and backward rankings. No equations reduce performance claims to fitted parameters by construction, no self-citations form load-bearing premises, and no ansatzes or uniqueness theorems are imported from prior author work. The central claim remains an independent empirical finding applied to defense design, with results reported across multiple retrievers and LLMs without reducing to self-referential inputs.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

The approach relies on an empirical discovery of ranking behavior differences and standard retrieval ranking mechanics; no free parameters, domain axioms beyond ordinary IR assumptions, or new invented entities are introduced.

pith-pipeline@v0.9.0 · 5748 in / 1164 out tokens · 36012 ms · 2026-05-20T03:41:25.246468+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

41 extracted references · 41 canonical work pages

  1. [1]

    A Survey on Hallucination in Large Language Models: Principles, Taxonomy, Challenges, and Open Questions,

    L. Huang, W. Yu, W. Ma, W. Zhong, Z. Feng, H. Wang, Q. Chen, W. Peng, X. Feng, B. Qin, and T. Liu, “A Survey on Hallucination in Large Language Models: Principles, Taxonomy, Challenges, and Open Questions,” ACM Trans. Inf. Syst., vol. 43, no. 2, pp. 42:1–42:55, 2025

  2. [2]

    A survey on large language model (LLM) security and privacy: The Good, The Bad, and The Ugly,

    Y . Yao, J. Duan, K. Xu, Y . Cai, Z. Sun, and Y . Zhang, “A survey on large language model (LLM) security and privacy: The Good, The Bad, and The Ugly,”High- Confidence Computing, vol. 4, no. 2, p. 100211, 2024

  3. [3]

    Retrieval-Augmented Generation for Large Language Models: A Survey,

    Y . Gao, Y . Xiong, X. Gao, K. Jia, J. Pan, Y . Bi, Y . Dai, J. Sun, M. Wang, and H. Wang, “Retrieval-Augmented Generation for Large Language Models: A Survey,” 2024

  4. [4]

    PoisonedRAG: Knowledge Corruption Attacks to Retrieval-Augmented Generation of Large Language Models,

    W. Zou, R. Geng, B. Wang, and J. Jia, “PoisonedRAG: Knowledge Corruption Attacks to Retrieval-Augmented Generation of Large Language Models,” 2024

  5. [5]

    Poisoning Retrieval Corpora by Injecting Adversarial Passages,

    Z. Zhong, Z. Huang, A. Wettig, and D. Chen, “Poisoning Retrieval Corpora by Injecting Adversarial Passages,” inProceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, H. Bouamor, J. Pino, and K. Bali, Eds. Singapore: Association for Computational Linguistics, 2023, pp. 13 764–13 775

  6. [6]

    GASLITEing the Retrieval: Exploring Vulnerabilities in Dense Embedding-based Search,

    M. Ben-Tov and M. Sharif, “GASLITEing the Retrieval: Exploring Vulnerabilities in Dense Embedding-based Search,” 2025

  7. [7]

    The Silent Saboteur: Impercepti- ble Adversarial Attacks against Black-Box Retrieval- Augmented Generation Systems,

    H. Song, Y .-a. Liu, R. Zhang, J. Guo, J. Lv, M. de Ri- jke, and X. Cheng, “The Silent Saboteur: Impercepti- ble Adversarial Attacks against Black-Box Retrieval- Augmented Generation Systems,” 2025

  8. [8]

    Topic-FlipRAG: Topic-Orientated Adversarial Opinion Manipulation Attacks to Retrieval- Augmented Generation Models,

    Y . Gong, Z. Chen, M. Chen, F. Yu, W. Lu, X. Wang, X. Liu, and J. Liu, “Topic-FlipRAG: Topic-Orientated Adversarial Opinion Manipulation Attacks to Retrieval- Augmented Generation Models,” 2025

  9. [9]

    Your RAG is Unfair: Exposing Fairness Vulnerabilities in Retrieval-Augmented Gener- ation via Backdoor Attacks,

    G. Bagwe, S. S. Chaturvedi, X. Ma, X. Yuan, K.-C. Wang, and L. Zhang, “Your RAG is Unfair: Exposing Fairness Vulnerabilities in Retrieval-Augmented Gener- ation via Backdoor Attacks,” 2025

  10. [10]

    SeCon-RAG: A Two- Stage Semantic Filtering and Conflict-Free Framework for Trustworthy RAG,

    X. Si, M. Zhu, S. Qin, L. Yu, L. Zhang, S. Liu, X. Li, R. Duan, Y . Liu, and X. Jia, “SeCon-RAG: A Two- Stage Semantic Filtering and Conflict-Free Framework for Trustworthy RAG,” 2025

  11. [11]

    TrustRAG: Enhancing Ro- bustness and Trustworthiness in Retrieval-Augmented Generation,

    H. Zhou, K.-H. Lee, Z. Zhan, Y . Chen, Z. Li, Z. Wang, H. Haddadi, and E. Yilmaz, “TrustRAG: Enhancing Ro- bustness and Trustworthiness in Retrieval-Augmented Generation,” 2025

  12. [12]

    ReliabilityRAG: Effective and Provably Robust Defense for RAG-based Web-Search,

    Z. Shen, B. Imana, T. Wu, C. Xiang, P. Mittal, and A. Korolova, “ReliabilityRAG: Effective and Provably Robust Defense for RAG-based Web-Search,” 2025

  13. [13]

    Certifiably Robust RAG against Retrieval Corruption,

    C. Xiang, T. Wu, Z. Zhong, D. Wagner, D. Chen, and P. Mittal, “Certifiably Robust RAG against Retrieval Corruption,” 2024

  14. [14]

    On the Vulnerabil- ity of Applying Retrieval-Augmented Generation within Knowledge-Intensive Application Domains,

    X. Xian, G. Wang, X. Bi, J. Srinivasa, A. Kundu, C. Fleming, M. Hong, and J. Ding, “On the Vulnerabil- ity of Applying Retrieval-Augmented Generation within Knowledge-Intensive Application Domains,” 2025

  15. [15]

    Astute RAG: Overcoming Imperfect Retrieval Augmen- tation and Knowledge Conflicts for Large Language Models,

    F. Wang, X. Wan, R. Sun, J. Chen, and S. O. Arik, “Astute RAG: Overcoming Imperfect Retrieval Augmen- tation and Knowledge Conflicts for Large Language Models,” inProceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), W. Che, J. Nabende, E. Shutova, and M. T. Pilehvar, Eds. Vienna, Austria: Associa...

  16. [16]

    InstructRAG: Instructing Retrieval-Augmented Generation via Self- Synthesized Rationales,

    Z. Wei, W.-L. Chen, and Y . Meng, “InstructRAG: Instructing Retrieval-Augmented Generation via Self- Synthesized Rationales,” inThe Thirteenth International Conference on Learning Representations, 2024. 13

  17. [17]

    Query Rewriting in Retrieval-Augmented Large Language Models,

    X. Ma, Y . Gong, P. He, H. Zhao, and N. Duan, “Query Rewriting in Retrieval-Augmented Large Language Models,” inProceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, H. Bouamor, J. Pino, and K. Bali, Eds. Singapore: Association for Computational Linguistics, 2023, pp. 5303–5315

  18. [18]

    Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks,

    P. Lewis, E. Perez, A. Piktus, F. Petroni, V . Karpukhin, N. Goyal, H. Küttler, M. Lewis, W.-t. Yih, T. Rock- täschel, S. Riedel, and D. Kiela, “Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks,” in Advances in Neural Information Processing Systems, vol. 33. Curran Associates, Inc., 2020, pp. 9459–9474

  19. [19]

    Siren’s Song in the AI Ocean: A Survey on Hallucination in Large Language Models,

    Y . Zhang, Y . Li, L. Cui, D. Cai, L. Liu, T. Fu, X. Huang, E. Zhao, Y . Zhang, C. Xu, Y . Chen, L. Wang, A. T. Luu, W. Bi, F. Shi, and S. Shi, “Siren’s Song in the AI Ocean: A Survey on Hallucination in Large Language Models,” 2025

  20. [20]

    SimRAG: Self- Improving Retrieval-Augmented Generation for Adapt- ing Large Language Models to Specialized Domains,

    R. Xu, H. Liu, S. Nag, Z. Dai, Y . Xie, X. Tang, C. Luo, Y . Li, J. C. Ho, C. Yang, and Q. He, “SimRAG: Self- Improving Retrieval-Augmented Generation for Adapt- ing Large Language Models to Specialized Domains,” in Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Te...

  21. [21]

    HijackRAG: Hijacking Attacks against Retrieval-Augmented Large Language Models,

    Y . Zhang, Q. Li, T. Du, X. Zhang, X. Zhao, Z. Feng, and J. Yin, “HijackRAG: Hijacking Attacks against Retrieval-Augmented Large Language Models,” 2024

  22. [22]

    PR-Attack: Coordi- nated Prompt-RAG Attacks on Retrieval-Augmented Generation in Large Language Models via Bilevel Opti- mization,

    Y . Jiao, X. Wang, and K. Yang, “PR-Attack: Coordi- nated Prompt-RAG Attacks on Retrieval-Augmented Generation in Large Language Models via Bilevel Opti- mization,” 2025

  23. [23]

    Maximal independent sets in bipartite graphs,

    J. Liu, “Maximal independent sets in bipartite graphs,” Journal of Graph Theory, vol. 17, no. 4, pp. 495–507,

  24. [24]

    Available: https://onlinelibrary.wiley

    [Online]. Available: https://onlinelibrary.wiley. com/doi/abs/10.1002/jgt.3190170407

  25. [25]

    Bidi- rectional ranking for person re-identification,

    Q. Leng, R. Hu, C. Liang, Y . Wang, and J. Chen, “Bidi- rectional ranking for person re-identification,” in2013 IEEE International Conference on Multimedia and Expo (ICME), 2013, pp. 1–6

  26. [26]

    Bidirectional Attention Flow for Machine Comprehen- sion,

    M. Seo, A. Kembhavi, A. Farhadi, and H. Hajishirzi, “Bidirectional Attention Flow for Machine Comprehen- sion,” 2018

  27. [27]

    Query2doc: Query Expansion with Large Language Models,

    L. Wang, N. Yang, and F. Wei, “Query2doc: Query Expansion with Large Language Models,” 2023

  28. [28]

    Hello neighbor: Accurate object retrieval with k-reciprocal nearest neighbors,

    D. Qin, S. Gammeter, L. Bossard, T. Quack, and L. van Gool, “Hello neighbor: Accurate object retrieval with k-reciprocal nearest neighbors,” inCVPR 2011, 2011, pp. 777–784

  29. [29]

    A contextual dissimilarity measure for accurate and efficient image search,

    H. Jegou, H. Harzallah, and C. Schmid, “A contextual dissimilarity measure for accurate and efficient image search,” in2007 IEEE Conference on Computer Vision and Pattern Recognition, 2007, pp. 1–8

  30. [30]

    Re-ranking Person Re-identification with k-reciprocal Encoding,

    Z. Zhong, L. Zheng, D. Cao, and S. Li, “Re-ranking Person Re-identification with k-reciprocal Encoding,” 2017

  31. [31]

    Natural Questions: A Benchmark for Question Answer- ing Research,

    T. Kwiatkowski, J. Palomaki, O. Redfield, M. Collins, A. Parikh, C. Alberti, D. Epstein, I. Polosukhin, J. De- vlin, K. Lee, K. Toutanova, L. Jones, M. Kelcey, M.-W. Chang, A. M. Dai, J. Uszkoreit, Q. Le, and S. Petrov, “Natural Questions: A Benchmark for Question Answer- ing Research,”Transactions of the Association for Com- putational Linguistics, vol. ...

  32. [32]

    MS MARCO: A Human Generated MAchine Reading COmprehension Dataset,

    P. Bajaj, D. Campos, N. Craswell, L. Deng, J. Gao, X. Liu, R. Majumder, A. McNamara, B. Mitra, T. Nguyen, M. Rosenberg, X. Song, A. Stoica, S. Ti- wary, and T. Wang, “MS MARCO: A Human Generated MAchine Reading COmprehension Dataset,” 2018

  33. [33]

    HotpotQA: A Dataset for Diverse, Explainable Multi-hop Question Answering,

    Z. Yang, P. Qi, S. Zhang, Y . Bengio, W. Cohen, R. Salakhutdinov, and C. D. Manning, “HotpotQA: A Dataset for Diverse, Explainable Multi-hop Question Answering,” inProceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, E. Riloff, D. Chiang, J. Hockenmaier, and J. Tsujii, Eds. Brussels, Belgium: Association for Computationa...

  34. [34]

    Unsupervised Dense Information Retrieval with Contrastive Learning,

    G. Izacard, M. Caron, L. Hosseini, S. Riedel, P. Bo- janowski, A. Joulin, and E. Grave, “Unsupervised Dense Information Retrieval with Contrastive Learning,” 2022

  35. [35]

    Approximate Nearest Neighbor Negative Contrastive Learning for Dense Text Retrieval,

    L. Xiong, C. Xiong, Y . Li, K.-F. Tang, J. Liu, P. N. Bennett, J. Ahmed, and A. Overwijk, “Approximate Nearest Neighbor Negative Contrastive Learning for Dense Text Retrieval,” inInternational Conference on Learning Representations, 2020

  36. [36]

    Dense Passage Re- trieval for Open-Domain Question Answering,

    V . Karpukhin, B. Oguz, S. Min, P. Lewis, L. Wu, S. Edunov, D. Chen, and W.-t. Yih, “Dense Passage Re- trieval for Open-Domain Question Answering,” inPro- ceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), B. Webber, T. Cohn, Y . He, and Y . Liu, Eds. Online: Association for Computational Linguistics, 2020, pp. 6769–6781

  37. [37]

    Qwen2.5 Technical Re- port,

    Qwen, A. Yang, B. Yang, B. Zhang, B. Hui, B. Zheng, B. Yu, C. Li, D. Liu, F. Huang, H. Wei, H. Lin, J. Yang, 14 J. Tu, J. Zhang, J. Yang, J. Yang, J. Zhou, J. Lin, K. Dang, K. Lu, K. Bao, K. Yang, L. Yu, M. Li, M. Xue, P. Zhang, Q. Zhu, R. Men, R. Lin, T. Li, T. Tang, T. Xia, X. Ren, X. Ren, Y . Fan, Y . Su, Y . Zhang, Y . Wan, Y . Liu, Z. Cui, Z. Zhang, ...

  38. [38]

    Mistral 7B,

    A. Q. Jiang, A. Sablayrolles, A. Mensch, C. Bamford, D. S. Chaplot, D. de las Casas, F. Bressand, G. Lengyel, G. Lample, L. Saulnier, L. R. Lavaud, M.-A. Lachaux, P. Stock, T. L. Scao, T. Lavril, T. Wang, T. Lacroix, and W. E. Sayed, “Mistral 7B,” 2023

  39. [39]

    The distribution of the flora in the alpine zone

    P. Jaccard, “The distribution of the flora in the alpine zone.”New Phytologist, vol. 11, no. 2, pp. 37–50, 1912. [Online]. Available: https://nph.onlinelibrary.wiley. com/doi/abs/10.1111/j.1469-8137.1912.tb05611.x

  40. [40]

    TESTS FOR RANK CORRELATION COEFFI- CIENTS. I,

    E. C. FIELLER, H. O. HARTLEY , and E. S. PEAR- SON, “TESTS FOR RANK CORRELATION COEFFI- CIENTS. I,”Biometrika, vol. 44, no. 3-4, pp. 470–481, 1957

  41. [41]

    A similarity mea- sure for indefinite rankings,

    W. Webber, A. Moffat, and J. Zobel, “A similarity mea- sure for indefinite rankings,”ACM Transactions on In- formation Systems, vol. 28, no. 4, pp. 1–38, 2010. A Appendix A.1 Formulation of Bidirectional Ranking De- fense To facilitate a clear understanding of the proposed framework, we provide a comprehensive summary of the mathematical notations and var...