Poisoning retrieval corpora by injecting adversarial passages

Zexuan Zhong, Ziqing Huang, Alexander Wettig, Danqi Chen · 2023 · arXiv 2310.19156

9 Pith papers cite this work. Polarity classification is still indexing.

9 Pith papers citing it

read on arXiv browse 9 citing papers

citation-role summary

background 3

citation-polarity summary

background 3

representative citing papers

Embedding Inference Attack

cs.CR · 2026-07-01 · unverdicted · novelty 7.0

Tailored queries enable identification of the embedding model used by a black-box IR system from the unordered set of retrieved documents, even when a reranker is present.

Needle-in-RAG: Prompt-Conditioned Character-Level Traceback of Poisoned Spans in Retrieved Evidence

cs.CR · 2026-05-03 · unverdicted · novelty 7.0

RAGCharacter localizes poisoned character spans in RAG evidence via prompt-conditioned counterfactual masking and achieves the best accuracy-over-attribution trade-off across tested attacks and models.

Led to Mislead: Adversarial Content Injection for Attacks on Neural Ranking Models

cs.IR · 2026-05-02 · unverdicted · novelty 7.0

CRAFT is a supervised LLM framework using retrieval-augmented generation, self-refinement, fine-tuning, and preference optimization to create fluent adversarial content that boosts target ranks in neural ranking models, outperforming baselines on MS MARCO and TREC benchmarks with cross-architecture

Selection Integrity for LLM Graph Memory: An Accumulability Criterion for Information-Flow-Blind Retrieval

cs.CR · 2026-06-10 · unverdicted · novelty 6.0

Provenance checks in graph memory are blind to structural attacks that reallocate top-k membership; authselect prevents this by enforcing selection on the authenticated subgraph only.

Detecting Is Not Resolving: The Monitoring Control Gap in Retrieval Augmented LLMs

cs.AI · 2026-05-26 · unverdicted · novelty 6.0

RAG models exhibit a monitoring-control gap: they acknowledge epistemic conflicts in accumulating documents yet fail to constrain unsafe recommendations, with single-turn tests overestimating safety.

AttnTrace: Contextual Attribution of Prompt Injection and Knowledge Corruption

cs.CL · 2025-08-05 · unverdicted · novelty 6.0

AttnTrace is an attention-weight-based context traceback method for LLMs that claims higher accuracy and efficiency than prior art like TracLLM while aiding prompt injection detection.

ToE: A Hierarchical and Explainable Claim Verification Framework with Dynamic Multi-source Evidence Retrieval and Aggregation

cs.AI · 2026-06-26 · unverdicted · novelty 5.0

ToE is a hierarchical claim verification framework using RL-driven multi-source retrieval, evidence evaluation, and tree aggregation that reports 4-24 point gains over baselines especially on poisoned inputs.

Through the Stealth Lens: Attention-Aware Defenses Against Poisoning in RAG

cs.CR · 2025-06-04 · unverdicted · novelty 5.0

Introduces NPAS and AV Filter using LLM attention weights to defend RAG against poisoning, reporting up to 20% accuracy gains while adaptive attacks reach 35% success.

Agentic AI Security: Threats, Defenses, Evaluation, and Open Challenges

cs.AI · 2025-10-27 · unverdicted · novelty 4.0

A survey that taxonomizes threats to agentic AI, reviews benchmarks and evaluation methods, discusses technical and governance defenses, and identifies open challenges.

citing papers explorer

Showing 9 of 9 citing papers.

Embedding Inference Attack cs.CR · 2026-07-01 · unverdicted · none · ref 40
Tailored queries enable identification of the embedding model used by a black-box IR system from the unordered set of retrieved documents, even when a reranker is present.
Needle-in-RAG: Prompt-Conditioned Character-Level Traceback of Poisoned Spans in Retrieved Evidence cs.CR · 2026-05-03 · unverdicted · none · ref 66
RAGCharacter localizes poisoned character spans in RAG evidence via prompt-conditioned counterfactual masking and achieves the best accuracy-over-attribution trade-off across tested attacks and models.
Led to Mislead: Adversarial Content Injection for Attacks on Neural Ranking Models cs.IR · 2026-05-02 · unverdicted · none · ref 60
CRAFT is a supervised LLM framework using retrieval-augmented generation, self-refinement, fine-tuning, and preference optimization to create fluent adversarial content that boosts target ranks in neural ranking models, outperforming baselines on MS MARCO and TREC benchmarks with cross-architecture
Selection Integrity for LLM Graph Memory: An Accumulability Criterion for Information-Flow-Blind Retrieval cs.CR · 2026-06-10 · unverdicted · none · ref 20
Provenance checks in graph memory are blind to structural attacks that reallocate top-k membership; authselect prevents this by enforcing selection on the authenticated subgraph only.
Detecting Is Not Resolving: The Monitoring Control Gap in Retrieval Augmented LLMs cs.AI · 2026-05-26 · unverdicted · none · ref 5
RAG models exhibit a monitoring-control gap: they acknowledge epistemic conflicts in accumulating documents yet fail to constrain unsafe recommendations, with single-turn tests overestimating safety.
AttnTrace: Contextual Attribution of Prompt Injection and Knowledge Corruption cs.CL · 2025-08-05 · unverdicted · none · ref 73
AttnTrace is an attention-weight-based context traceback method for LLMs that claims higher accuracy and efficiency than prior art like TracLLM while aiding prompt injection detection.
ToE: A Hierarchical and Explainable Claim Verification Framework with Dynamic Multi-source Evidence Retrieval and Aggregation cs.AI · 2026-06-26 · unverdicted · none · ref 5
ToE is a hierarchical claim verification framework using RL-driven multi-source retrieval, evidence evaluation, and tree aggregation that reports 4-24 point gains over baselines especially on poisoned inputs.
Through the Stealth Lens: Attention-Aware Defenses Against Poisoning in RAG cs.CR · 2025-06-04 · unverdicted · none · ref 31
Introduces NPAS and AV Filter using LLM attention weights to defend RAG against poisoning, reporting up to 20% accuracy gains while adaptive attacks reach 35% success.
Agentic AI Security: Threats, Defenses, Evaluation, and Open Challenges cs.AI · 2025-10-27 · unverdicted · none · ref 68
A survey that taxonomizes threats to agentic AI, reviews benchmarks and evaluation methods, discusses technical and governance defenses, and identifies open challenges.

Poisoning retrieval corpora by injecting adversarial passages

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer