Tailored queries enable identification of the embedding model used by a black-box IR system from the unordered set of retrieved documents, even when a reranker is present.
Poisoning retrieval corpora by injecting adversarial passages
9 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
verdicts
UNVERDICTED 9roles
background 3polarities
background 3representative citing papers
RAGCharacter localizes poisoned character spans in RAG evidence via prompt-conditioned counterfactual masking and achieves the best accuracy-over-attribution trade-off across tested attacks and models.
CRAFT is a supervised LLM framework using retrieval-augmented generation, self-refinement, fine-tuning, and preference optimization to create fluent adversarial content that boosts target ranks in neural ranking models, outperforming baselines on MS MARCO and TREC benchmarks with cross-architecture
Provenance checks in graph memory are blind to structural attacks that reallocate top-k membership; authselect prevents this by enforcing selection on the authenticated subgraph only.
RAG models exhibit a monitoring-control gap: they acknowledge epistemic conflicts in accumulating documents yet fail to constrain unsafe recommendations, with single-turn tests overestimating safety.
AttnTrace is an attention-weight-based context traceback method for LLMs that claims higher accuracy and efficiency than prior art like TracLLM while aiding prompt injection detection.
ToE is a hierarchical claim verification framework using RL-driven multi-source retrieval, evidence evaluation, and tree aggregation that reports 4-24 point gains over baselines especially on poisoned inputs.
Introduces NPAS and AV Filter using LLM attention weights to defend RAG against poisoning, reporting up to 20% accuracy gains while adaptive attacks reach 35% success.
A survey that taxonomizes threats to agentic AI, reviews benchmarks and evaluation methods, discusses technical and governance defenses, and identifies open challenges.
citing papers explorer
-
Embedding Inference Attack
Tailored queries enable identification of the embedding model used by a black-box IR system from the unordered set of retrieved documents, even when a reranker is present.
-
Needle-in-RAG: Prompt-Conditioned Character-Level Traceback of Poisoned Spans in Retrieved Evidence
RAGCharacter localizes poisoned character spans in RAG evidence via prompt-conditioned counterfactual masking and achieves the best accuracy-over-attribution trade-off across tested attacks and models.
-
Led to Mislead: Adversarial Content Injection for Attacks on Neural Ranking Models
CRAFT is a supervised LLM framework using retrieval-augmented generation, self-refinement, fine-tuning, and preference optimization to create fluent adversarial content that boosts target ranks in neural ranking models, outperforming baselines on MS MARCO and TREC benchmarks with cross-architecture
-
Selection Integrity for LLM Graph Memory: An Accumulability Criterion for Information-Flow-Blind Retrieval
Provenance checks in graph memory are blind to structural attacks that reallocate top-k membership; authselect prevents this by enforcing selection on the authenticated subgraph only.
-
Detecting Is Not Resolving: The Monitoring Control Gap in Retrieval Augmented LLMs
RAG models exhibit a monitoring-control gap: they acknowledge epistemic conflicts in accumulating documents yet fail to constrain unsafe recommendations, with single-turn tests overestimating safety.
-
AttnTrace: Contextual Attribution of Prompt Injection and Knowledge Corruption
AttnTrace is an attention-weight-based context traceback method for LLMs that claims higher accuracy and efficiency than prior art like TracLLM while aiding prompt injection detection.
-
ToE: A Hierarchical and Explainable Claim Verification Framework with Dynamic Multi-source Evidence Retrieval and Aggregation
ToE is a hierarchical claim verification framework using RL-driven multi-source retrieval, evidence evaluation, and tree aggregation that reports 4-24 point gains over baselines especially on poisoned inputs.
-
Through the Stealth Lens: Attention-Aware Defenses Against Poisoning in RAG
Introduces NPAS and AV Filter using LLM attention weights to defend RAG against poisoning, reporting up to 20% accuracy gains while adaptive attacks reach 35% success.
-
Agentic AI Security: Threats, Defenses, Evaluation, and Open Challenges
A survey that taxonomizes threats to agentic AI, reviews benchmarks and evaluation methods, discusses technical and governance defenses, and identifies open challenges.