arXiv preprint arXiv:2402.08416 (2024)

Gelei Deng, Yi Liu, Kailong Wang, Yuekang Li, Tianwei Zhang, Yang Liu · 2024 · arXiv 2402.08416

8 Pith papers cite this work. Polarity classification is still indexing.

8 Pith papers citing it

read on arXiv browse 8 citing papers

citation-role summary

background 1 method 1

citation-polarity summary

background 2

representative citing papers

Needle-in-RAG: Prompt-Conditioned Character-Level Traceback of Poisoned Spans in Retrieved Evidence

cs.CR · 2026-05-03 · unverdicted · novelty 7.0

RAGCharacter localizes poisoned character spans in RAG evidence via prompt-conditioned counterfactual masking and achieves the best accuracy-over-attribution trade-off across tested attacks and models.

RACC: Representation-Aware Coverage Criteria for LLM Safety Testing

cs.SE · 2026-02-02 · unverdicted · novelty 7.0

RACC defines six representation-aware coverage criteria that score jailbreak test suites by measuring activation of safety concepts extracted from LLM hidden states on a calibration set.

Relevance as a Vulnerability: How Web Retrieval Degrades Safety Alignment in LLM Agents

cs.CL · 2026-05-28 · unverdicted · novelty 6.0

Web retrieval degrades safety alignment in LLM agents, with relevance activating vulnerabilities including a Safe Source Paradox where oppositional content increases harmful compliance.

Detecting Is Not Resolving: The Monitoring Control Gap in Retrieval Augmented LLMs

cs.AI · 2026-05-26 · unverdicted · novelty 6.0

RAG models exhibit a monitoring-control gap: they acknowledge epistemic conflicts in accumulating documents yet fail to constrain unsafe recommendations, with single-turn tests overestimating safety.

Uncovering Logit Suppression Vulnerabilities in LLM Safety Alignment

cs.CR · 2024-05-20 · unverdicted · novelty 6.0

SSAG bypasses logit suppression in five LLMs to produce harmful responses at 95% success rate and 86% lower latency; VulMine reaches 77% attack success against defenses.

Security Threat Modeling for Emerging AI-Agent Protocols: A Comparative Analysis of MCP, A2A, Agora, and ANP

cs.CR · 2026-02-11 · unverdicted · novelty 5.0

The paper identifies twelve protocol-level security risks across MCP, A2A, Agora, and ANP and quantifies wrong-provider tool execution risk in MCP via a measurement-driven case study on multi-server composition.

Retrieval-Augmented Generation with Graphs (GraphRAG)

cs.IR · 2024-12-31 · unverdicted · novelty 5.0

A survey proposing a holistic GraphRAG framework with components including query processor, retriever, organizer, generator, and data source, plus domain-tailored reviews, challenges, and future directions.

Jailbreak Attacks and Defenses Against Large Language Models: A Survey

cs.CR · 2024-07-05 · accept · novelty 4.0

A survey that creates taxonomies for jailbreak attacks and defenses on LLMs, subdivides them into sub-classes, and compares evaluation approaches.

citing papers explorer

Showing 5 of 5 citing papers after filters.

Needle-in-RAG: Prompt-Conditioned Character-Level Traceback of Poisoned Spans in Retrieved Evidence cs.CR · 2026-05-03 · unverdicted · none · ref 17
RAGCharacter localizes poisoned character spans in RAG evidence via prompt-conditioned counterfactual masking and achieves the best accuracy-over-attribution trade-off across tested attacks and models.
RACC: Representation-Aware Coverage Criteria for LLM Safety Testing cs.SE · 2026-02-02 · unverdicted · none · ref 19
RACC defines six representation-aware coverage criteria that score jailbreak test suites by measuring activation of safety concepts extracted from LLM hidden states on a calibration set.
Relevance as a Vulnerability: How Web Retrieval Degrades Safety Alignment in LLM Agents cs.CL · 2026-05-28 · unverdicted · none · ref 6
Web retrieval degrades safety alignment in LLM agents, with relevance activating vulnerabilities including a Safe Source Paradox where oppositional content increases harmful compliance.
Detecting Is Not Resolving: The Monitoring Control Gap in Retrieval Augmented LLMs cs.AI · 2026-05-26 · unverdicted · none · ref 23
RAG models exhibit a monitoring-control gap: they acknowledge epistemic conflicts in accumulating documents yet fail to constrain unsafe recommendations, with single-turn tests overestimating safety.
Security Threat Modeling for Emerging AI-Agent Protocols: A Comparative Analysis of MCP, A2A, Agora, and ANP cs.CR · 2026-02-11 · unverdicted · none · ref 73
The paper identifies twelve protocol-level security risks across MCP, A2A, Agora, and ANP and quantifies wrong-provider tool execution risk in MCP via a measurement-driven case study on multi-server composition.

arXiv preprint arXiv:2402.08416 (2024)

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer