This paper defines a new Parasitic Toolchain Attack pattern (MCP-UPD) that assembles legitimate tools into privacy-exfiltrating workflows and reports the first large-scale scan of 12230 MCP tools across 1360 servers revealing systemic vulnerabilities from missing isolation and least-privilege in the
LLM lies: Hallucinations are not bugs, but features as adversarial examples
12 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
roles
background 1polarities
background 1representative citing papers
CREST-Search is a red-teaming framework that crafts seemingly benign search queries to induce unsafe citations from web-augmented LLMs, backed by a new WebSearch-Harm dataset for fine-tuning a specialized attacker model.
TIGRAG constructs token co-occurrence graphs for scalable graph-augmented RAG and uses iterative entity-driven retrieval to improve multi-hop QA performance over dense and prior graph methods.
REALISTA generates semantically coherent adversarial prompts via latent-space optimization over input-dependent editing directions, achieving stronger hallucination elicitation than prior realistic attacks on open-source and reasoning LLMs.
The method aggregates multiple hallucination evaluation scores via conformal p-values to enable calibrated detection with controlled false alarm rates across LLMs and datasets.
MolReFlect introduces a teacher-student framework that automatically creates fine-grained molecule-text alignments to achieve SOTA results on molecule-caption translation.
An A*-inspired multi-agent framework with hierarchical rewriting and a dynamic gamma parameter generates obfuscated prompts that achieve higher LLM attack success rates with fewer attempts than exhaustive search.
LLM-EDT improves cross-domain sequential recommendation by using LLMs for transferable item augmentation, dual-phase training to handle domain transitions, and domain-aware profiling to build user profiles.
Hybrid entropy-uncertainty-geometric defence improves clean accuracy by up to 43% and adversarial robustness by up to 65% on NLU and security benchmarks.
citing papers explorer
-
Parasites in the Toolchain: A Large-Scale Analysis of Attacks on the MCP Ecosystem
This paper defines a new Parasitic Toolchain Attack pattern (MCP-UPD) that assembles legitimate tools into privacy-exfiltrating workflows and reports the first large-scale scan of 12230 MCP tools across 1360 servers revealing systemic vulnerabilities from missing isolation and least-privilege in the
-
When Search Goes Wrong: Red-Teaming Web-Augmented Large Language Models
CREST-Search is a red-teaming framework that crafts seemingly benign search queries to induce unsafe citations from web-augmented LLMs, backed by a new WebSearch-Harm dataset for fine-tuning a specialized attacker model.
-
Efficient Retrieval-Augmented Generation via Token Co-occurrence Graphs
TIGRAG constructs token co-occurrence graphs for scalable graph-augmented RAG and uses iterative entity-driven retrieval to improve multi-hop QA performance over dense and prior graph methods.
-
REALISTA: Realistic Latent Adversarial Attacks that Elicit LLM Hallucinations
REALISTA generates semantically coherent adversarial prompts via latent-space optimization over input-dependent editing directions, achieving stronger hallucination elicitation than prior realistic attacks on open-source and reasoning LLMs.
-
Principled Detection of Hallucinations in Large Language Models via Multiple Testing
The method aggregates multiple hallucination evaluation scores via conformal p-values to enable calibrated detection with controlled false alarm rates across LLMs and datasets.
-
MolReFlect: Towards In-Context Fine-grained Alignments between Molecules and Texts
MolReFlect introduces a teacher-student framework that automatically creates fine-grained molecule-text alignments to achieve SOTA results on molecule-caption translation.
-
Dive into Ambiguity: A*-Inspired Multi-Agents Commonsense Obfuscation Attack on LLM Prompts
An A*-inspired multi-agent framework with hierarchical rewriting and a dynamic gamma parameter generates obfuscated prompts that achieve higher LLM attack success rates with fewer attempts than exhaustive search.
-
LLM-EDT: Large Language Model Enhanced Cross-domain Sequential Recommendation with Dual-phase Training
LLM-EDT improves cross-domain sequential recommendation by using LLMs for transferable item augmentation, dual-phase training to handle domain transitions, and domain-aware profiling to build user profiles.
-
Hybrid Adversarial Defence for Natural Language Understanding Tasks
Hybrid entropy-uncertainty-geometric defence improves clean accuracy by up to 43% and adversarial robustness by up to 65% on NLU and security benchmarks.