LLM lies: Hallucinations are not bugs, but features as adversarial examples

Yao, Jia-Yu, Ning, Kun-Peng, Liu, Zhen-Hui, Ning, Mu-Nan, Yuan, Li , year = · 2023 · arXiv 2310.01469

12 Pith papers cite this work. Polarity classification is still indexing.

12 Pith papers citing it

read on arXiv browse 12 citing papers

citation-role summary

background 1

citation-polarity summary

background 1

representative citing papers

Parasites in the Toolchain: A Large-Scale Analysis of Attacks on the MCP Ecosystem

cs.CR · 2025-09-08 · unverdicted · novelty 8.0

This paper defines a new Parasitic Toolchain Attack pattern (MCP-UPD) that assembles legitimate tools into privacy-exfiltrating workflows and reports the first large-scale scan of 12230 MCP tools across 1360 servers revealing systemic vulnerabilities from missing isolation and least-privilege in the

When Search Goes Wrong: Red-Teaming Web-Augmented Large Language Models

cs.CR · 2025-10-09 · unverdicted · novelty 7.0

CREST-Search is a red-teaming framework that crafts seemingly benign search queries to induce unsafe citations from web-augmented LLMs, backed by a new WebSearch-Harm dataset for fine-tuning a specialized attacker model.

Efficient Retrieval-Augmented Generation via Token Co-occurrence Graphs

cs.CL · 2026-06-29 · unverdicted · novelty 6.0

TIGRAG constructs token co-occurrence graphs for scalable graph-augmented RAG and uses iterative entity-driven retrieval to improve multi-hop QA performance over dense and prior graph methods.

REALISTA: Realistic Latent Adversarial Attacks that Elicit LLM Hallucinations

cs.CL · 2026-05-12 · unverdicted · novelty 6.0

REALISTA generates semantically coherent adversarial prompts via latent-space optimization over input-dependent editing directions, achieving stronger hallucination elicitation than prior realistic attacks on open-source and reasoning LLMs.

Principled Detection of Hallucinations in Large Language Models via Multiple Testing

cs.CL · 2025-08-25 · unverdicted · novelty 6.0

The method aggregates multiple hallucination evaluation scores via conformal p-values to enable calibrated detection with controlled false alarm rates across LLMs and datasets.

MolReFlect: Towards In-Context Fine-grained Alignments between Molecules and Texts

cs.CL · 2024-11-22 · unverdicted · novelty 6.0

MolReFlect introduces a teacher-student framework that automatically creates fine-grained molecule-text alignments to achieve SOTA results on molecule-caption translation.

Dive into Ambiguity: A*-Inspired Multi-Agents Commonsense Obfuscation Attack on LLM Prompts

cs.AI · 2026-05-31 · unverdicted · novelty 5.0

An A*-inspired multi-agent framework with hierarchical rewriting and a dynamic gamma parameter generates obfuscated prompts that achieve higher LLM attack success rates with fewer attempts than exhaustive search.

LLM-EDT: Large Language Model Enhanced Cross-domain Sequential Recommendation with Dual-phase Training

cs.IR · 2025-11-25 · unverdicted · novelty 5.0

LLM-EDT improves cross-domain sequential recommendation by using LLMs for transferable item augmentation, dual-phase training to handle domain transitions, and domain-aware profiling to build user profiles.

Hybrid Adversarial Defence for Natural Language Understanding Tasks

cs.CL · 2026-06-03 · unverdicted · novelty 4.0

Hybrid entropy-uncertainty-geometric defence improves clean accuracy by up to 43% and adversarial robustness by up to 65% on NLU and security benchmarks.

Like a Hammer, It Can Build, It Can Break: Large Language Model Uses, Perceptions, and Adoption in Cybersecurity Operations on Reddit

cs.CR · 2026-04-11

SelfGrader: LLM Jailbreak Detection via Anchored Token-Level Logits

cs.CR · 2026-04-01

Efficient Black-Box Fault Localization for System-Level Test Code Using Large Language Models

cs.SE · 2025-06-23

citing papers explorer

Showing 9 of 9 citing papers after filters.

Parasites in the Toolchain: A Large-Scale Analysis of Attacks on the MCP Ecosystem cs.CR · 2025-09-08 · unverdicted · none · ref 52
This paper defines a new Parasitic Toolchain Attack pattern (MCP-UPD) that assembles legitimate tools into privacy-exfiltrating workflows and reports the first large-scale scan of 12230 MCP tools across 1360 servers revealing systemic vulnerabilities from missing isolation and least-privilege in the
When Search Goes Wrong: Red-Teaming Web-Augmented Large Language Models cs.CR · 2025-10-09 · unverdicted · none · ref 43
CREST-Search is a red-teaming framework that crafts seemingly benign search queries to induce unsafe citations from web-augmented LLMs, backed by a new WebSearch-Harm dataset for fine-tuning a specialized attacker model.
Efficient Retrieval-Augmented Generation via Token Co-occurrence Graphs cs.CL · 2026-06-29 · unverdicted · none · ref 47
TIGRAG constructs token co-occurrence graphs for scalable graph-augmented RAG and uses iterative entity-driven retrieval to improve multi-hop QA performance over dense and prior graph methods.
REALISTA: Realistic Latent Adversarial Attacks that Elicit LLM Hallucinations cs.CL · 2026-05-12 · unverdicted · none · ref 183
REALISTA generates semantically coherent adversarial prompts via latent-space optimization over input-dependent editing directions, achieving stronger hallucination elicitation than prior realistic attacks on open-source and reasoning LLMs.
Principled Detection of Hallucinations in Large Language Models via Multiple Testing cs.CL · 2025-08-25 · unverdicted · none · ref 25
The method aggregates multiple hallucination evaluation scores via conformal p-values to enable calibrated detection with controlled false alarm rates across LLMs and datasets.
MolReFlect: Towards In-Context Fine-grained Alignments between Molecules and Texts cs.CL · 2024-11-22 · unverdicted · none · ref 42
MolReFlect introduces a teacher-student framework that automatically creates fine-grained molecule-text alignments to achieve SOTA results on molecule-caption translation.
Dive into Ambiguity: A*-Inspired Multi-Agents Commonsense Obfuscation Attack on LLM Prompts cs.AI · 2026-05-31 · unverdicted · none · ref 39
An A*-inspired multi-agent framework with hierarchical rewriting and a dynamic gamma parameter generates obfuscated prompts that achieve higher LLM attack success rates with fewer attempts than exhaustive search.
LLM-EDT: Large Language Model Enhanced Cross-domain Sequential Recommendation with Dual-phase Training cs.IR · 2025-11-25 · unverdicted · none · ref 31
LLM-EDT improves cross-domain sequential recommendation by using LLMs for transferable item augmentation, dual-phase training to handle domain transitions, and domain-aware profiling to build user profiles.
Hybrid Adversarial Defence for Natural Language Understanding Tasks cs.CL · 2026-06-03 · unverdicted · none · ref 48
Hybrid entropy-uncertainty-geometric defence improves clean accuracy by up to 43% and adversarial robustness by up to 65% on NLU and security benchmarks.

LLM lies: Hallucinations are not bugs, but features as adversarial examples

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer