pith. sign in

hub Canonical reference

(ab) using images and sounds for indirect instruction injection in multi-modal llms

Canonical reference. 86% of citing Pith papers cite this work as background.

18 Pith papers citing it
Background 86% of classified citations

hub tools

citation-role summary

background 7

citation-polarity summary

roles

background 7

polarities

background 6 unclear 1

clear filters

representative citing papers

Cross-Modal Backdoors in Multimodal Large Language Models

cs.CR · 2026-05-08 · unverdicted · novelty 8.0

Poisoning a single connector in MLLMs establishes a reusable latent backdoor pathway that transfers across modalities with over 95% attack success rate under bounded perturbations.

The Self-Correction Illusion: LLMs Correct Others but Not Themselves

cs.AI · 2026-06-04 · conditional · novelty 6.0

Relabeling an identical erroneous claim from the model's own thought role to an external chat role increases explicit correction rates by 23-93 percentage points across 13 model-domain cells, indicating a chat-template artifact rather than a cognitive deficit.

The Surface You Test Is Not the Surface That Breaks

cs.CR · 2026-05-28 · unverdicted · novelty 6.0

Prompt injection vulnerability in tool-augmented LLMs is a model-surface interaction rather than a fixed channel property; the same payload inverts success rates across models, and adaptive attack rate exceeds single-surface baselines by 9.1 pp on average.

Hallucination as Exploit: Evidence-Carrying Multimodal Agents

cs.AI · 2026-05-18 · unverdicted · novelty 6.0 · 2 refs

Evidence-carrying multimodal agents decompose tool calls into predicates, obtain certificates from DOM/OCR/AX verifiers, and use a deterministic gate to authorize actions only when certificates support them, achieving zero unsafe executions in tested tasks.

Semantic Denial of Service in LLM-controlled robots

cs.CR · 2026-04-25 · unverdicted · novelty 6.0

Injecting brief safety-plausible phrases into robot audio triggers LLM safety halts, enabling semantic denial-of-service attacks where prompt defenses trade attack suppression for impaired genuine hazard detection.

Whispers in the Machine: Confidentiality in Agentic Systems

cs.CR · 2024-02-10 · unverdicted · novelty 6.0

Systematic testing of ten LLM agents across 20 tool scenarios and 14 attacks finds universal vulnerability to prompt injection enabling data exfiltration, with tooling amplifying leakage.

Laundering AI Authority with Adversarial Examples

cs.CR · 2026-05-05 · unverdicted · novelty 5.0

Adversarial examples enable AI authority laundering by causing production VLMs to give authoritative but wrong responses on subtly perturbed images, with success rates of 22-100% using decade-old attack methods.

citing papers explorer

Showing 4 of 4 citing papers after filters.