ALU uses public data to suppress unlearning cost quadratically while characterizing distribution mismatch effects, enabling mass unlearning with maintained utility.
Stealing part of a production language model.arXiv preprint arXiv:2403.06634
7 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
roles
background 1polarities
support 1representative citing papers
A new 7x4 taxonomy organizes agentic AI security threats by architectural layer and persistence timescale, revealing under-explored upper layers and missing defenses after surveying 116 papers.
LLMPrint generates unique, post-processing-robust fingerprints for base LLMs and their variants via optimized prompt injection with statistical verification for gray-box and black-box settings.
Prediction agreement between open and closed LLMs substantially overstates agreement on attributions and causal reasons.
Prompt injection vulnerability in tool-augmented LLMs is a model-surface interaction rather than a fixed channel property; the same payload inverts success rates across models, and adaptive attack rate exceeds single-surface baselines by 9.1 pp on average.
An attack aligns differently shuffled intermediate activations from secure Transformer inference queries to recover model weights with low error using roughly one dollar of queries.
A single attacker can use strategic upvoting and downvoting on language model outputs to inject facts, security flaws, or fake news that persist in the model for all users after preference tuning.
citing papers explorer
-
A Systematic Survey of Security Threats and Defenses in LLM-Based AI Agents: A Layered Attack Surface Framework
A new 7x4 taxonomy organizes agentic AI security threats by architectural layer and persistence timescale, revealing under-explored upper layers and missing defenses after surveying 116 papers.
-
Fingerprinting LLMs via Prompt Injection
LLMPrint generates unique, post-processing-robust fingerprints for base LLMs and their variants via optimized prompt injection with statistical verification for gray-box and black-box settings.
-
The Surface You Test Is Not the Surface That Breaks
Prompt injection vulnerability in tool-augmented LLMs is a model-surface interaction rather than a fixed channel property; the same payload inverts success rates across models, and adaptive attack rate exceeds single-surface baselines by 9.1 pp on average.
-
On the (In-)Security of the Shuffling Defense in the Transformer Secure Inference
An attack aligns differently shuffled intermediate activations from secure Transformer inference queries to recover model weights with low error using roughly one dollar of queries.