Stealing part of a production language model.arXiv preprint arXiv:2403.06634

Stealing part of a production language model , author= · 2024 · arXiv 2403.06634

7 Pith papers cite this work. Polarity classification is still indexing.

7 Pith papers citing it

read on arXiv browse 7 citing papers

citation-role summary

background 1

citation-polarity summary

support 1

representative citing papers

Unlearning with Asymmetric Sources: Improved Unlearning-Utility Trade-off with Public Data

cs.LG · 2026-05-11 · unverdicted · novelty 7.0

ALU uses public data to suppress unlearning cost quadratically while characterizing distribution mismatch effects, enabling mass unlearning with maintained utility.

A Systematic Survey of Security Threats and Defenses in LLM-Based AI Agents: A Layered Attack Surface Framework

cs.CR · 2026-04-25 · unverdicted · novelty 7.0

A new 7x4 taxonomy organizes agentic AI security threats by architectural layer and persistence timescale, revealing under-explored upper layers and missing defenses after surveying 116 papers.

Fingerprinting LLMs via Prompt Injection

cs.CR · 2025-09-29 · conditional · novelty 7.0

LLMPrint generates unique, post-processing-robust fingerprints for base LLMs and their variants via optimized prompt injection with statistical verification for gray-box and black-box settings.

Surrogate Fidelity: When Can Open LLMs Explain Closed Ones?

cs.LG · 2026-06-30 · unverdicted · novelty 6.0

Prediction agreement between open and closed LLMs substantially overstates agreement on attributions and causal reasons.

The Surface You Test Is Not the Surface That Breaks

cs.CR · 2026-05-28 · unverdicted · novelty 6.0

Prompt injection vulnerability in tool-augmented LLMs is a model-surface interaction rather than a fixed channel property; the same payload inverts success rates across models, and adaptive attack rate exceeds single-surface baselines by 9.1 pp on average.

On the (In-)Security of the Shuffling Defense in the Transformer Secure Inference

cs.CR · 2026-05-06 · conditional · novelty 6.0

An attack aligns differently shuffled intermediate activations from secure Transformer inference queries to recover model weights with low error using roughly one dollar of queries.

LLM Hypnosis: Exploiting User Feedback for Unauthorized Knowledge Injection to All Users

cs.CL · 2025-07-03 · unverdicted · novelty 6.0

A single attacker can use strategic upvoting and downvoting on language model outputs to inject facts, security flaws, or fake news that persist in the model for all users after preference tuning.

citing papers explorer

Showing 4 of 4 citing papers after filters.

A Systematic Survey of Security Threats and Defenses in LLM-Based AI Agents: A Layered Attack Surface Framework cs.CR · 2026-04-25 · unverdicted · none · ref 65
A new 7x4 taxonomy organizes agentic AI security threats by architectural layer and persistence timescale, revealing under-explored upper layers and missing defenses after surveying 116 papers.
Fingerprinting LLMs via Prompt Injection cs.CR · 2025-09-29 · conditional · none · ref 2
LLMPrint generates unique, post-processing-robust fingerprints for base LLMs and their variants via optimized prompt injection with statistical verification for gray-box and black-box settings.
The Surface You Test Is Not the Surface That Breaks cs.CR · 2026-05-28 · unverdicted · none · ref 7
Prompt injection vulnerability in tool-augmented LLMs is a model-surface interaction rather than a fixed channel property; the same payload inverts success rates across models, and adaptive attack rate exceeds single-surface baselines by 9.1 pp on average.
On the (In-)Security of the Shuffling Defense in the Transformer Secure Inference cs.CR · 2026-05-06 · conditional · none · ref 163
An attack aligns differently shuffled intermediate activations from secure Transformer inference queries to recover model weights with low error using roughly one dollar of queries.

Stealing part of a production language model.arXiv preprint arXiv:2403.06634

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer