Stealing part of a production language model.arXiv preprint arXiv:2403.06634,

Nicholas Carlini, Daniel Paleka, Krishnamurthy Dj Dvijotham, Thomas Steinke, Jonathan Hayase, A Feder Cooper, Katherine Lee, Matthew Jagielski, Milad Nasr, Arthur Conmy, et al · 2024 · arXiv 2403.06634

5 Pith papers cite this work. Polarity classification is still indexing.

5 Pith papers citing it

read on arXiv browse 5 citing papers

citation-role summary

background 1

citation-polarity summary

support 1

representative citing papers

Unlearning with Asymmetric Sources: Improved Unlearning-Utility Trade-off with Public Data

cs.LG · 2026-05-11 · unverdicted · novelty 7.0

Asymmetric Langevin Unlearning uses public data to suppress unlearning noise costs by O(1/n_pub²), enabling practical mass unlearning with preserved utility under distribution mismatch.

A Systematic Survey of Security Threats and Defenses in LLM-Based AI Agents: A Layered Attack Surface Framework

cs.CR · 2026-04-25 · unverdicted · novelty 7.0

A new 7x4 taxonomy organizes agentic AI security threats by architectural layer and persistence timescale, revealing under-explored upper layers and missing defenses after surveying 116 papers.

Fingerprinting LLMs via Prompt Injection

cs.CR · 2025-09-29 · conditional · novelty 7.0

LLMPrint generates unique, post-processing-robust fingerprints for base LLMs and their variants via optimized prompt injection with statistical verification for gray-box and black-box settings.

On the (In-)Security of the Shuffling Defense in the Transformer Secure Inference

cs.CR · 2026-05-06 · conditional · novelty 6.0

An attack aligns differently shuffled intermediate activations from secure Transformer inference queries to recover model weights with low error using roughly one dollar of queries.

LLM Hypnosis: Exploiting User Feedback for Unauthorized Knowledge Injection to All Users

cs.CL · 2025-07-03 · unverdicted · novelty 6.0

A single attacker can use strategic upvoting and downvoting on language model outputs to inject facts, security flaws, or fake news that persist in the model for all users after preference tuning.

citing papers explorer

Showing 5 of 5 citing papers.

Unlearning with Asymmetric Sources: Improved Unlearning-Utility Trade-off with Public Data cs.LG · 2026-05-11 · unverdicted · none · ref 85
Asymmetric Langevin Unlearning uses public data to suppress unlearning noise costs by O(1/n_pub²), enabling practical mass unlearning with preserved utility under distribution mismatch.
A Systematic Survey of Security Threats and Defenses in LLM-Based AI Agents: A Layered Attack Surface Framework cs.CR · 2026-04-25 · unverdicted · none · ref 65
A new 7x4 taxonomy organizes agentic AI security threats by architectural layer and persistence timescale, revealing under-explored upper layers and missing defenses after surveying 116 papers.
Fingerprinting LLMs via Prompt Injection cs.CR · 2025-09-29 · conditional · none · ref 2
LLMPrint generates unique, post-processing-robust fingerprints for base LLMs and their variants via optimized prompt injection with statistical verification for gray-box and black-box settings.
On the (In-)Security of the Shuffling Defense in the Transformer Secure Inference cs.CR · 2026-05-06 · conditional · none · ref 163
An attack aligns differently shuffled intermediate activations from secure Transformer inference queries to recover model weights with low error using roughly one dollar of queries.
LLM Hypnosis: Exploiting User Feedback for Unauthorized Knowledge Injection to All Users cs.CL · 2025-07-03 · unverdicted · none · ref 14
A single attacker can use strategic upvoting and downvoting on language model outputs to inject facts, security flaws, or fake news that persist in the model for all users after preference tuning.

Stealing part of a production language model.arXiv preprint arXiv:2403.06634,

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer