Min-K% Prob detects pretraining data in LLMs by flagging outlier low-probability words in text, achieving 7.4% better performance than prior methods on the new WIKIMIA benchmark.
Membership inference attacks from first principles
3 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
verdicts
CONDITIONAL 3roles
background 1polarities
background 1representative citing papers
TOFU is a new benchmark with synthetic profiles and metrics demonstrating that existing unlearning algorithms for LLMs fail to achieve effective forgetting of targeted information.
SalUn uses gradient-based weight saliency to achieve effective machine unlearning of data, classes, or concepts in image classification and generation, narrowing the gap to exact retraining.
citing papers explorer
-
Detecting Pretraining Data from Large Language Models
Min-K% Prob detects pretraining data in LLMs by flagging outlier low-probability words in text, achieving 7.4% better performance than prior methods on the new WIKIMIA benchmark.
-
TOFU: A Task of Fictitious Unlearning for LLMs
TOFU is a new benchmark with synthetic profiles and metrics demonstrating that existing unlearning algorithms for LLMs fail to achieve effective forgetting of targeted information.
-
SalUn: Empowering Machine Unlearning via Gradient-based Weight Saliency in Both Image Classification and Generation
SalUn uses gradient-based weight saliency to achieve effective machine unlearning of data, classes, or concepts in image classification and generation, narrowing the gap to exact retraining.