Min-K% Prob detects pretraining data in LLMs by flagging outlier low-probability words in text, achieving 7.4% better performance than prior methods on the new WIKIMIA benchmark.
arXiv preprint arXiv:2304.06929 , year=
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
representative citing papers
TADP-RME adapts the privacy budget via inverse trust scores in [0,1] and uses reverse manifold embedding to reduce inference attack success rates by up to 3.1% while preserving formal differential privacy guarantees.
citing papers explorer
-
Detecting Pretraining Data from Large Language Models
Min-K% Prob detects pretraining data in LLMs by flagging outlier low-probability words in text, achieving 7.4% better performance than prior methods on the new WIKIMIA benchmark.
-
TADP-RME: A Trust-Adaptive Differential Privacy Framework for Enhancing Reliability of Data-Driven Systems
TADP-RME adapts the privacy budget via inverse trust scores in [0,1] and uses reverse manifold embedding to reduce inference attack success rates by up to 3.1% while preserving formal differential privacy guarantees.