TOFU is a new benchmark with synthetic profiles and metrics demonstrating that existing unlearning algorithms for LLMs fail to achieve effective forgetting of targeted information.
Are Large Pre-Trained Language Models Leaking Your Personal Information?
3 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
roles
background 1polarities
background 1representative citing papers
Generative AI evaluation must shift from static benchmark scores to measuring sustained improvements in human capabilities within specific deployment contexts.
A survey reviewing benchmark data contamination in LLMs, its impact on evaluation, and alternative assessment approaches.
citing papers explorer
-
TOFU: A Task of Fictitious Unlearning for LLMs
TOFU is a new benchmark with synthetic profiles and metrics demonstrating that existing unlearning algorithms for LLMs fail to achieve effective forgetting of targeted information.
-
Benchmarked Yet Not Measured -- Generative AI Should be Evaluated Against Real-World Utility
Generative AI evaluation must shift from static benchmark scores to measuring sustained improvements in human capabilities within specific deployment contexts.
-
Benchmark Data Contamination of Large Language Models: A Survey
A survey reviewing benchmark data contamination in LLMs, its impact on evaluation, and alternative assessment approaches.