A new extraction technique applied to 200 books and 14 LLMs finds that memorization of full books is rare except in specific high-capacity models where entire texts can be recovered verbatim.
Title resolution pending
3 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
years
2025 3roles
background 1polarities
background 1representative citing papers
LLM watermarking adoption is limited by misaligned stakeholder incentives; incentive-aligned approaches such as in-context watermarking can enable practical use in targeted domains like education and peer review.
Authors introduce MLM and CLM specialization methods that avoid memorizing identifiers in sensitive training data while aiming for a privacy-utility tradeoff on medical datasets.
citing papers explorer
-
Extracting memorized pieces of (copyrighted) books from open-weight language models
A new extraction technique applied to 200 books and 14 LLMs finds that memorization of full books is rare except in specific high-capacity models where entire texts can be recovered verbatim.
-
Position: LLM Watermarking Should Align Stakeholders' Incentives for Practical Adoption
LLM watermarking adoption is limited by misaligned stakeholder incentives; incentive-aligned approaches such as in-context watermarking can enable practical use in targeted domains like education and peer review.
-
Towards the Anonymization of the Language Modeling
Authors introduce MLM and CLM specialization methods that avoid memorizing identifiers in sensitive training data while aiming for a privacy-utility tradeoff on medical datasets.