RLCracker is a reinforcement learning attack that erases LLM watermarks at 98.5% success rate with minimal data and generalizes across ten schemes and multiple model sizes.
Necessary and sufficient watermark for large language models
5 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
roles
background 1polarities
background 1representative citing papers
Two new constructions for multi-bit generative watermarking attain the established lower bound on miss-detection probability under worst-case false-alarm constraints, fully characterizing optimal performance via linear programming.
A survey of LLM copyright protection that unifies text watermarking, model watermarking, and model fingerprinting while presenting new coverage of fingerprint transfer and removal.
citing papers explorer
-
RLCracker: Evaluating the Worst-Case Vulnerability of LLM Watermarks with Adaptive RL Attacks
RLCracker is a reinforcement learning attack that erases LLM watermarks at 98.5% success rate with minimal data and generalizes across ten schemes and multiple model sizes.
-
Optimal Multi-bit Generative Watermarking Schemes Under Worst-Case False-Alarm Constraints
Two new constructions for multi-bit generative watermarking attain the established lower bound on miss-detection probability under worst-case false-alarm constraints, fully characterizing optimal performance via linear programming.
-
Copyright Protection for Large Language Models: A Survey of Methods, Challenges, and Trends
A survey of LLM copyright protection that unifies text watermarking, model watermarking, and model fingerprinting while presenting new coverage of fingerprint transfer and removal.
- PASA: A Principled Embedding-Space Watermarking Approach for LLM-Generated Text under Semantic-Invariant Attacks
- Block-wise Codeword Embedding for Reliable Multi-bit Text Watermarking