RLCracker is a reinforcement learning attack that erases LLM watermarks at 98.5% success rate with minimal data and generalizes across ten schemes and multiple model sizes.
Watermark stealing in large language models
5 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
roles
background 2polarities
background 2representative citing papers
RLSpoofer trains a 4B model on 100 watermarked paraphrase pairs to spoof PF watermarks at 62% success rate, far exceeding baselines trained on up to 10,000 samples.
TimeMark is a trustworthy time watermarking framework that achieves exact generation-time recovery from AI-generated content with theoretically perfect accuracy by using time-dependent cryptographic keys, random non-stored bit sequences, and two-stage encoding with error-correcting codes.
ISTS watermarking dynamically controls injection based on prompt semantics and uses two-sided detection to resist removal and forgery attacks in diffusion models.
LLM watermarking adoption is limited by misaligned stakeholder incentives; incentive-aligned approaches such as in-context watermarking can enable practical use in targeted domains like education and peer review.
citing papers explorer
-
RLCracker: Evaluating the Worst-Case Vulnerability of LLM Watermarks with Adaptive RL Attacks
RLCracker is a reinforcement learning attack that erases LLM watermarks at 98.5% success rate with minimal data and generalizes across ten schemes and multiple model sizes.
-
RLSpoofer: A Lightweight Evaluator for LLM Watermark Spoofing Resilience
RLSpoofer trains a 4B model on 100 watermarked paraphrase pairs to spoof PF watermarks at 62% success rate, far exceeding baselines trained on up to 10,000 samples.
-
TimeMark: A Trustworthy Time Watermarking Framework for Exact Generation-Time Recovery from AIGC
TimeMark is a trustworthy time watermarking framework that achieves exact generation-time recovery from AI-generated content with theoretically perfect accuracy by using time-dependent cryptographic keys, random non-stored bit sequences, and two-stage encoding with error-correcting codes.
-
Towards Robust Content Watermarking Against Removal and Forgery Attacks
ISTS watermarking dynamically controls injection based on prompt semantics and uses two-sided detection to resist removal and forgery attacks in diffusion models.
-
Position: LLM Watermarking Should Align Stakeholders' Incentives for Practical Adoption
LLM watermarking adoption is limited by misaligned stakeholder incentives; incentive-aligned approaches such as in-context watermarking can enable practical use in targeted domains like education and peer review.