RLSpoofer trains a 4B model on 100 watermarked paraphrase pairs to spoof PF watermarks at 62% success rate, far exceeding baselines trained on up to 10,000 samples.
Discovering spoofing attempts on language model watermarks
3 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
fields
cs.CR 3roles
background 1polarities
background 1representative citing papers
Watermarking enables entity-level attribution and monitoring through signal aggregation even in zero-bit designs, creating an unavoidable dual-use tension between attribution and surveillance.
Randomized per-query key selection with single-key detection acceptance bounds forgery success rate independently of collected samples while preserving model utility.
citing papers explorer
-
RLSpoofer: A Lightweight Evaluator for LLM Watermark Spoofing Resilience
RLSpoofer trains a 4B model on 100 watermarked paraphrase pairs to spoof PF watermarks at 62% success rate, far exceeding baselines trained on up to 10,000 samples.
-
Watermarking Should Be Treated as a Monitoring Primitive
Watermarking enables entity-level attribution and monitoring through signal aggregation even in zero-bit designs, creating an unavoidable dual-use tension between attribution and surveillance.
-
Mitigating Watermark Forgery in Generative Models via Randomized Key Selection
Randomized per-query key selection with single-key detection acceptance bounds forgery success rate independently of collected samples while preserving model utility.