Discovering spoofing attempts on language model watermarks

Gloaguen, T · 2024 · arXiv 2410.02693

3 Pith papers cite this work. Polarity classification is still indexing.

3 Pith papers citing it

read on arXiv browse 3 citing papers

citation-role summary

background 1

citation-polarity summary

background 1

representative citing papers

RLSpoofer: A Lightweight Evaluator for LLM Watermark Spoofing Resilience

cs.CR · 2026-04-13 · unverdicted · novelty 7.0

RLSpoofer trains a 4B model on 100 watermarked paraphrase pairs to spoof PF watermarks at 62% success rate, far exceeding baselines trained on up to 10,000 samples.

Watermarking Should Be Treated as a Monitoring Primitive

cs.CR · 2026-05-13 · conditional · novelty 6.0 · 2 refs

Watermarking enables entity-level attribution and monitoring through signal aggregation even in zero-bit designs, creating an unavoidable dual-use tension between attribution and surveillance.

Mitigating Watermark Forgery in Generative Models via Randomized Key Selection

cs.CR · 2025-07-10 · unverdicted · novelty 5.0

Randomized per-query key selection with single-key detection acceptance bounds forgery success rate independently of collected samples while preserving model utility.

citing papers explorer

Showing 3 of 3 citing papers.

RLSpoofer: A Lightweight Evaluator for LLM Watermark Spoofing Resilience cs.CR · 2026-04-13 · unverdicted · none · ref 16
RLSpoofer trains a 4B model on 100 watermarked paraphrase pairs to spoof PF watermarks at 62% success rate, far exceeding baselines trained on up to 10,000 samples.
Watermarking Should Be Treated as a Monitoring Primitive cs.CR · 2026-05-13 · conditional · none · ref 11 · 2 links
Watermarking enables entity-level attribution and monitoring through signal aggregation even in zero-bit designs, creating an unavoidable dual-use tension between attribution and surveillance.
Mitigating Watermark Forgery in Generative Models via Randomized Key Selection cs.CR · 2025-07-10 · unverdicted · none · ref 13
Randomized per-query key selection with single-key detection acceptance bounds forgery success rate independently of collected samples while preserving model utility.

Discovering spoofing attempts on language model watermarks

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer