Mitigating Watermark Forgery in Generative Models via Randomized Key Selection

Toluwani Aremu , Noor Hussein , Munachiso Nwadike , Samuele Poppi , Jie Zhang , Karthik Nandakumar , Neil Gong , Nils Lukas

Authors on Pith no claims yet

classification 💻 cs.CR cs.AIcs.LG

keywords contentwatermarkforgeryemphattackerdefensedegradedetected

0 comments

read the original abstract

Watermarking enables GenAI providers to verify whether content was generated by their models. A watermark is a hidden signal in the content, whose presence can be detected using a secret watermark key. A core security threat are forgery attacks, where adversaries insert the provider's watermark into content \emph{not} produced by the provider, potentially damaging their reputation and undermining trust. Existing defenses resist forgery by embedding many watermarks with multiple keys into the same content, which can degrade model utility. However, forgery remains a threat when attackers can collect sufficiently many watermarked samples. We propose a defense that is provably forgery-resistant \emph{independent} of the number of watermarked content collected by the attacker, provided they cannot easily distinguish watermarks from different keys. Our scheme does not further degrade model utility. We randomize the watermark key selection for each query and accept content as genuine only if a watermark is detected by \emph{exactly} one key. We focus on the image and text modalities, but our defense is modality-agnostic, since it treats the underlying watermarking method as a black-box. Our method provably bounds the attacker's success rate and we empirically observe a reduction from near-perfect success rates to only $2\%$ at negligible computational overhead.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 2 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Watermarking Should Be Treated as a Monitoring Primitive
cs.CR 2026-05 unverdicted novelty 6.0

Watermarking enables entity-level attribution and monitoring via signal aggregation across outputs, even in zero-bit designs, revealing a fundamental tension with attribution goals.
Watermarking Should Be Treated as a Monitoring Primitive
cs.CR 2026-05 conditional novelty 6.0

Watermarking enables entity-level attribution and monitoring through signal aggregation even in zero-bit designs, creating an unavoidable dual-use tension between attribution and surveillance.