pith. machine review for the scientific record. sign in

arxiv: 2507.07871 · v4 · submitted 2025-07-10 · 💻 cs.CR · cs.AI· cs.LG

Recognition: unknown

Mitigating Watermark Forgery in Generative Models via Randomized Key Selection

Authors on Pith no claims yet
classification 💻 cs.CR cs.AIcs.LG
keywords contentwatermarkforgeryemphattackerdefensedegradedetected
0
0 comments X
read the original abstract

Watermarking enables GenAI providers to verify whether content was generated by their models. A watermark is a hidden signal in the content, whose presence can be detected using a secret watermark key. A core security threat are forgery attacks, where adversaries insert the provider's watermark into content \emph{not} produced by the provider, potentially damaging their reputation and undermining trust. Existing defenses resist forgery by embedding many watermarks with multiple keys into the same content, which can degrade model utility. However, forgery remains a threat when attackers can collect sufficiently many watermarked samples. We propose a defense that is provably forgery-resistant \emph{independent} of the number of watermarked content collected by the attacker, provided they cannot easily distinguish watermarks from different keys. Our scheme does not further degrade model utility. We randomize the watermark key selection for each query and accept content as genuine only if a watermark is detected by \emph{exactly} one key. We focus on the image and text modalities, but our defense is modality-agnostic, since it treats the underlying watermarking method as a black-box. Our method provably bounds the attacker's success rate and we empirically observe a reduction from near-perfect success rates to only $2\%$ at negligible computational overhead.

This paper has not been read by Pith yet.

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Forward citations

Cited by 2 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. Watermarking Should Be Treated as a Monitoring Primitive

    cs.CR 2026-05 unverdicted novelty 6.0

    Watermarking enables entity-level attribution and monitoring via signal aggregation across outputs, even in zero-bit designs, revealing a fundamental tension with attribution goals.

  2. Watermarking Should Be Treated as a Monitoring Primitive

    cs.CR 2026-05 conditional novelty 6.0

    Watermarking enables entity-level attribution and monitoring through signal aggregation even in zero-bit designs, creating an unavoidable dual-use tension between attribution and surveillance.