Watermark stealing in large language models

Martin Vechev · 2024 · arXiv 2402.19361

5 Pith papers cite this work. Polarity classification is still indexing.

5 Pith papers citing it

read on arXiv browse 5 citing papers

citation-role summary

background 2

citation-polarity summary

background 2

representative citing papers

RLCracker: Evaluating the Worst-Case Vulnerability of LLM Watermarks with Adaptive RL Attacks

cs.CR · 2025-09-25 · conditional · novelty 8.0

RLCracker is a reinforcement learning attack that erases LLM watermarks at 98.5% success rate with minimal data and generalizes across ten schemes and multiple model sizes.

RLSpoofer: A Lightweight Evaluator for LLM Watermark Spoofing Resilience

cs.CR · 2026-04-13 · unverdicted · novelty 7.0

RLSpoofer trains a 4B model on 100 watermarked paraphrase pairs to spoof PF watermarks at 62% success rate, far exceeding baselines trained on up to 10,000 samples.

TimeMark: A Trustworthy Time Watermarking Framework for Exact Generation-Time Recovery from AIGC

cs.CR · 2026-04-14 · unverdicted · novelty 6.0

TimeMark is a trustworthy time watermarking framework that achieves exact generation-time recovery from AI-generated content with theoretically perfect accuracy by using time-dependent cryptographic keys, random non-stored bit sequences, and two-stage encoding with error-correcting codes.

Towards Robust Content Watermarking Against Removal and Forgery Attacks

cs.CV · 2026-04-08 · unverdicted · novelty 6.0

ISTS watermarking dynamically controls injection based on prompt semantics and uses two-sided detection to resist removal and forgery attacks in diffusion models.

Position: LLM Watermarking Should Align Stakeholders' Incentives for Practical Adoption

cs.CR · 2025-10-21 · unverdicted · novelty 4.0

LLM watermarking adoption is limited by misaligned stakeholder incentives; incentive-aligned approaches such as in-context watermarking can enable practical use in targeted domains like education and peer review.

citing papers explorer

Showing 5 of 5 citing papers.

RLCracker: Evaluating the Worst-Case Vulnerability of LLM Watermarks with Adaptive RL Attacks cs.CR · 2025-09-25 · conditional · none · ref 11
RLCracker is a reinforcement learning attack that erases LLM watermarks at 98.5% success rate with minimal data and generalizes across ten schemes and multiple model sizes.
RLSpoofer: A Lightweight Evaluator for LLM Watermark Spoofing Resilience cs.CR · 2026-04-13 · unverdicted · none · ref 11
RLSpoofer trains a 4B model on 100 watermarked paraphrase pairs to spoof PF watermarks at 62% success rate, far exceeding baselines trained on up to 10,000 samples.
TimeMark: A Trustworthy Time Watermarking Framework for Exact Generation-Time Recovery from AIGC cs.CR · 2026-04-14 · unverdicted · none · ref 24
TimeMark is a trustworthy time watermarking framework that achieves exact generation-time recovery from AI-generated content with theoretically perfect accuracy by using time-dependent cryptographic keys, random non-stored bit sequences, and two-stage encoding with error-correcting codes.
Towards Robust Content Watermarking Against Removal and Forgery Attacks cs.CV · 2026-04-08 · unverdicted · none · ref 28
ISTS watermarking dynamically controls injection based on prompt semantics and uses two-sided detection to resist removal and forgery attacks in diffusion models.
Position: LLM Watermarking Should Align Stakeholders' Incentives for Practical Adoption cs.CR · 2025-10-21 · unverdicted · none · ref 32
LLM watermarking adoption is limited by misaligned stakeholder incentives; incentive-aligned approaches such as in-context watermarking can enable practical use in targeted domains like education and peer review.

Watermark stealing in large language models

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer