hub Canonical reference

Robust distortion- free watermarks for language models

Rohith Kuditipudi, John Thickstun, Tatsunori Hashimoto, Percy Liang · 2023 · arXiv 2307.15593

Canonical reference. 100% of citing Pith papers cite this work as background.

20 Pith papers citing it

Background 100% of classified citations

read on arXiv browse 20 citing papers

hub tools

JSON dossier citing papers JSON arXiv source

citation-role summary

background 5

citation-polarity summary

background 5

representative citing papers

SLAM: Structural Linguistic Activation Marking for Language Models

cs.CL · 2026-05-06 · unverdicted · novelty 8.0

SLAM achieves 100% detection on Gemma-2 models with only 1-2 point quality cost by causally steering SAE-identified residual-stream directions for linguistic structure.

Undetectable Conversations Between AI Agents via Pseudorandom Noise-Resilient Key Exchange

cs.CR · 2026-04-06 · unverdicted · novelty 8.0

AI agents can conduct undetectable covert conversations using a new pseudorandom noise-resilient key exchange that works without shared keys and with only constant min-entropy in messages.

RLCracker: Evaluating the Worst-Case Vulnerability of LLM Watermarks with Adaptive RL Attacks

cs.CR · 2025-09-25 · conditional · novelty 8.0

RLCracker is a reinforcement learning attack that erases LLM watermarks at 98.5% success rate with minimal data and generalizes across ten schemes and multiple model sizes.

SWAN: Semantic Watermarking with Abstract Meaning Representation

cs.CL · 2026-05-05 · unverdicted · novelty 7.0

SWAN uses AMR to embed semantic watermarks that persist through paraphrases, matching SOTA detection on original text and improving AUC by 13.9 points on paraphrased RealNews data.

Can we Watermark Low-Entropy LLM Outputs?

cs.CR · 2026-04-13 · unverdicted · novelty 7.0

The authors give constructions for provably undetectable watermarking of constant-entropy LLM outputs that are robust to random substitutions (under subexponential LPN) and to substitutions plus random deletions (under an additional heuristic or pseudorandom ECC).

RLSpoofer: A Lightweight Evaluator for LLM Watermark Spoofing Resilience

cs.CR · 2026-04-13 · unverdicted · novelty 7.0

RLSpoofer trains a 4B model on 100 watermarked paraphrase pairs to spoof PF watermarks at 62% success rate, far exceeding baselines trained on up to 10,000 samples.

Optimal Multi-bit Generative Watermarking Schemes Under Worst-Case False-Alarm Constraints

cs.IT · 2026-04-09 · unverdicted · novelty 7.0

Two new constructions for multi-bit generative watermarking attain the established lower bound on miss-detection probability under worst-case false-alarm constraints, fully characterizing optimal performance via linear programming.

cs.CR · 2025-08-15 · accept · novelty 7.0

A survey of LLM copyright protection that unifies text watermarking, model watermarking, and model fingerprinting while presenting new coverage of fingerprint transfer and removal.

Topic-Based Watermarks for Large Language Models

cs.CR · 2024-04-02 · unverdicted · novelty 7.0

A topic-guided watermarking scheme partitions the LLM vocabulary into topic-aligned token subsets and green-lists relevant tokens based on the input prompt to embed detectable marks while preserving text quality and improving robustness to attacks.

Trustworthy AI: Ensuring Reliability and Accountability from Models to Agents

cs.LG · 2026-05-09 · unverdicted · novelty 6.0

The thesis presents a kernel method for multiaccuracy across overlooked subpopulations, information-theoretic optimal watermarking for LLMs, and a simulator showing LLM agents outperforming humans in supply chains while creating tail risks.

Response Time Enhances Alignment with Heterogeneous Preferences

cs.LG · 2026-05-07 · unverdicted · novelty 6.0

Response times modeled as drift-diffusion processes enable consistent estimation of population-average preferences from heterogeneous anonymous binary choices.

Detecting Verbatim LLM Copy-Paste in Homework

cs.CR · 2026-05-07 · unverdicted · novelty 6.0

SteganoPrompt embeds a hidden instruction in assignment prompts via the Unicode Tags block so that LLMs add a detectable signature to responses when the prompt is pasted verbatim.

Block-wise Codeword Embedding for Reliable Multi-bit Text Watermarking

cs.CR · 2026-05-01 · unverdicted · novelty 6.0

BREW achieves TPR of 0.965 and FPR of 0.02 under 10% synonym substitution by shifting from ECC decoding to designated verification with block voting and local validation.

Towards Robust Content Watermarking Against Removal and Forgery Attacks

cs.CV · 2026-04-08 · unverdicted · novelty 6.0

ISTS watermarking dynamically controls injection based on prompt semantics and uses two-sided detection to resist removal and forgery attacks in diffusion models.

ArcMark: Distortion-Free Multi-Byte LLM Watermark via Optimal Transport

cs.LG · 2026-02-06 · unverdicted · novelty 6.0

ArcMark is a multi-byte LLM watermark that achieves distortion-free embedding of several bytes per few hundred tokens by treating generation as a channel coding problem and using optimal transport to match distributions.

SWaRL: Safeguard Code Watermarking via Reinforcement Learning

cs.CR · 2026-01-05 · unverdicted · novelty 6.0

SWaRL trains code LLMs with RL using compiler correctness signals and a confidential verifier reward to embed robust, functionality-preserving watermarks that resist refactoring attacks.

Can AI-Generated Text be Reliably Detected?

cs.CL · 2023-03-17 · unverdicted · novelty 6.0

Recursive paraphrasing attacks substantially lower detection rates for multiple AI text detectors with only minor quality loss, while a theoretical analysis ties best-case AUROC to total variation distance between human and AI distributions.

Verifiable Provenance and Watermarking for Generative AI: An Evidentiary Framework for International Operational Law and Domestic Courts

cs.CR · 2026-05-20 · unverdicted · novelty 5.0

Develops a five-tier threat model for generative AI content, releases a 12000-item multi-modal benchmark with laundering tests, evaluates four schemes, and maps detection metrics to legal sufficiency thresholds for law of armed conflict, domestic procedure, and EU AI Act.

Fundamental Trade-Offs in Multi-Bit Watermarking of Stochastic Processes

cs.IT · 2026-05-09 · unverdicted · novelty 5.0

Derives matched converse and achievability bounds that characterize optimal trade-offs among false-alarm probability, detection error probability, distortion, and information rate for multi-bit watermarking of stationary ergodic stochastic processes.

Position: LLM Watermarking Should Align Stakeholders' Incentives for Practical Adoption

cs.CR · 2025-10-21 · unverdicted · novelty 4.0

LLM watermarking adoption is limited by misaligned stakeholder incentives; incentive-aligned approaches such as in-context watermarking can enable practical use in targeted domains like education and peer review.

citing papers explorer

Showing 20 of 20 citing papers.

SLAM: Structural Linguistic Activation Marking for Language Models cs.CL · 2026-05-06 · unverdicted · none · ref 11
SLAM achieves 100% detection on Gemma-2 models with only 1-2 point quality cost by causally steering SAE-identified residual-stream directions for linguistic structure.
Undetectable Conversations Between AI Agents via Pseudorandom Noise-Resilient Key Exchange cs.CR · 2026-04-06 · unverdicted · none · ref 5
AI agents can conduct undetectable covert conversations using a new pseudorandom noise-resilient key exchange that works without shared keys and with only constant min-entropy in messages.
RLCracker: Evaluating the Worst-Case Vulnerability of LLM Watermarks with Adaptive RL Attacks cs.CR · 2025-09-25 · conditional · none · ref 16
RLCracker is a reinforcement learning attack that erases LLM watermarks at 98.5% success rate with minimal data and generalizes across ten schemes and multiple model sizes.
SWAN: Semantic Watermarking with Abstract Meaning Representation cs.CL · 2026-05-05 · unverdicted · none · ref 54
SWAN uses AMR to embed semantic watermarks that persist through paraphrases, matching SOTA detection on original text and improving AUC by 13.9 points on paraphrased RealNews data.
Can we Watermark Low-Entropy LLM Outputs? cs.CR · 2026-04-13 · unverdicted · none · ref 8
The authors give constructions for provably undetectable watermarking of constant-entropy LLM outputs that are robust to random substitutions (under subexponential LPN) and to substitutions plus random deletions (under an additional heuristic or pseudorandom ECC).
RLSpoofer: A Lightweight Evaluator for LLM Watermark Spoofing Resilience cs.CR · 2026-04-13 · unverdicted · none · ref 35
RLSpoofer trains a 4B model on 100 watermarked paraphrase pairs to spoof PF watermarks at 62% success rate, far exceeding baselines trained on up to 10,000 samples.
Optimal Multi-bit Generative Watermarking Schemes Under Worst-Case False-Alarm Constraints cs.IT · 2026-04-09 · unverdicted · none · ref 13
Two new constructions for multi-bit generative watermarking attain the established lower bound on miss-detection probability under worst-case false-alarm constraints, fully characterizing optimal performance via linear programming.
Copyright Protection for Large Language Models: A Survey of Methods, Challenges, and Trends cs.CR · 2025-08-15 · accept · none · ref 78
A survey of LLM copyright protection that unifies text watermarking, model watermarking, and model fingerprinting while presenting new coverage of fingerprint transfer and removal.
Topic-Based Watermarks for Large Language Models cs.CR · 2024-04-02 · unverdicted · none · ref 24
A topic-guided watermarking scheme partitions the LLM vocabulary into topic-aligned token subsets and green-lists relevant tokens based on the input prompt to embed detectable marks while preserving text quality and improving robustness to attacks.
Trustworthy AI: Ensuring Reliability and Accountability from Models to Agents cs.LG · 2026-05-09 · unverdicted · none · ref 93
The thesis presents a kernel method for multiaccuracy across overlooked subpopulations, information-theoretic optimal watermarking for LLMs, and a simulator showing LLM agents outperforming humans in supply chains while creating tail risks.
Response Time Enhances Alignment with Heterogeneous Preferences cs.LG · 2026-05-07 · unverdicted · none · ref 102
Response times modeled as drift-diffusion processes enable consistent estimation of population-average preferences from heterogeneous anonymous binary choices.
Detecting Verbatim LLM Copy-Paste in Homework cs.CR · 2026-05-07 · unverdicted · none · ref 15
SteganoPrompt embeds a hidden instruction in assignment prompts via the Unicode Tags block so that LLMs add a detectable signature to responses when the prompt is pasted verbatim.
Block-wise Codeword Embedding for Reliable Multi-bit Text Watermarking cs.CR · 2026-05-01 · unverdicted · none · ref 60
BREW achieves TPR of 0.965 and FPR of 0.02 under 10% synonym substitution by shifting from ECC decoding to designated verification with block voting and local validation.
Towards Robust Content Watermarking Against Removal and Forgery Attacks cs.CV · 2026-04-08 · unverdicted · none · ref 32
ISTS watermarking dynamically controls injection based on prompt semantics and uses two-sided detection to resist removal and forgery attacks in diffusion models.
ArcMark: Distortion-Free Multi-Byte LLM Watermark via Optimal Transport cs.LG · 2026-02-06 · unverdicted · none · ref 8
ArcMark is a multi-byte LLM watermark that achieves distortion-free embedding of several bytes per few hundred tokens by treating generation as a channel coding problem and using optimal transport to match distributions.
SWaRL: Safeguard Code Watermarking via Reinforcement Learning cs.CR · 2026-01-05 · unverdicted · none · ref 27
SWaRL trains code LLMs with RL using compiler correctness signals and a confidential verifier reward to embed robust, functionality-preserving watermarks that resist refactoring attacks.
Can AI-Generated Text be Reliably Detected? cs.CL · 2023-03-17 · unverdicted · none · ref 28
Recursive paraphrasing attacks substantially lower detection rates for multiple AI text detectors with only minor quality loss, while a theoretical analysis ties best-case AUROC to total variation distance between human and AI distributions.
Verifiable Provenance and Watermarking for Generative AI: An Evidentiary Framework for International Operational Law and Domestic Courts cs.CR · 2026-05-20 · unverdicted · none · ref 10
Develops a five-tier threat model for generative AI content, releases a 12000-item multi-modal benchmark with laundering tests, evaluates four schemes, and maps detection metrics to legal sufficiency thresholds for law of armed conflict, domestic procedure, and EU AI Act.
Fundamental Trade-Offs in Multi-Bit Watermarking of Stochastic Processes cs.IT · 2026-05-09 · unverdicted · none · ref 11
Derives matched converse and achievability bounds that characterize optimal trade-offs among false-alarm probability, detection error probability, distortion, and information rate for multi-bit watermarking of stationary ergodic stochastic processes.
Position: LLM Watermarking Should Align Stakeholders' Incentives for Practical Adoption cs.CR · 2025-10-21 · unverdicted · none · ref 34
LLM watermarking adoption is limited by misaligned stakeholder incentives; incentive-aligned approaches such as in-context watermarking can enable practical use in targeted domains like education and peer review.

Robust distortion- free watermarks for language models

hub tools

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer