hub

A watermark for large language models.arXiv preprint arXiv:2301.10226, 2023a

John Kirchenbauer, Jonas Geiping, Yuxin Wen, Jonathan Katz, Ian Miers, Tom Goldstein · 2023 · arXiv 2301.10226

13 Pith papers cite this work. Polarity classification is still indexing.

13 Pith papers citing it

read on arXiv browse 13 citing papers

hub tools

JSON dossier citing papers JSON arXiv source

citation-role summary

background 1 method 1

citation-polarity summary

support 1 use method 1

representative citing papers

SWAN: Semantic Watermarking with Abstract Meaning Representation

cs.CL · 2026-05-05 · unverdicted · novelty 7.0

SWAN uses AMR to embed semantic watermarks that persist through paraphrases, matching SOTA detection on original text and improving AUC by 13.9 points on paraphrased RealNews data.

A Systematic Survey of Security Threats and Defenses in LLM-Based AI Agents: A Layered Attack Surface Framework

cs.CR · 2026-04-25 · unverdicted · novelty 7.0

A new 7x4 taxonomy organizes agentic AI security threats by architectural layer and persistence timescale, revealing under-explored upper layers and missing defenses after surveying 116 papers.

ExaGPT: Example-Based Machine-Generated Text Detection for Human Interpretability

cs.CL · 2025-02-17 · unverdicted · novelty 7.0

ExaGPT uses span-level similarity retrieval from human and LLM datastores to detect machine-generated text while supplying the matching spans as human-interpretable evidence, achieving up to 37-point accuracy gains over prior interpretable detectors at 1% FPR.

Asking Back: Interaction-Layer Antidistillation Watermarks

cs.CR · 2026-05-15 · unverdicted · novelty 6.0

Interaction-layer antidistillation watermarks use system-prompt-induced behavioral markers like explicit follow-up questions that transfer to distilled student models at 45-89% relative fidelity and can be audited via black-box LLM-as-judge queries.

TextSeal: A Localized LLM Watermark for Provenance & Distillation Protection

cs.CR · 2026-05-12 · unverdicted · novelty 6.0 · 2 refs

TextSeal provides a localized, distortion-free LLM watermark that outperforms baselines in detection strength, remains effective in mixed human-AI text, preserves model performance, and transfers through distillation for provenance tracking.

Vaporizer: Breaking Watermarking Schemes for Large Language Model Outputs

cs.CR · 2026-05-08 · unverdicted · novelty 6.0

Existing LLM watermarking schemes can be defeated by semantic-preserving attacks including lexical changes, machine translation, and neural paraphrasing.

Response Time Enhances Alignment with Heterogeneous Preferences

cs.LG · 2026-05-07 · unverdicted · novelty 6.0

Response times modeled as drift-diffusion processes enable consistent estimation of population-average preferences from heterogeneous anonymous binary choices.

Intrinsic Fingerprint of LLMs: Continue Training is NOT All You Need to Steal A Model!

cs.CR · 2025-07-02 · unverdicted · novelty 6.0

Standard deviation distributions of attention matrices in LLMs remain distinctive and stable after continued training, enabling fingerprinting to trace model lineage and detect potential plagiarism such as in Pangu Pro MoE.

Can AI-Generated Text be Reliably Detected?

cs.CL · 2023-03-17 · unverdicted · novelty 6.0

Recursive paraphrasing attacks substantially lower detection rates for multiple AI text detectors with only minor quality loss, while a theoretical analysis ties best-case AUROC to total variation distance between human and AI distributions.

SCI-Defense: Defending Manipulation Attacks from Generative Engine Optimization

cs.LG · 2026-05-21 · unverdicted · novelty 5.0

SCI-Defense combines perplexity detection, semantic integrity scoring across four manipulation dimensions, and inter-candidate detection to counter GEO attacks, reporting perfect precision on Amazon product data but domain-limited recall on web passages.

Chainwash: Multi-Step Rewriting Attacks on Diffusion Language Model Watermarks

cs.CL · 2026-05-06 · unverdicted · novelty 5.0

Chained rewrites by open-weight LLMs reduce watermark detection on diffusion LM outputs from 87.9% to 4.86% after five steps across multiple styles and models.

LLMSniffer: Detecting LLM-Generated Code via GraphCodeBERT and Supervised Contrastive Learning

cs.SE · 2026-04-17 · unverdicted · novelty 4.0

LLMSniffer improves detection of LLM-generated code on GPTSniffer and Whodunit benchmarks by fine-tuning GraphCodeBERT via two-stage supervised contrastive learning plus preprocessing and MLP classification.

Findings of the Counter Turing Test: AI-Generated Text Detection

cs.CL · 2026-05-20

citing papers explorer

Showing 13 of 13 citing papers.

SWAN: Semantic Watermarking with Abstract Meaning Representation cs.CL · 2026-05-05 · unverdicted · none · ref 51
SWAN uses AMR to embed semantic watermarks that persist through paraphrases, matching SOTA detection on original text and improving AUC by 13.9 points on paraphrased RealNews data.
A Systematic Survey of Security Threats and Defenses in LLM-Based AI Agents: A Layered Attack Surface Framework cs.CR · 2026-04-25 · unverdicted · none · ref 80
A new 7x4 taxonomy organizes agentic AI security threats by architectural layer and persistence timescale, revealing under-explored upper layers and missing defenses after surveying 116 papers.
ExaGPT: Example-Based Machine-Generated Text Detection for Human Interpretability cs.CL · 2025-02-17 · unverdicted · none · ref 19
ExaGPT uses span-level similarity retrieval from human and LLM datastores to detect machine-generated text while supplying the matching spans as human-interpretable evidence, achieving up to 37-point accuracy gains over prior interpretable detectors at 1% FPR.
Asking Back: Interaction-Layer Antidistillation Watermarks cs.CR · 2026-05-15 · unverdicted · none · ref 15
Interaction-layer antidistillation watermarks use system-prompt-induced behavioral markers like explicit follow-up questions that transfer to distilled student models at 45-89% relative fidelity and can be audited via black-box LLM-as-judge queries.
TextSeal: A Localized LLM Watermark for Provenance & Distillation Protection cs.CR · 2026-05-12 · unverdicted · none · ref 12 · 2 links
TextSeal provides a localized, distortion-free LLM watermark that outperforms baselines in detection strength, remains effective in mixed human-AI text, preserves model performance, and transfers through distillation for provenance tracking.
Vaporizer: Breaking Watermarking Schemes for Large Language Model Outputs cs.CR · 2026-05-08 · unverdicted · none · ref 10
Existing LLM watermarking schemes can be defeated by semantic-preserving attacks including lexical changes, machine translation, and neural paraphrasing.
Response Time Enhances Alignment with Heterogeneous Preferences cs.LG · 2026-05-07 · unverdicted · none · ref 98
Response times modeled as drift-diffusion processes enable consistent estimation of population-average preferences from heterogeneous anonymous binary choices.
Intrinsic Fingerprint of LLMs: Continue Training is NOT All You Need to Steal A Model! cs.CR · 2025-07-02 · unverdicted · none · ref 6
Standard deviation distributions of attention matrices in LLMs remain distinctive and stable after continued training, enabling fingerprinting to trace model lineage and detect potential plagiarism such as in Pangu Pro MoE.
Can AI-Generated Text be Reliably Detected? cs.CL · 2023-03-17 · unverdicted · none · ref 64
Recursive paraphrasing attacks substantially lower detection rates for multiple AI text detectors with only minor quality loss, while a theoretical analysis ties best-case AUROC to total variation distance between human and AI distributions.
SCI-Defense: Defending Manipulation Attacks from Generative Engine Optimization cs.LG · 2026-05-21 · unverdicted · none · ref 12
SCI-Defense combines perplexity detection, semantic integrity scoring across four manipulation dimensions, and inter-candidate detection to counter GEO attacks, reporting perfect precision on Amazon product data but domain-limited recall on web passages.
Chainwash: Multi-Step Rewriting Attacks on Diffusion Language Model Watermarks cs.CL · 2026-05-06 · unverdicted · none · ref 5
Chained rewrites by open-weight LLMs reduce watermark detection on diffusion LM outputs from 87.9% to 4.86% after five steps across multiple styles and models.
LLMSniffer: Detecting LLM-Generated Code via GraphCodeBERT and Supervised Contrastive Learning cs.SE · 2026-04-17 · unverdicted · none · ref 12
LLMSniffer improves detection of LLM-generated code on GPTSniffer and Whodunit benchmarks by fine-tuning GraphCodeBERT via two-stage supervised contrastive learning plus preprocessing and MLP classification.
Findings of the Counter Turing Test: AI-Generated Text Detection cs.CL · 2026-05-20 · unreviewed · ref 17

A watermark for large language models.arXiv preprint arXiv:2301.10226, 2023a

hub tools

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer