SLAM achieves 100% detection on Gemma-2 models with only 1-2 point quality cost by causally steering SAE-identified residual-stream directions for linguistic structure.
Semstamp: A semantic watermark with paraphrastic robustness for text generation.arXiv preprint arXiv:2310.03991
6 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
representative citing papers
SWAN uses AMR to embed semantic watermarks that persist through paraphrases, matching SOTA detection on original text and improving AUC by 13.9 points on paraphrased RealNews data.
A survey of LLM copyright protection that unifies text watermarking, model watermarking, and model fingerprinting while presenting new coverage of fingerprint transfer and removal.
Watermarking enables entity-level attribution and monitoring through signal aggregation even in zero-bit designs, creating an unavoidable dual-use tension between attribution and surveillance.
TextSeal provides a localized, distortion-free LLM watermark that outperforms baselines in detection strength, remains effective in mixed human-AI text, preserves model performance, and transfers through distillation for provenance tracking.
LLM watermarking adoption is limited by misaligned stakeholder incentives; incentive-aligned approaches such as in-context watermarking can enable practical use in targeted domains like education and peer review.
citing papers explorer
-
SLAM: Structural Linguistic Activation Marking for Language Models
SLAM achieves 100% detection on Gemma-2 models with only 1-2 point quality cost by causally steering SAE-identified residual-stream directions for linguistic structure.
-
SWAN: Semantic Watermarking with Abstract Meaning Representation
SWAN uses AMR to embed semantic watermarks that persist through paraphrases, matching SOTA detection on original text and improving AUC by 13.9 points on paraphrased RealNews data.
-
Copyright Protection for Large Language Models: A Survey of Methods, Challenges, and Trends
A survey of LLM copyright protection that unifies text watermarking, model watermarking, and model fingerprinting while presenting new coverage of fingerprint transfer and removal.
-
Watermarking Should Be Treated as a Monitoring Primitive
Watermarking enables entity-level attribution and monitoring through signal aggregation even in zero-bit designs, creating an unavoidable dual-use tension between attribution and surveillance.
-
TextSeal: A Localized LLM Watermark for Provenance & Distillation Protection
TextSeal provides a localized, distortion-free LLM watermark that outperforms baselines in detection strength, remains effective in mixed human-AI text, preserves model performance, and transfers through distillation for provenance tracking.
-
Position: LLM Watermarking Should Align Stakeholders' Incentives for Practical Adoption
LLM watermarking adoption is limited by misaligned stakeholder incentives; incentive-aligned approaches such as in-context watermarking can enable practical use in targeted domains like education and peer review.