DualGuard: Dual-stream Large Language Model Watermarking Defense against Paraphrase and Spoofing Attack

Fang Fang; Hao Li; Li Guo; Shi Wang; Yanan Cao; Yingjie Li; Yubing Ren

arxiv: 2512.16182 · v2 · submitted 2025-12-18 · 💻 cs.CR · cs.CL

DualGuard: Dual-stream Large Language Model Watermarking Defense against Paraphrase and Spoofing Attack

Hao Li , Yubing Ren , Yanan Cao , Yingjie Li , Fang Fang , Shi Wang , Li Guo This is my paper

Pith reviewed 2026-05-16 21:57 UTC · model grok-4.3

classification 💻 cs.CR cs.CL

keywords LLM watermarkingparaphrase attackspoofing attackdual-stream mechanismwatermark detectiontext qualityattack traceabilitymodel abuse prevention

0 comments

The pith

DualGuard uses dual-stream watermarking to defend LLM outputs against both paraphrase and spoofing attacks.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper proposes DualGuard to fix a gap in existing LLM watermarking that handles paraphrase attacks but ignores spoofing attacks where harmful content gets injected to break attribution. It introduces an adaptive dual-stream mechanism that injects two complementary watermark signals dynamically according to the text's semantic content. This setup detects watermarked text after paraphrasing and traces the origin of spoofed content. Tests across datasets and models show strong results in detection, robustness to attacks, traceability, and text quality preservation. The method aims to make watermarking practical for stopping model abuse on cloud platforms.

Core claim

DualGuard is the first watermarking algorithm capable of defending against both paraphrase and spoofing attacks by employing an adaptive dual-stream watermarking mechanism in which two complementary watermark signals are dynamically injected based on the semantic content. This design enables DualGuard not only to detect but also to trace spoofing attacks, thereby ensuring reliable and trustworthy watermark detection.

What carries the argument

Adaptive dual-stream watermarking mechanism that dynamically injects two complementary watermark signals based on semantic content to enable both detection and tracing.

If this is right

Watermarked LLM text stays detectable even after attackers paraphrase it.
Spoofing attempts that add harmful content can be traced back to the attacker.
The approach maintains text quality while delivering high detectability and robustness.
Watermarking becomes viable for real-world protection of model outputs against abuse.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Multiple signal streams could become a standard pattern for handling wider attack varieties in future systems.
This design might support better attribution in public LLM services by reducing false negatives from combined attacks.
Extending the semantic-based injection logic to other text transformations could be tested directly.

Load-bearing premise

Injecting two complementary watermark signals based on semantic content will not degrade text quality or detectability while providing traceability against spoofing attacks.

What would settle it

An experiment where spoofed text evades tracing or watermarked output shows clear quality loss compared to the original unwatermarked text.

Figures

Figures reproduced from arXiv: 2512.16182 by Fang Fang, Hao Li, Li Guo, Shi Wang, Yanan Cao, Yingjie Li, Yubing Ren.

**Figure 2.** Figure 2: Overall framework of our watermarking method DualGuard. Gray indicates un-watermarked tokens, while blue and [PITH_FULL_IMAGE:figures/full_fig_p003_2.png] view at source ↗

**Figure 3.** Figure 3: Experimental results of different attack models on RealNewsLike and RealToxicityPrompts dataset. [PITH_FULL_IMAGE:figures/full_fig_p007_3.png] view at source ↗

**Figure 4.** Figure 4: The impact of Dual-stream Selection. RealNewsLike BookSum 0 2 4 6 8 10 12 14 Text Perplexity Un-watermarked KGW Unbiased AAR SynthID EWD SWEET DIPmark SIR XSIR DualGuard [PITH_FULL_IMAGE:figures/full_fig_p008_4.png] view at source ↗

**Figure 5.** Figure 5: PPL on RealNewsLike and BookSum datasets. [PITH_FULL_IMAGE:figures/full_fig_p008_5.png] view at source ↗

**Figure 7.** Figure 7: Experimental results of GPT-4.1 on RealNewsLike [PITH_FULL_IMAGE:figures/full_fig_p011_7.png] view at source ↗

**Figure 8.** Figure 8: Experimental results of watermark detectability on RealNewsLike and BookSum datasets. [PITH_FULL_IMAGE:figures/full_fig_p012_8.png] view at source ↗

**Figure 9.** Figure 9: Paraphrase attack prompt. Piggyback Spoofing Attack Prompt Task Description: Maliciously rewrite the following text to express more negative, pessimistic, harmful, hateful, or other negative content. Ensure that the output meets the following criteria: 1. Modification Scope: Restrict modification to the sentence level, rewriting them whenever feasible. 2. Modification Requirement: Maliciously rewrite the s… view at source ↗

**Figure 10.** Figure 10: Piggyback spoofing attack prompt. metric, and the max_new_tokens parameter for model generation is set to 16. • Short Input, Long Answer: which assesses long-form question answering (QA) capabilities and includes 200 samples from the ELI5 dataset [12], composed of threads from the Reddit forum “Explain Like I’m Five.” ROUGE-L is adopted as the generation metric, and the max_new_tokens parameter for mode… view at source ↗

read the original abstract

With the rapid development of cloud-based services, large language models have become increasingly accessible through various web platforms. However, this accessibility has also led to growing risks of model abuse. LLM watermarking has emerged as an effective approach to mitigate such misuse and protect intellectual property. Existing watermarking algorithms, however, primarily focus on defending against paraphrase attacks while overlooking piggyback spoofing attacks, which can inject harmful content, compromise watermark reliability, and undermine trust in attribution. To address this limitation, we propose DualGuard, the first watermarking algorithm capable of defending against both paraphrase and spoofing attacks. DualGuard employs the adaptive dual-stream watermarking mechanism, in which two complementary watermark signals are dynamically injected based on the semantic content. This design enables DualGuard not only to detect but also to trace spoofing attacks, thereby ensuring reliable and trustworthy watermark detection. Extensive experiments conducted across multiple datasets and language models demonstrate that DualGuard achieves excellent detectability, robustness, traceability, and text quality, effectively advancing the state of LLM watermarking for real-world applications.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

DualGuard's dual-stream watermarking idea targets a real gap in spoofing defense but the paper never shows how the two signals are built or separated, so the central claim stays unverified.

read the letter

The paper's main contribution is a dual-stream watermarking scheme that injects two complementary signals chosen adaptively from semantic content, with the goal of detecting paraphrase attacks while also tracing which signal was altered in a spoofing attempt. That direction is new relative to single-stream methods that ignore piggyback attacks, and the authors correctly identify the threat model where an attacker adds harmful content without destroying the original watermark. The experiments section reports results across several datasets and models on the usual metrics of detectability, robustness, traceability, and text quality, which is the right set to check and gives the work a practical orientation. Credit is due for running those tests instead of stopping at the idea stage. The soft spot is exactly where the stress-test note points: there are no equations, no pseudocode, and no explicit detection rule for how the streams are generated, combined at embedding, or disentangled at verification. Without that, it is impossible to assess whether the signals interfere, whether the semantic choice creates new artifacts, or why spoofing one stream leaves the other intact. The reported performance numbers are given at summary level only, with no ablations or concrete implementation details that would allow reproduction. This paper is for researchers already working on LLM watermarking and IP protection who want to see an initial attempt at multi-signal defense. A reader looking for a concrete, implementable algorithm will come away disappointed, but someone scanning for new threat models and high-level directions can extract the core idea. It deserves peer review because the problem is timely and the proposed direction is worth testing, though any referee would need to insist on a full technical description of the dual mechanism and the raw experimental data before the claims can be taken seriously.

Referee Report

1 major / 1 minor

Summary. The manuscript introduces DualGuard, the first LLM watermarking algorithm claimed to defend against both paraphrase attacks and piggyback spoofing attacks. It employs an adaptive dual-stream mechanism in which two complementary watermark signals are dynamically injected based on the semantic content of the generated text. This design is asserted to enable not only reliable detection under paraphrasing but also traceability of spoofing attempts by identifying which signal was altered. Extensive experiments across multiple datasets and language models are reported to demonstrate strong detectability, robustness, traceability, and preservation of text quality.

Significance. If the dual-stream complementarity can be shown to preserve independent detectability after paraphrasing while enabling spoofing traceability without quality degradation, the result would meaningfully extend single-stream watermarking methods by addressing a practical attack vector that compromises attribution reliability in deployed LLM services. The work targets a real gap in robustness for cloud-based LLM platforms.

major comments (1)

[Method section (likely §3)] The description of the adaptive dual-stream mechanism provides no equations, pseudocode, or formal detection rule for signal generation, embedding combination, or verification-time separation of the two streams. Without these details it is impossible to verify that the signals remain independently detectable after paraphrase or that alteration of one enables reliable tracing of spoofing without destroying detectability of the other.

minor comments (1)

[Abstract] The abstract asserts 'excellent' performance without reporting any quantitative metrics, baseline comparisons, or specific attack strengths, which hinders immediate assessment of the claimed advances.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for their constructive feedback and for acknowledging the practical importance of defending against both paraphrase and spoofing attacks. We address the major comment below.

read point-by-point responses

Referee: [Method section (likely §3)] The description of the adaptive dual-stream mechanism provides no equations, pseudocode, or formal detection rule for signal generation, embedding combination, or verification-time separation of the two streams. Without these details it is impossible to verify that the signals remain independently detectable after paraphrase or that alteration of one enables reliable tracing of spoofing without destroying detectability of the other.

Authors: We agree that the current textual description in Section 3 lacks the formal specifications needed for independent verification. In the revised manuscript we will add explicit equations for (i) adaptive generation of the two complementary watermark signals conditioned on semantic content, (ii) the combination rule used during embedding, and (iii) the verification-time separation and detection statistics for each stream. We will also include pseudocode for the embedding and detection procedures to demonstrate how the streams preserve independent detectability after paraphrasing and how selective alteration of one stream enables reliable spoofing traceability without compromising the other. These additions will directly resolve the verifiability issue. revision: yes

Circularity Check

0 steps flagged

No significant circularity; proposal is self-contained with no derivation chain or self-citation reduction

full rationale

The paper introduces DualGuard as a novel adaptive dual-stream watermarking scheme but presents no mathematical derivations, equations, or closed-form proofs that reduce the central claim to fitted inputs or prior self-citations. The abstract and available description frame the contribution as an empirical design validated by experiments across datasets and models, with no load-bearing uniqueness theorem, ansatz smuggling, or renaming of known results. No self-citation is invoked to justify the complementarity of the two streams or their spoofing traceability. The result is therefore not forced by construction and stands as an independent proposal whose validity rests on external experimental outcomes rather than internal definitional equivalence.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

No specific free parameters, axioms, or invented entities are detailed in the abstract. The method relies on the unstated assumption that the dual-stream injection is feasible and effective.

pith-pipeline@v0.9.0 · 5500 in / 1124 out tokens · 34161 ms · 2026-05-16T21:57:07.789803+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

64 extracted references · 64 canonical work pages · 8 internal anchors

[1]

Aaronson and H

S. Aaronson and H. Kirchner. 2022. Watermarking GPT outputs. https://www. scottaaronson.com/talks/watermark.ppt

work page 2022
[2]

Josh Achiam, Steven Adler, Sandhini Agarwal, Lama Ahmad, Ilge Akkaya, Floren- cia Leoni Aleman, Diogo Almeida, Janko Altenschmidt, Sam Altman, Shyamal Anadkat, et al. 2023. Gpt-4 technical report.arXiv preprint arXiv:2303.08774 (2023)

work page internal anchor Pith review Pith/arXiv arXiv 2023
[3]

Li An, Yujian Liu, Yepeng Liu, Yang Zhang, Yuheng Bu, and Shiyu Chang

work page
[4]

InSecond Conference on Language Modeling

Defending LLM Watermarking Against Spoofing Attacks with Con- trastive Representation Learning. InSecond Conference on Language Modeling. https://openreview.net/forum?id=n5hmtkdl7k

work page
[5]

Sachin Chanchani and Ruihong Huang. 2023. Composition-contrastive Learn- ing for Sentence Embeddings. InProceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), Anna Rogers, Jordan Boyd-Graber, and Naoaki Okazaki (Eds.). Association for Computational Linguistics, Toronto, Canada, 15836–15848. doi:10....

work page doi:10.18653/v1/2023.acl-long.882 2023
[6]

Mark Chen, Jerry Tworek, Heewoo Jun, Qiming Yuan, Henrique Ponde De Oliveira Pinto, Jared Kaplan, Harri Edwards, Yuri Burda, Nicholas Joseph, Greg Brockman, et al. 2021. Evaluating large language models trained on code. arXiv preprint arXiv:2107.03374(2021)

work page internal anchor Pith review Pith/arXiv arXiv 2021
[7]

Miranda Christ, Sam Gunn, and Or Zamir. 2024. Undetectable watermarks for language models. InThe Thirty Seventh Annual Conference on Learning Theory. PMLR, 1125–1139

work page 2024
[8]

Sumanth Dathathri, Abigail See, Sumedh Ghaisas, Po-Sen Huang, Rob McAdam, Johannes Welbl, Vandana Bachani, Alex Kaskasoli, Robert Stanforth, Tatiana Matejovicova, et al. 2024. Scalable watermarking for identifying large language model outputs.Nature634, 8035 (2024), 818–823

work page 2024
[9]

Adrian de Wynter, Ishaan Watts, Tua Wongsangaroonsri, Minghui Zhang, Noura Farra, Nektar Ege Altıntoprak, Lena Baur, Samantha Claudet, Pavel Gajdušek, Qilong Gu, et al. 2025. Rtp-lx: Can llms evaluate toxicity in multilingual scenarios?. InProceedings of the AAAI Conference on Artificial Intelligence, Vol. 39. 27940– 27950

work page 2025
[10]

Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2019. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), Jill Burstein, Christy...

work page 2019
[11]

Abhimanyu Dubey, Abhinav Jauhri, Abhinav Pandey, Abhishek Kadian, Ahmad Al-Dahle, Aiesha Letman, Akhil Mathur, Alan Schelten, Amy Yang, Angela Fan, et al. 2024. The llama 3 herd of models.arXiv preprint arXiv:2407.21783(2024)

work page internal anchor Pith review Pith/arXiv arXiv 2024
[12]

Alexander Fabbri, Irene Li, Tianwei She, Suyi Li, and Dragomir Radev. 2019. Multi- News: A Large-Scale Multi-Document Summarization Dataset and Abstractive Hierarchical Model. InProceedings of the 57th Annual Meeting of the Association for Computational Linguistics, Anna Korhonen, David Traum, and Lluís Màrquez (Eds.). Association for Computational Lingui...

work page doi:10.18653/v1/p19-1102 2019
[13]

Angela Fan, Yacine Jernite, Ethan Perez, David Grangier, Jason Weston, and Michael Auli. 2019. ELI5: Long Form Question Answering. InProceedings of the 57th Annual Meeting of the Association for Computational Linguistics, Anna Korhonen, David Traum, and Lluís Màrquez (Eds.). Association for Computational Linguistics, Florence, Italy, 3558–3567. doi:10.186...

work page doi:10.18653/v1/p19-1346 2019
[14]

Samuel Gehman, Suchin Gururangan, Maarten Sap, Yejin Choi, and Noah A. Smith. 2020. RealToxicityPrompts: Evaluating Neural Toxic Degeneration in Lan- guage Models. InFindings of the Association for Computational Linguistics: EMNLP 2020, Trevor Cohn, Yulan He, and Yang Liu (Eds.). Association for Computational Linguistics, Online, 3356–3369. doi:10.18653/v...

work page doi:10.18653/v1/2020.findings-emnlp.301 2020
[15]

Guerreiro, Duarte M

Nuno M. Guerreiro, Duarte M. Alves, Jonas Waldendorf, Barry Haddow, Alexan- dra Birch, Pierre Colombo, and André F. T. Martins. 2023. Hallucinations in Large Multilingual Translation Models.Transactions of the Association for Computa- tional Linguistics11 (2023), 1500–1517. doi:10.1162/tacl_a_00615

work page doi:10.1162/tacl_a_00615 2023
[16]

Jochen Hartmann, Mark Heitmann, Christian Siebert, and Christina Schamp

work page
[17]

Fries, L

More than a Feeling: Accuracy and Application of Sentiment Analysis. International Journal of Research in Marketing40, 1 (2023), 75–87. doi:10.1016/j. ijresmar.2022.05.005

work page doi:10.1016/j 2023
[18]

Thomas Hartvigsen, Saadia Gabriel, Hamid Palangi, Maarten Sap, Dipankar Ray, and Ece Kamar. 2022. ToxiGen: A Large-Scale Machine-Generated Dataset for Adversarial and Implicit Hate Speech Detection. InProceedings of the 60th Annual Meeting of the Association of Computational Linguistics

work page 2022
[19]

Zhiwei He, Binglin Zhou, Hongkun Hao, Aiwei Liu, Xing Wang, Zhaopeng Tu, Zhuosheng Zhang, and Rui Wang. 2024. Can Watermarks Survive Translation? On the Cross-lingual Consistency of Text Watermark for Large Language Models. InProceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), Lun-Wei Ku, Andre ...

work page doi:10.18653/v1/2024.acl-long.226 2024
[20]

Abe Hou, Jingyu Zhang, Tianxing He, Yichen Wang, Yung-Sung Chuang, Hong- wei Wang, Lingfeng Shen, Benjamin Van Durme, Daniel Khashabi, and Yulia Tsvetkov. 2024. SemStamp: A Semantic Watermark with Paraphrastic Robustness for Text Generation. InProceedings of the 2024 Conference of the North Ameri- can Chapter of the Association for Computational Linguisti...

work page doi:10.18653/v1/2024.naacl-long.226 2024
[21]

Abe Hou, Jingyu Zhang, Yichen Wang, Daniel Khashabi, and Tianxing He. 2024. k-SemStamp: A Clustering-Based Semantic Watermark for Detection of Machine- Generated Text. InFindings of the Association for Computational Linguistics: ACL 2024, Lun-Wei Ku, Andre Martins, and Vivek Srikumar (Eds.). Association for Computational Linguistics, Bangkok, Thailand, 17...

work page doi:10.18653/v1/2024 2024
[22]

Zhengmian Hu, Lichang Chen, Xidong Wu, Yihan Wu, Hongyang Zhang, and Heng Huang. 2024. Unbiased Watermark for Large Language Models. InThe Twelfth International Conference on Learning Representations, ICLR 2024, Vienna, Austria, May 7-11, 2024. OpenReview.net. https://openreview.net/forum?id= uWVC5FVidc

work page 2024
[23]

Piotr Indyk and Rajeev Motwani. 1998. Approximate nearest neighbors: towards removing the curse of dimensionality. InProceedings of the Thirtieth Annual ACM Symposium on Theory of Computing(Dallas, Texas, USA)(STOC ’98). Association for Computing Machinery, New York, NY, USA, 604–613. doi:10.1145/276698. 276876

work page doi:10.1145/276698 1998
[24]

Daphne Ippolito, Daniel Duckworth, Chris Callison-Burch, and Douglas Eck

work page
[25]

In: Zong, C., Xia, F., Li, W., Navigli, R

Automatic Detection of Generated Text is Easiest when Humans are Fooled. InProceedings of the 58th Annual Meeting of the Association for Computational Linguistics, Dan Jurafsky, Joyce Chai, Natalie Schluter, and Joel Tetreault (Eds.). Association for Computational Linguistics, Online, 1808–1822. doi:10.18653/v1/ 2020.acl-main.164

work page doi:10.18653/v1/ 2020
[26]

John Kirchenbauer, Jonas Geiping, Yuxin Wen, Jonathan Katz, Ian Miers, and Tom Goldstein. 2023. A watermark for large language models. InInternational Conference on Machine Learning. PMLR, 17061–17084

work page 2023
[27]

Wojciech Kryscinski, Nazneen Rajani, Divyansh Agarwal, Caiming Xiong, and Dragomir Radev. 2022. BOOKSUM: A Collection of Datasets for Long-form Narrative Summarization. InFindings of the Association for Computational Lin- guistics: EMNLP 2022, Yoav Goldberg, Zornitsa Kozareva, and Yue Zhang (Eds.). Association for Computational Linguistics, Abu Dhabi, Uni...

work page doi:10.18653/v1/2022.findings-emnlp.488 2022
[28]

Rohith Kuditipudi, John Thickstun, Tatsunori Hashimoto, and Percy Liang. 2024. Robust Distortion-free Watermarks for Language Models.Trans. Mach. Learn. Res.2024 (2024). https://openreview.net/forum?id=FpaCL1MO2C

work page 2024
[29]

Gregory Kang Ruey Lau, Xinyuan Niu, Hieu Dao, Jiangwei Chen, Chuan-Sheng Foo, and Bryan Kian Hsiang Low. 2024. Waterfall: Scalable Framework for Robust Text Watermarking and Provenance for LLMs. InProceedings of the 2024 Con- ference on Empirical Methods in Natural Language Processing, Yaser Al-Onaizan, Mohit Bansal, and Yun-Nung Chen (Eds.). Association ...

work page doi:10.18653/v1/2024.emnlp-main.1138 2024
[30]

Taehyun Lee, Seokhee Hong, Jaewoo Ahn, Ilgee Hong, Hwaran Lee, Sangdoo Yun, Jamin Shin, and Gunhee Kim. 2024. Who Wrote this Code? Watermarking for Code Generation. InProceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), Lun-Wei Ku, Andre Martins, and Vivek Srikumar (Eds.). Association for Computa...

work page doi:10.18653/v1/2024.acl-long.268 2024
[31]

Aiwei Liu, Leyi Pan, Xuming Hu, Shiao Meng, and Lijie Wen. 2024. A Semantic Invariant Robust Watermark for Large Language Models. InThe Twelfth Interna- tional Conference on Learning Representations, ICLR 2024, Vienna, Austria, May 7-11, 2024. OpenReview.net. https://openreview.net/forum?id=6p8lpe4MNf

work page 2024
[32]

Aiwei Liu, Leyi Pan, Yijian Lu, Jingjing Li, Xuming Hu, Xi Zhang, Lijie Wen, Irwin King, Hui Xiong, and Philip Yu. 2024. A survey of text watermarking in the era of large language models.Comput. Surveys57, 2 (2024), 1–36

work page 2024
[33]

Yepeng Liu and Yuheng Bu. 2024. Adaptive text watermark for large language models. InProceedings of the 41st International Conference on Machine Learning (Vienna, Austria)(ICML’24). JMLR.org, Article 1238, 20 pages

work page 2024
[34]

Yupei Liu, Yuqi Jia, Runpeng Geng, Jinyuan Jia, and Neil Zhenqiang Gong. 2024. Formalizing and benchmarking prompt injection attacks and defenses. InPro- ceedings of the 33rd USENIX Conference on Security Symposium(Philadelphia, PA, USA)(SEC ’24). USENIX Association, USA, Article 103, 17 pages

work page 2024
[35]

Stuart Lloyd. 1982. Least squares quantization in PCM.IEEE transactions on information theory28, 2 (1982), 129–137

work page 1982
[36]

Varvara Logacheva, Daryna Dementieva, Sergey Ustyantsev, Daniil Moskovskiy, David Dale, Irina Krotova, Nikita Semenov, and Alexander Panchenko. 2022. ParaDetox: Detoxification with Parallel Data. InProceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). Association for Computational Linguistics, Dub...

work page 2022
[37]

Yijian Lu, Aiwei Liu, Dianzhi Yu, Jingjing Li, and Irwin King. 2024. An Entropy- based Text Watermarking Detection Method. InProceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), Lun-Wei Ku, Andre Martins, and Vivek Srikumar (Eds.). Association for Compu- tational Linguistics, Bangkok, Thailand, 1...

work page doi:10.18653/v1/2024.acl- 2024
[38]

David Megías, Minoru Kuribayashi, Andrea Rosales, and Wojciech Mazurczyk

work page
[39]

InProceedings of the 16th International Conference on A vailability, Reliability and Security(Vienna, Austria)(ARES ’21)

DISSIMILAR: Towards fake news detection using information hiding, signal processing and machine learning. InProceedings of the 16th International Conference on A vailability, Reliability and Security(Vienna, Austria)(ARES ’21). Association for Computing Machinery, New York, NY, USA, Article 66, 9 pages. doi:10.1145/3465481.3470088

work page doi:10.1145/3465481.3470088
[40]

George A. Miller. 1995. WordNet: a lexical database for English.Commun. ACM 38, 11 (Nov. 1995), 39–41. doi:10.1145/219717.219748

work page doi:10.1145/219717.219748 1995
[41]

Niklas Muennighoff, Nouamane Tazi, Loïc Magne, and Nils Reimers. 2022. MTEB: Massive Text Embedding Benchmark.arXiv preprint arXiv:2210.07316(2022). doi:10.48550/ARXIV.2210.07316

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2210.07316 2022
[42]

Leyi Pan, Aiwei Liu, Zhiwei He, Zitian Gao, Xuandong Zhao, Yijian Lu, Binglin Zhou, Shuliang Liu, Xuming Hu, Lijie Wen, Irwin King, and Philip S. Yu. 2024. MarkLLM: An Open-Source Toolkit for LLM Watermarking. InProceedings of the 2024 Conference on Empirical Methods in Natural Language Processing: System Demonstrations, Delia Irazu Hernandez Farias, Tom ...

work page 2024
[43]

Qi Pang, Shengyuan Hu, Wenting Zheng, and Virginia Smith. 2024. No Free Lunch in LLM Watermarking: Trade-offs in Watermarking Design Choices. InAdvances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2024, NeurIPS 2024, Vancouver, BC, Canada, December 10 - 15, 2024, Amir Globersons, Lester Mackey, D...

work page 2024
[44]

Colin Raffel, Noam Shazeer, Adam Roberts, Katherine Lee, Sharan Narang, Michael Matena, Yanqi Zhou, Wei Li, and Peter J. Liu. 2020. Exploring the limits of transfer learning with a unified text-to-text transformer.J. Mach. Learn. Res.21, 1, Article 140 (Jan. 2020), 67 pages

work page 2020
[45]

Victor Sanh, Lysandre Debut, Julien Chaumond, and Thomas Wolf. 2019. Dis- tilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter.ArXiv abs/1910.01108 (2019)

work page internal anchor Pith review Pith/arXiv arXiv 2019
[46]

Gemini Team, Petko Georgiev, Ving Ian Lei, Ryan Burnell, Libin Bai, Anmol Gulati, Garrett Tanzer, Damien Vincent, Zhufeng Pan, Shibo Wang, et al. 2024. Gemini 1.5: Unlocking multimodal understanding across millions of tokens of context.arXiv preprint arXiv:2403.05530(2024)

work page internal anchor Pith review Pith/arXiv arXiv 2024
[47]

Shangqing Tu, Yuliang Sun, Yushi Bai, Jifan Yu, Lei Hou, and Juanzi Li. 2024. WaterBench: Towards Holistic Evaluation of Watermarks for Large Language Models. InProceedings of the 62nd Annual Meeting of the Association for Computa- tional Linguistics (Volume 1: Long Papers), Lun-Wei Ku, Andre Martins, and Vivek Srikumar (Eds.). Association for Computation...

work page doi:10.18653/v1/2024.acl-long.83 2024
[48]

Yihan Wu, Zhengmian Hu, Junfeng Guo, Hongyang Zhang, and Heng Huang

work page
[49]

InProceedings of the 41st International Conference on Machine Learning(Vienna, Austria)(ICML’24)

A resilient and accessible distribution-preserving watermark for large language models. InProceedings of the 41st International Conference on Machine Learning(Vienna, Austria)(ICML’24). JMLR.org, Article 2190, 28 pages

work page
[50]

An Yang, Anfeng Li, Baosong Yang, Beichen Zhang, Binyuan Hui, Bo Zheng, Bowen Yu, Chang Gao, Chengen Huang, Chenxu Lv, et al. 2025. Qwen3 technical report.arXiv preprint arXiv:2505.09388(2025)

work page internal anchor Pith review Pith/arXiv arXiv 2025
[51]

Jifan Yu, Xiaozhi Wang, Shangqing Tu, Shulin Cao, Daniel Zhang-Li, Xin Lv, Hao Peng, Zijun Yao, Xiaohan Zhang, Hanming Li, Chunyang Li, Zheyuan Zhang, Yushi Bai, Yantao Liu, Amy Xin, Kaifeng Yun, Linlu GONG, Nianyi Lin, Jianhui Chen, Zhili Wu, Yunjia Qi, Weikai Li, Yong Guan, Kaisheng Zeng, Ji Qi, Hailong Jin, Jinxin Liu, Yu Gu, Yuan Yao, Ning Ding, Lei H...

work page 2024
[52]

Susan Zhang, Stephen Roller, Naman Goyal, Mikel Artetxe, Moya Chen, Shuohui Chen, Christopher Dewan, Mona Diab, Xian Li, Xi Victoria Lin, et al. 2022. Opt: Open pre-trained transformer language models.arXiv preprint arXiv:2205.01068 (2022)

work page internal anchor Pith review Pith/arXiv arXiv 2022
[53]

non-distortionary

Xuandong Zhao, Yu-Xiang Wang, and Lei Li. 2023. Protecting language genera- tion models via invisible watermarking. InProceedings of the 40th International Conference on Machine Learning(Honolulu, Hawaii, USA)(ICML’23). JMLR.org, Article 1774, 13 pages. DualGuard: Dual-stream Large Language Model Watermarking Defense against Paraphrase and Spoofing Attack...

work page 2023
[55]

Modification Requirement: Rewrite the sentences by employing different vocabulary and syntactic constructions to the greatest extent possible, while preserving logical coherence and consistency

work page
[56]

Preserve Meaning: Rewrite should convey the same core idea and without omitting or distorting information

work page
[57]

Fluency and Grammar: Rewrite must be natural, grammatically correct, and well-structured

work page
[61]

Just provide the text after rewriting, without any introductory or concluding phrases

Retains Nuances: Preserve connotations, implied meanings, and idiomatic expressions where appropriate. Just provide the text after rewriting, without any introductory or concluding phrases. Original Text: {Original Text} Figure 9: Paraphrase attack prompt. Piggyback Spoofing Attack Prompt Task Description: Maliciously rewrite the following text to express...

work page
[62]

Modification Scope: Restrict modification to the sentence level, rewriting them whenever feasible

work page
[63]

Modification Requirement: Maliciously rewrite the sentences by employing different vocabulary and syntactic constructions to the greatest extent possible, and ensure that the modified text has obvious negative and pessimistic content

work page
[64]

Fluency and Grammar: Malicious rewrite must be natural, grammatically correct, and well-structured

work page
[65]

Appropriate Length: Maintain a similar length unless a slight adjustment improves clarity

work page
[66]

Consistency with Context: Retain the original tone and formality (e.g., academic, casual, professional)

work page
[67]

Minimal Redundancy: Avoid unnecessary repetition while keeping essential details

work page
[68]

Explain Like I’m Five

Retains Nuances: Preserve connotations, implied meanings, and idiomatic expressions where appropriate. Just provide the text after malicious rewriting, without any introductory or concluding phrases. Original Text: {Original Text} Figure 10: Piggyback spoofing attack prompt. metric, and the max_new_tokens parameter for model gen- eration is set to 16. • S...

work page 2018

[1] [1]

Aaronson and H

S. Aaronson and H. Kirchner. 2022. Watermarking GPT outputs. https://www. scottaaronson.com/talks/watermark.ppt

work page 2022

[2] [2]

Josh Achiam, Steven Adler, Sandhini Agarwal, Lama Ahmad, Ilge Akkaya, Floren- cia Leoni Aleman, Diogo Almeida, Janko Altenschmidt, Sam Altman, Shyamal Anadkat, et al. 2023. Gpt-4 technical report.arXiv preprint arXiv:2303.08774 (2023)

work page internal anchor Pith review Pith/arXiv arXiv 2023

[3] [3]

Li An, Yujian Liu, Yepeng Liu, Yang Zhang, Yuheng Bu, and Shiyu Chang

work page

[4] [4]

InSecond Conference on Language Modeling

Defending LLM Watermarking Against Spoofing Attacks with Con- trastive Representation Learning. InSecond Conference on Language Modeling. https://openreview.net/forum?id=n5hmtkdl7k

work page

[5] [5]

Sachin Chanchani and Ruihong Huang. 2023. Composition-contrastive Learn- ing for Sentence Embeddings. InProceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), Anna Rogers, Jordan Boyd-Graber, and Naoaki Okazaki (Eds.). Association for Computational Linguistics, Toronto, Canada, 15836–15848. doi:10....

work page doi:10.18653/v1/2023.acl-long.882 2023

[6] [6]

Mark Chen, Jerry Tworek, Heewoo Jun, Qiming Yuan, Henrique Ponde De Oliveira Pinto, Jared Kaplan, Harri Edwards, Yuri Burda, Nicholas Joseph, Greg Brockman, et al. 2021. Evaluating large language models trained on code. arXiv preprint arXiv:2107.03374(2021)

work page internal anchor Pith review Pith/arXiv arXiv 2021

[7] [7]

Miranda Christ, Sam Gunn, and Or Zamir. 2024. Undetectable watermarks for language models. InThe Thirty Seventh Annual Conference on Learning Theory. PMLR, 1125–1139

work page 2024

[8] [8]

Sumanth Dathathri, Abigail See, Sumedh Ghaisas, Po-Sen Huang, Rob McAdam, Johannes Welbl, Vandana Bachani, Alex Kaskasoli, Robert Stanforth, Tatiana Matejovicova, et al. 2024. Scalable watermarking for identifying large language model outputs.Nature634, 8035 (2024), 818–823

work page 2024

[9] [9]

Adrian de Wynter, Ishaan Watts, Tua Wongsangaroonsri, Minghui Zhang, Noura Farra, Nektar Ege Altıntoprak, Lena Baur, Samantha Claudet, Pavel Gajdušek, Qilong Gu, et al. 2025. Rtp-lx: Can llms evaluate toxicity in multilingual scenarios?. InProceedings of the AAAI Conference on Artificial Intelligence, Vol. 39. 27940– 27950

work page 2025

[10] [10]

Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2019. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), Jill Burstein, Christy...

work page 2019

[11] [11]

Abhimanyu Dubey, Abhinav Jauhri, Abhinav Pandey, Abhishek Kadian, Ahmad Al-Dahle, Aiesha Letman, Akhil Mathur, Alan Schelten, Amy Yang, Angela Fan, et al. 2024. The llama 3 herd of models.arXiv preprint arXiv:2407.21783(2024)

work page internal anchor Pith review Pith/arXiv arXiv 2024

[12] [12]

Alexander Fabbri, Irene Li, Tianwei She, Suyi Li, and Dragomir Radev. 2019. Multi- News: A Large-Scale Multi-Document Summarization Dataset and Abstractive Hierarchical Model. InProceedings of the 57th Annual Meeting of the Association for Computational Linguistics, Anna Korhonen, David Traum, and Lluís Màrquez (Eds.). Association for Computational Lingui...

work page doi:10.18653/v1/p19-1102 2019

[13] [13]

Angela Fan, Yacine Jernite, Ethan Perez, David Grangier, Jason Weston, and Michael Auli. 2019. ELI5: Long Form Question Answering. InProceedings of the 57th Annual Meeting of the Association for Computational Linguistics, Anna Korhonen, David Traum, and Lluís Màrquez (Eds.). Association for Computational Linguistics, Florence, Italy, 3558–3567. doi:10.186...

work page doi:10.18653/v1/p19-1346 2019

[14] [14]

Samuel Gehman, Suchin Gururangan, Maarten Sap, Yejin Choi, and Noah A. Smith. 2020. RealToxicityPrompts: Evaluating Neural Toxic Degeneration in Lan- guage Models. InFindings of the Association for Computational Linguistics: EMNLP 2020, Trevor Cohn, Yulan He, and Yang Liu (Eds.). Association for Computational Linguistics, Online, 3356–3369. doi:10.18653/v...

work page doi:10.18653/v1/2020.findings-emnlp.301 2020

[15] [15]

Guerreiro, Duarte M

Nuno M. Guerreiro, Duarte M. Alves, Jonas Waldendorf, Barry Haddow, Alexan- dra Birch, Pierre Colombo, and André F. T. Martins. 2023. Hallucinations in Large Multilingual Translation Models.Transactions of the Association for Computa- tional Linguistics11 (2023), 1500–1517. doi:10.1162/tacl_a_00615

work page doi:10.1162/tacl_a_00615 2023

[16] [16]

Jochen Hartmann, Mark Heitmann, Christian Siebert, and Christina Schamp

work page

[17] [17]

Fries, L

More than a Feeling: Accuracy and Application of Sentiment Analysis. International Journal of Research in Marketing40, 1 (2023), 75–87. doi:10.1016/j. ijresmar.2022.05.005

work page doi:10.1016/j 2023

[18] [18]

Thomas Hartvigsen, Saadia Gabriel, Hamid Palangi, Maarten Sap, Dipankar Ray, and Ece Kamar. 2022. ToxiGen: A Large-Scale Machine-Generated Dataset for Adversarial and Implicit Hate Speech Detection. InProceedings of the 60th Annual Meeting of the Association of Computational Linguistics

work page 2022

[19] [19]

Zhiwei He, Binglin Zhou, Hongkun Hao, Aiwei Liu, Xing Wang, Zhaopeng Tu, Zhuosheng Zhang, and Rui Wang. 2024. Can Watermarks Survive Translation? On the Cross-lingual Consistency of Text Watermark for Large Language Models. InProceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), Lun-Wei Ku, Andre ...

work page doi:10.18653/v1/2024.acl-long.226 2024

[20] [20]

Abe Hou, Jingyu Zhang, Tianxing He, Yichen Wang, Yung-Sung Chuang, Hong- wei Wang, Lingfeng Shen, Benjamin Van Durme, Daniel Khashabi, and Yulia Tsvetkov. 2024. SemStamp: A Semantic Watermark with Paraphrastic Robustness for Text Generation. InProceedings of the 2024 Conference of the North Ameri- can Chapter of the Association for Computational Linguisti...

work page doi:10.18653/v1/2024.naacl-long.226 2024

[21] [21]

Abe Hou, Jingyu Zhang, Yichen Wang, Daniel Khashabi, and Tianxing He. 2024. k-SemStamp: A Clustering-Based Semantic Watermark for Detection of Machine- Generated Text. InFindings of the Association for Computational Linguistics: ACL 2024, Lun-Wei Ku, Andre Martins, and Vivek Srikumar (Eds.). Association for Computational Linguistics, Bangkok, Thailand, 17...

work page doi:10.18653/v1/2024 2024

[22] [22]

Zhengmian Hu, Lichang Chen, Xidong Wu, Yihan Wu, Hongyang Zhang, and Heng Huang. 2024. Unbiased Watermark for Large Language Models. InThe Twelfth International Conference on Learning Representations, ICLR 2024, Vienna, Austria, May 7-11, 2024. OpenReview.net. https://openreview.net/forum?id= uWVC5FVidc

work page 2024

[23] [23]

Piotr Indyk and Rajeev Motwani. 1998. Approximate nearest neighbors: towards removing the curse of dimensionality. InProceedings of the Thirtieth Annual ACM Symposium on Theory of Computing(Dallas, Texas, USA)(STOC ’98). Association for Computing Machinery, New York, NY, USA, 604–613. doi:10.1145/276698. 276876

work page doi:10.1145/276698 1998

[24] [24]

Daphne Ippolito, Daniel Duckworth, Chris Callison-Burch, and Douglas Eck

work page

[25] [25]

In: Zong, C., Xia, F., Li, W., Navigli, R

Automatic Detection of Generated Text is Easiest when Humans are Fooled. InProceedings of the 58th Annual Meeting of the Association for Computational Linguistics, Dan Jurafsky, Joyce Chai, Natalie Schluter, and Joel Tetreault (Eds.). Association for Computational Linguistics, Online, 1808–1822. doi:10.18653/v1/ 2020.acl-main.164

work page doi:10.18653/v1/ 2020

[26] [26]

John Kirchenbauer, Jonas Geiping, Yuxin Wen, Jonathan Katz, Ian Miers, and Tom Goldstein. 2023. A watermark for large language models. InInternational Conference on Machine Learning. PMLR, 17061–17084

work page 2023

[27] [27]

Wojciech Kryscinski, Nazneen Rajani, Divyansh Agarwal, Caiming Xiong, and Dragomir Radev. 2022. BOOKSUM: A Collection of Datasets for Long-form Narrative Summarization. InFindings of the Association for Computational Lin- guistics: EMNLP 2022, Yoav Goldberg, Zornitsa Kozareva, and Yue Zhang (Eds.). Association for Computational Linguistics, Abu Dhabi, Uni...

work page doi:10.18653/v1/2022.findings-emnlp.488 2022

[28] [28]

Rohith Kuditipudi, John Thickstun, Tatsunori Hashimoto, and Percy Liang. 2024. Robust Distortion-free Watermarks for Language Models.Trans. Mach. Learn. Res.2024 (2024). https://openreview.net/forum?id=FpaCL1MO2C

work page 2024

[29] [29]

Gregory Kang Ruey Lau, Xinyuan Niu, Hieu Dao, Jiangwei Chen, Chuan-Sheng Foo, and Bryan Kian Hsiang Low. 2024. Waterfall: Scalable Framework for Robust Text Watermarking and Provenance for LLMs. InProceedings of the 2024 Con- ference on Empirical Methods in Natural Language Processing, Yaser Al-Onaizan, Mohit Bansal, and Yun-Nung Chen (Eds.). Association ...

work page doi:10.18653/v1/2024.emnlp-main.1138 2024

[30] [30]

Taehyun Lee, Seokhee Hong, Jaewoo Ahn, Ilgee Hong, Hwaran Lee, Sangdoo Yun, Jamin Shin, and Gunhee Kim. 2024. Who Wrote this Code? Watermarking for Code Generation. InProceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), Lun-Wei Ku, Andre Martins, and Vivek Srikumar (Eds.). Association for Computa...

work page doi:10.18653/v1/2024.acl-long.268 2024

[31] [31]

Aiwei Liu, Leyi Pan, Xuming Hu, Shiao Meng, and Lijie Wen. 2024. A Semantic Invariant Robust Watermark for Large Language Models. InThe Twelfth Interna- tional Conference on Learning Representations, ICLR 2024, Vienna, Austria, May 7-11, 2024. OpenReview.net. https://openreview.net/forum?id=6p8lpe4MNf

work page 2024

[32] [32]

Aiwei Liu, Leyi Pan, Yijian Lu, Jingjing Li, Xuming Hu, Xi Zhang, Lijie Wen, Irwin King, Hui Xiong, and Philip Yu. 2024. A survey of text watermarking in the era of large language models.Comput. Surveys57, 2 (2024), 1–36

work page 2024

[33] [33]

Yepeng Liu and Yuheng Bu. 2024. Adaptive text watermark for large language models. InProceedings of the 41st International Conference on Machine Learning (Vienna, Austria)(ICML’24). JMLR.org, Article 1238, 20 pages

work page 2024

[34] [34]

Yupei Liu, Yuqi Jia, Runpeng Geng, Jinyuan Jia, and Neil Zhenqiang Gong. 2024. Formalizing and benchmarking prompt injection attacks and defenses. InPro- ceedings of the 33rd USENIX Conference on Security Symposium(Philadelphia, PA, USA)(SEC ’24). USENIX Association, USA, Article 103, 17 pages

work page 2024

[35] [35]

Stuart Lloyd. 1982. Least squares quantization in PCM.IEEE transactions on information theory28, 2 (1982), 129–137

work page 1982

[36] [36]

Varvara Logacheva, Daryna Dementieva, Sergey Ustyantsev, Daniil Moskovskiy, David Dale, Irina Krotova, Nikita Semenov, and Alexander Panchenko. 2022. ParaDetox: Detoxification with Parallel Data. InProceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). Association for Computational Linguistics, Dub...

work page 2022

[37] [37]

Yijian Lu, Aiwei Liu, Dianzhi Yu, Jingjing Li, and Irwin King. 2024. An Entropy- based Text Watermarking Detection Method. InProceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), Lun-Wei Ku, Andre Martins, and Vivek Srikumar (Eds.). Association for Compu- tational Linguistics, Bangkok, Thailand, 1...

work page doi:10.18653/v1/2024.acl- 2024

[38] [38]

David Megías, Minoru Kuribayashi, Andrea Rosales, and Wojciech Mazurczyk

work page

[39] [39]

InProceedings of the 16th International Conference on A vailability, Reliability and Security(Vienna, Austria)(ARES ’21)

DISSIMILAR: Towards fake news detection using information hiding, signal processing and machine learning. InProceedings of the 16th International Conference on A vailability, Reliability and Security(Vienna, Austria)(ARES ’21). Association for Computing Machinery, New York, NY, USA, Article 66, 9 pages. doi:10.1145/3465481.3470088

work page doi:10.1145/3465481.3470088

[40] [40]

George A. Miller. 1995. WordNet: a lexical database for English.Commun. ACM 38, 11 (Nov. 1995), 39–41. doi:10.1145/219717.219748

work page doi:10.1145/219717.219748 1995

[41] [41]

Niklas Muennighoff, Nouamane Tazi, Loïc Magne, and Nils Reimers. 2022. MTEB: Massive Text Embedding Benchmark.arXiv preprint arXiv:2210.07316(2022). doi:10.48550/ARXIV.2210.07316

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2210.07316 2022

[42] [42]

Leyi Pan, Aiwei Liu, Zhiwei He, Zitian Gao, Xuandong Zhao, Yijian Lu, Binglin Zhou, Shuliang Liu, Xuming Hu, Lijie Wen, Irwin King, and Philip S. Yu. 2024. MarkLLM: An Open-Source Toolkit for LLM Watermarking. InProceedings of the 2024 Conference on Empirical Methods in Natural Language Processing: System Demonstrations, Delia Irazu Hernandez Farias, Tom ...

work page 2024

[43] [43]

Qi Pang, Shengyuan Hu, Wenting Zheng, and Virginia Smith. 2024. No Free Lunch in LLM Watermarking: Trade-offs in Watermarking Design Choices. InAdvances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2024, NeurIPS 2024, Vancouver, BC, Canada, December 10 - 15, 2024, Amir Globersons, Lester Mackey, D...

work page 2024

[44] [44]

Colin Raffel, Noam Shazeer, Adam Roberts, Katherine Lee, Sharan Narang, Michael Matena, Yanqi Zhou, Wei Li, and Peter J. Liu. 2020. Exploring the limits of transfer learning with a unified text-to-text transformer.J. Mach. Learn. Res.21, 1, Article 140 (Jan. 2020), 67 pages

work page 2020

[45] [45]

Victor Sanh, Lysandre Debut, Julien Chaumond, and Thomas Wolf. 2019. Dis- tilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter.ArXiv abs/1910.01108 (2019)

work page internal anchor Pith review Pith/arXiv arXiv 2019

[46] [46]

Gemini Team, Petko Georgiev, Ving Ian Lei, Ryan Burnell, Libin Bai, Anmol Gulati, Garrett Tanzer, Damien Vincent, Zhufeng Pan, Shibo Wang, et al. 2024. Gemini 1.5: Unlocking multimodal understanding across millions of tokens of context.arXiv preprint arXiv:2403.05530(2024)

work page internal anchor Pith review Pith/arXiv arXiv 2024

[47] [47]

Shangqing Tu, Yuliang Sun, Yushi Bai, Jifan Yu, Lei Hou, and Juanzi Li. 2024. WaterBench: Towards Holistic Evaluation of Watermarks for Large Language Models. InProceedings of the 62nd Annual Meeting of the Association for Computa- tional Linguistics (Volume 1: Long Papers), Lun-Wei Ku, Andre Martins, and Vivek Srikumar (Eds.). Association for Computation...

work page doi:10.18653/v1/2024.acl-long.83 2024

[48] [48]

Yihan Wu, Zhengmian Hu, Junfeng Guo, Hongyang Zhang, and Heng Huang

work page

[49] [49]

InProceedings of the 41st International Conference on Machine Learning(Vienna, Austria)(ICML’24)

A resilient and accessible distribution-preserving watermark for large language models. InProceedings of the 41st International Conference on Machine Learning(Vienna, Austria)(ICML’24). JMLR.org, Article 2190, 28 pages

work page

[50] [50]

An Yang, Anfeng Li, Baosong Yang, Beichen Zhang, Binyuan Hui, Bo Zheng, Bowen Yu, Chang Gao, Chengen Huang, Chenxu Lv, et al. 2025. Qwen3 technical report.arXiv preprint arXiv:2505.09388(2025)

work page internal anchor Pith review Pith/arXiv arXiv 2025

[51] [51]

Jifan Yu, Xiaozhi Wang, Shangqing Tu, Shulin Cao, Daniel Zhang-Li, Xin Lv, Hao Peng, Zijun Yao, Xiaohan Zhang, Hanming Li, Chunyang Li, Zheyuan Zhang, Yushi Bai, Yantao Liu, Amy Xin, Kaifeng Yun, Linlu GONG, Nianyi Lin, Jianhui Chen, Zhili Wu, Yunjia Qi, Weikai Li, Yong Guan, Kaisheng Zeng, Ji Qi, Hailong Jin, Jinxin Liu, Yu Gu, Yuan Yao, Ning Ding, Lei H...

work page 2024

[52] [52]

Susan Zhang, Stephen Roller, Naman Goyal, Mikel Artetxe, Moya Chen, Shuohui Chen, Christopher Dewan, Mona Diab, Xian Li, Xi Victoria Lin, et al. 2022. Opt: Open pre-trained transformer language models.arXiv preprint arXiv:2205.01068 (2022)

work page internal anchor Pith review Pith/arXiv arXiv 2022

[53] [53]

non-distortionary

Xuandong Zhao, Yu-Xiang Wang, and Lei Li. 2023. Protecting language genera- tion models via invisible watermarking. InProceedings of the 40th International Conference on Machine Learning(Honolulu, Hawaii, USA)(ICML’23). JMLR.org, Article 1774, 13 pages. DualGuard: Dual-stream Large Language Model Watermarking Defense against Paraphrase and Spoofing Attack...

work page 2023

[54] [55]

Modification Requirement: Rewrite the sentences by employing different vocabulary and syntactic constructions to the greatest extent possible, while preserving logical coherence and consistency

work page

[55] [56]

Preserve Meaning: Rewrite should convey the same core idea and without omitting or distorting information

work page

[56] [57]

Fluency and Grammar: Rewrite must be natural, grammatically correct, and well-structured

work page

[57] [61]

Just provide the text after rewriting, without any introductory or concluding phrases

Retains Nuances: Preserve connotations, implied meanings, and idiomatic expressions where appropriate. Just provide the text after rewriting, without any introductory or concluding phrases. Original Text: {Original Text} Figure 9: Paraphrase attack prompt. Piggyback Spoofing Attack Prompt Task Description: Maliciously rewrite the following text to express...

work page

[58] [62]

Modification Scope: Restrict modification to the sentence level, rewriting them whenever feasible

work page

[59] [63]

Modification Requirement: Maliciously rewrite the sentences by employing different vocabulary and syntactic constructions to the greatest extent possible, and ensure that the modified text has obvious negative and pessimistic content

work page

[60] [64]

Fluency and Grammar: Malicious rewrite must be natural, grammatically correct, and well-structured

work page

[61] [65]

Appropriate Length: Maintain a similar length unless a slight adjustment improves clarity

work page

[62] [66]

Consistency with Context: Retain the original tone and formality (e.g., academic, casual, professional)

work page

[63] [67]

Minimal Redundancy: Avoid unnecessary repetition while keeping essential details

work page

[64] [68]

Explain Like I’m Five

Retains Nuances: Preserve connotations, implied meanings, and idiomatic expressions where appropriate. Just provide the text after malicious rewriting, without any introductory or concluding phrases. Original Text: {Original Text} Figure 10: Piggyback spoofing attack prompt. metric, and the max_new_tokens parameter for model gen- eration is set to 16. • S...

work page 2018