DualGuard: Dual-stream Large Language Model Watermarking Defense against Paraphrase and Spoofing Attack
Pith reviewed 2026-05-16 21:57 UTC · model grok-4.3
The pith
DualGuard uses dual-stream watermarking to defend LLM outputs against both paraphrase and spoofing attacks.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
DualGuard is the first watermarking algorithm capable of defending against both paraphrase and spoofing attacks by employing an adaptive dual-stream watermarking mechanism in which two complementary watermark signals are dynamically injected based on the semantic content. This design enables DualGuard not only to detect but also to trace spoofing attacks, thereby ensuring reliable and trustworthy watermark detection.
What carries the argument
Adaptive dual-stream watermarking mechanism that dynamically injects two complementary watermark signals based on semantic content to enable both detection and tracing.
If this is right
- Watermarked LLM text stays detectable even after attackers paraphrase it.
- Spoofing attempts that add harmful content can be traced back to the attacker.
- The approach maintains text quality while delivering high detectability and robustness.
- Watermarking becomes viable for real-world protection of model outputs against abuse.
Where Pith is reading between the lines
- Multiple signal streams could become a standard pattern for handling wider attack varieties in future systems.
- This design might support better attribution in public LLM services by reducing false negatives from combined attacks.
- Extending the semantic-based injection logic to other text transformations could be tested directly.
Load-bearing premise
Injecting two complementary watermark signals based on semantic content will not degrade text quality or detectability while providing traceability against spoofing attacks.
What would settle it
An experiment where spoofed text evades tracing or watermarked output shows clear quality loss compared to the original unwatermarked text.
Figures
read the original abstract
With the rapid development of cloud-based services, large language models have become increasingly accessible through various web platforms. However, this accessibility has also led to growing risks of model abuse. LLM watermarking has emerged as an effective approach to mitigate such misuse and protect intellectual property. Existing watermarking algorithms, however, primarily focus on defending against paraphrase attacks while overlooking piggyback spoofing attacks, which can inject harmful content, compromise watermark reliability, and undermine trust in attribution. To address this limitation, we propose DualGuard, the first watermarking algorithm capable of defending against both paraphrase and spoofing attacks. DualGuard employs the adaptive dual-stream watermarking mechanism, in which two complementary watermark signals are dynamically injected based on the semantic content. This design enables DualGuard not only to detect but also to trace spoofing attacks, thereby ensuring reliable and trustworthy watermark detection. Extensive experiments conducted across multiple datasets and language models demonstrate that DualGuard achieves excellent detectability, robustness, traceability, and text quality, effectively advancing the state of LLM watermarking for real-world applications.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript introduces DualGuard, the first LLM watermarking algorithm claimed to defend against both paraphrase attacks and piggyback spoofing attacks. It employs an adaptive dual-stream mechanism in which two complementary watermark signals are dynamically injected based on the semantic content of the generated text. This design is asserted to enable not only reliable detection under paraphrasing but also traceability of spoofing attempts by identifying which signal was altered. Extensive experiments across multiple datasets and language models are reported to demonstrate strong detectability, robustness, traceability, and preservation of text quality.
Significance. If the dual-stream complementarity can be shown to preserve independent detectability after paraphrasing while enabling spoofing traceability without quality degradation, the result would meaningfully extend single-stream watermarking methods by addressing a practical attack vector that compromises attribution reliability in deployed LLM services. The work targets a real gap in robustness for cloud-based LLM platforms.
major comments (1)
- [Method section (likely §3)] The description of the adaptive dual-stream mechanism provides no equations, pseudocode, or formal detection rule for signal generation, embedding combination, or verification-time separation of the two streams. Without these details it is impossible to verify that the signals remain independently detectable after paraphrase or that alteration of one enables reliable tracing of spoofing without destroying detectability of the other.
minor comments (1)
- [Abstract] The abstract asserts 'excellent' performance without reporting any quantitative metrics, baseline comparisons, or specific attack strengths, which hinders immediate assessment of the claimed advances.
Simulated Author's Rebuttal
We thank the referee for their constructive feedback and for acknowledging the practical importance of defending against both paraphrase and spoofing attacks. We address the major comment below.
read point-by-point responses
-
Referee: [Method section (likely §3)] The description of the adaptive dual-stream mechanism provides no equations, pseudocode, or formal detection rule for signal generation, embedding combination, or verification-time separation of the two streams. Without these details it is impossible to verify that the signals remain independently detectable after paraphrase or that alteration of one enables reliable tracing of spoofing without destroying detectability of the other.
Authors: We agree that the current textual description in Section 3 lacks the formal specifications needed for independent verification. In the revised manuscript we will add explicit equations for (i) adaptive generation of the two complementary watermark signals conditioned on semantic content, (ii) the combination rule used during embedding, and (iii) the verification-time separation and detection statistics for each stream. We will also include pseudocode for the embedding and detection procedures to demonstrate how the streams preserve independent detectability after paraphrasing and how selective alteration of one stream enables reliable spoofing traceability without compromising the other. These additions will directly resolve the verifiability issue. revision: yes
Circularity Check
No significant circularity; proposal is self-contained with no derivation chain or self-citation reduction
full rationale
The paper introduces DualGuard as a novel adaptive dual-stream watermarking scheme but presents no mathematical derivations, equations, or closed-form proofs that reduce the central claim to fitted inputs or prior self-citations. The abstract and available description frame the contribution as an empirical design validated by experiments across datasets and models, with no load-bearing uniqueness theorem, ansatz smuggling, or renaming of known results. No self-citation is invoked to justify the complementarity of the two streams or their spoofing traceability. The result is therefore not forced by construction and stands as an independent proposal whose validity rests on external experimental outcomes rather than internal definitional equivalence.
Axiom & Free-Parameter Ledger
Reference graph
Works this paper leans on
-
[1]
S. Aaronson and H. Kirchner. 2022. Watermarking GPT outputs. https://www. scottaaronson.com/talks/watermark.ppt
work page 2022
-
[2]
Josh Achiam, Steven Adler, Sandhini Agarwal, Lama Ahmad, Ilge Akkaya, Floren- cia Leoni Aleman, Diogo Almeida, Janko Altenschmidt, Sam Altman, Shyamal Anadkat, et al. 2023. Gpt-4 technical report.arXiv preprint arXiv:2303.08774 (2023)
work page internal anchor Pith review Pith/arXiv arXiv 2023
-
[3]
Li An, Yujian Liu, Yepeng Liu, Yang Zhang, Yuheng Bu, and Shiyu Chang
-
[4]
InSecond Conference on Language Modeling
Defending LLM Watermarking Against Spoofing Attacks with Con- trastive Representation Learning. InSecond Conference on Language Modeling. https://openreview.net/forum?id=n5hmtkdl7k
-
[5]
Sachin Chanchani and Ruihong Huang. 2023. Composition-contrastive Learn- ing for Sentence Embeddings. InProceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), Anna Rogers, Jordan Boyd-Graber, and Naoaki Okazaki (Eds.). Association for Computational Linguistics, Toronto, Canada, 15836–15848. doi:10....
-
[6]
Mark Chen, Jerry Tworek, Heewoo Jun, Qiming Yuan, Henrique Ponde De Oliveira Pinto, Jared Kaplan, Harri Edwards, Yuri Burda, Nicholas Joseph, Greg Brockman, et al. 2021. Evaluating large language models trained on code. arXiv preprint arXiv:2107.03374(2021)
work page internal anchor Pith review Pith/arXiv arXiv 2021
-
[7]
Miranda Christ, Sam Gunn, and Or Zamir. 2024. Undetectable watermarks for language models. InThe Thirty Seventh Annual Conference on Learning Theory. PMLR, 1125–1139
work page 2024
-
[8]
Sumanth Dathathri, Abigail See, Sumedh Ghaisas, Po-Sen Huang, Rob McAdam, Johannes Welbl, Vandana Bachani, Alex Kaskasoli, Robert Stanforth, Tatiana Matejovicova, et al. 2024. Scalable watermarking for identifying large language model outputs.Nature634, 8035 (2024), 818–823
work page 2024
-
[9]
Adrian de Wynter, Ishaan Watts, Tua Wongsangaroonsri, Minghui Zhang, Noura Farra, Nektar Ege Altıntoprak, Lena Baur, Samantha Claudet, Pavel Gajdušek, Qilong Gu, et al. 2025. Rtp-lx: Can llms evaluate toxicity in multilingual scenarios?. InProceedings of the AAAI Conference on Artificial Intelligence, Vol. 39. 27940– 27950
work page 2025
-
[10]
Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2019. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), Jill Burstein, Christy...
work page 2019
-
[11]
Abhimanyu Dubey, Abhinav Jauhri, Abhinav Pandey, Abhishek Kadian, Ahmad Al-Dahle, Aiesha Letman, Akhil Mathur, Alan Schelten, Amy Yang, Angela Fan, et al. 2024. The llama 3 herd of models.arXiv preprint arXiv:2407.21783(2024)
work page internal anchor Pith review Pith/arXiv arXiv 2024
-
[12]
Alexander Fabbri, Irene Li, Tianwei She, Suyi Li, and Dragomir Radev. 2019. Multi- News: A Large-Scale Multi-Document Summarization Dataset and Abstractive Hierarchical Model. InProceedings of the 57th Annual Meeting of the Association for Computational Linguistics, Anna Korhonen, David Traum, and Lluís Màrquez (Eds.). Association for Computational Lingui...
-
[13]
Angela Fan, Yacine Jernite, Ethan Perez, David Grangier, Jason Weston, and Michael Auli. 2019. ELI5: Long Form Question Answering. InProceedings of the 57th Annual Meeting of the Association for Computational Linguistics, Anna Korhonen, David Traum, and Lluís Màrquez (Eds.). Association for Computational Linguistics, Florence, Italy, 3558–3567. doi:10.186...
-
[14]
Samuel Gehman, Suchin Gururangan, Maarten Sap, Yejin Choi, and Noah A. Smith. 2020. RealToxicityPrompts: Evaluating Neural Toxic Degeneration in Lan- guage Models. InFindings of the Association for Computational Linguistics: EMNLP 2020, Trevor Cohn, Yulan He, and Yang Liu (Eds.). Association for Computational Linguistics, Online, 3356–3369. doi:10.18653/v...
-
[15]
Nuno M. Guerreiro, Duarte M. Alves, Jonas Waldendorf, Barry Haddow, Alexan- dra Birch, Pierre Colombo, and André F. T. Martins. 2023. Hallucinations in Large Multilingual Translation Models.Transactions of the Association for Computa- tional Linguistics11 (2023), 1500–1517. doi:10.1162/tacl_a_00615
-
[16]
Jochen Hartmann, Mark Heitmann, Christian Siebert, and Christina Schamp
-
[17]
More than a Feeling: Accuracy and Application of Sentiment Analysis. International Journal of Research in Marketing40, 1 (2023), 75–87. doi:10.1016/j. ijresmar.2022.05.005
work page doi:10.1016/j 2023
-
[18]
Thomas Hartvigsen, Saadia Gabriel, Hamid Palangi, Maarten Sap, Dipankar Ray, and Ece Kamar. 2022. ToxiGen: A Large-Scale Machine-Generated Dataset for Adversarial and Implicit Hate Speech Detection. InProceedings of the 60th Annual Meeting of the Association of Computational Linguistics
work page 2022
-
[19]
Zhiwei He, Binglin Zhou, Hongkun Hao, Aiwei Liu, Xing Wang, Zhaopeng Tu, Zhuosheng Zhang, and Rui Wang. 2024. Can Watermarks Survive Translation? On the Cross-lingual Consistency of Text Watermark for Large Language Models. InProceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), Lun-Wei Ku, Andre ...
-
[20]
Abe Hou, Jingyu Zhang, Tianxing He, Yichen Wang, Yung-Sung Chuang, Hong- wei Wang, Lingfeng Shen, Benjamin Van Durme, Daniel Khashabi, and Yulia Tsvetkov. 2024. SemStamp: A Semantic Watermark with Paraphrastic Robustness for Text Generation. InProceedings of the 2024 Conference of the North Ameri- can Chapter of the Association for Computational Linguisti...
-
[21]
Abe Hou, Jingyu Zhang, Yichen Wang, Daniel Khashabi, and Tianxing He. 2024. k-SemStamp: A Clustering-Based Semantic Watermark for Detection of Machine- Generated Text. InFindings of the Association for Computational Linguistics: ACL 2024, Lun-Wei Ku, Andre Martins, and Vivek Srikumar (Eds.). Association for Computational Linguistics, Bangkok, Thailand, 17...
-
[22]
Zhengmian Hu, Lichang Chen, Xidong Wu, Yihan Wu, Hongyang Zhang, and Heng Huang. 2024. Unbiased Watermark for Large Language Models. InThe Twelfth International Conference on Learning Representations, ICLR 2024, Vienna, Austria, May 7-11, 2024. OpenReview.net. https://openreview.net/forum?id= uWVC5FVidc
work page 2024
-
[23]
Piotr Indyk and Rajeev Motwani. 1998. Approximate nearest neighbors: towards removing the curse of dimensionality. InProceedings of the Thirtieth Annual ACM Symposium on Theory of Computing(Dallas, Texas, USA)(STOC ’98). Association for Computing Machinery, New York, NY, USA, 604–613. doi:10.1145/276698. 276876
-
[24]
Daphne Ippolito, Daniel Duckworth, Chris Callison-Burch, and Douglas Eck
-
[25]
In: Zong, C., Xia, F., Li, W., Navigli, R
Automatic Detection of Generated Text is Easiest when Humans are Fooled. InProceedings of the 58th Annual Meeting of the Association for Computational Linguistics, Dan Jurafsky, Joyce Chai, Natalie Schluter, and Joel Tetreault (Eds.). Association for Computational Linguistics, Online, 1808–1822. doi:10.18653/v1/ 2020.acl-main.164
-
[26]
John Kirchenbauer, Jonas Geiping, Yuxin Wen, Jonathan Katz, Ian Miers, and Tom Goldstein. 2023. A watermark for large language models. InInternational Conference on Machine Learning. PMLR, 17061–17084
work page 2023
-
[27]
Wojciech Kryscinski, Nazneen Rajani, Divyansh Agarwal, Caiming Xiong, and Dragomir Radev. 2022. BOOKSUM: A Collection of Datasets for Long-form Narrative Summarization. InFindings of the Association for Computational Lin- guistics: EMNLP 2022, Yoav Goldberg, Zornitsa Kozareva, and Yue Zhang (Eds.). Association for Computational Linguistics, Abu Dhabi, Uni...
-
[28]
Rohith Kuditipudi, John Thickstun, Tatsunori Hashimoto, and Percy Liang. 2024. Robust Distortion-free Watermarks for Language Models.Trans. Mach. Learn. Res.2024 (2024). https://openreview.net/forum?id=FpaCL1MO2C
work page 2024
-
[29]
Gregory Kang Ruey Lau, Xinyuan Niu, Hieu Dao, Jiangwei Chen, Chuan-Sheng Foo, and Bryan Kian Hsiang Low. 2024. Waterfall: Scalable Framework for Robust Text Watermarking and Provenance for LLMs. InProceedings of the 2024 Con- ference on Empirical Methods in Natural Language Processing, Yaser Al-Onaizan, Mohit Bansal, and Yun-Nung Chen (Eds.). Association ...
-
[30]
Taehyun Lee, Seokhee Hong, Jaewoo Ahn, Ilgee Hong, Hwaran Lee, Sangdoo Yun, Jamin Shin, and Gunhee Kim. 2024. Who Wrote this Code? Watermarking for Code Generation. InProceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), Lun-Wei Ku, Andre Martins, and Vivek Srikumar (Eds.). Association for Computa...
-
[31]
Aiwei Liu, Leyi Pan, Xuming Hu, Shiao Meng, and Lijie Wen. 2024. A Semantic Invariant Robust Watermark for Large Language Models. InThe Twelfth Interna- tional Conference on Learning Representations, ICLR 2024, Vienna, Austria, May 7-11, 2024. OpenReview.net. https://openreview.net/forum?id=6p8lpe4MNf
work page 2024
-
[32]
Aiwei Liu, Leyi Pan, Yijian Lu, Jingjing Li, Xuming Hu, Xi Zhang, Lijie Wen, Irwin King, Hui Xiong, and Philip Yu. 2024. A survey of text watermarking in the era of large language models.Comput. Surveys57, 2 (2024), 1–36
work page 2024
-
[33]
Yepeng Liu and Yuheng Bu. 2024. Adaptive text watermark for large language models. InProceedings of the 41st International Conference on Machine Learning (Vienna, Austria)(ICML’24). JMLR.org, Article 1238, 20 pages
work page 2024
-
[34]
Yupei Liu, Yuqi Jia, Runpeng Geng, Jinyuan Jia, and Neil Zhenqiang Gong. 2024. Formalizing and benchmarking prompt injection attacks and defenses. InPro- ceedings of the 33rd USENIX Conference on Security Symposium(Philadelphia, PA, USA)(SEC ’24). USENIX Association, USA, Article 103, 17 pages
work page 2024
-
[35]
Stuart Lloyd. 1982. Least squares quantization in PCM.IEEE transactions on information theory28, 2 (1982), 129–137
work page 1982
-
[36]
Varvara Logacheva, Daryna Dementieva, Sergey Ustyantsev, Daniil Moskovskiy, David Dale, Irina Krotova, Nikita Semenov, and Alexander Panchenko. 2022. ParaDetox: Detoxification with Parallel Data. InProceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). Association for Computational Linguistics, Dub...
work page 2022
-
[37]
Yijian Lu, Aiwei Liu, Dianzhi Yu, Jingjing Li, and Irwin King. 2024. An Entropy- based Text Watermarking Detection Method. InProceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), Lun-Wei Ku, Andre Martins, and Vivek Srikumar (Eds.). Association for Compu- tational Linguistics, Bangkok, Thailand, 1...
-
[38]
David Megías, Minoru Kuribayashi, Andrea Rosales, and Wojciech Mazurczyk
-
[39]
DISSIMILAR: Towards fake news detection using information hiding, signal processing and machine learning. InProceedings of the 16th International Conference on A vailability, Reliability and Security(Vienna, Austria)(ARES ’21). Association for Computing Machinery, New York, NY, USA, Article 66, 9 pages. doi:10.1145/3465481.3470088
-
[40]
George A. Miller. 1995. WordNet: a lexical database for English.Commun. ACM 38, 11 (Nov. 1995), 39–41. doi:10.1145/219717.219748
-
[41]
Niklas Muennighoff, Nouamane Tazi, Loïc Magne, and Nils Reimers. 2022. MTEB: Massive Text Embedding Benchmark.arXiv preprint arXiv:2210.07316(2022). doi:10.48550/ARXIV.2210.07316
work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2210.07316 2022
-
[42]
Leyi Pan, Aiwei Liu, Zhiwei He, Zitian Gao, Xuandong Zhao, Yijian Lu, Binglin Zhou, Shuliang Liu, Xuming Hu, Lijie Wen, Irwin King, and Philip S. Yu. 2024. MarkLLM: An Open-Source Toolkit for LLM Watermarking. InProceedings of the 2024 Conference on Empirical Methods in Natural Language Processing: System Demonstrations, Delia Irazu Hernandez Farias, Tom ...
work page 2024
-
[43]
Qi Pang, Shengyuan Hu, Wenting Zheng, and Virginia Smith. 2024. No Free Lunch in LLM Watermarking: Trade-offs in Watermarking Design Choices. InAdvances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2024, NeurIPS 2024, Vancouver, BC, Canada, December 10 - 15, 2024, Amir Globersons, Lester Mackey, D...
work page 2024
-
[44]
Colin Raffel, Noam Shazeer, Adam Roberts, Katherine Lee, Sharan Narang, Michael Matena, Yanqi Zhou, Wei Li, and Peter J. Liu. 2020. Exploring the limits of transfer learning with a unified text-to-text transformer.J. Mach. Learn. Res.21, 1, Article 140 (Jan. 2020), 67 pages
work page 2020
-
[45]
Victor Sanh, Lysandre Debut, Julien Chaumond, and Thomas Wolf. 2019. Dis- tilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter.ArXiv abs/1910.01108 (2019)
work page internal anchor Pith review Pith/arXiv arXiv 2019
-
[46]
Gemini Team, Petko Georgiev, Ving Ian Lei, Ryan Burnell, Libin Bai, Anmol Gulati, Garrett Tanzer, Damien Vincent, Zhufeng Pan, Shibo Wang, et al. 2024. Gemini 1.5: Unlocking multimodal understanding across millions of tokens of context.arXiv preprint arXiv:2403.05530(2024)
work page internal anchor Pith review Pith/arXiv arXiv 2024
-
[47]
Shangqing Tu, Yuliang Sun, Yushi Bai, Jifan Yu, Lei Hou, and Juanzi Li. 2024. WaterBench: Towards Holistic Evaluation of Watermarks for Large Language Models. InProceedings of the 62nd Annual Meeting of the Association for Computa- tional Linguistics (Volume 1: Long Papers), Lun-Wei Ku, Andre Martins, and Vivek Srikumar (Eds.). Association for Computation...
-
[48]
Yihan Wu, Zhengmian Hu, Junfeng Guo, Hongyang Zhang, and Heng Huang
-
[49]
InProceedings of the 41st International Conference on Machine Learning(Vienna, Austria)(ICML’24)
A resilient and accessible distribution-preserving watermark for large language models. InProceedings of the 41st International Conference on Machine Learning(Vienna, Austria)(ICML’24). JMLR.org, Article 2190, 28 pages
-
[50]
An Yang, Anfeng Li, Baosong Yang, Beichen Zhang, Binyuan Hui, Bo Zheng, Bowen Yu, Chang Gao, Chengen Huang, Chenxu Lv, et al. 2025. Qwen3 technical report.arXiv preprint arXiv:2505.09388(2025)
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[51]
Jifan Yu, Xiaozhi Wang, Shangqing Tu, Shulin Cao, Daniel Zhang-Li, Xin Lv, Hao Peng, Zijun Yao, Xiaohan Zhang, Hanming Li, Chunyang Li, Zheyuan Zhang, Yushi Bai, Yantao Liu, Amy Xin, Kaifeng Yun, Linlu GONG, Nianyi Lin, Jianhui Chen, Zhili Wu, Yunjia Qi, Weikai Li, Yong Guan, Kaisheng Zeng, Ji Qi, Hailong Jin, Jinxin Liu, Yu Gu, Yuan Yao, Ning Ding, Lei H...
work page 2024
-
[52]
Susan Zhang, Stephen Roller, Naman Goyal, Mikel Artetxe, Moya Chen, Shuohui Chen, Christopher Dewan, Mona Diab, Xian Li, Xi Victoria Lin, et al. 2022. Opt: Open pre-trained transformer language models.arXiv preprint arXiv:2205.01068 (2022)
work page internal anchor Pith review Pith/arXiv arXiv 2022
-
[53]
Xuandong Zhao, Yu-Xiang Wang, and Lei Li. 2023. Protecting language genera- tion models via invisible watermarking. InProceedings of the 40th International Conference on Machine Learning(Honolulu, Hawaii, USA)(ICML’23). JMLR.org, Article 1774, 13 pages. DualGuard: Dual-stream Large Language Model Watermarking Defense against Paraphrase and Spoofing Attack...
work page 2023
-
[55]
Modification Requirement: Rewrite the sentences by employing different vocabulary and syntactic constructions to the greatest extent possible, while preserving logical coherence and consistency
-
[56]
Preserve Meaning: Rewrite should convey the same core idea and without omitting or distorting information
-
[57]
Fluency and Grammar: Rewrite must be natural, grammatically correct, and well-structured
-
[61]
Just provide the text after rewriting, without any introductory or concluding phrases
Retains Nuances: Preserve connotations, implied meanings, and idiomatic expressions where appropriate. Just provide the text after rewriting, without any introductory or concluding phrases. Original Text: {Original Text} Figure 9: Paraphrase attack prompt. Piggyback Spoofing Attack Prompt Task Description: Maliciously rewrite the following text to express...
-
[62]
Modification Scope: Restrict modification to the sentence level, rewriting them whenever feasible
-
[63]
Modification Requirement: Maliciously rewrite the sentences by employing different vocabulary and syntactic constructions to the greatest extent possible, and ensure that the modified text has obvious negative and pessimistic content
-
[64]
Fluency and Grammar: Malicious rewrite must be natural, grammatically correct, and well-structured
-
[65]
Appropriate Length: Maintain a similar length unless a slight adjustment improves clarity
-
[66]
Consistency with Context: Retain the original tone and formality (e.g., academic, casual, professional)
-
[67]
Minimal Redundancy: Avoid unnecessary repetition while keeping essential details
-
[68]
Retains Nuances: Preserve connotations, implied meanings, and idiomatic expressions where appropriate. Just provide the text after malicious rewriting, without any introductory or concluding phrases. Original Text: {Original Text} Figure 10: Piggyback spoofing attack prompt. metric, and the max_new_tokens parameter for model gen- eration is set to 16. • S...
work page 2018
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.