arxiv: 2604.27666 · v1 · submitted 2026-04-30 · 💻 cs.CR

Recognition: unknown

VOW: Verifiable and Oblivious Watermark Detection for Large Language Models

Xiaokun Luan , Yihao Zhang , Pengcheng Su , Feiran Lei , Meng Sun

Authors on Pith no claims yet

Pith reviewed 2026-05-07 05:42 UTC · model grok-4.3

classification 💻 cs.CR

keywords LLM watermarkingverifiable watermark detectionoblivious pseudorandom functionprivacy-preserving detectiontwo-party computationtext provenancesecure watermarkingVOPRF

0 comments

The pith

VOW lets users detect LLM watermarks without revealing their text while verifying the provider's result.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces VOW as a protocol that turns watermark detection into a secure two-party computation. It uses a Verifiable Oblivious Pseudorandom Function to let a user and provider jointly check for watermarks without the user sending the text and without the user having to trust the result. Current watermark methods require exposing potentially sensitive content or provide no cryptographic check on the outcome, which limits their use for private documents or regulated content. If the approach holds, watermarking becomes viable in settings where privacy matters, and detection can happen on short texts that dominate real-world queries. The work also reevaluates how well existing watermarks survive paraphrasing when examined under this private protocol.

Core claim

VOW formulates watermark detection as a secure two-party computation problem and instantiates the watermark's core logic with a Verifiable Oblivious Pseudorandom Function. This construction lets the user and provider perform detection such that the user's text is never revealed to the provider and the provider's output is cryptographically verifiable, while the protocol remains efficient enough for short texts.

What carries the argument

Verifiable Oblivious Pseudorandom Function (VOPRF) that encodes the watermark detection logic inside a secure two-party computation protocol

If this is right

Users can obtain detection results without disclosing their text to the provider.
The provider's detection outcome can be checked cryptographically by the user.
The protocol supports short texts at practical speeds.
Watermark robustness can be reassessed in a setting that does not expose the examined text.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same two-party structure could apply to provenance checks for other AI-generated media where input privacy is required.
Providers would need to expose only the VOPRF interface rather than the full watermarking details, changing deployment requirements.
If the efficiency claims hold, watermarking could move from optional research feature to default capability in privacy-sensitive LLM services.

Load-bearing premise

A practical and secure VOPRF can be built for the exact watermark detection logic and will stay efficient and robust on short texts even against modern paraphrasing attacks.

What would settle it

An implementation or proof showing that the VOPRF protocol leaks information about the input text or fails to produce verifiable correct outputs on short paraphrased texts would disprove the privacy, verifiability, and practicality claims.

Figures

Figures reproduced from arXiv: 2604.27666 by Feiran Lei, Meng Sun, Pengcheng Su, Xiaokun Luan, Yihao Zhang.

**Figure 1.** Figure 1: True positive rate (TPR) of different watermarking view at source ↗

**Figure 2.** Figure 2: Calibration of 𝑝-values on non-watermarked samples. The dashed diagonal line represents the ideal uniform distribution under the null hypothesis. and 0.992 at 150 tokens, confirming its suitability for short texts. While RDF’s curve rises faster, this is achieved at a relaxed FPR of 10−2 (1000 times less strict than other methods) on only 200 samples versus 500 for others. Notably, RDF’s TPR drops from 0.… view at source ↗

**Figure 4.** Figure 4: AUC (top) and TPR (bottom) of different watermarking schemes under synonym replacement and paraphrasing view at source ↗

**Figure 5.** Figure 5: Distribution of 𝑝-values for detection results on negative samples and watermarked texts after paraphrasing by GPT-5.1. texts yield higher AUC and TPR than longer texts, contradicting the expected benefit of increased evidence. This suggests that its learning-based detection mechanism fails to generalize robustly after paraphrasing. To visualize the extent of this failure, view at source ↗

**Figure 6.** Figure 6: Trade-off between perplexity and true positive rate at view at source ↗

**Figure 8.** Figure 8: Theoretical false positive rate (FPR) and the ratio view at source ↗

**Figure 9.** Figure 9: End-to-end throughput of VOW under different view at source ↗

**Figure 11.** Figure 11: Total communication overhead of the VOW detec view at source ↗

read the original abstract

Large Language Model (LLM) watermarking is crucial for establishing the provenance of machine-generated text, but most existing methods rely on a centralized trust model. This model forces users to reveal potentially sensitive text to a provider for detection and offers no way to verify the integrity of the result. While asymmetric schemes have been proposed to address these issues, they are either impractical for short texts or lack formal guarantees linking watermark insertion and detection. We propose VOW, a new protocol that achieves both privacy-preserving and cryptographically verifiable watermark detection with high efficiency. Our approach formulates detection as a secure two-party computation problem, instantiating the watermark's core logic with a Verifiable Oblivious Pseudorandom Function (VOPRF). This allows the user and provider to perform detection without the user's text being revealed, while the provider's result is verifiable. Our comprehensive evaluation shows that VOW is practical for short texts and provides a crucial reassessment of watermark robustness against modern paraphrasing attacks.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

VOW tries to turn LLM watermark detection into a private verifiable two-party protocol via VOPRF, but the paper gives no explicit mapping or reduction showing the original detection properties survive the change.

read the letter

The main point on this paper is that it takes the standard LLM watermark detection step and recasts it as a VOPRF-based secure two-party computation so the user never sends the text and the provider's output can be checked. That framing is the actual novelty relative to the asymmetric schemes the abstract criticizes for either failing on short texts or lacking formal ties between insertion and detection. The paper does a clear job stating the centralized trust problem and why obliviousness plus verifiability would help in practice. It also flags a reassessment of watermark robustness under modern paraphrasing, which is worth doing even if the numbers are not yet visible here. Those are the useful pieces. The soft spot is exactly the one the stress-test note flags. Watermark detection works by running a keyed PRF over tokens or n-grams and then applying a statistical test whose false-positive rate and robustness are calibrated to that specific distribution. Replacing the PRF call with a VOPRF requires showing that the blinded inputs, the verifiable output, and the final decision preserve the same score distribution and the same resistance to paraphrasing. The abstract asserts an efficient instantiation exists and that the evaluation confirms practicality for short texts, but it supplies no algorithm, no theorem, and no argument that the cryptographic layer does not shift the error rates or open new attack surfaces. If the VOPRF primitive produces outputs whose statistics differ from the native watermark PRF, the claimed robustness reassessment does not automatically carry over. The full text apparently contains the protocol and evaluation, yet the absence of a reduction or even a high-level argument that the mapping is faithful leaves the central claim unanchored. This paper is for people working on AI provenance and cryptographic protocols for ML. A reader who already follows watermarking papers will see the high-level idea and the gap it targets. It is worth sending to peer review because the problem is concrete and the direction is plausible, but any referee will need the concrete VOPRF instantiation, the security argument, and the raw detection numbers before the practicality claim can be assessed.

Referee Report

3 major / 2 minor

Summary. The paper proposes VOW, a new protocol for LLM watermark detection that formulates the task as a secure two-party computation. It instantiates the core watermark logic using a Verifiable Oblivious Pseudorandom Function (VOPRF) so that detection can be performed without the user revealing their text to the provider and so that the provider's output is cryptographically verifiable. The authors claim the construction is practical for short texts, achieves both obliviousness and verifiability, and includes an evaluation that reassesses watermark robustness against modern paraphrasing attacks.

Significance. If the VOPRF-based reduction is sound and the efficiency claims hold, the work would be significant for privacy-preserving and verifiable provenance in LLMs. It directly addresses the centralized trust model that currently forces users to reveal potentially sensitive text, and the cryptographic framing could provide formal guarantees that prior asymmetric schemes lack. The reassessment of paraphrasing robustness is also a useful contribution if supported by concrete data.

major comments (3)

[§4] §4 (Protocol Construction): The central claim that the watermark detection logic (statistical test over PRF-evaluated tokens) can be exactly expressed as a VOPRF evaluation lacks an explicit security reduction to the VOPRF assumption together with the base watermark security. Without this reduction it is impossible to verify that obliviousness and verifiability are achieved without degrading the original false-positive rate or robustness for short texts.
[§5] §5 (Evaluation): The paper asserts that VOW remains practical for short texts and provides a reassessment of robustness against paraphrasing attacks, yet no tables, figures, or quantitative comparison to the non-oblivious baseline are referenced. This is load-bearing for the practicality claim.
[§3] §3 (VOPRF Instantiation): The description of how the watermark's native PRF is mapped to the chosen VOPRF primitive (RSA-OPRF, EC-based, etc.) does not address whether the output distribution preserves the statistical test used for detection; any mismatch would invalidate the claimed robustness guarantees.

minor comments (2)

[Abstract] The abstract states 'high efficiency' and 'practical for short texts' without any concrete metrics (runtime, communication, or accuracy numbers); these should be quantified in the introduction or evaluation summary.
[§2] Notation for the watermark key, VOPRF inputs/outputs, and the detection threshold should be introduced consistently in §2 before being used in the protocol description.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for their insightful comments. Below we provide point-by-point responses to the major comments and outline the revisions we will make to the manuscript.

read point-by-point responses

Referee: [§4] §4 (Protocol Construction): The central claim that the watermark detection logic (statistical test over PRF-evaluated tokens) can be exactly expressed as a VOPRF evaluation lacks an explicit security reduction to the VOPRF assumption together with the base watermark security. Without this reduction it is impossible to verify that obliviousness and verifiability are achieved without degrading the original false-positive rate or robustness for short texts.

Authors: We appreciate the referee's emphasis on formal security. The VOW protocol in §4 is constructed so that the VOPRF directly computes the PRF evaluations required for the statistical test, thereby preserving the exact detection logic. We concede that an explicit reduction was not detailed in the submission. In the revised manuscript, we will add a formal security analysis subsection to §4. This will include a reduction showing that VOW's obliviousness and verifiability properties are secure under the VOPRF assumption and the security of the base watermarking scheme, with no degradation to the false-positive rate or robustness for short texts. The proof will be based on the standard simulation paradigm for oblivious PRFs. revision: yes
Referee: [§5] §5 (Evaluation): The paper asserts that VOW remains practical for short texts and provides a reassessment of robustness against paraphrasing attacks, yet no tables, figures, or quantitative comparison to the non-oblivious baseline are referenced. This is load-bearing for the practicality claim.

Authors: The evaluation in §5 does contain quantitative results demonstrating practicality for short texts and robustness reassessment. However, we agree that direct references to tables and figures comparing VOW to the non-oblivious baseline were not prominently included in the narrative. We will revise §5 to explicitly reference and discuss the relevant tables (efficiency metrics) and figures (robustness curves), including side-by-side comparisons of accuracy and overhead. This will better support the practicality claims with visible data. revision: yes
Referee: [§3] §3 (VOPRF Instantiation): The description of how the watermark's native PRF is mapped to the chosen VOPRF primitive (RSA-OPRF, EC-based, etc.) does not address whether the output distribution preserves the statistical test used for detection; any mismatch would invalidate the claimed robustness guarantees.

Authors: We selected an EC-based VOPRF instantiation in §3 because its output distribution is statistically indistinguishable from a random function, matching the requirements of the original watermark's PRF. Nevertheless, the manuscript does not explicitly analyze the impact on the statistical test. We will update §3 with a paragraph explaining that the VOPRF output is pseudorandom and uniformly distributed, preserving the p-value calculations and thus the false-positive rate and robustness. If necessary, we will note any negligible statistical distance and its effect. revision: yes

Circularity Check

0 steps flagged

No circularity in VOW protocol derivation

full rationale

The paper proposes a new protocol VOW by formulating detection as secure two-party computation using VOPRF. This is a constructive approach that does not rely on self-definitional loops, fitted predictions, or load-bearing self-citations. No equations or claims in the provided abstract reduce the result to its inputs by construction. The evaluation of practicality for short texts is presented as empirical, not tautological. Thus, the derivation chain is self-contained without circular elements.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim rests on the cryptographic security and efficiency of VOPRF for the watermark logic, plus the assumption that detection can be securely formulated as two-party computation without revealing text.

axioms (1)

domain assumption VOPRF provides obliviousness (hides input from evaluator) and verifiability (allows proof of correct output) under standard cryptographic assumptions.
Invoked when the abstract states the protocol instantiates watermark logic with VOPRF to achieve privacy and verifiability.

pith-pipeline@v0.9.0 · 5479 in / 1243 out tokens · 26128 ms · 2026-05-07T05:42:13.474962+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

54 extracted references · 25 canonical work pages · 7 internal anchors

[1]

Maryam Amirizaniani, Elias Martin, Tanya Roosta, Aman Chadha, and Chirag Shah. 2024. AuditLLM: A Tool for Auditing Large Language Models Using Multiprobe Approach(CIKM ’24). Association for Computing Machinery, New York, NY, USA, 5174–5179. doi:10.1145/3627673.3679222

work page doi:10.1145/3627673.3679222 2024
[2]

Dan Boneh. 1998. The Decision Diffie–Hellman problem. InAlgorithmic Number Theory, Vol. 1423. Springer, 48–63. doi:10.1007/BFb0054851

work page doi:10.1007/bfb0054851 1998
[3]

Will Cai, Tianneng Shi, Xuandong Zhao, and Dawn Song. 2025. Are You Getting What You Pay For? Auditing Model Substitution in LLM APIs. InNeurIPS 2025 Workshop on Regulatable ML. https://openreview.net/forum?id=thhrtv9P0s

2025
[4]

Silvia Casacuberta, Julia Hesse, and Anja Lehmann. 2022. SoK: Oblivious Pseu- dorandom Functions. In2022 IEEE 7th European Symposium on Security and Privacy (EuroS&P). IEEE Computer Society, Los Alamitos, CA, USA, 625–646. doi:10.1109/EuroSP53844.2022.00045

work page doi:10.1109/eurosp53844.2022.00045 2022
[5]

Pedersen

David Chaum and Torben P. Pedersen. 1992. Wallet Databases with Observers. In Proceedings of the 12th Annual International Cryptology Conference on Advances in Cryptology. Springer, 89–105

1992
[6]

Mark Chen, Jerry Tworek, Heewoo Jun, Qiming Yuan, Henrique Ponde de Oliveira Pinto, Jared Kaplan, Harri Edwards, Yuri Burda, Nicholas Joseph, Greg Brockman, Alex Ray, Raul Puri, Gretchen Krueger, Michael Petrov, Heidy Khlaaf, Girish Sastry, Pamela Mishkin, Brooke Chan, Scott Gray, Nick Ryder, Mikhail Pavlov, Alethea Power, Lukasz Kaiser, Mohammad Bavarian...

work page internal anchor Pith review arXiv 2021
[7]

Miranda Christ and Sam Gunn. 2024. Pseudorandom Error-Correcting Codes. InAdvances in Cryptology – CRYPTO 2024, Leonid Reyzin and Douglas Stebila (Eds.). Springer Nature Switzerland, Cham, 325–347

2024
[8]

Miranda Christ, Sam Gunn, and Or Zamir. 2024. Undetectable Watermarks for Language Models. InProceedings of Thirty Seventh Conference on Learning Theory (Proceedings of Machine Learning Research, Vol. 247), Shipra Agrawal and Aaron Roth (Eds.). PMLR, 1125–1139. https://proceedings.mlr.press/v247/christ24a.html

2024
[9]

Karl Cobbe, Vineet Kosaraju, Mohammad Bavarian, Mark Chen, Heewoo Jun, Lukasz Kaiser, Matthias Plappert, Jerry Tworek, Jacob Hilton, Reiichiro Nakano, Christopher Hesse, and John Schulman. 2021. Training Verifiers to Solve Math Word Problems. arXiv:2110.14168 [cs.LG] https://arxiv.org/abs/2110.14168

work page internal anchor Pith review arXiv 2021
[10]

Xinyue Cui, Johnny Wei, Swabha Swayamdipta, and Robin Jia. 2025. Robust Data Watermarking in Language Models by Injecting Fictitious Knowledge. In Findings of the Association for Computational Linguistics: ACL 2025, Wanxiang Che, Joyce Nabende, Ekaterina Shutova, and Mohammad Taher Pilehvar (Eds.). Association for Computational Linguistics, Vienna, Austri...

2025
[11]

Sumanth Dathathri, Abigail See, Sumedh Ghaisas, Po-Sen Huang, Rob McAdam, Johannes Welbl, Vandana Bachani, Alex Kaskasoli, Robert Stanforth, Tatiana Matejovicova, Jamie Hayes, Nidhi Vyas, Majd Al Merey, Jonah Brown-Cohen, Rudy Bunel, Borja Balle, Taylan Cemgil, Zahra Ahmed, Kitty Stacpoole, Ilia Shumailov, Ciprian Baetu, Sven Gowal, Demis Hassabis, and Pu...
[12]

Scalable watermarking for identifying large language model outputs

Scalable watermarking for identifying large language model outputs. Nature634 (Oct. 2024), 818–823. doi:10.1038/s41586-024-08025-4

work page doi:10.1038/s41586-024-08025-4 2024
[13]

Alex Davidson, Armando Faz-Hernandez, Nick Sullivan, and Christopher A. Wood. 2023. Oblivious Pseudorandom Functions (OPRFs) Using Prime-Order Groups. RFC 9497. doi:10.17487/RFC9497

work page doi:10.17487/rfc9497 2023
[14]

Alex Davidson, Ian Goldberg, Nick Sullivan, George Tankersley, and Filippo Valsorda. 2018. Privacy Pass: Bypassing Internet Challenges Anonymously. Proceedings on Privacy Enhancing Technologies2018 (2018), 164 – 180. doi:10. 1515/POPETS-2018-0026

2018
[15]

European Parliament and Council. 2024. The EU AI Act. Official Journal of the European Union, L 2024/1689. https://artificialintelligenceact.eu/the-act/ Official full title: Regulation (EU) 2024/1689 of the European Parliament and of the Council of 13 June 2024 laying down harmonised rules on artificial intelligence and amending Regulations (EC) No 300/20...

2024
[16]

Jaiden Fairoze, Sanjam Garg, Somesh Jha, Saeed Mahloujifar, Mohammad Mahmoody, and Mingyuan Wang. 2023. Publicly-Detectable Watermarking for Language Models. Cryptology ePrint Archive, Paper 2023/1661. https: //eprint.iacr.org/2023/1661

2023
[17]

Angela Fan, Yacine Jernite, Ethan Perez, David Grangier, Jason Weston, and Michael Auli. 2019. ELI5: Long Form Question Answering. InProceedings of the 57th Annual Meeting of the Association for Computational Linguistics, Anna Ko- rhonen, David Traum, and Lluís Màrquez (Eds.). Association for Computational Linguistics, Florence, Italy, 3558–3567. doi:10.1...

work page doi:10.18653/v1/p19-1346 2019
[18]

Pierre Fernandez, Antoine Chaffin, Karim Tit, Vivien Chappelier, and Teddy Furon. 2023. Three Bricks to Consolidate Watermarks for Large Language Models. In2023 IEEE International Workshop on Information Forensics and Security (WIFS). 1–6. doi:10.1109/WIFS58808.2023.10374576

work page doi:10.1109/wifs58808.2023.10374576 2023
[19]

Shafi Goldwasser, Silvio Micali, and Charles Rackoff. 1985. The knowledge complexity of interactive proof-systems. InProceedings of the 17th Annual ACM Symposium on Theory of Computing. Association for Computing Machinery, 291–304. doi:10.1145/22145.22178

work page doi:10.1145/22145.22178 1985
[20]

Chenchen Gu, Xiang Lisa Li, Rohith Kuditipudi, Percy Liang, and Tatsunori Hashimoto. 2025. Auditing Prompt Caching in Language Model APIs. InForty- second International Conference on Machine Learning. https://openreview.net/ forum?id=gUj2fxQcLZ

2025
[21]

Junfeng Guo, Yiming Li, Lixu Wang, Shu-Tao Xia, Heng Huang, Cong Liu, and Bo Li. 2023. Domain watermark: effective and harmless dataset copyright pro- tection is closed at hand. InProceedings of the 37th International Conference on Neural Information Processing Systems(New Orleans, LA, USA)(NIPS ’23). Curran Associates Inc., Red Hook, NY, USA, Article 237...

2023
[22]

Le Quan Ha, E. I. Sicilia-Garcia, Ji Ming, and F. J. Smith. 2002. Extension of Zipf’s law to words and phrases. InProceedings of the 19th International Conference on Computational Linguistics - Volume 1(Taipei, Taiwan)(COLING ’02). Association for Computational Linguistics, USA, 1–6. doi:10.3115/1072228.1072345

work page doi:10.3115/1072228.1072345 2002
[23]

Abe Hou, Jingyu Zhang, Tianxing He, Yichen Wang, Yung-Sung Chuang, Hong- wei Wang, Lingfeng Shen, Benjamin Van Durme, Daniel Khashabi, and Yulia Tsvetkov. 2024. SemStamp: A Semantic Watermark with Paraphrastic Robustness for Text Generation. InProceedings of the 2024 Conference of the North Ameri- can Chapter of the Association for Computational Linguisti...

work page doi:10.18653/v1/2024.naacl-long.226 2024
[24]

Abe Hou, Jingyu Zhang, Yichen Wang, Daniel Khashabi, and Tianxing He. 2024. k-SemStamp: A Clustering-Based Semantic Watermark for Detection of Machine- Generated Text. InFindings of the Association for Computational Linguistics: ACL 2024, Lun-Wei Ku, Andre Martins, and Vivek Srikumar (Eds.). Association for Computational Linguistics, Bangkok, Thailand, 17...

work page doi:10.18653/v1/2024 2024
[25]

Stanislaw Jarecki, Aggelos Kiayias, and Hugo Krawczyk. 2014. Round-Optimal Password-Protected Secret Sharing and T-PAKE in the Password-Only Model. In Advances in Cryptology – ASIACRYPT 2014, Palash Sarkar and Tetsu Iwata (Eds.). Springer Berlin Heidelberg, Berlin, Heidelberg, 233–253

2014
[26]

Daniel Kang, Tatsunori Hashimoto, Ion Stoica, and Yi Sun. 2022. Scaling up Trustless DNN Inference with Zero-Knowledge Proofs. arXiv:2210.08674 [cs.CR] https://arxiv.org/abs/2210.08674

work page arXiv 2022
[27]

John Kirchenbauer, Jonas Geiping, Yuxin Wen, Jonathan Katz, Ian Miers, and Tom Goldstein. 2023. A Watermark for Large Language Models. InProceedings of the 40th International Conference on Machine Learning (Proceedings of Machine Learning Research, Vol. 202), Andreas Krause, Emma Brunskill, Kyunghyun Cho, Barbara Engelhardt, Sivan Sabato, and Jonathan Sca...

2023
[28]

John Kirchenbauer, Jonas Geiping, Yuxin Wen, Manli Shu, Khalid Saifullah, Kezhi Kong, Kasun Fernando, Aniruddha Saha, Micah Goldblum, and Tom Goldstein
[29]

InThe Twelfth International Conference on Learning Representations

On the Reliability of Watermarks for Large Language Models. InThe Twelfth International Conference on Learning Representations. https://openreview. net/forum?id=DEJIDCmWOz
[30]

Lea Kissner and Dawn Xiaodong Song. 2005. Privacy-Preserving Set Operations. InAdvances in Cryptology – CRYPTO 2005, Vol. 3621. Springer, 241–257. doi:10. 1007/11535218_15

2005
[31]

Rohith Kuditipudi, John Thickstun, Tatsunori Hashimoto, and Percy Liang. 2024. Robust Distortion-free Watermarks for Language Models.Transactions on Ma- chine Learning Research(2024). https://openreview.net/forum?id=FpaCL1MO2C

2024
[32]

Zilong Lin, Jian Cui, Xiaojing Liao, and XiaoFeng Wang. 2024. Malla: Demysti- fying Real-world Large Language Model Integrated Malicious Services. In33rd USENIX Security Symposium (USENIX Security 24). USENIX Association, Philadel- phia, PA, 4693–4710. https://www.usenix.org/conference/usenixsecurity24/ presentation/lin-zilong

2024
[33]

Aiwei Liu, Leyi Pan, Xuming Hu, Shuang Li, Lijie Wen, Irwin King, and Philip S. Yu. 2024. An Unforgeable Publicly Verifiable Watermark for Large Language Models. InThe Twelfth International Conference on Learning Representations. https://openreview.net/forum?id=gMLQwKDY3N

2024
[34]

Aiwei Liu, Leyi Pan, Yijian Lu, Jingjing Li, Xuming Hu, Xi Zhang, Lijie Wen, Irwin King, Hui Xiong, and Philip Yu. 2024. A Survey of Text Watermarking in the Era of Large Language Models.ACM Comput. Surv.57, 2, Article 47 (Nov. 2024), 36 pages. doi:10.1145/3691626

work page doi:10.1145/3691626 2024
[35]

Yepeng Liu and Yuheng Bu. 2024. Adaptive text watermark for large language models. InProceedings of the 41st International Conference on Machine Learning (Vienna, Austria)(ICML’24). JMLR.org, Article 1238, 20 pages

2024
[36]

Meta. 2023. Llama 2: Open Foundation and Fine-Tuned Chat Models. arXiv:2307.09288. arXiv:2307.09288 [cs.CL] https://arxiv.org/abs/2307.09288

work page internal anchor Pith review arXiv 2023
[37]

OpenAI. 2024. GPT-4 Technical Report. arXiv:2303.08774. arXiv:2303.08774 [cs.CL] https://arxiv.org/abs/2303.08774

work page internal anchor Pith review arXiv 2024
[38]

Qwen. 2025. Qwen2.5 Technical Report. arXiv:2412.15115. arXiv:2412.15115 [cs.CL] https://arxiv.org/abs/2412.15115

work page internal anchor Pith review arXiv 2025
[39]

Colin Raffel, Noam Shazeer, Adam Roberts, Katherine Lee, Sharan Narang, Michael Matena, Yanqi Zhou, Wei Li, and Peter J. Liu. 2020. Exploring the limits of transfer learning with a unified text-to-text transformer.J. Mach. Learn. Res.21, 1, Article 140 (Jan. 2020), 67 pages

2020
[40]

Jie Ren, Han Xu, Yiding Liu, Yingqian Cui, Shuaiqiang Wang, Dawei Yin, and Jiliang Tang. 2024. A Robust Semantics-based Watermark for Large Language Model against Paraphrasing. InFindings of the Association for Computational Linguistics: NAACL 2024, Kevin Duh, Helena Gomez, and Steven Bethard (Eds.). Association for Computational Linguistics, Mexico City,...

2024
[41]

Victor Sanh, Lysandre Debut, Julien Chaumond, and Thomas Wolf. 2020. DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter. arXiv:1910.01108 [cs.CL] https://arxiv.org/abs/1910.01108

work page internal anchor Pith review arXiv 2020
[42]

Machine behaviour

Ilia Shumailov, Zakhar Shumaylov, Yiren Zhao, Nicolas Papernot, Ross J. Ander- son, and Yarin Gal. 2024. AI models collapse when trained on recursively gener- ated data.Nature631, 8022 (July 2024), 755–759. https://doi.org/10.1038/s41586- 024-07566-y

work page doi:10.1038/s41586- 2024
[43]

Yanshen Sun, Jianfeng He, Limeng Cui, Shuo Lei, and Chang-Tien Lu. 2024. Exploring the Deceptive Power of LLM-Generated Fake News: A Study of Real- World Detection Challenges. arXiv:2403.18249 [cs.CL] https://arxiv.org/abs/ 2403.18249

work page arXiv 2024
[44]

The White House. 2023. Executive Order 14110: Safe, Secure, and Trustworthy De- velopment and Use of Artificial Intelligence. Federal Register, 75191–75226 pages. https://www.federalregister.gov/d/2023-24283 Signed on October 30, 2023

2023
[45]

Yihan Wu, Zhengmian Hu, Junfeng Guo, Hongyang Zhang, and Heng Huang
[46]

InProceedings of the 41st International Conference on Machine Learning(Vienna, Austria)(ICML’24)

A resilient and accessible distribution-preserving watermark for large language models. InProceedings of the 41st International Conference on Machine Learning(Vienna, Austria)(ICML’24). JMLR.org, Article 2190, 28 pages
[47]

Yixin Wu, Ziqing Yang, Yun Shen, Michael Backes, and Yang Zhang. 2025. Syn- thetic artifact auditing: tracing LLM-generated synthetic data usage in down- stream applications. InProceedings of the 34th USENIX Conference on Security Symposium(Seattle, WA, USA)(SEC ’25). USENIX Association, USA, Article 88, 20 pages

2025
[48]

Andrew C. Yao. 1982. Protocols for secure computations. InProceedings of the 23rd Annual Symposium on Foundations of Computer Science (FOCS). IEEE, 160–164. doi:10.1109/SFCS.1982.38

work page doi:10.1109/sfcs.1982.38 1982
[49]

Ruisi Zhang, Shehzeen Samarah Hussain, Paarth Neekhara, and Farinaz Koushan- far. 2024. REMARK-LLM: a robust and efficient watermarking framework for generative large language models. InProceedings of the 33rd USENIX Conference on Security Symposium(Philadelphia, PA, USA)(SEC ’24). USENIX Association, USA, Article 102, 18 pages

2024
[50]

Xuandong Zhao, Prabhanjan Vijendra Ananth, Lei Li, and Yu-Xiang Wang. 2024. Provable Robust Watermarking for AI-Generated Text. InThe Twelfth Interna- tional Conference on Learning Representations. https://openreview.net/forum? id=SsmT8aO45L

2024
[51]

Xuandong Zhao, Sam Gunn, Miranda Christ, Jaiden Fairoze, Andres Fabrega, Nicholas Carlini, Sanjam Garg, Sanghyun Hong, Milad Nasr, Florian Tramer, Somesh Jha, Lei Li, Yu-Xiang Wang, and Dawn Song. 2025. SoK: Watermarking for AI-Generated Content . In2025 IEEE Symposium on Security and Privacy (SP). IEEE Computer Society, Los Alamitos, CA, USA, 2621–2639. ...

work page doi:10.1109/sp61157 2025
[52]

Xuandong Zhao, Yu-Xiang Wang, and Lei Li. 2023. Protecting language genera- tion models via invisible watermarking. InProceedings of the 40th International Conference on Machine Learning(Honolulu, Hawaii, USA)(ICML’23). JMLR.org, Article 1774, 13 pages

2023
[53]

Chaoyi Zhu, Jeroen Galjaard, Pin-Yu Chen, and Lydia Chen. 2024. Duwak: Dual Watermarks in Large Language Models. InFindings of the Association for Computational Linguistics: ACL 2024, Lun-Wei Ku, Andre Martins, and Vivek Srikumar (Eds.). Association for Computational Linguistics, Bangkok, Thailand, 11416–11436. doi:10.18653/v1/2024.findings-acl.678

work page doi:10.18653/v1/2024.findings-acl.678 2024
[54]

Auditing Black-Box LLM APIs with a Rank-Based Uniformity Test

Xiaoyuan Zhu, Yaowen Ye, Tianyi Qiu, Hanlin Zhu, Sijun Tan, Ajraf Mannan, Jonathan Michala, Raluca Ada Popa, and Willie Neiswanger. 2025. Auditing Black- Box LLM APIs with a Rank-Based Uniformity Test. arXiv:2506.06975 [cs.CR] https://arxiv.org/abs/2506.06975 A Proof of Correctness for Unbiased Sampling We prove the described rejection sampling algorithm ...

work page internal anchor Pith review Pith/arXiv arXiv 2025