pith. machine review for the scientific record. sign in

arxiv: 2604.27666 · v1 · submitted 2026-04-30 · 💻 cs.CR

Recognition: unknown

VOW: Verifiable and Oblivious Watermark Detection for Large Language Models

Authors on Pith no claims yet

Pith reviewed 2026-05-07 05:42 UTC · model grok-4.3

classification 💻 cs.CR
keywords LLM watermarkingverifiable watermark detectionoblivious pseudorandom functionprivacy-preserving detectiontwo-party computationtext provenancesecure watermarkingVOPRF
0
0 comments X

The pith

VOW lets users detect LLM watermarks without revealing their text while verifying the provider's result.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces VOW as a protocol that turns watermark detection into a secure two-party computation. It uses a Verifiable Oblivious Pseudorandom Function to let a user and provider jointly check for watermarks without the user sending the text and without the user having to trust the result. Current watermark methods require exposing potentially sensitive content or provide no cryptographic check on the outcome, which limits their use for private documents or regulated content. If the approach holds, watermarking becomes viable in settings where privacy matters, and detection can happen on short texts that dominate real-world queries. The work also reevaluates how well existing watermarks survive paraphrasing when examined under this private protocol.

Core claim

VOW formulates watermark detection as a secure two-party computation problem and instantiates the watermark's core logic with a Verifiable Oblivious Pseudorandom Function. This construction lets the user and provider perform detection such that the user's text is never revealed to the provider and the provider's output is cryptographically verifiable, while the protocol remains efficient enough for short texts.

What carries the argument

Verifiable Oblivious Pseudorandom Function (VOPRF) that encodes the watermark detection logic inside a secure two-party computation protocol

If this is right

  • Users can obtain detection results without disclosing their text to the provider.
  • The provider's detection outcome can be checked cryptographically by the user.
  • The protocol supports short texts at practical speeds.
  • Watermark robustness can be reassessed in a setting that does not expose the examined text.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same two-party structure could apply to provenance checks for other AI-generated media where input privacy is required.
  • Providers would need to expose only the VOPRF interface rather than the full watermarking details, changing deployment requirements.
  • If the efficiency claims hold, watermarking could move from optional research feature to default capability in privacy-sensitive LLM services.

Load-bearing premise

A practical and secure VOPRF can be built for the exact watermark detection logic and will stay efficient and robust on short texts even against modern paraphrasing attacks.

What would settle it

An implementation or proof showing that the VOPRF protocol leaks information about the input text or fails to produce verifiable correct outputs on short paraphrased texts would disprove the privacy, verifiability, and practicality claims.

Figures

Figures reproduced from arXiv: 2604.27666 by Feiran Lei, Meng Sun, Pengcheng Su, Xiaokun Luan, Yihao Zhang.

Figure 1
Figure 1. Figure 1: True positive rate (TPR) of different watermarking view at source ↗
Figure 2
Figure 2. Figure 2: Calibration of 𝑝-values on non-watermarked sam￾ples. The dashed diagonal line represents the ideal uniform distribution under the null hypothesis. and 0.992 at 150 tokens, confirming its suitability for short texts. While RDF’s curve rises faster, this is achieved at a relaxed FPR of 10−2 (1000 times less strict than other methods) on only 200 samples versus 500 for others. Notably, RDF’s TPR drops from 0.… view at source ↗
Figure 4
Figure 4. Figure 4: AUC (top) and TPR (bottom) of different watermarking schemes under synonym replacement and paraphrasing view at source ↗
Figure 5
Figure 5. Figure 5: Distribution of 𝑝-values for detection results on negative samples and watermarked texts after paraphrasing by GPT-5.1. texts yield higher AUC and TPR than longer texts, contradicting the expected benefit of increased evidence. This suggests that its learning-based detection mechanism fails to generalize robustly after paraphrasing. To visualize the extent of this failure, view at source ↗
Figure 6
Figure 6. Figure 6: Trade-off between perplexity and true positive rate at view at source ↗
Figure 8
Figure 8. Figure 8: Theoretical false positive rate (FPR) and the ratio view at source ↗
Figure 9
Figure 9. Figure 9: End-to-end throughput of VOW under different view at source ↗
Figure 11
Figure 11. Figure 11: Total communication overhead of the VOW detec view at source ↗
read the original abstract

Large Language Model (LLM) watermarking is crucial for establishing the provenance of machine-generated text, but most existing methods rely on a centralized trust model. This model forces users to reveal potentially sensitive text to a provider for detection and offers no way to verify the integrity of the result. While asymmetric schemes have been proposed to address these issues, they are either impractical for short texts or lack formal guarantees linking watermark insertion and detection. We propose VOW, a new protocol that achieves both privacy-preserving and cryptographically verifiable watermark detection with high efficiency. Our approach formulates detection as a secure two-party computation problem, instantiating the watermark's core logic with a Verifiable Oblivious Pseudorandom Function (VOPRF). This allows the user and provider to perform detection without the user's text being revealed, while the provider's result is verifiable. Our comprehensive evaluation shows that VOW is practical for short texts and provides a crucial reassessment of watermark robustness against modern paraphrasing attacks.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

3 major / 2 minor

Summary. The paper proposes VOW, a new protocol for LLM watermark detection that formulates the task as a secure two-party computation. It instantiates the core watermark logic using a Verifiable Oblivious Pseudorandom Function (VOPRF) so that detection can be performed without the user revealing their text to the provider and so that the provider's output is cryptographically verifiable. The authors claim the construction is practical for short texts, achieves both obliviousness and verifiability, and includes an evaluation that reassesses watermark robustness against modern paraphrasing attacks.

Significance. If the VOPRF-based reduction is sound and the efficiency claims hold, the work would be significant for privacy-preserving and verifiable provenance in LLMs. It directly addresses the centralized trust model that currently forces users to reveal potentially sensitive text, and the cryptographic framing could provide formal guarantees that prior asymmetric schemes lack. The reassessment of paraphrasing robustness is also a useful contribution if supported by concrete data.

major comments (3)
  1. [§4] §4 (Protocol Construction): The central claim that the watermark detection logic (statistical test over PRF-evaluated tokens) can be exactly expressed as a VOPRF evaluation lacks an explicit security reduction to the VOPRF assumption together with the base watermark security. Without this reduction it is impossible to verify that obliviousness and verifiability are achieved without degrading the original false-positive rate or robustness for short texts.
  2. [§5] §5 (Evaluation): The paper asserts that VOW remains practical for short texts and provides a reassessment of robustness against paraphrasing attacks, yet no tables, figures, or quantitative comparison to the non-oblivious baseline are referenced. This is load-bearing for the practicality claim.
  3. [§3] §3 (VOPRF Instantiation): The description of how the watermark's native PRF is mapped to the chosen VOPRF primitive (RSA-OPRF, EC-based, etc.) does not address whether the output distribution preserves the statistical test used for detection; any mismatch would invalidate the claimed robustness guarantees.
minor comments (2)
  1. [Abstract] The abstract states 'high efficiency' and 'practical for short texts' without any concrete metrics (runtime, communication, or accuracy numbers); these should be quantified in the introduction or evaluation summary.
  2. [§2] Notation for the watermark key, VOPRF inputs/outputs, and the detection threshold should be introduced consistently in §2 before being used in the protocol description.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for their insightful comments. Below we provide point-by-point responses to the major comments and outline the revisions we will make to the manuscript.

read point-by-point responses
  1. Referee: [§4] §4 (Protocol Construction): The central claim that the watermark detection logic (statistical test over PRF-evaluated tokens) can be exactly expressed as a VOPRF evaluation lacks an explicit security reduction to the VOPRF assumption together with the base watermark security. Without this reduction it is impossible to verify that obliviousness and verifiability are achieved without degrading the original false-positive rate or robustness for short texts.

    Authors: We appreciate the referee's emphasis on formal security. The VOW protocol in §4 is constructed so that the VOPRF directly computes the PRF evaluations required for the statistical test, thereby preserving the exact detection logic. We concede that an explicit reduction was not detailed in the submission. In the revised manuscript, we will add a formal security analysis subsection to §4. This will include a reduction showing that VOW's obliviousness and verifiability properties are secure under the VOPRF assumption and the security of the base watermarking scheme, with no degradation to the false-positive rate or robustness for short texts. The proof will be based on the standard simulation paradigm for oblivious PRFs. revision: yes

  2. Referee: [§5] §5 (Evaluation): The paper asserts that VOW remains practical for short texts and provides a reassessment of robustness against paraphrasing attacks, yet no tables, figures, or quantitative comparison to the non-oblivious baseline are referenced. This is load-bearing for the practicality claim.

    Authors: The evaluation in §5 does contain quantitative results demonstrating practicality for short texts and robustness reassessment. However, we agree that direct references to tables and figures comparing VOW to the non-oblivious baseline were not prominently included in the narrative. We will revise §5 to explicitly reference and discuss the relevant tables (efficiency metrics) and figures (robustness curves), including side-by-side comparisons of accuracy and overhead. This will better support the practicality claims with visible data. revision: yes

  3. Referee: [§3] §3 (VOPRF Instantiation): The description of how the watermark's native PRF is mapped to the chosen VOPRF primitive (RSA-OPRF, EC-based, etc.) does not address whether the output distribution preserves the statistical test used for detection; any mismatch would invalidate the claimed robustness guarantees.

    Authors: We selected an EC-based VOPRF instantiation in §3 because its output distribution is statistically indistinguishable from a random function, matching the requirements of the original watermark's PRF. Nevertheless, the manuscript does not explicitly analyze the impact on the statistical test. We will update §3 with a paragraph explaining that the VOPRF output is pseudorandom and uniformly distributed, preserving the p-value calculations and thus the false-positive rate and robustness. If necessary, we will note any negligible statistical distance and its effect. revision: yes

Circularity Check

0 steps flagged

No circularity in VOW protocol derivation

full rationale

The paper proposes a new protocol VOW by formulating detection as secure two-party computation using VOPRF. This is a constructive approach that does not rely on self-definitional loops, fitted predictions, or load-bearing self-citations. No equations or claims in the provided abstract reduce the result to its inputs by construction. The evaluation of practicality for short texts is presented as empirical, not tautological. Thus, the derivation chain is self-contained without circular elements.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim rests on the cryptographic security and efficiency of VOPRF for the watermark logic, plus the assumption that detection can be securely formulated as two-party computation without revealing text.

axioms (1)
  • domain assumption VOPRF provides obliviousness (hides input from evaluator) and verifiability (allows proof of correct output) under standard cryptographic assumptions.
    Invoked when the abstract states the protocol instantiates watermark logic with VOPRF to achieve privacy and verifiability.

pith-pipeline@v0.9.0 · 5479 in / 1243 out tokens · 26128 ms · 2026-05-07T05:42:13.474962+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

54 extracted references · 25 canonical work pages · 7 internal anchors

  1. [1]

    Maryam Amirizaniani, Elias Martin, Tanya Roosta, Aman Chadha, and Chirag Shah. 2024. AuditLLM: A Tool for Auditing Large Language Models Using Multiprobe Approach(CIKM ’24). Association for Computing Machinery, New York, NY, USA, 5174–5179. doi:10.1145/3627673.3679222

  2. [2]

    Dan Boneh. 1998. The Decision Diffie–Hellman problem. InAlgorithmic Number Theory, Vol. 1423. Springer, 48–63. doi:10.1007/BFb0054851

  3. [3]

    Will Cai, Tianneng Shi, Xuandong Zhao, and Dawn Song. 2025. Are You Getting What You Pay For? Auditing Model Substitution in LLM APIs. InNeurIPS 2025 Workshop on Regulatable ML. https://openreview.net/forum?id=thhrtv9P0s

  4. [4]

    Silvia Casacuberta, Julia Hesse, and Anja Lehmann. 2022. SoK: Oblivious Pseu- dorandom Functions. In2022 IEEE 7th European Symposium on Security and Privacy (EuroS&P). IEEE Computer Society, Los Alamitos, CA, USA, 625–646. doi:10.1109/EuroSP53844.2022.00045

  5. [5]

    Pedersen

    David Chaum and Torben P. Pedersen. 1992. Wallet Databases with Observers. In Proceedings of the 12th Annual International Cryptology Conference on Advances in Cryptology. Springer, 89–105

  6. [6]

    Mark Chen, Jerry Tworek, Heewoo Jun, Qiming Yuan, Henrique Ponde de Oliveira Pinto, Jared Kaplan, Harri Edwards, Yuri Burda, Nicholas Joseph, Greg Brockman, Alex Ray, Raul Puri, Gretchen Krueger, Michael Petrov, Heidy Khlaaf, Girish Sastry, Pamela Mishkin, Brooke Chan, Scott Gray, Nick Ryder, Mikhail Pavlov, Alethea Power, Lukasz Kaiser, Mohammad Bavarian...

  7. [7]

    Miranda Christ and Sam Gunn. 2024. Pseudorandom Error-Correcting Codes. InAdvances in Cryptology – CRYPTO 2024, Leonid Reyzin and Douglas Stebila (Eds.). Springer Nature Switzerland, Cham, 325–347

  8. [8]

    Miranda Christ, Sam Gunn, and Or Zamir. 2024. Undetectable Watermarks for Language Models. InProceedings of Thirty Seventh Conference on Learning Theory (Proceedings of Machine Learning Research, Vol. 247), Shipra Agrawal and Aaron Roth (Eds.). PMLR, 1125–1139. https://proceedings.mlr.press/v247/christ24a.html

  9. [9]

    Karl Cobbe, Vineet Kosaraju, Mohammad Bavarian, Mark Chen, Heewoo Jun, Lukasz Kaiser, Matthias Plappert, Jerry Tworek, Jacob Hilton, Reiichiro Nakano, Christopher Hesse, and John Schulman. 2021. Training Verifiers to Solve Math Word Problems. arXiv:2110.14168 [cs.LG] https://arxiv.org/abs/2110.14168

  10. [10]

    Xinyue Cui, Johnny Wei, Swabha Swayamdipta, and Robin Jia. 2025. Robust Data Watermarking in Language Models by Injecting Fictitious Knowledge. In Findings of the Association for Computational Linguistics: ACL 2025, Wanxiang Che, Joyce Nabende, Ekaterina Shutova, and Mohammad Taher Pilehvar (Eds.). Association for Computational Linguistics, Vienna, Austri...

  11. [11]

    Sumanth Dathathri, Abigail See, Sumedh Ghaisas, Po-Sen Huang, Rob McAdam, Johannes Welbl, Vandana Bachani, Alex Kaskasoli, Robert Stanforth, Tatiana Matejovicova, Jamie Hayes, Nidhi Vyas, Majd Al Merey, Jonah Brown-Cohen, Rudy Bunel, Borja Balle, Taylan Cemgil, Zahra Ahmed, Kitty Stacpoole, Ilia Shumailov, Ciprian Baetu, Sven Gowal, Demis Hassabis, and Pu...

  12. [12]

    Scalable watermarking for identifying large language model outputs

    Scalable watermarking for identifying large language model outputs. Nature634 (Oct. 2024), 818–823. doi:10.1038/s41586-024-08025-4

  13. [13]

    Alex Davidson, Armando Faz-Hernandez, Nick Sullivan, and Christopher A. Wood. 2023. Oblivious Pseudorandom Functions (OPRFs) Using Prime-Order Groups. RFC 9497. doi:10.17487/RFC9497

  14. [14]

    Alex Davidson, Ian Goldberg, Nick Sullivan, George Tankersley, and Filippo Valsorda. 2018. Privacy Pass: Bypassing Internet Challenges Anonymously. Proceedings on Privacy Enhancing Technologies2018 (2018), 164 – 180. doi:10. 1515/POPETS-2018-0026

  15. [15]

    European Parliament and Council. 2024. The EU AI Act. Official Journal of the European Union, L 2024/1689. https://artificialintelligenceact.eu/the-act/ Official full title: Regulation (EU) 2024/1689 of the European Parliament and of the Council of 13 June 2024 laying down harmonised rules on artificial intelligence and amending Regulations (EC) No 300/20...

  16. [16]

    Jaiden Fairoze, Sanjam Garg, Somesh Jha, Saeed Mahloujifar, Mohammad Mahmoody, and Mingyuan Wang. 2023. Publicly-Detectable Watermarking for Language Models. Cryptology ePrint Archive, Paper 2023/1661. https: //eprint.iacr.org/2023/1661

  17. [17]

    Angela Fan, Yacine Jernite, Ethan Perez, David Grangier, Jason Weston, and Michael Auli. 2019. ELI5: Long Form Question Answering. InProceedings of the 57th Annual Meeting of the Association for Computational Linguistics, Anna Ko- rhonen, David Traum, and Lluís Màrquez (Eds.). Association for Computational Linguistics, Florence, Italy, 3558–3567. doi:10.1...

  18. [18]

    Pierre Fernandez, Antoine Chaffin, Karim Tit, Vivien Chappelier, and Teddy Furon. 2023. Three Bricks to Consolidate Watermarks for Large Language Models. In2023 IEEE International Workshop on Information Forensics and Security (WIFS). 1–6. doi:10.1109/WIFS58808.2023.10374576

  19. [19]

    Shafi Goldwasser, Silvio Micali, and Charles Rackoff. 1985. The knowledge complexity of interactive proof-systems. InProceedings of the 17th Annual ACM Symposium on Theory of Computing. Association for Computing Machinery, 291–304. doi:10.1145/22145.22178

  20. [20]

    Chenchen Gu, Xiang Lisa Li, Rohith Kuditipudi, Percy Liang, and Tatsunori Hashimoto. 2025. Auditing Prompt Caching in Language Model APIs. InForty- second International Conference on Machine Learning. https://openreview.net/ forum?id=gUj2fxQcLZ

  21. [21]

    Junfeng Guo, Yiming Li, Lixu Wang, Shu-Tao Xia, Heng Huang, Cong Liu, and Bo Li. 2023. Domain watermark: effective and harmless dataset copyright pro- tection is closed at hand. InProceedings of the 37th International Conference on Neural Information Processing Systems(New Orleans, LA, USA)(NIPS ’23). Curran Associates Inc., Red Hook, NY, USA, Article 237...

  22. [22]

    Le Quan Ha, E. I. Sicilia-Garcia, Ji Ming, and F. J. Smith. 2002. Extension of Zipf’s law to words and phrases. InProceedings of the 19th International Conference on Computational Linguistics - Volume 1(Taipei, Taiwan)(COLING ’02). Association for Computational Linguistics, USA, 1–6. doi:10.3115/1072228.1072345

  23. [23]

    Abe Hou, Jingyu Zhang, Tianxing He, Yichen Wang, Yung-Sung Chuang, Hong- wei Wang, Lingfeng Shen, Benjamin Van Durme, Daniel Khashabi, and Yulia Tsvetkov. 2024. SemStamp: A Semantic Watermark with Paraphrastic Robustness for Text Generation. InProceedings of the 2024 Conference of the North Ameri- can Chapter of the Association for Computational Linguisti...

  24. [24]

    Abe Hou, Jingyu Zhang, Yichen Wang, Daniel Khashabi, and Tianxing He. 2024. k-SemStamp: A Clustering-Based Semantic Watermark for Detection of Machine- Generated Text. InFindings of the Association for Computational Linguistics: ACL 2024, Lun-Wei Ku, Andre Martins, and Vivek Srikumar (Eds.). Association for Computational Linguistics, Bangkok, Thailand, 17...

  25. [25]

    Stanislaw Jarecki, Aggelos Kiayias, and Hugo Krawczyk. 2014. Round-Optimal Password-Protected Secret Sharing and T-PAKE in the Password-Only Model. In Advances in Cryptology – ASIACRYPT 2014, Palash Sarkar and Tetsu Iwata (Eds.). Springer Berlin Heidelberg, Berlin, Heidelberg, 233–253

  26. [26]

    Daniel Kang, Tatsunori Hashimoto, Ion Stoica, and Yi Sun. 2022. Scaling up Trustless DNN Inference with Zero-Knowledge Proofs. arXiv:2210.08674 [cs.CR] https://arxiv.org/abs/2210.08674

  27. [27]

    John Kirchenbauer, Jonas Geiping, Yuxin Wen, Jonathan Katz, Ian Miers, and Tom Goldstein. 2023. A Watermark for Large Language Models. InProceedings of the 40th International Conference on Machine Learning (Proceedings of Machine Learning Research, Vol. 202), Andreas Krause, Emma Brunskill, Kyunghyun Cho, Barbara Engelhardt, Sivan Sabato, and Jonathan Sca...

  28. [28]

    John Kirchenbauer, Jonas Geiping, Yuxin Wen, Manli Shu, Khalid Saifullah, Kezhi Kong, Kasun Fernando, Aniruddha Saha, Micah Goldblum, and Tom Goldstein

  29. [29]

    InThe Twelfth International Conference on Learning Representations

    On the Reliability of Watermarks for Large Language Models. InThe Twelfth International Conference on Learning Representations. https://openreview. net/forum?id=DEJIDCmWOz

  30. [30]

    Lea Kissner and Dawn Xiaodong Song. 2005. Privacy-Preserving Set Operations. InAdvances in Cryptology – CRYPTO 2005, Vol. 3621. Springer, 241–257. doi:10. 1007/11535218_15

  31. [31]

    Rohith Kuditipudi, John Thickstun, Tatsunori Hashimoto, and Percy Liang. 2024. Robust Distortion-free Watermarks for Language Models.Transactions on Ma- chine Learning Research(2024). https://openreview.net/forum?id=FpaCL1MO2C

  32. [32]

    Zilong Lin, Jian Cui, Xiaojing Liao, and XiaoFeng Wang. 2024. Malla: Demysti- fying Real-world Large Language Model Integrated Malicious Services. In33rd USENIX Security Symposium (USENIX Security 24). USENIX Association, Philadel- phia, PA, 4693–4710. https://www.usenix.org/conference/usenixsecurity24/ presentation/lin-zilong

  33. [33]

    Aiwei Liu, Leyi Pan, Xuming Hu, Shuang Li, Lijie Wen, Irwin King, and Philip S. Yu. 2024. An Unforgeable Publicly Verifiable Watermark for Large Language Models. InThe Twelfth International Conference on Learning Representations. https://openreview.net/forum?id=gMLQwKDY3N

  34. [34]

    Aiwei Liu, Leyi Pan, Yijian Lu, Jingjing Li, Xuming Hu, Xi Zhang, Lijie Wen, Irwin King, Hui Xiong, and Philip Yu. 2024. A Survey of Text Watermarking in the Era of Large Language Models.ACM Comput. Surv.57, 2, Article 47 (Nov. 2024), 36 pages. doi:10.1145/3691626

  35. [35]

    Yepeng Liu and Yuheng Bu. 2024. Adaptive text watermark for large language models. InProceedings of the 41st International Conference on Machine Learning (Vienna, Austria)(ICML’24). JMLR.org, Article 1238, 20 pages

  36. [36]

    Meta. 2023. Llama 2: Open Foundation and Fine-Tuned Chat Models. arXiv:2307.09288. arXiv:2307.09288 [cs.CL] https://arxiv.org/abs/2307.09288

  37. [37]

    OpenAI. 2024. GPT-4 Technical Report. arXiv:2303.08774. arXiv:2303.08774 [cs.CL] https://arxiv.org/abs/2303.08774

  38. [38]

    Qwen. 2025. Qwen2.5 Technical Report. arXiv:2412.15115. arXiv:2412.15115 [cs.CL] https://arxiv.org/abs/2412.15115

  39. [39]

    Colin Raffel, Noam Shazeer, Adam Roberts, Katherine Lee, Sharan Narang, Michael Matena, Yanqi Zhou, Wei Li, and Peter J. Liu. 2020. Exploring the limits of transfer learning with a unified text-to-text transformer.J. Mach. Learn. Res.21, 1, Article 140 (Jan. 2020), 67 pages

  40. [40]

    Jie Ren, Han Xu, Yiding Liu, Yingqian Cui, Shuaiqiang Wang, Dawei Yin, and Jiliang Tang. 2024. A Robust Semantics-based Watermark for Large Language Model against Paraphrasing. InFindings of the Association for Computational Linguistics: NAACL 2024, Kevin Duh, Helena Gomez, and Steven Bethard (Eds.). Association for Computational Linguistics, Mexico City,...

  41. [41]

    Victor Sanh, Lysandre Debut, Julien Chaumond, and Thomas Wolf. 2020. DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter. arXiv:1910.01108 [cs.CL] https://arxiv.org/abs/1910.01108

  42. [42]

    Machine behaviour

    Ilia Shumailov, Zakhar Shumaylov, Yiren Zhao, Nicolas Papernot, Ross J. Ander- son, and Yarin Gal. 2024. AI models collapse when trained on recursively gener- ated data.Nature631, 8022 (July 2024), 755–759. https://doi.org/10.1038/s41586- 024-07566-y

  43. [43]

    Yanshen Sun, Jianfeng He, Limeng Cui, Shuo Lei, and Chang-Tien Lu. 2024. Exploring the Deceptive Power of LLM-Generated Fake News: A Study of Real- World Detection Challenges. arXiv:2403.18249 [cs.CL] https://arxiv.org/abs/ 2403.18249

  44. [44]

    The White House. 2023. Executive Order 14110: Safe, Secure, and Trustworthy De- velopment and Use of Artificial Intelligence. Federal Register, 75191–75226 pages. https://www.federalregister.gov/d/2023-24283 Signed on October 30, 2023

  45. [45]

    Yihan Wu, Zhengmian Hu, Junfeng Guo, Hongyang Zhang, and Heng Huang

  46. [46]

    InProceedings of the 41st International Conference on Machine Learning(Vienna, Austria)(ICML’24)

    A resilient and accessible distribution-preserving watermark for large language models. InProceedings of the 41st International Conference on Machine Learning(Vienna, Austria)(ICML’24). JMLR.org, Article 2190, 28 pages

  47. [47]

    Yixin Wu, Ziqing Yang, Yun Shen, Michael Backes, and Yang Zhang. 2025. Syn- thetic artifact auditing: tracing LLM-generated synthetic data usage in down- stream applications. InProceedings of the 34th USENIX Conference on Security Symposium(Seattle, WA, USA)(SEC ’25). USENIX Association, USA, Article 88, 20 pages

  48. [48]

    Andrew C. Yao. 1982. Protocols for secure computations. InProceedings of the 23rd Annual Symposium on Foundations of Computer Science (FOCS). IEEE, 160–164. doi:10.1109/SFCS.1982.38

  49. [49]

    Ruisi Zhang, Shehzeen Samarah Hussain, Paarth Neekhara, and Farinaz Koushan- far. 2024. REMARK-LLM: a robust and efficient watermarking framework for generative large language models. InProceedings of the 33rd USENIX Conference on Security Symposium(Philadelphia, PA, USA)(SEC ’24). USENIX Association, USA, Article 102, 18 pages

  50. [50]

    Xuandong Zhao, Prabhanjan Vijendra Ananth, Lei Li, and Yu-Xiang Wang. 2024. Provable Robust Watermarking for AI-Generated Text. InThe Twelfth Interna- tional Conference on Learning Representations. https://openreview.net/forum? id=SsmT8aO45L

  51. [51]

    Xuandong Zhao, Sam Gunn, Miranda Christ, Jaiden Fairoze, Andres Fabrega, Nicholas Carlini, Sanjam Garg, Sanghyun Hong, Milad Nasr, Florian Tramer, Somesh Jha, Lei Li, Yu-Xiang Wang, and Dawn Song. 2025. SoK: Watermarking for AI-Generated Content . In2025 IEEE Symposium on Security and Privacy (SP). IEEE Computer Society, Los Alamitos, CA, USA, 2621–2639. ...

  52. [52]

    Xuandong Zhao, Yu-Xiang Wang, and Lei Li. 2023. Protecting language genera- tion models via invisible watermarking. InProceedings of the 40th International Conference on Machine Learning(Honolulu, Hawaii, USA)(ICML’23). JMLR.org, Article 1774, 13 pages

  53. [53]

    Chaoyi Zhu, Jeroen Galjaard, Pin-Yu Chen, and Lydia Chen. 2024. Duwak: Dual Watermarks in Large Language Models. InFindings of the Association for Computational Linguistics: ACL 2024, Lun-Wei Ku, Andre Martins, and Vivek Srikumar (Eds.). Association for Computational Linguistics, Bangkok, Thailand, 11416–11436. doi:10.18653/v1/2024.findings-acl.678

  54. [54]

    Auditing Black-Box LLM APIs with a Rank-Based Uniformity Test

    Xiaoyuan Zhu, Yaowen Ye, Tianyi Qiu, Hanlin Zhu, Sijun Tan, Ajraf Mannan, Jonathan Michala, Raluca Ada Popa, and Willie Neiswanger. 2025. Auditing Black- Box LLM APIs with a Rank-Based Uniformity Test. arXiv:2506.06975 [cs.CR] https://arxiv.org/abs/2506.06975 A Proof of Correctness for Unbiased Sampling We prove the described rejection sampling algorithm ...