Recognition: unknown
VOW: Verifiable and Oblivious Watermark Detection for Large Language Models
Pith reviewed 2026-05-07 05:42 UTC · model grok-4.3
The pith
VOW lets users detect LLM watermarks without revealing their text while verifying the provider's result.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
VOW formulates watermark detection as a secure two-party computation problem and instantiates the watermark's core logic with a Verifiable Oblivious Pseudorandom Function. This construction lets the user and provider perform detection such that the user's text is never revealed to the provider and the provider's output is cryptographically verifiable, while the protocol remains efficient enough for short texts.
What carries the argument
Verifiable Oblivious Pseudorandom Function (VOPRF) that encodes the watermark detection logic inside a secure two-party computation protocol
If this is right
- Users can obtain detection results without disclosing their text to the provider.
- The provider's detection outcome can be checked cryptographically by the user.
- The protocol supports short texts at practical speeds.
- Watermark robustness can be reassessed in a setting that does not expose the examined text.
Where Pith is reading between the lines
- The same two-party structure could apply to provenance checks for other AI-generated media where input privacy is required.
- Providers would need to expose only the VOPRF interface rather than the full watermarking details, changing deployment requirements.
- If the efficiency claims hold, watermarking could move from optional research feature to default capability in privacy-sensitive LLM services.
Load-bearing premise
A practical and secure VOPRF can be built for the exact watermark detection logic and will stay efficient and robust on short texts even against modern paraphrasing attacks.
What would settle it
An implementation or proof showing that the VOPRF protocol leaks information about the input text or fails to produce verifiable correct outputs on short paraphrased texts would disprove the privacy, verifiability, and practicality claims.
Figures
read the original abstract
Large Language Model (LLM) watermarking is crucial for establishing the provenance of machine-generated text, but most existing methods rely on a centralized trust model. This model forces users to reveal potentially sensitive text to a provider for detection and offers no way to verify the integrity of the result. While asymmetric schemes have been proposed to address these issues, they are either impractical for short texts or lack formal guarantees linking watermark insertion and detection. We propose VOW, a new protocol that achieves both privacy-preserving and cryptographically verifiable watermark detection with high efficiency. Our approach formulates detection as a secure two-party computation problem, instantiating the watermark's core logic with a Verifiable Oblivious Pseudorandom Function (VOPRF). This allows the user and provider to perform detection without the user's text being revealed, while the provider's result is verifiable. Our comprehensive evaluation shows that VOW is practical for short texts and provides a crucial reassessment of watermark robustness against modern paraphrasing attacks.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper proposes VOW, a new protocol for LLM watermark detection that formulates the task as a secure two-party computation. It instantiates the core watermark logic using a Verifiable Oblivious Pseudorandom Function (VOPRF) so that detection can be performed without the user revealing their text to the provider and so that the provider's output is cryptographically verifiable. The authors claim the construction is practical for short texts, achieves both obliviousness and verifiability, and includes an evaluation that reassesses watermark robustness against modern paraphrasing attacks.
Significance. If the VOPRF-based reduction is sound and the efficiency claims hold, the work would be significant for privacy-preserving and verifiable provenance in LLMs. It directly addresses the centralized trust model that currently forces users to reveal potentially sensitive text, and the cryptographic framing could provide formal guarantees that prior asymmetric schemes lack. The reassessment of paraphrasing robustness is also a useful contribution if supported by concrete data.
major comments (3)
- [§4] §4 (Protocol Construction): The central claim that the watermark detection logic (statistical test over PRF-evaluated tokens) can be exactly expressed as a VOPRF evaluation lacks an explicit security reduction to the VOPRF assumption together with the base watermark security. Without this reduction it is impossible to verify that obliviousness and verifiability are achieved without degrading the original false-positive rate or robustness for short texts.
- [§5] §5 (Evaluation): The paper asserts that VOW remains practical for short texts and provides a reassessment of robustness against paraphrasing attacks, yet no tables, figures, or quantitative comparison to the non-oblivious baseline are referenced. This is load-bearing for the practicality claim.
- [§3] §3 (VOPRF Instantiation): The description of how the watermark's native PRF is mapped to the chosen VOPRF primitive (RSA-OPRF, EC-based, etc.) does not address whether the output distribution preserves the statistical test used for detection; any mismatch would invalidate the claimed robustness guarantees.
minor comments (2)
- [Abstract] The abstract states 'high efficiency' and 'practical for short texts' without any concrete metrics (runtime, communication, or accuracy numbers); these should be quantified in the introduction or evaluation summary.
- [§2] Notation for the watermark key, VOPRF inputs/outputs, and the detection threshold should be introduced consistently in §2 before being used in the protocol description.
Simulated Author's Rebuttal
We thank the referee for their insightful comments. Below we provide point-by-point responses to the major comments and outline the revisions we will make to the manuscript.
read point-by-point responses
-
Referee: [§4] §4 (Protocol Construction): The central claim that the watermark detection logic (statistical test over PRF-evaluated tokens) can be exactly expressed as a VOPRF evaluation lacks an explicit security reduction to the VOPRF assumption together with the base watermark security. Without this reduction it is impossible to verify that obliviousness and verifiability are achieved without degrading the original false-positive rate or robustness for short texts.
Authors: We appreciate the referee's emphasis on formal security. The VOW protocol in §4 is constructed so that the VOPRF directly computes the PRF evaluations required for the statistical test, thereby preserving the exact detection logic. We concede that an explicit reduction was not detailed in the submission. In the revised manuscript, we will add a formal security analysis subsection to §4. This will include a reduction showing that VOW's obliviousness and verifiability properties are secure under the VOPRF assumption and the security of the base watermarking scheme, with no degradation to the false-positive rate or robustness for short texts. The proof will be based on the standard simulation paradigm for oblivious PRFs. revision: yes
-
Referee: [§5] §5 (Evaluation): The paper asserts that VOW remains practical for short texts and provides a reassessment of robustness against paraphrasing attacks, yet no tables, figures, or quantitative comparison to the non-oblivious baseline are referenced. This is load-bearing for the practicality claim.
Authors: The evaluation in §5 does contain quantitative results demonstrating practicality for short texts and robustness reassessment. However, we agree that direct references to tables and figures comparing VOW to the non-oblivious baseline were not prominently included in the narrative. We will revise §5 to explicitly reference and discuss the relevant tables (efficiency metrics) and figures (robustness curves), including side-by-side comparisons of accuracy and overhead. This will better support the practicality claims with visible data. revision: yes
-
Referee: [§3] §3 (VOPRF Instantiation): The description of how the watermark's native PRF is mapped to the chosen VOPRF primitive (RSA-OPRF, EC-based, etc.) does not address whether the output distribution preserves the statistical test used for detection; any mismatch would invalidate the claimed robustness guarantees.
Authors: We selected an EC-based VOPRF instantiation in §3 because its output distribution is statistically indistinguishable from a random function, matching the requirements of the original watermark's PRF. Nevertheless, the manuscript does not explicitly analyze the impact on the statistical test. We will update §3 with a paragraph explaining that the VOPRF output is pseudorandom and uniformly distributed, preserving the p-value calculations and thus the false-positive rate and robustness. If necessary, we will note any negligible statistical distance and its effect. revision: yes
Circularity Check
No circularity in VOW protocol derivation
full rationale
The paper proposes a new protocol VOW by formulating detection as secure two-party computation using VOPRF. This is a constructive approach that does not rely on self-definitional loops, fitted predictions, or load-bearing self-citations. No equations or claims in the provided abstract reduce the result to its inputs by construction. The evaluation of practicality for short texts is presented as empirical, not tautological. Thus, the derivation chain is self-contained without circular elements.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption VOPRF provides obliviousness (hides input from evaluator) and verifiability (allows proof of correct output) under standard cryptographic assumptions.
Reference graph
Works this paper leans on
-
[1]
Maryam Amirizaniani, Elias Martin, Tanya Roosta, Aman Chadha, and Chirag Shah. 2024. AuditLLM: A Tool for Auditing Large Language Models Using Multiprobe Approach(CIKM ’24). Association for Computing Machinery, New York, NY, USA, 5174–5179. doi:10.1145/3627673.3679222
-
[2]
Dan Boneh. 1998. The Decision Diffie–Hellman problem. InAlgorithmic Number Theory, Vol. 1423. Springer, 48–63. doi:10.1007/BFb0054851
-
[3]
Will Cai, Tianneng Shi, Xuandong Zhao, and Dawn Song. 2025. Are You Getting What You Pay For? Auditing Model Substitution in LLM APIs. InNeurIPS 2025 Workshop on Regulatable ML. https://openreview.net/forum?id=thhrtv9P0s
2025
-
[4]
Silvia Casacuberta, Julia Hesse, and Anja Lehmann. 2022. SoK: Oblivious Pseu- dorandom Functions. In2022 IEEE 7th European Symposium on Security and Privacy (EuroS&P). IEEE Computer Society, Los Alamitos, CA, USA, 625–646. doi:10.1109/EuroSP53844.2022.00045
-
[5]
Pedersen
David Chaum and Torben P. Pedersen. 1992. Wallet Databases with Observers. In Proceedings of the 12th Annual International Cryptology Conference on Advances in Cryptology. Springer, 89–105
1992
-
[6]
Mark Chen, Jerry Tworek, Heewoo Jun, Qiming Yuan, Henrique Ponde de Oliveira Pinto, Jared Kaplan, Harri Edwards, Yuri Burda, Nicholas Joseph, Greg Brockman, Alex Ray, Raul Puri, Gretchen Krueger, Michael Petrov, Heidy Khlaaf, Girish Sastry, Pamela Mishkin, Brooke Chan, Scott Gray, Nick Ryder, Mikhail Pavlov, Alethea Power, Lukasz Kaiser, Mohammad Bavarian...
work page internal anchor Pith review arXiv 2021
-
[7]
Miranda Christ and Sam Gunn. 2024. Pseudorandom Error-Correcting Codes. InAdvances in Cryptology – CRYPTO 2024, Leonid Reyzin and Douglas Stebila (Eds.). Springer Nature Switzerland, Cham, 325–347
2024
-
[8]
Miranda Christ, Sam Gunn, and Or Zamir. 2024. Undetectable Watermarks for Language Models. InProceedings of Thirty Seventh Conference on Learning Theory (Proceedings of Machine Learning Research, Vol. 247), Shipra Agrawal and Aaron Roth (Eds.). PMLR, 1125–1139. https://proceedings.mlr.press/v247/christ24a.html
2024
-
[9]
Karl Cobbe, Vineet Kosaraju, Mohammad Bavarian, Mark Chen, Heewoo Jun, Lukasz Kaiser, Matthias Plappert, Jerry Tworek, Jacob Hilton, Reiichiro Nakano, Christopher Hesse, and John Schulman. 2021. Training Verifiers to Solve Math Word Problems. arXiv:2110.14168 [cs.LG] https://arxiv.org/abs/2110.14168
work page internal anchor Pith review arXiv 2021
-
[10]
Xinyue Cui, Johnny Wei, Swabha Swayamdipta, and Robin Jia. 2025. Robust Data Watermarking in Language Models by Injecting Fictitious Knowledge. In Findings of the Association for Computational Linguistics: ACL 2025, Wanxiang Che, Joyce Nabende, Ekaterina Shutova, and Mohammad Taher Pilehvar (Eds.). Association for Computational Linguistics, Vienna, Austri...
2025
-
[11]
Sumanth Dathathri, Abigail See, Sumedh Ghaisas, Po-Sen Huang, Rob McAdam, Johannes Welbl, Vandana Bachani, Alex Kaskasoli, Robert Stanforth, Tatiana Matejovicova, Jamie Hayes, Nidhi Vyas, Majd Al Merey, Jonah Brown-Cohen, Rudy Bunel, Borja Balle, Taylan Cemgil, Zahra Ahmed, Kitty Stacpoole, Ilia Shumailov, Ciprian Baetu, Sven Gowal, Demis Hassabis, and Pu...
-
[12]
Scalable watermarking for identifying large language model outputs
Scalable watermarking for identifying large language model outputs. Nature634 (Oct. 2024), 818–823. doi:10.1038/s41586-024-08025-4
-
[13]
Alex Davidson, Armando Faz-Hernandez, Nick Sullivan, and Christopher A. Wood. 2023. Oblivious Pseudorandom Functions (OPRFs) Using Prime-Order Groups. RFC 9497. doi:10.17487/RFC9497
-
[14]
Alex Davidson, Ian Goldberg, Nick Sullivan, George Tankersley, and Filippo Valsorda. 2018. Privacy Pass: Bypassing Internet Challenges Anonymously. Proceedings on Privacy Enhancing Technologies2018 (2018), 164 – 180. doi:10. 1515/POPETS-2018-0026
2018
-
[15]
European Parliament and Council. 2024. The EU AI Act. Official Journal of the European Union, L 2024/1689. https://artificialintelligenceact.eu/the-act/ Official full title: Regulation (EU) 2024/1689 of the European Parliament and of the Council of 13 June 2024 laying down harmonised rules on artificial intelligence and amending Regulations (EC) No 300/20...
2024
-
[16]
Jaiden Fairoze, Sanjam Garg, Somesh Jha, Saeed Mahloujifar, Mohammad Mahmoody, and Mingyuan Wang. 2023. Publicly-Detectable Watermarking for Language Models. Cryptology ePrint Archive, Paper 2023/1661. https: //eprint.iacr.org/2023/1661
2023
-
[17]
Angela Fan, Yacine Jernite, Ethan Perez, David Grangier, Jason Weston, and Michael Auli. 2019. ELI5: Long Form Question Answering. InProceedings of the 57th Annual Meeting of the Association for Computational Linguistics, Anna Ko- rhonen, David Traum, and Lluís Màrquez (Eds.). Association for Computational Linguistics, Florence, Italy, 3558–3567. doi:10.1...
-
[18]
Pierre Fernandez, Antoine Chaffin, Karim Tit, Vivien Chappelier, and Teddy Furon. 2023. Three Bricks to Consolidate Watermarks for Large Language Models. In2023 IEEE International Workshop on Information Forensics and Security (WIFS). 1–6. doi:10.1109/WIFS58808.2023.10374576
-
[19]
Shafi Goldwasser, Silvio Micali, and Charles Rackoff. 1985. The knowledge complexity of interactive proof-systems. InProceedings of the 17th Annual ACM Symposium on Theory of Computing. Association for Computing Machinery, 291–304. doi:10.1145/22145.22178
-
[20]
Chenchen Gu, Xiang Lisa Li, Rohith Kuditipudi, Percy Liang, and Tatsunori Hashimoto. 2025. Auditing Prompt Caching in Language Model APIs. InForty- second International Conference on Machine Learning. https://openreview.net/ forum?id=gUj2fxQcLZ
2025
-
[21]
Junfeng Guo, Yiming Li, Lixu Wang, Shu-Tao Xia, Heng Huang, Cong Liu, and Bo Li. 2023. Domain watermark: effective and harmless dataset copyright pro- tection is closed at hand. InProceedings of the 37th International Conference on Neural Information Processing Systems(New Orleans, LA, USA)(NIPS ’23). Curran Associates Inc., Red Hook, NY, USA, Article 237...
2023
-
[22]
Le Quan Ha, E. I. Sicilia-Garcia, Ji Ming, and F. J. Smith. 2002. Extension of Zipf’s law to words and phrases. InProceedings of the 19th International Conference on Computational Linguistics - Volume 1(Taipei, Taiwan)(COLING ’02). Association for Computational Linguistics, USA, 1–6. doi:10.3115/1072228.1072345
-
[23]
Abe Hou, Jingyu Zhang, Tianxing He, Yichen Wang, Yung-Sung Chuang, Hong- wei Wang, Lingfeng Shen, Benjamin Van Durme, Daniel Khashabi, and Yulia Tsvetkov. 2024. SemStamp: A Semantic Watermark with Paraphrastic Robustness for Text Generation. InProceedings of the 2024 Conference of the North Ameri- can Chapter of the Association for Computational Linguisti...
-
[24]
Abe Hou, Jingyu Zhang, Yichen Wang, Daniel Khashabi, and Tianxing He. 2024. k-SemStamp: A Clustering-Based Semantic Watermark for Detection of Machine- Generated Text. InFindings of the Association for Computational Linguistics: ACL 2024, Lun-Wei Ku, Andre Martins, and Vivek Srikumar (Eds.). Association for Computational Linguistics, Bangkok, Thailand, 17...
-
[25]
Stanislaw Jarecki, Aggelos Kiayias, and Hugo Krawczyk. 2014. Round-Optimal Password-Protected Secret Sharing and T-PAKE in the Password-Only Model. In Advances in Cryptology – ASIACRYPT 2014, Palash Sarkar and Tetsu Iwata (Eds.). Springer Berlin Heidelberg, Berlin, Heidelberg, 233–253
2014
- [26]
-
[27]
John Kirchenbauer, Jonas Geiping, Yuxin Wen, Jonathan Katz, Ian Miers, and Tom Goldstein. 2023. A Watermark for Large Language Models. InProceedings of the 40th International Conference on Machine Learning (Proceedings of Machine Learning Research, Vol. 202), Andreas Krause, Emma Brunskill, Kyunghyun Cho, Barbara Engelhardt, Sivan Sabato, and Jonathan Sca...
2023
-
[28]
John Kirchenbauer, Jonas Geiping, Yuxin Wen, Manli Shu, Khalid Saifullah, Kezhi Kong, Kasun Fernando, Aniruddha Saha, Micah Goldblum, and Tom Goldstein
-
[29]
InThe Twelfth International Conference on Learning Representations
On the Reliability of Watermarks for Large Language Models. InThe Twelfth International Conference on Learning Representations. https://openreview. net/forum?id=DEJIDCmWOz
-
[30]
Lea Kissner and Dawn Xiaodong Song. 2005. Privacy-Preserving Set Operations. InAdvances in Cryptology – CRYPTO 2005, Vol. 3621. Springer, 241–257. doi:10. 1007/11535218_15
2005
-
[31]
Rohith Kuditipudi, John Thickstun, Tatsunori Hashimoto, and Percy Liang. 2024. Robust Distortion-free Watermarks for Language Models.Transactions on Ma- chine Learning Research(2024). https://openreview.net/forum?id=FpaCL1MO2C
2024
-
[32]
Zilong Lin, Jian Cui, Xiaojing Liao, and XiaoFeng Wang. 2024. Malla: Demysti- fying Real-world Large Language Model Integrated Malicious Services. In33rd USENIX Security Symposium (USENIX Security 24). USENIX Association, Philadel- phia, PA, 4693–4710. https://www.usenix.org/conference/usenixsecurity24/ presentation/lin-zilong
2024
-
[33]
Aiwei Liu, Leyi Pan, Xuming Hu, Shuang Li, Lijie Wen, Irwin King, and Philip S. Yu. 2024. An Unforgeable Publicly Verifiable Watermark for Large Language Models. InThe Twelfth International Conference on Learning Representations. https://openreview.net/forum?id=gMLQwKDY3N
2024
-
[34]
Aiwei Liu, Leyi Pan, Yijian Lu, Jingjing Li, Xuming Hu, Xi Zhang, Lijie Wen, Irwin King, Hui Xiong, and Philip Yu. 2024. A Survey of Text Watermarking in the Era of Large Language Models.ACM Comput. Surv.57, 2, Article 47 (Nov. 2024), 36 pages. doi:10.1145/3691626
-
[35]
Yepeng Liu and Yuheng Bu. 2024. Adaptive text watermark for large language models. InProceedings of the 41st International Conference on Machine Learning (Vienna, Austria)(ICML’24). JMLR.org, Article 1238, 20 pages
2024
-
[36]
Meta. 2023. Llama 2: Open Foundation and Fine-Tuned Chat Models. arXiv:2307.09288. arXiv:2307.09288 [cs.CL] https://arxiv.org/abs/2307.09288
work page internal anchor Pith review arXiv 2023
-
[37]
OpenAI. 2024. GPT-4 Technical Report. arXiv:2303.08774. arXiv:2303.08774 [cs.CL] https://arxiv.org/abs/2303.08774
work page internal anchor Pith review arXiv 2024
-
[38]
Qwen. 2025. Qwen2.5 Technical Report. arXiv:2412.15115. arXiv:2412.15115 [cs.CL] https://arxiv.org/abs/2412.15115
work page internal anchor Pith review arXiv 2025
-
[39]
Colin Raffel, Noam Shazeer, Adam Roberts, Katherine Lee, Sharan Narang, Michael Matena, Yanqi Zhou, Wei Li, and Peter J. Liu. 2020. Exploring the limits of transfer learning with a unified text-to-text transformer.J. Mach. Learn. Res.21, 1, Article 140 (Jan. 2020), 67 pages
2020
-
[40]
Jie Ren, Han Xu, Yiding Liu, Yingqian Cui, Shuaiqiang Wang, Dawei Yin, and Jiliang Tang. 2024. A Robust Semantics-based Watermark for Large Language Model against Paraphrasing. InFindings of the Association for Computational Linguistics: NAACL 2024, Kevin Duh, Helena Gomez, and Steven Bethard (Eds.). Association for Computational Linguistics, Mexico City,...
2024
-
[41]
Victor Sanh, Lysandre Debut, Julien Chaumond, and Thomas Wolf. 2020. DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter. arXiv:1910.01108 [cs.CL] https://arxiv.org/abs/1910.01108
work page internal anchor Pith review arXiv 2020
-
[42]
Ilia Shumailov, Zakhar Shumaylov, Yiren Zhao, Nicolas Papernot, Ross J. Ander- son, and Yarin Gal. 2024. AI models collapse when trained on recursively gener- ated data.Nature631, 8022 (July 2024), 755–759. https://doi.org/10.1038/s41586- 024-07566-y
- [43]
-
[44]
The White House. 2023. Executive Order 14110: Safe, Secure, and Trustworthy De- velopment and Use of Artificial Intelligence. Federal Register, 75191–75226 pages. https://www.federalregister.gov/d/2023-24283 Signed on October 30, 2023
2023
-
[45]
Yihan Wu, Zhengmian Hu, Junfeng Guo, Hongyang Zhang, and Heng Huang
-
[46]
InProceedings of the 41st International Conference on Machine Learning(Vienna, Austria)(ICML’24)
A resilient and accessible distribution-preserving watermark for large language models. InProceedings of the 41st International Conference on Machine Learning(Vienna, Austria)(ICML’24). JMLR.org, Article 2190, 28 pages
-
[47]
Yixin Wu, Ziqing Yang, Yun Shen, Michael Backes, and Yang Zhang. 2025. Syn- thetic artifact auditing: tracing LLM-generated synthetic data usage in down- stream applications. InProceedings of the 34th USENIX Conference on Security Symposium(Seattle, WA, USA)(SEC ’25). USENIX Association, USA, Article 88, 20 pages
2025
-
[48]
Andrew C. Yao. 1982. Protocols for secure computations. InProceedings of the 23rd Annual Symposium on Foundations of Computer Science (FOCS). IEEE, 160–164. doi:10.1109/SFCS.1982.38
-
[49]
Ruisi Zhang, Shehzeen Samarah Hussain, Paarth Neekhara, and Farinaz Koushan- far. 2024. REMARK-LLM: a robust and efficient watermarking framework for generative large language models. InProceedings of the 33rd USENIX Conference on Security Symposium(Philadelphia, PA, USA)(SEC ’24). USENIX Association, USA, Article 102, 18 pages
2024
-
[50]
Xuandong Zhao, Prabhanjan Vijendra Ananth, Lei Li, and Yu-Xiang Wang. 2024. Provable Robust Watermarking for AI-Generated Text. InThe Twelfth Interna- tional Conference on Learning Representations. https://openreview.net/forum? id=SsmT8aO45L
2024
-
[51]
Xuandong Zhao, Sam Gunn, Miranda Christ, Jaiden Fairoze, Andres Fabrega, Nicholas Carlini, Sanjam Garg, Sanghyun Hong, Milad Nasr, Florian Tramer, Somesh Jha, Lei Li, Yu-Xiang Wang, and Dawn Song. 2025. SoK: Watermarking for AI-Generated Content . In2025 IEEE Symposium on Security and Privacy (SP). IEEE Computer Society, Los Alamitos, CA, USA, 2621–2639. ...
-
[52]
Xuandong Zhao, Yu-Xiang Wang, and Lei Li. 2023. Protecting language genera- tion models via invisible watermarking. InProceedings of the 40th International Conference on Machine Learning(Honolulu, Hawaii, USA)(ICML’23). JMLR.org, Article 1774, 13 pages
2023
-
[53]
Chaoyi Zhu, Jeroen Galjaard, Pin-Yu Chen, and Lydia Chen. 2024. Duwak: Dual Watermarks in Large Language Models. InFindings of the Association for Computational Linguistics: ACL 2024, Lun-Wei Ku, Andre Martins, and Vivek Srikumar (Eds.). Association for Computational Linguistics, Bangkok, Thailand, 11416–11436. doi:10.18653/v1/2024.findings-acl.678
-
[54]
Auditing Black-Box LLM APIs with a Rank-Based Uniformity Test
Xiaoyuan Zhu, Yaowen Ye, Tianyi Qiu, Hanlin Zhu, Sijun Tan, Ajraf Mannan, Jonathan Michala, Raluca Ada Popa, and Willie Neiswanger. 2025. Auditing Black- Box LLM APIs with a Rank-Based Uniformity Test. arXiv:2506.06975 [cs.CR] https://arxiv.org/abs/2506.06975 A Proof of Correctness for Unbiased Sampling We prove the described rejection sampling algorithm ...
work page internal anchor Pith review Pith/arXiv arXiv 2025
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.