Sample Complexities of Estimating Gumbel--Max Watermark Proportions with and without Reduction to Pivotal Statistics

Qiaosen Wang; Shuwen Chai

arxiv: 2607.00224 · v2 · pith:42VNWSJAnew · submitted 2026-06-30 · 🧮 math.ST · cs.IT· cs.LG· math.IT· stat.ML· stat.TH

Sample Complexities of Estimating Gumbel--Max Watermark Proportions with and without Reduction to Pivotal Statistics

Shuwen Chai , Qiaosen Wang This is my paper

Pith reviewed 2026-07-03 21:53 UTC · model grok-4.3

classification 🧮 math.ST cs.ITcs.LGmath.ITstat.MLstat.TH

keywords watermark proportion estimationGumbel-max watermarksample complexitypivotal statisticsstatistical estimationLLM watermarking

0 comments

The pith

Full observation of tokens and vectors yields lower sample complexity than pivotal reduction for estimating watermark proportions.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper studies estimation of the proportion of text generated by a Gumbel-max watermarked LLM when next-token distributions are unknown nuisance parameters. It contrasts the full-observation regime, where both the pseudorandom vector and chosen token are recorded, against the reduced regime that retains only a scalar pivot drawn from a Uniform-Beta mixture. An event-counting estimator together with a matching lower bound is derived for the full regime, while a Laguerre-polynomial estimator and matching lower bound are derived for the reduced regime. The resulting sample-complexity comparison shows that the prevalent reduction to pivots is strictly less efficient for proportion estimation. A reader cares because mixed human-LLM documents are the realistic case and the number of tokens needed determines whether the estimate is feasible on short texts.

Core claim

Under the Gumbel-max watermark, the minimal number of tokens required to estimate the watermark proportion to fixed accuracy is substantially smaller when the full pseudorandom vector and selected token are observed at each position than when only the one-dimensional pivot is retained; explicit estimators achieve the information-theoretic lower bounds in each regime.

What carries the argument

The distinction between full observation of the Gumbel-max pseudorandom vector and token versus reduction to the scalar Uniform-Beta pivot, together with the event-counting and Laguerre-polynomial estimators derived for each case.

If this is right

The event-counting estimator is optimal for the full-observation regime.
The Laguerre-polynomial estimator is optimal for the pivotal-reduction regime.
Reduction to pivots sacrifices statistical efficiency when the target is a continuous proportion rather than binary detection.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Watermark designs intended for proportion estimation could retain vector information instead of discarding it after pivot computation.
The same efficiency comparison may apply to other mixture-proportion problems that involve high-dimensional nuisance parameters.
Short documents may become estimable only under the full-observation regime.

Load-bearing premise

Next-token prediction distributions are unknown and arbitrary but obey a non-degeneracy condition.

What would settle it

An explicit construction or numerical check showing that the sample complexity under pivotal reduction is no larger than under full observation would refute the claimed efficiency gap.

read the original abstract

Watermarking promises statistical traceability of large language model (LLM) uses, but real documents rarely arrive as purely human-written or purely LLM-generated. This motivates a quantitative question beyond detection: what proportion of a document is generated from a pre-specified watermarked LLM? We study this watermark proportion estimation problem under the Gumbel--max watermarking mechanism, treating the next-token prediction distributions as unknown and arbitrary nuisance parameters subject to a non-degeneracy condition. We compare two observation regimes: in the full observation regime, the estimator observes the pseudorandom vector and the selected token at each position; in the more prevalent setting of pivotal reduction, it observes only a scalar pivot, which follows a one-dimensional Uniform--Beta mixture distribution. Under pivotal reduction, we develop a Laguerre-polynomial estimator and establish a matching information-theoretic lower bound for the sample complexity. For full observation, we introduce an event-counting estimator and show a matching lower bound, yielding a substantially smaller sample complexity. As our results imply, although reducing to pivotal statistics is an elegant and prevalent choice, it is not always sample-efficient for estimating the proportion of watermarks.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper shows full observation beats pivotal reduction on sample complexity for Gumbel-max watermark proportion estimation, with new estimators and matching bounds under a non-degeneracy condition.

read the letter

The core result is that full observation of the pseudorandom vectors and tokens gives substantially lower sample complexity than the common pivotal reduction for estimating the watermark proportion. They build a Laguerre-polynomial estimator for the pivotal case and an event-counting estimator for full observation, each paired with an information-theoretic lower bound that matches the upper bound.

This is useful concrete progress on the estimation side. The comparison is clean because they treat the next-token distributions as arbitrary nuisances under one non-degeneracy condition, which lets the bounds apply broadly without fixing the LLM output distributions. The abstract states the full-observation regime wins by a clear margin, and the stress-test note finds no internal inconsistency once the bounds are granted.

The main soft spot is that the letter rests entirely on those bounds being correct, and the abstract gives no derivations or assumption checks. A referee would need to verify the polynomial construction and the lower-bound arguments hold without hidden looseness. The non-degeneracy condition itself could be the practical limiter in real LLM settings where token probabilities sometimes get close to degenerate.

This paper is for people already working on statistical watermarking or on mixture proportion estimation with partial observations. A reader who cares about sample-efficiency trade-offs in detection problems will find the comparison worth seeing. It deserves a serious referee because the claims are specific, falsifiable, and address a real gap between the elegant pivotal approach and what the data actually provide.

Referee Report

0 major / 3 minor

Summary. The manuscript studies estimation of the watermark proportion (fraction of tokens generated by a Gumbel-max watermarked LLM) in mixed documents, treating next-token distributions as unknown arbitrary nuisance parameters subject to a non-degeneracy condition. It compares the full-observation regime (pseudorandom vector and chosen token observed at each step) against the pivotal-reduction regime (only the scalar Uniform-Beta mixture pivot observed). For pivotal reduction the authors construct a Laguerre-polynomial estimator and prove a matching information-theoretic lower bound; for full observation they construct an event-counting estimator and likewise prove a matching lower bound, establishing that the full-observation sample complexity is substantially smaller.

Significance. If the stated matching upper and lower bounds hold, the work supplies a complete information-theoretic characterization of sample complexity in both observation models and shows that reduction to the pivotal statistic, while prevalent, is not always sample-efficient. The explicit construction of estimators together with the matching bounds constitutes a clear technical contribution to the statistical analysis of LLM watermarking.

minor comments (3)

The non-degeneracy condition on the next-token distributions is invoked repeatedly but is stated only at a high level in the abstract; a precise formulation (e.g., a lower bound on the minimal probability mass or on the separation of the Gumbel-max logits) should appear in the main text before the first theorem that relies on it.
Notation for the watermark proportion parameter and for the two observation regimes is introduced without a consolidated table or diagram; adding a short notation table would improve readability.
The Laguerre-polynomial estimator is described only at the level of its functional form; an explicit statement of the truncation degree or the coefficient estimation procedure would help readers reproduce the construction.

Simulated Author's Rebuttal

0 responses · 0 unresolved

We thank the referee for their careful reading of the manuscript and for the positive assessment of its contributions to the information-theoretic analysis of watermark proportion estimation. The referee's summary correctly identifies the key comparison between full-observation and pivotal-reduction regimes, as well as the matching upper and lower bounds we derive in each case.

Circularity Check

0 steps flagged

No significant circularity; theoretical bounds derived independently

full rationale

The paper derives matching upper and lower bounds on sample complexity for watermark proportion estimation under two observation regimes, treating next-token distributions as arbitrary nuisance parameters subject only to a non-degeneracy condition. The central results (Laguerre-polynomial estimator under pivotal reduction and event-counting estimator under full observation) follow from standard information-theoretic arguments and explicit construction of estimators, without any reduction of predictions to fitted parameters, self-definitional loops, or load-bearing self-citations. The comparison between regimes is obtained directly from the derived bounds rather than by construction or renaming. This is a self-contained theoretical analysis with no evident circular steps.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

Only abstract available; main domain assumption identified is the non-degeneracy condition on token distributions.

axioms (1)

domain assumption next-token prediction distributions are unknown and arbitrary nuisance parameters subject to a non-degeneracy condition
Explicitly stated in abstract as the setting for the estimation problem.

pith-pipeline@v0.9.1-grok · 5751 in / 987 out tokens · 35490 ms · 2026-07-03T21:53:58.264100+00:00 · methodology

Review history (2 revisions) →

discussion (0)

Reference graph

Works this paper leans on

42 extracted references · 6 canonical work pages · 3 internal anchors

[1]

Watermarking of large language models

Scott Aaronson and H Kirchner. Watermarking of large language models. In Large Language Models and Transformers Workshop at Simons Institute for the Theory of Computing , 2023

2023
[2]

Handbook of mathematical functions, with formulas, graphs, and mathematical tables, 1966

Milton Abramowitz, Irene A Stegun, and Jacques E Romain. Handbook of mathematical functions, with formulas, graphs, and mathematical tables, 1966

1966
[3]

Minimax estimation of linear and quadratic functionals on sparsity classes

Olivier Collier, La \"e titia Comminges, and Alexandre B Tsybakov. Minimax estimation of linear and quadratic functionals on sparsity classes. The Annals of Statistics , pages 923--958, 2017

2017
[4]

On estimation of nonsmooth functionals of sparse normal means

Olivier Collier, La \"e titia Comminges, and Alexandre B Tsybakov. On estimation of nonsmooth functionals of sparse normal means. Bernoulli , 2020

2020
[5]

Undetectable watermarks for language models

Miranda Christ, Sam Gunn, and Or Zamir. Undetectable watermarks for language models. In The Thirty Seventh Annual Conference on Learning Theory , pages 1125--1139. PMLR, 2024

2024
[6]

Optimal rates of convergence for estimating the null density and proportion of nonnull effects in large-scale multiple testing

T Tony Cai and Jiashun Jin. Optimal rates of convergence for estimating the null density and proportion of nonnull effects in large-scale multiple testing. The Annals of Statistics , 38(304):100--145, 2010

2010
[7]

Optimal detection for language watermarks with pseudorandom collision

T Tony Cai, Xiang Li, Qi Long, Weijie J Su, and Garrett G Wen. Optimal detection for language watermarks with pseudorandom collision. arXiv preprint arXiv:2510.22007 , 2025

work page arXiv 2025
[8]

Sharp regret-Hellinger bounds for Gaussian empirical Bayes via polynomial approximation

Jiafeng Chen and Yihong Wu. Sharp regret- Hellinger bounds for Gaussian empirical Bayes via polynomial approximation. arXiv preprint arXiv:2605.02070 , 2026

work page internal anchor Pith review Pith/arXiv arXiv 2026
[9]

Sub- Gaussian mean estimators

Luc Devroye, Matthieu Lerasle, Gabor Lugosi, and Roberto I Oliveira. Sub- Gaussian mean estimators. The Annals of Statistics , pages 2695--2725, 2016

2016
[10]

Optimal estimation of high-dimensional Gaussian location mixtures

Natalie Doss, Yihong Wu, Pengkun Yang, and Harrison H Zhou. Optimal estimation of high-dimensional Gaussian location mixtures. The Annals of Statistics , 51(1):62--95, 2023

2023
[11]

Statistical theory of extreme values and some practical applications: a series of lectures , volume 33

Emil Julius Gumbel. Statistical theory of extreme values and some practical applications: a series of lectures , volume 33. US Government Printing Office, 1954

1954
[12]

Unbiased watermark for large language models

Zhengmian Hu, Lichang Chen, Xidong Wu, Yihan Wu, Hongyang Zhang, and Heng Huang. Unbiased watermark for large language models. In International Conference on Learning Representations , volume 2024, pages 45408--45436, 2024

2024
[13]

Minimax rate-optimal estimation of KL divergence between discrete distributions

Yanjun Han, Jiantao Jiao, and Tsachy Weissman. Minimax rate-optimal estimation of KL divergence between discrete distributions. In 2016 International Symposium on Information Theory and Its Applications (ISITA) , pages 256--260. IEEE, 2016

2016
[14]

Local moment matching: A unified methodology for symmetric functional estimation and distribution estimation under Wasserstein distance

Yanjun Han, Jiantao Jiao, and Tsachy Weissman. Local moment matching: A unified methodology for symmetric functional estimation and distribution estimation under Wasserstein distance. In Conference On Learning Theory , pages 3189--3221. PMLR, 2018

2018
[15]

Optimal prediction of the number of unseen species with multiplicity

Yi Hao and Ping Li. Optimal prediction of the number of unseen species with multiplicity. Advances in Neural Information Processing Systems , 33:8553--8564, 2020

2020
[16]

On the empirical power of goodness-of-fit tests in watermark detection

Weiqing He, Xiang Li, Tianqi Shang, Li Shen, Weijie Su, and Qi Long. On the empirical power of goodness-of-fit tests in watermark detection. Advances in neural information processing systems , 38:18761--18793, 2026

2026
[17]

Data amplification: Instance-optimal property estimation

Yi Hao and Alon Orlitsky. Data amplification: Instance-optimal property estimation. In International Conference on Machine Learning , pages 4049--4059. PMLR, 2020

2020
[18]

Sharp Inequalities between Total Variation and Hellinger Distances for Gaussian Mixtures

Joonhyuk Jung and Chao Gao. Sharp inequalities between total variation and Hellinger distances for Gaussian mixtures. arXiv preprint arXiv:2602.03202 , 2026

work page internal anchor Pith review Pith/arXiv arXiv 2026
[19]

Categorical reparameterization with Gumbel -softmax

Eric Jang, Shixiang Gu, and Ben Poole. Categorical reparameterization with Gumbel -softmax. In International Conference on Learning Representations , 2017

2017
[20]

Minimax estimation of functionals of discrete distributions

Jiantao Jiao, Kartik Venkat, Yanjun Han, and Tsachy Weissman. Minimax estimation of functionals of discrete distributions. IEEE Transactions on Information Theory , 61(5):2835--2885, 2015

2015
[21]

A watermark for large language models

John Kirchenbauer, Jonas Geiping, Yuxin Wen, Jonathan Katz, Ian Miers, and Tom Goldstein. A watermark for large language models. In International conference on machine learning , pages 17061--17084. PMLR, 2023

2023
[22]

Robust distortion-free watermarks for language models

Rohith Kuditipudi, John Thickstun, Tatsunori Hashimoto, and Percy Liang. Robust distortion-free watermarks for language models. Transactions on Machine Learning Research , 2024

2024
[23]

Refined detection for Gumbel watermarking

Tor Lattimore. Refined detection for Gumbel watermarking. arXiv preprint arXiv:2603.30017 , 2026

work page arXiv 2026
[24]

On a problem of adaptive estimation in Gaussian white noise

OV Lepskii. On a problem of adaptive estimation in Gaussian white noise. Theory of Probability & Its Applications , 35(3):454--466, 1991

1991
[25]

Asymptotically minimax adaptive estimation

OV Lepskii. Asymptotically minimax adaptive estimation. I : Upper bounds. optimally adaptive estimates. Theory of Probability & Its Applications , 36(4):682--697, 1992

1992
[26]

A likelihood based approach for watermark detection

Xingchi Li, Guanxun Li, and Xianyang Zhang. A likelihood based approach for watermark detection. In Proceedings of The 28th International Conference on Artificial Intelligence and Statistics , volume 258, pages 1675--1683, 2025

2025
[27]

Mean estimation and regression under heavy-tailed distributions: A survey

G \'a bor Lugosi and Shahar Mendelson. Mean estimation and regression under heavy-tailed distributions: A survey. Foundations of Computational Mathematics , 19(5):1145--1190, 2019

2019
[28]

A statistical framework of watermarks for large language models: Pivot, detection efficiency and optimal rules

Xiang Li, Feng Ruan, Huiyuan Wang, Qi Long, and Weijie J Su. A statistical framework of watermarks for large language models: Pivot, detection efficiency and optimal rules. The Annals of Statistics , 53(1):322--351, 2025

2025
[29]

Robust detection of watermarks for large language models under human edits

Xiang Li, Feng Ruan, Huiyuan Wang, Qi Long, and Weijie J Su. Robust detection of watermarks for large language models under human edits. Journal of the Royal Statistical Society Series B: Statistical Methodology , 88(2):491--515, 2026

2026
[30]

Optimal estimation of watermark proportions in hybrid AI -human texts

Xiang Li, Garrett Wen, Weiqing He, Jiayuan Wu, Qi Long, and Weijie J Su. Optimal estimation of watermark proportions in hybrid AI -human texts. arXiv preprint arXiv:2506.22343 , 2025

work page arXiv 2025
[31]

Optimal estimation of simultaneous signals using absolute inner product with applications to integrative genomics

Rong Ma, T Tony Cai, and Hongzhe Li. Optimal estimation of simultaneous signals using absolute inner product with applications to integrative genomics. Statistica Sinica , 32(2):1027--1048, 2022

2022
[32]

A* sampling

Chris J Maddison, Daniel Tarlow, and Tom Minka. A* sampling. Advances in neural information processing systems , 27, 2014

2014
[33]

On the best approximation by finite Gaussian mixtures

Yun Ma, Yihong Wu, and Pengkun Yang. On the best approximation by finite Gaussian mixtures. IEEE Transactions on Information Theory , 2025

2025
[34]

Optimal prediction of the number of unseen species

Alon Orlitsky, Ananda Theertha Suresh, and Yihong Wu. Optimal prediction of the number of unseen species. Proceedings of the National Academy of Sciences , 113(47):13283--13288, 2016

2016
[35]

Dualizing Le Cam ’s method for functional estimation I : General theory

Yury Polyanskiy and Yihong Wu. Dualizing Le Cam ’s method for functional estimation I : General theory. The Annals of Statistics , 54(1):1--24, 2026

2026
[36]

Complex analysis , volume 2

Elias M Stein and Rami Shakarchi. Complex analysis , volume 2. Princeton University Press, 2010

2010
[37]

Adaptive Confidence Intervals in Efron's Gaussian Two-Groups Model

Qiaosen Wang, Shuwen Chai, and Chao Gao. Adaptive confidence intervals in Efron 's Gaussian two-groups model. arXiv preprint arXiv:2604.26992 , 2026

work page internal anchor Pith review Pith/arXiv arXiv 2026
[38]

Minimax rates of entropy estimation on large alphabets via best polynomial approximation

Yihong Wu and Pengkun Yang. Minimax rates of entropy estimation on large alphabets via best polynomial approximation. IEEE Transactions on Information Theory , 62(6):3702--3720, 2016

2016
[39]

Chebyshev polynomials, moment matching, and optimal estimation of the unseen

Yihong Wu and Pengkun Yang. Chebyshev polynomials, moment matching, and optimal estimation of the unseen. The Annals of Statistics , 47(2):857--883, 2019

2019
[40]

Optimal estimation of Gaussian mixtures via denoised method of moments

Yihong Wu and Pengkun Yang. Optimal estimation of Gaussian mixtures via denoised method of moments. Annals of Statistics , 48(4), 2020

2020
[41]

Polynomial methods in statistical inference: Theory and practice

Yihong Wu and Pengkun Yang. Polynomial methods in statistical inference: Theory and practice. Foundations and Trends in Communications and Information Theory , 17(4):402--585, 2020

2020
[42]

Provable robust watermarking for AI -generated text

Xuandong Zhao, Prabhanjan Ananth, Lei Li, and Yu-Xiang Wang. Provable robust watermarking for AI -generated text. In B. Kim, Y. Yue, S. Chaudhuri, K. Fragkiadaki, M. Khan, and Y. Sun, editors, International Conference on Learning Representations , volume 2024, pages 43738--43772, 2024

2024

[1] [1]

Watermarking of large language models

Scott Aaronson and H Kirchner. Watermarking of large language models. In Large Language Models and Transformers Workshop at Simons Institute for the Theory of Computing , 2023

2023

[2] [2]

Handbook of mathematical functions, with formulas, graphs, and mathematical tables, 1966

Milton Abramowitz, Irene A Stegun, and Jacques E Romain. Handbook of mathematical functions, with formulas, graphs, and mathematical tables, 1966

1966

[3] [3]

Minimax estimation of linear and quadratic functionals on sparsity classes

Olivier Collier, La \"e titia Comminges, and Alexandre B Tsybakov. Minimax estimation of linear and quadratic functionals on sparsity classes. The Annals of Statistics , pages 923--958, 2017

2017

[4] [4]

On estimation of nonsmooth functionals of sparse normal means

Olivier Collier, La \"e titia Comminges, and Alexandre B Tsybakov. On estimation of nonsmooth functionals of sparse normal means. Bernoulli , 2020

2020

[5] [5]

Undetectable watermarks for language models

Miranda Christ, Sam Gunn, and Or Zamir. Undetectable watermarks for language models. In The Thirty Seventh Annual Conference on Learning Theory , pages 1125--1139. PMLR, 2024

2024

[6] [6]

Optimal rates of convergence for estimating the null density and proportion of nonnull effects in large-scale multiple testing

T Tony Cai and Jiashun Jin. Optimal rates of convergence for estimating the null density and proportion of nonnull effects in large-scale multiple testing. The Annals of Statistics , 38(304):100--145, 2010

2010

[7] [7]

Optimal detection for language watermarks with pseudorandom collision

T Tony Cai, Xiang Li, Qi Long, Weijie J Su, and Garrett G Wen. Optimal detection for language watermarks with pseudorandom collision. arXiv preprint arXiv:2510.22007 , 2025

work page arXiv 2025

[8] [8]

Sharp regret-Hellinger bounds for Gaussian empirical Bayes via polynomial approximation

Jiafeng Chen and Yihong Wu. Sharp regret- Hellinger bounds for Gaussian empirical Bayes via polynomial approximation. arXiv preprint arXiv:2605.02070 , 2026

work page internal anchor Pith review Pith/arXiv arXiv 2026

[9] [9]

Sub- Gaussian mean estimators

Luc Devroye, Matthieu Lerasle, Gabor Lugosi, and Roberto I Oliveira. Sub- Gaussian mean estimators. The Annals of Statistics , pages 2695--2725, 2016

2016

[10] [10]

Optimal estimation of high-dimensional Gaussian location mixtures

Natalie Doss, Yihong Wu, Pengkun Yang, and Harrison H Zhou. Optimal estimation of high-dimensional Gaussian location mixtures. The Annals of Statistics , 51(1):62--95, 2023

2023

[11] [11]

Statistical theory of extreme values and some practical applications: a series of lectures , volume 33

Emil Julius Gumbel. Statistical theory of extreme values and some practical applications: a series of lectures , volume 33. US Government Printing Office, 1954

1954

[12] [12]

Unbiased watermark for large language models

Zhengmian Hu, Lichang Chen, Xidong Wu, Yihan Wu, Hongyang Zhang, and Heng Huang. Unbiased watermark for large language models. In International Conference on Learning Representations , volume 2024, pages 45408--45436, 2024

2024

[13] [13]

Minimax rate-optimal estimation of KL divergence between discrete distributions

Yanjun Han, Jiantao Jiao, and Tsachy Weissman. Minimax rate-optimal estimation of KL divergence between discrete distributions. In 2016 International Symposium on Information Theory and Its Applications (ISITA) , pages 256--260. IEEE, 2016

2016

[14] [14]

Local moment matching: A unified methodology for symmetric functional estimation and distribution estimation under Wasserstein distance

Yanjun Han, Jiantao Jiao, and Tsachy Weissman. Local moment matching: A unified methodology for symmetric functional estimation and distribution estimation under Wasserstein distance. In Conference On Learning Theory , pages 3189--3221. PMLR, 2018

2018

[15] [15]

Optimal prediction of the number of unseen species with multiplicity

Yi Hao and Ping Li. Optimal prediction of the number of unseen species with multiplicity. Advances in Neural Information Processing Systems , 33:8553--8564, 2020

2020

[16] [16]

On the empirical power of goodness-of-fit tests in watermark detection

Weiqing He, Xiang Li, Tianqi Shang, Li Shen, Weijie Su, and Qi Long. On the empirical power of goodness-of-fit tests in watermark detection. Advances in neural information processing systems , 38:18761--18793, 2026

2026

[17] [17]

Data amplification: Instance-optimal property estimation

Yi Hao and Alon Orlitsky. Data amplification: Instance-optimal property estimation. In International Conference on Machine Learning , pages 4049--4059. PMLR, 2020

2020

[18] [18]

Sharp Inequalities between Total Variation and Hellinger Distances for Gaussian Mixtures

Joonhyuk Jung and Chao Gao. Sharp inequalities between total variation and Hellinger distances for Gaussian mixtures. arXiv preprint arXiv:2602.03202 , 2026

work page internal anchor Pith review Pith/arXiv arXiv 2026

[19] [19]

Categorical reparameterization with Gumbel -softmax

Eric Jang, Shixiang Gu, and Ben Poole. Categorical reparameterization with Gumbel -softmax. In International Conference on Learning Representations , 2017

2017

[20] [20]

Minimax estimation of functionals of discrete distributions

Jiantao Jiao, Kartik Venkat, Yanjun Han, and Tsachy Weissman. Minimax estimation of functionals of discrete distributions. IEEE Transactions on Information Theory , 61(5):2835--2885, 2015

2015

[21] [21]

A watermark for large language models

John Kirchenbauer, Jonas Geiping, Yuxin Wen, Jonathan Katz, Ian Miers, and Tom Goldstein. A watermark for large language models. In International conference on machine learning , pages 17061--17084. PMLR, 2023

2023

[22] [22]

Robust distortion-free watermarks for language models

Rohith Kuditipudi, John Thickstun, Tatsunori Hashimoto, and Percy Liang. Robust distortion-free watermarks for language models. Transactions on Machine Learning Research , 2024

2024

[23] [23]

Refined detection for Gumbel watermarking

Tor Lattimore. Refined detection for Gumbel watermarking. arXiv preprint arXiv:2603.30017 , 2026

work page arXiv 2026

[24] [24]

On a problem of adaptive estimation in Gaussian white noise

OV Lepskii. On a problem of adaptive estimation in Gaussian white noise. Theory of Probability & Its Applications , 35(3):454--466, 1991

1991

[25] [25]

Asymptotically minimax adaptive estimation

OV Lepskii. Asymptotically minimax adaptive estimation. I : Upper bounds. optimally adaptive estimates. Theory of Probability & Its Applications , 36(4):682--697, 1992

1992

[26] [26]

A likelihood based approach for watermark detection

Xingchi Li, Guanxun Li, and Xianyang Zhang. A likelihood based approach for watermark detection. In Proceedings of The 28th International Conference on Artificial Intelligence and Statistics , volume 258, pages 1675--1683, 2025

2025

[27] [27]

Mean estimation and regression under heavy-tailed distributions: A survey

G \'a bor Lugosi and Shahar Mendelson. Mean estimation and regression under heavy-tailed distributions: A survey. Foundations of Computational Mathematics , 19(5):1145--1190, 2019

2019

[28] [28]

A statistical framework of watermarks for large language models: Pivot, detection efficiency and optimal rules

Xiang Li, Feng Ruan, Huiyuan Wang, Qi Long, and Weijie J Su. A statistical framework of watermarks for large language models: Pivot, detection efficiency and optimal rules. The Annals of Statistics , 53(1):322--351, 2025

2025

[29] [29]

Robust detection of watermarks for large language models under human edits

Xiang Li, Feng Ruan, Huiyuan Wang, Qi Long, and Weijie J Su. Robust detection of watermarks for large language models under human edits. Journal of the Royal Statistical Society Series B: Statistical Methodology , 88(2):491--515, 2026

2026

[30] [30]

Optimal estimation of watermark proportions in hybrid AI -human texts

Xiang Li, Garrett Wen, Weiqing He, Jiayuan Wu, Qi Long, and Weijie J Su. Optimal estimation of watermark proportions in hybrid AI -human texts. arXiv preprint arXiv:2506.22343 , 2025

work page arXiv 2025

[31] [31]

Optimal estimation of simultaneous signals using absolute inner product with applications to integrative genomics

Rong Ma, T Tony Cai, and Hongzhe Li. Optimal estimation of simultaneous signals using absolute inner product with applications to integrative genomics. Statistica Sinica , 32(2):1027--1048, 2022

2022

[32] [32]

A* sampling

Chris J Maddison, Daniel Tarlow, and Tom Minka. A* sampling. Advances in neural information processing systems , 27, 2014

2014

[33] [33]

On the best approximation by finite Gaussian mixtures

Yun Ma, Yihong Wu, and Pengkun Yang. On the best approximation by finite Gaussian mixtures. IEEE Transactions on Information Theory , 2025

2025

[34] [34]

Optimal prediction of the number of unseen species

Alon Orlitsky, Ananda Theertha Suresh, and Yihong Wu. Optimal prediction of the number of unseen species. Proceedings of the National Academy of Sciences , 113(47):13283--13288, 2016

2016

[35] [35]

Dualizing Le Cam ’s method for functional estimation I : General theory

Yury Polyanskiy and Yihong Wu. Dualizing Le Cam ’s method for functional estimation I : General theory. The Annals of Statistics , 54(1):1--24, 2026

2026

[36] [36]

Complex analysis , volume 2

Elias M Stein and Rami Shakarchi. Complex analysis , volume 2. Princeton University Press, 2010

2010

[37] [37]

Adaptive Confidence Intervals in Efron's Gaussian Two-Groups Model

Qiaosen Wang, Shuwen Chai, and Chao Gao. Adaptive confidence intervals in Efron 's Gaussian two-groups model. arXiv preprint arXiv:2604.26992 , 2026

work page internal anchor Pith review Pith/arXiv arXiv 2026

[38] [38]

Minimax rates of entropy estimation on large alphabets via best polynomial approximation

Yihong Wu and Pengkun Yang. Minimax rates of entropy estimation on large alphabets via best polynomial approximation. IEEE Transactions on Information Theory , 62(6):3702--3720, 2016

2016

[39] [39]

Chebyshev polynomials, moment matching, and optimal estimation of the unseen

Yihong Wu and Pengkun Yang. Chebyshev polynomials, moment matching, and optimal estimation of the unseen. The Annals of Statistics , 47(2):857--883, 2019

2019

[40] [40]

Optimal estimation of Gaussian mixtures via denoised method of moments

Yihong Wu and Pengkun Yang. Optimal estimation of Gaussian mixtures via denoised method of moments. Annals of Statistics , 48(4), 2020

2020

[41] [41]

Polynomial methods in statistical inference: Theory and practice

Yihong Wu and Pengkun Yang. Polynomial methods in statistical inference: Theory and practice. Foundations and Trends in Communications and Information Theory , 17(4):402--585, 2020

2020

[42] [42]

Provable robust watermarking for AI -generated text

Xuandong Zhao, Prabhanjan Ananth, Lei Li, and Yu-Xiang Wang. Provable robust watermarking for AI -generated text. In B. Kim, Y. Yue, S. Chaudhuri, K. Fragkiadaki, M. Khan, and Y. Sun, editors, International Conference on Learning Representations , volume 2024, pages 43738--43772, 2024

2024