Growing Alphabets in Canonical Shuffle Experiments: Likelihood-Ratio Laws, Estimation Bounds, and Low-Budget Equivariant Design

Alex Shvets

arxiv: 2603.18080 · v2 · submitted 2026-03-18 · 💻 cs.IT · math.IT

Growing Alphabets in Canonical Shuffle Experiments: Likelihood-Ratio Laws, Estimation Bounds, and Low-Budget Equivariant Design

Alex Shvets This is my paper

Pith reviewed 2026-05-15 09:05 UTC · model grok-4.3

classification 💻 cs.IT math.IT

keywords shuffle modellocal differential privacyalphabet growthlikelihood ratio lawsfrequency estimationequivariant designchi-squared divergenceAssouad bounds

0 comments

The pith

Alphabet growth improves canonical shuffled privacy iff the worst pairwise law collapses to delta_1.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper studies how growing the output alphabet affects privacy in canonical one-step neighboring shuffle experiments for finite-output epsilon_0-LDP d-ary channels, with analysis under a pairwise chi-squared budget for frequency estimation. It shows that privacy strengthens with larger alphabets precisely when the worst pairwise likelihood-ratio law nu_ab,d collapses to a point mass at 1, making the shuffled histogram experiment equivalent to a quotient multinomial model. Sharp endpoint principles, saturating obstruction families, and a diluting versus persistent dichotomy with hockey-stick bounds are derived to characterize the behavior. These findings clarify mechanism design choices, including that augmented GRR is optimal among permutation-equivariant channels in low-budget regimes while calibrated GRR is not.

Core claim

The pairwise likelihood-ratio law nu_ab,d, the pushforward of the row ratio under the null row, is the governing invariant that renders the canonical shuffled histogram experiment equivalent to the quotient multinomial experiment. Alphabet growth improves canonical shuffled privacy if and only if the worst pairwise law collapses to delta_1. A sharp pure-LDP endpoint principle holds for the pairwise chi-squared, full-support obstruction families saturate it, and a diluting/persistent dichotomy is established with explicit finite-n hockey-stick bounds. The worst-case pairwise budget chi_*(W) governs a two-regime Assouad lower bound for estimators, symmetrization reduces the uniform-pointFisher

What carries the argument

The pairwise likelihood-ratio law nu_ab,d (pushforward of the row ratio under the null row), which generates the equivalent quotient multinomial experiment and determines whether alphabet growth improves privacy.

If this is right

Privacy improves with alphabet growth exactly when the worst pairwise law collapses to delta_1.
The worst-case pairwise budget chi_*(W) yields sharp two-regime Assouad lower bounds on estimation error.
Explicit finite-n hockey-stick bounds separate diluting and persistent privacy behaviors.
Augmented GRR is optimal among permutation-equivariant channels in the low-budget regime.
Symmetrization reduces the uniform-point Fisher criterion to permutation-equivariant channel design.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Mechanism designers could deliberately choose alphabets that enforce the collapse condition to strengthen privacy in deployed shuffle systems.
The obstruction families provide explicit worst-case constructions that may transfer to related models such as approximate LDP or multi-step shuffles.
The diluting/persistent dichotomy offers a practical test for selecting output alphabets in finite-sample frequency estimation tasks.
These invariants could inform budget allocation when combining shuffle privacy with other mechanisms like aggregation.

Load-bearing premise

The setup assumes canonical one-step neighboring shuffle experiments for finite-output epsilon_0-LDP d-ary channels along growing alphabets, governed by the pairwise chi-squared budget.

What would settle it

A concrete channel family where alphabet growth improves the privacy parameters even though the worst pairwise likelihood-ratio law fails to collapse to delta_1, or a numerical check showing violation of the hockey-stick bounds for small n and specific d.

read the original abstract

We study canonical one-step neighboring shuffle experiments for finite-output epsilon_0-LDP d-ary channels along growing alphabets, with frequency estimation and mechanism design under a pairwise chi-squared budget. The pairwise likelihood-ratio law nu_{ab,d} (pushforward of the row ratio under the null row) is the governing invariant: the canonical shuffled histogram experiment is exactly equivalent to the quotient multinomial experiment generated by nu_{ab,d}. Alphabet growth improves canonical shuffled privacy iff the worst pairwise law collapses to delta_1. We prove a sharp pure-LDP endpoint principle for the pairwise chi-squared, construct full-support obstruction families saturating it, and establish a diluting/persistent dichotomy with explicit finite-n hockey-stick bounds. The worst-case pairwise budget chi_*(W) governs a two-regime Assouad lower bound for arbitrary estimators in the i.i.d. multinomial model. Symmetrization reduces the uniform-point Fisher criterion to permutation-equivariant channels. Calibrated GRR is not optimal; in the low-budget regime, augmented GRR is optimal among permutation-equivariant channels.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper pins down a likelihood-ratio invariant that equates the canonical shuffle histogram to a quotient multinomial and yields a sharp endpoint plus optimality for augmented GRR in the low-budget regime.

read the letter

The main new piece is the reduction of the canonical one-step neighboring shuffle experiment to the quotient multinomial generated by the pairwise law nu_ab,d. Once that equivalence is in hand, the paper shows that alphabet growth improves privacy exactly when the worst nu collapses to delta_1, proves a sharp pure-LDP endpoint for the pairwise chi-squared budget, and splits the behavior into diluting and persistent regimes with explicit finite-n hockey-stick bounds. The optimality of augmented GRR among permutation-equivariant channels in the low-budget case follows from the symmetrized Fisher criterion and is a clear step beyond standard GRR. The two-regime Assouad lower bound for estimators is a direct consequence and looks usable for mechanism design under a chi-star budget. These statements are more precise than the usual asymptotic shuffle results in the cited literature. The obstruction families that saturate the endpoint are a good way to establish sharpness without extra parameters. The soft spot is uniformity in d. The stress-test note is on target: if the full-support families have d-dependent tails or if the canonical neighboring relation changes measure class with alphabet size, the iff claim and the dichotomy may not hold uniformly for large d. The paper needs to state explicit support control or stability conditions to close this. The derivations otherwise look formally grounded from the abstract. This is for people already working on shuffle-model LDP and frequency estimation who want explicit bounds rather than high-level comparisons. A reader who knows the multinomial experiment and equivariant channels will get the most out of it. It deserves a serious referee because the central claims are checkable and the technical core is self-contained.

Referee Report

2 major / 1 minor

Summary. The manuscript examines canonical one-step neighboring shuffle experiments for finite-output ε₀-LDP d-ary channels with growing alphabets, under a pairwise chi-squared budget for frequency estimation and mechanism design. The governing invariant is the pairwise likelihood-ratio law ν_{ab,d}, under which the canonical shuffled histogram experiment is exactly equivalent to the quotient multinomial experiment generated by ν_{ab,d}. The central claim is that alphabet growth improves canonical shuffled privacy if and only if the worst pairwise law collapses to δ₁. The paper proves a sharp pure-LDP endpoint principle for the pairwise chi-squared, constructs full-support obstruction families that saturate it, and establishes a diluting/persistent dichotomy with explicit finite-n hockey-stick bounds. The worst-case budget χ_*(W) governs a two-regime Assouad lower bound; symmetrization reduces the uniform-point Fisher criterion to permutation-equivariant channels; and augmented GRR is shown to be optimal among such channels in the low-budget regime.

Significance. If the central derivations hold, the work supplies a sharp, invariant-based characterization of privacy-utility tradeoffs in shuffle models as output alphabets grow, which is directly relevant to local DP mechanism design and frequency estimation. The explicit construction of saturating obstruction families, the parameter-free endpoint principle, and the finite-n hockey-stick bounds are concrete strengths. The reduction to permutation-equivariant channels and the identification of augmented GRR optimality in the low-budget regime offer actionable design guidance.

major comments (2)

[Equivalence statement (abstract and the section establishing the governing invariant)] The equivalence between the canonical shuffled histogram and the quotient multinomial generated by ν_{ab,d} is asserted to be exact, yet the manuscript must verify uniformity in d. Without explicit tail control on the full-support obstruction families, the pushforward ν_{ab,d} may acquire d-dependent support or tails, which would undermine the sharpness of the iff claim that alphabet growth improves privacy precisely when the worst law collapses to δ₁ and would weaken the diluting/persistent dichotomy for large d.
[Endpoint principle and obstruction families (abstract and the section containing the pure-LDP endpoint)] The sharp pure-LDP endpoint principle for the pairwise chi-squared and the construction of saturating obstruction families presuppose that the canonical one-step neighboring relation preserves measure class as d grows. If the neighboring relation changes measure class with alphabet size, the endpoint saturation and the subsequent Assouad lower bound governed by χ_*(W) lose uniformity; the manuscript should supply an explicit uniformity argument or counterexample family.

minor comments (1)

[Notation and definitions] Clarify at first use whether χ_*(W) is defined as the supremum over all pairs or only the worst pair; the current notation risks ambiguity when the budget is invoked for the two-regime Assouad bound.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the careful reading and for highlighting the need for explicit uniformity arguments with respect to alphabet size d. We address each major comment below and will incorporate the requested clarifications and supporting lemmas in the revised manuscript.

read point-by-point responses

Referee: [Equivalence statement (abstract and the section establishing the governing invariant)] The equivalence between the canonical shuffled histogram and the quotient multinomial generated by ν_{ab,d} is asserted to be exact, yet the manuscript must verify uniformity in d. Without explicit tail control on the full-support obstruction families, the pushforward ν_{ab,d} may acquire d-dependent support or tails, which would undermine the sharpness of the iff claim that alphabet growth improves privacy precisely when the worst law collapses to δ₁ and would weaken the diluting/persistent dichotomy for large d.

Authors: The equivalence is exact for each fixed d by construction of the pushforward of the row ratio under the null row, which maps the canonical shuffled histogram directly onto the quotient multinomial generated by ν_{ab,d}. To address uniformity, the revision will add a dedicated lemma establishing that the full-support obstruction families admit tail bounds controlled uniformly by the pairwise chi-squared budget; this bound is independent of d and ensures that support remains full and tails do not introduce d-dependent artifacts. Consequently the iff characterization and the diluting/persistent dichotomy remain sharp for all d. revision: yes
Referee: [Endpoint principle and obstruction families (abstract and the section containing the pure-LDP endpoint)] The sharp pure-LDP endpoint principle for the pairwise chi-squared and the construction of saturating obstruction families presuppose that the canonical one-step neighboring relation preserves measure class as d grows. If the neighboring relation changes measure class with alphabet size, the endpoint saturation and the subsequent Assouad lower bound governed by χ_*(W) lose uniformity; the manuscript should supply an explicit uniformity argument or counterexample family.

Authors: The full-support obstruction families are constructed so that every row places strictly positive mass on the entire alphabet for every d; the one-step neighboring relation therefore preserves measure class independently of d. The revision will insert a short uniformity remark immediately after the endpoint principle, showing that the pure-LDP saturation and the subsequent χ_*(W)-governed Assouad bound hold uniformly because the chi-squared budget controls the likelihood ratios uniformly across d. No counterexample family is needed under this construction. revision: yes

Circularity Check

0 steps flagged

Derivation chain is self-contained with no load-bearing reductions to inputs

full rationale

The paper defines the canonical one-step neighboring shuffle model for ε₀-LDP d-ary channels and introduces the pairwise likelihood-ratio law ν_{ab,d} as the pushforward invariant under the null row. It states the equivalence of the shuffled histogram experiment to the quotient multinomial generated by ν_{ab,d} as a direct modeling fact, then derives the iff condition on alphabet growth (worst law collapsing to δ₁), the pure-LDP endpoint for pairwise chi-squared, obstruction families, and finite-n hockey-stick bounds as consequences. No equations reduce a claimed prediction or result to a fitted parameter by construction, no self-citations are used for uniqueness or ansatz justification, and no known empirical pattern is merely renamed. The central claims rest on explicit constructions and bounds that do not presuppose their own conclusions, rendering the derivation self-contained.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The central claims rest on standard definitions of epsilon_0-LDP and multinomial models plus the canonical shuffle construction; no explicit free parameters or invented entities are stated in the abstract, but the pairwise chi-squared budget functions as a derived quantity that may embed fitting choices.

axioms (2)

domain assumption Canonical one-step neighboring shuffle experiments for finite-output epsilon_0-LDP d-ary channels
Invoked throughout the abstract as the experimental model.
domain assumption Pairwise chi-squared budget governs the two-regime Assouad lower bound
Used to derive estimation bounds in the i.i.d. multinomial model.

pith-pipeline@v0.9.0 · 5492 in / 1362 out tokens · 44133 ms · 2026-05-15T09:05:32.310404+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel; dAlembert_to_ODE_general matches

?

matches
MATCHES: this paper passage directly uses, restates, or depends on the cited Recognition theorem or module.

Universal extremal χ² bound: χ²(Wd(·|b)∥Wd(·|a)) ≤ (e^ε0−1)²/e^ε0, sharp iff the likelihood ratio is two-point at endpoints
IndisputableMonolith/Foundation/AlphaCoordinateFixation.lean costAlphaLog_high_calibrated_iff; J_uniquely_calibrated_via_higher_derivative echoes

?

echoes
ECHOES: this paper passage has the same mathematical shape or conceptual pattern as the Recognition theorem, but is not a direct formal dependency.

LR-quotient compression: full histogram experiment equivalent to multinomial on level sets of w_ab,d; invariant is pushforward ν_ab,d on [e^{-ε0},e^ε0]
IndisputableMonolith/Foundation/BranchSelection.lean branch_selection; RCLCombiner_isCoupling_iff matches

?

matches
MATCHES: this paper passage directly uses, restates, or depends on the cited Recognition theorem or module.

Diluting/persistent dichotomy: family diluting iff worst ν_adbd,d ⇒ δ1 (equivalently I*→0); persistent otherwise, no extra d-gain

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

15 extracted references · 15 canonical work pages

[1]

Balle, J

B. Balle, J. Bell, A. Gascón, and K. Nissim,The Privacy Blanket of the Shuffle Model, inAdvances in Cryptology—CRYPTO 2019, Lecture Notes in Comput. Sci.11692, Springer, 2019, pp. 638–667

work page 2019
[2]

Bhatia and C

R. Bhatia and C. Davis,A Better Bound on the Variance, Amer. Math. Monthly107 (2000), no. 4, 353–357

work page 2000
[3]

Bittau, Ú

A. Bittau, Ú. Erlingsson, P. Maniatis, I. Mironov, A. Raghunathan, D. Lie, M. Rudominer, U. Kode, J. Tinnes, and B. Seefeld,PROCHLO: Strong Privacy for Analytics in the Crowd, inProceedings of the 26th Symposium on Operating Systems Principles, ACM, 2017, pp. 441–459

work page 2017
[4]

A. Cheu, A. Smith, J. Ullman, D. Zeber, and M. Zhilyaev,Distributed Differential Privacy via Shuffling, inAdvances in Cryptology—EUROCRYPT 2019, Lecture Notes in Comput. Sci.11476, Springer, 2019, pp. 375–403

work page 2019
[5]

Erlingsson, V

Ú. Erlingsson, V. Feldman, I. Mironov, A. Raghunathan, K. Talwar, and A. Thakurta, Amplification by Shuffling: From Local to Central Differential Privacy via Anonymity, inProceedings of the Thirtieth Annual ACM-SIAM Symposium on Discrete Algorithms, SIAM, 2019, pp. 2468–2479

work page 2019
[6]

Feldman, A

V. Feldman, A. McMillan, and K. Talwar,Hiding Among the Clones: A Simple and Nearly Optimal Analysis of Privacy Amplification by Shuffling, in62nd IEEE Annual Symposium on Foundations of Computer Science (FOCS 2021), IEEE, 2022, pp. 954–964

work page 2021
[7]

Feldman, A

V. Feldman, A. McMillan, and K. Talwar,Stronger Privacy Amplification by Shuffling for Rényi and Approximate Differential Privacy, inProceedings of the 2023 Annual ACM-SIAM Symposium on Discrete Algorithms, SIAM, 2023, pp. 4966–4981. 40 ALEX SHVETS

work page 2023
[8]

Feller,An Introduction to Probability Theory and Its Applications, Vol

W. Feller,An Introduction to Probability Theory and Its Applications, Vol. II, 2nd ed., Wiley, New York, 1971

work page 1971
[9]

Le Cam,Asymptotic Methods in Statistical Decision Theory, Springer, New York, 1986

L. Le Cam,Asymptotic Methods in Statistical Decision Theory, Springer, New York, 1986

work page 1986
[10]

A. W. Marshall, I. Olkin, and B. C. Arnold,Inequalities: Theory of Majorization and Its Applications, 2nd ed., Springer, New York, 2011

work page 2011
[11]

V. V. Petrov,Sums of Independent Random Variables, Springer, Berlin, 1975

work page 1975
[12]

Shvets,Universal Shuffle Asymptotics: Sharp Privacy Analysis in the Gaussian Regime, arXiv:2602.09029, 2026

A. Shvets,Universal Shuffle Asymptotics: Sharp Privacy Analysis in the Gaussian Regime, arXiv:2602.09029, 2026

work page arXiv 2026
[13]

Shvets,Universal Shuffle Asymptotics, Part II: Non-Gaussian Limits for Shuffle Privacy—Poisson, Skellam, and Compound-Poisson Regimes, arXiv:2603.10073, 2026

A. Shvets,Universal Shuffle Asymptotics, Part II: Non-Gaussian Limits for Shuffle Privacy—Poisson, Skellam, and Compound-Poisson Regimes, arXiv:2603.10073, 2026

work page arXiv 2026
[14]

A.Shvets,Universal Shuffle Asymptotics, Part III: Dominant-Block Quotient Geometry and Hybrid Gaussian–Compound-Poisson Limits in Finite-Alphabet Shuffle Privacy, arXiv:2603.13407, 2026

work page arXiv 2026
[15]

Takagi and C

S. Takagi and C. Liew,Analysis of Shuffling Beyond Pure Local Differential Privacy, arXiv:2601.19154, 2026; accepted to PODS 2026. Independent Researcher, Haifa, Israel Email address:alt178332@gmail.com

work page arXiv 2026

[1] [1]

Balle, J

B. Balle, J. Bell, A. Gascón, and K. Nissim,The Privacy Blanket of the Shuffle Model, inAdvances in Cryptology—CRYPTO 2019, Lecture Notes in Comput. Sci.11692, Springer, 2019, pp. 638–667

work page 2019

[2] [2]

Bhatia and C

R. Bhatia and C. Davis,A Better Bound on the Variance, Amer. Math. Monthly107 (2000), no. 4, 353–357

work page 2000

[3] [3]

Bittau, Ú

A. Bittau, Ú. Erlingsson, P. Maniatis, I. Mironov, A. Raghunathan, D. Lie, M. Rudominer, U. Kode, J. Tinnes, and B. Seefeld,PROCHLO: Strong Privacy for Analytics in the Crowd, inProceedings of the 26th Symposium on Operating Systems Principles, ACM, 2017, pp. 441–459

work page 2017

[4] [4]

A. Cheu, A. Smith, J. Ullman, D. Zeber, and M. Zhilyaev,Distributed Differential Privacy via Shuffling, inAdvances in Cryptology—EUROCRYPT 2019, Lecture Notes in Comput. Sci.11476, Springer, 2019, pp. 375–403

work page 2019

[5] [5]

Erlingsson, V

Ú. Erlingsson, V. Feldman, I. Mironov, A. Raghunathan, K. Talwar, and A. Thakurta, Amplification by Shuffling: From Local to Central Differential Privacy via Anonymity, inProceedings of the Thirtieth Annual ACM-SIAM Symposium on Discrete Algorithms, SIAM, 2019, pp. 2468–2479

work page 2019

[6] [6]

Feldman, A

V. Feldman, A. McMillan, and K. Talwar,Hiding Among the Clones: A Simple and Nearly Optimal Analysis of Privacy Amplification by Shuffling, in62nd IEEE Annual Symposium on Foundations of Computer Science (FOCS 2021), IEEE, 2022, pp. 954–964

work page 2021

[7] [7]

Feldman, A

V. Feldman, A. McMillan, and K. Talwar,Stronger Privacy Amplification by Shuffling for Rényi and Approximate Differential Privacy, inProceedings of the 2023 Annual ACM-SIAM Symposium on Discrete Algorithms, SIAM, 2023, pp. 4966–4981. 40 ALEX SHVETS

work page 2023

[8] [8]

Feller,An Introduction to Probability Theory and Its Applications, Vol

W. Feller,An Introduction to Probability Theory and Its Applications, Vol. II, 2nd ed., Wiley, New York, 1971

work page 1971

[9] [9]

Le Cam,Asymptotic Methods in Statistical Decision Theory, Springer, New York, 1986

L. Le Cam,Asymptotic Methods in Statistical Decision Theory, Springer, New York, 1986

work page 1986

[10] [10]

A. W. Marshall, I. Olkin, and B. C. Arnold,Inequalities: Theory of Majorization and Its Applications, 2nd ed., Springer, New York, 2011

work page 2011

[11] [11]

V. V. Petrov,Sums of Independent Random Variables, Springer, Berlin, 1975

work page 1975

[12] [12]

Shvets,Universal Shuffle Asymptotics: Sharp Privacy Analysis in the Gaussian Regime, arXiv:2602.09029, 2026

A. Shvets,Universal Shuffle Asymptotics: Sharp Privacy Analysis in the Gaussian Regime, arXiv:2602.09029, 2026

work page arXiv 2026

[13] [13]

Shvets,Universal Shuffle Asymptotics, Part II: Non-Gaussian Limits for Shuffle Privacy—Poisson, Skellam, and Compound-Poisson Regimes, arXiv:2603.10073, 2026

A. Shvets,Universal Shuffle Asymptotics, Part II: Non-Gaussian Limits for Shuffle Privacy—Poisson, Skellam, and Compound-Poisson Regimes, arXiv:2603.10073, 2026

work page arXiv 2026

[14] [14]

A.Shvets,Universal Shuffle Asymptotics, Part III: Dominant-Block Quotient Geometry and Hybrid Gaussian–Compound-Poisson Limits in Finite-Alphabet Shuffle Privacy, arXiv:2603.13407, 2026

work page arXiv 2026

[15] [15]

Takagi and C

S. Takagi and C. Liew,Analysis of Shuffling Beyond Pure Local Differential Privacy, arXiv:2601.19154, 2026; accepted to PODS 2026. Independent Researcher, Haifa, Israel Email address:alt178332@gmail.com

work page arXiv 2026