Growing Alphabets in Canonical Shuffle Experiments: Likelihood-Ratio Laws, Estimation Bounds, and Low-Budget Equivariant Design
Pith reviewed 2026-05-15 09:05 UTC · model grok-4.3
The pith
Alphabet growth improves canonical shuffled privacy iff the worst pairwise law collapses to delta_1.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The pairwise likelihood-ratio law nu_ab,d, the pushforward of the row ratio under the null row, is the governing invariant that renders the canonical shuffled histogram experiment equivalent to the quotient multinomial experiment. Alphabet growth improves canonical shuffled privacy if and only if the worst pairwise law collapses to delta_1. A sharp pure-LDP endpoint principle holds for the pairwise chi-squared, full-support obstruction families saturate it, and a diluting/persistent dichotomy is established with explicit finite-n hockey-stick bounds. The worst-case pairwise budget chi_*(W) governs a two-regime Assouad lower bound for estimators, symmetrization reduces the uniform-pointFisher
What carries the argument
The pairwise likelihood-ratio law nu_ab,d (pushforward of the row ratio under the null row), which generates the equivalent quotient multinomial experiment and determines whether alphabet growth improves privacy.
If this is right
- Privacy improves with alphabet growth exactly when the worst pairwise law collapses to delta_1.
- The worst-case pairwise budget chi_*(W) yields sharp two-regime Assouad lower bounds on estimation error.
- Explicit finite-n hockey-stick bounds separate diluting and persistent privacy behaviors.
- Augmented GRR is optimal among permutation-equivariant channels in the low-budget regime.
- Symmetrization reduces the uniform-point Fisher criterion to permutation-equivariant channel design.
Where Pith is reading between the lines
- Mechanism designers could deliberately choose alphabets that enforce the collapse condition to strengthen privacy in deployed shuffle systems.
- The obstruction families provide explicit worst-case constructions that may transfer to related models such as approximate LDP or multi-step shuffles.
- The diluting/persistent dichotomy offers a practical test for selecting output alphabets in finite-sample frequency estimation tasks.
- These invariants could inform budget allocation when combining shuffle privacy with other mechanisms like aggregation.
Load-bearing premise
The setup assumes canonical one-step neighboring shuffle experiments for finite-output epsilon_0-LDP d-ary channels along growing alphabets, governed by the pairwise chi-squared budget.
What would settle it
A concrete channel family where alphabet growth improves the privacy parameters even though the worst pairwise likelihood-ratio law fails to collapse to delta_1, or a numerical check showing violation of the hockey-stick bounds for small n and specific d.
read the original abstract
We study canonical one-step neighboring shuffle experiments for finite-output epsilon_0-LDP d-ary channels along growing alphabets, with frequency estimation and mechanism design under a pairwise chi-squared budget. The pairwise likelihood-ratio law nu_{ab,d} (pushforward of the row ratio under the null row) is the governing invariant: the canonical shuffled histogram experiment is exactly equivalent to the quotient multinomial experiment generated by nu_{ab,d}. Alphabet growth improves canonical shuffled privacy iff the worst pairwise law collapses to delta_1. We prove a sharp pure-LDP endpoint principle for the pairwise chi-squared, construct full-support obstruction families saturating it, and establish a diluting/persistent dichotomy with explicit finite-n hockey-stick bounds. The worst-case pairwise budget chi_*(W) governs a two-regime Assouad lower bound for arbitrary estimators in the i.i.d. multinomial model. Symmetrization reduces the uniform-point Fisher criterion to permutation-equivariant channels. Calibrated GRR is not optimal; in the low-budget regime, augmented GRR is optimal among permutation-equivariant channels.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript examines canonical one-step neighboring shuffle experiments for finite-output ε₀-LDP d-ary channels with growing alphabets, under a pairwise chi-squared budget for frequency estimation and mechanism design. The governing invariant is the pairwise likelihood-ratio law ν_{ab,d}, under which the canonical shuffled histogram experiment is exactly equivalent to the quotient multinomial experiment generated by ν_{ab,d}. The central claim is that alphabet growth improves canonical shuffled privacy if and only if the worst pairwise law collapses to δ₁. The paper proves a sharp pure-LDP endpoint principle for the pairwise chi-squared, constructs full-support obstruction families that saturate it, and establishes a diluting/persistent dichotomy with explicit finite-n hockey-stick bounds. The worst-case budget χ_*(W) governs a two-regime Assouad lower bound; symmetrization reduces the uniform-point Fisher criterion to permutation-equivariant channels; and augmented GRR is shown to be optimal among such channels in the low-budget regime.
Significance. If the central derivations hold, the work supplies a sharp, invariant-based characterization of privacy-utility tradeoffs in shuffle models as output alphabets grow, which is directly relevant to local DP mechanism design and frequency estimation. The explicit construction of saturating obstruction families, the parameter-free endpoint principle, and the finite-n hockey-stick bounds are concrete strengths. The reduction to permutation-equivariant channels and the identification of augmented GRR optimality in the low-budget regime offer actionable design guidance.
major comments (2)
- [Equivalence statement (abstract and the section establishing the governing invariant)] The equivalence between the canonical shuffled histogram and the quotient multinomial generated by ν_{ab,d} is asserted to be exact, yet the manuscript must verify uniformity in d. Without explicit tail control on the full-support obstruction families, the pushforward ν_{ab,d} may acquire d-dependent support or tails, which would undermine the sharpness of the iff claim that alphabet growth improves privacy precisely when the worst law collapses to δ₁ and would weaken the diluting/persistent dichotomy for large d.
- [Endpoint principle and obstruction families (abstract and the section containing the pure-LDP endpoint)] The sharp pure-LDP endpoint principle for the pairwise chi-squared and the construction of saturating obstruction families presuppose that the canonical one-step neighboring relation preserves measure class as d grows. If the neighboring relation changes measure class with alphabet size, the endpoint saturation and the subsequent Assouad lower bound governed by χ_*(W) lose uniformity; the manuscript should supply an explicit uniformity argument or counterexample family.
minor comments (1)
- [Notation and definitions] Clarify at first use whether χ_*(W) is defined as the supremum over all pairs or only the worst pair; the current notation risks ambiguity when the budget is invoked for the two-regime Assouad bound.
Simulated Author's Rebuttal
We thank the referee for the careful reading and for highlighting the need for explicit uniformity arguments with respect to alphabet size d. We address each major comment below and will incorporate the requested clarifications and supporting lemmas in the revised manuscript.
read point-by-point responses
-
Referee: [Equivalence statement (abstract and the section establishing the governing invariant)] The equivalence between the canonical shuffled histogram and the quotient multinomial generated by ν_{ab,d} is asserted to be exact, yet the manuscript must verify uniformity in d. Without explicit tail control on the full-support obstruction families, the pushforward ν_{ab,d} may acquire d-dependent support or tails, which would undermine the sharpness of the iff claim that alphabet growth improves privacy precisely when the worst law collapses to δ₁ and would weaken the diluting/persistent dichotomy for large d.
Authors: The equivalence is exact for each fixed d by construction of the pushforward of the row ratio under the null row, which maps the canonical shuffled histogram directly onto the quotient multinomial generated by ν_{ab,d}. To address uniformity, the revision will add a dedicated lemma establishing that the full-support obstruction families admit tail bounds controlled uniformly by the pairwise chi-squared budget; this bound is independent of d and ensures that support remains full and tails do not introduce d-dependent artifacts. Consequently the iff characterization and the diluting/persistent dichotomy remain sharp for all d. revision: yes
-
Referee: [Endpoint principle and obstruction families (abstract and the section containing the pure-LDP endpoint)] The sharp pure-LDP endpoint principle for the pairwise chi-squared and the construction of saturating obstruction families presuppose that the canonical one-step neighboring relation preserves measure class as d grows. If the neighboring relation changes measure class with alphabet size, the endpoint saturation and the subsequent Assouad lower bound governed by χ_*(W) lose uniformity; the manuscript should supply an explicit uniformity argument or counterexample family.
Authors: The full-support obstruction families are constructed so that every row places strictly positive mass on the entire alphabet for every d; the one-step neighboring relation therefore preserves measure class independently of d. The revision will insert a short uniformity remark immediately after the endpoint principle, showing that the pure-LDP saturation and the subsequent χ_*(W)-governed Assouad bound hold uniformly because the chi-squared budget controls the likelihood ratios uniformly across d. No counterexample family is needed under this construction. revision: yes
Circularity Check
Derivation chain is self-contained with no load-bearing reductions to inputs
full rationale
The paper defines the canonical one-step neighboring shuffle model for ε₀-LDP d-ary channels and introduces the pairwise likelihood-ratio law ν_{ab,d} as the pushforward invariant under the null row. It states the equivalence of the shuffled histogram experiment to the quotient multinomial generated by ν_{ab,d} as a direct modeling fact, then derives the iff condition on alphabet growth (worst law collapsing to δ₁), the pure-LDP endpoint for pairwise chi-squared, obstruction families, and finite-n hockey-stick bounds as consequences. No equations reduce a claimed prediction or result to a fitted parameter by construction, no self-citations are used for uniqueness or ansatz justification, and no known empirical pattern is merely renamed. The central claims rest on explicit constructions and bounds that do not presuppose their own conclusions, rendering the derivation self-contained.
Axiom & Free-Parameter Ledger
axioms (2)
- domain assumption Canonical one-step neighboring shuffle experiments for finite-output epsilon_0-LDP d-ary channels
- domain assumption Pairwise chi-squared budget governs the two-regime Assouad lower bound
Lean theorems connected to this paper
-
IndisputableMonolith/Cost/FunctionalEquation.leanwashburn_uniqueness_aczel; dAlembert_to_ODE_general matches?
matchesMATCHES: this paper passage directly uses, restates, or depends on the cited Recognition theorem or module.
Universal extremal χ² bound: χ²(Wd(·|b)∥Wd(·|a)) ≤ (e^ε0−1)²/e^ε0, sharp iff the likelihood ratio is two-point at endpoints
-
IndisputableMonolith/Foundation/AlphaCoordinateFixation.leancostAlphaLog_high_calibrated_iff; J_uniquely_calibrated_via_higher_derivative echoes?
echoesECHOES: this paper passage has the same mathematical shape or conceptual pattern as the Recognition theorem, but is not a direct formal dependency.
LR-quotient compression: full histogram experiment equivalent to multinomial on level sets of w_ab,d; invariant is pushforward ν_ab,d on [e^{-ε0},e^ε0]
-
IndisputableMonolith/Foundation/BranchSelection.leanbranch_selection; RCLCombiner_isCoupling_iff matches?
matchesMATCHES: this paper passage directly uses, restates, or depends on the cited Recognition theorem or module.
Diluting/persistent dichotomy: family diluting iff worst ν_adbd,d ⇒ δ1 (equivalently I*→0); persistent otherwise, no extra d-gain
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Reference graph
Works this paper leans on
- [1]
-
[2]
R. Bhatia and C. Davis,A Better Bound on the Variance, Amer. Math. Monthly107 (2000), no. 4, 353–357
work page 2000
- [3]
-
[4]
A. Cheu, A. Smith, J. Ullman, D. Zeber, and M. Zhilyaev,Distributed Differential Privacy via Shuffling, inAdvances in Cryptology—EUROCRYPT 2019, Lecture Notes in Comput. Sci.11476, Springer, 2019, pp. 375–403
work page 2019
-
[5]
Ú. Erlingsson, V. Feldman, I. Mironov, A. Raghunathan, K. Talwar, and A. Thakurta, Amplification by Shuffling: From Local to Central Differential Privacy via Anonymity, inProceedings of the Thirtieth Annual ACM-SIAM Symposium on Discrete Algorithms, SIAM, 2019, pp. 2468–2479
work page 2019
-
[6]
V. Feldman, A. McMillan, and K. Talwar,Hiding Among the Clones: A Simple and Nearly Optimal Analysis of Privacy Amplification by Shuffling, in62nd IEEE Annual Symposium on Foundations of Computer Science (FOCS 2021), IEEE, 2022, pp. 954–964
work page 2021
-
[7]
V. Feldman, A. McMillan, and K. Talwar,Stronger Privacy Amplification by Shuffling for Rényi and Approximate Differential Privacy, inProceedings of the 2023 Annual ACM-SIAM Symposium on Discrete Algorithms, SIAM, 2023, pp. 4966–4981. 40 ALEX SHVETS
work page 2023
-
[8]
Feller,An Introduction to Probability Theory and Its Applications, Vol
W. Feller,An Introduction to Probability Theory and Its Applications, Vol. II, 2nd ed., Wiley, New York, 1971
work page 1971
-
[9]
Le Cam,Asymptotic Methods in Statistical Decision Theory, Springer, New York, 1986
L. Le Cam,Asymptotic Methods in Statistical Decision Theory, Springer, New York, 1986
work page 1986
-
[10]
A. W. Marshall, I. Olkin, and B. C. Arnold,Inequalities: Theory of Majorization and Its Applications, 2nd ed., Springer, New York, 2011
work page 2011
-
[11]
V. V. Petrov,Sums of Independent Random Variables, Springer, Berlin, 1975
work page 1975
-
[12]
A. Shvets,Universal Shuffle Asymptotics: Sharp Privacy Analysis in the Gaussian Regime, arXiv:2602.09029, 2026
-
[13]
A. Shvets,Universal Shuffle Asymptotics, Part II: Non-Gaussian Limits for Shuffle Privacy—Poisson, Skellam, and Compound-Poisson Regimes, arXiv:2603.10073, 2026
- [14]
-
[15]
S. Takagi and C. Liew,Analysis of Shuffling Beyond Pure Local Differential Privacy, arXiv:2601.19154, 2026; accepted to PODS 2026. Independent Researcher, Haifa, Israel Email address:alt178332@gmail.com
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.