Domain Compression and its Application to Randomness-Optimal Distributed Goodness-of-Fit

Cl\'ement L. Canonne; Himanshu Tyagi; Jayadev Acharya; Yanjun Han; Ziteng Sun

arxiv: 1907.08743 · v1 · pith:FDRNGE7Lnew · submitted 2019-07-20 · 💻 cs.DS · cs.DM· cs.IT· cs.LG· math.IT· math.ST· stat.TH

Domain Compression and its Application to Randomness-Optimal Distributed Goodness-of-Fit

Jayadev Acharya , Cl\'ement L. Canonne , Yanjun Han , Ziteng Sun , Himanshu Tyagi This is my paper

Pith reviewed 2026-05-24 19:08 UTC · model grok-4.3

classification 💻 cs.DS cs.DMcs.ITcs.LGmath.ITmath.STstat.TH

keywords distributed goodness-of-fitdomain compressionshared randomnesssample complexitycommunication constraintslocal privacyinformation constraints

0 comments

The pith

Domain compression enables distributed goodness-of-fit protocols whose sample complexity interpolates between private-coin and public-coin regimes based on the amount of shared randomness.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper establishes a tight trade-off among shared randomness, information constraints such as communication or local privacy, and sample complexity for testing whether samples held by multiple users come from a known discrete distribution. It introduces a domain compression primitive that maps distributions to a much smaller domain while preserving pairwise distances, and uses it to build a protocol that smoothly varies the number of samples required as shared randomness increases. Matching lower bounds are obtained from a general framework that applies to arbitrary information constraints and yields optimality when specialized to communication and local privacy.

Core claim

The authors provide a general distributed goodness-of-fit protocol that as a function of the amount of shared randomness interpolates smoothly between the private- and public-coin sample complexities. This upper bound is complemented by a general framework to prove lower bounds on sample complexity under limited shared randomness; when instantiated for communication and local privacy the resulting bounds are optimal as a function of all parameters including the amount of shared randomness. A key component is the new primitive of domain compression, which maps distributions to a much smaller domain size while preserving their pairwise distances using a limited amount of randomness.

What carries the argument

domain compression, a primitive that maps distributions over a large domain to a much smaller domain while preserving pairwise distances using limited randomness

If this is right

Sample complexity is optimal as a function of shared randomness, communication budget, and privacy parameter for the communication-constrained case.
Sample complexity is optimal as a function of shared randomness, privacy parameter, and communication for the locally private case.
The lower-bound framework produces matching bounds once instantiated for any fixed information constraint.
The protocol provides a continuous interpolation of sample complexity between the private-coin and public-coin extremes.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Domain compression may reduce computational cost in non-distributed testing or estimation tasks by shrinking the effective support size.
The same trade-off analysis could be applied to other distributed inference problems such as closeness testing or property estimation.
Partial shared randomness could be treated as a tunable resource in practical systems where full public randomness is expensive to coordinate.

Load-bearing premise

The lower-bound framework is general enough that no tighter protocol exists under the analyzed information constraints and shared-randomness regimes.

What would settle it

An explicit protocol achieving the public-coin sample complexity while using strictly less shared randomness than the amount required by the upper-bound construction, or a concrete instance where the lower-bound framework fails to match an achievable upper bound for communication or local privacy.

read the original abstract

We study goodness-of-fit of discrete distributions in the distributed setting, where samples are divided between multiple users who can only release a limited amount of information about their samples due to various information constraints. Recently, a subset of the authors showed that having access to a common random seed (i.e., shared randomness) leads to a significant reduction in the sample complexity of this problem. In this work, we provide a complete understanding of the interplay between the amount of shared randomness available, the stringency of information constraints, and the sample complexity of the testing problem by characterizing a tight trade-off between these three parameters. We provide a general distributed goodness-of-fit protocol that as a function of the amount of shared randomness interpolates smoothly between the private- and public-coin sample complexities. We complement our upper bound with a general framework to prove lower bounds on the sample complexity of this testing problems under limited shared randomness. Finally, we instantiate our bounds for the two archetypal information constraints of communication and local privacy, and show that our sample complexity bounds are optimal as a function of all the parameters of the problem, including the amount of shared randomness. A key component of our upper bounds is a new primitive of domain compression, a tool that allows us to map distributions to a much smaller domain size while preserving their pairwise distances, using a limited amount of randomness.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper studies distributed goodness-of-fit testing for discrete distributions under information constraints (communication and local privacy), characterizing the sample complexity as a function of the amount of shared randomness. It introduces a domain compression primitive that maps distributions to smaller domains while preserving distances using limited randomness, provides a general protocol interpolating between private- and public-coin regimes, and develops a lower-bound framework claimed to apply to arbitrary constraints and to yield matching bounds when instantiated for the two archetypal settings.

Significance. If the matching upper and lower bounds hold and the lower-bound framework is indeed general, the work would give a tight, parameter-complete characterization of randomness-sample trade-offs in distributed testing, strengthening the understanding of how shared randomness interacts with local constraints. The domain compression primitive is presented as reusable and could apply beyond this problem.

major comments (2)

[Lower-bound framework (instantiations for communication and local privacy)] The lower-bound framework (developed to prove optimality as a function of shared randomness) is asserted to apply to arbitrary information constraints and to be tight once instantiated, but the reduction technique and the precise information-theoretic quantity bounded are not shown to avoid implicit reliance on convexity or product structure that may hold only for communication and local privacy; this is load-bearing for the central claim that the bounds are optimal for all parameters including shared randomness.
[Domain compression primitive and upper-bound protocol] The domain compression primitive is central to the upper bounds, yet the manuscript does not provide an explicit statement of the randomness cost or the distance-preservation guarantee (e.g., in terms of total variation or Hellinger) that would allow independent verification that the protocol indeed interpolates smoothly between the private- and public-coin sample complexities.

minor comments (2)

[Introduction and preliminaries] Notation for the amount of shared randomness (e.g., the parameter r) should be introduced once with a clear definition before its use in the trade-off statements.
[Experimental or illustrative figures] Figure captions and axis labels for the sample-complexity plots should explicitly indicate the dependence on the shared-randomness parameter.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the careful reading and constructive comments. We address each major comment below and will make revisions to clarify the generality of the lower-bound framework and to explicitly state the properties of domain compression.

read point-by-point responses

Referee: [Lower-bound framework (instantiations for communication and local privacy)] The lower-bound framework (developed to prove optimality as a function of shared randomness) is asserted to apply to arbitrary information constraints and to be tight once instantiated, but the reduction technique and the precise information-theoretic quantity bounded are not shown to avoid implicit reliance on convexity or product structure that may hold only for communication and local privacy; this is load-bearing for the central claim that the bounds are optimal for all parameters including shared randomness.

Authors: We thank the referee for this observation. The lower-bound framework is designed to apply to arbitrary local information constraints via a minimax reduction over distributions and shared randomness that bounds a suitable information-theoretic quantity (a divergence between the induced message distributions). The proof does not rely on convexity or product structure beyond the definition of the constraint itself. To make this explicit, we will add a subsection in the revision that states the precise assumptions and sketches why the argument holds generally, without implicit dependence on the specific structures of communication or privacy channels. This addresses the concern while preserving the central claim. revision: yes
Referee: [Domain compression primitive and upper-bound protocol] The domain compression primitive is central to the upper bounds, yet the manuscript does not provide an explicit statement of the randomness cost or the distance-preservation guarantee (e.g., in terms of total variation or Hellinger) that would allow independent verification that the protocol indeed interpolates smoothly between the private- and public-coin sample complexities.

Authors: We agree that an explicit formal statement would aid verification. The domain compression primitive uses O(log n) bits of randomness (where n is the original domain size) and preserves total variation distance up to a multiplicative factor depending on the compression parameter k, which directly yields the smooth interpolation between private- and public-coin regimes. In the revised manuscript we will add a dedicated theorem statement with these guarantees (randomness cost and distance preservation in TV) together with a short proof sketch, allowing independent checking of the interpolation property. revision: yes

Circularity Check

0 steps flagged

No significant circularity; derivation self-contained via new primitives and framework

full rationale

The paper introduces a new domain compression primitive for the upper bound protocol and a new general lower-bound framework, then instantiates matching bounds for communication and local privacy. The sole self-reference is to prior qualitative work on shared randomness by a subset of authors, which is not used to justify the quantitative trade-off or optimality claims. No equation reduces to a fitted input by construction, no uniqueness theorem is imported from overlapping authors, and the central matching bounds are presented as derived within this work rather than presupposed.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 1 invented entities

Ledger populated from abstract only; no explicit free parameters, axioms, or invented entities beyond the new primitive are stated.

axioms (1)

standard math Standard assumptions of discrete finite-support distributions and information-theoretic constraints (communication or local privacy).
The problem statement presupposes these background notions from statistics and information theory.

invented entities (1)

domain compression no independent evidence
purpose: Map distributions to a much smaller domain while preserving pairwise distances using limited shared randomness.
Explicitly introduced as the key new primitive enabling the upper bounds.

pith-pipeline@v0.9.0 · 5809 in / 1280 out tokens · 22577 ms · 2026-05-24T19:08:18.642543+00:00 · methodology

Domain Compression and its Application to Randomness-Optimal Distributed Goodness-of-Fit

Core claim

What carries the argument

If this is right

Where Pith is reading between the lines

Load-bearing premise

What would settle it

discussion (0)