Order-Optimal Sequential 1-Bit Mean Estimation in General Tail Regimes

Ivan Lau; Jonathan Scarlett

arxiv: 2604.07796 · v2 · pith:CXKREENAnew · submitted 2026-04-09 · 📊 stat.ML · cs.IT· cs.LG· math.IT· math.ST· stat.TH

Order-Optimal Sequential 1-Bit Mean Estimation in General Tail Regimes

Ivan Lau , Jonathan Scarlett This is my paper

Pith reviewed 2026-05-25 06:44 UTC · model grok-4.3

classification 📊 stat.ML cs.ITcs.LGmath.ITmath.STstat.TH

keywords mean estimation1-bit quantizationadaptive samplingPAC estimationmoment boundssample complexityinformation-theoretic lower boundstail regimes

0 comments

The pith

An adaptive sequence of randomized 1-bit threshold queries estimates the mean to accuracy ε with sample complexity matching the unquantized rate plus small factors for any fixed moment bound k>1.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper constructs a sequential estimator that selects randomized thresholds adaptively after each 1-bit outcome and proves it is (ε, δ)-PAC for any distribution whose mean lies in a known interval and whose k-th central moment is bounded by σ^k. The sample complexity is order-optimal for every k>1: when k≠2 it recovers the best known unquantized rate together with an unavoidable localization cost of order log(λ/σ), while for k=2 it incurs an extra multiplicative log(σ/ε) factor that the authors show is information-theoretically necessary under 1-bit constraints. The same analysis yields a large adaptivity gap, showing that any non-adaptive procedure must pay a linear factor in λ/σ. Algorithmic extensions handle unknown sample budgets and unknown scale parameters while preserving the order-optimal guarantees.

Core claim

There exists a single adaptive procedure based on randomized threshold queries that produces an (ε, δ)-PAC mean estimate whose sample complexity is order-optimal simultaneously for every tail regime indexed by k>1. For k≠2 the complexity equals the unquantized minimax rate plus an O(log(λ/σ)) localization term; for finite variance (k=2) an extra O(log(σ/ε)) factor appears and is shown to be unavoidable by a new information-theoretic lower bound. Non-adaptive estimators, even with general interval queries, require sample complexity linear in λ/σ.

What carries the argument

The adaptive, sequential choice of randomized thresholds for 1-bit queries that both localizes the mean and refines the estimate in one pass.

If this is right

For every k≠2 the estimator matches the best unquantized sample complexity up to the localization cost log(λ/σ).
For k=2 the extra log(σ/ε) multiplicative penalty is fundamental and cannot be removed by any 1-bit method.
Any non-adaptive procedure, even with interval queries, must scale linearly with λ/σ.
Simple two-stage and multi-sample variants retain order optimality while relaxing the number of adaptive rounds or reducing total bits.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The necessity of the log(σ/ε) term under 1-bit constraints may extend to other location estimation problems with quantization.
The demonstrated adaptivity gap suggests that many existing non-adaptive quantization schemes in sensor networks are fundamentally inefficient for mean estimation.
The two-stage variant offers a practical middle ground when full sequential adaptivity is costly to implement.

Load-bearing premise

Thresholds must be allowed to depend on all previous 1-bit outcomes and may be randomized.

What would settle it

An explicit 1-bit estimator (adaptive or not) for a finite-variance distribution that achieves (ε, δ)-PAC accuracy with sample complexity free of the extra log(σ/ε) factor, or a matching lower-bound proof that removes the factor.

read the original abstract

In this paper, we study the problem of mean estimation under 1-bit communication constraints. We propose a novel adaptive mean estimator based solely on randomized threshold queries, where each 1-bit outcome indicates whether a given sample exceeds a sequentially chosen threshold. Our estimator is $(\epsilon, \delta)$-PAC for any distribution with a bounded mean $\mu \in [-\lambda, \lambda]$ and a bounded $k$-th central moment $\mathbb{E}[|X-\mu|^k] \le \sigma^k$ for any fixed $k > 1$. Moreover, our sample complexity is order-optimal in all such tail regimes, i.e., for every such $k$ value. For $k \neq 2$, our estimator's sample complexity matches the unquantized minimax lower bounds plus an unavoidable $O(\log(\lambda/\sigma))$ localization cost. For the finite-variance case ($k=2$), our estimator's sample complexity has an extra multiplicative $O(\log(\sigma/\epsilon))$ penalty, and we establish a novel information-theoretic lower bound showing that this penalty is a fundamental limit of 1-bit quantization. We also establish a significant adaptivity gap: for both threshold queries and more general interval queries, the sample complexity of any non-adaptive estimator must scale linearly with the search space parameter $\lambda/\sigma$, rendering it vastly less sample efficient than our adaptive approach. Finally, we present algorithmic variants that (i) handle an unknown sampling budget, (ii) adapt to an unknown scale parameter $\sigma$ given (possibly loose) bounds, (iii) require only two stages of adaptivity to achieve order-optimal sample complexity at the expense of more general 1-bit queries, and (iv) leverage multiple local samples per 1-bit query to proportionally reduce communication costs.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper gives matching upper and lower bounds for adaptive 1-bit mean estimation under k-moment tails, with a new lower bound showing the extra log factor is necessary when k=2.

read the letter

The core contribution is an adaptive estimator that uses randomized thresholds chosen sequentially from previous 1-bit outcomes. It achieves the unquantized minimax rate plus a localization term for k ≠ 2, and for k=2 it incurs an extra O(log(σ/ε)) factor that the authors lower-bound as unavoidable under this query model. They also prove a clear adaptivity gap: non-adaptive schemes must pay linear in λ/σ. That gap and the k=2 lower bound are the genuinely new pieces relative to prior work on quantized estimation. The constructions look technically careful on the upper-bound side, and the variants for unknown budget or unknown σ are useful practical extensions. The main limitation is that everything is derived inside the randomized adaptive threshold model. The abstract is explicit about this, but the “fundamental limit of 1-bit quantization” phrasing only holds under those assumptions; deterministic or non-adaptive thresholds change the rates. Without the full proofs it is hard to judge tightness for general k, but the abstract claims and the stress-test note line up on the modeling choice. This is for people working on communication-constrained or quantized statistical estimation. Anyone thinking about information-theoretic limits under moment conditions will find the lower bound and the adaptivity comparison worth reading. I would send it to peer review; the new lower bound and the clean separation of adaptive versus non-adaptive regimes are substantive enough to justify referee time even if some modeling caveats need tightening.

Referee Report

2 major / 2 minor

Summary. The paper proposes an adaptive estimator for 1-bit mean estimation based on sequential randomized threshold queries. It claims that the estimator is (ε, δ)-PAC for distributions with mean μ ∈ [-λ, λ] and bounded k-th central moment E[|X-μ|^k] ≤ σ^k for any fixed k > 1, with sample complexity that is order-optimal in all such regimes: matching the unquantized minimax rate plus O(log(λ/σ)) localization for k ≠ 2, and incurring an extra O(log(σ/ε)) factor for k=2 that is shown necessary by a new information-theoretic lower bound. The work also establishes an adaptivity gap (non-adaptive estimators scale linearly with λ/σ) and provides variants for unknown budgets, unknown σ, two-stage adaptivity, and multi-sample queries.

Significance. If the upper and lower bounds hold, the paper makes a solid contribution to communication-constrained estimation by giving order-optimal algorithms that work uniformly across tail regimes (k > 1) and by isolating a genuine information-theoretic penalty that appears only for finite variance. The explicit treatment of adaptivity, the novel k=2 lower bound, and the algorithmic variants for unknown parameters are all positive features. The results are relevant to both theoretical statistics and practical distributed learning settings.

major comments (2)

[§2 and §4] §2 (Query Model) and §4 (Upper Bound Construction): The order-optimality claims, including the matching lower bound for the extra O(log(σ/ε)) factor when k=2, are derived under the adaptive randomized-threshold model. The manuscript should state more explicitly whether the same rates (or the necessity of the log penalty) continue to hold if thresholds must be deterministic or chosen non-adaptively; this modeling choice is load-bearing for the “fundamental limit of 1-bit quantization” statement.
[§5] §5 (Lower Bound for k=2): The novel information-theoretic lower bound establishing necessity of the extra logarithmic factor is central to the finite-variance claim. The proof should be expanded to show precisely how the mutual-information or Fano-style argument accounts for the adaptive, randomized choice of thresholds; without that detail the lower-bound argument cannot be verified as tight.

minor comments (2)

The notation for the localization cost O(log(λ/σ)) is used in several places; a single displayed equation collecting all the sample-complexity expressions (upper and lower) for each regime of k would improve readability.
Figure 1 (or the corresponding plot of sample complexity vs. λ/σ) would benefit from an additional curve showing the non-adaptive baseline to visually illustrate the adaptivity gap claimed in §6.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the positive evaluation and the recommendation of minor revision. We address each major comment below.

read point-by-point responses

Referee: [§2 and §4] §2 (Query Model) and §4 (Upper Bound Construction): The order-optimality claims, including the matching lower bound for the extra O(log(σ/ε)) factor when k=2, are derived under the adaptive randomized-threshold model. The manuscript should state more explicitly whether the same rates (or the necessity of the log penalty) continue to hold if thresholds must be deterministic or chosen non-adaptively; this modeling choice is load-bearing for the “fundamental limit of 1-bit quantization” statement.

Authors: Our results, including both the upper bounds in Section 4 and the k=2 lower bound in Section 5, are derived specifically for the adaptive randomized-threshold query model defined in Section 2. The paper makes no claim that the stated rates or the necessity of the log(σ/ε) penalty extend to deterministic or non-adaptive thresholds. Section 6 already establishes an adaptivity gap showing linear dependence on λ/σ for non-adaptive estimators under both threshold and interval queries. We will add an explicit clarifying sentence in the introduction and at the end of Section 2 stating that all optimality claims are with respect to the adaptive randomized model and that the deterministic case is left open. revision: yes
Referee: [§5] §5 (Lower Bound for k=2): The novel information-theoretic lower bound establishing necessity of the extra logarithmic factor is central to the finite-variance claim. The proof should be expanded to show precisely how the mutual-information or Fano-style argument accounts for the adaptive, randomized choice of thresholds; without that detail the lower-bound argument cannot be verified as tight.

Authors: We agree that greater detail on this point will improve verifiability. The appendix proof applies Fano’s inequality to a carefully constructed packing of distributions and bounds the mutual information by the worst-case information revealed by any adaptive randomized threshold sequence. We will expand the high-level argument in Section 5 and insert a new paragraph in the appendix that explicitly walks through how the supremum over adaptive randomized policies is handled in the mutual-information term, making the accounting for adaptivity and randomization fully transparent. revision: yes

Circularity Check

0 steps flagged

No significant circularity; derivations are self-contained new constructions and bounds.

full rationale

The paper derives its central claims via a novel adaptive estimator construction (upper bounds) and a new information-theoretic lower bound for the k=2 case, both obtained directly from analysis of the adaptive randomized-threshold query model. These steps do not reduce to fitted parameters renamed as predictions, self-citations as load-bearing premises, or any of the enumerated circular patterns. The order-optimality statements follow from explicit matching of independently derived upper and lower bounds rather than by construction from inputs. The modeling assumptions (adaptive randomized thresholds) are stated explicitly and do not create definitional circularity within the derivation chain.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The central claim rests on domain assumptions of bounded mean and k-th central moment together with the randomized threshold query model; no free parameters are fitted because the work derives worst-case sample-complexity bounds rather than data-dependent fits.

axioms (2)

domain assumption The unknown distribution has mean μ bounded in [−λ, λ] and finite k-th central moment bounded by σ^k
Invoked to obtain the (ε, δ)-PAC guarantee and the stated sample complexities
domain assumption Randomized threshold queries are allowed and thresholds may be chosen adaptively
The query model that enables the claimed adaptivity and order-optimality

pith-pipeline@v0.9.0 · 5875 in / 1445 out tokens · 38005 ms · 2026-05-25T06:44:39.962714+00:00 · methodology

Review history (2 revisions) →

discussion (0)

Reference graph

Works this paper leans on

7 extracted references · 7 canonical work pages

[1]

F X pL`σq ą0.49. 8For ease of analysis, we assume thatλis an integer multiple ofσ. 19 On the other hand, by the “k-moment

PMLR. Paes Leme, R., Sivan, B., Teng, Y ., and Worah, P. (2023). Pricing query complexity of revenue maximization. In ACM-SIAM Symposium on Discrete Algorithms (SODA), pages 399–415. SIAM. Polyanskiy, Y . and Wu, Y . (2025).Information Theory: From Coding to Learning. Cambridge University Press. Ribeiro, A. and Giannakis, G. B. (2006a). Bandwidth-constrai...

work page arXiv 2023
[2]

IsX i,j ěa i?

Ask the agentn i threshold queries “IsX i,j ěa i?” forj“1, . . . , n i, wheren i is the regional sample budget determined in Step 5

work page
[3]

Generate independent random variablesT i,j „Unifpa i, biqforj“n i `1, . . . ,2n i

work page
[4]

IsX i,j ěT i,j?

Ask the agentn i randomized threshold queries “IsX i,j ěT i,j?” forj“n i `1, . . . ,2n i

work page
[5]

IsX i,j ďb i?

Compute the empirical averages based on the 1-bit feedback and perform the subtraction according to (2). The learner forms the corresponding estimateˆp bi using an analogous procedure, utilizing2n i fresh samples with queries “IsX i,j ďb i?” and “IsX i,j ďT i,j?”. We summarize the empirical estimates as ˆpai “ 1 ni ˜ niÿ j“1 1pXi,j ěa iq ¸ ´ 1 ni ˜ 2niÿ j...

work page 2020
[6]

“IsXP rα t, βts?

For a fixed distribution pair (indexed byj), an interval queryQ““IsXP rα t, βts?” is informative for distinguishing betweenD j,´ andD j,` only ifrα t, βtscontains exactly one of the two support pointstc j ˘σ{2u, i.e., ˇˇrαt, βts X tcj ˘σ{2u ˇˇ “1

work page
[7]

1 ĎDpλ, σqconstructed in the proof of Theorem 9, whereN“λ{σ´1. We will again establish a lower bound for this “hard subset

There are at most two indicesjfor which ˇˇrαt, βts X tcj ˘σ{2u ˇˇ “1. Fact 1 can be verified by analyzing the binary feedbackB“1tXP rα t, βtsufor all cases ofrα t, βts X tcj ˘σ{2u: ˇˇrαt, βts X tcj ˘σ{2u ˇˇ P t0,2u ù ñPr X„D j,´ pB“1q “Pr X„D j,` pB“1q ù ñQis uninformative, and ˇˇrαt, βts X tcj ˘σ{2u ˇˇ “1ù ñ ˇˇ Pr X„D j,´ pB“1q ´Pr X„D j,` pB“1q ˇˇ “ 2ϵ ...

work page 2020

[1] [1]

F X pL`σq ą0.49. 8For ease of analysis, we assume thatλis an integer multiple ofσ. 19 On the other hand, by the “k-moment

PMLR. Paes Leme, R., Sivan, B., Teng, Y ., and Worah, P. (2023). Pricing query complexity of revenue maximization. In ACM-SIAM Symposium on Discrete Algorithms (SODA), pages 399–415. SIAM. Polyanskiy, Y . and Wu, Y . (2025).Information Theory: From Coding to Learning. Cambridge University Press. Ribeiro, A. and Giannakis, G. B. (2006a). Bandwidth-constrai...

work page arXiv 2023

[2] [2]

IsX i,j ěa i?

Ask the agentn i threshold queries “IsX i,j ěa i?” forj“1, . . . , n i, wheren i is the regional sample budget determined in Step 5

work page

[3] [3]

Generate independent random variablesT i,j „Unifpa i, biqforj“n i `1, . . . ,2n i

work page

[4] [4]

IsX i,j ěT i,j?

Ask the agentn i randomized threshold queries “IsX i,j ěT i,j?” forj“n i `1, . . . ,2n i

work page

[5] [5]

IsX i,j ďb i?

Compute the empirical averages based on the 1-bit feedback and perform the subtraction according to (2). The learner forms the corresponding estimateˆp bi using an analogous procedure, utilizing2n i fresh samples with queries “IsX i,j ďb i?” and “IsX i,j ďT i,j?”. We summarize the empirical estimates as ˆpai “ 1 ni ˜ niÿ j“1 1pXi,j ěa iq ¸ ´ 1 ni ˜ 2niÿ j...

work page 2020

[6] [6]

“IsXP rα t, βts?

For a fixed distribution pair (indexed byj), an interval queryQ““IsXP rα t, βts?” is informative for distinguishing betweenD j,´ andD j,` only ifrα t, βtscontains exactly one of the two support pointstc j ˘σ{2u, i.e., ˇˇrαt, βts X tcj ˘σ{2u ˇˇ “1

work page

[7] [7]

1 ĎDpλ, σqconstructed in the proof of Theorem 9, whereN“λ{σ´1. We will again establish a lower bound for this “hard subset

There are at most two indicesjfor which ˇˇrαt, βts X tcj ˘σ{2u ˇˇ “1. Fact 1 can be verified by analyzing the binary feedbackB“1tXP rα t, βtsufor all cases ofrα t, βts X tcj ˘σ{2u: ˇˇrαt, βts X tcj ˘σ{2u ˇˇ P t0,2u ù ñPr X„D j,´ pB“1q “Pr X„D j,` pB“1q ù ñQis uninformative, and ˇˇrαt, βts X tcj ˘σ{2u ˇˇ “1ù ñ ˇˇ Pr X„D j,´ pB“1q ´Pr X„D j,` pB“1q ˇˇ “ 2ϵ ...

work page 2020