Sequential 1-bit Mean Estimation with Near-Optimal Sample Complexity
Pith reviewed 2026-05-18 13:31 UTC · model grok-4.3
The pith
Sequential 1-bit interval queries can estimate the mean of bounded-mean bounded-variance distributions with sample complexity matching the unquantized minimax rate up to logarithmic factors plus an unavoidable log term.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The authors prove that their adaptive interval-query estimator is (ε, δ)-PAC for any distribution satisfying |E[X]| ≤ λ and Var(X) ≤ σ² using Õ(σ²/ε² log(1/δ) + log(λ/σ)) samples. This bound matches the minimax lower bound for the unquantized (full-information) setting up to logarithmic factors, with the extra log(λ/σ) term shown to be necessary even for interval queries. They further establish an adaptivity gap, where the best non-adaptive mean estimator requires substantially more samples for large λ/σ.
What carries the argument
Sequentially chosen randomized interval queries, where each 1-bit response indicates whether the sample lies in the adaptively selected interval, allowing progressive refinement of the mean estimate with minimal communication.
If this is right
- The estimator works for all distributions with the given bounds on mean and variance.
- It achieves near-optimality compared to full-precision sampling.
- Non-adaptive interval-query estimators are provably worse when λ/σ is large.
- Variants exist for unknown sampling budget and unknown variance within bounds.
- Stronger tail decay allows tightened bounds.
Where Pith is reading between the lines
- Such methods could enable efficient distributed mean estimation in resource-constrained networks where only 1 bit per sample can be sent.
- Extending this to multi-dimensional means or other statistics might follow similar adaptive query strategies.
- The two-stage variant suggests limited adaptivity suffices for near-optimal performance.
- Practical tests with real data varying λ/σ would show where the extra log term becomes noticeable.
Load-bearing premise
The bounds λ on the mean and σ on the standard deviation are known in advance and used to design the sequence of interval queries.
What would settle it
A calculation showing a distribution with |mean| ≤ λ and Var ≤ σ² that requires strictly more than O(log λ/σ) extra samples beyond the main σ²/ε² term, or an explicit non-adaptive estimator that matches the adaptive rate even for large λ/σ.
read the original abstract
In this paper, we study the problem of distributed mean estimation with 1-bit communication constraints. We propose a mean estimator that is based on (randomized and sequentially-chosen) interval queries, whose 1-bit outcome indicates whether the given sample lies in the specified interval. Our estimator is $(\epsilon, \delta)$-PAC for all distributions with bounded mean ($-\lambda \le \mathbb{E}(X) \le \lambda $) and variance ($\mathrm{Var}(X) \le \sigma^2$) for some known parameters $\lambda$ and $\sigma$. We derive a sample complexity bound $\widetilde{O}\big( \frac{\sigma^2}{\epsilon^2}\log\frac{1}{\delta} + \log\frac{\lambda}{\sigma}\big)$, which matches the minimax lower bound for the unquantized setting up to logarithmic factors and the additional $\log\frac{\lambda}{\sigma}$ term that we show to be unavoidable. We also establish an adaptivity gap for interval-query based estimators: the best non-adaptive mean estimator is considerably worse than our adaptive mean estimator for large $\frac{\lambda}{\sigma}$. Finally, we give tightened sample complexity bounds for distributions with stronger tail decay, and present additional variants that (i) handle an unknown sampling budget (ii) adapt to the unknown true variance given (possibly loose) upper and lower bounds on the variance, and (iii) use only two stages of adaptivity at the expense of more complicated (non-interval) queries.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper introduces a sequential adaptive mean estimator using randomized interval queries under 1-bit communication. For distributions with known bounds λ on the mean and σ on the standard deviation, it achieves (ε, δ)-PAC estimation with sample complexity Õ(σ²/ε² log(1/δ) + log(λ/σ)). This nearly matches the minimax lower bound from the unquantized setting, and the paper provides a matching lower bound showing the extra log term is unavoidable for interval-query estimators. It further establishes an adaptivity gap, provides bounds for sub-Gaussian tails, and includes variants for unknown parameters and limited adaptivity.
Significance. This contribution is significant for the field of distributed and communication-constrained statistical estimation. By achieving near-optimal sample complexity with minimal communication via adaptive queries, it shows that adaptivity can mitigate the cost of quantization. The explicit lower bound for the adaptivity gap and the extensions to unknown variance make the results more robust and applicable. Credit is due for the parameter-free aspects in the core bound and the reproducible theoretical analysis if proofs are complete.
major comments (2)
- §4.1, Eq. (8): The recursive definition of the interval centers in the adaptive procedure assumes prior knowledge of σ; the variance reduction step should be shown to hold uniformly over all possible distributions satisfying the variance bound.
- §6, Theorem 3: In the proof of the lower bound, the information-theoretic argument uses KL divergence between specific two-point distributions; confirm that this extends to the general case with bounded variance without additional tail assumptions.
minor comments (3)
- Abstract: The notation for the sample complexity bound uses Õ; ensure consistency with the definition provided in Section 2.
- Section 3: The description of the randomized interval query could benefit from a pseudocode algorithm to improve clarity.
- References: Several recent works on 1-bit quantization in estimation (e.g., from 2023-2024) are missing; consider adding them for completeness.
Simulated Author's Rebuttal
We thank the referee for the thorough review and constructive comments on our work. We address each major comment below in detail. The manuscript has been revised to incorporate clarifications where appropriate, strengthening the presentation without altering the core results.
read point-by-point responses
-
Referee: §4.1, Eq. (8): The recursive definition of the interval centers in the adaptive procedure assumes prior knowledge of σ; the variance reduction step should be shown to hold uniformly over all possible distributions satisfying the variance bound.
Authors: We appreciate this observation. As stated in the problem setup (Section 2), σ is a known upper bound on the variance. The recursive construction in Eq. (8) employs this known σ to determine interval widths and centers. The variance reduction argument in the analysis of the adaptive procedure relies solely on the bound Var(X) ≤ σ² together with the mean bound |E[X]| ≤ λ; it does not require knowledge of the exact variance or any further distributional properties. Consequently, the guarantees hold uniformly over the entire class of distributions satisfying these moment bounds. We have added a clarifying remark immediately following Eq. (8) to make this uniformity explicit. revision: yes
-
Referee: §6, Theorem 3: In the proof of the lower bound, the information-theoretic argument uses KL divergence between specific two-point distributions; confirm that this extends to the general case with bounded variance without additional tail assumptions.
Authors: The lower bound of Theorem 3 applies to the minimax risk over all distributions with |E[X]| ≤ λ and Var(X) ≤ σ². The proof proceeds by exhibiting a pair of two-point distributions that lie inside this class (each with variance exactly equal to σ²) and applying the standard KL-based information-theoretic argument to this pair. Because the minimax quantity is at least as large as the risk on any subclass, the resulting lower bound carries over directly to the full class. No tail assumptions beyond the variance bound are invoked. We have inserted a short paragraph at the beginning of the proof of Theorem 3 to emphasize that the constructed distributions satisfy the problem constraints and that the argument requires no additional assumptions. revision: yes
Circularity Check
No significant circularity detected
full rationale
The paper proposes an adaptive estimator using randomized sequential interval queries and derives its sample complexity upper bound through direct analysis of the estimator's performance under the stated bounded-mean and bounded-variance assumptions. This upper bound is then compared to an external minimax lower bound known for the unquantized mean estimation problem, with the extra log(λ/σ) term justified by a separate lower-bound argument for interval-query estimators. No steps reduce by construction to fitted parameters, self-definitions, or load-bearing self-citations; the derivation remains self-contained against external benchmarks and does not rename or smuggle in prior results via citation chains.
Axiom & Free-Parameter Ledger
axioms (1)
- standard math Standard concentration inequalities suffice to analyze the (ε, δ)-PAC guarantee of the sequential estimator.
Lean theorems connected to this paper
-
IndisputableMonolith/Cost/FunctionalEquation.leanwashburn_uniqueness_aczel unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
We derive a sample complexity bound Õ(σ²/ε² log 1/δ + log λ/σ), which matches the minimax lower bound for the unquantized setting up to logarithmic factors and the additional log λ/σ term
-
IndisputableMonolith/Foundation/RealityFromDistinction.leanreality_from_one_distinction unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
Our estimator is (ε, δ)-PAC for all distributions with bounded mean (−λ ≤ E(X) ≤ λ) and variance (Var(X) ≤ σ²)
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Reference graph
Works this paper leans on
-
[1]
Acharya, J., Canonne, C., Liu, Y., Sun, Z., and Tyagi, H. (2021a). Distributed estimation with multiple samples per user: Sharp rates and phase transition. In Advances in Neural Information Processing Systems , volume 34, pages 18920--18931
-
[2]
L., Liu, Y., Sun, Z., and Tyagi, H
Acharya, J., Canonne, C. L., Liu, Y., Sun, Z., and Tyagi, H. (2021b). Interactive inference under information constraints. IEEE Transactions on Information Theory , 68(1):502--516
-
[3]
Acharya, J., Canonne, C. L., Singh, A. V., and Tyagi, H. (2021c). Optimal rates for nonparametric density estimation under communication constraints. CoRR , abs/2107.10078
-
[4]
Acharya, J., Canonne, C. L., Sun, Z., and Tyagi, H. (2023). Unified lower bounds for interactive high-dimensional estimation under information constraints. Advances in Neural Information Processing Systems , 36:51133--51165
work page 2023
-
[5]
Acharya, J., Canonne, C. L., and Tyagi, H. (2020a). Inference under information constraints I : L ower bounds from chi-square contraction. IEEE Transactions on Information Theory , 66(12):7835--7855
-
[6]
Acharya, J., Canonne, C. L., and Tyagi, H. (2020b). Inference under information constraints II: communication constraints and shared randomness. IEEE Transactions on Information Theory , 66(12):7856--7877
-
[7]
Acharya, J., Kairouz, P., Liu, Y., and Sun, Z. (2021d). Estimating sparse discrete distributions under privacy and communication constraints. In Algorithmic Learning Theory (ALT) , pages 79--98. PMLR
-
[8]
Babu, N. S., Kumar, R., and Vatedka, S. (2025). Unbiased quantization of the l_1 ball for communication-efficient distributed mean estimation. In International Conference on Artificial Intelligence and Statistics (AISTATS) , volume 258, pages 1270--1278. PMLR
work page 2025
-
[9]
P., Han, Y., and \" O zg \" u r, A
Barnes, L. P., Han, Y., and \" O zg \" u r, A. (2019). F isher information for distributed estimation under a blackboard communication protocol. In IEEE International Symposium on Information Theory (ISIT) , pages 2704--2708. IEEE
work page 2019
-
[10]
P., Han, Y., and \" O zg\" u r, A
Barnes, L. P., Han, Y., and \" O zg\" u r, A. (2020). Lower bounds for learning distributions under communication constraints via F isher information. Journal of Machine Learning Research , 21:1--30
work page 2020
-
[11]
Ben-Basat, R., Vargaftik, S., Portnoy, A., Einziger, G., Ben-Itzhak, Y., and Mitzenmacher, M. (2024). Accelerating federated learning with quick distributed mean estimation. In International Conference on Machine Learning (ICML) , volume 235, pages 3410--3442. PMLR
work page 2024
-
[12]
Boucheron, S., Lugosi, G., and Massart, P. (2013). Concentration Inequalities: A Nonasymptotic Theory of Independence . Oxford University Press
work page 2013
-
[13]
Braverman, M., Garg, A., Ma, T., Nguyen, H. L., and Woodruff, D. P. (2016). Communication lower bounds for statistical estimation problems via a distributed data processing inequality. In Symposium on Theory of Computing Conference, STOC'16 , pages 1011--1020. ACM
work page 2016
-
[14]
Cai, T. T. and Wei, H. (2022a). Distributed adaptive G aussian mean estimation with unknown variance: Interactive protocol helps adaptation. The Annals of Statistics , 50(4):1992--2020
work page 1992
-
[15]
Cai, T. T. and Wei, H. (2022b). Distributed nonparametric function estimation: Optimal rate of convergence and cost of adaptation. The Annals of Statistics , 50(2):698--725
-
[16]
Cai, T. T. and Wei, H. (2024). Distributed G aussian mean estimation under communication constraints: O ptimal rates and communication-efficient algorithms. Journal of Machine Learning Research , 25(37):1--63
work page 2024
-
[17]
Cherapanamjeri, Y., Tripuraneni, N., Bartlett, P., and Jordan, M. (2022). Optimal mean estimation without a variance. In Conference on Learning Theory , pages 356--357. PMLR
work page 2022
-
[18]
Dang, T., Lee, J., Song, M., and Valiant, P. (2023). Optimality in mean estimation: Beyond worst-case, beyond sub- G aussian, and beyond 1+ moments. In Advances in Neural Information Processing Systems , volume 36, pages 4150--4176
work page 2023
-
[19]
Davies, P., Gurunanthan, V., Moshrefi, N., Ashkboos, S., and Alistarh, D. (2021). New bounds for distributed mean estimation and variance reduction. In International Conference on Learning Representations, (ICLR)
work page 2021
-
[20]
Garg, A., Ma, T., and Nguyen, H. L. (2014). On communication cost of distributed statistical estimation and dimensionality. In Advances in Neural Information Processing Systems 27 , pages 2726--2734
work page 2014
-
[21]
Gretta, L. and Price, E. (2024). Sharp Noisy Binary Search with Monotonic Probabilities . In 51st International Colloquium on Automata, Languages, and Programming (ICALP) , volume 297, pages 75:1--75:19
work page 2024
-
[22]
Gupta, S., Hopkins, S., and Price, E. (2024). Beyond catoni: Sharper rates for heavy-tailed and robust mean estimation. In Conference on Learning Theory (COLT) , pages 2232--2269. PMLR
work page 2024
-
[23]
Han, Y., Mukherjee, P., \"O zg \"u r , A., and Weissman, T. (2018a). Distributed statistical estimation of high-dimensional and non-parametric distributions. In IEEE International Symposium on Information Theory (ISIT) , pages 506--510
-
[24]
Han, Y., \" O zg \" u r, A., and Weissman, T. (2018b). Geometric lower bounds for distributed parameter estimation under communication constraints. In Conference on Learning Theory (COLT) , volume 75, pages 3163--3188. PMLR
-
[25]
Hanna, O. A., Yang, L., and Fragouli, C. (2022). Solving Multi-Arm Bandit Using a Few Bits of Communication . In International Conference on Artificial Intelligence and Statistics (AISTATS) , pages 11215--11236
work page 2022
-
[26]
He, S., Shin, H.-S., Xu, S., and Tsourdos, A. (2020). Distributed estimation over a low-cost sensor network: A review of state-of-the-art. Information Fusion , 54:21--43
work page 2020
-
[27]
Kipnis, A. and Duchi, J. C. (2022). Mean estimation from one-bit measurements. IEEE Transactions on Information Theory , 68(9):6276--6296
work page 2022
-
[28]
Kone c n \`y , J. and Richt \'a rik, P. (2018). Randomized distributed mean estimation: Accuracy vs. communication. Frontiers in Applied Mathematics and Statistics , 4:62
work page 2018
-
[29]
Kumar, R. and Vatedka, S. (2025). One-bit distributed mean estimation with unknown variance. arXiv preprint arXiv:2501.18502
-
[30]
Lau, I. and Scarlett, J. (2025). Quantile multi-armed bandits with 1-bit feedback. In Algorithmic Learning Theory , pages 664--699. PMLR
work page 2025
-
[31]
Lee, J. (2020). CSCI 1951-W Sublinear Algorithms for Big Data, Lecture 11 . https://cs.brown.edu/courses/csci1951-w/lec/lec Accessed: 2025-07-01
work page 2020
-
[32]
Lee, J. C. and Valiant, P. (2022). Optimal sub- G aussian mean estimation in R . In IEEE Annual Symposium on Foundations of Computer Science (FOCS) , pages 672--683. IEEE
work page 2022
-
[33]
Lugosi, G. and Mendelson, S. (2019). Sub- G aussian estimators of the mean of a random vector. The Annals of Statistics , 47(2):783 -- 794
work page 2019
-
[34]
Luo, Z.-Q. (2005). Universal decentralized estimation in a bandwidth constrained sensor network. IEEE Transactions on Information Theory , 51(6):2210--2219
work page 2005
-
[35]
Mayekar, P., Scarlett, J., and Tan, V. Y. (2023). Communication-constrained bandits under additive G aussian noise . In International Conference on Machine Learning (ICML) , pages 24236--24250
work page 2023
-
[36]
Mayekar, P., Suresh, A. T., and Tyagi, H. (2021). Wyner-ziv estimators: Efficient distributed mean estimation with side-information. In International Conference on Artificial Intelligence and Statistics (AISTATS) , pages 3502--3510. PMLR
work page 2021
-
[37]
Minsker, S. (2023). Efficient median of means estimator. In Conference on Learning Theory (COLT) , pages 5925--5933. PMLR
work page 2023
- [38]
-
[39]
Ribeiro, A. and Giannakis, G. B. (2006a). Bandwidth-constrained distributed estimation for wireless sensor networks-part i: G aussian case. IEEE Transactions on Signal Processing , 54(3):1131--1143
-
[40]
Ribeiro, A. and Giannakis, G. B. (2006b). Bandwidth-constrained distributed estimation for wireless sensor networks-part ii: U nknown probability density function. IEEE Transactions on Signal Processing , 54(7):2784--2796
-
[41]
Shah, J., Cardone, M., Rush, C., and Dytso, A. (2025). Generalized linear models with 1-bit measurements: Asymptotics of the maximum likelihood estimator. In International Conference on Acoustics, Speech and Signal Processing (ICASSP) , pages 1--5. IEEE
work page 2025
-
[42]
Shamir, O. (2014). Fundamental limits of online and distributed algorithms for statistical learning and estimation. In Advances in Neural Information Processing Systems 27 , pages 163--171
work page 2014
-
[43]
Suresh, A. T., Felix, X. Y., Kumar, S., and McMahan, H. B. (2017). Distributed mean estimation with limited communication. In International Conference on Machine Learning (ICML) , pages 3329--3337. PMLR
work page 2017
-
[44]
T., Sun, Z., Ro, J., and Yu, F
Suresh, A. T., Sun, Z., Ro, J., and Yu, F. (2022). Correlated quantization for distributed mean estimation and optimization. In International Conference on Machine Learning , pages 20856--20876. PMLR
work page 2022
-
[45]
Szab \'o , B. and van Zanten, H. (2018). Adaptive distributed methods under communication constraints. The Annals of Statistics
work page 2018
-
[46]
Szab \'o , B. and van Zanten, H. (2020). Distributed function estimation: Adaptation using minimal communication. Mathematical Statistics and Learning
work page 2020
-
[47]
B., Portnoy, A., Mendelson, G., Itzhak, Y
Vargaftik, S., Basat, R. B., Portnoy, A., Mendelson, G., Itzhak, Y. B., and Mitzenmacher, M. (2022). EDEN : Communication-efficient and robust distributed mean estimation for federated learning. In International Conference on Machine Learning (ICML) , volume 162, pages 21984--22014. PMLR
work page 2022
-
[48]
Vargaftik, S., Ben-Basat, R., Portnoy, A., Mendelson, G., Ben-Itzhak, Y., and Mitzenmacher, M. (2021). Drive: One-bit distributed mean estimation. In Advances in Neural Information Processing Systems , volume 34, pages 362--377
work page 2021
-
[49]
Varshney, P. K. (2012). Distributed detection and data fusion . Springer Science & Business Media
work page 2012
-
[50]
Veeravalli, V. V. and Varshney, P. K. (2012). Distributed inference in wireless sensor networks. Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences , 370(1958):100--117
work page 2012
-
[51]
Vershynin, R. (2026). High-Dimensional Probability: An Introduction with Applications in Data Science . Cambridge Series in Statistical and Probabilistic Mathematics. Cambridge University Press, 2 edition
work page 2026
-
[52]
Xiao, J.-J., Ribeiro, A., Luo, Z.-Q., and Giannakis, G. B. (2006). Distributed compression-estimation using wireless sensor networks. IEEE Signal Processing Magazine , 23(4):27--41
work page 2006
-
[53]
Xu, A. and Raginsky, M. (2017). Information-theoretic lower bounds on Bayes risk in decentralized estimation . IEEE Transactions on Information Theory , 63(3):1580--1600
work page 2017
-
[54]
Zaman, A. and Szab \'o , B. (2022). Distributed nonparametric estimation under communication constraints. arXiv preprint arXiv:2204.10373
-
[55]
Zhang, Y., Duchi, J., Jordan, M. I., and Wainwright, M. J. (2013). Information-theoretic lower bounds for distributed statistical estimation with communication constraints. In Advances in Neural Information Processing Systems 26 , pages 2328--2336
work page 2013
-
[56]
Zhu, Y. and Lafferty, J. (2018). Distributed nonparametric regression under communication constraints. In International Conference on Machine Learning (ICML) , pages 6009--6017. PMLR
work page 2018
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.