Exact and Efficient Sampling from Dynamic Discrete Distributions with Finite-Precision Weights
Pith reviewed 2026-05-19 08:43 UTC · model grok-4.3
The pith
EBUS delivers exact sampling from dynamic distributions with finite-precision weights in constant time.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
EBUS is exact by construction: every returned index has probability exactly proportional to its represented weight, with all performance bounds holding in the word RAM model with bounded exponent range.
What carries the argument
Exact BUcket Sampling (EBUS), which uses a bucket-based structure to enforce exact proportionality without infinite-precision real arithmetic.
If this is right
- Dynamic updates remain efficient without full table rebuilds required by static methods such as the Alias Method.
- Applications needing precise probabilities can use finite-precision weights safely.
- The method scales linearly in space and construction while keeping per-operation costs constant in expectation.
Where Pith is reading between the lines
- The approach may generalize to other bounded-precision numeric types beyond IEEE floats.
- It could reduce silent errors in sampling-based algorithms in simulation and optimization.
- Implementation in probabilistic programming systems would allow direct use of changing finite-precision distributions.
Load-bearing premise
The analysis assumes a word RAM computational model with bounded exponent range.
What would settle it
Run the 64-bit floating-point implementation on hardware and measure whether observed selection frequencies match the exact represented weights within floating-point representation error.
read the original abstract
Sampling from a dynamic discrete distribution means drawing an index with probability proportional to a mutable set of weights. Classical constant-time techniques such as the Alias Method are well suited to static distributions, but become expensive in dynamic settings because updates require rebuilding auxiliary tables. Existing dynamic approaches, including Forest of Trees and BUcket Sampling (BUS), achieve reasonable practical performance but require infinite precision real arithmetic to be correct and produce meaningfully incorrect results when implemented on real hardware. We present EBUS (Exact BUcket Sampling), a dynamic sampler for finite-precision weights that is exact by construction: every returned index has probability exactly proportional to its represented weight. Our guarantees are proved in a word RAM model with bounded exponent range. In that model, our method supports $O(1)$ worst-case expected sampling time, $O(1)$ amortized time to update a single weight, $O(n)$ space, and $O(n)$ construction. We also provide an implementation for IEEE 64-bit floating-point weights and show experimentally that it is competitive with, and often faster than, several implementations of previous inexact methods while avoiding their numerical failure modes.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper introduces EBUS (Exact BUcket Sampling), a dynamic sampler for discrete distributions with finite-precision weights. It claims that EBUS is exact by construction—every returned index has probability exactly proportional to its represented weight—when proved in a word RAM model with bounded exponent range. The method achieves O(1) worst-case expected sampling time, O(1) amortized update time per weight, O(n) space, and O(n) construction time. An IEEE 64-bit floating-point implementation is provided along with experiments showing competitiveness with prior inexact methods while avoiding their numerical failures.
Significance. If the model-specific proof and implementation hold, the result is significant: it supplies the first dynamic sampler that is both asymptotically efficient and exactly correct under finite-precision arithmetic, directly addressing the gap between theoretical methods (Alias, Forest of Trees, BUS) that require infinite precision and practical hardware implementations. The provision of a working IEEE-64 implementation and experimental comparison constitutes a concrete, reproducible strength.
major comments (2)
- [Abstract and model section] Abstract and § on model and guarantees: the central exactness claim is proved only inside the word RAM model with bounded exponent range. This assumption is load-bearing; the manuscript should supply a short but explicit argument or counter-example showing where the proportionality invariant fails once the exponent bound is violated (e.g., after a sequence of updates that force normalization outside the assumed range).
- [Implementation and experiments] Implementation and experimental sections: the IEEE-64 version is presented as avoiding the numerical failure modes of prior methods, yet the paper does not report a direct verification (e.g., exhaustive enumeration on small n or statistical test against the exact multinomial) that the observed frequencies match the represented weights to machine precision. Such a check is needed to confirm that the floating-point realization inherits the model guarantee.
minor comments (2)
- [Algorithm description] Notation for bucket construction and the update procedure would benefit from a small worked example (n=4, explicit weight changes) to make the O(1) amortized claim easier to follow.
- [Experiments] The experimental plots should include error bars or at least the number of trials used to measure sampling and update times.
Simulated Author's Rebuttal
We thank the referee for their thorough review and valuable feedback on our paper. We address each of the major comments below, indicating how we plan to revise the manuscript accordingly.
read point-by-point responses
-
Referee: [Abstract and model section] Abstract and § on model and guarantees: the central exactness claim is proved only inside the word RAM model with bounded exponent range. This assumption is load-bearing; the manuscript should supply a short but explicit argument or counter-example showing where the proportionality invariant fails once the exponent bound is violated (e.g., after a sequence of updates that force normalization outside the assumed range).
Authors: We agree with the referee that the bounded exponent range is essential for the exactness proof. In our word RAM model, the assumption ensures that all intermediate sums and normalizations can be performed exactly without exponent overflow or loss of precision in the mantissa. When this bound is violated, such as through a sequence of updates that cause the total weight sum to exceed the maximum exponent, floating-point operations can result in rounding errors or infinities, violating the proportionality invariant. We will add a concise explanation and a counter-example (e.g., with n=2 and extreme weight updates leading to overflow) in the model section to illustrate this failure mode. revision: yes
-
Referee: [Implementation and experiments] Implementation and experimental sections: the IEEE-64 version is presented as avoiding the numerical failure modes of prior methods, yet the paper does not report a direct verification (e.g., exhaustive enumeration on small n or statistical test against the exact multinomial) that the observed frequencies match the represented weights to machine precision. Such a check is needed to confirm that the floating-point realization inherits the model guarantee.
Authors: We acknowledge the importance of empirical validation for the implementation. While our experiments demonstrate competitiveness and avoidance of obvious failures, we did not include a direct statistical verification against the exact probabilities. We will revise the experimental section to include such a verification: for small instances (n ≤ 10), we will run a large number of samples (e.g., 10^8) and use a chi-squared test or direct frequency comparison against the represented weights computed in higher precision to confirm agreement to machine precision. revision: yes
Circularity Check
No significant circularity; derivation self-contained in explicit model
full rationale
The paper defines EBUS and proves exactness, O(1) sampling, and update bounds directly within a stated word RAM model with bounded exponent range. No load-bearing step reduces by construction to fitted parameters, self-cited uniqueness theorems, or renamed prior results. The central exactness claim is established via the algorithm design and model-specific proof rather than by re-using the target property as an input. This is a standard self-contained algorithmic contribution.
Axiom & Free-Parameter Ledger
Reference graph
Works this paper leans on
-
[1]
Maintaining discrete probabil- ity distributions in practice
Daniel Allendorf. Maintaining discrete probabil- ity distributions in practice. In Proceedings of the Symposium on Algorithm Engineering and Exper- iments, 2024
work page 2024
-
[2]
Parallel and i/o-efficient algorithms for non-linear preferential attachment
Daniel Allendorf, Ulrich Meyer, Manuel Pen- schuck, and Hung Tran. Parallel and i/o-efficient algorithms for non-linear preferential attachment. In 2023 Proceedings of the Symposium on Algo- rithm Engineering and Experiments (ALENEX) , pages 65–76. SIAM, 2023
work page 2023
-
[3]
Joshua Colvin, Michael I. Monine, Ryan N. Gutenkunst, William S. Hlavacek, Daniel D. Von Hoff, and Richard G. Posner. Rulemonkey: software for stochastic simulation of rule-based models. BMC bioinformatics , 11:1–14, 2010
work page 2010
-
[4]
Federico D’Ambrosio, Hans L. Bodlaender, and Gerard T. Barkema. Dynamic sampling from a discrete probability distribution with a known dis- tribution of rates. Computational Statistics, 2022
work page 2022
-
[5]
Ex- act sublinear binomial sampling
Mart´ ın Farach-Colton and Meng-Tsung Tsai. Ex- act sublinear binomial sampling. Algorithmica, 73:637–651, 2015
work page 2015
-
[6]
Beforeit.jl: High- performance agent-based macroeconomics made easy, 2025
Aldo Glielmo, Mitja Devetak, Adriano Meligrana, and Sebastian Poledna. Beforeit.jl: High- performance agent-based macroeconomics made easy, 2025. 7
work page 2025
-
[7]
Solomon Kullback and Richard A. Leibler. On information and sufficiency. The annals of math- ematical statistics, 22(1):79–86, 1951
work page 1951
-
[8]
Fast random integer generation in an interval
Daniel Lemire. Fast random integer generation in an interval. ACM Transactions on Modeling and Computer Simulation (TOMACS) , 29(1):1– 12, 2019
work page 2019
-
[9]
Divergence measures based on the shannon entropy
Jianhua Lin. Divergence measures based on the shannon entropy. IEEE Transactions on Informa- tion theory, 37(1):145–151, 2002
work page 2002
-
[10]
Dynamic generation of discrete random variates
Yossi Matias, Jeffrey Vitter, and Ni Wen-Chun. Dynamic generation of discrete random variates. Theory of Computing Systems , 2008
work page 2008
-
[11]
Alexander Slepoy, Aidan Thompson, and Steven Plimpton. A constant-time kinetic monte carlo algorithm for simulation of large biochemical re- action networks. The Journal of chemical physics , 2008
work page 2008
-
[12]
Alastair J. Walker. An efficient method for gener- ating discrete random variables with general dis- tributions. ACM Transactions on Mathematical Software, 1977
work page 1977
-
[13]
C. K. Wong and M. C. Easton. An efficient method for weighted sampling without replace- ment. SIAM Journal on Computing , 9(1):111–113, 1980
work page 1980
-
[14]
Phillips, Michael Matheny, and Feifei Li
Dong Xie, Jeff M. Phillips, Michael Matheny, and Feifei Li. Spatial independent range sampling. In Proceedings of the 2021 International Conference on Management of Data , pages 2023–2035, 2021
work page 2021
-
[15]
Efficient dynamic weighted set sampling and its extension
Fangyuan Zhang, Mengxu Jiang, and Sibo Wang. Efficient dynamic weighted set sampling and its extension. Proceedings of the VLDB Endowment , 2023. 8
work page 2023
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.