Exact and Efficient Sampling from Dynamic Discrete Distributions with Finite-Precision Weights

Adriano Meligrana; Lilith Orion Hafner

arxiv: 2506.14062 · v2 · submitted 2025-06-16 · 💻 cs.DS · math.PR· stat.CO

Exact and Efficient Sampling from Dynamic Discrete Distributions with Finite-Precision Weights

Lilith Orion Hafner , Adriano Meligrana This is my paper

Pith reviewed 2026-05-19 08:43 UTC · model grok-4.3

classification 💻 cs.DS math.PRstat.CO

keywords dynamic samplingdiscrete distributionsfinite precisionexact samplingbucket samplingdata structuresword RAM

0 comments

The pith

EBUS delivers exact sampling from dynamic distributions with finite-precision weights in constant time.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper presents EBUS as a dynamic sampler that maintains exact proportionality between weights and selection probabilities even when weights use finite precision such as 64-bit floats. Previous dynamic methods like BUS achieve speed but rely on infinite-precision arithmetic and produce incorrect results on actual hardware. EBUS proves its guarantees in a word RAM model with bounded exponent range, delivering O(1) worst-case expected sampling time, O(1) amortized single-weight updates, O(n) space, and O(n) construction time. Experiments show the IEEE 64-bit implementation runs competitively with or faster than inexact alternatives while eliminating their numerical failures.

Core claim

EBUS is exact by construction: every returned index has probability exactly proportional to its represented weight, with all performance bounds holding in the word RAM model with bounded exponent range.

What carries the argument

Exact BUcket Sampling (EBUS), which uses a bucket-based structure to enforce exact proportionality without infinite-precision real arithmetic.

If this is right

Dynamic updates remain efficient without full table rebuilds required by static methods such as the Alias Method.
Applications needing precise probabilities can use finite-precision weights safely.
The method scales linearly in space and construction while keeping per-operation costs constant in expectation.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The approach may generalize to other bounded-precision numeric types beyond IEEE floats.
It could reduce silent errors in sampling-based algorithms in simulation and optimization.
Implementation in probabilistic programming systems would allow direct use of changing finite-precision distributions.

Load-bearing premise

The analysis assumes a word RAM computational model with bounded exponent range.

What would settle it

Run the 64-bit floating-point implementation on hardware and measure whether observed selection frequencies match the exact represented weights within floating-point representation error.

read the original abstract

Sampling from a dynamic discrete distribution means drawing an index with probability proportional to a mutable set of weights. Classical constant-time techniques such as the Alias Method are well suited to static distributions, but become expensive in dynamic settings because updates require rebuilding auxiliary tables. Existing dynamic approaches, including Forest of Trees and BUcket Sampling (BUS), achieve reasonable practical performance but require infinite precision real arithmetic to be correct and produce meaningfully incorrect results when implemented on real hardware. We present EBUS (Exact BUcket Sampling), a dynamic sampler for finite-precision weights that is exact by construction: every returned index has probability exactly proportional to its represented weight. Our guarantees are proved in a word RAM model with bounded exponent range. In that model, our method supports $O(1)$ worst-case expected sampling time, $O(1)$ amortized time to update a single weight, $O(n)$ space, and $O(n)$ construction. We also provide an implementation for IEEE 64-bit floating-point weights and show experimentally that it is competitive with, and often faster than, several implementations of previous inexact methods while avoiding their numerical failure modes.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

EBUS gives exact dynamic sampling for finite-precision weights in a bounded-exponent word RAM model and ships a competitive IEEE-64 implementation.

read the letter

The central point is that EBUS constructs a bucket structure so that sampling returns an index with probability exactly matching its finite-precision weight, something prior dynamic methods could not guarantee on real hardware. The authors prove this in a word RAM model where exponents stay within a fixed range, which keeps all arithmetic representable and preserves exact ratios. They also give O(1) expected sampling, O(1) amortized updates, linear space, and an IEEE 64-bit floating-point version that runs at least as fast as earlier inexact samplers while avoiding their documented numerical failures.

Referee Report

2 major / 2 minor

Summary. The paper introduces EBUS (Exact BUcket Sampling), a dynamic sampler for discrete distributions with finite-precision weights. It claims that EBUS is exact by construction—every returned index has probability exactly proportional to its represented weight—when proved in a word RAM model with bounded exponent range. The method achieves O(1) worst-case expected sampling time, O(1) amortized update time per weight, O(n) space, and O(n) construction time. An IEEE 64-bit floating-point implementation is provided along with experiments showing competitiveness with prior inexact methods while avoiding their numerical failures.

Significance. If the model-specific proof and implementation hold, the result is significant: it supplies the first dynamic sampler that is both asymptotically efficient and exactly correct under finite-precision arithmetic, directly addressing the gap between theoretical methods (Alias, Forest of Trees, BUS) that require infinite precision and practical hardware implementations. The provision of a working IEEE-64 implementation and experimental comparison constitutes a concrete, reproducible strength.

major comments (2)

[Abstract and model section] Abstract and § on model and guarantees: the central exactness claim is proved only inside the word RAM model with bounded exponent range. This assumption is load-bearing; the manuscript should supply a short but explicit argument or counter-example showing where the proportionality invariant fails once the exponent bound is violated (e.g., after a sequence of updates that force normalization outside the assumed range).
[Implementation and experiments] Implementation and experimental sections: the IEEE-64 version is presented as avoiding the numerical failure modes of prior methods, yet the paper does not report a direct verification (e.g., exhaustive enumeration on small n or statistical test against the exact multinomial) that the observed frequencies match the represented weights to machine precision. Such a check is needed to confirm that the floating-point realization inherits the model guarantee.

minor comments (2)

[Algorithm description] Notation for bucket construction and the update procedure would benefit from a small worked example (n=4, explicit weight changes) to make the O(1) amortized claim easier to follow.
[Experiments] The experimental plots should include error bars or at least the number of trials used to measure sampling and update times.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their thorough review and valuable feedback on our paper. We address each of the major comments below, indicating how we plan to revise the manuscript accordingly.

read point-by-point responses

Referee: [Abstract and model section] Abstract and § on model and guarantees: the central exactness claim is proved only inside the word RAM model with bounded exponent range. This assumption is load-bearing; the manuscript should supply a short but explicit argument or counter-example showing where the proportionality invariant fails once the exponent bound is violated (e.g., after a sequence of updates that force normalization outside the assumed range).

Authors: We agree with the referee that the bounded exponent range is essential for the exactness proof. In our word RAM model, the assumption ensures that all intermediate sums and normalizations can be performed exactly without exponent overflow or loss of precision in the mantissa. When this bound is violated, such as through a sequence of updates that cause the total weight sum to exceed the maximum exponent, floating-point operations can result in rounding errors or infinities, violating the proportionality invariant. We will add a concise explanation and a counter-example (e.g., with n=2 and extreme weight updates leading to overflow) in the model section to illustrate this failure mode. revision: yes
Referee: [Implementation and experiments] Implementation and experimental sections: the IEEE-64 version is presented as avoiding the numerical failure modes of prior methods, yet the paper does not report a direct verification (e.g., exhaustive enumeration on small n or statistical test against the exact multinomial) that the observed frequencies match the represented weights to machine precision. Such a check is needed to confirm that the floating-point realization inherits the model guarantee.

Authors: We acknowledge the importance of empirical validation for the implementation. While our experiments demonstrate competitiveness and avoidance of obvious failures, we did not include a direct statistical verification against the exact probabilities. We will revise the experimental section to include such a verification: for small instances (n ≤ 10), we will run a large number of samples (e.g., 10^8) and use a chi-squared test or direct frequency comparison against the represented weights computed in higher precision to confirm agreement to machine precision. revision: yes

Circularity Check

0 steps flagged

No significant circularity; derivation self-contained in explicit model

full rationale

The paper defines EBUS and proves exactness, O(1) sampling, and update bounds directly within a stated word RAM model with bounded exponent range. No load-bearing step reduces by construction to fitted parameters, self-cited uniqueness theorems, or renamed prior results. The central exactness claim is established via the algorithm design and model-specific proof rather than by re-using the target property as an input. This is a standard self-contained algorithmic contribution.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

No free parameters, invented entities, or non-standard axioms are described in the abstract; the work relies on the standard word RAM model with bounded exponent range as the setting for its proofs.

pith-pipeline@v0.9.0 · 5728 in / 1153 out tokens · 32436 ms · 2026-05-19T08:43:45.114425+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

15 extracted references · 15 canonical work pages

[1]

Maintaining discrete probabil- ity distributions in practice

Daniel Allendorf. Maintaining discrete probabil- ity distributions in practice. In Proceedings of the Symposium on Algorithm Engineering and Exper- iments, 2024

work page 2024
[2]

Parallel and i/o-efficient algorithms for non-linear preferential attachment

Daniel Allendorf, Ulrich Meyer, Manuel Pen- schuck, and Hung Tran. Parallel and i/o-efficient algorithms for non-linear preferential attachment. In 2023 Proceedings of the Symposium on Algo- rithm Engineering and Experiments (ALENEX) , pages 65–76. SIAM, 2023

work page 2023
[3]

Monine, Ryan N

Joshua Colvin, Michael I. Monine, Ryan N. Gutenkunst, William S. Hlavacek, Daniel D. Von Hoff, and Richard G. Posner. Rulemonkey: software for stochastic simulation of rule-based models. BMC bioinformatics , 11:1–14, 2010

work page 2010
[4]

Bodlaender, and Gerard T

Federico D’Ambrosio, Hans L. Bodlaender, and Gerard T. Barkema. Dynamic sampling from a discrete probability distribution with a known dis- tribution of rates. Computational Statistics, 2022

work page 2022
[5]

Ex- act sublinear binomial sampling

Mart´ ın Farach-Colton and Meng-Tsung Tsai. Ex- act sublinear binomial sampling. Algorithmica, 73:637–651, 2015

work page 2015
[6]

Beforeit.jl: High- performance agent-based macroeconomics made easy, 2025

Aldo Glielmo, Mitja Devetak, Adriano Meligrana, and Sebastian Poledna. Beforeit.jl: High- performance agent-based macroeconomics made easy, 2025. 7

work page 2025
[7]

Solomon Kullback and Richard A. Leibler. On information and sufficiency. The annals of math- ematical statistics, 22(1):79–86, 1951

work page 1951
[8]

Fast random integer generation in an interval

Daniel Lemire. Fast random integer generation in an interval. ACM Transactions on Modeling and Computer Simulation (TOMACS) , 29(1):1– 12, 2019

work page 2019
[9]

Divergence measures based on the shannon entropy

Jianhua Lin. Divergence measures based on the shannon entropy. IEEE Transactions on Informa- tion theory, 37(1):145–151, 2002

work page 2002
[10]

Dynamic generation of discrete random variates

Yossi Matias, Jeffrey Vitter, and Ni Wen-Chun. Dynamic generation of discrete random variates. Theory of Computing Systems , 2008

work page 2008
[11]

A constant-time kinetic monte carlo algorithm for simulation of large biochemical re- action networks

Alexander Slepoy, Aidan Thompson, and Steven Plimpton. A constant-time kinetic monte carlo algorithm for simulation of large biochemical re- action networks. The Journal of chemical physics , 2008

work page 2008
[12]

Alastair J. Walker. An efficient method for gener- ating discrete random variables with general dis- tributions. ACM Transactions on Mathematical Software, 1977

work page 1977
[13]

C. K. Wong and M. C. Easton. An efficient method for weighted sampling without replace- ment. SIAM Journal on Computing , 9(1):111–113, 1980

work page 1980
[14]

Phillips, Michael Matheny, and Feifei Li

Dong Xie, Jeff M. Phillips, Michael Matheny, and Feifei Li. Spatial independent range sampling. In Proceedings of the 2021 International Conference on Management of Data , pages 2023–2035, 2021

work page 2021
[15]

Efficient dynamic weighted set sampling and its extension

Fangyuan Zhang, Mengxu Jiang, and Sibo Wang. Efficient dynamic weighted set sampling and its extension. Proceedings of the VLDB Endowment , 2023. 8

work page 2023

[1] [1]

Maintaining discrete probabil- ity distributions in practice

Daniel Allendorf. Maintaining discrete probabil- ity distributions in practice. In Proceedings of the Symposium on Algorithm Engineering and Exper- iments, 2024

work page 2024

[2] [2]

Parallel and i/o-efficient algorithms for non-linear preferential attachment

Daniel Allendorf, Ulrich Meyer, Manuel Pen- schuck, and Hung Tran. Parallel and i/o-efficient algorithms for non-linear preferential attachment. In 2023 Proceedings of the Symposium on Algo- rithm Engineering and Experiments (ALENEX) , pages 65–76. SIAM, 2023

work page 2023

[3] [3]

Monine, Ryan N

Joshua Colvin, Michael I. Monine, Ryan N. Gutenkunst, William S. Hlavacek, Daniel D. Von Hoff, and Richard G. Posner. Rulemonkey: software for stochastic simulation of rule-based models. BMC bioinformatics , 11:1–14, 2010

work page 2010

[4] [4]

Bodlaender, and Gerard T

Federico D’Ambrosio, Hans L. Bodlaender, and Gerard T. Barkema. Dynamic sampling from a discrete probability distribution with a known dis- tribution of rates. Computational Statistics, 2022

work page 2022

[5] [5]

Ex- act sublinear binomial sampling

Mart´ ın Farach-Colton and Meng-Tsung Tsai. Ex- act sublinear binomial sampling. Algorithmica, 73:637–651, 2015

work page 2015

[6] [6]

Beforeit.jl: High- performance agent-based macroeconomics made easy, 2025

Aldo Glielmo, Mitja Devetak, Adriano Meligrana, and Sebastian Poledna. Beforeit.jl: High- performance agent-based macroeconomics made easy, 2025. 7

work page 2025

[7] [7]

Solomon Kullback and Richard A. Leibler. On information and sufficiency. The annals of math- ematical statistics, 22(1):79–86, 1951

work page 1951

[8] [8]

Fast random integer generation in an interval

Daniel Lemire. Fast random integer generation in an interval. ACM Transactions on Modeling and Computer Simulation (TOMACS) , 29(1):1– 12, 2019

work page 2019

[9] [9]

Divergence measures based on the shannon entropy

Jianhua Lin. Divergence measures based on the shannon entropy. IEEE Transactions on Informa- tion theory, 37(1):145–151, 2002

work page 2002

[10] [10]

Dynamic generation of discrete random variates

Yossi Matias, Jeffrey Vitter, and Ni Wen-Chun. Dynamic generation of discrete random variates. Theory of Computing Systems , 2008

work page 2008

[11] [11]

A constant-time kinetic monte carlo algorithm for simulation of large biochemical re- action networks

Alexander Slepoy, Aidan Thompson, and Steven Plimpton. A constant-time kinetic monte carlo algorithm for simulation of large biochemical re- action networks. The Journal of chemical physics , 2008

work page 2008

[12] [12]

Alastair J. Walker. An efficient method for gener- ating discrete random variables with general dis- tributions. ACM Transactions on Mathematical Software, 1977

work page 1977

[13] [13]

C. K. Wong and M. C. Easton. An efficient method for weighted sampling without replace- ment. SIAM Journal on Computing , 9(1):111–113, 1980

work page 1980

[14] [14]

Phillips, Michael Matheny, and Feifei Li

Dong Xie, Jeff M. Phillips, Michael Matheny, and Feifei Li. Spatial independent range sampling. In Proceedings of the 2021 International Conference on Management of Data , pages 2023–2035, 2021

work page 2021

[15] [15]

Efficient dynamic weighted set sampling and its extension

Fangyuan Zhang, Mengxu Jiang, and Sibo Wang. Efficient dynamic weighted set sampling and its extension. Proceedings of the VLDB Endowment , 2023. 8

work page 2023