pith. sign in

arxiv: 2506.14062 · v2 · submitted 2025-06-16 · 💻 cs.DS · math.PR· stat.CO

Exact and Efficient Sampling from Dynamic Discrete Distributions with Finite-Precision Weights

Pith reviewed 2026-05-19 08:43 UTC · model grok-4.3

classification 💻 cs.DS math.PRstat.CO
keywords dynamic samplingdiscrete distributionsfinite precisionexact samplingbucket samplingdata structuresword RAM
0
0 comments X

The pith

EBUS delivers exact sampling from dynamic distributions with finite-precision weights in constant time.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper presents EBUS as a dynamic sampler that maintains exact proportionality between weights and selection probabilities even when weights use finite precision such as 64-bit floats. Previous dynamic methods like BUS achieve speed but rely on infinite-precision arithmetic and produce incorrect results on actual hardware. EBUS proves its guarantees in a word RAM model with bounded exponent range, delivering O(1) worst-case expected sampling time, O(1) amortized single-weight updates, O(n) space, and O(n) construction time. Experiments show the IEEE 64-bit implementation runs competitively with or faster than inexact alternatives while eliminating their numerical failures.

Core claim

EBUS is exact by construction: every returned index has probability exactly proportional to its represented weight, with all performance bounds holding in the word RAM model with bounded exponent range.

What carries the argument

Exact BUcket Sampling (EBUS), which uses a bucket-based structure to enforce exact proportionality without infinite-precision real arithmetic.

If this is right

  • Dynamic updates remain efficient without full table rebuilds required by static methods such as the Alias Method.
  • Applications needing precise probabilities can use finite-precision weights safely.
  • The method scales linearly in space and construction while keeping per-operation costs constant in expectation.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The approach may generalize to other bounded-precision numeric types beyond IEEE floats.
  • It could reduce silent errors in sampling-based algorithms in simulation and optimization.
  • Implementation in probabilistic programming systems would allow direct use of changing finite-precision distributions.

Load-bearing premise

The analysis assumes a word RAM computational model with bounded exponent range.

What would settle it

Run the 64-bit floating-point implementation on hardware and measure whether observed selection frequencies match the exact represented weights within floating-point representation error.

read the original abstract

Sampling from a dynamic discrete distribution means drawing an index with probability proportional to a mutable set of weights. Classical constant-time techniques such as the Alias Method are well suited to static distributions, but become expensive in dynamic settings because updates require rebuilding auxiliary tables. Existing dynamic approaches, including Forest of Trees and BUcket Sampling (BUS), achieve reasonable practical performance but require infinite precision real arithmetic to be correct and produce meaningfully incorrect results when implemented on real hardware. We present EBUS (Exact BUcket Sampling), a dynamic sampler for finite-precision weights that is exact by construction: every returned index has probability exactly proportional to its represented weight. Our guarantees are proved in a word RAM model with bounded exponent range. In that model, our method supports $O(1)$ worst-case expected sampling time, $O(1)$ amortized time to update a single weight, $O(n)$ space, and $O(n)$ construction. We also provide an implementation for IEEE 64-bit floating-point weights and show experimentally that it is competitive with, and often faster than, several implementations of previous inexact methods while avoiding their numerical failure modes.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper introduces EBUS (Exact BUcket Sampling), a dynamic sampler for discrete distributions with finite-precision weights. It claims that EBUS is exact by construction—every returned index has probability exactly proportional to its represented weight—when proved in a word RAM model with bounded exponent range. The method achieves O(1) worst-case expected sampling time, O(1) amortized update time per weight, O(n) space, and O(n) construction time. An IEEE 64-bit floating-point implementation is provided along with experiments showing competitiveness with prior inexact methods while avoiding their numerical failures.

Significance. If the model-specific proof and implementation hold, the result is significant: it supplies the first dynamic sampler that is both asymptotically efficient and exactly correct under finite-precision arithmetic, directly addressing the gap between theoretical methods (Alias, Forest of Trees, BUS) that require infinite precision and practical hardware implementations. The provision of a working IEEE-64 implementation and experimental comparison constitutes a concrete, reproducible strength.

major comments (2)
  1. [Abstract and model section] Abstract and § on model and guarantees: the central exactness claim is proved only inside the word RAM model with bounded exponent range. This assumption is load-bearing; the manuscript should supply a short but explicit argument or counter-example showing where the proportionality invariant fails once the exponent bound is violated (e.g., after a sequence of updates that force normalization outside the assumed range).
  2. [Implementation and experiments] Implementation and experimental sections: the IEEE-64 version is presented as avoiding the numerical failure modes of prior methods, yet the paper does not report a direct verification (e.g., exhaustive enumeration on small n or statistical test against the exact multinomial) that the observed frequencies match the represented weights to machine precision. Such a check is needed to confirm that the floating-point realization inherits the model guarantee.
minor comments (2)
  1. [Algorithm description] Notation for bucket construction and the update procedure would benefit from a small worked example (n=4, explicit weight changes) to make the O(1) amortized claim easier to follow.
  2. [Experiments] The experimental plots should include error bars or at least the number of trials used to measure sampling and update times.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their thorough review and valuable feedback on our paper. We address each of the major comments below, indicating how we plan to revise the manuscript accordingly.

read point-by-point responses
  1. Referee: [Abstract and model section] Abstract and § on model and guarantees: the central exactness claim is proved only inside the word RAM model with bounded exponent range. This assumption is load-bearing; the manuscript should supply a short but explicit argument or counter-example showing where the proportionality invariant fails once the exponent bound is violated (e.g., after a sequence of updates that force normalization outside the assumed range).

    Authors: We agree with the referee that the bounded exponent range is essential for the exactness proof. In our word RAM model, the assumption ensures that all intermediate sums and normalizations can be performed exactly without exponent overflow or loss of precision in the mantissa. When this bound is violated, such as through a sequence of updates that cause the total weight sum to exceed the maximum exponent, floating-point operations can result in rounding errors or infinities, violating the proportionality invariant. We will add a concise explanation and a counter-example (e.g., with n=2 and extreme weight updates leading to overflow) in the model section to illustrate this failure mode. revision: yes

  2. Referee: [Implementation and experiments] Implementation and experimental sections: the IEEE-64 version is presented as avoiding the numerical failure modes of prior methods, yet the paper does not report a direct verification (e.g., exhaustive enumeration on small n or statistical test against the exact multinomial) that the observed frequencies match the represented weights to machine precision. Such a check is needed to confirm that the floating-point realization inherits the model guarantee.

    Authors: We acknowledge the importance of empirical validation for the implementation. While our experiments demonstrate competitiveness and avoidance of obvious failures, we did not include a direct statistical verification against the exact probabilities. We will revise the experimental section to include such a verification: for small instances (n ≤ 10), we will run a large number of samples (e.g., 10^8) and use a chi-squared test or direct frequency comparison against the represented weights computed in higher precision to confirm agreement to machine precision. revision: yes

Circularity Check

0 steps flagged

No significant circularity; derivation self-contained in explicit model

full rationale

The paper defines EBUS and proves exactness, O(1) sampling, and update bounds directly within a stated word RAM model with bounded exponent range. No load-bearing step reduces by construction to fitted parameters, self-cited uniqueness theorems, or renamed prior results. The central exactness claim is established via the algorithm design and model-specific proof rather than by re-using the target property as an input. This is a standard self-contained algorithmic contribution.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

No free parameters, invented entities, or non-standard axioms are described in the abstract; the work relies on the standard word RAM model with bounded exponent range as the setting for its proofs.

pith-pipeline@v0.9.0 · 5728 in / 1153 out tokens · 32436 ms · 2026-05-19T08:43:45.114425+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

15 extracted references · 15 canonical work pages

  1. [1]

    Maintaining discrete probabil- ity distributions in practice

    Daniel Allendorf. Maintaining discrete probabil- ity distributions in practice. In Proceedings of the Symposium on Algorithm Engineering and Exper- iments, 2024

  2. [2]

    Parallel and i/o-efficient algorithms for non-linear preferential attachment

    Daniel Allendorf, Ulrich Meyer, Manuel Pen- schuck, and Hung Tran. Parallel and i/o-efficient algorithms for non-linear preferential attachment. In 2023 Proceedings of the Symposium on Algo- rithm Engineering and Experiments (ALENEX) , pages 65–76. SIAM, 2023

  3. [3]

    Monine, Ryan N

    Joshua Colvin, Michael I. Monine, Ryan N. Gutenkunst, William S. Hlavacek, Daniel D. Von Hoff, and Richard G. Posner. Rulemonkey: software for stochastic simulation of rule-based models. BMC bioinformatics , 11:1–14, 2010

  4. [4]

    Bodlaender, and Gerard T

    Federico D’Ambrosio, Hans L. Bodlaender, and Gerard T. Barkema. Dynamic sampling from a discrete probability distribution with a known dis- tribution of rates. Computational Statistics, 2022

  5. [5]

    Ex- act sublinear binomial sampling

    Mart´ ın Farach-Colton and Meng-Tsung Tsai. Ex- act sublinear binomial sampling. Algorithmica, 73:637–651, 2015

  6. [6]

    Beforeit.jl: High- performance agent-based macroeconomics made easy, 2025

    Aldo Glielmo, Mitja Devetak, Adriano Meligrana, and Sebastian Poledna. Beforeit.jl: High- performance agent-based macroeconomics made easy, 2025. 7

  7. [7]

    Solomon Kullback and Richard A. Leibler. On information and sufficiency. The annals of math- ematical statistics, 22(1):79–86, 1951

  8. [8]

    Fast random integer generation in an interval

    Daniel Lemire. Fast random integer generation in an interval. ACM Transactions on Modeling and Computer Simulation (TOMACS) , 29(1):1– 12, 2019

  9. [9]

    Divergence measures based on the shannon entropy

    Jianhua Lin. Divergence measures based on the shannon entropy. IEEE Transactions on Informa- tion theory, 37(1):145–151, 2002

  10. [10]

    Dynamic generation of discrete random variates

    Yossi Matias, Jeffrey Vitter, and Ni Wen-Chun. Dynamic generation of discrete random variates. Theory of Computing Systems , 2008

  11. [11]

    A constant-time kinetic monte carlo algorithm for simulation of large biochemical re- action networks

    Alexander Slepoy, Aidan Thompson, and Steven Plimpton. A constant-time kinetic monte carlo algorithm for simulation of large biochemical re- action networks. The Journal of chemical physics , 2008

  12. [12]

    Alastair J. Walker. An efficient method for gener- ating discrete random variables with general dis- tributions. ACM Transactions on Mathematical Software, 1977

  13. [13]

    C. K. Wong and M. C. Easton. An efficient method for weighted sampling without replace- ment. SIAM Journal on Computing , 9(1):111–113, 1980

  14. [14]

    Phillips, Michael Matheny, and Feifei Li

    Dong Xie, Jeff M. Phillips, Michael Matheny, and Feifei Li. Spatial independent range sampling. In Proceedings of the 2021 International Conference on Management of Data , pages 2023–2035, 2021

  15. [15]

    Efficient dynamic weighted set sampling and its extension

    Fangyuan Zhang, Mengxu Jiang, and Sibo Wang. Efficient dynamic weighted set sampling and its extension. Proceedings of the VLDB Endowment , 2023. 8