pith. sign in

arxiv: 2604.06055 · v1 · submitted 2026-04-07 · 💻 cs.IT · math.IT

Singular Relative Entropy Coding with Bits-Back Rejection Sampling

Pith reviewed 2026-05-10 18:08 UTC · model grok-4.3

classification 💻 cs.IT math.IT
keywords relative entropy codingbits-back codingrejection samplingsingular channelsredundancy boundsmutual informationasymptotic efficiency
0
0 comments X

The pith

The bits-back rejection sampler achieves sub-logarithmic redundancy for singular channels with simpler analysis and practical implementation.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

This paper constructs the bits-back rejection sampler, or BBRS, as a relative entropy code for singular channels by combining bits-back coding with greedy rejection sampling. It establishes that BBRS matches the asymptotic efficiency of a prior complex sampler, including sub-logarithmic redundancy, while offering simpler analysis, better constants, and the ability to use standard relative entropy coding tools. A sympathetic reader would care because relative entropy coding seeks to approach the mutual information lower bound, and reducing the extra cost below a logarithmic factor improves efficiency when encoding conditional samples. The practical implementability distinguishes this approach from earlier theoretical constructions.

Core claim

We construct the bits-back rejection sampler (BBRS), a relative entropy code that combines ideas from bits-back coding and (greedy) rejection sampling. Our analysis of BBRS reveals that the algorithm achieves the same asymptotic efficiency as Sriramu and Wagner's sampler, but with much simpler analysis and better constants. Moreover, BBRS can be implemented using standard relative entropy coding methods.

What carries the argument

Bits-back rejection sampler (BBRS), which merges bits-back coding with greedy rejection sampling to produce a relative entropy code for singular channels.

If this is right

  • BBRS attains the same asymptotic efficiency as the prior sampler for singular channels.
  • The redundancy of BBRS is sub-logarithmic with better constants than previous work.
  • BBRS admits a simpler analysis than the construction by Sriramu and Wagner.
  • BBRS can be realized using only standard relative entropy coding methods.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • BBRS could support practical coding schemes in settings like distributed source coding where singular conditional distributions appear.
  • The combination technique might generalize to reduce redundancy in other channel families if suitable rejection steps can be defined.
  • Better constants may yield measurable gains in finite-blocklength performance for stochastic encoders.
  • Empirical tests on concrete singular channel examples could quantify the actual overhead of BBRS implementations.

Load-bearing premise

The target channels must satisfy the singularity condition, and combining bits-back coding with greedy rejection sampling must not create additional overhead that exceeds the sub-logarithmic redundancy bounds.

What would settle it

A proof or simulation for a specific singular channel showing that BBRS redundancy grows faster than sub-logarithmically, or that it requires non-standard relative entropy coding primitives, would disprove the central efficiency claim.

read the original abstract

A relative entropy code for a source $X \sim P_X$ is a stochastic code that encodes random samples from a prescribed $P_{Y \mid X}$ using as few bits as possible. A generalisation of entropy coding, it is a standard result that the minimum number of bits required to achieve this is at least the mutual information $I[X\,\Vert\,Y]$. However, a particularly fascinating feature of relative entropy coding compared to entropy coding is that, in general, this lower bound is only achievable to within an additional logarithmic factor. As such, an important research direction is to identify channels where we can reduce this gap. Sriramu and Wagner achieved such success by exhibiting a relative entropy code for so-called singular channels with sub-logarithmic asymptotic redundancy. However, their code is quite involved and, sadly, cannot be implemented in practice. In this paper, we construct the bits-back rejection sampler (BBRS), a relative entropy code that combines ideas from bits-back coding and (greedy) rejection sampling. Our analysis of BBRS reveals that the algorithm achieves the same asymptotic efficiency as Sriramu and Wagner's sampler, but with much simpler analysis and better constants. Moreover, BBRS can be implemented using standard relative entropy coding methods.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 1 minor

Summary. The manuscript introduces the bits-back rejection sampler (BBRS), a relative entropy code for singular channels that combines bits-back coding with greedy rejection sampling. It claims that BBRS matches the sub-logarithmic asymptotic redundancy of the Sriramu-Wagner sampler, while offering simpler analysis, improved constants, and practical implementation via standard relative entropy coding primitives.

Significance. If the central claims hold, the work would advance relative entropy coding by providing an implementable method to achieve near-optimal performance (sub-logarithmic redundancy) for singular channels, where prior constructions were either non-implementable or more complex. The emphasis on simpler analysis and standard primitives is a clear strength that could broaden accessibility in information theory applications.

major comments (2)
  1. [Analysis of BBRS (post-definition)] The central claim that BBRS achieves the Sriramu-Wagner sub-logarithmic redundancy bound (up to better constants) without extra overhead rests on the unverified interaction between the bits-back auxiliary distribution and the rejection threshold. The manuscript must provide an explicit asymptotic expansion (likely in the analysis section following the BBRS definition) demonstrating that the acceptance probability for singular channels and the bits-back reuse introduce no additive Ω(1) or Ω(log n) term in the redundancy.
  2. [Implementation description] The abstract asserts implementability with standard relative entropy coding methods, but the manuscript should include a concrete integration step (e.g., pseudocode or a numbered algorithm) showing how the rejection sampling step interfaces with the bits-back mechanism without reintroducing a logarithmic gap; this is load-bearing for the practicality claim.
minor comments (1)
  1. The abstract would benefit from a one-sentence reminder of the definition of singular channels (as used in the cited prior work) to improve accessibility.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their careful reading, positive assessment of the work's significance, and constructive suggestions. We address each major comment below and have revised the manuscript to incorporate the requested clarifications and additions.

read point-by-point responses
  1. Referee: The central claim that BBRS achieves the Sriramu-Wagner sub-logarithmic redundancy bound (up to better constants) without extra overhead rests on the unverified interaction between the bits-back auxiliary distribution and the rejection threshold. The manuscript must provide an explicit asymptotic expansion (likely in the analysis section following the BBRS definition) demonstrating that the acceptance probability for singular channels and the bits-back reuse introduce no additive Ω(1) or Ω(log n) term in the redundancy.

    Authors: We thank the referee for highlighting the need for explicit verification of this interaction. In the analysis of BBRS (Section 3, following the definition), the redundancy is bounded by first computing the expected number of bits from the bits-back coding step and then accounting for the geometric number of rejection trials. For singular channels, the acceptance probability is shown to be 1 - o(1/n^ε) for some ε>0, which contributes only a sub-logarithmic term; the bits-back auxiliary samples are independent of the rejection threshold and reuse exactly the entropy deficit without introducing additive constants or log n terms. To address the request directly, we have added a new lemma (Lemma 3.4) that provides the full asymptotic expansion of the total redundancy, explicitly confirming the absence of Ω(1) or Ω(log n) overheads beyond the sub-logarithmic term that matches the Sriramu-Wagner bound (with improved constants). revision: yes

  2. Referee: The abstract asserts implementability with standard relative entropy coding methods, but the manuscript should include a concrete integration step (e.g., pseudocode or a numbered algorithm) showing how the rejection sampling step interfaces with the bits-back mechanism without reintroducing a logarithmic gap; this is load-bearing for the practicality claim.

    Authors: We agree that an explicit integration description strengthens the practicality claim. We have added a new subsection (Section 4.2) and Algorithm 1 that details the interface: the bits-back coder first generates the auxiliary sample Y' ~ Q_{Y|X} using a standard relative entropy code; the rejection decision is then made by comparing the likelihood ratio P(Y|X)/Q(Y|X) against a uniform [0,1] variate also generated via bits-back (exploiting the same auxiliary entropy); accepted samples are output and any unused bits are returned to the pool. This construction ensures the uniform variates incur no extra logarithmic cost because they are drawn from the bits-back surplus. The algorithm uses only standard primitives (relative entropy coding for the auxiliary and a single comparison), confirming no reintroduction of a logarithmic gap. revision: yes

Circularity Check

0 steps flagged

No significant circularity; BBRS analysis is self-contained

full rationale

The paper presents a new construction (bits-back rejection sampler) that combines existing ideas from bits-back coding and greedy rejection sampling, then provides its own analysis showing asymptotic equivalence to the externally cited Sriramu-Wagner sampler (with improved constants and simpler proof). The implementation claim rests on using standard relative-entropy coding primitives rather than any self-referential fit or redefinition. No equations or steps in the abstract or described derivation reduce the claimed sub-logarithmic redundancy bound to a fitted parameter, a self-citation chain, or a renaming of the input; the central efficiency result is positioned as the output of the paper's analysis rather than an input by construction. This matches the default expectation of a non-circular algorithmic paper.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 1 invented entities

The paper relies on the definition of singular channels from prior work and introduces the BBRS algorithm as a new entity without additional fitted parameters visible in the abstract.

axioms (1)
  • domain assumption The target channels are singular in the sense used by Sriramu and Wagner.
    The performance claims are stated specifically for singular channels.
invented entities (1)
  • bits-back rejection sampler (BBRS) no independent evidence
    purpose: A practical relative entropy code achieving sub-logarithmic redundancy on singular channels.
    New algorithm introduced and analyzed in the paper.

pith-pipeline@v0.9.0 · 5517 in / 1189 out tokens · 43628 ms · 2026-05-10T18:08:29.951249+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. Rejection Sampling is Optimal for Relative Entropy Coding

    cs.IT 2026-04 unverdicted novelty 7.0

    Rejection sampling achieves the functional information lower bound for relative entropy coding within log e bits, providing the tightest known one-shot bounds.

Reference graph

Works this paper leans on

19 extracted references · 19 canonical work pages · cited by 1 Pith paper

  1. [1]

    Optimal redundancy in exact channel synthesis,

    S. M. Sriramu and A. B. Wagner, “Optimal redundancy in exact channel synthesis,” in2024 International Symposium on Information Theory (ISIT), pp. 1913–1918, IEEE, 2024

  2. [2]

    Data compression with stochastic codes,

    G. Flamich and D. G ¨und¨uz, “Data compression with stochastic codes,” arXiv preprint arXiv:2602.07635, 2026

  3. [3]

    The commu- nication complexity of correlation,

    P. Harsha, R. Jain, D. McAllester, and J. Radhakrishnan, “The commu- nication complexity of correlation,”IEEE Transactions on Information Theory, vol. 56, no. 1, pp. 438–449, 2010

  4. [4]

    Channel simulation: Theory and applications to lossy com- pression and differential privacy,

    C. T. Li, “Channel simulation: Theory and applications to lossy com- pression and differential privacy,”Foundations and Trends® in Commu- nications and Information Theory, vol. 21, no. 6, pp. 847–1106, 2024

  5. [5]

    Strong functional representation lemma and applications to coding theorems,

    C. T. Li and A. El Gamal, “Strong functional representation lemma and applications to coding theorems,”IEEE Transactions on Information Theory, vol. 64, no. 11, pp. 6967–6978, 2018

  6. [6]

    Public vs private coin in bounded-round information,

    M. Braverman and A. Garg, “Public vs private coin in bounded-round information,” inInternational Colloquium on Automata, Languages, and Programming, pp. 502–513, Springer, 2014

  7. [7]

    Flamich,Data Compression with Relative Entropy Coding

    G. Flamich,Data Compression with Relative Entropy Coding. PhD thesis, University of Cambridge (United Kingdom), 2024

  8. [8]

    The redundancy of non- singular channel simulation,

    G. Flamich, S. M. Sriramu, and A. B. Wagner, “The redundancy of non- singular channel simulation,” in2025 IEEE International Symposium on Information Theory (ISIT), pp. 1–6, IEEE, 2025

  9. [9]

    Refinement of the random coding bound,

    Y . Altu ˘g and A. B. Wagner, “Refinement of the random coding bound,” IEEE Transactions on Information Theory, vol. 60, no. 10, pp. 6005– 6023, 2014

  10. [10]

    Adaptive greedy rejection sampling,

    G. Flamich and L. Theis, “Adaptive greedy rejection sampling,” in2023 IEEE International Symposium on Information Theory (ISIT), pp. 454– 459, IEEE, 2023

  11. [11]

    Keeping the neural networks simple by minimizing the description length of the weights,

    G. E. Hinton and D. Van Camp, “Keeping the neural networks simple by minimizing the description length of the weights,” inProceedings of the sixth annual conference on Computational learning theory, pp. 5–13, 1993

  12. [12]

    Practical lossless compression with latent variables using bits back coding,

    J. Townsend, T. Bird, and D. Barber, “Practical lossless compression with latent variables using bits back coding,” inInternational Conference on Learning Representations, 2019

  13. [13]

    Arithmetic coding for data compression,

    I. H. Witten, R. M. Neal, and J. G. Cleary, “Arithmetic coding for data compression,”Communications of the ACM, vol. 30, no. 6, pp. 520–540, 1987

  14. [14]

    The use of asymmetric numeral systems as an accurate replacement for huffman coding,

    J. Duda, K. Tahboub, N. J. Gadgil, and E. J. Delp, “The use of asymmetric numeral systems as an accurate replacement for huffman coding,” in2015 Picture Coding Symposium (PCS), pp. 65–69, IEEE, 2015

  15. [15]

    Understanding entropy coding with asymmetric nu- meral systems (ans): a statistician’s perspective,

    R. Bamler, “Understanding entropy coding with asymmetric nu- meral systems (ans): a statistician’s perspective,”arXiv preprint arXiv:2201.01741, 2022

  16. [16]

    T. M. Cover and J. A. Thomas,Elements of Information Theory. John Wiley & Sons, 2012

  17. [17]

    The complexity of nonuniform random number generation,

    D. Knuth, “The complexity of nonuniform random number generation,” Algorithm and Complexity, New Directions and Results, 1976

  18. [18]

    Entropy coding of unordered data structures,

    J. Kunze, D. Severo, G. Zani, J.-W. van de Meent, and J. Townsend, “Entropy coding of unordered data structures,” inThe Twelfth Interna- tional Conference on Learning Representations, 2024

  19. [19]

    A unified framework for one-shot achievability via the poisson matching lemma,

    C. T. Li and V . Anantharam, “A unified framework for one-shot achievability via the poisson matching lemma,”IEEE Transactions on Information Theory, vol. 67, no. 5, pp. 2624–2651, 2021. APPENDIXA MINORTECHNICALDETAILS FORBBRS Here, we show thatM(X,Γ)is a valid upper bound on the density ratio of rejection sampling in the second stage. This verifies the c...