Singular Relative Entropy Coding with Bits-Back Rejection Sampling
Pith reviewed 2026-05-10 18:08 UTC · model grok-4.3
The pith
The bits-back rejection sampler achieves sub-logarithmic redundancy for singular channels with simpler analysis and practical implementation.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
We construct the bits-back rejection sampler (BBRS), a relative entropy code that combines ideas from bits-back coding and (greedy) rejection sampling. Our analysis of BBRS reveals that the algorithm achieves the same asymptotic efficiency as Sriramu and Wagner's sampler, but with much simpler analysis and better constants. Moreover, BBRS can be implemented using standard relative entropy coding methods.
What carries the argument
Bits-back rejection sampler (BBRS), which merges bits-back coding with greedy rejection sampling to produce a relative entropy code for singular channels.
If this is right
- BBRS attains the same asymptotic efficiency as the prior sampler for singular channels.
- The redundancy of BBRS is sub-logarithmic with better constants than previous work.
- BBRS admits a simpler analysis than the construction by Sriramu and Wagner.
- BBRS can be realized using only standard relative entropy coding methods.
Where Pith is reading between the lines
- BBRS could support practical coding schemes in settings like distributed source coding where singular conditional distributions appear.
- The combination technique might generalize to reduce redundancy in other channel families if suitable rejection steps can be defined.
- Better constants may yield measurable gains in finite-blocklength performance for stochastic encoders.
- Empirical tests on concrete singular channel examples could quantify the actual overhead of BBRS implementations.
Load-bearing premise
The target channels must satisfy the singularity condition, and combining bits-back coding with greedy rejection sampling must not create additional overhead that exceeds the sub-logarithmic redundancy bounds.
What would settle it
A proof or simulation for a specific singular channel showing that BBRS redundancy grows faster than sub-logarithmically, or that it requires non-standard relative entropy coding primitives, would disprove the central efficiency claim.
read the original abstract
A relative entropy code for a source $X \sim P_X$ is a stochastic code that encodes random samples from a prescribed $P_{Y \mid X}$ using as few bits as possible. A generalisation of entropy coding, it is a standard result that the minimum number of bits required to achieve this is at least the mutual information $I[X\,\Vert\,Y]$. However, a particularly fascinating feature of relative entropy coding compared to entropy coding is that, in general, this lower bound is only achievable to within an additional logarithmic factor. As such, an important research direction is to identify channels where we can reduce this gap. Sriramu and Wagner achieved such success by exhibiting a relative entropy code for so-called singular channels with sub-logarithmic asymptotic redundancy. However, their code is quite involved and, sadly, cannot be implemented in practice. In this paper, we construct the bits-back rejection sampler (BBRS), a relative entropy code that combines ideas from bits-back coding and (greedy) rejection sampling. Our analysis of BBRS reveals that the algorithm achieves the same asymptotic efficiency as Sriramu and Wagner's sampler, but with much simpler analysis and better constants. Moreover, BBRS can be implemented using standard relative entropy coding methods.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript introduces the bits-back rejection sampler (BBRS), a relative entropy code for singular channels that combines bits-back coding with greedy rejection sampling. It claims that BBRS matches the sub-logarithmic asymptotic redundancy of the Sriramu-Wagner sampler, while offering simpler analysis, improved constants, and practical implementation via standard relative entropy coding primitives.
Significance. If the central claims hold, the work would advance relative entropy coding by providing an implementable method to achieve near-optimal performance (sub-logarithmic redundancy) for singular channels, where prior constructions were either non-implementable or more complex. The emphasis on simpler analysis and standard primitives is a clear strength that could broaden accessibility in information theory applications.
major comments (2)
- [Analysis of BBRS (post-definition)] The central claim that BBRS achieves the Sriramu-Wagner sub-logarithmic redundancy bound (up to better constants) without extra overhead rests on the unverified interaction between the bits-back auxiliary distribution and the rejection threshold. The manuscript must provide an explicit asymptotic expansion (likely in the analysis section following the BBRS definition) demonstrating that the acceptance probability for singular channels and the bits-back reuse introduce no additive Ω(1) or Ω(log n) term in the redundancy.
- [Implementation description] The abstract asserts implementability with standard relative entropy coding methods, but the manuscript should include a concrete integration step (e.g., pseudocode or a numbered algorithm) showing how the rejection sampling step interfaces with the bits-back mechanism without reintroducing a logarithmic gap; this is load-bearing for the practicality claim.
minor comments (1)
- The abstract would benefit from a one-sentence reminder of the definition of singular channels (as used in the cited prior work) to improve accessibility.
Simulated Author's Rebuttal
We thank the referee for their careful reading, positive assessment of the work's significance, and constructive suggestions. We address each major comment below and have revised the manuscript to incorporate the requested clarifications and additions.
read point-by-point responses
-
Referee: The central claim that BBRS achieves the Sriramu-Wagner sub-logarithmic redundancy bound (up to better constants) without extra overhead rests on the unverified interaction between the bits-back auxiliary distribution and the rejection threshold. The manuscript must provide an explicit asymptotic expansion (likely in the analysis section following the BBRS definition) demonstrating that the acceptance probability for singular channels and the bits-back reuse introduce no additive Ω(1) or Ω(log n) term in the redundancy.
Authors: We thank the referee for highlighting the need for explicit verification of this interaction. In the analysis of BBRS (Section 3, following the definition), the redundancy is bounded by first computing the expected number of bits from the bits-back coding step and then accounting for the geometric number of rejection trials. For singular channels, the acceptance probability is shown to be 1 - o(1/n^ε) for some ε>0, which contributes only a sub-logarithmic term; the bits-back auxiliary samples are independent of the rejection threshold and reuse exactly the entropy deficit without introducing additive constants or log n terms. To address the request directly, we have added a new lemma (Lemma 3.4) that provides the full asymptotic expansion of the total redundancy, explicitly confirming the absence of Ω(1) or Ω(log n) overheads beyond the sub-logarithmic term that matches the Sriramu-Wagner bound (with improved constants). revision: yes
-
Referee: The abstract asserts implementability with standard relative entropy coding methods, but the manuscript should include a concrete integration step (e.g., pseudocode or a numbered algorithm) showing how the rejection sampling step interfaces with the bits-back mechanism without reintroducing a logarithmic gap; this is load-bearing for the practicality claim.
Authors: We agree that an explicit integration description strengthens the practicality claim. We have added a new subsection (Section 4.2) and Algorithm 1 that details the interface: the bits-back coder first generates the auxiliary sample Y' ~ Q_{Y|X} using a standard relative entropy code; the rejection decision is then made by comparing the likelihood ratio P(Y|X)/Q(Y|X) against a uniform [0,1] variate also generated via bits-back (exploiting the same auxiliary entropy); accepted samples are output and any unused bits are returned to the pool. This construction ensures the uniform variates incur no extra logarithmic cost because they are drawn from the bits-back surplus. The algorithm uses only standard primitives (relative entropy coding for the auxiliary and a single comparison), confirming no reintroduction of a logarithmic gap. revision: yes
Circularity Check
No significant circularity; BBRS analysis is self-contained
full rationale
The paper presents a new construction (bits-back rejection sampler) that combines existing ideas from bits-back coding and greedy rejection sampling, then provides its own analysis showing asymptotic equivalence to the externally cited Sriramu-Wagner sampler (with improved constants and simpler proof). The implementation claim rests on using standard relative-entropy coding primitives rather than any self-referential fit or redefinition. No equations or steps in the abstract or described derivation reduce the claimed sub-logarithmic redundancy bound to a fitted parameter, a self-citation chain, or a renaming of the input; the central efficiency result is positioned as the output of the paper's analysis rather than an input by construction. This matches the default expectation of a non-circular algorithmic paper.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption The target channels are singular in the sense used by Sriramu and Wagner.
invented entities (1)
-
bits-back rejection sampler (BBRS)
no independent evidence
Forward citations
Cited by 1 Pith paper
-
Rejection Sampling is Optimal for Relative Entropy Coding
Rejection sampling achieves the functional information lower bound for relative entropy coding within log e bits, providing the tightest known one-shot bounds.
Reference graph
Works this paper leans on
-
[1]
Optimal redundancy in exact channel synthesis,
S. M. Sriramu and A. B. Wagner, “Optimal redundancy in exact channel synthesis,” in2024 International Symposium on Information Theory (ISIT), pp. 1913–1918, IEEE, 2024
work page 1913
-
[2]
Data compression with stochastic codes,
G. Flamich and D. G ¨und¨uz, “Data compression with stochastic codes,” arXiv preprint arXiv:2602.07635, 2026
-
[3]
The commu- nication complexity of correlation,
P. Harsha, R. Jain, D. McAllester, and J. Radhakrishnan, “The commu- nication complexity of correlation,”IEEE Transactions on Information Theory, vol. 56, no. 1, pp. 438–449, 2010
work page 2010
-
[4]
Channel simulation: Theory and applications to lossy com- pression and differential privacy,
C. T. Li, “Channel simulation: Theory and applications to lossy com- pression and differential privacy,”Foundations and Trends® in Commu- nications and Information Theory, vol. 21, no. 6, pp. 847–1106, 2024
work page 2024
-
[5]
Strong functional representation lemma and applications to coding theorems,
C. T. Li and A. El Gamal, “Strong functional representation lemma and applications to coding theorems,”IEEE Transactions on Information Theory, vol. 64, no. 11, pp. 6967–6978, 2018
work page 2018
-
[6]
Public vs private coin in bounded-round information,
M. Braverman and A. Garg, “Public vs private coin in bounded-round information,” inInternational Colloquium on Automata, Languages, and Programming, pp. 502–513, Springer, 2014
work page 2014
-
[7]
Flamich,Data Compression with Relative Entropy Coding
G. Flamich,Data Compression with Relative Entropy Coding. PhD thesis, University of Cambridge (United Kingdom), 2024
work page 2024
-
[8]
The redundancy of non- singular channel simulation,
G. Flamich, S. M. Sriramu, and A. B. Wagner, “The redundancy of non- singular channel simulation,” in2025 IEEE International Symposium on Information Theory (ISIT), pp. 1–6, IEEE, 2025
work page 2025
-
[9]
Refinement of the random coding bound,
Y . Altu ˘g and A. B. Wagner, “Refinement of the random coding bound,” IEEE Transactions on Information Theory, vol. 60, no. 10, pp. 6005– 6023, 2014
work page 2014
-
[10]
Adaptive greedy rejection sampling,
G. Flamich and L. Theis, “Adaptive greedy rejection sampling,” in2023 IEEE International Symposium on Information Theory (ISIT), pp. 454– 459, IEEE, 2023
work page 2023
-
[11]
Keeping the neural networks simple by minimizing the description length of the weights,
G. E. Hinton and D. Van Camp, “Keeping the neural networks simple by minimizing the description length of the weights,” inProceedings of the sixth annual conference on Computational learning theory, pp. 5–13, 1993
work page 1993
-
[12]
Practical lossless compression with latent variables using bits back coding,
J. Townsend, T. Bird, and D. Barber, “Practical lossless compression with latent variables using bits back coding,” inInternational Conference on Learning Representations, 2019
work page 2019
-
[13]
Arithmetic coding for data compression,
I. H. Witten, R. M. Neal, and J. G. Cleary, “Arithmetic coding for data compression,”Communications of the ACM, vol. 30, no. 6, pp. 520–540, 1987
work page 1987
-
[14]
The use of asymmetric numeral systems as an accurate replacement for huffman coding,
J. Duda, K. Tahboub, N. J. Gadgil, and E. J. Delp, “The use of asymmetric numeral systems as an accurate replacement for huffman coding,” in2015 Picture Coding Symposium (PCS), pp. 65–69, IEEE, 2015
work page 2015
-
[15]
Understanding entropy coding with asymmetric nu- meral systems (ans): a statistician’s perspective,
R. Bamler, “Understanding entropy coding with asymmetric nu- meral systems (ans): a statistician’s perspective,”arXiv preprint arXiv:2201.01741, 2022
-
[16]
T. M. Cover and J. A. Thomas,Elements of Information Theory. John Wiley & Sons, 2012
work page 2012
-
[17]
The complexity of nonuniform random number generation,
D. Knuth, “The complexity of nonuniform random number generation,” Algorithm and Complexity, New Directions and Results, 1976
work page 1976
-
[18]
Entropy coding of unordered data structures,
J. Kunze, D. Severo, G. Zani, J.-W. van de Meent, and J. Townsend, “Entropy coding of unordered data structures,” inThe Twelfth Interna- tional Conference on Learning Representations, 2024
work page 2024
-
[19]
A unified framework for one-shot achievability via the poisson matching lemma,
C. T. Li and V . Anantharam, “A unified framework for one-shot achievability via the poisson matching lemma,”IEEE Transactions on Information Theory, vol. 67, no. 5, pp. 2624–2651, 2021. APPENDIXA MINORTECHNICALDETAILS FORBBRS Here, we show thatM(X,Γ)is a valid upper bound on the density ratio of rejection sampling in the second stage. This verifies the c...
work page 2021
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.