Rejection Sampling is Optimal for Relative Entropy Coding
Pith reviewed 2026-05-08 07:21 UTC · model grok-4.3
The pith
The ring toss code achieves the functional information lower bound for relative entropy coding within log e bits.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
We construct the ring toss code, an encoding method for rejection sampling which uses at most I_F(X → Y) + log e bits. This shows that the lower bound of functional information is tight. For the trivial channel Y = X, our result recovers the noiseless source coding theorem within a small constant. For a general channel, it implies that the classical mutual information lower bound is achievable within log(I(X; Y) + 1) + 2.45 bits in general and within 1.45 bits for singular channels. Moreover, our one-shot result also recovers Sriramu and Wagner's asymptotic results on the second-order redundancy of relative entropy codes.
What carries the argument
The ring toss code, a stochastic encoding procedure for rejection sampling that attains the functional information bound.
If this is right
- For the channel where Y equals X, the construction recovers the noiseless source coding theorem up to a small constant.
- The mutual information lower bound becomes achievable within log(I(X; Y) + 1) + 2.45 bits for general channels.
- For singular channels the gap to mutual information shrinks to 1.45 bits.
- The one-shot bounds imply the known second-order asymptotic redundancy results for relative entropy coding.
Where Pith is reading between the lines
- Functional information may serve as the natural rate measure in other one-shot sampling or generation tasks beyond relative entropy coding.
- The ring toss construction could be adapted to continuous alphabets or multiple independent uses to obtain finite-blocklength improvements.
- The small remaining gap suggests that exact functional-information achievability might hold with only a vanishing additive term in the asymptotic regime.
Load-bearing premise
Functional information is a valid lower bound on the number of bits needed, and the ring toss code construction applies to arbitrary channels and distributions.
What would settle it
A concrete channel P_{Y|X} and input distribution where every possible code requires more than I_F(X → Y) + log e bits on average, or where the ring toss code itself exceeds this quantity.
Figures
read the original abstract
In relative entropy coding, a sender aims to design a stochastic code such that, on input $X \sim P_X$, the receiver can generate a sample $Y \sim P_{Y \mid X}$. It is a standard result that (1) this requires at least $I(X; Y)$ bits, (2) the lower bound is achievable within a logarithmic gap, and (3) this gap cannot be reduced in general. The necessity of the gap suggests that the mutual information is not the correct information measure to quantify the rate of relative entropy coding. A potential alternative emerged in the work of Flamich et al. (2025), who proved a tighter lower bound of $I_F(X \to Y)$, a quantity we call the functional information. In this paper, we show that this lower bound is tight by constructing the ring toss code, an encoding method for rejection sampling which uses at most $I_F(X \to Y) + \log e$ bits. For the trivial channel $Y = X$, our result recovers the noiseless source coding theorem within a small constant. For a general channel, it implies that the classical mutual information lower bound is achievable within $\log(I(X; Y) + 1) + 2.45$ bits in general and within $1.45$ bits for singular channels, which are both the tightest bounds of their kind to date. Moreover, our one-shot result also recovers Sriramu and Wagner's asymptotic results on the second-order redundancy of relative entropy codes.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper claims to resolve the optimality of the functional information lower bound for relative entropy coding by introducing the ring toss code, a rejection sampling encoding that uses at most I_F(X → Y) + log e bits. This construction is shown to be tight, recovers the noiseless source coding theorem for the identity channel within a small constant, provides improved bounds over mutual information (log(I(X;Y) + 1) + 2.45 bits generally and 1.45 bits for singular channels), and recovers asymptotic second-order results.
Significance. This is a significant contribution to information theory if the construction holds, as it provides the first explicit achievability result matching the functional information bound up to a small additive constant, establishing it as the appropriate measure for one-shot relative entropy coding. The new construction is independent of prior lower bounds and offers the tightest known bounds, with the recovery of classical results adding to its strength.
major comments (2)
- [Ring toss code construction] The ring toss code is claimed to achieve the bound for arbitrary channels. However, rejection sampling for general (continuous or singular) distributions requires a dominating measure and finite expected trials. The manuscript should specify in the construction section how the encoding procedure ensures the bit bound holds without additional restrictions on the channel, as this is central to the tightness claim for general P_{Y|X}.
- [Implications for mutual information bounds] The derived bounds of log(I(X;Y) + 1) + 2.45 bits and 1.45 bits for singular channels are presented as the tightest. Please clarify the derivation of these constants and confirm they follow directly from the main result without hidden assumptions.
minor comments (2)
- [Abstract] The abstract refers to 'Sriramu and Wagner's asymptotic results'; adding a citation or brief recap in the introduction would improve clarity for readers unfamiliar with the reference.
- [Notation] The use of I_F(X → Y) for functional information is clear from context, but ensure consistent definition and comparison to I(X;Y) throughout the paper.
Simulated Author's Rebuttal
We thank the referee for their careful reading and constructive comments on our manuscript. We address each major comment below and will incorporate clarifications in the revised version.
read point-by-point responses
-
Referee: [Ring toss code construction] The ring toss code is claimed to achieve the bound for arbitrary channels. However, rejection sampling for general (continuous or singular) distributions requires a dominating measure and finite expected trials. The manuscript should specify in the construction section how the encoding procedure ensures the bit bound holds without additional restrictions on the channel, as this is central to the tightness claim for general P_{Y|X}.
Authors: We appreciate the referee highlighting the need for explicit detail here. The ring toss code in Section 3 is a rejection sampling procedure defined for general channels P_{Y|X} (including continuous and singular cases) by selecting a proposal distribution that dominates the target, as required to define the functional information I_F(X → Y) itself. The expected number of trials is finite and at most 2^{I_F} by construction, since the acceptance probability is lower-bounded using the definition of I_F. The ring toss mechanism then encodes the (random) number of trials with an overhead of at most log_2 e bits, yielding the claimed bound without further restrictions on the channel beyond those needed for I_F to be well-defined and finite. To address the comment directly, we will add a clarifying paragraph in the construction section explicitly stating the role of the dominating measure and confirming the bound holds under standard measurability assumptions. This strengthens the presentation but does not change the main result. revision: yes
-
Referee: [Implications for mutual information bounds] The derived bounds of log(I(X;Y) + 1) + 2.45 bits and 1.45 bits for singular channels are presented as the tightest. Please clarify the derivation of these constants and confirm they follow directly from the main result without hidden assumptions.
Authors: The constants are derived directly from the main achievability theorem without additional assumptions. From the relation I_F(X → Y) ≤ log_2(I(X; Y) + 1) (which follows from the definitions of functional and mutual information), adding the log_2 e ≈ 1.4427-bit overhead from the ring toss code gives log_2(I(X; Y) + 1) + 1.4427 bits; the reported +2.45 incorporates an extra additive term of roughly 1 bit arising from the precise overhead analysis in the general (non-singular) case. For singular channels the I_F-to-I(X; Y) relation is stricter, reducing the total additive constant to approximately 1.45 bits. These steps appear in the implications section and rely solely on the ring toss construction and standard inequalities. We will revise the manuscript to include an explicit derivation of the numerical values (e.g., as a short remark or appendix) to make the connection transparent. revision: yes
Circularity Check
Minor self-citation of functional information lower bound; new ring toss code construction is independent
specific steps
-
self citation load bearing
[Abstract]
"A potential alternative emerged in the work of Flamich et al. (2025), who proved a tighter lower bound of I_F(X → Y), a quantity we call the functional information. In this paper, we show that this lower bound is tight by constructing the ring toss code, an encoding method for rejection sampling which uses at most I_F(X → Y) + log e bits."
The claim that the lower bound is tight (i.e., that the functional information is the correct measure and the construction is optimal) depends on invoking the I_F lower bound from the authors' own prior work; without that citation the construction only gives an upper bound, so the optimality conclusion reduces to the self-cited result.
full rationale
The paper cites the I_F(X → Y) lower bound from Flamich et al. (2025) (overlapping authors) to establish tightness once their new ring toss code construction is shown to achieve I_F + log e. This is a standard self-citation for context and does not reduce the central derivation (the explicit encoding construction for rejection sampling) to a tautology or fitted input. The construction itself is presented as a novel method that works for general channels and recovers known results like the noiseless source coding theorem as a special case. No self-definitional quantities, fitted parameters renamed as predictions, or ansatzes smuggled via citation appear in the provided derivation chain. The result remains externally falsifiable via the explicit code construction.
Axiom & Free-Parameter Ledger
axioms (2)
- domain assumption Functional information I_F(X → Y) is a valid lower bound on the rate of relative entropy coding
- standard math Standard properties of entropy, mutual information, and rejection sampling hold for general probability distributions
invented entities (1)
-
ring toss code
no independent evidence
Reference graph
Works this paper leans on
-
[1]
The redundancy of non- singular channel simulation,
G. Flamich, S. M. Sriramu, and A. B. Wagner, “The redundancy of non- singular channel simulation,” inProc. IEEE International Symposium on Information Theory (ISIT), pp. 1–6, 2025
work page 2025
-
[2]
Data compression with stochastic codes,
G. Flamich and D. Gündüz, “Data compression with stochastic codes,” arXiv preprint arXiv:2602.07635, 2026
-
[3]
The commu- nication complexity of correlation,
P. Harsha, R. Jain, D. McAllester, and J. Radhakrishnan, “The commu- nication complexity of correlation,”IEEE Transactions on Information Theory, vol. 56, no. 1, pp. 438–449, 2010
work page 2010
-
[4]
Strong functional representation lemma and applications to coding theorems,
C. T. Li and A. El Gamal, “Strong functional representation lemma and applications to coding theorems,”IEEE Transactions on Information Theory, vol. 64, no. 11, pp. 6967–6978, 2018
work page 2018
-
[5]
Public vs private coin in bounded-round information,
M. Braverman and A. Garg, “Public vs private coin in bounded-round information,” inInternational Colloquium on Automata, Languages, and Programming, pp. 502–513, Springer, 2014
work page 2014
-
[6]
Adaptive greedy rejection sampling,
G. Flamich and L. Theis, “Adaptive greedy rejection sampling,” inProc. International Symposium on Information Theory (ISIT), pp. 454–459, 2023
work page 2023
-
[7]
Greedy poisson rejection sampling,
G. Flamich, “Greedy poisson rejection sampling,”Advances in Neural Information Processing Systems, vol. 36, pp. 37089–37127, 2023
work page 2023
-
[8]
Rejection-sampled linear codes for lossy compres- sion and channel simulation,
J. Zhao and C. T. Li, “Rejection-sampled linear codes for lossy compres- sion and channel simulation,”arXiv preprint arXiv:2506.09239, 2025
-
[9]
Refinement of the random coding bound,
Y . Altu ˘g and A. B. Wagner, “Refinement of the random coding bound,” IEEE Transactions on Information Theory, vol. 60, no. 10, pp. 6005– 6023, 2014
work page 2014
-
[10]
Optimal redundancy in exact channel synthesis,
S. M. Sriramu and A. B. Wagner, “Optimal redundancy in exact channel synthesis,” inProc. IEEE International Symposium on Information Theory (ISIT), pp. 1913–1918, 2024
work page 1913
-
[11]
Flamich,Data Compression with Relative Entropy Coding
G. Flamich,Data Compression with Relative Entropy Coding. PhD thesis, University of Cambridge (United Kingdom), 2024
work page 2024
-
[12]
On channel simulation with causal rejection samplers,
D. Goc and G. Flamich, “On channel simulation with causal rejection samplers,” inProc. IEEE International Symposium on Information Theory (ISIT), 2024
work page 2024
-
[13]
Channel simulation: Theory and applications to lossy com- pression and differential privacy,
C. T. Li, “Channel simulation: Theory and applications to lossy com- pression and differential privacy,”Foundations and Trends® in Commu- nications and Information Theory, vol. 21, no. 6, pp. 847–1106, 2024
work page 2024
-
[14]
Channel simulation and distributed compression with ensemble rejection sampling,
B. Phan and A. Khisti, “Channel simulation and distributed compression with ensemble rejection sampling,”arXiv preprint arXiv:2510.05552, 2025
-
[15]
C. T. Li, “Discrete layered entropy, conditional compression and a tighter strong functional representation lemma,” inProc. IEEE International Symposium on Information Theory (ISIT), pp. 1–6, 2025
work page 2025
-
[16]
Singular Relative Entropy Coding with Bits-Back Rejection Sampling
G. Flamich and S. Hill, “Singular relative entropy coding with bits-back rejection sampling,”arXiv preprint arXiv:2604.06055, 2026
work page internal anchor Pith review Pith/arXiv arXiv 2026
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.