Rejection Sampling is Optimal for Relative Entropy Coding

Fady Alajaji; Gergely Flamich; Spencer Hill; Tam\'as Linder

arxiv: 2604.23076 · v2 · submitted 2026-04-25 · 💻 cs.IT · math.IT

Rejection Sampling is Optimal for Relative Entropy Coding

Spencer Hill , Fady Alajaji , Tam\'as Linder , Gergely Flamich This is my paper

Pith reviewed 2026-05-08 07:21 UTC · model grok-4.3

classification 💻 cs.IT math.IT

keywords relative entropy codingrejection samplingfunctional informationring toss codeone-shot codingsource codinginformation theory

0 comments

The pith

The ring toss code achieves the functional information lower bound for relative entropy coding within log e bits.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper shows that functional information gives the right rate for relative entropy coding, where a sender encodes an input X so a receiver can produce a sample from the conditional distribution of Y given X. Earlier work established a lower bound using functional information but left open whether it could be met. By building the ring toss code for rejection sampling, the authors prove the bound is tight up to a small additive constant. This matters because it replaces the looser mutual information measure with a tighter one and recovers known source coding results as special cases.

Core claim

We construct the ring toss code, an encoding method for rejection sampling which uses at most I_F(X → Y) + log e bits. This shows that the lower bound of functional information is tight. For the trivial channel Y = X, our result recovers the noiseless source coding theorem within a small constant. For a general channel, it implies that the classical mutual information lower bound is achievable within log(I(X; Y) + 1) + 2.45 bits in general and within 1.45 bits for singular channels. Moreover, our one-shot result also recovers Sriramu and Wagner's asymptotic results on the second-order redundancy of relative entropy codes.

What carries the argument

The ring toss code, a stochastic encoding procedure for rejection sampling that attains the functional information bound.

If this is right

For the channel where Y equals X, the construction recovers the noiseless source coding theorem up to a small constant.
The mutual information lower bound becomes achievable within log(I(X; Y) + 1) + 2.45 bits for general channels.
For singular channels the gap to mutual information shrinks to 1.45 bits.
The one-shot bounds imply the known second-order asymptotic redundancy results for relative entropy coding.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Functional information may serve as the natural rate measure in other one-shot sampling or generation tasks beyond relative entropy coding.
The ring toss construction could be adapted to continuous alphabets or multiple independent uses to obtain finite-blocklength improvements.
The small remaining gap suggests that exact functional-information achievability might hold with only a vanishing additive term in the asymptotic regime.

Load-bearing premise

Functional information is a valid lower bound on the number of bits needed, and the ring toss code construction applies to arbitrary channels and distributions.

What would settle it

A concrete channel P_{Y|X} and input distribution where every possible code requires more than I_F(X → Y) + log e bits on average, or where the ring toss code itself exceeds this quantity.

Figures

Figures reproduced from arXiv: 2604.23076 by Fady Alajaji, Gergely Flamich, Spencer Hill, Tam\'as Linder.

**Figure 1.** Figure 1: Visualization of the width function when view at source ↗

**Figure 2.** Figure 2: A visualization of the acceptance region in rejection sampling for an additive white Gaussian noise channel Y = X+N with X ∼ N (0, 0.75) and N ∼ N (0, 0.25), where X and Y are truncated to a large but finite range to ensure a bounded density ratio. Here, x = −0.5 and the two proposal samples are Y1 = 0.3, Y2 = −0.4 with heights L1 = 1.5 and L2 = 1.1, where we use the shorthand Li ≜ MUi. We see that x ̸∈ S1… view at source ↗

**Figure 3.** Figure 3: Illustration of the ring toss code for X, Y ∼ PX,Y when the channel X → Y is singular. Given the common randomness Z, upon the sender observing the source X ∼ PX, they compute the filtered set of indices S from (27). Denote the the filtered samples {Y˜ i}i≥1. Now, for each Y˜ i, the encoder computes the support Si (shaded green and blue subsets of X in the figure) of the function x 7→ dPY |X dPY (Y˜ i | x)… view at source ↗

read the original abstract

In relative entropy coding, a sender aims to design a stochastic code such that, on input $X \sim P_X$, the receiver can generate a sample $Y \sim P_{Y \mid X}$. It is a standard result that (1) this requires at least $I(X; Y)$ bits, (2) the lower bound is achievable within a logarithmic gap, and (3) this gap cannot be reduced in general. The necessity of the gap suggests that the mutual information is not the correct information measure to quantify the rate of relative entropy coding. A potential alternative emerged in the work of Flamich et al. (2025), who proved a tighter lower bound of $I_F(X \to Y)$, a quantity we call the functional information. In this paper, we show that this lower bound is tight by constructing the ring toss code, an encoding method for rejection sampling which uses at most $I_F(X \to Y) + \log e$ bits. For the trivial channel $Y = X$, our result recovers the noiseless source coding theorem within a small constant. For a general channel, it implies that the classical mutual information lower bound is achievable within $\log(I(X; Y) + 1) + 2.45$ bits in general and within $1.45$ bits for singular channels, which are both the tightest bounds of their kind to date. Moreover, our one-shot result also recovers Sriramu and Wagner's asymptotic results on the second-order redundancy of relative entropy codes.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The ring toss code gives a concrete way to encode rejection sampling so it meets the functional information lower bound within log e bits.

read the letter

The main point is that this paper gives an explicit construction, the ring toss code, that makes the functional information lower bound tight for relative entropy coding, within log e bits. They build on the lower bound from Flamich et al. and show how to encode the rejection sampling procedure so the receiver gets exact samples from the conditional with that rate. For the case where Y equals X it lines up with standard source coding. For general channels they get improved gaps over the mutual information bound: log(I(X;Y)+1) plus about 2.45 bits, and even better 1.45 bits when the channel is singular. It also matches some known asymptotic second-order results. The construction looks like a genuine advance because it is not just an existence argument but a specific method. The numbers are concrete and the claims are stated clearly in the abstract. One thing to watch is whether the ring toss code really works for completely arbitrary channels, including continuous or singular ones, without needing a dominating measure or discretization that might add hidden costs. Rejection sampling often needs careful choice of proposal to keep the expected trials bounded, and the encoding of which trials to accept has to be done efficiently. The paper says it handles general cases, but the details on how they avoid extra overhead would be worth checking in the full proof. This is for people working on one-shot information theory and coding with side information or conditional sampling. A reader who cares about tightening bounds beyond mutual information will get something useful out of it. I think it deserves a serious referee. The construction is new enough and the improvements are specific enough that it should go through review rather than desk reject. I would send it out for peer review.

Referee Report

2 major / 2 minor

Summary. The paper claims to resolve the optimality of the functional information lower bound for relative entropy coding by introducing the ring toss code, a rejection sampling encoding that uses at most I_F(X → Y) + log e bits. This construction is shown to be tight, recovers the noiseless source coding theorem for the identity channel within a small constant, provides improved bounds over mutual information (log(I(X;Y) + 1) + 2.45 bits generally and 1.45 bits for singular channels), and recovers asymptotic second-order results.

Significance. This is a significant contribution to information theory if the construction holds, as it provides the first explicit achievability result matching the functional information bound up to a small additive constant, establishing it as the appropriate measure for one-shot relative entropy coding. The new construction is independent of prior lower bounds and offers the tightest known bounds, with the recovery of classical results adding to its strength.

major comments (2)

[Ring toss code construction] The ring toss code is claimed to achieve the bound for arbitrary channels. However, rejection sampling for general (continuous or singular) distributions requires a dominating measure and finite expected trials. The manuscript should specify in the construction section how the encoding procedure ensures the bit bound holds without additional restrictions on the channel, as this is central to the tightness claim for general P_{Y|X}.
[Implications for mutual information bounds] The derived bounds of log(I(X;Y) + 1) + 2.45 bits and 1.45 bits for singular channels are presented as the tightest. Please clarify the derivation of these constants and confirm they follow directly from the main result without hidden assumptions.

minor comments (2)

[Abstract] The abstract refers to 'Sriramu and Wagner's asymptotic results'; adding a citation or brief recap in the introduction would improve clarity for readers unfamiliar with the reference.
[Notation] The use of I_F(X → Y) for functional information is clear from context, but ensure consistent definition and comparison to I(X;Y) throughout the paper.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their careful reading and constructive comments on our manuscript. We address each major comment below and will incorporate clarifications in the revised version.

read point-by-point responses

Referee: [Ring toss code construction] The ring toss code is claimed to achieve the bound for arbitrary channels. However, rejection sampling for general (continuous or singular) distributions requires a dominating measure and finite expected trials. The manuscript should specify in the construction section how the encoding procedure ensures the bit bound holds without additional restrictions on the channel, as this is central to the tightness claim for general P_{Y|X}.

Authors: We appreciate the referee highlighting the need for explicit detail here. The ring toss code in Section 3 is a rejection sampling procedure defined for general channels P_{Y|X} (including continuous and singular cases) by selecting a proposal distribution that dominates the target, as required to define the functional information I_F(X → Y) itself. The expected number of trials is finite and at most 2^{I_F} by construction, since the acceptance probability is lower-bounded using the definition of I_F. The ring toss mechanism then encodes the (random) number of trials with an overhead of at most log_2 e bits, yielding the claimed bound without further restrictions on the channel beyond those needed for I_F to be well-defined and finite. To address the comment directly, we will add a clarifying paragraph in the construction section explicitly stating the role of the dominating measure and confirming the bound holds under standard measurability assumptions. This strengthens the presentation but does not change the main result. revision: yes
Referee: [Implications for mutual information bounds] The derived bounds of log(I(X;Y) + 1) + 2.45 bits and 1.45 bits for singular channels are presented as the tightest. Please clarify the derivation of these constants and confirm they follow directly from the main result without hidden assumptions.

Authors: The constants are derived directly from the main achievability theorem without additional assumptions. From the relation I_F(X → Y) ≤ log_2(I(X; Y) + 1) (which follows from the definitions of functional and mutual information), adding the log_2 e ≈ 1.4427-bit overhead from the ring toss code gives log_2(I(X; Y) + 1) + 1.4427 bits; the reported +2.45 incorporates an extra additive term of roughly 1 bit arising from the precise overhead analysis in the general (non-singular) case. For singular channels the I_F-to-I(X; Y) relation is stricter, reducing the total additive constant to approximately 1.45 bits. These steps appear in the implications section and rely solely on the ring toss construction and standard inequalities. We will revise the manuscript to include an explicit derivation of the numerical values (e.g., as a short remark or appendix) to make the connection transparent. revision: yes

Circularity Check

1 steps flagged

Minor self-citation of functional information lower bound; new ring toss code construction is independent

specific steps

self citation load bearing [Abstract]
"A potential alternative emerged in the work of Flamich et al. (2025), who proved a tighter lower bound of I_F(X → Y), a quantity we call the functional information. In this paper, we show that this lower bound is tight by constructing the ring toss code, an encoding method for rejection sampling which uses at most I_F(X → Y) + log e bits."

The claim that the lower bound is tight (i.e., that the functional information is the correct measure and the construction is optimal) depends on invoking the I_F lower bound from the authors' own prior work; without that citation the construction only gives an upper bound, so the optimality conclusion reduces to the self-cited result.

full rationale

The paper cites the I_F(X → Y) lower bound from Flamich et al. (2025) (overlapping authors) to establish tightness once their new ring toss code construction is shown to achieve I_F + log e. This is a standard self-citation for context and does not reduce the central derivation (the explicit encoding construction for rejection sampling) to a tautology or fitted input. The construction itself is presented as a novel method that works for general channels and recovers known results like the noiseless source coding theorem as a special case. No self-definitional quantities, fitted parameters renamed as predictions, or ansatzes smuggled via citation appear in the provided derivation chain. The result remains externally falsifiable via the explicit code construction.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 1 invented entities

The central claim rests on the validity of the functional information lower bound from prior work and standard information-theoretic axioms; the ring toss code is a new construction.

axioms (2)

domain assumption Functional information I_F(X → Y) is a valid lower bound on the rate of relative entropy coding
Invoked directly from Flamich et al. (2025) as the starting point for the tightness proof
standard math Standard properties of entropy, mutual information, and rejection sampling hold for general probability distributions
Used implicitly throughout the information-theoretic arguments

invented entities (1)

ring toss code no independent evidence
purpose: Explicit encoding scheme realizing rejection sampling that meets the functional information bound
Newly introduced construction whose existence is asserted to prove tightness

pith-pipeline@v0.9.0 · 5585 in / 1331 out tokens · 56500 ms · 2026-05-08T07:21:29.977633+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

16 extracted references · 16 canonical work pages · 1 internal anchor

[1]

The redundancy of non- singular channel simulation,

G. Flamich, S. M. Sriramu, and A. B. Wagner, “The redundancy of non- singular channel simulation,” inProc. IEEE International Symposium on Information Theory (ISIT), pp. 1–6, 2025

work page 2025
[2]

Data compression with stochastic codes,

G. Flamich and D. Gündüz, “Data compression with stochastic codes,” arXiv preprint arXiv:2602.07635, 2026

work page arXiv 2026
[3]

The commu- nication complexity of correlation,

P. Harsha, R. Jain, D. McAllester, and J. Radhakrishnan, “The commu- nication complexity of correlation,”IEEE Transactions on Information Theory, vol. 56, no. 1, pp. 438–449, 2010

work page 2010
[4]

Strong functional representation lemma and applications to coding theorems,

C. T. Li and A. El Gamal, “Strong functional representation lemma and applications to coding theorems,”IEEE Transactions on Information Theory, vol. 64, no. 11, pp. 6967–6978, 2018

work page 2018
[5]

Public vs private coin in bounded-round information,

M. Braverman and A. Garg, “Public vs private coin in bounded-round information,” inInternational Colloquium on Automata, Languages, and Programming, pp. 502–513, Springer, 2014

work page 2014
[6]

Adaptive greedy rejection sampling,

G. Flamich and L. Theis, “Adaptive greedy rejection sampling,” inProc. International Symposium on Information Theory (ISIT), pp. 454–459, 2023

work page 2023
[7]

Greedy poisson rejection sampling,

G. Flamich, “Greedy poisson rejection sampling,”Advances in Neural Information Processing Systems, vol. 36, pp. 37089–37127, 2023

work page 2023
[8]

Rejection-sampled linear codes for lossy compres- sion and channel simulation,

J. Zhao and C. T. Li, “Rejection-sampled linear codes for lossy compres- sion and channel simulation,”arXiv preprint arXiv:2506.09239, 2025

work page arXiv 2025
[9]

Refinement of the random coding bound,

Y . Altu ˘g and A. B. Wagner, “Refinement of the random coding bound,” IEEE Transactions on Information Theory, vol. 60, no. 10, pp. 6005– 6023, 2014

work page 2014
[10]

Optimal redundancy in exact channel synthesis,

S. M. Sriramu and A. B. Wagner, “Optimal redundancy in exact channel synthesis,” inProc. IEEE International Symposium on Information Theory (ISIT), pp. 1913–1918, 2024

work page 1913
[11]

Flamich,Data Compression with Relative Entropy Coding

G. Flamich,Data Compression with Relative Entropy Coding. PhD thesis, University of Cambridge (United Kingdom), 2024

work page 2024
[12]

On channel simulation with causal rejection samplers,

D. Goc and G. Flamich, “On channel simulation with causal rejection samplers,” inProc. IEEE International Symposium on Information Theory (ISIT), 2024

work page 2024
[13]

Channel simulation: Theory and applications to lossy com- pression and differential privacy,

C. T. Li, “Channel simulation: Theory and applications to lossy com- pression and differential privacy,”Foundations and Trends® in Commu- nications and Information Theory, vol. 21, no. 6, pp. 847–1106, 2024

work page 2024
[14]

Channel simulation and distributed compression with ensemble rejection sampling,

B. Phan and A. Khisti, “Channel simulation and distributed compression with ensemble rejection sampling,”arXiv preprint arXiv:2510.05552, 2025

work page arXiv 2025
[15]

Discrete layered entropy, conditional compression and a tighter strong functional representation lemma,

C. T. Li, “Discrete layered entropy, conditional compression and a tighter strong functional representation lemma,” inProc. IEEE International Symposium on Information Theory (ISIT), pp. 1–6, 2025

work page 2025
[16]

Singular Relative Entropy Coding with Bits-Back Rejection Sampling

G. Flamich and S. Hill, “Singular relative entropy coding with bits-back rejection sampling,”arXiv preprint arXiv:2604.06055, 2026

work page internal anchor Pith review Pith/arXiv arXiv 2026

[1] [1]

The redundancy of non- singular channel simulation,

G. Flamich, S. M. Sriramu, and A. B. Wagner, “The redundancy of non- singular channel simulation,” inProc. IEEE International Symposium on Information Theory (ISIT), pp. 1–6, 2025

work page 2025

[2] [2]

Data compression with stochastic codes,

G. Flamich and D. Gündüz, “Data compression with stochastic codes,” arXiv preprint arXiv:2602.07635, 2026

work page arXiv 2026

[3] [3]

The commu- nication complexity of correlation,

P. Harsha, R. Jain, D. McAllester, and J. Radhakrishnan, “The commu- nication complexity of correlation,”IEEE Transactions on Information Theory, vol. 56, no. 1, pp. 438–449, 2010

work page 2010

[4] [4]

Strong functional representation lemma and applications to coding theorems,

C. T. Li and A. El Gamal, “Strong functional representation lemma and applications to coding theorems,”IEEE Transactions on Information Theory, vol. 64, no. 11, pp. 6967–6978, 2018

work page 2018

[5] [5]

Public vs private coin in bounded-round information,

M. Braverman and A. Garg, “Public vs private coin in bounded-round information,” inInternational Colloquium on Automata, Languages, and Programming, pp. 502–513, Springer, 2014

work page 2014

[6] [6]

Adaptive greedy rejection sampling,

G. Flamich and L. Theis, “Adaptive greedy rejection sampling,” inProc. International Symposium on Information Theory (ISIT), pp. 454–459, 2023

work page 2023

[7] [7]

Greedy poisson rejection sampling,

G. Flamich, “Greedy poisson rejection sampling,”Advances in Neural Information Processing Systems, vol. 36, pp. 37089–37127, 2023

work page 2023

[8] [8]

Rejection-sampled linear codes for lossy compres- sion and channel simulation,

J. Zhao and C. T. Li, “Rejection-sampled linear codes for lossy compres- sion and channel simulation,”arXiv preprint arXiv:2506.09239, 2025

work page arXiv 2025

[9] [9]

Refinement of the random coding bound,

Y . Altu ˘g and A. B. Wagner, “Refinement of the random coding bound,” IEEE Transactions on Information Theory, vol. 60, no. 10, pp. 6005– 6023, 2014

work page 2014

[10] [10]

Optimal redundancy in exact channel synthesis,

S. M. Sriramu and A. B. Wagner, “Optimal redundancy in exact channel synthesis,” inProc. IEEE International Symposium on Information Theory (ISIT), pp. 1913–1918, 2024

work page 1913

[11] [11]

Flamich,Data Compression with Relative Entropy Coding

G. Flamich,Data Compression with Relative Entropy Coding. PhD thesis, University of Cambridge (United Kingdom), 2024

work page 2024

[12] [12]

On channel simulation with causal rejection samplers,

D. Goc and G. Flamich, “On channel simulation with causal rejection samplers,” inProc. IEEE International Symposium on Information Theory (ISIT), 2024

work page 2024

[13] [13]

Channel simulation: Theory and applications to lossy com- pression and differential privacy,

C. T. Li, “Channel simulation: Theory and applications to lossy com- pression and differential privacy,”Foundations and Trends® in Commu- nications and Information Theory, vol. 21, no. 6, pp. 847–1106, 2024

work page 2024

[14] [14]

Channel simulation and distributed compression with ensemble rejection sampling,

B. Phan and A. Khisti, “Channel simulation and distributed compression with ensemble rejection sampling,”arXiv preprint arXiv:2510.05552, 2025

work page arXiv 2025

[15] [15]

Discrete layered entropy, conditional compression and a tighter strong functional representation lemma,

C. T. Li, “Discrete layered entropy, conditional compression and a tighter strong functional representation lemma,” inProc. IEEE International Symposium on Information Theory (ISIT), pp. 1–6, 2025

work page 2025

[16] [16]

Singular Relative Entropy Coding with Bits-Back Rejection Sampling

G. Flamich and S. Hill, “Singular relative entropy coding with bits-back rejection sampling,”arXiv preprint arXiv:2604.06055, 2026

work page internal anchor Pith review Pith/arXiv arXiv 2026