pith. sign in

arxiv: 2604.23076 · v2 · submitted 2026-04-25 · 💻 cs.IT · math.IT

Rejection Sampling is Optimal for Relative Entropy Coding

Pith reviewed 2026-05-08 07:21 UTC · model grok-4.3

classification 💻 cs.IT math.IT
keywords relative entropy codingrejection samplingfunctional informationring toss codeone-shot codingsource codinginformation theory
0
0 comments X

The pith

The ring toss code achieves the functional information lower bound for relative entropy coding within log e bits.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper shows that functional information gives the right rate for relative entropy coding, where a sender encodes an input X so a receiver can produce a sample from the conditional distribution of Y given X. Earlier work established a lower bound using functional information but left open whether it could be met. By building the ring toss code for rejection sampling, the authors prove the bound is tight up to a small additive constant. This matters because it replaces the looser mutual information measure with a tighter one and recovers known source coding results as special cases.

Core claim

We construct the ring toss code, an encoding method for rejection sampling which uses at most I_F(X → Y) + log e bits. This shows that the lower bound of functional information is tight. For the trivial channel Y = X, our result recovers the noiseless source coding theorem within a small constant. For a general channel, it implies that the classical mutual information lower bound is achievable within log(I(X; Y) + 1) + 2.45 bits in general and within 1.45 bits for singular channels. Moreover, our one-shot result also recovers Sriramu and Wagner's asymptotic results on the second-order redundancy of relative entropy codes.

What carries the argument

The ring toss code, a stochastic encoding procedure for rejection sampling that attains the functional information bound.

If this is right

  • For the channel where Y equals X, the construction recovers the noiseless source coding theorem up to a small constant.
  • The mutual information lower bound becomes achievable within log(I(X; Y) + 1) + 2.45 bits for general channels.
  • For singular channels the gap to mutual information shrinks to 1.45 bits.
  • The one-shot bounds imply the known second-order asymptotic redundancy results for relative entropy coding.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Functional information may serve as the natural rate measure in other one-shot sampling or generation tasks beyond relative entropy coding.
  • The ring toss construction could be adapted to continuous alphabets or multiple independent uses to obtain finite-blocklength improvements.
  • The small remaining gap suggests that exact functional-information achievability might hold with only a vanishing additive term in the asymptotic regime.

Load-bearing premise

Functional information is a valid lower bound on the number of bits needed, and the ring toss code construction applies to arbitrary channels and distributions.

What would settle it

A concrete channel P_{Y|X} and input distribution where every possible code requires more than I_F(X → Y) + log e bits on average, or where the ring toss code itself exceeds this quantity.

Figures

Figures reproduced from arXiv: 2604.23076 by Fady Alajaji, Gergely Flamich, Spencer Hill, Tam\'as Linder.

Figure 1
Figure 1. Figure 1: Visualization of the width function when view at source ↗
Figure 2
Figure 2. Figure 2: A visualization of the acceptance region in rejection sampling for an additive white Gaussian noise channel Y = X+N with X ∼ N (0, 0.75) and N ∼ N (0, 0.25), where X and Y are truncated to a large but finite range to ensure a bounded density ratio. Here, x = −0.5 and the two proposal samples are Y1 = 0.3, Y2 = −0.4 with heights L1 = 1.5 and L2 = 1.1, where we use the shorthand Li ≜ MUi. We see that x ̸∈ S1… view at source ↗
Figure 3
Figure 3. Figure 3: Illustration of the ring toss code for X, Y ∼ PX,Y when the channel X → Y is singular. Given the common randomness Z, upon the sender observing the source X ∼ PX, they compute the filtered set of indices S from (27). Denote the the filtered samples {Y˜ i}i≥1. Now, for each Y˜ i, the encoder computes the support Si (shaded green and blue subsets of X in the figure) of the function x 7→ dPY |X dPY (Y˜ i | x)… view at source ↗
read the original abstract

In relative entropy coding, a sender aims to design a stochastic code such that, on input $X \sim P_X$, the receiver can generate a sample $Y \sim P_{Y \mid X}$. It is a standard result that (1) this requires at least $I(X; Y)$ bits, (2) the lower bound is achievable within a logarithmic gap, and (3) this gap cannot be reduced in general. The necessity of the gap suggests that the mutual information is not the correct information measure to quantify the rate of relative entropy coding. A potential alternative emerged in the work of Flamich et al. (2025), who proved a tighter lower bound of $I_F(X \to Y)$, a quantity we call the functional information. In this paper, we show that this lower bound is tight by constructing the ring toss code, an encoding method for rejection sampling which uses at most $I_F(X \to Y) + \log e$ bits. For the trivial channel $Y = X$, our result recovers the noiseless source coding theorem within a small constant. For a general channel, it implies that the classical mutual information lower bound is achievable within $\log(I(X; Y) + 1) + 2.45$ bits in general and within $1.45$ bits for singular channels, which are both the tightest bounds of their kind to date. Moreover, our one-shot result also recovers Sriramu and Wagner's asymptotic results on the second-order redundancy of relative entropy codes.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper claims to resolve the optimality of the functional information lower bound for relative entropy coding by introducing the ring toss code, a rejection sampling encoding that uses at most I_F(X → Y) + log e bits. This construction is shown to be tight, recovers the noiseless source coding theorem for the identity channel within a small constant, provides improved bounds over mutual information (log(I(X;Y) + 1) + 2.45 bits generally and 1.45 bits for singular channels), and recovers asymptotic second-order results.

Significance. This is a significant contribution to information theory if the construction holds, as it provides the first explicit achievability result matching the functional information bound up to a small additive constant, establishing it as the appropriate measure for one-shot relative entropy coding. The new construction is independent of prior lower bounds and offers the tightest known bounds, with the recovery of classical results adding to its strength.

major comments (2)
  1. [Ring toss code construction] The ring toss code is claimed to achieve the bound for arbitrary channels. However, rejection sampling for general (continuous or singular) distributions requires a dominating measure and finite expected trials. The manuscript should specify in the construction section how the encoding procedure ensures the bit bound holds without additional restrictions on the channel, as this is central to the tightness claim for general P_{Y|X}.
  2. [Implications for mutual information bounds] The derived bounds of log(I(X;Y) + 1) + 2.45 bits and 1.45 bits for singular channels are presented as the tightest. Please clarify the derivation of these constants and confirm they follow directly from the main result without hidden assumptions.
minor comments (2)
  1. [Abstract] The abstract refers to 'Sriramu and Wagner's asymptotic results'; adding a citation or brief recap in the introduction would improve clarity for readers unfamiliar with the reference.
  2. [Notation] The use of I_F(X → Y) for functional information is clear from context, but ensure consistent definition and comparison to I(X;Y) throughout the paper.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their careful reading and constructive comments on our manuscript. We address each major comment below and will incorporate clarifications in the revised version.

read point-by-point responses
  1. Referee: [Ring toss code construction] The ring toss code is claimed to achieve the bound for arbitrary channels. However, rejection sampling for general (continuous or singular) distributions requires a dominating measure and finite expected trials. The manuscript should specify in the construction section how the encoding procedure ensures the bit bound holds without additional restrictions on the channel, as this is central to the tightness claim for general P_{Y|X}.

    Authors: We appreciate the referee highlighting the need for explicit detail here. The ring toss code in Section 3 is a rejection sampling procedure defined for general channels P_{Y|X} (including continuous and singular cases) by selecting a proposal distribution that dominates the target, as required to define the functional information I_F(X → Y) itself. The expected number of trials is finite and at most 2^{I_F} by construction, since the acceptance probability is lower-bounded using the definition of I_F. The ring toss mechanism then encodes the (random) number of trials with an overhead of at most log_2 e bits, yielding the claimed bound without further restrictions on the channel beyond those needed for I_F to be well-defined and finite. To address the comment directly, we will add a clarifying paragraph in the construction section explicitly stating the role of the dominating measure and confirming the bound holds under standard measurability assumptions. This strengthens the presentation but does not change the main result. revision: yes

  2. Referee: [Implications for mutual information bounds] The derived bounds of log(I(X;Y) + 1) + 2.45 bits and 1.45 bits for singular channels are presented as the tightest. Please clarify the derivation of these constants and confirm they follow directly from the main result without hidden assumptions.

    Authors: The constants are derived directly from the main achievability theorem without additional assumptions. From the relation I_F(X → Y) ≤ log_2(I(X; Y) + 1) (which follows from the definitions of functional and mutual information), adding the log_2 e ≈ 1.4427-bit overhead from the ring toss code gives log_2(I(X; Y) + 1) + 1.4427 bits; the reported +2.45 incorporates an extra additive term of roughly 1 bit arising from the precise overhead analysis in the general (non-singular) case. For singular channels the I_F-to-I(X; Y) relation is stricter, reducing the total additive constant to approximately 1.45 bits. These steps appear in the implications section and rely solely on the ring toss construction and standard inequalities. We will revise the manuscript to include an explicit derivation of the numerical values (e.g., as a short remark or appendix) to make the connection transparent. revision: yes

Circularity Check

1 steps flagged

Minor self-citation of functional information lower bound; new ring toss code construction is independent

specific steps
  1. self citation load bearing [Abstract]
    "A potential alternative emerged in the work of Flamich et al. (2025), who proved a tighter lower bound of I_F(X → Y), a quantity we call the functional information. In this paper, we show that this lower bound is tight by constructing the ring toss code, an encoding method for rejection sampling which uses at most I_F(X → Y) + log e bits."

    The claim that the lower bound is tight (i.e., that the functional information is the correct measure and the construction is optimal) depends on invoking the I_F lower bound from the authors' own prior work; without that citation the construction only gives an upper bound, so the optimality conclusion reduces to the self-cited result.

full rationale

The paper cites the I_F(X → Y) lower bound from Flamich et al. (2025) (overlapping authors) to establish tightness once their new ring toss code construction is shown to achieve I_F + log e. This is a standard self-citation for context and does not reduce the central derivation (the explicit encoding construction for rejection sampling) to a tautology or fitted input. The construction itself is presented as a novel method that works for general channels and recovers known results like the noiseless source coding theorem as a special case. No self-definitional quantities, fitted parameters renamed as predictions, or ansatzes smuggled via citation appear in the provided derivation chain. The result remains externally falsifiable via the explicit code construction.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 1 invented entities

The central claim rests on the validity of the functional information lower bound from prior work and standard information-theoretic axioms; the ring toss code is a new construction.

axioms (2)
  • domain assumption Functional information I_F(X → Y) is a valid lower bound on the rate of relative entropy coding
    Invoked directly from Flamich et al. (2025) as the starting point for the tightness proof
  • standard math Standard properties of entropy, mutual information, and rejection sampling hold for general probability distributions
    Used implicitly throughout the information-theoretic arguments
invented entities (1)
  • ring toss code no independent evidence
    purpose: Explicit encoding scheme realizing rejection sampling that meets the functional information bound
    Newly introduced construction whose existence is asserted to prove tightness

pith-pipeline@v0.9.0 · 5585 in / 1331 out tokens · 56500 ms · 2026-05-08T07:21:29.977633+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

16 extracted references · 16 canonical work pages · 1 internal anchor

  1. [1]

    The redundancy of non- singular channel simulation,

    G. Flamich, S. M. Sriramu, and A. B. Wagner, “The redundancy of non- singular channel simulation,” inProc. IEEE International Symposium on Information Theory (ISIT), pp. 1–6, 2025

  2. [2]

    Data compression with stochastic codes,

    G. Flamich and D. Gündüz, “Data compression with stochastic codes,” arXiv preprint arXiv:2602.07635, 2026

  3. [3]

    The commu- nication complexity of correlation,

    P. Harsha, R. Jain, D. McAllester, and J. Radhakrishnan, “The commu- nication complexity of correlation,”IEEE Transactions on Information Theory, vol. 56, no. 1, pp. 438–449, 2010

  4. [4]

    Strong functional representation lemma and applications to coding theorems,

    C. T. Li and A. El Gamal, “Strong functional representation lemma and applications to coding theorems,”IEEE Transactions on Information Theory, vol. 64, no. 11, pp. 6967–6978, 2018

  5. [5]

    Public vs private coin in bounded-round information,

    M. Braverman and A. Garg, “Public vs private coin in bounded-round information,” inInternational Colloquium on Automata, Languages, and Programming, pp. 502–513, Springer, 2014

  6. [6]

    Adaptive greedy rejection sampling,

    G. Flamich and L. Theis, “Adaptive greedy rejection sampling,” inProc. International Symposium on Information Theory (ISIT), pp. 454–459, 2023

  7. [7]

    Greedy poisson rejection sampling,

    G. Flamich, “Greedy poisson rejection sampling,”Advances in Neural Information Processing Systems, vol. 36, pp. 37089–37127, 2023

  8. [8]

    Rejection-sampled linear codes for lossy compres- sion and channel simulation,

    J. Zhao and C. T. Li, “Rejection-sampled linear codes for lossy compres- sion and channel simulation,”arXiv preprint arXiv:2506.09239, 2025

  9. [9]

    Refinement of the random coding bound,

    Y . Altu ˘g and A. B. Wagner, “Refinement of the random coding bound,” IEEE Transactions on Information Theory, vol. 60, no. 10, pp. 6005– 6023, 2014

  10. [10]

    Optimal redundancy in exact channel synthesis,

    S. M. Sriramu and A. B. Wagner, “Optimal redundancy in exact channel synthesis,” inProc. IEEE International Symposium on Information Theory (ISIT), pp. 1913–1918, 2024

  11. [11]

    Flamich,Data Compression with Relative Entropy Coding

    G. Flamich,Data Compression with Relative Entropy Coding. PhD thesis, University of Cambridge (United Kingdom), 2024

  12. [12]

    On channel simulation with causal rejection samplers,

    D. Goc and G. Flamich, “On channel simulation with causal rejection samplers,” inProc. IEEE International Symposium on Information Theory (ISIT), 2024

  13. [13]

    Channel simulation: Theory and applications to lossy com- pression and differential privacy,

    C. T. Li, “Channel simulation: Theory and applications to lossy com- pression and differential privacy,”Foundations and Trends® in Commu- nications and Information Theory, vol. 21, no. 6, pp. 847–1106, 2024

  14. [14]

    Channel simulation and distributed compression with ensemble rejection sampling,

    B. Phan and A. Khisti, “Channel simulation and distributed compression with ensemble rejection sampling,”arXiv preprint arXiv:2510.05552, 2025

  15. [15]

    Discrete layered entropy, conditional compression and a tighter strong functional representation lemma,

    C. T. Li, “Discrete layered entropy, conditional compression and a tighter strong functional representation lemma,” inProc. IEEE International Symposium on Information Theory (ISIT), pp. 1–6, 2025

  16. [16]

    Singular Relative Entropy Coding with Bits-Back Rejection Sampling

    G. Flamich and S. Hill, “Singular relative entropy coding with bits-back rejection sampling,”arXiv preprint arXiv:2604.06055, 2026