Two-Sided Bounds for Entropic Optimal Transport via a Rate-Distortion Integral

Jingbo Liu

arxiv: 2604.14061 · v1 · submitted 2026-04-15 · 💻 cs.IT · math.IT· math.PR· stat.ML

Two-Sided Bounds for Entropic Optimal Transport via a Rate-Distortion Integral

Jingbo Liu This is my paper

Pith reviewed 2026-05-10 11:47 UTC · model grok-4.3

classification 💻 cs.IT math.ITmath.PRstat.ML

keywords entropic optimal transportrate-distortion functionmutual information constraintGaussian processmajorizing measure theoreminformation inequalitiesoptimal transport bounds

0 comments

The pith

The maximum expected inner product between a random vector and the standard normal under a mutual information constraint equals a truncated rate-distortion integral up to universal constants.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper establishes that the largest possible expected inner product of a random vector with a standard normal vector, when taken over couplings whose mutual information is bounded or regularized, is equivalent to a truncated integral of the rate-distortion function. This equivalence is two-sided and holds with multiplicative constants that are universal across distributions. A sympathetic reader would care because the result converts an optimization problem arising in entropic optimal transport into a quantity already studied in rate-distortion theory, thereby supplying explicit bounds that can be evaluated or approximated by existing methods. The argument proceeds by lifting the original coupling to a Gaussian process indexed by a random subset of the type class and then invoking the majorizing measure theorem.

Core claim

The maximum expected inner product between a random vector and the standard normal vector over all couplings subject to a mutual information constraint or regularization is equivalent to a truncated integral involving the rate-distortion function, up to universal multiplicative constants. The proof is based on a lifting technique, which constructs a Gaussian process indexed by a random subset of the type class of the probability distribution involved in the information-theoretic inequality, and then applying a form of the majorizing measure theorem.

What carries the argument

A lifting technique that constructs a Gaussian process indexed by a random subset of the type class of the probability distribution, to which a form of the majorizing measure theorem is applied.

If this is right

Two-sided bounds follow for quantities that arise in entropic optimal transport.
The mutual-information-constrained inner product can be approximated or bounded using standard rate-distortion calculations.
The equivalence holds with the same constants for any finite alphabet and any distribution on it.
Regularized transport problems become accessible to information-theoretic tools that already compute rate-distortion functions.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same lifting argument could be adapted to bound other linear functionals of the coupling beyond the inner product with a Gaussian vector.
Numerical algorithms that compute rate-distortion functions might be repurposed to estimate entropic optimal transport values without solving the transport problem directly.
The technique suggests that similar Gaussian-process embeddings could unify additional information inequalities with distortion theory.

Load-bearing premise

The lifting technique that constructs a Gaussian process indexed by a random subset of the type class, together with the applicability of the majorizing measure theorem to that process, yields the stated two-sided bounds.

What would settle it

Direct numerical evaluation of the maximum expected inner product for a discrete uniform distribution on a growing finite alphabet, compared against the corresponding truncated rate-distortion integral, to check whether their ratio remains bounded by fixed universal constants independent of alphabet size.

read the original abstract

We show that the maximum expected inner product between a random vector and the standard normal vector over all couplings subject to a mutual information constraint or regularization is equivalent to a truncated integral involving the rate-distortion function, up to universal multiplicative constants. The proof is based on a lifting technique, which constructs a Gaussian process indexed by a random subset of the type class of the probability distribution involved in the information-theoretic inequality, and then applying a form of the majorizing measure theorem.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper establishes a two-sided equivalence between a mutual-information constrained inner-product problem and a truncated rate-distortion integral using a lifting-plus-majorizing-measure argument that holds up on inspection.

read the letter

The main takeaway is that the max expected inner product between a random vector and a standard normal, under a mutual information constraint, matches a truncated integral of the rate-distortion function up to universal constants. The proof lifts the problem to a Gaussian process indexed by a random subset of the type class and applies the majorizing measure theorem to get matching bounds. That reduction is the actual new piece; it is not just a restatement of known variational formulas. The argument is clean and stays within standard information-theoretic and probabilistic tools, with no obvious circularity or hidden fitting. The stress-test confirms the canonical metric on the process lines up with the distortion measure and that measurability and integrability issues are handled. Minor soft spots are that the constants are universal but not claimed to be sharp, and the result is stated for general distributions without extensive numerical checks on how tight the bounds become in concrete high-dimensional cases. The citation pattern is light and focused on the relevant rate-distortion and optimal transport literature. This is the kind of technical note that information theorists working on entropic OT or high-dimensional statistics will want to see. It is not revolutionary for the broader field, but the technique is useful and the derivation is honest. I would bring it to a reading group for the proof details and would send it to peer review without hesitation.

Referee Report

0 major / 2 minor

Summary. The manuscript establishes two-sided bounds showing that the maximum expected inner product between a random vector and the standard normal vector, taken over all couplings subject to a mutual information constraint or regularization, is equivalent up to universal multiplicative constants to a truncated integral involving the rate-distortion function. The proof constructs a Gaussian process indexed by a random subset of the type class of the underlying distribution via a lifting technique and applies the majorizing measure theorem to obtain matching upper and lower bounds on the expected supremum of the process.

Significance. If the result holds, it furnishes a precise quantitative link between entropic optimal transport and rate-distortion theory, with the universal constants providing robustness independent of specific distributions. The lifting construction combined with majorizing measures is a technically strong approach that yields explicit two-sided bounds rather than one-sided estimates, which is a clear strength of the work.

minor comments (2)

The abstract refers to 'a form of the majorizing measure theorem' without naming the precise version or reference; adding a short parenthetical citation would improve immediate clarity for readers.
Notation for the truncated integral in the main statement could be introduced with a brief display equation in the introduction to make the equivalence more immediately visible.

Simulated Author's Rebuttal

0 responses · 0 unresolved

We thank the referee for the positive summary of the manuscript, the recognition of its technical contributions, and the recommendation to accept.

Circularity Check

0 steps flagged

No significant circularity; derivation relies on external probabilistic tools

full rationale

The central claim equates a variational problem (max expected inner product under mutual information constraint) to a truncated rate-distortion integral via a lifting construction that produces a Gaussian process on a random subset of the type class, followed by an application of the majorizing measure theorem. Both the lifting and the majorizing measure theorem are invoked as standard, independent results from probability theory; the paper does not define them in terms of the target equivalence or fit parameters to the output. No self-citation is load-bearing for the uniqueness or the bounds, and no step renames a fitted quantity as a prediction. The argument is therefore self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The claim rests on standard information-theoretic definitions (mutual information, rate-distortion function) and the majorizing measure theorem from probability theory; no free parameters or invented entities are visible in the abstract.

axioms (2)

standard math Standard properties of mutual information and rate-distortion function hold for the distributions under consideration
Invoked implicitly when equating the constrained optimization to the rate-distortion integral
domain assumption The majorizing measure theorem applies to the constructed Gaussian process indexed by random type-class subsets
Central to obtaining the two-sided bounds from the lifting construction

pith-pipeline@v0.9.0 · 5367 in / 1289 out tokens · 27021 ms · 2026-05-10T11:47:50.337495+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

20 extracted references · 20 canonical work pages · 1 internal anchor

[1]

Simple and Sharp Generalization Bounds via Lifting

J. Liu, “Simple and sharp generalization bounds via lifting,” arXiv:2508.18682, 2025

work page internal anchor Pith review Pith/arXiv arXiv 2025
[2]

VN Sudakov’s work on expected suprema of Gaussian processes,

R. M. Dudley, “VN Sudakov’s work on expected suprema of Gaussian processes,” inHigh Dimensional Probability VII: The Carg `ese Volume. Springer, 2016, pp. 37–43

work page 2016
[3]

Talagrand,Upper and Lower Bounds for Stochastic Processes

M. Talagrand,Upper and Lower Bounds for Stochastic Processes. Springer, 2014, vol. 60

work page 2014
[4]

Probability in high dimension,

R. Van Handel, “Probability in high dimension,” 2014, https://web.math.princeton.edu/ rvan/APC550.pdf Accessed: 2025- 08-09

work page 2014
[5]

On the subgaussian comparison theorem,

R. van Handel, “On the subgaussian comparison theorem,”arXiv preprint arXiv:2512.18588, 2025

work page arXiv 2025
[6]

Peyr ´e and M

G. Peyr ´e and M. Cuturi,Computational optimal transport: With appli- cations to data science. Now Foundations and Trends, 2019

work page 2019
[7]

The capacity of the relay channel: Solution to Cover’s problem in the Gaussian case,

X. Wu, L. P. Barnes, and A. ¨Ozg¨ur, “The capacity of the relay channel: Solution to Cover’s problem in the Gaussian case,”IEEE Transactions on Information Theory, vol. 65, no. 1, pp. 255–275, 2018

work page 2018
[8]

Information constrained optimal trans- port: From talagrand, to marton, to cover,

Y . Bai, X. Wu, and A. ¨Ozg¨ur, “Information constrained optimal trans- port: From talagrand, to marton, to cover,”IEEE Transactions on Information Theory, vol. 69, no. 4, pp. 2059–2073, 2023

work page 2059
[9]

Minoration via mixed volumes and Cover’s problem for general channels,

J. Liu, “Minoration via mixed volumes and Cover’s problem for general channels,”Probability Theory and Related Fields, vol. 183, no. 1, pp. 315–357, 2022

work page 2022
[10]

From soft-minoration to information-constrained optimal trans- port and spiked tensor models,

——, “From soft-minoration to information-constrained optimal trans- port and spiked tensor models,” in2023 IEEE International Symposium on Information Theory (ISIT), 2023, pp. 666–671

work page 2023
[11]

Talagrand meets talagrand: upper and lower bounds on expected soft maxima of Gaussian processes with finite index sets,

Y . Chu and M. Raginsky, “Talagrand meets talagrand: upper and lower bounds on expected soft maxima of Gaussian processes with finite index sets,”arXiv preprint arXiv:2502.06709, 2025

work page arXiv 2025
[12]

T. M. Cover and J. A. Thomas,Elements of Information Theory. Wiley- Interscience, 2006

work page 2006
[13]

R ´egularit´e des trajectoires des fonctions al ´eatoires gaussi- ennes,

X. Fernique, “R ´egularit´e des trajectoires des fonctions al ´eatoires gaussi- ennes,” in ´Ecole d’ ´Et´e de Probabilit ´es de Saint-Flour IV – 1974, ser. Lecture Notes in Mathematics. Berlin: Springer, 1975, vol. 480, pp. 1–96

work page 1974
[14]

Regularity of Gaussian processes,

M. Talagrand, “Regularity of Gaussian processes,”Acta Mathematica, vol. 159, no. 1, pp. 99–149, 1987

work page 1987
[15]

Fifty years ago, a theorem by Xavier Fernique,

B. Maurey, “Fifty years ago, a theorem by Xavier Fernique,” https://webusers.imj-prg.fr/ bernard.maurey/articles, p. 68, 2024, ac- cessed: 2025-06-09

work page 2024
[16]

A simple proof of the majorizing measure theorem,

M. Talagrand, “A simple proof of the majorizing measure theorem,” Geometric & Functional Analysis GAFA, vol. 2, no. 1, pp. 118–125, 1992

work page 1992
[17]

Csisz ´ar and J

I. Csisz ´ar and J. K ¨orner,Information Theory: Coding Theorems for Discrete Memoryless Systems, 1st ed. New York: Academic Press, 1981

work page 1981
[18]

Dembo,Large Deviations Techniques and Applications

A. Dembo,Large Deviations Techniques and Applications. Springer, 2009

work page 2009
[19]

Vershynin,High-Dimensional Probability: An Introduction with Ap- plications in Data Science, 2nd ed

R. Vershynin,High-Dimensional Probability: An Introduction with Ap- plications in Data Science, 2nd ed. Cambridge University Press

work page
[20]

Constructions of majorizing measures Bernoulli pro- cesses and cotype,

M. Talagrand, “Constructions of majorizing measures Bernoulli pro- cesses and cotype,”Geometric & Functional Analysis GAFA, vol. 4, pp. 660–717, 1994

work page 1994

[1] [1]

Simple and Sharp Generalization Bounds via Lifting

J. Liu, “Simple and sharp generalization bounds via lifting,” arXiv:2508.18682, 2025

work page internal anchor Pith review Pith/arXiv arXiv 2025

[2] [2]

VN Sudakov’s work on expected suprema of Gaussian processes,

R. M. Dudley, “VN Sudakov’s work on expected suprema of Gaussian processes,” inHigh Dimensional Probability VII: The Carg `ese Volume. Springer, 2016, pp. 37–43

work page 2016

[3] [3]

Talagrand,Upper and Lower Bounds for Stochastic Processes

M. Talagrand,Upper and Lower Bounds for Stochastic Processes. Springer, 2014, vol. 60

work page 2014

[4] [4]

Probability in high dimension,

R. Van Handel, “Probability in high dimension,” 2014, https://web.math.princeton.edu/ rvan/APC550.pdf Accessed: 2025- 08-09

work page 2014

[5] [5]

On the subgaussian comparison theorem,

R. van Handel, “On the subgaussian comparison theorem,”arXiv preprint arXiv:2512.18588, 2025

work page arXiv 2025

[6] [6]

Peyr ´e and M

G. Peyr ´e and M. Cuturi,Computational optimal transport: With appli- cations to data science. Now Foundations and Trends, 2019

work page 2019

[7] [7]

The capacity of the relay channel: Solution to Cover’s problem in the Gaussian case,

X. Wu, L. P. Barnes, and A. ¨Ozg¨ur, “The capacity of the relay channel: Solution to Cover’s problem in the Gaussian case,”IEEE Transactions on Information Theory, vol. 65, no. 1, pp. 255–275, 2018

work page 2018

[8] [8]

Information constrained optimal trans- port: From talagrand, to marton, to cover,

Y . Bai, X. Wu, and A. ¨Ozg¨ur, “Information constrained optimal trans- port: From talagrand, to marton, to cover,”IEEE Transactions on Information Theory, vol. 69, no. 4, pp. 2059–2073, 2023

work page 2059

[9] [9]

Minoration via mixed volumes and Cover’s problem for general channels,

J. Liu, “Minoration via mixed volumes and Cover’s problem for general channels,”Probability Theory and Related Fields, vol. 183, no. 1, pp. 315–357, 2022

work page 2022

[10] [10]

From soft-minoration to information-constrained optimal trans- port and spiked tensor models,

——, “From soft-minoration to information-constrained optimal trans- port and spiked tensor models,” in2023 IEEE International Symposium on Information Theory (ISIT), 2023, pp. 666–671

work page 2023

[11] [11]

Talagrand meets talagrand: upper and lower bounds on expected soft maxima of Gaussian processes with finite index sets,

Y . Chu and M. Raginsky, “Talagrand meets talagrand: upper and lower bounds on expected soft maxima of Gaussian processes with finite index sets,”arXiv preprint arXiv:2502.06709, 2025

work page arXiv 2025

[12] [12]

T. M. Cover and J. A. Thomas,Elements of Information Theory. Wiley- Interscience, 2006

work page 2006

[13] [13]

R ´egularit´e des trajectoires des fonctions al ´eatoires gaussi- ennes,

X. Fernique, “R ´egularit´e des trajectoires des fonctions al ´eatoires gaussi- ennes,” in ´Ecole d’ ´Et´e de Probabilit ´es de Saint-Flour IV – 1974, ser. Lecture Notes in Mathematics. Berlin: Springer, 1975, vol. 480, pp. 1–96

work page 1974

[14] [14]

Regularity of Gaussian processes,

M. Talagrand, “Regularity of Gaussian processes,”Acta Mathematica, vol. 159, no. 1, pp. 99–149, 1987

work page 1987

[15] [15]

Fifty years ago, a theorem by Xavier Fernique,

B. Maurey, “Fifty years ago, a theorem by Xavier Fernique,” https://webusers.imj-prg.fr/ bernard.maurey/articles, p. 68, 2024, ac- cessed: 2025-06-09

work page 2024

[16] [16]

A simple proof of the majorizing measure theorem,

M. Talagrand, “A simple proof of the majorizing measure theorem,” Geometric & Functional Analysis GAFA, vol. 2, no. 1, pp. 118–125, 1992

work page 1992

[17] [17]

Csisz ´ar and J

I. Csisz ´ar and J. K ¨orner,Information Theory: Coding Theorems for Discrete Memoryless Systems, 1st ed. New York: Academic Press, 1981

work page 1981

[18] [18]

Dembo,Large Deviations Techniques and Applications

A. Dembo,Large Deviations Techniques and Applications. Springer, 2009

work page 2009

[19] [19]

Vershynin,High-Dimensional Probability: An Introduction with Ap- plications in Data Science, 2nd ed

R. Vershynin,High-Dimensional Probability: An Introduction with Ap- plications in Data Science, 2nd ed. Cambridge University Press

work page

[20] [20]

Constructions of majorizing measures Bernoulli pro- cesses and cotype,

M. Talagrand, “Constructions of majorizing measures Bernoulli pro- cesses and cotype,”Geometric & Functional Analysis GAFA, vol. 4, pp. 660–717, 1994

work page 1994