Change of measure through the Legendre transform

Antoine Picard-Weibel; Benjamin Guedj

arxiv: 2202.05568 · v2 · pith:OIYVSIDXnew · submitted 2022-02-11 · 📊 stat.ML · cs.IT· cs.LG· math.IT· math.PR· math.ST· stat.TH

Change of measure through the Legendre transform

Antoine Picard-Weibel , Benjamin Guedj This is my paper

Pith reviewed 2026-05-24 12:22 UTC · model grok-4.3

classification 📊 stat.ML cs.ITcs.LGmath.ITmath.PRmath.STstat.TH

keywords PAC-Bayesf-divergenceschange of measureLegendre transformFenchel-Young inequalitygeneralization boundslearning theoryconcentration inequalities

0 comments

The pith

f-divergence change-of-measure inequalities derived from the Legendre transform extend PAC-Bayes bounds to new empirical risk conditions.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper develops change-of-measure inequalities for arbitrary f-divergences by using the Legendre transform of f together with the Fenchel-Young inequality. These inequalities move concentration properties from a fixed reference measure to any posterior measure. The key advantage is that the required conditions on the empirical risk can be chosen to match the divergence, rather than always needing bounded exponential moments as in the classical Donsker-Varadhan result. This construction is applied to obtain new PAC-Bayes generalization bounds that hold under a broader set of assumptions on the loss function. A sympathetic reader would care because it widens the range of learning problems for which PAC-Bayesian guarantees are available.

Core claim

We study change-of-measure inequalities based on f-divergences, obtained by combining the Legendre transform of f with the Fenchel-Young inequality. Beyond their intrinsic interest in probability theory, we show how these inequalities are helpful in learning theory and yield PAC-Bayes bounds under tailored assumptions on the empirical risk, thereby extending the range of conditions under which PAC-Bayesian guarantees can be established.

What carries the argument

f-divergence change-of-measure inequality derived from the Legendre transform of f and the Fenchel-Young inequality

If this is right

PAC-Bayes bounds become available when the empirical risk has finite moments of order p > 1 for suitable f.
The method covers cases where the loss can take large values with small probability without requiring exponential integrability.
Different f choices produce bounds adapted to sub-Gaussian, sub-exponential, or other tail behaviors.
Generalization results apply to a larger family of posterior distributions in statistical learning.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same technique could generate concentration inequalities for other functionals in probability beyond PAC-Bayes.
It might simplify proofs in robust statistics where moment conditions are natural.
Numerical verification on synthetic data with heavy-tailed losses would test whether the new bounds are non-vacuous.

Load-bearing premise

The derived f-divergence inequalities must preserve the concentration properties without imposing extra conditions that would make the PAC-Bayes application invalid.

What would settle it

Finding a loss random variable with finite p-moment but infinite exponential moment, and checking whether the PAC-Bayes bound derived from the corresponding f still provides a non-trivial guarantee on the generalization gap.

read the original abstract

PAC-Bayes generalisation bounds are derived via change-of-measure inequalities that transfer concentration properties from a reference measure to all posterior measures. The specific choice of change of measure determines the assumptions required on the empirical risk; in particular, the classical Donsker--Varadhan theorem leads to bounds relying on bounded exponential moments. We study change-of-measure inequalities based on \(f\)-divergences, obtained by combining the Legendre transform of \(f\) with the Fenchel--Young inequality. Beyond their intrinsic interest in probability theory, we show how these inequalities are helpful in learning theory and yield PAC-Bayes bounds under tailored assumptions on the empirical risk, thereby extending the range of conditions under which PAC-Bayesian guarantees can be established.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper gives a convex-analytic construction for f-divergence change-of-measure inequalities that extends the assumptions under which PAC-Bayes bounds can be derived.

read the letter

The core contribution is a systematic way to obtain change-of-measure inequalities from any f-divergence by taking its Legendre transform and applying the Fenchel-Young inequality. This produces PAC-Bayes generalization bounds that rest on tailored conditions on the empirical risk instead of the bounded exponential moments required by the classical Donsker-Varadhan route. That is the main new piece: it moves beyond the single KL case and gives a family of inequalities tied to different choices of f. The construction itself looks clean and rests on standard convex-analysis facts, with no obvious circularity or hidden self-reference in the abstract description. The transfer step from reference measure to posterior appears direct, which is what allows the extension of usable assumptions. The paper does a reasonable job of stating the motivation and the contrast with prior work. The main limitation is that everything rests on the abstract; without the actual derivations or examples it is impossible to judge whether the resulting bounds are non-vacuous, easy to compute, or meaningfully tighter in applications. If the inequalities turn out to be loose or require strong conditions on f, the practical gain would be small. Still, the logic as described does not contain internal contradictions. This is aimed at researchers working on PAC-Bayes bounds who want more flexibility in the loss assumptions. A reader already comfortable with f-divergences and convex duality would get the most out of it. The claim is specific enough and the method is in principle checkable, so the paper deserves a serious referee even if revisions are needed to show concrete bounds and examples.

Referee Report

0 major / 3 minor

Summary. The paper derives change-of-measure inequalities based on f-divergences by combining the Legendre transform of a convex function f with the Fenchel-Young inequality. These inequalities are then applied to obtain PAC-Bayes generalization bounds that rely on tailored assumptions on the empirical risk (rather than the bounded exponential moments required by the classical Donsker-Varadhan theorem), thereby extending the range of conditions under which PAC-Bayesian guarantees hold.

Significance. If the derivations are correct, the work provides a convex-analytic framework for generating families of change-of-measure inequalities indexed by f, which in turn yield PAC-Bayes bounds under correspondingly tailored risk assumptions. This could meaningfully broaden the applicability of PAC-Bayes theory to learning settings where exponential-moment conditions fail but other moment or tail conditions (matched to f) hold. The approach is parameter-free in the sense that it directly invokes standard convex duality without additional fitted quantities.

minor comments (3)

Abstract, paragraph 2: the phrase 'tailored assumptions on the empirical risk' is used without an immediate concrete example; a one-sentence illustration (e.g., for f(t)=t log t or f(t)=t^2) would clarify the claim for readers.
The manuscript should explicitly state whether the derived inequalities recover the classical Donsker-Varadhan bound as a special case when f is the exponential function, to make the extension transparent.
Notation for the reference and posterior measures should be introduced once and used consistently; the current abstract alternates between 'reference measure' and 'posterior measures' without a single defining sentence.

Simulated Author's Rebuttal

0 responses · 0 unresolved

We thank the referee for their positive summary of our work and for recommending minor revision. No specific major comments were listed in the report.

Circularity Check

0 steps flagged

No significant circularity identified

full rationale

The derivation begins from the standard Legendre transform of an f-divergence combined with the external Fenchel-Young inequality to produce change-of-measure bounds, then applies those bounds to PAC-Bayes under tailored empirical-risk assumptions. No step reduces by construction to a fitted parameter, self-definition, or load-bearing self-citation; the central inequalities are obtained directly from classical convex-analysis identities whose validity does not depend on the target PAC-Bayes result. The paper therefore remains self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim rests on the validity of applying the Legendre transform and Fenchel-Young inequality to produce usable change-of-measure bounds; these are standard convex-analysis facts rather than new postulates.

axioms (1)

standard math Fenchel-Young inequality holds for proper convex lower-semicontinuous functions
Invoked to combine Legendre transform of f with the divergence to obtain the change-of-measure inequality.

pith-pipeline@v0.9.0 · 5660 in / 1176 out tokens · 26716 ms · 2026-05-24T12:22:11.971215+00:00 · methodology

discussion (0)

Forward citations

Cited by 2 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Tighter Information-Theoretic Generalization Bounds via a Novel Class of Change of Measure Inequalities
cs.IT 2026-02 conditional novelty 7.0

A unified data-processing framework produces tighter change-of-measure inequalities that improve information-theoretic generalization bounds across learning theory and privacy.
Density-Ratio Losses for Post-Hoc Learning to Defer
stat.ML 2026-05 unverdicted novelty 6.0

Post-hoc learning to defer is cast as density-ratio learning between model and expert ideal distributions, producing DR CPE losses that recover Chow's rule for KL-based ideals and support adjustable deferral via thresholding.

Reference graph

Works this paper leans on

14 extracted references · 14 canonical work pages · cited by 2 Pith papers

[1]

12 ON CHANGE OF MEASURE INEQUALITIES FOR f -DIVERGENCES Pierre Alquier and Benjamin Guedj

URL https://www.arxiv.org/abs/2110.11216. 12 ON CHANGE OF MEASURE INEQUALITIES FOR f -DIVERGENCES Pierre Alquier and Benjamin Guedj. Simpler PAC-Bayesian bo unds for hostile data. Machine Learning, 107(5):887–902,

work page arXiv
[2]

URL https://doi.org/10.3390/e23101280

doi: 10.3390/e23101280. URL https://doi.org/10.3390/e23101280. Felix Biggs and Benjamin Guedj. On margins and derandomisat ion in PAC-Bayes. In AISTATS, 2022a. URL https://www.arxiv.org/abs/2107.03955. Felix Biggs and Benjamin Guedj. Non-vacuous generalisatio n bounds for shallow neural networks. 2022b. URL https://arxiv.org/abs/2202.01627. St´ ephane Bou...

work page doi:10.3390/e23101280
[3]

Olivier Catoni

URL https://arxiv.org/abs/2012.03780. Olivier Catoni. A PAC-Bayesian approach to adaptive classi ﬁcation. preprint, 840,

work page arXiv 2012
[4]

Statistical Learning Theory and Stochastic Optimization: Ecole d’Et ´e de Proba- bilit´es de Saint-Flour XXXI-2001

Olivier Catoni. Statistical Learning Theory and Stochastic Optimization: Ecole d’Et ´e de Proba- bilit´es de Saint-Flour XXXI-2001 . Springer,

work page 2001
[5]

doi: 10.1016/j.jcss.2011.12.02

work page doi:10.1016/j.jcss.2011.12.02 2011
[6]

A primer on PAC-bayesian learning.arXiv preprint arXiv:1901.05353, 2019

URL https://arxiv.org/abs/1901.05353. Benjamin Guedj and Louis Pujol. Still no free lunches: the pr ice to pay for tighter PAC- Bayes bounds. Entropy, 23(11),

work page arXiv 1901
[7]

doi: 10.3390/e23111529

ISSN 1099-4300. doi: 10.3390/e23111529. UR L https://www.mdpi.com/1099-4300/23/11/1529. Maxime Haddouche, Benjamin Guedj, Omar Rivasplata, and Joh n Shawe-Taylor. PAC-Bayes un- leashed: generalisation bounds with unbounded losses. Entropy, 23(10):1330,

work page doi:10.3390/e23111529
[8]

Matthew J. Holland. PAC-Bayes under potentially heavy tail s. In Hanna M. Wal- lach, Hugo Larochelle, Alina Beygelzimer, Florence d’Alch ´ e-Buc, Emily B. Fox, and Roman Garnett, editors, Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Sys tems 2019, NeurIPS 2019, December 8-14, 2019, V ancouver , B...

work page 2019
[9]

John Langford and Matthias Seeger

URL https://proceedings.neurips.cc/paper/2019/hash/3a20f62a0af1aa152670bab3c602feed-Abstract.html. John Langford and Matthias Seeger. Bounds for averaging cla ssiﬁers

work page 2019
[10]

Williamson

Zakaria Mhammedi, Benjamin Guedj, and Robert C. Williamson . PAC-Bayesian bound for the conditional value at risk. In Hugo Larochelle, Marc’Aurelio Ran- zato, Raia Hadsell, Maria-Florina Balcan, and Hsuan-Tien L in, editors, Advances in Neural Information Processing Systems 33: Annual Conferen ce on Neural Informa- tion Processing Systems [NeurIPS] 2020, ...

work page 2020
[11]

Behnam Neyshabur, Srinadh Bhojanapalli, and Nathan Srebro

URL https://proceedings.neurips.cc/paper/2020/hash/d02e9bdc27a894e882fa0c9055c99722-Abstract.html. Behnam Neyshabur, Srinadh Bhojanapalli, and Nathan Srebro . A PAC-Bayesian ap- proach to spectrally-normalized margin bounds for neural n etworks. In 6th Interna- tional Conference on Learning Representations, ICLR 2018, V ancouver , BC, Canada, April 30 - M...

work page 2020
[12]

Y uki Ohnishi and Jean Honorio

URL https://proceedings.mlr.press/v124/nozawa20a.html. Y uki Ohnishi and Jean Honorio. Novel change of measure inequ alities with appli- cations to PAC-Bayesian bounds and Monte Carlo estimation. In Arindam Baner- jee and Kenji Fukumizu, editors, The 24th International Conference on Artiﬁcial Intel- ligence and Statistics, AISTATS 2021, April 13-15, 2021,...

work page 2021
[13]

Maria Perez-Ortiz, Omar Rivasplata, Benjamin Guedj, Matth ew Gleeson, Jingyu Zhang, John Shawe-Taylor, Miroslaw Bober, and Josef Kittler

URL http://proceedings.mlr.press/v130/ohnishi21a.html. Maria Perez-Ortiz, Omar Rivasplata, Benjamin Guedj, Matth ew Gleeson, Jingyu Zhang, John Shawe-Taylor, Miroslaw Bober, and Josef Kittler. Learning PAC-Bayes priors for probabilis- tic neural networks. 2021a. URL https://arxiv.org/abs/2109.10304. Maria Perez-Ortiz, Omar Rivasplata, John Shawe-Taylor, a...

work page arXiv
[14]

URL https://doi.org/10.1109/TIT.2014.2320500

doi: 10.1109/TIT.2014.2320500. URL https://doi.org/10.1109/TIT.2014.2320500. Wenda Zhou, Victor V eitch, Morgane Austern, Ryan P . Adams, a nd Peter Orbanz. Non-vacuous generalization bounds at the imagenet scale: a PAC-Bayesian compres- sion approach. In 7th International Conference on Learning Representations , ICLR 2019, New Orleans, LA, USA, May 6-9, ...

work page doi:10.1109/tit.2014.2320500 2014

[1] [1]

12 ON CHANGE OF MEASURE INEQUALITIES FOR f -DIVERGENCES Pierre Alquier and Benjamin Guedj

URL https://www.arxiv.org/abs/2110.11216. 12 ON CHANGE OF MEASURE INEQUALITIES FOR f -DIVERGENCES Pierre Alquier and Benjamin Guedj. Simpler PAC-Bayesian bo unds for hostile data. Machine Learning, 107(5):887–902,

work page arXiv

[2] [2]

URL https://doi.org/10.3390/e23101280

doi: 10.3390/e23101280. URL https://doi.org/10.3390/e23101280. Felix Biggs and Benjamin Guedj. On margins and derandomisat ion in PAC-Bayes. In AISTATS, 2022a. URL https://www.arxiv.org/abs/2107.03955. Felix Biggs and Benjamin Guedj. Non-vacuous generalisatio n bounds for shallow neural networks. 2022b. URL https://arxiv.org/abs/2202.01627. St´ ephane Bou...

work page doi:10.3390/e23101280

[3] [3]

Olivier Catoni

URL https://arxiv.org/abs/2012.03780. Olivier Catoni. A PAC-Bayesian approach to adaptive classi ﬁcation. preprint, 840,

work page arXiv 2012

[4] [4]

Statistical Learning Theory and Stochastic Optimization: Ecole d’Et ´e de Proba- bilit´es de Saint-Flour XXXI-2001

Olivier Catoni. Statistical Learning Theory and Stochastic Optimization: Ecole d’Et ´e de Proba- bilit´es de Saint-Flour XXXI-2001 . Springer,

work page 2001

[5] [5]

doi: 10.1016/j.jcss.2011.12.02

work page doi:10.1016/j.jcss.2011.12.02 2011

[6] [6]

A primer on PAC-bayesian learning.arXiv preprint arXiv:1901.05353, 2019

URL https://arxiv.org/abs/1901.05353. Benjamin Guedj and Louis Pujol. Still no free lunches: the pr ice to pay for tighter PAC- Bayes bounds. Entropy, 23(11),

work page arXiv 1901

[7] [7]

doi: 10.3390/e23111529

ISSN 1099-4300. doi: 10.3390/e23111529. UR L https://www.mdpi.com/1099-4300/23/11/1529. Maxime Haddouche, Benjamin Guedj, Omar Rivasplata, and Joh n Shawe-Taylor. PAC-Bayes un- leashed: generalisation bounds with unbounded losses. Entropy, 23(10):1330,

work page doi:10.3390/e23111529

[8] [8]

Matthew J. Holland. PAC-Bayes under potentially heavy tail s. In Hanna M. Wal- lach, Hugo Larochelle, Alina Beygelzimer, Florence d’Alch ´ e-Buc, Emily B. Fox, and Roman Garnett, editors, Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Sys tems 2019, NeurIPS 2019, December 8-14, 2019, V ancouver , B...

work page 2019

[9] [9]

John Langford and Matthias Seeger

URL https://proceedings.neurips.cc/paper/2019/hash/3a20f62a0af1aa152670bab3c602feed-Abstract.html. John Langford and Matthias Seeger. Bounds for averaging cla ssiﬁers

work page 2019

[10] [10]

Williamson

Zakaria Mhammedi, Benjamin Guedj, and Robert C. Williamson . PAC-Bayesian bound for the conditional value at risk. In Hugo Larochelle, Marc’Aurelio Ran- zato, Raia Hadsell, Maria-Florina Balcan, and Hsuan-Tien L in, editors, Advances in Neural Information Processing Systems 33: Annual Conferen ce on Neural Informa- tion Processing Systems [NeurIPS] 2020, ...

work page 2020

[11] [11]

Behnam Neyshabur, Srinadh Bhojanapalli, and Nathan Srebro

URL https://proceedings.neurips.cc/paper/2020/hash/d02e9bdc27a894e882fa0c9055c99722-Abstract.html. Behnam Neyshabur, Srinadh Bhojanapalli, and Nathan Srebro . A PAC-Bayesian ap- proach to spectrally-normalized margin bounds for neural n etworks. In 6th Interna- tional Conference on Learning Representations, ICLR 2018, V ancouver , BC, Canada, April 30 - M...

work page 2020

[12] [12]

Y uki Ohnishi and Jean Honorio

URL https://proceedings.mlr.press/v124/nozawa20a.html. Y uki Ohnishi and Jean Honorio. Novel change of measure inequ alities with appli- cations to PAC-Bayesian bounds and Monte Carlo estimation. In Arindam Baner- jee and Kenji Fukumizu, editors, The 24th International Conference on Artiﬁcial Intel- ligence and Statistics, AISTATS 2021, April 13-15, 2021,...

work page 2021

[13] [13]

Maria Perez-Ortiz, Omar Rivasplata, Benjamin Guedj, Matth ew Gleeson, Jingyu Zhang, John Shawe-Taylor, Miroslaw Bober, and Josef Kittler

URL http://proceedings.mlr.press/v130/ohnishi21a.html. Maria Perez-Ortiz, Omar Rivasplata, Benjamin Guedj, Matth ew Gleeson, Jingyu Zhang, John Shawe-Taylor, Miroslaw Bober, and Josef Kittler. Learning PAC-Bayes priors for probabilis- tic neural networks. 2021a. URL https://arxiv.org/abs/2109.10304. Maria Perez-Ortiz, Omar Rivasplata, John Shawe-Taylor, a...

work page arXiv

[14] [14]

URL https://doi.org/10.1109/TIT.2014.2320500

doi: 10.1109/TIT.2014.2320500. URL https://doi.org/10.1109/TIT.2014.2320500. Wenda Zhou, Victor V eitch, Morgane Austern, Ryan P . Adams, a nd Peter Orbanz. Non-vacuous generalization bounds at the imagenet scale: a PAC-Bayesian compres- sion approach. In 7th International Conference on Learning Representations , ICLR 2019, New Orleans, LA, USA, May 6-9, ...

work page doi:10.1109/tit.2014.2320500 2014