pith. sign in

arxiv: 2202.05568 · v2 · pith:OIYVSIDXnew · submitted 2022-02-11 · 📊 stat.ML · cs.IT· cs.LG· math.IT· math.PR· math.ST· stat.TH

Change of measure through the Legendre transform

Pith reviewed 2026-05-24 12:22 UTC · model grok-4.3

classification 📊 stat.ML cs.ITcs.LGmath.ITmath.PRmath.STstat.TH
keywords PAC-Bayesf-divergenceschange of measureLegendre transformFenchel-Young inequalitygeneralization boundslearning theoryconcentration inequalities
0
0 comments X

The pith

f-divergence change-of-measure inequalities derived from the Legendre transform extend PAC-Bayes bounds to new empirical risk conditions.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper develops change-of-measure inequalities for arbitrary f-divergences by using the Legendre transform of f together with the Fenchel-Young inequality. These inequalities move concentration properties from a fixed reference measure to any posterior measure. The key advantage is that the required conditions on the empirical risk can be chosen to match the divergence, rather than always needing bounded exponential moments as in the classical Donsker-Varadhan result. This construction is applied to obtain new PAC-Bayes generalization bounds that hold under a broader set of assumptions on the loss function. A sympathetic reader would care because it widens the range of learning problems for which PAC-Bayesian guarantees are available.

Core claim

We study change-of-measure inequalities based on f-divergences, obtained by combining the Legendre transform of f with the Fenchel-Young inequality. Beyond their intrinsic interest in probability theory, we show how these inequalities are helpful in learning theory and yield PAC-Bayes bounds under tailored assumptions on the empirical risk, thereby extending the range of conditions under which PAC-Bayesian guarantees can be established.

What carries the argument

f-divergence change-of-measure inequality derived from the Legendre transform of f and the Fenchel-Young inequality

If this is right

  • PAC-Bayes bounds become available when the empirical risk has finite moments of order p > 1 for suitable f.
  • The method covers cases where the loss can take large values with small probability without requiring exponential integrability.
  • Different f choices produce bounds adapted to sub-Gaussian, sub-exponential, or other tail behaviors.
  • Generalization results apply to a larger family of posterior distributions in statistical learning.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same technique could generate concentration inequalities for other functionals in probability beyond PAC-Bayes.
  • It might simplify proofs in robust statistics where moment conditions are natural.
  • Numerical verification on synthetic data with heavy-tailed losses would test whether the new bounds are non-vacuous.

Load-bearing premise

The derived f-divergence inequalities must preserve the concentration properties without imposing extra conditions that would make the PAC-Bayes application invalid.

What would settle it

Finding a loss random variable with finite p-moment but infinite exponential moment, and checking whether the PAC-Bayes bound derived from the corresponding f still provides a non-trivial guarantee on the generalization gap.

read the original abstract

PAC-Bayes generalisation bounds are derived via change-of-measure inequalities that transfer concentration properties from a reference measure to all posterior measures. The specific choice of change of measure determines the assumptions required on the empirical risk; in particular, the classical Donsker--Varadhan theorem leads to bounds relying on bounded exponential moments. We study change-of-measure inequalities based on \(f\)-divergences, obtained by combining the Legendre transform of \(f\) with the Fenchel--Young inequality. Beyond their intrinsic interest in probability theory, we show how these inequalities are helpful in learning theory and yield PAC-Bayes bounds under tailored assumptions on the empirical risk, thereby extending the range of conditions under which PAC-Bayesian guarantees can be established.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

0 major / 3 minor

Summary. The paper derives change-of-measure inequalities based on f-divergences by combining the Legendre transform of a convex function f with the Fenchel-Young inequality. These inequalities are then applied to obtain PAC-Bayes generalization bounds that rely on tailored assumptions on the empirical risk (rather than the bounded exponential moments required by the classical Donsker-Varadhan theorem), thereby extending the range of conditions under which PAC-Bayesian guarantees hold.

Significance. If the derivations are correct, the work provides a convex-analytic framework for generating families of change-of-measure inequalities indexed by f, which in turn yield PAC-Bayes bounds under correspondingly tailored risk assumptions. This could meaningfully broaden the applicability of PAC-Bayes theory to learning settings where exponential-moment conditions fail but other moment or tail conditions (matched to f) hold. The approach is parameter-free in the sense that it directly invokes standard convex duality without additional fitted quantities.

minor comments (3)
  1. Abstract, paragraph 2: the phrase 'tailored assumptions on the empirical risk' is used without an immediate concrete example; a one-sentence illustration (e.g., for f(t)=t log t or f(t)=t^2) would clarify the claim for readers.
  2. The manuscript should explicitly state whether the derived inequalities recover the classical Donsker-Varadhan bound as a special case when f is the exponential function, to make the extension transparent.
  3. Notation for the reference and posterior measures should be introduced once and used consistently; the current abstract alternates between 'reference measure' and 'posterior measures' without a single defining sentence.

Simulated Author's Rebuttal

0 responses · 0 unresolved

We thank the referee for their positive summary of our work and for recommending minor revision. No specific major comments were listed in the report.

Circularity Check

0 steps flagged

No significant circularity identified

full rationale

The derivation begins from the standard Legendre transform of an f-divergence combined with the external Fenchel-Young inequality to produce change-of-measure bounds, then applies those bounds to PAC-Bayes under tailored empirical-risk assumptions. No step reduces by construction to a fitted parameter, self-definition, or load-bearing self-citation; the central inequalities are obtained directly from classical convex-analysis identities whose validity does not depend on the target PAC-Bayes result. The paper therefore remains self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim rests on the validity of applying the Legendre transform and Fenchel-Young inequality to produce usable change-of-measure bounds; these are standard convex-analysis facts rather than new postulates.

axioms (1)
  • standard math Fenchel-Young inequality holds for proper convex lower-semicontinuous functions
    Invoked to combine Legendre transform of f with the divergence to obtain the change-of-measure inequality.

pith-pipeline@v0.9.0 · 5660 in / 1176 out tokens · 26716 ms · 2026-05-24T12:22:11.971215+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Forward citations

Cited by 2 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. Tighter Information-Theoretic Generalization Bounds via a Novel Class of Change of Measure Inequalities

    cs.IT 2026-02 conditional novelty 7.0

    A unified data-processing framework produces tighter change-of-measure inequalities that improve information-theoretic generalization bounds across learning theory and privacy.

  2. Density-Ratio Losses for Post-Hoc Learning to Defer

    stat.ML 2026-05 unverdicted novelty 6.0

    Post-hoc learning to defer is cast as density-ratio learning between model and expert ideal distributions, producing DR CPE losses that recover Chow's rule for KL-based ideals and support adjustable deferral via thresholding.

Reference graph

Works this paper leans on

14 extracted references · 14 canonical work pages · cited by 2 Pith papers

  1. [1]

    12 ON CHANGE OF MEASURE INEQUALITIES FOR f -DIVERGENCES Pierre Alquier and Benjamin Guedj

    URL https://www.arxiv.org/abs/2110.11216. 12 ON CHANGE OF MEASURE INEQUALITIES FOR f -DIVERGENCES Pierre Alquier and Benjamin Guedj. Simpler PAC-Bayesian bo unds for hostile data. Machine Learning, 107(5):887–902,

  2. [2]

    URL https://doi.org/10.3390/e23101280

    doi: 10.3390/e23101280. URL https://doi.org/10.3390/e23101280. Felix Biggs and Benjamin Guedj. On margins and derandomisat ion in PAC-Bayes. In AISTATS, 2022a. URL https://www.arxiv.org/abs/2107.03955. Felix Biggs and Benjamin Guedj. Non-vacuous generalisatio n bounds for shallow neural networks. 2022b. URL https://arxiv.org/abs/2202.01627. St´ ephane Bou...

  3. [3]

    Olivier Catoni

    URL https://arxiv.org/abs/2012.03780. Olivier Catoni. A PAC-Bayesian approach to adaptive classi fication. preprint, 840,

  4. [4]

    Statistical Learning Theory and Stochastic Optimization: Ecole d’Et ´e de Proba- bilit´es de Saint-Flour XXXI-2001

    Olivier Catoni. Statistical Learning Theory and Stochastic Optimization: Ecole d’Et ´e de Proba- bilit´es de Saint-Flour XXXI-2001 . Springer,

  5. [5]

    doi: 10.1016/j.jcss.2011.12.02

  6. [6]

    A primer on PAC-bayesian learning.arXiv preprint arXiv:1901.05353, 2019

    URL https://arxiv.org/abs/1901.05353. Benjamin Guedj and Louis Pujol. Still no free lunches: the pr ice to pay for tighter PAC- Bayes bounds. Entropy, 23(11),

  7. [7]

    doi: 10.3390/e23111529

    ISSN 1099-4300. doi: 10.3390/e23111529. UR L https://www.mdpi.com/1099-4300/23/11/1529. Maxime Haddouche, Benjamin Guedj, Omar Rivasplata, and Joh n Shawe-Taylor. PAC-Bayes un- leashed: generalisation bounds with unbounded losses. Entropy, 23(10):1330,

  8. [8]

    Matthew J. Holland. PAC-Bayes under potentially heavy tail s. In Hanna M. Wal- lach, Hugo Larochelle, Alina Beygelzimer, Florence d’Alch ´ e-Buc, Emily B. Fox, and Roman Garnett, editors, Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Sys tems 2019, NeurIPS 2019, December 8-14, 2019, V ancouver , B...

  9. [9]

    John Langford and Matthias Seeger

    URL https://proceedings.neurips.cc/paper/2019/hash/3a20f62a0af1aa152670bab3c602feed-Abstract.html. John Langford and Matthias Seeger. Bounds for averaging cla ssifiers

  10. [10]

    Williamson

    Zakaria Mhammedi, Benjamin Guedj, and Robert C. Williamson . PAC-Bayesian bound for the conditional value at risk. In Hugo Larochelle, Marc’Aurelio Ran- zato, Raia Hadsell, Maria-Florina Balcan, and Hsuan-Tien L in, editors, Advances in Neural Information Processing Systems 33: Annual Conferen ce on Neural Informa- tion Processing Systems [NeurIPS] 2020, ...

  11. [11]

    Behnam Neyshabur, Srinadh Bhojanapalli, and Nathan Srebro

    URL https://proceedings.neurips.cc/paper/2020/hash/d02e9bdc27a894e882fa0c9055c99722-Abstract.html. Behnam Neyshabur, Srinadh Bhojanapalli, and Nathan Srebro . A PAC-Bayesian ap- proach to spectrally-normalized margin bounds for neural n etworks. In 6th Interna- tional Conference on Learning Representations, ICLR 2018, V ancouver , BC, Canada, April 30 - M...

  12. [12]

    Y uki Ohnishi and Jean Honorio

    URL https://proceedings.mlr.press/v124/nozawa20a.html. Y uki Ohnishi and Jean Honorio. Novel change of measure inequ alities with appli- cations to PAC-Bayesian bounds and Monte Carlo estimation. In Arindam Baner- jee and Kenji Fukumizu, editors, The 24th International Conference on Artificial Intel- ligence and Statistics, AISTATS 2021, April 13-15, 2021,...

  13. [13]

    Maria Perez-Ortiz, Omar Rivasplata, Benjamin Guedj, Matth ew Gleeson, Jingyu Zhang, John Shawe-Taylor, Miroslaw Bober, and Josef Kittler

    URL http://proceedings.mlr.press/v130/ohnishi21a.html. Maria Perez-Ortiz, Omar Rivasplata, Benjamin Guedj, Matth ew Gleeson, Jingyu Zhang, John Shawe-Taylor, Miroslaw Bober, and Josef Kittler. Learning PAC-Bayes priors for probabilis- tic neural networks. 2021a. URL https://arxiv.org/abs/2109.10304. Maria Perez-Ortiz, Omar Rivasplata, John Shawe-Taylor, a...

  14. [14]

    URL https://doi.org/10.1109/TIT.2014.2320500

    doi: 10.1109/TIT.2014.2320500. URL https://doi.org/10.1109/TIT.2014.2320500. Wenda Zhou, Victor V eitch, Morgane Austern, Ryan P . Adams, a nd Peter Orbanz. Non-vacuous generalization bounds at the imagenet scale: a PAC-Bayesian compres- sion approach. In 7th International Conference on Learning Representations , ICLR 2019, New Orleans, LA, USA, May 6-9, ...