pith. sign in

arxiv: 2605.19024 · v1 · pith:AFZPADBUnew · submitted 2026-05-18 · 📊 stat.ML · cs.LG· stat.ME

Conformal Prediction via Transported Beta Laws

Pith reviewed 2026-05-20 07:36 UTC · model grok-4.3

classification 📊 stat.ML cs.LGstat.ME
keywords conformal predictionbeta distributionwasserstein distanceconditional coveragenon-exchangeable datamarginal coveragefinite-sample boundsorder statistics
0
0 comments X

The pith

The calibration-conditional coverage in split conformal prediction follows an exact Beta law under i.i.d. data, which is then transported via Wasserstein distance to bound gaps when exchangeability fails.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Split conformal prediction delivers marginal coverage guarantees that average over the random calibration sample under exchangeability. The paper instead examines the full law of coverage conditional on a realized calibration set. Under continuous i.i.d. observations this law is exactly Beta(k, n+1-k), so the familiar marginal guarantee is simply its mean. The authors adopt this Beta distribution as a finite-sample reference and quantify how non-i.i.d. structure deforms it, using Wasserstein distance on the unit interval. The resulting bounds separate test-point shifts, which act through a coverage-scale transport map, from calibration-set dependence, which alters the underlying order statistics, and they yield explicit or approximate characterizations in scale-shift, clustered, and mixing regimes.

Core claim

In the continuous i.i.d. setting the law of the calibration-conditional coverage is exactly Beta(k, n+1-k). This Beta law serves as a finite-sample reference object. Departures from it are quantified using Wasserstein distances on [0,1], yielding direct bounds on marginal coverage gaps and on bad-calibration probabilities. Different sources of non-i.i.d. behavior deform the reference in distinct ways: test-side shift acts through a transport map on the coverage scale while calibration dependence alters the order-statistic law itself. The framework is instantiated in scale-shift, clustered, and stationary mixing settings, where the deformations are characterized explicitly or via Berry-Esseen

What carries the argument

The transported Beta law, formed by taking the exact Beta(k, n+1-k) reference for i.i.d. data and deforming it either by a transport map (for test-side shifts) or by a changed order-statistic distribution (for calibration dependence) to produce Wasserstein bounds on coverage error.

If this is right

  • Wasserstein distance to the Beta reference directly bounds the gap between marginal and conditional coverage.
  • The same distance supplies finite-sample bounds on the probability of poor calibration for any fixed threshold.
  • Test-side shifts and calibration dependence produce separable deformations that can be bounded independently.
  • Explicit transport maps or Berry-Esseen approximations are available for scale-shift, clustered, and mixing data.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The framework could be inverted to produce data-driven corrections that adjust the conformal threshold once a deformation has been estimated.
  • Analogous reference laws might be derived for other nonconformity scores or for full conformal prediction.
  • The separation of shift versus dependence effects suggests diagnostic tests that flag which source is dominant in a given dataset.

Load-bearing premise

The exact Beta(k, n+1-k) law for calibration-conditional coverage holds only under continuous i.i.d. observations and exchangeability, which the paper uses as the reference object whose deformations are then studied.

What would settle it

Compute the empirical distribution of calibration-conditional coverage on continuous i.i.d. data and test whether its Wasserstein distance to Beta(k, n+1-k) is near zero, or verify that the observed Wasserstein distance in a stationary mixing process matches the Berry-Esseen approximation to within sampling error at moderate n.

Figures

Figures reproduced from arXiv: 2605.19024 by Helton Graziadei, Luben M. C. Cabezas, Thiago R. Ramos.

Figure 1
Figure 1. Figure 1: visualizes this lower-tail effect and its decay with the calibration size [PITH_FULL_IMAGE:figures/full_fig_p002_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Wasserstein radius W1(νπ,c, βn,k) for the contaminated law νπ,c = (1−π)βn,k+πδc, with n = 50, k = 46, and γ = 0.9. Each white contour traces the boundary of B1(βn,k, ρ) for a given ρ, defining the set of contaminations (π, c) compatible with that transport budget. Contamination near c = γ allows a larger fraction π for the same radius, while contamination away from γ forces π to be small — the contour shap… view at source ↗
Figure 3
Figure 3. Figure 3: Transported beta laws under half-normal scale shift for n = 30, γ = 0.9, and k = 28, so that Bn,k ∼ Beta(28, 3). Under the scale ratio r = σTe/σT , the realized coverage satisfies Dn,k d= hr(Bn,k). As r increases above one, the law is transported to the left, increasing the probability of bad calibration. The vertical lines mark γ = 0.90 and γ − η = 0.85, with η = 0.05. so that Cov(k) − k n + 1 = E[hr(Bn,k… view at source ↗
Figure 4
Figure 4. Figure 4: Realized-coverage laws νn,k for the Gaussian AR(1) model with n = 50 and γ = 0.9, against the i.i.d. beta reference βn,k (dashed red). At ℓ = 1, larger positive values of a produce a more dispersed realized-coverage law, with both a sharper concentration near 1 and a heavier left tail extending below the bad-calibration threshold γ−η = 0.85. As ℓ grows, the direct test-calibration dependence weakens and th… view at source ↗
Figure 5
Figure 5. Figure 5: Half-normal scale-shift example with γ = 0.9. Left: CDFs of ν (r) n,k = (hr)#βn,k (solid blue) against βn,k (dashed black) for r ∈ {0.5, 0.8, 2.0} and n ∈ {50, 200, 1000}; the shaded area equals W1(ν (r) n,k, βn,k) and is essentially con￾stant across n. Right: exact identity W1(ν (r) n,k, βn,k) = | Cov(k) − k/(n + 1)| verified across r ∈ [0.3, 3.0]; the asymmetry around r = 1 reflects the nonlinearity of h… view at source ↗
Figure 6
Figure 6. Figure 6: W1(νn,kγ , βn,kγ ) (solid blue) and the asymptotic bound (1) (dashed orange) as functions of ℓ, for n = 200, γ = 0.9, and a ∈ {0, 0.3, 0.6, 0.9}. For a = 0, W1 is negligible for all ℓ; for a > 0, both decay geometrically and level off at the Berry–Esseen floor, which depends only on n and the long-run variance τ 2 γ [PITH_FULL_IMAGE:figures/full_fig_p031_6.png] view at source ↗
Figure 7
Figure 7. Figure 7: Chain of bounds | Cov(kγ)−kγ/(n+ 1)| ≤ W1(νn,kγ , βn,kγ ) ≤ |a| ℓ+ p 2/(πn)|τγ − p γ(1 − γ)| + O(n −1/2 ) as a function of n, for γ = 0.9, a ∈ {0.3, 0.6, 0.9}, and ℓ ∈ {1, 10, 25}. Solid blue: Monte-Carlo W1; dashed red: coverage gap; dotted orange: asymptotic bound. The chain holds throughout the asymptotic regime; for small n with large a and small ℓ the bound is conservative, and tightens as |a| ℓ decay… view at source ↗
Figure 8
Figure 8. Figure 8: Bad-calibration bound from Theorem 12: P(D (ℓ) n,kγ ≤ γ − η) (solid blue) against P(Bn,kγ ≤ γ − η/2) + 2W1/η (dashed green) as a function of n, for γ = 0.9, η = 0.05, a ∈ {0.3, 0.6, 0.9}, and ℓ ∈ {5, 10}. The bound can exceed one for large a and small ℓ, where the Markov penalty 2W1/η dominates, and tightens as ℓ grows and W1 decays [PITH_FULL_IMAGE:figures/full_fig_p033_8.png] view at source ↗
read the original abstract

Split conformal prediction provides finite-sample marginal coverage under exchangeability, but this guarantee averages over the random calibration sample. We study instead the law of the calibration-conditional coverage induced by a realized conformal threshold. In the continuous i.i.d. setting this law is exactly $Beta(k,n+1-k)$, so the usual marginal guarantee corresponds to its mean. We take this beta law as a finite-sample reference object and quantify departures from it using Wasserstein distances on $[0,1]$. The framework yields direct bounds on marginal coverage gaps and on bad-calibration probabilities, and separates different sources of non-i.i.d. behavior according to how they deform the beta reference: test-side shift acts through a transport map on the coverage scale, while calibration dependence changes the order-statistic law itself. We instantiate the framework in scale-shift, clustered, and stationary mixing settings, where the induced deformations can be characterized explicitly or through Berry-Esseen approximations. Simulations on dependent processes confirm that the first-order approximation tracks the empirical Wasserstein distance even at moderate sample sizes.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

1 major / 2 minor

Summary. The paper studies the calibration-conditional coverage law in split conformal prediction. Under continuous i.i.d. observations this law is exactly Beta(k, n+1-k); the usual marginal guarantee is its mean. The authors treat this Beta law as a finite-sample reference and quantify departures from it via Wasserstein distances on [0,1]. The framework is claimed to deliver direct bounds on marginal coverage gaps and on bad-calibration probabilities, while separating test-side shift (via a transport map) from calibration dependence (via changes to the order-statistic law). Explicit or Berry-Esseen characterizations are given for scale-shift, clustered, and stationary-mixing regimes, with supporting simulations on dependent processes.

Significance. If the claimed Wasserstein bounds are rigorously justified, the work supplies a geometrically interpretable finite-sample analysis that isolates distinct sources of non-exchangeability. The first-principles construction from order statistics and the explicit transport-map treatment of test-side shift are strengths; the Berry-Esseen route for mixing processes offers a concrete approximation that simulations suggest remains accurate at moderate n.

major comments (1)
  1. [Abstract] Abstract: the claim that Wasserstein distance to the Beta(k, n+1-k) reference 'yields direct bounds ... on bad-calibration probabilities' is not immediate. The indicator 1_{[0,c]} is discontinuous and not Lipschitz, so Kantorovich-Rubinstein duality controls only the mean gap (via the 1-Lipschitz identity map) and supplies no automatic bound on P(X ≤ c). The manuscript must state the auxiliary regularity (e.g., uniform density bounds on the transported measure or Lipschitz constants of the transport map) that closes this gap; without it the probability bound does not follow from W1 alone.
minor comments (2)
  1. The abstract states that simulations 'confirm that the first-order approximation tracks the empirical Wasserstein distance even at moderate sample sizes,' yet provides no numerical values for n, the dependence parameters, or the number of Monte Carlo replications. Adding these details (or a table) would strengthen the empirical support.
  2. [§2] Notation for the transported coverage random variable and the reference Beta law could be introduced with a single displayed equation early in §2 to improve readability.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for their careful reading and constructive feedback. We address the single major comment below and have revised the manuscript to incorporate the necessary clarifications on regularity conditions.

read point-by-point responses
  1. Referee: [Abstract] Abstract: the claim that Wasserstein distance to the Beta(k, n+1-k) reference 'yields direct bounds ... on bad-calibration probabilities' is not immediate. The indicator 1_{[0,c]} is discontinuous and not Lipschitz, so Kantorovich-Rubinstein duality controls only the mean gap (via the 1-Lipschitz identity map) and supplies no automatic bound on P(X ≤ c). The manuscript must state the auxiliary regularity (e.g., uniform density bounds on the transported measure or Lipschitz constants of the transport map) that closes this gap; without it the probability bound does not follow from W1 alone.

    Authors: We agree that the indicator function 1_{[0,c]} is discontinuous and hence not 1-Lipschitz, so the Kantorovich-Rubinstein representation of W_1 directly yields only a bound on the difference of expectations (i.e., the marginal coverage gap). Bounding probabilities of the form P(coverage ≤ c) requires additional regularity to control the modulus of continuity of the CDF. In the revised manuscript we have added an explicit statement of the required auxiliary conditions immediately after the definition of the transported beta law (new paragraph in Section 2): we assume that the transported measures admit densities bounded above and below by positive constants independent of n. Under this uniform-density assumption the CDFs are Lipschitz with constant equal to the density bound, and therefore W_1 controls the Kolmogorov distance, which in turn bounds the bad-calibration probabilities. The same density bounds hold automatically in the scale-shift and stationary-mixing regimes treated in Sections 3 and 5; we have inserted a short remark confirming this fact and have updated the abstract to read “under the auxiliary density bounds stated in Section 2, the framework yields direct bounds…”. revision: yes

Circularity Check

0 steps flagged

No circularity; derivation uses standard order statistics and Wasserstein metric from first principles

full rationale

The paper constructs the Beta(k, n+1-k) reference law directly from the distribution of order statistics under continuous i.i.d. exchangeability, a standard and independently verifiable probabilistic fact. Wasserstein distances are then applied as an external metric to quantify deformations without any reduction of outputs to fitted inputs, self-definitions, or load-bearing self-citations. Explicit transport maps for test-side shifts and Berry-Esseen approximations for dependence are derived in specific regimes without circular reference to the target coverage bounds. The framework remains self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The framework rests on standard properties of order statistics and beta distributions under exchangeability; no free parameters are introduced, no new entities are postulated, and the non-i.i.d. extensions use explicit transport maps or mixing assumptions.

axioms (2)
  • standard math Under continuous i.i.d. exchangeability the calibration-conditional coverage follows exactly Beta(k, n+1-k)
    Invoked as the reference law whose mean recovers the usual marginal coverage guarantee.
  • domain assumption Wasserstein distance on [0,1] quantifies departures from the beta reference in a way that yields direct bounds on coverage gaps
    Used to separate test-side shift (transport map) from calibration dependence (order-statistic law change).

pith-pipeline@v0.9.0 · 5722 in / 1610 out tokens · 42763 ms · 2026-05-20T07:36:00.798890+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

34 extracted references · 34 canonical work pages · 1 internal anchor

  1. [1]

    and Marzouk, Youssef , year =

    Aolaritei, Liviu and Wang, Zheyu Oliver and Zhu, Julie and Jordan, Michael I. and Marzouk, Youssef , year =. Conformal Prediction under

  2. [2]

    2025 , archivePrefix =

    Multivariate Conformal Prediction using Optimal Transport , author =. 2025 , archivePrefix =. 2502.03609 , primaryClass =

  3. [3]

    Theory of Probability & Its Applications , volume=

    Some limit theorems for stationary processes , author=. Theory of Probability & Its Applications , volume=. 1962 , publisher=

  4. [4]

    A Survey and Some Open Questions , author=

    Basic Properties of Strong Mixing Conditions. A Survey and Some Open Questions , author=. Probability Surveys , volume=

  5. [5]

    The Annals of Probability , volume=

    Rates of Convergence for Empirical Processes of Stationary Mixing Sequences , author=. The Annals of Probability , volume=. 1994 , doi=

  6. [6]

    Advances in Neural Information Processing Systems , volume =

    An Information Theoretic Perspective on Conformal Prediction , author =. Advances in Neural Information Processing Systems , volume =. 2024 , archivePrefix =. 2405.02140 , primaryClass =

  7. [7]

    2025 , archivePrefix =

    Weighted Conformal Prediction for Survival Analysis under Covariate Shift , author =. 2025 , archivePrefix =. 2512.03738 , primaryClass =

  8. [8]

    2026 , archivePrefix =

    Coverage Guarantees for Pseudo-Calibrated Conformal Prediction under Distribution Shift , author =. 2026 , archivePrefix =. 2602.14913 , primaryClass =

  9. [9]

    The Thirty-ninth Annual Conference on Neural Information Processing Systems , year =

    Non-exchangeable Conformal Prediction with Optimal Transport: Tackling Distribution Shifts with Unlabeled Data , author =. The Thirty-ninth Annual Conference on Neural Information Processing Systems , year =

  10. [10]

    Electronic Journal of Statistics , volume =

    Training-conditional coverage for distribution-free predictive inference , author =. Electronic Journal of Statistics , volume =. 2023 , doi =

  11. [11]

    2020 IEEE Information Theory Workshop (ITW) , pages =

    Measuring Dependencies of Order Statistics: An Information Theoretic Perspective , author =. 2020 IEEE Information Theory Workshop (ITW) , pages =. 2021 , doi =

  12. [12]

    Statistics and Probability Letters , volume =

    Universal distribution of the empirical coverage in split conformal prediction , author =. Statistics and Probability Letters , volume =. 2025 , doi =

  13. [13]

    The Annals of Statistics , volume =

    Conformal Prediction Beyond Exchangeability , author =. The Annals of Statistics , volume =. 2023 , doi =

  14. [14]

    2026 , archivePrefix =

    Predictive inference for time series: why is split conformal effective despite temporal dependence? , author =. 2026 , archivePrefix =. 2510.02471 , primaryClass =

  15. [15]

    The Thirteenth International Conference on Learning Representations , year =

    Wasserstein-Regularized Conformal Prediction under General Distribution Shift , author =. The Thirteenth International Conference on Learning Representations , year =

  16. [16]

    Conditional

    Vovk, Vladimir , month = nov, year =. Conditional. Proceedings of the

  17. [17]

    2005 , isbn =

    Vovk, Vladimir and Gammerman, Alex and Shafer, Glenn , title =. 2005 , isbn =

  18. [18]

    Machine Learning: ECML 2002 , editor =

    Inductive Confidence Machines for Regression , author =. Machine Learning: ECML 2002 , editor =. 2002 , doi =

  19. [19]

    Journal of the Royal Statistical Society: Series B (Statistical Methodology) , volume =

    Distribution-free Prediction Bands for Non-parametric Regression , author =. Journal of the Royal Statistical Society: Series B (Statistical Methodology) , volume =. 2014 , doi =

  20. [20]

    A Gentle Introduction to Conformal Prediction and Distribution-Free Uncertainty Quantification

    A Gentle Introduction to Conformal Prediction and Distribution-Free Uncertainty Quantification , author =. 2021 , archivePrefix =. 2107.07511 , primaryClass =

  21. [21]

    Gradient Flows in Metric Spaces and in the Space of Probability Measures , author =

  22. [22]

    2009 , doi =

    Optimal Transport: Old and New , author =. 2009 , doi =

  23. [23]

    2015 , doi =

    Optimal Transport for Applied Mathematicians , author =. 2015 , doi =

  24. [24]

    Foundations and Trends in Machine Learning , volume =

    Computational Optimal Transport , author =. Foundations and Trends in Machine Learning , volume =. 2019 , doi =

  25. [25]

    Asymptotic

    Sen, Pranab Kumar , month = oct, year =. Asymptotic. The Annals of Mathematical Statistics , publisher =. doi:10.1214/aoms/1177698155 , abstract =

  26. [26]

    1972 , issn =

    On the Bahadur representation of sample quantiles for sequences of -mixing random variables , journal =. 1972 , issn =. doi:https://doi.org/10.1016/0047-259X(72)90011-5 , url =

  27. [27]

    Journal of Multivariate Analysis , author =

    On deviations between empirical and quantile processes for mixing random variables , volume =. Journal of Multivariate Analysis , author =. 1978 , keywords =. doi:10.1016/0047-259X(78)90031-3 , abstract =

  28. [28]

    Proceedings of the 27th International Conference on Artificial Intelligence and Statistics (AISTATS) , series =

    Transductive conformal inference with adaptive scores , author =. Proceedings of the 27th International Conference on Artificial Intelligence and Statistics (AISTATS) , series =

  29. [29]

    arXiv preprint arXiv:2409.12019 , year =

    Asymptotics for conformal inference , author =. arXiv preprint arXiv:2409.12019 , year =

  30. [30]

    Lahiri, S. N. and Sun, S. , title =. The Annals of Applied Probability , volume =. 2009 , doi =

  31. [31]

    Journal of Mathematical Analysis and Applications , volume =

    Yang, Wenzhi and Wang, Xuejun and Li, Xiaoqin and Hu, Shuhe , title =. Journal of Mathematical Analysis and Applications , volume =. 2012 , doi =

  32. [32]

    Communications in Statistics -- Theory and Methods , volume =

    Yang, Wenzhi and Wang, Xuejun and Hu, Shuhe , title =. Communications in Statistics -- Theory and Methods , volume =. 2014 , doi =

  33. [33]

    Journal of Machine Learning Research , volume=

    Split conformal prediction and non-exchangeable data , author=. Journal of Machine Learning Research , volume=

  34. [34]

    2008 , eprint=

    Stability Bound for Stationary Phi-mixing and Beta-mixing Processes , author=. 2008 , eprint=