pith. sign in

arxiv: 2603.22000 · v3 · submitted 2026-03-23 · 💻 cs.LG · stat.ML

CRPS-Optimal Binning for Univariate Conformal Regression

Pith reviewed 2026-05-15 00:38 UTC · model grok-4.3

classification 💻 cs.LG stat.ML
keywords conformal predictionCRPSbinningprediction intervalsnonparametric estimationconditional distributiondynamic programming
0
0 comments X

The pith

CRPS-optimal contiguous binning yields narrower conformal prediction intervals while preserving finite-sample marginal coverage.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper develops a non-parametric estimator for conditional distributions by sorting observations on the covariate and grouping them into a small number of contiguous bins whose boundaries are chosen to minimize the total leave-one-out CRPS. The optimal partition for any fixed number of bins is recovered exactly by dynamic programming after an O(n squared log n) precomputation of the cost matrix. The number of bins itself is selected by K-fold cross-validation on test CRPS, avoiding the optimism that arises from in-sample LOO minimization. The resulting within-bin empirical distributions are then wrapped in a conformal procedure that uses CRPS as the nonconformity score, producing a transductive prediction set that carries a finite-sample marginal coverage guarantee at any chosen level without reserving a hold-out set.

Core claim

Partitioning covariate-sorted observations into K contiguous bins that globally minimize the sum of within-bin leave-one-out CRPS values, with K chosen by cross-validating test CRPS, supplies an empirical-CDF predictor inside each bin that, when equipped with CRPS as nonconformity measure, forms a conformal prediction set possessing finite-sample marginal coverage at any prescribed level; all observations contribute to both the partition and the p-value calculation.

What carries the argument

Dynamic programming that recovers the globally optimal K-partition of n sorted points minimizing the sum of within-bin LOO-CRPS costs, after which K is chosen by K-fold CV on test CRPS.

If this is right

  • The construction is transductive and data-efficient: every observation is used both to define the bins and to compute nonconformity scores, with no calibration split required.
  • The method delivers finite-sample marginal coverage at any user-specified level epsilon.
  • On real regression benchmarks the resulting intervals are substantially narrower than those from Gaussian split conformal, CQR, CQR-QRF, and conformalized isotonic distributional regression while coverage remains near nominal.
  • LOO-CRPS admits a closed-form expression with O(n squared log n) precomputation and O(n squared) storage, allowing exact optimization for moderate sample sizes.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Because the partition is fixed once K and the data are given, new test points can be assigned to an existing bin without recomputing the entire dynamic program, though the paper does not explore online or streaming variants.
  • The U-shaped cross-validation curve for K supplies an automatic bias-variance trade-off for distribution estimation that may generalize to other proper scoring rules beyond CRPS.
  • If the covariate is high-dimensional, a preliminary univariate projection or tree-based ordering would be needed before the contiguous-bin dynamic program can be applied.

Load-bearing premise

That sorting observations by a single covariate and restricting to contiguous blocks produces a partition whose within-bin empirical distributions are close enough to the true conditional distributions for the CRPS objective to select useful bins.

What would settle it

On a synthetic dataset in which the conditional distribution of the response changes smoothly and continuously with the covariate, the CRPS-optimal bins produce intervals that are no narrower than those from a correctly specified parametric conformal method while empirical coverage falls materially below the nominal level.

Figures

Figures reproduced from arXiv: 2603.22000 by Paolo Toccaceli.

Figure 1
Figure 1. Figure 1: Geometric interpretation of CRPS(Fˆm, y) as the integral of the squared vertical gap between the predictive CDF Fˆm (blue step function, m = 4 atoms) and the step 1[t ≥ y] (grey) at the observed outcome y. Blue shading marks intervals where Fˆm(t) > 1[t ≥ y] (too little forecast mass above t); orange marks intervals where Fˆm(t) < 1[t ≥ y] (too little mass below t). CRPS = R (Fˆm(t) − 1[t ≥ y])2 dt is larg… view at source ↗
Figure 2
Figure 2. Figure 2: Within-sample LOO-CRPS (left) and cross-validated test CRPS (right) as functions [PITH_FULL_IMAGE:figures/full_fig_p011_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Venn prediction band (shaded) and training ECDF [PITH_FULL_IMAGE:figures/full_fig_p012_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Conformal p-value p(yh) as a function of the candidate response yh at three test points x ∗ ∈ {0.3, 1.5, 2.7} for ε = 0.10. The shaded region is the prediction set Γ0.10; vertical red lines mark the true 90% interval under N (3x ∗ ,(1 + x ∗ ) 2 ). The p-value curve is unimodal (each monotone piece is convex, since the underlying nonconformity score α(yh) is convex), yielding a single connected prediction s… view at source ↗
Figure 5
Figure 5. Figure 5: Conformal 90% prediction intervals (shaded blue) and true 90% intervals (dashed [PITH_FULL_IMAGE:figures/full_fig_p015_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: Synthetic bimodal bin (m = 50, ε = 0.10, two modes at ±3). Left: training data distribution (KDE + rug), showing the two clearly separated modes. Centre: the CRPS nonconformity score is convex; the prediction set Γε is a single interval spanning both modes. Right: the 1-NN score has two local minima near the data clusters; the prediction set Γε,(1) consists of two tight intervals, one near each mode, with … view at source ↗
Figure 7
Figure 7. Figure 7: Training scatter with the optimal 6-bin partition. Shaded regions correspond to the [PITH_FULL_IMAGE:figures/full_fig_p020_7.png] view at source ↗
Figure 8
Figure 8. Figure 8: Distribution of conformal p-values p(y ∗ ) on the 2,000-point test set. Un￾der a valid conformal predictor the p-values are super-uniform (density ≤ 1 everywhere); slight conservatism relative to the uniform baseline reflects the approximate exchangeabil￾ity within bins of a heteroscedastic process. The p-values are discrete, taking values on {1/(m+1), 2/(m+1), . . . , 1} for the m training points in each … view at source ↗
Figure 9
Figure 9. Figure 9: Conditional coverage per bin on the 2,000-point test set, at three levels ε ∈ {0.05, 0.10, 0.20}. Dashed lines mark the nominal level. Bin 1 (m = 86, leftmost) under￾covers due to the within-bin shift of the true conditional distribution; the remaining bins are close to their nominal targets. 0.0 0.2 0.4 0.6 0.8 1.0 PIT 0.0 0.5 1.0 1.5 Density Bin 1 (m = 86) 0.0 0.2 0.4 0.6 0.8 1.0 PIT Bin 2 (m = 166) 0.0 … view at source ↗
Figure 10
Figure 10. Figure 10: PIT histograms per bin. The PIT for a test point ( [PITH_FULL_IMAGE:figures/full_fig_p022_10.png] view at source ↗
Figure 11
Figure 11. Figure 11: Old Faithful. Left: scatter of eruption duration vs. waiting time with the optimal 4-bin partition. Boundaries at 63.0, 67.5, and 71.5 minutes resolve the transition between the short-eruption and long-eruption regimes. Right: within-bin empirical CDFs; the outer bins are unimodal, confirming the partition captures the regime structure. 10.2 Motorcycle accident: heteroscedastic benchmark The mcycle datase… view at source ↗
Figure 12
Figure 12. Figure 12: Old Faithful: 90% prediction intervals from five methods. Our CRPS-based [PITH_FULL_IMAGE:figures/full_fig_p024_12.png] view at source ↗
Figure 13
Figure 13. Figure 13: Motorcycle accident. Left: cross-validated test CRPS as a function of K with the selected K∗ = 6. Right: scatter with the 6-bin partition; boundaries are concentrated in the high-variance impact phase. 11 Discussion 11.1 Other possible scoring rules The DP requires only that the per-bin cost decompose additively; any leave-one-out proper scoring rule satisfies this. Among strictly proper rules for the ful… view at source ↗
Figure 14
Figure 14. Figure 14: Motorcycle accident: 90% prediction intervals. Gaussian split conformal is [PITH_FULL_IMAGE:figures/full_fig_p026_14.png] view at source ↗
read the original abstract

We propose a method for non-parametric conditional distribution estimation based on partitioning covariate-sorted observations into contiguous bins and using the within-bin empirical CDF as the predictive distribution. Bin boundaries are chosen to minimise the total leave-one-out Continuous Ranked Probability Score (LOO-CRPS), which admits a closed-form cost function with $O(n^2 \log n)$ precomputation and $O(n^2)$ storage; the globally optimal $K$-partition is recovered by a dynamic programme in $O(n^2 K)$ time. Minimisation of within-sample LOO-CRPS turns out to be inappropriate for selecting $K$ as it results in in-sample optimism. We instead select $K$ by $K$-fold cross-validation of test CRPS, which yields a U-shaped criterion with a well-defined minimum. Having selected $K^*$ and fitted the full-data partition, we form two complementary predictive objects: the Venn prediction band and a conformal prediction set based on CRPS as the nonconformity score, which carries a finite-sample marginal coverage guarantee at any prescribed level $\varepsilon$. The conformal prediction is transductive and data-efficient, as all observations are used for both partitioning and p-value calculation, with no need to reserve a hold-out set. On real benchmarks against split-conformal competitors (Gaussian split conformal, CQR, CQR-QRF, and conformalized isotonic distributional regression), the method produces substantially narrower prediction intervals while maintaining near-nominal coverage.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The manuscript introduces CRPS-optimal binning for univariate conformal regression. Observations are sorted by a covariate and partitioned into K contiguous bins chosen to minimize the leave-one-out CRPS, which is computed via a closed-form expression and optimized using dynamic programming. K is selected via K-fold cross-validation on test CRPS. The resulting empirical CDFs within bins serve as predictive distributions, from which Venn prediction bands and CRPS-based conformal sets are derived, the latter providing finite-sample marginal coverage guarantees. Empirical results on real benchmarks show narrower prediction intervals compared to split-conformal methods such as Gaussian split conformal, CQR, CQR-QRF, and conformalized isotonic regression, while maintaining near-nominal coverage.

Significance. If the reported efficiency gains hold under proper statistical verification, the approach supplies a computationally tractable nonparametric estimator for conditional distributions that pairs naturally with conformal prediction. The dynamic-programming optimality for the LOO-CRPS objective and the transductive use of all data for both partitioning and p-value computation are concrete strengths that distinguish it from hold-out-based split-conformal baselines.

major comments (2)
  1. [Experimental Results] Experimental Results section: the abstract and benchmark tables claim substantially narrower intervals than CQR, CQR-QRF and conformalized isotonic regression, yet no error bars, standard deviations across random seeds or folds, or paired statistical tests are reported. Without these, the central efficiency claim cannot be assessed for robustness versus dataset-specific effects.
  2. [Method] Method and Assumptions: the contiguous binning after univariate sorting implicitly assumes that the conditional response distribution varies smoothly along the sorted axis. The manuscript provides no diagnostic experiments or discussion for multimodal or non-monotonic conditional laws, where the within-bin empirical CDF would be biased; this directly affects whether the observed width reductions generalize beyond the chosen benchmarks.
minor comments (2)
  1. [Method] The recurrence relation and pre-computation steps for the dynamic program should be stated explicitly with an equation or short pseudocode block to confirm the claimed O(n²K) runtime after O(n² log n) pre-processing.
  2. [Conformal Prediction] Notation for the CRPS nonconformity score and the exact construction of the conformal prediction set (Venn band versus the CRPS-based set) should be unified across sections to avoid ambiguity in the coverage argument.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback. We address each major comment below and describe the revisions we will make to strengthen the manuscript.

read point-by-point responses
  1. Referee: Experimental Results section: the abstract and benchmark tables claim substantially narrower intervals than CQR, CQR-QRF and conformalized isotonic regression, yet no error bars, standard deviations across random seeds or folds, or paired statistical tests are reported. Without these, the central efficiency claim cannot be assessed for robustness versus dataset-specific effects.

    Authors: We agree that the lack of variability measures and statistical tests limits the ability to assess robustness. In the revised manuscript we will recompute all benchmark results over 10 independent random seeds for data ordering and model fitting, reporting mean interval widths together with standard deviations. We will also add paired Wilcoxon signed-rank tests on the per-dataset width differences versus each baseline, with p-values, to quantify whether the observed efficiency gains are statistically significant and consistent. revision: yes

  2. Referee: Method and Assumptions: the contiguous binning after univariate sorting implicitly assumes that the conditional response distribution varies smoothly along the sorted axis. The manuscript provides no diagnostic experiments or discussion for multimodal or non-monotonic conditional laws, where the within-bin empirical CDF would be biased; this directly affects whether the observed width reductions generalize beyond the chosen benchmarks.

    Authors: The referee correctly notes an implicit modeling assumption. Sorting by a single covariate and using contiguous bins presupposes that the conditional distribution changes sufficiently smoothly for local empirical CDFs to be informative. We will add an explicit discussion of this assumption and its limitations (including potential bias under strong multimodality or non-monotonicity) to the Method section. In addition, we will include a new synthetic-data experiment with multimodal conditional distributions to illustrate performance degradation relative to the baselines, thereby clarifying the method's scope. revision: yes

Circularity Check

0 steps flagged

No significant circularity detected

full rationale

The derivation proceeds by optimizing contiguous bin boundaries on covariate-sorted data to minimize a closed-form LOO-CRPS objective via dynamic programming, then selecting K via external K-fold CV on held-out test CRPS (explicitly to avoid in-sample optimism), and finally constructing a transductive conformal set with CRPS as nonconformity score. The marginal coverage guarantee is invoked from standard conformal prediction theory and does not reduce to any quantity fitted inside the paper. The narrower-interval claim is presented as an empirical benchmark result, not a mathematical identity. No self-citations, self-definitional steps, or fitted-input-renamed-as-prediction patterns appear in the text.

Axiom & Free-Parameter Ledger

1 free parameters · 2 axioms · 0 invented entities

The method rests on standard dynamic-programming optimality for contiguous partitioning and on the finite-sample validity of transductive conformal prediction; no new entities are postulated and the only tunable quantity is K, which is selected externally by cross-validation.

free parameters (1)
  • K
    Number of bins; selected by K-fold cross-validation of test CRPS rather than fitted directly to the training data.
axioms (2)
  • standard math Dynamic programming recovers the globally optimal K-partition for the LOO-CRPS cost
    Standard optimality property of DP on contiguous partitions; invoked in the description of the O(n^2 K) recovery step.
  • domain assumption Transductive conformal prediction using CRPS as nonconformity score yields finite-sample marginal coverage
    Standard result from conformal prediction theory; the paper applies it without deriving it anew.

pith-pipeline@v0.9.0 · 5561 in / 1390 out tokens · 23181 ms · 2026-05-15T00:38:35.542550+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

29 extracted references · 29 canonical work pages

  1. [1]

    Brown, T. A. (1974). Admissible scoring systems for continuous distributions.RAND Corporation Memorandum, P-5235

  2. [2]

    Matheson, J. E. & Winkler, R. L. (1976). Scoring rules for continuous probability distri- butions.Management Science,22(10), 1087–1096

  3. [3]

    Unger, D. A. (1985). A method to estimate the continuous ranked probability score. Preprints, 9th Conference on Probability and Statistics in Atmospheric Sciences, Virginia Beach, VA, American Meteorological Society, 206–213

  4. [4]

    Bouttier, F. (1994). Sur la pr´ evision probabiliste et sa v´ erification.Note de Centre CNRM, No. 21, M´ et´ eo France, Toulouse

  5. [5]

    Fenwick, P. M. (1994). A new data structure for cumulative frequency tables.Software: Practice and Experience,24(3), 327–336

  6. [6]

    Knuth, D. E. (1971). Optimum binary search trees.Acta Informatica,1(1), 14–25

  7. [7]

    Yao, F. F. (1980). Efficient dynamic programming using quadrangle inequalities. InPro- ceedings of the 12th Annual ACM Symposium on Theory of Computing (STOC), 429–435

  8. [8]

    Efron, B. (1986). How biased is the apparent error rate of a prediction rule?Journal of the American Statistical Association,81(394), 461–478

  9. [9]

    & Friedman, J

    Hastie, T., Tibshirani, R. & Friedman, J. (2009).The Elements of Statistical Learning, 2nd ed., Section 7.4. Springer. 29

  10. [10]

    & Raftery, A

    Gneiting, T. & Raftery, A. E. (2007). Strictly proper scoring rules, prediction, and estimation.Journal of the American Statistical Association,102(477), 359–378

  11. [11]

    & Gy¨ orfi, L

    Devroye, L. & Gy¨ orfi, L. (1985).Nonparametric Density Estimation: TheL 1 View. Wiley. (Chapter 5 covers the exponential limit of nearest-neighbour distances.)

  12. [12]

    & Shafer, G

    Vovk, V., Gammerman, A. & Shafer, G. (2005).Algorithmic Learning in a Random World. Springer

  13. [13]

    & Gammerman, A

    Papadopoulos, H., Proedrou, K., Vovk, V. & Gammerman, A. (2002). Inductive confi- dence machines for regression. InProceedings of the 13th European Conference on Machine Learning (ECML), pp. 345–356. Springer

  14. [14]

    & Cand` es, E

    Romano, Y., Patterson, E. & Cand` es, E. J. (2019). Conformalized quantile regression. InAdvances in Neural Information Processing Systems, vol. 32

  15. [15]

    & Petej, I

    Vovk, V. & Petej, I. (2014). Venn–Abers predictors. InProceedings of the 30th Conference on Uncertainty in Artificial Intelligence (UAI), pp. 829–838

  16. [16]

    Meinshausen, N. (2006). Quantile regression forests.Journal of Machine Learning Re- search,7, 983–999

  17. [17]

    Y., Thai, K

    Duan, T., Avati, A., Ding, D. Y., Thai, K. K., Basu, S., Ng, A. Y. & Schuler, A. (2020). NGBoost: Natural gradient boosting for probabilistic prediction. InProceedings of the 37th International Conference on Machine Learning (ICML), pp. 2690–2700

  18. [18]

    & Bassett, G

    Koenker, R. & Bassett, G. (1978). Regression quantiles.Econometrica,46(1), 33–50

  19. [19]

    Auger, I. E. & Lawrence, C. E. (1989). Algorithms for the optimal identification of segment neighborhoods.Bulletin of Mathematical Biology,51(1), 39–54

  20. [20]

    & Eckley, I

    Killick, R., Fearnhead, P. & Eckley, I. A. (2012). Optimal detection of changepoints with a linear computational cost.Journal of the American Statistical Association,107(500), 1590–1598

  21. [21]

    & Franz, C

    Baringhaus, L. & Franz, C. (2004). On a new multivariate two-sample test.Journal of Multivariate Analysis,88(1), 190–206

  22. [22]

    Sz´ ekely, G. J. & Rizzo, M. L. (2013). Energy statistics: A class of statistics based on distances.Journal of Statistical Planning and Inference,143(8), 1249–1272

  23. [23]

    F., Cand` es, E

    Barber, R. F., Cand` es, E. J., Ramdas, A. & Tibshirani, R. J. (2021). Predictive inference with the jackknife+.The Annals of Statistics,49(1), 486–507

  24. [24]

    Lei, J., G’Sell, M., Rinaldo, A., Tibshirani, R. J. & Wasserman, L. (2018). Distribution- free predictive inference for regression.Journal of the American Statistical Association, 113(523), 1094–1111

  25. [25]

    & Romano, Y

    Sesia, M. & Romano, Y. (2021). Conformal prediction using conditional histograms. In Advances in Neural Information Processing Systems, vol. 34. 30

  26. [26]

    & Ziegel, J

    Allen, S., Gavrilopoulos, G., Henzi, A., Kleger, G.-R. & Ziegel, J. F. (2025). In-sample calibration yields conformal calibration guarantees.arXiv preprint, 2503.03841

  27. [27]

    & Zhu, Y

    Chernozhukov, V., W¨ uthrich, K. & Zhu, Y. (2021). Distributional conformal prediction. Proceedings of the National Academy of Sciences,118(48), e2107794118

  28. [28]

    Randahl, D., Williams, J. P. & Hegre, H. (2026). Bin-conditional conformal prediction of fatalities from armed conflict.Political Analysis,34(1), 96–108

  29. [29]

    Henzi, A., Ziegel, J. F. & Gneiting, T. (2021). Isotonic distributional regression.Journal of the Royal Statistical Society: Series B,83(5), 963–993. 31