Conformal Prediction with Macro-Coverage Guarantees

Aabesh Bhattacharyya; Rina Foygel Barber; Tiffany Ding

arxiv: 2606.28598 · v1 · pith:BXE66MJTnew · submitted 2026-06-26 · 📊 stat.ME · cs.LG· stat.ML

Conformal Prediction with Macro-Coverage Guarantees

Aabesh Bhattacharyya , Tiffany Ding , Rina Foygel Barber This is my paper

Pith reviewed 2026-06-30 00:31 UTC · model grok-4.3

classification 📊 stat.ME cs.LGstat.ML

keywords conformal predictionmacro-coverageprediction setsclass-conditional coverageimbalanced classificationfinite-sample guaranteeslabel weighting

0 comments

The pith

Label-weighted conformal prediction produces sets with finite-sample macro-coverage guarantees.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper establishes that weighting each class in the conformal calibration step yields prediction sets whose average per-class coverage meets a target level with finite samples. This middle ground between marginal coverage, which can ignore rare classes, and class-conditional coverage, which demands many examples per class, is achieved by treating macro-coverage as the unweighted average of the individual class coverages. The same weighting approach extends to generalized objectives that first group classes and then average with arbitrary weights. The authors also give the explicit form of the smallest sets that meet any such objective and a matching score function.

Core claim

Label-weighted conformal prediction produces prediction sets that satisfy a finite-sample guarantee on the macro-coverage objective, defined as the unweighted average of class-conditional coverages, and on a family of generalized macro-coverage objectives that aggregate coverage over arbitrary class groupings.

What carries the argument

Label-weighted conformal scores, in which each class receives a weight chosen to make its contribution to the overall coverage objective uniform.

If this is right

The guarantee holds for any finite calibration set size and does not require balanced class counts.
Generalized objectives allow the user to specify coverage targets for user-defined groups of classes rather than individual classes.
The minimal-cardinality prediction sets for a given objective are obtained by thresholding a weighted nonconformity score.
The same construction recovers standard marginal coverage when all weights are set equal to class frequency.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The weighting scheme could be combined with other conformal variants such as adaptive or full conformal methods to retain their additional properties while targeting macro-coverage.
In domains with long-tailed label distributions the method may reduce the practical gap between theoretical coverage promises and observed performance on minority classes.
The explicit characterization of minimal sets suggests a direct optimization route for objectives that mix coverage and set size.

Load-bearing premise

The calibration and test points are exchangeable.

What would settle it

Apply the weighted procedure on exchangeable data and measure the realized unweighted average of per-class coverages on a large hold-out set; systematic shortfall below the nominal level would refute the guarantee.

read the original abstract

Prediction sets should have high coverage to be useful, but some coverage notions are more practically relevant than others. In the classification setting, class-conditional coverage requires that the prediction set (i.e., the set of candidate labels for a new test point) must achieve the target accuracy level within each class, which may be challenging to satisfy when many classes are rare and have few calibration points. At the other extreme, marginal coverage requires only that coverage holds on average over the distribution of all classes, which can lead to low-probability labels being essentially ignored. To find a middle ground, recent work has introduced macro-coverage, defined as the unweighted average of class-conditional coverages. Macro-coverage offers a compromise between marginal coverage and class-conditional coverage that is particularly appropriate for long-tailed settings. In this work, we show that label-weighted conformal prediction can be used to produce prediction sets with a finite-sample macro-coverage guarantee, and more generally a guarantee on a family of generalized macro-coverage objectives that aggregate coverage at the level of arbitrary class groupings and take a weighted average. We further characterize the form of the smallest prediction sets satisfying a given generalized macro-coverage objective and propose a corresponding conformal score function. We validate our theoretical results on two large-scale image classification datasets.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

Label-weighted conformal prediction delivers finite-sample macro-coverage guarantees, a usable middle option for imbalanced classification.

read the letter

The core contribution is showing that reweighting the nonconformity scores by label frequency lets you run the usual conformal argument and get a finite-sample guarantee on the unweighted average of class-conditional coverages. They also extend it to arbitrary groupings and give the form of the smallest sets that meet the target. That is new relative to the standard marginal and conditional results in the conformal literature.

The paper does the obvious next step cleanly: it states the generalized objective, derives the minimal-set characterization, proposes the matching score, and checks the coverage numbers on two large image datasets. The exchangeability assumption is the standard one, transferred directly to the weighted scores, so the finite-sample claim looks mechanically sound.

The main limitation is that the abstract leaves the explicit derivation and the exact weighting scheme implicit, so a referee would need to see the full proof to confirm there are no hidden dependencies on the number of classes or calibration size. The experiments are only two datasets; that is enough to illustrate but not to stress-test edge cases like very rare classes or distribution shift.

This is for people already working on conformal methods for classification who need something between marginal and fully conditional coverage. It is worth sending to a serious referee because the claim is concrete, the method is simple to implement, and the gap it targets is real.

Referee Report

0 major / 3 minor

Summary. The manuscript claims that label-weighted conformal prediction yields finite-sample guarantees for macro-coverage (the unweighted average of class-conditional coverages) and its generalizations to arbitrary class groupings. It further characterizes the smallest prediction sets satisfying a given generalized macro-coverage objective, proposes a corresponding conformal score, and validates the results empirically on two large-scale image classification datasets.

Significance. If the finite-sample guarantees hold under the standard exchangeability assumption, the work supplies a practically relevant middle ground between marginal and class-conditional coverage for long-tailed classification problems. The characterization of optimal sets and the extension to grouped objectives add theoretical value beyond existing conformal methods.

minor comments (3)

[Abstract] Abstract: the finite-sample claim would be clearer if the exchangeability assumption between calibration and test points were stated explicitly rather than left implicit.
[Introduction] The description of the generalized macro-coverage objectives would benefit from a short concrete example of class groupings to illustrate the weighting.
[Experiments] Empirical section: more detail on how the label-weighted nonconformity scores are computed and how macro-coverage is estimated from the test sets would aid reproducibility.

Simulated Author's Rebuttal

0 responses · 0 unresolved

We thank the referee for their positive summary, recognition of the practical relevance for long-tailed settings, and recommendation of minor revision. No specific major comments were raised in the report.

Circularity Check

0 steps flagged

No significant circularity identified

full rationale

The derivation adapts the standard exchangeability-based conformal argument to a label-weighted nonconformity score, yielding finite-sample macro-coverage guarantees. This extension relies on the usual calibration/test exchangeability assumption (transferred to the weighted case) rather than any self-referential definition, fitted parameter renamed as a prediction, or load-bearing self-citation chain. The abstract and claim description contain no equations or steps that reduce the macro-coverage result to the paper's own inputs by construction; the result is an independent extension of existing conformal methods.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The result rests on the standard exchangeability assumption of conformal prediction plus the existence of a calibration set; no new free parameters or invented entities are introduced.

axioms (1)

domain assumption Calibration and test points are exchangeable
Required for the finite-sample coverage guarantee to transfer from the usual conformal argument to the weighted version.

pith-pipeline@v0.9.1-grok · 5761 in / 1124 out tokens · 33846 ms · 2026-06-30T00:31:46.483619+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

37 extracted references · 2 canonical work pages

[2]

International Conference on Learning Representations , year=

Conformal prediction for long-tailed classification , author=. International Conference on Learning Representations , year=
[3]

Journal of the American Statistical Association , volume=

Least ambiguous set-valued classifiers with bounded error levels , author=. Journal of the American Statistical Association , volume=. 2019 , publisher=

2019
[4]

Garcin, Camille and Joly, Alexis and Bonnet, Pierre and Lombardo, Jean-Christophe and Affouard, Antoine and Chouet, Mathias and Servajean, Maximilien and Lorieul, Titouan and Salmon, Joseph , booktitle =
[5]

arXiv preprint arXiv:2502.17264 , year=

Kandinsky Conformal Prediction: Beyond Class-and Covariate-Conditional Coverage , author=. arXiv preprint arXiv:2502.17264 , year=

work page arXiv
[6]

Speech and Natural Language: Proceedings of a Workshop Held at Pacific Grove, California, February 19-22, 1991 , year=

Evaluating text categorization i , author=. Speech and Natural Language: Proceedings of a Workshop Held at Pacific Grove, California, February 19-22, 1991 , year=

1991
[7]

Proceedings of the AAAI Conference on Artificial Intelligence , volume=

Conformal Prediction Meets Long-tail Classification , author=. Proceedings of the AAAI Conference on Artificial Intelligence , volume=
[8]

2005 , publisher=

Algorithmic Learning in a Random World , author=. 2005 , publisher=

2005
[9]

European Conference on Machine Learning , pages=

Inductive confidence machines for regression , author=. European Conference on Machine Learning , pages=. 2002 , organization=

2002
[10]

Advances in Neural Information Processing Systems , volume=

Conformal prediction under covariate shift , author=. Advances in Neural Information Processing Systems , volume=
[11]

Uncertainty in Artificial Intelligence , pages=

Distribution-free uncertainty quantification for classification under label shift , author=. Uncertainty in Artificial Intelligence , pages=. 2021 , organization=

2021
[12]

The Annals of Statistics , volume=

Conformal prediction beyond exchangeability , author=. The Annals of Statistics , volume=. 2023 , publisher=

2023
[13]

Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition , pages=

Building a bird recognition app and large scale dataset with citizen scientists: The fine print in fine-grained dataset collection , author=. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition , pages=
[14]

2008 , publisher=

Mathematical Statistics , author=. 2008 , publisher=

2008
[15]

Journal of the American Statistical Association , volume=

Distribution-free predictive inference for regression , author=. Journal of the American Statistical Association , volume=. 2018 , publisher=

2018
[16]

Foundations and Trends in Machine Learning , volume=

Conformal prediction: A gentle introduction , author=. Foundations and Trends in Machine Learning , volume=. 2023 , publisher=

2023
[17]

Asian Conference on Machine Learning , pages=

Conditional validity of inductive conformal predictors , author=. Asian Conference on Machine Learning , pages=. 2012 , organization=

2012
[18]

Advances in Neural Information Processing Systems , volume=

Class-conditional conformal prediction with many classes , author=. Advances in Neural Information Processing Systems , volume=
[19]

Advances in Neural Information Processing Systems , volume=

Classification with valid and adaptive coverage , author=. Advances in Neural Information Processing Systems , volume=
[20]

Advances in Neural Information Processing Systems , volume=

Conformal prediction for class-wise coverage via augmented label rank calibration , author=. Advances in Neural Information Processing Systems , volume=
[21]

Conformal prediction: A gentle introduction

Anastasios N Angelopoulos and Stephen Bates. Conformal prediction: A gentle introduction. Foundations and Trends in Machine Learning, 16 0 (4): 0 494--591, 2023

2023
[22]

Conformal prediction beyond exchangeability

Rina Foygel Barber, Emmanuel J Candes, Aaditya Ramdas, and Ryan J Tibshirani. Conformal prediction beyond exchangeability. The Annals of Statistics, 51 0 (2): 0 816--845, 2023

2023
[23]

Group-weighted conformal prediction

Aabesh Bhattacharyya and Rina Foygel Barber. Group-weighted conformal prediction. arXiv preprint arXiv:2401.17452, 2024

work page arXiv 2024
[24]

Class-conditional conformal prediction with many classes

Tiffany Ding, Anastasios Angelopoulos, Stephen Bates, Michael Jordan, and Ryan J Tibshirani. Class-conditional conformal prediction with many classes. Advances in Neural Information Processing Systems, 36: 0 64555--64576, 2023

2023
[25]

Conformal prediction for long-tailed classification

Tiffany Ding, Jean-Baptiste Fermanian, and Joseph Salmon. Conformal prediction for long-tailed classification. International Conference on Learning Representations, 2026

2026
[26]

Pl@ntNet-300K : A plant image dataset with high label ambiguity and a long-tailed distribution

Camille Garcin, Alexis Joly, Pierre Bonnet, Jean-Christophe Lombardo, Antoine Affouard, Mathias Chouet, Maximilien Servajean, Titouan Lorieul, and Joseph Salmon. Pl@ntNet-300K : A plant image dataset with high label ambiguity and a long-tailed distribution. In Advances in Neural Information Processing Systems, 2021

2021
[27]

Distribution-free predictive inference for regression

Jing Lei, Max G’Sell, Alessandro Rinaldo, Ryan J Tibshirani, and Larry Wasserman. Distribution-free predictive inference for regression. Journal of the American Statistical Association, 113 0 (523): 0 1094--1111, 2018

2018
[28]

Evaluating text categorization i

David D Lewis. Evaluating text categorization i. In Speech and Natural Language: Proceedings of a Workshop Held at Pacific Grove, California, February 19-22, 1991, 1991

1991
[29]

Conformal prediction meets long-tail classification

Shuqi Liu, Jianguo Huang, and Luke Ong. Conformal prediction meets long-tail classification. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 40, pages 23828--23836, 2026

2026
[30]

Inductive confidence machines for regression

Harris Papadopoulos, Kostas Proedrou, Volodya Vovk, and Alex Gammerman. Inductive confidence machines for regression. In European Conference on Machine Learning, pages 345--356. Springer, 2002

2002
[31]

Distribution-free uncertainty quantification for classification under label shift

Aleksandr Podkopaev and Aaditya Ramdas. Distribution-free uncertainty quantification for classification under label shift. In Uncertainty in Artificial Intelligence, pages 844--853. PMLR, 2021

2021
[32]

Least ambiguous set-valued classifiers with bounded error levels

Mauricio Sadinle, Jing Lei, and Larry Wasserman. Least ambiguous set-valued classifiers with bounded error levels. Journal of the American Statistical Association, 114 0 (525): 0 223--234, 2019

2019
[33]

Mathematical Statistics

Jun Shao. Mathematical Statistics. Springer Science & Business Media, 2008

2008
[34]

Conformal prediction for class-wise coverage via augmented label rank calibration

Yuanjie Shi, Subhankar Ghosh, Taha Belkhouja, Janardhan R Doppa, and Yan Yan. Conformal prediction for class-wise coverage via augmented label rank calibration. Advances in Neural Information Processing Systems, 37: 0 132133--132178, 2024

2024
[35]

Conformal prediction under covariate shift

Ryan J Tibshirani, Rina Foygel Barber, Emmanuel Candes, and Aaditya Ramdas. Conformal prediction under covariate shift. Advances in Neural Information Processing Systems, 32, 2019

2019
[36]

Building a bird recognition app and large scale dataset with citizen scientists: The fine print in fine-grained dataset collection

Grant Van Horn, Steve Branson, Ryan Farrell, Scott Haber, Jessie Barry, Panos Ipeirotis, Pietro Perona, and Serge Belongie. Building a bird recognition app and large scale dataset with citizen scientists: The fine print in fine-grained dataset collection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 595--604, 2015

2015
[37]

Conditional validity of inductive conformal predictors

Vladimir Vovk. Conditional validity of inductive conformal predictors. In Asian Conference on Machine Learning, pages 475--490. PMLR, 2012

2012
[38]

Algorithmic Learning in a Random World

Vladimir Vovk, Alexander Gammerman, and Glenn Shafer. Algorithmic Learning in a Random World. Springer, 2005

2005

[1] [2]

International Conference on Learning Representations , year=

Conformal prediction for long-tailed classification , author=. International Conference on Learning Representations , year=

[2] [3]

Journal of the American Statistical Association , volume=

Least ambiguous set-valued classifiers with bounded error levels , author=. Journal of the American Statistical Association , volume=. 2019 , publisher=

2019

[3] [4]

Garcin, Camille and Joly, Alexis and Bonnet, Pierre and Lombardo, Jean-Christophe and Affouard, Antoine and Chouet, Mathias and Servajean, Maximilien and Lorieul, Titouan and Salmon, Joseph , booktitle =

[4] [5]

arXiv preprint arXiv:2502.17264 , year=

Kandinsky Conformal Prediction: Beyond Class-and Covariate-Conditional Coverage , author=. arXiv preprint arXiv:2502.17264 , year=

work page arXiv

[5] [6]

Speech and Natural Language: Proceedings of a Workshop Held at Pacific Grove, California, February 19-22, 1991 , year=

Evaluating text categorization i , author=. Speech and Natural Language: Proceedings of a Workshop Held at Pacific Grove, California, February 19-22, 1991 , year=

1991

[6] [7]

Proceedings of the AAAI Conference on Artificial Intelligence , volume=

Conformal Prediction Meets Long-tail Classification , author=. Proceedings of the AAAI Conference on Artificial Intelligence , volume=

[7] [8]

2005 , publisher=

Algorithmic Learning in a Random World , author=. 2005 , publisher=

2005

[8] [9]

European Conference on Machine Learning , pages=

Inductive confidence machines for regression , author=. European Conference on Machine Learning , pages=. 2002 , organization=

2002

[9] [10]

Advances in Neural Information Processing Systems , volume=

Conformal prediction under covariate shift , author=. Advances in Neural Information Processing Systems , volume=

[10] [11]

Uncertainty in Artificial Intelligence , pages=

Distribution-free uncertainty quantification for classification under label shift , author=. Uncertainty in Artificial Intelligence , pages=. 2021 , organization=

2021

[11] [12]

The Annals of Statistics , volume=

Conformal prediction beyond exchangeability , author=. The Annals of Statistics , volume=. 2023 , publisher=

2023

[12] [13]

Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition , pages=

Building a bird recognition app and large scale dataset with citizen scientists: The fine print in fine-grained dataset collection , author=. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition , pages=

[13] [14]

2008 , publisher=

Mathematical Statistics , author=. 2008 , publisher=

2008

[14] [15]

Journal of the American Statistical Association , volume=

Distribution-free predictive inference for regression , author=. Journal of the American Statistical Association , volume=. 2018 , publisher=

2018

[15] [16]

Foundations and Trends in Machine Learning , volume=

Conformal prediction: A gentle introduction , author=. Foundations and Trends in Machine Learning , volume=. 2023 , publisher=

2023

[16] [17]

Asian Conference on Machine Learning , pages=

Conditional validity of inductive conformal predictors , author=. Asian Conference on Machine Learning , pages=. 2012 , organization=

2012

[17] [18]

Advances in Neural Information Processing Systems , volume=

Class-conditional conformal prediction with many classes , author=. Advances in Neural Information Processing Systems , volume=

[18] [19]

Advances in Neural Information Processing Systems , volume=

Classification with valid and adaptive coverage , author=. Advances in Neural Information Processing Systems , volume=

[19] [20]

Advances in Neural Information Processing Systems , volume=

Conformal prediction for class-wise coverage via augmented label rank calibration , author=. Advances in Neural Information Processing Systems , volume=

[20] [21]

Conformal prediction: A gentle introduction

Anastasios N Angelopoulos and Stephen Bates. Conformal prediction: A gentle introduction. Foundations and Trends in Machine Learning, 16 0 (4): 0 494--591, 2023

2023

[21] [22]

Conformal prediction beyond exchangeability

Rina Foygel Barber, Emmanuel J Candes, Aaditya Ramdas, and Ryan J Tibshirani. Conformal prediction beyond exchangeability. The Annals of Statistics, 51 0 (2): 0 816--845, 2023

2023

[22] [23]

Group-weighted conformal prediction

Aabesh Bhattacharyya and Rina Foygel Barber. Group-weighted conformal prediction. arXiv preprint arXiv:2401.17452, 2024

work page arXiv 2024

[23] [24]

Class-conditional conformal prediction with many classes

Tiffany Ding, Anastasios Angelopoulos, Stephen Bates, Michael Jordan, and Ryan J Tibshirani. Class-conditional conformal prediction with many classes. Advances in Neural Information Processing Systems, 36: 0 64555--64576, 2023

2023

[24] [25]

Conformal prediction for long-tailed classification

Tiffany Ding, Jean-Baptiste Fermanian, and Joseph Salmon. Conformal prediction for long-tailed classification. International Conference on Learning Representations, 2026

2026

[25] [26]

Pl@ntNet-300K : A plant image dataset with high label ambiguity and a long-tailed distribution

Camille Garcin, Alexis Joly, Pierre Bonnet, Jean-Christophe Lombardo, Antoine Affouard, Mathias Chouet, Maximilien Servajean, Titouan Lorieul, and Joseph Salmon. Pl@ntNet-300K : A plant image dataset with high label ambiguity and a long-tailed distribution. In Advances in Neural Information Processing Systems, 2021

2021

[26] [27]

Distribution-free predictive inference for regression

Jing Lei, Max G’Sell, Alessandro Rinaldo, Ryan J Tibshirani, and Larry Wasserman. Distribution-free predictive inference for regression. Journal of the American Statistical Association, 113 0 (523): 0 1094--1111, 2018

2018

[27] [28]

Evaluating text categorization i

David D Lewis. Evaluating text categorization i. In Speech and Natural Language: Proceedings of a Workshop Held at Pacific Grove, California, February 19-22, 1991, 1991

1991

[28] [29]

Conformal prediction meets long-tail classification

Shuqi Liu, Jianguo Huang, and Luke Ong. Conformal prediction meets long-tail classification. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 40, pages 23828--23836, 2026

2026

[29] [30]

Inductive confidence machines for regression

Harris Papadopoulos, Kostas Proedrou, Volodya Vovk, and Alex Gammerman. Inductive confidence machines for regression. In European Conference on Machine Learning, pages 345--356. Springer, 2002

2002

[30] [31]

Distribution-free uncertainty quantification for classification under label shift

Aleksandr Podkopaev and Aaditya Ramdas. Distribution-free uncertainty quantification for classification under label shift. In Uncertainty in Artificial Intelligence, pages 844--853. PMLR, 2021

2021

[31] [32]

Least ambiguous set-valued classifiers with bounded error levels

Mauricio Sadinle, Jing Lei, and Larry Wasserman. Least ambiguous set-valued classifiers with bounded error levels. Journal of the American Statistical Association, 114 0 (525): 0 223--234, 2019

2019

[32] [33]

Mathematical Statistics

Jun Shao. Mathematical Statistics. Springer Science & Business Media, 2008

2008

[33] [34]

Conformal prediction for class-wise coverage via augmented label rank calibration

Yuanjie Shi, Subhankar Ghosh, Taha Belkhouja, Janardhan R Doppa, and Yan Yan. Conformal prediction for class-wise coverage via augmented label rank calibration. Advances in Neural Information Processing Systems, 37: 0 132133--132178, 2024

2024

[34] [35]

Conformal prediction under covariate shift

Ryan J Tibshirani, Rina Foygel Barber, Emmanuel Candes, and Aaditya Ramdas. Conformal prediction under covariate shift. Advances in Neural Information Processing Systems, 32, 2019

2019

[35] [36]

Building a bird recognition app and large scale dataset with citizen scientists: The fine print in fine-grained dataset collection

Grant Van Horn, Steve Branson, Ryan Farrell, Scott Haber, Jessie Barry, Panos Ipeirotis, Pietro Perona, and Serge Belongie. Building a bird recognition app and large scale dataset with citizen scientists: The fine print in fine-grained dataset collection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 595--604, 2015

2015

[36] [37]

Conditional validity of inductive conformal predictors

Vladimir Vovk. Conditional validity of inductive conformal predictors. In Asian Conference on Machine Learning, pages 475--490. PMLR, 2012

2012

[37] [38]

Algorithmic Learning in a Random World

Vladimir Vovk, Alexander Gammerman, and Glenn Shafer. Algorithmic Learning in a Random World. Springer, 2005

2005