A sharp Sauer inequality for multiclass and list prediction is established in terms of the DS dimension, tight for every alphabet size k, list size ℓ, and dimension value.
hub Canonical reference
A Gentle Introduction to Conformal Prediction and Distribution-Free Uncertainty Quantification
Canonical reference. 83% of citing Pith papers cite this work as background.
abstract
Black-box machine learning models are now routinely used in high-risk settings, like medical diagnostics, which demand uncertainty quantification to avoid consequential model failures. Conformal prediction is a user-friendly paradigm for creating statistically rigorous uncertainty sets/intervals for the predictions of such models. Critically, the sets are valid in a distribution-free sense: they possess explicit, non-asymptotic guarantees even without distributional assumptions or model assumptions. One can use conformal prediction with any pre-trained model, such as a neural network, to produce sets that are guaranteed to contain the ground truth with a user-specified probability, such as 90%. It is easy-to-understand, easy-to-use, and general, applying naturally to problems arising in the fields of computer vision, natural language processing, deep reinforcement learning, and so on. This hands-on introduction is aimed to provide the reader a working understanding of conformal prediction and related distribution-free uncertainty quantification techniques with one self-contained document. We lead the reader through practical theory for and examples of conformal prediction and describe its extensions to complex machine learning tasks involving structured outputs, distribution shift, time-series, outliers, models that abstain, and more. Throughout, there are many explanatory illustrations, examples, and code samples in Python. With each code sample comes a Jupyter notebook implementing the method on a real-data example; the notebooks can be accessed and easily run using our codebase.
hub tools
citation-role summary
citation-polarity summary
claims ledger
- abstract Black-box machine learning models are now routinely used in high-risk settings, like medical diagnostics, which demand uncertainty quantification to avoid consequential model failures. Conformal prediction is a user-friendly paradigm for creating statistically rigorous uncertainty sets/intervals for the predictions of such models. Critically, the sets are valid in a distribution-free sense: they possess explicit, non-asymptotic guarantees even without distributional assumptions or model assumptions. One can use conformal prediction with any pre-trained model, such as a neural network, to produ
co-cited works
representative citing papers
MiCP is the first conformal prediction method for multi-turn LLM pipelines that allocates per-turn error budgets to enable adaptive stopping with an overall coverage guarantee, shown to reduce turns and cost on RAG and ReAct benchmarks.
Proposes a scale-calibrated median-of-means estimator for robust aggregation of distributed PCA estimates on the product of Euclidean space and Grassmann manifold.
The paper derives that calibration-conditional coverage follows a Beta(k, n+1-k) law under continuous i.i.d. exchangeability and quantifies non-i.i.d. departures via Wasserstein distances on transported beta laws, yielding explicit bounds in scale-shift, clustered, and mixing regimes.
GRAPHLCP improves localized conformal prediction on graphs by using feature-aware densification and Personalized PageRank kernels to incorporate topology for better coverage and efficiency.
TRACE creates valid conformal prediction sets for complex generative models by scoring outputs via averaged denoising or velocity errors along stochastic transport paths instead of likelihoods.
Trimming helps conformal prediction under contamination precisely when the anomaly score separates retention probabilities without biasing clean scores, otherwise the retained mixture coefficient prevents substantial decontamination.
PUICL is a transformer pretrained on synthetic PU data from structural causal models that solves positive-unlabeled classification via in-context learning without gradient updates or fitting.
SCALE uses Spectral Graph Conditional Exchangeability (SGCE) and graph wavelets to achieve valid coverage and improved efficiency in conformal prediction for non-exchangeable graph time series by conformalizing high-frequency residuals conditioned on low-frequency embeddings.
SURE-RAG aggregates pair-level claim-evidence relations into interpretable signals for selective RAG answering, reaching 0.9075 Macro-F1 on HotpotQA-RAG v3 while providing auditability and reducing unsafe answers by 37% at 30% coverage.
An intrinsic effective sample size for manifold MCMC is defined via kernel discrepancy as the number of independent draws yielding equivalent expected squared discrepancy to the target.
The profile maximum likelihood estimator for the location in anisotropic hyperbolic wrapped normal models is strongly consistent, asymptotically normal, and attains the Hájek-Le Cam minimax lower bound under squared geodesic loss.
A GNN predicts Gaussians over QAOA parameters to create graph-conditioned trust regions that reduce circuit evaluations for MaxCut from 85-343 down to 45 while keeping approximation ratios within 3 points of heuristics.
A model-agnostic adaptive conformal anomaly detection approach uses weighted quantile bounds learned from past foundation model predictions to deliver interpretable p-value scores with stable calibration under shifts for time series monitoring.
Random team assignments in a professional firm reveal that indirect ties strongly increase new direct tie formation, while effects of degree and local density are smaller and less robust.
Compositional selective specificity (CSS) decomposes generated answers into claims and emits each at the most specific level supported by evidence, raising overcommitment-aware utility from 0.846 to 0.913 on LongFact while retaining 0.938 specificity.
LLM judges display per-document transitivity violations in 33-67% of cases despite low aggregate rates, while conformal prediction set widths serve as reliable indicators of document-level difficulty with cross-judge agreement.
CMRM adds a conformal quantile regularization on prediction margins to any loss, improving noisy-label classification accuracy up to 3.39% across methods and benchmarks while preserving performance at zero noise.
Conformal risk control for bounded non-monotone losses over a grid of size m achieves excess risk of order sqrt(log m / n) with n calibration samples, which is minimax optimal.
PS-DME is a new framework that controls post-selection false coverage rate for distributional KPI estimates via e-values and is provably more sample-efficient than data splitting under explicit conditions.
A model-agnostic Geometric Risk Controller reduces extreme errors in VLM-based OCR by requiring cross-view consensus before accepting outputs.
The work develops an iterative safe planner that adjusts conformal prediction bounds across policy updates via sensitivity analysis to maintain distribution-free safety guarantees despite interaction-induced distribution shifts.
BalanceRAG uses sequential graphical testing on a 2D lattice of threshold pairs to certify safe operating points that meet target risk levels in cascaded RAG while increasing coverage.
C-SymmPI reformulates conditional coverage as miscoverage error over a user-specified function class to deliver near-conditional guarantees under group symmetries and distributional invariance.
citing papers explorer
-
Uncertainty-Aware Transformers: Conformal Prediction for Language Models
CONFIDE applies conformal prediction to transformer embeddings for valid prediction sets, improving accuracy up to 4.09% and efficiency over baselines on models like BERT-tiny.