Conformal-Style Quantile Analyses for Stochastic Bandits

Chengyu Du; Mengfan Xu

arxiv: 2605.07115 · v1 · submitted 2026-05-08 · 💻 cs.LG · stat.ML

Conformal-Style Quantile Analyses for Stochastic Bandits

Chengyu Du , Mengfan Xu This is my paper

Pith reviewed 2026-05-11 00:57 UTC · model grok-4.3

classification 💻 cs.LG stat.ML

keywords stochastic banditsconformal predictionupper-quantile regretadaptive conformal estimationUCB algorithmregret analysisprediction intervals

0 comments

The pith

ACP-UCB1 achieves logarithmic regret for upper-quantile performance in stochastic bandits.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Most stochastic bandit algorithms minimize regret with respect to mean rewards, yet many applications instead favor arms with strong upper-tail behavior. The paper introduces ACP-UCB1, a policy that pairs an adaptive conformal estimate of the upper endpoint of a central prediction interval with a UCB-style optimism bonus. It proves that the policy attains logarithmic upper-quantile regret, with each arm contributing a term of order log n divided by the gap in the quantile metric. The analysis controls the adaptive conformity scores through reward-quantile concentration, a perturbation argument, and deterministic localization of the level. The work also supplies metric-specific regret decompositions that compare ACP-UCB1 directly to classical UCB1 and validates the approach with numerical experiments.

Core claim

We propose ACP-UCB1, a conformal-style policy that combines an adaptive conformal estimate of the upper endpoint F_j^{-1}(1-α/2) with a UCB-type optimism bonus. The technical challenge that conformity scores are recomputed from evolving empirical quantile estimates and evaluated at an adaptive level is met by reward-quantile concentration, a perturbation argument for the recomputed score quantiles, and deterministic localization of the adaptive level. ACP-UCB1 achieves logarithmic upper-quantile regret with per-arm contribution O(log n / Δ_j^ACP). We also provide metric-specific regret decompositions comparing ACP-UCB1 with UCB1.

What carries the argument

ACP-UCB1, the policy that uses an adaptive conformal estimate of the upper prediction-interval endpoint together with a UCB optimism bonus, with the recomputed scores controlled by concentration, perturbation, and localization arguments.

If this is right

Each suboptimal arm contributes at most O(log n / Δ_j^ACP) to the upper-quantile regret.
Metric-specific regret decompositions allow direct comparison of ACP-UCB1 performance against UCB1 under both mean and upper-quantile criteria.
When the upper-tail target induces an arm ranking different from the mean ranking, ACP-UCB1 selects and exploits a different optimal arm than mean-based policies.
Numerical experiments confirm performance gains precisely in the regime where tail and mean rankings diverge.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same conformal adaptation mechanism could be applied to lower-tail or other fixed-quantile targets by changing the miscoverage parameter α.
The perturbation argument for recomputed conformity scores may transfer to other online problems that repeatedly update quantile estimates inside a decision loop.
The approach opens a route to risk-sensitive bandits that optimize distributional properties beyond a single quantile level.

Load-bearing premise

Conformity scores recomputed from evolving empirical quantile estimates can be controlled via reward-quantile concentration, a perturbation argument, and deterministic localization of the adaptive level.

What would settle it

An instance in which upper-quantile regret grows faster than logarithmically, for example linearly in n, under reward distributions that violate the quantile concentration bounds used to control the adaptive scores.

Figures

Figures reproduced from arXiv: 2605.07115 by Chengyu Du, Mengfan Xu.

**Figure 2.** Figure 2: Final regret as the nominal level α varies. 0 2000 4000 6000 8000 10000 Round 0 200 400 600 800 Cumulative ACP-Regret Gaussian ACP-UCB1 UCB1 0 2000 4000 6000 8000 10000 Round 0 500 1000 1500 Cumulative ACP-Regret Student-t 0 2000 4000 6000 8000 10000 Round 0 200 400 600 800 1000 1200 Cumulative ACP-Regret Skewed Student-t Distributional robustness: ACP-regret over time [PITH_FULL_IMAGE:figures/full_fig_p0… view at source ↗

**Figure 3.** Figure 3: Distributional robustness under cumulative ACP-regret. [PITH_FULL_IMAGE:figures/full_fig_p009_3.png] view at source ↗

read the original abstract

Stochastic bandit algorithms are usually analyzed under a mean-reward criterion, yet many problems favor arms with strong upper-tail performance, which we study herein. For a fixed miscoverage level \(\alpha\), the natural upper-tail target of arm \(j\) is the upper endpoint \(F_j^{-1}(1-\alpha/2)\) of a central prediction interval. This target can rank arms differently from their means, creating a central mismatch with the classical bandit objective. To this end, we propose ACP-UCB1, a conformal-style policy that combines an adaptive conformal estimate of the upper endpoint with a UCB-type optimism bonus. The technical challenge is that the conformity scores used by ACP-UCB1 are recomputed from evolving empirical quantile estimates and evaluated at an adaptive level. We control this endpoint through reward-quantile concentration, a perturbation argument for recomputed score quantiles, and deterministic localization of the adaptive level. ACP-UCB1 achieves logarithmic upper-quantile regret with per-arm contribution \(O(\nicefrac{\log n}{\Delta_j^{\mathrm{ACP}}})\). We also provide metric-specific regret decompositions comparing ACP-UCB1 with UCB1 and use numerical experiments to validate performance and improvement.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

ACP-UCB1 applies adaptive conformal prediction to target upper-quantile endpoints in stochastic bandits and claims logarithmic regret, but the perturbation step for recomputed scores looks like the weakest link.

read the letter

The paper's main move is to swap the usual mean target in UCB for an adaptive conformal estimate of the upper endpoint of a central prediction interval. This makes sense for settings where tail performance matters more than average reward, and the resulting ACP-UCB1 policy is presented with a per-arm regret term of order log n over the quantile gap. They also supply regret decompositions against plain UCB1 and mention numerical checks. That combination of adaptive conformal machinery with optimism is the concrete new piece; it does not collapse to existing mean-based or fixed-quantile analyses on the surface. The framing of the mismatch between classical objectives and upper-tail goals is clear and useful. The technical route they outline—reward-quantile concentration plus a perturbation argument plus deterministic localization of the adaptive level—is laid out in the abstract and appears to be the route to the bound. If the full proofs close the uniformity gaps, the result is a reasonable extension. The soft spot is exactly the one flagged in the stress test. Because conformity scores are recomputed each round from the current empirical quantile estimates, the perturbation step needs a modulus that holds uniformly over histories where some arms still have o(log t) pulls. The localization argument pins the level only after stabilization, so early-round error could accumulate in the bonus and push the realized regret above the claimed order. The abstract does not spell out how the modulus is controlled in the small-sample regime, which leaves the bound's robustness unclear without the details. This work is aimed at researchers who already care about non-mean criteria in bandits or who want to import conformal tools into sequential settings. A reader who wants a new algorithmic template and a regret statement to build on will get value from it, even if they end up tightening the perturbation argument themselves. It is coherent on its own terms and engages the relevant literature, so it deserves a serious referee rather than a desk reject. The experiments and full derivations will decide how much weight the claim carries.

Referee Report

2 major / 2 minor

Summary. The paper proposes ACP-UCB1, a stochastic bandit algorithm targeting upper-quantile performance rather than mean rewards. It employs an adaptive conformal prediction estimate of the upper endpoint F_j^{-1}(1-α/2) combined with a UCB-style optimism bonus on conformity scores that are recomputed each round from evolving empirical quantile estimates and evaluated at an adaptive miscoverage level. The analysis controls this endpoint via reward-quantile concentration, a perturbation argument on the recomputed score quantiles, and deterministic localization of the adaptive level. The central result is a logarithmic upper-quantile regret bound with per-arm contribution O(log n / Δ_j^ACP), accompanied by metric-specific regret decompositions versus UCB1 and numerical experiments.

Significance. If the control arguments hold, the work provides a principled extension of conformal methods to bandit settings with quantile objectives, which are relevant for risk-sensitive or tail-focused applications where mean-based ranking fails. The achievement of classical logarithmic regret rates under this metric, together with explicit decompositions comparing ACP-UCB1 to UCB1, offers a clean theoretical comparison. The adaptive-level mechanism is a distinctive technical feature.

major comments (2)

[Abstract] Abstract (technical challenge paragraph): the perturbation argument for recomputed conformity-score quantiles must hold uniformly over all histories, including those with o(log t) pulls per arm. The manuscript outline does not specify the continuity modulus or Lipschitz constant used, nor does it verify that the modulus remains controlled before the estimator stabilizes; if it degrades, the accumulated error in the UCB bonus can exceed the claimed O(log n / Δ_j^ACP) per-arm term.
[Proof of the main regret bound] Proof of the main regret bound (technical section following the abstract): the deterministic localization of the adaptive level is invoked only after estimator stabilization, but the early-round contribution to regret is not separately bounded. Without an explicit high-probability control on the perturbation error for t ≪ n when sample counts are still small, the overall logarithmic bound does not follow from the three listed ingredients.

minor comments (2)

[Abstract] The notation Δ_j^ACP is introduced without an explicit definition in the abstract; a one-line reminder of its meaning (gap between the target quantile and the next-best arm) would improve readability.
[Abstract] The abstract states that 'metric-specific regret decompositions' are provided, yet no indication is given of whether these are additive or multiplicative and how they differ from the standard UCB1 decomposition; a brief comparison table or equation reference would clarify the contribution.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the careful and constructive review. The comments highlight important points about uniformity and early-round control that require explicit clarification in the manuscript. We address each major comment below and will make the corresponding revisions.

read point-by-point responses

Referee: [Abstract] Abstract (technical challenge paragraph): the perturbation argument for recomputed conformity-score quantiles must hold uniformly over all histories, including those with o(log t) pulls per arm. The manuscript outline does not specify the continuity modulus or Lipschitz constant used, nor does it verify that the modulus remains controlled before the estimator stabilizes; if it degrades, the accumulated error in the UCB bonus can exceed the claimed O(log n / Δ_j^ACP) per-arm term.

Authors: We agree that uniformity of the perturbation argument over all histories must be stated explicitly, including regimes with few pulls per arm. Under the bounded-reward assumption the quantile map is Lipschitz with constant 1/α (which we will record in the revision). We will add a dedicated lemma establishing a uniform high-probability bound on the recomputed-score perturbation that holds for every t ≥ 1 and every history; the bound is obtained by applying the same reward-quantile concentration inequality used later in the proof, together with the deterministic localization of the adaptive level that is active from round 1. The resulting error term is absorbed into the O(log n / Δ_j^ACP) per-arm contribution without inflation. revision: yes
Referee: [Proof of the main regret bound] Proof of the main regret bound (technical section following the abstract): the deterministic localization of the adaptive level is invoked only after estimator stabilization, but the early-round contribution to regret is not separately bounded. Without an explicit high-probability control on the perturbation error for t ≪ n when sample counts are still small, the overall logarithmic bound does not follow from the three listed ingredients.

Authors: The referee is correct that the early-round contribution must be controlled separately before invoking stabilization. In the revised proof we will insert an explicit burn-in analysis: for t ≤ T_0 (where T_0 is a polynomial in log n chosen so that each arm has been pulled Ω(log n) times with high probability), we bound the per-round perturbation error crudely by a constant (using only boundedness of rewards and the fact that the adaptive level is localized deterministically). The total contribution of this finite-length phase is O(log n) and is absorbed into the leading logarithmic term. After T_0 the three ingredients (quantile concentration, perturbation control, and deterministic localization) combine exactly as stated to yield the claimed per-arm bound. revision: yes

Circularity Check

0 steps flagged

No circularity: regret bound derived from external concentration inequalities and adaptive arguments

full rationale

The paper's central claim is that ACP-UCB1 achieves O(log n / Δ_j^ACP) upper-quantile regret by controlling recomputed conformity scores via reward-quantile concentration inequalities, a perturbation argument, and deterministic localization of the adaptive level. These are standard external tools (Hoeffding-type bounds, continuity moduli, and deterministic pinning) that do not reduce the target regret expression to a fitted parameter or self-citation by construction. No equations in the provided abstract or description equate the final bound to an input fit; the derivation chain remains independent of the result itself.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

Based solely on the abstract, the central claim rests on standard concentration properties for stochastic rewards and the validity of adaptive conformal estimation under recomputed scores; no free parameters or invented entities are explicitly introduced.

axioms (2)

domain assumption Reward distributions admit suitable quantile concentration inequalities
Invoked to control the upper-endpoint estimates and the adaptive conformal scores.
domain assumption Perturbation and deterministic localization arguments suffice for recomputed conformity scores
Used to handle the technical challenge of evolving empirical quantiles and adaptive levels.

pith-pipeline@v0.9.0 · 5508 in / 1120 out tokens · 37843 ms · 2026-05-11T00:57:32.430183+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

We control this endpoint through reward-quantile concentration, a perturbation argument for recomputed score quantiles, and deterministic localization of the adaptive level. ACP-UCB1 achieves logarithmic upper-quantile regret with per-arm contribution O(log n / Δ_j^ACP).
IndisputableMonolith/Foundation/AbsoluteFloorClosure.lean absolute_floor_iff_bare_distinguishability unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

Assumption 1... local density regularity around the relevant reward and score quantiles

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

49 extracted references · 49 canonical work pages

[1]

Angelopoulos and Stephen Bates

Anastasios N. Angelopoulos and Stephen Bates. A gentle introduction to conformal prediction and distribution-free uncertainty quantification. Foundations and Trends in Machine Learning, 16 0 (4): 0 494--591, 2023

work page 2023
[2]

Angelopoulos, Rina Foygel Barber, and Stephen Bates

Anastasios N. Angelopoulos, Rina Foygel Barber, and Stephen Bates. Online conformal prediction with decaying step sizes. In Proceedings of the 41st International Conference on Machine Learning (ICML), 2024

work page 2024
[3]

Exploration--exploitation tradeoff using variance estimates in multi-armed bandits

Jean-Yves Audibert, R \'e mi Munos, and Csaba Szepesv \'a ri. Exploration--exploitation tradeoff using variance estimates in multi-armed bandits. Theoretical Computer Science, 410 0 (19): 0 1876--1902, 2009

work page 1902
[4]

Finite-time analysis of the multiarmed bandit problem

Peter Auer, Nicol \`o Cesa-Bianchi, and Paul Fischer. Finite-time analysis of the multiarmed bandit problem. Machine Learning, 47 0 (2--3): 0 235--256, 2002

work page 2002
[5]

Cand \`e s, Aaditya Ramdas, and Ryan J

Rina Foygel Barber, Emmanuel J. Cand \`e s, Aaditya Ramdas, and Ryan J. Tibshirani. Conformal prediction beyond exchangeability. The Annals of Statistics, 51 0 (2): 0 816--845, 2023

work page 2023
[6]

Conformal bandits: Bringing statistical validity and reward efficiency to the small-gap regime, 2025

Simone Cuonzo and Nina Deliu. Conformal bandits: Bringing statistical validity and reward efficiency to the small-gap regime, 2025. arXiv:2512.09850

work page arXiv 2025
[7]

Pure exploration for max-quantile bandits

Yahel David and Nahum Shimkin. Pure exploration for max-quantile bandits. In Machine Learning and Knowledge Discovery in Databases, volume 9851 of Lecture Notes in Computer Science, pages 556--571. Springer, 2016

work page 2016
[8]

Asymptotic minimax character of the sample distribution function and of the classical multinomial estimator

Aryeh Dvoretzky, Jack Kiefer, and Jacob Wolfowitz. Asymptotic minimax character of the sample distribution function and of the classical multinomial estimator. The Annals of Mathematical Statistics, 27 0 (3): 0 642--669, 1956

work page 1956
[9]

Exploration vs exploitation vs safety: Risk-aware multi-armed bandits

Nicolas Galichet, Mich \`e le Sebag, and Olivier Teytaud. Exploration vs exploitation vs safety: Risk-aware multi-armed bandits. In Proceedings of the 5th Asian Conference on Machine Learning, volume 29 of Proceedings of Machine Learning Research, pages 245--260. PMLR, 2013

work page 2013
[10]

The KL-UCB algorithm for bounded stochastic bandits and beyond

Aur \'e lien Garivier and Olivier Capp \'e . The KL-UCB algorithm for bounded stochastic bandits and beyond. Proceedings of the 24th Annual Conference on Learning Theory (COLT), 2011

work page 2011
[11]

Stochastic online conformal prediction with semi-bandit feedback

Haosen Ge, Hamsa Bastani, and Osbert Bastani. Stochastic online conformal prediction with semi-bandit feedback. In Proceedings of the 42nd International Conference on Machine Learning, 2025. ICML 2025 poster

work page 2025
[12]

Cand \`e s

Isaac Gibbs and Emmanuel J. Cand \`e s. Adaptive conformal inference under distribution shift. In Advances in Neural Information Processing Systems, volume 34, 2021

work page 2021
[13]

Probability inequalities for sums of bounded random variables

Wassily Hoeffding. Probability inequalities for sums of bounded random variables. Journal of the American Statistical Association, 58 0 (301): 0 13--30, 1963

work page 1963
[14]

Asymptotically efficient adaptive allocation rules

Tze Leung Lai and Herbert Robbins. Asymptotically efficient adaptive allocation rules. Advances in Applied Mathematics, 6 0 (1): 0 4--22, 1985

work page 1985
[15]

Bandit Algorithms

Tor Lattimore and Csaba Szepesv \'a ri. Bandit Algorithms. Cambridge University Press, 2020

work page 2020
[16]

Tibshirani, and Larry Wasserman

Jing Lei, Max G'Sell, Alessandro Rinaldo, Ryan J. Tibshirani, and Larry Wasserman. Distribution-free predictive inference for regression. Journal of the American Statistical Association, 113 0 (523): 0 1094--1111, 2018

work page 2018
[17]

The tight constant in the D voretzky-- K iefer-- W olfowitz inequality

Pascal Massart. The tight constant in the D voretzky-- K iefer-- W olfowitz inequality. The Annals of Probability, 18 0 (3): 0 1269--1283, 1990

work page 1990
[18]

Nikolakakis, Dionysios S

Konstantinos E. Nikolakakis, Dionysios S. Kalogerias, Or Sheffet, and Anand D. Sarwate. Quantile multi-armed bandits: Optimal best-arm identification and a differentially private scheme. IEEE Journal on Selected Areas in Information Theory, 2 0 (2): 0 534--548, 2021. doi:10.1109/JSAIT.2021.3081525

work page doi:10.1109/jsait.2021.3081525 2021
[19]

Cand \`e s

Yaniv Romano, Evan Patterson, and Emmanuel J. Cand \`e s. Conformalized quantile regression. In Advances in Neural Information Processing Systems, volume 32, 2019

work page 2019
[20]

Risk-aversion in multi-armed bandits

Amir Sani, Alessandro Lazaric, and R \'e mi Munos. Risk-aversion in multi-armed bandits. In Advances in Neural Information Processing Systems, volume 25, 2012

work page 2012
[21]

Qualitative multi-armed bandits: A quantile-based approach

Balazs Szorenyi, R \'o bert Busa-Fekete, Paul Weng, and Eyke H \"u llermeier. Qualitative multi-armed bandits: A quantile-based approach. In Proceedings of the 32nd International Conference on Machine Learning, volume 37 of Proceedings of Machine Learning Research, pages 1660--1668. PMLR, 2015

work page 2015
[22]

Thompson

William R. Thompson. On the likelihood that one unknown probability exceeds another in view of the evidence of two samples. Biometrika, 25 0 (3--4): 0 285--294, 1933

work page 1933
[23]

Algorithmic Learning in a Random World

Vladimir Vovk, Alex Gammerman, and Glenn Shafer. Algorithmic Learning in a Random World. Springer, 2005

work page 2005
[24]

Adaptive conformal predictions for time series

Margaux Zaffran, Aymeric Dieuleveut, Olivier F \'e ron, Yannig Goude, and Julie Josse. Adaptive conformal predictions for time series. In Proceedings of the 39th International Conference on Machine Learning (ICML), 2022

work page 2022
[25]

2025 , eprint =

Cuonzo, Simone and Deliu, Nina , title =. 2025 , eprint =

work page 2025
[26]

Finite-time Analysis of the Multiarmed Bandit Problem , journal =

Auer, Peter and Cesa-Bianchi, Nicol. Finite-time Analysis of the Multiarmed Bandit Problem , journal =

work page
[27]

Adaptive Conformal Inference under Distribution Shift , booktitle =

Gibbs, Isaac and Cand. Adaptive Conformal Inference under Distribution Shift , booktitle =

work page
[28]

Bandit Algorithms , publisher =

Lattimore, Tor and Szepesv. Bandit Algorithms , publisher =

work page
[29]

Advances in Applied Mathematics , volume =

Lai, Tze Leung and Robbins, Herbert , title =. Advances in Applied Mathematics , volume =

work page
[30]

The Annals of Probability , volume =

Massart, Pascal , title =. The Annals of Probability , volume =

work page
[31]

and Barber, Rina Foygel and Bates, Stephen , title =

Angelopoulos, Anastasios N. and Barber, Rina Foygel and Bates, Stephen , title =. Proceedings of the 41st International Conference on Machine Learning (ICML) , year =

work page
[32]

Garivier, Aur. The. Proceedings of the 24th Annual Conference on Learning Theory (COLT) , year =

work page
[33]

Exploration--Exploitation Tradeoff Using Variance Estimates in Multi-armed Bandits , journal =

Audibert, Jean-Yves and Munos, R. Exploration--Exploitation Tradeoff Using Variance Estimates in Multi-armed Bandits , journal =

work page
[34]

, title =

Thompson, William R. , title =. Biometrika , volume =

work page
[35]

Vovk, Vladimir and Gammerman, Alex and Shafer, Glenn , title =

work page
[36]

and Wasserman, Larry , title =

Lei, Jing and G'Sell, Max and Rinaldo, Alessandro and Tibshirani, Ryan J. and Wasserman, Larry , title =. Journal of the American Statistical Association , volume =

work page
[37]

and Bates, Stephen , title =

Angelopoulos, Anastasios N. and Bates, Stephen , title =. Foundations and Trends in Machine Learning , volume =

work page
[38]

Adaptive Conformal Predictions for Time Series , booktitle =

Zaffran, Margaux and Dieuleveut, Aymeric and F. Adaptive Conformal Predictions for Time Series , booktitle =

work page
[39]

Conformal Prediction Beyond Exchangeability , journal =

Barber, Rina Foygel and Cand. Conformal Prediction Beyond Exchangeability , journal =

work page
[40]

Bandits with Heavy Tail , journal =

Bubeck, S. Bandits with Heavy Tail , journal =

work page
[41]

Risk-Aversion in Multi-armed Bandits , booktitle =

Sani, Amir and Lazaric, Alessandro and Munos, R. Risk-Aversion in Multi-armed Bandits , booktitle =

work page
[42]

Qualitative Multi-Armed Bandits: A Quantile-Based Approach , booktitle =

Szorenyi, Balazs and Busa-Fekete, R. Qualitative Multi-Armed Bandits: A Quantile-Based Approach , booktitle =

work page
[43]

and Kalogerias, Dionysios S

Nikolakakis, Konstantinos E. and Kalogerias, Dionysios S. and Sheffet, Or and Sarwate, Anand D. , title =. IEEE Journal on Selected Areas in Information Theory , volume =. 2021 , doi =

work page 2021
[44]

Conformalized Quantile Regression , booktitle =

Romano, Yaniv and Patterson, Evan and Cand. Conformalized Quantile Regression , booktitle =

work page
[45]

Proceedings of the 42nd International Conference on Machine Learning , year =

Ge, Haosen and Bastani, Hamsa and Bastani, Osbert , title =. Proceedings of the 42nd International Conference on Machine Learning , year =

work page
[46]

Journal of the American Statistical Association , volume=

Probability inequalities for sums of bounded random variables , author=. Journal of the American Statistical Association , volume=

work page
[47]

The Annals of Mathematical Statistics , volume=

Asymptotic minimax character of the sample distribution function and of the classical multinomial estimator , author=. The Annals of Mathematical Statistics , volume=

work page
[48]

Machine Learning and Knowledge Discovery in Databases , series =

David, Yahel and Shimkin, Nahum , title =. Machine Learning and Knowledge Discovery in Databases , series =

work page
[49]

Exploration vs Exploitation vs Safety: Risk-Aware Multi-Armed Bandits , booktitle =

Galichet, Nicolas and Sebag, Mich. Exploration vs Exploitation vs Safety: Risk-Aware Multi-Armed Bandits , booktitle =

work page

[1] [1]

Angelopoulos and Stephen Bates

Anastasios N. Angelopoulos and Stephen Bates. A gentle introduction to conformal prediction and distribution-free uncertainty quantification. Foundations and Trends in Machine Learning, 16 0 (4): 0 494--591, 2023

work page 2023

[2] [2]

Angelopoulos, Rina Foygel Barber, and Stephen Bates

Anastasios N. Angelopoulos, Rina Foygel Barber, and Stephen Bates. Online conformal prediction with decaying step sizes. In Proceedings of the 41st International Conference on Machine Learning (ICML), 2024

work page 2024

[3] [3]

Exploration--exploitation tradeoff using variance estimates in multi-armed bandits

Jean-Yves Audibert, R \'e mi Munos, and Csaba Szepesv \'a ri. Exploration--exploitation tradeoff using variance estimates in multi-armed bandits. Theoretical Computer Science, 410 0 (19): 0 1876--1902, 2009

work page 1902

[4] [4]

Finite-time analysis of the multiarmed bandit problem

Peter Auer, Nicol \`o Cesa-Bianchi, and Paul Fischer. Finite-time analysis of the multiarmed bandit problem. Machine Learning, 47 0 (2--3): 0 235--256, 2002

work page 2002

[5] [5]

Cand \`e s, Aaditya Ramdas, and Ryan J

Rina Foygel Barber, Emmanuel J. Cand \`e s, Aaditya Ramdas, and Ryan J. Tibshirani. Conformal prediction beyond exchangeability. The Annals of Statistics, 51 0 (2): 0 816--845, 2023

work page 2023

[6] [6]

Conformal bandits: Bringing statistical validity and reward efficiency to the small-gap regime, 2025

Simone Cuonzo and Nina Deliu. Conformal bandits: Bringing statistical validity and reward efficiency to the small-gap regime, 2025. arXiv:2512.09850

work page arXiv 2025

[7] [7]

Pure exploration for max-quantile bandits

Yahel David and Nahum Shimkin. Pure exploration for max-quantile bandits. In Machine Learning and Knowledge Discovery in Databases, volume 9851 of Lecture Notes in Computer Science, pages 556--571. Springer, 2016

work page 2016

[8] [8]

Asymptotic minimax character of the sample distribution function and of the classical multinomial estimator

Aryeh Dvoretzky, Jack Kiefer, and Jacob Wolfowitz. Asymptotic minimax character of the sample distribution function and of the classical multinomial estimator. The Annals of Mathematical Statistics, 27 0 (3): 0 642--669, 1956

work page 1956

[9] [9]

Exploration vs exploitation vs safety: Risk-aware multi-armed bandits

Nicolas Galichet, Mich \`e le Sebag, and Olivier Teytaud. Exploration vs exploitation vs safety: Risk-aware multi-armed bandits. In Proceedings of the 5th Asian Conference on Machine Learning, volume 29 of Proceedings of Machine Learning Research, pages 245--260. PMLR, 2013

work page 2013

[10] [10]

The KL-UCB algorithm for bounded stochastic bandits and beyond

Aur \'e lien Garivier and Olivier Capp \'e . The KL-UCB algorithm for bounded stochastic bandits and beyond. Proceedings of the 24th Annual Conference on Learning Theory (COLT), 2011

work page 2011

[11] [11]

Stochastic online conformal prediction with semi-bandit feedback

Haosen Ge, Hamsa Bastani, and Osbert Bastani. Stochastic online conformal prediction with semi-bandit feedback. In Proceedings of the 42nd International Conference on Machine Learning, 2025. ICML 2025 poster

work page 2025

[12] [12]

Cand \`e s

Isaac Gibbs and Emmanuel J. Cand \`e s. Adaptive conformal inference under distribution shift. In Advances in Neural Information Processing Systems, volume 34, 2021

work page 2021

[13] [13]

Probability inequalities for sums of bounded random variables

Wassily Hoeffding. Probability inequalities for sums of bounded random variables. Journal of the American Statistical Association, 58 0 (301): 0 13--30, 1963

work page 1963

[14] [14]

Asymptotically efficient adaptive allocation rules

Tze Leung Lai and Herbert Robbins. Asymptotically efficient adaptive allocation rules. Advances in Applied Mathematics, 6 0 (1): 0 4--22, 1985

work page 1985

[15] [15]

Bandit Algorithms

Tor Lattimore and Csaba Szepesv \'a ri. Bandit Algorithms. Cambridge University Press, 2020

work page 2020

[16] [16]

Tibshirani, and Larry Wasserman

Jing Lei, Max G'Sell, Alessandro Rinaldo, Ryan J. Tibshirani, and Larry Wasserman. Distribution-free predictive inference for regression. Journal of the American Statistical Association, 113 0 (523): 0 1094--1111, 2018

work page 2018

[17] [17]

The tight constant in the D voretzky-- K iefer-- W olfowitz inequality

Pascal Massart. The tight constant in the D voretzky-- K iefer-- W olfowitz inequality. The Annals of Probability, 18 0 (3): 0 1269--1283, 1990

work page 1990

[18] [18]

Nikolakakis, Dionysios S

Konstantinos E. Nikolakakis, Dionysios S. Kalogerias, Or Sheffet, and Anand D. Sarwate. Quantile multi-armed bandits: Optimal best-arm identification and a differentially private scheme. IEEE Journal on Selected Areas in Information Theory, 2 0 (2): 0 534--548, 2021. doi:10.1109/JSAIT.2021.3081525

work page doi:10.1109/jsait.2021.3081525 2021

[19] [19]

Cand \`e s

Yaniv Romano, Evan Patterson, and Emmanuel J. Cand \`e s. Conformalized quantile regression. In Advances in Neural Information Processing Systems, volume 32, 2019

work page 2019

[20] [20]

Risk-aversion in multi-armed bandits

Amir Sani, Alessandro Lazaric, and R \'e mi Munos. Risk-aversion in multi-armed bandits. In Advances in Neural Information Processing Systems, volume 25, 2012

work page 2012

[21] [21]

Qualitative multi-armed bandits: A quantile-based approach

Balazs Szorenyi, R \'o bert Busa-Fekete, Paul Weng, and Eyke H \"u llermeier. Qualitative multi-armed bandits: A quantile-based approach. In Proceedings of the 32nd International Conference on Machine Learning, volume 37 of Proceedings of Machine Learning Research, pages 1660--1668. PMLR, 2015

work page 2015

[22] [22]

Thompson

William R. Thompson. On the likelihood that one unknown probability exceeds another in view of the evidence of two samples. Biometrika, 25 0 (3--4): 0 285--294, 1933

work page 1933

[23] [23]

Algorithmic Learning in a Random World

Vladimir Vovk, Alex Gammerman, and Glenn Shafer. Algorithmic Learning in a Random World. Springer, 2005

work page 2005

[24] [24]

Adaptive conformal predictions for time series

Margaux Zaffran, Aymeric Dieuleveut, Olivier F \'e ron, Yannig Goude, and Julie Josse. Adaptive conformal predictions for time series. In Proceedings of the 39th International Conference on Machine Learning (ICML), 2022

work page 2022

[25] [25]

2025 , eprint =

Cuonzo, Simone and Deliu, Nina , title =. 2025 , eprint =

work page 2025

[26] [26]

Finite-time Analysis of the Multiarmed Bandit Problem , journal =

Auer, Peter and Cesa-Bianchi, Nicol. Finite-time Analysis of the Multiarmed Bandit Problem , journal =

work page

[27] [27]

Adaptive Conformal Inference under Distribution Shift , booktitle =

Gibbs, Isaac and Cand. Adaptive Conformal Inference under Distribution Shift , booktitle =

work page

[28] [28]

Bandit Algorithms , publisher =

Lattimore, Tor and Szepesv. Bandit Algorithms , publisher =

work page

[29] [29]

Advances in Applied Mathematics , volume =

Lai, Tze Leung and Robbins, Herbert , title =. Advances in Applied Mathematics , volume =

work page

[30] [30]

The Annals of Probability , volume =

Massart, Pascal , title =. The Annals of Probability , volume =

work page

[31] [31]

and Barber, Rina Foygel and Bates, Stephen , title =

Angelopoulos, Anastasios N. and Barber, Rina Foygel and Bates, Stephen , title =. Proceedings of the 41st International Conference on Machine Learning (ICML) , year =

work page

[32] [32]

Garivier, Aur. The. Proceedings of the 24th Annual Conference on Learning Theory (COLT) , year =

work page

[33] [33]

Exploration--Exploitation Tradeoff Using Variance Estimates in Multi-armed Bandits , journal =

Audibert, Jean-Yves and Munos, R. Exploration--Exploitation Tradeoff Using Variance Estimates in Multi-armed Bandits , journal =

work page

[34] [34]

, title =

Thompson, William R. , title =. Biometrika , volume =

work page

[35] [35]

Vovk, Vladimir and Gammerman, Alex and Shafer, Glenn , title =

work page

[36] [36]

and Wasserman, Larry , title =

Lei, Jing and G'Sell, Max and Rinaldo, Alessandro and Tibshirani, Ryan J. and Wasserman, Larry , title =. Journal of the American Statistical Association , volume =

work page

[37] [37]

and Bates, Stephen , title =

Angelopoulos, Anastasios N. and Bates, Stephen , title =. Foundations and Trends in Machine Learning , volume =

work page

[38] [38]

Adaptive Conformal Predictions for Time Series , booktitle =

Zaffran, Margaux and Dieuleveut, Aymeric and F. Adaptive Conformal Predictions for Time Series , booktitle =

work page

[39] [39]

Conformal Prediction Beyond Exchangeability , journal =

Barber, Rina Foygel and Cand. Conformal Prediction Beyond Exchangeability , journal =

work page

[40] [40]

Bandits with Heavy Tail , journal =

Bubeck, S. Bandits with Heavy Tail , journal =

work page

[41] [41]

Risk-Aversion in Multi-armed Bandits , booktitle =

Sani, Amir and Lazaric, Alessandro and Munos, R. Risk-Aversion in Multi-armed Bandits , booktitle =

work page

[42] [42]

Qualitative Multi-Armed Bandits: A Quantile-Based Approach , booktitle =

Szorenyi, Balazs and Busa-Fekete, R. Qualitative Multi-Armed Bandits: A Quantile-Based Approach , booktitle =

work page

[43] [43]

and Kalogerias, Dionysios S

Nikolakakis, Konstantinos E. and Kalogerias, Dionysios S. and Sheffet, Or and Sarwate, Anand D. , title =. IEEE Journal on Selected Areas in Information Theory , volume =. 2021 , doi =

work page 2021

[44] [44]

Conformalized Quantile Regression , booktitle =

Romano, Yaniv and Patterson, Evan and Cand. Conformalized Quantile Regression , booktitle =

work page

[45] [45]

Proceedings of the 42nd International Conference on Machine Learning , year =

Ge, Haosen and Bastani, Hamsa and Bastani, Osbert , title =. Proceedings of the 42nd International Conference on Machine Learning , year =

work page

[46] [46]

Journal of the American Statistical Association , volume=

Probability inequalities for sums of bounded random variables , author=. Journal of the American Statistical Association , volume=

work page

[47] [47]

The Annals of Mathematical Statistics , volume=

Asymptotic minimax character of the sample distribution function and of the classical multinomial estimator , author=. The Annals of Mathematical Statistics , volume=

work page

[48] [48]

Machine Learning and Knowledge Discovery in Databases , series =

David, Yahel and Shimkin, Nahum , title =. Machine Learning and Knowledge Discovery in Databases , series =

work page

[49] [49]

Exploration vs Exploitation vs Safety: Risk-Aware Multi-Armed Bandits , booktitle =

Galichet, Nicolas and Sebag, Mich. Exploration vs Exploitation vs Safety: Risk-Aware Multi-Armed Bandits , booktitle =

work page