pith. sign in

arxiv: 2605.14953 · v1 · pith:4JUC6Y25new · submitted 2026-05-14 · 💻 cs.LG

Efficient Online Conformal Selection with Limited Feedback

Pith reviewed 2026-06-30 21:46 UTC · model grok-4.3

classification 💻 cs.LG
keywords conformal selectionadaptive conformal inferencebandit feedbackonline learningefficiency regretdistribution-free uncertainty quantificationLyapunov functionssemi-bandit feedback
0
0 comments X

The pith

The ACI update rule applied to the control parameter achieves adversarial validity and stochastic efficiency for online conformal selection under bandit feedback.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper shows that a simple update rule from Adaptive Conformal Inference, when used on the right control variable, keeps the average probability of selecting at least one success at the target level for any sequence of inputs. This validity holds even when the only feedback is whether the chosen subset succeeded, without needing full outcome details. The same rule also produces sublinear regret in selection cost compared to a stochastic benchmark when inputs are drawn i.i.d. A reader would care because the approach works under distribution shifts and with far less information than earlier methods, connecting limited-feedback online learning to distribution-free guarantees.

Core claim

We demonstrate that the simple Adaptive Conformal Inference (ACI) update rule, when applied to the appropriate control parameter or dual variable, is both adversarially valid, ensuring the success target is met on average for any input sequence (and hence under distribution shifts), and stochastically efficient, achieving sublinear efficiency regret for i.i.d. inputs against an appropriate stochastic benchmark. We show such guarantees under canonical models capturing bandit and semi-bandit feedback to the agent via a unifying algorithmic technique, and analytic framework involving Lyapunov functions.

What carries the argument

The Adaptive Conformal Inference (ACI) update rule applied to the control parameter or dual variable, analyzed through Lyapunov functions under bandit and semi-bandit feedback models.

If this is right

  • The success probability target is met on average for every possible input sequence, including those with arbitrary distribution shifts.
  • Efficiency regret remains sublinear when inputs are i.i.d., relative to the stochastic benchmark.
  • The guarantees extend to both bandit feedback (only success of the selected set) and semi-bandit feedback.
  • The method applies to more complex selection settings than earlier approaches while using strictly less feedback.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The Lyapunov analysis may allow similar ACI-based updates to be derived for other limited-feedback problems in sequential decision making.
  • The bridge between online learning and conformal methods suggests testing the same dual-variable technique on resource allocation tasks outside selection.
  • If the stochastic benchmark can be computed or approximated in practice, the sublinear regret bound offers a concrete way to quantify efficiency gains over naive fixed-threshold rules.

Load-bearing premise

The existence of canonical models for bandit and semi-bandit feedback together with an appropriate stochastic benchmark against which sublinear efficiency regret can be measured.

What would settle it

A sequence of inputs where repeated application of the ACI update on the control variable produces an average success rate below the target phi, or an i.i.d. sequence where the cumulative efficiency regret grows linearly rather than sublinearly.

Figures

Figures reproduced from arXiv: 2605.14953 by Ali Sinop, Kamesh Munagala, Kostas Kollias, Sreenivas Gollapudi.

Figure 1
Figure 1. Figure 1: Plots of validity and efficiency regret. (Left) Cumulative success rate tracking the target [PITH_FULL_IMAGE:figures/full_fig_p010_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Comparison of projected update vs. boundary rules under an adversarial shift with [PITH_FULL_IMAGE:figures/full_fig_p011_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Plots of validity and efficiency regret for the inventory management setting. [PITH_FULL_IMAGE:figures/full_fig_p030_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Plots of validity and efficiency regret for the AC-OG algorithm using the “learning to [PITH_FULL_IMAGE:figures/full_fig_p030_4.png] view at source ↗
read the original abstract

We address the problem of conformal selection, where an agent must select a minimal subset of options to ensure that at least one ``success'' is identified with a pre-specified target probability $\phi$. While traditional online conformal prediction focuses on maintaining validity for the observed sequence, minimizing the resource cost (efficiency) of such selections, especially under limited feedback, remains a significant challenge. In this work, we consider settings with the most limited ``bandit'' feedback, and demonstrate that the simple Adaptive Conformal Inference (ACI) update rule, when applied to the appropriate control parameter or dual variable, is both adversarially valid, ensuring the success target is met on average for any input sequence (and hence under distribution shifts), and stochastically efficient, achieving sublinear efficiency regret for $i.i.d.$ inputs against an appropriate stochastic benchmark. We show such guarantees under canonical models capturing bandit and semi-bandit feedback to the agent via a unifying algorithmic technique, and analytic framework involving Lyapunov functions. Our approach handles more complex settings than prior work, while requiring significantly less feedback, and our results provide a new theoretical bridge between efficient online learning with limited feedback and distribution-free uncertainty quantification.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 1 minor

Summary. The manuscript proposes an approach to online conformal selection under limited feedback (bandit and semi-bandit). It applies the Adaptive Conformal Inference (ACI) update rule to an appropriate control parameter or dual variable. The central claims are that this yields adversarial validity, ensuring the success probability target φ is met on average for arbitrary input sequences (hence under distribution shifts), and stochastic efficiency, with sublinear efficiency regret under i.i.d. inputs against an appropriate stochastic benchmark. The analysis uses Lyapunov functions and is claimed to handle more complex settings than prior work with less feedback.

Significance. If the results hold, the work provides a new theoretical bridge between efficient online learning with limited feedback and distribution-free uncertainty quantification. The use of a unifying algorithmic technique and Lyapunov analytic framework for both validity and efficiency under partial observations is a notable strength. This could enable conformal selection in settings where full feedback is unavailable, extending the applicability of conformal methods.

major comments (2)
  1. [Abstract] Abstract: the stochastic efficiency claim is stated as 'achieving sublinear efficiency regret for i.i.d. inputs against an appropriate stochastic benchmark,' but the abstract supplies no explicit construction of this benchmark (e.g., best fixed threshold, optimal policy in hindsight, or clairvoyant selector) nor the corresponding regret expression under the bandit feedback model. Without this, it is impossible to verify whether the benchmark is observable or well-defined when only partial feedback is available, which is load-bearing for the efficiency half of the central claim.
  2. [Abstract] Abstract: the adversarial validity claim is said to follow from applying the ACI rule to the 'appropriate control parameter or dual variable' via Lyapunov analysis that 'absorbs the partial observation into the update,' yet the abstract gives no drift condition, Lyapunov function definition, or step showing how the bandit observation model is incorporated. This leaves the validity guarantee at a high-level description without verifiable closure of the analysis.
minor comments (1)
  1. The abstract refers to 'canonical models capturing bandit and semi-bandit feedback' without a one-sentence definition or pointer to the section where they are formalized; adding this would improve accessibility for readers unfamiliar with the feedback models.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive comments on the abstract. Both points identify places where the abstract is too high-level; we will revise it to include the requested explicit constructions and analysis sketches while preserving brevity. No changes to the main technical results are needed.

read point-by-point responses
  1. Referee: [Abstract] Abstract: the stochastic efficiency claim is stated as 'achieving sublinear efficiency regret for i.i.d. inputs against an appropriate stochastic benchmark,' but the abstract supplies no explicit construction of this benchmark (e.g., best fixed threshold, optimal policy in hindsight, or clairvoyant selector) nor the corresponding regret expression under the bandit feedback model. Without this, it is impossible to verify whether the benchmark is observable or well-defined when only partial feedback is available, which is load-bearing for the efficiency half of the central claim.

    Authors: We agree the abstract should be explicit. The stochastic benchmark is the optimal fixed-threshold policy that knows the marginal success probability p* under the i.i.d. distribution and selects the minimal number of arms to meet φ exactly in expectation; the efficiency regret is the difference between the algorithm’s cumulative selection cost and that of this benchmark, where cost is measured only on the arms whose outcomes are revealed by the bandit feedback. Because the benchmark depends solely on the known marginal p* (which can be estimated from the same partial observations), it remains well-defined and observable under bandit feedback. We will add a parenthetical sentence stating this construction and the resulting O(√T) regret bound. revision: yes

  2. Referee: [Abstract] Abstract: the adversarial validity claim is said to follow from applying the ACI rule to the 'appropriate control parameter or dual variable' via Lyapunov analysis that 'absorbs the partial observation into the update,' yet the abstract gives no drift condition, Lyapunov function definition, or step showing how the bandit observation model is incorporated. This leaves the validity guarantee at a high-level description without verifiable closure of the analysis.

    Authors: We accept that the abstract omits the key analytic steps. The Lyapunov function is V_t = (1/2) (∑_{s=1}^t (φ - r_s))^2 where r_s is the indicator of success on the selected arm at time s; the ACI update is applied directly to the dual variable λ_t that controls the selection threshold. Under bandit feedback the update uses only the observed success indicator on the chosen arm, which produces a negative drift of -η V_t + O(1) that holds pathwise for any sequence, thereby closing the adversarial validity argument. We will insert a one-sentence outline of this Lyapunov drift in the revised abstract. revision: yes

Circularity Check

0 steps flagged

No significant circularity; ACI applied as external primitive with independent Lyapunov analysis

full rationale

The abstract presents the ACI update rule as an existing primitive applied to a dual variable for conformal selection under bandit feedback. Adversarial validity is claimed to follow from standard ACI properties for arbitrary sequences, while stochastic efficiency uses a Lyapunov framework against a canonical benchmark for i.i.d. inputs. No quoted equations or text indicate that the benchmark is defined in terms of the result itself, that parameters are fitted then renamed as predictions, or that load-bearing steps reduce to self-citations. The derivation is self-contained against external ACI and Lyapunov tools.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract-only review; no explicit free parameters, axioms, or invented entities are stated. The target probability phi is a user-specified input rather than a fitted parameter.

pith-pipeline@v0.9.1-grok · 5743 in / 1057 out tokens · 32371 ms · 2026-06-30T21:46:18.680631+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

43 extracted references · 11 canonical work pages · 1 internal anchor

  1. [1]

    Shipra Agrawal and Nikhil R. Devanur. Bandits with concave rewards and convex knapsacks. InProceedings of the Fifteenth ACM Conference on Economics and Computation, EC ’14, page 989–1006, New York, NY, USA, 2014. Association for Computing Machinery. doi: 10.1145/ 2600057.2602844. URLhttps://doi.org/10.1145/2600057.2602844

  2. [2]

    Conformal PID control for time series prediction

    Anastasios Angelopoulos, Emmanuel Candes, and Ryan Tibshirani. Conformal PID control for time series prediction. In A. Oh, T. Naumann, A. Globerson, K. Saenko, M. Hardt, and S. Levine, editors,Advances in Neural Information Processing Systems, volume 36, pages 23047–23074. Curran Associates, Inc., 2023. URLhttps://proceedings.neurips.cc/paper_ files/paper...

  3. [3]

    A Gentle Introduction to Conformal Prediction and Distribution-Free Uncertainty Quantification

    Anastasios N. Angelopoulos and Stephen Bates. A gentle introduction to conformal prediction and distribution-free uncertainty quantification.arXiv preprint arXiv:2107.07511, 2021. URL https://arxiv.org/abs/2107.07511

  4. [4]

    Angelopoulos, Michael I

    Anastasios N. Angelopoulos, Michael I. Jordan, and Ryan J. Tibshirani. Gradient equilibrium 11 in online learning: Theory and applications.Journal of Machine Learning Research, 26(305): 1–68, 2025. URLhttp://jmlr.org/papers/v26/25-0356.html

  5. [5]

    Online conformal prediction with decaying step sizes

    Anastasios Nikolas Angelopoulos, Rina Barber, and Stephen Bates. Online conformal prediction with decaying step sizes. In Ruslan Salakhutdinov, Zico Kolter, Katherine Heller, Adrian Weller, Nuria Oliver, Jonathan Scarlett, and Felix Berkenkamp, editors,Proceedings of the 41st International Conference on Machine Learning, volume 235 ofProceedings of Machin...

  6. [6]

    Online conformal prediction via online optimization

    Felipe Areces, Christopher Mohri, Tatsunori Hashimoto, and John Duchi. Online conformal prediction via online optimization. InForty-second International Conference on Machine Learning, 2025. URLhttps://openreview.net/forum?id=KwGc2JUIDK

  7. [7]

    P. Auer, N. Cesa-Bianchi, and P. Fischer. Finite-time analysis of the multi-armed bandit problem.Machine Learning, 47:235–256, 2002. URLhttp://www2.compute.dtu.dk/pubdb/ pubs/2088-full.html

  8. [8]

    Bandits with knapsacks

    Ashwinkumar Badanidiyuru, Robert Kleinberg, and Aleksandrs Slivkins. Bandits with knapsacks. J. ACM, 65(3), March 2018. ISSN 0004-5411. doi: 10.1145/3164539. URLhttps://doi.org/ 10.1145/3164539

  9. [9]

    Conformal prediction beyond exchangeability.The Annals of Statistics, 51(2):816–845, 2023

    Rina Foygel Barber, Emmanuel Candès, Aaditya Ramdas, and Ryan Tibshirani. Conformal prediction beyond exchangeability.The Annals of Statistics, 51(2):816–845, 2023

  10. [10]

    A lyapunov-based methodology for constrained optimization with bandit feedback.Proceedings of the AAAI Conference on Artificial Intelligence, 36(4):3716–3723, Jun

    Semih Cayci, Yilin Zheng, and Atilla Eryilmaz. A lyapunov-based methodology for constrained optimization with bandit feedback.Proceedings of the AAAI Conference on Artificial Intelligence, 36(4):3716–3723, Jun. 2022. doi: 10.1609/aaai.v36i4.20285. URL https://ojs.aaai.org/ index.php/AAAI/article/view/20285

  11. [11]

    K. L. Chung. On a stochastic approximation method.The Annals of Mathematical Statistics, 25(3):463–483, 1954. ISSN 00034851. URLhttp://www.jstor.org/stable/2236830

  12. [12]

    Jovanović

    Dongsheng Ding, Xiaohan Wei, Zhuoran Yang, Zhaoran Wang, and Mihailo R. Jovanović. Provably efficient safe exploration via primal-dual policy optimization, 2020. URLhttps: //arxiv.org/abs/2003.00534

  13. [13]

    Achieving risk control in online learning settings.Transactions on Machine Learning Research, 2023

    Shai Feldman, Liran Ringel, Stephen Bates, and Yaniv Romano. Achieving risk control in online learning settings.Transactions on Machine Learning Research, 2023. ISSN 2835-8856. URL https://openreview.net/forum?id=5Y04GWvoJu

  14. [14]

    Fisher, G.L

    M.L. Fisher, G.L. Nemhauser, and L.A. Wolsey. An analysis of approximations for maximizing submodular set functions. LIDAM Reprints CORE 341, Universitat catholique de Louvain, Center for Operations Research and Econometrics (CORE), Jan 1978. URLhttps://ideas. repec.org/p/cor/louvrp/341.html

  15. [15]

    Volume optimality in conformal prediction with structured prediction sets

    Chao Gao, Liren Shan, Vaidehi Srinivas, and Aravindan Vijayaraghavan. Volume optimality in conformal prediction with structured prediction sets. InForty-second International Conference on Machine Learning, 2025. URLhttps://openreview.net/forum?id=oNDhnGrD51

  16. [16]

    Stochastic online conformal prediction with semi-bandit feedback

    Haosen Ge, Hamsa Bastani, and Osbert Bastani. Stochastic online conformal prediction with semi-bandit feedback. InForty-second International Conference on Machine Learning, 2025. URLhttps://openreview.net/forum?id=IdRrKDsTZ8. 12

  17. [17]

    Adaptive conformal inference under distribution shift

    Isaac Gibbs and Emmanuel Candes. Adaptive conformal inference under distribution shift. In M. Ranzato, A. Beygelzimer, Y. Dauphin, P.S. Liang, and J. Wortman Vaughan, editors, Advances in Neural Information Processing Systems, volume 34, pages 1660–1672. Curran Associates, Inc., 2021. URL https://proceedings.neurips.cc/paper_files/paper/2021/ file/0d441de...

  18. [18]

    Stochastic constrained contextual bandits via lyapunov optimization based estimation to decision framework

    Hengquan Guo and Xin Liu. Stochastic constrained contextual bandits via lyapunov optimization based estimation to decision framework. In Shipra Agrawal and Aaron Roth, editors,Proceedings of Thirty Seventh Conference on Learning Theory, volume247ofProceedings of Machine Learning Research, pages 2204–2231. PMLR, 30 Jun–03 Jul 2024

  19. [19]

    Pai, and Aaron Roth

    Varun Gupta, Christopher Jung, Georgy Noarov, Mallesh M. Pai, and Aaron Roth. Online Multivalid Learning: Means, Moments, and Prediction Intervals. In Mark Braverman, editor, 13th Innovations in Theoretical Computer Science Conference (ITCS 2022), volume 215 of Leibniz International Proceedings in Informatics (LIPIcs), pages 82:1–82:24, Dagstuhl, Germany,

  20. [20]

    ISBN 978-3-95977-217-4

    Schloss Dagstuhl – Leibniz-Zentrum für Informatik. ISBN 978-3-95977-217-4. doi: 10. 4230/LIPIcs.ITCS.2022.82. URLhttps://drops.dagstuhl.de/entities/document/10.4230/ LIPIcs.ITCS.2022.82

  21. [21]

    A nonparametric asymptotic analysis of inventory planning with censored demand.Mathematics of Operations Research, 34(1):103–123,

    Woonghee Tim Huh and Paat Rusmevichientong. A nonparametric asymptotic analysis of inventory planning with censored demand.Mathematics of Operations Research, 34(1):103–123,

  22. [22]

    URLhttp://www.jstor.org/stable/40538369

    ISSN 0364765X, 15265471. URLhttp://www.jstor.org/stable/40538369

  23. [23]

    Adversarial bandits with knapsacks.J

    Nicole Immorlica, Karthik Sankararaman, Robert Schapire, and Aleksandrs Slivkins. Adversarial bandits with knapsacks.J. ACM, 69(6), November 2022. ISSN 0004-5411. doi: 10.1145/3557045. URLhttps://doi.org/10.1145/3557045

  24. [24]

    The value of knowing a demand curve: Bounds on regret for online posted-price auctions

    Robert Kleinberg and Tom Leighton. The value of knowing a demand curve: Bounds on regret for online posted-price auctions. InProceedings of the 44th Annual IEEE Symposium on Foundations of Computer Science, FOCS ’03, page 594, USA, 2003. IEEE Computer Society. ISBN 0769520405

  25. [25]

    Kushner and G.G

    H. Kushner and G.G. Yin.Stochastic Approximation and Recursive Algorithms and Applications. Stochastic Modelling and Applied Probability. Springer New York, 2013. ISBN 9781489926968. URLhttps://books.google.com/books?id=sB0GCAAAQBAJ

  26. [26]

    Stochastic online greedy learning with semi-bandit feedbacks

    Tian Lin, Jian Li, and Wei Chen. Stochastic online greedy learning with semi-bandit feedbacks. In C. Cortes, N. Lawrence, D. Lee, M. Sugiyama, and R. Garnett, ed- itors,Advances in Neural Information Processing Systems, volume 28. Curran Asso- ciates, Inc., 2015. URLhttps://proceedings.neurips.cc/paper_files/paper/2015/file/ 0266e33d3f546cb5436a10798e657d...

  27. [27]

    Trading regret for efficiency: online convex optimization with long term constraints.J

    Mehrdad Mahdavi, Rong Jin, and Tianbao Yang. Trading regret for efficiency: online convex optimization with long term constraints.J. Mach. Learn. Res., 13(1):2503–2528, September

  28. [28]

    Neely.Stochastic Network Optimization with Application to Communication and Queueing Systems

    Michael J. Neely.Stochastic Network Optimization with Application to Communication and Queueing Systems. Synthesis Lectures on Communication Networks. Morgan & Claypool Publishers, 2010. ISBN 978-3-031-79994-5. doi: 10.2200/S00271ED1V01Y201006CNT007. URL https://doi.org/10.2200/S00271ED1V01Y201006CNT007. 13

  29. [29]

    Learning diverse rankings with multi-armed bandits

    Filip Radlinski, Robert Kleinberg, and Thorsten Joachims. Learning diverse rankings with multi-armed bandits. In William W. Cohen, Andrew McCallum, and Sam T. Roweis, editors, Machine Learning, Proceedings of the Twenty-Fifth International Conference (ICML 2008), Helsinki, Finland, June 5-9, 2008, volume 307 ofACM International Conference Proceeding Serie...

  30. [30]

    The relationship between no-regret learning and online conformal prediction

    Ramya Ramalingam, Shayan Kiyani, and Aaron Roth. The relationship between no-regret learning and online conformal prediction. InForty-second International Conference on Machine Learning, 2025. URLhttps://openreview.net/forum?id=JAcOYCqNo9

  31. [31]

    Aleksandrs Slivkins, Xingyu Zhou, Karthik Abinav Sankararaman, and Dylan J. Foster. Con- textual bandits with packing and covering constraints: a modular lagrangian approach via regression.J. Mach. Learn. Res., 25(1), January 2024. ISSN 1532-4435

  32. [32]

    Online conformal prediction with efficiency guarantees, 2025

    Vaidehi Srinivas. Online conformal prediction with efficiency guarantees, 2025. URLhttps: //arxiv.org/abs/2507.02496

  33. [33]

    Curran Associates Inc., Red Hook, NY, USA, 2019

    Ryan Tibshirani, Rina Foygel Barber, Emmanuel Candès, and Aaditya Ramdas.Conformal prediction under covariate shift. Curran Associates Inc., Red Hook, NY, USA, 2019

  34. [34]

    Springer, 2005

    Vladimir Vovk, Alexander Gammerman, and Glenn Shafer.Algorithmic Learning in a Random World. Springer, 2005

  35. [35]

    Efficient online set-valued classification with bandit feedback

    Zhou Wang and Xingye Qiao. Efficient online set-valued classification with bandit feedback. In Proceedings of the 41st International Conference on Machine Learning, ICML’24. JMLR.org, 2024

  36. [36]

    Online conformal prediction with adver- sarial semi-bandit feedback via regret minimization, 2026

    Junyoung Yang, Kyungmin Kim, and Sangdon Park. Online conformal prediction with adver- sarial semi-bandit feedback via regret minimization, 2026. URLhttps://arxiv.org/abs/2604. 17984

  37. [37]

    An optimal algorithm for stochastic and adversar- ial bandits

    Julian Zimmert and Yevgeny Seldin. An optimal algorithm for stochastic and adversar- ial bandits. In Kamalika Chaudhuri and Masashi Sugiyama, editors,Proceedings of the Twenty-Second International Conference on Artificial Intelligence and Statistics, volume 89 ofProceedings of Machine Learning Research, pages 467–475. PMLR, 16–18 Apr 2019. URL https://pro...

  38. [38]

    Beating stochastic and adversarial semi- bandits optimally and simultaneously.CoRR, abs/1901.08779, 2019

    Julian Zimmert, Haipeng Luo, and Chen-Yu Wei. Beating stochastic and adversarial semi- bandits optimally and simultaneously.CoRR, abs/1901.08779, 2019. URLhttp://arxiv.org/ abs/1901.08779

  39. [39]

    bandits with knapsacks

    P. Zipkin.Foundations of Inventory Management. McGraw-Hill Companies,Incorporated, 2000. ISBN 9780256113792. URLhttps://books.google.com/books?id=rjzbkQEACAAJ. 14 Acknowledgment of LLM Use The authors used Gemini 3 Pro for the following tasks: (1) identifying and summarizing prior work; (2) paraphrasing and polishing the text; and (3) generating Python co...

  40. [40]

    That is, the expected success probability is strictly increasing as we increase the threshold beyond the optimal point

    Success Margin:For all τ > τ ∗, r(τ) −ϕ≥c 1(τ−τ ∗). That is, the expected success probability is strictly increasing as we increase the threshold beyond the optimal point

  41. [41]

    non-flatness

    Cost Lipschitz:For all τ > τ ∗, c(τ) −c(τ ∗) ≤c 2(τ−τ ∗). This ensures the cost does not jump discontinuously when we slightly over-provision the threshold, and the reduction in Section 2.6 to convert the problem to apply Theorem 2 also uses the same assumption. The main new assumption over Section 2 is the Success Margin or “non-flatness” assumption on e...

  42. [42]

    At each epocht, expertk chooses arme k,t

    Online Greedy Selection (OG):Maintainn experts {Ek}. At each epocht, expertk chooses arme k,t. The probe set isEt ={e 1,t, . . . , eKt,t}. 2.Expert Update:Each expertk≤K t is updated using its marginal gain: ∆k,t =v t({e1,t, . . . , ek,t})−v t({e1,t, . . . , ek−1,t}). We will use a specific update rule in the efficiency analysis below, while the conformal...

  43. [43]

    learning to rank

    Calibration (ACI):Maintain a continuous state variableθt with θ1 = 0. Update θt+1 based on the reward (or success rate)Yt, and determine the next discrete budgetKt+1 as: θt+1 =θ t +η(ϕ−Y t), Kt+1 = min (n,⌈θt+1⌉). Note that in the ACI step, ifθt < 0, the drift is always positive sinceYt = vt(∅) = 0 < ϕ, while if θt > n, so that all arms are probed, the dr...