pith. sign in

arxiv: 2510.01020 · v2 · submitted 2025-10-01 · 💻 cs.LG · cs.AI· math.ST· stat.ML· stat.TH

The Good, the Bad, and the Sampled: a No-Regret Approach to Safe Online Classification

Pith reviewed 2026-05-18 10:24 UTC · model grok-4.3

classification 💻 cs.LG cs.AImath.STstat.MLstat.TH
keywords safe online classificationno-regret learninglogistic regressionsequential testingerror-constrained classificationmedical screeninghigh probability bounds
0
0 comments X

The pith

A method for online logistic classification keeps misclassification below alpha with high probability using only tilde O of square root T extra tests beyond an oracle.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper examines sequential decisions for binary outcomes such as disease status when the underlying risk follows an unknown logistic model. At each step a decision maker can pay for a revealing test or instead predict from observed features, with the aim of using as few tests as possible while ensuring the error rate stays below a preset alpha with probability at least one minus delta. The proposed procedure estimates both the logistic parameter and the feature distribution online and applies a conservative threshold to the logistic score to decide when prediction is safe. A sympathetic reader would care because the guarantee allows medical screening programs to operate with bounded risk and only modest extra testing cost relative to perfect prior knowledge.

Core claim

We study sequential testing for a binary disease outcome when risk follows an unknown logistic model. At each round the decision maker may either pay for a test revealing the true label or predict the outcome based on patient features and past data. The goal is to minimize costly tests while ensuring the misclassification rate stays below alpha with probability at least one minus delta. We propose a method that jointly estimates the logistic parameter theta star and the feature distribution, using a conservative threshold on the logistic score to decide when to test. We prove our procedure achieves the target error with high probability and requires only tilde O of square root T more tests.

What carries the argument

The conservative threshold applied to the online-estimated logistic score, which decides whether to predict or to pay for a label-revealing test while the model parameters and feature distribution are updated jointly.

If this is right

  • The misclassification rate remains below alpha with probability at least one minus delta throughout the sequence.
  • The total number of tests exceeds that of an oracle knowing the true model by only tilde O of square root T.
  • The logistic parameter is estimated accurately enough to support the safe threshold decisions.
  • The procedure applies directly to medical screening tasks that require both safety and efficiency.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same online estimation-plus-threshold idea could be tested on other link functions beyond logistic for risk modeling.
  • In deployment the approach might reduce overall screening costs by limiting tests to those near the decision boundary.
  • The high-probability bound could be checked empirically on public medical datasets by measuring excess tests over long horizons.

Load-bearing premise

The risk follows an unknown logistic model and the feature distribution can be jointly estimated online while a conservative threshold on the logistic score suffices to control the misclassification rate.

What would settle it

A long run of patient classifications in which the observed misclassification rate exceeds alpha more often than the allowed delta or in which the number of excess tests grows faster than order square root of T would show the claimed guarantee does not hold.

Figures

Figures reproduced from arXiv: 2510.01020 by Aldo Pacchiano, Spyros Dragazis, Tavor Z. Baharav.

Figure 1
Figure 1. Figure 1: Threshold-based testing policy. When P and θ ⋆ are known, a threshold decision rule is optimal when the safety constraint is imposed only in expectation, as we show in the following proposition. 3 [PITH_FULL_IMAGE:figures/full_fig_p003_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Pessimistic choice of |τt|. Our testing threshold τt is designed to be systematically pessimistic. We begin with a plug-in estimator of the optimal threshold as τ ⋆ (θ L t , Pˆ t, α). To guarantee safety, we inflate our threshold to account for estimation errors. First, we reduce α to αt = max(0, α − p log(2t 2/δ′)/2t) (discussed in Section D.4) to guarantee (α, δ)-safety, if the true θ ⋆ and P were known.… view at source ↗
Figure 3
Figure 3. Figure 3: Simulation results. First and second row correspond to [PITH_FULL_IMAGE:figures/full_fig_p010_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: B(0, 1) ∩ {x : |⟨x, θ⋆ ⟩| ≤ τ ⋆} Before proving Lemma 12 we will first prove an auxiliary lemma that allows us to work with a more convenient vector in the surface of the unit ball instead of θ ⋆ . For more details about orthogonal transformations we refer the reader to [25]. 24 [PITH_FULL_IMAGE:figures/full_fig_p024_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: The CDF of Beta( d+1 2 , 1 2 ) for various values of d. f(z) = z − 1 2 (1 − z) d−1 2 , log (f(z)) = − 1 2 log z + d − 1 2 log(1 − z), log (f(z))′ = − 1 2z − d − 1 2(1 − z) < 0. Then, for all 0 < z < 1, f ′ (z) = log (f(z))′ f(z) < 0, and so F is concave [PITH_FULL_IMAGE:figures/full_fig_p027_5.png] view at source ↗
read the original abstract

We study sequential testing for a binary disease outcome when risk follows an unknown logistic model. At each round, the decision maker may either pay for a test revealing the true label or predict the outcome based on patient features and past data. The goal is to minimize costly tests while ensuring the misclassification rate stays below $\alpha$ with probability at least $1-\delta$. We propose a method that jointly estimates the logistic parameter $\theta^{\star}$ and the feature distribution, using a conservative threshold on the logistic score to decide when to test. We prove our procedure achieves the target error with high probability and requires only $\widetilde O(\sqrt{T})$ more tests than an oracle with full knowledge. This is the first no-regret guarantee for error-constrained logistic testing, with direct applications to medical screening. Simulations corroborate our theoretical results, showing safe classification of patients and efficient estimation of $\theta^{\star}$ with few excess tests.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The manuscript studies sequential testing for binary disease outcomes under an unknown logistic risk model. At each round the decision maker may test (paying for the true label) or predict from features and past data. The goal is to keep the misclassification rate at most α with probability at least 1-δ while minimizing tests. The proposed algorithm jointly estimates the logistic parameter θ* from tested labels and the feature distribution from all observed x_t, then applies a time-varying conservative threshold on the logistic score to decide testing versus prediction. The central claims are that the procedure meets the target error with high probability and incurs only Õ(√T) excess tests relative to an oracle with full knowledge; this is presented as the first no-regret guarantee for error-constrained logistic testing, with supporting simulations.

Significance. If the high-probability guarantee and sublinear excess-query bound hold, the result would constitute a meaningful advance in safe online classification, supplying the first explicit no-regret analysis under a uniform error constraint for logistic models. The medical-screening motivation is concrete and the combination of joint online estimation with an adaptive threshold is technically interesting. Simulations are reported to corroborate both safety and efficient estimation.

major comments (2)
  1. [§4] §4 (Proof of the main high-probability bound, presumably Theorem 1): the argument must control the supremum deviation of the estimated risk under the adaptive sampling rule induced by the conservative threshold. Standard Hoeffding or Bernstein inequalities on fixed samples do not directly apply because the decision to obtain the next label depends on the current estimate, creating dependence between the sampling process and the estimation error. Without an explicit martingale or self-normalized concentration argument that accounts for the stopping times, the claimed uniform 1-δ guarantee over all rounds is not yet established.
  2. [§3] §3 (Algorithm and threshold definition): the precise functional form of the conservative threshold (how it is computed from the current estimates of θ* and the feature distribution) is not stated with sufficient explicitness to verify that it indeed enforces the misclassification rate ≤ α uniformly. The paper should supply the exact expression and the derivation showing why this choice yields the claimed high-probability control.
minor comments (2)
  1. [Notation] The definition of the excess-test quantity relative to the oracle should be stated formally (e.g., as an expectation or high-probability statement) and its relationship to the fitted parameters clarified.
  2. [Experiments] In the experimental section, report the concrete parameter values (α, δ, T, dimension, and how the logistic model and feature distribution are generated) so that the simulation results can be reproduced.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the thorough review and valuable feedback on our manuscript. We address each of the major comments below and have made revisions to improve the clarity and rigor of the presentation.

read point-by-point responses
  1. Referee: [§4] §4 (Proof of the main high-probability bound, presumably Theorem 1): the argument must control the supremum deviation of the estimated risk under the adaptive sampling rule induced by the conservative threshold. Standard Hoeffding or Bernstein inequalities on fixed samples do not directly apply because the decision to obtain the next label depends on the current estimate, creating dependence between the sampling process and the estimation error. Without an explicit martingale or self-normalized concentration argument that accounts for the stopping times, the claimed uniform 1-δ guarantee over all rounds is not yet established.

    Authors: We appreciate the referee's point regarding the need for careful handling of the adaptive sampling in the concentration bounds. Upon reflection, the original proof sketch relied on standard concentration but indeed requires extension to account for the dependence. In the revised version, we will provide a complete proof using self-normalized martingale inequalities (such as those based on Freedman's inequality) to bound the supremum deviation of the estimated risk under the adaptive threshold rule. This will rigorously establish the uniform high-probability control. revision: yes

  2. Referee: [§3] §3 (Algorithm and threshold definition): the precise functional form of the conservative threshold (how it is computed from the current estimates of θ* and the feature distribution) is not stated with sufficient explicitness to verify that it indeed enforces the misclassification rate ≤ α uniformly. The paper should supply the exact expression and the derivation showing why this choice yields the claimed high-probability control.

    Authors: We agree that more explicit details on the threshold are necessary for verification. The conservative threshold at time t is defined as the smallest value τ such that the integral over the feature distribution of the logistic probability adjusted by the estimation error bound is at most α. We will include the precise formula for the threshold in terms of the current estimates ˆθ_t and the empirical feature distribution, along with the derivation from the high-probability bound on the logistic score deviation, in the updated Section 3. revision: yes

Circularity Check

0 steps flagged

Derivation self-contained; no circular reductions to inputs or self-citations

full rationale

The paper presents a joint online estimation procedure for θ* and the feature distribution, combined with a conservative logistic-score threshold for deciding when to test. It claims a high-probability guarantee that misclassification rate ≤ α and Õ(√T) excess tests versus an oracle. No equations or steps in the provided abstract or description reduce the excess-test bound or error control to a fitted parameter by construction, nor do they rely on load-bearing self-citations whose content is unverified. The central no-regret result is derived from the adaptive procedure and concentration arguments, which constitute independent mathematical content rather than tautological renaming or self-definition. This is the expected outcome for a theoretical proof paper whose bounds are stated relative to external oracles and standard tools.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The central claim rests on the logistic risk model and the feasibility of online joint estimation of parameters and feature distribution; these are domain assumptions rather than derived quantities.

axioms (2)
  • domain assumption Risk follows an unknown logistic model
    Explicitly stated as the generative model for the binary outcome in the abstract.
  • domain assumption Feature distribution can be estimated jointly with the logistic parameter
    Used as the basis for the conservative threshold decision rule.

pith-pipeline@v0.9.0 · 5712 in / 1264 out tokens · 34314 ms · 2026-05-18T10:24:00.548117+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

47 extracted references · 47 canonical work pages · 1 internal anchor

  1. [1]

    Improved algorithms for linear stochastic bandits.Advances in neural information processing systems, 24, 2011

    Yasin Abbasi-Yadkori, D´ avid P´ al, and Csaba Szepesv´ ari. Improved algorithms for linear stochastic bandits.Advances in neural information processing systems, 24, 2011. 6

  2. [2]

    Reinforcement learning based recommender systems: A survey.ACM Computing Surveys, 55(7):1–38, 2022

    M Mehdi Afsar, Trafford Crump, and Behrouz Far. Reinforcement learning based recommender systems: A survey.ACM Computing Surveys, 55(7):1–38, 2022. 1

  3. [3]

    Interpretable operations research for high-stakes decisions: Designing the greek covid-19 testing system.INFORMS Journal on Applied Analytics, 52(5):398–411, 2022

    Hamsa Bastani, Kimon Drakopoulos, Vishal Gupta, Jon Vlachogiannis, Christos Had- jichristodoulou, Pagona Lagiou, Gkikas Magiorkinis, Dimitrios Paraskevis, and Sotirios Tsiodras. Interpretable operations research for high-stakes decisions: Designing the greek covid-19 testing system.INFORMS Journal on Applied Analytics, 52(5):398–411, 2022. 2

  4. [4]

    Probably approximately correct labels

    Emmanuel J Cand` es, Andrew Ilyas, and Tijana Zrnic. Probably approximately correct labels. arXiv preprint arXiv:2506.10908, 2025. 15

  5. [5]

    Cambridge university press, 2006

    Nicolo Cesa-Bianchi and G´ abor Lugosi.Prediction, learning, and games. Cambridge university press, 2006. 6

  6. [6]

    Worst-case analysis of selective sampling for linear classification.Journal of Machine Learning Research, 7(7), 2006

    Nicolo Cesa-Bianchi, Claudio Gentile, Luca Zaniboni, and Manfred Warmuth. Worst-case analysis of selective sampling for linear classification.Journal of Machine Learning Research, 7(7), 2006. 2

  7. [7]

    Concentration inequalities and martingale inequalities: a survey

    Fan Chung and Linyuan Lu. Concentration inequalities and martingale inequalities: a survey. Internet mathematics, 3(1):79–127, 2006. 40

  8. [8]

    Machine learning in drug discovery: a review.Artificial intelligence review, 55(3):1947–1999,

    Suresh Dara, Swetha Dhamercherla, Surender Singh Jadav, CH Madhu Babu, and Mohamed Jawed Ahsan. Machine learning in drug discovery: a review.Artificial intelligence review, 55(3):1947–1999,

  9. [9]

    Analysis of perceptron-based active learning

    Sanjoy Dasgupta, Adam Tauman Kalai, and Claire Monteleoni. Analysis of perceptron-based active learning. InInternational conference on computational learning theory, pages 249–263. Springer, 2005. 15

  10. [10]

    Selective sampling and active learning from single and multiple teachers.The Journal of Machine Learning Research, 13(1):2655–2697, 2012

    Ofer Dekel, Claudio Gentile, and Karthik Sridharan. Selective sampling and active learning from single and multiple teachers.The Journal of Machine Learning Research, 13(1):2655–2697, 2012. 2

  11. [11]

    Efficiently learning halfspaces with tsybakov noise

    Ilias Diakonikolas, Daniel M Kane, Vasilis Kontonis, Christos Tzamos, and Nikos Zarifis. Efficiently learning halfspaces with tsybakov noise. InProceedings of the 53rd Annual ACM SIGACT Symposium on Theory of Computing, pages 88–101, 2021. 15

  12. [12]

    Online learning of halfspaces with massart noise.arXiv preprint arXiv:2405.12958, 2024

    Ilias Diakonikolas, Vasilis Kontonis, Christos Tzamos, and Nikos Zarifis. Online learning of halfspaces with massart noise.arXiv preprint arXiv:2405.12958, 2024. 15

  13. [13]

    Towards semi- supervised learning with non-random missing labels

    Yue Duan, Zhen Zhao, Lei Qi, Luping Zhou, Lei Wang, and Yinghuan Shi. Towards semi- supervised learning with non-random missing labels. InProceedings of the IEEE/CVF International Conference on Computer Vision, pages 16121–16131, 2023. 2

  14. [14]

    Computation of the regularized incomplete beta function

    Vera Egorova, Amparo Gil, Javier Segura, NM Temme, et al. Computation of the regularized incomplete beta function. 2023. 24 11

  15. [15]

    Improved optimistic algorithms for logistic bandits

    Louis Faury, Marc Abeille, Cl´ ement Calauz` enes, and Olivier Fercoq. Improved optimistic algorithms for logistic bandits. InInternational Conference on Machine Learning, pages 3052–3060. PMLR,

  16. [16]

    John Wiley & Sons,

    Gerald B Folland.Real analysis: modern techniques and their applications. John Wiley & Sons,

  17. [17]

    Selective sampling using the query by committee algorithm.Machine learning, 28:133–168, 1997

    Yoav Freund, H Sebastian Seung, Eli Shamir, and Naftali Tishby. Selective sampling using the query by committee algorithm.Machine learning, 28:133–168, 1997. 2

  18. [18]

    Online selective classifica- tion with limited feedback.Advances in Neural Information Processing Systems, 34:14529–14541,

    Aditya Gangrade, Anil Kag, Ashok Cutkosky, and Venkatesh Saligrama. Online selective classifica- tion with limited feedback.Advances in Neural Information Processing Systems, 34:14529–14541,

  19. [19]

    Selective classification via one-sided prediction

    Aditya Gangrade, Anil Kag, and Venkatesh Saligrama. Selective classification via one-sided prediction. InInternational Conference on Artificial Intelligence and Statistics, pages 2179–2187. PMLR, 2021. 2 and 15

  20. [20]

    Safe linear bandits over unknown polytopes

    Aditya Gangrade, Tianrui Chen, and Venkatesh Saligrama. Safe linear bandits over unknown polytopes. InThe Thirty Seventh Annual Conference on Learning Theory, pages 1755–1795. PMLR, 2024. 2

  21. [21]

    Safe machine learning.Statistics, 58(3):473–477, 2024

    Paolo Giudici. Safe machine learning.Statistics, 58(3):473–477, 2024. 1

  22. [22]

    Adversarial resilience in sequential prediction via abstention.Advances in Neural Information Processing Systems, 36:8027–8047,

    Surbhi Goel, Steve Hanneke, Shay Moran, and Abhishek Shetty. Adversarial resilience in sequential prediction via abstention.Advances in Neural Information Processing Systems, 36:8027–8047,

  23. [23]

    arXiv preprint arXiv:2205.10330 , year =

    Shangding Gu, Long Yang, Yali Du, Guang Chen, Florian Walter, Jun Wang, and Alois Knoll. A review of safe reinforcement learning: Methods, theory and applications.arXiv preprint arXiv:2205.10330, 2022. 1

  24. [24]

    Toward a general theory of online selective sampling: Trading off mistakes and queries

    Steve Hanneke and Liu Yang. Toward a general theory of online selective sampling: Trading off mistakes and queries. InInternational Conference on Artificial Intelligence and Statistics, pages 3997–4005. PMLR, 2021. 2

  25. [25]

    Cambridge university press, 2012

    Roger A Horn and Charles R Johnson.Matrix analysis. Cambridge university press, 2012. 24

  26. [26]

    Volumes of n-dimensional spheres and ellipsoids, 2014

    Michael Jorgensen. Volumes of n-dimensional spheres and ellipsoids, 2014. 25 and 26

  27. [27]

    Conservative contextual linear bandits.Advances in Neural Information Processing Systems, 30,

    Abbas Kazerouni, Mohammad Ghavamzadeh, Yasin Abbasi Yadkori, and Benjamin Van Roy. Conservative contextual linear bandits.Advances in Neural Information Processing Systems, 30,

  28. [28]

    Cambridge University Press, 2020

    Tor Lattimore and Csaba Szepesv´ ari.Bandit algorithms. Cambridge University Press, 2020. 1, 21, and 39

  29. [29]

    A unified confidence sequence for generalized linear models, with applications to bandits.Advances in Neural Information Processing Systems, 37:124640–124685, 2025

    Junghyun Lee, Se-Young Yun, and Kwang-Sung Jun. A unified confidence sequence for generalized linear models, with applications to bandits.Advances in Neural Information Processing Systems, 37:124640–124685, 2025. 5 and 9

  30. [30]

    Concise formulas for the area and volume of a hyperspherical cap.Asian Journal of Mathematics & Statistics, 4(1):66–70, 2010

    Shengqiao Li. Concise formulas for the area and volume of a hyperspherical cap.Asian Journal of Mathematics & Statistics, 4(1):66–70, 2010. 25

  31. [31]

    Better algorithms for selective sampling

    Francesco Orabona, Nicolo Cesa-Bianchi, et al. Better algorithms for selective sampling. In Proceedings of the 28th international conference on machine learning: Bellevue, Washington, USA, june 28. july 2, 2011, pages 433–440. Omnipress, 2011. 2 and 5 12

  32. [32]

    Stochastic bandits with linear constraints

    Aldo Pacchiano, Mohammad Ghavamzadeh, Peter Bartlett, and Heinrich Jiang. Stochastic bandits with linear constraints. InInternational conference on artificial intelligence and statistics, pages 2827–2835. PMLR, 2021. 2

  33. [33]

    Machine learning portfolio allocation.The Journal of Finance and Data Science, 8:35–54, 2022

    Michael Pinelis and David Ruppert. Machine learning portfolio allocation.The Journal of Finance and Data Science, 8:35–54, 2022. 1

  34. [34]

    Selective sampling and imitation learning via online regression.Advances in Neural Information Processing Systems, 36:67213–67268,

    Ayush Sekhari, Karthik Sridharan, Wen Sun, and Runzhe Wu. Selective sampling and imitation learning via online regression.Advances in Neural Information Processing Systems, 36:67213–67268,

  35. [35]

    Active learning literature survey

    Burr Settles. Active learning literature survey. 2009. 2

  36. [36]

    Query by committee

    H Sebastian Seung, Manfred Opper, and Haim Sompolinsky. Query by committee. InProceedings of the fifth annual workshop on Computational learning theory, pages 287–294, 1992. 2

  37. [37]

    Dynamic Ad Allocation: Bandits with Budgets

    Aleksandrs Slivkins. Dynamic ad allocation: Bandits with budgets.arXiv preprint arXiv:1306.0155,

  38. [38]

    Reinforcement learning.Journal of Cognitive Neuroscience, 11(1):126–134, 1999

    Richard S Sutton, Andrew G Barto, et al. Reinforcement learning.Journal of Cognitive Neuroscience, 11(1):126–134, 1999. 1

  39. [39]

    Optimal aggregation of classifiers in statistical learning.The Annals of Statistics, 32(1):135–166, 2004

    Alexander B Tsybakov. Optimal aggregation of classifiers in statistical learning.The Annals of Statistics, 32(1):135–166, 2004. 15

  40. [40]

    Applications of machine learning in drug discovery and development.Nature reviews Drug discovery, 18(6):463–477, 2019

    Jessica Vamathevan, Dominic Clark, Paul Czodrowski, Ian Dunham, Edgardo Ferran, George Lee, Bin Li, Anant Madabhushi, Parantu Shah, Michaela Spitzer, et al. Applications of machine learning in drug discovery and development.Nature reviews Drug discovery, 18(6):463–477, 2019. 1

  41. [41]

    Cambridge university press, 2018

    Roman Vershynin.High-dimensional probability: An introduction with applications in data science, volume 47. Cambridge university press, 2018. 16, 19, 20, and 41

  42. [42]

    Cambridge university press, 2019

    Martin J Wainwright.High-dimensional statistics: A non-asymptotic viewpoint, volume 48. Cambridge university press, 2019. 18 and 19

  43. [43]

    Power con- strained bandits

    Jiayu Yao, Emma Brunskill, Weiwei Pan, Susan Murphy, and Finale Doshi-Velez. Power con- strained bandits. InMachine Learning for Healthcare Conference, pages 209–259. PMLR, 2021. 2

  44. [44]

    expert” model also releases an uncertainty level Ui about its prediction. The algorithmic challenge is to leverage the uncertainty levels to produce “PAC labels

    Zheqing Zhu and Benjamin Van Roy. Scalable neural contextual bandit for recommender sys- tems. InProceedings of the 32nd ACM International Conference on Information and Knowledge Management, pages 3636–3646, 2023. 1 13 Appendix Contents 1 Introduction 1 2 Preliminaries 2 2.1 Problem Definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ...

  45. [45]

    If ˆYt = 1 thenE(1{ ˆYt ̸=Y t} | ˆYt = 1) = 1−p t

  46. [46]

    sandwich

    Else if ˆYt = 0 thenE(1{ ˆYt ̸=Y t} | ˆYt = 0) =p t. The optimal policy then is to output the prediction with the smallest error. The expected error then is equal to E(1{ ˆYt ̸=Y t})≜min{1−p t, pt}. We denote P(Zt = 0) = ηt. The optimal policy choice is reduced to the following optimization problem. 15 min {ηt} TX t=1 1−η t s.t. 1 T TX t=1 min{1−p t, pt}η...

  47. [47]

    f(z) =z − 1 2 (1−z) d−1 2 , log (f(z)) =− 1 2 logz+ d−1 2 log(1−z), log (f(z))′ =− 1 2z − d−1 2(1−z) <0

    for various values ofd. f(z) =z − 1 2 (1−z) d−1 2 , log (f(z)) =− 1 2 logz+ d−1 2 log(1−z), log (f(z))′ =− 1 2z − d−1 2(1−z) <0. Then, for all 0 < z < 1, f ′(z) = log (f(z))′ f(z) < 0, and so F is concave. Figure 5 illustrates the CDF across various values of parameterd >1. To continue in our analysis, we will need to show that the τ for which we are eval...