The Good, the Bad, and the Sampled: a No-Regret Approach to Safe Online Classification
Pith reviewed 2026-05-18 10:24 UTC · model grok-4.3
The pith
A method for online logistic classification keeps misclassification below alpha with high probability using only tilde O of square root T extra tests beyond an oracle.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
We study sequential testing for a binary disease outcome when risk follows an unknown logistic model. At each round the decision maker may either pay for a test revealing the true label or predict the outcome based on patient features and past data. The goal is to minimize costly tests while ensuring the misclassification rate stays below alpha with probability at least one minus delta. We propose a method that jointly estimates the logistic parameter theta star and the feature distribution, using a conservative threshold on the logistic score to decide when to test. We prove our procedure achieves the target error with high probability and requires only tilde O of square root T more tests.
What carries the argument
The conservative threshold applied to the online-estimated logistic score, which decides whether to predict or to pay for a label-revealing test while the model parameters and feature distribution are updated jointly.
If this is right
- The misclassification rate remains below alpha with probability at least one minus delta throughout the sequence.
- The total number of tests exceeds that of an oracle knowing the true model by only tilde O of square root T.
- The logistic parameter is estimated accurately enough to support the safe threshold decisions.
- The procedure applies directly to medical screening tasks that require both safety and efficiency.
Where Pith is reading between the lines
- The same online estimation-plus-threshold idea could be tested on other link functions beyond logistic for risk modeling.
- In deployment the approach might reduce overall screening costs by limiting tests to those near the decision boundary.
- The high-probability bound could be checked empirically on public medical datasets by measuring excess tests over long horizons.
Load-bearing premise
The risk follows an unknown logistic model and the feature distribution can be jointly estimated online while a conservative threshold on the logistic score suffices to control the misclassification rate.
What would settle it
A long run of patient classifications in which the observed misclassification rate exceeds alpha more often than the allowed delta or in which the number of excess tests grows faster than order square root of T would show the claimed guarantee does not hold.
Figures
read the original abstract
We study sequential testing for a binary disease outcome when risk follows an unknown logistic model. At each round, the decision maker may either pay for a test revealing the true label or predict the outcome based on patient features and past data. The goal is to minimize costly tests while ensuring the misclassification rate stays below $\alpha$ with probability at least $1-\delta$. We propose a method that jointly estimates the logistic parameter $\theta^{\star}$ and the feature distribution, using a conservative threshold on the logistic score to decide when to test. We prove our procedure achieves the target error with high probability and requires only $\widetilde O(\sqrt{T})$ more tests than an oracle with full knowledge. This is the first no-regret guarantee for error-constrained logistic testing, with direct applications to medical screening. Simulations corroborate our theoretical results, showing safe classification of patients and efficient estimation of $\theta^{\star}$ with few excess tests.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript studies sequential testing for binary disease outcomes under an unknown logistic risk model. At each round the decision maker may test (paying for the true label) or predict from features and past data. The goal is to keep the misclassification rate at most α with probability at least 1-δ while minimizing tests. The proposed algorithm jointly estimates the logistic parameter θ* from tested labels and the feature distribution from all observed x_t, then applies a time-varying conservative threshold on the logistic score to decide testing versus prediction. The central claims are that the procedure meets the target error with high probability and incurs only Õ(√T) excess tests relative to an oracle with full knowledge; this is presented as the first no-regret guarantee for error-constrained logistic testing, with supporting simulations.
Significance. If the high-probability guarantee and sublinear excess-query bound hold, the result would constitute a meaningful advance in safe online classification, supplying the first explicit no-regret analysis under a uniform error constraint for logistic models. The medical-screening motivation is concrete and the combination of joint online estimation with an adaptive threshold is technically interesting. Simulations are reported to corroborate both safety and efficient estimation.
major comments (2)
- [§4] §4 (Proof of the main high-probability bound, presumably Theorem 1): the argument must control the supremum deviation of the estimated risk under the adaptive sampling rule induced by the conservative threshold. Standard Hoeffding or Bernstein inequalities on fixed samples do not directly apply because the decision to obtain the next label depends on the current estimate, creating dependence between the sampling process and the estimation error. Without an explicit martingale or self-normalized concentration argument that accounts for the stopping times, the claimed uniform 1-δ guarantee over all rounds is not yet established.
- [§3] §3 (Algorithm and threshold definition): the precise functional form of the conservative threshold (how it is computed from the current estimates of θ* and the feature distribution) is not stated with sufficient explicitness to verify that it indeed enforces the misclassification rate ≤ α uniformly. The paper should supply the exact expression and the derivation showing why this choice yields the claimed high-probability control.
minor comments (2)
- [Notation] The definition of the excess-test quantity relative to the oracle should be stated formally (e.g., as an expectation or high-probability statement) and its relationship to the fitted parameters clarified.
- [Experiments] In the experimental section, report the concrete parameter values (α, δ, T, dimension, and how the logistic model and feature distribution are generated) so that the simulation results can be reproduced.
Simulated Author's Rebuttal
We thank the referee for the thorough review and valuable feedback on our manuscript. We address each of the major comments below and have made revisions to improve the clarity and rigor of the presentation.
read point-by-point responses
-
Referee: [§4] §4 (Proof of the main high-probability bound, presumably Theorem 1): the argument must control the supremum deviation of the estimated risk under the adaptive sampling rule induced by the conservative threshold. Standard Hoeffding or Bernstein inequalities on fixed samples do not directly apply because the decision to obtain the next label depends on the current estimate, creating dependence between the sampling process and the estimation error. Without an explicit martingale or self-normalized concentration argument that accounts for the stopping times, the claimed uniform 1-δ guarantee over all rounds is not yet established.
Authors: We appreciate the referee's point regarding the need for careful handling of the adaptive sampling in the concentration bounds. Upon reflection, the original proof sketch relied on standard concentration but indeed requires extension to account for the dependence. In the revised version, we will provide a complete proof using self-normalized martingale inequalities (such as those based on Freedman's inequality) to bound the supremum deviation of the estimated risk under the adaptive threshold rule. This will rigorously establish the uniform high-probability control. revision: yes
-
Referee: [§3] §3 (Algorithm and threshold definition): the precise functional form of the conservative threshold (how it is computed from the current estimates of θ* and the feature distribution) is not stated with sufficient explicitness to verify that it indeed enforces the misclassification rate ≤ α uniformly. The paper should supply the exact expression and the derivation showing why this choice yields the claimed high-probability control.
Authors: We agree that more explicit details on the threshold are necessary for verification. The conservative threshold at time t is defined as the smallest value τ such that the integral over the feature distribution of the logistic probability adjusted by the estimation error bound is at most α. We will include the precise formula for the threshold in terms of the current estimates ˆθ_t and the empirical feature distribution, along with the derivation from the high-probability bound on the logistic score deviation, in the updated Section 3. revision: yes
Circularity Check
Derivation self-contained; no circular reductions to inputs or self-citations
full rationale
The paper presents a joint online estimation procedure for θ* and the feature distribution, combined with a conservative logistic-score threshold for deciding when to test. It claims a high-probability guarantee that misclassification rate ≤ α and Õ(√T) excess tests versus an oracle. No equations or steps in the provided abstract or description reduce the excess-test bound or error control to a fitted parameter by construction, nor do they rely on load-bearing self-citations whose content is unverified. The central no-regret result is derived from the adaptive procedure and concentration arguments, which constitute independent mathematical content rather than tautological renaming or self-definition. This is the expected outcome for a theoretical proof paper whose bounds are stated relative to external oracles and standard tools.
Axiom & Free-Parameter Ledger
axioms (2)
- domain assumption Risk follows an unknown logistic model
- domain assumption Feature distribution can be estimated jointly with the logistic parameter
Lean theorems connected to this paper
-
IndisputableMonolith/Cost/FunctionalEquation.leanwashburn_uniqueness_aczel unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
We propose a method that jointly estimates the logistic parameter θ⋆ and the feature distribution, using a conservative threshold on the logistic score to decide when to test. ... τt ≜ τ⋆(θLt, P̂t, αt − ζt − 2Bt/√λtmin − εQ) + 3Bt/√λtmin + εQ
-
IndisputableMonolith/Foundation/AbsoluteFloorClosure.leanabsolute_floor_iff_bare_distinguishability unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
Definition 1. An algorithm ... satisfies (α,δ)-safety if P(∀T̄, (1/T̄)∑1{Ŷt≠Yt}≤α)≥1−δ
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Reference graph
Works this paper leans on
-
[1]
Yasin Abbasi-Yadkori, D´ avid P´ al, and Csaba Szepesv´ ari. Improved algorithms for linear stochastic bandits.Advances in neural information processing systems, 24, 2011. 6
work page 2011
-
[2]
Reinforcement learning based recommender systems: A survey.ACM Computing Surveys, 55(7):1–38, 2022
M Mehdi Afsar, Trafford Crump, and Behrouz Far. Reinforcement learning based recommender systems: A survey.ACM Computing Surveys, 55(7):1–38, 2022. 1
work page 2022
-
[3]
Hamsa Bastani, Kimon Drakopoulos, Vishal Gupta, Jon Vlachogiannis, Christos Had- jichristodoulou, Pagona Lagiou, Gkikas Magiorkinis, Dimitrios Paraskevis, and Sotirios Tsiodras. Interpretable operations research for high-stakes decisions: Designing the greek covid-19 testing system.INFORMS Journal on Applied Analytics, 52(5):398–411, 2022. 2
work page 2022
-
[4]
Probably approximately correct labels
Emmanuel J Cand` es, Andrew Ilyas, and Tijana Zrnic. Probably approximately correct labels. arXiv preprint arXiv:2506.10908, 2025. 15
-
[5]
Cambridge university press, 2006
Nicolo Cesa-Bianchi and G´ abor Lugosi.Prediction, learning, and games. Cambridge university press, 2006. 6
work page 2006
-
[6]
Nicolo Cesa-Bianchi, Claudio Gentile, Luca Zaniboni, and Manfred Warmuth. Worst-case analysis of selective sampling for linear classification.Journal of Machine Learning Research, 7(7), 2006. 2
work page 2006
-
[7]
Concentration inequalities and martingale inequalities: a survey
Fan Chung and Linyuan Lu. Concentration inequalities and martingale inequalities: a survey. Internet mathematics, 3(1):79–127, 2006. 40
work page 2006
-
[8]
Machine learning in drug discovery: a review.Artificial intelligence review, 55(3):1947–1999,
Suresh Dara, Swetha Dhamercherla, Surender Singh Jadav, CH Madhu Babu, and Mohamed Jawed Ahsan. Machine learning in drug discovery: a review.Artificial intelligence review, 55(3):1947–1999,
work page 1947
-
[9]
Analysis of perceptron-based active learning
Sanjoy Dasgupta, Adam Tauman Kalai, and Claire Monteleoni. Analysis of perceptron-based active learning. InInternational conference on computational learning theory, pages 249–263. Springer, 2005. 15
work page 2005
-
[10]
Ofer Dekel, Claudio Gentile, and Karthik Sridharan. Selective sampling and active learning from single and multiple teachers.The Journal of Machine Learning Research, 13(1):2655–2697, 2012. 2
work page 2012
-
[11]
Efficiently learning halfspaces with tsybakov noise
Ilias Diakonikolas, Daniel M Kane, Vasilis Kontonis, Christos Tzamos, and Nikos Zarifis. Efficiently learning halfspaces with tsybakov noise. InProceedings of the 53rd Annual ACM SIGACT Symposium on Theory of Computing, pages 88–101, 2021. 15
work page 2021
-
[12]
Online learning of halfspaces with massart noise.arXiv preprint arXiv:2405.12958, 2024
Ilias Diakonikolas, Vasilis Kontonis, Christos Tzamos, and Nikos Zarifis. Online learning of halfspaces with massart noise.arXiv preprint arXiv:2405.12958, 2024. 15
-
[13]
Towards semi- supervised learning with non-random missing labels
Yue Duan, Zhen Zhao, Lei Qi, Luping Zhou, Lei Wang, and Yinghuan Shi. Towards semi- supervised learning with non-random missing labels. InProceedings of the IEEE/CVF International Conference on Computer Vision, pages 16121–16131, 2023. 2
work page 2023
-
[14]
Computation of the regularized incomplete beta function
Vera Egorova, Amparo Gil, Javier Segura, NM Temme, et al. Computation of the regularized incomplete beta function. 2023. 24 11
work page 2023
-
[15]
Improved optimistic algorithms for logistic bandits
Louis Faury, Marc Abeille, Cl´ ement Calauz` enes, and Olivier Fercoq. Improved optimistic algorithms for logistic bandits. InInternational Conference on Machine Learning, pages 3052–3060. PMLR,
-
[16]
Gerald B Folland.Real analysis: modern techniques and their applications. John Wiley & Sons,
-
[17]
Selective sampling using the query by committee algorithm.Machine learning, 28:133–168, 1997
Yoav Freund, H Sebastian Seung, Eli Shamir, and Naftali Tishby. Selective sampling using the query by committee algorithm.Machine learning, 28:133–168, 1997. 2
work page 1997
-
[18]
Aditya Gangrade, Anil Kag, Ashok Cutkosky, and Venkatesh Saligrama. Online selective classifica- tion with limited feedback.Advances in Neural Information Processing Systems, 34:14529–14541,
-
[19]
Selective classification via one-sided prediction
Aditya Gangrade, Anil Kag, and Venkatesh Saligrama. Selective classification via one-sided prediction. InInternational Conference on Artificial Intelligence and Statistics, pages 2179–2187. PMLR, 2021. 2 and 15
work page 2021
-
[20]
Safe linear bandits over unknown polytopes
Aditya Gangrade, Tianrui Chen, and Venkatesh Saligrama. Safe linear bandits over unknown polytopes. InThe Thirty Seventh Annual Conference on Learning Theory, pages 1755–1795. PMLR, 2024. 2
work page 2024
-
[21]
Safe machine learning.Statistics, 58(3):473–477, 2024
Paolo Giudici. Safe machine learning.Statistics, 58(3):473–477, 2024. 1
work page 2024
-
[22]
Surbhi Goel, Steve Hanneke, Shay Moran, and Abhishek Shetty. Adversarial resilience in sequential prediction via abstention.Advances in Neural Information Processing Systems, 36:8027–8047,
-
[23]
arXiv preprint arXiv:2205.10330 , year =
Shangding Gu, Long Yang, Yali Du, Guang Chen, Florian Walter, Jun Wang, and Alois Knoll. A review of safe reinforcement learning: Methods, theory and applications.arXiv preprint arXiv:2205.10330, 2022. 1
-
[24]
Toward a general theory of online selective sampling: Trading off mistakes and queries
Steve Hanneke and Liu Yang. Toward a general theory of online selective sampling: Trading off mistakes and queries. InInternational Conference on Artificial Intelligence and Statistics, pages 3997–4005. PMLR, 2021. 2
work page 2021
-
[25]
Cambridge university press, 2012
Roger A Horn and Charles R Johnson.Matrix analysis. Cambridge university press, 2012. 24
work page 2012
-
[26]
Volumes of n-dimensional spheres and ellipsoids, 2014
Michael Jorgensen. Volumes of n-dimensional spheres and ellipsoids, 2014. 25 and 26
work page 2014
-
[27]
Conservative contextual linear bandits.Advances in Neural Information Processing Systems, 30,
Abbas Kazerouni, Mohammad Ghavamzadeh, Yasin Abbasi Yadkori, and Benjamin Van Roy. Conservative contextual linear bandits.Advances in Neural Information Processing Systems, 30,
-
[28]
Cambridge University Press, 2020
Tor Lattimore and Csaba Szepesv´ ari.Bandit algorithms. Cambridge University Press, 2020. 1, 21, and 39
work page 2020
-
[29]
Junghyun Lee, Se-Young Yun, and Kwang-Sung Jun. A unified confidence sequence for generalized linear models, with applications to bandits.Advances in Neural Information Processing Systems, 37:124640–124685, 2025. 5 and 9
work page 2025
-
[30]
Shengqiao Li. Concise formulas for the area and volume of a hyperspherical cap.Asian Journal of Mathematics & Statistics, 4(1):66–70, 2010. 25
work page 2010
-
[31]
Better algorithms for selective sampling
Francesco Orabona, Nicolo Cesa-Bianchi, et al. Better algorithms for selective sampling. In Proceedings of the 28th international conference on machine learning: Bellevue, Washington, USA, june 28. july 2, 2011, pages 433–440. Omnipress, 2011. 2 and 5 12
work page 2011
-
[32]
Stochastic bandits with linear constraints
Aldo Pacchiano, Mohammad Ghavamzadeh, Peter Bartlett, and Heinrich Jiang. Stochastic bandits with linear constraints. InInternational conference on artificial intelligence and statistics, pages 2827–2835. PMLR, 2021. 2
work page 2021
-
[33]
Machine learning portfolio allocation.The Journal of Finance and Data Science, 8:35–54, 2022
Michael Pinelis and David Ruppert. Machine learning portfolio allocation.The Journal of Finance and Data Science, 8:35–54, 2022. 1
work page 2022
-
[34]
Ayush Sekhari, Karthik Sridharan, Wen Sun, and Runzhe Wu. Selective sampling and imitation learning via online regression.Advances in Neural Information Processing Systems, 36:67213–67268,
-
[35]
Active learning literature survey
Burr Settles. Active learning literature survey. 2009. 2
work page 2009
-
[36]
H Sebastian Seung, Manfred Opper, and Haim Sompolinsky. Query by committee. InProceedings of the fifth annual workshop on Computational learning theory, pages 287–294, 1992. 2
work page 1992
-
[37]
Dynamic Ad Allocation: Bandits with Budgets
Aleksandrs Slivkins. Dynamic ad allocation: Bandits with budgets.arXiv preprint arXiv:1306.0155,
work page internal anchor Pith review Pith/arXiv arXiv
-
[38]
Reinforcement learning.Journal of Cognitive Neuroscience, 11(1):126–134, 1999
Richard S Sutton, Andrew G Barto, et al. Reinforcement learning.Journal of Cognitive Neuroscience, 11(1):126–134, 1999. 1
work page 1999
-
[39]
Alexander B Tsybakov. Optimal aggregation of classifiers in statistical learning.The Annals of Statistics, 32(1):135–166, 2004. 15
work page 2004
-
[40]
Jessica Vamathevan, Dominic Clark, Paul Czodrowski, Ian Dunham, Edgardo Ferran, George Lee, Bin Li, Anant Madabhushi, Parantu Shah, Michaela Spitzer, et al. Applications of machine learning in drug discovery and development.Nature reviews Drug discovery, 18(6):463–477, 2019. 1
work page 2019
-
[41]
Cambridge university press, 2018
Roman Vershynin.High-dimensional probability: An introduction with applications in data science, volume 47. Cambridge university press, 2018. 16, 19, 20, and 41
work page 2018
-
[42]
Cambridge university press, 2019
Martin J Wainwright.High-dimensional statistics: A non-asymptotic viewpoint, volume 48. Cambridge university press, 2019. 18 and 19
work page 2019
-
[43]
Jiayu Yao, Emma Brunskill, Weiwei Pan, Susan Murphy, and Finale Doshi-Velez. Power con- strained bandits. InMachine Learning for Healthcare Conference, pages 209–259. PMLR, 2021. 2
work page 2021
-
[44]
Zheqing Zhu and Benjamin Van Roy. Scalable neural contextual bandit for recommender sys- tems. InProceedings of the 32nd ACM International Conference on Information and Knowledge Management, pages 3636–3646, 2023. 1 13 Appendix Contents 1 Introduction 1 2 Preliminaries 2 2.1 Problem Definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ...
work page 2023
-
[45]
If ˆYt = 1 thenE(1{ ˆYt ̸=Y t} | ˆYt = 1) = 1−p t
-
[46]
Else if ˆYt = 0 thenE(1{ ˆYt ̸=Y t} | ˆYt = 0) =p t. The optimal policy then is to output the prediction with the smallest error. The expected error then is equal to E(1{ ˆYt ̸=Y t})≜min{1−p t, pt}. We denote P(Zt = 0) = ηt. The optimal policy choice is reduced to the following optimization problem. 15 min {ηt} TX t=1 1−η t s.t. 1 T TX t=1 min{1−p t, pt}η...
-
[47]
for various values ofd. f(z) =z − 1 2 (1−z) d−1 2 , log (f(z)) =− 1 2 logz+ d−1 2 log(1−z), log (f(z))′ =− 1 2z − d−1 2(1−z) <0. Then, for all 0 < z < 1, f ′(z) = log (f(z))′ f(z) < 0, and so F is concave. Figure 5 illustrates the CDF across various values of parameterd >1. To continue in our analysis, we will need to show that the τ for which we are eval...
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.