Single-sample versus case-control sampling scheme for Positive Unlabeled data: the story of two scenarios
Pith reviewed 2026-05-24 04:51 UTC · model grok-4.3
The pith
ERM classifiers for positive-unlabeled data lose performance when the sampling scheme switches from case-control to single-sample unless the empirical risk definition is changed.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Classifiers based on empirical risk minimization for positive-unlabeled data that are designed for case-control sampling deteriorate when applied to single-sample data because their behavior depends on the scenario; accounting for the difference requires only a change in the definition of the empirical risk, and the single-sample analogue of the non-negative risk classifier exhibits significant performance differences from the original especially when half or more positive observations are labeled.
What carries the argument
the definition of the empirical risk inside the ERM objective for positive-unlabeled learning
If this is right
- The non-negative risk classifier designed for case-control data requires a distinct single-sample analogue.
- Performance differences between the two versions are largest when half or more positive observations are labeled.
- Applying an ERM minimizer designed for single-sample data to case-control data produces analogous mismatches.
- The scenario mismatch is resolved by redefining the empirical risk rather than by altering other components of the method.
Where Pith is reading between the lines
- Designers of PU methods may need to state the target sampling scheme explicitly when reporting performance.
- Data collection protocols for PU problems could benefit from recording whether positives and unlabeled examples were drawn jointly or separately.
- The same risk-definition adjustment might be needed when adapting other ERM-based PU techniques to new sampling regimes.
Load-bearing premise
The performance difference between sampling schemes is driven by the definition of the empirical risk estimator rather than by estimation error in class priors or other unmodeled factors.
What would settle it
Train both the original case-control ERM classifier and its single-sample analogue on data generated under each sampling scheme separately, then measure whether the performance gap disappears after the risk definition is adjusted.
Figures
read the original abstract
In the paper we argue that performance of the classifiers based on Empirical Risk Minimization (ERM) for positive unlabeled data, which are designed for case-control sampling scheme may significantly deteriorate when applied to a single-sample scenario. We reveal why their behavior depends, in all but very specific cases, on the scenario. Also, we introduce a single-sample case analogue of the popular non-negative risk classifier designed for case-control data and compare its performance with the original proposal. We show that the significant differences occur between them, especiall when half or more positive of observations are labeled. The opposite case when ERM minimizer designed for the case-control case is applied for single-sample data is also considered and similar conclusions are drawn. Taking into account difference of scenarios requires a sole, but crucial, change in the definition of the Empirical Risk.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper argues that ERM-based PU classifiers designed for case-control sampling deteriorate when applied to single-sample scenarios. It claims to reveal the scenario dependence, introduces a single-sample analogue of the non-negative risk classifier, and reports significant performance gaps (especially when half or more positives are labeled). The central assertion is that the sampling difference is accounted for by a sole change in the empirical risk definition.
Significance. If the central claim holds after isolating the risk-definition change, the work would clarify an important modeling distinction in PU learning and supply a corrected estimator for the single-sample case. The empirical comparisons could help practitioners avoid mismatched risk estimators. The manuscript does not yet supply the derivations, controlled experiments, or dataset details needed to evaluate whether the reported gaps are attributable to the risk change rather than auxiliary modeling choices.
major comments (2)
- [Abstract (central claim) and experimental sections] The central claim requires that the sole modification to the empirical risk (change in the negative risk term) produces the reported deterioration and improvement. This holds only if class-prior estimation, non-negative risk correction, and unlabeled-data handling are identical when the two risk definitions are compared. The manuscript must demonstrate that these auxiliary procedures are held fixed; otherwise the observed gaps may be artifacts of differing prior estimates or hyper-parameters rather than the risk definition itself.
- [Abstract and experimental results] No derivation details, error analysis, or dataset descriptions are supplied to support the assertion of 'significant differences' and the scenario dependence. Without these, it is not possible to verify whether the performance gap is load-bearing for the risk-definition change or driven by unmodeled factors such as estimation error in class priors.
minor comments (1)
- [Abstract] Typo: 'especiall' should read 'especially'.
Simulated Author's Rebuttal
We thank the referee for the constructive comments on our manuscript. We address each major comment below, clarifying the experimental controls used and committing to revisions that will make the isolation of the risk-definition effect and supporting details fully explicit.
read point-by-point responses
-
Referee: [Abstract (central claim) and experimental sections] The central claim requires that the sole modification to the empirical risk (change in the negative risk term) produces the reported deterioration and improvement. This holds only if class-prior estimation, non-negative risk correction, and unlabeled-data handling are identical when the two risk definitions are compared. The manuscript must demonstrate that these auxiliary procedures are held fixed; otherwise the observed gaps may be artifacts of differing prior estimates or hyper-parameters rather than the risk definition itself.
Authors: We agree that the central claim is valid only when auxiliary components are held fixed. In the reported experiments the class-prior estimator, the non-negative risk correction, and the unlabeled-data handling procedure were identical for both risk definitions; only the empirical-risk formulation itself was altered, and hyper-parameters were selected under the same protocol. We will revise the experimental section to include an explicit statement and table documenting these fixed components, thereby confirming that the observed gaps arise from the risk-definition change. revision: yes
-
Referee: [Abstract and experimental results] No derivation details, error analysis, or dataset descriptions are supplied to support the assertion of 'significant differences' and the scenario dependence. Without these, it is not possible to verify whether the performance gap is load-bearing for the risk-definition change or driven by unmodeled factors such as estimation error in class priors.
Authors: Section 3 contains the derivation of the single-sample non-negative risk estimator and the theoretical analysis of scenario dependence. Dataset descriptions and preprocessing steps are given in the experimental setup. To improve verifiability we will expand the appendix with complete derivation steps, add a formal error analysis that accounts for class-prior estimation error, and supply additional dataset statistics together with the implementation code. revision: yes
Circularity Check
No significant circularity; central claim follows from explicit redefinition of risk under distinct sampling schemes
full rationale
The paper distinguishes case-control versus single-sample PU schemes and shows that ERM classifiers designed for one require a modified risk definition to perform under the other. No load-bearing step reduces a claimed prediction or performance gap to a fitted parameter, self-citation chain, or ansatz imported from the authors' prior work. The argument is carried by the direct comparison of the two risk estimators under the respective sampling models, without any equation that equates an output to its input by construction. External benchmarks (synthetic and real-data experiments) are used to illustrate the difference rather than to validate a self-referential claim.
Axiom & Free-Parameter Ledger
Reference graph
Works this paper leans on
-
[1]
Recommendations as Treatments: Debiasing Learning and Evaluation
Schnabel T, Swaminathan A, Singh A, Chandak N, Joachims T. Recommendation as treatments: debiasing learning and evaluation. ICML, 2016. 48:1670–1679. doi:10.48550/arXiv.1602.05352
work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.1602.05352 2016
-
[2]
Learning from positive and unlabeled data: a survey
Bekker J, Davis J. Learning from positive and unlabeled data: a survey. Machine Learning , 2020. 109(4):719–760. doi:10.1007/s10994-020-05877-5
-
[3]
Building high-performance classifiers using positive and unlabeled examples for text classification
Ke T, Yang B, Zhen L, Tan J, Li Y , Jing L. Building high-performance classifiers using positive and unlabeled examples for text classification. In: International Symposium on Neural Networks. Springer, 2012 pp. 187–195. doi:10.1007/978-3-642-31362-2 21
-
[4]
Maximum likelihood from incomplete data via the EM algorithm
Dempster AP, Laird NM, Rubin DB. Maximum likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society. Series B (Methodological) , 1977. 39(1):1–38
work page 1977
-
[5]
On missing labels, long-tails and propensities in extreme multi-label classification
Shultheis E, Babbar R, Wydmuch M, Dembczy ´nski K. On missing labels, long-tails and propensities in extreme multi-label classification. In: KDD’22. 2022 pp. 1547–1557. doi:10.1145/3534678.3539466
-
[6]
Beyond the selected completely at random assumption for learning from positive and unlabeled data
Bekker J, Robberechts P, Davis J. Beyond the selected completely at random assumption for learning from positive and unlabeled data. In: Proceedings of ECMLPKDD’2019. Springer, Cham, 2019 pp. 71–
work page 2019
-
[7]
doi:10.1007/978-3-030-46147-8 5
-
[8]
Variations and extension of the convex-concave procedure.Optimisation and Engineering,
Lipp T, Boyd S. Variations and extension of the convex-concave procedure.Optimisation and Engineering,
-
[9]
17(2):263–287. doi:10.1007/s11081-015-9294-x
-
[10]
One-class classification approach to variational learning from biased positive unlabeled data
Wawrze ´nczyk A, Mielniczuk J. One-class classification approach to variational learning from biased positive unlabeled data. In: ECAI 2023, to appear
work page 2023
-
[11]
Learning classifiers from only positive and unlabeled data
Elkan C, Noto K. Learning classifiers from only positive and unlabeled data. In: Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 2008 pp. 213–220
work page 2008
-
[12]
Deep Generative Positive-Unlabeled Learning under Selection Bias
Na B, Kim H, Song K, Joo W, Kim YY , Moon IC. Deep Generative Positive-Unlabeled Learning under Selection Bias. In: Proceedings of CIKM’20, CIKM ’20. ACM, New York, NY , USA. 2020 pp. 1155–
work page 2020
-
[13]
Analysis of Learning from Positive and Unlabeled Data
du Plessis MC, Niu G, Sugiyama M. Analysis of Learning from Positive and Unlabeled Data. In: Ghahra- mani Z, Welling M, Cortes C, Lawrence N, Weinberger K (eds.), Advances in Neural Information Pro- cessing Systems, volume 27. Curran Associates, Inc., 2014 pp. 703–711. J. Mielniczuk and A. Wawrze ´nczyk / Single-sample V ersus Case-control Sampling Scheme...
work page 2014
-
[14]
Positive-Unlabeled Learning with Non-Negative Risk Estimator
Kiryo R, Niu G, du Plessis MC, Sugiyama M. Positive-Unlabeled Learning with Non-Negative Risk Estimator. In: Proceedings of the NIPS’17, NIPS’17. Curran Associates Inc., Red Hook, NY , USA. 2017 pp. 1674–1684. ISBN:9781510860964
work page 2017
-
[15]
Learning from Positive and Unlabeled Data with a Selection Bias
Kato M, Teshima T, Honda J. Learning from Positive and Unlabeled Data with a Selection Bias. In: ICLR 2019
work page 2019
-
[16]
MINILM: Deep Self-Attention Distillation for Task- Agnostic Compression of Pre-Trained Transformers
Wang W, Wei F, Dong L, Bao H, Yang N, Zhou M. MINILM: Deep Self-Attention Distillation for Task- Agnostic Compression of Pre-Trained Transformers. In: Proceedings of the 34th International Conference on Neural Information Processing Systems, NIPS’20. Curran Associates Inc., Red Hook, NY , USA, 2020 pp. 5776–5788. doi:10.48550/arXiv.2002.10957
-
[17]
Shaker A, Maaz M, Abdul Rasheed H, Khan S, Yang MH, Khan F. SwiftFormer: Efficient Addi- tive Attention for Transformer-based Real-time Mobile Vision Applications. In: ICCV 2023, doi: 10.48550/arXiv.2303.15446
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.