Single-sample versus case-control sampling scheme for Positive Unlabeled data: the story of two scenarios

Adam Wawrze\'nczyk; Jan Mielniczuk

arxiv: 2312.02095 · v3 · submitted 2023-12-04 · 💻 cs.LG

Single-sample versus case-control sampling scheme for Positive Unlabeled data: the story of two scenarios

Jan Mielniczuk , Adam Wawrze\'nczyk This is my paper

Pith reviewed 2026-05-24 04:51 UTC · model grok-4.3

classification 💻 cs.LG

keywords positive unlabeled learningempirical risk minimizationcase-control samplingsingle-sample samplingPU classificationnon-negative risk

0 comments

The pith

ERM classifiers for positive-unlabeled data lose performance when the sampling scheme switches from case-control to single-sample unless the empirical risk definition is changed.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper demonstrates that empirical risk minimization classifiers built for positive-unlabeled data under case-control sampling can degrade when the data instead follows a single-sample scheme. Their behavior depends on the sampling scenario except in very specific cases. The authors create a single-sample version of the popular non-negative risk classifier and show clear performance gaps between the two versions, especially when half or more of the positive observations are labeled. They reach the same conclusion when the mismatch runs in the opposite direction. Only one adjustment to the empirical risk definition is required to match the sampling scheme.

Core claim

Classifiers based on empirical risk minimization for positive-unlabeled data that are designed for case-control sampling deteriorate when applied to single-sample data because their behavior depends on the scenario; accounting for the difference requires only a change in the definition of the empirical risk, and the single-sample analogue of the non-negative risk classifier exhibits significant performance differences from the original especially when half or more positive observations are labeled.

What carries the argument

the definition of the empirical risk inside the ERM objective for positive-unlabeled learning

If this is right

The non-negative risk classifier designed for case-control data requires a distinct single-sample analogue.
Performance differences between the two versions are largest when half or more positive observations are labeled.
Applying an ERM minimizer designed for single-sample data to case-control data produces analogous mismatches.
The scenario mismatch is resolved by redefining the empirical risk rather than by altering other components of the method.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Designers of PU methods may need to state the target sampling scheme explicitly when reporting performance.
Data collection protocols for PU problems could benefit from recording whether positives and unlabeled examples were drawn jointly or separately.
The same risk-definition adjustment might be needed when adapting other ERM-based PU techniques to new sampling regimes.

Load-bearing premise

The performance difference between sampling schemes is driven by the definition of the empirical risk estimator rather than by estimation error in class priors or other unmodeled factors.

What would settle it

Train both the original case-control ERM classifier and its single-sample analogue on data generated under each sampling scheme separately, then measure whether the performance gap disappears after the risk definition is adjusted.

Figures

Figures reproduced from arXiv: 2312.02095 by Adam Wawrze\'nczyk, Jan Mielniczuk.

**Figure 2.** Figure 2: Change of accuracy with label frequency increase for single-sample datasets [PITH_FULL_IMAGE:figures/full_fig_p014_2.png] view at source ↗

**Figure 3.** Figure 3: Test accuracy per epoch, selected single-sample datasets, [PITH_FULL_IMAGE:figures/full_fig_p014_3.png] view at source ↗

**Figure 4.** Figure 4: Risk components per epoch, Snacks dataset, [PITH_FULL_IMAGE:figures/full_fig_p015_4.png] view at source ↗

read the original abstract

In the paper we argue that performance of the classifiers based on Empirical Risk Minimization (ERM) for positive unlabeled data, which are designed for case-control sampling scheme may significantly deteriorate when applied to a single-sample scenario. We reveal why their behavior depends, in all but very specific cases, on the scenario. Also, we introduce a single-sample case analogue of the popular non-negative risk classifier designed for case-control data and compare its performance with the original proposal. We show that the significant differences occur between them, especiall when half or more positive of observations are labeled. The opposite case when ERM minimizer designed for the case-control case is applied for single-sample data is also considered and similar conclusions are drawn. Taking into account difference of scenarios requires a sole, but crucial, change in the definition of the Empirical Risk.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The sampling scheme in PU learning affects the right form of the empirical risk, and the paper makes that explicit with a single-sample analogue, though the experiments need tighter controls to confirm the source of the gap.

read the letter

The core claim is that ERM classifiers built for case-control PU sampling can degrade under single-sample data, and the fix is a targeted change in how the negative part of the risk is defined. The authors introduce the single-sample counterpart to the non-negative risk estimator and report noticeable performance differences between the two versions, especially once half or more positives are labeled. They also check the reverse direction. This distinction is not just restated from earlier PU-ERM results; it is spelled out as a scenario-dependent choice that practitioners might otherwise miss. The paper earns credit for keeping the adjustment minimal and for showing that the sampling scheme is not interchangeable in all but very special cases. If the comparisons really isolate the risk definition while holding prior estimation and other choices fixed, the observation is useful for anyone who has to pick or implement a PU risk estimator. The main soft spot is that the abstract supplies no derivation steps, no dataset descriptions, and no error analysis, so it is hard to judge how cleanly the reported gaps trace back to the risk term rather than to differences in how the two setups were coded. The stress-test point lands: without explicit confirmation that class-prior handling and hyper-parameters stayed identical, the performance difference could partly reflect those auxiliary decisions. The work is narrow but directly relevant to people already using or extending PU methods. A reader who needs to match their risk estimator to the actual sampling process would get a concrete takeaway. It is coherent enough on its own terms to go to peer review, mainly so referees can check the experimental controls and the size of the effect in the full text.

Referee Report

2 major / 1 minor

Summary. The paper argues that ERM-based PU classifiers designed for case-control sampling deteriorate when applied to single-sample scenarios. It claims to reveal the scenario dependence, introduces a single-sample analogue of the non-negative risk classifier, and reports significant performance gaps (especially when half or more positives are labeled). The central assertion is that the sampling difference is accounted for by a sole change in the empirical risk definition.

Significance. If the central claim holds after isolating the risk-definition change, the work would clarify an important modeling distinction in PU learning and supply a corrected estimator for the single-sample case. The empirical comparisons could help practitioners avoid mismatched risk estimators. The manuscript does not yet supply the derivations, controlled experiments, or dataset details needed to evaluate whether the reported gaps are attributable to the risk change rather than auxiliary modeling choices.

major comments (2)

[Abstract (central claim) and experimental sections] The central claim requires that the sole modification to the empirical risk (change in the negative risk term) produces the reported deterioration and improvement. This holds only if class-prior estimation, non-negative risk correction, and unlabeled-data handling are identical when the two risk definitions are compared. The manuscript must demonstrate that these auxiliary procedures are held fixed; otherwise the observed gaps may be artifacts of differing prior estimates or hyper-parameters rather than the risk definition itself.
[Abstract and experimental results] No derivation details, error analysis, or dataset descriptions are supplied to support the assertion of 'significant differences' and the scenario dependence. Without these, it is not possible to verify whether the performance gap is load-bearing for the risk-definition change or driven by unmodeled factors such as estimation error in class priors.

minor comments (1)

[Abstract] Typo: 'especiall' should read 'especially'.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive comments on our manuscript. We address each major comment below, clarifying the experimental controls used and committing to revisions that will make the isolation of the risk-definition effect and supporting details fully explicit.

read point-by-point responses

Referee: [Abstract (central claim) and experimental sections] The central claim requires that the sole modification to the empirical risk (change in the negative risk term) produces the reported deterioration and improvement. This holds only if class-prior estimation, non-negative risk correction, and unlabeled-data handling are identical when the two risk definitions are compared. The manuscript must demonstrate that these auxiliary procedures are held fixed; otherwise the observed gaps may be artifacts of differing prior estimates or hyper-parameters rather than the risk definition itself.

Authors: We agree that the central claim is valid only when auxiliary components are held fixed. In the reported experiments the class-prior estimator, the non-negative risk correction, and the unlabeled-data handling procedure were identical for both risk definitions; only the empirical-risk formulation itself was altered, and hyper-parameters were selected under the same protocol. We will revise the experimental section to include an explicit statement and table documenting these fixed components, thereby confirming that the observed gaps arise from the risk-definition change. revision: yes
Referee: [Abstract and experimental results] No derivation details, error analysis, or dataset descriptions are supplied to support the assertion of 'significant differences' and the scenario dependence. Without these, it is not possible to verify whether the performance gap is load-bearing for the risk-definition change or driven by unmodeled factors such as estimation error in class priors.

Authors: Section 3 contains the derivation of the single-sample non-negative risk estimator and the theoretical analysis of scenario dependence. Dataset descriptions and preprocessing steps are given in the experimental setup. To improve verifiability we will expand the appendix with complete derivation steps, add a formal error analysis that accounts for class-prior estimation error, and supply additional dataset statistics together with the implementation code. revision: yes

Circularity Check

0 steps flagged

No significant circularity; central claim follows from explicit redefinition of risk under distinct sampling schemes

full rationale

The paper distinguishes case-control versus single-sample PU schemes and shows that ERM classifiers designed for one require a modified risk definition to perform under the other. No load-bearing step reduces a claimed prediction or performance gap to a fitted parameter, self-citation chain, or ansatz imported from the authors' prior work. The argument is carried by the direct comparison of the two risk estimators under the respective sampling models, without any equation that equates an output to its input by construction. External benchmarks (synthetic and real-data experiments) are used to illustrate the difference rather than to validate a self-referential claim.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Only the abstract is available; no free parameters, axioms, or invented entities can be extracted from the provided text. The work extends existing ERM methods for PU data without introducing new postulated quantities.

pith-pipeline@v0.9.0 · 5673 in / 1094 out tokens · 25828 ms · 2026-05-24T04:51:47.244616+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

17 extracted references · 17 canonical work pages · 1 internal anchor

[1]

Recommendations as Treatments: Debiasing Learning and Evaluation

Schnabel T, Swaminathan A, Singh A, Chandak N, Joachims T. Recommendation as treatments: debiasing learning and evaluation. ICML, 2016. 48:1670–1679. doi:10.48550/arXiv.1602.05352

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.1602.05352 2016
[2]

Learning from positive and unlabeled data: a survey

Bekker J, Davis J. Learning from positive and unlabeled data: a survey. Machine Learning , 2020. 109(4):719–760. doi:10.1007/s10994-020-05877-5

work page doi:10.1007/s10994-020-05877-5 2020
[3]

Building high-performance classifiers using positive and unlabeled examples for text classification

Ke T, Yang B, Zhen L, Tan J, Li Y , Jing L. Building high-performance classifiers using positive and unlabeled examples for text classification. In: International Symposium on Neural Networks. Springer, 2012 pp. 187–195. doi:10.1007/978-3-642-31362-2 21

work page doi:10.1007/978-3-642-31362-2 2012
[4]

Maximum likelihood from incomplete data via the EM algorithm

Dempster AP, Laird NM, Rubin DB. Maximum likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society. Series B (Methodological) , 1977. 39(1):1–38

work page 1977
[5]

On missing labels, long-tails and propensities in extreme multi-label classification

Shultheis E, Babbar R, Wydmuch M, Dembczy ´nski K. On missing labels, long-tails and propensities in extreme multi-label classification. In: KDD’22. 2022 pp. 1547–1557. doi:10.1145/3534678.3539466

work page doi:10.1145/3534678.3539466 2022
[6]

Beyond the selected completely at random assumption for learning from positive and unlabeled data

Bekker J, Robberechts P, Davis J. Beyond the selected completely at random assumption for learning from positive and unlabeled data. In: Proceedings of ECMLPKDD’2019. Springer, Cham, 2019 pp. 71–

work page 2019
[7]

doi:10.1007/978-3-030-46147-8 5

work page doi:10.1007/978-3-030-46147-8
[8]

Variations and extension of the convex-concave procedure.Optimisation and Engineering,

Lipp T, Boyd S. Variations and extension of the convex-concave procedure.Optimisation and Engineering,

work page
[9]

doi:10.1007/s11081-015-9294-x

17(2):263–287. doi:10.1007/s11081-015-9294-x

work page doi:10.1007/s11081-015-9294-x
[10]

One-class classification approach to variational learning from biased positive unlabeled data

Wawrze ´nczyk A, Mielniczuk J. One-class classification approach to variational learning from biased positive unlabeled data. In: ECAI 2023, to appear

work page 2023
[11]

Learning classifiers from only positive and unlabeled data

Elkan C, Noto K. Learning classifiers from only positive and unlabeled data. In: Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 2008 pp. 213–220

work page 2008
[12]

Deep Generative Positive-Unlabeled Learning under Selection Bias

Na B, Kim H, Song K, Joo W, Kim YY , Moon IC. Deep Generative Positive-Unlabeled Learning under Selection Bias. In: Proceedings of CIKM’20, CIKM ’20. ACM, New York, NY , USA. 2020 pp. 1155–

work page 2020
[13]

Analysis of Learning from Positive and Unlabeled Data

du Plessis MC, Niu G, Sugiyama M. Analysis of Learning from Positive and Unlabeled Data. In: Ghahra- mani Z, Welling M, Cortes C, Lawrence N, Weinberger K (eds.), Advances in Neural Information Pro- cessing Systems, volume 27. Curran Associates, Inc., 2014 pp. 703–711. J. Mielniczuk and A. Wawrze ´nczyk / Single-sample V ersus Case-control Sampling Scheme...

work page 2014
[14]

Positive-Unlabeled Learning with Non-Negative Risk Estimator

Kiryo R, Niu G, du Plessis MC, Sugiyama M. Positive-Unlabeled Learning with Non-Negative Risk Estimator. In: Proceedings of the NIPS’17, NIPS’17. Curran Associates Inc., Red Hook, NY , USA. 2017 pp. 1674–1684. ISBN:9781510860964

work page 2017
[15]

Learning from Positive and Unlabeled Data with a Selection Bias

Kato M, Teshima T, Honda J. Learning from Positive and Unlabeled Data with a Selection Bias. In: ICLR 2019

work page 2019
[16]

MINILM: Deep Self-Attention Distillation for Task- Agnostic Compression of Pre-Trained Transformers

Wang W, Wei F, Dong L, Bao H, Yang N, Zhou M. MINILM: Deep Self-Attention Distillation for Task- Agnostic Compression of Pre-Trained Transformers. In: Proceedings of the 34th International Conference on Neural Information Processing Systems, NIPS’20. Curran Associates Inc., Red Hook, NY , USA, 2020 pp. 5776–5788. doi:10.48550/arXiv.2002.10957

work page doi:10.48550/arxiv.2002.10957 2020
[17]

SwiftFormer: Efficient Addi- tive Attention for Transformer-based Real-time Mobile Vision Applications

Shaker A, Maaz M, Abdul Rasheed H, Khan S, Yang MH, Khan F. SwiftFormer: Efficient Addi- tive Attention for Transformer-based Real-time Mobile Vision Applications. In: ICCV 2023, doi: 10.48550/arXiv.2303.15446

work page doi:10.48550/arxiv.2303.15446 2023

[1] [1]

Recommendations as Treatments: Debiasing Learning and Evaluation

Schnabel T, Swaminathan A, Singh A, Chandak N, Joachims T. Recommendation as treatments: debiasing learning and evaluation. ICML, 2016. 48:1670–1679. doi:10.48550/arXiv.1602.05352

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.1602.05352 2016

[2] [2]

Learning from positive and unlabeled data: a survey

Bekker J, Davis J. Learning from positive and unlabeled data: a survey. Machine Learning , 2020. 109(4):719–760. doi:10.1007/s10994-020-05877-5

work page doi:10.1007/s10994-020-05877-5 2020

[3] [3]

Building high-performance classifiers using positive and unlabeled examples for text classification

Ke T, Yang B, Zhen L, Tan J, Li Y , Jing L. Building high-performance classifiers using positive and unlabeled examples for text classification. In: International Symposium on Neural Networks. Springer, 2012 pp. 187–195. doi:10.1007/978-3-642-31362-2 21

work page doi:10.1007/978-3-642-31362-2 2012

[4] [4]

Maximum likelihood from incomplete data via the EM algorithm

Dempster AP, Laird NM, Rubin DB. Maximum likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society. Series B (Methodological) , 1977. 39(1):1–38

work page 1977

[5] [5]

On missing labels, long-tails and propensities in extreme multi-label classification

Shultheis E, Babbar R, Wydmuch M, Dembczy ´nski K. On missing labels, long-tails and propensities in extreme multi-label classification. In: KDD’22. 2022 pp. 1547–1557. doi:10.1145/3534678.3539466

work page doi:10.1145/3534678.3539466 2022

[6] [6]

Beyond the selected completely at random assumption for learning from positive and unlabeled data

Bekker J, Robberechts P, Davis J. Beyond the selected completely at random assumption for learning from positive and unlabeled data. In: Proceedings of ECMLPKDD’2019. Springer, Cham, 2019 pp. 71–

work page 2019

[7] [7]

doi:10.1007/978-3-030-46147-8 5

work page doi:10.1007/978-3-030-46147-8

[8] [8]

Variations and extension of the convex-concave procedure.Optimisation and Engineering,

Lipp T, Boyd S. Variations and extension of the convex-concave procedure.Optimisation and Engineering,

work page

[9] [9]

doi:10.1007/s11081-015-9294-x

17(2):263–287. doi:10.1007/s11081-015-9294-x

work page doi:10.1007/s11081-015-9294-x

[10] [10]

One-class classification approach to variational learning from biased positive unlabeled data

Wawrze ´nczyk A, Mielniczuk J. One-class classification approach to variational learning from biased positive unlabeled data. In: ECAI 2023, to appear

work page 2023

[11] [11]

Learning classifiers from only positive and unlabeled data

Elkan C, Noto K. Learning classifiers from only positive and unlabeled data. In: Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 2008 pp. 213–220

work page 2008

[12] [12]

Deep Generative Positive-Unlabeled Learning under Selection Bias

Na B, Kim H, Song K, Joo W, Kim YY , Moon IC. Deep Generative Positive-Unlabeled Learning under Selection Bias. In: Proceedings of CIKM’20, CIKM ’20. ACM, New York, NY , USA. 2020 pp. 1155–

work page 2020

[13] [13]

Analysis of Learning from Positive and Unlabeled Data

du Plessis MC, Niu G, Sugiyama M. Analysis of Learning from Positive and Unlabeled Data. In: Ghahra- mani Z, Welling M, Cortes C, Lawrence N, Weinberger K (eds.), Advances in Neural Information Pro- cessing Systems, volume 27. Curran Associates, Inc., 2014 pp. 703–711. J. Mielniczuk and A. Wawrze ´nczyk / Single-sample V ersus Case-control Sampling Scheme...

work page 2014

[14] [14]

Positive-Unlabeled Learning with Non-Negative Risk Estimator

Kiryo R, Niu G, du Plessis MC, Sugiyama M. Positive-Unlabeled Learning with Non-Negative Risk Estimator. In: Proceedings of the NIPS’17, NIPS’17. Curran Associates Inc., Red Hook, NY , USA. 2017 pp. 1674–1684. ISBN:9781510860964

work page 2017

[15] [15]

Learning from Positive and Unlabeled Data with a Selection Bias

Kato M, Teshima T, Honda J. Learning from Positive and Unlabeled Data with a Selection Bias. In: ICLR 2019

work page 2019

[16] [16]

MINILM: Deep Self-Attention Distillation for Task- Agnostic Compression of Pre-Trained Transformers

Wang W, Wei F, Dong L, Bao H, Yang N, Zhou M. MINILM: Deep Self-Attention Distillation for Task- Agnostic Compression of Pre-Trained Transformers. In: Proceedings of the 34th International Conference on Neural Information Processing Systems, NIPS’20. Curran Associates Inc., Red Hook, NY , USA, 2020 pp. 5776–5788. doi:10.48550/arXiv.2002.10957

work page doi:10.48550/arxiv.2002.10957 2020

[17] [17]

SwiftFormer: Efficient Addi- tive Attention for Transformer-based Real-time Mobile Vision Applications

Shaker A, Maaz M, Abdul Rasheed H, Khan S, Yang MH, Khan F. SwiftFormer: Efficient Addi- tive Attention for Transformer-based Real-time Mobile Vision Applications. In: ICCV 2023, doi: 10.48550/arXiv.2303.15446

work page doi:10.48550/arxiv.2303.15446 2023