Prior shift estimation for positive unlabeled data through the lens of kernel embedding

Jan Mielniczuk; Pawe{\l} Teisseyre; Wojciech Rejchel

arxiv: 2502.21194 · v3 · pith:ZUQSIVDBnew · submitted 2025-02-28 · 📊 stat.ML · cs.LG

Prior shift estimation for positive unlabeled data through the lens of kernel embedding

Jan Mielniczuk , Wojciech Rejchel , Pawe{\l} Teisseyre This is my paper

Pith reviewed 2026-05-23 01:39 UTC · model grok-4.3

classification 📊 stat.ML cs.LG

keywords positive-unlabeled learningclass prior estimationkernel embeddingdistribution matchingprior shiftreproducing kernel Hilbert spaceasymptotic consistency

0 comments

The pith

A direct kernel-embedding estimator recovers the class prior in positive-unlabeled data with prior shift by solving an explicit distribution-matching optimization.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper develops an estimator for the proportion of positive examples in an unlabeled target population whose prior may differ from that of a source population observed only through positive samples and mixed samples. The method matches kernel embeddings of the observed distributions inside a reproducing kernel Hilbert space and obtains the prior as the explicit solution to a resulting optimization problem. Because the procedure never computes posterior probabilities, it sidesteps one common source of error in positive-unlabeled pipelines. The authors prove that the estimator converges to the true prior as sample size grows and supply a concrete, computable finite-sample deviation bound. A reader would care because many downstream positive-unlabeled algorithms depend on an accurate prior; a direct geometric method reduces modeling choices that can otherwise bias the result.

Core claim

The class prior is recovered directly as the explicit solution to a distribution-matching optimization that aligns kernel embeddings of the positive and mixed source samples with the target sample; the resulting estimator is asymptotically consistent and admits an explicit non-asymptotic bound on its deviation from the unknown prior that can be evaluated in practice.

What carries the argument

Kernel embedding distribution matching in a reproducing kernel Hilbert space, which converts the prior-recovery task into an explicit convex optimization whose solution is the estimated mixing proportion.

If this is right

The estimator converges to the true prior as the number of samples increases.
A non-asymptotic, computable bound on the estimation error is available without further modeling.
The estimator exhibits a simple geometric interpretation based on distances between embedded distributions.
On both synthetic and real data the method matches or exceeds the accuracy of existing competitors while avoiding posterior estimation.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The geometric formulation may allow the same matching idea to be applied to other forms of label shift without redesigning the loss.
Because the estimator is explicit, it can be plugged into existing positive-unlabeled algorithms as a drop-in prior without retraining auxiliary models.
The finite-sample bound supplies a practical way to decide how much target data is needed before the prior estimate is reliable enough for downstream use.

Load-bearing premise

The optimization problem that aligns the kernel embeddings has a unique solution that equals the unknown class prior.

What would settle it

In large samples the estimator systematically deviates from the true prior even though the kernel embeddings of the positive, mixed, and target distributions are accurately estimated.

Figures

Figures reproduced from arXiv: 2502.21194 by Jan Mielniczuk, Pawe{\l} Teisseyre, Wojciech Rejchel.

**Figure 2.** Figure 2: Visualization of the objective function behavior for [PITH_FULL_IMAGE:figures/full_fig_p005_2.png] view at source ↗

**Figure 3.** Figure 3: Distribution of estimators (red line indicates the true [PITH_FULL_IMAGE:figures/full_fig_p012_3.png] view at source ↗

**Figure 4.** Figure 4: Estimation errors wrt size of the source data for synthetic dataset and [PITH_FULL_IMAGE:figures/full_fig_p013_4.png] view at source ↗

**Figure 5.** Figure 5: The impact of π estimation on the performance of the TCPU estimator. The boxplots show estimation errors for TCPU target class prior estimator |πb ′ −π ′ |, for different source class prior estimators ˆπ. 14 [PITH_FULL_IMAGE:figures/full_fig_p014_5.png] view at source ↗

**Figure 6.** Figure 6: Robustness to violation of the Label Shift (LS) assumption (1). The blue vertical line [PITH_FULL_IMAGE:figures/full_fig_p015_6.png] view at source ↗

**Figure 7.** Figure 7: Distribution of estimators (red line indicates the true [PITH_FULL_IMAGE:figures/full_fig_p016_7.png] view at source ↗

**Figure 8.** Figure 8: Distribution of estimators (red line indicates the true [PITH_FULL_IMAGE:figures/full_fig_p017_8.png] view at source ↗

**Figure 9.** Figure 9: Distribution of estimators (red line indicates the true [PITH_FULL_IMAGE:figures/full_fig_p018_9.png] view at source ↗

read the original abstract

We study estimation of a class prior for unlabeled target samples which possibly differs from that of source population. Moreover, it is assumed that the source data is partially observable: only samples from the positive class and from the whole population are available (PU learning scenario). We introduce a novel direct estimator of a class prior which avoids estimation of posterior probabilities in both populations and has a simple geometric interpretation. It is based on a distribution matching technique together with kernel embedding in a Reproducing Kernel Hilbert Space and is obtained as an explicit solution to an optimisation task. We establish its asymptotic consistency as well as an explicit non-asymptotic bound on its deviation from the unknown prior, which is calculable in practice. We study finite sample behaviour for synthetic and real data and show that the proposal works consistently on par or better than its competitors.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

Abstract outlines a direct kernel-embedding estimator for class priors in shifted PU data with claimed consistency and a practical bound, but only the abstract exists so nothing can be checked.

read the letter

The core claim is a new direct estimator for the target class prior in positive-unlabeled source data when the unlabeled target has a different prior. It works by distribution matching in an RKHS, yields an explicit solution to an optimization problem, skips posterior estimation entirely, and comes with asymptotic consistency plus an explicit non-asymptotic deviation bound that is supposed to be computable in practice. They also say finite-sample results on synthetic and real data are competitive or better than existing methods, with a simple geometric reading of the estimator.

Referee Report

1 major / 0 minor

Summary. The manuscript proposes a novel direct estimator for the class prior in positive-unlabeled (PU) data under possible prior shift. The estimator is derived from distribution matching via kernel embeddings in an RKHS, obtained as an explicit solution to an optimization task. It avoids posterior probability estimation in both populations, admits a geometric interpretation, and the authors claim asymptotic consistency together with an explicit non-asymptotic deviation bound that is calculable in practice. Finite-sample performance is reported to be competitive with existing methods on synthetic and real data.

Significance. If the central claims are substantiated, the work would supply a direct, geometrically interpretable alternative to posterior-based prior estimators in PU learning and label-shift settings. The combination of an explicit optimization solution with a practical non-asymptotic bound would be a useful theoretical and computational contribution to kernel methods for distribution matching.

major comments (1)

[Abstract] Abstract: the claims of asymptotic consistency, an explicit non-asymptotic bound calculable in practice, and that the optimization yields the class prior without posterior estimation or biasing modeling choices are load-bearing for the central contribution, yet the full derivations, proofs, assumptions, and experimental details are not available in the provided text, preventing verification of soundness or the weakest assumption identified in the reader report.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for their review and for highlighting the importance of verifying the central claims. We address the major comment below.

read point-by-point responses

Referee: [Abstract] Abstract: the claims of asymptotic consistency, an explicit non-asymptotic bound calculable in practice, and that the optimization yields the class prior without posterior estimation or biasing modeling choices are load-bearing for the central contribution, yet the full derivations, proofs, assumptions, and experimental details are not available in the provided text, preventing verification of soundness or the weakest assumption identified in the reader report.

Authors: The provided excerpt in this review contains only the abstract. The full manuscript (arXiv:2502.21194) contains the complete derivations, proofs under the stated assumptions (including RKHS properties and kernel choice), the explicit non-asymptotic deviation bound, and all experimental details. The estimator is obtained as the closed-form solution to the distribution-matching objective in the RKHS, which directly yields the prior without requiring posterior estimation in either population. We are happy to supply specific proof excerpts or additional clarification if the referee wishes to examine particular steps. revision: no

Circularity Check

0 steps flagged

No circularity detected from abstract

full rationale

The abstract presents the estimator as an explicit solution to an optimization task using kernel embedding and distribution matching, with separate claims of asymptotic consistency and a calculable non-asymptotic bound. No equations, self-citations, fitted parameters renamed as predictions, or self-referential definitions are visible in the provided text. The derivation chain cannot be walked beyond the abstract, so no load-bearing step reduces to its inputs by construction.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

The abstract does not specify any free parameters, axioms, or invented entities; full paper would be needed to audit these.

pith-pipeline@v0.9.0 · 5647 in / 1246 out tokens · 48469 ms · 2026-05-23T01:39:49.752531+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

34 extracted references · 34 canonical work pages

[1]

In: Proceedings of the 32th AAAI Conference on Artificial Intelligence

Bekker, J., Davis, J.: Estimating the class prior in positive and unlabeled data through decision tree induction. In: Proceedings of the 32th AAAI Conference on Artificial Intelligence. pp. 1–8 (2018)

work page 2018
[2]

Machine Learning 109, 719–760 (2020)

Bekker, J., Davis, J.: Learning from positive and unlabeled data: a survey. Machine Learning 109, 719–760 (2020)

work page 2020
[3]

Journal of Machine Learn- ing Research11, 2973–3009 (2010)

Blanchard, G., Lee, G., Scott, C.: Semi-supervised novelty detection. Journal of Machine Learn- ing Research11, 2973–3009 (2010)

work page 2010
[4]

In: Proceedings of the 37th International Conference on Machine Learning

Chen, X., Chen, W., Chen, T., Yuan, Y., Gong, C., Chen, K., Wang, Z.: Self-PU: Self boosted and calibrated positive-unlabeled training. In: Proceedings of the 37th International Conference on Machine Learning. ICML’20 (2020)

work page 2020
[5]

In: Proceedings of the European Conferencce on Machine Learning (2023)

Dussap, B., Blanchard, G., Ch´ erif-Abdellatif, B.E.: Label shift quantification with robustness guarantees via distribution feature matching. In: Proceedings of the European Conferencce on Machine Learning (2023)

work page 2023
[6]

In: Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining

Elkan, C., Noto, K.: Learning classifiers from only positive and unlabeled data. In: Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. pp. 213–220. KDD ’08 (2008)

work page 2008
[7]

Data Mining and Knowledge Dis- covery17, 164–206 (2008)

Forman, G.: Quantifying counts and costs via classification. Data Mining and Knowledge Dis- covery17, 164–206 (2008)

work page 2008
[8]

In: Advances in Neural Information Processing Systems

Fukumizu, K., Gretton, A., Sun, X., Sch¨ olkopf, B.: Kernel measures of conditional dependence. In: Advances in Neural Information Processing Systems. vol. 20 (2007)

work page 2007
[9]

IEEE Transactions on Knowledge and Data Engineering18(1), 6–20 (2006)

Fung, G.P.C., Yu, J.X., Lu, H., Yu, P.S.: Text classification without negative examples revisit. IEEE Transactions on Knowledge and Data Engineering18(1), 6–20 (2006)

work page 2006
[10]

In: Proceedings of the 34th International Conference on Neural Information Processing Systems

Garg, S., Wu, Y., Balakrishnan, S., Lipton, Z.C.: A unified view of label shift estimation. In: Proceedings of the 34th International Conference on Neural Information Processing Systems. pp. 1–11. NIPS’ 20 (2020)

work page 2020
[11]

IEEE Trans Pattern Anal Mach Intell pp

Gong, C., Wang, Q., Liu, T., Han, B., You, J., Yang, J., Tao, D.: Instance-dependent positive and unlabeled learning with labeling bias estimation. IEEE Trans Pattern Anal Mach Intell pp. 1–16 (2021)

work page 2021
[12]

ACM Comput

Gonz´ alez, P., Casta˜ no, A., Chawla, N., Coz, J.: A review on quantification learning. ACM Comput. Surv.50(5) (2017)

work page 2017
[13]

Journal of Machine Learning Research13, 723–773 (2012)

Gretton, A., Borgwardt, K., Rasch, M., Sch¨ olkopf, B., Smola, A.: A kernel two-sample test. Journal of Machine Learning Research13, 723–773 (2012)

work page 2012
[14]

In: Proceedings of the 31th International Conferencce on Machine Learning

Iyer, A., Nath, S., Sarawagi, S.: Maximum mean discrepancy for class ratio estimation: con- vergence bounds and kernel selection. In: Proceedings of the 31th International Conferencce on Machine Learning. IMLR W & CP vol. 32 (2014)

work page 2014
[15]

In: Proceedings of the 30th International Conference on Neural Information Processing Systems

Jain, S., White, M., Radivojac, P.: Estimating the class prior and posterior from noisy positives and unlabeled data. In: Proceedings of the 30th International Conference on Neural Information Processing Systems. p. 2693–2701 (2016) 23

work page 2016
[16]

In: Proceedings of the International Conference on Neural Information Processing Systems

Kiryo, R., Niu, G., du Plessis, M.C., Sugiyama, M.: Positive-unlabeled learning with non- negative risk estimator. In: Proceedings of the International Conference on Neural Information Processing Systems. pp. 1674–1684. NIPS’17 (2017)

work page 2017
[17]

Briefings in Bioinformatics23(1) (2021)

Li, F., Dong, S., Leier, A., Han, M., Guo, X., Xu, J., Wang, X., Pan, S., Jia, C., Zhang, Y., Webb, G., Coin, L.J.M., Li, C., Song, J.: Positive-unlabeled learning in bioinformatics and computational biology: a brief review. Briefings in Bioinformatics23(1) (2021)

work page 2021
[18]

In: Proceedings of the 18th International Joint Conference on Artificial Intelligence

Li, X., Liu, B.: Learning to classify texts using positive and unlabeled data. In: Proceedings of the 18th International Joint Conference on Artificial Intelligence. p. 587–592. IJCAI’03 (2003)

work page 2003
[19]

In: Proceedings of the 35th International Conference on Machine Learning

Lipton, Z.C., Wang, Y., Smola, A.J.: Detecting and correcting for label shift with black box predictors. In: Proceedings of the 35th International Conference on Machine Learning. pp. 3128–3136. ICML’ 18 (2018)

work page 2018
[20]

In: Proceedings of the AAAI Conference on Artificial Intelligence

Luo, C., Zhao, P., Chen, C., Qiao, B., Du, C., Zhang, H., Wu, W., Cai, S., He, B., Rajmohan, S., Lin, Q.: Pulns: Positive-unlabeled learning with effective negative sample selector. In: Proceedings of the AAAI Conference on Artificial Intelligence. AAAI’21, vol. 35, pp. 8784–8792 (2021)

work page 2021
[21]

Survey in Combinatorics pp

Mc Diarmid, C.: On the method of bounded differences. Survey in Combinatorics pp. 148–188 (1989)

work page 1989
[22]

Fundamenta Informaticae191, 1–17 (2024)

Mielniczuk, J., Wawrze´ nczyk, A.: Single-sample versus case-control sampling scheme for Posi- tive Unlabeled data: the story of two scenarios. Fundamenta Informaticae191, 1–17 (2024)

work page 2024
[23]

Machine Learning112, 889–919 (2023)

Nakajima, S., Siguyama, M.: Positive-unlabeled classification under class-prior shift: a prior- invariant approach based on density ratio estimation. Machine Learning112, 889–919 (2023)

work page 2023
[24]

In: Advances in Neural Information Processing Systems

Paszke, A., Gross, S., Massa, F., Lerer, A., Bradbury, J., Chanan, G., Killeen, T., Lin, Z., Gimelshein, N., Antiga, L., Desmaison, A., Kopf, A., Yang, E., DeVito, Z., Raison, M., Tejani, A., Chilamkurthy, S., Steiner, B., Fang, L., Bai, J., Chintala, S.: Pytorch: An imperative style, high-performance deep learning library. In: Advances in Neural Informat...

work page 2019
[25]

In: Proceedings of the International Conference on Neural Information Processing Systems

du Plessis, M.C., Niu, G., Sugiyama, M.: Analysis of learning from positive and unlabeled data. In: Proceedings of the International Conference on Neural Information Processing Systems. pp. 703–711. NIPS’14 (2014)

work page 2014
[26]

In: Proceedings of The 33rd International Conference on Machine Learning

Ramaswamy, H., Scott, C., Tewari, A.: Mixture proportion estimation via kernel embeddings of distributions. In: Proceedings of The 33rd International Conference on Machine Learning. vol. 48, pp. 2052–2060 (2016)

work page 2052
[27]

Journal of Medical Systems46(5), 1–12 (2022)

Roland, T., Bock, C., Tschoellitsch, T., Maletzky, A., Hochreiter, S., Meier, J., Klambauer, G.: Domain shifts in machine learning based covid-19 diagnosis from blood tests. Journal of Medical Systems46(5), 1–12 (2022)

work page 2022
[28]

Neural Comput.14(1), 21–41 (2002)

Saerens, M., Latinne, P., Decaestecker, C.: Adjusting the outputs of a classifier to new a priori probabilities: a simple procedure. Neural Comput.14(1), 21–41 (2002)

work page 2002
[29]

International Journal of Approximate Reasoning 85, 159 – 177 (2017) 24

Sechidis, K., Sperrin, M., Petherick, E.S., Luj´ an, M., Brown, G.: Dealing with under-reported variables: An information theoretic solution. International Journal of Approximate Reasoning 85, 159 – 177 (2017) 24

work page 2017
[30]

Journal of Machine Learning Research18(86), 1–47 (2017)

Tolstikhin, I., Sriperumbudur, B.K., Muandet, K.: Minimax estimation of kernel mean embed- dings. Journal of Machine Learning Research18(86), 1–47 (2017)

work page 2017
[31]

Journal of Machine Learning Research20, 1–33 (2019)

Vaz, A., Izbicki, R., Stern, R.: Quantification under prior probability shift: the ratio estimator and its extensions. Journal of Machine Learning Research20, 1–33 (2019)

work page 2019
[32]

In: Proceedings of the 30th International Conferencce on Machine Learning (2014)

Zhang, K., Sch¨ olkopf, B., Muandet, K., Wang, Z.: Domain adaptation under target and condi- tional shift. In: Proceedings of the 30th International Conferencce on Machine Learning (2014)

work page 2014
[33]

In: NIPS’18

Zhang, Z., Sabuncu, M.: Generalizec cross entropy loss for training neural networks with noisy labels. In: NIPS’18. pp. 8792 – 8802 (2018)

work page 2018
[34]

In: Proceedings of the Conference on Computer Vision and Pattern Recognition

Zhao, Y., Xu, Q., Jiang, Y., Wen, P., Huang, Q.: Dist-pu: Positive-unlabeled learning from a label distribution perspective. In: Proceedings of the Conference on Computer Vision and Pattern Recognition. pp. 14461–14470. CVPR’22 (2022) 25 Supplementary material 1 Additional theoretical results Lemma 4.Suppose thatM= sup x K(x, x)<∞andδ≤exp(−( √ 2 + 1)2/2)i...

work page 2022

[1] [1]

In: Proceedings of the 32th AAAI Conference on Artificial Intelligence

Bekker, J., Davis, J.: Estimating the class prior in positive and unlabeled data through decision tree induction. In: Proceedings of the 32th AAAI Conference on Artificial Intelligence. pp. 1–8 (2018)

work page 2018

[2] [2]

Machine Learning 109, 719–760 (2020)

Bekker, J., Davis, J.: Learning from positive and unlabeled data: a survey. Machine Learning 109, 719–760 (2020)

work page 2020

[3] [3]

Journal of Machine Learn- ing Research11, 2973–3009 (2010)

Blanchard, G., Lee, G., Scott, C.: Semi-supervised novelty detection. Journal of Machine Learn- ing Research11, 2973–3009 (2010)

work page 2010

[4] [4]

In: Proceedings of the 37th International Conference on Machine Learning

Chen, X., Chen, W., Chen, T., Yuan, Y., Gong, C., Chen, K., Wang, Z.: Self-PU: Self boosted and calibrated positive-unlabeled training. In: Proceedings of the 37th International Conference on Machine Learning. ICML’20 (2020)

work page 2020

[5] [5]

In: Proceedings of the European Conferencce on Machine Learning (2023)

Dussap, B., Blanchard, G., Ch´ erif-Abdellatif, B.E.: Label shift quantification with robustness guarantees via distribution feature matching. In: Proceedings of the European Conferencce on Machine Learning (2023)

work page 2023

[6] [6]

In: Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining

Elkan, C., Noto, K.: Learning classifiers from only positive and unlabeled data. In: Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. pp. 213–220. KDD ’08 (2008)

work page 2008

[7] [7]

Data Mining and Knowledge Dis- covery17, 164–206 (2008)

Forman, G.: Quantifying counts and costs via classification. Data Mining and Knowledge Dis- covery17, 164–206 (2008)

work page 2008

[8] [8]

In: Advances in Neural Information Processing Systems

Fukumizu, K., Gretton, A., Sun, X., Sch¨ olkopf, B.: Kernel measures of conditional dependence. In: Advances in Neural Information Processing Systems. vol. 20 (2007)

work page 2007

[9] [9]

IEEE Transactions on Knowledge and Data Engineering18(1), 6–20 (2006)

Fung, G.P.C., Yu, J.X., Lu, H., Yu, P.S.: Text classification without negative examples revisit. IEEE Transactions on Knowledge and Data Engineering18(1), 6–20 (2006)

work page 2006

[10] [10]

In: Proceedings of the 34th International Conference on Neural Information Processing Systems

Garg, S., Wu, Y., Balakrishnan, S., Lipton, Z.C.: A unified view of label shift estimation. In: Proceedings of the 34th International Conference on Neural Information Processing Systems. pp. 1–11. NIPS’ 20 (2020)

work page 2020

[11] [11]

IEEE Trans Pattern Anal Mach Intell pp

Gong, C., Wang, Q., Liu, T., Han, B., You, J., Yang, J., Tao, D.: Instance-dependent positive and unlabeled learning with labeling bias estimation. IEEE Trans Pattern Anal Mach Intell pp. 1–16 (2021)

work page 2021

[12] [12]

ACM Comput

Gonz´ alez, P., Casta˜ no, A., Chawla, N., Coz, J.: A review on quantification learning. ACM Comput. Surv.50(5) (2017)

work page 2017

[13] [13]

Journal of Machine Learning Research13, 723–773 (2012)

Gretton, A., Borgwardt, K., Rasch, M., Sch¨ olkopf, B., Smola, A.: A kernel two-sample test. Journal of Machine Learning Research13, 723–773 (2012)

work page 2012

[14] [14]

In: Proceedings of the 31th International Conferencce on Machine Learning

Iyer, A., Nath, S., Sarawagi, S.: Maximum mean discrepancy for class ratio estimation: con- vergence bounds and kernel selection. In: Proceedings of the 31th International Conferencce on Machine Learning. IMLR W & CP vol. 32 (2014)

work page 2014

[15] [15]

In: Proceedings of the 30th International Conference on Neural Information Processing Systems

Jain, S., White, M., Radivojac, P.: Estimating the class prior and posterior from noisy positives and unlabeled data. In: Proceedings of the 30th International Conference on Neural Information Processing Systems. p. 2693–2701 (2016) 23

work page 2016

[16] [16]

In: Proceedings of the International Conference on Neural Information Processing Systems

Kiryo, R., Niu, G., du Plessis, M.C., Sugiyama, M.: Positive-unlabeled learning with non- negative risk estimator. In: Proceedings of the International Conference on Neural Information Processing Systems. pp. 1674–1684. NIPS’17 (2017)

work page 2017

[17] [17]

Briefings in Bioinformatics23(1) (2021)

Li, F., Dong, S., Leier, A., Han, M., Guo, X., Xu, J., Wang, X., Pan, S., Jia, C., Zhang, Y., Webb, G., Coin, L.J.M., Li, C., Song, J.: Positive-unlabeled learning in bioinformatics and computational biology: a brief review. Briefings in Bioinformatics23(1) (2021)

work page 2021

[18] [18]

In: Proceedings of the 18th International Joint Conference on Artificial Intelligence

Li, X., Liu, B.: Learning to classify texts using positive and unlabeled data. In: Proceedings of the 18th International Joint Conference on Artificial Intelligence. p. 587–592. IJCAI’03 (2003)

work page 2003

[19] [19]

In: Proceedings of the 35th International Conference on Machine Learning

Lipton, Z.C., Wang, Y., Smola, A.J.: Detecting and correcting for label shift with black box predictors. In: Proceedings of the 35th International Conference on Machine Learning. pp. 3128–3136. ICML’ 18 (2018)

work page 2018

[20] [20]

In: Proceedings of the AAAI Conference on Artificial Intelligence

Luo, C., Zhao, P., Chen, C., Qiao, B., Du, C., Zhang, H., Wu, W., Cai, S., He, B., Rajmohan, S., Lin, Q.: Pulns: Positive-unlabeled learning with effective negative sample selector. In: Proceedings of the AAAI Conference on Artificial Intelligence. AAAI’21, vol. 35, pp. 8784–8792 (2021)

work page 2021

[21] [21]

Survey in Combinatorics pp

Mc Diarmid, C.: On the method of bounded differences. Survey in Combinatorics pp. 148–188 (1989)

work page 1989

[22] [22]

Fundamenta Informaticae191, 1–17 (2024)

Mielniczuk, J., Wawrze´ nczyk, A.: Single-sample versus case-control sampling scheme for Posi- tive Unlabeled data: the story of two scenarios. Fundamenta Informaticae191, 1–17 (2024)

work page 2024

[23] [23]

Machine Learning112, 889–919 (2023)

Nakajima, S., Siguyama, M.: Positive-unlabeled classification under class-prior shift: a prior- invariant approach based on density ratio estimation. Machine Learning112, 889–919 (2023)

work page 2023

[24] [24]

In: Advances in Neural Information Processing Systems

Paszke, A., Gross, S., Massa, F., Lerer, A., Bradbury, J., Chanan, G., Killeen, T., Lin, Z., Gimelshein, N., Antiga, L., Desmaison, A., Kopf, A., Yang, E., DeVito, Z., Raison, M., Tejani, A., Chilamkurthy, S., Steiner, B., Fang, L., Bai, J., Chintala, S.: Pytorch: An imperative style, high-performance deep learning library. In: Advances in Neural Informat...

work page 2019

[25] [25]

In: Proceedings of the International Conference on Neural Information Processing Systems

du Plessis, M.C., Niu, G., Sugiyama, M.: Analysis of learning from positive and unlabeled data. In: Proceedings of the International Conference on Neural Information Processing Systems. pp. 703–711. NIPS’14 (2014)

work page 2014

[26] [26]

In: Proceedings of The 33rd International Conference on Machine Learning

Ramaswamy, H., Scott, C., Tewari, A.: Mixture proportion estimation via kernel embeddings of distributions. In: Proceedings of The 33rd International Conference on Machine Learning. vol. 48, pp. 2052–2060 (2016)

work page 2052

[27] [27]

Journal of Medical Systems46(5), 1–12 (2022)

Roland, T., Bock, C., Tschoellitsch, T., Maletzky, A., Hochreiter, S., Meier, J., Klambauer, G.: Domain shifts in machine learning based covid-19 diagnosis from blood tests. Journal of Medical Systems46(5), 1–12 (2022)

work page 2022

[28] [28]

Neural Comput.14(1), 21–41 (2002)

Saerens, M., Latinne, P., Decaestecker, C.: Adjusting the outputs of a classifier to new a priori probabilities: a simple procedure. Neural Comput.14(1), 21–41 (2002)

work page 2002

[29] [29]

International Journal of Approximate Reasoning 85, 159 – 177 (2017) 24

Sechidis, K., Sperrin, M., Petherick, E.S., Luj´ an, M., Brown, G.: Dealing with under-reported variables: An information theoretic solution. International Journal of Approximate Reasoning 85, 159 – 177 (2017) 24

work page 2017

[30] [30]

Journal of Machine Learning Research18(86), 1–47 (2017)

Tolstikhin, I., Sriperumbudur, B.K., Muandet, K.: Minimax estimation of kernel mean embed- dings. Journal of Machine Learning Research18(86), 1–47 (2017)

work page 2017

[31] [31]

Journal of Machine Learning Research20, 1–33 (2019)

Vaz, A., Izbicki, R., Stern, R.: Quantification under prior probability shift: the ratio estimator and its extensions. Journal of Machine Learning Research20, 1–33 (2019)

work page 2019

[32] [32]

In: Proceedings of the 30th International Conferencce on Machine Learning (2014)

Zhang, K., Sch¨ olkopf, B., Muandet, K., Wang, Z.: Domain adaptation under target and condi- tional shift. In: Proceedings of the 30th International Conferencce on Machine Learning (2014)

work page 2014

[33] [33]

In: NIPS’18

Zhang, Z., Sabuncu, M.: Generalizec cross entropy loss for training neural networks with noisy labels. In: NIPS’18. pp. 8792 – 8802 (2018)

work page 2018

[34] [34]

In: Proceedings of the Conference on Computer Vision and Pattern Recognition

Zhao, Y., Xu, Q., Jiang, Y., Wen, P., Huang, Q.: Dist-pu: Positive-unlabeled learning from a label distribution perspective. In: Proceedings of the Conference on Computer Vision and Pattern Recognition. pp. 14461–14470. CVPR’22 (2022) 25 Supplementary material 1 Additional theoretical results Lemma 4.Suppose thatM= sup x K(x, x)<∞andδ≤exp(−( √ 2 + 1)2/2)i...

work page 2022