Mixture Proportion Estimation and Weakly-supervised Kernel Test for Conditional Independence

Akito Narahara; Takafumi Kanamori; Yushi Hirose

arxiv: 2604.07191 · v1 · submitted 2026-04-08 · 💻 cs.LG · cs.AI

Mixture Proportion Estimation and Weakly-supervised Kernel Test for Conditional Independence

Yushi Hirose , Akito Narahara , Takafumi Kanamori This is my paper

Pith reviewed 2026-05-10 17:38 UTC · model grok-4.3

classification 💻 cs.LG cs.AI

keywords mixture proportion estimationconditional independenceweakly supervised learningkernel testsmethod of momentsidentifiabilityPU learninglabel noise

0 comments

The pith

Mixture proportion estimation becomes identifiable under conditional independence assumptions given the class label, even when irreducibility fails.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper proposes new assumptions based on conditional independence between the data and the mixture components given the class label. These assumptions ensure that the mixture proportions can be uniquely determined from unlabeled data without relying on the common irreducibility condition. The authors derive method of moments estimators for these proportions and prove their consistency and asymptotic normality. They also introduce kernel-based tests that can check whether the conditional independence holds using only weak supervision. This approach broadens the applicability of mixture proportion estimation in settings like positive-unlabeled learning and label noise where standard assumptions may not apply.

Core claim

Under the conditional independence of the observed variable and the class-conditional distribution given the label, the mixture proportions are identifiable from the marginal distribution alone, allowing consistent estimation via matching moments of the observed data to the class-conditional distributions estimated from labeled examples.

What carries the argument

Conditional independence assumptions given the class label that replace irreducibility, together with method-of-moments estimators derived from them and weakly-supervised kernel tests for validating the independence.

If this is right

If the assumptions hold, mixture proportion estimates remain consistent even in cases where no irreducible component exists.
The kernel tests can detect violations of conditional independence in weakly supervised settings without requiring full labels.
These estimators can be plugged into PU learning and domain adaptation pipelines to improve performance over irreducibility-based methods.
Asymptotic analysis provides rates for the estimators that depend on the strength of the independence.
The tests control type I and type II errors in finite samples as shown in experiments.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Such tests could be adapted to verify fairness constraints or causal structures in unlabeled data.
The framework might extend to multi-class mixtures by generalizing the independence conditions.
Practical implementations would benefit from combining the estimators with robust class-conditional density estimates.
If the independence fails mildly, the tests provide a diagnostic before trusting the proportion estimates.

Load-bearing premise

The data distribution satisfies conditional independence between the features and the mixture proportions given the class label.

What would settle it

A dataset in which the proposed estimators produce biased proportion estimates while a kernel test fails to reject the conditional independence assumption, or conversely where the test rejects but the estimates remain accurate.

read the original abstract

Mixture proportion estimation (MPE) aims to estimate class priors from unlabeled data. This task is a critical component in weakly supervised learning, such as PU learning, learning with label noise, and domain adaptation. Existing MPE methods rely on the \textit{irreducibility} assumption or its variant for identifiability. In this paper, we propose novel assumptions based on conditional independence (CI) given the class label, which ensure identifiability even when irreducibility does not hold. We develop method of moments estimators under these assumptions and analyze their asymptotic properties. Furthermore, we present weakly-supervised kernel tests to validate the CI assumptions, which are of independent interest in applications such as causal discovery and fairness evaluation. Empirically, we demonstrate the improved performance of our estimators compared with existing methods and that our tests successfully control both type I and type II errors.\label{key}

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper swaps irreducibility for conditional independence given the label to identify mixture proportions, then builds matching moment estimators and kernel tests that check the assumption itself.

read the letter

The main advance is the shift to conditional independence assumptions between features and the mixing component given the label. This keeps identifiability even when the standard irreducibility condition fails, and the authors derive method-of-moments estimators plus asymptotic normality results under ordinary kernel regularity conditions. They also supply weakly supervised kernel tests for validating those CI assumptions, which could be used separately in causal or fairness settings. Empirically the estimators improve on existing MPE baselines across the synthetic and real data they report, and the tests keep type I and II error rates under control at the nominal levels. The technical development looks careful with no obvious gaps in the identifiability argument or the moment equations. The main soft spot is that the new CI assumptions still need to hold in the target distribution, and while the paper supplies a test for them, power could drop in regimes where the chosen kernel or feature map is a poor match. The experiments are solid for what they cover but remain tied to the specific setups shown. This is useful for people doing PU learning, label noise, or domain adaptation who hit cases where irreducibility is unrealistic. A reader focused on identifiability questions in mixture models will find concrete tools here. I would send it to referees; the contribution is focused, the derivations are grounded, and the empirical checks line up with the claims.

Referee Report

0 major / 3 minor

Summary. The paper proposes novel conditional independence assumptions given the class label to ensure identifiability of mixture proportions in MPE even when irreducibility fails. It develops method-of-moments estimators under these assumptions, derives their consistency and asymptotic normality, and constructs weakly-supervised kernel tests (via RKHS embeddings) for validating the CI assumptions, with applications to causal discovery and fairness. Empirical results on synthetic and real data show improved estimation accuracy and proper type-I/II error control for the tests.

Significance. If the derivations hold, the work meaningfully advances weakly-supervised learning by providing an alternative identifiability route for MPE that does not require irreducibility, directly benefiting PU learning, label-noise correction, and domain adaptation. The weakly-supervised kernel tests are of independent interest and the manuscript supplies asymptotic guarantees plus reproducible empirical validation, which are clear strengths.

minor comments (3)

[Abstract] Abstract: the final sentence claims the tests 'successfully control both type I and type II errors' but the empirical section should clarify whether type-II power is reported against specific alternatives or only size is controlled.
[Section 4] The method-of-moments derivation assumes the feature maps are bounded; a brief remark on whether this is satisfied by the kernels used in the experiments would improve clarity.
[Section 6] Table captions and axis labels in the experimental figures could be expanded to include the exact kernel bandwidth selection procedure and the number of Monte Carlo repetitions.

Simulated Author's Rebuttal

0 responses · 0 unresolved

We thank the referee for their positive summary of our work and the recommendation for minor revision. The referee's description accurately captures the paper's focus on novel conditional independence assumptions for identifiability in mixture proportion estimation, the method-of-moments estimators with asymptotic analysis, and the weakly-supervised kernel tests.

Circularity Check

0 steps flagged

No significant circularity in derivation chain

full rationale

The paper introduces novel conditional independence assumptions given the class label to ensure identifiability of mixture proportions even without irreducibility. It derives method-of-moments estimators from these assumptions, proves consistency and asymptotic normality under standard regularity conditions on kernels and feature maps, and constructs weakly-supervised kernel tests via RKHS embeddings. None of these steps reduce by construction to fitted inputs, self-definitions, or self-citation chains; the identifiability argument and statistical guarantees rest on the proposed assumptions and external regularity conditions rather than circular reductions.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim depends on the validity of the newly proposed conditional independence assumptions for identifiability; no free parameters or invented entities are mentioned in the abstract.

axioms (1)

domain assumption Conditional independence of features given the class label ensures identifiability of mixture proportions even without irreducibility.
This is the key new assumption introduced to replace the standard irreducibility condition for MPE.

pith-pipeline@v0.9.0 · 5454 in / 1286 out tokens · 59245 ms · 2026-05-10T17:38:06.114786+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

18 extracted references · 18 canonical work pages

[1]

[Yes] (b) An analysis of the properties and complexity (time, space, sample size) of any algorithm

For all models and algorithms presented, check if you include: Mixture Proportion Estimation and W eakly-supervised Kernel T est for Conditional Independence (a) A clear description of the mathematical set- ting, assumptions, algorithm, and/or model. [Yes] (b) An analysis of the properties and complexity (time, space, sample size) of any algorithm. [Yes] ...

work page
[2]

[Yes] (b) Complete proofs of all theoretical results

For any theoretical claim, check if you include: (a) Statements of the full set of assumptions of all theoretical results. [Yes] (b) Complete proofs of all theoretical results. [Yes] (c) Clear explanations of any assumptions. [Yes]

work page
[3]

[Yes] (b) All the training details (e.g., data splits, hy- perparameters, how they were chosen)

For all figures and tables that present empirical results, check if you include: (a) The code, data, and instructions needed to re- produce the main experimental results (either in the supplemental material or as a URL). [Yes] (b) All the training details (e.g., data splits, hy- perparameters, how they were chosen). [Yes] (c) A clear definition of the spe...

work page
[4]

[Yes] (b) The license information of the assets, if appli- cable

If you are using existing assets (e.g., code, data, models) or curating/releasing new assets, check if you include: (a) Citations of the creator If your work uses existing assets. [Yes] (b) The license information of the assets, if appli- cable. [Yes] (c) Newassetseitherinthesupplementalmaterial or as a URL, if applicable. [Not Applicable] (d) Information...

work page
[5]

[Not Applicable] (b) Descriptions of potential participant risks, withlinkstoInstitutionalReviewBoard(IRB) approvals if applicable

If you used crowdsourcing or conducted research with human subjects, check if you include: (a) The full text of instructions given to partici- pants and screenshots. [Not Applicable] (b) Descriptions of potential participant risks, withlinkstoInstitutionalReviewBoard(IRB) approvals if applicable. [Not Applicable] (c) The estimated hourly wage paid to part...

work page
[6]

=−(θ−θ ′)2(P1 −N 1)(P2 −N 2) Therefore, a=−(θ−θ ′)2(EP1[g1]−E N1[g1])·(E P2[g2]−E N2[g2]). Mixture Proportion Estimation and W eakly-supervised Kernel T est for Conditional Independence Considering α∗ is one solution ofmCI (α) = 0, if( EP1[g1] −E N1[g1]) · (EP2[g2] −E N2[g2]) ̸= 0, a̸ = 0and there exist real solutions formCI (α) = 0. Proof of Theorem 1.Th...

work page 2024
[7]

= Σ ∞ r=1λ1,rϕ1,r(x1)ϕ1,r(x′ 1)and k2(x2, x′

work page
[8]

= Σ ∞ r=1λ2,rϕ2,r(x2)ϕ2,r(x′ 2)where λ1,r, λ2,r and ϕ1,r, ϕ2,r are eigenvalues and eigenfunctions. Since these expansions are absolutely convergent, applying Fubini-Tonelli theorem, we can write˜k12(x, x′)as follows: ˜k12(x, x′) = k1(x1, x′ 1)−E z1∼F1 k1(x1, z1)−E z1∼F1 k1(x′ 1, z1) +E z1,z′ 1∼F1 k1(z1, z′ 1) k2(x2, x′ 2)−E z2∼F2 k2(x2, z2)−E z2∼F2 k2(x′ ...

work page
[9]

=ϕ 2,r(x′ 2)−E F2 ϕ2,r(z2). Y ushi Hirose, Akito Narahara, T akafumi Kanamori Then the test statisticTCI is written as follows with˜ϕ1,r and ˜ϕ2,r: TCI = E ˆF12 [φ1 ⊗φ 2]−E ˆF1 ˆF2 [φ1 ⊗φ 2] 2 H = E ˆF12 [(φ1 −E F1 φ1)⊗(φ 2 −E F2 φ2)]−E ˆF2 ˆF2 [(φ1 −E F1 φ1)⊗(φ 2 −E F2 φ2)] 2 H =E ˆF12, ˆF12 ˜k12(x, x′)−2E ˆF12, ˆF1 ˆF2 ˜k12(x, x′) +E ˆF1 ˆF2, ˆF1 ˆF2 ˜k...

work page 1981
[10]

since it can be written as follows. TCI = 1 n6n′6 nX i1,...,i6=1 n′ X q1,...,q6=1 hi1,...,i6,q1,...,q6 whereh i1,...,i6,q1,...,q6 is a symmetric function such that hi1,...,i6,q1,...,q6 := 1 6!6! (i1,..,i6)X (j1,...,j6) (q1,..,q6)X (r1,...,r6) ⟨φj1,...,j3,r1,...,r3 , φj4,...,j6,r4,...,r6 ⟩ Y ushi Hirose, Akito Narahara, T akafumi Kanamori and φj1,...,j3,r1...

work page 2007
[11]

− µX1|XS(x′ S)⟩ and ˜k2S(x2S, x′ 2S) = ⟨φ2(x2) −µ X2|XS(xS), φ2(x′

work page
[12]

−µ X2|XS(x′ S)⟩. By Mercer’s theorem, these kernels can be expanded ˜k1S(x1S, x′ 1S) = ∞X r=1 λ1,r(ϕ1,r(x1)−E F [ϕ1,r(x1)|xS])(ϕ1,r(x′ 1)−E F [ϕ1,r(x1)|x′ S]) = ∞X r=1 λ1,r ˜ϕ1,r(x1S)˜ϕ1,r(x′ 1S), ˜k2S(x2S, x′ 2S) = ∞X r=1 λ2,r(ϕ2,r(x2)−E F [ϕ2,r(x2)|xS])(ϕ2,r(x′ 2)−E F [ϕ2,r(x2)|x′ S]) = ∞X r=1 λ2,r ˜ϕ2,r(x2S)˜ϕ2,r(x′ 2S), kS(xS, x′ S) = ∞X r=1 λS,rϕS,r(...

work page 2007
[13]

˜K1S ∈R M×M and ˜K2S ∈R M×M are the Gram matrices associated with ˜k1S and ˜k2S, defined by( ˜K1S)ij = ˜k1S(vx1S ,i,v x1S ,j)and( ˜K2S)ij = ˜k2S(vx2S ,i,v x2S ,j)

−µ X1|XS(x′ S)⟩ and ˜k2S(x2S, x′ 2S) = ⟨φ2(x2)−µ X2|XS(xS), φ2(x′ 2)−µ X2|XS(x′ S)⟩,we can computeTM CI as Y ushi Hirose, Akito Narahara, T akafumi Kanamori TM CI = tr(( ˜K1S ⊙K S)Dα∗ ˜K2SDα∗) where ⊙ denotes the Hadamard product. ˜K1S ∈R M×M and ˜K2S ∈R M×M are the Gram matrices associated with ˜k1S and ˜k2S, defined by( ˜K1S)ij = ˜k1S(vx1S ,i,v x1S ,j)a...

work page 2011
[14]

We first selected a candidate set of features Xi that were discriminative, satisfying |E[X i |Y= 1]−E[X i |Y=−1]|/ p V[X i |Y= 1]> 0.5, since a significant mean difference is es- sential for the efficient MPE

work page
[15]

We then applied the HSIC test to all pairs of features from this candidate set to identify those satisfying the CI condition, with a significance level 0.05

work page
[16]

For the MPE task, we setn = n′ = 2000and used a Positive-Unlabeled (PU) setting with classpriors (θ, θ′) = (1,0.5)

For each detected CI feature pair, we ran our CI MPE method 10 times. For the MPE task, we setn = n′ = 2000and used a Positive-Unlabeled (PU) setting with classpriors (θ, θ′) = (1,0.5). D.2.3 MCI MPE with synthetic data We used a regularization parameterλ = 5 × 10−4 and a Gaussian kernel with bandwidthσ = 3.5for all MCI MPE experiments. The search ranges ...

work page
[17]

In the search, we constructed a candidate set of features that satisfies|E[X i |Y= 1]−E[X i |Y=−1]|/ p V[X i |Y= 1]> 1, similarly to D.2.2

We searched for feature triplets(X1, X2, XS)satisfying the MCI condition in the negative class by applying the KCI test (Zhang et al., 2011) to all possible triplets with a significance level 0.05. In the search, we constructed a candidate set of features that satisfies|E[X i |Y= 1]−E[X i |Y=−1]|/ p V[X i |Y= 1]> 1, similarly to D.2.2. Then we only used f...

work page 2011
[18]

For the MPE task, we setn = n′ = 1000and used a Positive-Unlabeled (PU) setting with classpriors (θ, θ′) = (1, 0.5)

For each detected triplet, we ran our MCI MPE method 5 times and evaluated the estimation error forθ′. For the MPE task, we setn = n′ = 1000and used a Positive-Unlabeled (PU) setting with classpriors (θ, θ′) = (1, 0.5). We used a Gaussian kernel with bandwidthσ = 1.0for KRR, set the regularization parameter toλ = 10−3 and the search rangeIα− = [−1.25,− 0....

work page

[1] [1]

[Yes] (b) An analysis of the properties and complexity (time, space, sample size) of any algorithm

For all models and algorithms presented, check if you include: Mixture Proportion Estimation and W eakly-supervised Kernel T est for Conditional Independence (a) A clear description of the mathematical set- ting, assumptions, algorithm, and/or model. [Yes] (b) An analysis of the properties and complexity (time, space, sample size) of any algorithm. [Yes] ...

work page

[2] [2]

[Yes] (b) Complete proofs of all theoretical results

For any theoretical claim, check if you include: (a) Statements of the full set of assumptions of all theoretical results. [Yes] (b) Complete proofs of all theoretical results. [Yes] (c) Clear explanations of any assumptions. [Yes]

work page

[3] [3]

[Yes] (b) All the training details (e.g., data splits, hy- perparameters, how they were chosen)

For all figures and tables that present empirical results, check if you include: (a) The code, data, and instructions needed to re- produce the main experimental results (either in the supplemental material or as a URL). [Yes] (b) All the training details (e.g., data splits, hy- perparameters, how they were chosen). [Yes] (c) A clear definition of the spe...

work page

[4] [4]

[Yes] (b) The license information of the assets, if appli- cable

If you are using existing assets (e.g., code, data, models) or curating/releasing new assets, check if you include: (a) Citations of the creator If your work uses existing assets. [Yes] (b) The license information of the assets, if appli- cable. [Yes] (c) Newassetseitherinthesupplementalmaterial or as a URL, if applicable. [Not Applicable] (d) Information...

work page

[5] [5]

[Not Applicable] (b) Descriptions of potential participant risks, withlinkstoInstitutionalReviewBoard(IRB) approvals if applicable

If you used crowdsourcing or conducted research with human subjects, check if you include: (a) The full text of instructions given to partici- pants and screenshots. [Not Applicable] (b) Descriptions of potential participant risks, withlinkstoInstitutionalReviewBoard(IRB) approvals if applicable. [Not Applicable] (c) The estimated hourly wage paid to part...

work page

[6] [6]

=−(θ−θ ′)2(P1 −N 1)(P2 −N 2) Therefore, a=−(θ−θ ′)2(EP1[g1]−E N1[g1])·(E P2[g2]−E N2[g2]). Mixture Proportion Estimation and W eakly-supervised Kernel T est for Conditional Independence Considering α∗ is one solution ofmCI (α) = 0, if( EP1[g1] −E N1[g1]) · (EP2[g2] −E N2[g2]) ̸= 0, a̸ = 0and there exist real solutions formCI (α) = 0. Proof of Theorem 1.Th...

work page 2024

[7] [7]

= Σ ∞ r=1λ1,rϕ1,r(x1)ϕ1,r(x′ 1)and k2(x2, x′

work page

[8] [8]

= Σ ∞ r=1λ2,rϕ2,r(x2)ϕ2,r(x′ 2)where λ1,r, λ2,r and ϕ1,r, ϕ2,r are eigenvalues and eigenfunctions. Since these expansions are absolutely convergent, applying Fubini-Tonelli theorem, we can write˜k12(x, x′)as follows: ˜k12(x, x′) = k1(x1, x′ 1)−E z1∼F1 k1(x1, z1)−E z1∼F1 k1(x′ 1, z1) +E z1,z′ 1∼F1 k1(z1, z′ 1) k2(x2, x′ 2)−E z2∼F2 k2(x2, z2)−E z2∼F2 k2(x′ ...

work page

[9] [9]

=ϕ 2,r(x′ 2)−E F2 ϕ2,r(z2). Y ushi Hirose, Akito Narahara, T akafumi Kanamori Then the test statisticTCI is written as follows with˜ϕ1,r and ˜ϕ2,r: TCI = E ˆF12 [φ1 ⊗φ 2]−E ˆF1 ˆF2 [φ1 ⊗φ 2] 2 H = E ˆF12 [(φ1 −E F1 φ1)⊗(φ 2 −E F2 φ2)]−E ˆF2 ˆF2 [(φ1 −E F1 φ1)⊗(φ 2 −E F2 φ2)] 2 H =E ˆF12, ˆF12 ˜k12(x, x′)−2E ˆF12, ˆF1 ˆF2 ˜k12(x, x′) +E ˆF1 ˆF2, ˆF1 ˆF2 ˜k...

work page 1981

[10] [10]

since it can be written as follows. TCI = 1 n6n′6 nX i1,...,i6=1 n′ X q1,...,q6=1 hi1,...,i6,q1,...,q6 whereh i1,...,i6,q1,...,q6 is a symmetric function such that hi1,...,i6,q1,...,q6 := 1 6!6! (i1,..,i6)X (j1,...,j6) (q1,..,q6)X (r1,...,r6) ⟨φj1,...,j3,r1,...,r3 , φj4,...,j6,r4,...,r6 ⟩ Y ushi Hirose, Akito Narahara, T akafumi Kanamori and φj1,...,j3,r1...

work page 2007

[11] [11]

− µX1|XS(x′ S)⟩ and ˜k2S(x2S, x′ 2S) = ⟨φ2(x2) −µ X2|XS(xS), φ2(x′

work page

[12] [12]

−µ X2|XS(x′ S)⟩. By Mercer’s theorem, these kernels can be expanded ˜k1S(x1S, x′ 1S) = ∞X r=1 λ1,r(ϕ1,r(x1)−E F [ϕ1,r(x1)|xS])(ϕ1,r(x′ 1)−E F [ϕ1,r(x1)|x′ S]) = ∞X r=1 λ1,r ˜ϕ1,r(x1S)˜ϕ1,r(x′ 1S), ˜k2S(x2S, x′ 2S) = ∞X r=1 λ2,r(ϕ2,r(x2)−E F [ϕ2,r(x2)|xS])(ϕ2,r(x′ 2)−E F [ϕ2,r(x2)|x′ S]) = ∞X r=1 λ2,r ˜ϕ2,r(x2S)˜ϕ2,r(x′ 2S), kS(xS, x′ S) = ∞X r=1 λS,rϕS,r(...

work page 2007

[13] [13]

˜K1S ∈R M×M and ˜K2S ∈R M×M are the Gram matrices associated with ˜k1S and ˜k2S, defined by( ˜K1S)ij = ˜k1S(vx1S ,i,v x1S ,j)and( ˜K2S)ij = ˜k2S(vx2S ,i,v x2S ,j)

−µ X1|XS(x′ S)⟩ and ˜k2S(x2S, x′ 2S) = ⟨φ2(x2)−µ X2|XS(xS), φ2(x′ 2)−µ X2|XS(x′ S)⟩,we can computeTM CI as Y ushi Hirose, Akito Narahara, T akafumi Kanamori TM CI = tr(( ˜K1S ⊙K S)Dα∗ ˜K2SDα∗) where ⊙ denotes the Hadamard product. ˜K1S ∈R M×M and ˜K2S ∈R M×M are the Gram matrices associated with ˜k1S and ˜k2S, defined by( ˜K1S)ij = ˜k1S(vx1S ,i,v x1S ,j)a...

work page 2011

[14] [14]

We first selected a candidate set of features Xi that were discriminative, satisfying |E[X i |Y= 1]−E[X i |Y=−1]|/ p V[X i |Y= 1]> 0.5, since a significant mean difference is es- sential for the efficient MPE

work page

[15] [15]

We then applied the HSIC test to all pairs of features from this candidate set to identify those satisfying the CI condition, with a significance level 0.05

work page

[16] [16]

For the MPE task, we setn = n′ = 2000and used a Positive-Unlabeled (PU) setting with classpriors (θ, θ′) = (1,0.5)

For each detected CI feature pair, we ran our CI MPE method 10 times. For the MPE task, we setn = n′ = 2000and used a Positive-Unlabeled (PU) setting with classpriors (θ, θ′) = (1,0.5). D.2.3 MCI MPE with synthetic data We used a regularization parameterλ = 5 × 10−4 and a Gaussian kernel with bandwidthσ = 3.5for all MCI MPE experiments. The search ranges ...

work page

[17] [17]

In the search, we constructed a candidate set of features that satisfies|E[X i |Y= 1]−E[X i |Y=−1]|/ p V[X i |Y= 1]> 1, similarly to D.2.2

We searched for feature triplets(X1, X2, XS)satisfying the MCI condition in the negative class by applying the KCI test (Zhang et al., 2011) to all possible triplets with a significance level 0.05. In the search, we constructed a candidate set of features that satisfies|E[X i |Y= 1]−E[X i |Y=−1]|/ p V[X i |Y= 1]> 1, similarly to D.2.2. Then we only used f...

work page 2011

[18] [18]

For the MPE task, we setn = n′ = 1000and used a Positive-Unlabeled (PU) setting with classpriors (θ, θ′) = (1, 0.5)

For each detected triplet, we ran our MCI MPE method 5 times and evaluated the estimation error forθ′. For the MPE task, we setn = n′ = 1000and used a Positive-Unlabeled (PU) setting with classpriors (θ, θ′) = (1, 0.5). We used a Gaussian kernel with bandwidthσ = 1.0for KRR, set the regularization parameter toλ = 10−3 and the search rangeIα− = [−1.25,− 0....

work page