Fractionally Supervised Classification with Maxima Nominated Samples

Jingyu Wang; Mohammad Jafari Jozani

arxiv: 2604.25145 · v1 · submitted 2026-04-28 · 📊 stat.ME · cs.LG· stat.ML

Fractionally Supervised Classification with Maxima Nominated Samples

Mohammad Jafari Jozani , Jingyu Wang This is my paper

Pith reviewed 2026-05-07 15:40 UTC · model grok-4.3

classification 📊 stat.ME cs.LGstat.ML

keywords fractionally supervised classificationmaxima nomination samplingEM algorithmweighted likelihoodmixture modelsorder statisticsrare event detectionsemi-supervised learning

0 comments

The pith

A latent representation of the nominated set enables a valid EM algorithm for fractionally supervised classification under maxima nomination sampling.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Fractionally supervised classification combines labeled and unlabeled data in mixture models, but existing versions assume simple random sampling. When each observation is instead the maximum from a small set, as occurs in screening rare events or reliability tests, the likelihood changes and the usual EM construction breaks down. The authors introduce a latent representation that tracks both the class of the observed maximum and the unknown composition of the remaining units in the set. This produces a proper EM algorithm together with a weighted-likelihood procedure that respects the sampling design. Simulations on rare-event normal mixtures show clear gains over methods that ignore the nomination structure, and a real-data example illustrates practical performance.

Core claim

We develop FSC for nominated samples by introducing a latent representation that accounts for both the class membership of the observed maximum and the latent composition of the remaining units in the set. The resulting method yields a proper EM algorithm and a coherent weighted-likelihood FSC procedure for NS data.

What carries the argument

Latent representation that models the class membership of the observed maximum together with the latent composition of the remaining units in each nominated set.

If this is right

The method supplies a valid EM algorithm for parameter estimation when data arise from maxima nomination sampling.
It produces a coherent weighted-likelihood version of fractionally supervised classification that incorporates both labeled and unlabeled nominated observations.
Simulations demonstrate substantial accuracy gains relative to the misspecified procedure that treats the data as if they came from simple random sampling.
A real-data analysis confirms the procedure can be applied successfully to rare-event classification problems.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same latent-construction idea could be adapted to other order-statistic sampling designs beyond maxima.
Optimal choice of set size in nomination sampling might be informed by the information gain captured in the latent model.
The framework could be combined with existing techniques for biased sampling in semi-supervised settings.

Load-bearing premise

The latent representation correctly models the conditional distribution of the unobserved units given the observed maximum and the class labels under the maxima nomination sampling mechanism.

What would settle it

Generate data from a known mixture model under the maxima nomination mechanism, then check whether the proposed EM algorithm recovers the true parameters; systematic failure to recover them would show the latent representation is misspecified.

Figures

Figures reproduced from arXiv: 2604.25145 by Jingyu Wang, Mohammad Jafari Jozani.

**Figure 1.** Figure 1: Log-likelihoods ℓC(π) (correct, solid blue) and ℓW(π) (misspecified, dashed red) as functions of π with (µ2, σ2) fixed at truth. Components are f1 = N(0, 1) and f2 = N(3.5, 1.2 2 ), with k = 3 and n = 20 NS observations generated with set.seed(2025). Both curves are shifted so their maximum is at zero. The correct MLE ˆπC = 0.54 is close to the truth π0 = 0.40 (green dotted line); the misspecified MLE ˆπW … view at source ↗

**Figure 3.** Figure 3: Performance versus w3 at ρ = 0.85, ε = 0.05, δ = 4, τ = 1.5, and n3 = 200, for set sizes k ∈ {2, 3, 5, 8}. Panel (a) shows average ARI. FSC–NS (solid, circles) remains close to 0.83 across all k and weights, whereas FSC–SRS (dashed, squares) deteriorates sharply for k ≥ 3 as w3 increases, with ARI approaching zero for k = 5 and k = 8 even at moderate weights. Panel (b) shows sensitivity for the rare class.… view at source ↗

**Figure 5.** Figure 5: Average ARI versus w3 at k = 3, n3 = 200, averaged over all (ε, δ, τ ) combinations. Left: FSC-NS; right: FSC-SRS. Curves correspond to ranking levels ρ ∈ {1, 0.85, 0.60}. ARI∈ [−1, 1]; higher is better. The three curves in the left panel nearly coincide, confirming that FSC-NS is robust to ranking quality. FSC-SRS deteriorates with w3 most severely under ρ = 1 view at source ↗

**Figure 6.** Figure 6: Average rare-class F1 score versus w3 under the same averaging scheme as view at source ↗

**Figure 7.** Figure 7: Average RMSE of ˆε versus w3 at k = 3, n3 = 200. Lower is better. FSC-NS stays near zero across w3 and ρ. FSC-SRS grows with w3, most severely under ρ = 1 view at source ↗

read the original abstract

Fractionally supervised classification (FSC) offers a flexible framework for combining labeled and unlabeled data in model-based classification, but existing formulations assume simple random sampling. In many applications, however, the retained observation is an extreme order statistic from a set rather than a randomly selected unit. This is particularly appealing when the target population is rare, since maxima nomination sampling (NS) can enrich the sample with the most informative observations, as in screening, environmental monitoring, repeated testing, and reliability studies. Under such designs, the likelihood function changes fundamentally, and the usual FSC EM construction is no longer valid. We develop FSC for nominated samples by introducing a latent representation that accounts for both the class membership of the observed maximum and the latent composition of the remaining units in the set. The resulting method yields a proper EM algorithm and a coherent weighted-likelihood FSC procedure for NS data. We present the methodology in general form, illustrate it for a rare-event contamination normal mixtures, and show through simulation that it substantially improves on the misspecified alternative by ignoring the extra rank information of such data. A real-data analysis demonstrates its practical value.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper adapts fractionally supervised classification to maxima nominated sampling via a latent model for set composition and delivers a working EM algorithm that improves on the naive alternative in simulations.

read the letter

The main contribution is a latent-variable construction that lets them write a proper likelihood for FSC when each observation is the maximum from a nominated set rather than a random draw. This produces a coherent EM algorithm and weighted-likelihood procedure instead of the standard one that would be misspecified under this design. They give the method in general form, work it out for normal mixtures in a rare-event contamination setting, run simulations that show gains from using the rank information, and include a real-data example to illustrate practical use. That is useful work for anyone who already knows FSC and needs to handle non-random sampling common in screening or reliability studies. The simulations appear to be the main evidence that the adjustment matters, and they are presented as showing clear improvement over ignoring the nomination mechanism. The soft spot is the latent representation itself. The E-step expectations will only be correct if the model exactly encodes that every unobserved unit in the set is smaller than the observed maximum, conditional on the class labels and the sampling rule. The abstract says they account for the latent composition, but without the explicit conditional distributions or derivation steps it is not obvious whether the dependence is fully enforced or approximated. Minor gaps like missing error bars or sensitivity checks on the mixture parameters would also be easy to fix but are not mentioned. This is for methodologists working on semi-supervised classification with order-statistic sampling designs. It is coherent enough on its own terms to deserve a serious referee who can check the latent construction and the simulation details.

Referee Report

3 major / 2 minor

Summary. The paper develops an extension of fractionally supervised classification (FSC) to data collected via maxima nomination sampling (NS), where each observation is the maximum from a set of units. By introducing a latent representation that captures the class membership of the observed maximum and the latent composition of the remaining units in each set, the authors derive a proper EM algorithm and a weighted-likelihood procedure for FSC under this sampling design. The method is illustrated for normal mixture models in a rare-event contamination setting, with simulations demonstrating improved performance over methods that ignore the NS mechanism, and a real-data analysis is provided.

Significance. If the proposed latent representation accurately reflects the conditional distributions induced by the maxima nomination mechanism, this work would offer a significant methodological advance for model-based classification in applications involving extreme value sampling, such as environmental monitoring and reliability studies. It addresses a gap in FSC by adapting it to non-i.i.d. sampling schemes common in practice, potentially leading to more accurate classification for rare events.

major comments (3)

[Section 3 (latent representation and EM derivation)] The central claim rests on a latent representation for the class of the observed maximum and the composition of remaining units that yields a proper EM algorithm. However, the manuscript does not demonstrate that this representation correctly encodes the conditional distribution of the unobserved units given the observed maximum (i.e., enforcing that all other units are smaller than the observed max under the NS mechanism). Any mismatch here would render the E-step expectations incorrect and the weighted-likelihood procedure inconsistent.
[Section 3 and Section 4] No explicit form is given for the complete-data likelihood, the observed-data likelihood under NS, or the E-step conditional expectations (e.g., posterior probabilities for class labels and latent compositions). Without these, it is impossible to verify that the procedure is a 'proper EM algorithm' as claimed in the abstract and Section 3.
[Section 5 (simulations)] The simulation study (Section 5) reports substantial improvement over the misspecified alternative, but provides no information on the number of Monte Carlo replications, standard errors or confidence intervals for the reported metrics (e.g., classification error rates), or sensitivity to the set size in the NS design. This leaves the evidence for improvement unquantified and potentially sensitive to design choices.

minor comments (2)

[Abstract] The abstract refers to 'a rare-event contamination normal mixtures' without specifying the mixture parameters, contamination proportion, or how the NS sets are generated; this detail should be added for reproducibility.
[Section 2 and Section 3] Notation for the latent variables (class indicators for the max and composition variables for remaining units) is introduced without a clear table or summary of their joint distribution and the order constraints they must satisfy.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the careful review and constructive comments, which have helped strengthen the presentation of our work. We address each major comment in turn below, with revisions made to improve clarity and completeness where needed.

read point-by-point responses

Referee: [Section 3 (latent representation and EM derivation)] The central claim rests on a latent representation for the class of the observed maximum and the composition of remaining units that yields a proper EM algorithm. However, the manuscript does not demonstrate that this representation correctly encodes the conditional distribution of the unobserved units given the observed maximum (i.e., enforcing that all other units are smaller than the observed max under the NS mechanism). Any mismatch here would render the E-step expectations incorrect and the weighted-likelihood procedure inconsistent.

Authors: We appreciate the referee's emphasis on this foundational aspect. The latent representation in Section 3 is constructed precisely to respect the maxima nomination mechanism: the observed value is modeled as the maximum from the set, with the remaining units' values drawn conditionally from the distribution truncated below the observed maximum, and class memberships assigned via the latent composition counts. This ensures the conditional distributions match those induced by NS. To address the concern directly, we have added an explicit verification (new Proposition 1 in the revised Section 3) showing that the joint distribution over the observed maximum and latent composition reproduces the required truncation and ordering constraints. revision: yes
Referee: [Section 3 and Section 4] No explicit form is given for the complete-data likelihood, the observed-data likelihood under NS, or the E-step conditional expectations (e.g., posterior probabilities for class labels and latent compositions). Without these, it is impossible to verify that the procedure is a 'proper EM algorithm' as claimed in the abstract and Section 3.

Authors: We agree that the explicit expressions are essential for verification. In the revised manuscript we now state the complete-data likelihood (Equation 3.4), the observed-data likelihood under the NS design (Equation 3.5), and the closed-form E-step expectations for both the class indicators and the latent set compositions (Equations 3.6–3.8). These additions confirm that the algorithm is a standard EM procedure applied to the augmented complete-data model. revision: yes
Referee: [Section 5 (simulations)] The simulation study (Section 5) reports substantial improvement over the misspecified alternative, but provides no information on the number of Monte Carlo replications, standard errors or confidence intervals for the reported metrics (e.g., classification error rates), or sensitivity to the set size in the NS design. This leaves the evidence for improvement unquantified and potentially sensitive to design choices.

Authors: We thank the referee for this observation on reporting standards. The original simulations used 500 Monte Carlo replications; we have now added this information, together with standard errors and 95% confidence intervals for all performance metrics in the revised tables of Section 5. We have also included a new sensitivity study examining performance across set sizes 2, 5, and 10, confirming that the reported gains remain stable. revision: yes

Circularity Check

0 steps flagged

No circularity: new latent representation and EM construction introduced independently for NS data

full rationale

The paper extends FSC to maxima nominated samples by defining a new latent representation that encodes class membership of the observed maximum together with the composition of the remaining units. This representation is used to construct a valid EM algorithm and weighted-likelihood procedure. No equations, claims, or self-citations in the provided text reduce the new procedure to a fitted quantity defined by the same data or to a prior result by construction. The derivation is presented as self-contained and independent of the target result, consistent with the default expectation for non-circular methodological extensions.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The central claim rests on standard mixture-model assumptions for the data-generating process and on the correctness of the newly introduced latent representation for the nomination design; no ad-hoc free parameters or invented entities are described in the abstract.

axioms (2)

domain assumption Observations arise from a finite mixture of distributions (illustrated with normal components for rare-event contamination).
Invoked when the methodology is illustrated for normal mixtures.
domain assumption The maxima nomination mechanism selects the largest value from each independent set of fixed size.
Fundamental to the change in likelihood that motivates the new latent representation.

pith-pipeline@v0.9.0 · 5495 in / 1335 out tokens · 63511 ms · 2026-05-07T15:40:17.281356+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

20 extracted references · 2 canonical work pages

[1]

B., and Raftery, A

Bouveyron, C., Celeux, G., Murphy, T. B., and Raftery, A. E. (2019).Model-Based Clustering and Classification for Data Science: With Applications in R. Cambridge University Press

2019
[2]

and Cover, T

Castelli, V. and Cover, T. L. (1996). The relative value of labeled and unlabeled samples in pattern 23 recognition with an unknown mixing parameter.IEEE Transactions on Information Theory,42(6), 2102–2117

1996
[3]

Cozman, F. G. and Cohen, I. (2002). Unlabeled data can degrade classification performance of gener- ative classifiers. InProceedings of the Fifteenth International Florida Artificial Intelligence Research Society Conference, pages 327–331. AAAI Press

2002
[4]

B., and Downey, G

Dean, N., Murphy, T. B., and Downey, G. (2006). Using unlabelled data to update classification rules with applications in food authenticity studies.Journal of the Royal Statistical Society: Series C (Applied Statistics),55(1), 1–14

2006
[5]

Dell, T. R. and Clutter, J. L. (1972). Ranked set sampling theory with order statistics background. Biometrics,28(2), 545–555

1972
[6]

P., Laird, N

Dempster, A. P., Laird, N. M., and Rubin, D. B. (1977). Maximum likelihood from incomplete data via the EM algorithm.Journal of the Royal Statistical Society: Series B (Methodological),39(1), 1–38

1977
[7]

Gallaugher, M. P. B. and McNicholas, P. D. (2019). On fractionally-supervised classification: Weight selection and extension to the multivariatet-distribution.Journal of Classification,36(2), 232–265

2019
[8]

Hanley, J. A. and McNeil, B. J. (1982). The meaning and use of the area under a receiver operating characteristic (ROC) curve.Radiology,143(1), 29–36

1982
[9]

Hatefi, A., Jafari Jozani, M., and Ziou, D. (2014). Estimation and classification for finite mixture models under ranked set sampling.Statistica Sinica,24, 675–698

2014
[10]

Hatefi, A., Reid, N., Jafari Jozani, M., and Ozturk, O. (2020). Finite mixture modeling, classification and statistical learning with order statistics.Statistica Sinica,30(4), 1881–1903

2020
[11]

and Garcia, E

He, H. and Garcia, E. A. (2009). Learning from imbalanced data.IEEE Transactions on Knowledge and Data Engineering,21(9), 1263–1284

2009
[12]

and Arabie, P

Hubert, L. and Arabie, P. (1985). Comparing partitions.Journal of Classification,2(1), 193–218. Jafari Jozani, M. and Johnson, B.C. (2012). Randomized nomination sampling for finite populations. Journal of Statistical Planning and Inference,142(7), 2103-2115

1985
[13]

and Zeng, L

King, G. and Zeng, L. (2001). Logistic regression in rare events data.Political Analysis,9(2), 137–163

2001
[14]

L., Street, W

Mangasarian, O. L., Street, W. N., and Wolberg, W. H. (1995). Breast cancer diagnosis and prognosis via linear programming.Operations Research,43(4), 570–577

1995
[15]

Pepe, M. S. (2003).The Statistical Evaluation of Medical Tests for Classification and Prediction. Oxford University Press, Oxford

2003
[16]

Steinley, D. (2004). Properties of the Hubert–Arabie adjusted Rand index.Psychological Methods, 9(3), 386–396

2004
[17]

and McNicholas, P

Vrbik, I. and McNicholas, P. D. (2015). Fractionally-supervised classification.Journal of Classification, 32(3), 359–381. 24

2015
[18]

(2021).On Fractionally-Supervised Classification with Nominated Samples

Wang, J. (2021).On Fractionally-Supervised Classification with Nominated Samples. M.Sc. thesis, Uni- versity of Manitoba, Winnipeg, Canada. Available athttps://mspace.lib.umanitoba.ca/items/ 7fb2d0d0-62d5-4521-b43d-af9f2c285afe. Wang J., Li F., Li J., Hou C., Qian Y., Liang J. (2025). RSS-Bagging: Improving Generalization Through the Fisher Information of...

work page doi:10.1109/tnnls.2023.3270559 2021
[19]

Willemain, T. R. (1980). Estimating the population median by nomination sampling.Journal of the American Statistical Association,75(372), 908–911

1980
[20]

Wolberg, W., Mangasarian, O., Street, N., and Street, W. (1993). Breast Cancer Wisconsin (Diagnostic) [Dataset].UCI Machine Learning Repository. doi:10.24432/C5DW2B. 25

work page doi:10.24432/c5dw2b 1993

[1] [1]

B., and Raftery, A

Bouveyron, C., Celeux, G., Murphy, T. B., and Raftery, A. E. (2019).Model-Based Clustering and Classification for Data Science: With Applications in R. Cambridge University Press

2019

[2] [2]

and Cover, T

Castelli, V. and Cover, T. L. (1996). The relative value of labeled and unlabeled samples in pattern 23 recognition with an unknown mixing parameter.IEEE Transactions on Information Theory,42(6), 2102–2117

1996

[3] [3]

Cozman, F. G. and Cohen, I. (2002). Unlabeled data can degrade classification performance of gener- ative classifiers. InProceedings of the Fifteenth International Florida Artificial Intelligence Research Society Conference, pages 327–331. AAAI Press

2002

[4] [4]

B., and Downey, G

Dean, N., Murphy, T. B., and Downey, G. (2006). Using unlabelled data to update classification rules with applications in food authenticity studies.Journal of the Royal Statistical Society: Series C (Applied Statistics),55(1), 1–14

2006

[5] [5]

Dell, T. R. and Clutter, J. L. (1972). Ranked set sampling theory with order statistics background. Biometrics,28(2), 545–555

1972

[6] [6]

P., Laird, N

Dempster, A. P., Laird, N. M., and Rubin, D. B. (1977). Maximum likelihood from incomplete data via the EM algorithm.Journal of the Royal Statistical Society: Series B (Methodological),39(1), 1–38

1977

[7] [7]

Gallaugher, M. P. B. and McNicholas, P. D. (2019). On fractionally-supervised classification: Weight selection and extension to the multivariatet-distribution.Journal of Classification,36(2), 232–265

2019

[8] [8]

Hanley, J. A. and McNeil, B. J. (1982). The meaning and use of the area under a receiver operating characteristic (ROC) curve.Radiology,143(1), 29–36

1982

[9] [9]

Hatefi, A., Jafari Jozani, M., and Ziou, D. (2014). Estimation and classification for finite mixture models under ranked set sampling.Statistica Sinica,24, 675–698

2014

[10] [10]

Hatefi, A., Reid, N., Jafari Jozani, M., and Ozturk, O. (2020). Finite mixture modeling, classification and statistical learning with order statistics.Statistica Sinica,30(4), 1881–1903

2020

[11] [11]

and Garcia, E

He, H. and Garcia, E. A. (2009). Learning from imbalanced data.IEEE Transactions on Knowledge and Data Engineering,21(9), 1263–1284

2009

[12] [12]

and Arabie, P

Hubert, L. and Arabie, P. (1985). Comparing partitions.Journal of Classification,2(1), 193–218. Jafari Jozani, M. and Johnson, B.C. (2012). Randomized nomination sampling for finite populations. Journal of Statistical Planning and Inference,142(7), 2103-2115

1985

[13] [13]

and Zeng, L

King, G. and Zeng, L. (2001). Logistic regression in rare events data.Political Analysis,9(2), 137–163

2001

[14] [14]

L., Street, W

Mangasarian, O. L., Street, W. N., and Wolberg, W. H. (1995). Breast cancer diagnosis and prognosis via linear programming.Operations Research,43(4), 570–577

1995

[15] [15]

Pepe, M. S. (2003).The Statistical Evaluation of Medical Tests for Classification and Prediction. Oxford University Press, Oxford

2003

[16] [16]

Steinley, D. (2004). Properties of the Hubert–Arabie adjusted Rand index.Psychological Methods, 9(3), 386–396

2004

[17] [17]

and McNicholas, P

Vrbik, I. and McNicholas, P. D. (2015). Fractionally-supervised classification.Journal of Classification, 32(3), 359–381. 24

2015

[18] [18]

(2021).On Fractionally-Supervised Classification with Nominated Samples

Wang, J. (2021).On Fractionally-Supervised Classification with Nominated Samples. M.Sc. thesis, Uni- versity of Manitoba, Winnipeg, Canada. Available athttps://mspace.lib.umanitoba.ca/items/ 7fb2d0d0-62d5-4521-b43d-af9f2c285afe. Wang J., Li F., Li J., Hou C., Qian Y., Liang J. (2025). RSS-Bagging: Improving Generalization Through the Fisher Information of...

work page doi:10.1109/tnnls.2023.3270559 2021

[19] [19]

Willemain, T. R. (1980). Estimating the population median by nomination sampling.Journal of the American Statistical Association,75(372), 908–911

1980

[20] [20]

Wolberg, W., Mangasarian, O., Street, N., and Street, W. (1993). Breast Cancer Wisconsin (Diagnostic) [Dataset].UCI Machine Learning Repository. doi:10.24432/C5DW2B. 25

work page doi:10.24432/c5dw2b 1993