Turtle shell clustering: A mixture approach to discriminative clustering with applications to flow cytometry and other data

Arthur White; Mackenzie R. Neal; Paul D. McNicholas

arxiv: 2604.23083 · v1 · submitted 2026-04-25 · 📊 stat.ML · cs.LG· stat.ME

Turtle shell clustering: A mixture approach to discriminative clustering with applications to flow cytometry and other data

Mackenzie R. Neal , Paul D. McNicholas , Arthur White This is my paper

Pith reviewed 2026-05-08 07:28 UTC · model grok-4.3

classification 📊 stat.ML cs.LGstat.ME

keywords clusteringdiscriminative clusteringmixture modelsmutual informationflow cytometryunsupervised learningGaussian mixturescomponent selection

0 comments

The pith

A mixture of Gaussians and uniform distributions under a regularized mutual information objective draws non-linear cluster boundaries and selects the number of groups without labels.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper presents a clustering procedure that combines generative ideas about cluster geometry with discriminative ideas about boundaries between groups. It optimizes a regularized mutual information objective using a mixture model that places Gaussian components on cluster interiors and uniform components on noise or background regions. The regularization term plus an explicit merge step lets the algorithm decide how many components to keep, producing a fully unsupervised procedure. Tests on simulated data and flow cytometry experiments show the method recovering intuitive groupings even when clusters are irregular or contaminated by noise.

Core claim

The turtle shell method is a probabilistic discriminative clustering procedure based on a regularized mutual information objective function. It employs a mixture of mixtures of Gaussian and uniform distributions to model the conditional distribution, enabling the estimation of non-linear boundaries. Automatic component selection is achieved through the regularizing term and a merge step analogous to reversible jump MCMC techniques.

What carries the argument

The regularized mutual information objective function applied to a mixture model of Gaussian and uniform distributions, together with a merge step for automatic component number selection.

If this is right

The method estimates non-linear decision boundaries between clusters without any supervision.
The regularizer and merge step together determine the number of clusters automatically.
Clusters with irregular shapes or embedded noise are still recovered as intuitive groups.
The approach applies directly to flow cytometry data to separate distinct cell populations.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same regularized objective could be paired with other base distributions to handle different types of noise beyond uniforms.
The uniform components offer a built-in mechanism for identifying points that do not belong to any cluster.
Embedding the method inside a dimensionality-reduction pipeline might extend its use to higher-dimensional biological or image datasets.

Load-bearing premise

The regularized mutual information objective combined with a mixture of Gaussian and uniform distributions will produce meaningful, non-linear boundaries and automatic component selection in a fully unsupervised setting without requiring labeled data.

What would settle it

A dataset with known ground-truth clusters of highly irregular non-convex shapes where the method either selects the wrong number of components or fails to recover the true assignments better than a standard Gaussian mixture model.

Figures

Figures reproduced from arXiv: 2604.23083 by Arthur White, Mackenzie R. Neal, Paul D. McNicholas.

**Figure 1.** Figure 1: Clusters from the EM estimation of a GMM when (a) the BIC is used to select view at source ↗

**Figure 2.** Figure 2: Histogram of data generated from a mixture of Gaussian and uniform distributions. view at source ↗

**Figure 3.** Figure 3: Example of RIM clustering results when a multi-logit is assumed. view at source ↗

**Figure 4.** Figure 4: Estimated number of clusters for each initialization method. view at source ↗

**Figure 5.** Figure 5: An example result on a simulated dataset from Section 3.3. view at source ↗

**Figure 6.** Figure 6: An example result from each tested method on a simulated dataset from the cross view at source ↗

**Figure 7.** Figure 7: An example result from each tested method on a simulated dataset from the view at source ↗

**Figure 8.** Figure 8: An example results from each tested method on a simulated dataset from the view at source ↗

**Figure 9.** Figure 9: ARI values obtained from each method on each benchmark clustering dataset view at source ↗

**Figure 10.** Figure 10: ARI values obtained from each method on each flow cytometry dataset under view at source ↗

read the original abstract

Generative approaches to clustering provide information on geometric properties of clusters, whereas discriminative approaches provide boundaries between clusters. Ideas from both approaches are incorporated to present a fully unsupervised, probabilistic, and discriminative clustering method via a regularized mutual information objective function, wherein a mixture of mixtures of Gaussian and uniform distributions is used for formulation of the conditional model. Automatic selection of the number of components is established with the introduction of the regularizing term and a merge step, similar to those applied in reversible jump Markov chain Monte Carlo methods used in Bayesian clustering. Consequently, the turtle shell method -- a fully unsupervised clustering method capable of estimating non-linear boundary lines, automatically selecting the number of components, and capturing intuitive clusters in the presence of data abnormalities such as noise and/or irregular cluster shapes -- is introduced. We test this method on various simulated and real datasets commonly explored in clustering research, and extend the analysis to datasets arising from flow cytometry experiments.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

Turtle shell clustering blends a Gaussian-uniform mixture with regularized mutual information and a merge step for automatic unsupervised clustering, but the abstract leaves the core derivations and validation thin.

read the letter

The main contribution is a fully unsupervised method that mixes generative modeling (Gaussians plus uniforms to capture shape and noise) with a discriminative regularized mutual information objective, plus a merge step to choose the number of components without fixing it in advance. This targets the practical issue in flow cytometry where clusters are often irregular or contaminated by noise, and the uniform component plus regularization is a reasonable way to let the model absorb outliers while still drawing boundaries. The merge step, modeled after reversible jump ideas, is presented as enabling automatic selection, which is a useful direction if it works without heavy tuning. What the paper does well is frame the problem clearly for applied domains and test on both simulated cases and real cytometry data, showing the method can produce intuitive groupings where standard approaches might fail on non-convex shapes. The combination itself is not revolutionary but the specific integration looks like honest incremental progress within mixture-based clustering. The soft spots are the missing details on how the regularization actually produces non-linear boundaries without circular dependence on the fitted components, and whether the single free regularization parameter stays stable across datasets or requires case-by-case adjustment. The abstract mentions experiments but gives no quantitative comparisons or sensitivity checks, so it is hard to judge whether gains over existing GMM variants or other discriminative clusterers are meaningful. This paper is aimed at statisticians and bioinformaticians who work with noisy, high-dimensional cytometry or similar data and want a probabilistic tool that avoids pre-specifying cluster count. A reader focused on mixture models would find the application section useful even if the math needs expansion. It deserves peer review because the idea is coherent and the target application is concrete; referees can check the derivations, the merge implementation, and the empirical results.

Referee Report

3 major / 3 minor

Summary. The paper introduces the turtle shell clustering method as a fully unsupervised probabilistic discriminative clustering approach. It formulates the problem via a regularized mutual information objective that employs a mixture-of-mixtures model consisting of Gaussian and uniform distributions to capture cluster geometry and boundaries. A regularizing term together with a merge step modeled after reversible-jump MCMC is used to achieve automatic selection of the number of components. The method is claimed to recover non-linear boundaries and to remain robust to noise and irregular shapes; it is evaluated on simulated data, standard clustering benchmarks, and flow-cytometry datasets.

Significance. If the central claims are substantiated, the work offers a principled bridge between generative mixture modeling and discriminative boundary estimation within a single unsupervised objective. Automatic component selection without labeled data or exhaustive hyper-parameter search would be a practical advance for noisy, high-dimensional applications such as flow cytometry. The explicit use of uniform components to model background or outliers is a concrete modeling choice that could generalize to other domains with contamination.

major comments (3)

[§3.2] §3.2, Eq. (7): the regularized mutual-information objective contains a free regularization parameter λ whose selection procedure is not fully specified; the text states that λ is 'chosen once' yet provides neither a data-driven rule nor a sensitivity analysis showing that downstream cluster count and boundaries remain stable across a plausible range of λ.
[§4.3] §4.3, Algorithm 1 (merge step): the acceptance probability for the merge operation is stated to be 'analogous to reversible-jump MCMC' but the precise Metropolis-Hastings ratio, proposal distribution, and Jacobian term are not derived; without these quantities it is impossible to verify that the merge step yields a consistent estimator of the number of components rather than an ad-hoc post-processing rule.
[Table 2] Table 2 and Figure 4: the reported ARI and NMI values on the flow-cytometry data are given for a single run; no standard errors across random initializations or cross-validation folds are supplied, making it difficult to assess whether the apparent superiority over k-means and GMM is statistically reliable.

minor comments (3)

[§2.1] Notation for the uniform component density is introduced in §2.1 but never given an explicit functional form; adding the support and normalization constant would remove ambiguity.
[Figure 3] The caption of Figure 3 does not indicate the value of λ used for the displayed partition; this information should be added for reproducibility.
[§3.1] Several references to 'mutual information' in §3.1 omit the base of the logarithm; consistency with the information-theoretic literature would be improved by stating whether nats or bits are used.

Simulated Author's Rebuttal

3 responses · 0 unresolved

Thank you for the constructive comments on our paper. We respond to each major comment in turn and indicate the revisions we plan to make to strengthen the manuscript.

read point-by-point responses

Referee: [§3.2] §3.2, Eq. (7): the regularized mutual-information objective contains a free regularization parameter λ whose selection procedure is not fully specified; the text states that λ is 'chosen once' yet provides neither a data-driven rule nor a sensitivity analysis showing that downstream cluster count and boundaries remain stable across a plausible range of λ.

Authors: We agree that the selection procedure for the regularization parameter λ requires more detail. In the revised manuscript, we will specify a data-driven rule for choosing λ, such as selecting the value that maximizes the objective on a held-out subset or a default based on data dimensionality, and include a sensitivity analysis demonstrating that the cluster count and boundaries are stable over a range of λ values. revision: yes
Referee: [§4.3] §4.3, Algorithm 1 (merge step): the acceptance probability for the merge operation is stated to be 'analogous to reversible-jump MCMC' but the precise Metropolis-Hastings ratio, proposal distribution, and Jacobian term are not derived; without these quantities it is impossible to verify that the merge step yields a consistent estimator of the number of components rather than an ad-hoc post-processing rule.

Authors: The merge step is a deterministic post-processing rule applied after the main optimization to automatically select the number of components by merging those that do not improve the objective. It is inspired by but not a direct implementation of reversible-jump MCMC. We will revise the manuscript to remove the MCMC analogy, explicitly state that the acceptance is based on whether the regularized mutual information increases after the merge, and clarify that this is a heuristic procedure rather than a theoretically consistent MCMC estimator. revision: yes
Referee: [Table 2] Table 2 and Figure 4: the reported ARI and NMI values on the flow-cytometry data are given for a single run; no standard errors across random initializations or cross-validation folds are supplied, making it difficult to assess whether the apparent superiority over k-means and GMM is statistically reliable.

Authors: We acknowledge the need for measures of variability. In the revised version, we will repeat the flow-cytometry experiments across multiple random initializations (e.g., 20 runs) and report the mean ARI and NMI along with standard errors in Table 2. Error bars will be added to Figure 4 to reflect this variability. revision: yes

Circularity Check

0 steps flagged

No significant circularity identified

full rationale

The paper defines a new unsupervised clustering procedure by combining a regularized mutual information objective with a mixture-of-mixtures model (Gaussians plus uniforms) and an explicit merge step for component selection. These elements are introduced as part of the method construction itself rather than derived from or fitted to a target quantity that is then re-labeled as a prediction. No load-bearing self-citation chains, self-definitional loops, or renamings of known results appear in the abstract or high-level description; the merge step is presented as an algorithmic addition analogous to existing RJMCMC techniques but not claimed to be forced by prior author work. The derivation therefore remains self-contained and independent of its own outputs.

Axiom & Free-Parameter Ledger

1 free parameters · 2 axioms · 0 invented entities

The central claim rests on the modeling choice that data clusters can be represented as mixtures of Gaussians plus uniforms and that regularization plus merging suffices for automatic unsupervised selection; these are domain assumptions without independent evidence supplied in the abstract.

free parameters (1)

regularization parameter
Controls the strength of the regularizing term in the mutual information objective; its value must be chosen or tuned to achieve automatic component selection.

axioms (2)

domain assumption Data can be adequately modeled by a mixture of mixtures of Gaussian and uniform distributions for the conditional model.
Used to formulate the probabilistic discriminative clustering objective.
ad hoc to paper A merge step analogous to reversible jump MCMC will correctly determine the number of components without supervision.
Introduced to enable automatic selection of the number of components.

pith-pipeline@v0.9.0 · 5468 in / 1468 out tokens · 31032 ms · 2026-05-08T07:28:45.090699+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

40 extracted references · 40 canonical work pages

[1]

Chattopadhyay, M

Aghaeepour, N., P. Chattopadhyay, M. Chikina, T. Dhaene, S. Van Gassen, M. Kursa, B. N. Lambrecht, M. Malek, G. J. McLachlan, Y. Qian, P. Qiu, Y. Saeys, R. Stanton, D. Tong, C. Vens, S. Walkowiak, K. Wang, G. Finak, R. Gottardo, T. Mosmann, G. P. Nolan, R. H. Scheuermann, and R. R. Brinkman (2016). A benchmark for evaluation of algorithms for identificati...

work page 2016
[2]

Finak, F

Aghaeepour, N., G. Finak, F. Consortium, D. Consortium, H. Hoos, T. R. Mosmann, R. Brinkman, R. Gottardo, and R. H. Scheuermann (2013). Critical assessment of automated flow cytometry data analysis techniques. Nature methods\/ 10\/ (3), 228--238

work page 2013
[3]

Baudry, J.-P. (2015). Estimation and model selection for model-based clustering with the conditional classification likelihood. Electronic Journal of Statistics\/ 9 , 1041--1077

work page 2015
[4]

Celeux, and G

Biernacki, C., G. Celeux, and G. Govaert (2002). Assessing a mixture model for clustering with the integrated completed likelihood. IEEE Transactions on Pattern Analysis and Machine Intelligence\/ 22\/ (7), 719--725

work page 2002
[5]

D., J.-L

Blondel, V. D., J.-L. Guillaume, R. Lambiotte, and E. Lefebvre (2008). Fast unfolding of communities in large networks. Journal of statistical mechanics: theory and experiment\/ 2008\/ (10), P10008

work page 2008
[6]

Heading, and D

Bridle, J., A. Heading, and D. MacKay (1991). Unsupervised classifiers, mutual information and phantom targets. Advances in neural information processing systems\/ 4

work page 1991
[7]

Browne, R. P., P. D. McNicholas, and M. D. Sparling (2011). Model-based learning using a mixture of mixtures of G aussian and uniform distributions. IEEE Transactions on Pattern Analysis and Machine Intelligence\/ 34\/ (4), 814--817

work page 2011
[8]

Byrd, R. H., P. Lu, J. Nocedal, and C. Zhu (1995). A limited memory algorithm for bound constrained optimization. SIAM Journal on scientific computing\/ 16\/ (5), 1190--1208

work page 1995
[9]

Cardoso, M. (2013). Wholesale customers . UCI Machine Learning Repository. DOI : https://doi.org/10.24432/C5030X

work page doi:10.24432/c5030x 2013
[10]

Carnell, R. (2024). lhs: Latin Hypercube Samples . R package version 1.2.0

work page 2024
[11]

Niewczas, P

Charytanowicz, M., J. Niewczas, P. Kulczycki, P. Kowalski, and S. Lukasik (2010). Seeds . UCI Machine Learning Repository. DOI : https://doi.org/10.24432/C5H30K

work page doi:10.24432/c5h30k 2010
[12]

Alkhassim, R

Commenges, D., C. Alkhassim, R. Gottardo, B. Hejblum, and R. Thi \'e baut (2018). cytometree: A binary tree algorithm for automatic gating in cytometry analysis. Cytometry Part A\/ 93\/ (11), 1132--1140

work page 2018
[13]

Alkhassim, R

Commenges, D., C. Alkhassim, R. Gottardo, B. P. Hejblum, and Rodolphe Thi\'ebaut (2018). cytometree: a binary tree algorithm for automatic gating in cytometry analysis [software]. Cytometry Part A\/ 93\/ (11), 1132--1140. Describes the R package version 2.0.6

work page 2018
[14]

Nepusz, V

Csárdi, G., T. Nepusz, V. Traag, S. Horvát, F. Zanini, D. Noom, K. Müller, D. Schoch, and M. Salmon (2026). igraph : Network Analysis and Visualization in R . R package version 2.2.1

work page 2026
[15]

Dempster, A. P., N. M. Laird, and D. B. Rubin (1977). Maximum likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society: Series B\/ 39\/ (1), 1--38

work page 1977
[16]

Doherty, U. P., R. M. McLoughlin, and A. White (2025). Challenges and adaptations of model-based clustering for flow and mass cytometry. WIREs Computational Statistics\/ 17\/ (1), e70017

work page 2025
[17]

Grandvalet, Y. and Y. Bengio (2004). Semi-supervised learning by entropy minimization. Advances in neural information processing systems\/ 17

work page 2004
[18]

Hejblum, B. P., C. Alkhassim, R. Gottardo, F. Caron, and R. Thi \'e baut (2019). Sequential Dirichlet process mixtures of multivariate skew t -distributions for model-based clustering of flow cytometry data . The Annals of Applied Statistics\/ 13\/ (1), 638 -- 660

work page 2019
[19]

Hubert, L. and P. Arabie (1985). Comparing partitions. Journal of Classification\/ 2\/ (1), 193--218

work page 1985
[20]

Hung, Y., Y. Wang, V. Zarnitsyna, C. Zhu, and C. J. Wu (2013). Hidden M arkov models with applications in cell adhesion experiments. Journal of the American Statistical Association\/ 108\/ (504), 1469--1479

work page 2013
[21]

Hurley, C. (2025). gclus: Clustering Graphics . R package version 1.3.3

work page 2025
[22]

Pucella, H

Khodadadi-Jamayran, A., J. Pucella, H. Zhou, N. Doudican, J. C. D. driven phenotypic dissection of aml reveals progenitor-like cells that correlate with prognosisand Adriana Heguy, B. Reizis, and A. Tsirigos (2020). i C ell R : Combined coverage correction and principal component alignment for batch alignment in single-cell sequencing analysis. bioRxiv\/

work page 2020
[23]

Perona, and R

Krause, A., P. Perona, and R. Gomes (2010). Discriminative clustering by regularized information maximization. Advances in neural information processing systems\/ 23

work page 2010
[24]

Levine, J. H., E. F. Simonds, S. C. Bendall, K. L. Davis, D. A. El-ad, M. D. Tadmor, O. Litvin, H. G. Fienberg, A. Jager, E. R. Zunder, et al. (2015). Data-driven phenotypic dissection of aml reveals progenitor-like cells that correlate with prognosis. Cell\/ 162\/ (1), 184--197

work page 2015
[25]

Liu, X., W. Song, B. Wong, T. Zhang, S. Yu, G. Lin, and X. Ding (2019). A comparison framework and guideline of clustering methods for mass cytometry data. Genome Biology\/ 20\/ (297)

work page 2019
[26]

Lun, A. (2025). bluster: Clustering Algorithms for Bioconductor . R package version 1.18.0

work page 2025
[27]

Marin, D., M. Tang, I. B. Ayed, and Y. Boykov (2017). Kernel clustering: Density biases and solutions. IEEE Transactions on Pattern Analysis and Machine Intelligence\/ 41\/ (1), 136--147

work page 2017
[28]

McNicholas, P. D. (2016a). Mixture Model-Based Classification . Boca Raton: Chapman & Hall/CRC Press

work page
[29]

McNicholas, P. D. (2016b). Model-based clustering. Journal of Classification\/ 33\/ (3), 331--373

work page
[30]

Moulavi, D., P. A. Jaskowiak, R. J. Campello, A. Zimek, and J. Sander (2014). Density-based clustering validation. In Proceedings of the 2014 SIAM international conference on data mining , pp.\ 839--847. SIAM

work page 2014
[31]

Mattei, C

Ohl, L., P.-A. Mattei, C. Bouveyron, W. Harchaoui, M. Leclercq, A. Droit, and F. Precioso (2022). Generalised mutual information for discriminative clustering. Advances in Neural Information Processing Systems\/ 35 , 3377--3390

work page 2022
[32]

Qian, Y., C. Wei, F. Eun-Hyung Lee, J. Campbell, J. Halliley, J. A. Lee, J. Cai, Y. M. Kong, E. Sadat, E. Thomson, et al. (2010). Elucidation of seventeen human peripheral blood b-cell subsets and quantification of the tetanus response using a density-based method for the automated identification of cell populations in multidimensional flow cytometry data...

work page 2010
[33]

R: A Language and Environment for Statistical Computing

R Core Team (2025). R: A Language and Environment for Statistical Computing . Vienna, Austria: R Foundation for Statistical Computing

work page 2025
[34]

Schwarz, G. (1978). Estimating the dimension of a model. The Annals of Statistics\/ 6\/ (2), 461--464

work page 1978
[35]

Fraley, T

Scrucca, L., C. Fraley, T. B. Murphy, and A. E. Raftery (2023). Model-Based Clustering, Classification, and Density Estimation Using mclust in R . Chapman and Hall/CRC

work page 2023
[36]

Tortora, C., R. P. Browne, A. ElSherbiny, B. C. Franczak, and P. D. McNicholas (2021). Model-based clustering, classification, and discriminant analysis using the generalized hyperbolic distribution: MixGHD R package. Journal of Statistical Software\/ 98\/ (3), 1--24

work page 2021
[37]

Callebaut, M

Van Gassen, S., B. Callebaut, M. J. Van Helden, B. N. Lambrecht, P. Demeester, T. Dhaene, and Y. Saeys (2015). Flowsom: Using self-organizing maps for visualization and interpretation of cytometry data. Cytometry Part A\/ 87\/ (7), 636--645

work page 2015
[38]

Zhang, Z., K. L. Chan, Y. Wu, and C. Chen (2004). Learning a multivariate gaussian mixture model with the reversible jump mcmc algorithm. Statistics and Computing\/ 14\/ (4), 343--355

work page 2004
[39]

Zhang, Z., C. Chen, J. Sun, and K. L. Chan (2003). EM algorithms for G aussian mixtures with split-and-merge operation. Pattern recognition\/ 36\/ (9), 1973--1983

work page 2003
[40]

Lin, and X

Zou, Y., Y. Lin, and X. Song (2024). Bayesian heterogeneous hidden M arkov models with an unknown number of states. Journal of Computational and Graphical Statistics\/ 33\/ (1), 15--24

work page 2024

[1] [1]

Chattopadhyay, M

Aghaeepour, N., P. Chattopadhyay, M. Chikina, T. Dhaene, S. Van Gassen, M. Kursa, B. N. Lambrecht, M. Malek, G. J. McLachlan, Y. Qian, P. Qiu, Y. Saeys, R. Stanton, D. Tong, C. Vens, S. Walkowiak, K. Wang, G. Finak, R. Gottardo, T. Mosmann, G. P. Nolan, R. H. Scheuermann, and R. R. Brinkman (2016). A benchmark for evaluation of algorithms for identificati...

work page 2016

[2] [2]

Finak, F

Aghaeepour, N., G. Finak, F. Consortium, D. Consortium, H. Hoos, T. R. Mosmann, R. Brinkman, R. Gottardo, and R. H. Scheuermann (2013). Critical assessment of automated flow cytometry data analysis techniques. Nature methods\/ 10\/ (3), 228--238

work page 2013

[3] [3]

Baudry, J.-P. (2015). Estimation and model selection for model-based clustering with the conditional classification likelihood. Electronic Journal of Statistics\/ 9 , 1041--1077

work page 2015

[4] [4]

Celeux, and G

Biernacki, C., G. Celeux, and G. Govaert (2002). Assessing a mixture model for clustering with the integrated completed likelihood. IEEE Transactions on Pattern Analysis and Machine Intelligence\/ 22\/ (7), 719--725

work page 2002

[5] [5]

D., J.-L

Blondel, V. D., J.-L. Guillaume, R. Lambiotte, and E. Lefebvre (2008). Fast unfolding of communities in large networks. Journal of statistical mechanics: theory and experiment\/ 2008\/ (10), P10008

work page 2008

[6] [6]

Heading, and D

Bridle, J., A. Heading, and D. MacKay (1991). Unsupervised classifiers, mutual information and phantom targets. Advances in neural information processing systems\/ 4

work page 1991

[7] [7]

Browne, R. P., P. D. McNicholas, and M. D. Sparling (2011). Model-based learning using a mixture of mixtures of G aussian and uniform distributions. IEEE Transactions on Pattern Analysis and Machine Intelligence\/ 34\/ (4), 814--817

work page 2011

[8] [8]

Byrd, R. H., P. Lu, J. Nocedal, and C. Zhu (1995). A limited memory algorithm for bound constrained optimization. SIAM Journal on scientific computing\/ 16\/ (5), 1190--1208

work page 1995

[9] [9]

Cardoso, M. (2013). Wholesale customers . UCI Machine Learning Repository. DOI : https://doi.org/10.24432/C5030X

work page doi:10.24432/c5030x 2013

[10] [10]

Carnell, R. (2024). lhs: Latin Hypercube Samples . R package version 1.2.0

work page 2024

[11] [11]

Niewczas, P

Charytanowicz, M., J. Niewczas, P. Kulczycki, P. Kowalski, and S. Lukasik (2010). Seeds . UCI Machine Learning Repository. DOI : https://doi.org/10.24432/C5H30K

work page doi:10.24432/c5h30k 2010

[12] [12]

Alkhassim, R

Commenges, D., C. Alkhassim, R. Gottardo, B. Hejblum, and R. Thi \'e baut (2018). cytometree: A binary tree algorithm for automatic gating in cytometry analysis. Cytometry Part A\/ 93\/ (11), 1132--1140

work page 2018

[13] [13]

Alkhassim, R

Commenges, D., C. Alkhassim, R. Gottardo, B. P. Hejblum, and Rodolphe Thi\'ebaut (2018). cytometree: a binary tree algorithm for automatic gating in cytometry analysis [software]. Cytometry Part A\/ 93\/ (11), 1132--1140. Describes the R package version 2.0.6

work page 2018

[14] [14]

Nepusz, V

Csárdi, G., T. Nepusz, V. Traag, S. Horvát, F. Zanini, D. Noom, K. Müller, D. Schoch, and M. Salmon (2026). igraph : Network Analysis and Visualization in R . R package version 2.2.1

work page 2026

[15] [15]

Dempster, A. P., N. M. Laird, and D. B. Rubin (1977). Maximum likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society: Series B\/ 39\/ (1), 1--38

work page 1977

[16] [16]

Doherty, U. P., R. M. McLoughlin, and A. White (2025). Challenges and adaptations of model-based clustering for flow and mass cytometry. WIREs Computational Statistics\/ 17\/ (1), e70017

work page 2025

[17] [17]

Grandvalet, Y. and Y. Bengio (2004). Semi-supervised learning by entropy minimization. Advances in neural information processing systems\/ 17

work page 2004

[18] [18]

Hejblum, B. P., C. Alkhassim, R. Gottardo, F. Caron, and R. Thi \'e baut (2019). Sequential Dirichlet process mixtures of multivariate skew t -distributions for model-based clustering of flow cytometry data . The Annals of Applied Statistics\/ 13\/ (1), 638 -- 660

work page 2019

[19] [19]

Hubert, L. and P. Arabie (1985). Comparing partitions. Journal of Classification\/ 2\/ (1), 193--218

work page 1985

[20] [20]

Hung, Y., Y. Wang, V. Zarnitsyna, C. Zhu, and C. J. Wu (2013). Hidden M arkov models with applications in cell adhesion experiments. Journal of the American Statistical Association\/ 108\/ (504), 1469--1479

work page 2013

[21] [21]

Hurley, C. (2025). gclus: Clustering Graphics . R package version 1.3.3

work page 2025

[22] [22]

Pucella, H

Khodadadi-Jamayran, A., J. Pucella, H. Zhou, N. Doudican, J. C. D. driven phenotypic dissection of aml reveals progenitor-like cells that correlate with prognosisand Adriana Heguy, B. Reizis, and A. Tsirigos (2020). i C ell R : Combined coverage correction and principal component alignment for batch alignment in single-cell sequencing analysis. bioRxiv\/

work page 2020

[23] [23]

Perona, and R

Krause, A., P. Perona, and R. Gomes (2010). Discriminative clustering by regularized information maximization. Advances in neural information processing systems\/ 23

work page 2010

[24] [24]

Levine, J. H., E. F. Simonds, S. C. Bendall, K. L. Davis, D. A. El-ad, M. D. Tadmor, O. Litvin, H. G. Fienberg, A. Jager, E. R. Zunder, et al. (2015). Data-driven phenotypic dissection of aml reveals progenitor-like cells that correlate with prognosis. Cell\/ 162\/ (1), 184--197

work page 2015

[25] [25]

Liu, X., W. Song, B. Wong, T. Zhang, S. Yu, G. Lin, and X. Ding (2019). A comparison framework and guideline of clustering methods for mass cytometry data. Genome Biology\/ 20\/ (297)

work page 2019

[26] [26]

Lun, A. (2025). bluster: Clustering Algorithms for Bioconductor . R package version 1.18.0

work page 2025

[27] [27]

Marin, D., M. Tang, I. B. Ayed, and Y. Boykov (2017). Kernel clustering: Density biases and solutions. IEEE Transactions on Pattern Analysis and Machine Intelligence\/ 41\/ (1), 136--147

work page 2017

[28] [28]

McNicholas, P. D. (2016a). Mixture Model-Based Classification . Boca Raton: Chapman & Hall/CRC Press

work page

[29] [29]

McNicholas, P. D. (2016b). Model-based clustering. Journal of Classification\/ 33\/ (3), 331--373

work page

[30] [30]

Moulavi, D., P. A. Jaskowiak, R. J. Campello, A. Zimek, and J. Sander (2014). Density-based clustering validation. In Proceedings of the 2014 SIAM international conference on data mining , pp.\ 839--847. SIAM

work page 2014

[31] [31]

Mattei, C

Ohl, L., P.-A. Mattei, C. Bouveyron, W. Harchaoui, M. Leclercq, A. Droit, and F. Precioso (2022). Generalised mutual information for discriminative clustering. Advances in Neural Information Processing Systems\/ 35 , 3377--3390

work page 2022

[32] [32]

Qian, Y., C. Wei, F. Eun-Hyung Lee, J. Campbell, J. Halliley, J. A. Lee, J. Cai, Y. M. Kong, E. Sadat, E. Thomson, et al. (2010). Elucidation of seventeen human peripheral blood b-cell subsets and quantification of the tetanus response using a density-based method for the automated identification of cell populations in multidimensional flow cytometry data...

work page 2010

[33] [33]

R: A Language and Environment for Statistical Computing

R Core Team (2025). R: A Language and Environment for Statistical Computing . Vienna, Austria: R Foundation for Statistical Computing

work page 2025

[34] [34]

Schwarz, G. (1978). Estimating the dimension of a model. The Annals of Statistics\/ 6\/ (2), 461--464

work page 1978

[35] [35]

Fraley, T

Scrucca, L., C. Fraley, T. B. Murphy, and A. E. Raftery (2023). Model-Based Clustering, Classification, and Density Estimation Using mclust in R . Chapman and Hall/CRC

work page 2023

[36] [36]

Tortora, C., R. P. Browne, A. ElSherbiny, B. C. Franczak, and P. D. McNicholas (2021). Model-based clustering, classification, and discriminant analysis using the generalized hyperbolic distribution: MixGHD R package. Journal of Statistical Software\/ 98\/ (3), 1--24

work page 2021

[37] [37]

Callebaut, M

Van Gassen, S., B. Callebaut, M. J. Van Helden, B. N. Lambrecht, P. Demeester, T. Dhaene, and Y. Saeys (2015). Flowsom: Using self-organizing maps for visualization and interpretation of cytometry data. Cytometry Part A\/ 87\/ (7), 636--645

work page 2015

[38] [38]

Zhang, Z., K. L. Chan, Y. Wu, and C. Chen (2004). Learning a multivariate gaussian mixture model with the reversible jump mcmc algorithm. Statistics and Computing\/ 14\/ (4), 343--355

work page 2004

[39] [39]

Zhang, Z., C. Chen, J. Sun, and K. L. Chan (2003). EM algorithms for G aussian mixtures with split-and-merge operation. Pattern recognition\/ 36\/ (9), 1973--1983

work page 2003

[40] [40]

Lin, and X

Zou, Y., Y. Lin, and X. Song (2024). Bayesian heterogeneous hidden M arkov models with an unknown number of states. Journal of Computational and Graphical Statistics\/ 33\/ (1), 15--24

work page 2024