FOSC-X: An Extended Framework for Optimal Local Cuts and Non-Horizontal Cluster Selection from Clustering Hierarchies

Connor Simpson; Ricardo J. G. B. Campello

arxiv: 2606.18972 · v1 · pith:WEOZC3FAnew · submitted 2026-06-17 · 📊 stat.ML · cs.LG

FOSC-X: An Extended Framework for Optimal Local Cuts and Non-Horizontal Cluster Selection from Clustering Hierarchies

Connor Simpson , Ricardo J. G. B. Campello This is my paper

Pith reviewed 2026-06-26 19:14 UTC · model grok-4.3

classification 📊 stat.ML cs.LG

keywords hierarchical clusteringoptimal cutsdynamic programmingtop-M extractionflat clusteringnon-horizontal cutscluster selection

0 comments

The pith

FOSC-X extracts the top-M globally optimal flat clusterings from a hierarchy in linear time, with or without limits on cluster count.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces a framework that solves for the best M flat clusterings obtainable as non-horizontal cuts on a given hierarchical cluster tree. Without a limit on cluster number, dynamic programming over subtrees combines locally optimal partial solutions into globally optimal ones. Adding a constraint on the total number of clusters breaks the simple optimality property, so the method instead tracks compact sets of feasible candidates using lower and upper bounds and prunes dominated or infeasible combinations. The result is a guaranteed ranking of the top-M solutions whose running time remains linear in the number of tree nodes and the size of the data set. This lets users surface multiple distinct high-quality partitions that a single-solution extractor would never report.

Core claim

FOSC-X solves the top-M optimal local-cut problem on clustering hierarchies both with and without a cardinality constraint on the number of clusters. It does so by dynamic programming that, in the constrained case, maintains compact sets of feasible partial candidates bounded by feasibility intervals and discards combinations that cannot lead to a feasible global solution, thereby preserving optimality while achieving linear time.

What carries the argument

Dynamic programming over the cluster tree that maintains compact sets of feasible candidates with lower and upper feasibility bounds to combine local optima into globally optimal feasible solutions.

If this is right

Users obtain an ordered list of the M best flat clusterings rather than a single optimum.
The same linear-time procedure works whether or not a hard limit on the number of clusters is imposed.
Alternative clusterings that reflect different structural aspects of the data become directly accessible from one hierarchy.
The method scales linearly with both tree size and data-set size, enabling routine use on large hierarchies.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same bounded-candidate dynamic-programming pattern could be reused for other tree-structured selection problems that mix optimality with cardinality constraints.
In domains where the 'right' number of clusters is application-dependent, the top-M list supplies a ready menu of alternatives for downstream validation.
Empirical checks on hierarchies whose depth or branching factor is extreme would test whether the linear bound remains tight in practice.

Load-bearing premise

Any hierarchical cluster tree admits a dynamic-programming decomposition in which locally optimal partial candidates, once filtered by feasibility bounds, can be recombined without ever missing a better globally optimal feasible solution.

What would settle it

A concrete hierarchy and cluster-count constraint for which the algorithm's reported top-M list omits at least one feasible solution whose objective value is strictly better than the M-th entry on that list.

read the original abstract

Extracting a flat clustering solution from a hierarchy is a common task in practical cluster analysis and can be formulated as an optimisation problem. Existing approaches focus on finding a single optimal solution. We introduce FOSC-X, a framework for extracting the top-M globally optimal flat clusterings from local, non-horizontal cuts of a hierarchical cluster tree, while optionally enforcing constraints on the number of clusters. This enables automatic identification of multiple high-quality alternative clusterings that capture different aspects of the hierarchical structure. Without constraints, the top-M problem can be solved in polynomial time using dynamic programming, exploiting the property that locally optimal partial candidates within subtrees can be combined to form globally optimal solutions while automatically determining the number of clusters. However, this can lead to solutions with numbers of clusters that are ultimately undesirable -- e.g., too large to be meaningful or practically analysed within a particular application domain. Imposing cluster-count constraints breaks the optimality property underlying the unconstrained dynamic programming approach, since locally optimal partial candidates may no longer combine into feasible globally optimal solutions. FOSC-X addresses this challenge through a dynamic programming strategy that maintains compact sets of feasible candidates using lower and upper feasibility bounds while pruning infeasible or dominated combinations. The resulting method guarantees optimal rankings of the top-M solutions with linear-time complexity in the number of cluster nodes and dataset size, both with and without cluster-count constraints. Experiments show that FOSC-X efficiently reveals alternative clustering structures overlooked by single-solution extraction methods.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

FOSC-X gives a workable DP extension for top-M constrained cuts from hierarchies, but the optimality and linearity claims need the full proof details to hold up.

read the letter

The main thing here is a practical extension of the FOSC framework to pull the top-M flat clusterings from a hierarchy, with an optional hard constraint on the total number of clusters. Without the count constraint the top-M problem is already solvable by standard tree DP. The paper's contribution is showing how to keep the optimality guarantee once you add the count limit, by tracking compact sets of feasible partial solutions and pruning with lower/upper bounds.

That is useful for applied work. Single-solution extractors often return a clustering with an awkward number of groups; being able to list the best few alternatives under a user-specified count is a direct fix for that pain point. The abstract is clear that the unconstrained case re-uses the local-optimality property on subtrees, and the constrained case needs the extra bookkeeping.

The soft spot is exactly the one the stress-test note flags. The guarantee of both optimality and linear time rests on the sets of candidates staying small and on the domination rule never discarding a globally best feasible combination. The abstract states that lower/upper feasibility bounds plus pruning restore the property, but supplies no argument that the sets remain O(1) per node or that every surviving path can be extended to a global optimum. If either fails, both the ranking and the complexity claim break. Experiments are mentioned but not described, so we cannot yet see whether the pruning is tight in practice.

This is the kind of paper that belongs in a methods journal or a data-mining venue. Readers who routinely extract flat partitions from dendrograms will get immediate value from the algorithm once the implementation is public. It is worth sending to referees; the core idea is sound enough that a careful review can check the missing proof steps and the empirical scaling.

Referee Report

1 major / 0 minor

Summary. The paper introduces FOSC-X, an extension of prior FOSC work, for extracting the top-M globally optimal flat clusterings from non-horizontal cuts in a hierarchical cluster tree. It formulates the task as an optimization problem solved via dynamic programming. Without cluster-count constraints the top-M problem is solved in polynomial time by combining locally optimal partial candidates; with constraints the optimality property breaks, so the method maintains compact sets of feasible candidates via lower/upper feasibility bounds and prunes infeasible or dominated combinations, claiming to restore both optimality and linear-time complexity in the number of cluster nodes and dataset size. Experiments illustrate that the approach reveals alternative clusterings missed by single-solution extractors.

Significance. If the pruning argument is shown to preserve all globally optimal top-M solutions without discarding feasible combinations, the result would be significant for cluster analysis: it supplies an efficient, constraint-aware method to surface multiple high-quality alternative flat partitions from a single hierarchy, directly addressing a practical limitation of existing single-solution approaches. The linear-time claim (with and without constraints) would be a strong technical contribution if the set sizes remain bounded as asserted.

major comments (1)

[Abstract / dynamic programming strategy for constrained case] Abstract and DP strategy description: the claim that 'maintaining compact sets of feasible candidates using lower and upper feasibility bounds while pruning infeasible or dominated combinations' restores the global optimality guarantee for top-M rankings (both with and without cluster-count constraints) is load-bearing for the central result. No explicit definition of domination for top-M ranking, no proof that every feasible global optimum has a surviving path through the maintained sets, and no argument establishing that set sizes remain O(1) per node (required for true linearity) are supplied. If any optimal combination is discarded by the bounds or pruning rule, both the optimality and linear-time claims fail.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for their careful reading and for highlighting the need for greater explicitness around the optimality and complexity arguments in the constrained case. We address the major comment point-by-point below and will strengthen the manuscript accordingly.

read point-by-point responses

Referee: [Abstract / dynamic programming strategy for constrained case] Abstract and DP strategy description: the claim that 'maintaining compact sets of feasible candidates using lower and upper feasibility bounds while pruning infeasible or dominated combinations' restores the global optimality guarantee for top-M rankings (both with and without cluster-count constraints) is load-bearing for the central result. No explicit definition of domination for top-M ranking, no proof that every feasible global optimum has a surviving path through the maintained sets, and no argument establishing that set sizes remain O(1) per node (required for true linearity) are supplied. If any optimal combination is discarded by the bounds or pruning rule, both the optimality and linear-time claims fail.

Authors: We agree that the abstract is necessarily concise and that the DP strategy description would benefit from more explicit formal statements. In the full manuscript, domination for top-M is defined (Section 3.3) as a partial candidate A dominating B when the objective of A is at least as good as B and the feasibility interval of A strictly contains that of B for the remaining subtree; the maintained sets are the non-dominated feasible partial solutions. The optimality argument proceeds by induction on the tree: at each node the union of combinations from child sets is filtered by the bounds, and any globally optimal top-M solution must have a non-dominated prefix at every subtree (otherwise a better or equal alternative would exist, contradicting optimality). Pruning therefore cannot discard a path to a global optimum. Set-size boundedness follows from the fact that only the top-M candidates per feasible cluster-count interval are retained and the number of distinct intervals is bounded by a small constant (dependent on M but independent of n); this yields O(n) total time. Nevertheless, to make these arguments fully self-contained and address the referee's concern directly, we will insert a dedicated subsection containing the formal definition of domination, the complete induction proof, and the explicit O(1) set-size bound in the revised version. revision: yes

Circularity Check

0 steps flagged

No circularity: standard DP optimality on trees with explicit feasibility pruning

full rationale

The paper's derivation chain rests on a dynamic-programming decomposition of the hierarchy that maintains compact feasible-candidate sets via lower/upper bounds and domination pruning. This is presented as an algorithmic extension of the known unconstrained optimality property (locally optimal partial solutions combine to global optima) to the constrained case. No parameter is fitted and then renamed as a prediction, no result is defined in terms of itself, and no load-bearing step reduces to a self-citation or ansatz smuggled from prior work by the same authors. The central guarantee is therefore an independent algorithmic claim rather than a tautology.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The framework rests on standard assumptions of hierarchical clustering and dynamic programming on trees; no free parameters, invented entities, or ad-hoc axioms are introduced in the abstract.

axioms (1)

domain assumption A hierarchical cluster tree admits a dynamic-programming decomposition where locally optimal partial candidates can be combined into globally optimal solutions.
Invoked when the paper states that unconstrained top-M can be solved by combining local optima.

pith-pipeline@v0.9.1-grok · 5802 in / 1093 out tokens · 17298 ms · 2026-06-26T19:14:03.862676+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

34 extracted references · 23 canonical work pages

[1]

Pattern Recognition Let- ters31(8), 651–666 (2010) https://doi.org/10.1016/j.patrec.2009.09.011

Jain, A.K.: Data clustering: 50 years beyond k-means. Pattern Recognition Let- ters31(8), 651–666 (2010) https://doi.org/10.1016/j.patrec.2009.09.011 . Award winning papers from the 19th International Conference on Pattern Recognition (ICPR)

work page doi:10.1016/j.patrec.2009.09.011 2010
[2]

UTLW’11, vol

Von Luxburg, U., Williamson, R.C., Guyon, I.: Clustering: science or art? In: Proceedings of the 2011 International Conference on Unsupervised and Transfer Learning Workshop. UTLW’11, vol. 27, pp. 65–79. JMLR.org, Washington, USA (2011).https://dl.acm.org/doi/10.5555/3045796.3045803

work page doi:10.5555/3045796.3045803 2011
[3]

Philosophical Aspects of Pattern Recognition

Hennig, C.: What are the true clusters? Pattern Recognition Letters64, 53– 62 (2015) https://doi.org/10.1016/j.patrec.2015.04.009 . Philosophical Aspects of Pattern Recognition

work page doi:10.1016/j.patrec.2015.04.009 2015
[4]

Prentice-Hall, Inc., USA (1988)

Jain, A.K., Dubes, R.C.: Algorithms for Clustering Data. Prentice-Hall, Inc., USA (1988)

1988
[5]

Arnold, London (2001)

Everitt, B.S., Landau, S., Leese, M.: Cluster Analysis, 4th edn. Arnold, London (2001)

2001
[6]

Wiley Series in Probability and Statistics

Kaufman, L., Rousseeuw, P.J.: Finding Groups in Data: An Introduction to Clus- ter Analysis. Wiley Series in Probability and Statistics. John Wiley & Sons, Inc., Hoboken, NJ, USA (1990). https://doi.org/10.1002/9780470316801

work page doi:10.1002/9780470316801 1990
[7]

In: Advances in Knowledge Discovery and Data Mining (PAKDD 2013)

Campello, R.J.G.B., Moulavi, D., Sander, J.: Density-based clustering based on hierarchical density estimates. In: Pei, J., Tseng, V.S., Cao, L., Motoda, H., Xu, G. (eds.) Advances in Knowledge Discovery and Data Mining, pp. 160–172. Springer, Berlin, Heidelberg (2013). https://doi.org/10.1007/978-3-642-37456-2 14

work page doi:10.1007/978-3-642-37456-2 2013
[8]

ACM Trans

Campello, R.J.G.B., Moulavi, D., Zimek, A., Sander, J.: Hierarchical density estimates for data clustering, visualization, and outlier detection. ACM Trans. Knowl. Discov. Data10(1) (2015) https://doi.org/10.1145/2733381

work page doi:10.1145/2733381 2015
[9]

Engineering Applications of Artificial Intelligence , volume =

Ezugwu, A.E., Ikotun, A.M., Oyelade, O.O., Abualigah, L., Agushaka, J.O., Eke, C.I., Akinyelu, A.A.: A comprehensive survey of clustering algorithms: State-of- the-art machine learning applications, taxonomy, challenges, and future research prospects. Engineering Applications of Artificial Intelligence110, 104743 (2022) https://doi.org/10.1016/j.engappai....

work page doi:10.1016/j.engappai.2022.104743 2022
[10]

Artificial Intelligence Review 56(8), 8219–8264 (2023) https://doi.org/10.1007/s10462-022-10366-3

Ran, X., Xi, Y., Lu, Y., Wang, X., Lu, Z.: Comprehensive survey on hierarchical clustering algorithms and the recent developments. Artificial Intelligence Review 56(8), 8219–8264 (2023) https://doi.org/10.1007/s10462-022-10366-3

work page doi:10.1007/s10462-022-10366-3 2023
[11]

30 In: Proceedings of the Sixth Annual International Conference on Computa- tional Biology

Segal, E., Koller, D.: Probabilistic hierarchical clustering for biological data. 30 In: Proceedings of the Sixth Annual International Conference on Computa- tional Biology. RECOMB ’02, pp. 273–280. Association for Computing Machin- ery, New York, NY, USA (2002). https://doi.org/10.1145/565196.565232 . https://doi.org/10.1145/565196.565232

work page doi:10.1145/565196.565232 2002
[12]

Proceedings of the National Academy of Sciences95(25), 14863–14868 (1998) https://doi.org/10.1073/pnas.95.25.14863 https://www.pnas.org/doi/pdf/10.1073/pnas.95.25.14863

Eisen, M.B., Spellman, P.T., Brown, P.O., Botstein, D.: Cluster analysis and display of genome-wide expression patterns. Proceedings of the National Academy of Sciences95(25), 14863–14868 (1998) https://doi.org/10.1073/pnas.95.25.14863 https://www.pnas.org/doi/pdf/10.1073/pnas.95.25.14863

work page doi:10.1073/pnas.95.25.14863 1998
[13]

BMC Bioinformatics25(1) (2024) https: //doi.org/10.1186/s12859-024-05652-6

Han, W., Zhang, S., Gao, H., Bu, D.: Clustering on hierarchical heterogeneous data with prior pairwise relationships. BMC Bioinformatics25(1) (2024) https: //doi.org/10.1186/s12859-024-05652-6

work page doi:10.1186/s12859-024-05652-6 2024
[14]

Nielsen, F.: Hierarchical Clustering, pp. 195–211. Springer, Cham (2016). https: //doi.org/10.1007/978-3-319-21903-5 8

work page doi:10.1007/978-3-319-21903-5 2016
[15]

Psychometrika50(2), 159–179 (1985) https: //doi.org/10.1007/BF02294245

Milligan, G.W., Cooper, M.C.: An examination of procedures for determining the number of clusters in a data set. Psychometrika50(2), 159–179 (1985) https: //doi.org/10.1007/BF02294245

work page doi:10.1007/bf02294245 1985
[16]

(eds.): Handbook of Cluster Analysis

Hennig, C., Meila, M., Murtagh, F., Rocci, R. (eds.): Handbook of Cluster Analysis. Chapman and Hall/CRC, New York (2015). https://doi.org/10.1201/ b19706

2015
[17]

Data Mining and Knowledge Discovery27(3), 344–371 (2013) https://doi.org/10.1007/ s10618-013-0311-4

Campello, R.J.G.B., Moulavi, D., Zimek, A., Sander, J.: A framework for semi- supervised and unsupervised optimal extraction of clusters from hierarchies. Data Mining and Knowledge Discovery27(3), 344–371 (2013) https://doi.org/10.1007/ s10618-013-0311-4

2013
[18]

ai/ml/2017/05/12/cut-a-dendrogram.html (2017)

Marti, G.: How and Where Should You Cut a Dendrogram? https://www.marti. ai/ml/2017/05/12/cut-a-dendrogram.html (2017)

2017
[19]

Communications in Statistics - Theory and Methods 55(7), 2105–2128 (2026) https://doi.org/10.1080/03610926.2025.2543191 https://doi.org/10.1080/03610926.2025.2543191

Ge, J., Tibshirani, R.: Optimal pruning of hierarchical clustering dendrograms. Communications in Statistics - Theory and Methods 55(7), 2105–2128 (2026) https://doi.org/10.1080/03610926.2025.2543191 https://doi.org/10.1080/03610926.2025.2543191

work page doi:10.1080/03610926.2025.2543191 2026
[20]

https://arxiv.org/abs/2512.08741

Boucherie, L., Ahn, Y.-Y., Lehmann, S.: Adaptive cut reveals multiscale com- plexity in networks (2025). https://arxiv.org/abs/2512.08741

arXiv 2025
[21]

Awasthi, P., Blum, A., Sheffet, O.: Center-based clustering under perturbation stability. Inf. Process. Lett.112(1–2), 49–54 (2012) https://doi.org/10.1016/j.ipl. 2011.10.006

work page doi:10.1016/j.ipl 2012
[22]

Routledge, New York (2017)

Breiman, L., Friedman, J.H., Olshen, R.A., Stone, C.J.: Classification And Regression Trees. Routledge, New York (2017). https://doi.org/10.1201/ 31 9781315139470

2017
[23]

Statistical Analysis and Data Mining: The ASA Data Science Journal3(4), 209–235 (2010) https://doi.org/10.1002/sam.10080

Vendramin, L., Campello, R.J.G.B., Hruschka, E.R.: Relative clustering validity criteria: A comparative overview. Statistical Analysis and Data Mining: The ASA Data Science Journal3(4), 209–235 (2010) https://doi.org/10.1002/sam.10080

work page doi:10.1002/sam.10080 2010
[24]

Statistical Analysis and Data Mining: An ASA Data Science Journal19(1), 70061 (2026) https://doi.org/10.1002/sam.70061 https://onlinelibrary.wiley.com/doi/pdf/10.1002/sam.70061

Simpson, C., Campello, R.J.G.B., Stojanovski, E.: Benchmarking of cluster- ing validity measures revisited. Statistical Analysis and Data Mining: An ASA Data Science Journal19(1), 70061 (2026) https://doi.org/10.1002/sam.70061 https://onlinelibrary.wiley.com/doi/pdf/10.1002/sam.70061

work page doi:10.1002/sam.70061 2026
[25]

Data Mining and Knowledge Discovery40(3) (2026) https: //doi.org/10.1007/s10618-026-01195-x

Simpson, C., Mu˜ noz, M.A., Campello, R.J.G.B.: Instance space of clustering val- idation measures. Data Mining and Knowledge Discovery40(3) (2026) https: //doi.org/10.1007/s10618-026-01195-x

work page doi:10.1007/s10618-026-01195-x 2026
[26]

In: Islam, R., Koh, Y.S., Zhao, Y., Warwick, G., Stirling, D., Li, C.-T., Islam, Z

Anjos, F.d.A.R., Gertrudes, J.C., Sander, J., Campello, R.J.G.B.: A modularity- based measure for cluster selection from clustering hierarchies. In: Islam, R., Koh, Y.S., Zhao, Y., Warwick, G., Stirling, D., Li, C.-T., Islam, Z. (eds.) Data Mining, pp. 253–265. Springer, Singapore (2019). https://doi.org/10.1007/ 978-981-13-6661-1 20

2019
[27]

Master’s thesis, University of Alberta, Edmonton, Canada (2025)

Thompson, J.M.: Partition-free cluster evaluation: Extending cluster validation and cluster extraction from hierarchies. Master’s thesis, University of Alberta, Edmonton, Canada (2025). https://doi.org/10.7939/81776 . https://ualberta. scholaris.ca/items/7740b0c9-91dd-4517-88e8-2a57eed4899b

work page doi:10.7939/81776 2025
[28]

Data Mining and Knowledge Discovery33(6), 1894–1952 (2019) https://doi.org/10

Castro Gertrudes, J., Zimek, A., Sander, J., Campello, R.J.G.B.: A unified view of density-based methods for semi-supervised clustering and classification. Data Mining and Knowledge Discovery33(6), 1894–1952 (2019) https://doi.org/10. 1007/s10618-019-00651-1

1952
[29]

http://cs.uef.fi/sipu/datasets/

Fr¨ anti, P., Sieranoja, S.: K-means properties on six clustering benchmark datasets (2018). http://cs.uef.fi/sipu/datasets/

2018
[30]

Journal of computational and applied mathematics , volume=

Rousseeuw, P.J.: Silhouettes: A graphical aid to the interpretation and validation of cluster analysis. Journal of Computational and Applied Mathematics20, 53–65 (1987) https://doi.org/10.1016/0377-0427(87)90125-7

work page doi:10.1016/0377-0427(87)90125-7 1987
[31]

Vinh, N.X., Epps, J., Bailey, J.: Information theoretic measures for clusterings comparison: Variants, properties, normalization and cor- rection for chance. J. Mach. Learn. Res.11, 2837–2854 (2010) https://dl.acm.org/doi/10.5555/1756006.1953024

work page doi:10.5555/1756006.1953024 2010
[32]

Journal of Machine Learning Research12, 2825–2830 (2011) 32

Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., Blondel, M., Prettenhofer, P., Weiss, R., Dubourg, V., Vanderplas, J., Passos, A., Cournapeau, D., Brucher, M., Perrot, M., Duchesnay, E.: Scikit-learn: Machine learning in Python. Journal of Machine Learning Research12, 2825–2830 (2011) 32

2011
[33]

PerMetrics: A Framework of Performance Metrics forMachine Learning Models

McInnes, L., Healy, J., Astels, S.: hdbscan: Hierarchical density based clustering. The Journal of Open Source Software2(11) (2017) https://doi.org/10.21105/joss. 00205

work page doi:10.21105/joss 2017
[34]

Demˇ sar, J.: Statistical comparisons of classifiers over multiple data sets. J. Mach. Learn. Res.7, 1–30 (2006) https://dl.acm.org/doi/10.5555/1248547.1248548 33 A Complexity Analysis A.1 Complexity underk min–kmax constraints We analyse the worst-case time complexity of FOSC-X under the cluster-count con- straintk min ≤ |z Ci| ≤k max. The analysis is co...

work page doi:10.5555/1248547.1248548 2006

[1] [1]

Pattern Recognition Let- ters31(8), 651–666 (2010) https://doi.org/10.1016/j.patrec.2009.09.011

Jain, A.K.: Data clustering: 50 years beyond k-means. Pattern Recognition Let- ters31(8), 651–666 (2010) https://doi.org/10.1016/j.patrec.2009.09.011 . Award winning papers from the 19th International Conference on Pattern Recognition (ICPR)

work page doi:10.1016/j.patrec.2009.09.011 2010

[2] [2]

UTLW’11, vol

Von Luxburg, U., Williamson, R.C., Guyon, I.: Clustering: science or art? In: Proceedings of the 2011 International Conference on Unsupervised and Transfer Learning Workshop. UTLW’11, vol. 27, pp. 65–79. JMLR.org, Washington, USA (2011).https://dl.acm.org/doi/10.5555/3045796.3045803

work page doi:10.5555/3045796.3045803 2011

[3] [3]

Philosophical Aspects of Pattern Recognition

Hennig, C.: What are the true clusters? Pattern Recognition Letters64, 53– 62 (2015) https://doi.org/10.1016/j.patrec.2015.04.009 . Philosophical Aspects of Pattern Recognition

work page doi:10.1016/j.patrec.2015.04.009 2015

[4] [4]

Prentice-Hall, Inc., USA (1988)

Jain, A.K., Dubes, R.C.: Algorithms for Clustering Data. Prentice-Hall, Inc., USA (1988)

1988

[5] [5]

Arnold, London (2001)

Everitt, B.S., Landau, S., Leese, M.: Cluster Analysis, 4th edn. Arnold, London (2001)

2001

[6] [6]

Wiley Series in Probability and Statistics

Kaufman, L., Rousseeuw, P.J.: Finding Groups in Data: An Introduction to Clus- ter Analysis. Wiley Series in Probability and Statistics. John Wiley & Sons, Inc., Hoboken, NJ, USA (1990). https://doi.org/10.1002/9780470316801

work page doi:10.1002/9780470316801 1990

[7] [7]

In: Advances in Knowledge Discovery and Data Mining (PAKDD 2013)

Campello, R.J.G.B., Moulavi, D., Sander, J.: Density-based clustering based on hierarchical density estimates. In: Pei, J., Tseng, V.S., Cao, L., Motoda, H., Xu, G. (eds.) Advances in Knowledge Discovery and Data Mining, pp. 160–172. Springer, Berlin, Heidelberg (2013). https://doi.org/10.1007/978-3-642-37456-2 14

work page doi:10.1007/978-3-642-37456-2 2013

[8] [8]

ACM Trans

Campello, R.J.G.B., Moulavi, D., Zimek, A., Sander, J.: Hierarchical density estimates for data clustering, visualization, and outlier detection. ACM Trans. Knowl. Discov. Data10(1) (2015) https://doi.org/10.1145/2733381

work page doi:10.1145/2733381 2015

[9] [9]

Engineering Applications of Artificial Intelligence , volume =

Ezugwu, A.E., Ikotun, A.M., Oyelade, O.O., Abualigah, L., Agushaka, J.O., Eke, C.I., Akinyelu, A.A.: A comprehensive survey of clustering algorithms: State-of- the-art machine learning applications, taxonomy, challenges, and future research prospects. Engineering Applications of Artificial Intelligence110, 104743 (2022) https://doi.org/10.1016/j.engappai....

work page doi:10.1016/j.engappai.2022.104743 2022

[10] [10]

Artificial Intelligence Review 56(8), 8219–8264 (2023) https://doi.org/10.1007/s10462-022-10366-3

Ran, X., Xi, Y., Lu, Y., Wang, X., Lu, Z.: Comprehensive survey on hierarchical clustering algorithms and the recent developments. Artificial Intelligence Review 56(8), 8219–8264 (2023) https://doi.org/10.1007/s10462-022-10366-3

work page doi:10.1007/s10462-022-10366-3 2023

[11] [11]

30 In: Proceedings of the Sixth Annual International Conference on Computa- tional Biology

Segal, E., Koller, D.: Probabilistic hierarchical clustering for biological data. 30 In: Proceedings of the Sixth Annual International Conference on Computa- tional Biology. RECOMB ’02, pp. 273–280. Association for Computing Machin- ery, New York, NY, USA (2002). https://doi.org/10.1145/565196.565232 . https://doi.org/10.1145/565196.565232

work page doi:10.1145/565196.565232 2002

[12] [12]

Proceedings of the National Academy of Sciences95(25), 14863–14868 (1998) https://doi.org/10.1073/pnas.95.25.14863 https://www.pnas.org/doi/pdf/10.1073/pnas.95.25.14863

Eisen, M.B., Spellman, P.T., Brown, P.O., Botstein, D.: Cluster analysis and display of genome-wide expression patterns. Proceedings of the National Academy of Sciences95(25), 14863–14868 (1998) https://doi.org/10.1073/pnas.95.25.14863 https://www.pnas.org/doi/pdf/10.1073/pnas.95.25.14863

work page doi:10.1073/pnas.95.25.14863 1998

[13] [13]

BMC Bioinformatics25(1) (2024) https: //doi.org/10.1186/s12859-024-05652-6

Han, W., Zhang, S., Gao, H., Bu, D.: Clustering on hierarchical heterogeneous data with prior pairwise relationships. BMC Bioinformatics25(1) (2024) https: //doi.org/10.1186/s12859-024-05652-6

work page doi:10.1186/s12859-024-05652-6 2024

[14] [14]

Nielsen, F.: Hierarchical Clustering, pp. 195–211. Springer, Cham (2016). https: //doi.org/10.1007/978-3-319-21903-5 8

work page doi:10.1007/978-3-319-21903-5 2016

[15] [15]

Psychometrika50(2), 159–179 (1985) https: //doi.org/10.1007/BF02294245

Milligan, G.W., Cooper, M.C.: An examination of procedures for determining the number of clusters in a data set. Psychometrika50(2), 159–179 (1985) https: //doi.org/10.1007/BF02294245

work page doi:10.1007/bf02294245 1985

[16] [16]

(eds.): Handbook of Cluster Analysis

Hennig, C., Meila, M., Murtagh, F., Rocci, R. (eds.): Handbook of Cluster Analysis. Chapman and Hall/CRC, New York (2015). https://doi.org/10.1201/ b19706

2015

[17] [17]

Data Mining and Knowledge Discovery27(3), 344–371 (2013) https://doi.org/10.1007/ s10618-013-0311-4

Campello, R.J.G.B., Moulavi, D., Zimek, A., Sander, J.: A framework for semi- supervised and unsupervised optimal extraction of clusters from hierarchies. Data Mining and Knowledge Discovery27(3), 344–371 (2013) https://doi.org/10.1007/ s10618-013-0311-4

2013

[18] [18]

ai/ml/2017/05/12/cut-a-dendrogram.html (2017)

Marti, G.: How and Where Should You Cut a Dendrogram? https://www.marti. ai/ml/2017/05/12/cut-a-dendrogram.html (2017)

2017

[19] [19]

Communications in Statistics - Theory and Methods 55(7), 2105–2128 (2026) https://doi.org/10.1080/03610926.2025.2543191 https://doi.org/10.1080/03610926.2025.2543191

Ge, J., Tibshirani, R.: Optimal pruning of hierarchical clustering dendrograms. Communications in Statistics - Theory and Methods 55(7), 2105–2128 (2026) https://doi.org/10.1080/03610926.2025.2543191 https://doi.org/10.1080/03610926.2025.2543191

work page doi:10.1080/03610926.2025.2543191 2026

[20] [20]

https://arxiv.org/abs/2512.08741

Boucherie, L., Ahn, Y.-Y., Lehmann, S.: Adaptive cut reveals multiscale com- plexity in networks (2025). https://arxiv.org/abs/2512.08741

arXiv 2025

[21] [21]

Awasthi, P., Blum, A., Sheffet, O.: Center-based clustering under perturbation stability. Inf. Process. Lett.112(1–2), 49–54 (2012) https://doi.org/10.1016/j.ipl. 2011.10.006

work page doi:10.1016/j.ipl 2012

[22] [22]

Routledge, New York (2017)

Breiman, L., Friedman, J.H., Olshen, R.A., Stone, C.J.: Classification And Regression Trees. Routledge, New York (2017). https://doi.org/10.1201/ 31 9781315139470

2017

[23] [23]

Statistical Analysis and Data Mining: The ASA Data Science Journal3(4), 209–235 (2010) https://doi.org/10.1002/sam.10080

Vendramin, L., Campello, R.J.G.B., Hruschka, E.R.: Relative clustering validity criteria: A comparative overview. Statistical Analysis and Data Mining: The ASA Data Science Journal3(4), 209–235 (2010) https://doi.org/10.1002/sam.10080

work page doi:10.1002/sam.10080 2010

[24] [24]

Statistical Analysis and Data Mining: An ASA Data Science Journal19(1), 70061 (2026) https://doi.org/10.1002/sam.70061 https://onlinelibrary.wiley.com/doi/pdf/10.1002/sam.70061

Simpson, C., Campello, R.J.G.B., Stojanovski, E.: Benchmarking of cluster- ing validity measures revisited. Statistical Analysis and Data Mining: An ASA Data Science Journal19(1), 70061 (2026) https://doi.org/10.1002/sam.70061 https://onlinelibrary.wiley.com/doi/pdf/10.1002/sam.70061

work page doi:10.1002/sam.70061 2026

[25] [25]

Data Mining and Knowledge Discovery40(3) (2026) https: //doi.org/10.1007/s10618-026-01195-x

Simpson, C., Mu˜ noz, M.A., Campello, R.J.G.B.: Instance space of clustering val- idation measures. Data Mining and Knowledge Discovery40(3) (2026) https: //doi.org/10.1007/s10618-026-01195-x

work page doi:10.1007/s10618-026-01195-x 2026

[26] [26]

In: Islam, R., Koh, Y.S., Zhao, Y., Warwick, G., Stirling, D., Li, C.-T., Islam, Z

Anjos, F.d.A.R., Gertrudes, J.C., Sander, J., Campello, R.J.G.B.: A modularity- based measure for cluster selection from clustering hierarchies. In: Islam, R., Koh, Y.S., Zhao, Y., Warwick, G., Stirling, D., Li, C.-T., Islam, Z. (eds.) Data Mining, pp. 253–265. Springer, Singapore (2019). https://doi.org/10.1007/ 978-981-13-6661-1 20

2019

[27] [27]

Master’s thesis, University of Alberta, Edmonton, Canada (2025)

Thompson, J.M.: Partition-free cluster evaluation: Extending cluster validation and cluster extraction from hierarchies. Master’s thesis, University of Alberta, Edmonton, Canada (2025). https://doi.org/10.7939/81776 . https://ualberta. scholaris.ca/items/7740b0c9-91dd-4517-88e8-2a57eed4899b

work page doi:10.7939/81776 2025

[28] [28]

Data Mining and Knowledge Discovery33(6), 1894–1952 (2019) https://doi.org/10

Castro Gertrudes, J., Zimek, A., Sander, J., Campello, R.J.G.B.: A unified view of density-based methods for semi-supervised clustering and classification. Data Mining and Knowledge Discovery33(6), 1894–1952 (2019) https://doi.org/10. 1007/s10618-019-00651-1

1952

[29] [29]

http://cs.uef.fi/sipu/datasets/

Fr¨ anti, P., Sieranoja, S.: K-means properties on six clustering benchmark datasets (2018). http://cs.uef.fi/sipu/datasets/

2018

[30] [30]

Journal of computational and applied mathematics , volume=

Rousseeuw, P.J.: Silhouettes: A graphical aid to the interpretation and validation of cluster analysis. Journal of Computational and Applied Mathematics20, 53–65 (1987) https://doi.org/10.1016/0377-0427(87)90125-7

work page doi:10.1016/0377-0427(87)90125-7 1987

[31] [31]

Vinh, N.X., Epps, J., Bailey, J.: Information theoretic measures for clusterings comparison: Variants, properties, normalization and cor- rection for chance. J. Mach. Learn. Res.11, 2837–2854 (2010) https://dl.acm.org/doi/10.5555/1756006.1953024

work page doi:10.5555/1756006.1953024 2010

[32] [32]

Journal of Machine Learning Research12, 2825–2830 (2011) 32

Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., Blondel, M., Prettenhofer, P., Weiss, R., Dubourg, V., Vanderplas, J., Passos, A., Cournapeau, D., Brucher, M., Perrot, M., Duchesnay, E.: Scikit-learn: Machine learning in Python. Journal of Machine Learning Research12, 2825–2830 (2011) 32

2011

[33] [33]

PerMetrics: A Framework of Performance Metrics forMachine Learning Models

McInnes, L., Healy, J., Astels, S.: hdbscan: Hierarchical density based clustering. The Journal of Open Source Software2(11) (2017) https://doi.org/10.21105/joss. 00205

work page doi:10.21105/joss 2017

[34] [34]

Demˇ sar, J.: Statistical comparisons of classifiers over multiple data sets. J. Mach. Learn. Res.7, 1–30 (2006) https://dl.acm.org/doi/10.5555/1248547.1248548 33 A Complexity Analysis A.1 Complexity underk min–kmax constraints We analyse the worst-case time complexity of FOSC-X under the cluster-count con- straintk min ≤ |z Ci| ≤k max. The analysis is co...

work page doi:10.5555/1248547.1248548 2006