pith. sign in

arxiv: 2606.18972 · v1 · pith:WEOZC3FAnew · submitted 2026-06-17 · 📊 stat.ML · cs.LG

FOSC-X: An Extended Framework for Optimal Local Cuts and Non-Horizontal Cluster Selection from Clustering Hierarchies

Pith reviewed 2026-06-26 19:14 UTC · model grok-4.3

classification 📊 stat.ML cs.LG
keywords hierarchical clusteringoptimal cutsdynamic programmingtop-M extractionflat clusteringnon-horizontal cutscluster selection
0
0 comments X

The pith

FOSC-X extracts the top-M globally optimal flat clusterings from a hierarchy in linear time, with or without limits on cluster count.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces a framework that solves for the best M flat clusterings obtainable as non-horizontal cuts on a given hierarchical cluster tree. Without a limit on cluster number, dynamic programming over subtrees combines locally optimal partial solutions into globally optimal ones. Adding a constraint on the total number of clusters breaks the simple optimality property, so the method instead tracks compact sets of feasible candidates using lower and upper bounds and prunes dominated or infeasible combinations. The result is a guaranteed ranking of the top-M solutions whose running time remains linear in the number of tree nodes and the size of the data set. This lets users surface multiple distinct high-quality partitions that a single-solution extractor would never report.

Core claim

FOSC-X solves the top-M optimal local-cut problem on clustering hierarchies both with and without a cardinality constraint on the number of clusters. It does so by dynamic programming that, in the constrained case, maintains compact sets of feasible partial candidates bounded by feasibility intervals and discards combinations that cannot lead to a feasible global solution, thereby preserving optimality while achieving linear time.

What carries the argument

Dynamic programming over the cluster tree that maintains compact sets of feasible candidates with lower and upper feasibility bounds to combine local optima into globally optimal feasible solutions.

If this is right

  • Users obtain an ordered list of the M best flat clusterings rather than a single optimum.
  • The same linear-time procedure works whether or not a hard limit on the number of clusters is imposed.
  • Alternative clusterings that reflect different structural aspects of the data become directly accessible from one hierarchy.
  • The method scales linearly with both tree size and data-set size, enabling routine use on large hierarchies.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same bounded-candidate dynamic-programming pattern could be reused for other tree-structured selection problems that mix optimality with cardinality constraints.
  • In domains where the 'right' number of clusters is application-dependent, the top-M list supplies a ready menu of alternatives for downstream validation.
  • Empirical checks on hierarchies whose depth or branching factor is extreme would test whether the linear bound remains tight in practice.

Load-bearing premise

Any hierarchical cluster tree admits a dynamic-programming decomposition in which locally optimal partial candidates, once filtered by feasibility bounds, can be recombined without ever missing a better globally optimal feasible solution.

What would settle it

A concrete hierarchy and cluster-count constraint for which the algorithm's reported top-M list omits at least one feasible solution whose objective value is strictly better than the M-th entry on that list.

read the original abstract

Extracting a flat clustering solution from a hierarchy is a common task in practical cluster analysis and can be formulated as an optimisation problem. Existing approaches focus on finding a single optimal solution. We introduce FOSC-X, a framework for extracting the top-M globally optimal flat clusterings from local, non-horizontal cuts of a hierarchical cluster tree, while optionally enforcing constraints on the number of clusters. This enables automatic identification of multiple high-quality alternative clusterings that capture different aspects of the hierarchical structure. Without constraints, the top-M problem can be solved in polynomial time using dynamic programming, exploiting the property that locally optimal partial candidates within subtrees can be combined to form globally optimal solutions while automatically determining the number of clusters. However, this can lead to solutions with numbers of clusters that are ultimately undesirable -- e.g., too large to be meaningful or practically analysed within a particular application domain. Imposing cluster-count constraints breaks the optimality property underlying the unconstrained dynamic programming approach, since locally optimal partial candidates may no longer combine into feasible globally optimal solutions. FOSC-X addresses this challenge through a dynamic programming strategy that maintains compact sets of feasible candidates using lower and upper feasibility bounds while pruning infeasible or dominated combinations. The resulting method guarantees optimal rankings of the top-M solutions with linear-time complexity in the number of cluster nodes and dataset size, both with and without cluster-count constraints. Experiments show that FOSC-X efficiently reveals alternative clustering structures overlooked by single-solution extraction methods.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

1 major / 0 minor

Summary. The paper introduces FOSC-X, an extension of prior FOSC work, for extracting the top-M globally optimal flat clusterings from non-horizontal cuts in a hierarchical cluster tree. It formulates the task as an optimization problem solved via dynamic programming. Without cluster-count constraints the top-M problem is solved in polynomial time by combining locally optimal partial candidates; with constraints the optimality property breaks, so the method maintains compact sets of feasible candidates via lower/upper feasibility bounds and prunes infeasible or dominated combinations, claiming to restore both optimality and linear-time complexity in the number of cluster nodes and dataset size. Experiments illustrate that the approach reveals alternative clusterings missed by single-solution extractors.

Significance. If the pruning argument is shown to preserve all globally optimal top-M solutions without discarding feasible combinations, the result would be significant for cluster analysis: it supplies an efficient, constraint-aware method to surface multiple high-quality alternative flat partitions from a single hierarchy, directly addressing a practical limitation of existing single-solution approaches. The linear-time claim (with and without constraints) would be a strong technical contribution if the set sizes remain bounded as asserted.

major comments (1)
  1. [Abstract / dynamic programming strategy for constrained case] Abstract and DP strategy description: the claim that 'maintaining compact sets of feasible candidates using lower and upper feasibility bounds while pruning infeasible or dominated combinations' restores the global optimality guarantee for top-M rankings (both with and without cluster-count constraints) is load-bearing for the central result. No explicit definition of domination for top-M ranking, no proof that every feasible global optimum has a surviving path through the maintained sets, and no argument establishing that set sizes remain O(1) per node (required for true linearity) are supplied. If any optimal combination is discarded by the bounds or pruning rule, both the optimality and linear-time claims fail.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for their careful reading and for highlighting the need for greater explicitness around the optimality and complexity arguments in the constrained case. We address the major comment point-by-point below and will strengthen the manuscript accordingly.

read point-by-point responses
  1. Referee: [Abstract / dynamic programming strategy for constrained case] Abstract and DP strategy description: the claim that 'maintaining compact sets of feasible candidates using lower and upper feasibility bounds while pruning infeasible or dominated combinations' restores the global optimality guarantee for top-M rankings (both with and without cluster-count constraints) is load-bearing for the central result. No explicit definition of domination for top-M ranking, no proof that every feasible global optimum has a surviving path through the maintained sets, and no argument establishing that set sizes remain O(1) per node (required for true linearity) are supplied. If any optimal combination is discarded by the bounds or pruning rule, both the optimality and linear-time claims fail.

    Authors: We agree that the abstract is necessarily concise and that the DP strategy description would benefit from more explicit formal statements. In the full manuscript, domination for top-M is defined (Section 3.3) as a partial candidate A dominating B when the objective of A is at least as good as B and the feasibility interval of A strictly contains that of B for the remaining subtree; the maintained sets are the non-dominated feasible partial solutions. The optimality argument proceeds by induction on the tree: at each node the union of combinations from child sets is filtered by the bounds, and any globally optimal top-M solution must have a non-dominated prefix at every subtree (otherwise a better or equal alternative would exist, contradicting optimality). Pruning therefore cannot discard a path to a global optimum. Set-size boundedness follows from the fact that only the top-M candidates per feasible cluster-count interval are retained and the number of distinct intervals is bounded by a small constant (dependent on M but independent of n); this yields O(n) total time. Nevertheless, to make these arguments fully self-contained and address the referee's concern directly, we will insert a dedicated subsection containing the formal definition of domination, the complete induction proof, and the explicit O(1) set-size bound in the revised version. revision: yes

Circularity Check

0 steps flagged

No circularity: standard DP optimality on trees with explicit feasibility pruning

full rationale

The paper's derivation chain rests on a dynamic-programming decomposition of the hierarchy that maintains compact feasible-candidate sets via lower/upper bounds and domination pruning. This is presented as an algorithmic extension of the known unconstrained optimality property (locally optimal partial solutions combine to global optima) to the constrained case. No parameter is fitted and then renamed as a prediction, no result is defined in terms of itself, and no load-bearing step reduces to a self-citation or ansatz smuggled from prior work by the same authors. The central guarantee is therefore an independent algorithmic claim rather than a tautology.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The framework rests on standard assumptions of hierarchical clustering and dynamic programming on trees; no free parameters, invented entities, or ad-hoc axioms are introduced in the abstract.

axioms (1)
  • domain assumption A hierarchical cluster tree admits a dynamic-programming decomposition where locally optimal partial candidates can be combined into globally optimal solutions.
    Invoked when the paper states that unconstrained top-M can be solved by combining local optima.

pith-pipeline@v0.9.1-grok · 5802 in / 1093 out tokens · 17298 ms · 2026-06-26T19:14:03.862676+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

34 extracted references · 23 canonical work pages

  1. [1]

    Pattern Recognition Let- ters31(8), 651–666 (2010) https://doi.org/10.1016/j.patrec.2009.09.011

    Jain, A.K.: Data clustering: 50 years beyond k-means. Pattern Recognition Let- ters31(8), 651–666 (2010) https://doi.org/10.1016/j.patrec.2009.09.011 . Award winning papers from the 19th International Conference on Pattern Recognition (ICPR)

  2. [2]

    UTLW’11, vol

    Von Luxburg, U., Williamson, R.C., Guyon, I.: Clustering: science or art? In: Proceedings of the 2011 International Conference on Unsupervised and Transfer Learning Workshop. UTLW’11, vol. 27, pp. 65–79. JMLR.org, Washington, USA (2011).https://dl.acm.org/doi/10.5555/3045796.3045803

  3. [3]

    Philosophical Aspects of Pattern Recognition

    Hennig, C.: What are the true clusters? Pattern Recognition Letters64, 53– 62 (2015) https://doi.org/10.1016/j.patrec.2015.04.009 . Philosophical Aspects of Pattern Recognition

  4. [4]

    Prentice-Hall, Inc., USA (1988)

    Jain, A.K., Dubes, R.C.: Algorithms for Clustering Data. Prentice-Hall, Inc., USA (1988)

  5. [5]

    Arnold, London (2001)

    Everitt, B.S., Landau, S., Leese, M.: Cluster Analysis, 4th edn. Arnold, London (2001)

  6. [6]

    Wiley Series in Probability and Statistics

    Kaufman, L., Rousseeuw, P.J.: Finding Groups in Data: An Introduction to Clus- ter Analysis. Wiley Series in Probability and Statistics. John Wiley & Sons, Inc., Hoboken, NJ, USA (1990). https://doi.org/10.1002/9780470316801

  7. [7]

    In: Advances in Knowledge Discovery and Data Mining (PAKDD 2013)

    Campello, R.J.G.B., Moulavi, D., Sander, J.: Density-based clustering based on hierarchical density estimates. In: Pei, J., Tseng, V.S., Cao, L., Motoda, H., Xu, G. (eds.) Advances in Knowledge Discovery and Data Mining, pp. 160–172. Springer, Berlin, Heidelberg (2013). https://doi.org/10.1007/978-3-642-37456-2 14

  8. [8]

    ACM Trans

    Campello, R.J.G.B., Moulavi, D., Zimek, A., Sander, J.: Hierarchical density estimates for data clustering, visualization, and outlier detection. ACM Trans. Knowl. Discov. Data10(1) (2015) https://doi.org/10.1145/2733381

  9. [9]

    Engineering Applications of Artificial Intelligence , volume =

    Ezugwu, A.E., Ikotun, A.M., Oyelade, O.O., Abualigah, L., Agushaka, J.O., Eke, C.I., Akinyelu, A.A.: A comprehensive survey of clustering algorithms: State-of- the-art machine learning applications, taxonomy, challenges, and future research prospects. Engineering Applications of Artificial Intelligence110, 104743 (2022) https://doi.org/10.1016/j.engappai....

  10. [10]

    Artificial Intelligence Review 56(8), 8219–8264 (2023) https://doi.org/10.1007/s10462-022-10366-3

    Ran, X., Xi, Y., Lu, Y., Wang, X., Lu, Z.: Comprehensive survey on hierarchical clustering algorithms and the recent developments. Artificial Intelligence Review 56(8), 8219–8264 (2023) https://doi.org/10.1007/s10462-022-10366-3

  11. [11]

    30 In: Proceedings of the Sixth Annual International Conference on Computa- tional Biology

    Segal, E., Koller, D.: Probabilistic hierarchical clustering for biological data. 30 In: Proceedings of the Sixth Annual International Conference on Computa- tional Biology. RECOMB ’02, pp. 273–280. Association for Computing Machin- ery, New York, NY, USA (2002). https://doi.org/10.1145/565196.565232 . https://doi.org/10.1145/565196.565232

  12. [12]

    Proceedings of the National Academy of Sciences95(25), 14863–14868 (1998) https://doi.org/10.1073/pnas.95.25.14863 https://www.pnas.org/doi/pdf/10.1073/pnas.95.25.14863

    Eisen, M.B., Spellman, P.T., Brown, P.O., Botstein, D.: Cluster analysis and display of genome-wide expression patterns. Proceedings of the National Academy of Sciences95(25), 14863–14868 (1998) https://doi.org/10.1073/pnas.95.25.14863 https://www.pnas.org/doi/pdf/10.1073/pnas.95.25.14863

  13. [13]

    BMC Bioinformatics25(1) (2024) https: //doi.org/10.1186/s12859-024-05652-6

    Han, W., Zhang, S., Gao, H., Bu, D.: Clustering on hierarchical heterogeneous data with prior pairwise relationships. BMC Bioinformatics25(1) (2024) https: //doi.org/10.1186/s12859-024-05652-6

  14. [14]

    Nielsen, F.: Hierarchical Clustering, pp. 195–211. Springer, Cham (2016). https: //doi.org/10.1007/978-3-319-21903-5 8

  15. [15]

    Psychometrika50(2), 159–179 (1985) https: //doi.org/10.1007/BF02294245

    Milligan, G.W., Cooper, M.C.: An examination of procedures for determining the number of clusters in a data set. Psychometrika50(2), 159–179 (1985) https: //doi.org/10.1007/BF02294245

  16. [16]

    (eds.): Handbook of Cluster Analysis

    Hennig, C., Meila, M., Murtagh, F., Rocci, R. (eds.): Handbook of Cluster Analysis. Chapman and Hall/CRC, New York (2015). https://doi.org/10.1201/ b19706

  17. [17]

    Data Mining and Knowledge Discovery27(3), 344–371 (2013) https://doi.org/10.1007/ s10618-013-0311-4

    Campello, R.J.G.B., Moulavi, D., Zimek, A., Sander, J.: A framework for semi- supervised and unsupervised optimal extraction of clusters from hierarchies. Data Mining and Knowledge Discovery27(3), 344–371 (2013) https://doi.org/10.1007/ s10618-013-0311-4

  18. [18]

    ai/ml/2017/05/12/cut-a-dendrogram.html (2017)

    Marti, G.: How and Where Should You Cut a Dendrogram? https://www.marti. ai/ml/2017/05/12/cut-a-dendrogram.html (2017)

  19. [19]

    Communications in Statistics - Theory and Methods 55(7), 2105–2128 (2026) https://doi.org/10.1080/03610926.2025.2543191 https://doi.org/10.1080/03610926.2025.2543191

    Ge, J., Tibshirani, R.: Optimal pruning of hierarchical clustering dendrograms. Communications in Statistics - Theory and Methods 55(7), 2105–2128 (2026) https://doi.org/10.1080/03610926.2025.2543191 https://doi.org/10.1080/03610926.2025.2543191

  20. [20]

    https://arxiv.org/abs/2512.08741

    Boucherie, L., Ahn, Y.-Y., Lehmann, S.: Adaptive cut reveals multiscale com- plexity in networks (2025). https://arxiv.org/abs/2512.08741

  21. [21]

    Awasthi, P., Blum, A., Sheffet, O.: Center-based clustering under perturbation stability. Inf. Process. Lett.112(1–2), 49–54 (2012) https://doi.org/10.1016/j.ipl. 2011.10.006

  22. [22]

    Routledge, New York (2017)

    Breiman, L., Friedman, J.H., Olshen, R.A., Stone, C.J.: Classification And Regression Trees. Routledge, New York (2017). https://doi.org/10.1201/ 31 9781315139470

  23. [23]

    Statistical Analysis and Data Mining: The ASA Data Science Journal3(4), 209–235 (2010) https://doi.org/10.1002/sam.10080

    Vendramin, L., Campello, R.J.G.B., Hruschka, E.R.: Relative clustering validity criteria: A comparative overview. Statistical Analysis and Data Mining: The ASA Data Science Journal3(4), 209–235 (2010) https://doi.org/10.1002/sam.10080

  24. [24]

    Statistical Analysis and Data Mining: An ASA Data Science Journal19(1), 70061 (2026) https://doi.org/10.1002/sam.70061 https://onlinelibrary.wiley.com/doi/pdf/10.1002/sam.70061

    Simpson, C., Campello, R.J.G.B., Stojanovski, E.: Benchmarking of cluster- ing validity measures revisited. Statistical Analysis and Data Mining: An ASA Data Science Journal19(1), 70061 (2026) https://doi.org/10.1002/sam.70061 https://onlinelibrary.wiley.com/doi/pdf/10.1002/sam.70061

  25. [25]

    Data Mining and Knowledge Discovery40(3) (2026) https: //doi.org/10.1007/s10618-026-01195-x

    Simpson, C., Mu˜ noz, M.A., Campello, R.J.G.B.: Instance space of clustering val- idation measures. Data Mining and Knowledge Discovery40(3) (2026) https: //doi.org/10.1007/s10618-026-01195-x

  26. [26]

    In: Islam, R., Koh, Y.S., Zhao, Y., Warwick, G., Stirling, D., Li, C.-T., Islam, Z

    Anjos, F.d.A.R., Gertrudes, J.C., Sander, J., Campello, R.J.G.B.: A modularity- based measure for cluster selection from clustering hierarchies. In: Islam, R., Koh, Y.S., Zhao, Y., Warwick, G., Stirling, D., Li, C.-T., Islam, Z. (eds.) Data Mining, pp. 253–265. Springer, Singapore (2019). https://doi.org/10.1007/ 978-981-13-6661-1 20

  27. [27]

    Master’s thesis, University of Alberta, Edmonton, Canada (2025)

    Thompson, J.M.: Partition-free cluster evaluation: Extending cluster validation and cluster extraction from hierarchies. Master’s thesis, University of Alberta, Edmonton, Canada (2025). https://doi.org/10.7939/81776 . https://ualberta. scholaris.ca/items/7740b0c9-91dd-4517-88e8-2a57eed4899b

  28. [28]

    Data Mining and Knowledge Discovery33(6), 1894–1952 (2019) https://doi.org/10

    Castro Gertrudes, J., Zimek, A., Sander, J., Campello, R.J.G.B.: A unified view of density-based methods for semi-supervised clustering and classification. Data Mining and Knowledge Discovery33(6), 1894–1952 (2019) https://doi.org/10. 1007/s10618-019-00651-1

  29. [29]

    http://cs.uef.fi/sipu/datasets/

    Fr¨ anti, P., Sieranoja, S.: K-means properties on six clustering benchmark datasets (2018). http://cs.uef.fi/sipu/datasets/

  30. [30]

    Journal of computational and applied mathematics , volume=

    Rousseeuw, P.J.: Silhouettes: A graphical aid to the interpretation and validation of cluster analysis. Journal of Computational and Applied Mathematics20, 53–65 (1987) https://doi.org/10.1016/0377-0427(87)90125-7

  31. [31]

    Vinh, N.X., Epps, J., Bailey, J.: Information theoretic measures for clusterings comparison: Variants, properties, normalization and cor- rection for chance. J. Mach. Learn. Res.11, 2837–2854 (2010) https://dl.acm.org/doi/10.5555/1756006.1953024

  32. [32]

    Journal of Machine Learning Research12, 2825–2830 (2011) 32

    Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., Blondel, M., Prettenhofer, P., Weiss, R., Dubourg, V., Vanderplas, J., Passos, A., Cournapeau, D., Brucher, M., Perrot, M., Duchesnay, E.: Scikit-learn: Machine learning in Python. Journal of Machine Learning Research12, 2825–2830 (2011) 32

  33. [33]

    PerMetrics: A Framework of Performance Metrics forMachine Learning Models

    McInnes, L., Healy, J., Astels, S.: hdbscan: Hierarchical density based clustering. The Journal of Open Source Software2(11) (2017) https://doi.org/10.21105/joss. 00205

  34. [34]

    Demˇ sar, J.: Statistical comparisons of classifiers over multiple data sets. J. Mach. Learn. Res.7, 1–30 (2006) https://dl.acm.org/doi/10.5555/1248547.1248548 33 A Complexity Analysis A.1 Complexity underk min–kmax constraints We analyse the worst-case time complexity of FOSC-X under the cluster-count con- straintk min ≤ |z Ci| ≤k max. The analysis is co...