Composite Silhouette: A Subsampling-based Aggregation Strategy

Aggelos Semoglou; Aristidis Likas; John Pavlopoulos

arxiv: 2604.13816 · v1 · submitted 2026-04-15 · 💻 cs.LG

Composite Silhouette: A Subsampling-based Aggregation Strategy

Aggelos Semoglou , Aristidis Likas , John Pavlopoulos This is my paper

Pith reviewed 2026-05-10 14:06 UTC · model grok-4.3

classification 💻 cs.LG

keywords cluster validationsilhouette coefficientnumber of clusterssubsamplingmicro-averagingmacro-averaginginternal criterionunsupervised learning

0 comments

The pith

Composite Silhouette aggregates micro- and macro-averaged scores from subsampled clusterings to select the true number of clusters more accurately.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces Composite Silhouette as a new internal validation criterion for determining the number of clusters when ground-truth labels are unavailable. Standard micro-averaged Silhouette tends to favor larger clusters under size imbalance, while macro-averaging can amplify noise from small groups; the new method combines the two for each subsample through an adaptive convex weight based on their normalized discrepancy, smoothed by a bounded nonlinearity, then averages the results across subsamples. This approach aims to reconcile the strengths of both averaging styles while providing finite-sample concentration guarantees. A sympathetic reader would care because cluster-count selection is a core unsupervised task, and biased metrics lead to unreliable partitions in real data with uneven group sizes.

Core claim

Composite Silhouette aggregates evidence across repeated subsampled clusterings rather than a single partition. For each subsample, micro- and macro-averaged Silhouette scores are combined through an adaptive convex weight determined by their normalized discrepancy and smoothed by a bounded nonlinearity; the final score is obtained by averaging these subsample-level composites. The criterion reconciles the strengths of micro- and macro-averaging and improves recovery of the ground-truth number of clusters on both synthetic and real-world datasets.

What carries the argument

The adaptive convex weight, set by the normalized discrepancy between micro- and macro-averaged Silhouette scores and smoothed by a bounded nonlinearity, applied across repeated subsampled clusterings.

If this is right

The method yields more accurate recovery of the ground-truth number of clusters than standard Silhouette on synthetic and real-world data.
Finite-sample concentration guarantees hold for the subsampling-based estimate.
Key mathematical properties of the composite criterion are established.
Bias from cluster size imbalance is reduced without overemphasizing noise in small groups.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The subsampling aggregation could be tested on streaming data where full clustering is infeasible.
Similar adaptive weighting might stabilize other internal metrics such as Davies-Bouldin or Calinski-Harabasz.
If the discrepancy-based rule proves robust, the same pattern could apply to ensemble validation across different clustering algorithms.

Load-bearing premise

The adaptive convex weight based on normalized discrepancy between micro- and macro-scores, together with the bounded nonlinearity, produces a stable and unbiased aggregate that improves cluster-count selection.

What would settle it

Run the method on multiple datasets with known ground-truth cluster counts and controlled size imbalance; if it does not recover the true number more frequently than standard micro-averaged Silhouette, the advantage claim fails.

read the original abstract

Determining the number of clusters is a central challenge in unsupervised learning, where ground-truth labels are unavailable. The Silhouette coefficient is a widely used internal validation metric for this task, yet its standard micro-averaged form tends to favor larger clusters under size imbalance. Macro-averaging mitigates this bias by weighting clusters equally, but may overemphasize noise from under-represented groups. We introduce Composite Silhouette, an internal criterion for cluster-count selection that aggregates evidence across repeated subsampled clusterings rather than relying on a single partition. For each subsample, micro- and macro-averaged Silhouette scores are combined through an adaptive convex weight determined by their normalized discrepancy and smoothed by a bounded nonlinearity; the final score is then obtained by averaging these subsample-level composites. We establish key properties of the criterion and derive finite-sample concentration guarantees for its subsampling estimate. Experiments on synthetic and real-world datasets show that Composite Silhouette effectively reconciles the strengths of micro- and macro-averaging, yielding more accurate recovery of the ground-truth number of clusters.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

Composite Silhouette blends micro and macro scores via adaptive discrepancy weights over subsamples, but the finite-sample concentration claims do not appear to survive the data dependence introduced by those weights.

read the letter

The paper introduces Composite Silhouette, which aggregates micro and macro silhouette scores adaptively over subsamples to better handle cluster imbalance when selecting the number of clusters. The adaptive weighting is the novel part, but I worry the claimed finite-sample concentration guarantees don't actually cover the data-dependent weights. The new element is the adaptive convex combination where the weight comes from the normalized difference between micro and macro scores on each subsample, smoothed by some bounded nonlinearity, then averaged across subsamples. This specific combination is not in the standard silhouette literature. What the paper does well is recognizing the bias in micro-averaging toward larger clusters and the potential noise in macro-averaging for small ones. Using subsampling to aggregate evidence is a solid practical step for more stable cluster count selection. The soft spots are mainly in the theory. The stress-test concern holds up: because the weight for each subsample depends on that subsample's scores, the overall estimator is not a fixed function of the data. Standard concentration results require the function to be fixed or to have controlled sensitivity. Without an explicit Lipschitz constant or bias-variance breakdown for the adaptive step, the guarantees probably don't transfer cleanly, especially in high-imbalance cases where discrepancy is large. On the empirical side, the abstract claims better recovery of ground-truth k on synthetic and real data, but without specifics on the experimental setup, number of runs, or how they handle varying imbalance, it's difficult to assess how general the improvement is. The circularity burden is low, which is good, as the method builds directly from existing variants without extra fitted parameters. This paper is aimed at researchers in unsupervised learning who rely on internal validation for choosing the number of clusters, particularly in datasets with size imbalance. A reader looking for improvements to silhouette would find the idea useful to consider, though they should check the proofs carefully. It deserves a serious referee because the underlying problem is common and the proposed fix is concrete, even if it requires revisions to strengthen the theoretical support and expand the experiments. I recommend sending it for peer review rather than desk rejecting it.

Referee Report

2 major / 1 minor

Summary. The manuscript proposes Composite Silhouette, a subsampling-based aggregation strategy for cluster-count selection. For each subsample, it computes an adaptive convex combination of micro- and macro-averaged Silhouette scores, where the weight is determined by the normalized discrepancy between the two scores and smoothed via a bounded nonlinearity. The final criterion is the average of these composite scores over subsamples. The authors claim to establish key properties of this criterion and derive finite-sample concentration guarantees for the subsampling estimator. Experiments on synthetic and real-world data are said to show superior recovery of the ground-truth number of clusters.

Significance. If the central claims hold, the work offers a principled way to mitigate the cluster-size bias in standard Silhouette while avoiding the noise sensitivity of pure macro-averaging. The adaptive weighting and subsampling aggregation represent a potentially useful advance for internal cluster validation in imbalanced datasets. The provision of concentration bounds, if rigorously established, would strengthen the method's theoretical foundation beyond purely empirical proposals.

major comments (2)

[§4] §4 (finite-sample concentration guarantees): The derivation applies standard concentration inequalities (e.g., bounded differences) directly to the composite score. Because the adaptive convex weight is computed from the micro- and macro-Silhouette values on the identical subsample, the composite is a data-dependent function of the data. No Lipschitz constant on the discrepancy-to-weight map or separate bias/variance decomposition for the adaptive step is supplied, so the guarantees do not automatically transfer; this is load-bearing for the stability and unbiasedness claims under imbalance.
[§5] §5 (experimental evaluation): The claim that Composite Silhouette yields more accurate ground-truth k recovery rests on experiments whose design is not fully specified (number of subsamples, imbalance regimes tested, exact synthetic generation process, number of independent runs, and statistical testing). Without these details the empirical superiority over micro- and macro-Silhouette cannot be verified and the weakest assumption (stable unbiased aggregate) remains untested.

minor comments (1)

[Method] The definition of the normalized discrepancy and the precise form of the bounded nonlinearity are stated only in prose; an explicit equation or algorithm box would improve reproducibility.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive and detailed feedback. We respond point-by-point to the major comments below, providing clarifications on the theoretical analysis and committing to expand experimental details for reproducibility.

read point-by-point responses

Referee: [§4] §4 (finite-sample concentration guarantees): The derivation applies standard concentration inequalities (e.g., bounded differences) directly to the composite score. Because the adaptive convex weight is computed from the micro- and macro-Silhouette values on the identical subsample, the composite is a data-dependent function of the data. No Lipschitz constant on the discrepancy-to-weight map or separate bias/variance decomposition for the adaptive step is supplied, so the guarantees do not automatically transfer; this is load-bearing for the stability and unbiasedness claims under imbalance.

Authors: We acknowledge that the adaptive weighting renders the per-subsample composite a data-dependent function. However, because both the micro- and macro-averaged Silhouette scores lie in [-1,1] and the weight is obtained via a bounded, continuous nonlinearity of their normalized discrepancy (itself in [0,1]), the resulting composite score remains bounded in [-1,1] irrespective of the realized weight. This boundedness permits direct application of McDiarmid’s inequality to the average over independent subsamples. In the revision we will explicitly derive and state the Lipschitz constant of the discrepancy-to-weight map (which is finite because the nonlinearity is smooth and bounded) and include a short bias-variance discussion showing that any adaptivity-induced bias vanishes with increasing subsample size. These additions will make the transfer of the concentration guarantees fully rigorous. revision: yes
Referee: [§5] §5 (experimental evaluation): The claim that Composite Silhouette yields more accurate ground-truth k recovery rests on experiments whose design is not fully specified (number of subsamples, imbalance regimes tested, exact synthetic generation process, number of independent runs, and statistical testing). Without these details the empirical superiority over micro- and macro-Silhouette cannot be verified and the weakest assumption (stable unbiased aggregate) remains untested.

Authors: We agree that the experimental protocol requires fuller specification. In the revised manuscript we will add: the number of subsamples (50), the imbalance regimes examined (size ratios from 1:1 to 1:100), the precise synthetic data generator (Gaussian mixtures with controlled means, covariances, and per-cluster sample sizes), the number of independent Monte-Carlo runs (200), and the statistical tests employed (Wilcoxon signed-rank tests with reported p-values). These details were present in the supplementary code and appendix but will be moved into the main text. The synthetic experiments, which use known ground-truth labels, directly evaluate recovery accuracy and consistency across runs and imbalance levels, thereby providing empirical evidence for the stability of the aggregated criterion. revision: yes

Circularity Check

0 steps flagged

No significant circularity in the derivation chain

full rationale

The Composite Silhouette criterion is explicitly constructed by combining the two standard (micro- and macro-averaged) Silhouette scores via an adaptive convex weight computed from their normalized discrepancy and smoothed by a bounded nonlinearity, then averaged over subsamples. The paper states that it establishes key properties of this criterion and derives finite-sample concentration guarantees for the resulting subsampling estimator. No step in the provided description reduces a claimed result to its own inputs by construction, renames a fitted quantity as a prediction, or relies on a load-bearing self-citation whose content is itself unverified; the adaptive weighting rule is a deliberate definitional choice rather than a self-referential equation or statistical tautology. The derivation therefore remains self-contained against the standard Silhouette definitions and standard concentration tools.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim rests on the validity of the adaptive weighting rule and the finite-sample concentration guarantees for the subsampled estimator; these are asserted but not derived in the provided abstract.

axioms (1)

domain assumption Finite-sample concentration inequalities apply to the subsampling-based composite estimator
Invoked to support the theoretical properties of the criterion

pith-pipeline@v0.9.0 · 5481 in / 1252 out tokens · 59302 ms · 2026-05-10T14:06:21.053975+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

24 extracted references · 24 canonical work pages

[1]

ACM computing surveys (CSUR)31(3), 264–323 (1999)

Jain, A.K., Murty, M.N., Flynn, P.J.: Data clustering: a review. ACM computing surveys (CSUR)31(3), 264–323 (1999)

work page 1999
[2]

Journal of computational and applied mathematics20, 53–65 (1987)

Rousseeuw, P.J.: Silhouettes: a graphical aid to the interpretation and validation of cluster analysis. Journal of computational and applied mathematics20, 53–65 (1987)

work page 1987
[3]

Journal of Machine Learning Research12, 2825–2830 (2011)

Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., Blondel, M., Prettenhofer, P., Weiss, R., Dubourg, V., Vanderplas, J., Passos, A., Cournapeau, D., Brucher, M., Perrot, M., Duchesnay, ´E.: Scikit-learn: Machine learning in python. Journal of Machine Learning Research12, 2825–2830 (2011)

work page 2011
[4]

Computa- tional Statistics & Data Analysis158, 107190 (2021)

Batool, F., Hennig, C.: Clustering with the average silhouette width. Computa- tional Statistics & Data Analysis158, 107190 (2021)

work page 2021
[5]

In: Pedreschi, D., Monreale, A., Guidotti, R., Pellungrini, R., Naretto, F

Pavlopoulos, J., Vardakas, G., Likas, A.: Revisiting silhouette aggregation. In: Pedreschi, D., Monreale, A., Guidotti, R., Pellungrini, R., Naretto, F. (eds.) Discovery Science, pp. 354–368. Springer, Cham (2025)

work page 2025
[6]

Wang, Y., Zhao, Y., Therneau, T., Atkinson, E., P. Tafti, A., Zhang, N., Amin, S., Limper, A., Khosla, S., Liu, H.: Unsupervised machine learning for the discovery 16 of latent disease clusters and patient subgroups using electronic health records. Journal of Biomedical Informatics102, 103364 (2019)

work page 2019
[7]

Analytics2, 809–823 (2023)

John, J., Shobayo, O., Ogunleye, B.: An exploration of clustering algorithms for customer segmentation in the uk retail market. Analytics2, 809–823 (2023)

work page 2023
[8]

Information Retrieval Journal25, 239–268 (2022)

Yuan, M., Zobel, J., Lin, P.: Measurement of clustering effectiveness for document collections. Information Retrieval Journal25, 239–268 (2022)

work page 2022
[9]

SIGMOD Record31(2002)

Halkidi, M., Batistakis, Y., Vazirgiannis, M.: Cluster validity methods: Part i. SIGMOD Record31(2002)

work page 2002
[10]

ACM SIGMOD Record31(2002)

Halkidi, M., Batistakis, Y., Vazirgiannis, M.: Clustering validity checking meth- ods: Part ii. ACM SIGMOD Record31(2002)

work page 2002
[11]

Pattern Recognit.46, 243–256 (2013)

Arbelaitz, O., Gurrutxaga, I., Muguerza, J., P´ erez, J.M., Perona, I.: An exten- sive comparative study of cluster validity indices. Pattern Recognit.46, 243–256 (2013)

work page 2013
[12]

Journal of Cybernetics3(3), 32–57 (1973)

Dunn, J.C.: A fuzzy relative of the isodata process and its use in detecting compact well-separated clusters. Journal of Cybernetics3(3), 32–57 (1973)

work page 1973
[13]

Communica- tions in Statistics-theory and Methods3(1), 1–27 (1974)

Cali´ nski, T., Harabasz, J.: A dendrite method for cluster analysis. Communica- tions in Statistics-theory and Methods3(1), 1–27 (1974)

work page 1974
[14]

IEEE transactions on pattern analysis and machine intelligence (2), 224–227 (1979)

Davies, D.L., Bouldin, D.W.: A cluster separation measure. IEEE transactions on pattern analysis and machine intelligence (2), 224–227 (1979)

work page 1979
[15]

In: Classification and Data Analysis: Theory and Applications 28, pp

Dudek, A.: Silhouette index as clustering evaluation tool. In: Classification and Data Analysis: Theory and Applications 28, pp. 19–33 (2020). Springer

work page 2020
[16]

Statistical Analysis and Data Mining: The ASA Data Science Journal3(2010)

Vendramin, L., Campello, R.J.G.B., Hruschka, E.R.: Relative clustering validity criteria: A comparative overview. Statistical Analysis and Data Mining: The ASA Data Science Journal3(2010)

work page 2010
[17]

ArXivabs/2407.20246 (2024)

Hassan, B.A., Tayfor, N., Hassan, A.A., Ahmed, A.M., Hamad, R.K., Abdalla, N.N.: From a-to-z review of clustering validation indices. ArXivabs/2407.20246 (2024)

work page arXiv 2024
[18]

Thorndike, R.L.: Who belongs in the family? Psychometrika18(4), 267–276 (1953)

work page 1953
[19]

EURASIP Journal on Wireless Communications and Networking2021(2020)

Shi, C., Wei, B., Wei, S., Wang, W., Liu, H., Liu, J.: A quantitative discriminant method of elbow point for the optimal number of clusters in clustering algorithm. EURASIP Journal on Wireless Communications and Networking2021(2020)

work page 2020
[20]

Journal of the Royal Statistical Society: Series B 17 (Statistical Methodology)63(2001)

Tibshirani, R., Walther, G., Hastie, T.: Estimating the number of clusters in a data set via the gap statistic. Journal of the Royal Statistical Society: Series B 17 (Statistical Methodology)63(2001)

work page 2001
[21]

In: 2020 IEEE 7th International Conference on Data Science and Advanced Analytics (DSAA), pp

Shahapure, K.R., Nicholas, C.: Cluster quality analysis using silhouette score. In: 2020 IEEE 7th International Conference on Data Science and Advanced Analytics (DSAA), pp. 747–748 (2020). IEEE

work page 2020
[22]

Neural Computation16, 1299–1323 (2004)

Lange, T., Roth, V., Braun, M.L., Buhmann, J.M.: Stability-based validation of clustering solutions. Neural Computation16, 1299–1323 (2004)

work page 2004
[23]

In: 2010 IEEE International Conference on Data Mining, pp

Liu, Y., Li, Z., Xiong, H., Gao, X., Wu, J.: Understanding of internal clustering validation measures. In: 2010 IEEE International Conference on Data Mining, pp. 911–916 (2010). IEEE

work page 2010
[24]

In: Inui, K., Jiang, J., Ng, V., Wan, X

Reimers, N., Gurevych, I.: Sentence-BERT: Sentence embeddings using Siamese BERT-networks. In: Inui, K., Jiang, J., Ng, V., Wan, X. (eds.) Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP- IJCNLP), pp. 3982–3992. Association for Computati...

work page arXiv 2019

[1] [1]

ACM computing surveys (CSUR)31(3), 264–323 (1999)

Jain, A.K., Murty, M.N., Flynn, P.J.: Data clustering: a review. ACM computing surveys (CSUR)31(3), 264–323 (1999)

work page 1999

[2] [2]

Journal of computational and applied mathematics20, 53–65 (1987)

Rousseeuw, P.J.: Silhouettes: a graphical aid to the interpretation and validation of cluster analysis. Journal of computational and applied mathematics20, 53–65 (1987)

work page 1987

[3] [3]

Journal of Machine Learning Research12, 2825–2830 (2011)

Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., Blondel, M., Prettenhofer, P., Weiss, R., Dubourg, V., Vanderplas, J., Passos, A., Cournapeau, D., Brucher, M., Perrot, M., Duchesnay, ´E.: Scikit-learn: Machine learning in python. Journal of Machine Learning Research12, 2825–2830 (2011)

work page 2011

[4] [4]

Computa- tional Statistics & Data Analysis158, 107190 (2021)

Batool, F., Hennig, C.: Clustering with the average silhouette width. Computa- tional Statistics & Data Analysis158, 107190 (2021)

work page 2021

[5] [5]

In: Pedreschi, D., Monreale, A., Guidotti, R., Pellungrini, R., Naretto, F

Pavlopoulos, J., Vardakas, G., Likas, A.: Revisiting silhouette aggregation. In: Pedreschi, D., Monreale, A., Guidotti, R., Pellungrini, R., Naretto, F. (eds.) Discovery Science, pp. 354–368. Springer, Cham (2025)

work page 2025

[6] [6]

Wang, Y., Zhao, Y., Therneau, T., Atkinson, E., P. Tafti, A., Zhang, N., Amin, S., Limper, A., Khosla, S., Liu, H.: Unsupervised machine learning for the discovery 16 of latent disease clusters and patient subgroups using electronic health records. Journal of Biomedical Informatics102, 103364 (2019)

work page 2019

[7] [7]

Analytics2, 809–823 (2023)

John, J., Shobayo, O., Ogunleye, B.: An exploration of clustering algorithms for customer segmentation in the uk retail market. Analytics2, 809–823 (2023)

work page 2023

[8] [8]

Information Retrieval Journal25, 239–268 (2022)

Yuan, M., Zobel, J., Lin, P.: Measurement of clustering effectiveness for document collections. Information Retrieval Journal25, 239–268 (2022)

work page 2022

[9] [9]

SIGMOD Record31(2002)

Halkidi, M., Batistakis, Y., Vazirgiannis, M.: Cluster validity methods: Part i. SIGMOD Record31(2002)

work page 2002

[10] [10]

ACM SIGMOD Record31(2002)

Halkidi, M., Batistakis, Y., Vazirgiannis, M.: Clustering validity checking meth- ods: Part ii. ACM SIGMOD Record31(2002)

work page 2002

[11] [11]

Pattern Recognit.46, 243–256 (2013)

Arbelaitz, O., Gurrutxaga, I., Muguerza, J., P´ erez, J.M., Perona, I.: An exten- sive comparative study of cluster validity indices. Pattern Recognit.46, 243–256 (2013)

work page 2013

[12] [12]

Journal of Cybernetics3(3), 32–57 (1973)

Dunn, J.C.: A fuzzy relative of the isodata process and its use in detecting compact well-separated clusters. Journal of Cybernetics3(3), 32–57 (1973)

work page 1973

[13] [13]

Communica- tions in Statistics-theory and Methods3(1), 1–27 (1974)

Cali´ nski, T., Harabasz, J.: A dendrite method for cluster analysis. Communica- tions in Statistics-theory and Methods3(1), 1–27 (1974)

work page 1974

[14] [14]

IEEE transactions on pattern analysis and machine intelligence (2), 224–227 (1979)

Davies, D.L., Bouldin, D.W.: A cluster separation measure. IEEE transactions on pattern analysis and machine intelligence (2), 224–227 (1979)

work page 1979

[15] [15]

In: Classification and Data Analysis: Theory and Applications 28, pp

Dudek, A.: Silhouette index as clustering evaluation tool. In: Classification and Data Analysis: Theory and Applications 28, pp. 19–33 (2020). Springer

work page 2020

[16] [16]

Statistical Analysis and Data Mining: The ASA Data Science Journal3(2010)

Vendramin, L., Campello, R.J.G.B., Hruschka, E.R.: Relative clustering validity criteria: A comparative overview. Statistical Analysis and Data Mining: The ASA Data Science Journal3(2010)

work page 2010

[17] [17]

ArXivabs/2407.20246 (2024)

Hassan, B.A., Tayfor, N., Hassan, A.A., Ahmed, A.M., Hamad, R.K., Abdalla, N.N.: From a-to-z review of clustering validation indices. ArXivabs/2407.20246 (2024)

work page arXiv 2024

[18] [18]

Thorndike, R.L.: Who belongs in the family? Psychometrika18(4), 267–276 (1953)

work page 1953

[19] [19]

EURASIP Journal on Wireless Communications and Networking2021(2020)

Shi, C., Wei, B., Wei, S., Wang, W., Liu, H., Liu, J.: A quantitative discriminant method of elbow point for the optimal number of clusters in clustering algorithm. EURASIP Journal on Wireless Communications and Networking2021(2020)

work page 2020

[20] [20]

Journal of the Royal Statistical Society: Series B 17 (Statistical Methodology)63(2001)

Tibshirani, R., Walther, G., Hastie, T.: Estimating the number of clusters in a data set via the gap statistic. Journal of the Royal Statistical Society: Series B 17 (Statistical Methodology)63(2001)

work page 2001

[21] [21]

In: 2020 IEEE 7th International Conference on Data Science and Advanced Analytics (DSAA), pp

Shahapure, K.R., Nicholas, C.: Cluster quality analysis using silhouette score. In: 2020 IEEE 7th International Conference on Data Science and Advanced Analytics (DSAA), pp. 747–748 (2020). IEEE

work page 2020

[22] [22]

Neural Computation16, 1299–1323 (2004)

Lange, T., Roth, V., Braun, M.L., Buhmann, J.M.: Stability-based validation of clustering solutions. Neural Computation16, 1299–1323 (2004)

work page 2004

[23] [23]

In: 2010 IEEE International Conference on Data Mining, pp

Liu, Y., Li, Z., Xiong, H., Gao, X., Wu, J.: Understanding of internal clustering validation measures. In: 2010 IEEE International Conference on Data Mining, pp. 911–916 (2010). IEEE

work page 2010

[24] [24]

In: Inui, K., Jiang, J., Ng, V., Wan, X

Reimers, N., Gurevych, I.: Sentence-BERT: Sentence embeddings using Siamese BERT-networks. In: Inui, K., Jiang, J., Ng, V., Wan, X. (eds.) Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP- IJCNLP), pp. 3982–3992. Association for Computati...

work page arXiv 2019