Associativity-Peakiness Metric for Contingency Tables
Pith reviewed 2026-05-08 12:20 UTC · model grok-4.3
The pith
A new Associativity Peakiness metric for contingency tables provides higher dynamic range and computational efficiency than existing measures for evaluating clustering algorithms.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The Associativity Peakiness metric characterizes aspects of clustering algorithm performance that are critical for predicting deployed behavior, analogous to quality measures for supervised learning confusion matrices, by delivering higher dynamic range than publicly available metrics while requiring less computation.
What carries the argument
The Associativity Peakiness (AP) metric, which measures the peakiness of associations within contingency tables to expose detailed clustering performance features not captured by vector-based alternatives.
If this is right
- Clustering algorithms can be ranked more precisely when their table outputs are scored with this metric instead of vector-pair alternatives.
- Comparative studies of clustering methods gain the ability to identify subtle performance differences that standard metrics overlook.
- Faster computation allows repeated evaluations during algorithm tuning without added overhead.
- Deployment forecasts for clustering systems become more accurate by incorporating table-specific features.
Where Pith is reading between the lines
- The metric might extend naturally to evaluating other table-based outputs, such as those from association rule mining or community detection.
- Standard benchmarking libraries could incorporate it to replace or supplement current clustering scores for greater sensitivity.
- Testing on edge cases like highly imbalanced tables would clarify whether the dynamic range holds across all realistic scenarios.
- If adopted, it could shift focus from vector metrics to table metrics in algorithm papers, surfacing previously hidden error patterns.
Load-bearing premise
The 500 simulated contingency tables adequately represent the range of outputs from real clustering algorithms, and higher dynamic range directly improves predictions of deployed performance without further checks on actual datasets.
What would settle it
A direct comparison on real-world clustering datasets showing that the AP metric correlates no better with independent performance indicators than existing metrics would disprove the claimed advantage in dynamic range.
read the original abstract
For the use case of comparing the performance of clustering algorithms whose output is a contingency table, a single performance metric for contingency tables is needed. Such a metric is vital for comparative performance analysis of clustering algorithms. A survey of publicly available literature did not show the presence of such a metric. Metrics do exist for vector pairs of truth values and predicted values, which are an alternative form of output of clustering algorithms. However, the metrics for vector pairs do not reveal the presence of detailed features that are apparent in contingency tables. This paper presents the Associativity Peakiness (AP) metric, which characterizes aspects of clustering algorithm performance that are critical for predicting a clustering algorithm's performance when deployed. The AP metric is analogous to measures of quality for confusion matrices that are outputs of supervised learning algorithms. This paper presents results from simulations in which 500 contingency tables were generated for multiple test scenarios. The results show that for the use case of evaluating clustering algorithms, the AP metric characterizes performance of contingency tables with higher dynamic range than publicly available metrics, and that it is computationally more efficient than comparable publicly available metrics.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript proposes the Associativity-Peakiness (AP) metric as a single performance measure for contingency tables arising from clustering algorithms. It surveys existing metrics for vector-pair outputs and argues that they fail to capture detailed features visible in contingency tables. The paper presents simulation results from 500 generated contingency tables across multiple test scenarios, claiming that AP exhibits higher dynamic range than publicly available metrics and is computationally more efficient, making it better suited for predicting deployed clustering performance.
Significance. A well-defined metric with demonstrably higher dynamic range for contingency tables could aid comparative evaluation of clustering algorithms where table structure (e.g., off-diagonal associations) matters. The simulation-based approach is a reasonable starting point, but the absence of real clustering outputs, explicit definitions, and statistical validation means the claimed advantages remain untested against deployed use cases. If the metric were shown to correlate with actual performance on real data, it would fill a noted gap; as presented, its significance is limited by the simulation-only evidence.
major comments (4)
- The manuscript provides no explicit mathematical definition or formula for the AP metric (e.g., how associativity and peakiness are computed from a contingency table). Without this, the central claims about dynamic range and efficiency cannot be verified or reproduced, and the analogy to confusion-matrix measures remains ungrounded.
- The procedure for generating the 500 simulated contingency tables is not described (e.g., distribution parameters, cluster sizes, noise models, or how they represent real clustering outputs). This makes it impossible to assess whether the reported higher dynamic range is an artifact of the simulation design rather than a general property.
- No results from actual clustering runs on real datasets are presented, nor is there any validation that higher dynamic range on simulations predicts deployed performance. The claim that AP is 'vital for comparative performance analysis' therefore rests on untested extrapolation from synthetic tables.
- No error bars, variance estimates, or statistical tests are reported for the dynamic-range comparisons, nor are baseline metrics and their implementations specified. This undermines the efficiency and superiority assertions.
minor comments (2)
- The abstract and text should include at least one concrete example contingency table with the computed AP value to illustrate the metric.
- Clarify the exact definition of 'dynamic range' used in the comparisons (e.g., score range, variance, or separation metric) and cite the specific publicly available metrics being compared.
Simulated Author's Rebuttal
We thank the referee for the constructive and detailed feedback. We address each major comment point-by-point below, indicating where revisions will be made to improve clarity, reproducibility, and scope. Our responses focus on strengthening the manuscript without altering its core simulation-based contribution.
read point-by-point responses
-
Referee: The manuscript provides no explicit mathematical definition or formula for the AP metric (e.g., how associativity and peakiness are computed from a contingency table). Without this, the central claims about dynamic range and efficiency cannot be verified or reproduced, and the analogy to confusion-matrix measures remains ungrounded.
Authors: We agree that an explicit mathematical definition is required for verification and reproducibility. The submitted version omitted a dedicated derivation section. We will add the full formulas for associativity (normalized association measure across table entries) and peakiness (concentration metric on diagonal and off-diagonal elements), along with the combined AP expression and pseudocode for computation from a contingency table. revision: yes
-
Referee: The procedure for generating the 500 simulated contingency tables is not described (e.g., distribution parameters, cluster sizes, noise models, or how they represent real clustering outputs). This makes it impossible to assess whether the reported higher dynamic range is an artifact of the simulation design rather than a general property.
Authors: We will expand the simulation methodology section with complete details on the generation process, including the specific distributions for cluster sizes, noise injection models, parameter ranges, and the rationale for how the scenarios capture structural features typical of real clustering contingency tables. revision: yes
-
Referee: No results from actual clustering runs on real datasets are presented, nor is there any validation that higher dynamic range on simulations predicts deployed performance. The claim that AP is 'vital for comparative performance analysis' therefore rests on untested extrapolation from synthetic tables.
Authors: We acknowledge the value of real-data validation for stronger claims about deployed performance. The present work is intentionally a controlled simulation study to isolate metric properties across diverse table structures. We will revise the discussion and abstract to clarify the simulation scope, better map scenarios to real clustering characteristics, and qualify the 'vital' claim as applying to the metric's demonstrated properties in synthetic settings, with a note that real-dataset experiments are planned for follow-up work. revision: partial
-
Referee: No error bars, variance estimates, or statistical tests are reported for the dynamic-range comparisons, nor are baseline metrics and their implementations specified. This undermines the efficiency and superiority assertions.
Authors: We will update all comparison figures to include error bars and variance estimates across the 500 tables, add appropriate statistical tests (e.g., paired comparisons with significance levels), and explicitly list the baseline metrics along with their public implementations or references used in the experiments. revision: yes
Circularity Check
No circularity: newly defined metric with independent simulation evaluation
full rationale
The paper defines the Associativity-Peakiness (AP) metric from first principles as a new characterization tool for contingency tables and evaluates it directly on 500 independently generated simulated tables. No equations, derivations, or load-bearing steps are shown that reduce the metric or its claimed advantages (higher dynamic range, efficiency) to fitted parameters, self-citations, or ansatzes. The central claim is an empirical comparison on the provided simulations rather than a prediction forced by the definition itself. This is self-contained against external benchmarks with no reduction to inputs by construction.
Axiom & Free-Parameter Ledger
Reference graph
Works this paper leans on
-
[1]
Journal of the Royal Statistical Society
Altham, P.M.E.: The measurement of association in a contingency table: three extensions of the cross-ratios. Journal of the Royal Statistical Society. Series B (Methodological)32(3), 395–407 (1970)
work page 1970
-
[2]
Journal of the Royal Statistical Society
Altham, P.M.E.: The measurement of association of rows and columns for an r×s contingency table. Journal of the Royal Statistical Society. Series B (Methodological) 32(1), 53–73 (1970) ArcGIS Architecture Center: View the output confusion matrix. https://pro.arcgis. com/en/pro-app/latest/help/analysis/image-analyst/accuracy-assessment.htm (2024)
work page 1970
-
[3]
https:// www.statology.org/what-is-a-strong-correlation/ (2020)
Bobbitt, Z.: What is considered to be a “strong” correlation? – statology. https:// www.statology.org/what-is-a-strong-correlation/ (2020)
work page 2020
-
[4]
Bouchet-Valat, M.: General marginal-free association indices for contingency tables: from the altham index to the intrinsic association coefficient. Sociological Methods & Research51(1), 203–236 (2022) Data Science With Chris: Evaluate clustering algorithms. https:// datasciencewithchris.com/evaluate-clustering-algorithms/ (2023)
work page 2022
-
[5]
Dickinson, M.: Python - generate random numbers summing to a pre- defined value - stack overflow. https://stackoverflow.com/questions/3589214/ generate-random-numbers-summing-to-a-predefined-value (2023)
-
[6]
In: Confer- ence on Uncertainty in Artificial Intelligence (2002)
Dom, B.E.: An information-theoretic external cluster-validity measure. In: Confer- ence on Uncertainty in Artificial Intelligence (2002). https://api.semanticscholar. org/CorpusID:402174 37
work page 2002
-
[7]
Pfitzner, D., Leibbrandt, R., Powers, D.: Characterization and evaluation of similarity measures for pairs of clusterings. Knowledge and Information Systems19(3), 361– 394 (2009) Python Software Foundation: Random - generate pseudo-random numbers - Python 3.11.4. documentation. https://docs.python.org/3/library/random.html#random. choice (2023)
work page 2009
-
[8]
Rosenberg, A., Hirschberg, J.B.: V-measure: a conditional entropy-based external cluster evaluation measure. In: Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning. Proceedings of EMNLP, Prague (2007) Scikit-Learn Developers: 2.3 Clustering - scikit-learn 1.3.0 documentation; 2.3.11.4 Fowlkes-M...
work page 2007
-
[9]
International Journal of Computer Applications17(1), 25–30 (2011)
Sayal, R., Kumar, D.V.V.: A novel similarity measure for clustering categorical data sets. International Journal of Computer Applications17(1), 25–30 (2011)
work page 2011
-
[10]
Beahvior Research Methods55, 3326–3347 (2023)
Sliveira, P.S.P., Siqueira, J.O.: Better to be in agreement than in bad company. Beahvior Research Methods55, 3326–3347 (2023)
work page 2023
-
[11]
In: International Conference on Machine Learning, pp
Vinh, N.X., Epps, J., Bailey, J.: Information theoretic measures for clusterings com- parison. In: International Conference on Machine Learning, pp. 1073–1080 (2009). https://doi.org/10.1145/1553374.1553511
-
[12]
Technical Report AD1216913, Defense Technical Information Center (December 2023)
Zirkind, N.E., Diehl, W.J.: Associativity–peakiness metrics for contingency tables. Technical Report AD1216913, Defense Technical Information Center (December 2023). https://discover.dtic.mil/technical-reports/ 38
work page 2023
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.