Modern hierarchical, agglomerative clustering algorithms
read the original abstract
This paper presents algorithms for hierarchical, agglomerative clustering which perform most efficiently in the general-purpose setup that is given in modern standard software. Requirements are: (1) the input data is given by pairwise dissimilarities between data points, but extensions to vector data are also discussed (2) the output is a "stepwise dendrogram", a data structure which is shared by all implementations in current standard software. We present algorithms (old and new) which perform clustering in this setting efficiently, both in an asymptotic worst-case analysis and from a practical point of view. The main contributions of this paper are: (1) We present a new algorithm which is suitable for any distance update scheme and performs significantly better than the existing algorithms. (2) We prove the correctness of two algorithms by Rohlf and Murtagh, which is necessary in each case for different reasons. (3) We give well-founded recommendations for the best current algorithms for the various agglomerative clustering schemes.
This paper has not been read by Pith yet.
Forward citations
Cited by 11 Pith papers
-
Claim2Vec: Embedding Fact-Check Claims for Multilingual Similarity and Clustering
Claim2Vec is a contrastively fine-tuned multilingual encoder that improves claim clustering performance and embedding space structure on multilingual fact-check datasets.
-
No-regret optimization of time-varying bilevel problems
W-SparQ-BL models time-varying lower-level responses with multi-output GPs and sparse approximations to achieve sublinear dynamic regret in bilevel optimization under noise.
-
VecCISC: Improving Confidence-Informed Self-Consistency with Reasoning Trace Clustering and Candidate Answer Selection
VecCISC filters equivalent, degenerate, or hallucinated reasoning traces via semantic clustering before critic evaluation, reducing token use by 47% with no loss in accuracy versus standard CISC.
-
Multiscale Cochran-Mantel-Haenszel Scanning for Conditional Dependency
Multiscale CMH scanning generalizes the classic test to continuous spaces, achieving consistency for conditional independence testing by conditioning on marginal order statistics without requiring large stratum sizes.
-
ClusterChirp: Scalable Interactive Exploration of Omics Data with Natural Language-Guided Analysis
ClusterChirp is a freely available web tool for scalable interactive visualization, hierarchical clustering, and natural-language-guided analysis of high-dimensional omics datasets.
-
Causal Unsupervised Semantic Segmentation
CAUSE uses frontdoor adjustment with a discretized concept clusterbook mediator to perform unsupervised semantic segmentation and reports state-of-the-art results.
-
CVSearch: Empowering Multimodal LLMs with Cognitive Visual Search for High-Resolution Image Perception
CVSearch proposes an Assess-then-Search workflow combining expert-assisted search with Semantic Guided Adaptive Patching and Dynamic Bottom-Up Search to improve efficiency and accuracy on high-resolution image tasks f...
-
GS-Quant: Granular Semantic and Generative Structural Quantization for Knowledge Graph Completion
GS-Quant generates coarse-to-fine discrete codes for KG entities via semantic hierarchy injection and causal sequence reconstruction, enabling LLMs to perform knowledge graph completion by treating the codes as vocabu...
-
Improve Large Language Model Systems with User Logs
UNO distills user logs into semi-structured rules and preferences, applies query-and-feedback clustering to handle heterogeneity, quantifies cognitive gaps to filter noise, and builds primary and reflective modules th...
-
Adaptive Obstacle-Aware Task Assignment and Planning for Heterogeneous Robot Teaming
OATH combines adaptive Halton sampling, obstacle-aware clustering with auctions, and LLM-based instruction interpretation to improve task assignment and planning for heterogeneous robot teams in obstacle-rich environments.
-
Graph-based Complexity Forecasts in UK En Route Airspace Using Relevant Aircraft Interactions
Graph-based probabilistic forecasts of relevant aircraft pairs in UK airspace achieve higher correlation (ρ=0.68) with workload proxies than standard volume predictions (ρ=0.55).
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.