Rock the KASBA: Blazingly Fast and Accurate Time Series Clustering

Anthony Bagnall; Christopher Holder

arxiv: 2411.17838 · v1 · submitted 2024-11-26 · 💻 cs.LG

Rock the KASBA: Blazingly Fast and Accurate Time Series Clustering

Christopher Holder , Anthony Bagnall This is my paper

Pith reviewed 2026-05-23 08:13 UTC · model grok-4.3

classification 💻 cs.LG

keywords time series clusteringk-meansMSM distancestochastic subgradientbarycentre averagingelastic distanceclustering algorithm

0 comments

The pith

KASBA is a k-means time series clusterer using MSM distance, stochastic subgradient barycentres and metric pruning for better accuracy at much higher speed.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper presents KASBA as a new time series clustering method built on k-means. It applies the Move-Split-Merge elastic distance at every step, computes centroids via randomised stochastic subgradient descent, connects successive clustering stages to speed convergence, and skips many distance calculations by using the metric properties of MSM. The design lets users trade off runtime against quality. Experiments show it outperforms faster existing clusterers on benchmark accuracy while running orders of magnitude faster than the most accurate prior k-means variants.

Core claim

KASBA produces significantly better clustering than the faster state of the art clusterers and offers orders of magnitude improvement in run time over the most performant k-means alternatives.

What carries the argument

The KASBA algorithm, which integrates MSM distance throughout clustering, randomised stochastic subgradient descent for barycentre centroids, stage linking for convergence, and metric-based distance pruning.

If this is right

Practitioners gain a tunable method that can serve as a fast preprocessing step for anomaly detection or segmentation on time series.
Clustering quality no longer forces a choice between slow accurate k-means and fast but low-quality alternatives.
The same acceleration techniques can be applied to other elastic distances that satisfy the metric properties used for pruning.
Stage linking reduces the number of full iterations needed, lowering total distance computations across repeated runs.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Similar stochastic subgradient and pruning ideas could reduce the cost of other centroid-based methods that currently rely on full pairwise distances.
If the observed speedups hold on streaming data, KASBA could support online clustering tasks where prior accurate methods are too slow.
The balance between runtime and accuracy may encourage wider use of time series clustering as an exploratory tool rather than only in offline batch settings.

Load-bearing premise

The experimental benchmarks used to demonstrate accuracy and runtime gains are representative of the real-world time series distributions on which practitioners would deploy the algorithm.

What would settle it

A head-to-head test on a fresh collection of large, diverse time series datasets where KASBA shows neither clear accuracy gains over fast baselines nor clear runtime gains over accurate k-means variants.

read the original abstract

Time series data has become increasingly prevalent across numerous domains, driving a growing demand for time series machine learning techniques. Among these, time series clustering (TSCL) stands out as one of the most popular machine learning tasks. TSCL serves as a powerful exploratory analysis tool and is also employed as a preprocessing step or subroutine for various tasks, including anomaly detection, segmentation, and classification. The most popular TSCL algorithms are either fast (in terms of run time) but perform poorly on benchmark problems, or perform well on benchmarks but scale poorly. We present a new TSCL algorithm, the $k$-means (K) accelerated (A) Stochastic subgradient (S) Barycentre (B) Average (A) (KASBA) clustering algorithm. KASBA is a $k$-means clustering algorithm that uses the Move-Split-Merge (MSM) elastic distance at all stages of clustering, applies a randomised stochastic subgradient gradient descent to find barycentre centroids, links each stage of clustering to accelerate convergence and exploits the metric property of MSM distance to avoid a large proportion of distance calculations. It is a versatile and scalable clusterer designed for real-world TSCL applications. It allows practitioners to balance run time and clustering performance. We demonstrate through extensive experimentation that KASBA produces significantly better clustering than the faster state of the art clusterers and is offers orders of magnitude improvement in run time over the most performant $k$-means alternatives.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

KASBA is a new k-means variant using MSM distance, stochastic subgradient barycentres, stage linking and metric pruning, but the abstract supplies zero data or implementation details to support its speed and accuracy claims.

read the letter

The paper describes KASBA as a k-means algorithm that applies MSM distance at every stage, uses randomised stochastic subgradient descent for barycentre centroids, links clustering stages for faster convergence, and prunes distance calculations via the metric property of MSM. This specific combination inside a single pipeline appears new based on the abstract. It directly targets the known practical tradeoff in time series clustering where fast methods tend to be inaccurate and accurate methods tend to be slow, and it positions the algorithm as tunable for real-world use. That framing is clear and useful on its own terms. The main weakness is that every performance claim rests on unspecified extensive experimentation. No datasets, no accuracy metrics such as ARI or NMI, no runtime tables, no baseline implementations, and no ablation results are provided. Without those, it is impossible to judge whether the reported gains are real, whether the benchmarks are representative, or whether the comparisons are fair. The assumption that the test problems reflect distributions practitioners actually encounter cannot be checked. This work would interest researchers already working on time series clustering who want to see the full experimental section and code. Based on the abstract alone it does not yet have enough substance to justify sending to peer review.

Referee Report

2 major / 1 minor

Summary. The manuscript introduces KASBA, a k-means time series clustering algorithm that applies the Move-Split-Merge (MSM) elastic distance at all stages, uses randomised stochastic subgradient descent for barycentre centroid computation, links clustering stages to accelerate convergence, and exploits MSM's metric property for distance pruning. It claims that extensive experimentation shows KASBA yields significantly better clustering than faster state-of-the-art methods while delivering orders-of-magnitude runtime gains over performant k-means alternatives, making it suitable for real-world applications.

Significance. If the empirical claims were substantiated, KASBA could represent a meaningful advance in scalable time series clustering by reconciling accuracy with efficiency through consistent use of an elastic distance and stochastic optimization, potentially benefiting downstream tasks such as anomaly detection and segmentation. The approach of combining MSM with subgradient barycentres and pruning is technically interesting, but the complete absence of any experimental evidence in the manuscript prevents any determination of whether these benefits are realized or generalizable.

major comments (2)

[Abstract] Abstract: The central claims that 'extensive experimentation' demonstrates significantly better clustering than faster SOTA clusterers and orders-of-magnitude runtime improvements over performant k-means alternatives are unsupported by any datasets, metrics (e.g., ARI, NMI), runtime tables, baseline implementations, ablation results, or statistical tests in the provided manuscript. This directly prevents evaluation of the primary contribution.
[Abstract] Abstract: The algorithm is described only at the level of component names (MSM distance at all stages, randomised stochastic subgradient gradient descent for barycentres, stage linking, metric pruning) with no equations, pseudocode, convergence analysis, or implementation details, rendering it impossible to verify correctness, novelty, or how the claimed efficiency is achieved.

minor comments (1)

[Abstract] Abstract: The forced acronym expansion for KASBA repeats the letter 'A' and is presented in a manner that may reduce readability.

Simulated Author's Rebuttal

2 responses · 2 unresolved

We thank the referee for their review and for identifying the core issues with the manuscript as presented. We acknowledge that the provided text consists solely of the abstract, which contains unsubstantiated claims and lacks technical detail. Below we respond to each major comment.

read point-by-point responses

Referee: [Abstract] Abstract: The central claims that 'extensive experimentation' demonstrates significantly better clustering than faster SOTA clusterers and orders-of-magnitude runtime improvements over performant k-means alternatives are unsupported by any datasets, metrics (e.g., ARI, NMI), runtime tables, baseline implementations, ablation results, or statistical tests in the provided manuscript. This directly prevents evaluation of the primary contribution.

Authors: We agree that the abstract alone provides no supporting evidence. The manuscript excerpt supplied here contains only the abstract and therefore cannot substantiate the claims of extensive experimentation, specific metrics, runtime tables, or statistical tests. Without the full paper body, these claims remain unsupported in the material under review. revision: yes
Referee: [Abstract] Abstract: The algorithm is described only at the level of component names (MSM distance at all stages, randomised stochastic subgradient gradient descent for barycentres, stage linking, metric pruning) with no equations, pseudocode, convergence analysis, or implementation details, rendering it impossible to verify correctness, novelty, or how the claimed efficiency is achieved.

Authors: We agree that the abstract provides only high-level component names and no equations, pseudocode, or analysis. The supplied manuscript text is limited to the abstract, so no such technical details are present and the referee cannot verify the algorithm. revision: yes

standing simulated objections not resolved

Absence of any experimental results, datasets, metrics, or statistical tests (only abstract available)
Absence of equations, pseudocode, convergence analysis, or implementation details (only abstract available)

Circularity Check

0 steps flagged

No circularity; performance claims rest on external experiments with no internal derivations or self-referential reductions

full rationale

The provided text consists solely of the abstract with no equations, derivations, fitted parameters, or mathematical claims. The central assertion of superior clustering and runtime performance is justified solely by reference to 'extensive experimentation' whose results are not shown. No self-citations, ansatzes, uniqueness theorems, or renamings appear. Because there is no derivation chain at all, none of the enumerated circularity patterns can be exhibited by quoting the paper and showing a reduction to inputs by construction. The paper is therefore self-contained against the circularity criteria.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract-only; no equations, parameters or background assumptions are stated, so the ledger cannot be populated beyond the empty lists below.

pith-pipeline@v0.9.0 · 5764 in / 1055 out tokens · 19605 ms · 2026-05-23T08:13:55.064130+00:00 · methodology

Rock the KASBA: Blazingly Fast and Accurate Time Series Clustering

Core claim

What carries the argument

If this is right

Where Pith is reading between the lines

Load-bearing premise

What would settle it

discussion (0)