HIVE-COTE 2.0: a new meta ensemble for time series classification
Pith reviewed 2026-05-24 14:03 UTC · model grok-4.3
The pith
HIVE-COTE 2.0 replaces key ensemble members with TDE, DrCIF and Arsenal to raise accuracy on time series benchmarks.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
HIVE-COTE 2.0 forms its ensemble from classifiers drawn from multiple domains including phase-independent shapelets, bag-of-words dictionaries and phase-dependent intervals, now incorporating the new TDE and DrCIF components plus the Arsenal ensemble of ROCKET classifiers, and demonstrates significantly higher accuracy than prior state-of-the-art methods on the stated UCR and UEA collections.
What carries the argument
The HIVE-COTE heterogeneous meta-ensemble that votes across transformation-based classifiers from different domains, with TDE, DrCIF and Arsenal as the updated constituents that supply complementary signals.
If this is right
- HIVE-COTE 2.0 establishes a new accuracy reference point for both univariate and multivariate time series classification.
- The three new components can be combined with other base classifiers to produce further ensembles.
- Accuracy improvements hold across the full range of dataset characteristics represented in the UCR and UEA archives.
- The updated ensemble remains practical to run on standard hardware while delivering the reported gains.
Where Pith is reading between the lines
- The pattern of incremental replacement of ensemble members suggests that continued search for complementary representations could yield additional gains on the same benchmarks.
- Because the method aggregates information from shape, dictionary and interval domains, it may transfer to related sequence-labeling tasks outside strict classification.
- If the complementarity holds, similar meta-ensemble designs could be tested on streaming or irregularly sampled time series without major redesign.
Load-bearing premise
The accuracy gains from TDE, DrCIF and Arsenal arise from genuine complementarity rather than dataset-specific tuning or properties of the UCR and UEA collections.
What would settle it
A head-to-head evaluation on a fresh collection of time series datasets showing no statistically significant accuracy advantage for HIVE-COTE 2.0 over the previous leading methods would falsify the central claim.
read the original abstract
The Hierarchical Vote Collective of Transformation-based Ensembles (HIVE-COTE) is a heterogeneous meta ensemble for time series classification. HIVE-COTE forms its ensemble from classifiers of multiple domains, including phase-independent shapelets, bag-of-words based dictionaries and phase-dependent intervals. Since it was first proposed in 2016, the algorithm has remained state of the art for accuracy on the UCR time series classification archive. Over time it has been incrementally updated, culminating in its current state, HIVE-COTE 1.0. During this time a number of algorithms have been proposed which match the accuracy of HIVE-COTE. We propose comprehensive changes to the HIVE-COTE algorithm which significantly improve its accuracy and usability, presenting this upgrade as HIVE-COTE 2.0. We introduce two novel classifiers, the Temporal Dictionary Ensemble (TDE) and Diverse Representation Canonical Interval Forest (DrCIF), which replace existing ensemble members. Additionally, we introduce the Arsenal, an ensemble of ROCKET classifiers as a new HIVE-COTE 2.0 constituent. We demonstrate that HIVE-COTE 2.0 is significantly more accurate than the current state of the art on 112 univariate UCR archive datasets and 26 multivariate UEA archive datasets.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper presents HIVE-COTE 2.0, an updated meta-ensemble for time series classification. It introduces the Temporal Dictionary Ensemble (TDE) and Diverse Representation Canonical Interval Forest (DrCIF) as replacements for existing components, along with the Arsenal (an ensemble of ROCKET classifiers). The central claim is that HIVE-COTE 2.0 significantly outperforms the current state of the art on 112 univariate datasets from the UCR archive and 26 multivariate datasets from the UEA archive.
Significance. If the accuracy improvements are confirmed to stem from the new components rather than tuning artifacts, this would be a significant contribution to time series classification research, as HIVE-COTE has been a benchmark method since 2016. The new classifiers TDE and DrCIF add to the toolkit of transformation-based ensembles.
major comments (2)
- [Section 4 (Experimental Results)] The manuscript does not report an ablation study isolating the marginal accuracy contribution of TDE, DrCIF, and Arsenal (e.g., by successively adding or removing each while holding other hyperparameters fixed). This is load-bearing for the claim that the three components deliver complementary gains rather than collective hyperparameter optimization over the UCR/UEA collections.
- [Section 4 (Experimental Setup)] No description is given of the hyperparameter search procedure used for TDE, DrCIF, and Arsenal, nor whether search was performed on the same 112+26 datasets used for final reporting. This leaves open the possibility that reported wins are artifacts of benchmark-specific tuning.
minor comments (2)
- [Abstract] The abstract states 'significantly more accurate' without naming the statistical test, correction for multiple comparisons, or significance threshold employed.
- Ensure all baseline implementations (including prior HIVE-COTE versions) are referenced to the same public codebase or repository to support reproducibility.
Simulated Author's Rebuttal
We thank the referee for their constructive comments on our manuscript. We address each major comment below and will revise the paper to incorporate clarifications and additional analysis where appropriate.
read point-by-point responses
-
Referee: [Section 4 (Experimental Results)] The manuscript does not report an ablation study isolating the marginal accuracy contribution of TDE, DrCIF, and Arsenal (e.g., by successively adding or removing each while holding other hyperparameters fixed). This is load-bearing for the claim that the three components deliver complementary gains rather than collective hyperparameter optimization over the UCR/UEA collections.
Authors: We agree that an explicit ablation study would strengthen the evidence for complementary contributions. The current results compare HIVE-COTE 2.0 to HIVE-COTE 1.0 and other methods, but do not isolate each new component. In the revised manuscript we will add an ablation study evaluating HIVE-COTE 1.0 augmented successively with TDE, DrCIF, and Arsenal (hyperparameters held fixed to those in the component papers) on the same 112+26 datasets. revision: yes
-
Referee: [Section 4 (Experimental Setup)] No description is given of the hyperparameter search procedure used for TDE, DrCIF, and Arsenal, nor whether search was performed on the same 112+26 datasets used for final reporting. This leaves open the possibility that reported wins are artifacts of benchmark-specific tuning.
Authors: We agree a description is needed. Hyperparameters for TDE, DrCIF and Arsenal follow the procedures in their original papers: independent per-dataset cross-validation on each training set only. No collective search or tuning across the full UCR/UEA collections occurred. We will insert this description into Section 4 of the revised manuscript. revision: yes
Circularity Check
Empirical benchmark update with no circular derivation
full rationale
This is an empirical machine-learning paper that proposes algorithmic updates (TDE, DrCIF, Arsenal) to an existing ensemble and reports accuracy on fixed external benchmark collections (UCR/UEA). No equations, uniqueness theorems, or derivations are present that could reduce to self-definition, fitted inputs renamed as predictions, or self-citation chains. The central claims rest on direct experimental comparison rather than any closed-loop construction, satisfying the self-contained-against-external-benchmarks criterion.
Axiom & Free-Parameter Ledger
free parameters (1)
- Hyperparameters of TDE, DrCIF, and Arsenal
axioms (1)
- domain assumption Performance on the UCR and UEA archives is a reliable indicator of general time series classification superiority.
Forward citations
Cited by 1 Pith paper
-
Soft-MSM: Differentiable Context-Aware Elastic Alignment for Time Series
Soft-MSM is a smooth, gradient-enabled version of the context-aware MSM distance for time series alignment that outperforms Soft-DTW alternatives in clustering and nearest-centroid classification.
Reference graph
Works this paper leans on
-
[1]
Engineering Structures 228:111564
Arul M, Kareem A (2021) Applications of shapelet transform to time series classification of earthquake, wind and wave data. Engineering Structures 228:111564
work page 2021
-
[2]
Data Mining and Knowledge Discovery 31(3):606--660
Bagnall A, Lines J, Bostrom A, Large J, Keogh E (2017) The great time series classification bake off: a review and experimental evaluation of recent algorithmic advances. Data Mining and Knowledge Discovery 31(3):606--660
work page 2017
-
[3]
The UEA multivariate time series classification archive, 2018
Bagnall A, Dau H, Lines J, Flynn M, Large J, Bostrom A, Southam P, Keogh E (2018) The UEA multivariate time series classification archive, 2018. ArXiv e-prints arXiv:1811.00075, ://arxiv.org/abs/1809.06705
work page internal anchor Pith review Pith/arXiv arXiv 2018
-
[4]
Bagnall A, Flynn M, Large J, Lines J, Middlehurst M (2020) On the usage and performance of HIVE-COTE v1.0 . In: proceedings of the 5th Workshop on Advances Analytics and Learning on Temporal Data, Lecture Notes in Artificial Intelligence, vol 12588
work page 2020
-
[5]
Data Mining and Knowledge Discovery 28(3):634--669
Batista G, Keogh E, Tataw O, deSouza V (2014) CID : an efficient complexity-invariant distance measure for time series. Data Mining and Knowledge Discovery 28(3):634--669
work page 2014
-
[6]
Benavoli A, Corani G, Mangili F (2016) Should we really use post-hoc tests based on mean-ranks? Journal of Machine Learning Research 17:1--10
work page 2016
-
[7]
Transactions on Large-Scale Data and Knowledge Centered Systems 32:24--46
Bostrom A, Bagnall A (2017) Binary shapelet transform for multiclass time series classification. Transactions on Large-Scale Data and Knowledge Centered Systems 32:24--46
work page 2017
-
[8]
In: proceedings of the IEEE International Conference on Data Mining
Cabello N, Naghizade E, Qi J, Kulik L (2020) Fast and accurate time series classification through supervised interval search. In: proceedings of the IEEE International Conference on Data Mining
work page 2020
- [9]
-
[10]
Annals of Operations Research 148(1):227--250
Chaovalitwongse WA, Prokopyev OA, Pardalos PM (2006) Electroencephalogram (eeg) time series classification: Applications in epilepsy. Annals of Operations Research 148(1):227--250
work page 2006
-
[11]
IEEE/CAA Journal of Automatica Sinica 6(6):1293--1305
Dau H, Bagnall A, Kamgar K, Yeh M, Zhu Y, Gharghabi S, Ratanamahatana C, Chotirat A, Keogh E (2019) The UCR time series archive. IEEE/CAA Journal of Automatica Sinica 6(6):1293--1305
work page 2019
-
[12]
Data Mining and Knowledge Discovery 34:1454--1495
Dempster A, Petitjean F, Webb G (2020) ROCKET : Exceptionally fast and accurate time series classification using random convolutional kernels. Data Mining and Knowledge Discovery 34:1454--1495
work page 2020
-
[13]
Dem s ar J (2006) Statistical comparisons of classifiers over multiple data sets 7:1--30
work page 2006
-
[14]
Information Sciences 239:142--153
Deng H, Runger G, Tuv E, Vladimir M (2013) A time series forest for classification and feature extraction. Information Sciences 239:142--153
work page 2013
-
[15]
Data Mining and Knowledge Discovery 33(4):917--963
Fawaz H, Forestier G, Weber J, Idoumghar L, Muller P (2019) Deep learning for time series classification: a review. Data Mining and Knowledge Discovery 33(4):917--963
work page 2019
-
[16]
Data Mining and Knowledge Discovery 34(6):1936--1962
Fawaz H, Lucas B, Forestier G, Pelletier C, Schmidt D, Weber J, Webb G, Idoumghar L, Muller P, Petitjean F (2020) InceptionTime : finding AlexNet for time series classification. Data Mining and Knowledge Discovery 34(6):1936--1962
work page 2020
-
[17]
Fulcher B, Jones N (2017) hctsa: A computational framework for automated time-series phenotyping using massive feature extraction. Cell Systems 5(5):527--531
work page 2017
-
[18]
statistical comparisons of classifiers over multiple data sets
Garc\' i a S, Herrera F (2008) An extension on “statistical comparisons of classifiers over multiple data sets” for all pairwise comparisons 9:2677--2694
work page 2008
-
[19]
arXiv preprint arXiv:201110996
Guillaume A, Vrain C, Wael E (2020) Time series classification for predictive maintenance on event logs. arXiv preprint arXiv:201110996
work page 2020
-
[20]
Data Mining and Knowledge Discovery 28(4):851--881
Hills J, Lines J, Baranauskas E, Mapp J, Bagnall A (2014) Classification of time series by shapelet transformation. Data Mining and Knowledge Discovery 28(4):851--881
work page 2014
-
[21]
Intelligent Data Analysis 23(5)
Large J, Bagnall A, Malinowski S, Tavenard R (2019 a ) On time series classification with dictionary-based classifiers. Intelligent Data Analysis 23(5)
work page 2019
-
[22]
Data Mining and Knowledge Discovery 33(6):1674--–1709
Large J, Lines J, Bagnall A (2019 b ) A probabilistic classifier ensemble weighting scheme based on cross validated accuracy estimates. Data Mining and Knowledge Discovery 33(6):1674--–1709
work page 2019
-
[23]
Lazebnik S, Schmid C, Ponce J (2006) Beyond bags of features: Spatial pyramid matching for recognizing natural scene categories. In: 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06), IEEE, vol 2, pp 2169--2178
work page 2006
-
[24]
Data Mining and Knowledge Discovery 29:565--592
Lines J, Bagnall A (2015) Time series classification with ensembles of elastic distance measures. Data Mining and Knowledge Discovery 29:565--592
work page 2015
-
[25]
In: proceedings of 16th IEEE International Conference on Data Mining
Lines J, Taylor S, Bagnall A (2016) HIVE-COTE : The hierarchical vote collective of transformation-based ensembles for time series classification. In: proceedings of 16th IEEE International Conference on Data Mining
work page 2016
-
[26]
ACM Transactions Knowledge Discovery from Data 12(5):1--36
Lines J, Taylor S, Bagnall A (2018) Time series classification with HIVE-COTE : The hierarchical vote collective of transformation-based ensembles. ACM Transactions Knowledge Discovery from Data 12(5):1--36
work page 2018
-
[27]
Data Mining and Knowledge Discovery 33(6):1821--1852
Lubba C, Sethi S, Knaute P, Schultz S, Fulcher B, Jones N (2019) catch22: canonical time-series characteristics. Data Mining and Knowledge Discovery 33(6):1821--1852
work page 2019
-
[28]
Data Mining and Knowledge Discovery 33(3):607--635
Lucas B, Shifaz A, Pelletier C, O’Neill L, Zaidi N, Goethals B, Petitjean F, Webb G (2019) Proximity forest: an effective and scalable distance-based classifier for time series. Data Mining and Knowledge Discovery 33(3):607--635
work page 2019
-
[29]
Middlehurst M, Vickers W, Bagnall A (2019) Scalable dictionary classifiers for time series classification. In: proceedings of Intelligent Data Engineering and Automated Learning, Lecture Notes in Computer Science, vol 11871, pp 11--19
work page 2019
-
[30]
In: proceedings of the IEEE International Conference on Big Data
Middlehurst M, Large J, Bagnall A (2020 a ) The canonical interval forest (CIF) classifier for time series classification. In: proceedings of the IEEE International Conference on Big Data
work page 2020
-
[31]
Middlehurst M, Large J, Cawley G, Bagnall A (2020 b ) The temporal dictionary ensemble (TDE) classifier for time series classification. In: proceedings of the European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases
work page 2020
-
[32]
Data Mining and Knowledge Discovery 33(4):1183--1222
Nguyen TL, Gsponer S, Ilie I, O'Reilly M, Ifrim G (2019) Interpretable time series classification using linear models and multi-resolution multi-domain symbolic representations. Data Mining and Knowledge Discovery 33(4):1183--1222
work page 2019
-
[33]
Ecological informatics 21:40--49
Potamitis I (2014) Classifying insects on the fly. Ecological informatics 21:40--49
work page 2014
-
[34]
IEEE Transactions on Pattern Analysis and Machine Intelligence 28(10):1619--1630
Rodriguez J, Kuncheva L, Alonso C (2006) Rotation forest: A new classifier ensemble method. IEEE Transactions on Pattern Analysis and Machine Intelligence 28(10):1619--1630
work page 2006
-
[35]
Data Mining and Knowledge Discovery 35(2):401--–449
Ruiz AP, Flynn M, Large J, Middlehurst M, Bagnall A (2021) The great multivariate time series classification bake off: a review and experimental evaluation of recent algorithmic advances. Data Mining and Knowledge Discovery 35(2):401--–449
work page 2021
-
[36]
Data Mining and Knowledge Discovery 29(6):1505--1530
Sch \"a fer P (2015) The BOSS is concerned with time series classification in the presence of noise. Data Mining and Knowledge Discovery 29(6):1505--1530
work page 2015
-
[37]
Sch \"a fer P, H \"o gqvist M (2012) SFA: a symbolic Fourier approximation and index for similarity search in high dimensional datasets . In: proceedings of the 15th International Conference on Extending Database Technology, pp 516--527
work page 2012
-
[38]
In: proceedings of the ACM on Conference on Information and Knowledge Management, pp 637--646
Sch \"a fer P, Leser U (2017 a ) Fast and accurate time series classification with WEASEL . In: proceedings of the ACM on Conference on Information and Knowledge Management, pp 637--646
work page 2017
-
[39]
arXiv preprint arXiv:171111343
Sch \"a fer P, Leser U (2017 b ) Multivariate time series classification with weasel+ muse. arXiv preprint arXiv:171111343
work page 2017
-
[40]
Data Mining and Knowledge Discovery pp 1--34
Shifaz A, Pelletier C, Petitjean F, Webb G (2020) TS-CHIEF : A scalable and accurate forest algorithm for time series classification. Data Mining and Knowledge Discovery pp 1--34
work page 2020
-
[41]
Data mining and knowledge discovery 31(1):1--31
Shokoohi-Yekta M, Hu B, Jin H, Wang J, Keogh E (2017) Generalizing dtw to the multi-dimensional case requires an adaptive approach. Data mining and knowledge discovery 31(1):1--31
work page 2017
-
[42]
In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR)
Szegedy C, Liu W, Jia Y, Sermanet P, Reed S, Anguelov D, Erhan D, Vanhoucke V, Rabinovich A (2015) Going deeper with convolutions. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR)
work page 2015
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.