pith. sign in

arxiv: 2104.07551 · v1 · submitted 2021-04-15 · 💻 cs.LG

HIVE-COTE 2.0: a new meta ensemble for time series classification

Pith reviewed 2026-05-24 14:03 UTC · model grok-4.3

classification 💻 cs.LG
keywords time series classificationensemble methodsHIVE-COTEUCR archiveUEA archiveROCKETmeta-ensembledictionary ensemble
0
0 comments X

The pith

HIVE-COTE 2.0 replaces key ensemble members with TDE, DrCIF and Arsenal to raise accuracy on time series benchmarks.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper updates the HIVE-COTE meta-ensemble for time series classification by introducing the Temporal Dictionary Ensemble and Diverse Representation Canonical Interval Forest as replacements for prior members and adding an Arsenal of ROCKET classifiers. It reports that the resulting HIVE-COTE 2.0 achieves statistically higher accuracy than the previous state of the art across 112 univariate datasets from the UCR archive and 26 multivariate datasets from the UEA archive. A reader would care because time series classification supports applications in medicine, finance and monitoring, where even modest accuracy gains can improve downstream decisions. The changes target both accuracy and practical usability of the ensemble.

Core claim

HIVE-COTE 2.0 forms its ensemble from classifiers drawn from multiple domains including phase-independent shapelets, bag-of-words dictionaries and phase-dependent intervals, now incorporating the new TDE and DrCIF components plus the Arsenal ensemble of ROCKET classifiers, and demonstrates significantly higher accuracy than prior state-of-the-art methods on the stated UCR and UEA collections.

What carries the argument

The HIVE-COTE heterogeneous meta-ensemble that votes across transformation-based classifiers from different domains, with TDE, DrCIF and Arsenal as the updated constituents that supply complementary signals.

If this is right

  • HIVE-COTE 2.0 establishes a new accuracy reference point for both univariate and multivariate time series classification.
  • The three new components can be combined with other base classifiers to produce further ensembles.
  • Accuracy improvements hold across the full range of dataset characteristics represented in the UCR and UEA archives.
  • The updated ensemble remains practical to run on standard hardware while delivering the reported gains.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The pattern of incremental replacement of ensemble members suggests that continued search for complementary representations could yield additional gains on the same benchmarks.
  • Because the method aggregates information from shape, dictionary and interval domains, it may transfer to related sequence-labeling tasks outside strict classification.
  • If the complementarity holds, similar meta-ensemble designs could be tested on streaming or irregularly sampled time series without major redesign.

Load-bearing premise

The accuracy gains from TDE, DrCIF and Arsenal arise from genuine complementarity rather than dataset-specific tuning or properties of the UCR and UEA collections.

What would settle it

A head-to-head evaluation on a fresh collection of time series datasets showing no statistically significant accuracy advantage for HIVE-COTE 2.0 over the previous leading methods would falsify the central claim.

read the original abstract

The Hierarchical Vote Collective of Transformation-based Ensembles (HIVE-COTE) is a heterogeneous meta ensemble for time series classification. HIVE-COTE forms its ensemble from classifiers of multiple domains, including phase-independent shapelets, bag-of-words based dictionaries and phase-dependent intervals. Since it was first proposed in 2016, the algorithm has remained state of the art for accuracy on the UCR time series classification archive. Over time it has been incrementally updated, culminating in its current state, HIVE-COTE 1.0. During this time a number of algorithms have been proposed which match the accuracy of HIVE-COTE. We propose comprehensive changes to the HIVE-COTE algorithm which significantly improve its accuracy and usability, presenting this upgrade as HIVE-COTE 2.0. We introduce two novel classifiers, the Temporal Dictionary Ensemble (TDE) and Diverse Representation Canonical Interval Forest (DrCIF), which replace existing ensemble members. Additionally, we introduce the Arsenal, an ensemble of ROCKET classifiers as a new HIVE-COTE 2.0 constituent. We demonstrate that HIVE-COTE 2.0 is significantly more accurate than the current state of the art on 112 univariate UCR archive datasets and 26 multivariate UEA archive datasets.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper presents HIVE-COTE 2.0, an updated meta-ensemble for time series classification. It introduces the Temporal Dictionary Ensemble (TDE) and Diverse Representation Canonical Interval Forest (DrCIF) as replacements for existing components, along with the Arsenal (an ensemble of ROCKET classifiers). The central claim is that HIVE-COTE 2.0 significantly outperforms the current state of the art on 112 univariate datasets from the UCR archive and 26 multivariate datasets from the UEA archive.

Significance. If the accuracy improvements are confirmed to stem from the new components rather than tuning artifacts, this would be a significant contribution to time series classification research, as HIVE-COTE has been a benchmark method since 2016. The new classifiers TDE and DrCIF add to the toolkit of transformation-based ensembles.

major comments (2)
  1. [Section 4 (Experimental Results)] The manuscript does not report an ablation study isolating the marginal accuracy contribution of TDE, DrCIF, and Arsenal (e.g., by successively adding or removing each while holding other hyperparameters fixed). This is load-bearing for the claim that the three components deliver complementary gains rather than collective hyperparameter optimization over the UCR/UEA collections.
  2. [Section 4 (Experimental Setup)] No description is given of the hyperparameter search procedure used for TDE, DrCIF, and Arsenal, nor whether search was performed on the same 112+26 datasets used for final reporting. This leaves open the possibility that reported wins are artifacts of benchmark-specific tuning.
minor comments (2)
  1. [Abstract] The abstract states 'significantly more accurate' without naming the statistical test, correction for multiple comparisons, or significance threshold employed.
  2. Ensure all baseline implementations (including prior HIVE-COTE versions) are referenced to the same public codebase or repository to support reproducibility.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their constructive comments on our manuscript. We address each major comment below and will revise the paper to incorporate clarifications and additional analysis where appropriate.

read point-by-point responses
  1. Referee: [Section 4 (Experimental Results)] The manuscript does not report an ablation study isolating the marginal accuracy contribution of TDE, DrCIF, and Arsenal (e.g., by successively adding or removing each while holding other hyperparameters fixed). This is load-bearing for the claim that the three components deliver complementary gains rather than collective hyperparameter optimization over the UCR/UEA collections.

    Authors: We agree that an explicit ablation study would strengthen the evidence for complementary contributions. The current results compare HIVE-COTE 2.0 to HIVE-COTE 1.0 and other methods, but do not isolate each new component. In the revised manuscript we will add an ablation study evaluating HIVE-COTE 1.0 augmented successively with TDE, DrCIF, and Arsenal (hyperparameters held fixed to those in the component papers) on the same 112+26 datasets. revision: yes

  2. Referee: [Section 4 (Experimental Setup)] No description is given of the hyperparameter search procedure used for TDE, DrCIF, and Arsenal, nor whether search was performed on the same 112+26 datasets used for final reporting. This leaves open the possibility that reported wins are artifacts of benchmark-specific tuning.

    Authors: We agree a description is needed. Hyperparameters for TDE, DrCIF and Arsenal follow the procedures in their original papers: independent per-dataset cross-validation on each training set only. No collective search or tuning across the full UCR/UEA collections occurred. We will insert this description into Section 4 of the revised manuscript. revision: yes

Circularity Check

0 steps flagged

Empirical benchmark update with no circular derivation

full rationale

This is an empirical machine-learning paper that proposes algorithmic updates (TDE, DrCIF, Arsenal) to an existing ensemble and reports accuracy on fixed external benchmark collections (UCR/UEA). No equations, uniqueness theorems, or derivations are present that could reduce to self-definition, fitted inputs renamed as predictions, or self-citation chains. The central claims rest on direct experimental comparison rather than any closed-loop construction, satisfying the self-contained-against-external-benchmarks criterion.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 0 invented entities

Review limited to abstract; full methods section unavailable. Free parameters and axioms inferred at high level from typical ensemble practice and benchmark reliance.

free parameters (1)
  • Hyperparameters of TDE, DrCIF, and Arsenal
    Ensemble classifiers of this type require numerous tunable parameters fitted during training on the target data.
axioms (1)
  • domain assumption Performance on the UCR and UEA archives is a reliable indicator of general time series classification superiority.
    The central accuracy claim rests entirely on results from these standard benchmark collections.

pith-pipeline@v0.9.0 · 5771 in / 1173 out tokens · 33671 ms · 2026-05-24T14:03:17.032481+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. Soft-MSM: Differentiable Context-Aware Elastic Alignment for Time Series

    cs.LG 2026-04 unverdicted novelty 7.0

    Soft-MSM is a smooth, gradient-enabled version of the context-aware MSM distance for time series alignment that outperforms Soft-DTW alternatives in clustering and nearest-centroid classification.

Reference graph

Works this paper leans on

42 extracted references · 42 canonical work pages · cited by 1 Pith paper · 1 internal anchor

  1. [1]

    Engineering Structures 228:111564

    Arul M, Kareem A (2021) Applications of shapelet transform to time series classification of earthquake, wind and wave data. Engineering Structures 228:111564

  2. [2]

    Data Mining and Knowledge Discovery 31(3):606--660

    Bagnall A, Lines J, Bostrom A, Large J, Keogh E (2017) The great time series classification bake off: a review and experimental evaluation of recent algorithmic advances. Data Mining and Knowledge Discovery 31(3):606--660

  3. [3]

    The UEA multivariate time series classification archive, 2018

    Bagnall A, Dau H, Lines J, Flynn M, Large J, Bostrom A, Southam P, Keogh E (2018) The UEA multivariate time series classification archive, 2018. ArXiv e-prints arXiv:1811.00075, ://arxiv.org/abs/1809.06705

  4. [4]

    In: proceedings of the 5th Workshop on Advances Analytics and Learning on Temporal Data, Lecture Notes in Artificial Intelligence, vol 12588

    Bagnall A, Flynn M, Large J, Lines J, Middlehurst M (2020) On the usage and performance of HIVE-COTE v1.0 . In: proceedings of the 5th Workshop on Advances Analytics and Learning on Temporal Data, Lecture Notes in Artificial Intelligence, vol 12588

  5. [5]

    Data Mining and Knowledge Discovery 28(3):634--669

    Batista G, Keogh E, Tataw O, deSouza V (2014) CID : an efficient complexity-invariant distance measure for time series. Data Mining and Knowledge Discovery 28(3):634--669

  6. [6]

    Benavoli A, Corani G, Mangili F (2016) Should we really use post-hoc tests based on mean-ranks? Journal of Machine Learning Research 17:1--10

  7. [7]

    Transactions on Large-Scale Data and Knowledge Centered Systems 32:24--46

    Bostrom A, Bagnall A (2017) Binary shapelet transform for multiclass time series classification. Transactions on Large-Scale Data and Knowledge Centered Systems 32:24--46

  8. [8]

    In: proceedings of the IEEE International Conference on Data Mining

    Cabello N, Naghizade E, Qi J, Kulik L (2020) Fast and accurate time series classification through supervised interval search. In: proceedings of the IEEE International Conference on Data Mining

  9. [9]

    In: Proc

    Caruana R, Niculescu-Mizil A (2004) Ensemble selection from libraries of models. In: Proc. of the 21st International Conference on Machine learning

  10. [10]

    Annals of Operations Research 148(1):227--250

    Chaovalitwongse WA, Prokopyev OA, Pardalos PM (2006) Electroencephalogram (eeg) time series classification: Applications in epilepsy. Annals of Operations Research 148(1):227--250

  11. [11]

    IEEE/CAA Journal of Automatica Sinica 6(6):1293--1305

    Dau H, Bagnall A, Kamgar K, Yeh M, Zhu Y, Gharghabi S, Ratanamahatana C, Chotirat A, Keogh E (2019) The UCR time series archive. IEEE/CAA Journal of Automatica Sinica 6(6):1293--1305

  12. [12]

    Data Mining and Knowledge Discovery 34:1454--1495

    Dempster A, Petitjean F, Webb G (2020) ROCKET : Exceptionally fast and accurate time series classification using random convolutional kernels. Data Mining and Knowledge Discovery 34:1454--1495

  13. [13]

    Dem s ar J (2006) Statistical comparisons of classifiers over multiple data sets 7:1--30

  14. [14]

    Information Sciences 239:142--153

    Deng H, Runger G, Tuv E, Vladimir M (2013) A time series forest for classification and feature extraction. Information Sciences 239:142--153

  15. [15]

    Data Mining and Knowledge Discovery 33(4):917--963

    Fawaz H, Forestier G, Weber J, Idoumghar L, Muller P (2019) Deep learning for time series classification: a review. Data Mining and Knowledge Discovery 33(4):917--963

  16. [16]

    Data Mining and Knowledge Discovery 34(6):1936--1962

    Fawaz H, Lucas B, Forestier G, Pelletier C, Schmidt D, Weber J, Webb G, Idoumghar L, Muller P, Petitjean F (2020) InceptionTime : finding AlexNet for time series classification. Data Mining and Knowledge Discovery 34(6):1936--1962

  17. [17]

    Cell Systems 5(5):527--531

    Fulcher B, Jones N (2017) hctsa: A computational framework for automated time-series phenotyping using massive feature extraction. Cell Systems 5(5):527--531

  18. [18]

    statistical comparisons of classifiers over multiple data sets

    Garc\' i a S, Herrera F (2008) An extension on “statistical comparisons of classifiers over multiple data sets” for all pairwise comparisons 9:2677--2694

  19. [19]

    arXiv preprint arXiv:201110996

    Guillaume A, Vrain C, Wael E (2020) Time series classification for predictive maintenance on event logs. arXiv preprint arXiv:201110996

  20. [20]

    Data Mining and Knowledge Discovery 28(4):851--881

    Hills J, Lines J, Baranauskas E, Mapp J, Bagnall A (2014) Classification of time series by shapelet transformation. Data Mining and Knowledge Discovery 28(4):851--881

  21. [21]

    Intelligent Data Analysis 23(5)

    Large J, Bagnall A, Malinowski S, Tavenard R (2019 a ) On time series classification with dictionary-based classifiers. Intelligent Data Analysis 23(5)

  22. [22]

    Data Mining and Knowledge Discovery 33(6):1674--–1709

    Large J, Lines J, Bagnall A (2019 b ) A probabilistic classifier ensemble weighting scheme based on cross validated accuracy estimates. Data Mining and Knowledge Discovery 33(6):1674--–1709

  23. [23]

    In: 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06), IEEE, vol 2, pp 2169--2178

    Lazebnik S, Schmid C, Ponce J (2006) Beyond bags of features: Spatial pyramid matching for recognizing natural scene categories. In: 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06), IEEE, vol 2, pp 2169--2178

  24. [24]

    Data Mining and Knowledge Discovery 29:565--592

    Lines J, Bagnall A (2015) Time series classification with ensembles of elastic distance measures. Data Mining and Knowledge Discovery 29:565--592

  25. [25]

    In: proceedings of 16th IEEE International Conference on Data Mining

    Lines J, Taylor S, Bagnall A (2016) HIVE-COTE : The hierarchical vote collective of transformation-based ensembles for time series classification. In: proceedings of 16th IEEE International Conference on Data Mining

  26. [26]

    ACM Transactions Knowledge Discovery from Data 12(5):1--36

    Lines J, Taylor S, Bagnall A (2018) Time series classification with HIVE-COTE : The hierarchical vote collective of transformation-based ensembles. ACM Transactions Knowledge Discovery from Data 12(5):1--36

  27. [27]

    Data Mining and Knowledge Discovery 33(6):1821--1852

    Lubba C, Sethi S, Knaute P, Schultz S, Fulcher B, Jones N (2019) catch22: canonical time-series characteristics. Data Mining and Knowledge Discovery 33(6):1821--1852

  28. [28]

    Data Mining and Knowledge Discovery 33(3):607--635

    Lucas B, Shifaz A, Pelletier C, O’Neill L, Zaidi N, Goethals B, Petitjean F, Webb G (2019) Proximity forest: an effective and scalable distance-based classifier for time series. Data Mining and Knowledge Discovery 33(3):607--635

  29. [29]

    In: proceedings of Intelligent Data Engineering and Automated Learning, Lecture Notes in Computer Science, vol 11871, pp 11--19

    Middlehurst M, Vickers W, Bagnall A (2019) Scalable dictionary classifiers for time series classification. In: proceedings of Intelligent Data Engineering and Automated Learning, Lecture Notes in Computer Science, vol 11871, pp 11--19

  30. [30]

    In: proceedings of the IEEE International Conference on Big Data

    Middlehurst M, Large J, Bagnall A (2020 a ) The canonical interval forest (CIF) classifier for time series classification. In: proceedings of the IEEE International Conference on Big Data

  31. [31]

    In: proceedings of the European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases

    Middlehurst M, Large J, Cawley G, Bagnall A (2020 b ) The temporal dictionary ensemble (TDE) classifier for time series classification. In: proceedings of the European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases

  32. [32]

    Data Mining and Knowledge Discovery 33(4):1183--1222

    Nguyen TL, Gsponer S, Ilie I, O'Reilly M, Ifrim G (2019) Interpretable time series classification using linear models and multi-resolution multi-domain symbolic representations. Data Mining and Knowledge Discovery 33(4):1183--1222

  33. [33]

    Ecological informatics 21:40--49

    Potamitis I (2014) Classifying insects on the fly. Ecological informatics 21:40--49

  34. [34]

    IEEE Transactions on Pattern Analysis and Machine Intelligence 28(10):1619--1630

    Rodriguez J, Kuncheva L, Alonso C (2006) Rotation forest: A new classifier ensemble method. IEEE Transactions on Pattern Analysis and Machine Intelligence 28(10):1619--1630

  35. [35]

    Data Mining and Knowledge Discovery 35(2):401--–449

    Ruiz AP, Flynn M, Large J, Middlehurst M, Bagnall A (2021) The great multivariate time series classification bake off: a review and experimental evaluation of recent algorithmic advances. Data Mining and Knowledge Discovery 35(2):401--–449

  36. [36]

    Data Mining and Knowledge Discovery 29(6):1505--1530

    Sch \"a fer P (2015) The BOSS is concerned with time series classification in the presence of noise. Data Mining and Knowledge Discovery 29(6):1505--1530

  37. [37]

    a fer P, H \

    Sch \"a fer P, H \"o gqvist M (2012) SFA: a symbolic Fourier approximation and index for similarity search in high dimensional datasets . In: proceedings of the 15th International Conference on Extending Database Technology, pp 516--527

  38. [38]

    In: proceedings of the ACM on Conference on Information and Knowledge Management, pp 637--646

    Sch \"a fer P, Leser U (2017 a ) Fast and accurate time series classification with WEASEL . In: proceedings of the ACM on Conference on Information and Knowledge Management, pp 637--646

  39. [39]

    arXiv preprint arXiv:171111343

    Sch \"a fer P, Leser U (2017 b ) Multivariate time series classification with weasel+ muse. arXiv preprint arXiv:171111343

  40. [40]

    Data Mining and Knowledge Discovery pp 1--34

    Shifaz A, Pelletier C, Petitjean F, Webb G (2020) TS-CHIEF : A scalable and accurate forest algorithm for time series classification. Data Mining and Knowledge Discovery pp 1--34

  41. [41]

    Data mining and knowledge discovery 31(1):1--31

    Shokoohi-Yekta M, Hu B, Jin H, Wang J, Keogh E (2017) Generalizing dtw to the multi-dimensional case requires an adaptive approach. Data mining and knowledge discovery 31(1):1--31

  42. [42]

    In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR)

    Szegedy C, Liu W, Jia Y, Sermanet P, Reed S, Anguelov D, Erhan D, Vanhoucke V, Rabinovich A (2015) Going deeper with convolutions. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR)