Unsupervised Feature Based Algorithms for Time Series Extrinsic Regression
Pith reviewed 2026-05-24 08:23 UTC · model grok-4.3
The pith
FreshPRINCE and DrCIF are the only regressors that significantly outperform rotation forest on an expanded set of 63 TSER problems.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
FreshPRINCE is a pipeline that transforms each time series into a wide range of summary features and passes them to a rotation forest regressor. DrCIF is a tree ensemble that creates features from summary statistics computed over random intervals. On the expanded archive of 63 problems these two algorithms, together with InceptionTime, significantly outperform the other 18 regressors tested; crucially, only FreshPRINCE and DrCIF also significantly outperform the standard rotation forest regressor.
What carries the argument
FreshPRINCE and DrCIF, which adapt unsupervised feature extraction and interval statistics from time series classification into regression pipelines built around rotation forest.
If this is right
- Rotation forest remains a competitive baseline that new TSER methods must demonstrably beat.
- Summary-feature pipelines and interval-based statistics transfer successfully from classification to extrinsic regression.
- InceptionTime is strong but does not significantly surpass rotation forest on this benchmark.
- The larger archive makes performance differences clearer than the original 19-problem set.
Where Pith is reading between the lines
- Hybrid models that combine FreshPRINCE feature sets with DrCIF interval statistics could be tested next.
- The same feature constructions might improve regression performance in related time-series tasks such as forecasting.
- Releasing the 63-problem archive allows direct head-to-head testing of any future TSER proposal against these results.
Load-bearing premise
The 63 problems in the expanded TSER archive are representative enough of real-world extrinsic regression tasks that performance rankings will hold more generally.
What would settle it
A follow-up experiment on a fresh collection of real-world TSER problems in which neither FreshPRINCE nor DrCIF shows a statistically significant advantage over rotation forest.
Figures
read the original abstract
Time Series Extrinsic Regression (TSER) involves using a set of training time series to form a predictive model of a continuous response variable that is not directly related to the regressor series. The TSER archive for comparing algorithms was released in 2022 with 19 problems. We increase the size of this archive to 63 problems and reproduce the previous comparison of baseline algorithms. We then extend the comparison to include a wider range of standard regressors and the latest versions of TSER models used in the previous study. We show that none of the previously evaluated regressors can outperform a regression adaptation of a standard classifier, rotation forest. We introduce two new TSER algorithms developed from related work in time series classification. FreshPRINCE is a pipeline estimator consisting of a transform into a wide range of summary features followed by a rotation forest regressor. DrCIF is a tree ensemble that creates features from summary statistics over random intervals. Our study demonstrates that both algorithms, along with InceptionTime, exhibit significantly better performance compared to the other 18 regressors tested. More importantly, these two proposals (DrCIF and FreshPRINCE) models are the only ones that significantly outperform the standard rotation forest regressor.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript claims to expand the TSER archive from 19 to 63 problems, reproduce prior baseline comparisons, evaluate a wider range of regressors, and introduce two new algorithms (FreshPRINCE: a summary-feature pipeline with rotation forest; DrCIF: an interval-based tree ensemble). It reports that DrCIF, FreshPRINCE, and InceptionTime significantly outperform the other 18 regressors tested, and that only DrCIF and FreshPRINCE significantly beat the rotation forest regressor.
Significance. If the empirical claims hold after clarification of protocols, the work would be useful for enlarging the TSER benchmark and highlighting competitive feature-based methods. The reproduction of prior results, identification of rotation forest as a strong baseline, and derivation of DrCIF/FreshPRINCE from classification techniques are positive contributions. The value is primarily in the expanded archive and the performance ranking, provided the archive is representative and the statistics are transparent.
major comments (2)
- [TSER Archive expansion section] TSER Archive expansion section: No description is provided of how the 44 new problems were sourced, selected, or validated for diversity in domains, lengths, or feature characteristics. This directly affects the load-bearing claim that the observed rankings (DrCIF and FreshPRINCE as the only significant outperformers of rotation forest) generalize.
- [Experiments and Results section] Experiments and Results section: The manuscript omits the full experimental protocol, including data splits, the exact statistical tests (e.g., name of test, p-value threshold, multiple-comparison correction), and any pre-specification details. Without these, the abstract's significance claims cannot be verified and post-hoc selection cannot be ruled out.
minor comments (1)
- [Abstract] Abstract: The phrasing 'these two proposals (DrCIF and FreshPRINCE) models are the only ones' is grammatically awkward; rephrase for clarity.
Simulated Author's Rebuttal
We thank the referee for their constructive comments, which highlight opportunities to improve transparency. We address each major comment below and will revise the manuscript to incorporate the requested details.
read point-by-point responses
-
Referee: [TSER Archive expansion section] No description is provided of how the 44 new problems were sourced, selected, or validated for diversity in domains, lengths, or feature characteristics. This directly affects the load-bearing claim that the observed rankings (DrCIF and FreshPRINCE as the only significant outperformers of rotation forest) generalize.
Authors: We acknowledge the omission and will add a dedicated subsection to the TSER Archive expansion section. This will describe the sourcing of the 44 problems (primarily from the UEA/UCR time series classification archive and other public repositories), the inclusion criteria (continuous extrinsic targets not directly derived from the series), and summary statistics demonstrating diversity across domains, lengths, and characteristics. These additions will support the generalizability of the performance rankings. revision: yes
-
Referee: [Experiments and Results section] The manuscript omits the full experimental protocol, including data splits, the exact statistical tests (e.g., name of test, p-value threshold, multiple-comparison correction), and any pre-specification details. Without these, the abstract's significance claims cannot be verified and post-hoc selection cannot be ruled out.
Authors: We agree that the protocol details should be stated explicitly. The revised Experiments and Results section will specify the data splitting procedure (using provided train/test splits where available, otherwise 10-fold cross-validation), the Wilcoxon signed-rank test with Holm correction at p < 0.05, and confirmation that the comparisons were pre-specified according to the study design. We will also provide a link to the full code repository and results to enable independent verification. revision: yes
Circularity Check
No circularity: purely empirical benchmark study
full rationale
The paper expands the TSER archive from 19 to 63 problems and performs comparative evaluation of regressors, including two new proposals (FreshPRINCE and DrCIF) adapted from classification work. All claims concern observed performance rankings on the external benchmark collection. No derivations, equations, fitted parameters renamed as predictions, or self-citation chains that reduce the central result to its own inputs appear in the provided text. The evaluation is statistically independent of any internal construction that would trigger the enumerated circularity patterns.
Axiom & Free-Parameter Ledger
free parameters (1)
- model hyperparameters
axioms (2)
- domain assumption The expanded archive of 63 problems is representative of TSER tasks.
- domain assumption Standard cross-validation and significance testing procedures were applied without post-hoc data selection.
Reference graph
Works this paper leans on
-
[1]
A. Bagnall, J. Lines, A. Bostrom, J. Large, and E. Keogh, “The great time series classification bake off: a review and experimental evaluation of recent algorithmic advances,”Data Mining and Knowledge Discovery, vol. 31, no. 3, pp. 606–660, 2017. 31https://perma.cc/37GN-BMRL 32https://perma.cc/4M79-7QY4 33https://perma.cc/8AP5-5R7R 34https://tsml-eval.read...
work page 2017
-
[2]
Bake off redux: a review and experimental evaluation of recent time series classification algorithms
M. Middlehurst, P. Sch ¨afer, and A. Bagnall, “Bake off redux: a review and experimental evaluation of recent time series classification algo- rithms,” arXiv preprint arXiv:2304.13029 , 2023
work page internal anchor Pith review Pith/arXiv arXiv 2023
-
[3]
A. P. Ruiz, M. Flynn, J. Large, M. Middlehurst, and A. Bagnall, “The great multivariate time series classification bake off: a review and experimental evaluation of recent algorithmic advances,” Data Mining and Knowledge Discovery , vol. 35, no. 2, pp. 401–449, 2021
work page 2021
-
[4]
S. Makridakis, S. C. Wheelwright, and R. J. Hyndman, Forecasting methods and applications . John wiley & sons, 2008
work page 2008
-
[5]
Time series extrinsic regression,
C. W. Tan, C. Bergmeir, F. Petitjean, and G. I. Webb, “Time series extrinsic regression,” Data Mining and Knowledge Discovery , vol. 35, no. 3, pp. 1032–1060, 2021
work page 2021
-
[6]
A. Dempster, F. Petitjean, and G. I. Webb, “Rocket: exceptionally fast and accurate time series classification using random convolutional kernels,” Data Mining and Knowledge Discovery , vol. 34, no. 5, pp. 1454–1495, 2020
work page 2020
-
[7]
Inceptiontime: Finding alexnet for time series classification,
H. Ismail Fawaz, B. Lucas, G. Forestier, C. Pelletier, D. F. Schmidt, J. Weber, G. I. Webb, L. Idoumghar, P.-A. Muller, and F. Petitjean, “Inceptiontime: Finding alexnet for time series classification,” Data Mining and Knowledge Discovery , vol. 34, no. 6, pp. 1936–1962, 2020
work page 1936
-
[8]
C. W. Tan, A. Dempster, C. Bergmeir, and G. I. Webb, “Multirocket: 18 multiple pooling operators and transformations for fast and effective time series classification,” Data Mining and Knowledge Discovery , vol. 36, no. 5, pp. 1623–1646, 2022
work page 2022
-
[9]
The freshprince: A simple transforma- tion based pipeline time series classifier,
M. Middlehurst and A. Bagnall, “The freshprince: A simple transforma- tion based pipeline time series classifier,” in International Conference on Pattern Recognition and Artificial Intelligence . Springer, 2022, pp. 150–161
work page 2022
-
[11]
Rotation forest: A new classifier ensemble method,
J. J. Rodriguez, L. I. Kuncheva, and C. J. Alonso, “Rotation forest: A new classifier ensemble method,” IEEE transactions on pattern analysis and machine intelligence , vol. 28, no. 10, pp. 1619–1630, 2006
work page 2006
-
[12]
A time series forest for classification and feature extraction,
H. Deng, G. Runger, E. Tuv, and M. Vladimir, “A time series forest for classification and feature extraction,”Information Sciences, vol. 239, pp. 142–153, 2013
work page 2013
-
[13]
The canonical interval forest (cif) classifier for time series classification,
M. Middlehurst, J. Large, and A. Bagnall, “The canonical interval forest (cif) classifier for time series classification,” in2020 IEEE international conference on big data (big data) . IEEE, 2020, pp. 188–195
work page 2020
-
[14]
Hive-cote 2.0: a new meta ensemble for time series classification,
M. Middlehurst, J. Large, M. Flynn, J. Lines, A. Bostrom, and A. Bag- nall, “Hive-cote 2.0: a new meta ensemble for time series classification,” Machine Learning, vol. 110, no. 11, pp. 3211–3243, 2021
work page 2021
-
[15]
catch22: Canonical time-series characteristics,
C. H. Lubba, S. S. Sethi, P. Knaute, S. R. Schultz, B. D. Fulcher, and N. S. Jones, “catch22: Canonical time-series characteristics,” Data Mining and Knowledge Discovery , vol. 33, no. 6, pp. 1821–1852, 2019
work page 2019
-
[16]
Estimator selection and combination in scalar-on-function regression,
J. Goldsmith and F. Scheipl, “Estimator selection and combination in scalar-on-function regression,” Computational Statistics & Data Analy- sis, vol. 70, pp. 362–372, 2014
work page 2014
-
[17]
Methods for scalar-on-function regression,
P. T. Reiss, J. Goldsmith, H. L. Shang, and R. T. Ogden, “Methods for scalar-on-function regression,” International Statistical Review , vol. 85, no. 2, pp. 228–249, 2017
work page 2017
-
[18]
Support vector regression machines,
H. Drucker, C. J. Burges, L. Kaufman, A. Smola, and V . Vapnik, “Support vector regression machines,” Advances in neural information processing systems , vol. 9, 1996. [Online]. Available: https://proceedings.neurips.cc/paper/1996/file/ d38901788c533e8286cb6400b40b386d-Paper.pdf
work page 1996
-
[19]
L. Breiman, “Random forests,” Machine learning , vol. 45, no. 1, pp. 5–32, 2001
work page 2001
-
[20]
Xgboost: A scalable tree boosting system,
T. Chen and C. Guestrin, “Xgboost: A scalable tree boosting system,” in Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining , 2016, pp. 785–794
work page 2016
-
[21]
Time series classification from scratch with deep neural networks: A strong baseline,
Z. Wang, W. Yan, and T. Oates, “Time series classification from scratch with deep neural networks: A strong baseline,” in Proceedings of the IEEE international joint conference on neural networks, 2017, pp. 1578– 1585
work page 2017
-
[22]
Deep residual learning for image recognition,
K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” in Proceedings of the IEEE conference on computer vision and pattern recognition , 2016, pp. 770–778
work page 2016
-
[23]
Combining recurrent, convolutional, and continuous-time models with linear state space layers,
A. Gu, I. Johnson, K. Goel, K. Saab, T. Dao, A. Rudra, and C. R ´e, “Combining recurrent, convolutional, and continuous-time models with linear state space layers,” in Advances in Neural Information Processing Systems, M. Ranzato, A. Beygelzimer, Y . Dauphin, P. Liang, and J. W. Vaughan, Eds., vol. 34. Curran Associates, Inc., 2021, pp. 572–585. [Online]....
work page 2021
-
[24]
Liquid structural state-space models,
R. Hasani, M. Lechner, T.-H. Wang, M. Chahine, A. Amini, and D. Rus, “Liquid structural state-space models,” 2022
work page 2022
-
[25]
Graph neural networks for multivariate time series regression with application to seismic data,
S. Bloemheuvel, J. van den Hoogen, D. Jozinovi ´c, A. Michelini, and M. Atzmueller, “Graph neural networks for multivariate time series regression with application to seismic data,” International Journal of Data Science and Analytics , 2022
work page 2022
-
[26]
Septor: Seismic depth estimation using hierarchical neural networks,
M. A. Siddiquee, V . M. A. Souza, G. E. Baker, and A. Mueen, “Septor: Seismic depth estimation using hierarchical neural networks,” in Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining , 2022, p. 3889–3897
work page 2022
-
[27]
Essays on machine learning methods in economics,
M. Bayani, “Essays on machine learning methods in economics,” Ph.D. dissertation, City University of New York, 2022. [Online]. Available: https://academicworks.cuny.edu/cgi/viewcontent.cgi?article= 6069&context=gc etds
work page 2022
-
[28]
Generalizing DTW to the multi-dimensional case requires an adaptive approach,
M. Shokoohi-Yekta, B. Hu, H. Jin, J. Wang, and E. Keogh, “Generalizing DTW to the multi-dimensional case requires an adaptive approach,”Data Mining and Knowledge Discovery , vol. 31, no. 1, pp. 1–31, 2017
work page 2017
-
[29]
C. W. Tan, A. Dempster, C. Bergmeir, and G. Webb, “MultiRocket: multiple pooling operators and transformations for fast and effective time series classification,” Data Mining and Knowledge Discovery , vol. 36, p. 1623–1646, 2022
work page 2022
-
[30]
Deep learning for time series classification: a review,
H. Ismail Fawaz, G. Forestier, J. Weber, L. Idoumghar, and P.-A. Muller, “Deep learning for time series classification: a review,”Data Mining and Knowledge Discovery, vol. 33, no. 4, pp. 917–963, 2019
work page 2019
-
[31]
Convolutional neural networks for time series classification,
B. Zhao, H. Lu, S. Chen, J. Liu, and D. Wu, “Convolutional neural networks for time series classification,” Journal of Systems Engineering and Electronics, vol. 28, no. 1, pp. 162–169, 2017
work page 2017
-
[32]
Binary shapelet transform for multiclass time series classification,
A. Bostrom and A. Bagnall, “Binary shapelet transform for multiclass time series classification,”Transactions on Large-Scale Data and Knowl- edge Centered Systems , vol. 32, pp. 24–46, 2017
work page 2017
-
[33]
Time series shapelets: a novel technique that allows accurate, interpretable and fast classification,
L. Ye and E. Keogh, “Time series shapelets: a novel technique that allows accurate, interpretable and fast classification,” Data Mining and Knowledge Discovery, vol. 22, no. 1-2, pp. 149–182, 2011
work page 2011
-
[34]
The BOSS is concerned with time series classification in the presence of noise,
P. Sch ¨afer, “The BOSS is concerned with time series classification in the presence of noise,” Data Mining and Knowledge Discovery , vol. 29, no. 6, pp. 1505–1530, 2015
work page 2015
-
[35]
P. Sch ¨afer and U. Leser, “Weasel 2.0–a random dilated dictionary transform for fast, accurate and memory constrained time series classi- fication,”arXiv preprint arXiv:2301.10194 , 2023
-
[36]
The temporal dictionary ensemble (TDE) classifier for time series classification,
M. Middlehurst, J. Large, G. Cawley, and A. Bagnall, “The temporal dictionary ensemble (TDE) classifier for time series classification,” in Proceedings of European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases , ser. Lecture Notes in Computer Science, vol. 12457, 2020, pp. 660–676
work page 2020
-
[37]
Time series feature extraction on basis of scalable hypothesis tests (tsfresh–a python package),
M. Christ, N. Braun, J. Neuffer, and A. W. Kempa-Liehr, “Time series feature extraction on basis of scalable hypothesis tests (tsfresh–a python package),” Neurocomputing, vol. 307, pp. 72–77, 2018
work page 2018
-
[38]
Is rotation forest the best classifier for problems with continuous features?
A. Bagnall, M. Flynn, J. Large, J. Line, A. Bostrom, and G. Cawley, “Is rotation forest the best classifier for problems with continuous features?” arXiv preprint arXiv:1809.06705 , 2018
-
[39]
Ro- tation forests for regression,
C. Pardo, J. F. Diez-Pastor, C. Garc ´ıa-Osorio, and J. J. Rodr ´ıguez, “Ro- tation forests for regression,” Applied Mathematics and Computation , vol. 219, no. 19, pp. 9914–9924, 2013
work page 2013
-
[40]
Classification and regression trees,
L. Breiman, J. Friedman, R. Olshen, and C. Stone, “Classification and regression trees,” 1984
work page 1984
-
[41]
Fast and accurate time series classification through supervised interval search,
N. Cabello, E. Naghizade, J. Qi, and L. Kulik, “Fast and accurate time series classification through supervised interval search,” in 2020 IEEE International Conference on Data Mining (ICDM) . IEEE, 2020, pp. 948–953
work page 2020
-
[42]
M. Flynn, J. Large, and A. Bagnall, “The contract random interval spectral ensemble (c-rise): the effect of contracting a classifier on accuracy,” in International Conference on Hybrid Artificial Intelligence Systems. Springer, 2019, pp. 381–392
work page 2019
-
[43]
Derivative dynamic time warping,
E. J. Keogh and M. J. Pazzani, “Derivative dynamic time warping,” in Proceedings of the 2001 SIAM international conference on data mining . SIAM, 2001, pp. 1–11
work page 2001
-
[44]
B. D. Fulcher and N. S. Jones, “hctsa: A computational framework for automated time-series phenotyping using massive feature extraction,” Cell Systems, vol. 5, no. 5, pp. 527–531, 2017
work page 2017
-
[45]
L. M. Candanedo and V . Feldheim, “Accurate occupancy detection of an office room from light, temperature, humidity and CO 2 measurements using statistical learning models,” Energy and Buildings , vol. 112, pp. 28–39, 2016
work page 2016
-
[46]
A. Salam and A. El Hibaoui, “Comparison of machine learning al- gorithms for the power consumption prediction:-case study of tetouan city–,” in 2018 6th International Renewable and Sustainable Energy Conference (IRSEC). IEEE, 2018, pp. 1–5
work page 2018
-
[47]
Assessing beijing’s pm2. 5 pollution: severity, weather impact, apec and winter heating,
X. Liang, T. Zou, B. Guo, S. Li, H. Zhang, S. Zhang, H. Huang, and S. X. Chen, “Assessing beijing’s pm2. 5 pollution: severity, weather impact, apec and winter heating,” Proceedings of the Royal Society A: Mathematical, Physical and Engineering Sciences , vol. 471, no. 2182, p. 20150257, 2015
work page 2015
-
[48]
Predicting car park occupancy rates in smart cities,
D. H. Stolfi, E. Alba, and X. Yao, “Predicting car park occupancy rates in smart cities,” in Smart Cities: Second International Conference, Smart- CT 2017. Springer, 2017, pp. 107–117
work page 2017
-
[49]
R. Osterhuber and A. Schwartz, “Snowpack, precipitation, and temper- ature measurements at the Central Sierra Snow Laboratory for water years 1971 to 2019,” 2021, dryad
work page 1971
-
[50]
Online decorrelation of humidity and temperature in chemical sensors for continuous monitoring,
R. Huerta, T. Mosqueiro, J. Fonollosa, N. F. Rulkov, and I. Rodriguez- Lujan, “Online decorrelation of humidity and temperature in chemical sensors for continuous monitoring,” Chemometrics and Intelligent Lab- oratory Systems, vol. 157, pp. 169–176, 2016
work page 2016
-
[51]
Bioinspired early detection through gas flow modulation in chemo-sensory systems,
A. Ziyatdinov, J. Fonollosa, L. Fern ´andez, A. Gutierrez-G ´alvez, S. Marco, and A. Perera, “Bioinspired early detection through gas flow modulation in chemo-sensory systems,” Sensors and Actuators B: Chemical, vol. 206, pp. 538–547, 2015
work page 2015
-
[52]
Learning to 19 detect heavy drinking episodes using smartphone accelerometer data
J. A. Killian, K. M. Passino, A. Nandi, D. R. Madden, J. D. Clapp, N. Wiratunga, F. Coenen, and S. Sani, “Learning to 19 detect heavy drinking episodes using smartphone accelerometer data.” in KHD@ IJCAI , 2019, pp. 35–42. [Online]. Available: https://ceur-ws.org/V ol-2429/paper6.pdf
work page 2019
-
[53]
M. D ´ıaz-Lozano, D. Guijo-Rubio, P. A. Guti ´errez, A. M. G ´omez- Orellana, I. T ´u˜nez, L. Ortigosa-Moreno, A. Romanos-Rodr ´ıguez, J. Padillo-Ruiz, and C. Herv ´as-Mart´ınez, “Covid-19 contagion forecast- ing framework based on curve decomposition and evolutionary artificial neural networks: A case study in andalusia, spain,” Expert Systems with Applic...
work page 2022
-
[54]
Natural gas prices with Twitter sentiment scores,
R. Ghosh, “Natural gas prices with Twitter sentiment scores,” 2022, kaggle
work page 2022
-
[55]
Statistical comparisons of classifiers over multiple data sets,
J. Dem ˇsar, “Statistical comparisons of classifiers over multiple data sets,” Journal of Machine Learning Research , vol. 7, pp. 1–30, 2006
work page 2006
-
[56]
An extension on “statistical comparisons of classifiers over multiple data sets
S. Garc ´ıa and F. Herrera, “An extension on “statistical comparisons of classifiers over multiple data sets” for all pairwise comparisons,” Journal of Machine Learning Research , vol. 9, pp. 2677–2694, 2008. [Online]. Available: https://www.jmlr.org/papers/v9/garcia08a.html
work page 2008
-
[57]
Should we really use post-hoc tests based on mean-ranks?
A. Benavoli, G. Corani, and F. Mangili, “Should we really use post-hoc tests based on mean-ranks?” Journal of Machine Learning Research , vol. 17, pp. 1–10, 2016. [Online]. Available: https://jmlr.org/papers/v17/benavoli16a.html
work page 2016
-
[58]
A new family of power transformations to improve normality or symmetry,
I. Yeo and R. A. Johnson, “A new family of power transformations to improve normality or symmetry,” Biometrika, vol. 87, no. 4, pp. 954– 959, 2000
work page 2000
-
[59]
Estimating electric motor temperatures with deep residual machine learning,
W. Kirchg ¨assner, O. Wallscheid, and J. B ¨ocker, “Estimating electric motor temperatures with deep residual machine learning,” IEEE Trans- actions on Power Electronics , vol. 36, no. 7, pp. 7480–7488, 2021. David Guijo-Rubio received the BSc degree in Computer Science in 2016 and the MSc degree in Artificial Intelligence in 2017. In 2021, he received the...
work page 2021
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.