Recognition: unknown
EvoTSC: Evolving Feature Learning Models for Time Series Classification via Genetic Programming
Pith reviewed 2026-05-07 16:35 UTC · model grok-4.3
The pith
EvoTSC automatically evolves lightweight feature learning models for time series classification using genetic programming and outperforms eleven benchmark methods.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
EvoTSC is a genetic programming approach for evolving lightweight feature learning models for time series classification. Its core is a multi-layer program structure that embeds diverse prior expert knowledge to guide the search toward effective time series operations, along with a tailored Pareto tournament selection strategy that favors models performing consistently across varying training data subsets. Experiments on univariate datasets show that EvoTSC significantly outperforms eleven benchmark methods in most comparisons and produces resource-efficient models.
What carries the argument
A multi-layer program structure embedding prior expert knowledge about time series operations, combined with Pareto tournament selection to promote generalizability.
If this is right
- Automatically discovered models can classify time series data more accurately than many existing methods.
- The evolved models use fewer computational resources during training and inference.
- Less labeled data is needed to achieve strong performance on time series tasks.
- Feature engineering for time series can be automated rather than done manually.
Where Pith is reading between the lines
- Extending the method to multivariate time series might yield similar gains if the program structure is adapted accordingly.
- The resource efficiency of the models suggests they could run effectively on devices with limited processing power.
- Integrating the evolved models into larger systems could improve performance in applications like medical signal analysis or financial forecasting.
Load-bearing premise
Embedding diverse forms of prior expert knowledge into the multi-layer program structure will successfully direct the evolutionary process to useful time series analysis operations.
What would settle it
If EvoTSC is tested on new univariate time series classification datasets and does not outperform the eleven benchmarks in accuracy while remaining lightweight, the performance superiority claim would not hold.
Figures
read the original abstract
Time series classification is an important analytical task across diverse domains. However, its practical application is often hindered by the scarcity of labeled data and the requirement for substantial computational resources. To address these challenges, this paper proposes EvoTSC, a novel genetic programming approach designed to automatically evolve lightweight feature learning models for time series classification. The core of EvoTSC is a carefully designed multi-layer program structure that strategically embeds diverse forms of prior expert knowledge into the evolutionary process, effectively guiding the search toward operations known to be highly effective for time series analysis. To mitigate the common overfitting problem in time series classification, a tailored Pareto tournament selection strategy is proposed to favor models that perform consistently well across varying training data subsets, promoting the discovery of highly generalizable models. Extensive experiments conducted on univariate time series classification datasets demonstrate that EvoTSC significantly outperforms eleven benchmark methods in most comparisons. Further analyses verify the contribution of each component and the resource efficiency of the evolved models.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript proposes EvoTSC, a genetic programming method that evolves lightweight feature-learning models for univariate time series classification. Its core innovations are a multi-layer program structure that embeds diverse forms of expert prior knowledge and a Pareto tournament selection mechanism intended to favor models that generalize across training-data subsets. The central empirical claim is that EvoTSC significantly outperforms eleven benchmark methods on standard univariate TSC datasets, with additional analyses purporting to verify the contribution of each component and the resource efficiency of the evolved models.
Significance. If the performance claims are statistically substantiated, the work would demonstrate a practical route to automated, knowledge-guided feature extraction for TSC that addresses labeled-data scarcity and computational cost. The emphasis on lightweight, generalizable models and the explicit incorporation of domain knowledge into the GP representation are potentially valuable additions to the evolutionary computation and time-series literature.
major comments (3)
- [Results / Experimental evaluation] Results section (and abstract): The headline claim that EvoTSC 'significantly outperforms' eleven benchmarks is not supported by any reported statistical tests (Friedman, Wilcoxon signed-rank, or critical-difference diagrams), p-values, or multiple-comparison corrections. Raw win counts or mean accuracies alone are insufficient to justify the qualifier 'significantly' across multiple datasets and methods.
- [Experimental setup] Experimental protocol: No information is provided on train/test splits, number of independent runs, random seeds, hyper-parameter settings for the eleven baselines, or whether the benchmark implementations match published code. These omissions make it impossible to assess whether the reported gains are reproducible or artifacts of experimental choices.
- [Ablation studies] Component analysis: The verification that each element (multi-layer structure, Pareto selection, etc.) contributes to performance relies on ablation tables whose statistical reliability is likewise unaddressed; without error bars or significance tests on the ablations, the contribution claims remain tentative.
minor comments (2)
- [Method] The precise syntax and terminal/function sets of the multi-layer program structure should be illustrated with a concrete example program in Section 3.
- [Pareto tournament selection] Clarify whether the Pareto tournament operates on accuracy versus model size or includes additional objectives; the current description leaves the exact Pareto front definition ambiguous.
Simulated Author's Rebuttal
We thank the referee for the constructive feedback, which highlights important aspects of statistical rigor and reproducibility. We address each major comment below and will revise the manuscript accordingly to strengthen these elements.
read point-by-point responses
-
Referee: [Results / Experimental evaluation] Results section (and abstract): The headline claim that EvoTSC 'significantly outperforms' eleven benchmarks is not supported by any reported statistical tests (Friedman, Wilcoxon signed-rank, or critical-difference diagrams), p-values, or multiple-comparison corrections. Raw win counts or mean accuracies alone are insufficient to justify the qualifier 'significantly' across multiple datasets and methods.
Authors: We agree that the qualifier 'significantly' requires formal statistical support, which is absent from the current manuscript despite consistent outperformance in mean accuracy and win counts. In the revision, we will add a Friedman test across all methods and datasets, followed by post-hoc Wilcoxon signed-rank tests with Holm correction, and include a critical difference diagram. The abstract and results text will be updated to reflect only those claims supported by the new tests. revision: yes
-
Referee: [Experimental setup] Experimental protocol: No information is provided on train/test splits, number of independent runs, random seeds, hyper-parameter settings for the eleven baselines, or whether the benchmark implementations match published code. These omissions make it impossible to assess whether the reported gains are reproducible or artifacts of experimental choices.
Authors: We acknowledge that complete protocol details are necessary for reproducibility and were omitted in the submitted version. The revised manuscript will expand the experimental setup section to specify: standard UCR train/test splits, 30 independent runs with listed random seeds, full hyper-parameter values for each of the eleven baselines (with citations to original sources), and confirmation that publicly available code repositories were used where applicable. revision: yes
-
Referee: [Ablation studies] Component analysis: The verification that each element (multi-layer structure, Pareto selection, etc.) contributes to performance relies on ablation tables whose statistical reliability is likewise unaddressed; without error bars or significance tests on the ablations, the contribution claims remain tentative.
Authors: We recognize that the ablation results lack measures of variability and statistical tests. In the revision, we will augment the component analysis with standard deviations across the 30 runs for each ablation variant and add pairwise statistical tests (Wilcoxon signed-rank) between the full EvoTSC and each ablated version, reporting p-values to substantiate the contribution of the multi-layer structure and Pareto selection. revision: yes
Circularity Check
No circularity: empirical method with external benchmark validation
full rationale
The paper proposes EvoTSC, a genetic programming method using a multi-layer program structure to embed expert knowledge and a Pareto tournament selection for generalization. Its central claims rest on experimental comparisons against 11 benchmark methods on univariate TSC datasets, with no mathematical derivation chain, equations, or predictions that reduce to fitted parameters or self-citations by construction. The approach is self-contained against external benchmarks; performance is measured via direct accuracy comparisons rather than internal redefinitions. No load-bearing steps match the enumerated circularity patterns.
Axiom & Free-Parameter Ledger
axioms (2)
- domain assumption Genetic programming can search effectively for feature extractors when guided by domain knowledge
- domain assumption Models that perform consistently across training subsets will generalize better to unseen data
Reference graph
Works this paper leans on
-
[1]
Medical time series classification with hierarchical attention-based temporal convolutional networks: A case study of myotonic dystrophy diagnosis
L. Lin, B. Xu, W. Wu, T. W. Richardson, and E. A. Bernal, “Medical time series classification with hierarchical attention-based temporal convolutional networks: A case study of myotonic dystrophy diagnosis.” inCVPR workshops, vol. 2, 2019
2019
-
[2]
A case driven study of the use of time series classification for flexibility in industry 4.0,
J. Polge, J. Robert, and Y . Le Traon, “A case driven study of the use of time series classification for flexibility in industry 4.0,”Sensors, vol. 20, no. 24, p. 7273, 2020
2020
-
[3]
Improving stock trend prediction through financial time series classification and temporal correlation analysis based on aligning change point: M. liang et al
M. Liang, X. Wang, and S. Wu, “Improving stock trend prediction through financial time series classification and temporal correlation analysis based on aligning change point: M. liang et al.”Soft Computing, vol. 27, no. 7, pp. 3655–3672, 2023
2023
-
[4]
Label-efficient time series representation learning: A review,
E. Eldele, M. Ragab, Z. Chen, M. Wu, C.-K. Kwoh, and X. Li, “Label-efficient time series representation learning: A review,”IEEE Transactions on Artificial Intelligence, vol. 5, no. 12, pp. 6027–6042, 2024
2024
-
[5]
A survey on time-series pre-trained models,
Q. Ma, Z. Liu, Z. Zheng, Z. Huang, S. Zhu, Z. Yu, and J. T. Kwok, “A survey on time-series pre-trained models,”IEEE Transactions on Knowledge and Data Engineering, vol. 36, no. 12, pp. 7536–7555, 2024
2024
-
[6]
Highly comparative feature-based time-series classification,
B. D. Fulcher and N. S. Jones, “Highly comparative feature-based time-series classification,”IEEE Transactions on Knowledge and Data Engineering, vol. 26, no. 12, pp. 3026–3037, 2014
2014
-
[7]
catch22: Canonical time-series characteristics: Selected through highly comparative time-series analysis,
C. H. Lubba, S. S. Sethi, P. Knaute, S. R. Schultz, B. D. Fulcher, and N. S. Jones, “catch22: Canonical time-series characteristics: Selected through highly comparative time-series analysis,”Data Mining and Knowledge Discovery, vol. 33, no. 6, pp. 1821–1852, 2019
2019
-
[8]
Tsfel: Time series feature extraction library,
M. Barandas, D. Folgado, L. Fernandes, S. Santos, M. Abreu, P. Bota, H. Liu, T. Schultz, and H. Gamboa, “Tsfel: Time series feature extraction library,”SoftwareX, vol. 11, p. 100456, 2020
2020
-
[9]
hctsa: A computational framework for automated time-series phenotyping using massive feature extraction,
B. D. Fulcher and N. S. Jones, “hctsa: A computational framework for automated time-series phenotyping using massive feature extraction,” Cell Systems, vol. 5, no. 5, pp. 527–531, 2017
2017
-
[10]
Ts2vec: Towards universal representation of time series,
Z. Yue, Y . Wang, J. Duan, T. Yang, C. Huang, Y . Tong, and B. Xu, “Ts2vec: Towards universal representation of time series,” inProceed- ings of the AAAI Conference on Artificial Iintelligence, vol. 36, no. 8, 2022, pp. 8980–8987
2022
-
[11]
Frera: a frequency-refined augmentation for contrastive learning on time series classification,
T. Tian, C. Miao, and H. Qian, “Frera: a frequency-refined augmentation for contrastive learning on time series classification,” inProceedings of the 31st ACM SIGKDD Conference on Knowledge Discovery and Data Mining V . 2, 2025, pp. 2835–2846
2025
-
[12]
Decoupling representation and classifier for long-tailed recognition,
B. Kang, S. Xie, M. Rohrbach, Z. Yan, A. Gordo, J. Feng, and Y . Kalantidis, “Decoupling representation and classifier for long-tailed recognition,” inInternational Conference on Learning Representations, 2020
2020
-
[13]
Content- aware balanced spectrum encoding in masked modeling for time series classification,
Y . Han, H. Wang, Y . Hu, Y . Gong, X. Song, and W. Guan, “Content- aware balanced spectrum encoding in masked modeling for time series classification,” inProceedings of the AAAI Conference on Artificial Intelligence, vol. 39, no. 16, 2025, pp. 17 059–17 067
2025
-
[14]
Mantis: Lightweight calibrated foundation model for user-friendly time series classification,
V . Feofanov, S. Wen, M. Alonso, R. Ilbert, H. Guo, M. Tiomoko, L. Pan, J. Zhang, and I. Redko, “Mantis: Lightweight calibrated foundation model for user-friendly time series classification,” in1st ICML Workshop on Foundation Models for Structured Data, 2025
2025
-
[15]
Moment: A family of open time-series foundation models,
M. Goswami, K. Szafer, A. Choudhry, Y . Cai, S. Li, and A. Dubrawski, “Moment: A family of open time-series foundation models,” inInter- national Conference on Machine Learning. PMLR, 2024, pp. 16 115– 16 152
2024
-
[16]
Time series representations for classification lie hidden in pretrained vision transformers,
S. Roschmann, Q. Bouniot, V . Feofanov, I. Redko, and Z. Akata, “Time series representations for classification lie hidden in pretrained vision transformers,”arXiv preprint arXiv:2506.08641, 2025
-
[17]
Ecg-tcn: Wearable cardiac arrhythmia detection with a temporal convolutional network,
T. M. Ingolfsson, X. Wang, M. Hersche, A. Burrello, L. Cavigelli, and L. Benini, “Ecg-tcn: Wearable cardiac arrhythmia detection with a temporal convolutional network,” in2021 IEEE 3rd International Conference on Artificial Intelligence Circuits and Systems (AICAS), 2021, pp. 1–4. 14
2021
-
[18]
Deep learning based approaches for intelligent indus- trial machinery health management and fault diagnosis in resource- constrained environments,
A. Saeed, M. A. Khan, U. Akram, W. J. Obidallah, S. Jawed, and A. Ahmad, “Deep learning based approaches for intelligent indus- trial machinery health management and fault diagnosis in resource- constrained environments,”Scientific Reports, vol. 15, no. 1, p. 1114, 2025
2025
-
[19]
Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead,
C. Rudin, “Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead,”Nature Machine Intelligence, vol. 1, no. 5, pp. 206–215, 2019
2019
-
[20]
Banzhaf,Genetic Programming: An Introduction, ser
W. Banzhaf,Genetic Programming: An Introduction, ser. The Morgan Kaufmann Series in Artificial Intelligence. Elsevier Science, 1998
1998
-
[21]
Co-evolutionary dual-tree genetic programming for fine-grained flower image classification,
Q. Wang, Y . Bi, B. Xue, and M. Zhang, “Co-evolutionary dual-tree genetic programming for fine-grained flower image classification,”IEEE Transactions on Evolutionary Computation, 2025
2025
-
[22]
Las-gp: Har- nessing large semantic libraries for evolutionary feature construction in symbolic regression,
H. Zhang, Q. Chen, B. Xue, W. Banzhaf, and M. Zhang, “Las-gp: Har- nessing large semantic libraries for evolutionary feature construction in symbolic regression,”IEEE Transactions on Evolutionary Computation, 2025
2025
-
[23]
Learning and sharing: A multitask genetic programming approach to image feature learning,
Y . Bi, B. Xue, and M. Zhang, “Learning and sharing: A multitask genetic programming approach to image feature learning,”IEEE Transactions on Evolutionary Computation, vol. 26, no. 2, pp. 218–232, 2022
2022
-
[24]
Multitree genetic programming for multimodal learning in multimodal medical image classification,
Z. Wu, B. Xue, and M. Zhang, “Multitree genetic programming for multimodal learning in multimodal medical image classification,”IEEE Transactions on Evolutionary Computation, 2025
2025
-
[25]
Multi-view feature construction using genetic programming for rolling bearing fault diag- nosis [application notes],
B. Peng, Y . Bi, B. Xue, M. Zhang, and S. Wan, “Multi-view feature construction using genetic programming for rolling bearing fault diag- nosis [application notes],”IEEE Computational Intelligence Magazine, vol. 16, no. 3, pp. 79–94, 2021
2021
-
[26]
Genetic programming with image- related operators and a flexible program structure for feature learning in image classification,
Y . Bi, B. Xue, and M. Zhang, “Genetic programming with image- related operators and a flexible program structure for feature learning in image classification,”IEEE Transactions on Evolutionary Computation, vol. 25, no. 1, pp. 87–101, 2020
2020
-
[27]
Gplight+: A genetic programming method for learning symmetric traffic signal control policy,
X.-C. Liao, Y . Mei, and M. Zhang, “Gplight+: A genetic programming method for learning symmetric traffic signal control policy,”IEEE Transactions on Evolutionary Computation, 2025
2025
-
[28]
Srlinear: Lightweight long-term time series forecasting via symbolic regression,
H. Zhao, H. Zhang, C. Tian, Z. Wei, and A. Zhou, “Srlinear: Lightweight long-term time series forecasting via symbolic regression,”IEEE Trans- actions on Artificial Intelligence, vol. 7, no. 1, pp. 210–224, 2026
2026
-
[29]
Pursuing the pareto paradigm: tournaments, algorithm variations and ordinal optimization,
M. Kotanchek, G. Smits, and E. Vladislavleva, “Pursuing the pareto paradigm: tournaments, algorithm variations and ordinal optimization,” inGenetic Programming Theory and Practice IV. Springer, 2006, pp. 167–185
2006
-
[30]
Time series feature extraction on basis of scalable hypothesis tests (tsfresh – a python package),
M. Christ, N. Braun, J. Neuffer, and A. W. Kempa-Liehr, “Time series feature extraction on basis of scalable hypothesis tests (tsfresh – a python package),”Neurocomputing, vol. 307, pp. 72–77, 2018
2018
-
[31]
Soft contrastive learning for time series,
S. Lee, T. Park, and K. Lee, “Soft contrastive learning for time series,” in12th International Conference on Learning Representations, 2024
2024
-
[32]
Simmtm: A simple pre-training framework for masked time-series modeling,
J. Dong, H. Wu, H. Zhang, L. Zhang, J. Wang, and M. Long, “Simmtm: A simple pre-training framework for masked time-series modeling,”Ad- vances in Neural Information Processing Systems, vol. 36, pp. 29 996– 30 025, 2023
2023
-
[33]
Timemae: Self-supervised representations of time series with decoupled masked autoencoders,
M. Cheng, X. Tao, Z. Liu, Q. Liu, H. Zhang, R. Zhang, and E. Chen, “Timemae: Self-supervised representations of time series with decoupled masked autoencoders,” inProceedings of the Nineteenth ACM Interna- tional Conference on Web Search and Data Mining, 2026, pp. 498–508
2026
-
[34]
Multitree genetic programming for learning color and multiscale features in image classification,
Q. Fan, Y . Bi, B. Xue, and M. Zhang, “Multitree genetic programming for learning color and multiscale features in image classification,”IEEE Transactions on Evolutionary Computation, vol. 29, no. 4, pp. 1055– 1069, 2025
2025
-
[35]
Multi-expert genetic programming based ensemble for long-tailed image classification,
Z. Chen, Q. Fan, R. Jiao, B. Xue, H. Huang, Y . Dai, and M. Zhang, “Multi-expert genetic programming based ensemble for long-tailed image classification,”IEEE Transactions on Evolutionary Computation, 2026
2026
-
[36]
Automatic feature extraction and construction using genetic programming for rotating machinery fault diagnosis,
B. Peng, S. Wan, Y . Bi, B. Xue, and M. Zhang, “Automatic feature extraction and construction using genetic programming for rotating machinery fault diagnosis,”IEEE Transactions on Cybernetics, vol. 51, no. 10, pp. 4909–4923, 2021
2021
-
[37]
Genetic programming with flexible region detection for fine-grained image classification,
Q. Wang, Y . Bi, B. Xue, and M. Zhang, “Genetic programming with flexible region detection for fine-grained image classification,”IEEE Transactions on Evolutionary Computation, vol. 29, no. 4, pp. 853–864, 2025
2025
-
[38]
Strongly typed genetic programming,
D. J. Montana, “Strongly typed genetic programming,”Evolutionary Computation, vol. 3, no. 2, pp. 199–230, 06 1995
1995
-
[39]
Extremely randomized trees,
P. Geurts, D. Ernst, and L. Wehenkel, “Extremely randomized trees,” Machine Learning, vol. 63, no. 1, pp. 3–42, 2006
2006
-
[40]
Genetic programming for image classification: A new program representation with flexible feature reuse,
Q. Fan, Y . Bi, B. Xue, and M. Zhang, “Genetic programming for image classification: A new program representation with flexible feature reuse,” IEEE Transactions on Evolutionary Computation, vol. 27, no. 3, pp. 460–474, 2022
2022
-
[41]
A time series is worth 64 words: Long-term forecasting with transformers,
Y . Nie, N. H. Nguyen, P. Sinthong, and J. Kalagnanam, “A time series is worth 64 words: Long-term forecasting with transformers,” in International Conference on Learning Representations, 2023
2023
-
[42]
Unlocking the power of patch: Patch-based mlp for long-term time series forecasting,
P. Tang and W. Zhang, “Unlocking the power of patch: Patch-based mlp for long-term time series forecasting,” inProceedings of the AAAI Conference on Artificial Intelligence, vol. 39, no. 12, 2025, pp. 12 640– 12 646
2025
-
[43]
Pcn: Patch segmentation convolutional networks in time series forecasting tasks,
J. Wang, K. Zheng, Y . Kuang, and W. Wang, “Pcn: Patch segmentation convolutional networks in time series forecasting tasks,”Knowledge- Based Systems, vol. 332, p. 114900, 2026
2026
-
[44]
Deep learning for time series classification using new hand-crafted convolution filters,
A. Ismail-Fawaz, M. Devanne, J. Weber, and G. Forestier, “Deep learning for time series classification using new hand-crafted convolution filters,” in2022 IEEE International Conference on Big Data. IEEE, 2022, pp. 972–981
2022
-
[45]
The ucr time series classi- fication archive,
H. A. Dau, E. Keogh, K. Kamgar, C.-C. M. Yeh, Y . Zhu, S. Gharghabi, C. A. Ratanamahatana, Yanping, B. Hu, N. Begum, A. Bagnall, A. Mueen, G. Batista, and Hexagon-ML, “The ucr time series classi- fication archive,” October 2018, https://www.cs.ucr.edu/∼eamonn/time series data 2018/
2018
-
[46]
An evolutionary forest for regres- sion,
H. Zhang, A. Zhou, and H. Zhang, “An evolutionary forest for regres- sion,”IEEE Transactions on Evolutionary Computation, vol. 26, no. 4, pp. 735–749, 2022
2022
-
[47]
Surrogate-assisted neighborhood search with only a few weight vectors for expensive large-scale multiobjective binary optimization,
H. Gu, H. Wang, Y . Mei, M. Zhang, and Y . Jin, “Surrogate-assisted neighborhood search with only a few weight vectors for expensive large-scale multiobjective binary optimization,”IEEE Transactions on Evolutionary Computation, vol. 29, no. 6, pp. 2626–2640, 2025
2025
-
[48]
Statistical comparisons of classifiers over multiple data sets,
J. Dem ˇsar, “Statistical comparisons of classifiers over multiple data sets,” Journal of Machine Learning Research, vol. 7, no. 1, pp. 1–30, 2006
2006
-
[49]
Mcunet: Tiny deep learning on iot devices,
J. Lin, W.-M. Chen, Y . Lin, C. Gan, and S. Han, “Mcunet: Tiny deep learning on iot devices,”Advances in Neural Information Processing Systems, vol. 33, 2020
2020
-
[50]
Real- time heartbeat classification on distributed edge devices: a performance and resource utilization study,
E. S. Pramukantoro, K. Amron, P. A. Kamila, and V . Wardhani, “Real- time heartbeat classification on distributed edge devices: a performance and resource utilization study,”Sensors, vol. 25, no. 19, p. 6116, 2025
2025
-
[51]
Micronas for memory and latency constrained hardware aware neural architecture search in time series classification on microcontrollers,
T. King, Y . Zhou, T. R ¨oddiger, and M. Beigl, “Micronas for memory and latency constrained hardware aware neural architecture search in time series classification on microcontrollers,”Scientific Reports, vol. 15, no. 1, p. 7575, 2025
2025
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.