pith. sign in

arxiv: 2607.00154 · v1 · pith:KD3LD7GFnew · submitted 2026-06-30 · 💻 cs.LG · cs.AI· cs.NE

EVOTS: Evolutionary Transformer Search for Time Series Forecasting

Pith reviewed 2026-07-02 19:48 UTC · model grok-4.3

classification 💻 cs.LG cs.AIcs.NE
keywords evolutionary neural architecture searchtransformer modelstime series forecastingmultivariate forecastingneural architecture searchETT benchmarks
0
0 comments X

The pith

Evolutionary search discovers competitive Transformer architectures for time series forecasting without hand-crafted rules.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces an evolutionary neural architecture search framework that encodes Transformer-like models via a modular genome for multivariate time series forecasting tasks. It shows that the resulting architectures achieve mean squared error competitive with or better than a fixed Transformer baseline specifically in the multivariate-to-multivariate setting across ETT family datasets and multiple forecast horizons. This matters because it indicates that search can adapt model structure to forecasting needs rather than relying on fixed designs. The framework includes a repair mechanism to keep generated architectures valid during evolution and reports training times as a measure of practicality.

Core claim

EVOTS encodes architectures as modular genomes allowing flexible composition of attention, feed-forward, and projection components, then applies evolutionary search with a repair mechanism that enforces validity. In experiments on ETTh1, ETTh2, ETTm1, and ETTm2 under univariate-to-univariate, multivariate-to-univariate, and multivariate-to-multivariate regimes with horizons 96 to 720, the evolved models reach competitive or improved MSE relative to a strong Transformer baseline in the multivariate-to-multivariate case.

What carries the argument

modular genome representation with repair mechanism that enforces structural validity to enable exploration of diverse architectures

If this is right

  • Evolved architectures remain competitive across univariate-to-univariate and multivariate-to-univariate settings as well.
  • Performance holds for forecast horizons of 96, 192, 336, and 720 steps.
  • Wall-clock training times stay within ranges that indicate practical computational cost.
  • The search process operates without hand-crafted design rules yet produces valid high-performing models.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same genome-plus-repair approach could be tested on other sequence tasks such as natural language or audio modeling.
  • If the evolved models generalize to unseen datasets outside the ETT family, the method would reduce dependence on task-specific manual tuning.
  • Extending the genome to include additional operator types or different evolutionary selection pressures would be a direct next experiment.

Load-bearing premise

The modular genome representation together with the repair mechanism enables effective exploration of a diverse architecture space without relying on hand-crafted design rules.

What would settle it

If repeated evolutionary runs on the ETTm1 or ETTm2 datasets in the multivariate-to-multivariate setting produce architectures whose MSE is consistently higher than the fixed Transformer baseline across the tested horizons, the performance claim would be falsified.

Figures

Figures reproduced from arXiv: 2607.00154 by AbdElRahman ElSaid, Damir Pulatov.

Figure 1
Figure 1. Figure 1: High-level schematic of the modular architecture search space. Repair does not introduce additional stochasticity and does not alter the evolutionary objective, enabling unrestricted exploration of the search space while guaranteeing that all evaluated individuals correspond to valid neural networks. 3.6 Variation Operators New architectures are generated using evolutionary variation operators adapted to t… view at source ↗
Figure 2
Figure 2. Figure 2: Distribution of MSE across forecasting horizons ( [PITH_FULL_IMAGE:figures/full_fig_p009_2.png] view at source ↗
read the original abstract

Evolutionary neural architecture design for multivariate time-series forecasting remains underexplored, with most approaches relying on fixed Transformer architectures despite substantial variation across tasks and forecasting settings. This paper introduces an evolutionary neural architecture search framework for discovering task-adaptive Transformer-like models for time-series forecasting (EVOTS). Architectures are encoded using a modular genome representation that enables flexible composition of attention, feed-forward, and projection components, while a repair mechanism enforces structural validity throughout the evolutionary process. This formulation allows effective exploration of a diverse architecture space without relying on hand-crafted design rules. The proposed approach is evaluated on four benchmark datasets from the ETT family (ETTh1, ETTh2, ETTm1, and ETTm2) under multiple forecasting settings, including univariate-to-univariate, multivariate-to-univariate, and multivariate-to-multivariate prediction, with horizons of 96, 192, 336, and 720. In the multivariate-to-multivariate setting, the evolved architectures achieve competitive and, in several cases, improved mean squared error relative to a strong Transformer-based baseline. Additional analyses examine performance differences across forecasting settings and report wall-clock training time to provide a coarse indication of computational cost. Overall, the results demonstrate that evolutionary search can effectively discover flexible and high-performing Transformer-like architectures for multivariate time-series forecasting within practical runtime constraints.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper introduces EVOTS, an evolutionary neural architecture search framework for discovering task-adaptive Transformer-like models for multivariate time-series forecasting. Architectures are encoded via a modular genome representation with a repair mechanism to enforce validity; the method is evaluated on the four ETT benchmark datasets under univariate-to-univariate, multivariate-to-univariate, and multivariate-to-multivariate settings with horizons 96/192/336/720. The central empirical claim is that, in the multivariate-to-multivariate setting, the evolved models achieve competitive and in several cases improved MSE relative to a strong Transformer baseline, while additional analyses address performance differences across settings and wall-clock training time.

Significance. If the empirical claims hold under proper statistical controls, the work would provide evidence that evolutionary search with a modular genome can discover flexible, high-performing Transformer variants for time-series forecasting without hand-crafted rules, addressing an underexplored area. The practical runtime results and multi-setting evaluation would further strengthen the contribution, though the current presentation leaves the robustness of the performance gains open to question.

major comments (2)
  1. [Abstract / Experimental results] Abstract and experimental results section: the reported MSE improvements in the multivariate-to-multivariate setting are presented as single point estimates from the evolutionary search process. Because initialization, mutation, crossover, and selection are stochastic, the absence of multiple independent runs, standard deviations, or statistical significance tests (e.g., paired t-tests or Wilcoxon tests against the baseline) makes it impossible to determine whether observed gains reflect systematic superiority or lucky trajectories; this directly undermines the central claim.
  2. [Experimental results] Experimental setup (baseline implementation and data handling): the abstract states competitive MSE results but provides no details on how the Transformer baseline was implemented, whether hyperparameter search was performed for it, the precise train/validation/test splits, or any statistical testing protocol. These omissions are load-bearing for the comparison claim and must be supplied with sufficient specificity to allow reproduction.
minor comments (2)
  1. [Method] The description of the repair mechanism and genome representation would benefit from a small illustrative example (e.g., a before/after repair diagram) to clarify how structural validity is maintained without hand-crafted rules.
  2. [Experimental results] Wall-clock training time is reported as a coarse cost indicator; it would be clearer to also report the number of evaluated architectures or total search budget in FLOPs or GPU-hours for context.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive comments, which highlight important aspects of statistical robustness and reproducibility. We address each major comment below and will revise the manuscript to incorporate the suggested improvements.

read point-by-point responses
  1. Referee: [Abstract / Experimental results] Abstract and experimental results section: the reported MSE improvements in the multivariate-to-multivariate setting are presented as single point estimates from the evolutionary search process. Because initialization, mutation, crossover, and selection are stochastic, the absence of multiple independent runs, standard deviations, or statistical significance tests (e.g., paired t-tests or Wilcoxon tests against the baseline) makes it impossible to determine whether observed gains reflect systematic superiority or lucky trajectories; this directly undermines the central claim.

    Authors: We agree that single-run point estimates limit the strength of the claims given the stochastic nature of the evolutionary process. In the revised manuscript, we will report results from multiple independent evolutionary runs (at least five per setting), include means and standard deviations, and add statistical significance tests (Wilcoxon signed-rank tests) comparing evolved models to the baseline. These additions will be placed in the experimental results section and reflected in an updated abstract. revision: yes

  2. Referee: [Experimental results] Experimental setup (baseline implementation and data handling): the abstract states competitive MSE results but provides no details on how the Transformer baseline was implemented, whether hyperparameter search was performed for it, the precise train/validation/test splits, or any statistical testing protocol. These omissions are load-bearing for the comparison claim and must be supplied with sufficient specificity to allow reproduction.

    Authors: We acknowledge the need for greater specificity. The revised version will expand the experimental setup subsection to include: (i) the exact Transformer baseline architecture and hyperparameters, (ii) whether and how hyperparameter tuning was performed for the baseline, (iii) the precise train/validation/test split ratios and any preprocessing steps for the ETT datasets, and (iv) the full statistical testing protocol. This information will enable full reproduction of the comparisons. revision: yes

Circularity Check

0 steps flagged

No circularity: empirical performance claims rest on external baseline comparisons

full rationale

The paper introduces an evolutionary NAS framework (EVOTS) for Transformer-like models in time-series forecasting and evaluates it empirically on ETT benchmarks under multiple settings. The central claim of competitive or improved MSE in the multivariate-to-multivariate case is supported by direct comparison to a fixed Transformer baseline, with no mathematical derivation, fitted parameters renamed as predictions, or self-citation chains that reduce the result to its own inputs. The modular genome and repair mechanism are design choices whose effectiveness is assessed via external benchmarks rather than by construction. This is a standard empirical NAS study with no load-bearing self-referential steps.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract-only review; no explicit free parameters, axioms, or invented entities are stated. The approach implicitly relies on standard evolutionary search assumptions and the validity of the ETT benchmark family.

pith-pipeline@v0.9.1-grok · 5772 in / 1045 out tokens · 20195 ms · 2026-07-02T19:48:45.211552+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

34 extracted references · 5 canonical work pages · 3 internal anchors

  1. [1]

    itransformer: Inverted transformers are effective for time series forecasting,

    Y . Liu, T. Hu, H. Zhang, H. Wu, S. Wang, L. Ma, and M. Long, “itransformer: Inverted transformers are effective for time series forecasting,” inInternational Conference on Representation Learning(B. Kim, Y . Yue, S. Chaudhuri, K. Fragkiadaki, M. Khan, and Y . Sun, eds.), vol. 2024, pp. 11116–11140, 2024

  2. [2]

    Attention is all you need,

    A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, Ł. Kaiser, and I. Polosukhin, “Attention is all you need,”Advances in neural information processing systems, vol. 30, 2017

  3. [3]

    Informer: Beyond efficient transformer for long sequence time-series forecasting,

    H. Zhou, S. Zhang, J. Peng, S. Zhang, J. Li, H. Xiong, and W. Zhang, “Informer: Beyond efficient transformer for long sequence time-series forecasting,” inProceedings of the AAAI conference on artificial intelligence, vol. 35, pp. 11106–11115, 2021

  4. [4]

    Temporal fusion transformers for interpretable multi-horizon time series forecasting,

    B. Lim, S. Ö. Arık, N. Loeff, and T. Pfister, “Temporal fusion transformers for interpretable multi-horizon time series forecasting,”International journal of forecasting, vol. 37, no. 4, pp. 1748–1764, 2021

  5. [5]

    Autoformer: Decomposition transformers with auto-correlation for long-term series forecasting,

    H. Wu, J. Xu, J. Wang, and M. Long, “Autoformer: Decomposition transformers with auto-correlation for long-term series forecasting,”Advances in neural information processing systems, vol. 34, pp. 22419–22430, 2021

  6. [6]

    Crossformer: Transformer utilizing cross-dimension dependency for multivariate time series forecasting,

    Y . Zhang and J. Yan, “Crossformer: Transformer utilizing cross-dimension dependency for multivariate time series forecasting,” inThe Eleventh International Conference on Learning Representations, 2023

  7. [7]

    Large language models are zero-shot time series forecasters,

    N. Gruver, M. Finzi, S. Qiu, and A. G. Wilson, “Large language models are zero-shot time series forecasters,” Advances in Neural Information Processing Systems, vol. 36, pp. 19622–19635, 2023

  8. [8]

    Promptcast: A new prompt-based learning paradigm for time series forecasting,

    H. Xue and F. D. Salim, “Promptcast: A new prompt-based learning paradigm for time series forecasting,”IEEE Transactions on Knowledge and Data Engineering, vol. 36, no. 11, pp. 6851–6864, 2023

  9. [9]

    One fits all: Power general time series analysis by pretrained lm,

    T. Zhou, P. Niu, L. Sun, R. Jin,et al., “One fits all: Power general time series analysis by pretrained lm,”Advances in neural information processing systems, vol. 36, pp. 43322–43355, 2023

  10. [10]

    Time-LLM: Time Series Forecasting by Reprogramming Large Language Models

    M. Jin, S. Wang, L. Ma, Z. Chu, J. Y . Zhang, X. Shi, P.-Y . Chen, Y . Liang, Y .-F. Li, S. Pan,et al., “Time-llm: Time series forecasting by reprogramming large language models,”arXiv preprint arXiv:2310.01728, 2023

  11. [11]

    Lag-llama: Towards foundation models for time series forecasting,

    K. Rasul, A. Ashok, A. R. Williams, A. Khorasani, G. Adamopoulos, R. Bhagwatkar, M. Biloš, H. Ghonia, N. Hassen, A. Schneider,et al., “Lag-llama: Towards foundation models for time series forecasting,” inR0-FoMo: Robustness of Few-shot and Zero-shot Learning in Large Foundation Models, 2023

  12. [12]

    Moment: A family of open time-series foundation models,

    M. Goswami, K. Szafer, A. Choudhry, Y . Cai, S. Li, and A. Dubrawski, “Moment: A family of open time-series foundation models,”arXiv preprint arXiv:2402.03885, 2024

  13. [13]

    A decoder-only foundation model for time-series forecasting,

    A. Das, W. Kong, R. Sen, and Y . Zhou, “A decoder-only foundation model for time-series forecasting,” in Forty-first International Conference on Machine Learning, 2024

  14. [14]

    Unified training of universal time series forecasting transformers, 2024,

    G. Woo, C. Liu, A. Kumar, C. Xiong, S. Savarese, and D. Sahoo, “Unified training of universal time series forecasting transformers, 2024,”URL https://arxiv. org/abs/2402.02592, vol. 7, 2024

  15. [15]

    Chronos: Learning the language of time series,

    A. F. Ansari, L. Stella, C. Turkmen, X. Zhang, P. Mercado, H. Shen, O. Shchur, S. S. Rangapuram, S. Pineda Arango, S. Kapoor, J. Zschiegner, D. C. Maddix, M. W. Mahoney, K. Torkkola, A. Gordon Wil- son, M. Bohlke-Schneider, and Y . Wang, “Chronos: Learning the language of time series,”Transactions on Machine Learning Research, 2024

  16. [16]

    Chronos-2: From Univariate to Universal Forecasting

    A. F. Ansari, O. Shchur, J. Küken, A. Auer, B. Han, P. Mercado, S. S. Rangapuram, H. Shen, L. Stella, X. Zhang, M. Goswami, S. Kapoor, D. C. Maddix, P. Guerron, T. Hu, J. Yin, N. Erickson, P. M. Desai, H. Wang, H. Rangwala, G. Karypis, Y . Wang, and M. Bohlke-Schneider, “Chronos-2: From univariate to universal forecasting,”arXiv preprint arXiv:2510.15821, 2025

  17. [17]

    Forecastpfn: Synthetically-trained zero-shot forecasting,

    S. Dooley, G. S. Khurana, C. Mohapatra, S. V . Naidu, and C. White, “Forecastpfn: Synthetically-trained zero-shot forecasting,” inAdvances in Neural Information Processing Systems, 2023

  18. [18]

    From tables to time: How tabpfn-v2 outperforms specialized time series forecasting models,

    S. B. Hoo, S. Müller, D. Salinas, and F. Hutter, “From tables to time: How tabpfn-v2 outperforms specialized time series forecasting models,” 2025

  19. [19]

    Neural architecture search: A survey,

    T. Elsken, J. H. Metzen, and F. Hutter, “Neural architecture search: A survey,”Journal of Machine Learning Research, vol. 20, no. 55, pp. 1–21, 2019. 12 APREPRINT- JULY2, 2026

  20. [20]

    Evolving neural networks through augmenting topologies,

    K. O. Stanley and R. Miikkulainen, “Evolving neural networks through augmenting topologies,”Evolutionary computation, vol. 10, no. 2, pp. 99–127, 2002

  21. [21]

    Investigating recurrent neural network memory structures using neuro- evolution,

    A. Ororbia, A. ElSaid, and T. Desell, “Investigating recurrent neural network memory structures using neuro- evolution,” inProceedings of the genetic and evolutionary computation conference, pp. 446–455, 2019

  22. [22]

    Exa-gp: Unifying graph-based genetic programming and neuroevo- lution for explainable time series forecasting,

    J. Murphy, D. Kar, J. Karns, and T. Desell, “Exa-gp: Unifying graph-based genetic programming and neuroevo- lution for explainable time series forecasting,” inProceedings of the Genetic and Evolutionary Computation Conference Companion, pp. 523–526, 2024

  23. [23]

    Ant-based neural topology search (ants) for optimizing recurrent networks,

    A. ElSaid, A. G. Ororbia, and T. J. Desell, “Ant-based neural topology search (ants) for optimizing recurrent networks,” inInternational Conference on the Applications of Evolutionary Computation (Part of EvoStar), pp. 626–641, Springer, 2020

  24. [24]

    Continuous ant-based neural topology search,

    A. ElSaid, J. Karns, Z. Lyu, A. G. Ororbia, and T. Desell, “Continuous ant-based neural topology search,” in International Conference on the Applications of Evolutionary Computation (Part of EvoStar), pp. 291–306, Springer, 2021

  25. [25]

    Cg-cants-n: A versatile graph-based framework for scalable and adaptive problem solving across domains,

    A. ElSaid and T. Desell, “Cg-cants-n: A versatile graph-based framework for scalable and adaptive problem solving across domains,” inProceedings of the Genetic and Evolutionary Computation Conference Companion, pp. 263–266, 2025

  26. [26]

    Nsga-net: neural architecture search using multi-objective genetic algorithm,

    Z. Lu, I. Whalen, V . Boddeti, Y . Dhebar, K. Deb, E. Goodman, and W. Banzhaf, “Nsga-net: neural architecture search using multi-objective genetic algorithm,” inProceedings of the genetic and evolutionary computation conference, pp. 419–427, 2019

  27. [27]

    Regularized evolution for image classifier architecture search,

    E. Real, A. Aggarwal, Y . Huang, and Q. V . Le, “Regularized evolution for image classifier architecture search,” in Proceedings of the aaai conference on artificial intelligence, vol. 33, pp. 4780–4789, 2019

  28. [28]

    Efficient neural architecture search via parameters sharing,

    H. Pham, M. Guan, B. Zoph, Q. Le, and J. Dean, “Efficient neural architecture search via parameters sharing,” in International conference on machine learning, pp. 4095–4104, PMLR, 2018

  29. [29]

    DARTS: Differentiable Architecture Search

    H. Liu, K. Simonyan, and Y . Yang, “Darts: Differentiable architecture search,”arXiv preprint arXiv:1806.09055, 2018

  30. [30]

    The evolved transformer,

    D. So, Q. Le, and C. Liang, “The evolved transformer,” inInternational conference on machine learning, pp. 5877–5886, PMLR, 2019

  31. [31]

    Searching the search space of vision transformer,

    M. Chen, K. Wu, B. Ni, H. Peng, B. Liu, J. Fu, H. Chao, and H. Ling, “Searching the search space of vision transformer,”Advances in Neural Information Processing Systems, vol. 34, pp. 8714–8726, 2021

  32. [32]

    Nasvit: Neural architecture search for efficient vision transformers with gradient conflict-aware supernet training,

    C. Gong and D. Wang, “Nasvit: Neural architecture search for efficient vision transformers with gradient conflict-aware supernet training,”ICLR Proceedings 2022, 2022

  33. [33]

    M. D. V ose,The Simple Genetic Algorithm: Foundations and Theory. The MIT Press, 08 1999

  34. [34]

    Designing neural networks through neuroevolution,

    K. O. Stanley, J. Clune, J. Lehman, and R. Miikkulainen, “Designing neural networks through neuroevolution,” Nature Machine Intelligence, vol. 1, no. 1, pp. 24–35, 2019. 13