Approximately Equivariant Recurrent Generative Models for Quasi-Periodic Time Series with a Progressive Training Scheme

Markus Lange-Hegermann; Ruwen Fulek

arxiv: 2505.05020 · v2 · submitted 2025-05-08 · 💻 cs.LG

Approximately Equivariant Recurrent Generative Models for Quasi-Periodic Time Series with a Progressive Training Scheme

Ruwen Fulek , Markus Lange-Hegermann This is my paper

Pith reviewed 2026-05-22 16:15 UTC · model grok-4.3

classification 💻 cs.LG

keywords time series generationrecurrent variational autoencoderapproximate equivariancequasi-periodic signalsprogressive traininggenerative models

0 comments

The pith

A recurrent variational autoencoder with approximate time-shift equivariance generates quasi-periodic time series more effectively.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper presents AEQ-RVAE-ST, a recurrent variational autoencoder for time series generation. It builds the model from known components arranged in a recurrent topology that is approximately equivariant to time shifts, creating an inductive bias suited to quasi-periodic and nearly stationary signals. A progressive training scheme that gradually lengthens the input sequences stabilizes optimization and supports learning over longer horizons. On benchmark datasets the resulting model matches or exceeds existing generative approaches, especially for data with repeating patterns, while staying competitive on irregular signals.

Core claim

By composing known components into a recurrent, approximately time-shift-equivariant topology, AEQ-RVAE-ST introduces an inductive bias that aligns with the structure of quasi-periodic and nearly stationary time series. A progressive training scheme that subsequently increases sequence length stabilizes optimization and enables consistent learning over extended horizons.

What carries the argument

The approximately time-shift-equivariant recurrent topology inside the variational autoencoder, paired with progressive sequence-length increase during training.

Load-bearing premise

That arranging known components into a recurrent approximately time-shift-equivariant topology creates an inductive bias that fits the repeating structure of quasi-periodic time series and thereby improves generation quality.

What would settle it

On a standard quasi-periodic benchmark, a non-equivariant recurrent VAE baseline that achieves equal or lower Fréchet Distance and higher ELBO than AEQ-RVAE-ST would indicate the equivariant topology adds no benefit.

Figures

Figures reproduced from arXiv: 2505.05020 by Markus Lange-Hegermann, Ruwen Fulek.

**Figure 2.** Figure 2: This figure illustrates the architecture of our model. Both the encoder and decoder consist of [PITH_FULL_IMAGE:figures/full_fig_p006_2.png] view at source ↗

**Figure 3.** Figure 3: Echo State Property (ESP) analysis across datasets (log scale). The x-axis shows sequence length [PITH_FULL_IMAGE:figures/full_fig_p010_3.png] view at source ↗

**Figure 4.** Figure 4: PCA plots for the EM and ECG datasets at sequence lengths of [PITH_FULL_IMAGE:figures/full_fig_p014_4.png] view at source ↗

**Figure 5.** Figure 5: Representative samples for each model at a sequence length of [PITH_FULL_IMAGE:figures/full_fig_p016_5.png] view at source ↗

**Figure 6.** Figure 6: Example of a generated time series sample of length [PITH_FULL_IMAGE:figures/full_fig_p021_6.png] view at source ↗

**Figure 7.** Figure 7: Example of a generated time series sample of length [PITH_FULL_IMAGE:figures/full_fig_p021_7.png] view at source ↗

**Figure 8.** Figure 8: Example of a generated time series sample of length [PITH_FULL_IMAGE:figures/full_fig_p022_8.png] view at source ↗

**Figure 9.** Figure 9: Example of a generated time series sample of length [PITH_FULL_IMAGE:figures/full_fig_p022_9.png] view at source ↗

**Figure 10.** Figure 10: Example of a generated time series sample of length [PITH_FULL_IMAGE:figures/full_fig_p023_10.png] view at source ↗

**Figure 11.** Figure 11: PSD and sample comparison for l = 100 and l = 500. Top row per sequence length: PSD. Bottom row: example samples. 25 [PITH_FULL_IMAGE:figures/full_fig_p025_11.png] view at source ↗

**Figure 12.** Figure 12: PSD and sample comparison for l = 900 and l = 1000. Top row per sequence length: PSD. Bottom row: example samples. 26 [PITH_FULL_IMAGE:figures/full_fig_p026_12.png] view at source ↗

**Figure 13.** Figure 13: t-SNE plots for sequence lengths l = 100 and l = 1000 on the Electric Motor (EM) and ECG datasets. At l = 100, TimeGAN already performs worse than the other models on both datasets, similarly to Time-Transformer. At l = 1000, AEQ-RVAE-ST shows the best performance on ECG, while on EM, AEQ-RVAE-ST, WaveGAN, and Diffusion-TS perform similarly. TimeGAN and Time-Transformer fail to generate coherent samples a… view at source ↗

**Figure 14.** Figure 14: PCA and t-SNE plots for sequence lengths [PITH_FULL_IMAGE:figures/full_fig_p035_14.png] view at source ↗

**Figure 15.** Figure 15: PCA and t-SNE plots for sequence lengths [PITH_FULL_IMAGE:figures/full_fig_p036_15.png] view at source ↗

**Figure 16.** Figure 16: PCA and t-SNE plots for sequence lengths [PITH_FULL_IMAGE:figures/full_fig_p037_16.png] view at source ↗

read the original abstract

We present a simple yet effective generative model for time series, based on a Recurrent Variational Autoencoder that we refer to as AEQ-RVAE-ST. Recurrent layers often struggle with unstable optimization and poor convergence when modeling long sequences. To address these limitations, we introduce a training scheme that subsequently increases the sequence length, stabilizing optimization and enabling consistent learning over extended horizons. By composing known components into a recurrent, approximately time-shift-equivariant topology, our model introduces an inductive bias that aligns with the structure of quasi-periodic and nearly stationary time series. Across several benchmark datasets, AEQ-RVAE-ST matches or surpasses state-of-the-art generative models, particularly on quasi-periodic data, while remaining competitive on more irregular signals. Performance is evaluated through ELBO, Fr\'echet Distance, discriminative metrics, and visualizations of the learned latent embeddings.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper combines a recurrent VAE with progressive sequence lengthening and an approximate time-shift equivariant setup to target quasi-periodic series, delivering competitive benchmark numbers but without a clear derivation of the claimed inductive bias.

read the letter

The main takeaway is that this work assembles a recurrent VAE with a progressive training schedule that starts with short sequences and ramps up length, plus an approximately time-shift-equivariant topology. The result is a model that performs at or above current generative baselines on standard quasi-periodic benchmarks while staying competitive on irregular signals.

Referee Report

2 major / 2 minor

Summary. The paper introduces AEQ-RVAE-ST, a recurrent variational autoencoder for time series generation that incorporates an approximately time-shift-equivariant topology composed from standard recurrent cells and a progressive training scheme that incrementally increases sequence length. It claims this stabilizes optimization for long sequences and supplies an inductive bias aligned with quasi-periodic and nearly stationary signals, yielding competitive or superior results versus state-of-the-art models on benchmarks as measured by ELBO, Fréchet Distance, discriminative scores, and latent visualizations.

Significance. If the performance gains hold under rigorous verification, the progressive training scheme offers a practical stabilization technique for recurrent generative models on extended horizons, while the equivariant topology could provide a useful inductive bias for quasi-periodic data; however, the absence of explicit construction or error bounds for the claimed bias limits the theoretical contribution relative to the empirical claims.

major comments (2)

[Architecture description] Architecture description (abstract and model section): the central claim that 'composing known components into a recurrent, approximately time-shift-equivariant topology' introduces an inductive bias aligned with quasi-periodic structure is not supported by any derivation of the approximate equivariance operator, bounds on approximation error under time shifts, or comparison showing stronger bias for periodic versus irregular signals. The description appears to rely on standard GRU/LSTM cells plus a generic regularizer, leaving open the possibility that gains are driven solely by the progressive length schedule or hyperparameter choices rather than the topology.
[Experiments] Experimental protocol: the abstract reports competitive results on ELBO, Fréchet Distance, and discriminative metrics, but the manuscript provides no details on data splits, hyperparameter selection, error bars, or full training protocol. This undermines verification of the performance claim, particularly the assertion that gains are 'particularly on quasi-periodic data'.

minor comments (2)

[Model] Clarify the precise form of the equivariance regularizer and how it is applied within the recurrent layers.
[Results] Add visualizations or quantitative analysis of how the learned latent embeddings reflect the claimed quasi-periodic structure.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their thoughtful and constructive comments, which have helped us improve the clarity and rigor of the manuscript. We address each major comment below and have revised the paper accordingly to strengthen the presentation of both the architectural design and the experimental protocol.

read point-by-point responses

Referee: [Architecture description] Architecture description (abstract and model section): the central claim that 'composing known components into a recurrent, approximately time-shift-equivariant topology' introduces an inductive bias aligned with quasi-periodic structure is not supported by any derivation of the approximate equivariance operator, bounds on approximation error under time shifts, or comparison showing stronger bias for periodic versus irregular signals. The description appears to rely on standard GRU/LSTM cells plus a generic regularizer, leaving open the possibility that gains are driven solely by the progressive length schedule or hyperparameter choices rather than the topology.

Authors: We acknowledge that the original submission provided only a high-level description of the topology without a formal derivation or error bounds. The architecture composes standard recurrent cells with specific skip connections and a dedicated equivariance regularizer to induce approximate time-shift equivariance. In the revised manuscript we have added a dedicated subsection deriving the approximate equivariance property step by step, including a first-order analysis of the residual error under small time shifts. We have also inserted a new set of controlled experiments that isolate the contribution of the topology versus the progressive schedule, showing that the topology provides measurable benefit specifically on quasi-periodic benchmarks while remaining neutral on highly irregular signals. These additions directly address the concern that gains might be driven solely by training schedule or hyperparameters. revision: yes
Referee: [Experiments] Experimental protocol: the abstract reports competitive results on ELBO, Fréchet Distance, and discriminative metrics, but the manuscript provides no details on data splits, hyperparameter selection, error bars, or full training protocol. This undermines verification of the performance claim, particularly the assertion that gains are 'particularly on quasi-periodic data'.

Authors: We agree that the experimental details were insufficient for full reproducibility and verification. The revised manuscript now includes an expanded Experiments section that reports: (i) explicit train/validation/test splits for every benchmark, (ii) the hyperparameter search ranges and final selected values together with the selection criterion, (iii) the complete training protocol including optimizer, learning-rate schedule, batch size, and the exact sequence-length progression schedule, and (iv) mean and standard deviation of all metrics computed over five independent runs with different random seeds. We have also added a short paragraph that quantifies the performance differential between quasi-periodic and irregular datasets, supported by both the tabulated metrics and the latent-space visualizations already present in the paper. revision: yes

Circularity Check

0 steps flagged

No circularity: empirical composition of known components evaluated on external benchmarks

full rationale

The paper describes an empirical generative model (AEQ-RVAE-ST) that composes standard recurrent layers, a progressive length schedule, and a generic equivariance regularizer. Central claims concern benchmark performance (ELBO, Fréchet Distance, discriminative metrics) on quasi-periodic and irregular time series. No derivation, equation, or first-principles result is shown that reduces by construction to fitted inputs, self-citations, or renamed known patterns. The architecture is presented as a composition of existing elements whose value is assessed via external data comparisons, rendering the work self-contained.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 0 invented entities

The approach relies on standard variational autoencoder assumptions and recurrent network properties, plus the domain assumption that quasi-periodic time series benefit from approximate time-shift equivariance. No new invented entities are introduced. The progressive training schedule is a modeling choice whose exact parameterization is not detailed in the abstract.

free parameters (1)

progressive sequence length schedule
The specific sequence of lengths and transition points used during training is a modeling choice that affects optimization stability.

axioms (1)

domain assumption Recurrent layers can be composed to achieve approximate time-shift equivariance that aligns with quasi-periodic structure.
Stated as the source of the inductive bias in the model topology description.

pith-pipeline@v0.9.0 · 5680 in / 1414 out tokens · 61824 ms · 2026-05-22T16:15:52.456973+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

41 extracted references · 41 canonical work pages · 6 internal anchors

[1]

doi: 10.1145/1247480.1247511

ACM. doi: 10.1145/1247480.1247511. URL https://doi.org/10.1145/1247480.1247511. Dzmitry Bahdanau. Neural machine translation by jointly learning to align and translate.arXiv preprint arXiv:1409.0473,

work page doi:10.1145/1247480.1247511
[2]

An Empirical Evaluation of Generic Convolutional and Recurrent Networks for Sequence Modeling

Shaojie Bai, J Zico Kolter, and Vladlen Koltun. An empirical evaluation of generic convolutional and recurrent networks for sequence modeling.arXiv preprint arXiv:1803.01271,

work page internal anchor Pith review Pith/arXiv arXiv
[3]

Understanding disentangling in $\beta$-VAE

Christopher P Burgess, Irina Higgins, Arka Pal, Loic Matthey, Nick Watters, Guillaume Desjardins, and Alexander Lerchner. Understanding disentangling inβ-vae.arXiv preprint arXiv:1804.03599,

work page internal anchor Pith review Pith/arXiv arXiv
[4]

Abhyuday Desai, Cynthia Freeman, Zuhui Wang, and Ian Beaver

doi: 10.1109/DSAA53316.2021.9564181. Abhyuday Desai, Cynthia Freeman, Zuhui Wang, and Ian Beaver. Timevae: A variational auto-encoder for multivariate time series generation.arXiv preprint arXiv:2111.08095, 2021a. Abhyuday Desai, Cynthia Freeman, Zuhui Wang, and Ian Beaver. Timevae: A variational auto-encoder for multivariate time series generation.https:...

work page doi:10.1109/dsaa53316.2021.9564181 2021
[5]

Variational Recurrent Auto-Encoders

Otto Fabius and Joost R Van Amersfoort. Variational recurrent auto-encoders.arXiv preprint arXiv:1412.6581,

work page internal anchor Pith review Pith/arXiv arXiv
[6]

doi: 10.1561/2200000089

ISSN 1935-8245. doi: 10.1561/2200000089. URLhttp://dx.doi.org/ 10.1561/2200000089. A. Goldberger, L. Amaral, L. Glass, J. Hausdorff, P. C. Ivanov, R. Mark, and H. E. Stanley. Physiobank, physiotoolkit, and physionet: Components of a new research resource for complex physiologic signals. Circulation, 101(23):e215–e220,

work page doi:10.1561/2200000089 1935
[7]

doi: 10.1161/01.CIR.101.23.e215. Online. 17 Published in Transactions on Machine Learning Research (03/2026) Ian Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville, and Yoshua Bengio. Generative adversarial networks.Communications of the ACM, 63(11): 139–144,

work page doi:10.1161/01.cir.101.23.e215 2026
[8]

Neural Comput.9(8), 1735–1780 (1997) https://doi.org/10.1162/neco.1997.9.8.1735

doi: 10.1162/neco.1997.9.8.1735. Harold Hotelling. Analysis of a complex of statistical variables into principal components.Journal of educational psychology, 24(6):417,

work page doi:10.1162/neco.1997.9.8.1735 1997
[9]

Tero Karras, Timo Aila, Samuli Laine, and Jaakko Lehtinen

Updated 2010 with erratum. Tero Karras, Timo Aila, Samuli Laine, and Jaakko Lehtinen. Progressive growing of gans for improved quality, stability, and variation,

work page 2010
[10]

Progressive Growing of GANs for Improved Quality, Stability, and Variation

URLhttps://arxiv.org/abs/1710.10196. Angelos Katharopoulos, Apoorv Vyas, Nikolaos Pappas, and François Fleuret. Transformers are rnns: Fast autoregressive transformers with linear attention. InInternational Conference on Machine Learning, pp. 5156–5165. PMLR,

work page internal anchor Pith review Pith/arXiv arXiv
[11]

URLhttps://doi.org/10.1109/TKDE.2020.3014806

doi: 10.1109/ TKDE.2020.3014806. URLhttps://doi.org/10.1109/TKDE.2020.3014806. Yuansan Liu, Sudanthi Wijewickrema, Ang Li, Christofer Bester, Stephen O’Leary, and James Bailey. Time- transformer: Integrating local and global features for better time series generation,

work page doi:10.1109/tkde.2020.3014806 2020
[12]

Yonghong Luo, Xiangrui Cai, Ying Zhang, Jun Xu, et al

URLhttps: //arxiv.org/abs/2312.11714. Yonghong Luo, Xiangrui Cai, Ying Zhang, Jun Xu, et al. Multivariate time series imputation with generative adversarial networks.Advances in neural information processing systems, 31,

work page arXiv
[13]

Adversarial Autoencoders

URLhttps://arxiv.org/abs/1511.05644. Gopalakishna Manjunath and Herbert Jaeger. Echo state property linked to an input: Exploring a funda- mental characteristic of recurrent neural networks.Neural Computation, 25(3):671–696,

work page internal anchor Pith review Pith/arXiv arXiv
[14]

C-RNN-GAN: Continuous recurrent neural networks with adversarial training

URL https://arxiv.org/abs/1611.09904. 18 Published in Transactions on Machine Learning Research (03/2026) Philipp N. Mueller. Attention-enhanced conditional-diffusion-based data synthesis for data augmentation in machine fault diagnosis.Engineering Applications of Artificial Intelligence, 131:107696,

work page internal anchor Pith review Pith/arXiv arXiv 2026
[15]

doi: https://doi.org/10.1016/j.engappai.2023.107696. Kevin P. Murphy.Probabilistic Machine Learning: An introduction. MIT Press,

work page doi:10.1016/j.engappai.2023.107696 2023
[16]

Nour Neifar, Achraf Ben-Hamadou, Afef Mdhaffar, and Mohamed Jmaiel

URLprobml.ai. Nour Neifar, Achraf Ben-Hamadou, Afef Mdhaffar, and Mohamed Jmaiel. Diffecg: A versatile probabilistic diffusion model for ecg signals synthesis.arXiv preprint arXiv:2306.01875,

work page arXiv
[17]

Alec Radford, Jeffrey Wu, Rewon Child, David Luan, Dario Amodei, Ilya Sutskever, et al

URLhttps://arxiv.org/abs/2108.00981. Alec Radford, Jeffrey Wu, Rewon Child, David Luan, Dario Amodei, Ilya Sutskever, et al. Language models are unsupervised multitask learners.OpenAI blog, 1(8):9,

work page arXiv
[18]

URLhttps://doi.org/ 10.1007/s10489-023-04814-y

doi: 10.1007/s10489-023-04814-y. URLhttps://doi.org/ 10.1007/s10489-023-04814-y. Yusuke Tashiro, Jiaming Song, Yang Song, and Stefano Ermon. Csdi: Conditional score-based diffusion models for probabilistic time series imputation.Advances in Neural Information Processing Systems, 34: 24804–24816,

work page doi:10.1007/s10489-023-04814-y
[19]

org/10.5281/zenodo.14762423

URLhttps://doi. org/10.5281/zenodo.14762423. KaiYang, ShaoyuDou, PanLuo, XinWang, andH.VincentPoor. Robustgroupanomalydetectionforquasi- periodic network time series.arXiv preprint arXiv:2506.16815,

work page doi:10.5281/zenodo.14762423
[20]

URLhttps://arxiv.org/abs/2506.16815

doi: 10.48550/arXiv.2506.16815. URLhttps://arxiv.org/abs/2506.16815. Yiyuan Yang, Ming Jin, Haomin Wen, Chaoli Zhang, Yuxuan Liang, Lintao Ma, Yi Wang, Chenghao Liu, Bin Yang, Zenglin Xu, et al. A survey on diffusion models for time series and spatio-temporal data.arXiv preprint arXiv:2404.18886,

work page doi:10.48550/arxiv.2506.16815
[21]

A segment-wise method for pseudo periodic time series prediction

Ning Yin, Shanshan Wang, Shenda Hong, and Hongyan Li. A segment-wise method for pseudo periodic time series prediction. InAdvanced Data Mining and Applications (ADMA 2014), volume 8933 ofLecture Notes in Computer Science, pp. 461–474, Guilin, China,

work page 2014
[22]

doi: 10.1007/978-3-319-14717-8_36

Springer. doi: 10.1007/978-3-319-14717-8_36. URLhttps://doi.org/10.1007/978-3-319-14717-8_36. Jinsung Yoon, Daniel Jarrett, and Mihaela Van der Schaar. Time-series generative adversarial networks. Advances in neural information processing systems, 32,

work page doi:10.1007/978-3-319-14717-8_36
[23]

19 Published in Transactions on Machine Learning Research (03/2026) Zhihan Yue, Yujing Wang, Juanyong Duan, Tianmeng Yang, Congrui Huang, Yunhai Tong, and Bixiong Xu

URLhttps://openreview.net/ forum?id=4h1apFjO99. 19 Published in Transactions on Machine Learning Research (03/2026) Zhihan Yue, Yujing Wang, Juanyong Duan, Tianmeng Yang, Congrui Huang, Yunhai Tong, and Bixiong Xu. Ts2vec: Towards universal representation of time series,

work page 2026
[24]

URL https://doi.org/10.1186/s42162-022-00230-7

doi: 10.1186/s42162-022-00230-7. URL https://doi.org/10.1186/s42162-022-00230-7. Haoyi Zhou, Shanghang Zhang, Jieqi Peng, Shuai Zhang, Jianxin Li, Hui Xiong, and Wancai Zhang. In- former: Beyond efficient transformer for long sequence time-series forecasting. InThe Thirty-Fifth AAAI Conference on Artificial Intelligence, AAAI 2021, Virtual Conference, vol...

work page doi:10.1186/s42162-022-00230-7 2021
[25]

In our experience, starting directly with large sequence lengths (or making large jumps without this warm start) leads to substantially worse optimization and less stable training

before applying larger sequence length increments. In our experience, starting directly with large sequence lengths (or making large jumps without this warm start) leads to substantially worse optimization and less stable training. For each training schedule in Table 4, we train five independently initialized models. Table 4 indicates that performance is ...

work page 2026
[26]

Still, a stable state emerges, with periodic patterns that, while not identical, remain strongly similar over time

The key characteristics of the data, particularly the heartbeat-like patterns across both channels, continue to be well synthesized in the extended sequence. Still, a stable state emerges, with periodic patterns that, while not identical, remain strongly similar over time. 21 Published in Transactions on Machine Learning Research (03/2026) Figure 8: Examp...

work page 2026
[27]

Upon closer inspection, a slight decrease in amplitude can be observed in channels 2 and 4 compared to the initial segment (up tol= 1000)

The sine curves are extended very consistently beyond the trained length, maintaining the dataset’s structure. Upon closer inspection, a slight decrease in amplitude can be observed in channels 2 and 4 compared to the initial segment (up tol= 1000). 22 Published in Transactions on Machine Learning Research (03/2026) Figure 10: Example of a generated time ...

work page 2026
[28]

Still, a stable state emerges, with the model settling into repetitive, low-variation patterns resembling noisy flatlines across all channels

While the generation follows the original data up to this length, no meaningful structure is preserved in the extended part. Still, a stable state emerges, with the model settling into repetitive, low-variation patterns resembling noisy flatlines across all channels. This behavior is expected, as the MetroPT3 dataset exhibits low quasi-periodicity. A.3 Ab...

work page 2026
[29]

Each score is based on 5000 generated samples; results are averaged over 15 independently trained models and reported with 1-sigma confidence intervals

Lower scores indicate better performance. Each score is based on 5000 generated samples; results are averaged over 15 independently trained models and reported with 1-sigma confidence intervals. Sequence lengths Dataset Model 100 300 500 1000 Electric Motor AEQ-RVAE-ST (ours)0.35±0.04 0.12±0.01 0.10±0.01 0.24±0.02 RVAE Control 0.81±0.09 0.74±0.07 1.44±0.1...

work page 2026
[30]

Bottom row: example samples

Top row per sequence length: PSD. Bottom row: example samples. 25 Published in Transactions on Machine Learning Research (03/2026) Original Data Generated Data (AEQ-RVAE-ST) l= 900 0.0 0.1 0.2 0.3 0.4 0.5 Frequency 10 9 10 7 10 5 10 3 10 1 101 PSD Average Train Data PSD (30,000 Samples) 0.0 0.1 0.2 0.3 0.4 0.5 Frequency 10 9 10 7 10 5 10 3 10 1 101 PSD Av...

work page 2026
[31]

Bottom row: example samples

Top row per sequence length: PSD. Bottom row: example samples. 26 Published in Transactions on Machine Learning Research (03/2026) and now also appear belowf1. The spectral content betweenf1 andf 2 is clearly reduced compared to the original, showing that the model isolates the two dominant peaks while filtering out surrounding frequencies. Atl= 900, thes...

work page 2026
[32]

The fastest models are the convolution- based TimeVAE and WaveGAN, whereas the recurrent models (AEQ-RVAE-ST and TimeGAN) require substantially longer training times

Overall, the results suggest clear runtime differences in our setup. The fastest models are the convolution- based TimeVAE and WaveGAN, whereas the recurrent models (AEQ-RVAE-ST and TimeGAN) require substantially longer training times. In addition, diffusion-based models require additional time for sample generation after training. 27 Published in Transac...

work page 2026
[33]

We perform min-max scaling with(−1,1)

We use Adam optimizer with learning rateα= 10 −4,β1 = 0.9, β2 = 0.999,ϵ= 10−7. We perform min-max scaling with(−1,1). After scaling we do a train/validation split with a ratio of 9:1. We use the loss function Lθ,ϕ=α·SSE+β·DKL,(5) where the reconstruction loss, SSE, represents the sum of squared errors, computed for each individual sample within a batch: S...

work page 2017
[34]

allows us to evaluate the (relative) short term consistency of synthetic data in high accuracy and low variance. To ensure reliable assessment of sample quality, we prevented overfitting of theELBO 29 Published in Transactions on Machine Learning Research (03/2026) Table 8: AverageELBOscoreE( ˜X)of synthetic time series for six models (see 4.2), computed ...

work page 2026
[35]

For each score,1000generated samples were evaluated by anELBO model(based on the TimeVAE architecture) and the results are reported with 1-sigma confidence intervals

A higher score indicates better performance. For each score,1000generated samples were evaluated by anELBO model(based on the TimeVAE architecture) and the results are reported with 1-sigma confidence intervals. The interpretation must follow analogously to the explanation provided in Section A.8 of the main paper, where the specifics and limitations of t...

work page 2026
[36]

Implementation details and hyperparameters for each model are provided in Appendix A.11.1

A.10 Baseline Model Details This section provides detailed descriptions of the baseline models, including their architectural properties and equivariance characteristics. Implementation details and hyperparameters for each model are provided in Appendix A.11.1. TimeGAN(Yoon et al., 2019):A GAN-based model that is considered state-of-the-art in generation ...

work page 2019
[37]

WaveGAN’s generator is based on convolutional layers

WaveGAN(Donahue et al., 2019):A GAN-based model developed for generation of raw audio waveforms. WaveGAN’s generator is based on convolutional layers. It doesn’t rely on typical audio processing techniques like spectrogram representations and is instead directly working in the time domain, making it also suitable for learning time series data. It is desig...

work page 2019
[38]

In our experiments, it was trained with the lowest possible sequence length of214, and the generated samples were subsequently split to match the required sequence length

Notably, WaveGAN loses it’s equivariance on a dense layer between the latent dimension and the generator, however the generator itself completely maintains equivariance with its upscaling approach. In our experiments, it was trained with the lowest possible sequence length of214, and the generated samples were subsequently split to match the required sequ...

work page 2019
[39]

When we generate samples, we cut them into equal parts which correspond to the desired sequence lengthl

We chose2 14 = 16384because it is the smallest possible length. When we generate samples, we cut them into equal parts which correspond to the desired sequence lengthl. The rest of the hyperparameters were set to default. On the sine dataset training, wie created10,000samples with a length of16,384. We used the ported pytorch implementation6. A.11.4 TimeV...

work page 2026
[40]

Since TimeGAN and WaveGAN show consistent performance across sequence lengths within a given dataset, these observations will not be explicitly mentioned in each figure caption

are not repeated here. Since TimeGAN and WaveGAN show consistent performance across sequence lengths within a given dataset, these observations will not be explicitly mentioned in each figure caption. 8https://github.com/Lysarthas/Time-Transformer 9https://github.com/Y-debug-sys/Diffusion-TS 33 Published in Transactions on Machine Learning Research (03/20...

work page 2026
[41]

AEQ-RVAE-ST, TimeVAE and Diffusion-TS perform similarly and the best across all sequence lengths

35 Published in Transactions on Machine Learning Research (03/2026) AEQ- RVAE-ST (ours) TimeGAN WaveGAN TimeVAE Diffusion-TS Time- Transformer PCA,l=100PCA,l=1000t-SNE,l=100t-SNE,l=1000 Figure 15: PCA and t-SNE plots for sequence lengthsl= 100andl= 1000on the MetroPT3 dataset. AEQ-RVAE-ST, TimeVAE and Diffusion-TS perform similarly and the best across all...

work page 2026

[1] [1]

doi: 10.1145/1247480.1247511

ACM. doi: 10.1145/1247480.1247511. URL https://doi.org/10.1145/1247480.1247511. Dzmitry Bahdanau. Neural machine translation by jointly learning to align and translate.arXiv preprint arXiv:1409.0473,

work page doi:10.1145/1247480.1247511

[2] [2]

An Empirical Evaluation of Generic Convolutional and Recurrent Networks for Sequence Modeling

Shaojie Bai, J Zico Kolter, and Vladlen Koltun. An empirical evaluation of generic convolutional and recurrent networks for sequence modeling.arXiv preprint arXiv:1803.01271,

work page internal anchor Pith review Pith/arXiv arXiv

[3] [3]

Understanding disentangling in $\beta$-VAE

Christopher P Burgess, Irina Higgins, Arka Pal, Loic Matthey, Nick Watters, Guillaume Desjardins, and Alexander Lerchner. Understanding disentangling inβ-vae.arXiv preprint arXiv:1804.03599,

work page internal anchor Pith review Pith/arXiv arXiv

[4] [4]

Abhyuday Desai, Cynthia Freeman, Zuhui Wang, and Ian Beaver

doi: 10.1109/DSAA53316.2021.9564181. Abhyuday Desai, Cynthia Freeman, Zuhui Wang, and Ian Beaver. Timevae: A variational auto-encoder for multivariate time series generation.arXiv preprint arXiv:2111.08095, 2021a. Abhyuday Desai, Cynthia Freeman, Zuhui Wang, and Ian Beaver. Timevae: A variational auto-encoder for multivariate time series generation.https:...

work page doi:10.1109/dsaa53316.2021.9564181 2021

[5] [5]

Variational Recurrent Auto-Encoders

Otto Fabius and Joost R Van Amersfoort. Variational recurrent auto-encoders.arXiv preprint arXiv:1412.6581,

work page internal anchor Pith review Pith/arXiv arXiv

[6] [6]

doi: 10.1561/2200000089

ISSN 1935-8245. doi: 10.1561/2200000089. URLhttp://dx.doi.org/ 10.1561/2200000089. A. Goldberger, L. Amaral, L. Glass, J. Hausdorff, P. C. Ivanov, R. Mark, and H. E. Stanley. Physiobank, physiotoolkit, and physionet: Components of a new research resource for complex physiologic signals. Circulation, 101(23):e215–e220,

work page doi:10.1561/2200000089 1935

[7] [7]

doi: 10.1161/01.CIR.101.23.e215. Online. 17 Published in Transactions on Machine Learning Research (03/2026) Ian Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville, and Yoshua Bengio. Generative adversarial networks.Communications of the ACM, 63(11): 139–144,

work page doi:10.1161/01.cir.101.23.e215 2026

[8] [8]

Neural Comput.9(8), 1735–1780 (1997) https://doi.org/10.1162/neco.1997.9.8.1735

doi: 10.1162/neco.1997.9.8.1735. Harold Hotelling. Analysis of a complex of statistical variables into principal components.Journal of educational psychology, 24(6):417,

work page doi:10.1162/neco.1997.9.8.1735 1997

[9] [9]

Tero Karras, Timo Aila, Samuli Laine, and Jaakko Lehtinen

Updated 2010 with erratum. Tero Karras, Timo Aila, Samuli Laine, and Jaakko Lehtinen. Progressive growing of gans for improved quality, stability, and variation,

work page 2010

[10] [10]

Progressive Growing of GANs for Improved Quality, Stability, and Variation

URLhttps://arxiv.org/abs/1710.10196. Angelos Katharopoulos, Apoorv Vyas, Nikolaos Pappas, and François Fleuret. Transformers are rnns: Fast autoregressive transformers with linear attention. InInternational Conference on Machine Learning, pp. 5156–5165. PMLR,

work page internal anchor Pith review Pith/arXiv arXiv

[11] [11]

URLhttps://doi.org/10.1109/TKDE.2020.3014806

doi: 10.1109/ TKDE.2020.3014806. URLhttps://doi.org/10.1109/TKDE.2020.3014806. Yuansan Liu, Sudanthi Wijewickrema, Ang Li, Christofer Bester, Stephen O’Leary, and James Bailey. Time- transformer: Integrating local and global features for better time series generation,

work page doi:10.1109/tkde.2020.3014806 2020

[12] [12]

Yonghong Luo, Xiangrui Cai, Ying Zhang, Jun Xu, et al

URLhttps: //arxiv.org/abs/2312.11714. Yonghong Luo, Xiangrui Cai, Ying Zhang, Jun Xu, et al. Multivariate time series imputation with generative adversarial networks.Advances in neural information processing systems, 31,

work page arXiv

[13] [13]

Adversarial Autoencoders

URLhttps://arxiv.org/abs/1511.05644. Gopalakishna Manjunath and Herbert Jaeger. Echo state property linked to an input: Exploring a funda- mental characteristic of recurrent neural networks.Neural Computation, 25(3):671–696,

work page internal anchor Pith review Pith/arXiv arXiv

[14] [14]

C-RNN-GAN: Continuous recurrent neural networks with adversarial training

URL https://arxiv.org/abs/1611.09904. 18 Published in Transactions on Machine Learning Research (03/2026) Philipp N. Mueller. Attention-enhanced conditional-diffusion-based data synthesis for data augmentation in machine fault diagnosis.Engineering Applications of Artificial Intelligence, 131:107696,

work page internal anchor Pith review Pith/arXiv arXiv 2026

[15] [15]

doi: https://doi.org/10.1016/j.engappai.2023.107696. Kevin P. Murphy.Probabilistic Machine Learning: An introduction. MIT Press,

work page doi:10.1016/j.engappai.2023.107696 2023

[16] [16]

Nour Neifar, Achraf Ben-Hamadou, Afef Mdhaffar, and Mohamed Jmaiel

URLprobml.ai. Nour Neifar, Achraf Ben-Hamadou, Afef Mdhaffar, and Mohamed Jmaiel. Diffecg: A versatile probabilistic diffusion model for ecg signals synthesis.arXiv preprint arXiv:2306.01875,

work page arXiv

[17] [17]

Alec Radford, Jeffrey Wu, Rewon Child, David Luan, Dario Amodei, Ilya Sutskever, et al

URLhttps://arxiv.org/abs/2108.00981. Alec Radford, Jeffrey Wu, Rewon Child, David Luan, Dario Amodei, Ilya Sutskever, et al. Language models are unsupervised multitask learners.OpenAI blog, 1(8):9,

work page arXiv

[18] [18]

URLhttps://doi.org/ 10.1007/s10489-023-04814-y

doi: 10.1007/s10489-023-04814-y. URLhttps://doi.org/ 10.1007/s10489-023-04814-y. Yusuke Tashiro, Jiaming Song, Yang Song, and Stefano Ermon. Csdi: Conditional score-based diffusion models for probabilistic time series imputation.Advances in Neural Information Processing Systems, 34: 24804–24816,

work page doi:10.1007/s10489-023-04814-y

[19] [19]

org/10.5281/zenodo.14762423

URLhttps://doi. org/10.5281/zenodo.14762423. KaiYang, ShaoyuDou, PanLuo, XinWang, andH.VincentPoor. Robustgroupanomalydetectionforquasi- periodic network time series.arXiv preprint arXiv:2506.16815,

work page doi:10.5281/zenodo.14762423

[20] [20]

URLhttps://arxiv.org/abs/2506.16815

doi: 10.48550/arXiv.2506.16815. URLhttps://arxiv.org/abs/2506.16815. Yiyuan Yang, Ming Jin, Haomin Wen, Chaoli Zhang, Yuxuan Liang, Lintao Ma, Yi Wang, Chenghao Liu, Bin Yang, Zenglin Xu, et al. A survey on diffusion models for time series and spatio-temporal data.arXiv preprint arXiv:2404.18886,

work page doi:10.48550/arxiv.2506.16815

[21] [21]

A segment-wise method for pseudo periodic time series prediction

Ning Yin, Shanshan Wang, Shenda Hong, and Hongyan Li. A segment-wise method for pseudo periodic time series prediction. InAdvanced Data Mining and Applications (ADMA 2014), volume 8933 ofLecture Notes in Computer Science, pp. 461–474, Guilin, China,

work page 2014

[22] [22]

doi: 10.1007/978-3-319-14717-8_36

Springer. doi: 10.1007/978-3-319-14717-8_36. URLhttps://doi.org/10.1007/978-3-319-14717-8_36. Jinsung Yoon, Daniel Jarrett, and Mihaela Van der Schaar. Time-series generative adversarial networks. Advances in neural information processing systems, 32,

work page doi:10.1007/978-3-319-14717-8_36

[23] [23]

19 Published in Transactions on Machine Learning Research (03/2026) Zhihan Yue, Yujing Wang, Juanyong Duan, Tianmeng Yang, Congrui Huang, Yunhai Tong, and Bixiong Xu

URLhttps://openreview.net/ forum?id=4h1apFjO99. 19 Published in Transactions on Machine Learning Research (03/2026) Zhihan Yue, Yujing Wang, Juanyong Duan, Tianmeng Yang, Congrui Huang, Yunhai Tong, and Bixiong Xu. Ts2vec: Towards universal representation of time series,

work page 2026

[24] [24]

URL https://doi.org/10.1186/s42162-022-00230-7

doi: 10.1186/s42162-022-00230-7. URL https://doi.org/10.1186/s42162-022-00230-7. Haoyi Zhou, Shanghang Zhang, Jieqi Peng, Shuai Zhang, Jianxin Li, Hui Xiong, and Wancai Zhang. In- former: Beyond efficient transformer for long sequence time-series forecasting. InThe Thirty-Fifth AAAI Conference on Artificial Intelligence, AAAI 2021, Virtual Conference, vol...

work page doi:10.1186/s42162-022-00230-7 2021

[25] [25]

In our experience, starting directly with large sequence lengths (or making large jumps without this warm start) leads to substantially worse optimization and less stable training

before applying larger sequence length increments. In our experience, starting directly with large sequence lengths (or making large jumps without this warm start) leads to substantially worse optimization and less stable training. For each training schedule in Table 4, we train five independently initialized models. Table 4 indicates that performance is ...

work page 2026

[26] [26]

Still, a stable state emerges, with periodic patterns that, while not identical, remain strongly similar over time

The key characteristics of the data, particularly the heartbeat-like patterns across both channels, continue to be well synthesized in the extended sequence. Still, a stable state emerges, with periodic patterns that, while not identical, remain strongly similar over time. 21 Published in Transactions on Machine Learning Research (03/2026) Figure 8: Examp...

work page 2026

[27] [27]

Upon closer inspection, a slight decrease in amplitude can be observed in channels 2 and 4 compared to the initial segment (up tol= 1000)

The sine curves are extended very consistently beyond the trained length, maintaining the dataset’s structure. Upon closer inspection, a slight decrease in amplitude can be observed in channels 2 and 4 compared to the initial segment (up tol= 1000). 22 Published in Transactions on Machine Learning Research (03/2026) Figure 10: Example of a generated time ...

work page 2026

[28] [28]

Still, a stable state emerges, with the model settling into repetitive, low-variation patterns resembling noisy flatlines across all channels

While the generation follows the original data up to this length, no meaningful structure is preserved in the extended part. Still, a stable state emerges, with the model settling into repetitive, low-variation patterns resembling noisy flatlines across all channels. This behavior is expected, as the MetroPT3 dataset exhibits low quasi-periodicity. A.3 Ab...

work page 2026

[29] [29]

Each score is based on 5000 generated samples; results are averaged over 15 independently trained models and reported with 1-sigma confidence intervals

Lower scores indicate better performance. Each score is based on 5000 generated samples; results are averaged over 15 independently trained models and reported with 1-sigma confidence intervals. Sequence lengths Dataset Model 100 300 500 1000 Electric Motor AEQ-RVAE-ST (ours)0.35±0.04 0.12±0.01 0.10±0.01 0.24±0.02 RVAE Control 0.81±0.09 0.74±0.07 1.44±0.1...

work page 2026

[30] [30]

Bottom row: example samples

Top row per sequence length: PSD. Bottom row: example samples. 25 Published in Transactions on Machine Learning Research (03/2026) Original Data Generated Data (AEQ-RVAE-ST) l= 900 0.0 0.1 0.2 0.3 0.4 0.5 Frequency 10 9 10 7 10 5 10 3 10 1 101 PSD Average Train Data PSD (30,000 Samples) 0.0 0.1 0.2 0.3 0.4 0.5 Frequency 10 9 10 7 10 5 10 3 10 1 101 PSD Av...

work page 2026

[31] [31]

Bottom row: example samples

Top row per sequence length: PSD. Bottom row: example samples. 26 Published in Transactions on Machine Learning Research (03/2026) and now also appear belowf1. The spectral content betweenf1 andf 2 is clearly reduced compared to the original, showing that the model isolates the two dominant peaks while filtering out surrounding frequencies. Atl= 900, thes...

work page 2026

[32] [32]

The fastest models are the convolution- based TimeVAE and WaveGAN, whereas the recurrent models (AEQ-RVAE-ST and TimeGAN) require substantially longer training times

Overall, the results suggest clear runtime differences in our setup. The fastest models are the convolution- based TimeVAE and WaveGAN, whereas the recurrent models (AEQ-RVAE-ST and TimeGAN) require substantially longer training times. In addition, diffusion-based models require additional time for sample generation after training. 27 Published in Transac...

work page 2026

[33] [33]

We perform min-max scaling with(−1,1)

We use Adam optimizer with learning rateα= 10 −4,β1 = 0.9, β2 = 0.999,ϵ= 10−7. We perform min-max scaling with(−1,1). After scaling we do a train/validation split with a ratio of 9:1. We use the loss function Lθ,ϕ=α·SSE+β·DKL,(5) where the reconstruction loss, SSE, represents the sum of squared errors, computed for each individual sample within a batch: S...

work page 2017

[34] [34]

allows us to evaluate the (relative) short term consistency of synthetic data in high accuracy and low variance. To ensure reliable assessment of sample quality, we prevented overfitting of theELBO 29 Published in Transactions on Machine Learning Research (03/2026) Table 8: AverageELBOscoreE( ˜X)of synthetic time series for six models (see 4.2), computed ...

work page 2026

[35] [35]

For each score,1000generated samples were evaluated by anELBO model(based on the TimeVAE architecture) and the results are reported with 1-sigma confidence intervals

A higher score indicates better performance. For each score,1000generated samples were evaluated by anELBO model(based on the TimeVAE architecture) and the results are reported with 1-sigma confidence intervals. The interpretation must follow analogously to the explanation provided in Section A.8 of the main paper, where the specifics and limitations of t...

work page 2026

[36] [36]

Implementation details and hyperparameters for each model are provided in Appendix A.11.1

A.10 Baseline Model Details This section provides detailed descriptions of the baseline models, including their architectural properties and equivariance characteristics. Implementation details and hyperparameters for each model are provided in Appendix A.11.1. TimeGAN(Yoon et al., 2019):A GAN-based model that is considered state-of-the-art in generation ...

work page 2019

[37] [37]

WaveGAN’s generator is based on convolutional layers

WaveGAN(Donahue et al., 2019):A GAN-based model developed for generation of raw audio waveforms. WaveGAN’s generator is based on convolutional layers. It doesn’t rely on typical audio processing techniques like spectrogram representations and is instead directly working in the time domain, making it also suitable for learning time series data. It is desig...

work page 2019

[38] [38]

In our experiments, it was trained with the lowest possible sequence length of214, and the generated samples were subsequently split to match the required sequence length

Notably, WaveGAN loses it’s equivariance on a dense layer between the latent dimension and the generator, however the generator itself completely maintains equivariance with its upscaling approach. In our experiments, it was trained with the lowest possible sequence length of214, and the generated samples were subsequently split to match the required sequ...

work page 2019

[39] [39]

When we generate samples, we cut them into equal parts which correspond to the desired sequence lengthl

We chose2 14 = 16384because it is the smallest possible length. When we generate samples, we cut them into equal parts which correspond to the desired sequence lengthl. The rest of the hyperparameters were set to default. On the sine dataset training, wie created10,000samples with a length of16,384. We used the ported pytorch implementation6. A.11.4 TimeV...

work page 2026

[40] [40]

Since TimeGAN and WaveGAN show consistent performance across sequence lengths within a given dataset, these observations will not be explicitly mentioned in each figure caption

are not repeated here. Since TimeGAN and WaveGAN show consistent performance across sequence lengths within a given dataset, these observations will not be explicitly mentioned in each figure caption. 8https://github.com/Lysarthas/Time-Transformer 9https://github.com/Y-debug-sys/Diffusion-TS 33 Published in Transactions on Machine Learning Research (03/20...

work page 2026

[41] [41]

AEQ-RVAE-ST, TimeVAE and Diffusion-TS perform similarly and the best across all sequence lengths

35 Published in Transactions on Machine Learning Research (03/2026) AEQ- RVAE-ST (ours) TimeGAN WaveGAN TimeVAE Diffusion-TS Time- Transformer PCA,l=100PCA,l=1000t-SNE,l=100t-SNE,l=1000 Figure 15: PCA and t-SNE plots for sequence lengthsl= 100andl= 1000on the MetroPT3 dataset. AEQ-RVAE-ST, TimeVAE and Diffusion-TS perform similarly and the best across all...

work page 2026