Quantifying the Pre-training Dividend: Generative versus Latent Self-Supervised Learning for Time Series Foundation Models

Kathy Razmadze; Noam Major; Yoli Shavit

arxiv: 2605.19462 · v1 · pith:NOZX2NOEnew · submitted 2026-05-19 · 💻 cs.LG · cs.AI

Quantifying the Pre-training Dividend: Generative versus Latent Self-Supervised Learning for Time Series Foundation Models

Noam Major , Kathy Razmadze , Yoli Shavit This is my paper

Pith reviewed 2026-05-20 07:07 UTC · model grok-4.3

classification 💻 cs.LG cs.AI

keywords self-supervised learningtime seriespre-traininganomaly detectionforecastingfoundation modelslatent alignmentgenerative models

0 comments

The pith

Pre-training boosts time series anomaly detection by up to 375% but adds little to forecasting.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper measures the extra value that self-supervised pre-training adds to time series models on a range of tasks. It compares standard generative pre-training against latent alignment methods that use wavelet transforms to build invariance to small changes. The results show large improvements on anomaly detection and classification but only small changes on forecasting. The difference traces to a trade-off: some tasks need fine-grained signal detail while others need features that ignore local noise. This pattern holds across data sources and stops improving once models reach moderate size.

Core claim

We establish a controlled framework to evaluate the pre-training dividend across diverse temporal tasks. Our analysis reveals that the pre-training dividend is highly asymmetric: SSL yields gains of up to 375% for anomaly detection and classification, yet remains marginal for forecasting. We demonstrate that representational utility is non-universal, governed by a precision-invariance trade-off where the specific signal resolution required by the task must align with the objective. Finally, we show that representation quality is largely independent of data origin and saturates at moderate architectural depths, suggesting a path to scaling via massive synthetic generation.

What carries the argument

A controlled comparison of generative versus latent self-supervised objectives, using Discrete Wavelet Transform augmentations to enforce invariance to local fluctuations.

If this is right

Anomaly detection and classification receive large accuracy lifts from either generative or latent pre-training.
Forecasting performance shows only marginal improvement after the same pre-training.
Representation quality stays roughly constant whether the pre-training data comes from real or synthetic sources.
Further increases in model depth beyond moderate sizes produce little additional benefit.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

For forecasting applications, simpler supervised training or purely generative objectives may be more efficient than broad latent pre-training.
Large synthetic datasets could serve as a practical substitute for scarce real time-series data during pre-training.
A hybrid objective that balances precision and invariance might produce more general-purpose time series representations.

Load-bearing premise

The specific DWT adaptations of LeJEPA and DINO together with the fixed fine-tuning protocol isolate the effect of the pre-training objective itself.

What would settle it

If the same large gaps between tasks disappear when the same models are fine-tuned with identical procedures but without the DWT augmentations, the claim that the objective alone drives the asymmetry would be falsified.

Figures

Figures reproduced from arXiv: 2605.19462 by Kathy Razmadze, Noam Major, Yoli Shavit.

**Figure 2.** Figure 2: Forecasting performance (MSE) as a function of backbone depth. [PITH_FULL_IMAGE:figures/full_fig_p008_2.png] view at source ↗

**Figure 3.** Figure 3: Impact of pre-training data composition on forecasting performance (MSE) for ETTh [PITH_FULL_IMAGE:figures/full_fig_p016_3.png] view at source ↗

**Figure 4.** Figure 4: Impact of pre-training data composition on forecasting performance (MSE) for ETTm [PITH_FULL_IMAGE:figures/full_fig_p016_4.png] view at source ↗

**Figure 5.** Figure 5: Impact of pre-training data composition on forecasting performance (MSE) for Weather [PITH_FULL_IMAGE:figures/full_fig_p017_5.png] view at source ↗

**Figure 6.** Figure 6: Impact of pre-training data composition on forecasting performance (MSE) for Traffic. [PITH_FULL_IMAGE:figures/full_fig_p017_6.png] view at source ↗

**Figure 7.** Figure 7: Embedding Analysis For forecasting, DWT achieves the best or near-best results across datasets (e.g., lowest MSE on ETTH1, ETTM1, and ETTM2), while alternative transformations exhibit inconsistent behavior and often degrade performance. On average, non-DWT augmentations increase forecasting error by approximately 1–3% relative to DWT. Discussion. These results indicate that the effectiveness of augmentatio… view at source ↗

**Figure 8.** Figure 8: Training loss curves for 24-layer models pre-trained on the Synthetic dataset. All methods [PITH_FULL_IMAGE:figures/full_fig_p025_8.png] view at source ↗

**Figure 9.** Figure 9: SIGReg-induced embedding geometry for Le-JEPA on the SpokenArabicDigits dataset. (1) [PITH_FULL_IMAGE:figures/full_fig_p026_9.png] view at source ↗

read the original abstract

The success of self-supervised learning (SSL) in vision and NLP has motivated its rapid adoption for time series. However, research has focused primarily on Generative paradigms and forecasting tasks, leaving the broader utility of learned representations unquantified. We establish a controlled framework to evaluate the "pre-training dividend": the value added by SSL across diverse temporal tasks. We systematically compare Generative paradigms against Latent Alignment architectures, introducing adaptations of LeJEPA and DINO for time series. These adaptations utilize Discrete Wavelet Transform (DWT) augmentations to enforce invariance to local fluctuations. Our analysis reveals that the pre-training dividend is highly asymmetric: SSL yields gains of up to 375% for anomaly detection and classification, yet remains marginal for forecasting. We demonstrate that representational utility is non-universal, governed by a precision-invariance trade-off where the specific signal resolution required by the task must align with the objective. Finally, we show that representation quality is largely independent of data origin and saturates at moderate architectural depths, suggesting a path to scaling via massive synthetic generation. Our code is available at: https://github.com/noammajor/Models

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper shows latent SSL adaptations deliver large gains on anomaly and classification tasks but only marginal ones on forecasting, though DWT augmentations applied only to latent methods may partly explain the asymmetry.

read the letter

The central result is that pre-training helps a lot more on some time series tasks than others. Latent methods adapted from LeJEPA and DINO produce gains up to 375% on anomaly detection and classification while adding little for forecasting. The authors link this to a precision-invariance trade-off that depends on matching the representation to what each task actually needs from the signal resolution.

Referee Report

1 major / 2 minor

Summary. The paper claims to establish a controlled framework comparing generative and latent self-supervised learning (SSL) for time series, adapting LeJEPA and DINO with DWT augmentations. It reports asymmetric pre-training benefits: up to 375% gains in anomaly detection and classification tasks, but only marginal gains in forecasting. This is attributed to a precision-invariance trade-off in representations, with additional findings on data origin independence and architectural depth saturation.

Significance. Should the experimental controls prove robust, the results would significantly advance understanding of SSL utility in time series by demonstrating that representational benefits are task-dependent rather than universal. This has implications for developing foundation models tailored to specific temporal tasks like anomaly detection versus forecasting. The public code release aids in verifying and extending these findings.

major comments (1)

[Experimental Framework (as described in abstract and methods)] The DWT augmentations are introduced specifically for the latent adaptations (LeJEPA and DINO) to enforce invariance to local fluctuations. However, the generative baselines appear to use standard augmentations without this multi-resolution component. This raises the possibility that observed differences in performance, particularly the large gains for anomaly detection and classification, stem from unequal handling of signal resolution rather than the core generative vs. latent objective. To support the central claim of isolating the pre-training dividend and the non-universality conclusion, ablations or explicit comparisons of augmentation effects across paradigms are necessary.

minor comments (2)

[Abstract] The specific baseline for the '375%' gain (e.g., compared to from-scratch training) and the evaluation metric should be clarified for better interpretability.
[Overall] The manuscript would benefit from more details on statistical significance testing and variance across runs to substantiate the reported percentage gains.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We are grateful to the referee for providing detailed and thoughtful feedback on our manuscript. We have addressed the major comment regarding the experimental framework below, and we plan to incorporate revisions to enhance the robustness of our claims.

read point-by-point responses

Referee: The DWT augmentations are introduced specifically for the latent adaptations (LeJEPA and DINO) to enforce invariance to local fluctuations. However, the generative baselines appear to use standard augmentations without this multi-resolution component. This raises the possibility that observed differences in performance, particularly the large gains for anomaly detection and classification, stem from unequal handling of signal resolution rather than the core generative vs. latent objective. To support the central claim of isolating the pre-training dividend and the non-universality conclusion, ablations or explicit comparisons of augmentation effects across paradigms are necessary.

Authors: We appreciate the referee pointing out this potential issue in our controlled comparison. The use of DWT augmentations is integral to the latent SSL adaptations, as these methods are designed to learn representations that are invariant to local fluctuations at multiple resolutions, which aligns with the hierarchical nature of time series signals. Generative approaches, by contrast, typically aim to reconstruct the original signal and thus employ standard augmentations focused on temporal shifts or noise addition. Nevertheless, we acknowledge that this design choice could introduce a confound. To strengthen our isolation of the pre-training objective's effect, we will add explicit ablations in the revised version. Specifically, we will evaluate the generative baselines with DWT augmentations and the latent methods with standard augmentations, reporting the resulting performance changes on the anomaly detection and classification tasks. This will allow us to quantify the contribution of the augmentation strategy separately from the SSL paradigm and better support our conclusions on the precision-invariance trade-off. revision: yes

Circularity Check

0 steps flagged

Empirical benchmarking study with no circular derivation chain

full rationale

The paper is a controlled empirical comparison of generative versus latent SSL pre-training on time series tasks, reporting performance deltas on anomaly detection, classification, and forecasting. Central claims rest on observed accuracy gains (e.g., up to 375%) and the precision-invariance trade-off inferred from task-specific results, not on any mathematical derivation, parameter fitting that is then relabeled as prediction, or self-referential definitions. No equations or self-citation chains are presented that reduce the reported asymmetries to quantities defined in terms of the same fitted values or prior author results. The framework description emphasizes isolation of pre-training objectives via DWT adaptations, but this is an experimental design choice rather than a circular reduction. The study is therefore self-contained against external benchmarks and receives a score of 0.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 0 invented entities

The central claim rests on the assumption that the chosen downstream tasks fairly represent real-world utility and that the DWT augmentations cleanly implement the desired invariance without side effects; no new physical entities are postulated.

free parameters (1)

DWT augmentation hyperparameters
Parameters controlling the wavelet scales and levels used to generate views are chosen to enforce invariance and are likely tuned on validation data.

axioms (1)

domain assumption The selected tasks (anomaly detection, classification, forecasting) are representative proxies for the broader utility of time series representations.
The framework measures the pre-training dividend exclusively through performance on these tasks.

pith-pipeline@v0.9.0 · 5739 in / 1547 out tokens · 61282 ms · 2026-05-20T07:07:00.695664+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

We establish a controlled framework to evaluate the 'pre-training dividend'... Latent Alignment paradigms prioritize global structural characteristics... governed by a precision-invariance trade-off
IndisputableMonolith/Foundation/ArithmeticFromLogic.lean embed_strictMono_of_one_lt unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

These adaptations utilize Discrete Wavelet Transform (DWT) augmentations to enforce invariance to local fluctuations.

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

21 extracted references · 21 canonical work pages · 7 internal anchors

[1]

Chronos: Learning the Language of Time Series

Abdul Fatir Ansari, Lorenzo Stella, Caner Turkmen, Xiyuan Zhang, Pedro Mercado, Huibin Shen, Oleksandr Shchur, Syama Sundar Rangapuram, Sebastian Pineda Arango, Shubham Kapoor, et al. Chronos: Learning the language of time series.arXiv preprint arXiv:2403.07815, 2024

work page internal anchor Pith review Pith/arXiv arXiv 2024
[2]

LeJEPA: Provable and Scalable Self-Supervised Learning Without the Heuristics

Randall Balestriero and Yann LeCun. Lejepa: Provable and scalable self-supervised learning without the heuristics, 2025.URL https://arxiv. org/abs/2511.08544, 10

work page internal anchor Pith review Pith/arXiv arXiv 2025
[3]

VICReg: Variance-invariance-covariance regular- ization for self-supervised learning

Adrien Bardes, Jean Ponce, and Yann LeCun. VICReg: Variance-invariance-covariance regular- ization for self-supervised learning. InInternational Conference on Learning Representations, 2022

work page 2022
[4]

Emerging properties in self-supervised vision transformers

Mathilde Caron, Hugo Touvron, Ishan Misra, Hervé Jégou, Julien Mairal, Piotr Bojanowski, and Armand Joulin. Emerging properties in self-supervised vision transformers. InInternational Conference on Computer Vision (ICCV), 2021

work page 2021
[5]

A decoder-only foundation model for time-series forecasting

Abhimanyu Das, Weihao Kong, Rajat Sen, and Yichen Zhou. A decoder-only foundation model for time-series forecasting.arXiv preprint arXiv:2310.10688, 2023

work page internal anchor Pith review Pith/arXiv arXiv 2023
[6]

Simmtm: A simple pre- training framework for masked time-series modeling

Qingsong Dong, Yuxuan Ma, Yu Wang, Jie Chen, and Jun Wang. Simmtm: A simple pre- training framework for masked time-series modeling. InAdvances in Neural Information Processing Systems (NeurIPS), 2023

work page 2023
[7]

Timesiam: Siamese self- supervised learning for time series

Qingsong Dong, Yuxuan Ma, Yu Wang, Jie Chen, and Jun Wang. Timesiam: Siamese self- supervised learning for time series. InInternational Conference on Machine Learning (ICML), 2024

work page 2024
[8]

Joint embeddings go temporal.arXiv preprint arXiv:2509.25449, 2025

Sofiane Ennadir, Siavash Golkar, and Leopoldo Sarra. Joint embeddings go temporal.arXiv preprint arXiv:2509.25449, 2025

work page arXiv 2025
[9]

Monash time series forecasting archive

Rakshitha Godahewa, Christoph Bergmeir, Geoffrey I Webb, Rob J Hyndman, and Pablo Montero-Manso. Monash time series forecasting archive.arXiv preprint arXiv:2105.06643, 2021

work page arXiv 2021
[10]

Assran et al

Yann LeCun. Joint embedding predictive architectures: Self-supervised learning without reconstruction.arXiv preprint arXiv:2301.08243, 2023

work page arXiv 2023
[11]

Self-supervised learning for time series: Contrastive or generative?arXiv preprint arXiv:2403.09809, 2024

Ziyu Liu, Azadeh Alavi, Minyi Li, and Xiang Zhang. Self-supervised learning for time series: Contrastive or generative?arXiv preprint arXiv:2403.09809, 2024

work page arXiv 2024
[12]

Utica: Multi-objective self-distllation foundation model pretraining for time series classification.arXiv preprint arXiv:2603.01348, 2026

Yessin Moakher, Youssef Attia El Hili, and Vasilii Feofanov. Utica: Multi-objective self-distllation foundation model pretraining for time series classification.arXiv preprint arXiv:2603.01348, 2026

work page arXiv 2026
[14]

A Time Series is Worth 64 Words: Long-term Forecasting with Transformers

Yuqi Nie, Nam H Nguyen, Phanwadee Sinthong, and Jayant Kalagnanam. A time series is worth 64 words: Long-term forecasting with transformers.arXiv preprint arXiv:2211.14730, 2022

work page internal anchor Pith review Pith/arXiv arXiv 2022
[15]

Representation Learning with Contrastive Predictive Coding

Aaron van den Oord, Yazhe Li, and Oriol Vinyals. Representation learning with contrastive predictive coding. InarXiv preprint arXiv:1807.03748, 2018

work page internal anchor Pith review Pith/arXiv arXiv 2018
[16]

Timepfn: Effective multivariate time series forecasting with synthetic data

Ege Onur Taga, Muhammed Emrullah Ildiz, and Samet Oymak. Timepfn: Effective multivariate time series forecasting with synthetic data. InProceedings of the AAAI conference on artificial intelligence, volume 39, pages 20761–20769, 2025

work page 2025
[17]

Attention is all you need.Advances in neural information processing systems, 30, 2017

Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Łukasz Kaiser, and Illia Polosukhin. Attention is all you need.Advances in neural information processing systems, 30, 2017. 10

work page 2017
[18]

Deep Time Series Models: A Comprehensive Survey and Benchmark

Yuxuan Wang, Haixu Wu, Jiaxiang Dong, Yong Liu, Chen Wang, Mingsheng Long, and Jianmin Wang. Deep time series models: A comprehensive survey and benchmark.arXiv preprint arXiv:2407.13278, 2024

work page internal anchor Pith review Pith/arXiv arXiv 2024
[19]

Ts2vec: Towards universal representation of time series

Zhihan Yue, Haoyue Liu, Yan Zhou, Huan Yu, and Wenwu Sun. Ts2vec: Towards universal representation of time series. InAAAI Conference on Artificial Intelligence, 2022

work page 2022
[20]

Self-supervised learning for time series analysis: Taxonomy, progress, and prospects.IEEE Transactions on Pattern Analysis and Machine Intelligence, 2023

George Zerveas, S Jayaraman, Dhaval Patel, Anurag Bhamidipaty, and Carsten Eickhoff. Self-supervised learning for time series analysis: Taxonomy, progress, and prospects.IEEE Transactions on Pattern Analysis and Machine Intelligence, 2023

work page 2023
[21]

Timedart: Diffusion-based autoregressive representation learning for time series.arXiv preprint arXiv:2410.05711, 2024

Qingsong Zhang et al. Timedart: Diffusion-based autoregressive representation learning for time series.arXiv preprint arXiv:2410.05711, 2024

work page arXiv 2024
[22]

Universal Time-Series Representation Learning: A Survey

Qingsong Zhang et al. Universal time-series representation learning: A survey.arXiv preprint arXiv:2401.03717, 2024. A Appendix We provide extended results and discussion that did not fit within the page limitation of the main text and a detailed documentation of our experimental setup to support reproducibility: • Sections A.1- A.3 provide extended Linea...

work page internal anchor Pith review Pith/arXiv arXiv 2024

[1] [1]

Chronos: Learning the Language of Time Series

Abdul Fatir Ansari, Lorenzo Stella, Caner Turkmen, Xiyuan Zhang, Pedro Mercado, Huibin Shen, Oleksandr Shchur, Syama Sundar Rangapuram, Sebastian Pineda Arango, Shubham Kapoor, et al. Chronos: Learning the language of time series.arXiv preprint arXiv:2403.07815, 2024

work page internal anchor Pith review Pith/arXiv arXiv 2024

[2] [2]

LeJEPA: Provable and Scalable Self-Supervised Learning Without the Heuristics

Randall Balestriero and Yann LeCun. Lejepa: Provable and scalable self-supervised learning without the heuristics, 2025.URL https://arxiv. org/abs/2511.08544, 10

work page internal anchor Pith review Pith/arXiv arXiv 2025

[3] [3]

VICReg: Variance-invariance-covariance regular- ization for self-supervised learning

Adrien Bardes, Jean Ponce, and Yann LeCun. VICReg: Variance-invariance-covariance regular- ization for self-supervised learning. InInternational Conference on Learning Representations, 2022

work page 2022

[4] [4]

Emerging properties in self-supervised vision transformers

Mathilde Caron, Hugo Touvron, Ishan Misra, Hervé Jégou, Julien Mairal, Piotr Bojanowski, and Armand Joulin. Emerging properties in self-supervised vision transformers. InInternational Conference on Computer Vision (ICCV), 2021

work page 2021

[5] [5]

A decoder-only foundation model for time-series forecasting

Abhimanyu Das, Weihao Kong, Rajat Sen, and Yichen Zhou. A decoder-only foundation model for time-series forecasting.arXiv preprint arXiv:2310.10688, 2023

work page internal anchor Pith review Pith/arXiv arXiv 2023

[6] [6]

Simmtm: A simple pre- training framework for masked time-series modeling

Qingsong Dong, Yuxuan Ma, Yu Wang, Jie Chen, and Jun Wang. Simmtm: A simple pre- training framework for masked time-series modeling. InAdvances in Neural Information Processing Systems (NeurIPS), 2023

work page 2023

[7] [7]

Timesiam: Siamese self- supervised learning for time series

Qingsong Dong, Yuxuan Ma, Yu Wang, Jie Chen, and Jun Wang. Timesiam: Siamese self- supervised learning for time series. InInternational Conference on Machine Learning (ICML), 2024

work page 2024

[8] [8]

Joint embeddings go temporal.arXiv preprint arXiv:2509.25449, 2025

Sofiane Ennadir, Siavash Golkar, and Leopoldo Sarra. Joint embeddings go temporal.arXiv preprint arXiv:2509.25449, 2025

work page arXiv 2025

[9] [9]

Monash time series forecasting archive

Rakshitha Godahewa, Christoph Bergmeir, Geoffrey I Webb, Rob J Hyndman, and Pablo Montero-Manso. Monash time series forecasting archive.arXiv preprint arXiv:2105.06643, 2021

work page arXiv 2021

[10] [10]

Assran et al

Yann LeCun. Joint embedding predictive architectures: Self-supervised learning without reconstruction.arXiv preprint arXiv:2301.08243, 2023

work page arXiv 2023

[11] [11]

Self-supervised learning for time series: Contrastive or generative?arXiv preprint arXiv:2403.09809, 2024

Ziyu Liu, Azadeh Alavi, Minyi Li, and Xiang Zhang. Self-supervised learning for time series: Contrastive or generative?arXiv preprint arXiv:2403.09809, 2024

work page arXiv 2024

[12] [12]

Utica: Multi-objective self-distllation foundation model pretraining for time series classification.arXiv preprint arXiv:2603.01348, 2026

Yessin Moakher, Youssef Attia El Hili, and Vasilii Feofanov. Utica: Multi-objective self-distllation foundation model pretraining for time series classification.arXiv preprint arXiv:2603.01348, 2026

work page arXiv 2026

[13] [14]

A Time Series is Worth 64 Words: Long-term Forecasting with Transformers

Yuqi Nie, Nam H Nguyen, Phanwadee Sinthong, and Jayant Kalagnanam. A time series is worth 64 words: Long-term forecasting with transformers.arXiv preprint arXiv:2211.14730, 2022

work page internal anchor Pith review Pith/arXiv arXiv 2022

[14] [15]

Representation Learning with Contrastive Predictive Coding

Aaron van den Oord, Yazhe Li, and Oriol Vinyals. Representation learning with contrastive predictive coding. InarXiv preprint arXiv:1807.03748, 2018

work page internal anchor Pith review Pith/arXiv arXiv 2018

[15] [16]

Timepfn: Effective multivariate time series forecasting with synthetic data

Ege Onur Taga, Muhammed Emrullah Ildiz, and Samet Oymak. Timepfn: Effective multivariate time series forecasting with synthetic data. InProceedings of the AAAI conference on artificial intelligence, volume 39, pages 20761–20769, 2025

work page 2025

[16] [17]

Attention is all you need.Advances in neural information processing systems, 30, 2017

Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Łukasz Kaiser, and Illia Polosukhin. Attention is all you need.Advances in neural information processing systems, 30, 2017. 10

work page 2017

[17] [18]

Deep Time Series Models: A Comprehensive Survey and Benchmark

Yuxuan Wang, Haixu Wu, Jiaxiang Dong, Yong Liu, Chen Wang, Mingsheng Long, and Jianmin Wang. Deep time series models: A comprehensive survey and benchmark.arXiv preprint arXiv:2407.13278, 2024

work page internal anchor Pith review Pith/arXiv arXiv 2024

[18] [19]

Ts2vec: Towards universal representation of time series

Zhihan Yue, Haoyue Liu, Yan Zhou, Huan Yu, and Wenwu Sun. Ts2vec: Towards universal representation of time series. InAAAI Conference on Artificial Intelligence, 2022

work page 2022

[19] [20]

Self-supervised learning for time series analysis: Taxonomy, progress, and prospects.IEEE Transactions on Pattern Analysis and Machine Intelligence, 2023

George Zerveas, S Jayaraman, Dhaval Patel, Anurag Bhamidipaty, and Carsten Eickhoff. Self-supervised learning for time series analysis: Taxonomy, progress, and prospects.IEEE Transactions on Pattern Analysis and Machine Intelligence, 2023

work page 2023

[20] [21]

Timedart: Diffusion-based autoregressive representation learning for time series.arXiv preprint arXiv:2410.05711, 2024

Qingsong Zhang et al. Timedart: Diffusion-based autoregressive representation learning for time series.arXiv preprint arXiv:2410.05711, 2024

work page arXiv 2024

[21] [22]

Universal Time-Series Representation Learning: A Survey

Qingsong Zhang et al. Universal time-series representation learning: A survey.arXiv preprint arXiv:2401.03717, 2024. A Appendix We provide extended results and discussion that did not fit within the page limitation of the main text and a detailed documentation of our experimental setup to support reproducibility: • Sections A.1- A.3 provide extended Linea...

work page internal anchor Pith review Pith/arXiv arXiv 2024