pith. sign in

hub Mixed citations

A Time Series is Worth 64 Words: Long-term Forecasting with Transformers

Mixed citation behavior. Most common role is background (64%).

60 Pith papers citing it
Background 64% of classified citations
abstract

We propose an efficient design of Transformer-based models for multivariate time series forecasting and self-supervised representation learning. It is based on two key components: (i) segmentation of time series into subseries-level patches which are served as input tokens to Transformer; (ii) channel-independence where each channel contains a single univariate time series that shares the same embedding and Transformer weights across all the series. Patching design naturally has three-fold benefit: local semantic information is retained in the embedding; computation and memory usage of the attention maps are quadratically reduced given the same look-back window; and the model can attend longer history. Our channel-independent patch time series Transformer (PatchTST) can improve the long-term forecasting accuracy significantly when compared with that of SOTA Transformer-based models. We also apply our model to self-supervised pre-training tasks and attain excellent fine-tuning performance, which outperforms supervised training on large datasets. Transferring of masked pre-trained representation on one dataset to others also produces SOTA forecasting accuracy. Code is available at: https://github.com/yuqinie98/PatchTST.

hub tools

citation-role summary

background 8 baseline 2 method 1

citation-polarity summary

representative citing papers

Olivia: Harmonizing Time Series Foundation Models with Power Spectral Density

cs.LG · 2026-05-17 · unverdicted · novelty 7.0

Olivia harmonizes time series datasets via normalized power spectral density using a Harmonizer module and resonator-based HarmonicAttention, achieving state-of-the-art zero-shot, few-shot, and full-shot forecasting on TSLib, GIFT-Eval, and GluonTS benchmarks.

How Do Electrocardiogram Models Scale?

cs.LG · 2026-05-17 · conditional · novelty 6.0

Empirical scaling study of ECG models finds SSL scales robustly while ResNets show 1.3-2.5x better parameter efficiency and SSL up to 16x better data efficiency than supervised baselines on out-of-distribution tasks.

citing papers explorer

Showing 50 of 60 citing papers.