pith. sign in

arxiv: 2605.18534 · v1 · pith:XX367GU5new · submitted 2026-05-18 · 💻 cs.LG

XCTFormer: Leveraging Cross-Channel and Cross-Time Dependencies for Enhanced Time-Series Analysis

Pith reviewed 2026-05-20 11:52 UTC · model grok-4.3

classification 💻 cs.LG
keywords multivariate time serieschannel-dependent modelingtransformer attentionimputationcross-channel dependenciesforecastinganomaly detection
0
0 comments X

The pith

XCTFormer improves multivariate time-series modeling by using direct pairwise attention across both channels and time steps.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper presents XCTFormer as a channel-dependent transformer that addresses shortcomings in prior models by explicitly modeling dependencies between every pair of tokens from different variables and time points. It claims that earlier channel-dependent approaches relied on indirect strategies that missed key relationships, while channel-independent methods succeeded partly because of those modeling gaps. The new architecture adds a Cross-Relational Attention Block to increase the model's ability to express cross-channel and cross-temporal links, plus an optional compression step for efficiency. Experiments across three benchmarks show competitive or superior results on forecasting, imputation, and anomaly detection, with the largest gains on imputation. If correct, this suggests that careful direct attention can make channel-dependent designs reliably better than both indirect alternatives and independent baselines.

Core claim

XCTFormer operates in a token-to-token manner to capture pairwise dependencies across time and channels through its Cross-Relational Attention Block, combined with a data processing module and optional Dependency Compression Plugin, delivering state-of-the-art imputation accuracy that exceeds the second-best method by an average of 20.8 percent in MSE and 15.3 percent in MAE on standard benchmarks.

What carries the argument

The Cross-Relational Attention Block (CRAB), which computes explicit pairwise attention between all tokens spanning both temporal and channel dimensions to directly represent inter-variable and cross-time relationships.

If this is right

  • The model attains state-of-the-art results on imputation while remaining competitive on forecasting and anomaly detection.
  • Direct pairwise modeling across channels and time increases model capacity and expressiveness compared with indirect channel-dependent strategies.
  • The optional Dependency Compression Plugin allows the approach to scale without prohibitive compute demands on the evaluated tasks.
  • Explicit cross-relational attention can be added to transformer pipelines for multivariate sequences without assuming independence between variables.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same explicit pairwise token modeling could be tested on sequence tasks outside time series, such as video frames or sensor networks, where variables share a common physical context.
  • If the performance edge holds on larger or noisier datasets, practitioners might shift away from channel-independent defaults toward lightweight cross-channel attention layers.
  • The gap between indirect and direct dependency modeling suggests a broader design principle: when variables arise from a shared process, explicit cross terms should be the default starting point rather than an add-on.

Load-bearing premise

That an explicit token-to-token attention mechanism will capture the true underlying dependencies without introducing noise, overfitting, or computational costs that outweigh the gains on typical time-series benchmarks.

What would settle it

A new time-series dataset or benchmark in which XCTFormer fails to match or exceed the imputation accuracy of the current second-best method while also showing no clear advantage over strong channel-independent baselines.

Figures

Figures reproduced from arXiv: 2605.18534 by Israel Zexer, Omri Azencot.

Figure 1
Figure 1. Figure 1: XCTFormer model overview. Multivariate inputs are divided into patches per channel, tokenized, [PITH_FULL_IMAGE:figures/full_fig_p005_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Potential cross-channel and temporal dependencies for token at channel 3 at time [PITH_FULL_IMAGE:figures/full_fig_p006_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Interpretable Learned Mask Structure: The data permutation step places the patch sequence [PITH_FULL_IMAGE:figures/full_fig_p024_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Analysis of learnable attention masks on ETTm1 dataset. Top row: 96 [PITH_FULL_IMAGE:figures/full_fig_p025_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: DeCoP k sensitivity on Traffic (forecasting). Average MAE/MSE across horizons {96, 192, 336, 720} for different compressed representation sizes k. C.2 DeCoP compression size analysis As recalled, DeCoP compresses token-to-token interactions into a low-dimensional representation; the choice of k directly controls the expressive capacity of this bottleneck and thus can affect both accuracy and efficiency. In… view at source ↗
Figure 6
Figure 6. Figure 6: DeCoP k sensitivity on ECL (forecasting). Average MAE/MSE across horizons {96, 192, 336, 720} for different compressed representation sizes k. 0 50 100 150 200 250 k 0.045 0.050 0.055 0.060 0.065 0.070 0.075 0.080 Test MSE imputation - Electricity Test Validation 0 50 100 150 200 250 k 0.14 0.15 0.16 0.17 0.18 0.19 0.20 Test MAE imputation - Electricity 0.045 0.050 0.055 0.060 0.065 0.070 0.075 0.080 0.085… view at source ↗
Figure 7
Figure 7. Figure 7: DeCoP k sensitivity on ECL (imputation). Average MAE/MSE across mask ratios {0.125, 0.25, 0.375, 0.5} for different compressed representation sizes k. 27 [PITH_FULL_IMAGE:figures/full_fig_p027_7.png] view at source ↗
Figure 8
Figure 8. Figure 8: Patch length sensitivity on ETTm1 (forecasting). Average validation and test losses across horizons {96, 192, 336, 720} for different patch_len values with stride = patch_len/2. Following PatchTST’s (Nie et al., 2023) best practice, we kept the patching configuration fixed in most of our main experiments. For forecasting and anomaly detection, we used patch_len= 16 and stride= 8. For imputation, where the … view at source ↗
Figure 9
Figure 9. Figure 9: Patch length sensitivity on Weather (forecasting). Average validation and test losses across horizons {96, 192, 336, 720} for different patch_len values with stride = patch_len/2. 0.6 0.4 0.2 0.0 0.2 0.4 0.6 Value 0 1 2 Density Mask: Layer 1 [-0.618, 0.636] Mean=-0.0001 0.75 0.50 0.25 0.00 0.25 0.50 0.75 Value 0 1 2 Density Mask: Layer 2 [-0.708, 0.696] Mean=0.0018 0.050 0.025 0.000 0.025 0.050 0.075 Value… view at source ↗
Figure 10
Figure 10. Figure 10: Distributions of mask and signed-attention weights. Histograms of the learnable mask values M and the resulting activated attention weights. Results are shown for the forecasting task upon the ETTm1 dataset with lookback L=96 and horizon H=192. The left panel shows the distributions after the first training epoch, and the right panel after the final (10th) epoch. The distributions remain approximately Gau… view at source ↗
Figure 11
Figure 11. Figure 11: Scalability with respect to channel dimensionality ( [PITH_FULL_IMAGE:figures/full_fig_p030_11.png] view at source ↗
Figure 12
Figure 12. Figure 12: Scalability with respect to sequence length ( [PITH_FULL_IMAGE:figures/full_fig_p031_12.png] view at source ↗
Figure 13
Figure 13. Figure 13: Synthetic dataset visualization: source signals (var_1 through var_6) and the constructed target [PITH_FULL_IMAGE:figures/full_fig_p034_13.png] view at source ↗
Figure 14
Figure 14. Figure 14: Prediction examples on the synthetic dataset: ground-truth target vs. model forecasts for selected [PITH_FULL_IMAGE:figures/full_fig_p035_14.png] view at source ↗
read the original abstract

Multivariate time-series analysis involves extracting informative representations from sequences of multiple interdependent variables, supporting tasks such as forecasting, imputation, and anomaly detection. In real-world scenarios, these variables are typically collected from a shared context or underlying phenomenon, suggesting the presence of latent dependencies across time and channels that can be leveraged to improve performance. However, recent findings show that channel-independent (CI) models, which assume no inter-variable dependencies, often outperform channel-dependent (CD) models that explicitly model such relationships. This surprising result indicates that current CD models may not fully exploit their potential due to limitations in how dependencies are captured. Recent studies have revisited channel dependence modeling with various approaches; however, these methods often employ indirect modeling strategies, which can lead to meaningful dependencies being overlooked. To address this issue, we introduce XCTFormer, a transformer-based channel-dependent (CD) model that explicitly captures cross-temporal and cross-channel dependencies via an enhanced attention mechanism. The model operates in a token-to-token fashion, modeling pairwise dependencies between every pair of tokens across time and channels. The architecture comprises (i) a data processing module, (ii) a novel Cross-Relational Attention Block (CRAB) that increases capacity and expressiveness, and (iii) an optional Dependency Compression Plugin (DeCoP) that improves scalability. Through extensive experiments on three time-series benchmarks, we show that XCTFormer achieves strong results compared to widely recognized baselines; in particular, it attains state-of-the-art performance on the imputation task, outperforming the second-best method by an average of 20.8% in MSE and 15.3% in MAE.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The manuscript introduces XCTFormer, a channel-dependent Transformer for multivariate time-series analysis. It uses a data-processing module, a Cross-Relational Attention Block (CRAB) that performs explicit token-to-token pairwise attention over cross-channel and cross-time tokens, and an optional Dependency Compression Plugin (DeCoP) for scalability. The central claim is that this explicit modeling overcomes limitations of prior indirect CD strategies and yields state-of-the-art imputation performance on three public benchmarks, outperforming the second-best method by 20.8% MSE and 15.3% MAE on average.

Significance. If the reported gains can be shown to stem specifically from the uncompressed token-to-token attention in CRAB rather than from the data-processing module, residual connections, or DeCoP, the work would supply concrete empirical counter-evidence to the recent preference for channel-independent models. It would also demonstrate that direct pairwise dependency modeling is feasible on standard benchmarks without prohibitive cost, potentially guiding future CD architectures.

major comments (2)
  1. [Abstract / §3 (Architecture)] Abstract and architecture description: the central claim attributes the 20.8% MSE / 15.3% MAE imputation gains to the explicit token-to-token pairwise attention inside the CRAB block. However, DeCoP is introduced as an optional plugin that improves scalability (implying reduction of the O((T·C)^2) cost). The manuscript does not state which configuration—full pairwise CRAB or the compressed DeCoP variant—was used to obtain the reported SOTA numbers. This distinction is load-bearing: if DeCoP was active, the results cannot be read as direct validation that explicit pairwise attention resolves the shortcomings of indirect CD modeling.
  2. [Experimental evaluation] Experimental section: the abstract asserts strong results and SOTA imputation performance, yet supplies no information on the exact baselines, ablation studies isolating CRAB, statistical significance tests, train/validation/test splits, or error bars across random seeds. These omissions prevent attribution of the quoted percentage improvements to the proposed mechanism and therefore undermine the empirical support for the main thesis.
minor comments (2)
  1. [§2 (Preliminaries)] The tokenization scheme that flattens time and channel dimensions into a single sequence could be illustrated with a small diagram or explicit indexing equations to aid readability.
  2. [Introduction] A few sentences in the introduction are overly long; splitting them would improve clarity without changing content.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive and detailed comments, which have helped us clarify key aspects of the work. We address each major comment below and have revised the manuscript accordingly to strengthen the presentation of our results and experimental details.

read point-by-point responses
  1. Referee: [Abstract / §3 (Architecture)] Abstract and architecture description: the central claim attributes the 20.8% MSE / 15.3% MAE imputation gains to the explicit token-to-token pairwise attention inside the CRAB block. However, DeCoP is introduced as an optional plugin that improves scalability (implying reduction of the O((T·C)^2) cost). The manuscript does not state which configuration—full pairwise CRAB or the compressed DeCoP variant—was used to obtain the reported SOTA numbers. This distinction is load-bearing: if DeCoP was active, the results cannot be read as direct validation that explicit pairwise attention resolves the shortcomings of indirect CD modeling.

    Authors: We thank the referee for identifying this critical point of clarification. The reported SOTA imputation results were obtained using the full CRAB block with uncompressed token-to-token pairwise attention; the DeCoP plugin was not activated. This choice was made because the three standard benchmarks have moderate dimensions (T and C) for which the quadratic complexity remains computationally feasible. DeCoP is presented as an optional module specifically for larger-scale settings. We have revised the abstract and Section 3 to explicitly state the configuration used for the main results and have added a brief discussion of DeCoP's role and when it would be applied. revision: yes

  2. Referee: [Experimental evaluation] Experimental section: the abstract asserts strong results and SOTA imputation performance, yet supplies no information on the exact baselines, ablation studies isolating CRAB, statistical significance tests, train/validation/test splits, or error bars across random seeds. These omissions prevent attribution of the quoted percentage improvements to the proposed mechanism and therefore undermine the empirical support for the main thesis.

    Authors: We agree that these details are essential for reproducibility and for rigorously attributing performance gains to the proposed mechanisms. In the revised version we have substantially expanded the experimental section to include: (i) the complete list of baselines with full citations, (ii) dedicated ablation studies that isolate the CRAB block (including variants with and without cross-relational attention), (iii) statistical significance tests (paired t-tests and Wilcoxon signed-rank tests across runs), (iv) explicit descriptions of the train/validation/test splits, and (v) error bars showing mean and standard deviation over five random seeds. These additions directly support the claim that the observed improvements stem from the explicit modeling in CRAB. revision: yes

Circularity Check

0 steps flagged

No circularity: empirical architecture proposal with independent benchmark evaluation

full rationale

The paper introduces an architectural design (CRAB block for explicit token-to-token attention plus optional DeCoP) and supports its claims solely through experimental results on public time-series benchmarks. No derivation, equation, or first-principles argument reduces the reported performance gains to a quantity defined by the model's own fitted parameters or to a self-citation chain. The central performance numbers (20.8% MSE / 15.3% MAE on imputation) are presented as measured outcomes rather than as algebraic consequences of the model definition itself. Self-citations, if present, are not load-bearing for the empirical claims.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 2 invented entities

The central claim rests on the empirical superiority of explicit pairwise token modeling; the architecture introduces two new components whose effectiveness is demonstrated only through benchmark comparisons rather than theoretical guarantees.

free parameters (1)
  • Transformer hyperparameters (layers, heads, embedding size, etc.)
    Standard architectural choices that are tuned on the evaluation benchmarks to achieve the reported performance.
axioms (1)
  • domain assumption Multivariate time-series variables collected from a shared context exhibit latent cross-channel and cross-time dependencies that explicit modeling can exploit.
    Invoked in the opening paragraphs of the abstract as the motivation for moving beyond channel-independent baselines.
invented entities (2)
  • Cross-Relational Attention Block (CRAB) no independent evidence
    purpose: To increase model capacity by computing pairwise dependencies between every pair of tokens across time and channels.
    New architectural component introduced to address limitations of prior indirect dependency modeling.
  • Dependency Compression Plugin (DeCoP) no independent evidence
    purpose: To improve scalability of the full pairwise attention computation.
    Optional module proposed to mitigate computational cost of the token-to-token design.

pith-pipeline@v0.9.0 · 5829 in / 1566 out tokens · 50996 ms · 2026-05-20T11:52:13.003004+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

66 extracted references · 66 canonical work pages · 1 internal anchor

  1. [1]

    Thirty-Seventh Conference on Artificial Intelligence , publisher =

    Ailing Zeng and Muxi Chen and Lei Zhang and Qiang Xu , title =. Thirty-Seventh Conference on Artificial Intelligence , publisher =

  2. [2]

    The Twelfth International Conference on Learning Representations,

    Yong Liu and Tengge Hu and Haoran Zhang and Haixu Wu and Shiyu Wang and Lintao Ma and Mingsheng Long , title =. The Twelfth International Conference on Learning Representations,

  3. [3]

    Annual Conference on Neural Information Processing Systems 2021, NeurIPS , year =

    Haixu Wu and Jiehui Xu and Jianmin Wang and Mingsheng Long , title =. Annual Conference on Neural Information Processing Systems 2021, NeurIPS , year =

  4. [4]

    Nguyen and Phanwadee Sinthong and Jayant Kalagnanam , title =

    Yuqi Nie and Nam H. Nguyen and Phanwadee Sinthong and Jayant Kalagnanam , title =. The Eleventh International Conference on Learning Representations,

  5. [5]

    Gomez and Lukasz Kaiser and Illia Polosukhin , title =

    Ashish Vaswani and Noam Shazeer and Niki Parmar and Jakob Uszkoreit and Llion Jones and Aidan N. Gomez and Lukasz Kaiser and Illia Polosukhin , title =. Annual Conference on Neural Information Processing Systems, NeurIPS , year =

  6. [6]

    Box, George E. P. and Jenkins, Gwilym M. , title =. 1970 , edition =

  7. [7]

    Neural computation , year=

    Long short-term memory , author=. Neural computation , year=

  8. [8]

    Annual Conference on Neural Information Processing Systems, NeurIPS , year =

    Jean. Annual Conference on Neural Information Processing Systems, NeurIPS , year =

  9. [9]

    Guoqi Yu and Jing Zou and Xiaowei Hu and Angelica I. Avil. Forty-first International Conference on Machine Learning,

  10. [10]

    Thirty-Fifth Conference on Artificial Intelligence , publisher =

    Haoyi Zhou and Shanghang Zhang and Jieqi Peng and Shuai Zhang and Jianxin Li and Hui Xiong and Wancai Zhang , title =. Thirty-Fifth Conference on Artificial Intelligence , publisher =

  11. [11]

    Annual Conference on Neural Information Processing Systems, NeurIPS , year =

    Shiyang Li and Xiaoyong Jin and Yao Xuan and Xiyou Zhou and Wenhu Chen and Yu. Annual Conference on Neural Information Processing Systems, NeurIPS , year =

  12. [12]

    International Conference on Machine Learning,

    Tian Zhou and Ziqing Ma and Qingsong Wen and Xue Wang and Liang Sun and Rong Jin , title =. International Conference on Machine Learning,

  13. [13]

    Xue Wang and Tian Zhou and Qingsong Wen and Jinyang Gao and Bolin Ding and Rong Jin , booktitle=

  14. [14]

    Wang, Xinlin and Wang, Hao and Bhandari, Binayak and Cheng, Leming , journal =

  15. [15]

    2018 , organization=

    Bui, C and Pham, N and Vo, A and Tran, A and Nguyen, A and Le, T , booktitle=. 2018 , organization=

  16. [16]

    2024 , publisher=

    Mystakidis, Aristeidis and Koukaras, Paraskevas and Tsalikidis, Nikolaos and Ioannidis, Dimosthenis and Tjortjis, Christos , journal=. 2024 , publisher=

  17. [17]

    2021 , publisher=

    Duarte, Diego and Walshaw, Chris and Ramesh, Nadarajah , journal=. 2021 , publisher=

  18. [18]

    2023 , publisher=

    Brunet, Gilbert and Parsons, David B and Ivanov, Dimitar and Lee, Boram and Bauer, Peter and Bernier, Natacha B and Bouchet, Veronique and Brown, Andy and Busalacchi, Antonio and Flatter, Georgina Campbell and others , journal=. 2023 , publisher=

  19. [19]

    Liu and Schahram Dustdar , title =

    Shizhan Liu and Hang Yu and Cong Liao and Jianguo Li and Weiyao Lin and Alex X. Liu and Schahram Dustdar , title =. The Tenth International Conference on Learning Representations,

  20. [20]

    The Eleventh International Conference on Learning Representations,

    Yunhao Zhang and Junchi Yan , title =. The Eleventh International Conference on Learning Representations,

  21. [21]

    Transactions on Machine Learning Research , year =

    Abhimanyu Das and Weihao Kong and Andrew Leach and Shaan Mathur and Rajat Sen and Rose Yu , title =. Transactions on Machine Learning Research , year =

  22. [22]

    Kaiming He and Xiangyu Zhang and Shaoqing Ren and Jian Sun , title =

  23. [23]

    Zhang and Jun Zhou , title =

    Shiyu Wang and Haixu Wu and Xiaoming Shi and Tengge Hu and Huakun Luo and Lintao Ma and James Y. Zhang and Jun Zhou , title =. The Twelfth International Conference on Learning Representations,

  24. [24]

    The Thirteenth International Conference on Learning Representations,

    Shiyu Wang and Jiawei Li and Xiaoming Shi and Zhou Ye and Baichuan Mo and Wenze Lin and Shengtong Ju and Zhixuan Chu and Ming Jin , title =. The Thirteenth International Conference on Learning Representations,

  25. [25]

    The Tenth International Conference on Learning Representations,

    Taesung Kim and Jinhee Kim and Yunwon Tae and Cheonbok Park and Jang. The Tenth International Conference on Learning Representations,

  26. [26]

    arXiv , year =

    Patara Trirat and Yooju Shin and Junhyeok Kang and Youngeun Nam and Jihye Na and Minyoung Bae and Joeun Kim and Byunghyun Kim and Jae. arXiv , year =

  27. [27]

    The Thirteenth International Conference on Learning Representations,

    Berivan Isik and Natalia Ponomareva and Hussein Hazimeh and Dimitris Paparas and Sergei Vassilvitskii and Sanmi Koyejo , title =. The Thirteenth International Conference on Learning Representations,

  28. [28]

    Communications of the ACM , year =

    Pedro Domingos , title =. Communications of the ACM , year =

  29. [29]

    Webb and Irwin King and Shirui Pan , journal =

    Ming Jin and Huan Yee Koh and Qingsong Wen and Daniele Zambon and Cesare Alippi and Geoffrey I. Webb and Irwin King and Shirui Pan , journal =

  30. [30]

    Lv, Ang and Xie, Ruobing and Li, Shuaipeng and Liao, Jiayi and Sun, Xingwu and Kang, Zhanhui and Wang, Di and Yan, Rui , journal=

  31. [31]

    The Eleventh International Conference on Learning Representations,

    Huiqiang Wang and Jian Peng and Feihu Huang and Jince Wang and Junhui Chen and Yifei Xiao , title =. The Eleventh International Conference on Learning Representations,

  32. [32]

    Annual Conference on Neural Information Processing Systems, NeurIPS , year =

    Minhao Liu and Ailing Zeng and Muxi Chen and Zhijian Xu and Qiuxia Lai and Lingna Ma and Qiang Xu , title =. Annual Conference on Neural Information Processing Systems, NeurIPS , year =

  33. [33]

    The Eleventh International Conference on Learning Representations,

    Haixu Wu and Tengge Hu and Yong Liu and Hang Zhou and Jianmin Wang and Mingsheng Long , title =. The Eleventh International Conference on Learning Representations,

  34. [34]

    2410.18613 , archivePrefix=

    Hemanth Saratchandran and Jianqiao Zheng and Yiping Ji and Wenbo Zhang and Simon Lucey , year=. 2410.18613 , archivePrefix=

  35. [35]

    The Twelfth International Conference on Learning Representations,

    Lifan Zhao and Yanyan Shen , title =. The Twelfth International Conference on Learning Representations,

  36. [36]

    Annual Conference on Neural Information Processing Systems, NeurIPS , year =

    Xiaodan Chen and Xiucheng Li and Xinyang Chen and Zhijun Li , title =. Annual Conference on Neural Information Processing Systems, NeurIPS , year =

  37. [37]

    International Conference on Artificial Intelligence and Statistics,

    Liran Nochumsohn and Hedi Zisling and Omri Azencot , title =. International Conference on Artificial Intelligence and Statistics,

  38. [38]

    Annual Conference on Neural Information Processing Systems, NeurIPS , year =

    Adam Paszke and Sam Gross and Francisco Massa and Adam Lerer and James Bradbury and Gregory Chanan and Trevor Killeen and Zeming Lin and Natalia Gimelshein and Luca Antiga and Alban Desmaison and Andreas K. Annual Conference on Neural Information Processing Systems, NeurIPS , year =

  39. [39]

    Kingma and Jimmy Ba , title =

    Diederik P. Kingma and Jimmy Ba , title =. 3rd International Conference on Learning Representations,

  40. [40]

    Li, Shiyang and Jin, Xiaoyong and Xuan, Yao and Zhou, Xiyou and Chen, Wenhu and Wang, Yu-Xiang and Yan, Xifeng , booktitle =

  41. [41]

    International Conference on Learning Representations (ICLR) , year =

    Kitaev, Nikita and Kaiser,. International Conference on Learning Representations (ICLR) , year =

  42. [42]

    Xu, Jiehui and Wu, Haixu and Wang, Jianmin and Long, Mingsheng , booktitle =

  43. [43]

    International Conference on Learning Representations (ICLR) , year =

    Gu, Albert and Goel, Karan and R. International Conference on Learning Representations (ICLR) , year =

  44. [44]

    Advances in Neural Information Processing Systems (NeurIPS) , year =

    Non-stationary Transformers: Exploring the Stationarity in Time Series Forecasting , author =. Advances in Neural Information Processing Systems (NeurIPS) , year =

  45. [45]

    Woo, Gerald and Liu, Chenghao and Sahoo, Doyen and Kumar, Akshat and Hoi, Steven , journal =

  46. [46]

    Zhang, Tianping and Zhang, Yizhuo and Cao, Wei and Bian, Jiang and Yi, Xiaohan and Zheng, Shun and Li, Jian , journal =

  47. [47]

    and Lynn, Frances and Meade, Brian D

    Reed, Gerald F. and Lynn, Frances and Meade, Brian D. , title =. Clinical and Diagnostic Laboratory Immunology , year =

  48. [48]

    Su, Ya and Zhao, Youjian and Niu, Chenhao and Liu, Rong and Sun, Wei and Pei, Dan , booktitle =

  49. [49]

    and Tippenhauer, Nils Ole , booktitle =

    Mathur, Aditya P. and Tippenhauer, Nils Ole , booktitle =

  50. [50]

    Abdulaal, Ahmed and Liu, Zhuanghua and Lancewicki, Tomer , booktitle =

  51. [51]

    Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining , year =

    Detecting Spacecraft Anomalies Using LSTMs and Nonparametric Dynamic Thresholding , author =. Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining , year =

  52. [52]

    Forty-first International Conference on Machine Learning,

    Abhimanyu Das and Weihao Kong and Rajat Sen and Yichen Zhou , title =. Forty-first International Conference on Machine Learning,

  53. [53]

    Abdul Fatir Ansari and Lorenzo Stella and Ali Caner T. Trans. Mach. Learn. Res. , year =

  54. [54]

    The Thirteenth International Conference on Learning Representations,

    Xiaoming Shi and Shiyu Wang and Yuqi Nie and Dianqi Li and Zhou Ye and Qingsong Wen and Ming Jin , title =. The Thirteenth International Conference on Learning Representations,

  55. [55]

    CoRR , year =

    Abdul Fatir Ansari and Oleksandr Shchur and Jaris K. CoRR , year =

  56. [56]

    Forty-first International Conference on Machine Learning,

    Gerald Woo and Chenghao Liu and Akshat Kumar and Caiming Xiong and Silvio Savarese and Doyen Sahoo , title =. Forty-first International Conference on Machine Learning,

  57. [57]

    TiRex: Zero-Shot Forecasting Across Long and Short Horizons with Enhanced In-Context Learning , journal =

    Andreas Auer and Patrick Podest and Daniel Klotz and Sebastian B. TiRex: Zero-Shot Forecasting Across Long and Short Horizons with Enhanced In-Context Learning , journal =

  58. [58]

    2021 , howpublished =

    Facebook Research , title =. 2021 , howpublished =

  59. [59]

    arXiv preprint arXiv:2509.15105 , year=

    Super-Linear: A Lightweight Pretrained Mixture of Linear Experts for Time Series Forecasting , author=. arXiv preprint arXiv:2509.15105 , year=

  60. [60]

    arXiv preprint arXiv:2411.15743 , year=

    Beyond data scarcity: A frequency-driven framework for zero-shot forecasting , author=. arXiv preprint arXiv:2411.15743 , year=

  61. [61]

    arXiv preprint arXiv:2601.00970 , year=

    Zero-shot Forecasting by Simulation Alone , author=. arXiv preprint arXiv:2601.00970 , year=

  62. [62]

    Advances in Neural Information Processing Systems , volume=

    Utilizing image transforms and diffusion models for generative modeling of short and long time series , author=. Advances in Neural Information Processing Systems , volume=

  63. [63]

    Advances in Neural Information Processing Systems (NeurIPS) 39 , year =

    Fadlon, Gal and Arbiv, Idan and Berman, Nimrod and Azencot, Omri , title=. Advances in Neural Information Processing Systems (NeurIPS) 39 , year =

  64. [64]

    Advances in Neural Information Processing Systems (NeurIPS) 39 , year =

    Gonen, Tal and Pemper, Itai and Naiman, Ilan and Berman, Nimrod and Azencot, Omri , title=. Advances in Neural Information Processing Systems (NeurIPS) 39 , year =

  65. [65]

    Benjamin Erichson and Pu Ren and Michael W

    Ilan Naiman and N. Benjamin Erichson and Pu Ren and Michael W. Mahoney and Omri Azencot , booktitle=. Generative Modeling of Regular and Irregular Time Series Data via

  66. [66]

    Liran Nochumsohn and Omri Azencot , title =. Trans. Mach. Learn. Res. , volume =