XCTFormer: Leveraging Cross-Channel and Cross-Time Dependencies for Enhanced Time-Series Analysis

Israel Zexer; Omri Azencot

arxiv: 2605.18534 · v1 · pith:XX367GU5new · submitted 2026-05-18 · 💻 cs.LG

XCTFormer: Leveraging Cross-Channel and Cross-Time Dependencies for Enhanced Time-Series Analysis

Israel Zexer , Omri Azencot This is my paper

Pith reviewed 2026-05-20 11:52 UTC · model grok-4.3

classification 💻 cs.LG

keywords multivariate time serieschannel-dependent modelingtransformer attentionimputationcross-channel dependenciesforecastinganomaly detection

0 comments

The pith

XCTFormer improves multivariate time-series modeling by using direct pairwise attention across both channels and time steps.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper presents XCTFormer as a channel-dependent transformer that addresses shortcomings in prior models by explicitly modeling dependencies between every pair of tokens from different variables and time points. It claims that earlier channel-dependent approaches relied on indirect strategies that missed key relationships, while channel-independent methods succeeded partly because of those modeling gaps. The new architecture adds a Cross-Relational Attention Block to increase the model's ability to express cross-channel and cross-temporal links, plus an optional compression step for efficiency. Experiments across three benchmarks show competitive or superior results on forecasting, imputation, and anomaly detection, with the largest gains on imputation. If correct, this suggests that careful direct attention can make channel-dependent designs reliably better than both indirect alternatives and independent baselines.

Core claim

XCTFormer operates in a token-to-token manner to capture pairwise dependencies across time and channels through its Cross-Relational Attention Block, combined with a data processing module and optional Dependency Compression Plugin, delivering state-of-the-art imputation accuracy that exceeds the second-best method by an average of 20.8 percent in MSE and 15.3 percent in MAE on standard benchmarks.

What carries the argument

The Cross-Relational Attention Block (CRAB), which computes explicit pairwise attention between all tokens spanning both temporal and channel dimensions to directly represent inter-variable and cross-time relationships.

If this is right

The model attains state-of-the-art results on imputation while remaining competitive on forecasting and anomaly detection.
Direct pairwise modeling across channels and time increases model capacity and expressiveness compared with indirect channel-dependent strategies.
The optional Dependency Compression Plugin allows the approach to scale without prohibitive compute demands on the evaluated tasks.
Explicit cross-relational attention can be added to transformer pipelines for multivariate sequences without assuming independence between variables.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same explicit pairwise token modeling could be tested on sequence tasks outside time series, such as video frames or sensor networks, where variables share a common physical context.
If the performance edge holds on larger or noisier datasets, practitioners might shift away from channel-independent defaults toward lightweight cross-channel attention layers.
The gap between indirect and direct dependency modeling suggests a broader design principle: when variables arise from a shared process, explicit cross terms should be the default starting point rather than an add-on.

Load-bearing premise

That an explicit token-to-token attention mechanism will capture the true underlying dependencies without introducing noise, overfitting, or computational costs that outweigh the gains on typical time-series benchmarks.

What would settle it

A new time-series dataset or benchmark in which XCTFormer fails to match or exceed the imputation accuracy of the current second-best method while also showing no clear advantage over strong channel-independent baselines.

Figures

Figures reproduced from arXiv: 2605.18534 by Israel Zexer, Omri Azencot.

**Figure 2.** Figure 2: Potential cross-channel and temporal dependencies for token at channel 3 at time [PITH_FULL_IMAGE:figures/full_fig_p006_2.png] view at source ↗

**Figure 3.** Figure 3: Interpretable Learned Mask Structure: The data permutation step places the patch sequence [PITH_FULL_IMAGE:figures/full_fig_p024_3.png] view at source ↗

**Figure 4.** Figure 4: Analysis of learnable attention masks on ETTm1 dataset. Top row: 96 [PITH_FULL_IMAGE:figures/full_fig_p025_4.png] view at source ↗

**Figure 5.** Figure 5: DeCoP k sensitivity on Traffic (forecasting). Average MAE/MSE across horizons {96, 192, 336, 720} for different compressed representation sizes k. C.2 DeCoP compression size analysis As recalled, DeCoP compresses token-to-token interactions into a low-dimensional representation; the choice of k directly controls the expressive capacity of this bottleneck and thus can affect both accuracy and efficiency. In… view at source ↗

**Figure 6.** Figure 6: DeCoP k sensitivity on ECL (forecasting). Average MAE/MSE across horizons {96, 192, 336, 720} for different compressed representation sizes k. 0 50 100 150 200 250 k 0.045 0.050 0.055 0.060 0.065 0.070 0.075 0.080 Test MSE imputation - Electricity Test Validation 0 50 100 150 200 250 k 0.14 0.15 0.16 0.17 0.18 0.19 0.20 Test MAE imputation - Electricity 0.045 0.050 0.055 0.060 0.065 0.070 0.075 0.080 0.085… view at source ↗

**Figure 7.** Figure 7: DeCoP k sensitivity on ECL (imputation). Average MAE/MSE across mask ratios {0.125, 0.25, 0.375, 0.5} for different compressed representation sizes k. 27 [PITH_FULL_IMAGE:figures/full_fig_p027_7.png] view at source ↗

**Figure 8.** Figure 8: Patch length sensitivity on ETTm1 (forecasting). Average validation and test losses across horizons {96, 192, 336, 720} for different patch_len values with stride = patch_len/2. Following PatchTST’s (Nie et al., 2023) best practice, we kept the patching configuration fixed in most of our main experiments. For forecasting and anomaly detection, we used patch_len= 16 and stride= 8. For imputation, where the … view at source ↗

**Figure 9.** Figure 9: Patch length sensitivity on Weather (forecasting). Average validation and test losses across horizons {96, 192, 336, 720} for different patch_len values with stride = patch_len/2. 0.6 0.4 0.2 0.0 0.2 0.4 0.6 Value 0 1 2 Density Mask: Layer 1 [-0.618, 0.636] Mean=-0.0001 0.75 0.50 0.25 0.00 0.25 0.50 0.75 Value 0 1 2 Density Mask: Layer 2 [-0.708, 0.696] Mean=0.0018 0.050 0.025 0.000 0.025 0.050 0.075 Value… view at source ↗

**Figure 10.** Figure 10: Distributions of mask and signed-attention weights. Histograms of the learnable mask values M and the resulting activated attention weights. Results are shown for the forecasting task upon the ETTm1 dataset with lookback L=96 and horizon H=192. The left panel shows the distributions after the first training epoch, and the right panel after the final (10th) epoch. The distributions remain approximately Gau… view at source ↗

**Figure 11.** Figure 11: Scalability with respect to channel dimensionality ( [PITH_FULL_IMAGE:figures/full_fig_p030_11.png] view at source ↗

**Figure 12.** Figure 12: Scalability with respect to sequence length ( [PITH_FULL_IMAGE:figures/full_fig_p031_12.png] view at source ↗

**Figure 13.** Figure 13: Synthetic dataset visualization: source signals (var_1 through var_6) and the constructed target [PITH_FULL_IMAGE:figures/full_fig_p034_13.png] view at source ↗

**Figure 14.** Figure 14: Prediction examples on the synthetic dataset: ground-truth target vs. model forecasts for selected [PITH_FULL_IMAGE:figures/full_fig_p035_14.png] view at source ↗

read the original abstract

Multivariate time-series analysis involves extracting informative representations from sequences of multiple interdependent variables, supporting tasks such as forecasting, imputation, and anomaly detection. In real-world scenarios, these variables are typically collected from a shared context or underlying phenomenon, suggesting the presence of latent dependencies across time and channels that can be leveraged to improve performance. However, recent findings show that channel-independent (CI) models, which assume no inter-variable dependencies, often outperform channel-dependent (CD) models that explicitly model such relationships. This surprising result indicates that current CD models may not fully exploit their potential due to limitations in how dependencies are captured. Recent studies have revisited channel dependence modeling with various approaches; however, these methods often employ indirect modeling strategies, which can lead to meaningful dependencies being overlooked. To address this issue, we introduce XCTFormer, a transformer-based channel-dependent (CD) model that explicitly captures cross-temporal and cross-channel dependencies via an enhanced attention mechanism. The model operates in a token-to-token fashion, modeling pairwise dependencies between every pair of tokens across time and channels. The architecture comprises (i) a data processing module, (ii) a novel Cross-Relational Attention Block (CRAB) that increases capacity and expressiveness, and (iii) an optional Dependency Compression Plugin (DeCoP) that improves scalability. Through extensive experiments on three time-series benchmarks, we show that XCTFormer achieves strong results compared to widely recognized baselines; in particular, it attains state-of-the-art performance on the imputation task, outperforming the second-best method by an average of 20.8% in MSE and 15.3% in MAE.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

XCTFormer adds explicit pairwise token attention via CRAB to handle cross-channel and cross-time dependencies, but the big imputation gains may trace more to DeCoP or other unablated pieces than to the core direct-modeling claim.

read the letter

The one thing to know is that this paper pushes channel-dependent transformers by replacing indirect dependency modeling with a direct token-to-token attention block called CRAB, plus an optional compression plugin. That combination is the actual novelty here, and it sits on top of existing work rather than replacing the whole framework. The authors correctly note that many prior CD models still fall short even when they try to capture inter-variable links, and they respond with a mechanism that attends over every pair of tokens across both time and channels. That design choice is straightforward and increases capacity in a way that matches the stated goal. The optional DeCoP module is a practical addition for keeping compute reasonable on longer sequences. Those pieces together give the paper a clear technical contribution that readers working on multivariate forecasting or imputation will recognize as an incremental but honest step forward. The experiments claim solid wins on three public benchmarks, especially imputation, which is the strongest part of the story if the numbers check out. The soft spots are mostly around evidence. The abstract reports large average gains over the second-best method, yet the description leaves open whether the published runs used the full O((T·C)^2) attention or the compressed DeCoP variant. If the results rely on compression, the performance does not directly test the claim that explicit pairwise attention fixes the shortcomings of indirect CD strategies. Without ablations that isolate CRAB from the data-processing module and residual connections, it is difficult to attribute the improvement to the new attention block itself. The paper also does not appear to include statistical significance tests or error bars in the summary, which weakens the SOTA assertion until those details are verified. This work is aimed at people already building or tuning transformer models for time-series tasks who want a concrete alternative to current CD designs. A reader who cares about architectural tweaks that target dependency modeling will find usable ideas here, even if they end up modifying the compression choice. It is worth sending to peer review so the experimental controls and ablation results can be examined in full; the core idea is coherent enough that referees can give targeted feedback rather than a desk rejection.

Referee Report

2 major / 2 minor

Summary. The manuscript introduces XCTFormer, a channel-dependent Transformer for multivariate time-series analysis. It uses a data-processing module, a Cross-Relational Attention Block (CRAB) that performs explicit token-to-token pairwise attention over cross-channel and cross-time tokens, and an optional Dependency Compression Plugin (DeCoP) for scalability. The central claim is that this explicit modeling overcomes limitations of prior indirect CD strategies and yields state-of-the-art imputation performance on three public benchmarks, outperforming the second-best method by 20.8% MSE and 15.3% MAE on average.

Significance. If the reported gains can be shown to stem specifically from the uncompressed token-to-token attention in CRAB rather than from the data-processing module, residual connections, or DeCoP, the work would supply concrete empirical counter-evidence to the recent preference for channel-independent models. It would also demonstrate that direct pairwise dependency modeling is feasible on standard benchmarks without prohibitive cost, potentially guiding future CD architectures.

major comments (2)

[Abstract / §3 (Architecture)] Abstract and architecture description: the central claim attributes the 20.8% MSE / 15.3% MAE imputation gains to the explicit token-to-token pairwise attention inside the CRAB block. However, DeCoP is introduced as an optional plugin that improves scalability (implying reduction of the O((T·C)^2) cost). The manuscript does not state which configuration—full pairwise CRAB or the compressed DeCoP variant—was used to obtain the reported SOTA numbers. This distinction is load-bearing: if DeCoP was active, the results cannot be read as direct validation that explicit pairwise attention resolves the shortcomings of indirect CD modeling.
[Experimental evaluation] Experimental section: the abstract asserts strong results and SOTA imputation performance, yet supplies no information on the exact baselines, ablation studies isolating CRAB, statistical significance tests, train/validation/test splits, or error bars across random seeds. These omissions prevent attribution of the quoted percentage improvements to the proposed mechanism and therefore undermine the empirical support for the main thesis.

minor comments (2)

[§2 (Preliminaries)] The tokenization scheme that flattens time and channel dimensions into a single sequence could be illustrated with a small diagram or explicit indexing equations to aid readability.
[Introduction] A few sentences in the introduction are overly long; splitting them would improve clarity without changing content.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive and detailed comments, which have helped us clarify key aspects of the work. We address each major comment below and have revised the manuscript accordingly to strengthen the presentation of our results and experimental details.

read point-by-point responses

Referee: [Abstract / §3 (Architecture)] Abstract and architecture description: the central claim attributes the 20.8% MSE / 15.3% MAE imputation gains to the explicit token-to-token pairwise attention inside the CRAB block. However, DeCoP is introduced as an optional plugin that improves scalability (implying reduction of the O((T·C)^2) cost). The manuscript does not state which configuration—full pairwise CRAB or the compressed DeCoP variant—was used to obtain the reported SOTA numbers. This distinction is load-bearing: if DeCoP was active, the results cannot be read as direct validation that explicit pairwise attention resolves the shortcomings of indirect CD modeling.

Authors: We thank the referee for identifying this critical point of clarification. The reported SOTA imputation results were obtained using the full CRAB block with uncompressed token-to-token pairwise attention; the DeCoP plugin was not activated. This choice was made because the three standard benchmarks have moderate dimensions (T and C) for which the quadratic complexity remains computationally feasible. DeCoP is presented as an optional module specifically for larger-scale settings. We have revised the abstract and Section 3 to explicitly state the configuration used for the main results and have added a brief discussion of DeCoP's role and when it would be applied. revision: yes
Referee: [Experimental evaluation] Experimental section: the abstract asserts strong results and SOTA imputation performance, yet supplies no information on the exact baselines, ablation studies isolating CRAB, statistical significance tests, train/validation/test splits, or error bars across random seeds. These omissions prevent attribution of the quoted percentage improvements to the proposed mechanism and therefore undermine the empirical support for the main thesis.

Authors: We agree that these details are essential for reproducibility and for rigorously attributing performance gains to the proposed mechanisms. In the revised version we have substantially expanded the experimental section to include: (i) the complete list of baselines with full citations, (ii) dedicated ablation studies that isolate the CRAB block (including variants with and without cross-relational attention), (iii) statistical significance tests (paired t-tests and Wilcoxon signed-rank tests across runs), (iv) explicit descriptions of the train/validation/test splits, and (v) error bars showing mean and standard deviation over five random seeds. These additions directly support the claim that the observed improvements stem from the explicit modeling in CRAB. revision: yes

Circularity Check

0 steps flagged

No circularity: empirical architecture proposal with independent benchmark evaluation

full rationale

The paper introduces an architectural design (CRAB block for explicit token-to-token attention plus optional DeCoP) and supports its claims solely through experimental results on public time-series benchmarks. No derivation, equation, or first-principles argument reduces the reported performance gains to a quantity defined by the model's own fitted parameters or to a self-citation chain. The central performance numbers (20.8% MSE / 15.3% MAE on imputation) are presented as measured outcomes rather than as algebraic consequences of the model definition itself. Self-citations, if present, are not load-bearing for the empirical claims.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 2 invented entities

The central claim rests on the empirical superiority of explicit pairwise token modeling; the architecture introduces two new components whose effectiveness is demonstrated only through benchmark comparisons rather than theoretical guarantees.

free parameters (1)

Transformer hyperparameters (layers, heads, embedding size, etc.)
Standard architectural choices that are tuned on the evaluation benchmarks to achieve the reported performance.

axioms (1)

domain assumption Multivariate time-series variables collected from a shared context exhibit latent cross-channel and cross-time dependencies that explicit modeling can exploit.
Invoked in the opening paragraphs of the abstract as the motivation for moving beyond channel-independent baselines.

invented entities (2)

Cross-Relational Attention Block (CRAB) no independent evidence
purpose: To increase model capacity by computing pairwise dependencies between every pair of tokens across time and channels.
New architectural component introduced to address limitations of prior indirect dependency modeling.
Dependency Compression Plugin (DeCoP) no independent evidence
purpose: To improve scalability of the full pairwise attention computation.
Optional module proposed to mitigate computational cost of the token-to-token design.

pith-pipeline@v0.9.0 · 5829 in / 1566 out tokens · 50996 ms · 2026-05-20T11:52:13.003004+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

CRAB extends the standard attention block with a learnable non-boolean masking mechanism and replaces softmax with AbsAct normalization that allows negative weights while preserving bounded Frobenius norm.
IndisputableMonolith/Foundation/AlexanderDuality.lean alexander_duality_circle_linking unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

DeCoP compresses quadratic attention to linear form via a learnable matrix C for datasets with >60 channels.

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

66 extracted references · 66 canonical work pages · 1 internal anchor

[1]

Thirty-Seventh Conference on Artificial Intelligence , publisher =

Ailing Zeng and Muxi Chen and Lei Zhang and Qiang Xu , title =. Thirty-Seventh Conference on Artificial Intelligence , publisher =

work page
[2]

The Twelfth International Conference on Learning Representations,

Yong Liu and Tengge Hu and Haoran Zhang and Haixu Wu and Shiyu Wang and Lintao Ma and Mingsheng Long , title =. The Twelfth International Conference on Learning Representations,

work page
[3]

Annual Conference on Neural Information Processing Systems 2021, NeurIPS , year =

Haixu Wu and Jiehui Xu and Jianmin Wang and Mingsheng Long , title =. Annual Conference on Neural Information Processing Systems 2021, NeurIPS , year =

work page 2021
[4]

Nguyen and Phanwadee Sinthong and Jayant Kalagnanam , title =

Yuqi Nie and Nam H. Nguyen and Phanwadee Sinthong and Jayant Kalagnanam , title =. The Eleventh International Conference on Learning Representations,

work page
[5]

Gomez and Lukasz Kaiser and Illia Polosukhin , title =

Ashish Vaswani and Noam Shazeer and Niki Parmar and Jakob Uszkoreit and Llion Jones and Aidan N. Gomez and Lukasz Kaiser and Illia Polosukhin , title =. Annual Conference on Neural Information Processing Systems, NeurIPS , year =

work page
[6]

Box, George E. P. and Jenkins, Gwilym M. , title =. 1970 , edition =

work page 1970
[7]

Neural computation , year=

Long short-term memory , author=. Neural computation , year=

work page
[8]

Annual Conference on Neural Information Processing Systems, NeurIPS , year =

Jean. Annual Conference on Neural Information Processing Systems, NeurIPS , year =

work page
[9]

Guoqi Yu and Jing Zou and Xiaowei Hu and Angelica I. Avil. Forty-first International Conference on Machine Learning,

work page
[10]

Thirty-Fifth Conference on Artificial Intelligence , publisher =

Haoyi Zhou and Shanghang Zhang and Jieqi Peng and Shuai Zhang and Jianxin Li and Hui Xiong and Wancai Zhang , title =. Thirty-Fifth Conference on Artificial Intelligence , publisher =

work page
[11]

Annual Conference on Neural Information Processing Systems, NeurIPS , year =

Shiyang Li and Xiaoyong Jin and Yao Xuan and Xiyou Zhou and Wenhu Chen and Yu. Annual Conference on Neural Information Processing Systems, NeurIPS , year =

work page
[12]

International Conference on Machine Learning,

Tian Zhou and Ziqing Ma and Qingsong Wen and Xue Wang and Liang Sun and Rong Jin , title =. International Conference on Machine Learning,

work page
[13]

Xue Wang and Tian Zhou and Qingsong Wen and Jinyang Gao and Bolin Ding and Rong Jin , booktitle=

work page
[14]

Wang, Xinlin and Wang, Hao and Bhandari, Binayak and Cheng, Leming , journal =

work page
[15]

2018 , organization=

Bui, C and Pham, N and Vo, A and Tran, A and Nguyen, A and Le, T , booktitle=. 2018 , organization=

work page 2018
[16]

2024 , publisher=

Mystakidis, Aristeidis and Koukaras, Paraskevas and Tsalikidis, Nikolaos and Ioannidis, Dimosthenis and Tjortjis, Christos , journal=. 2024 , publisher=

work page 2024
[17]

2021 , publisher=

Duarte, Diego and Walshaw, Chris and Ramesh, Nadarajah , journal=. 2021 , publisher=

work page 2021
[18]

2023 , publisher=

Brunet, Gilbert and Parsons, David B and Ivanov, Dimitar and Lee, Boram and Bauer, Peter and Bernier, Natacha B and Bouchet, Veronique and Brown, Andy and Busalacchi, Antonio and Flatter, Georgina Campbell and others , journal=. 2023 , publisher=

work page 2023
[19]

Liu and Schahram Dustdar , title =

Shizhan Liu and Hang Yu and Cong Liao and Jianguo Li and Weiyao Lin and Alex X. Liu and Schahram Dustdar , title =. The Tenth International Conference on Learning Representations,

work page
[20]

The Eleventh International Conference on Learning Representations,

Yunhao Zhang and Junchi Yan , title =. The Eleventh International Conference on Learning Representations,

work page
[21]

Transactions on Machine Learning Research , year =

Abhimanyu Das and Weihao Kong and Andrew Leach and Shaan Mathur and Rajat Sen and Rose Yu , title =. Transactions on Machine Learning Research , year =

work page
[22]

Kaiming He and Xiangyu Zhang and Shaoqing Ren and Jian Sun , title =

work page
[23]

Zhang and Jun Zhou , title =

Shiyu Wang and Haixu Wu and Xiaoming Shi and Tengge Hu and Huakun Luo and Lintao Ma and James Y. Zhang and Jun Zhou , title =. The Twelfth International Conference on Learning Representations,

work page
[24]

The Thirteenth International Conference on Learning Representations,

Shiyu Wang and Jiawei Li and Xiaoming Shi and Zhou Ye and Baichuan Mo and Wenze Lin and Shengtong Ju and Zhixuan Chu and Ming Jin , title =. The Thirteenth International Conference on Learning Representations,

work page
[25]

The Tenth International Conference on Learning Representations,

Taesung Kim and Jinhee Kim and Yunwon Tae and Cheonbok Park and Jang. The Tenth International Conference on Learning Representations,

work page
[26]

arXiv , year =

Patara Trirat and Yooju Shin and Junhyeok Kang and Youngeun Nam and Jihye Na and Minyoung Bae and Joeun Kim and Byunghyun Kim and Jae. arXiv , year =

work page
[27]

The Thirteenth International Conference on Learning Representations,

Berivan Isik and Natalia Ponomareva and Hussein Hazimeh and Dimitris Paparas and Sergei Vassilvitskii and Sanmi Koyejo , title =. The Thirteenth International Conference on Learning Representations,

work page
[28]

Communications of the ACM , year =

Pedro Domingos , title =. Communications of the ACM , year =

work page
[29]

Webb and Irwin King and Shirui Pan , journal =

Ming Jin and Huan Yee Koh and Qingsong Wen and Daniele Zambon and Cesare Alippi and Geoffrey I. Webb and Irwin King and Shirui Pan , journal =

work page
[30]

Lv, Ang and Xie, Ruobing and Li, Shuaipeng and Liao, Jiayi and Sun, Xingwu and Kang, Zhanhui and Wang, Di and Yan, Rui , journal=

work page
[31]

The Eleventh International Conference on Learning Representations,

Huiqiang Wang and Jian Peng and Feihu Huang and Jince Wang and Junhui Chen and Yifei Xiao , title =. The Eleventh International Conference on Learning Representations,

work page
[32]

Annual Conference on Neural Information Processing Systems, NeurIPS , year =

Minhao Liu and Ailing Zeng and Muxi Chen and Zhijian Xu and Qiuxia Lai and Lingna Ma and Qiang Xu , title =. Annual Conference on Neural Information Processing Systems, NeurIPS , year =

work page
[33]

The Eleventh International Conference on Learning Representations,

Haixu Wu and Tengge Hu and Yong Liu and Hang Zhou and Jianmin Wang and Mingsheng Long , title =. The Eleventh International Conference on Learning Representations,

work page
[34]

2410.18613 , archivePrefix=

Hemanth Saratchandran and Jianqiao Zheng and Yiping Ji and Wenbo Zhang and Simon Lucey , year=. 2410.18613 , archivePrefix=

work page arXiv
[35]

The Twelfth International Conference on Learning Representations,

Lifan Zhao and Yanyan Shen , title =. The Twelfth International Conference on Learning Representations,

work page
[36]

Annual Conference on Neural Information Processing Systems, NeurIPS , year =

Xiaodan Chen and Xiucheng Li and Xinyang Chen and Zhijun Li , title =. Annual Conference on Neural Information Processing Systems, NeurIPS , year =

work page
[37]

International Conference on Artificial Intelligence and Statistics,

Liran Nochumsohn and Hedi Zisling and Omri Azencot , title =. International Conference on Artificial Intelligence and Statistics,

work page
[38]

Annual Conference on Neural Information Processing Systems, NeurIPS , year =

Adam Paszke and Sam Gross and Francisco Massa and Adam Lerer and James Bradbury and Gregory Chanan and Trevor Killeen and Zeming Lin and Natalia Gimelshein and Luca Antiga and Alban Desmaison and Andreas K. Annual Conference on Neural Information Processing Systems, NeurIPS , year =

work page
[39]

Kingma and Jimmy Ba , title =

Diederik P. Kingma and Jimmy Ba , title =. 3rd International Conference on Learning Representations,

work page
[40]

Li, Shiyang and Jin, Xiaoyong and Xuan, Yao and Zhou, Xiyou and Chen, Wenhu and Wang, Yu-Xiang and Yan, Xifeng , booktitle =

work page
[41]

International Conference on Learning Representations (ICLR) , year =

Kitaev, Nikita and Kaiser,. International Conference on Learning Representations (ICLR) , year =

work page
[42]

Xu, Jiehui and Wu, Haixu and Wang, Jianmin and Long, Mingsheng , booktitle =

work page
[43]

International Conference on Learning Representations (ICLR) , year =

Gu, Albert and Goel, Karan and R. International Conference on Learning Representations (ICLR) , year =

work page
[44]

Advances in Neural Information Processing Systems (NeurIPS) , year =

Non-stationary Transformers: Exploring the Stationarity in Time Series Forecasting , author =. Advances in Neural Information Processing Systems (NeurIPS) , year =

work page
[45]

Woo, Gerald and Liu, Chenghao and Sahoo, Doyen and Kumar, Akshat and Hoi, Steven , journal =

work page
[46]

Zhang, Tianping and Zhang, Yizhuo and Cao, Wei and Bian, Jiang and Yi, Xiaohan and Zheng, Shun and Li, Jian , journal =

work page
[47]

and Lynn, Frances and Meade, Brian D

Reed, Gerald F. and Lynn, Frances and Meade, Brian D. , title =. Clinical and Diagnostic Laboratory Immunology , year =

work page
[48]

Su, Ya and Zhao, Youjian and Niu, Chenhao and Liu, Rong and Sun, Wei and Pei, Dan , booktitle =

work page
[49]

and Tippenhauer, Nils Ole , booktitle =

Mathur, Aditya P. and Tippenhauer, Nils Ole , booktitle =

work page
[50]

Abdulaal, Ahmed and Liu, Zhuanghua and Lancewicki, Tomer , booktitle =

work page
[51]

Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining , year =

Detecting Spacecraft Anomalies Using LSTMs and Nonparametric Dynamic Thresholding , author =. Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining , year =

work page
[52]

Forty-first International Conference on Machine Learning,

Abhimanyu Das and Weihao Kong and Rajat Sen and Yichen Zhou , title =. Forty-first International Conference on Machine Learning,

work page
[53]

Abdul Fatir Ansari and Lorenzo Stella and Ali Caner T. Trans. Mach. Learn. Res. , year =

work page
[54]

The Thirteenth International Conference on Learning Representations,

Xiaoming Shi and Shiyu Wang and Yuqi Nie and Dianqi Li and Zhou Ye and Qingsong Wen and Ming Jin , title =. The Thirteenth International Conference on Learning Representations,

work page
[55]

CoRR , year =

Abdul Fatir Ansari and Oleksandr Shchur and Jaris K. CoRR , year =

work page
[56]

Forty-first International Conference on Machine Learning,

Gerald Woo and Chenghao Liu and Akshat Kumar and Caiming Xiong and Silvio Savarese and Doyen Sahoo , title =. Forty-first International Conference on Machine Learning,

work page
[57]

TiRex: Zero-Shot Forecasting Across Long and Short Horizons with Enhanced In-Context Learning , journal =

Andreas Auer and Patrick Podest and Daniel Klotz and Sebastian B. TiRex: Zero-Shot Forecasting Across Long and Short Horizons with Enhanced In-Context Learning , journal =

work page
[58]

2021 , howpublished =

Facebook Research , title =. 2021 , howpublished =

work page 2021
[59]

arXiv preprint arXiv:2509.15105 , year=

Super-Linear: A Lightweight Pretrained Mixture of Linear Experts for Time Series Forecasting , author=. arXiv preprint arXiv:2509.15105 , year=

work page internal anchor Pith review arXiv
[60]

arXiv preprint arXiv:2411.15743 , year=

Beyond data scarcity: A frequency-driven framework for zero-shot forecasting , author=. arXiv preprint arXiv:2411.15743 , year=

work page arXiv
[61]

arXiv preprint arXiv:2601.00970 , year=

Zero-shot Forecasting by Simulation Alone , author=. arXiv preprint arXiv:2601.00970 , year=

work page arXiv
[62]

Advances in Neural Information Processing Systems , volume=

Utilizing image transforms and diffusion models for generative modeling of short and long time series , author=. Advances in Neural Information Processing Systems , volume=

work page
[63]

Advances in Neural Information Processing Systems (NeurIPS) 39 , year =

Fadlon, Gal and Arbiv, Idan and Berman, Nimrod and Azencot, Omri , title=. Advances in Neural Information Processing Systems (NeurIPS) 39 , year =

work page
[64]

Advances in Neural Information Processing Systems (NeurIPS) 39 , year =

Gonen, Tal and Pemper, Itai and Naiman, Ilan and Berman, Nimrod and Azencot, Omri , title=. Advances in Neural Information Processing Systems (NeurIPS) 39 , year =

work page
[65]

Benjamin Erichson and Pu Ren and Michael W

Ilan Naiman and N. Benjamin Erichson and Pu Ren and Michael W. Mahoney and Omri Azencot , booktitle=. Generative Modeling of Regular and Irregular Time Series Data via

work page
[66]

Liran Nochumsohn and Omri Azencot , title =. Trans. Mach. Learn. Res. , volume =

work page

[1] [1]

Thirty-Seventh Conference on Artificial Intelligence , publisher =

Ailing Zeng and Muxi Chen and Lei Zhang and Qiang Xu , title =. Thirty-Seventh Conference on Artificial Intelligence , publisher =

work page

[2] [2]

The Twelfth International Conference on Learning Representations,

Yong Liu and Tengge Hu and Haoran Zhang and Haixu Wu and Shiyu Wang and Lintao Ma and Mingsheng Long , title =. The Twelfth International Conference on Learning Representations,

work page

[3] [3]

Annual Conference on Neural Information Processing Systems 2021, NeurIPS , year =

Haixu Wu and Jiehui Xu and Jianmin Wang and Mingsheng Long , title =. Annual Conference on Neural Information Processing Systems 2021, NeurIPS , year =

work page 2021

[4] [4]

Nguyen and Phanwadee Sinthong and Jayant Kalagnanam , title =

Yuqi Nie and Nam H. Nguyen and Phanwadee Sinthong and Jayant Kalagnanam , title =. The Eleventh International Conference on Learning Representations,

work page

[5] [5]

Gomez and Lukasz Kaiser and Illia Polosukhin , title =

Ashish Vaswani and Noam Shazeer and Niki Parmar and Jakob Uszkoreit and Llion Jones and Aidan N. Gomez and Lukasz Kaiser and Illia Polosukhin , title =. Annual Conference on Neural Information Processing Systems, NeurIPS , year =

work page

[6] [6]

Box, George E. P. and Jenkins, Gwilym M. , title =. 1970 , edition =

work page 1970

[7] [7]

Neural computation , year=

Long short-term memory , author=. Neural computation , year=

work page

[8] [8]

Annual Conference on Neural Information Processing Systems, NeurIPS , year =

Jean. Annual Conference on Neural Information Processing Systems, NeurIPS , year =

work page

[9] [9]

Guoqi Yu and Jing Zou and Xiaowei Hu and Angelica I. Avil. Forty-first International Conference on Machine Learning,

work page

[10] [10]

Thirty-Fifth Conference on Artificial Intelligence , publisher =

Haoyi Zhou and Shanghang Zhang and Jieqi Peng and Shuai Zhang and Jianxin Li and Hui Xiong and Wancai Zhang , title =. Thirty-Fifth Conference on Artificial Intelligence , publisher =

work page

[11] [11]

Annual Conference on Neural Information Processing Systems, NeurIPS , year =

Shiyang Li and Xiaoyong Jin and Yao Xuan and Xiyou Zhou and Wenhu Chen and Yu. Annual Conference on Neural Information Processing Systems, NeurIPS , year =

work page

[12] [12]

International Conference on Machine Learning,

Tian Zhou and Ziqing Ma and Qingsong Wen and Xue Wang and Liang Sun and Rong Jin , title =. International Conference on Machine Learning,

work page

[13] [13]

Xue Wang and Tian Zhou and Qingsong Wen and Jinyang Gao and Bolin Ding and Rong Jin , booktitle=

work page

[14] [14]

Wang, Xinlin and Wang, Hao and Bhandari, Binayak and Cheng, Leming , journal =

work page

[15] [15]

2018 , organization=

Bui, C and Pham, N and Vo, A and Tran, A and Nguyen, A and Le, T , booktitle=. 2018 , organization=

work page 2018

[16] [16]

2024 , publisher=

Mystakidis, Aristeidis and Koukaras, Paraskevas and Tsalikidis, Nikolaos and Ioannidis, Dimosthenis and Tjortjis, Christos , journal=. 2024 , publisher=

work page 2024

[17] [17]

2021 , publisher=

Duarte, Diego and Walshaw, Chris and Ramesh, Nadarajah , journal=. 2021 , publisher=

work page 2021

[18] [18]

2023 , publisher=

Brunet, Gilbert and Parsons, David B and Ivanov, Dimitar and Lee, Boram and Bauer, Peter and Bernier, Natacha B and Bouchet, Veronique and Brown, Andy and Busalacchi, Antonio and Flatter, Georgina Campbell and others , journal=. 2023 , publisher=

work page 2023

[19] [19]

Liu and Schahram Dustdar , title =

Shizhan Liu and Hang Yu and Cong Liao and Jianguo Li and Weiyao Lin and Alex X. Liu and Schahram Dustdar , title =. The Tenth International Conference on Learning Representations,

work page

[20] [20]

The Eleventh International Conference on Learning Representations,

Yunhao Zhang and Junchi Yan , title =. The Eleventh International Conference on Learning Representations,

work page

[21] [21]

Transactions on Machine Learning Research , year =

Abhimanyu Das and Weihao Kong and Andrew Leach and Shaan Mathur and Rajat Sen and Rose Yu , title =. Transactions on Machine Learning Research , year =

work page

[22] [22]

Kaiming He and Xiangyu Zhang and Shaoqing Ren and Jian Sun , title =

work page

[23] [23]

Zhang and Jun Zhou , title =

Shiyu Wang and Haixu Wu and Xiaoming Shi and Tengge Hu and Huakun Luo and Lintao Ma and James Y. Zhang and Jun Zhou , title =. The Twelfth International Conference on Learning Representations,

work page

[24] [24]

The Thirteenth International Conference on Learning Representations,

Shiyu Wang and Jiawei Li and Xiaoming Shi and Zhou Ye and Baichuan Mo and Wenze Lin and Shengtong Ju and Zhixuan Chu and Ming Jin , title =. The Thirteenth International Conference on Learning Representations,

work page

[25] [25]

The Tenth International Conference on Learning Representations,

Taesung Kim and Jinhee Kim and Yunwon Tae and Cheonbok Park and Jang. The Tenth International Conference on Learning Representations,

work page

[26] [26]

arXiv , year =

Patara Trirat and Yooju Shin and Junhyeok Kang and Youngeun Nam and Jihye Na and Minyoung Bae and Joeun Kim and Byunghyun Kim and Jae. arXiv , year =

work page

[27] [27]

The Thirteenth International Conference on Learning Representations,

Berivan Isik and Natalia Ponomareva and Hussein Hazimeh and Dimitris Paparas and Sergei Vassilvitskii and Sanmi Koyejo , title =. The Thirteenth International Conference on Learning Representations,

work page

[28] [28]

Communications of the ACM , year =

Pedro Domingos , title =. Communications of the ACM , year =

work page

[29] [29]

Webb and Irwin King and Shirui Pan , journal =

Ming Jin and Huan Yee Koh and Qingsong Wen and Daniele Zambon and Cesare Alippi and Geoffrey I. Webb and Irwin King and Shirui Pan , journal =

work page

[30] [30]

Lv, Ang and Xie, Ruobing and Li, Shuaipeng and Liao, Jiayi and Sun, Xingwu and Kang, Zhanhui and Wang, Di and Yan, Rui , journal=

work page

[31] [31]

The Eleventh International Conference on Learning Representations,

Huiqiang Wang and Jian Peng and Feihu Huang and Jince Wang and Junhui Chen and Yifei Xiao , title =. The Eleventh International Conference on Learning Representations,

work page

[32] [32]

Annual Conference on Neural Information Processing Systems, NeurIPS , year =

Minhao Liu and Ailing Zeng and Muxi Chen and Zhijian Xu and Qiuxia Lai and Lingna Ma and Qiang Xu , title =. Annual Conference on Neural Information Processing Systems, NeurIPS , year =

work page

[33] [33]

The Eleventh International Conference on Learning Representations,

Haixu Wu and Tengge Hu and Yong Liu and Hang Zhou and Jianmin Wang and Mingsheng Long , title =. The Eleventh International Conference on Learning Representations,

work page

[34] [34]

2410.18613 , archivePrefix=

Hemanth Saratchandran and Jianqiao Zheng and Yiping Ji and Wenbo Zhang and Simon Lucey , year=. 2410.18613 , archivePrefix=

work page arXiv

[35] [35]

The Twelfth International Conference on Learning Representations,

Lifan Zhao and Yanyan Shen , title =. The Twelfth International Conference on Learning Representations,

work page

[36] [36]

Annual Conference on Neural Information Processing Systems, NeurIPS , year =

Xiaodan Chen and Xiucheng Li and Xinyang Chen and Zhijun Li , title =. Annual Conference on Neural Information Processing Systems, NeurIPS , year =

work page

[37] [37]

International Conference on Artificial Intelligence and Statistics,

Liran Nochumsohn and Hedi Zisling and Omri Azencot , title =. International Conference on Artificial Intelligence and Statistics,

work page

[38] [38]

Annual Conference on Neural Information Processing Systems, NeurIPS , year =

Adam Paszke and Sam Gross and Francisco Massa and Adam Lerer and James Bradbury and Gregory Chanan and Trevor Killeen and Zeming Lin and Natalia Gimelshein and Luca Antiga and Alban Desmaison and Andreas K. Annual Conference on Neural Information Processing Systems, NeurIPS , year =

work page

[39] [39]

Kingma and Jimmy Ba , title =

Diederik P. Kingma and Jimmy Ba , title =. 3rd International Conference on Learning Representations,

work page

[40] [40]

Li, Shiyang and Jin, Xiaoyong and Xuan, Yao and Zhou, Xiyou and Chen, Wenhu and Wang, Yu-Xiang and Yan, Xifeng , booktitle =

work page

[41] [41]

International Conference on Learning Representations (ICLR) , year =

Kitaev, Nikita and Kaiser,. International Conference on Learning Representations (ICLR) , year =

work page

[42] [42]

Xu, Jiehui and Wu, Haixu and Wang, Jianmin and Long, Mingsheng , booktitle =

work page

[43] [43]

International Conference on Learning Representations (ICLR) , year =

Gu, Albert and Goel, Karan and R. International Conference on Learning Representations (ICLR) , year =

work page

[44] [44]

Advances in Neural Information Processing Systems (NeurIPS) , year =

Non-stationary Transformers: Exploring the Stationarity in Time Series Forecasting , author =. Advances in Neural Information Processing Systems (NeurIPS) , year =

work page

[45] [45]

Woo, Gerald and Liu, Chenghao and Sahoo, Doyen and Kumar, Akshat and Hoi, Steven , journal =

work page

[46] [46]

Zhang, Tianping and Zhang, Yizhuo and Cao, Wei and Bian, Jiang and Yi, Xiaohan and Zheng, Shun and Li, Jian , journal =

work page

[47] [47]

and Lynn, Frances and Meade, Brian D

Reed, Gerald F. and Lynn, Frances and Meade, Brian D. , title =. Clinical and Diagnostic Laboratory Immunology , year =

work page

[48] [48]

Su, Ya and Zhao, Youjian and Niu, Chenhao and Liu, Rong and Sun, Wei and Pei, Dan , booktitle =

work page

[49] [49]

and Tippenhauer, Nils Ole , booktitle =

Mathur, Aditya P. and Tippenhauer, Nils Ole , booktitle =

work page

[50] [50]

Abdulaal, Ahmed and Liu, Zhuanghua and Lancewicki, Tomer , booktitle =

work page

[51] [51]

Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining , year =

Detecting Spacecraft Anomalies Using LSTMs and Nonparametric Dynamic Thresholding , author =. Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining , year =

work page

[52] [52]

Forty-first International Conference on Machine Learning,

Abhimanyu Das and Weihao Kong and Rajat Sen and Yichen Zhou , title =. Forty-first International Conference on Machine Learning,

work page

[53] [53]

Abdul Fatir Ansari and Lorenzo Stella and Ali Caner T. Trans. Mach. Learn. Res. , year =

work page

[54] [54]

The Thirteenth International Conference on Learning Representations,

Xiaoming Shi and Shiyu Wang and Yuqi Nie and Dianqi Li and Zhou Ye and Qingsong Wen and Ming Jin , title =. The Thirteenth International Conference on Learning Representations,

work page

[55] [55]

CoRR , year =

Abdul Fatir Ansari and Oleksandr Shchur and Jaris K. CoRR , year =

work page

[56] [56]

Forty-first International Conference on Machine Learning,

Gerald Woo and Chenghao Liu and Akshat Kumar and Caiming Xiong and Silvio Savarese and Doyen Sahoo , title =. Forty-first International Conference on Machine Learning,

work page

[57] [57]

TiRex: Zero-Shot Forecasting Across Long and Short Horizons with Enhanced In-Context Learning , journal =

Andreas Auer and Patrick Podest and Daniel Klotz and Sebastian B. TiRex: Zero-Shot Forecasting Across Long and Short Horizons with Enhanced In-Context Learning , journal =

work page

[58] [58]

2021 , howpublished =

Facebook Research , title =. 2021 , howpublished =

work page 2021

[59] [59]

arXiv preprint arXiv:2509.15105 , year=

Super-Linear: A Lightweight Pretrained Mixture of Linear Experts for Time Series Forecasting , author=. arXiv preprint arXiv:2509.15105 , year=

work page internal anchor Pith review arXiv

[60] [60]

arXiv preprint arXiv:2411.15743 , year=

Beyond data scarcity: A frequency-driven framework for zero-shot forecasting , author=. arXiv preprint arXiv:2411.15743 , year=

work page arXiv

[61] [61]

arXiv preprint arXiv:2601.00970 , year=

Zero-shot Forecasting by Simulation Alone , author=. arXiv preprint arXiv:2601.00970 , year=

work page arXiv

[62] [62]

Advances in Neural Information Processing Systems , volume=

Utilizing image transforms and diffusion models for generative modeling of short and long time series , author=. Advances in Neural Information Processing Systems , volume=

work page

[63] [63]

Advances in Neural Information Processing Systems (NeurIPS) 39 , year =

Fadlon, Gal and Arbiv, Idan and Berman, Nimrod and Azencot, Omri , title=. Advances in Neural Information Processing Systems (NeurIPS) 39 , year =

work page

[64] [64]

Advances in Neural Information Processing Systems (NeurIPS) 39 , year =

Gonen, Tal and Pemper, Itai and Naiman, Ilan and Berman, Nimrod and Azencot, Omri , title=. Advances in Neural Information Processing Systems (NeurIPS) 39 , year =

work page

[65] [65]

Benjamin Erichson and Pu Ren and Michael W

Ilan Naiman and N. Benjamin Erichson and Pu Ren and Michael W. Mahoney and Omri Azencot , booktitle=. Generative Modeling of Regular and Irregular Time Series Data via

work page

[66] [66]

Liran Nochumsohn and Omri Azencot , title =. Trans. Mach. Learn. Res. , volume =

work page