CASE-NET: Deep Spatio-Temporal Representation Learning via Causal Attention and Channel Recalibration for Multivariate Time Series Classification

Fan Zhang; Hua Wang; Yating Cui

arxiv: 2605.22043 · v1 · pith:2CJZY2QRnew · submitted 2026-05-21 · 💻 cs.LG

CASE-NET: Deep Spatio-Temporal Representation Learning via Causal Attention and Channel Recalibration for Multivariate Time Series Classification

Fan Zhang , Yating Cui , Hua Wang This is my paper

Pith reviewed 2026-05-22 08:26 UTC · model grok-4.3

classification 💻 cs.LG

keywords multivariate time series classificationcausal attentionchannel recalibrationinformation bottlenecknon-stationary datarepresentation learningdeep spatio-temporal networks

0 comments

The pith

CASE-NET shows that enforcing the arrow of time with masked attention and causal convolutions plus channel recalibration removes confounding and noise for stronger multivariate time series classification.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper tries to establish that current encoders for multivariate time series fail because they let future information leak into past representations and let noisy channels pollute the learned features. It introduces CASE-NET to fix this by building a Causal Temporal Encoder that uses masked self-attention and causal convolutions to respect the physical direction of time, then pairs it with an Adaptive Channel Recalibration module that acts as an information bottleneck to keep only useful signals. If the approach holds, classification should become more accurate and stable on data whose statistics shift over time. Readers in pervasive computing or finance would care because those fields routinely deal with exactly such non-stationary sensor or market streams.

Core claim

CASE-NET establishes that a Causal Temporal Encoder enforcing physical arrow-of-time constraints via masked self-attention and causal convolutions, combined with an Adaptive Channel Recalibration module that functions as an information bottleneck to suppress detrimental noise, produces cleaner latent representations and yields new state-of-the-art benchmarks on four of six heterogeneous tasks, including a peak accuracy of 98.6 percent on the AWR dataset together with improved robustness under non-stationary conditions.

What carries the argument

The Causal Temporal Encoder (masked self-attention plus causal convolutions) paired with Adaptive Channel Recalibration as an information bottleneck that together precondition the spatio-temporal manifold.

If this is right

New state-of-the-art results on four of the six evaluated tasks across heterogeneous domains.
Peak accuracy of 98.6 percent on the AWR activity recognition dataset.
Measurably higher robustness when input statistics change over time.
Direct applicability to multivariate streams in pervasive computing and financial analysis.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same causal-masking pattern could be tested on forecasting or anomaly detection tasks where future leakage is equally harmful.
The recalibration bottleneck might transfer to other high-dimensional sensor fusion problems to reduce manual feature cleaning.
If the gains persist on larger real-world streams, practitioners could replace heavy preprocessing pipelines with these structural priors inside the network.
Combining the arrow-of-time prior with additional domain constraints such as conservation laws could be explored for physical simulation data.

Load-bearing premise

That adding causal constraints and channel recalibration will remove temporal confounding and noise contamination without introducing new biases or losing useful signal in the latent space.

What would settle it

A controlled ablation in which the same backbone without masked attention or without the recalibration module matches or exceeds CASE-NET accuracy on the same non-stationary test sets would falsify the claim that those mechanisms are necessary.

Figures

Figures reproduced from arXiv: 2605.22043 by Fan Zhang, Hua Wang, Yating Cui.

**Figure 1.** Figure 1: The hierarchical framework of CASE-NET, illustrating: (1) multi-scale representation initialization via parallel branches; (2) a [PITH_FULL_IMAGE:figures/full_fig_p004_1.png] view at source ↗

**Figure 2.** Figure 2: (a) Training and Validation Loss and (b) Learning Curves [PITH_FULL_IMAGE:figures/full_fig_p007_2.png] view at source ↗

**Figure 5.** Figure 5: Correlation heatmaps of specific features [PITH_FULL_IMAGE:figures/full_fig_p007_5.png] view at source ↗

**Figure 6.** Figure 6: t-SNE visualization of learned feature manifolds for HAR [PITH_FULL_IMAGE:figures/full_fig_p007_6.png] view at source ↗

read the original abstract

Multivariate time series (MTS) classification is foundational to pervasive computing and financial analysis, yet existing multi-scale paradigms are often constrained by suboptimal representation fidelity. We identify two critical bottlenecks: temporal non-causality in standard encoders that induces temporal confounding in non-stationary dynamics, and the absence of explicit channel saliency mechanisms that allows noise to contaminate the latent space. To address these challenges, we propose the Causal Attention and Spatio-temporal Encoder Network (CASE-NET), an architecture designed for structural manifold pre-conditioning. CASE-NET synergizes a Causal Temporal Encoder, which enforces physical arrow-of-time constraints via masked self-attention and causal convolutions, with an Adaptive Channel Recalibration module functioning as an information bottleneck to suppress detrimental noise. Comprehensive evaluations across six heterogeneous domains demonstrate that CASE-NET establishes new state-of-the-art benchmarks on four tasks, achieving a peak accuracy of 98.6% on the AWR dataset and superior robustness in non-stationary regimes.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

CASE-NET pairs causal masking with channel recalibration for MTS classification, but the causality premise looks mismatched to whole-sequence labels and the performance claims lack visible controls.

read the letter

The paper's main move is to add a Causal Temporal Encoder that uses masked self-attention and causal convolutions, then feed the result through an Adaptive Channel Recalibration block that acts as an information bottleneck. That specific pairing has not appeared in the cited prior work, so the architecture counts as a concrete new design even if the pieces are incremental extensions of existing attention and squeeze-and-excitation ideas. The stated goal is to reduce temporal confounding in non-stationary series while suppressing noise, and the abstract reports new peak numbers such as 98.6 percent on AWR across six domains. If the full experiments include proper ablations and multiple random seeds, that would be useful engineering data for people already working with attention-based time-series models.

Referee Report

2 major / 0 minor

Summary. The manuscript proposes CASE-NET for multivariate time series classification. It identifies two bottlenecks—temporal non-causality in standard encoders that induces confounding in non-stationary regimes, and absence of explicit channel saliency allowing noise contamination—and introduces a Causal Temporal Encoder (masked self-attention plus causal convolutions) together with an Adaptive Channel Recalibration module acting as an information bottleneck. The authors claim this yields new state-of-the-art results on four of six heterogeneous tasks, including a peak accuracy of 98.6% on the AWR dataset and improved robustness under non-stationarity.

Significance. If the performance claims are substantiated by rigorous baselines, ablations, and statistical tests, the work could contribute a useful perspective on incorporating explicit causal constraints and channel-level information bottlenecks into time-series representation learning. The combination of arrow-of-time masking with recalibration is a coherent architectural choice that may prove relevant for other non-stationary sequence tasks.

major comments (2)

The central premise that masked self-attention and causal convolutions eliminate temporal confounding without net loss of predictive signal is load-bearing for the contribution, yet no derivation or controlled experiment isolates the trade-off between removing future-context confounding and discarding label-correlated statistics that may still be informative for whole-sequence classification under non-stationarity.
The abstract asserts SOTA results and superior robustness, but the manuscript supplies no quantitative details on the exact baselines, number of runs, error bars, or statistical significance tests that would allow evaluation of whether the reported 98.6% AWR accuracy and cross-task gains are robust.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive and detailed review of our manuscript. We have carefully addressed each of the major comments raised. Our responses are provided below, and we have made revisions to the manuscript to incorporate additional experiments and details as suggested.

read point-by-point responses

Referee: The central premise that masked self-attention and causal convolutions eliminate temporal confounding without net loss of predictive signal is load-bearing for the contribution, yet no derivation or controlled experiment isolates the trade-off between removing future-context confounding and discarding label-correlated statistics that may still be informative for whole-sequence classification under non-stationarity.

Authors: We appreciate the referee's emphasis on this critical aspect of our contribution. The design of the Causal Temporal Encoder is motivated by the need to respect the temporal order in non-stationary time series to avoid confounding from future information. While the empirical superiority on multiple tasks indicates a net benefit, we concur that a more targeted analysis of the trade-off would be beneficial. Accordingly, in the revised manuscript, we have added a new controlled experiment in the ablation studies section. This experiment systematically varies the causality constraint and measures the impact on classification accuracy under different levels of non-stationarity, thereby isolating the effects of reduced confounding versus potential loss of informative future statistics. revision: yes
Referee: The abstract asserts SOTA results and superior robustness, but the manuscript supplies no quantitative details on the exact baselines, number of runs, error bars, or statistical significance tests that would allow evaluation of whether the reported 98.6% AWR accuracy and cross-task gains are robust.

Authors: We acknowledge that the experimental reporting in the original submission could be more comprehensive to allow full assessment of robustness. The manuscript does compare against several established baselines across the six datasets, but to strengthen the claims, we have revised the experimental section to include precise details: all results are averaged over 5 independent runs with different random seeds; standard deviations are now reported as error bars in the tables; and we have included statistical significance testing using paired t-tests, with p-values provided for the comparisons against the strongest baseline on each task. These updates confirm that the 98.6% accuracy on AWR and the improvements on other tasks are statistically significant. revision: yes

Circularity Check

0 steps flagged

No circularity: empirical architecture proposal with standard benchmark evaluation

full rationale

The paper proposes CASE-NET as an empirical architecture combining causal attention, convolutions, and channel recalibration to address stated bottlenecks in MTS classification. No mathematical derivation chain, first-principles predictions, or equations are presented that reduce by construction to fitted inputs or self-citations. Claims rest on experimental accuracy numbers from standard heterogeneous benchmarks, which constitute independent empirical content rather than tautological reduction. The work is self-contained as a typical deep learning design paper without load-bearing self-referential steps.

Axiom & Free-Parameter Ledger

2 free parameters · 2 axioms · 0 invented entities

Only the abstract is available, so the ledger is necessarily incomplete; the model rests on standard deep-learning assumptions plus two domain-specific modeling choices whose justification is not supplied.

free parameters (2)

attention mask and convolution kernel sizes
Chosen to enforce causality; values are not stated and must be tuned on data.
channel recalibration bottleneck ratio
Hyper-parameter controlling the information bottleneck; fitted during training.

axioms (2)

domain assumption Masked self-attention and causal convolutions enforce physical arrow-of-time constraints and thereby eliminate temporal confounding.
Invoked in the description of the Causal Temporal Encoder; no proof or empirical isolation is given in the abstract.
domain assumption Channel recalibration functions as an effective information bottleneck that suppresses detrimental noise without discarding signal.
Central modeling claim for the Adaptive Channel Recalibration module.

pith-pipeline@v0.9.0 · 5708 in / 1438 out tokens · 43678 ms · 2026-05-22T08:26:33.380007+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Foundation/ArrowOfTime.lean arrow_from_z / before_transitive echoes

?

echoes
ECHOES: this paper passage has the same mathematical shape or conceptual pattern as the Recognition theorem, but is not a direct formal dependency.

The CTA module ... enforces autoregressive consistency (h_t = f({x_τ}_τ≤t)). ... By enforcing causality, our CTA module functions as a Structural Noise Filter. ... enforcing physical time-arrow constraints
IndisputableMonolith/Foundation/ArrowOfTime.lean forward_accumulates echoes

?

echoes
ECHOES: this paper passage has the same mathematical shape or conceptual pattern as the Recognition theorem, but is not a direct formal dependency.

in physical systems, the arrow of time dictates that current states should not depend on future observations

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

38 extracted references · 38 canonical work pages · 6 internal anchors

[1]

Causal-oriented representation learning for time- series forecasting based on the spatiotemporal information transformation.Communications Physics, 8(1):242,

[Caiet al., 2025 ] Sihua Cai, Hao Peng, Rui Liu, and Pei Chen. Causal-oriented representation learning for time- series forecasting based on the spatiotemporal information transformation.Communications Physics, 8(1):242,

work page 2025
[2]

Multi-scale adaptive graph neural network for multivariate time series forecasting.IEEE Transactions on Knowledge and Data Engineering, 35(10):10748–10761,

[Chenet al., 2023 ] Ling Chen, Donghui Chen, Zongjiang Shang, Binqing Wu, Cen Zheng, Bo Wen, and Wei Zhang. Multi-scale adaptive graph neural network for multivariate time series forecasting.IEEE Transactions on Knowledge and Data Engineering, 35(10):10748–10761,

work page 2023
[3]

Scientific reports 12, 16327

[Chenet al., 2024 ] Peng Chen, Yingying Zhang, Yunyao Cheng, Yang Shu, Yihang Wang, Qingsong Wen, Bin Yang, and Chenjuan Guo. Pathformer: Multi-scale trans- formers with adaptive pathways for time series forecast- ing.arXiv preprint arXiv:2402.05956,

work page arXiv 2024
[4]

Hud: Hierarchical uncertainty-aware disambiguation network for composed video retrieval

[Chenet al., 2025 ] Zhiwei Chen, Yupeng Hu, Zixu Li, Zhi- heng Fu, Haokun Wen, and Weili Guan. Hud: Hierarchical uncertainty-aware disambiguation network for composed video retrieval. InProceedings of the ACM International Conference on Multimedia, page 6143–6152,

work page 2025
[5]

Weakly guided adaptation for robust time series forecasting.Proceedings of the VLDB Endowment, 17(4):766–779,

[Chenget al., 2023 ] Yunyao Cheng, Peng Chen, Chenjuan Guo, Kai Zhao, Qingsong Wen, Bin Yang, and Chris- tian S Jensen. Weakly guided adaptation for robust time series forecasting.Proceedings of the VLDB Endowment, 17(4):766–779,

work page 2023
[6]

Time-series representation learning via temporal and contextual contrasting.arXiv preprint arXiv:2106.14112,

[Eldeleet al., 2021 ] Emadeldeen Eldele, Mohamed Ragab, Zhenghua Chen, Min Wu, Chee Keong Kwoh, Xiaoli Li, and Cuntai Guan. Time-series representation learning via temporal and contextual contrasting.arXiv preprint arXiv:2106.14112,

work page arXiv 2021
[7]

Sde-attention: Latent attention in sde-rnns for ir- regularly sampled time series with missing data.arXiv preprint arXiv:2511.23238,

[Fanget al., 2025 ] Yuting Fang, Qouc Le Gia, and Flora Salim. Sde-attention: Latent attention in sde-rnns for ir- regularly sampled time series with missing data.arXiv preprint arXiv:2511.23238,

work page arXiv 2025
[8]

Pair: Complementarity-guided disentanglement for com- posed image retrieval

[Fuet al., 2025 ] Zhiheng Fu, Zixu Li, Zhiwei Chen, Chunx- iao Wang, Xuemeng Song, Yupeng Hu, and Liqiang Nie. Pair: Complementarity-guided disentanglement for com- posed image retrieval. InICASSP 2025-2025 IEEE In- ternational Conference on Acoustics, Speech and Signal Processing (ICASSP), pages 1–5. IEEE,

work page 2025
[9]

A new at- tention mechanism to classify multivariate time series

[Hao and Cao, 2020] Yifan Hao and Huiping Cao. A new at- tention mechanism to classify multivariate time series. In Proceedings of the Twenty-Ninth International Joint Con- ference on Artificial Intelligence,

work page 2020
[10]

Crossgnn: Confronting noisy multivariate time series via cross interaction refinement.Advances in Neural Information Processing Systems, 36:46885–46902,

[Huanget al., 2023 ] Qihe Huang, Lei Shen, Ruixin Zhang, Shouhong Ding, Binwu Wang, Zhengyang Zhou, and Yang Wang. Crossgnn: Confronting noisy multivariate time series via cross interaction refinement.Advances in Neural Information Processing Systems, 36:46885–46902,

work page 2023
[11]

Median: Adaptive intermediate-grained aggregation network for composed image retrieval

[Huanget al., 2025 ] Qinlei Huang, Zhiwei Chen, Zixu Li, Chunxiao Wang, Xuemeng Song, Yupeng Hu, and Liqiang Nie. Median: Adaptive intermediate-grained aggregation network for composed image retrieval. InICASSP 2025- 2025 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pages 1–5. IEEE,

work page 2025
[12]

Cafo: Feature- centric explanation on time series classification

[Kimet al., 2024 ] Jaeho Kim, Seok-Ju Hahn, Yoontae Hwang, Junghye Lee, and Seulki Lee. Cafo: Feature- centric explanation on time series classification. InPro- ceedings of the 30th ACM SIGKDD Conference on Knowl- edge Discovery and Data Mining, pages 1372–1382,

work page 2024
[13]

Adam: A Method for Stochastic Optimization

[Kingma and Ba, 2014] Diederik P. Kingma and Jimmy Ba. Adam: A method for stochastic optimization.arXiv preprint arXiv:1412.6980,

work page internal anchor Pith review Pith/arXiv arXiv 2014
[14]

Modeling long-and short-term temporal patterns with deep neural networks

[Laiet al., 2018 ] Guokun Lai, Wei-Cheng Chang, Yiming Yang, and Hanxiao Liu. Modeling long-and short-term temporal patterns with deep neural networks. InThe 41st international ACM SIGIR conference on research & devel- opment in information retrieval, pages 95–104,

work page 2018
[15]

Learnable dynamic temporal pooling for time series classification

[Leeet al., 2021 ] Dongha Lee, Seonghyeon Lee, and Hwanjo Yu. Learnable dynamic temporal pooling for time series classification. InProceedings of the AAAI Confer- ence on Artificial Intelligence, volume 35, pages 8288– 8296,

work page 2021
[16]

Encoder: Entity mining and modification relation binding for composed image retrieval

[Liet al., 2025 ] Zixu Li, Zhiwei Chen, Haokun Wen, Zhi- heng Fu, Yupeng Hu, and Weili Guan. Encoder: Entity mining and modification relation binding for composed image retrieval. InProceedings of the AAAI Conference on Artificial Intelligence, volume 39, pages 5101–5109,

work page 2025
[17]

Cats: Mitigating corre- lation shift for multivariate time series classification.arXiv preprint arXiv:2504.04283,

[Linet al., 2025 ] Xiao Lin, Zhichen Zeng, Tianxin Wei, Zhining Liu, and Hanghang Tong. Cats: Mitigating corre- lation shift for multivariate time series classification.arXiv preprint arXiv:2504.04283,

work page arXiv 2025
[18]

Spatial- temporal large language model for traffic prediction

[Liuet al., 2024a ] Chenxi Liu, Sun Yang, Qianxiong Xu, Zhishuai Li, Cheng Long, Ziyue Li, and Rui Zhao. Spatial- temporal large language model for traffic prediction. In 2024 25th IEEE International Conference on Mobile Data Management (MDM), pages 31–40. IEEE,

work page 2024
[19]

iTransformer: Inverted Transformers Are Effective for Time Series Forecasting

[Liuet al., 2024b ] Yong Liu, Tengge Hu, Haoran Zhang, Haixu Wu, Shiyu Wang, Lintao Ma, and Mingsheng Long. itransformer: Inverted transformers are effective for time series forecasting.arXiv preprint arXiv:2310.06625,

work page internal anchor Pith review Pith/arXiv arXiv
[20]

Disms-ts: Eliminating redun- dant multi-scale features for time series classification

[Liuet al., 2025 ] Zhipeng Liu, Peibo Duan, Binwu Wang, Xuan Tang, Qi Chu, Changsheng Zhang, Yongsheng Huang, and Bin Zhang. Disms-ts: Eliminating redun- dant multi-scale features for time series classification. In Proceedings of the 33rd ACM International Conference on Multimedia, pages 10817–10826,

work page 2025
[21]

Time series prediction using deep learning methods in healthcare.ACM Transactions on Management Information Systems, 14(1):1–29,

[Moridet al., 2023 ] Mohammad Amin Morid, Olivia R Liu Sheng, and Joseph Dunbar. Time series prediction using deep learning methods in healthcare.ACM Transactions on Management Information Systems, 14(1):1–29,

work page 2023
[22]

A Time Series is Worth 64 Words: Long-term Forecasting with Transformers

[Nieet al., 2022 ] Yuqi Nie, Nam H Nguyen, Phanwadee Sinthong, and Jayant Kalagnanam. A time series is worth 64 words: Long-term forecasting with transformers. arxiv 2022.arXiv preprint arXiv:2211.14730,

work page internal anchor Pith review Pith/arXiv arXiv 2022
[23]

A Dual-Stage Attention-Based Recurrent Neural Network for Time Series Prediction

[Qinet al., 2017 ] Yao Qin, Dongjin Song, Haifeng Chen, Wei Cheng, Guofei Jiang, and Garrison Cottrell. A dual- stage attention-based recurrent neural network for time se- ries prediction.arXiv preprint arXiv:1704.02971,

work page internal anchor Pith review Pith/arXiv arXiv 2017
[24]

Jensen, Zhenli Sheng, and Bin Yang

[Qiuet al., 2024 ] Xiangfei Qiu, Jilin Hu, Lekui Zhou, Xingjian Wu, Junyang Du, Buang Zhang, Chenjuan Guo, Aoying Zhou, Christian S. Jensen, Zhenli Sheng, and Bin Yang. TFB: Towards comprehensive and fair benchmark- ing of time series forecasting methods. InProc. VLDB Endow., pages 2363–2377,

work page 2024
[25]

A multi-scale temporal- frequency fusion network based on mlp for long-term time series forecasting.International Journal of Machine Learning and Cybernetics, 16(5):3943–3954,

[Songet al., 2025 ] Yaqi Song, Rujie Wan, Li Li, Wanyu Wang, and Haonan Xing. A multi-scale temporal- frequency fusion network based on mlp for long-term time series forecasting.International Journal of Machine Learning and Cybernetics, 16(5):3943–3954,

work page 2025
[26]

Deep learning for epileptic seizure detection using a causal- spatio-temporal model based on transfer entropy.Entropy, 26(10):853,

[Sunet al., 2024 ] Jie Sun, Jie Xiang, Yanqing Dong, Bin Wang, Mengni Zhou, Jiuhong Ma, and Yan Niu. Deep learning for epileptic seizure detection using a causal- spatio-temporal model based on transfer entropy.Entropy, 26(10):853,

work page 2024
[27]

Timemixer: Decomposable mul- tiscale mixing for time series forecasting.arXiv preprint arXiv:2405.14616,

[Wanget al., 2024a ] Shiyu Wang, Haixu Wu, Xiaoming Shi, Tengge Hu, Huakun Luo, Lintao Ma, James Y Zhang, and Jun Zhou. Timemixer: Decomposable mul- tiscale mixing for time series forecasting.arXiv preprint arXiv:2405.14616,

work page arXiv
[28]

Deep Time Series Models: A Comprehensive Survey and Benchmark

[Wanget al., 2024b ] Yuxuan Wang, Haixu Wu, Jiaxiang Dong, Yong Liu, Chen Wang, Mingsheng Long, and Jian- min Wang. Deep time series models: A comprehensive survey and benchmark.arXiv preprint arXiv:2407.13278,

work page internal anchor Pith review Pith/arXiv arXiv
[29]

Eeo-tfv: Escape-explore optimizer for web-scale time- series forecasting and vision analysis

[Wanget al., 2026a ] Hua Wang, Jinghao Lu, and Fan Zhang. Eeo-tfv: Escape-explore optimizer for web-scale time- series forecasting and vision analysis. InProceedings of the ACM Web Conference 2026, pages 7271–7282,

work page 2026
[30]

A deep spatio-temporal ar- chitecture for dynamic ecn analysis with granger causality based causal discovery.Pattern Recognition, page 112346,

[Xuet al., 2025 ] Faming Xu, Yiding Wang, Gang Qu, Vince D Calhoun, Julia M Stephen, Tony W Wilson, Yu- Ping Wang, and Chen Qiao. A deep spatio-temporal ar- chitecture for dynamic ecn analysis with granger causality based causal discovery.Pattern Recognition, page 112346,

work page 2025
[31]

Self-supervised con- trastive pre-training for time series via time-frequency consistency.Advances in neural information processing systems, 35:3988–4003,

[Zhanget al., 2022 ] Xiang Zhang, Ziyuan Zhao, Theodoros Tsiligkaridis, and Marinka Zitnik. Self-supervised con- trastive pre-training for time series via time-frequency consistency.Advances in neural information processing systems, 35:3988–4003,

work page 2022
[32]

Multiview spatial-temporal meta-learning for multivariate time series forecasting.Sensors (Basel, Switzerland), 24(14):4473,

[Zhanget al., 2024 ] Liang Zhang, Jianping Zhu, Bo Jin, and Xiaopeng Wei. Multiview spatial-temporal meta-learning for multivariate time series forecasting.Sensors (Basel, Switzerland), 24(14):4473,

work page 2024
[33]

Multivariate time series approach integrating cross-temporal and cross-channel at- tention for dysarthria detection from speech.Neurocom- puting, page 130708,

[Zhanget al., 2025 ] Zhenglin Zhang, Tengfei Wang, Zian Hu, Li-Zhuang Yang, and Hai Li. Multivariate time series approach integrating cross-temporal and cross-channel at- tention for dysarthria detection from speech.Neurocom- puting, page 130708,

work page 2025
[34]

Time-tk: A multi-offset temporal interaction frame- work combining transformer and kolmogorov-arnold net- works for time series forecasting

[Zhanget al., 2026a ] Fan Zhang, Shiming Fan, and Hua Wang. Time-tk: A multi-offset temporal interaction frame- work combining transformer and kolmogorov-arnold net- works for time series forecasting. InProceedings of the ACM Web Conference 2026, pages 7495–7506,

work page 2026
[35]

TimeSAF: Towards LLM-Guided Semantic Asynchronous Fusion for Time Series Forecasting

[Zhanget al., 2026b ] Fan Zhang, Shiming Fan, and Hua Wang. Timesaf: Towards llm-guided semantic asyn- chronous fusion for time series forecasting.arXiv preprint arXiv:2604.12648,

work page internal anchor Pith review Pith/arXiv arXiv
[36]

Hint: Composed image retrieval with dual- path compositional contextualized network.arXiv preprint arXiv:2603.26341,

[Zhanget al., 2026c ] Mingyu Zhang, Zixu Li, Zhiwei Chen, Zhiheng Fu, Xiaowei Zhu, Jiajia Nie, Yinwei Wei, and Yupeng Hu. Hint: Composed image retrieval with dual- path compositional contextualized network.arXiv preprint arXiv:2603.26341,

work page arXiv
[37]

Mtm: A multi-scale token mixing transformer for irregular mul- tivariate time series classification

[Zhonget al., 2025 ] Shuhan Zhong, Weipeng Zhuo, Sizhe Song, Guanyao Li, Zhongyi Yu, and S-H Gary Chan. Mtm: A multi-scale token mixing transformer for irregular mul- tivariate time series classification. InProceedings of the 31st ACM SIGKDD Conference on Knowledge Discovery and Data Mining V . 2, pages 4074–4085,

work page 2025
[38]

Long-term time series forecast- ing with multilinear trend fuzzy information granules for lstm in a periodic framework.IEEE Transactions on Fuzzy Systems, 32(1):322–336, 2023

[Zhuet al., 2023 ] Chenglong Zhu, Xueling Ma, Weiping Ding, and Jianming Zhan. Long-term time series forecast- ing with multilinear trend fuzzy information granules for lstm in a periodic framework.IEEE Transactions on Fuzzy Systems, 32(1):322–336, 2023

work page 2023

[1] [1]

Causal-oriented representation learning for time- series forecasting based on the spatiotemporal information transformation.Communications Physics, 8(1):242,

[Caiet al., 2025 ] Sihua Cai, Hao Peng, Rui Liu, and Pei Chen. Causal-oriented representation learning for time- series forecasting based on the spatiotemporal information transformation.Communications Physics, 8(1):242,

work page 2025

[2] [2]

Multi-scale adaptive graph neural network for multivariate time series forecasting.IEEE Transactions on Knowledge and Data Engineering, 35(10):10748–10761,

[Chenet al., 2023 ] Ling Chen, Donghui Chen, Zongjiang Shang, Binqing Wu, Cen Zheng, Bo Wen, and Wei Zhang. Multi-scale adaptive graph neural network for multivariate time series forecasting.IEEE Transactions on Knowledge and Data Engineering, 35(10):10748–10761,

work page 2023

[3] [3]

Scientific reports 12, 16327

[Chenet al., 2024 ] Peng Chen, Yingying Zhang, Yunyao Cheng, Yang Shu, Yihang Wang, Qingsong Wen, Bin Yang, and Chenjuan Guo. Pathformer: Multi-scale trans- formers with adaptive pathways for time series forecast- ing.arXiv preprint arXiv:2402.05956,

work page arXiv 2024

[4] [4]

Hud: Hierarchical uncertainty-aware disambiguation network for composed video retrieval

[Chenet al., 2025 ] Zhiwei Chen, Yupeng Hu, Zixu Li, Zhi- heng Fu, Haokun Wen, and Weili Guan. Hud: Hierarchical uncertainty-aware disambiguation network for composed video retrieval. InProceedings of the ACM International Conference on Multimedia, page 6143–6152,

work page 2025

[5] [5]

Weakly guided adaptation for robust time series forecasting.Proceedings of the VLDB Endowment, 17(4):766–779,

[Chenget al., 2023 ] Yunyao Cheng, Peng Chen, Chenjuan Guo, Kai Zhao, Qingsong Wen, Bin Yang, and Chris- tian S Jensen. Weakly guided adaptation for robust time series forecasting.Proceedings of the VLDB Endowment, 17(4):766–779,

work page 2023

[6] [6]

Time-series representation learning via temporal and contextual contrasting.arXiv preprint arXiv:2106.14112,

[Eldeleet al., 2021 ] Emadeldeen Eldele, Mohamed Ragab, Zhenghua Chen, Min Wu, Chee Keong Kwoh, Xiaoli Li, and Cuntai Guan. Time-series representation learning via temporal and contextual contrasting.arXiv preprint arXiv:2106.14112,

work page arXiv 2021

[7] [7]

Sde-attention: Latent attention in sde-rnns for ir- regularly sampled time series with missing data.arXiv preprint arXiv:2511.23238,

[Fanget al., 2025 ] Yuting Fang, Qouc Le Gia, and Flora Salim. Sde-attention: Latent attention in sde-rnns for ir- regularly sampled time series with missing data.arXiv preprint arXiv:2511.23238,

work page arXiv 2025

[8] [8]

Pair: Complementarity-guided disentanglement for com- posed image retrieval

[Fuet al., 2025 ] Zhiheng Fu, Zixu Li, Zhiwei Chen, Chunx- iao Wang, Xuemeng Song, Yupeng Hu, and Liqiang Nie. Pair: Complementarity-guided disentanglement for com- posed image retrieval. InICASSP 2025-2025 IEEE In- ternational Conference on Acoustics, Speech and Signal Processing (ICASSP), pages 1–5. IEEE,

work page 2025

[9] [9]

A new at- tention mechanism to classify multivariate time series

[Hao and Cao, 2020] Yifan Hao and Huiping Cao. A new at- tention mechanism to classify multivariate time series. In Proceedings of the Twenty-Ninth International Joint Con- ference on Artificial Intelligence,

work page 2020

[10] [10]

Crossgnn: Confronting noisy multivariate time series via cross interaction refinement.Advances in Neural Information Processing Systems, 36:46885–46902,

[Huanget al., 2023 ] Qihe Huang, Lei Shen, Ruixin Zhang, Shouhong Ding, Binwu Wang, Zhengyang Zhou, and Yang Wang. Crossgnn: Confronting noisy multivariate time series via cross interaction refinement.Advances in Neural Information Processing Systems, 36:46885–46902,

work page 2023

[11] [11]

Median: Adaptive intermediate-grained aggregation network for composed image retrieval

[Huanget al., 2025 ] Qinlei Huang, Zhiwei Chen, Zixu Li, Chunxiao Wang, Xuemeng Song, Yupeng Hu, and Liqiang Nie. Median: Adaptive intermediate-grained aggregation network for composed image retrieval. InICASSP 2025- 2025 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pages 1–5. IEEE,

work page 2025

[12] [12]

Cafo: Feature- centric explanation on time series classification

[Kimet al., 2024 ] Jaeho Kim, Seok-Ju Hahn, Yoontae Hwang, Junghye Lee, and Seulki Lee. Cafo: Feature- centric explanation on time series classification. InPro- ceedings of the 30th ACM SIGKDD Conference on Knowl- edge Discovery and Data Mining, pages 1372–1382,

work page 2024

[13] [13]

Adam: A Method for Stochastic Optimization

[Kingma and Ba, 2014] Diederik P. Kingma and Jimmy Ba. Adam: A method for stochastic optimization.arXiv preprint arXiv:1412.6980,

work page internal anchor Pith review Pith/arXiv arXiv 2014

[14] [14]

Modeling long-and short-term temporal patterns with deep neural networks

[Laiet al., 2018 ] Guokun Lai, Wei-Cheng Chang, Yiming Yang, and Hanxiao Liu. Modeling long-and short-term temporal patterns with deep neural networks. InThe 41st international ACM SIGIR conference on research & devel- opment in information retrieval, pages 95–104,

work page 2018

[15] [15]

Learnable dynamic temporal pooling for time series classification

[Leeet al., 2021 ] Dongha Lee, Seonghyeon Lee, and Hwanjo Yu. Learnable dynamic temporal pooling for time series classification. InProceedings of the AAAI Confer- ence on Artificial Intelligence, volume 35, pages 8288– 8296,

work page 2021

[16] [16]

Encoder: Entity mining and modification relation binding for composed image retrieval

[Liet al., 2025 ] Zixu Li, Zhiwei Chen, Haokun Wen, Zhi- heng Fu, Yupeng Hu, and Weili Guan. Encoder: Entity mining and modification relation binding for composed image retrieval. InProceedings of the AAAI Conference on Artificial Intelligence, volume 39, pages 5101–5109,

work page 2025

[17] [17]

Cats: Mitigating corre- lation shift for multivariate time series classification.arXiv preprint arXiv:2504.04283,

[Linet al., 2025 ] Xiao Lin, Zhichen Zeng, Tianxin Wei, Zhining Liu, and Hanghang Tong. Cats: Mitigating corre- lation shift for multivariate time series classification.arXiv preprint arXiv:2504.04283,

work page arXiv 2025

[18] [18]

Spatial- temporal large language model for traffic prediction

[Liuet al., 2024a ] Chenxi Liu, Sun Yang, Qianxiong Xu, Zhishuai Li, Cheng Long, Ziyue Li, and Rui Zhao. Spatial- temporal large language model for traffic prediction. In 2024 25th IEEE International Conference on Mobile Data Management (MDM), pages 31–40. IEEE,

work page 2024

[19] [19]

iTransformer: Inverted Transformers Are Effective for Time Series Forecasting

[Liuet al., 2024b ] Yong Liu, Tengge Hu, Haoran Zhang, Haixu Wu, Shiyu Wang, Lintao Ma, and Mingsheng Long. itransformer: Inverted transformers are effective for time series forecasting.arXiv preprint arXiv:2310.06625,

work page internal anchor Pith review Pith/arXiv arXiv

[20] [20]

Disms-ts: Eliminating redun- dant multi-scale features for time series classification

[Liuet al., 2025 ] Zhipeng Liu, Peibo Duan, Binwu Wang, Xuan Tang, Qi Chu, Changsheng Zhang, Yongsheng Huang, and Bin Zhang. Disms-ts: Eliminating redun- dant multi-scale features for time series classification. In Proceedings of the 33rd ACM International Conference on Multimedia, pages 10817–10826,

work page 2025

[21] [21]

Time series prediction using deep learning methods in healthcare.ACM Transactions on Management Information Systems, 14(1):1–29,

[Moridet al., 2023 ] Mohammad Amin Morid, Olivia R Liu Sheng, and Joseph Dunbar. Time series prediction using deep learning methods in healthcare.ACM Transactions on Management Information Systems, 14(1):1–29,

work page 2023

[22] [22]

A Time Series is Worth 64 Words: Long-term Forecasting with Transformers

[Nieet al., 2022 ] Yuqi Nie, Nam H Nguyen, Phanwadee Sinthong, and Jayant Kalagnanam. A time series is worth 64 words: Long-term forecasting with transformers. arxiv 2022.arXiv preprint arXiv:2211.14730,

work page internal anchor Pith review Pith/arXiv arXiv 2022

[23] [23]

A Dual-Stage Attention-Based Recurrent Neural Network for Time Series Prediction

[Qinet al., 2017 ] Yao Qin, Dongjin Song, Haifeng Chen, Wei Cheng, Guofei Jiang, and Garrison Cottrell. A dual- stage attention-based recurrent neural network for time se- ries prediction.arXiv preprint arXiv:1704.02971,

work page internal anchor Pith review Pith/arXiv arXiv 2017

[24] [24]

Jensen, Zhenli Sheng, and Bin Yang

[Qiuet al., 2024 ] Xiangfei Qiu, Jilin Hu, Lekui Zhou, Xingjian Wu, Junyang Du, Buang Zhang, Chenjuan Guo, Aoying Zhou, Christian S. Jensen, Zhenli Sheng, and Bin Yang. TFB: Towards comprehensive and fair benchmark- ing of time series forecasting methods. InProc. VLDB Endow., pages 2363–2377,

work page 2024

[25] [25]

A multi-scale temporal- frequency fusion network based on mlp for long-term time series forecasting.International Journal of Machine Learning and Cybernetics, 16(5):3943–3954,

[Songet al., 2025 ] Yaqi Song, Rujie Wan, Li Li, Wanyu Wang, and Haonan Xing. A multi-scale temporal- frequency fusion network based on mlp for long-term time series forecasting.International Journal of Machine Learning and Cybernetics, 16(5):3943–3954,

work page 2025

[26] [26]

Deep learning for epileptic seizure detection using a causal- spatio-temporal model based on transfer entropy.Entropy, 26(10):853,

[Sunet al., 2024 ] Jie Sun, Jie Xiang, Yanqing Dong, Bin Wang, Mengni Zhou, Jiuhong Ma, and Yan Niu. Deep learning for epileptic seizure detection using a causal- spatio-temporal model based on transfer entropy.Entropy, 26(10):853,

work page 2024

[27] [27]

Timemixer: Decomposable mul- tiscale mixing for time series forecasting.arXiv preprint arXiv:2405.14616,

[Wanget al., 2024a ] Shiyu Wang, Haixu Wu, Xiaoming Shi, Tengge Hu, Huakun Luo, Lintao Ma, James Y Zhang, and Jun Zhou. Timemixer: Decomposable mul- tiscale mixing for time series forecasting.arXiv preprint arXiv:2405.14616,

work page arXiv

[28] [28]

Deep Time Series Models: A Comprehensive Survey and Benchmark

[Wanget al., 2024b ] Yuxuan Wang, Haixu Wu, Jiaxiang Dong, Yong Liu, Chen Wang, Mingsheng Long, and Jian- min Wang. Deep time series models: A comprehensive survey and benchmark.arXiv preprint arXiv:2407.13278,

work page internal anchor Pith review Pith/arXiv arXiv

[29] [29]

Eeo-tfv: Escape-explore optimizer for web-scale time- series forecasting and vision analysis

[Wanget al., 2026a ] Hua Wang, Jinghao Lu, and Fan Zhang. Eeo-tfv: Escape-explore optimizer for web-scale time- series forecasting and vision analysis. InProceedings of the ACM Web Conference 2026, pages 7271–7282,

work page 2026

[30] [30]

A deep spatio-temporal ar- chitecture for dynamic ecn analysis with granger causality based causal discovery.Pattern Recognition, page 112346,

[Xuet al., 2025 ] Faming Xu, Yiding Wang, Gang Qu, Vince D Calhoun, Julia M Stephen, Tony W Wilson, Yu- Ping Wang, and Chen Qiao. A deep spatio-temporal ar- chitecture for dynamic ecn analysis with granger causality based causal discovery.Pattern Recognition, page 112346,

work page 2025

[31] [31]

Self-supervised con- trastive pre-training for time series via time-frequency consistency.Advances in neural information processing systems, 35:3988–4003,

[Zhanget al., 2022 ] Xiang Zhang, Ziyuan Zhao, Theodoros Tsiligkaridis, and Marinka Zitnik. Self-supervised con- trastive pre-training for time series via time-frequency consistency.Advances in neural information processing systems, 35:3988–4003,

work page 2022

[32] [32]

Multiview spatial-temporal meta-learning for multivariate time series forecasting.Sensors (Basel, Switzerland), 24(14):4473,

[Zhanget al., 2024 ] Liang Zhang, Jianping Zhu, Bo Jin, and Xiaopeng Wei. Multiview spatial-temporal meta-learning for multivariate time series forecasting.Sensors (Basel, Switzerland), 24(14):4473,

work page 2024

[33] [33]

Multivariate time series approach integrating cross-temporal and cross-channel at- tention for dysarthria detection from speech.Neurocom- puting, page 130708,

[Zhanget al., 2025 ] Zhenglin Zhang, Tengfei Wang, Zian Hu, Li-Zhuang Yang, and Hai Li. Multivariate time series approach integrating cross-temporal and cross-channel at- tention for dysarthria detection from speech.Neurocom- puting, page 130708,

work page 2025

[34] [34]

Time-tk: A multi-offset temporal interaction frame- work combining transformer and kolmogorov-arnold net- works for time series forecasting

[Zhanget al., 2026a ] Fan Zhang, Shiming Fan, and Hua Wang. Time-tk: A multi-offset temporal interaction frame- work combining transformer and kolmogorov-arnold net- works for time series forecasting. InProceedings of the ACM Web Conference 2026, pages 7495–7506,

work page 2026

[35] [35]

TimeSAF: Towards LLM-Guided Semantic Asynchronous Fusion for Time Series Forecasting

[Zhanget al., 2026b ] Fan Zhang, Shiming Fan, and Hua Wang. Timesaf: Towards llm-guided semantic asyn- chronous fusion for time series forecasting.arXiv preprint arXiv:2604.12648,

work page internal anchor Pith review Pith/arXiv arXiv

[36] [36]

Hint: Composed image retrieval with dual- path compositional contextualized network.arXiv preprint arXiv:2603.26341,

[Zhanget al., 2026c ] Mingyu Zhang, Zixu Li, Zhiwei Chen, Zhiheng Fu, Xiaowei Zhu, Jiajia Nie, Yinwei Wei, and Yupeng Hu. Hint: Composed image retrieval with dual- path compositional contextualized network.arXiv preprint arXiv:2603.26341,

work page arXiv

[37] [37]

Mtm: A multi-scale token mixing transformer for irregular mul- tivariate time series classification

[Zhonget al., 2025 ] Shuhan Zhong, Weipeng Zhuo, Sizhe Song, Guanyao Li, Zhongyi Yu, and S-H Gary Chan. Mtm: A multi-scale token mixing transformer for irregular mul- tivariate time series classification. InProceedings of the 31st ACM SIGKDD Conference on Knowledge Discovery and Data Mining V . 2, pages 4074–4085,

work page 2025

[38] [38]

Long-term time series forecast- ing with multilinear trend fuzzy information granules for lstm in a periodic framework.IEEE Transactions on Fuzzy Systems, 32(1):322–336, 2023

[Zhuet al., 2023 ] Chenglong Zhu, Xueling Ma, Weiping Ding, and Jianming Zhan. Long-term time series forecast- ing with multilinear trend fuzzy information granules for lstm in a periodic framework.IEEE Transactions on Fuzzy Systems, 32(1):322–336, 2023

work page 2023