pith. sign in

arxiv: 2606.20055 · v1 · pith:5VWCKSEUnew · submitted 2026-06-18 · 💻 cs.LG

PaAno+: Multiscale Encoding and Cross-Variable Attention for Time Series Anomaly Detection

Pith reviewed 2026-06-26 18:25 UTC · model grok-4.3

classification 💻 cs.LG
keywords time series anomaly detectionmultiscale encodingcross-variable attentionpatch representation learninglightweight modelself-supervised pretext taskTSB-AD benchmark
0
0 comments X

The pith

PaAno+ adds multiscale convolutions and cross-variable attention to improve time series anomaly detection accuracy and efficiency.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper presents PaAno as a lightweight model for time series anomaly detection aimed at industrial and medical monitoring. It builds a multiscale feature-extraction backbone with convolutional kernels of varying receptive fields, adds cross-scale adaptive attention with residual connections, and introduces a cross-variable fusion attention module to model dependencies between variables. A custom pretext task of temporal patch-window sorting combined with triplet loss is used to learn more discriminative patch embeddings. On the TSB-AD benchmark the model reports state-of-the-art results for both univariate and multivariate tasks, with gains across metrics including VUS-PR, while maintaining low computational cost suitable for edge deployment.

Core claim

The central claim is that a patch-oriented encoder using differentiated convolutional kernels for multiscale temporal features, followed by cross-scale adaptive attention aggregation and a dedicated cross-variable fusion attention module, together with a patch-window sorting pretext task and triplet loss, produces superior anomaly detection accuracy on the TSB-AD benchmark for univariate and multivariate series while remaining computationally lightweight.

What carries the argument

The multiscale convolutional backbone with cross-scale adaptive attention aggregation combined with the cross-variable fusion attention module, which captures hierarchical temporal patterns and explicit inter-variable correlations.

If this is right

  • The model enables real-time anomaly inference on resource-limited hardware.
  • Detection performance improves on both univariate and multivariate tasks relative to prior lightweight approaches.
  • The learned patch embeddings become more discriminative through the sorting pretext and triplet loss.
  • The architecture remains compact enough for practical deployment without the overhead of large transformer models.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same multiscale-plus-cross-variable pattern could be tested on other sequential data such as sensor streams in robotics.
  • The patch-sorting pretext task might transfer to self-supervised pretraining for forecasting or imputation tasks.
  • Hybrid systems could combine this lightweight encoder with occasional calls to larger models only on uncertain cases.

Load-bearing premise

The TSB-AD benchmark and its chosen metrics including VUS-PR represent real-world industrial and medical time series anomaly detection under complex conditions.

What would settle it

A controlled experiment showing that PaAno+ does not outperform strong baselines on a fresh collection of industrial or medical time series drawn from settings absent from TSB-AD would falsify the claim of broad superiority.

Figures

Figures reproduced from arXiv: 2606.20055 by Hongbing Wang, Wenchao Liu, Xiangguang Xiong, XiaoDong Liu, Youji Zhu.

Figure 1
Figure 1. Figure 1: Classification of common defects found in the current dataset. Anomalies are marked in red. 3. Method 3.1. Problem Definition This work addresses the task of semi-supervised time series anomaly detection. Under this paradigm, the training set consists exclusively of normal time series samples. The model learns the distribution and temporal evolution patterns of normal data to identify anomalous time points… view at source ↗
Figure 2
Figure 2. Figure 2: Architecture of the PaAno+ multivariate time-series anomaly detection system. Insufficient contextual information: Several abnormal samples lack adjacent normal background data, hindering the model from capturing feature discrepancies between normal and abnormal patterns. Unrealistic anomaly ratio: Benchmark datasets assume an excessively high proportion of anomalous samples, which is inconsistent with the… view at source ↗
Figure 3
Figure 3. Figure 3: Training workflow of the PaAno+ model. where 𝑀 represents the batch size, and the distance function dist(⋅, ⋅) uses the cosine distance, dist(𝑎, 𝑏) = 1 − cos(𝑎, 𝑏). 𝛿 = 0.5 is the margin parameter, used to constrain the minimum difference in feature distances between positive and negative samples. This loss function requires that the feature distance between anchor points and negative samples be at least 𝛿… view at source ↗
Figure 4
Figure 4. Figure 4: Parameter sensitivity analysis of the PaAno+ model’s Top-𝑘 values and memory bank size on the TSB-AD-U and TSB-AD-M Eval datasets. 4.7.2. Cross-Variable Attention Contributions To explore the contribution of cross-variable fusion attention to multivariate anomaly detection, an ablated variant (w/o Attention) is established by removing the cross-variable attention module while preserving the multiscale enco… view at source ↗
Figure 5
Figure 5. Figure 5: Sensitivity analysis of the performance of univariate and multivariate time-series anomaly detection with respect to window length 𝑇 . All results are presented as percentages (%). The nearest-neighbor number 𝑘 and the memory compression ratio are two critical control parameters for the updating mechanism. Model performance fluctuates slightly when 𝑘 varies from 1 to 5. The setting 𝑘 = 3 achieves a favorab… view at source ↗
read the original abstract

Time-series anomaly detection has significant practical value for industrial and medical monitoring, as well as other critical domains. Current Transformer- and large-model-based detection approaches incur excessive computational overhead, while existing lightweight alternatives are constrained by insufficient feature extraction and inadequate modeling of dependencies across multivariate variables. To mitigate the above drawbacks, this study develops a lightweight, efficient anomaly detection model, dubbed PaAno, within the patch-oriented representation learning paradigm. In the encoder module, a multiscale feature-extraction backbone is constructed using convolutional kernels with differentiated receptive fields to capture hierarchical temporal characteristics; subsequent cross-scale adaptive attention aggregation, combined with residual connection optimization, further stabilizes feature representation learning. A cross-variable fusion attention module is embedded to explicitly characterize inter-variable correlations, empowering the model to identify anomalous patterns amid intricate operational conditions. Moreover, a novel pretext task based on temporal patch-window sorting is customized to uncover intrinsic structural properties of time series, and triplet loss is leveraged to optimize the patch embedding space for enhanced feature discrimination. Extensive experiments on the TSB-AD benchmark demonstrate that the proposed PaAno achieves state-of-the-art detection accuracy on both univariate and multivariate tasks, yielding significant performance gains across evaluation metrics, including VUS-PR, relative to the original PaAno. Leveraging a compact network design, the presented model achieves favorable computational efficiency, enabling deployment on resource-limited terminals for real-time anomaly inference.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

1 major / 1 minor

Summary. The paper proposes PaAno+, a lightweight patch-oriented model for time-series anomaly detection. It introduces a multiscale convolutional encoder with differentiated receptive fields, cross-scale adaptive attention aggregation with residuals, a cross-variable fusion attention module, and a pretext task based on temporal patch-window sorting optimized via triplet loss. The central claim is that PaAno+ achieves state-of-the-art detection accuracy on both univariate and multivariate tasks on the TSB-AD benchmark, with significant gains (including on VUS-PR) over the original PaAno while remaining computationally efficient for resource-limited deployment.

Significance. If the reported gains hold under scrutiny, the work could supply a practical, deployable alternative to heavy Transformer-based detectors for industrial and medical monitoring. The multiscale backbone and explicit cross-variable modeling target documented weaknesses in prior lightweight methods, and the emphasis on efficiency is a clear strength for real-time inference.

major comments (1)
  1. [§4] §4 (Experiments) and abstract: the SOTA claim and practical-value framing rest on the untested assumption that TSB-AD faithfully captures 'intricate operational conditions,' variable correlations, and anomaly patterns from the target domains. No analysis of anomaly-type diversity, length distributions, or noise characteristics is supplied; a concrete test would be to stratify results by these factors or evaluate on a controlled perturbation of TSB-AD.
minor comments (1)
  1. [Abstract] Abstract: the final sentence refers to 'the proposed PaAno' while the title and earlier text use PaAno+; standardize nomenclature for clarity.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for the detailed and constructive comment on the experimental section. We address the concern point-by-point below and outline the planned revisions.

read point-by-point responses
  1. Referee: [§4] §4 (Experiments) and abstract: the SOTA claim and practical-value framing rest on the untested assumption that TSB-AD faithfully captures 'intricate operational conditions,' variable correlations, and anomaly patterns from the target domains. No analysis of anomaly-type diversity, length distributions, or noise characteristics is supplied; a concrete test would be to stratify results by these factors or evaluate on a controlled perturbation of TSB-AD.

    Authors: We acknowledge that the manuscript does not include an explicit stratification or perturbation analysis of TSB-AD. TSB-AD aggregates multiple established real-world datasets chosen to reflect diverse operational conditions, anomaly types, and variable correlations across domains; our consistent gains (including on VUS-PR) across its univariate and multivariate subsets provide supporting evidence for the claims. To directly address the point, the revised version will add a short subsection in §4 summarizing TSB-AD's documented characteristics (anomaly-type coverage, length distributions, and noise profiles) based on the benchmark's original construction and metadata. We view a full controlled perturbation study as valuable future work rather than a requirement for the current claims, as it would entail new experiments outside the scope of the present evaluation. This constitutes a partial revision. revision: partial

Circularity Check

0 steps flagged

No mathematical derivation or self-referential predictions present

full rationale

The paper is entirely empirical: it describes a model architecture (multiscale encoder, cross-variable attention, pretext task) and reports benchmark results on TSB-AD. No equations, first-principles derivations, fitted parameters renamed as predictions, or load-bearing self-citations appear in the provided text. The central claim reduces to experimental performance numbers rather than any construction that equates output to input by definition. This is the normal non-circular outcome for a benchmark-driven methods paper.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract supplies no equations, derivations, or modeling choices, so no free parameters, axioms, or invented entities can be identified.

pith-pipeline@v0.9.1-grok · 5787 in / 1114 out tokens · 22754 ms · 2026-06-26T18:25:00.665950+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

62 extracted references · 34 canonical work pages

  1. [1]

    Z. H. Yue, Y. J. Wang, J. Y. Duan, T. M. Yang, C. R. Huang, Y. H. Tong, and B. X. Xu. Ts2vec: Towards universal representation of time series. InAAAI, 2022. URLhttps://cdn.aaai.org/ojs/20881/20881-13-24894-1-2-20220628. pdf

  2. [2]

    F. Jia, K. Wang, Y. Zheng, D. Cao, and Y. Liu. Gpt4mts: Prompt-based large language model for multimodal time-series forecasting. InProceedings of the AAAI Conference on Artificial Intelligence, volume 38, pages 23343–23351, 2024. doi: 10.1609/aaai.v38i21.30383

  3. [3]

    Paparrizos, P

    J. Paparrizos, P. Boniol, T. Palpanas, R. S. Tsay, A. Elmore, and M. J. Franklin. Volume under the surface: A new accuracy evaluation measure for time-series anomaly detection.Proc. VLDB Endow., 15(11):2774–2787, 2022

  4. [4]

    Y. Wang, L. Zhang, T. Si, G. Bishop, and H. Gong. Anomaly detection in high-dimensional time series data with scaled bregman divergence.Algorithms, 18:62, 2025. doi: 10.3390/a18020062

  5. [5]

    Liu and J

    Q. Liu and J. Paparriz. The elephant in the room: Towards a reliable time-series anomaly detection benchmark. In Advances in Neural Information Processing Systems, volume 37, pages 108231–108261, 2024

  6. [6]

    Park and S

    J. Park and S. Kang. Paano: Patch-based representation learning for time-series anomaly detection. InProceedings of the International Conference on Learning Representations (ICLR), 2026. URLhttps://openreview.net/forum?id= NXThkM7Iym

  7. [7]

    N. Y. Lu, F. R. Gao, Y. Yang, and F. L. Wang. Pca-based modeling and on-line monitoring strategy for uneven-length batch processes.Industrial and Engineering Chemistry Research, 43(13):3343–3352, 2004. doi: 10.1002/aic.10024

  8. [8]

    E. J. Candès, X. D. Li, Y. Ma, and J. Wright. Robust principal component analysis?Journal of the ACM, 58(3):1–37, 2011

  9. [9]

    Yairi, Y

    T. Yairi, Y. Kato, and K. Hori. Fault detection by mining association rules from house-keeping data. InProceedings of the 6th International Symposium on Artificial Intelligence, Robotics and Automation in Space (i-SAIRAS), pages 18–21,

  10. [10]

    doi: 10.1.1.102.7045

  11. [11]

    Paparrizos and L

    J. Paparrizos and L. Gravano. k-shape: Efficient and accurate clustering of time series. InProceedings of the 2015 ACM SIGMOD International Conference on Management of Data, pages 1855–1870, 2015. doi: 10.1145/2723372.2737793

  12. [12]

    Z. He, X. Xu, and S. Deng. Discovering cluster-based local outliers.Pattern recognition letters, 24(9-10):1641–1650, 2003

  13. [13]

    Z. Li, H. Ma, and Y. Mei. A unifying method for outlier and change detection from data streams based on local polynomial fitting. InPacific-Asia Conference on Knowledge Discovery and Data Mining(PAKDD), pages 150–161, 2007. doi: 10.1007/978-3-540-71701-0_17

  14. [14]

    H. Ren, B. Xu, and Y. Wang. Time-series anomaly detection service at microsoft. InProceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, pages 3009–3017, 2019. doi: 10.1145/3292500.3342043

  15. [15]

    F. T. Liu, K. M. Ting, and Z. H. Zhou. Isolation forest. In2008 Eighth IEEE International Conference on Data Mining, pages 413–422, 2008. doi: 10.1109/ICDM.2008.17

  16. [16]

    Extended isolation forest,

    S. Hariri, M. C. Kind, and R. J. Brunner. Extended isolation forest.IEEE Transactions on Knowledge and Data Engineering, 33(4):1479–1489, 2021. doi: 10.1109/TKDE.2019.2947676

  17. [17]

    M. M. Breunig, H. P. Kriegel, R. T. Ng, and J. Sander. Lof: identifying density-based local outliers. InProceedings of the 2000 ACM SIGMOD International Conference on Management of Data, pages 93–104, 2000. doi: 10.1145/342009.335388

  18. [18]

    Efficient algorithms for mining outliers from large data sets

    S. Ramaswamy, R. Rastogi, and K. Shim. Efficient algorithms for mining outliers from large data sets. InProceedings of the 2000 ACM SIGMOD International Conference on Management of Data,pages427–438,2000. doi:10.1145/342009.335437

  19. [19]

    Goldstein and A

    M. Goldstein and A. Dengel. Histogram-based outlier score (hbos): A fast unsupervised anomaly detection algorithm. KI-2012: poster and demo track, 1:59–63, 2012

  20. [20]

    Z. Li, Y. Zhao, N. Botta, C. Ionescu, and X. Hu. Copod: Copula-based outlier detection. In2020 IEEE International Conference on Data Mining, pages 1118–1123, 2020. doi: 10.1109/ICDM50108.2020.00135

  21. [21]

    C. C. M. Yeh, Y. Zhu, L. Ulanova, N. Begum, Y. Ding, H. A. Dau, D. F. Silva, A. Mueen, and E. Ke. Matrix profile i: All pairs similarity joins for time series: A unifying view that includes motifs, discords and shapelets. In2016 IEEE 16th International Conference on Data Mining, pages 1317–1322, 2016. doi: 10.1109/ICDM.2016.0179

  22. [22]

    Boniol and T

    P. Boniol and T. Palpanas. Series2graph: Graph-based subsequence anomaly detection for time series.Proceedings of the VLDB Endowment, 13(12):1821–1834, 2020. doi: 10.14778/3407790.3407792

  23. [23]

    Boniol, J

    P. Boniol, J. Paparrizos, T. Palpanas, and M. J. Franklin. Sand: Streaming subsequence anomaly detection.Proceedings of the VLDB Endowment, 14(10):1717–1729, 2021. doi: 10.14778/3467861.3467863

  24. [24]

    Z. Wang, W. Yan, and T. Oates. Time series classification from scratch with deep neural networks: A strong baseline. In 2017 International Joint Conference on Neural Networks, pages 1578–1585, 2017. doi: 10.1109/IJCNN.2017.7966039

  25. [25]

    Munir, S

    M. Munir, S. A. Siddiqui, A. Dengel, and S. Ahmed. Deepant: A deep learning approach for unsupervised anomaly detection in time series.IEEE Access, 7:1991–2005, 2019. doi: 10.1109/ACCESS.2018.2886457

  26. [26]

    H. X. Wu, T. G. Hu, Y. Liu, H. Zhou, J. M. Wang, and M. S. Long. Timesnet: Temporal 2d-variation modeling for general time series analysis. InInternational Conference on Learning Representations (ICLR), 2023. URL https://openreview.net/forum?id=ju_Uqw384Oq

  27. [27]

    Malhotra, L

    P. Malhotra, L. Vig, G. Shroff, and P. Agarwal. Long short-term memory networks for anomaly detection in time series. In European Symposium on Artificial Neural Networks, Computational Intelligence and Machine Learning (ESANN), pages Youji Zhu et al.:Preprint submitted to ElsevierPage 19 of 21 PaAno+: Multiscale Encoding and Cross-Variable Attention for T...

  28. [28]

    Sakurada and T

    M. Sakurada and T. Yairi. Anomaly detection using autoencoders with nonlinear dimensionality reduction. InMLSDA 2nd Workshop on Machine Learning for Sensory Data Analysis, pages 4–11, 2014. doi: 10.1145/2689746.2689747

  29. [29]

    USAD: UnSupervised anomaly detection on multivariate time series,

    J. Audibert, P. Michiardi, F. Guyard, S. Marti, and M. A. Zuluaga. Usad: Unsupervised anomaly detection on multivariate time series. InProceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, pages 3395–3404, 2020. doi: 10.1145/3394486.3403392

  30. [30]

    Y. Su, Y. J. Zhao, C. H. Niu, R. Liu, W. Sun, and D. Pei. Robust anomaly detection for multivariate time series through stochastic recurrent neural network. InProceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, pages 2828–2837, 2019. doi: 10.1145/3292500.3330672

  31. [31]

    Z. J. Xu, A. L. Zeng, and Q. Xu. Fits: Modeling time series with 10k parameters. InInternational Conference on Learning Representations (ICLR), 2024. URLhttps://openreview.net/forum?id=h8eTPQz2jI

  32. [32]

    J. H. Xu, H. X. Wu, J. M. Wang, and M. S. Long. Anomaly transformer: Time series anomaly detection with association discrepancy. InInternational Conference on Learning Representations (ICLR), 2022. URLhttps://openreview.net/ forum?id=LzQQ89U1qm_

  33. [33]

    S. Tuli, G. Casale, and N. R. Jennings. Tranad: Deep transformer networks for anomaly detection in multivariate time series data.Proceedings of the VLDB Endowment, 15(6):1201–1214, 2022. doi: 10.14778/3514061.3514067

  34. [34]

    Y. Y. Yang, C. L. Zhang, T. Zhou, Q. S. Wen, and L. Sun. Dcdetector: Dual attention contrastive representation learning for time series anomaly detection. InProceedings of the 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, pages 3033–3045, 2023. doi: 10.1145/3580305.3599295

  35. [35]

    Y. Q. Nie, N. H. Nguyen, P. Sinthong, and J. Kalagnanam. A time series is worth 64 words: Long-term forecasting with transformers. InInternational Conference on Learning Representations (ICLR), 2023. URLhttps://openreview.net/ forum?id=Jbdc0vTOcol

  36. [36]

    Y. Liu, T. G. Hu, H. R. Zhang, H. X. Wu, S. Y. Wang, L. T. Ma, and M. S. Long. itransformer: Inverted transformers are effective for time series forecasting. InInternational Conference on Learning Representations (ICLR), 2024. URL https://openreview.net/pdf?id=JePfAI8fah

  37. [37]

    Goswami, K

    M. Goswami, K. Szafer, A. Choudhry, Y. F. Cai, S. Li, and A. Dubrawski. Moment: A family of open time-series foundation models. InProceedings of the 41st International Conference on Machine Learning (ICML), 2024. URL https://openreview.net/pdf?id=FVvf69a5rx

  38. [38]

    A. Das, W. H. Kong, R. Sen, and Y. C. Zhou. A decoder-only foundation model for time-series forecasting. InProceedings of the 41st International Conference on Machine Learning (ICML), 2024. URLhttps://openreview.net/forum?id= jn2iTJas6h

  39. [39]

    A. F. Ansari, L. Stella, A. C. Turkmen, X. Y. Zhang, P. Mercado, H. B. Shen, O. Shchur, S. S. Rangapuram, S. P. Arango, S. Kapoor, J. Zschiegner, D. C. Maddix, H. Wang, M. W. Mahoney, K. Torkkola, A. G. Wilson, M. Bohlke-Schneider, and B. Wang. Chronos: Learning the language of time series.Transactions on Machine Learning Research (TMLR), 2024. URLhttps:/...

  40. [40]

    Rasul, A

    K. Rasul, A. Ashok, A. R. Williams, H. Ghonia, R. Bhagwatkar, A. Khorasani, B. M. J. Darvishi, G. Adamopoulos, R. Riachi, N. Hassen, M. Biloš, S. Garg, A. Schneider, N. Chapados, A. Drouin, V. Zantedeschi, Y. Nevmyvaka, and I. Rish. Lag-llama: Towards foundation models for probabilistic time series forecasting. InR0-FoMo Workshop at NeurIPS 2023, 2023. UR...

  41. [41]

    Z. Z. Darban, G. I. Webb, S. R. Pan, C. C. Aggarwal, and M. Salehi. Deep learning for time series anomaly detection: A survey.ACM Computing Surveys, 57(1):1–42, 2024. doi: 10.1145/3735790.3735791

  42. [42]

    Schölkopf, A

    B. Schölkopf, A. J. Smola, R. C. Williamson, and P. L. Bartlett. New support vector algorithms.Neural Computation, 12(5):1207–1245, 2000. doi: 10.1162/089976600300015565

  43. [43]

    X. J. Wu, X. F. Qiu, Z. Y. Li, Y. H. Wang, J. L. Hu, C. J. Guo, H. Xiong, and B. Yang. Catch: Channel-aware multivariate time series anomaly detection via frequency patching. InICLR, 2025. URLhttps://openreview.net/forum? id=OY7NBoHUcy

  44. [44]

    Zhong, Z

    Z. Zhong, Z. Yu, Y. Yang, W. Wang, K. Yang, and C. L. P. Chen. Patchad: A lightweight patch-based mlp-mixer for time series anomaly detection.IEEE Transactions on Big Data, 11(6):3460–3473, 2025. doi: 10.1109/TBDATA.2025.3596745

  45. [45]

    F. H. Ismail, B. Lucas, G. Forestier, C. Pelletier, D. F. Schmidt, J. Weber, G. I. Webb, L. Idoumghar, P. A. Muller, and F. Petitjean. Inceptiontime: Finding alexnet for time series classification.Data Mining and Knowledge Discovery, 2020. doi: 10.1007/s10618-020-00710-y

  46. [46]

    S. Xia, W. Sun, X. Zou, P. Chen, D. Ma, H. Xu, M. Chen, and H. Li. Mfam-ad: An anomaly detection model for multivariate time series using attention mechanism.PeerJ Computer Science, 2024. doi: 10.7717/peerj-cs.2201

  47. [47]

    Zhang, B

    B. Zhang, B. Qi, J. Wang, and G. Liang. An improved gaussian mixture-probability hypothesis density filter for underwater multiple target tracking in dense clutter scenario. In2024 OES China Ocean Acoustics, pages 1–7, 2024. doi: 10.1109/COA58979.2024.10723628

  48. [48]

    H. F. Lee, Z. X. Zeng, Z. P. Qiu, W. F. Zhu, and R. L. Xiao. Cscad: Modeling cross-scale sequence correlations for multivariate time series anomaly detection.Information Processing and Management, 2025. doi: 10.1016/j.ipm.2025. 104315

  49. [49]

    W. S. Gao, X. Y. Wang, Y. Wang, and X. C. Jing. Dual-stream attention-enhanced memory networks for video anomaly detection.Sensors, 25(17), 2025. doi: 10.3390/s25175496

  50. [50]

    Z. Z. Darban, G. I. Webb, S. Pan, C. C. Aggarwal, and M. Salehi. Carla: Self-supervised contrastive representation learning for time series anomaly detection.Pattern Recognition, 157, 2025. doi: 10.1016/j.patcog.2024.110874. Youji Zhu et al.:Preprint submitted to ElsevierPage 20 of 21 PaAno+: Multiscale Encoding and Cross-Variable Attention for Time Serie...

  51. [51]

    X. Y. Yang, Z. G. Zhang, and R. Y. Cui. Timeclr: A self-supervised contrastive learning framework for univariate time series representation.Knowledge-Based Systems, 245, 2022. doi: 10.1016/j.knosys.2022.108606

  52. [52]

    G. Woo, C. H. Liu, D. Sahoo, A. Kumar, and S. C. H. Hoi. Cost: Contrastive learning of disentangled seasonal-trend representations for time series forecasting. InICLR, 2022. URLhttps://openreview.net/forum?id=PilZY3omXV2

  53. [53]

    Zhang, Z

    X. Zhang, Z. Y. Zhao, T. Tsiligkaridis, and M. Zitnik. Tf-c: Time-frequency contrastive learning for time series. In NeurIPS, 2022. URLhttps://openreview.net/forum?id=OJ4mMfGKLN

  54. [54]

    Y. S. Dai, H. Wang, K. Rafferty, I. Spence, and B. Quinn. Tdsrl: Time series dual self-supervised representation learning for anomaly detection from different perspectives, 2024. URLhttpshdl.handle.net/10419/289582

  55. [55]

    Q. Wang, H. Q. Zhu, W. Zhang, F. Jiang, X. L. Wang, and H. Huang. Maet: A generalizable masked autoencoding framework for anomaly detection in time-series data.Journal of Signal Processing Systems, 97:281–291, 2025. doi: 10.1007/s11265-025-01968-5

  56. [56]

    Y. C. Fang, J. D. Xie, Y. Zhao, L. Chen, Y. J. Gao, and K. Zheng. Tfmae: Temporal-frequency masked autoencoders for time series anomaly detection. InProceedings of the 40th IEEE International Conference on Data Engineering (ICDE), Utrecht, Netherlands, pages 1228–1241, 2024. doi: 10.1109/ICDE60146.2024.00099

  57. [57]

    J. Kim, K. Park, S. Yun, and S. Lee. Ppt: Patch order do matters in time series pretext task. InProceedings of the International Conference on Learning Representations (ICLR), Singapore, 2025. URLhttps://openreview.net/forum? id=7zwIEbSTDy

  58. [58]

    Ling and H

    Y. Ling and H. Shenda. Unsupervised time-series representation learning with iterative bilinear temporal-spectral fusion. InProceedings of the 39th International Conference on Machine Learning, PMLR, volume 162, pages 25038–25054, 2022. URLhttp://proceedings.mlr.press/v162/yang22e.html

  59. [59]

    Z. J. Zhong, Z. W. Yu, X. Xi, Y. Xu, W. M. Cao, Y. Y. Yang, K. X. Yang, and J. You. Simad: A simple dissimilarity- based approach for time-series anomaly detection.IEEE Transactions on Neural Networks and Learning Systems, 36(11): 19669–19680, 2025. doi: 10.1109/TNNLS.2025.3590220

  60. [60]

    A. Zeng, M. Chen, L. Zhang, and Q. Xu. Are transformers effective for time series forecasting? InProceedings of the AAAI conference on artificial intelligence, volume 37, pages 11121–11128, 2023

  61. [61]

    Q. C. Shentu, B. B. Li, K. Zhao, Y. Shu, Z. W. Rao, L. J. Pan, B. Yang, and C. J. Guo. Towards a general time series anomaly detector with adaptive bottlenecks and dual adversarial decoders. InInternational Conference on Learning Representations (ICLR), 2025. URLhttps://openreview.net/forum?id=aKcd7ImG5e

  62. [62]

    B. B. Li, Q. C. Shentu, Y. Shu, H. Zhang, M. Li, N. Jin, B. Yang, and C. J. Guo. Crossad: Time series anomaly detection with cross-scale associations and cross-window modeling. InIn Advances in Neural Information Processing Systems (NeurIPS 2025), 2025. URLhttps://nips.cc/virtual/2025/loc/san-diego/poster/116814. Youji Zhu et al.:Preprint submitted to Els...