pith. sign in

arxiv: 2507.15066 · v5 · submitted 2025-07-20 · 💻 cs.LG · cs.AI· cs.MM

Time-RA: Towards Time Series Reasoning for Anomaly Diagnosis with LLM Feedback

Pith reviewed 2026-05-19 03:26 UTC · model grok-4.3

classification 💻 cs.LG cs.AIcs.MM
keywords time series anomaly detectiongenerative reasoningmultimodal benchmarklarge language modelstransferabilityinterpretabilityanomaly diagnosisRATs40K
0
0 comments X p. Extension

The pith

Time-RA reframes anomaly detection as generative reasoning over multimodal time series data.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper proposes Time-RA, a task that shifts time series anomaly detection from binary classification to generating step-by-step diagnostic explanations. To enable this shift it releases RATs40K, a benchmark of roughly 40,000 real-world samples spanning ten domains that combine raw series, text context, visual plots, and structured reasoning annotations. Experiments demonstrate that supervised fine-tuning on this data, especially when visual plots are included, raises both diagnostic accuracy and the consistency of the generated reasoning. Fine-tuned models then transfer directly to previously unseen real-world datasets and surpass conventional detection baselines. The work therefore positions interpretable, multimodal reasoning as a practical next stage for time series analysis.

Core claim

Reformulating time series anomaly detection as the Time-RA generative task and releasing the RATs40K benchmark with structured reasoning annotations allows supervised fine-tuning plus visual inputs to improve diagnostic accuracy and reasoning consistency while delivering strong plug-and-play transferability that outperforms traditional baselines on unseen real-world datasets.

What carries the argument

RATs40K, the multimodal dataset that supplies raw time series, textual context, visual plots, and structured reasoning annotations to support generative anomaly diagnosis.

If this is right

  • Supervised fine-tuning on the benchmark raises both diagnostic accuracy and reasoning consistency.
  • Inclusion of visual plot representations further improves performance in complex anomaly scenarios.
  • Fine-tuned models transfer without retraining to unseen real-world datasets.
  • These transferred models outperform conventional anomaly detection baselines on the new data.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Explanatory outputs could be directly integrated into operational dashboards so analysts receive both an alert and its supporting reasoning.
  • The same multimodal annotation style may generalize to related time series tasks such as forecasting with justifications.
  • Expanding the benchmark to additional domains would test how far the observed transferability extends.

Load-bearing premise

The structured reasoning annotations in RATs40K accurately and consistently represent valid diagnostic reasoning for the anomalies.

What would settle it

A fine-tuned model producing lower accuracy or inconsistent reasoning than traditional baselines when tested on a new, held-out real-world time series collection would falsify the transferability result.

Figures

Figures reproduced from arXiv: 2507.15066 by Jiang Bian, Kai Ying, Lei Song, Qingsong Wen, Svitlana Vyetrenko, Tom Bamford, Yiyuan Yang, Zhiguang Wang, Zichuan Liu.

Figure 1
Figure 1. Figure 1: Comparison between traditional TSAD task and the [PITH_FULL_IMAGE:figures/full_fig_p002_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: The Time-RA workflow: A novel multimodal reasoning framework for multi-task time series anomaly detection. and recurrent neural networks [57], learn temporal dependencies and can capture complex nonlinearities. Transformer-based meth￾ods have further enhanced long-range dependency modeling and become state-of-the-art in many benchmark datasets [66, 69, 72]. Besides, approaches designed for time series foun… view at source ↗
Figure 3
Figure 3. Figure 3: RATs40K dataset construction pipeline. We ensure diversity by sampling anomalous thoughts and decision actions from a large model pool, then we query GPT-4 with detailed definitions and prompts for preference selection and correction to generate high-quality, fine-grained annotations. 4.1 Dataset Collection We first collect multimodal time-series anomaly data, incorporat￾ing numeric time series, descriptiv… view at source ↗
Figure 4
Figure 4. Figure 4: Comparison of LLM-generated labels with expert [PITH_FULL_IMAGE:figures/full_fig_p006_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Textual reasoning quality based on Likert scale. [PITH_FULL_IMAGE:figures/full_fig_p007_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: Univariate Time-RA case study. Multivariate Time Series Example - Normal Image: Multivariate Time Series Example - Anomaly Image: Label: Anomaly Source: Medical-ECG Action: Temporal Dependency Anomaly Thought: The three ECG channels exhibit a complex pattern with notable deviations in their dynamics. Initially, all channels show relatively stable oscillations, but around time step 30, there is a significan… view at source ↗
Figure 7
Figure 7. Figure 7: Multivariate Time-RA case study [PITH_FULL_IMAGE:figures/full_fig_p016_7.png] view at source ↗
Figure 8
Figure 8. Figure 8: Global statistics of model response in the model pool. [PITH_FULL_IMAGE:figures/full_fig_p017_8.png] view at source ↗
read the original abstract

Time series anomaly detection (TSAD) has traditionally focused on binary classification and often lacks the fine-grained categorization and explanatory reasoning required for transparent decision-making. To address these limitations, we propose Time-series Reasoning for Anomaly (Time-RA), a novel task that reformulates TSAD from a discriminative into a generative, reasoning-intensive paradigm. To facilitate this, we introduce RATs40K, the first real-world large-scale multimodal benchmark with ~40,000 samples across 10 domains, integrating raw time series, textual context, and visual plots with structured reasoning annotations. Extensive benchmarking shows that while supervised fine-tuning and visual representations boost diagnostic accuracy and reasoning consistency, performance varies across complex scenarios. Notably, fine-tuned models demonstrate strong "plug-and-play" transferability, outperforming traditional baselines on unseen real-world datasets. Our work establishes a foundation for interpretable, multimodal time series analysis. All code (https://github.com/yyysjz1997/Time-RA) and the RATs40K dataset (https://huggingface.co/datasets/Time-RA/RATs40K) are fully open-sourced to facilitate future research.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 1 minor

Summary. The paper claims to reformulate time series anomaly detection (TSAD) as a generative reasoning task called Time-RA. It introduces RATs40K, a multimodal benchmark dataset with approximately 40,000 samples from 10 domains that includes raw time series data, textual context, visual plots, and structured reasoning annotations. Benchmarking experiments indicate that supervised fine-tuning and the use of visual representations enhance diagnostic accuracy and reasoning consistency. Importantly, fine-tuned models exhibit strong plug-and-play transferability, outperforming traditional baselines on unseen real-world datasets. The work aims to establish a foundation for interpretable multimodal time series analysis and provides open access to code and data.

Significance. If the results hold, this contribution could significantly impact the field by shifting focus from binary anomaly detection to explanatory reasoning, improving transparency and usability in practical applications across diverse domains. The provision of a large-scale real-world dataset and open-sourced resources strengthens the potential for follow-up work and reproducibility.

major comments (2)
  1. The abstract reports positive benchmarking outcomes for fine-tuning and visual inputs, but lacks details on statistical significance, error bars, or controls for annotation quality, leaving only moderate support for the central transferability claim.
  2. The structured reasoning annotations in RATs40K are central to the training and evaluation, yet the manuscript provides limited information on the annotation process and quality assurance, which is critical given that the transferability relies on these annotations accurately representing valid diagnostic reasoning.
minor comments (1)
  1. Clarify the distinction between the proposed Time-RA task and existing TSAD methods more explicitly in the introduction.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive comments and positive overall assessment of the work. We address each major comment below and commit to revisions that strengthen the manuscript without altering its core contributions.

read point-by-point responses
  1. Referee: The abstract reports positive benchmarking outcomes for fine-tuning and visual inputs, but lacks details on statistical significance, error bars, or controls for annotation quality, leaving only moderate support for the central transferability claim.

    Authors: We agree that the current presentation would benefit from greater statistical rigor. In the revised manuscript we will add error bars to all reported metrics, include statistical significance tests (e.g., paired t-tests or Wilcoxon tests) for the key comparisons that support transferability, and explicitly reference the expanded annotation-quality controls described in response to the second comment. These additions will be made both in the abstract and in the experimental sections. revision: yes

  2. Referee: The structured reasoning annotations in RATs40K are central to the training and evaluation, yet the manuscript provides limited information on the annotation process and quality assurance, which is critical given that the transferability relies on these annotations accurately representing valid diagnostic reasoning.

    Authors: We acknowledge that the manuscript currently provides only a high-level description of the annotation process. In the revision we will expand the dataset-construction section to detail: the annotation guidelines and task instructions given to annotators, the domain expertise of the annotators, inter-annotator agreement statistics (e.g., Cohen’s kappa or Krippendorff’s alpha), and the multi-stage quality-assurance pipeline (initial annotation, expert review, and resolution of disagreements). These additions will directly support the validity of the transferability experiments. revision: yes

Circularity Check

0 steps flagged

No significant circularity: empirical benchmark and new dataset

full rationale

The paper defines a new task (Time-RA) and releases a new multimodal dataset (RATs40K) with structured annotations, then reports empirical benchmarking results on fine-tuned models and their transfer performance to unseen datasets. No equations, fitted parameters, or mathematical derivations appear in the provided text. Central claims rest on experimental outcomes rather than any self-referential reduction of a prediction to its own inputs or to a self-citation chain. The work is self-contained against external benchmarks because the dataset and code are open-sourced and the evaluations use held-out and unseen real-world data.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claims rest on the validity of the new task definition and the quality of the human-provided reasoning annotations in the benchmark; no free parameters or invented entities are evident from the abstract.

axioms (1)
  • domain assumption Multimodal inputs (raw time series, text context, and visual plots) can be jointly used to elicit consistent diagnostic reasoning from LLMs.
    This underpins the benchmark design and the reported gains from visual representations.

pith-pipeline@v0.9.0 · 5765 in / 1205 out tokens · 38892 ms · 2026-05-19T03:26:48.138161+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

83 extracted references · 83 canonical work pages · 6 internal anchors

  1. [1]

    Ahmed Abdulaal, Zhuanghua Liu, and Tomer Lancewicki. 2021. Practical ap- proach to asynchronous multivariate time series anomaly detection and localiza- tion. In Proceedings of the 27th ACM SIGKDD conference on knowledge discovery & data mining. 2485–2494

  2. [2]

    Josh Achiam, Steven Adler, Sandhini Agarwal, Lama Ahmad, Ilge Akkaya, Floren- cia Leoni Aleman, Diogo Almeida, Janko Altenschmidt, Sam Altman, Shyamal Anadkat, et al. 2023. Gpt-4 technical report. arXiv preprint arXiv:2303.08774 (2023)

  3. [3]

    Subutai Ahmad, Alexander Lavin, Scott Purdy, and Zuha Agha. 2017. Unsuper- vised real-time anomaly detection for streaming data.Neurocomputing 262 (2017), 134–147

  4. [4]

    Marc Bachlin, Meir Plotnik, Daniel Roggen, Inbal Maidan, Jeffrey M Hausdorff, Nir Giladi, and Gerhard Troster. 2009. Wearable assistant for Parkinson’s disease patients with the freezing of gait symptom. IEEE Transactions on Information Technology in Biomedicine 14, 2 (2009), 436–446

  5. [5]

    Alexander Bakhtin, Jesse Nyyssölä, Yuqing Wang, Noman Ahmad, Ke Ping, Matteo Esposito, Mika Mäntylä, and Davide Taibi. 2025. LO2: Microservice API Anomaly Dataset of Logs and Metrics. arXiv preprint arXiv:2504.12067 (2025)

  6. [6]

    Guillermo Barrenetxea. 2019. Sensorscope Data. https://doi.org/10.5281/zenodo. 2654726 Data set

  7. [7]

    Ane Blázquez-García, Angel Conde, Usue Mori, and Jose A Lozano. 2021. A review on outlier/anomaly detection in time series data. ACM computing surveys (CSUR) 54, 3 (2021), 1–33

  8. [8]

    Paul Boniol, Qinghua Liu, Mingyi Huang, Themis Palpanas, and John Paparrizos

  9. [9]

    arXiv preprint arXiv:2412.20512 (2024)

    Dive into time-series anomaly detection: A decade review. arXiv preprint arXiv:2412.20512 (2024)

  10. [10]

    Paul Boniol, John Paparrizos, Themis Palpanas, and Michael J Franklin. 2021. Sand: streaming subsequence anomaly detection. Proceedings of the VLDB Endowment 14, 10 (2021), 1717–1729

  11. [11]

    George EP Box and David A Pierce. 1970. Distribution of residual autocorrelations in autoregressive-integrated moving average time series models. Journal of the American statistical Association 65, 332 (1970), 1509–1526

  12. [12]

    Varun Chandola, Arindam Banerjee, and Vipin Kumar. 2009. Anomaly detection: A survey. ACM computing surveys (CSUR) 41, 3 (2009), 1–58

  13. [13]

    Jialin Chen, Aosong Feng, Ziyu Zhao, Juan Garza, Gaukhar Nurbek, Cheng Qin, Ali Maatouk, Leandros Tassiulas, Yifeng Gao, and Rex Ying. 2025. Mtbench: A multimodal time series benchmark for temporal reasoning and question answer- ing. arXiv preprint arXiv:2503.16858 (2025)

  14. [14]

    Wei-Lin Chiang, Zhuohan Li, Zi Lin, Ying Sheng, Zhanghao Wu, Hao Zhang, Lianmin Zheng, Siyuan Zhuang, Yonghao Zhuang, Joseph E Gonzalez, et al. 2023. Vicuna: An open-source chatbot impressing gpt-4 with 90%* chatgpt quality, March 2023. URL https://lmsys. org/blog/2023-03-30-vicuna 3, 5 (2023)

  15. [15]

    Nwodo Benita Chikodili, Mohammed D Abdulmalik, Opeyemi A Abisoye, and Sulaimon A Bashir. 2020. Outlier detection in multivariate time series data using a fusion of K-medoid, standardized euclidean distance and Z-score. In Interna- tional Conference on Information and Communication Technology and Applications. Springer, 259–271

  16. [16]

    Winnie Chow, Lauren Gardiner, Haraldur T Hallgrímsson, Maxwell A Xu, and Shirley You Ren. 2024. Towards time series reasoning with llms. arXiv preprint arXiv:2409.11376 (2024)

  17. [17]

    Kevin Clark, Minh-Thang Luong, Quoc V Le, and Christopher D Manning. 2020. Electra: Pre-training text encoders as discriminators rather than generators.arXiv preprint arXiv:2003.10555 (2020)

  18. [18]

    Andrew A Cook, Göksel Mısırlı, and Zhong Fan. 2019. Anomaly detection for IoT time-series data: A survey. IEEE Internet of Things Journal 7, 7 (2019), 6481–6494

  19. [19]

    Ganqu Cui, Lifan Yuan, Ning Ding, Guanming Yao, Bingxiang He, Wei Zhu, Yuan Ni, Guotong Xie, Ruobing Xie, Yankai Lin, Zhiyuan Liu, and Maosong Sun. 2024. ULTRAFEEDBACK: Boosting Language Models with Scaled AI Feedback. In ICML. 9722–9744

  20. [20]

    Hoang Anh Dau, Anthony Bagnall, Kaveh Kamgar, Chin-Chia Michael Yeh, Yan Zhu, Shaghayegh Gharghabi, Chotirat Ann Ratanamahatana, and Eamonn Keogh

  21. [21]

    IEEE/CAA Journal of Automatica Sinica 6, 6 (2019), 1293–1305

    The UCR time series archive. IEEE/CAA Journal of Automatica Sinica 6, 6 (2019), 1293–1305

  22. [22]

    Pavel Filonov, Andrey Lavrentyev, and Artem Vorontsov. 2016. Multivariate industrial time series with cyber-attack simulation: Fault detection using an lstm-based predictive data model. arXiv preprint arXiv:1612.06676 (2016)

  23. [23]

    P Fleith. 2023. Controlled Anomalies Time Series (CATS) Dataset. Solenix Engineering GmbH, Feb (2023)

  24. [24]

    Jingkun Gao, Xiaomin Song, Qingsong Wen, Pichao Wang, Liang Sun, and Huan Xu. 2020. Robusttad: Robust time series anomaly detection via decomposition and convolutional neural networks. arXiv preprint arXiv:2002.09545 (2020)

  25. [25]

    Ary L Goldberger, Luis AN Amaral, Leon Glass, Jeffrey M Hausdorff, Plamen Ch Ivanov, Roger G Mark, Joseph E Mietus, George B Moody, Chung-Kang Peng, and H Eugene Stanley. 2000. PhysioBank, PhysioToolkit, and PhysioNet: components of a new research resource for complex physiologic signals. circulation 101, 23 (2000), e215–e220. KDD ’26, August 9-13, 2026, ...

  26. [26]

    Mononito Goswami, Konrad Szafer, Arjun Choudhry, Yifu Cai, Shuo Li, and Artur Dubrawski. 2024. MOMENT: A Family of Open Time-series Foundation Models. In International Conference on Machine Learning

  27. [27]

    Aaron Grattafiori, Abhimanyu Dubey, Abhinav Jauhri, Abhinav Pandey, Abhishek Kadian, Ahmad Al-Dahle, Aiesha Letman, Akhil Mathur, Alan Schelten, Alex Vaughan, et al. 2024. The llama 3 herd of models. arXiv preprint arXiv:2407.21783 (2024)

  28. [28]

    Scott David Greenwald, Ramesh S Patil, and Roger G Mark. 1990. Improved detection and classification of arrhythmias in noise-corrupted electrocardiograms using contextual information. IEEE

  29. [29]

    Daya Guo, Dejian Yang, Haowei Zhang, Junxiao Song, Ruoyu Zhang, Runxin Xu, Qihao Zhu, Shirong Ma, Peiyi Wang, Xiao Bi, et al . 2025. Deepseek-r1: Incentivizing reasoning capability in llms via reinforcement learning. arXiv preprint arXiv:2501.12948 (2025)

  30. [30]

    Edward J Hu, Yelong Shen, Phillip Wallis, Zeyuan Allen-Zhu, Yuanzhi Li, Shean Wang, Lu Wang, Weizhu Chen, et al. 2022. Lora: Low-rank adaptation of large language models. ICLR 1, 2 (2022), 3

  31. [31]

    Kyle Hundman, Valentino Constantinou, Christopher Laporte, Ian Colwell, and Tom Soderstrom. 2018. Detecting spacecraft anomalies using lstms and nonpara- metric dynamic thresholding. In Proceedings of the 24th ACM SIGKDD interna- tional conference on knowledge discovery & data mining . 387–395

  32. [32]

    Vincent Jacob, Fei Song, Arnaud Stiegler, Bijan Rad, Yanlei Diao, and Nesime Tatbul. 2020. Exathlon: A benchmark for explainable anomaly detection over time series. arXiv preprint arXiv:2010.05073 (2020)

  33. [33]

    Ming Jin, Yifan Zhang, Wei Chen, Kexin Zhang, Yuxuan Liang, Bin Yang, Jindong Wang, Shirui Pan, and Qingsong Wen. 2024. Position: What can large language models tell us about time series analysis. In Forty-first International Conference on Machine Learning

  34. [34]

    Neha Kant and Manish Mahajan. 2019. Time-series outlier detection using enhanced k-means in combination with pso algorithm. In Engineering Vibration, Communication and Information Processing: ICoEVCI 2018, India . Springer, 363– 373

  35. [35]

    Paweł Karczmarek, Adam Kiersztyn, Witold Pedrycz, and Ebru Al. 2020. K-means- based isolation forest. Knowledge-based systems 195 (2020), 105659

  36. [36]

    Maryam Mahsal Khan and Mohammed Alkhathami. 2024. Anomaly detection in IoT-based healthcare: machine learning for enhanced security. Scientific reports 14, 1 (2024), 5872

  37. [37]

    Yaxuan Kong, Yiyuan Yang, Yoontae Hwang, Wenjie Du, Stefan Zohren, Zhangyang Wang, Ming Jin, and Qingsong Wen. 2025. Time-mqa: Time se- ries multi-task question answering with context enhancement. arXiv preprint arXiv:2503.01875 (2025)

  38. [38]

    Yaxuan Kong, Yiyuan Yang, Shiyu Wang, Chenghao Liu, Yuxuan Liang, Ming Jin, Stefan Zohren, Dan Pei, Yan Liu, and Qingsong Wen. 2025. Position: Empowering Time Series Reasoning with Multimodal LLMs. arXiv preprint arXiv:2502.01477 (2025)

  39. [39]

    Kwei-Herng Lai, Daochen Zha, Junjie Xu, Yue Zhao, Guanchu Wang, and Xia Hu. 2021. Revisiting time series outlier detection: Definitions and benchmarks. In Thirty-fifth conference on neural information processing systems datasets and benchmarks track (round 1)

  40. [40]

    N Laptev, S Amizadeh, and Y Billawala. 2015. S5-A labeled anomaly detection dataset, version 1.0 (16M)

  41. [41]

    Gen Li and Jason J Jung. 2023. Deep learning for anomaly detection in multivariate time series: Approaches, applications, and challenges. Information Fusion 91 (2023), 93–102

  42. [42]

    Yuxuan Liang, Haomin Wen, Yuqi Nie, Yushan Jiang, Ming Jin, Dongjin Song, Shirui Pan, and Qingsong Wen. 2024. Foundation models for time series analysis: A tutorial and survey. In Proceedings of the 30th ACM SIGKDD conference on knowledge discovery and data mining . 6555–6565

  43. [43]

    Chen Liu, Shibo He, Shizhong Li, Zhenyu Shi, and Wenchao Meng. 2025. De- tecting Both Seen and Unseen Anomalies in Time Series. ACM Transactions on Knowledge Discovery from Data 19, 4 (2025), 1–29

  44. [44]

    Chen Liu, Shibo He, Qihang Zhou, Shizhong Li, and Wenchao Meng. 2024. Large language model guided knowledge distillation for time series anomaly detection. arXiv preprint arXiv:2401.15123 (2024)

  45. [45]

    Haoxin Liu, Harshavardhan Kamarthi, Zhiyuan Zhao, Shangqing Xu, Shiyu Wang, Qingsong Wen, Tom Hartvigsen, Fei Wang, and B Aditya Prakash. 2025. How can time series analysis benefit from multiple modalities? a survey and outlook. arXiv preprint arXiv:2503.11835 (2025)

  46. [46]

    Haoxin Liu, Shangqing Xu, Zhiyuan Zhao, Lingkai Kong, Harshavardhan Prab- hakar Kamarthi, Aditya Sasanur, Megha Sharma, Jiaming Cui, Qingsong Wen, Chao Zhang, et al. 2024. Time-mmd: Multi-domain multimodal dataset for time series analysis. Advances in Neural Information Processing Systems 37 (2024), 77888–77933

  47. [47]

    Jun Liu, Chaoyun Zhang, Jiaxu Qian, Minghua Ma, Si Qin, Chetan Bansal, Qingwei Lin, Saravan Rajmohan, and Dongmei Zhang. 2024. Large language models can deliver accurate and interpretable time series anomaly detection. arXiv preprint arXiv:2405.15370 (2024)

  48. [48]

    Qinghua Liu and John Paparrizos. 2024. The elephant in the room: Towards a re- liable time-series anomaly detection benchmark. Advances in Neural Information Processing Systems 37 (2024), 108231–108261

  49. [49]

    Zichuan Liu, Tianchun Wang, Jimeng Shi, Xu Zheng, Zhuomin Chen, Lei Song, Wenqian Dong, Jayantha Obeysekera, Farhad Shirani, and Dongsheng Luo. 2024. TIMEX++ learning time-series explanations with information bottleneck. In ICML. 32062–32082

  50. [50]

    Zichuan Liu, Yingying ZHANG, Tianchun Wang, Zefan Wang, Dongsheng Luo, Mengnan Du, Min Wu, Yi Wang, Chunlin Chen, Lunting Fan, et al. [n. d.]. Ex- plaining Time Series via Contrastive and Locally Sparse Perturbations. In ICLR

  51. [51]

    Daehyung Park, Yuuna Hoshi, and Charles C Kemp. 2018. A multimodal anomaly detector for robot-assisted feeding using an lstm-based variational autoencoder. IEEE Robotics and Automation Letters 3, 3 (2018), 1544–1551

  52. [52]

    Peter CB Phillips and Sainan Jin. 2021. Business cycles, trend elimination, and the HP filter. International Economic Review 62, 2 (2021), 469–520

  53. [53]

    Hansheng Ren, Bixiong Xu, Yujing Wang, Chao Yi, Congrui Huang, Xiaoyu Kou, Tony Xing, Mao Yang, Jie Tong, and Qi Zhang. 2019. Time-series anomaly detec- tion service at microsoft. In Proceedings of the 25th ACM SIGKDD international conference on knowledge discovery & data mining . 3009–3017

  54. [54]

    Lukas Ruff, Robert Vandermeulen, Nico Goernitz, Lucas Deecke, Shoaib Ahmed Siddiqui, Alexander Binder, Emmanuel Müller, and Marius Kloft. 2018. Deep one-class classification. In International conference on machine learning . PMLR, 4393–4402

  55. [55]

    Mayu Sakurada and Takehisa Yairi. 2014. Anomaly detection using autoencoders with nonlinear dimensionality reduction. In Proceedings of the MLSDA 2014 2nd workshop on machine learning for sensory data analysis . 4–11

  56. [56]

    Sebastian Schmidl, Phillip Wenig, and Thorsten Papenbrock. 2022. Anomaly detection in time series: a comprehensive evaluation. Proceedings of the VLDB Endowment 15, 9 (2022), 1779–1797

  57. [57]

    Youjin Shin, Sangyup Lee, Shahroz Tariq, Myeong Shin Lee, Okchul Jung, Daewon Chung, and Simon S Woo. 2020. Itad: integrative tensor-based anomaly detection system for reducing false positives of satellite systems. In Proceedings of the 29th ACM international conference on information & knowledge management . 2733–2740

  58. [58]

    Haotian Si, Jianhui Li, Changhua Pei, Hang Cui, Jingwen Yang, Yongqian Sun, Shenglin Zhang, Jingjing Li, Haiming Zhang, Jing Han, et al. 2024. Timeseries- bench: An industrial-grade benchmark for time series anomaly detection models. In 2024 IEEE 35th International Symposium on Software Reliability Engineering (ISSRE). IEEE, 61–72

  59. [59]

    Ya Su, Youjian Zhao, Chenhao Niu, Rong Liu, Wei Sun, and Dan Pei. 2019. Robust anomaly detection for multivariate time series through stochastic recurrent neural network. In Proceedings of the 25th ACM SIGKDD international conference on knowledge discovery & data mining . 2828–2837

  60. [60]

    Gemini Team, Rohan Anil, Sebastian Borgeaud, Jean-Baptiste Alayrac, Jiahui Yu, Radu Soricut, Johan Schalkwyk, Andrew M Dai, Anja Hauth, Katie Millican, et al. 2023. Gemini: a family of highly capable multimodal models. arXiv preprint arXiv:2312.11805 (2023)

  61. [61]

    Markus Thill, Wolfgang Konen, and Thomas Bäck. 2020. MarkusThill/MGAB: the mackey-glass anomaly benchmark. Version v1. 0.1. Zenodo. doi 10 (2020)

  62. [62]

    Luan Tran, Liyue Fan, and Cyrus Shahabi. 2016. Distance-based outlier detection in data streams. Proceedings of the VLDB Endowment 9, 12 (2016), 1089–1100

  63. [63]

    Hubert Truchan and Zahra Ahmadi. 2025. Nonastreda: Multimodal Dataset for Identifying Tool Wear Condition. https://doi.org/10.17632/m892d2wtzh.1. doi:10.17632/m892d2wtzh.1

  64. [64]

    Bingxing Wang, Yuxin Dong, Jianhua Yao, Honglin Qin, and Jiajing Wang. 2024. Exploring anomaly detection and risk assessment in financial markets using deep neural networks. International Journal of Innovative Research in Computer Science and Technology 12, 4 (2024)

  65. [65]

    Chengsen Wang, Qi Qi, Jingyu Wang, Haifeng Sun, Zirui Zhuang, Jinming Wu, Lei Zhang, and Jianxin Liao. 2025. ChatTime: A Unified Multimodal Time Series Foundation Model Bridging Numerical and Textual Data. In AAAI Conference on Artificial Intelligence

  66. [66]

    Haixu Wu, Tengge Hu, Yong Liu, Hang Zhou, Jianmin Wang, and Mingsheng Long. 2023. TimesNet: Temporal 2D-Variation Modeling for General Time Series Analysis. In International Conference on Learning Representations

  67. [67]

    Renjie Wu and Eamonn J Keogh. 2021. Current time series anomaly detection benchmarks are flawed and are creating the illusion of progress.IEEE transactions on knowledge and data engineering 35, 3 (2021), 2421–2429

  68. [68]

    Jiehui Xu, Haixu Wu, Jianmin Wang, and Mingsheng Long. 2021. Anomaly transformer: Time series anomaly detection with association discrepancy. arXiv preprint arXiv:2110.02642 (2021)

  69. [69]

    Xiongxiao Xu, Haoran Wang, Yueqing Liang, Philip S Yu, Yue Zhao, and Kai Shu

  70. [70]

    Can Multimodal LLMs Perform Time Series Anomaly Detection? arXiv preprint arXiv:2502.17812 (2025)

  71. [71]

    Kun Yang, Samory Kpotufe, and Nick Feamster. 2021. An efficient one-class SVM for anomaly detection in the Internet of Things. arXiv preprint arXiv:2104.11146 (2021)

  72. [72]

    Yiyuan Yang, Chaoli Zhang, Tian Zhou, Qingsong Wen, and Liang Sun. 2023. Dcdetector: Dual attention contrastive representation learning for time series Time-RA: Towards Time Series Reasoning for Anomaly with LLM Feedback KDD ’26, August 9-13, 2026, Jeju, South Korea anomaly detection. In Proceedings of the 29th ACM SIGKDD Conference on Knowl- edge Discove...

  73. [73]

    Zahra Zamanzadeh Darban, Geoffrey I Webb, Shirui Pan, Charu Aggarwal, and Mahsa Salehi. 2024. Deep learning for time series anomaly detection: A survey. Comput. Surveys 57, 1 (2024), 1–42

  74. [74]

    Chaoli Zhang, Tian Zhou, Qingsong Wen, and Liang Sun. 2022. Tfad: A de- composition time series anomaly detection architecture with time-frequency analysis. In Proceedings of the 31st ACM international conference on information & knowledge management. 2497–2507

  75. [75]

    Zhijie Zhong, Zhiwen Yu, Yiyuan Yang, Weizheng Wang, and Kaixiang Yang

  76. [76]

    arXiv preprint arXiv:2401.09793 (2024)

    PatchAD: A lightweight patch-based MLP-mixer for time series anomaly detection. arXiv preprint arXiv:2401.09793 (2024)

  77. [77]

    Tian Zhou, Peisong Niu, Liang Sun, Rong Jin, et al . 2023. One fits all: Power general time series analysis by pretrained lm. Advances in neural information processing systems 36 (2023), 43322–43355

  78. [78]

    Zihao Zhou and Rose Yu. 2024. Can LLMs Understand Time Series Anomalies? arXiv:2410.05440 [cs.LG] https://arxiv.org/abs/2410.05440 KDD ’26, August 9-13, 2026, Jeju, South Korea Yang et al. A LIMITATION AND FURTHER DIRECTION For the limitations, there are some worth noting: (i) Detection of Multiple Anomaly Types in a Single Sequence. In cases where multip...

  79. [79]

    **Irrelevant**: No alignment

  80. [80]

    **Partial Focus**: Poor handling in a certain aspect, such as misclassifica- tion of exceptions

Showing first 80 references.