Time-RA: Towards Time Series Reasoning for Anomaly Diagnosis with LLM Feedback

arxiv: 2507.15066 · v5 · submitted 2025-07-20 · 💻 cs.LG · cs.AI· cs.MM

Time-RA: Towards Time Series Reasoning for Anomaly Diagnosis with LLM Feedback

Yiyuan Yang , Zichuan Liu , Lei Song , Kai Ying , Zhiguang Wang , Tom Bamford , Svitlana Vyetrenko , Jiang Bian

show 1 more author

Qingsong Wen

This is my paper

Pith reviewed 2026-05-19 03:26 UTC · model grok-4.3

classification 💻 cs.LG cs.AIcs.MM

keywords time series anomaly detectiongenerative reasoningmultimodal benchmarklarge language modelstransferabilityinterpretabilityanomaly diagnosisRATs40K

0 comments p. Extension

The pith

Time-RA reframes anomaly detection as generative reasoning over multimodal time series data.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper proposes Time-RA, a task that shifts time series anomaly detection from binary classification to generating step-by-step diagnostic explanations. To enable this shift it releases RATs40K, a benchmark of roughly 40,000 real-world samples spanning ten domains that combine raw series, text context, visual plots, and structured reasoning annotations. Experiments demonstrate that supervised fine-tuning on this data, especially when visual plots are included, raises both diagnostic accuracy and the consistency of the generated reasoning. Fine-tuned models then transfer directly to previously unseen real-world datasets and surpass conventional detection baselines. The work therefore positions interpretable, multimodal reasoning as a practical next stage for time series analysis.

Core claim

Reformulating time series anomaly detection as the Time-RA generative task and releasing the RATs40K benchmark with structured reasoning annotations allows supervised fine-tuning plus visual inputs to improve diagnostic accuracy and reasoning consistency while delivering strong plug-and-play transferability that outperforms traditional baselines on unseen real-world datasets.

What carries the argument

RATs40K, the multimodal dataset that supplies raw time series, textual context, visual plots, and structured reasoning annotations to support generative anomaly diagnosis.

If this is right

Supervised fine-tuning on the benchmark raises both diagnostic accuracy and reasoning consistency.
Inclusion of visual plot representations further improves performance in complex anomaly scenarios.
Fine-tuned models transfer without retraining to unseen real-world datasets.
These transferred models outperform conventional anomaly detection baselines on the new data.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Explanatory outputs could be directly integrated into operational dashboards so analysts receive both an alert and its supporting reasoning.
The same multimodal annotation style may generalize to related time series tasks such as forecasting with justifications.
Expanding the benchmark to additional domains would test how far the observed transferability extends.

Load-bearing premise

The structured reasoning annotations in RATs40K accurately and consistently represent valid diagnostic reasoning for the anomalies.

What would settle it

A fine-tuned model producing lower accuracy or inconsistent reasoning than traditional baselines when tested on a new, held-out real-world time series collection would falsify the transferability result.

Figures

Figures reproduced from arXiv: 2507.15066 by Jiang Bian, Kai Ying, Lei Song, Qingsong Wen, Svitlana Vyetrenko, Tom Bamford, Yiyuan Yang, Zhiguang Wang, Zichuan Liu.

**Figure 2.** Figure 2: The Time-RA workflow: A novel multimodal reasoning framework for multi-task time series anomaly detection. and recurrent neural networks [57], learn temporal dependencies and can capture complex nonlinearities. Transformer-based methods have further enhanced long-range dependency modeling and become state-of-the-art in many benchmark datasets [66, 69, 72]. Besides, approaches designed for time series foun… view at source ↗

**Figure 3.** Figure 3: RATs40K dataset construction pipeline. We ensure diversity by sampling anomalous thoughts and decision actions from a large model pool, then we query GPT-4 with detailed definitions and prompts for preference selection and correction to generate high-quality, fine-grained annotations. 4.1 Dataset Collection We first collect multimodal time-series anomaly data, incorporating numeric time series, descriptiv… view at source ↗

**Figure 4.** Figure 4: Comparison of LLM-generated labels with expert [PITH_FULL_IMAGE:figures/full_fig_p006_4.png] view at source ↗

**Figure 5.** Figure 5: Textual reasoning quality based on Likert scale. [PITH_FULL_IMAGE:figures/full_fig_p007_5.png] view at source ↗

**Figure 6.** Figure 6: Univariate Time-RA case study. Multivariate Time Series Example - Normal Image: Multivariate Time Series Example - Anomaly Image: Label: Anomaly Source: Medical-ECG Action: Temporal Dependency Anomaly Thought: The three ECG channels exhibit a complex pattern with notable deviations in their dynamics. Initially, all channels show relatively stable oscillations, but around time step 30, there is a significan… view at source ↗

**Figure 7.** Figure 7: Multivariate Time-RA case study [PITH_FULL_IMAGE:figures/full_fig_p016_7.png] view at source ↗

**Figure 8.** Figure 8: Global statistics of model response in the model pool. [PITH_FULL_IMAGE:figures/full_fig_p017_8.png] view at source ↗

read the original abstract

Time series anomaly detection (TSAD) has traditionally focused on binary classification and often lacks the fine-grained categorization and explanatory reasoning required for transparent decision-making. To address these limitations, we propose Time-series Reasoning for Anomaly (Time-RA), a novel task that reformulates TSAD from a discriminative into a generative, reasoning-intensive paradigm. To facilitate this, we introduce RATs40K, the first real-world large-scale multimodal benchmark with ~40,000 samples across 10 domains, integrating raw time series, textual context, and visual plots with structured reasoning annotations. Extensive benchmarking shows that while supervised fine-tuning and visual representations boost diagnostic accuracy and reasoning consistency, performance varies across complex scenarios. Notably, fine-tuned models demonstrate strong "plug-and-play" transferability, outperforming traditional baselines on unseen real-world datasets. Our work establishes a foundation for interpretable, multimodal time series analysis. All code (https://github.com/yyysjz1997/Time-RA) and the RATs40K dataset (https://huggingface.co/datasets/Time-RA/RATs40K) are fully open-sourced to facilitate future research.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

This paper adds a reasoning task and a large open multimodal benchmark to time series anomaly detection, with transfer results that look usable but rest on thin statistical reporting.

read the letter

The punchline is that this work introduces a reasoning-focused task for time series anomalies along with a sizable new benchmark, and the transfer results look decent at first glance. They reframe the usual detection problem into something more like generating explanations, which they name Time-RA. To support it they built RATs40K, a collection of about 40,000 multimodal examples spanning 10 domains. Each sample includes the time series data, some text context, visual plots, and structured reasoning annotations. That combination is new compared to standard TSAD benchmarks that stick to binary labels. They also released all the code and the dataset publicly, which is a practical plus for anyone wanting to experiment. The experiments show gains from supervised fine-tuning and from feeding in visual representations. The models appear to generalize reasonably to datasets they weren't trained on, beating traditional methods in those cases. This plug-and-play aspect is the part that could matter for real applications. Where it feels thinner is the supporting evidence. The abstract talks about positive benchmarking but does not include error bars, p-values, or details on how they ensured the reasoning annotations are high quality and consistent. If the annotations vary a lot between labelers, that could affect how much we trust the fine-tuning and transfer claims. They acknowledge that results are not uniform across complex scenarios, so the benefits might be narrower than the headline suggests. Overall this is aimed at researchers who work on making anomaly detection more useful in practice by adding interpretability, especially through multimodal and language model approaches. Someone building tools for domain experts in finance or healthcare might find the benchmark useful. The combination of a new task definition, fresh data, and open resources gives it enough substance to merit peer review. I would recommend putting it through the review process. The core setup is straightforward and the data release helps, even though the current writeup leaves some questions about the strength of the empirical support.

Referee Report

2 major / 1 minor

Summary. The paper claims to reformulate time series anomaly detection (TSAD) as a generative reasoning task called Time-RA. It introduces RATs40K, a multimodal benchmark dataset with approximately 40,000 samples from 10 domains that includes raw time series data, textual context, visual plots, and structured reasoning annotations. Benchmarking experiments indicate that supervised fine-tuning and the use of visual representations enhance diagnostic accuracy and reasoning consistency. Importantly, fine-tuned models exhibit strong plug-and-play transferability, outperforming traditional baselines on unseen real-world datasets. The work aims to establish a foundation for interpretable multimodal time series analysis and provides open access to code and data.

Significance. If the results hold, this contribution could significantly impact the field by shifting focus from binary anomaly detection to explanatory reasoning, improving transparency and usability in practical applications across diverse domains. The provision of a large-scale real-world dataset and open-sourced resources strengthens the potential for follow-up work and reproducibility.

major comments (2)

The abstract reports positive benchmarking outcomes for fine-tuning and visual inputs, but lacks details on statistical significance, error bars, or controls for annotation quality, leaving only moderate support for the central transferability claim.
The structured reasoning annotations in RATs40K are central to the training and evaluation, yet the manuscript provides limited information on the annotation process and quality assurance, which is critical given that the transferability relies on these annotations accurately representing valid diagnostic reasoning.

minor comments (1)

Clarify the distinction between the proposed Time-RA task and existing TSAD methods more explicitly in the introduction.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive comments and positive overall assessment of the work. We address each major comment below and commit to revisions that strengthen the manuscript without altering its core contributions.

read point-by-point responses

Referee: The abstract reports positive benchmarking outcomes for fine-tuning and visual inputs, but lacks details on statistical significance, error bars, or controls for annotation quality, leaving only moderate support for the central transferability claim.

Authors: We agree that the current presentation would benefit from greater statistical rigor. In the revised manuscript we will add error bars to all reported metrics, include statistical significance tests (e.g., paired t-tests or Wilcoxon tests) for the key comparisons that support transferability, and explicitly reference the expanded annotation-quality controls described in response to the second comment. These additions will be made both in the abstract and in the experimental sections. revision: yes
Referee: The structured reasoning annotations in RATs40K are central to the training and evaluation, yet the manuscript provides limited information on the annotation process and quality assurance, which is critical given that the transferability relies on these annotations accurately representing valid diagnostic reasoning.

Authors: We acknowledge that the manuscript currently provides only a high-level description of the annotation process. In the revision we will expand the dataset-construction section to detail: the annotation guidelines and task instructions given to annotators, the domain expertise of the annotators, inter-annotator agreement statistics (e.g., Cohen’s kappa or Krippendorff’s alpha), and the multi-stage quality-assurance pipeline (initial annotation, expert review, and resolution of disagreements). These additions will directly support the validity of the transferability experiments. revision: yes

Circularity Check

0 steps flagged

No significant circularity: empirical benchmark and new dataset

full rationale

The paper defines a new task (Time-RA) and releases a new multimodal dataset (RATs40K) with structured annotations, then reports empirical benchmarking results on fine-tuned models and their transfer performance to unseen datasets. No equations, fitted parameters, or mathematical derivations appear in the provided text. Central claims rest on experimental outcomes rather than any self-referential reduction of a prediction to its own inputs or to a self-citation chain. The work is self-contained against external benchmarks because the dataset and code are open-sourced and the evaluations use held-out and unseen real-world data.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claims rest on the validity of the new task definition and the quality of the human-provided reasoning annotations in the benchmark; no free parameters or invented entities are evident from the abstract.

axioms (1)

domain assumption Multimodal inputs (raw time series, text context, and visual plots) can be jointly used to elicit consistent diagnostic reasoning from LLMs.
This underpins the benchmark design and the reported gains from visual representations.

pith-pipeline@v0.9.0 · 5765 in / 1205 out tokens · 38892 ms · 2026-05-19T03:26:48.138161+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Foundation/RealityFromDistinction.lean reality_from_one_distinction unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

We propose Time-series Reasoning for Anomaly (Time-RA) that transforms classical time series anomaly detection from a discriminative into a generative, reasoning-intensive task leveraging Large Language Models (LLMs).
IndisputableMonolith/Foundation/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

We introduce RATs40K, the first real-world multimodal benchmark dataset with ~40,000 samples across 10 domains, integrating raw time series, textual context, and visual plots with structured reasoning annotations.

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

83 extracted references · 83 canonical work pages · 6 internal anchors

[1]

Ahmed Abdulaal, Zhuanghua Liu, and Tomer Lancewicki. 2021. Practical ap- proach to asynchronous multivariate time series anomaly detection and localiza- tion. In Proceedings of the 27th ACM SIGKDD conference on knowledge discovery & data mining. 2485–2494

work page 2021
[2]

Josh Achiam, Steven Adler, Sandhini Agarwal, Lama Ahmad, Ilge Akkaya, Floren- cia Leoni Aleman, Diogo Almeida, Janko Altenschmidt, Sam Altman, Shyamal Anadkat, et al. 2023. Gpt-4 technical report. arXiv preprint arXiv:2303.08774 (2023)

work page internal anchor Pith review Pith/arXiv arXiv 2023
[3]

Subutai Ahmad, Alexander Lavin, Scott Purdy, and Zuha Agha. 2017. Unsuper- vised real-time anomaly detection for streaming data.Neurocomputing 262 (2017), 134–147

work page 2017
[4]

Marc Bachlin, Meir Plotnik, Daniel Roggen, Inbal Maidan, Jeffrey M Hausdorff, Nir Giladi, and Gerhard Troster. 2009. Wearable assistant for Parkinson’s disease patients with the freezing of gait symptom. IEEE Transactions on Information Technology in Biomedicine 14, 2 (2009), 436–446

work page 2009
[5]

Alexander Bakhtin, Jesse Nyyssölä, Yuqing Wang, Noman Ahmad, Ke Ping, Matteo Esposito, Mika Mäntylä, and Davide Taibi. 2025. LO2: Microservice API Anomaly Dataset of Logs and Metrics. arXiv preprint arXiv:2504.12067 (2025)

work page arXiv 2025
[6]

Guillermo Barrenetxea. 2019. Sensorscope Data. https://doi.org/10.5281/zenodo. 2654726 Data set

work page doi:10.5281/zenodo 2019
[7]

Ane Blázquez-García, Angel Conde, Usue Mori, and Jose A Lozano. 2021. A review on outlier/anomaly detection in time series data. ACM computing surveys (CSUR) 54, 3 (2021), 1–33

work page 2021
[8]

Paul Boniol, Qinghua Liu, Mingyi Huang, Themis Palpanas, and John Paparrizos

work page
[9]

arXiv preprint arXiv:2412.20512 (2024)

Dive into time-series anomaly detection: A decade review. arXiv preprint arXiv:2412.20512 (2024)

work page arXiv 2024
[10]

Paul Boniol, John Paparrizos, Themis Palpanas, and Michael J Franklin. 2021. Sand: streaming subsequence anomaly detection. Proceedings of the VLDB Endowment 14, 10 (2021), 1717–1729

work page 2021
[11]

George EP Box and David A Pierce. 1970. Distribution of residual autocorrelations in autoregressive-integrated moving average time series models. Journal of the American statistical Association 65, 332 (1970), 1509–1526

work page 1970
[12]

Varun Chandola, Arindam Banerjee, and Vipin Kumar. 2009. Anomaly detection: A survey. ACM computing surveys (CSUR) 41, 3 (2009), 1–58

work page 2009
[13]

Jialin Chen, Aosong Feng, Ziyu Zhao, Juan Garza, Gaukhar Nurbek, Cheng Qin, Ali Maatouk, Leandros Tassiulas, Yifeng Gao, and Rex Ying. 2025. Mtbench: A multimodal time series benchmark for temporal reasoning and question answer- ing. arXiv preprint arXiv:2503.16858 (2025)

work page arXiv 2025
[14]

Wei-Lin Chiang, Zhuohan Li, Zi Lin, Ying Sheng, Zhanghao Wu, Hao Zhang, Lianmin Zheng, Siyuan Zhuang, Yonghao Zhuang, Joseph E Gonzalez, et al. 2023. Vicuna: An open-source chatbot impressing gpt-4 with 90%* chatgpt quality, March 2023. URL https://lmsys. org/blog/2023-03-30-vicuna 3, 5 (2023)

work page 2023
[15]

Nwodo Benita Chikodili, Mohammed D Abdulmalik, Opeyemi A Abisoye, and Sulaimon A Bashir. 2020. Outlier detection in multivariate time series data using a fusion of K-medoid, standardized euclidean distance and Z-score. In Interna- tional Conference on Information and Communication Technology and Applications. Springer, 259–271

work page 2020
[16]

Winnie Chow, Lauren Gardiner, Haraldur T Hallgrímsson, Maxwell A Xu, and Shirley You Ren. 2024. Towards time series reasoning with llms. arXiv preprint arXiv:2409.11376 (2024)

work page arXiv 2024
[17]

Kevin Clark, Minh-Thang Luong, Quoc V Le, and Christopher D Manning. 2020. Electra: Pre-training text encoders as discriminators rather than generators.arXiv preprint arXiv:2003.10555 (2020)

work page internal anchor Pith review Pith/arXiv arXiv 2020
[18]

Andrew A Cook, Göksel Mısırlı, and Zhong Fan. 2019. Anomaly detection for IoT time-series data: A survey. IEEE Internet of Things Journal 7, 7 (2019), 6481–6494

work page 2019
[19]

Ganqu Cui, Lifan Yuan, Ning Ding, Guanming Yao, Bingxiang He, Wei Zhu, Yuan Ni, Guotong Xie, Ruobing Xie, Yankai Lin, Zhiyuan Liu, and Maosong Sun. 2024. ULTRAFEEDBACK: Boosting Language Models with Scaled AI Feedback. In ICML. 9722–9744

work page 2024
[20]

Hoang Anh Dau, Anthony Bagnall, Kaveh Kamgar, Chin-Chia Michael Yeh, Yan Zhu, Shaghayegh Gharghabi, Chotirat Ann Ratanamahatana, and Eamonn Keogh

work page
[21]

IEEE/CAA Journal of Automatica Sinica 6, 6 (2019), 1293–1305

The UCR time series archive. IEEE/CAA Journal of Automatica Sinica 6, 6 (2019), 1293–1305

work page 2019
[22]

Pavel Filonov, Andrey Lavrentyev, and Artem Vorontsov. 2016. Multivariate industrial time series with cyber-attack simulation: Fault detection using an lstm-based predictive data model. arXiv preprint arXiv:1612.06676 (2016)

work page internal anchor Pith review Pith/arXiv arXiv 2016
[23]

P Fleith. 2023. Controlled Anomalies Time Series (CATS) Dataset. Solenix Engineering GmbH, Feb (2023)

work page 2023
[24]

Jingkun Gao, Xiaomin Song, Qingsong Wen, Pichao Wang, Liang Sun, and Huan Xu. 2020. Robusttad: Robust time series anomaly detection via decomposition and convolutional neural networks. arXiv preprint arXiv:2002.09545 (2020)

work page arXiv 2020
[25]

Ary L Goldberger, Luis AN Amaral, Leon Glass, Jeffrey M Hausdorff, Plamen Ch Ivanov, Roger G Mark, Joseph E Mietus, George B Moody, Chung-Kang Peng, and H Eugene Stanley. 2000. PhysioBank, PhysioToolkit, and PhysioNet: components of a new research resource for complex physiologic signals. circulation 101, 23 (2000), e215–e220. KDD ’26, August 9-13, 2026, ...

work page 2000
[26]

Mononito Goswami, Konrad Szafer, Arjun Choudhry, Yifu Cai, Shuo Li, and Artur Dubrawski. 2024. MOMENT: A Family of Open Time-series Foundation Models. In International Conference on Machine Learning

work page 2024
[27]

Aaron Grattafiori, Abhimanyu Dubey, Abhinav Jauhri, Abhinav Pandey, Abhishek Kadian, Ahmad Al-Dahle, Aiesha Letman, Akhil Mathur, Alan Schelten, Alex Vaughan, et al. 2024. The llama 3 herd of models. arXiv preprint arXiv:2407.21783 (2024)

work page internal anchor Pith review Pith/arXiv arXiv 2024
[28]

Scott David Greenwald, Ramesh S Patil, and Roger G Mark. 1990. Improved detection and classification of arrhythmias in noise-corrupted electrocardiograms using contextual information. IEEE

work page 1990
[29]

Daya Guo, Dejian Yang, Haowei Zhang, Junxiao Song, Ruoyu Zhang, Runxin Xu, Qihao Zhu, Shirong Ma, Peiyi Wang, Xiao Bi, et al . 2025. Deepseek-r1: Incentivizing reasoning capability in llms via reinforcement learning. arXiv preprint arXiv:2501.12948 (2025)

work page internal anchor Pith review Pith/arXiv arXiv 2025
[30]

Edward J Hu, Yelong Shen, Phillip Wallis, Zeyuan Allen-Zhu, Yuanzhi Li, Shean Wang, Lu Wang, Weizhu Chen, et al. 2022. Lora: Low-rank adaptation of large language models. ICLR 1, 2 (2022), 3

work page 2022
[31]

Kyle Hundman, Valentino Constantinou, Christopher Laporte, Ian Colwell, and Tom Soderstrom. 2018. Detecting spacecraft anomalies using lstms and nonpara- metric dynamic thresholding. In Proceedings of the 24th ACM SIGKDD interna- tional conference on knowledge discovery & data mining . 387–395

work page 2018
[32]

Vincent Jacob, Fei Song, Arnaud Stiegler, Bijan Rad, Yanlei Diao, and Nesime Tatbul. 2020. Exathlon: A benchmark for explainable anomaly detection over time series. arXiv preprint arXiv:2010.05073 (2020)

work page arXiv 2020
[33]

Ming Jin, Yifan Zhang, Wei Chen, Kexin Zhang, Yuxuan Liang, Bin Yang, Jindong Wang, Shirui Pan, and Qingsong Wen. 2024. Position: What can large language models tell us about time series analysis. In Forty-first International Conference on Machine Learning

work page 2024
[34]

Neha Kant and Manish Mahajan. 2019. Time-series outlier detection using enhanced k-means in combination with pso algorithm. In Engineering Vibration, Communication and Information Processing: ICoEVCI 2018, India . Springer, 363– 373

work page 2019
[35]

Paweł Karczmarek, Adam Kiersztyn, Witold Pedrycz, and Ebru Al. 2020. K-means- based isolation forest. Knowledge-based systems 195 (2020), 105659

work page 2020
[36]

Maryam Mahsal Khan and Mohammed Alkhathami. 2024. Anomaly detection in IoT-based healthcare: machine learning for enhanced security. Scientific reports 14, 1 (2024), 5872

work page 2024
[37]

Yaxuan Kong, Yiyuan Yang, Yoontae Hwang, Wenjie Du, Stefan Zohren, Zhangyang Wang, Ming Jin, and Qingsong Wen. 2025. Time-mqa: Time se- ries multi-task question answering with context enhancement. arXiv preprint arXiv:2503.01875 (2025)

work page arXiv 2025
[38]

Yaxuan Kong, Yiyuan Yang, Shiyu Wang, Chenghao Liu, Yuxuan Liang, Ming Jin, Stefan Zohren, Dan Pei, Yan Liu, and Qingsong Wen. 2025. Position: Empowering Time Series Reasoning with Multimodal LLMs. arXiv preprint arXiv:2502.01477 (2025)

work page arXiv 2025
[39]

Kwei-Herng Lai, Daochen Zha, Junjie Xu, Yue Zhao, Guanchu Wang, and Xia Hu. 2021. Revisiting time series outlier detection: Definitions and benchmarks. In Thirty-fifth conference on neural information processing systems datasets and benchmarks track (round 1)

work page 2021
[40]

N Laptev, S Amizadeh, and Y Billawala. 2015. S5-A labeled anomaly detection dataset, version 1.0 (16M)

work page 2015
[41]

Gen Li and Jason J Jung. 2023. Deep learning for anomaly detection in multivariate time series: Approaches, applications, and challenges. Information Fusion 91 (2023), 93–102

work page 2023
[42]

Yuxuan Liang, Haomin Wen, Yuqi Nie, Yushan Jiang, Ming Jin, Dongjin Song, Shirui Pan, and Qingsong Wen. 2024. Foundation models for time series analysis: A tutorial and survey. In Proceedings of the 30th ACM SIGKDD conference on knowledge discovery and data mining . 6555–6565

work page 2024
[43]

Chen Liu, Shibo He, Shizhong Li, Zhenyu Shi, and Wenchao Meng. 2025. De- tecting Both Seen and Unseen Anomalies in Time Series. ACM Transactions on Knowledge Discovery from Data 19, 4 (2025), 1–29

work page 2025
[44]

Chen Liu, Shibo He, Qihang Zhou, Shizhong Li, and Wenchao Meng. 2024. Large language model guided knowledge distillation for time series anomaly detection. arXiv preprint arXiv:2401.15123 (2024)

work page arXiv 2024
[45]

Haoxin Liu, Harshavardhan Kamarthi, Zhiyuan Zhao, Shangqing Xu, Shiyu Wang, Qingsong Wen, Tom Hartvigsen, Fei Wang, and B Aditya Prakash. 2025. How can time series analysis benefit from multiple modalities? a survey and outlook. arXiv preprint arXiv:2503.11835 (2025)

work page arXiv 2025
[46]

Haoxin Liu, Shangqing Xu, Zhiyuan Zhao, Lingkai Kong, Harshavardhan Prab- hakar Kamarthi, Aditya Sasanur, Megha Sharma, Jiaming Cui, Qingsong Wen, Chao Zhang, et al. 2024. Time-mmd: Multi-domain multimodal dataset for time series analysis. Advances in Neural Information Processing Systems 37 (2024), 77888–77933

work page 2024
[47]

Jun Liu, Chaoyun Zhang, Jiaxu Qian, Minghua Ma, Si Qin, Chetan Bansal, Qingwei Lin, Saravan Rajmohan, and Dongmei Zhang. 2024. Large language models can deliver accurate and interpretable time series anomaly detection. arXiv preprint arXiv:2405.15370 (2024)

work page arXiv 2024
[48]

Qinghua Liu and John Paparrizos. 2024. The elephant in the room: Towards a re- liable time-series anomaly detection benchmark. Advances in Neural Information Processing Systems 37 (2024), 108231–108261

work page 2024
[49]

Zichuan Liu, Tianchun Wang, Jimeng Shi, Xu Zheng, Zhuomin Chen, Lei Song, Wenqian Dong, Jayantha Obeysekera, Farhad Shirani, and Dongsheng Luo. 2024. TIMEX++ learning time-series explanations with information bottleneck. In ICML. 32062–32082

work page 2024
[50]

Zichuan Liu, Yingying ZHANG, Tianchun Wang, Zefan Wang, Dongsheng Luo, Mengnan Du, Min Wu, Yi Wang, Chunlin Chen, Lunting Fan, et al. [n. d.]. Ex- plaining Time Series via Contrastive and Locally Sparse Perturbations. In ICLR

work page
[51]

Daehyung Park, Yuuna Hoshi, and Charles C Kemp. 2018. A multimodal anomaly detector for robot-assisted feeding using an lstm-based variational autoencoder. IEEE Robotics and Automation Letters 3, 3 (2018), 1544–1551

work page 2018
[52]

Peter CB Phillips and Sainan Jin. 2021. Business cycles, trend elimination, and the HP filter. International Economic Review 62, 2 (2021), 469–520

work page 2021
[53]

Hansheng Ren, Bixiong Xu, Yujing Wang, Chao Yi, Congrui Huang, Xiaoyu Kou, Tony Xing, Mao Yang, Jie Tong, and Qi Zhang. 2019. Time-series anomaly detec- tion service at microsoft. In Proceedings of the 25th ACM SIGKDD international conference on knowledge discovery & data mining . 3009–3017

work page 2019
[54]

Lukas Ruff, Robert Vandermeulen, Nico Goernitz, Lucas Deecke, Shoaib Ahmed Siddiqui, Alexander Binder, Emmanuel Müller, and Marius Kloft. 2018. Deep one-class classification. In International conference on machine learning . PMLR, 4393–4402

work page 2018
[55]

Mayu Sakurada and Takehisa Yairi. 2014. Anomaly detection using autoencoders with nonlinear dimensionality reduction. In Proceedings of the MLSDA 2014 2nd workshop on machine learning for sensory data analysis . 4–11

work page 2014
[56]

Sebastian Schmidl, Phillip Wenig, and Thorsten Papenbrock. 2022. Anomaly detection in time series: a comprehensive evaluation. Proceedings of the VLDB Endowment 15, 9 (2022), 1779–1797

work page 2022
[57]

Youjin Shin, Sangyup Lee, Shahroz Tariq, Myeong Shin Lee, Okchul Jung, Daewon Chung, and Simon S Woo. 2020. Itad: integrative tensor-based anomaly detection system for reducing false positives of satellite systems. In Proceedings of the 29th ACM international conference on information & knowledge management . 2733–2740

work page 2020
[58]

Haotian Si, Jianhui Li, Changhua Pei, Hang Cui, Jingwen Yang, Yongqian Sun, Shenglin Zhang, Jingjing Li, Haiming Zhang, Jing Han, et al. 2024. Timeseries- bench: An industrial-grade benchmark for time series anomaly detection models. In 2024 IEEE 35th International Symposium on Software Reliability Engineering (ISSRE). IEEE, 61–72

work page 2024
[59]

Ya Su, Youjian Zhao, Chenhao Niu, Rong Liu, Wei Sun, and Dan Pei. 2019. Robust anomaly detection for multivariate time series through stochastic recurrent neural network. In Proceedings of the 25th ACM SIGKDD international conference on knowledge discovery & data mining . 2828–2837

work page 2019
[60]

Gemini Team, Rohan Anil, Sebastian Borgeaud, Jean-Baptiste Alayrac, Jiahui Yu, Radu Soricut, Johan Schalkwyk, Andrew M Dai, Anja Hauth, Katie Millican, et al. 2023. Gemini: a family of highly capable multimodal models. arXiv preprint arXiv:2312.11805 (2023)

work page internal anchor Pith review Pith/arXiv arXiv 2023
[61]

Markus Thill, Wolfgang Konen, and Thomas Bäck. 2020. MarkusThill/MGAB: the mackey-glass anomaly benchmark. Version v1. 0.1. Zenodo. doi 10 (2020)

work page 2020
[62]

Luan Tran, Liyue Fan, and Cyrus Shahabi. 2016. Distance-based outlier detection in data streams. Proceedings of the VLDB Endowment 9, 12 (2016), 1089–1100

work page 2016
[63]

Hubert Truchan and Zahra Ahmadi. 2025. Nonastreda: Multimodal Dataset for Identifying Tool Wear Condition. https://doi.org/10.17632/m892d2wtzh.1. doi:10.17632/m892d2wtzh.1

work page doi:10.17632/m892d2wtzh.1 2025
[64]

Bingxing Wang, Yuxin Dong, Jianhua Yao, Honglin Qin, and Jiajing Wang. 2024. Exploring anomaly detection and risk assessment in financial markets using deep neural networks. International Journal of Innovative Research in Computer Science and Technology 12, 4 (2024)

work page 2024
[65]

Chengsen Wang, Qi Qi, Jingyu Wang, Haifeng Sun, Zirui Zhuang, Jinming Wu, Lei Zhang, and Jianxin Liao. 2025. ChatTime: A Unified Multimodal Time Series Foundation Model Bridging Numerical and Textual Data. In AAAI Conference on Artificial Intelligence

work page 2025
[66]

Haixu Wu, Tengge Hu, Yong Liu, Hang Zhou, Jianmin Wang, and Mingsheng Long. 2023. TimesNet: Temporal 2D-Variation Modeling for General Time Series Analysis. In International Conference on Learning Representations

work page 2023
[67]

Renjie Wu and Eamonn J Keogh. 2021. Current time series anomaly detection benchmarks are flawed and are creating the illusion of progress.IEEE transactions on knowledge and data engineering 35, 3 (2021), 2421–2429

work page 2021
[68]

Jiehui Xu, Haixu Wu, Jianmin Wang, and Mingsheng Long. 2021. Anomaly transformer: Time series anomaly detection with association discrepancy. arXiv preprint arXiv:2110.02642 (2021)

work page arXiv 2021
[69]

Xiongxiao Xu, Haoran Wang, Yueqing Liang, Philip S Yu, Yue Zhao, and Kai Shu

work page
[70]

Can Multimodal LLMs Perform Time Series Anomaly Detection? arXiv preprint arXiv:2502.17812 (2025)

work page arXiv 2025
[71]

Kun Yang, Samory Kpotufe, and Nick Feamster. 2021. An efficient one-class SVM for anomaly detection in the Internet of Things. arXiv preprint arXiv:2104.11146 (2021)

work page arXiv 2021
[72]

Yiyuan Yang, Chaoli Zhang, Tian Zhou, Qingsong Wen, and Liang Sun. 2023. Dcdetector: Dual attention contrastive representation learning for time series Time-RA: Towards Time Series Reasoning for Anomaly with LLM Feedback KDD ’26, August 9-13, 2026, Jeju, South Korea anomaly detection. In Proceedings of the 29th ACM SIGKDD Conference on Knowl- edge Discove...

work page 2023
[73]

Zahra Zamanzadeh Darban, Geoffrey I Webb, Shirui Pan, Charu Aggarwal, and Mahsa Salehi. 2024. Deep learning for time series anomaly detection: A survey. Comput. Surveys 57, 1 (2024), 1–42

work page 2024
[74]

Chaoli Zhang, Tian Zhou, Qingsong Wen, and Liang Sun. 2022. Tfad: A de- composition time series anomaly detection architecture with time-frequency analysis. In Proceedings of the 31st ACM international conference on information & knowledge management. 2497–2507

work page 2022
[75]

Zhijie Zhong, Zhiwen Yu, Yiyuan Yang, Weizheng Wang, and Kaixiang Yang

work page
[76]

arXiv preprint arXiv:2401.09793 (2024)

PatchAD: A lightweight patch-based MLP-mixer for time series anomaly detection. arXiv preprint arXiv:2401.09793 (2024)

work page arXiv 2024
[77]

Tian Zhou, Peisong Niu, Liang Sun, Rong Jin, et al . 2023. One fits all: Power general time series analysis by pretrained lm. Advances in neural information processing systems 36 (2023), 43322–43355

work page 2023
[78]

Zihao Zhou and Rose Yu. 2024. Can LLMs Understand Time Series Anomalies? arXiv:2410.05440 [cs.LG] https://arxiv.org/abs/2410.05440 KDD ’26, August 9-13, 2026, Jeju, South Korea Yang et al. A LIMITATION AND FURTHER DIRECTION For the limitations, there are some worth noting: (i) Detection of Multiple Anomaly Types in a Single Sequence. In cases where multip...

work page arXiv 2024
[79]

**Irrelevant**: No alignment

work page
[80]

**Partial Focus**: Poor handling in a certain aspect, such as misclassifica- tion of exceptions

work page

Showing first 80 references.

[1] [1]

Ahmed Abdulaal, Zhuanghua Liu, and Tomer Lancewicki. 2021. Practical ap- proach to asynchronous multivariate time series anomaly detection and localiza- tion. In Proceedings of the 27th ACM SIGKDD conference on knowledge discovery & data mining. 2485–2494

work page 2021

[2] [2]

Josh Achiam, Steven Adler, Sandhini Agarwal, Lama Ahmad, Ilge Akkaya, Floren- cia Leoni Aleman, Diogo Almeida, Janko Altenschmidt, Sam Altman, Shyamal Anadkat, et al. 2023. Gpt-4 technical report. arXiv preprint arXiv:2303.08774 (2023)

work page internal anchor Pith review Pith/arXiv arXiv 2023

[3] [3]

Subutai Ahmad, Alexander Lavin, Scott Purdy, and Zuha Agha. 2017. Unsuper- vised real-time anomaly detection for streaming data.Neurocomputing 262 (2017), 134–147

work page 2017

[4] [4]

Marc Bachlin, Meir Plotnik, Daniel Roggen, Inbal Maidan, Jeffrey M Hausdorff, Nir Giladi, and Gerhard Troster. 2009. Wearable assistant for Parkinson’s disease patients with the freezing of gait symptom. IEEE Transactions on Information Technology in Biomedicine 14, 2 (2009), 436–446

work page 2009

[5] [5]

Alexander Bakhtin, Jesse Nyyssölä, Yuqing Wang, Noman Ahmad, Ke Ping, Matteo Esposito, Mika Mäntylä, and Davide Taibi. 2025. LO2: Microservice API Anomaly Dataset of Logs and Metrics. arXiv preprint arXiv:2504.12067 (2025)

work page arXiv 2025

[6] [6]

Guillermo Barrenetxea. 2019. Sensorscope Data. https://doi.org/10.5281/zenodo. 2654726 Data set

work page doi:10.5281/zenodo 2019

[7] [7]

Ane Blázquez-García, Angel Conde, Usue Mori, and Jose A Lozano. 2021. A review on outlier/anomaly detection in time series data. ACM computing surveys (CSUR) 54, 3 (2021), 1–33

work page 2021

[8] [8]

Paul Boniol, Qinghua Liu, Mingyi Huang, Themis Palpanas, and John Paparrizos

work page

[9] [9]

arXiv preprint arXiv:2412.20512 (2024)

Dive into time-series anomaly detection: A decade review. arXiv preprint arXiv:2412.20512 (2024)

work page arXiv 2024

[10] [10]

Paul Boniol, John Paparrizos, Themis Palpanas, and Michael J Franklin. 2021. Sand: streaming subsequence anomaly detection. Proceedings of the VLDB Endowment 14, 10 (2021), 1717–1729

work page 2021

[11] [11]

George EP Box and David A Pierce. 1970. Distribution of residual autocorrelations in autoregressive-integrated moving average time series models. Journal of the American statistical Association 65, 332 (1970), 1509–1526

work page 1970

[12] [12]

Varun Chandola, Arindam Banerjee, and Vipin Kumar. 2009. Anomaly detection: A survey. ACM computing surveys (CSUR) 41, 3 (2009), 1–58

work page 2009

[13] [13]

Jialin Chen, Aosong Feng, Ziyu Zhao, Juan Garza, Gaukhar Nurbek, Cheng Qin, Ali Maatouk, Leandros Tassiulas, Yifeng Gao, and Rex Ying. 2025. Mtbench: A multimodal time series benchmark for temporal reasoning and question answer- ing. arXiv preprint arXiv:2503.16858 (2025)

work page arXiv 2025

[14] [14]

Wei-Lin Chiang, Zhuohan Li, Zi Lin, Ying Sheng, Zhanghao Wu, Hao Zhang, Lianmin Zheng, Siyuan Zhuang, Yonghao Zhuang, Joseph E Gonzalez, et al. 2023. Vicuna: An open-source chatbot impressing gpt-4 with 90%* chatgpt quality, March 2023. URL https://lmsys. org/blog/2023-03-30-vicuna 3, 5 (2023)

work page 2023

[15] [15]

Nwodo Benita Chikodili, Mohammed D Abdulmalik, Opeyemi A Abisoye, and Sulaimon A Bashir. 2020. Outlier detection in multivariate time series data using a fusion of K-medoid, standardized euclidean distance and Z-score. In Interna- tional Conference on Information and Communication Technology and Applications. Springer, 259–271

work page 2020

[16] [16]

Winnie Chow, Lauren Gardiner, Haraldur T Hallgrímsson, Maxwell A Xu, and Shirley You Ren. 2024. Towards time series reasoning with llms. arXiv preprint arXiv:2409.11376 (2024)

work page arXiv 2024

[17] [17]

Kevin Clark, Minh-Thang Luong, Quoc V Le, and Christopher D Manning. 2020. Electra: Pre-training text encoders as discriminators rather than generators.arXiv preprint arXiv:2003.10555 (2020)

work page internal anchor Pith review Pith/arXiv arXiv 2020

[18] [18]

Andrew A Cook, Göksel Mısırlı, and Zhong Fan. 2019. Anomaly detection for IoT time-series data: A survey. IEEE Internet of Things Journal 7, 7 (2019), 6481–6494

work page 2019

[19] [19]

Ganqu Cui, Lifan Yuan, Ning Ding, Guanming Yao, Bingxiang He, Wei Zhu, Yuan Ni, Guotong Xie, Ruobing Xie, Yankai Lin, Zhiyuan Liu, and Maosong Sun. 2024. ULTRAFEEDBACK: Boosting Language Models with Scaled AI Feedback. In ICML. 9722–9744

work page 2024

[20] [20]

Hoang Anh Dau, Anthony Bagnall, Kaveh Kamgar, Chin-Chia Michael Yeh, Yan Zhu, Shaghayegh Gharghabi, Chotirat Ann Ratanamahatana, and Eamonn Keogh

work page

[21] [21]

IEEE/CAA Journal of Automatica Sinica 6, 6 (2019), 1293–1305

The UCR time series archive. IEEE/CAA Journal of Automatica Sinica 6, 6 (2019), 1293–1305

work page 2019

[22] [22]

Pavel Filonov, Andrey Lavrentyev, and Artem Vorontsov. 2016. Multivariate industrial time series with cyber-attack simulation: Fault detection using an lstm-based predictive data model. arXiv preprint arXiv:1612.06676 (2016)

work page internal anchor Pith review Pith/arXiv arXiv 2016

[23] [23]

P Fleith. 2023. Controlled Anomalies Time Series (CATS) Dataset. Solenix Engineering GmbH, Feb (2023)

work page 2023

[24] [24]

Jingkun Gao, Xiaomin Song, Qingsong Wen, Pichao Wang, Liang Sun, and Huan Xu. 2020. Robusttad: Robust time series anomaly detection via decomposition and convolutional neural networks. arXiv preprint arXiv:2002.09545 (2020)

work page arXiv 2020

[25] [25]

Ary L Goldberger, Luis AN Amaral, Leon Glass, Jeffrey M Hausdorff, Plamen Ch Ivanov, Roger G Mark, Joseph E Mietus, George B Moody, Chung-Kang Peng, and H Eugene Stanley. 2000. PhysioBank, PhysioToolkit, and PhysioNet: components of a new research resource for complex physiologic signals. circulation 101, 23 (2000), e215–e220. KDD ’26, August 9-13, 2026, ...

work page 2000

[26] [26]

Mononito Goswami, Konrad Szafer, Arjun Choudhry, Yifu Cai, Shuo Li, and Artur Dubrawski. 2024. MOMENT: A Family of Open Time-series Foundation Models. In International Conference on Machine Learning

work page 2024

[27] [27]

Aaron Grattafiori, Abhimanyu Dubey, Abhinav Jauhri, Abhinav Pandey, Abhishek Kadian, Ahmad Al-Dahle, Aiesha Letman, Akhil Mathur, Alan Schelten, Alex Vaughan, et al. 2024. The llama 3 herd of models. arXiv preprint arXiv:2407.21783 (2024)

work page internal anchor Pith review Pith/arXiv arXiv 2024

[28] [28]

Scott David Greenwald, Ramesh S Patil, and Roger G Mark. 1990. Improved detection and classification of arrhythmias in noise-corrupted electrocardiograms using contextual information. IEEE

work page 1990

[29] [29]

Daya Guo, Dejian Yang, Haowei Zhang, Junxiao Song, Ruoyu Zhang, Runxin Xu, Qihao Zhu, Shirong Ma, Peiyi Wang, Xiao Bi, et al . 2025. Deepseek-r1: Incentivizing reasoning capability in llms via reinforcement learning. arXiv preprint arXiv:2501.12948 (2025)

work page internal anchor Pith review Pith/arXiv arXiv 2025

[30] [30]

Edward J Hu, Yelong Shen, Phillip Wallis, Zeyuan Allen-Zhu, Yuanzhi Li, Shean Wang, Lu Wang, Weizhu Chen, et al. 2022. Lora: Low-rank adaptation of large language models. ICLR 1, 2 (2022), 3

work page 2022

[31] [31]

Kyle Hundman, Valentino Constantinou, Christopher Laporte, Ian Colwell, and Tom Soderstrom. 2018. Detecting spacecraft anomalies using lstms and nonpara- metric dynamic thresholding. In Proceedings of the 24th ACM SIGKDD interna- tional conference on knowledge discovery & data mining . 387–395

work page 2018

[32] [32]

Vincent Jacob, Fei Song, Arnaud Stiegler, Bijan Rad, Yanlei Diao, and Nesime Tatbul. 2020. Exathlon: A benchmark for explainable anomaly detection over time series. arXiv preprint arXiv:2010.05073 (2020)

work page arXiv 2020

[33] [33]

Ming Jin, Yifan Zhang, Wei Chen, Kexin Zhang, Yuxuan Liang, Bin Yang, Jindong Wang, Shirui Pan, and Qingsong Wen. 2024. Position: What can large language models tell us about time series analysis. In Forty-first International Conference on Machine Learning

work page 2024

[34] [34]

Neha Kant and Manish Mahajan. 2019. Time-series outlier detection using enhanced k-means in combination with pso algorithm. In Engineering Vibration, Communication and Information Processing: ICoEVCI 2018, India . Springer, 363– 373

work page 2019

[35] [35]

Paweł Karczmarek, Adam Kiersztyn, Witold Pedrycz, and Ebru Al. 2020. K-means- based isolation forest. Knowledge-based systems 195 (2020), 105659

work page 2020

[36] [36]

Maryam Mahsal Khan and Mohammed Alkhathami. 2024. Anomaly detection in IoT-based healthcare: machine learning for enhanced security. Scientific reports 14, 1 (2024), 5872

work page 2024

[37] [37]

Yaxuan Kong, Yiyuan Yang, Yoontae Hwang, Wenjie Du, Stefan Zohren, Zhangyang Wang, Ming Jin, and Qingsong Wen. 2025. Time-mqa: Time se- ries multi-task question answering with context enhancement. arXiv preprint arXiv:2503.01875 (2025)

work page arXiv 2025

[38] [38]

Yaxuan Kong, Yiyuan Yang, Shiyu Wang, Chenghao Liu, Yuxuan Liang, Ming Jin, Stefan Zohren, Dan Pei, Yan Liu, and Qingsong Wen. 2025. Position: Empowering Time Series Reasoning with Multimodal LLMs. arXiv preprint arXiv:2502.01477 (2025)

work page arXiv 2025

[39] [39]

Kwei-Herng Lai, Daochen Zha, Junjie Xu, Yue Zhao, Guanchu Wang, and Xia Hu. 2021. Revisiting time series outlier detection: Definitions and benchmarks. In Thirty-fifth conference on neural information processing systems datasets and benchmarks track (round 1)

work page 2021

[40] [40]

N Laptev, S Amizadeh, and Y Billawala. 2015. S5-A labeled anomaly detection dataset, version 1.0 (16M)

work page 2015

[41] [41]

Gen Li and Jason J Jung. 2023. Deep learning for anomaly detection in multivariate time series: Approaches, applications, and challenges. Information Fusion 91 (2023), 93–102

work page 2023

[42] [42]

Yuxuan Liang, Haomin Wen, Yuqi Nie, Yushan Jiang, Ming Jin, Dongjin Song, Shirui Pan, and Qingsong Wen. 2024. Foundation models for time series analysis: A tutorial and survey. In Proceedings of the 30th ACM SIGKDD conference on knowledge discovery and data mining . 6555–6565

work page 2024

[43] [43]

Chen Liu, Shibo He, Shizhong Li, Zhenyu Shi, and Wenchao Meng. 2025. De- tecting Both Seen and Unseen Anomalies in Time Series. ACM Transactions on Knowledge Discovery from Data 19, 4 (2025), 1–29

work page 2025

[44] [44]

Chen Liu, Shibo He, Qihang Zhou, Shizhong Li, and Wenchao Meng. 2024. Large language model guided knowledge distillation for time series anomaly detection. arXiv preprint arXiv:2401.15123 (2024)

work page arXiv 2024

[45] [45]

Haoxin Liu, Harshavardhan Kamarthi, Zhiyuan Zhao, Shangqing Xu, Shiyu Wang, Qingsong Wen, Tom Hartvigsen, Fei Wang, and B Aditya Prakash. 2025. How can time series analysis benefit from multiple modalities? a survey and outlook. arXiv preprint arXiv:2503.11835 (2025)

work page arXiv 2025

[46] [46]

Haoxin Liu, Shangqing Xu, Zhiyuan Zhao, Lingkai Kong, Harshavardhan Prab- hakar Kamarthi, Aditya Sasanur, Megha Sharma, Jiaming Cui, Qingsong Wen, Chao Zhang, et al. 2024. Time-mmd: Multi-domain multimodal dataset for time series analysis. Advances in Neural Information Processing Systems 37 (2024), 77888–77933

work page 2024

[47] [47]

Jun Liu, Chaoyun Zhang, Jiaxu Qian, Minghua Ma, Si Qin, Chetan Bansal, Qingwei Lin, Saravan Rajmohan, and Dongmei Zhang. 2024. Large language models can deliver accurate and interpretable time series anomaly detection. arXiv preprint arXiv:2405.15370 (2024)

work page arXiv 2024

[48] [48]

Qinghua Liu and John Paparrizos. 2024. The elephant in the room: Towards a re- liable time-series anomaly detection benchmark. Advances in Neural Information Processing Systems 37 (2024), 108231–108261

work page 2024

[49] [49]

Zichuan Liu, Tianchun Wang, Jimeng Shi, Xu Zheng, Zhuomin Chen, Lei Song, Wenqian Dong, Jayantha Obeysekera, Farhad Shirani, and Dongsheng Luo. 2024. TIMEX++ learning time-series explanations with information bottleneck. In ICML. 32062–32082

work page 2024

[50] [50]

Zichuan Liu, Yingying ZHANG, Tianchun Wang, Zefan Wang, Dongsheng Luo, Mengnan Du, Min Wu, Yi Wang, Chunlin Chen, Lunting Fan, et al. [n. d.]. Ex- plaining Time Series via Contrastive and Locally Sparse Perturbations. In ICLR

work page

[51] [51]

Daehyung Park, Yuuna Hoshi, and Charles C Kemp. 2018. A multimodal anomaly detector for robot-assisted feeding using an lstm-based variational autoencoder. IEEE Robotics and Automation Letters 3, 3 (2018), 1544–1551

work page 2018

[52] [52]

Peter CB Phillips and Sainan Jin. 2021. Business cycles, trend elimination, and the HP filter. International Economic Review 62, 2 (2021), 469–520

work page 2021

[53] [53]

Hansheng Ren, Bixiong Xu, Yujing Wang, Chao Yi, Congrui Huang, Xiaoyu Kou, Tony Xing, Mao Yang, Jie Tong, and Qi Zhang. 2019. Time-series anomaly detec- tion service at microsoft. In Proceedings of the 25th ACM SIGKDD international conference on knowledge discovery & data mining . 3009–3017

work page 2019

[54] [54]

Lukas Ruff, Robert Vandermeulen, Nico Goernitz, Lucas Deecke, Shoaib Ahmed Siddiqui, Alexander Binder, Emmanuel Müller, and Marius Kloft. 2018. Deep one-class classification. In International conference on machine learning . PMLR, 4393–4402

work page 2018

[55] [55]

Mayu Sakurada and Takehisa Yairi. 2014. Anomaly detection using autoencoders with nonlinear dimensionality reduction. In Proceedings of the MLSDA 2014 2nd workshop on machine learning for sensory data analysis . 4–11

work page 2014

[56] [56]

Sebastian Schmidl, Phillip Wenig, and Thorsten Papenbrock. 2022. Anomaly detection in time series: a comprehensive evaluation. Proceedings of the VLDB Endowment 15, 9 (2022), 1779–1797

work page 2022

[57] [57]

Youjin Shin, Sangyup Lee, Shahroz Tariq, Myeong Shin Lee, Okchul Jung, Daewon Chung, and Simon S Woo. 2020. Itad: integrative tensor-based anomaly detection system for reducing false positives of satellite systems. In Proceedings of the 29th ACM international conference on information & knowledge management . 2733–2740

work page 2020

[58] [58]

Haotian Si, Jianhui Li, Changhua Pei, Hang Cui, Jingwen Yang, Yongqian Sun, Shenglin Zhang, Jingjing Li, Haiming Zhang, Jing Han, et al. 2024. Timeseries- bench: An industrial-grade benchmark for time series anomaly detection models. In 2024 IEEE 35th International Symposium on Software Reliability Engineering (ISSRE). IEEE, 61–72

work page 2024

[59] [59]

Ya Su, Youjian Zhao, Chenhao Niu, Rong Liu, Wei Sun, and Dan Pei. 2019. Robust anomaly detection for multivariate time series through stochastic recurrent neural network. In Proceedings of the 25th ACM SIGKDD international conference on knowledge discovery & data mining . 2828–2837

work page 2019

[60] [60]

Gemini Team, Rohan Anil, Sebastian Borgeaud, Jean-Baptiste Alayrac, Jiahui Yu, Radu Soricut, Johan Schalkwyk, Andrew M Dai, Anja Hauth, Katie Millican, et al. 2023. Gemini: a family of highly capable multimodal models. arXiv preprint arXiv:2312.11805 (2023)

work page internal anchor Pith review Pith/arXiv arXiv 2023

[61] [61]

Markus Thill, Wolfgang Konen, and Thomas Bäck. 2020. MarkusThill/MGAB: the mackey-glass anomaly benchmark. Version v1. 0.1. Zenodo. doi 10 (2020)

work page 2020

[62] [62]

Luan Tran, Liyue Fan, and Cyrus Shahabi. 2016. Distance-based outlier detection in data streams. Proceedings of the VLDB Endowment 9, 12 (2016), 1089–1100

work page 2016

[63] [63]

Hubert Truchan and Zahra Ahmadi. 2025. Nonastreda: Multimodal Dataset for Identifying Tool Wear Condition. https://doi.org/10.17632/m892d2wtzh.1. doi:10.17632/m892d2wtzh.1

work page doi:10.17632/m892d2wtzh.1 2025

[64] [64]

Bingxing Wang, Yuxin Dong, Jianhua Yao, Honglin Qin, and Jiajing Wang. 2024. Exploring anomaly detection and risk assessment in financial markets using deep neural networks. International Journal of Innovative Research in Computer Science and Technology 12, 4 (2024)

work page 2024

[65] [65]

Chengsen Wang, Qi Qi, Jingyu Wang, Haifeng Sun, Zirui Zhuang, Jinming Wu, Lei Zhang, and Jianxin Liao. 2025. ChatTime: A Unified Multimodal Time Series Foundation Model Bridging Numerical and Textual Data. In AAAI Conference on Artificial Intelligence

work page 2025

[66] [66]

Haixu Wu, Tengge Hu, Yong Liu, Hang Zhou, Jianmin Wang, and Mingsheng Long. 2023. TimesNet: Temporal 2D-Variation Modeling for General Time Series Analysis. In International Conference on Learning Representations

work page 2023

[67] [67]

Renjie Wu and Eamonn J Keogh. 2021. Current time series anomaly detection benchmarks are flawed and are creating the illusion of progress.IEEE transactions on knowledge and data engineering 35, 3 (2021), 2421–2429

work page 2021

[68] [68]

Jiehui Xu, Haixu Wu, Jianmin Wang, and Mingsheng Long. 2021. Anomaly transformer: Time series anomaly detection with association discrepancy. arXiv preprint arXiv:2110.02642 (2021)

work page arXiv 2021

[69] [69]

Xiongxiao Xu, Haoran Wang, Yueqing Liang, Philip S Yu, Yue Zhao, and Kai Shu

work page

[70] [70]

Can Multimodal LLMs Perform Time Series Anomaly Detection? arXiv preprint arXiv:2502.17812 (2025)

work page arXiv 2025

[71] [71]

Kun Yang, Samory Kpotufe, and Nick Feamster. 2021. An efficient one-class SVM for anomaly detection in the Internet of Things. arXiv preprint arXiv:2104.11146 (2021)

work page arXiv 2021

[72] [72]

Yiyuan Yang, Chaoli Zhang, Tian Zhou, Qingsong Wen, and Liang Sun. 2023. Dcdetector: Dual attention contrastive representation learning for time series Time-RA: Towards Time Series Reasoning for Anomaly with LLM Feedback KDD ’26, August 9-13, 2026, Jeju, South Korea anomaly detection. In Proceedings of the 29th ACM SIGKDD Conference on Knowl- edge Discove...

work page 2023

[73] [73]

Zahra Zamanzadeh Darban, Geoffrey I Webb, Shirui Pan, Charu Aggarwal, and Mahsa Salehi. 2024. Deep learning for time series anomaly detection: A survey. Comput. Surveys 57, 1 (2024), 1–42

work page 2024

[74] [74]

Chaoli Zhang, Tian Zhou, Qingsong Wen, and Liang Sun. 2022. Tfad: A de- composition time series anomaly detection architecture with time-frequency analysis. In Proceedings of the 31st ACM international conference on information & knowledge management. 2497–2507

work page 2022

[75] [75]

Zhijie Zhong, Zhiwen Yu, Yiyuan Yang, Weizheng Wang, and Kaixiang Yang

work page

[76] [76]

arXiv preprint arXiv:2401.09793 (2024)

PatchAD: A lightweight patch-based MLP-mixer for time series anomaly detection. arXiv preprint arXiv:2401.09793 (2024)

work page arXiv 2024

[77] [77]

Tian Zhou, Peisong Niu, Liang Sun, Rong Jin, et al . 2023. One fits all: Power general time series analysis by pretrained lm. Advances in neural information processing systems 36 (2023), 43322–43355

work page 2023

[78] [78]

Zihao Zhou and Rose Yu. 2024. Can LLMs Understand Time Series Anomalies? arXiv:2410.05440 [cs.LG] https://arxiv.org/abs/2410.05440 KDD ’26, August 9-13, 2026, Jeju, South Korea Yang et al. A LIMITATION AND FURTHER DIRECTION For the limitations, there are some worth noting: (i) Detection of Multiple Anomaly Types in a Single Sequence. In cases where multip...

work page arXiv 2024

[79] [79]

**Irrelevant**: No alignment

work page

[80] [80]

**Partial Focus**: Poor handling in a certain aspect, such as misclassifica- tion of exceptions

work page