pith. sign in

arxiv: 2511.18739 · v2 · pith:ZANPQFQEnew · submitted 2025-11-24 · 💻 cs.AI · cs.LG· stat.ML

A Problem-Oriented Taxonomy of Evaluation Metrics for Time Series Anomaly Detection

Pith reviewed 2026-05-17 06:53 UTC · model grok-4.3

classification 💻 cs.AI cs.LGstat.ML
keywords time series anomaly detectionevaluation metricstaxonomyrobustness to random scoresevent-level metricsNABPoint-Adjustdiscriminative ability
0
0 comments X

The pith

A problem-oriented taxonomy of time series anomaly detection metrics finds that most separate real detections from noise but NAB and Point-Adjust inflate easily under random scores.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper establishes a framework that sorts more than twenty existing metrics into six dimensions according to the practical evaluation problems each one targets, such as rewarding timely alerts or accounting for the cost of human review. Experiments compare metric scores across genuine detections, purely random outputs, and perfect oracle results to measure how well each one distinguishes useful performance from noise. A sympathetic reader would care because the choice of metric directly affects whether an anomaly detector appears effective in IoT or cyber-physical systems, and a weak metric can mask poor performance or reward meaningless outputs. The results indicate strong separability for most event-level metrics while highlighting limited resistance to random inflation in several widely adopted ones.

Core claim

By organizing metrics according to the evaluation challenges they address instead of their mathematical definitions, the taxonomy places them into six dimensions covering basic accuracy, timeliness rewards, tolerance for imprecise labels, audit-cost penalties, robustness against random or inflated scores, and parameter-free cross-dataset use. Tests under genuine, random, and oracle scenarios show that most event-level metrics produce clearly separated score distributions, yet NAB and Point-Adjust exhibit limited resistance to random-score inflation, supporting the conclusion that metric choice must match the operational objectives of each application.

What carries the argument

the problem-oriented taxonomy that reinterprets metrics by the specific evaluation challenges they target rather than their formulas

If this is right

  • Metric selection for time series anomaly detection must align with the application's specific priorities such as timeliness or audit cost.
  • Widely used metrics like NAB and Point-Adjust can produce misleadingly high scores when detectors output random or inflated values.
  • Parameter-free metrics support more reliable comparisons of detectors across different datasets.
  • Evaluation protocols should incorporate tests for robustness against random-score inflation as a standard check.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Application developers would benefit from first identifying which of the six dimensions matter most for their use case before picking an evaluation metric.
  • The random-and-oracle testing approach could be reused to assess metric reliability in related sequential-data tasks such as fault detection.
  • If the dimensions prove comprehensive, future metric design could focus on strengthening the areas where current popular options show weakness.

Load-bearing premise

The six dimensions cover the main evaluation challenges and the genuine-random-oracle test scenarios represent real application behavior without important confounding factors.

What would settle it

A new set of experiments on independent real-world datasets showing that NAB and Point-Adjust maintain clear separation between genuine and random detections would challenge the claim of their limited resistance to random-score inflation.

Figures

Figures reproduced from arXiv: 2511.18739 by Jiarong Liu, Kaixiang Yang, Shuanghua Yang, Yujue Zhou, Yupeng Song.

Figure 1
Figure 1. Figure 1: Problem-oriented Taxonomy of Time Series Anomaly Detection Metrics. This figure illustrates the proposed problem-oriented taxonomy of anomaly [PITH_FULL_IMAGE:figures/full_fig_p004_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Concentrated vs. Distributed False Alarms under Batch-level Eval [PITH_FULL_IMAGE:figures/full_fig_p010_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Distribution of metric scores under genuine detectors, random guessing, and oracle-based attacks. Metrics whose genuine and random score distributions [PITH_FULL_IMAGE:figures/full_fig_p016_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Comprehensive comparison of 20+ time series anomaly detection metrics based on average effect size, AUC, genuine/random scores, and monotonicity. [PITH_FULL_IMAGE:figures/full_fig_p016_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Joint analysis of effect size and AUC across all metrics. Metrics located in the left region exhibit high discriminative ability, clearly separating genuine [PITH_FULL_IMAGE:figures/full_fig_p017_5.png] view at source ↗
read the original abstract

Time series anomaly detection is widely used in IoT and cyber-physical systems, yet its evaluation remains challenging due to diverse application objectives and heterogeneous metric assumptions. This study introduces a problem-oriented framework that reinterprets existing metrics based on the specific evaluation challenges they are designed to address, rather than their mathematical forms or output structures. We categorize over twenty commonly used metrics into six dimensions: 1) basic accuracy-driven evaluation; 2) timeliness-aware reward mechanisms; 3) tolerance to labeling imprecision; 4) penalties reflecting human-audit cost; 5) robustness against random or inflated scores; and 6) parameter-free comparability for cross-dataset benchmarking. Comprehensive experiments are conducted to examine metric behavior under genuine, random, and oracle detection scenarios. By comparing their resulting score distributions, we quantify each metric's discriminative ability -- its capability to distinguish meaningful detections from random noise. The results show that while most event-level metrics exhibit strong separability, several widely used metrics (e.g., NAB, Point-Adjust) demonstrate limited resistance to random-score inflation. These findings reveal that metric suitability must be inherently task-dependent and aligned with the operational objectives of IoT applications. The proposed framework offers a unified analytical perspective for understanding existing metrics and provides practical guidance for selecting or developing more context-aware, robust, and fair evaluation methodologies for time series anomaly detection.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 1 minor

Summary. The paper introduces a problem-oriented taxonomy that reinterprets over twenty time series anomaly detection metrics across six dimensions (basic accuracy, timeliness-aware rewards, tolerance to labeling imprecision, human-audit cost penalties, robustness to random/inflated scores, and parameter-free cross-dataset comparability). It reports experiments comparing score distributions under genuine, random, and oracle detection scenarios, claiming strong separability for most event-level metrics but limited resistance to random-score inflation for widely used ones such as NAB and Point-Adjust, and concludes that metric suitability is inherently task-dependent for IoT applications.

Significance. If the empirical findings on separability hold after addressing setup details, the work provides a useful unified analytical lens for metric selection in time series anomaly detection, moving beyond mathematical form to problem-specific challenges. The multi-scenario experimental design (genuine/random/oracle) is a constructive element that directly quantifies discriminative ability and offers practical guidance for more robust, context-aware evaluation in cyber-physical systems.

major comments (2)
  1. [Experiments] Experimental setup (described at high level in the abstract and results): the random detection scenario lacks specification of whether scores are drawn independently or preserve temporal autocorrelation, anomaly density, and score calibration typical of real detectors. If the former, the limited resistance observed for NAB and Point-Adjust may be an artifact of the synthetic construction rather than an intrinsic metric property, directly undermining the central claim that suitability is task-dependent.
  2. [Results] Results section: no dataset details, statistical tests, or exclusion criteria are reported for the score-distribution comparisons across the three scenarios. This leaves the reported separability differences potentially sensitive to unstated choices and weakens the robustness of the conclusion that most event-level metrics exhibit strong separability.
minor comments (1)
  1. [Taxonomy] The derivation or validation of the six dimensions could be clarified with a brief justification of completeness for IoT use cases.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their constructive feedback on our paper. The comments highlight areas where additional clarity can strengthen the presentation of our taxonomy and experimental results. We respond to each major comment below and indicate the changes we will make in the revised manuscript.

read point-by-point responses
  1. Referee: [Experiments] Experimental setup (described at high level in the abstract and results): the random detection scenario lacks specification of whether scores are drawn independently or preserve temporal autocorrelation, anomaly density, and score calibration typical of real detectors. If the former, the limited resistance observed for NAB and Point-Adjust may be an artifact of the synthetic construction rather than an intrinsic metric property, directly undermining the central claim that suitability is task-dependent.

    Authors: We agree that more detailed specification of the random detection scenario is necessary to fully substantiate our claims. In the revised version of the manuscript, we will provide an explicit description of how the random scores are generated, including confirmation that they are drawn independently per time step while matching the anomaly density of the genuine scenario and using a uniform distribution for calibration. This controlled approach allows us to isolate the effect of random inflation on the metrics. We maintain that this does not render the findings an artifact, because the differential performance across metrics (strong separability for most event-level metrics versus limited for NAB and Point-Adjust) demonstrates inherent differences in their design, which aligns with our conclusion that suitability is task-dependent. The added details will help readers evaluate this aspect more thoroughly. revision: yes

  2. Referee: [Results] Results section: no dataset details, statistical tests, or exclusion criteria are reported for the score-distribution comparisons across the three scenarios. This leaves the reported separability differences potentially sensitive to unstated choices and weakens the robustness of the conclusion that most event-level metrics exhibit strong separability.

    Authors: We thank the referee for noting this gap in reporting. To address it, the revised manuscript will expand the Results section to include: (1) detailed information on the datasets employed, such as their origins, sizes, number of anomalies, and any preprocessing steps; (2) the statistical tests used to compare score distributions across the genuine, random, and oracle scenarios (e.g., appropriate non-parametric tests for separability); and (3) any exclusion criteria applied, such as for metrics that require tuning parameters or specific data conditions. These enhancements will increase the transparency and reproducibility of our experiments, thereby reinforcing the robustness of the separability conclusions. revision: yes

Circularity Check

0 steps flagged

No circularity; taxonomy and separability claims grounded in new experiments

full rationale

The paper defines a six-dimension taxonomy by reinterpreting existing metrics according to evaluation challenges they address, then quantifies discriminative ability via direct comparison of score distributions under genuine, random, and oracle detection scenarios. These empirical comparisons constitute independent evidence rather than any reduction to fitted parameters, self-citations, or definitional equivalences. No load-bearing steps invoke prior author work as a uniqueness theorem or smuggle ansatzes; the framework remains self-contained against the described experimental benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 1 invented entities

The central claim rests on the assumption that metrics can be usefully grouped by the evaluation problems they target and that the three experimental regimes distinguish meaningful from random performance.

axioms (1)
  • domain assumption Metrics can be meaningfully reinterpreted and grouped according to the specific evaluation challenges they address rather than their mathematical definitions.
    This is the foundational premise of the problem-oriented framework stated in the abstract.
invented entities (1)
  • Six-dimensional problem-oriented taxonomy no independent evidence
    purpose: To provide a unified analytical perspective for understanding and selecting metrics
    New categorization introduced by the authors; no independent evidence outside the paper is supplied.

pith-pipeline@v0.9.0 · 5557 in / 1243 out tokens · 41583 ms · 2026-05-17T06:53:41.964049+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

51 extracted references · 51 canonical work pages

  1. [1]

    Anomaly detection for iot time- series data: A survey,

    A. A. Cook, G. Mısırlı, and Z. Fan, “Anomaly detection for iot time- series data: A survey,”IEEE Internet of Things Journal, vol. 7, no. 7, pp. 6481–6494, 2019

  2. [2]

    Iot platforms: enabling the internet of things,

    S. Luceroet al., “Iot platforms: enabling the internet of things,”White paper, 2016

  3. [3]

    Idc forecasts connected iot devices to generate 79.4 zb of data in 2025,

    E. Estopace, “Idc forecasts connected iot devices to generate 79.4 zb of data in 2025,”FutureIoT, June, 2019

  4. [4]

    Time series anomaly detection for cyber-physical systems via neural system identification and bayesian filtering,

    C. Feng and P. Tian, “Time series anomaly detection for cyber-physical systems via neural system identification and bayesian filtering,” in Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining, 2021, pp. 2858–2867

  5. [5]

    Mac: Measuring the impacts of anomalies on travel time of multiple transportation systems,

    Z. Fang, Y . Yang, S. Wang, B. Fu, Z. Song, F. Zhang, and D. Zhang, “Mac: Measuring the impacts of anomalies on travel time of multiple transportation systems,”Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies, vol. 3, no. 2, pp. 1–24, 2019

  6. [6]

    {Jump-Starting}multivariate time series anomaly detection for online service systems,

    M. Ma, S. Zhang, J. Chen, J. Xu, H. Li, Y . Lin, X. Nie, B. Zhou, Y . Wang, and D. Pei, “{Jump-Starting}multivariate time series anomaly detection for online service systems,” in2021 USENIX Annual Technical Conference (USENIX ATC 21), 2021, pp. 413–426

  7. [7]

    Special issue on time series analysis in the biological sciences,

    D. S. Stoffer and H. Ombao, “Special issue on time series analysis in the biological sciences,” pp. 701–703, 2012

  8. [8]

    Time series anomaly detection for smart grids: A survey,

    J. E. Zhang, D. Wu, and B. Boulet, “Time series anomaly detection for smart grids: A survey,” in2021 IEEE Electrical Power and Energy Conference (EPEC). IEEE, 2021, pp. 125–130

  9. [9]

    A data-distillation-enhanced autoencoder for detecting anomalous gas consumption,

    Y . Zhou, J. Jiang, S.-H. Yang, L. He, Y . Ding, K. Liu, G. Zhu, and Y . Qing, “A data-distillation-enhanced autoencoder for detecting anomalous gas consumption,”IEEE Internet of Things Journal, vol. 11, no. 2, pp. 3473–3483, 2023

  10. [10]

    Finding unusual medical time-series subsequences: Algorithms and applications,

    E. Keogh, J. Lin, A. Fu, and H. Van Herle, “Finding unusual medical time-series subsequences: Algorithms and applications,”IEEE Trans- actions on Information Technology in Biomedicine, vol. 10, no. 3, pp. 429–439, 2006

  11. [11]

    Spacecraft time-series anomaly detection using transfer learning,

    S. Baireddy, S. R. Desai, J. L. Mathieson, R. H. Foster, M. W. Chan, M. L. Comer, and E. J. Delp, “Spacecraft time-series anomaly detection using transfer learning,” in2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW). IEEE, 2021, pp. 1951–1960

  12. [12]

    Anomaly detection in financial time series by principal component analysis and neural networks,

    S. Cr ´epey, N. Lehdili, N. Madhar, and M. Thomas, “Anomaly detection in financial time series by principal component analysis and neural networks,”Algorithms, vol. 15, no. 10, p. 385, 2022

  13. [13]

    Box and jenkins: time series analysis, forecasting and control,

    G. Box, “Box and jenkins: time series analysis, forecasting and control,” inA Very British Affair: Six Britons and the Development of Time Series Analysis During the 20th Century. Springer, 2013, pp. 161–215

  14. [14]

    Deep one-class classification,

    L. Ruff, R. Vandermeulen, N. Goernitz, L. Deecke, S. A. Siddiqui, A. Binder, E. M ¨uller, and M. Kloft, “Deep one-class classification,” inInternational conference on machine learning. PMLR, 2018, pp. 4393–4402

  15. [15]

    Deep autoencoding gaussian mixture model for unsupervised anomaly detection,

    B. Zong, Q. Song, M. R. Min, W. Cheng, C. Lumezanu, D. Cho, and H. Chen, “Deep autoencoding gaussian mixture model for unsupervised anomaly detection,” inInternational conference on learning representa- tions, 2018

  16. [16]

    Calibrated one-class classification for unsupervised time series anomaly detection,

    H. Xu, Y . Wang, S. Jian, Q. Liao, Y . Wang, and G. Pang, “Calibrated one-class classification for unsupervised time series anomaly detection,” IEEE Transactions on Knowledge and Data Engineering, vol. 36, no. 11, pp. 5723–5736, 2024

  17. [17]

    S. S. Saravanan,Time series anomaly detection using generative ad- versarial networks. Missouri University of Science and Technology, 2023. IEEE INTERNET OF THINGS JOURNAL 18

  18. [18]

    Anomaly detection: A survey,

    V . Chandola, A. Banerjee, and V . Kumar, “Anomaly detection: A survey,” ACM computing surveys (CSUR), vol. 41, no. 3, pp. 1–58, 2009

  19. [19]

    Self-supervised disentangled representation learning for time series anomaly detection,

    L. Zhang, J. Zhu, G. Han, B. Jin, P. Wang, and X. Wei, “Self-supervised disentangled representation learning for time series anomaly detection,” IEEE Internet of Things Journal, 2025

  20. [20]

    Multiview graph contrastive learning for multivariate time-series anomaly detection in iot,

    S. Qin, L. Chen, Y . Luo, and G. Tao, “Multiview graph contrastive learning for multivariate time-series anomaly detection in iot,”IEEE Internet of Things Journal, vol. 10, no. 24, pp. 22 401–22 414, 2023

  21. [21]

    An interpretable multivariate time-series anomaly detection method in cyber–physical systems based on adaptive mask,

    H. Zhu, C. Yi, S. Rho, S. Liu, and F. Jiang, “An interpretable multivariate time-series anomaly detection method in cyber–physical systems based on adaptive mask,”IEEE Internet of Things Journal, vol. 11, no. 2, pp. 2728–2740, 2024

  22. [22]

    Deep koopman predictors for anomaly detection of complex iot systems with time series data,

    L. Fu, M. Ma, and Z. Zhai, “Deep koopman predictors for anomaly detection of complex iot systems with time series data,”IEEE Internet of Things Journal, vol. 11, no. 23, pp. 38 360–38 369, 2024

  23. [23]

    Learning graph structures with transformer for multivariate time-series anomaly detection in iot,

    Z. Chen, D. Chen, X. Zhang, Z. Yuan, and X. Cheng, “Learning graph structures with transformer for multivariate time-series anomaly detection in iot,”IEEE Internet of Things Journal, vol. 9, no. 12, pp. 9179–9189, 2022

  24. [24]

    Detecting spacecraft anomalies using lstms and nonparametric dynamic thresholding,

    K. Hundman, V . Constantinou, C. Laporte, I. Colwell, and T. Soder- strom, “Detecting spacecraft anomalies using lstms and nonparametric dynamic thresholding,” inProceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, 2018, pp. 387–395

  25. [25]

    Navigating the metric maze: A taxonomy of evaluation metrics for anomaly detection in time series,

    S. Sørbø and M. Ruocco, “Navigating the metric maze: A taxonomy of evaluation metrics for anomaly detection in time series,”Data Mining and Knowledge Discovery, vol. 38, no. 3, pp. 1027–1068, 2024

  26. [26]

    Tanogan: Time series anomaly detection with generative adversarial networks,

    M. A. Bashar and R. Nayak, “Tanogan: Time series anomaly detection with generative adversarial networks,” in2020 IEEE Symposium Series on Computational Intelligence (SSCI). IEEE, 2020, pp. 1778–1785

  27. [27]

    Learning sparse latent graph representations for anomaly detection in multivariate time series,

    S. Han and S. S. Woo, “Learning sparse latent graph representations for anomaly detection in multivariate time series,” inProceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining. ACM, 2022, pp. 2977–2986

  28. [28]

    Multivariate time series anomaly detection and interpretation using hierarchical inter- metric and temporal embedding,

    Z. Li, Y . Zhao, J. Han, Y . Su, R. Jiao, X. Wen, and D. Pei, “Multivariate time series anomaly detection and interpretation using hierarchical inter- metric and temporal embedding,” inProceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining. ACM, 2021, pp. 3220–3230

  29. [29]

    Practical approach to asyn- chronous multivariate time series anomaly detection and localization,

    A. Abdulaal, Z. Liu, and T. Lancewicki, “Practical approach to asyn- chronous multivariate time series anomaly detection and localization,” inProceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining. ACM, 2021, pp. 2485–2494

  30. [30]

    Caveats and pitfalls of roc analysis in clinical microarray research (and how to avoid them),

    D. Berrar and P. Flach, “Caveats and pitfalls of roc analysis in clinical microarray research (and how to avoid them),”Briefings in Bioinformat- ics, vol. 13, no. 1, pp. 83–97, 2012

  31. [31]

    An evaluation of anomaly detection and diagnosis in multivariate time series,

    A. Garg, W. Zhang, J. Samaran, R. Savitha, and C.-S. Foo, “An evaluation of anomaly detection and diagnosis in multivariate time series,”IEEE Transactions on Neural Networks and Learning Systems, vol. 33, no. 6, pp. 2508–2517, 2022

  32. [32]

    A joint model for it operation series prediction and anomaly detection,

    R.-Q. Chen, G.-H. Shi, W.-L. Zhao, and C.-H. Liang, “A joint model for it operation series prediction and anomaly detection,”Neurocomputing, vol. 448, pp. 130–139, 2021

  33. [33]

    Evaluating real-time anomaly detection algo- rithms – the numenta anomaly benchmark,

    A. Lavin and S. Ahmad, “Evaluating real-time anomaly detection algo- rithms – the numenta anomaly benchmark,” in2015 IEEE 14th Inter- national Conference on Machine Learning and Applications (ICMLA). IEEE, 2015, pp. 38–44

  34. [34]

    Precision and recall for time series,

    N. Tatbul, T. J. Lee, S. Zdonik, M. Alam, and J. Gottschlich, “Precision and recall for time series,” inAdvances in Neural Information Processing Systems, vol. 31. Curran Associates, Inc., 2018

  35. [35]

    Statistical evaluation of anomaly detectors for sequences,

    E. Scharw ¨achter and E. M ¨uller, “Statistical evaluation of anomaly detectors for sequences,” 2020

  36. [36]

    Evaluation metrics for anomaly detection algorithms in time-series,

    G. Kov ´acs, G. Sebestyen, and A. Hangan, “Evaluation metrics for anomaly detection algorithms in time-series,”Acta Universitatis Sapi- entiae, Informatica, vol. 11, no. 2, pp. 113–130, 2019

  37. [37]

    Local evaluation of time series anomaly detection algorithms,

    A. Huet, J. M. Navarro, and D. Rossi, “Local evaluation of time series anomaly detection algorithms,” inProceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining. ACM, 2022, pp. 635–645

  38. [38]

    Time-series aware precision and recall for anomaly detection: Considering variety of detection result and addressing ambiguous labeling,

    W.-S. Hwang, J.-H. Yun, J. Kim, and H. C. Kim, “Time-series aware precision and recall for anomaly detection: Considering variety of detection result and addressing ambiguous labeling,” inProceedings of the 28th ACM International Conference on Information and Knowledge Management. ACM, 2019, pp. 2241–2244

  39. [39]

    ”do you know existing accuracy metrics overrate time-series anomaly detections?

    W.-S. Hwang, J.-H. Yun, J. Kim, and B. G. Min, “”do you know existing accuracy metrics overrate time-series anomaly detections?”,” inPro- ceedings of the 37th ACM/SIGAPP Symposium on Applied Computing. ACM, 2022, pp. 403–412

  40. [40]

    V olume under the surface: A new accuracy evaluation measure for time-series anomaly detection,

    J. Paparrizos, P. Boniol, T. Palpanas, R. S. Tsay, A. Elmore, and M. J. Franklin, “V olume under the surface: A new accuracy evaluation measure for time-series anomaly detection,”Proceedings of the VLDB Endowment, vol. 15, no. 11, pp. 2774–2787, 2022

  41. [41]

    Pate: Proximity-aware time series anomaly evaluation,

    R. Ghorbani, M. J. Reinders, and D. M. Tax, “Pate: Proximity-aware time series anomaly evaluation,” inProceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, 2024, pp. 872–883

  42. [42]

    Tsb-uad: An end-to-end benchmark suite for univariate time-series anomaly detection,

    J. Paparrizos, Y . Kang, P. Boniol, R. S. Tsay, T. Palpanas, and M. J. Franklin, “Tsb-uad: An end-to-end benchmark suite for univariate time-series anomaly detection,”Proceedings of the VLDB Endowment, vol. 15, no. 8, pp. 1697–1711, 2022

  43. [43]

    Towards a rigorous evaluation of time-series anomaly detection,

    S. Kim, K. Choi, H.-S. Choi, B. Lee, and S. Yoon, “Towards a rigorous evaluation of time-series anomaly detection,”Proceedings of the AAAI Conference on Artificial Intelligence, vol. 36, no. 7, pp. 7194–7201, 2022

  44. [44]

    Time series anomaly detection with adversarial reconstruction net- works,

    S. Liu, B. Zhou, Q. Ding, B. Hooi, Z. Zhang, H. Shen, and X. Cheng, “Time series anomaly detection with adversarial reconstruction net- works,”IEEE Transactions on Knowledge and Data Engineering, vol. 35, no. 4, pp. 4293–4306, 2023

  45. [45]

    Mstream: Fast anomaly detection in multi-aspect streams,

    S. Bhatia, A. Jain, P. Li, R. Kumar, and B. Hooi, “Mstream: Fast anomaly detection in multi-aspect streams,” inProceedings of the Web Conference

  46. [46]

    3371–3382

    ACM, 2021, pp. 3371–3382

  47. [47]

    Outlier detection for time series with recurrent autoencoder ensembles,

    T. Kieu, B. Yang, C. Guo, and C. S. Jensen, “Outlier detection for time series with recurrent autoencoder ensembles,” inProceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence. International Joint Conferences on Artificial Intelligence Organization, 2019, pp. 2725–2732

  48. [48]

    A multimodal anomaly detector for robot-assisted feeding using an lstm-based variational autoencoder,

    D. Park, Y . Hoshi, and C. C. Kemp, “A multimodal anomaly detector for robot-assisted feeding using an lstm-based variational autoencoder,” IEEE Robotics and Automation Letters, vol. 3, no. 3, pp. 1544–1551, 2018

  49. [49]

    Graph-augmented normalizing flows for anomaly detection of multiple time series,

    E. Dai and J. Chen, “Graph-augmented normalizing flows for anomaly detection of multiple time series,”arXiv preprint arXiv:2202.07857, 2022

  50. [50]

    Unsupervised anomaly detection with lstm neural networks,

    T. Ergen and S. S. Kozat, “Unsupervised anomaly detection with lstm neural networks,”IEEE Transactions on Neural Networks and Learning Systems, vol. 31, no. 8, pp. 3127–3141, 2020

  51. [51]

    Robustness of autoen- coders for anomaly detection under adversarial impact,

    A. Goodge, B. Hooi, S. K. Ng, and W. S. Ng, “Robustness of autoen- coders for anomaly detection under adversarial impact,” inProceedings of the twenty-ninth international conference on international joint con- ferences on artificial intelligence, 2021, pp. 1244–1250. Kaixiang YangKaixiang Yang received the B.S. degree in metallic materials engineering fro...