pith. sign in

arxiv: 2605.30046 · v1 · pith:7MJJZIHEnew · submitted 2026-05-28 · 💻 cs.LG · cs.AI

Masked Diffusion Modeling for Anomaly Detection

Pith reviewed 2026-06-29 09:12 UTC · model grok-4.3

classification 💻 cs.LG cs.AI
keywords anomaly detectionmasked diffusiontabular anomaly detectioncategorical datadiscrete sequencesreconstruction scorediffusion models
0
0 comments X

The pith

Masked diffusion models detect anomalies by scoring the difficulty of reconstructing masked coordinates in nominal data.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper proposes MaskDiff-AD, which applies masked diffusion models to anomaly detection for categorical, mixed-type, and sequence data. The model is trained only on normal samples and uses the challenge of reconstructing randomly masked parts as an anomaly score. This avoids reverse sampling and works directly on discrete data. It includes theory on error types and shows top average performance across many tabular and text datasets compared to other methods.

Core claim

MaskDiff-AD is a forward-only method based on masked diffusion models trained only on nominal data. Anomaly scores come from the difficulty of reconstructing randomly masked coordinates, creating a content-sensitive score for discrete state spaces without reverse-time sampling. A non-parametric variant is provided along with theoretical guarantees on Type-I and Type-II errors under a fixed threshold. It achieves the best overall average rank on fourteen categorical and mixed-type tabular datasets and four text datasets, outperforming twelve tabular baselines.

What carries the argument

Masked diffusion model for scoring anomaly via reconstruction difficulty of masked coordinates.

If this is right

  • Outperforms all twelve tabular baseline methods on the fourteen datasets from ADBench and UADAD.
  • Applies to four text anomaly detection datasets from NLP-ADBench with competitive results.
  • Provides theoretical guarantees characterizing Type-I and Type-II errors.
  • Non-parametric variant available for use without parametric assumptions.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Such reconstruction-based scoring might generalize to other discrete data types like graphs or code if the masking strategy is adapted.
  • The forward-only design could reduce computational cost compared to methods requiring full diffusion sampling at test time.
  • Further validation on imbalanced or high-dimensional datasets would test the score's robustness beyond the reported experiments.

Load-bearing premise

That the difficulty of reconstructing randomly masked coordinates in a model trained only on nominal data yields a reliable, content-sensitive anomaly score that separates anomalies from normal samples across the tested data distributions.

What would settle it

Finding a dataset of categorical or mixed data where the reconstruction errors for masked coordinates do not differ significantly between nominal and anomalous samples.

Figures

Figures reproduced from arXiv: 2605.30046 by Lixing Zhang, Liyan Xie, Yuchen Liang.

Figure 1
Figure 1. Figure 1: Overview of MaskDiff-AD. We first generate masked probe views of a test sample at [PITH_FULL_IMAGE:figures/full_fig_p004_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Synthetic heatmaps of the expectation of non-parametric and parametric reconstruction [PITH_FULL_IMAGE:figures/full_fig_p007_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Sensitivity of the ROC-AUC to the probe mask rate on the Vehicle Claims dataset. Each [PITH_FULL_IMAGE:figures/full_fig_p011_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Additional sensitivity analysis of Parametric MaskDiff-AD to the probe mask rate [PITH_FULL_IMAGE:figures/full_fig_p021_4.png] view at source ↗
read the original abstract

Anomaly detection aims to identify samples that deviate from the nominal data distribution and is central to many safety-critical applications. However, developing effective anomaly detection methods for categorical, mixed-type, and discrete sequence data remains challenging and relatively underexplored. Masked diffusion models provide a natural way to model such data by learning to recover masked values from the remaining visible context. In this paper, we propose Masked Diffusion for Anomaly Detection (MaskDiff-AD), a forward-only method based on masked diffusion models trained only on nominal data. Given a test sample, MaskDiff-AD constructs anomaly scores from the difficulty of reconstructing randomly masked coordinates, yielding a content-sensitive score that operates directly on discrete state spaces while avoiding reverse-time sampling. We also develop a non-parametric variant of MaskDiff-AD and provide theoretical guarantees by characterizing Type-I and Type-II errors under a fixed detection threshold. Experiments on fourteen categorical and mixed-type tabular datasets from ADBench and UADAD, as well as four text anomaly detection datasets from NLP-ADBench, show that MaskDiff-AD achieves competitive performance against classical, diffusion-based, and recent tabular/text anomaly detection baselines. Notably, MaskDiff-AD achieves the best overall average rank, outperforming all twelve tabular baseline methods.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper proposes MaskDiff-AD, a forward-only anomaly detection method based on masked diffusion models trained exclusively on nominal data. Anomaly scores are computed from the reconstruction difficulty of randomly masked coordinates in test samples, operating directly on discrete spaces without reverse-time sampling. A non-parametric variant is introduced, along with theoretical characterization of Type-I and Type-II errors under a fixed threshold. Experiments across 14 categorical/mixed tabular datasets (ADBench, UADAD) and 4 text datasets (NLP-ADBench) report that MaskDiff-AD achieves the best overall average rank, outperforming 12 baselines including classical, diffusion-based, and recent tabular/text methods.

Significance. If the reported empirical results hold under proper statistical validation, the work provides a useful contribution to anomaly detection for discrete and mixed-type data by offering a content-sensitive scoring mechanism that avoids reverse diffusion sampling. The inclusion of theoretical error bounds and a non-parametric option strengthens the proposal relative to purely empirical diffusion baselines.

major comments (2)
  1. [§4] §4 (theoretical analysis): the Type-I/II error characterization is stated under a fixed detection threshold, but the precise assumptions on the data distribution and masking process required for the bounds to hold are not enumerated, making it difficult to assess the scope of the guarantee relative to the empirical benchmarks.
  2. [§5] §5 (experiments): while average ranks are reported across 18 datasets, no per-dataset statistical significance tests (e.g., paired t-tests or Wilcoxon with correction) or variance estimates across random seeds are provided, which is load-bearing for the central claim of outperforming all twelve baselines.
minor comments (2)
  1. [Abstract] The abstract and introduction would benefit from a one-sentence statement of the key modeling assumption (nominal-only training) to clarify the unsupervised setting.
  2. [§3] Notation for the masking probability and diffusion schedule should be unified between the method description and the non-parametric variant.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive review and the recommendation for minor revision. We appreciate the recognition of the method's contribution to discrete and mixed-type anomaly detection and address each major comment below.

read point-by-point responses
  1. Referee: [§4] §4 (theoretical analysis): the Type-I/II error characterization is stated under a fixed detection threshold, but the precise assumptions on the data distribution and masking process required for the bounds to hold are not enumerated, making it difficult to assess the scope of the guarantee relative to the empirical benchmarks.

    Authors: We agree that explicitly enumerating the assumptions would improve clarity. In the revised manuscript, we will insert a dedicated 'Assumptions' paragraph in §4 that lists the precise conditions under which the Type-I/II bounds hold: i.i.d. sampling from the nominal distribution, independent coordinate masking with fixed probability p, and convergence of the masked diffusion model to the true conditional distributions. This addition will directly relate the theoretical scope to the ADBench, UADAD, and NLP-ADBench empirical settings. revision: yes

  2. Referee: [§5] §5 (experiments): while average ranks are reported across 18 datasets, no per-dataset statistical significance tests (e.g., paired t-tests or Wilcoxon with correction) or variance estimates across random seeds are provided, which is load-bearing for the central claim of outperforming all twelve baselines.

    Authors: We acknowledge that statistical validation would strengthen the central empirical claim. In the revision we will augment §5 with (i) standard deviation estimates across five random seeds for all methods on the tabular datasets and (ii) per-dataset Wilcoxon signed-rank tests (Bonferroni-corrected) comparing MaskDiff-AD against the top three baselines, reported in an expanded Table 2 and a new appendix. Average rank will remain as the primary aggregate metric, supplemented by these tests. revision: yes

Circularity Check

0 steps flagged

No significant circularity

full rationale

The paper introduces MaskDiff-AD as a forward-only masked diffusion approach for anomaly scoring on categorical/mixed/tabular and text data, with the score defined directly from per-coordinate reconstruction difficulty under random masking and a model trained solely on nominal samples. It further supplies Type-I/II error bounds under a fixed threshold. No load-bearing step reduces to a self-citation, a fitted parameter renamed as a prediction, or an ansatz imported from prior author work; the derivation chain is self-contained against the external ADBench/UADAD/NLP-ADBench benchmarks and does not rely on internal redefinitions that would force the reported performance.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 0 invented entities

Review performed on abstract only; full model architecture, training objective, and hyperparameter choices are not visible, so the ledger is necessarily incomplete.

free parameters (1)
  • masking probability or diffusion schedule
    The masked diffusion procedure requires at least one such choice to define the training and scoring process; value not stated in abstract.
axioms (1)
  • domain assumption A masked diffusion model trained solely on nominal data captures the distribution sufficiently to make reconstruction difficulty a valid anomaly signal.
    This premise underpins both the scoring rule and the claimed separation of anomalies.

pith-pipeline@v0.9.1-grok · 5744 in / 1278 out tokens · 37881 ms · 2026-06-29T09:12:04.101326+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

55 extracted references · 5 canonical work pages · 1 internal anchor

  1. [1]

    Anomaly detection: A survey.ACM computing surveys (CSUR), 41(3):1–58, 2009

    Varun Chandola, Arindam Banerjee, and Vipin Kumar. Anomaly detection: A survey.ACM computing surveys (CSUR), 41(3):1–58, 2009

  2. [2]

    A unifying review of deep and shallow anomaly detection.Proceedings of the IEEE, 109(5):756–795, 2021

    Lukas Ruff, Jacob R Kauffmann, Robert A Vandermeulen, Grégoire Montavon, Wojciech Samek, Marius Kloft, Thomas G Dietterich, and Klaus-Robert Müller. A unifying review of deep and shallow anomaly detection.Proceedings of the IEEE, 109(5):756–795, 2021

  3. [3]

    Deep learning for anomaly detection: A review.ACM Computing Surveys (CSUR), 54(2):1–38, 2021

    Guansong Pang, Chunhua Shen, Longbing Cao, and Anton Van Den Hengel. Deep learning for anomaly detection: A review.ACM Computing Surveys (CSUR), 54(2):1–38, 2021

  4. [4]

    Sensor fault and patient anomaly detection and classification in medical wireless sensor networks

    Osman Salem, Alexey Guerassimov, Ahmed Mehaoua, Anthony Marcus, and Borko Furht. Sensor fault and patient anomaly detection and classification in medical wireless sensor networks. In2013 IEEE International Conference on Communications (ICC), pages 4373–4378. IEEE, 2013

  5. [5]

    Anomaly detection in medical wireless sensor networks using machine learning algorithms.Procedia Computer Science, 70:325–333, 2015

    Girik Pachauri and Sandeep Sharma. Anomaly detection in medical wireless sensor networks using machine learning algorithms.Procedia Computer Science, 70:325–333, 2015

  6. [6]

    A survey of anomaly detection techniques in financial domain.Future Generation Computer Systems, 55:278–288, 2016

    Mohiuddin Ahmed, Abdun Naser Mahmood, and Md Rafiqul Islam. A survey of anomaly detection techniques in financial domain.Future Generation Computer Systems, 55:278–288, 2016

  7. [7]

    Anomaly detection approaches for semiconductor manufacturing.Procedia Manufacturing, 11:2018–2024, 2017

    Gian Antonio Susto, Matteo Terzi, and Alessandro Beghi. Anomaly detection approaches for semiconductor manufacturing.Procedia Manufacturing, 11:2018–2024, 2017

  8. [8]

    Challenges for unsupervised anomaly detection in particle physics.Journal of High Energy Physics, 2022(3):66, 2022

    Katherine Fraser, Samuel Homiller, Rashmish K Mishra, Bryan Ostdiek, and Matthew D Schwartz. Challenges for unsupervised anomaly detection in particle physics.Journal of High Energy Physics, 2022(3):66, 2022

  9. [9]

    Telemetry-mining: a machine learning approach to anomaly detection and fault diagnosis 12 for space systems

    Takehisa Yairi, Yoshinobu Kawahara, Ryohei Fujimaki, Yuichi Sato, and Kazuo Machida. Telemetry-mining: a machine learning approach to anomaly detection and fault diagnosis 12 for space systems. In2nd IEEE International Conference on Space Mission Challenges for Information Technology (SMC-IT’06), pages 8–pp. IEEE, 2006

  10. [10]

    Anomaly detection methods for categorical data: A review.ACM Computing Surveys (CSUR), 52(2):1–35, 2019

    Ayman Taha and Ali S Hadi. Anomaly detection methods for categorical data: A review.ACM Computing Surveys (CSUR), 52(2):1–35, 2019

  11. [11]

    Adbench: Anomaly detection benchmark.Advances in Neural Information Processing Systems, 35:32142–32159, 2022

    Songqiao Han, Xiyang Hu, Hailiang Huang, Minqi Jiang, and Yue Zhao. Adbench: Anomaly detection benchmark.Advances in Neural Information Processing Systems, 35:32142–32159, 2022

  12. [12]

    Nlp- adbench: Nlp anomaly detection benchmark.Findings of the Association for Computational Linguistics: EMNLP 2025, 2025

    Yuangang Li, Jiaqi Li, Zhuo Xiao, Tiankai Yang, Yi Nian, Xiyang Hu, and Yue Zhao. Nlp- adbench: Nlp anomaly detection benchmark.Findings of the Association for Computational Linguistics: EMNLP 2025, 2025

  13. [13]

    Deep unsuper- vised learning using nonequilibrium thermodynamics

    Jascha Sohl-Dickstein, Eric Weiss, Niru Maheswaranathan, and Surya Ganguli. Deep unsuper- vised learning using nonequilibrium thermodynamics. InInternational conference on machine learning, pages 2256–2265. pmlr, 2015

  14. [14]

    Denoising diffusion probabilistic models.Advances in Neural Information Processing Systems, 33:6840–6851, 2020

    Jonathan Ho, Ajay Jain, and Pieter Abbeel. Denoising diffusion probabilistic models.Advances in Neural Information Processing Systems, 33:6840–6851, 2020

  15. [15]

    Simple and effective masked diffusion language models.Advances in Neural Information Processing Systems, 37:130136–130184, 2024

    Subham S Sahoo, Marianne Arriola, Yair Schiff, Aaron Gokaslan, Edgar Marroquin, Justin T Chiu, Alexander Rush, and Volodymyr Kuleshov. Simple and effective masked diffusion language models.Advances in Neural Information Processing Systems, 37:130136–130184, 2024

  16. [16]

    Structured denoising diffusion models in discrete state-spaces.Advances in Neural Information Processing Systems, 34:17981–17993, 2021

    Jacob Austin, Daniel D Johnson, Jonathan Ho, Daniel Tarlow, and Rianne Van Den Berg. Structured denoising diffusion models in discrete state-spaces.Advances in Neural Information Processing Systems, 34:17981–17993, 2021

  17. [17]

    Masked diffusion models are secretly time-agnostic masked models and exploit inaccurate categorical sampling

    Kaiwen Zheng, Yongxin Chen, Hanzi Mao, Ming-Yu Liu, Jun Zhu, and Qinsheng Zhang. Masked diffusion models are secretly time-agnostic masked models and exploit inaccurate categorical sampling. InInternational Conference on Learning Representations (ICLR), 2025

  18. [18]

    Dif- fusion beats autoregressive in data-constrained settings

    Mihir Prabhudesai, Mengning Wu, Amir Zadeh, Katerina Fragkiadaki, and Deepak Pathak. Dif- fusion beats autoregressive in data-constrained settings. InThe Thirty-ninth Annual Conference on Neural Information Processing Systems, 2025

  19. [19]

    Constrained discrete diffusion.arXiv preprint arXiv:2503.09790, 2025

    Michael Cardei, Jacob K Christopher, Thomas Hartvigsen, Bhavya Kailkhura, and Ferdinando Fioretto. Constrained discrete diffusion.arXiv preprint arXiv:2503.09790, 2025

  20. [20]

    arXiv preprint arXiv:2407.13734 , year=

    Masatoshi Uehara, Yulai Zhao, Tommaso Biancalani, and Sergey Levine. Understanding reinforcement learning-based fine-tuning of diffusion models: A tutorial and review.arXiv preprint arXiv:2407.13734, 2024

  21. [21]

    On diffusion model- ing for anomaly detection

    Victor Livernoche, Vineet Jain, Yashar Hezaveh, and Siamak Ravanbakhsh. On diffusion model- ing for anomaly detection. InThe Twelfth International Conference on Learning Representations, 2024

  22. [22]

    Efficient algorithms for mining outliers from large data sets.SIGMOD Record, 29(2):427–438, June 2000

    Sridhar Ramaswamy, Rajeev Rastogi, and Kyuseok Shim. Efficient algorithms for mining outliers from large data sets.SIGMOD Record, 29(2):427–438, June 2000

  23. [23]

    Isolation forest

    Fei Tony Liu, Kai Ming Ting, and Zhi-Hua Zhou. Isolation forest. InProceedings of the 2008 Eighth IEEE International Conference on Data Mining, ICDM ’08, page 413–422, USA, 2008. IEEE Computer Society. 13

  24. [24]

    Zheng Li, Yue Zhao, Xiyang Hu, Nicola Botta, Cezar Ionescu, and George H. Chen. ECOD: Unsu- pervised outlier detection using empirical cumulative distribution functions.IEEE Transactions on Knowledge and Data Engineering, 35(12):12181–12193, 2023

  25. [25]

    Copod: Copula-based outlier detection

    Zheng Li, Yue Zhao, Nicola Botta, Cezar Ionescu, and Xiyang Hu. Copod: Copula-based outlier detection. In2020 IEEE International Conference on Data Mining (ICDM), page 1118–1123. IEEE, November 2020

  26. [26]

    Unsupervised anomaly detection for auditing data and impact of categorical encodings.arXiv preprint arXiv:2210.14056, 2022

    Ajay Chawda, Stefanie Grimm, and Marius Kloft. Unsupervised anomaly detection for auditing data and impact of categorical encodings.arXiv preprint arXiv:2210.14056, 2022

  27. [27]

    MCM: Masked cell modeling for anomaly detection in tabular data

    Jiaxin Yin, Yuanyuan Qiao, Zitang Zhou, Xiangchao Wang, and Jie Yang. MCM: Masked cell modeling for anomaly detection in tabular data. InThe Twelfth International Conference on Learning Representations, 2024

  28. [28]

    Beyond individual input for deep anomaly detection on tabular data

    Hugo Thimonier, Fabrice Popineau, Arpad Rimmel, and Bich-Liên Doan. Beyond individual input for deep anomaly detection on tabular data. InProceedings of the 41st International Conference on Machine Learning, volume 235 ofProceedings of Machine Learning Research, pages 48097–48123. PMLR, 21–27 Jul 2024

  29. [29]

    Anomaly detection for tabular data with internal contrastive learning

    Tom Shenkar and Lior Wolf. Anomaly detection for tabular data with internal contrastive learning. InInternational Conference on Learning Representations, 2022

  30. [30]

    Fast and reliable anomaly detection in categorical data

    Leman Akoglu, Hanghang Tong, Jilles Vreeken, and Christos Faloutsos. Fast and reliable anomaly detection in categorical data. InProceedings of the 21st ACM International Conference on Information and Knowledge Management, CIKM ’12, page 415–424, New York, NY, USA,

  31. [31]

    Association for Computing Machinery

  32. [32]

    DRL: Decom- posed representation learning for tabular anomaly detection

    Hangting Ye, He Zhao, Wei Fan, Mingyuan Zhou, Dan dan Guo, and Yi Chang. DRL: Decom- posed representation learning for tabular anomaly detection. InThe Thirteenth International Conference on Learning Representations, 2025

  33. [33]

    Anomaly detection using autoencoders with nonlinear dimensionality reduction

    Mayu Sakurada and Takehisa Yairi. Anomaly detection using autoencoders with nonlinear dimensionality reduction. InProceedings of the MLSDA 2014 2nd workshop on machine learning for sensory data analysis, pages 4–11, 2014

  34. [34]

    Variational autoencoder based anomaly detection using recon- struction probability.Special lecture on IE, 2(1):1–18, 2015

    Jinwon An and Sungzoon Cho. Variational autoencoder based anomaly detection using recon- struction probability.Special lecture on IE, 2(1):1–18, 2015

  35. [35]

    Deep autoencoding gaussian mixture model for unsupervised anomaly detection

    Bo Zong, Qi Song, Martin Renqiang Min, Wei Cheng, Cristian Lumezanu, Daeki Cho, and Haifeng Chen. Deep autoencoding gaussian mixture model for unsupervised anomaly detection. InInternational Conference on Learning Representations, 2018

  36. [36]

    Waldstein, Ursula Schmidt-Erfurth, and Georg Langs

    Thomas Schlegl, Philipp Seeböck, Sebastian M. Waldstein, Ursula Schmidt-Erfurth, and Georg Langs. Unsupervised anomaly detection with generative adversarial networks to guide marker discovery. InInformation Processing in Medical Imaging, pages 146–157. Springer, 2017

  37. [37]

    Ganomaly: Semi-supervised anomaly detection via adversarial training

    Samet Akcay, Amir Atapour-Abarghouei, and Toby P Breckon. Ganomaly: Semi-supervised anomaly detection via adversarial training. InAsian conference on computer vision, pages 622–637. Springer, 2018. 14

  38. [38]

    Same same but differnet: Semi- supervised defect detection with normalizing flows

    Marco Rudolph, Bastian Wandt, and Bodo Rosenhahn. Same same but differnet: Semi- supervised defect detection with normalizing flows. InProceedings of the IEEE/CVF winter conference on applications of computer vision, pages 1907–1916, 2021

  39. [39]

    A survey on diffusion models for anomaly detection.arXiv preprint arXiv:2501.11430, 2025

    Jing Liu, Zhenchao Ma, Zepu Wang, Chenxuanyin Zou, Jiayang Ren, Zehua Wang, Liang Song, Bo Hu, Yang Liu, and Victor Leung. A survey on diffusion models for anomaly detection.arXiv preprint arXiv:2501.11430, 2025

  40. [40]

    Anoddpm: Anomaly detection with denoising diffusion probabilistic models using simplex noise

    Julian Wyatt, Adam Leach, Sebastian M Schmon, and Chris G Willcocks. Anoddpm: Anomaly detection with denoising diffusion probabilistic models using simplex noise. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 650–656, 2022

  41. [41]

    Anomaly detection with conditioned denoising diffusion models

    Arian Mousakhan, Thomas Brox, and Jawad Tayyub. Anomaly detection with conditioned denoising diffusion models. InDAGM German Conference on Pattern Recognition, pages 181–195. Springer, 2024

  42. [42]

    Diffusionad: Norm-guided one-step denoising diffusion for anomaly detection.IEEE transactions on pattern analysis and machine intelligence, 2025

    Hui Zhang, Zheng Wang, Dan Zeng, Zuxuan Wu, and Yu-Gang Jiang. Diffusionad: Norm-guided one-step denoising diffusion for anomaly detection.IEEE transactions on pattern analysis and machine intelligence, 2025

  43. [43]

    Simplified and generalized masked diffusion for discrete data

    Jiaxin Shi, Kehang Han, Zhe Wang, Arnaud Doucet, and Michalis Titsias. Simplified and generalized masked diffusion for discrete data. InThe Thirty-eighth Annual Conference on Neural Information Processing Systems, 2024

  44. [44]

    Language models are unsupervised multitask learners.OpenAI blog, 1(8):9, 2019

    Alec Radford, Jeffrey Wu, Rewon Child, David Luan, Dario Amodei, Ilya Sutskever, et al. Language models are unsupervised multitask learners.OpenAI blog, 1(8):9, 2019

  45. [45]

    Deep one-class classification

    Lukas Ruff, Robert Vandermeulen, Nico Goernitz, Lucas Deecke, Shoaib Ahmed Siddiqui, Alexander Binder, Emmanuel Müller, and Marius Kloft. Deep one-class classification. In Jennifer Dy and Andreas Krause, editors,Proceedings of the 35th International Conference on Machine Learning, volume 80 ofProceedings of Machine Learning Research, pages 4393–4402. PMLR...

  46. [46]

    Classification-based anomaly detection for general data

    Liron Bergman and Yedid Hoshen. Classification-based anomaly detection for general data. In International Conference on Learning Representations (ICLR), 2020

  47. [47]

    DATE: Detecting anomalies in text via self-supervision of transformers

    Andrei Manolache, Florin Brad, and Elena Burceanu. DATE: Detecting anomalies in text via self-supervision of transformers. In Kristina Toutanova, Anna Rumshisky, Luke Zettlemoyer, Dilek Hakkani-Tur, Iz Beltagy, Steven Bethard, Ryan Cotterell, Tanmoy Chakraborty, and Yichao Zhou, editors,Proceedings of the 2021 Conference of the North American Chapter of t...

  48. [48]

    Few-shot anomaly detection in text with deviation learning

    Anindya Sundar Das, Aravind Ajay, Sriparna Saha, and Monowar Bhuyan. Few-shot anomaly detection in text with deviation learning. InInternational Conference on Neural Information Processing, pages 425–438. Springer, 2023

  49. [49]

    BERT: Pre-training of deep bidirectional transformers for language understanding

    Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. BERT: Pre-training of deep bidirectional transformers for language understanding. In Jill Burstein, Christy Doran, and Thamar Solorio, editors,Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volu...

  50. [50]

    New embedding models and api updates

    OpenAI. New embedding models and api updates. https://openai.com/index/ new-embedding-models-and-api-updates/

  51. [51]

    Lof: identifying density-based local outliers

    Markus M Breunig, Hans-Peter Kriegel, Raymond T Ng, and Jörg Sander. Lof: identifying density-based local outliers. InProceedings of the 2000 ACM SIGMOD international conference on Management of data, pages 93–104, 2000

  52. [52]

    Generative adversarial active learning for unsupervised outlier detection.IEEE Transactions on Knowledge and Data Engineering, 32(8):1517–1528, 2019

    Yezheng Liu, Zhe Li, Chong Zhou, Yuanchun Jiang, Jianshan Sun, Meng Wang, and Xiangnan He. Generative adversarial active learning for unsupervised outlier detection.IEEE Transactions on Knowledge and Data Engineering, 32(8):1517–1528, 2019

  53. [53]

    Aggarwal.Outlier Analysis

    Charu C. Aggarwal.Outlier Analysis. Springer, 2 edition, 2017

  54. [54]

    Auto-Encoding Variational Bayes

    Diederik P. Kingma and Max Welling. Auto-encoding variational bayes.arXiv preprint arXiv:1312.6114, 2013

  55. [55]

    Lunar: Unifying local outlier detection methods via graph neural networks.Proceedings of the AAAI Conference on Artificial Intelligence, 36(6):6737–6745, Jun

    Adam Goodge, Bryan Hooi, See-Kiong Ng, and Wee Siong Ng. Lunar: Unifying local outlier detection methods via graph neural networks.Proceedings of the AAAI Conference on Artificial Intelligence, 36(6):6737–6745, Jun. 2022. A Algorithmic Details In this section, we provide the detailed algorithms for the two variants mentioned in Section 3.2. Algorithm 2 su...