Masked Diffusion Modeling for Anomaly Detection

Lixing Zhang; Liyan Xie; Yuchen Liang

arxiv: 2605.30046 · v1 · pith:7MJJZIHEnew · submitted 2026-05-28 · 💻 cs.LG · cs.AI

Masked Diffusion Modeling for Anomaly Detection

Lixing Zhang , Yuchen Liang , Liyan Xie This is my paper

Pith reviewed 2026-06-29 09:12 UTC · model grok-4.3

classification 💻 cs.LG cs.AI

keywords anomaly detectionmasked diffusiontabular anomaly detectioncategorical datadiscrete sequencesreconstruction scorediffusion models

0 comments

The pith

Masked diffusion models detect anomalies by scoring the difficulty of reconstructing masked coordinates in nominal data.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper proposes MaskDiff-AD, which applies masked diffusion models to anomaly detection for categorical, mixed-type, and sequence data. The model is trained only on normal samples and uses the challenge of reconstructing randomly masked parts as an anomaly score. This avoids reverse sampling and works directly on discrete data. It includes theory on error types and shows top average performance across many tabular and text datasets compared to other methods.

Core claim

MaskDiff-AD is a forward-only method based on masked diffusion models trained only on nominal data. Anomaly scores come from the difficulty of reconstructing randomly masked coordinates, creating a content-sensitive score for discrete state spaces without reverse-time sampling. A non-parametric variant is provided along with theoretical guarantees on Type-I and Type-II errors under a fixed threshold. It achieves the best overall average rank on fourteen categorical and mixed-type tabular datasets and four text datasets, outperforming twelve tabular baselines.

What carries the argument

Masked diffusion model for scoring anomaly via reconstruction difficulty of masked coordinates.

If this is right

Outperforms all twelve tabular baseline methods on the fourteen datasets from ADBench and UADAD.
Applies to four text anomaly detection datasets from NLP-ADBench with competitive results.
Provides theoretical guarantees characterizing Type-I and Type-II errors.
Non-parametric variant available for use without parametric assumptions.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Such reconstruction-based scoring might generalize to other discrete data types like graphs or code if the masking strategy is adapted.
The forward-only design could reduce computational cost compared to methods requiring full diffusion sampling at test time.
Further validation on imbalanced or high-dimensional datasets would test the score's robustness beyond the reported experiments.

Load-bearing premise

That the difficulty of reconstructing randomly masked coordinates in a model trained only on nominal data yields a reliable, content-sensitive anomaly score that separates anomalies from normal samples across the tested data distributions.

What would settle it

Finding a dataset of categorical or mixed data where the reconstruction errors for masked coordinates do not differ significantly between nominal and anomalous samples.

Figures

Figures reproduced from arXiv: 2605.30046 by Lixing Zhang, Liyan Xie, Yuchen Liang.

**Figure 2.** Figure 2: Synthetic heatmaps of the expectation of non-parametric and parametric reconstruction [PITH_FULL_IMAGE:figures/full_fig_p007_2.png] view at source ↗

**Figure 3.** Figure 3: Sensitivity of the ROC-AUC to the probe mask rate on the Vehicle Claims dataset. Each [PITH_FULL_IMAGE:figures/full_fig_p011_3.png] view at source ↗

**Figure 4.** Figure 4: Additional sensitivity analysis of Parametric MaskDiff-AD to the probe mask rate [PITH_FULL_IMAGE:figures/full_fig_p021_4.png] view at source ↗

read the original abstract

Anomaly detection aims to identify samples that deviate from the nominal data distribution and is central to many safety-critical applications. However, developing effective anomaly detection methods for categorical, mixed-type, and discrete sequence data remains challenging and relatively underexplored. Masked diffusion models provide a natural way to model such data by learning to recover masked values from the remaining visible context. In this paper, we propose Masked Diffusion for Anomaly Detection (MaskDiff-AD), a forward-only method based on masked diffusion models trained only on nominal data. Given a test sample, MaskDiff-AD constructs anomaly scores from the difficulty of reconstructing randomly masked coordinates, yielding a content-sensitive score that operates directly on discrete state spaces while avoiding reverse-time sampling. We also develop a non-parametric variant of MaskDiff-AD and provide theoretical guarantees by characterizing Type-I and Type-II errors under a fixed detection threshold. Experiments on fourteen categorical and mixed-type tabular datasets from ADBench and UADAD, as well as four text anomaly detection datasets from NLP-ADBench, show that MaskDiff-AD achieves competitive performance against classical, diffusion-based, and recent tabular/text anomaly detection baselines. Notably, MaskDiff-AD achieves the best overall average rank, outperforming all twelve tabular baseline methods.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper introduces a forward-only anomaly score from masked reconstruction in diffusion models for categorical data and claims the best average rank on the benchmarks, but the abstract leaves the method details and theory unexamined.

read the letter

The main point is a new anomaly scoring procedure for categorical and mixed tabular data that trains a masked diffusion model on nominal samples only and then scores test points by how hard it is to reconstruct randomly masked coordinates. It skips reverse-time sampling and adds a non-parametric variant plus some Type-I and Type-II error bounds.

The construction is distinct from prior diffusion detectors and targets a setting where many existing methods do not apply directly. The benchmarks cover ADBench, UADAD, and NLP-ADBench, which are reasonable choices for this regime, and the claim of beating all twelve tabular baselines on average rank is concrete.

The idea of using reconstruction difficulty under masking as a content-sensitive score is plausible for discrete data. Training only on normal data and operating directly on discrete states avoids some common pitfalls.

The soft spots are clear from the abstract. No experimental details appear—no masking rates, no aggregation rule for the per-coordinate errors, no statistical tests, and no derivation of the error guarantees or their assumptions. Without those, it is difficult to judge whether the central assumption holds or whether the reported ranks reflect a reliable separation. The low soundness score in the reader's note matches what is visible.

This work is aimed at researchers who need anomaly detection on non-continuous features in safety-critical settings. A reader already working on diffusion models or tabular AD would get value from the forward-only formulation if the full paper supplies the missing specifications.

It deserves peer review so the experiments and theory can be checked in detail.

Referee Report

2 major / 2 minor

Summary. The paper proposes MaskDiff-AD, a forward-only anomaly detection method based on masked diffusion models trained exclusively on nominal data. Anomaly scores are computed from the reconstruction difficulty of randomly masked coordinates in test samples, operating directly on discrete spaces without reverse-time sampling. A non-parametric variant is introduced, along with theoretical characterization of Type-I and Type-II errors under a fixed threshold. Experiments across 14 categorical/mixed tabular datasets (ADBench, UADAD) and 4 text datasets (NLP-ADBench) report that MaskDiff-AD achieves the best overall average rank, outperforming 12 baselines including classical, diffusion-based, and recent tabular/text methods.

Significance. If the reported empirical results hold under proper statistical validation, the work provides a useful contribution to anomaly detection for discrete and mixed-type data by offering a content-sensitive scoring mechanism that avoids reverse diffusion sampling. The inclusion of theoretical error bounds and a non-parametric option strengthens the proposal relative to purely empirical diffusion baselines.

major comments (2)

[§4] §4 (theoretical analysis): the Type-I/II error characterization is stated under a fixed detection threshold, but the precise assumptions on the data distribution and masking process required for the bounds to hold are not enumerated, making it difficult to assess the scope of the guarantee relative to the empirical benchmarks.
[§5] §5 (experiments): while average ranks are reported across 18 datasets, no per-dataset statistical significance tests (e.g., paired t-tests or Wilcoxon with correction) or variance estimates across random seeds are provided, which is load-bearing for the central claim of outperforming all twelve baselines.

minor comments (2)

[Abstract] The abstract and introduction would benefit from a one-sentence statement of the key modeling assumption (nominal-only training) to clarify the unsupervised setting.
[§3] Notation for the masking probability and diffusion schedule should be unified between the method description and the non-parametric variant.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive review and the recommendation for minor revision. We appreciate the recognition of the method's contribution to discrete and mixed-type anomaly detection and address each major comment below.

read point-by-point responses

Referee: [§4] §4 (theoretical analysis): the Type-I/II error characterization is stated under a fixed detection threshold, but the precise assumptions on the data distribution and masking process required for the bounds to hold are not enumerated, making it difficult to assess the scope of the guarantee relative to the empirical benchmarks.

Authors: We agree that explicitly enumerating the assumptions would improve clarity. In the revised manuscript, we will insert a dedicated 'Assumptions' paragraph in §4 that lists the precise conditions under which the Type-I/II bounds hold: i.i.d. sampling from the nominal distribution, independent coordinate masking with fixed probability p, and convergence of the masked diffusion model to the true conditional distributions. This addition will directly relate the theoretical scope to the ADBench, UADAD, and NLP-ADBench empirical settings. revision: yes
Referee: [§5] §5 (experiments): while average ranks are reported across 18 datasets, no per-dataset statistical significance tests (e.g., paired t-tests or Wilcoxon with correction) or variance estimates across random seeds are provided, which is load-bearing for the central claim of outperforming all twelve baselines.

Authors: We acknowledge that statistical validation would strengthen the central empirical claim. In the revision we will augment §5 with (i) standard deviation estimates across five random seeds for all methods on the tabular datasets and (ii) per-dataset Wilcoxon signed-rank tests (Bonferroni-corrected) comparing MaskDiff-AD against the top three baselines, reported in an expanded Table 2 and a new appendix. Average rank will remain as the primary aggregate metric, supplemented by these tests. revision: yes

Circularity Check

0 steps flagged

No significant circularity

full rationale

The paper introduces MaskDiff-AD as a forward-only masked diffusion approach for anomaly scoring on categorical/mixed/tabular and text data, with the score defined directly from per-coordinate reconstruction difficulty under random masking and a model trained solely on nominal samples. It further supplies Type-I/II error bounds under a fixed threshold. No load-bearing step reduces to a self-citation, a fitted parameter renamed as a prediction, or an ansatz imported from prior author work; the derivation chain is self-contained against the external ADBench/UADAD/NLP-ADBench benchmarks and does not rely on internal redefinitions that would force the reported performance.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 0 invented entities

Review performed on abstract only; full model architecture, training objective, and hyperparameter choices are not visible, so the ledger is necessarily incomplete.

free parameters (1)

masking probability or diffusion schedule
The masked diffusion procedure requires at least one such choice to define the training and scoring process; value not stated in abstract.

axioms (1)

domain assumption A masked diffusion model trained solely on nominal data captures the distribution sufficiently to make reconstruction difficulty a valid anomaly signal.
This premise underpins both the scoring rule and the claimed separation of anomalies.

pith-pipeline@v0.9.1-grok · 5744 in / 1278 out tokens · 37881 ms · 2026-06-29T09:12:04.101326+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

55 extracted references · 5 canonical work pages · 1 internal anchor

[1]

Anomaly detection: A survey.ACM computing surveys (CSUR), 41(3):1–58, 2009

Varun Chandola, Arindam Banerjee, and Vipin Kumar. Anomaly detection: A survey.ACM computing surveys (CSUR), 41(3):1–58, 2009

2009
[2]

A unifying review of deep and shallow anomaly detection.Proceedings of the IEEE, 109(5):756–795, 2021

Lukas Ruff, Jacob R Kauffmann, Robert A Vandermeulen, Grégoire Montavon, Wojciech Samek, Marius Kloft, Thomas G Dietterich, and Klaus-Robert Müller. A unifying review of deep and shallow anomaly detection.Proceedings of the IEEE, 109(5):756–795, 2021

2021
[3]

Deep learning for anomaly detection: A review.ACM Computing Surveys (CSUR), 54(2):1–38, 2021

Guansong Pang, Chunhua Shen, Longbing Cao, and Anton Van Den Hengel. Deep learning for anomaly detection: A review.ACM Computing Surveys (CSUR), 54(2):1–38, 2021

2021
[4]

Sensor fault and patient anomaly detection and classification in medical wireless sensor networks

Osman Salem, Alexey Guerassimov, Ahmed Mehaoua, Anthony Marcus, and Borko Furht. Sensor fault and patient anomaly detection and classification in medical wireless sensor networks. In2013 IEEE International Conference on Communications (ICC), pages 4373–4378. IEEE, 2013

2013
[5]

Anomaly detection in medical wireless sensor networks using machine learning algorithms.Procedia Computer Science, 70:325–333, 2015

Girik Pachauri and Sandeep Sharma. Anomaly detection in medical wireless sensor networks using machine learning algorithms.Procedia Computer Science, 70:325–333, 2015

2015
[6]

A survey of anomaly detection techniques in financial domain.Future Generation Computer Systems, 55:278–288, 2016

Mohiuddin Ahmed, Abdun Naser Mahmood, and Md Rafiqul Islam. A survey of anomaly detection techniques in financial domain.Future Generation Computer Systems, 55:278–288, 2016

2016
[7]

Anomaly detection approaches for semiconductor manufacturing.Procedia Manufacturing, 11:2018–2024, 2017

Gian Antonio Susto, Matteo Terzi, and Alessandro Beghi. Anomaly detection approaches for semiconductor manufacturing.Procedia Manufacturing, 11:2018–2024, 2017

2018
[8]

Challenges for unsupervised anomaly detection in particle physics.Journal of High Energy Physics, 2022(3):66, 2022

Katherine Fraser, Samuel Homiller, Rashmish K Mishra, Bryan Ostdiek, and Matthew D Schwartz. Challenges for unsupervised anomaly detection in particle physics.Journal of High Energy Physics, 2022(3):66, 2022

2022
[9]

Telemetry-mining: a machine learning approach to anomaly detection and fault diagnosis 12 for space systems

Takehisa Yairi, Yoshinobu Kawahara, Ryohei Fujimaki, Yuichi Sato, and Kazuo Machida. Telemetry-mining: a machine learning approach to anomaly detection and fault diagnosis 12 for space systems. In2nd IEEE International Conference on Space Mission Challenges for Information Technology (SMC-IT’06), pages 8–pp. IEEE, 2006

2006
[10]

Anomaly detection methods for categorical data: A review.ACM Computing Surveys (CSUR), 52(2):1–35, 2019

Ayman Taha and Ali S Hadi. Anomaly detection methods for categorical data: A review.ACM Computing Surveys (CSUR), 52(2):1–35, 2019

2019
[11]

Adbench: Anomaly detection benchmark.Advances in Neural Information Processing Systems, 35:32142–32159, 2022

Songqiao Han, Xiyang Hu, Hailiang Huang, Minqi Jiang, and Yue Zhao. Adbench: Anomaly detection benchmark.Advances in Neural Information Processing Systems, 35:32142–32159, 2022

2022
[12]

Nlp- adbench: Nlp anomaly detection benchmark.Findings of the Association for Computational Linguistics: EMNLP 2025, 2025

Yuangang Li, Jiaqi Li, Zhuo Xiao, Tiankai Yang, Yi Nian, Xiyang Hu, and Yue Zhao. Nlp- adbench: Nlp anomaly detection benchmark.Findings of the Association for Computational Linguistics: EMNLP 2025, 2025

2025
[13]

Deep unsuper- vised learning using nonequilibrium thermodynamics

Jascha Sohl-Dickstein, Eric Weiss, Niru Maheswaranathan, and Surya Ganguli. Deep unsuper- vised learning using nonequilibrium thermodynamics. InInternational conference on machine learning, pages 2256–2265. pmlr, 2015

2015
[14]

Denoising diffusion probabilistic models.Advances in Neural Information Processing Systems, 33:6840–6851, 2020

Jonathan Ho, Ajay Jain, and Pieter Abbeel. Denoising diffusion probabilistic models.Advances in Neural Information Processing Systems, 33:6840–6851, 2020

2020
[15]

Simple and effective masked diffusion language models.Advances in Neural Information Processing Systems, 37:130136–130184, 2024

Subham S Sahoo, Marianne Arriola, Yair Schiff, Aaron Gokaslan, Edgar Marroquin, Justin T Chiu, Alexander Rush, and Volodymyr Kuleshov. Simple and effective masked diffusion language models.Advances in Neural Information Processing Systems, 37:130136–130184, 2024

2024
[16]

Structured denoising diffusion models in discrete state-spaces.Advances in Neural Information Processing Systems, 34:17981–17993, 2021

Jacob Austin, Daniel D Johnson, Jonathan Ho, Daniel Tarlow, and Rianne Van Den Berg. Structured denoising diffusion models in discrete state-spaces.Advances in Neural Information Processing Systems, 34:17981–17993, 2021

2021
[17]

Masked diffusion models are secretly time-agnostic masked models and exploit inaccurate categorical sampling

Kaiwen Zheng, Yongxin Chen, Hanzi Mao, Ming-Yu Liu, Jun Zhu, and Qinsheng Zhang. Masked diffusion models are secretly time-agnostic masked models and exploit inaccurate categorical sampling. InInternational Conference on Learning Representations (ICLR), 2025

2025
[18]

Dif- fusion beats autoregressive in data-constrained settings

Mihir Prabhudesai, Mengning Wu, Amir Zadeh, Katerina Fragkiadaki, and Deepak Pathak. Dif- fusion beats autoregressive in data-constrained settings. InThe Thirty-ninth Annual Conference on Neural Information Processing Systems, 2025

2025
[19]

Constrained discrete diffusion.arXiv preprint arXiv:2503.09790, 2025

Michael Cardei, Jacob K Christopher, Thomas Hartvigsen, Bhavya Kailkhura, and Ferdinando Fioretto. Constrained discrete diffusion.arXiv preprint arXiv:2503.09790, 2025

work page arXiv 2025
[20]

arXiv preprint arXiv:2407.13734 , year=

Masatoshi Uehara, Yulai Zhao, Tommaso Biancalani, and Sergey Levine. Understanding reinforcement learning-based fine-tuning of diffusion models: A tutorial and review.arXiv preprint arXiv:2407.13734, 2024

work page arXiv 2024
[21]

On diffusion model- ing for anomaly detection

Victor Livernoche, Vineet Jain, Yashar Hezaveh, and Siamak Ravanbakhsh. On diffusion model- ing for anomaly detection. InThe Twelfth International Conference on Learning Representations, 2024

2024
[22]

Efficient algorithms for mining outliers from large data sets.SIGMOD Record, 29(2):427–438, June 2000

Sridhar Ramaswamy, Rajeev Rastogi, and Kyuseok Shim. Efficient algorithms for mining outliers from large data sets.SIGMOD Record, 29(2):427–438, June 2000

2000
[23]

Isolation forest

Fei Tony Liu, Kai Ming Ting, and Zhi-Hua Zhou. Isolation forest. InProceedings of the 2008 Eighth IEEE International Conference on Data Mining, ICDM ’08, page 413–422, USA, 2008. IEEE Computer Society. 13

2008
[24]

Zheng Li, Yue Zhao, Xiyang Hu, Nicola Botta, Cezar Ionescu, and George H. Chen. ECOD: Unsu- pervised outlier detection using empirical cumulative distribution functions.IEEE Transactions on Knowledge and Data Engineering, 35(12):12181–12193, 2023

2023
[25]

Copod: Copula-based outlier detection

Zheng Li, Yue Zhao, Nicola Botta, Cezar Ionescu, and Xiyang Hu. Copod: Copula-based outlier detection. In2020 IEEE International Conference on Data Mining (ICDM), page 1118–1123. IEEE, November 2020

2020
[26]

Unsupervised anomaly detection for auditing data and impact of categorical encodings.arXiv preprint arXiv:2210.14056, 2022

Ajay Chawda, Stefanie Grimm, and Marius Kloft. Unsupervised anomaly detection for auditing data and impact of categorical encodings.arXiv preprint arXiv:2210.14056, 2022

work page arXiv 2022
[27]

MCM: Masked cell modeling for anomaly detection in tabular data

Jiaxin Yin, Yuanyuan Qiao, Zitang Zhou, Xiangchao Wang, and Jie Yang. MCM: Masked cell modeling for anomaly detection in tabular data. InThe Twelfth International Conference on Learning Representations, 2024

2024
[28]

Beyond individual input for deep anomaly detection on tabular data

Hugo Thimonier, Fabrice Popineau, Arpad Rimmel, and Bich-Liên Doan. Beyond individual input for deep anomaly detection on tabular data. InProceedings of the 41st International Conference on Machine Learning, volume 235 ofProceedings of Machine Learning Research, pages 48097–48123. PMLR, 21–27 Jul 2024

2024
[29]

Anomaly detection for tabular data with internal contrastive learning

Tom Shenkar and Lior Wolf. Anomaly detection for tabular data with internal contrastive learning. InInternational Conference on Learning Representations, 2022

2022
[30]

Fast and reliable anomaly detection in categorical data

Leman Akoglu, Hanghang Tong, Jilles Vreeken, and Christos Faloutsos. Fast and reliable anomaly detection in categorical data. InProceedings of the 21st ACM International Conference on Information and Knowledge Management, CIKM ’12, page 415–424, New York, NY, USA,
[31]

Association for Computing Machinery
[32]

DRL: Decom- posed representation learning for tabular anomaly detection

Hangting Ye, He Zhao, Wei Fan, Mingyuan Zhou, Dan dan Guo, and Yi Chang. DRL: Decom- posed representation learning for tabular anomaly detection. InThe Thirteenth International Conference on Learning Representations, 2025

2025
[33]

Anomaly detection using autoencoders with nonlinear dimensionality reduction

Mayu Sakurada and Takehisa Yairi. Anomaly detection using autoencoders with nonlinear dimensionality reduction. InProceedings of the MLSDA 2014 2nd workshop on machine learning for sensory data analysis, pages 4–11, 2014

2014
[34]

Variational autoencoder based anomaly detection using recon- struction probability.Special lecture on IE, 2(1):1–18, 2015

Jinwon An and Sungzoon Cho. Variational autoencoder based anomaly detection using recon- struction probability.Special lecture on IE, 2(1):1–18, 2015

2015
[35]

Deep autoencoding gaussian mixture model for unsupervised anomaly detection

Bo Zong, Qi Song, Martin Renqiang Min, Wei Cheng, Cristian Lumezanu, Daeki Cho, and Haifeng Chen. Deep autoencoding gaussian mixture model for unsupervised anomaly detection. InInternational Conference on Learning Representations, 2018

2018
[36]

Waldstein, Ursula Schmidt-Erfurth, and Georg Langs

Thomas Schlegl, Philipp Seeböck, Sebastian M. Waldstein, Ursula Schmidt-Erfurth, and Georg Langs. Unsupervised anomaly detection with generative adversarial networks to guide marker discovery. InInformation Processing in Medical Imaging, pages 146–157. Springer, 2017

2017
[37]

Ganomaly: Semi-supervised anomaly detection via adversarial training

Samet Akcay, Amir Atapour-Abarghouei, and Toby P Breckon. Ganomaly: Semi-supervised anomaly detection via adversarial training. InAsian conference on computer vision, pages 622–637. Springer, 2018. 14

2018
[38]

Same same but differnet: Semi- supervised defect detection with normalizing flows

Marco Rudolph, Bastian Wandt, and Bodo Rosenhahn. Same same but differnet: Semi- supervised defect detection with normalizing flows. InProceedings of the IEEE/CVF winter conference on applications of computer vision, pages 1907–1916, 2021

1907
[39]

A survey on diffusion models for anomaly detection.arXiv preprint arXiv:2501.11430, 2025

Jing Liu, Zhenchao Ma, Zepu Wang, Chenxuanyin Zou, Jiayang Ren, Zehua Wang, Liang Song, Bo Hu, Yang Liu, and Victor Leung. A survey on diffusion models for anomaly detection.arXiv preprint arXiv:2501.11430, 2025

work page arXiv 2025
[40]

Anoddpm: Anomaly detection with denoising diffusion probabilistic models using simplex noise

Julian Wyatt, Adam Leach, Sebastian M Schmon, and Chris G Willcocks. Anoddpm: Anomaly detection with denoising diffusion probabilistic models using simplex noise. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 650–656, 2022

2022
[41]

Anomaly detection with conditioned denoising diffusion models

Arian Mousakhan, Thomas Brox, and Jawad Tayyub. Anomaly detection with conditioned denoising diffusion models. InDAGM German Conference on Pattern Recognition, pages 181–195. Springer, 2024

2024
[42]

Diffusionad: Norm-guided one-step denoising diffusion for anomaly detection.IEEE transactions on pattern analysis and machine intelligence, 2025

Hui Zhang, Zheng Wang, Dan Zeng, Zuxuan Wu, and Yu-Gang Jiang. Diffusionad: Norm-guided one-step denoising diffusion for anomaly detection.IEEE transactions on pattern analysis and machine intelligence, 2025

2025
[43]

Simplified and generalized masked diffusion for discrete data

Jiaxin Shi, Kehang Han, Zhe Wang, Arnaud Doucet, and Michalis Titsias. Simplified and generalized masked diffusion for discrete data. InThe Thirty-eighth Annual Conference on Neural Information Processing Systems, 2024

2024
[44]

Language models are unsupervised multitask learners.OpenAI blog, 1(8):9, 2019

Alec Radford, Jeffrey Wu, Rewon Child, David Luan, Dario Amodei, Ilya Sutskever, et al. Language models are unsupervised multitask learners.OpenAI blog, 1(8):9, 2019

2019
[45]

Deep one-class classification

Lukas Ruff, Robert Vandermeulen, Nico Goernitz, Lucas Deecke, Shoaib Ahmed Siddiqui, Alexander Binder, Emmanuel Müller, and Marius Kloft. Deep one-class classification. In Jennifer Dy and Andreas Krause, editors,Proceedings of the 35th International Conference on Machine Learning, volume 80 ofProceedings of Machine Learning Research, pages 4393–4402. PMLR...

2018
[46]

Classification-based anomaly detection for general data

Liron Bergman and Yedid Hoshen. Classification-based anomaly detection for general data. In International Conference on Learning Representations (ICLR), 2020

2020
[47]

DATE: Detecting anomalies in text via self-supervision of transformers

Andrei Manolache, Florin Brad, and Elena Burceanu. DATE: Detecting anomalies in text via self-supervision of transformers. In Kristina Toutanova, Anna Rumshisky, Luke Zettlemoyer, Dilek Hakkani-Tur, Iz Beltagy, Steven Bethard, Ryan Cotterell, Tanmoy Chakraborty, and Yichao Zhou, editors,Proceedings of the 2021 Conference of the North American Chapter of t...

2021
[48]

Few-shot anomaly detection in text with deviation learning

Anindya Sundar Das, Aravind Ajay, Sriparna Saha, and Monowar Bhuyan. Few-shot anomaly detection in text with deviation learning. InInternational Conference on Neural Information Processing, pages 425–438. Springer, 2023

2023
[49]

BERT: Pre-training of deep bidirectional transformers for language understanding

Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. BERT: Pre-training of deep bidirectional transformers for language understanding. In Jill Burstein, Christy Doran, and Thamar Solorio, editors,Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volu...

2019
[50]

New embedding models and api updates

OpenAI. New embedding models and api updates. https://openai.com/index/ new-embedding-models-and-api-updates/
[51]

Lof: identifying density-based local outliers

Markus M Breunig, Hans-Peter Kriegel, Raymond T Ng, and Jörg Sander. Lof: identifying density-based local outliers. InProceedings of the 2000 ACM SIGMOD international conference on Management of data, pages 93–104, 2000

2000
[52]

Generative adversarial active learning for unsupervised outlier detection.IEEE Transactions on Knowledge and Data Engineering, 32(8):1517–1528, 2019

Yezheng Liu, Zhe Li, Chong Zhou, Yuanchun Jiang, Jianshan Sun, Meng Wang, and Xiangnan He. Generative adversarial active learning for unsupervised outlier detection.IEEE Transactions on Knowledge and Data Engineering, 32(8):1517–1528, 2019

2019
[53]

Aggarwal.Outlier Analysis

Charu C. Aggarwal.Outlier Analysis. Springer, 2 edition, 2017

2017
[54]

Auto-Encoding Variational Bayes

Diederik P. Kingma and Max Welling. Auto-encoding variational bayes.arXiv preprint arXiv:1312.6114, 2013

work page internal anchor Pith review Pith/arXiv arXiv 2013
[55]

Lunar: Unifying local outlier detection methods via graph neural networks.Proceedings of the AAAI Conference on Artificial Intelligence, 36(6):6737–6745, Jun

Adam Goodge, Bryan Hooi, See-Kiong Ng, and Wee Siong Ng. Lunar: Unifying local outlier detection methods via graph neural networks.Proceedings of the AAAI Conference on Artificial Intelligence, 36(6):6737–6745, Jun. 2022. A Algorithmic Details In this section, we provide the detailed algorithms for the two variants mentioned in Section 3.2. Algorithm 2 su...

2022

[1] [1]

Anomaly detection: A survey.ACM computing surveys (CSUR), 41(3):1–58, 2009

Varun Chandola, Arindam Banerjee, and Vipin Kumar. Anomaly detection: A survey.ACM computing surveys (CSUR), 41(3):1–58, 2009

2009

[2] [2]

A unifying review of deep and shallow anomaly detection.Proceedings of the IEEE, 109(5):756–795, 2021

Lukas Ruff, Jacob R Kauffmann, Robert A Vandermeulen, Grégoire Montavon, Wojciech Samek, Marius Kloft, Thomas G Dietterich, and Klaus-Robert Müller. A unifying review of deep and shallow anomaly detection.Proceedings of the IEEE, 109(5):756–795, 2021

2021

[3] [3]

Deep learning for anomaly detection: A review.ACM Computing Surveys (CSUR), 54(2):1–38, 2021

Guansong Pang, Chunhua Shen, Longbing Cao, and Anton Van Den Hengel. Deep learning for anomaly detection: A review.ACM Computing Surveys (CSUR), 54(2):1–38, 2021

2021

[4] [4]

Sensor fault and patient anomaly detection and classification in medical wireless sensor networks

Osman Salem, Alexey Guerassimov, Ahmed Mehaoua, Anthony Marcus, and Borko Furht. Sensor fault and patient anomaly detection and classification in medical wireless sensor networks. In2013 IEEE International Conference on Communications (ICC), pages 4373–4378. IEEE, 2013

2013

[5] [5]

Anomaly detection in medical wireless sensor networks using machine learning algorithms.Procedia Computer Science, 70:325–333, 2015

Girik Pachauri and Sandeep Sharma. Anomaly detection in medical wireless sensor networks using machine learning algorithms.Procedia Computer Science, 70:325–333, 2015

2015

[6] [6]

A survey of anomaly detection techniques in financial domain.Future Generation Computer Systems, 55:278–288, 2016

Mohiuddin Ahmed, Abdun Naser Mahmood, and Md Rafiqul Islam. A survey of anomaly detection techniques in financial domain.Future Generation Computer Systems, 55:278–288, 2016

2016

[7] [7]

Anomaly detection approaches for semiconductor manufacturing.Procedia Manufacturing, 11:2018–2024, 2017

Gian Antonio Susto, Matteo Terzi, and Alessandro Beghi. Anomaly detection approaches for semiconductor manufacturing.Procedia Manufacturing, 11:2018–2024, 2017

2018

[8] [8]

Challenges for unsupervised anomaly detection in particle physics.Journal of High Energy Physics, 2022(3):66, 2022

Katherine Fraser, Samuel Homiller, Rashmish K Mishra, Bryan Ostdiek, and Matthew D Schwartz. Challenges for unsupervised anomaly detection in particle physics.Journal of High Energy Physics, 2022(3):66, 2022

2022

[9] [9]

Telemetry-mining: a machine learning approach to anomaly detection and fault diagnosis 12 for space systems

Takehisa Yairi, Yoshinobu Kawahara, Ryohei Fujimaki, Yuichi Sato, and Kazuo Machida. Telemetry-mining: a machine learning approach to anomaly detection and fault diagnosis 12 for space systems. In2nd IEEE International Conference on Space Mission Challenges for Information Technology (SMC-IT’06), pages 8–pp. IEEE, 2006

2006

[10] [10]

Anomaly detection methods for categorical data: A review.ACM Computing Surveys (CSUR), 52(2):1–35, 2019

Ayman Taha and Ali S Hadi. Anomaly detection methods for categorical data: A review.ACM Computing Surveys (CSUR), 52(2):1–35, 2019

2019

[11] [11]

Adbench: Anomaly detection benchmark.Advances in Neural Information Processing Systems, 35:32142–32159, 2022

Songqiao Han, Xiyang Hu, Hailiang Huang, Minqi Jiang, and Yue Zhao. Adbench: Anomaly detection benchmark.Advances in Neural Information Processing Systems, 35:32142–32159, 2022

2022

[12] [12]

Nlp- adbench: Nlp anomaly detection benchmark.Findings of the Association for Computational Linguistics: EMNLP 2025, 2025

Yuangang Li, Jiaqi Li, Zhuo Xiao, Tiankai Yang, Yi Nian, Xiyang Hu, and Yue Zhao. Nlp- adbench: Nlp anomaly detection benchmark.Findings of the Association for Computational Linguistics: EMNLP 2025, 2025

2025

[13] [13]

Deep unsuper- vised learning using nonequilibrium thermodynamics

Jascha Sohl-Dickstein, Eric Weiss, Niru Maheswaranathan, and Surya Ganguli. Deep unsuper- vised learning using nonequilibrium thermodynamics. InInternational conference on machine learning, pages 2256–2265. pmlr, 2015

2015

[14] [14]

Denoising diffusion probabilistic models.Advances in Neural Information Processing Systems, 33:6840–6851, 2020

Jonathan Ho, Ajay Jain, and Pieter Abbeel. Denoising diffusion probabilistic models.Advances in Neural Information Processing Systems, 33:6840–6851, 2020

2020

[15] [15]

Simple and effective masked diffusion language models.Advances in Neural Information Processing Systems, 37:130136–130184, 2024

Subham S Sahoo, Marianne Arriola, Yair Schiff, Aaron Gokaslan, Edgar Marroquin, Justin T Chiu, Alexander Rush, and Volodymyr Kuleshov. Simple and effective masked diffusion language models.Advances in Neural Information Processing Systems, 37:130136–130184, 2024

2024

[16] [16]

Structured denoising diffusion models in discrete state-spaces.Advances in Neural Information Processing Systems, 34:17981–17993, 2021

Jacob Austin, Daniel D Johnson, Jonathan Ho, Daniel Tarlow, and Rianne Van Den Berg. Structured denoising diffusion models in discrete state-spaces.Advances in Neural Information Processing Systems, 34:17981–17993, 2021

2021

[17] [17]

Masked diffusion models are secretly time-agnostic masked models and exploit inaccurate categorical sampling

Kaiwen Zheng, Yongxin Chen, Hanzi Mao, Ming-Yu Liu, Jun Zhu, and Qinsheng Zhang. Masked diffusion models are secretly time-agnostic masked models and exploit inaccurate categorical sampling. InInternational Conference on Learning Representations (ICLR), 2025

2025

[18] [18]

Dif- fusion beats autoregressive in data-constrained settings

Mihir Prabhudesai, Mengning Wu, Amir Zadeh, Katerina Fragkiadaki, and Deepak Pathak. Dif- fusion beats autoregressive in data-constrained settings. InThe Thirty-ninth Annual Conference on Neural Information Processing Systems, 2025

2025

[19] [19]

Constrained discrete diffusion.arXiv preprint arXiv:2503.09790, 2025

Michael Cardei, Jacob K Christopher, Thomas Hartvigsen, Bhavya Kailkhura, and Ferdinando Fioretto. Constrained discrete diffusion.arXiv preprint arXiv:2503.09790, 2025

work page arXiv 2025

[20] [20]

arXiv preprint arXiv:2407.13734 , year=

Masatoshi Uehara, Yulai Zhao, Tommaso Biancalani, and Sergey Levine. Understanding reinforcement learning-based fine-tuning of diffusion models: A tutorial and review.arXiv preprint arXiv:2407.13734, 2024

work page arXiv 2024

[21] [21]

On diffusion model- ing for anomaly detection

Victor Livernoche, Vineet Jain, Yashar Hezaveh, and Siamak Ravanbakhsh. On diffusion model- ing for anomaly detection. InThe Twelfth International Conference on Learning Representations, 2024

2024

[22] [22]

Efficient algorithms for mining outliers from large data sets.SIGMOD Record, 29(2):427–438, June 2000

Sridhar Ramaswamy, Rajeev Rastogi, and Kyuseok Shim. Efficient algorithms for mining outliers from large data sets.SIGMOD Record, 29(2):427–438, June 2000

2000

[23] [23]

Isolation forest

Fei Tony Liu, Kai Ming Ting, and Zhi-Hua Zhou. Isolation forest. InProceedings of the 2008 Eighth IEEE International Conference on Data Mining, ICDM ’08, page 413–422, USA, 2008. IEEE Computer Society. 13

2008

[24] [24]

Zheng Li, Yue Zhao, Xiyang Hu, Nicola Botta, Cezar Ionescu, and George H. Chen. ECOD: Unsu- pervised outlier detection using empirical cumulative distribution functions.IEEE Transactions on Knowledge and Data Engineering, 35(12):12181–12193, 2023

2023

[25] [25]

Copod: Copula-based outlier detection

Zheng Li, Yue Zhao, Nicola Botta, Cezar Ionescu, and Xiyang Hu. Copod: Copula-based outlier detection. In2020 IEEE International Conference on Data Mining (ICDM), page 1118–1123. IEEE, November 2020

2020

[26] [26]

Unsupervised anomaly detection for auditing data and impact of categorical encodings.arXiv preprint arXiv:2210.14056, 2022

Ajay Chawda, Stefanie Grimm, and Marius Kloft. Unsupervised anomaly detection for auditing data and impact of categorical encodings.arXiv preprint arXiv:2210.14056, 2022

work page arXiv 2022

[27] [27]

MCM: Masked cell modeling for anomaly detection in tabular data

Jiaxin Yin, Yuanyuan Qiao, Zitang Zhou, Xiangchao Wang, and Jie Yang. MCM: Masked cell modeling for anomaly detection in tabular data. InThe Twelfth International Conference on Learning Representations, 2024

2024

[28] [28]

Beyond individual input for deep anomaly detection on tabular data

Hugo Thimonier, Fabrice Popineau, Arpad Rimmel, and Bich-Liên Doan. Beyond individual input for deep anomaly detection on tabular data. InProceedings of the 41st International Conference on Machine Learning, volume 235 ofProceedings of Machine Learning Research, pages 48097–48123. PMLR, 21–27 Jul 2024

2024

[29] [29]

Anomaly detection for tabular data with internal contrastive learning

Tom Shenkar and Lior Wolf. Anomaly detection for tabular data with internal contrastive learning. InInternational Conference on Learning Representations, 2022

2022

[30] [30]

Fast and reliable anomaly detection in categorical data

Leman Akoglu, Hanghang Tong, Jilles Vreeken, and Christos Faloutsos. Fast and reliable anomaly detection in categorical data. InProceedings of the 21st ACM International Conference on Information and Knowledge Management, CIKM ’12, page 415–424, New York, NY, USA,

[31] [31]

Association for Computing Machinery

[32] [32]

DRL: Decom- posed representation learning for tabular anomaly detection

Hangting Ye, He Zhao, Wei Fan, Mingyuan Zhou, Dan dan Guo, and Yi Chang. DRL: Decom- posed representation learning for tabular anomaly detection. InThe Thirteenth International Conference on Learning Representations, 2025

2025

[33] [33]

Anomaly detection using autoencoders with nonlinear dimensionality reduction

Mayu Sakurada and Takehisa Yairi. Anomaly detection using autoencoders with nonlinear dimensionality reduction. InProceedings of the MLSDA 2014 2nd workshop on machine learning for sensory data analysis, pages 4–11, 2014

2014

[34] [34]

Variational autoencoder based anomaly detection using recon- struction probability.Special lecture on IE, 2(1):1–18, 2015

Jinwon An and Sungzoon Cho. Variational autoencoder based anomaly detection using recon- struction probability.Special lecture on IE, 2(1):1–18, 2015

2015

[35] [35]

Deep autoencoding gaussian mixture model for unsupervised anomaly detection

Bo Zong, Qi Song, Martin Renqiang Min, Wei Cheng, Cristian Lumezanu, Daeki Cho, and Haifeng Chen. Deep autoencoding gaussian mixture model for unsupervised anomaly detection. InInternational Conference on Learning Representations, 2018

2018

[36] [36]

Waldstein, Ursula Schmidt-Erfurth, and Georg Langs

Thomas Schlegl, Philipp Seeböck, Sebastian M. Waldstein, Ursula Schmidt-Erfurth, and Georg Langs. Unsupervised anomaly detection with generative adversarial networks to guide marker discovery. InInformation Processing in Medical Imaging, pages 146–157. Springer, 2017

2017

[37] [37]

Ganomaly: Semi-supervised anomaly detection via adversarial training

Samet Akcay, Amir Atapour-Abarghouei, and Toby P Breckon. Ganomaly: Semi-supervised anomaly detection via adversarial training. InAsian conference on computer vision, pages 622–637. Springer, 2018. 14

2018

[38] [38]

Same same but differnet: Semi- supervised defect detection with normalizing flows

Marco Rudolph, Bastian Wandt, and Bodo Rosenhahn. Same same but differnet: Semi- supervised defect detection with normalizing flows. InProceedings of the IEEE/CVF winter conference on applications of computer vision, pages 1907–1916, 2021

1907

[39] [39]

A survey on diffusion models for anomaly detection.arXiv preprint arXiv:2501.11430, 2025

Jing Liu, Zhenchao Ma, Zepu Wang, Chenxuanyin Zou, Jiayang Ren, Zehua Wang, Liang Song, Bo Hu, Yang Liu, and Victor Leung. A survey on diffusion models for anomaly detection.arXiv preprint arXiv:2501.11430, 2025

work page arXiv 2025

[40] [40]

Anoddpm: Anomaly detection with denoising diffusion probabilistic models using simplex noise

Julian Wyatt, Adam Leach, Sebastian M Schmon, and Chris G Willcocks. Anoddpm: Anomaly detection with denoising diffusion probabilistic models using simplex noise. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 650–656, 2022

2022

[41] [41]

Anomaly detection with conditioned denoising diffusion models

Arian Mousakhan, Thomas Brox, and Jawad Tayyub. Anomaly detection with conditioned denoising diffusion models. InDAGM German Conference on Pattern Recognition, pages 181–195. Springer, 2024

2024

[42] [42]

Diffusionad: Norm-guided one-step denoising diffusion for anomaly detection.IEEE transactions on pattern analysis and machine intelligence, 2025

Hui Zhang, Zheng Wang, Dan Zeng, Zuxuan Wu, and Yu-Gang Jiang. Diffusionad: Norm-guided one-step denoising diffusion for anomaly detection.IEEE transactions on pattern analysis and machine intelligence, 2025

2025

[43] [43]

Simplified and generalized masked diffusion for discrete data

Jiaxin Shi, Kehang Han, Zhe Wang, Arnaud Doucet, and Michalis Titsias. Simplified and generalized masked diffusion for discrete data. InThe Thirty-eighth Annual Conference on Neural Information Processing Systems, 2024

2024

[44] [44]

Language models are unsupervised multitask learners.OpenAI blog, 1(8):9, 2019

Alec Radford, Jeffrey Wu, Rewon Child, David Luan, Dario Amodei, Ilya Sutskever, et al. Language models are unsupervised multitask learners.OpenAI blog, 1(8):9, 2019

2019

[45] [45]

Deep one-class classification

Lukas Ruff, Robert Vandermeulen, Nico Goernitz, Lucas Deecke, Shoaib Ahmed Siddiqui, Alexander Binder, Emmanuel Müller, and Marius Kloft. Deep one-class classification. In Jennifer Dy and Andreas Krause, editors,Proceedings of the 35th International Conference on Machine Learning, volume 80 ofProceedings of Machine Learning Research, pages 4393–4402. PMLR...

2018

[46] [46]

Classification-based anomaly detection for general data

Liron Bergman and Yedid Hoshen. Classification-based anomaly detection for general data. In International Conference on Learning Representations (ICLR), 2020

2020

[47] [47]

DATE: Detecting anomalies in text via self-supervision of transformers

Andrei Manolache, Florin Brad, and Elena Burceanu. DATE: Detecting anomalies in text via self-supervision of transformers. In Kristina Toutanova, Anna Rumshisky, Luke Zettlemoyer, Dilek Hakkani-Tur, Iz Beltagy, Steven Bethard, Ryan Cotterell, Tanmoy Chakraborty, and Yichao Zhou, editors,Proceedings of the 2021 Conference of the North American Chapter of t...

2021

[48] [48]

Few-shot anomaly detection in text with deviation learning

Anindya Sundar Das, Aravind Ajay, Sriparna Saha, and Monowar Bhuyan. Few-shot anomaly detection in text with deviation learning. InInternational Conference on Neural Information Processing, pages 425–438. Springer, 2023

2023

[49] [49]

BERT: Pre-training of deep bidirectional transformers for language understanding

Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. BERT: Pre-training of deep bidirectional transformers for language understanding. In Jill Burstein, Christy Doran, and Thamar Solorio, editors,Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volu...

2019

[50] [50]

New embedding models and api updates

OpenAI. New embedding models and api updates. https://openai.com/index/ new-embedding-models-and-api-updates/

[51] [51]

Lof: identifying density-based local outliers

Markus M Breunig, Hans-Peter Kriegel, Raymond T Ng, and Jörg Sander. Lof: identifying density-based local outliers. InProceedings of the 2000 ACM SIGMOD international conference on Management of data, pages 93–104, 2000

2000

[52] [52]

Generative adversarial active learning for unsupervised outlier detection.IEEE Transactions on Knowledge and Data Engineering, 32(8):1517–1528, 2019

Yezheng Liu, Zhe Li, Chong Zhou, Yuanchun Jiang, Jianshan Sun, Meng Wang, and Xiangnan He. Generative adversarial active learning for unsupervised outlier detection.IEEE Transactions on Knowledge and Data Engineering, 32(8):1517–1528, 2019

2019

[53] [53]

Aggarwal.Outlier Analysis

Charu C. Aggarwal.Outlier Analysis. Springer, 2 edition, 2017

2017

[54] [54]

Auto-Encoding Variational Bayes

Diederik P. Kingma and Max Welling. Auto-encoding variational bayes.arXiv preprint arXiv:1312.6114, 2013

work page internal anchor Pith review Pith/arXiv arXiv 2013

[55] [55]

Lunar: Unifying local outlier detection methods via graph neural networks.Proceedings of the AAAI Conference on Artificial Intelligence, 36(6):6737–6745, Jun

Adam Goodge, Bryan Hooi, See-Kiong Ng, and Wee Siong Ng. Lunar: Unifying local outlier detection methods via graph neural networks.Proceedings of the AAAI Conference on Artificial Intelligence, 36(6):6737–6745, Jun. 2022. A Algorithmic Details In this section, we provide the detailed algorithms for the two variants mentioned in Section 3.2. Algorithm 2 su...

2022