Dual Prototype-Conditioned Diffusion Model for Scalable Multi-Class Unsupervised Anomaly Detection in Large Category Spaces

Bo Chen; Hongwei Liu; Weijiang Lv; Wenchao Chen; Yaoxuan Feng; Yubiao Wang; Yuxin Li; Zixuan Zhao

arxiv: 2605.24402 · v1 · pith:5K5X65FBnew · submitted 2026-05-23 · 💻 cs.CV

Dual Prototype-Conditioned Diffusion Model for Scalable Multi-Class Unsupervised Anomaly Detection in Large Category Spaces

Yaoxuan Feng , Yuxin Li , Weijiang Lv , Zixuan Zhao , Yubiao Wang , Wenchao Chen , Bo Chen , Hongwei Liu This is my paper

Pith reviewed 2026-06-30 13:45 UTC · model grok-4.3

classification 💻 cs.CV

keywords anomaly detectiondiffusion modelprototypesmulti-classunsupervised learningscalabilitycomputer vision

0 comments

The pith

A diffusion model conditioned on dual local and global prototypes enables scalable multi-class anomaly detection across growing category counts.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper addresses the degradation in multi-class unsupervised anomaly detection as the number of categories increases due to complex normal distributions. It proposes DPDiff-AD which uses local prototypes to capture fine-grained patterns through nearest-prototype aggregation and global prototypes to regulate feature geometry via optimal transport. These define a structured normality space that is refined by a diffusion process with prototype-aware attention. Experiments show improved AUROC on large datasets and stable performance with more categories. This matters because it allows unified models for diverse product categories without retraining per category.

Core claim

DPDiff-AD models heterogeneous normal distributions through complementary local and global prototypes. Local prototypes capture representative fine-grained structural patterns via nearest-prototype aggregation, while global prototypes regulate holistic feature geometry through optimal transport regularization. This space is refined through diffusion-based reconstruction conditioned on both via prototype-aware attention, achieving precise normality modeling and preserving structured separability as category cardinality grows.

What carries the argument

Dual prototypes (local via nearest-prototype aggregation and global via optimal transport) conditioned into a diffusion model via prototype-aware attention to define and refine a structured normality space.

If this is right

Improves image-level AUROC by 5.3 points and pixel-level by 2.9 points on a 160-category dataset over prior SOTA.
Maintains stable performance as the number of categories increases.
Enables precise normality modeling for heterogeneous distributions across diverse categories.
Supports scalable anomaly discrimination in large category spaces.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Similar dual-prototype conditioning could extend to other generative models beyond diffusion for anomaly tasks.
The approach might apply to domains like medical imaging where category diversity is high.
Testing on even larger category sets could reveal the limits of the scalability claim.

Load-bearing premise

The dual prototypes and prototype-aware attention together preserve separability of normal and anomalous distributions as the number of categories grows.

What would settle it

Performance degradation on a dataset with significantly more than 160 categories, or failure of the local and global prototypes to complement each other in maintaining AUROC stability.

read the original abstract

Multi-class anomaly detection aims to build unified models across diverse product categories. However, as the number of categories grows, its performance often degrades due to increasingly complex and heterogeneous normal distributions. To address this challenge, we propose DPDiff-AD, a Dual Prototype-conditioned Diffusion model for large-scale multi-class Anomaly Detection. DPDiff-AD models heterogeneous normal distributions through complementary local and global prototypes. Local prototypes capture representative fine-grained structural patterns via nearest-prototype aggregation, while global prototypes regulate holistic feature geometry through optimal transport regularization. Together, these dual-scale representations define a structured normality space. This space is refined through diffusion-based reconstruction conditioned on both local and global prototypes via prototype-aware attention. By jointly leveraging dual prototypes during generation, DPDiff-AD achieves precise normality modeling, preserves structured separability as category cardinality grows, and enables scalable anomaly discrimination. Extensive experiments across five benchmarks demonstrate the effectiveness and scalability of DPDiff-AD. On the 160-category large-scale dataset, it improves image- and pixel-level AUROC by 5.3 and 2.9 points over the previous state-of-the-art method Dinomaly+, while maintaining stable performance as category cardinality increases.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

DPDiff-AD combines local nearest-prototype aggregation with global OT regularization inside a diffusion model via prototype-aware attention, and the 5.3/2.9 AUROC gains on 160 categories are the concrete result worth checking.

read the letter

The paper introduces DPDiff-AD, which conditions a diffusion reverse process on both local prototypes (built by nearest-prototype aggregation of fine-grained patterns) and global prototypes (shaped by optimal transport regularization), then uses prototype-aware attention to guide reconstruction for anomaly scoring across many categories.

It does a solid job naming the practical issue—performance drop as category count rises because normal distributions become heterogeneous—and shows measurable improvement over Dinomaly+ on the large-scale 160-category set while claiming stability as cardinality increases. The dual-scale construction is a distinct modeling choice not directly in the referenced prior work.

The soft spot is that the abstract supplies no ablations, derivations, or controls that isolate whether the attention mechanism actually routes the two prototype sets without cross-category interference or mode collapse. The reported gains could be driven mainly by the diffusion backbone or training details rather than the dual conditioning, and the lack of error bars or fuller dataset descriptions leaves the stability claim under-anchored.

This is for people working on industrial multi-class anomaly detection who need models that hold up when the number of product types reaches the dozens or low hundreds. A reader in that niche would get value from the architecture even if the experiments need tightening.

I would send it to peer review. The idea is specific enough and the numbers are concrete enough to merit referee time, though the mechanism verification would be the main point to press.

Referee Report

2 major / 2 minor

Summary. The paper presents DPDiff-AD, a Dual Prototype-conditioned Diffusion model for scalable multi-class unsupervised anomaly detection. It models heterogeneous normal distributions using local prototypes (via nearest-prototype aggregation) and global prototypes (via optimal transport regularization), which are then used to condition a diffusion model through prototype-aware attention. The method claims to achieve improved performance on large category spaces, with specific gains of 5.3 and 2.9 AUROC points on image- and pixel-level metrics over Dinomaly+ on a 160-category dataset, while maintaining stability as the number of categories increases. Experiments are reported across five benchmarks.

Significance. If the central claims hold, this work would be significant for advancing multi-class anomaly detection in settings with large and growing category spaces, such as industrial inspection. By addressing the degradation in performance with increasing category cardinality through a structured normality space defined by dual prototypes, it offers a potential solution to a practical scalability issue. The integration of diffusion models with prototype conditioning is a promising direction, and the reported empirical gains on a large-scale dataset highlight its potential impact if the underlying mechanism is robustly validated.

major comments (2)

[Abstract] Abstract: The scalability to 160 categories is asserted based on the dual-prototype conditioning preserving separability via prototype-aware attention, but the abstract provides no derivation, equation, or ablation demonstrating that the attention mechanism integrates local and global prototypes without introducing cross-category interference or degrading separability; this is load-bearing for the claim that performance remains stable as cardinality grows.
[Experiments] Experiments section: The reported 5.3 and 2.9 point AUROC improvements on the 160-category dataset lack accompanying error bars, details on the dataset composition, or ablations isolating the contribution of the dual prototypes versus the diffusion backbone or training schedule, making it difficult to attribute the gains specifically to the proposed components.

minor comments (2)

The abstract refers to 'five benchmarks' without naming them, which should be clarified for context.
Notation for local and global prototypes could be introduced more clearly with equations in the method section to aid readability.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the detailed and constructive comments. We address each major comment below, indicating revisions that will be incorporated to strengthen the presentation and experimental validation.

read point-by-point responses

Referee: [Abstract] Abstract: The scalability to 160 categories is asserted based on the dual-prototype conditioning preserving separability via prototype-aware attention, but the abstract provides no derivation, equation, or ablation demonstrating that the attention mechanism integrates local and global prototypes without introducing cross-category interference or degrading separability; this is load-bearing for the claim that performance remains stable as cardinality grows.

Authors: We agree that the abstract's brevity limits its ability to convey the supporting details. The prototype-aware attention and its role in integrating local and global prototypes while preserving separability are formally derived in Section 3.3 (Equations 7-9) and empirically validated via ablations in Section 4.4. To better support the scalability claim in the abstract itself, we will revise it to include a concise reference to the attention mechanism and its separability-preserving property, cross-referencing the relevant sections. This revision will be made without expanding the abstract beyond typical length constraints. revision: yes
Referee: [Experiments] Experiments section: The reported 5.3 and 2.9 point AUROC improvements on the 160-category dataset lack accompanying error bars, details on the dataset composition, or ablations isolating the contribution of the dual prototypes versus the diffusion backbone or training schedule, making it difficult to attribute the gains specifically to the proposed components.

Authors: The referee correctly identifies gaps in the experimental reporting. In the revised manuscript we will: (i) report mean AUROC with standard deviations computed over five independent runs for all key results on the 160-category dataset; (ii) expand Section 4.1 with a detailed breakdown of the 160-category dataset composition (category distribution, image counts, and source); and (iii) add a dedicated ablation table isolating the dual-prototype conditioning from the diffusion backbone and training schedule. These additions will directly address attribution of the reported gains. revision: yes

Circularity Check

0 steps flagged

No circularity: empirical modeling choice with independent validation

full rationale

The paper introduces DPDiff-AD as a new architectural construction (dual local/global prototypes + prototype-aware attention in diffusion) whose performance is validated empirically on external benchmarks including a 160-category dataset. No equations or claims reduce the reported AUROC gains to quantities fitted from the same data or to self-citation chains; the separability-preservation argument is presented as a modeling hypothesis tested by experiment rather than derived by definition. This is the normal case of a self-contained proposal.

Axiom & Free-Parameter Ledger

2 free parameters · 1 axioms · 0 invented entities

Abstract-only review limits precision; the approach rests on standard diffusion assumptions plus the untested premise that dual prototypes suffice for heterogeneous normality.

free parameters (2)

Number of local and global prototypes
Determines granularity of normality modeling; chosen or tuned per dataset.
Optimal transport regularization strength
Hyperparameter balancing global geometry; fitted or selected to stabilize training.

axioms (1)

domain assumption Heterogeneous normal distributions across categories can be captured by complementary local and global prototypes without loss of separability
Invoked to justify the dual-scale representation and diffusion conditioning.

pith-pipeline@v0.9.1-grok · 5770 in / 1209 out tokens · 37139 ms · 2026-06-30T13:45:42.442117+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

56 extracted references · 10 canonical work pages · 6 internal anchors

[1]

International Journal of Computer Vision129(4), 1038–1059 (2021)

Bergmann, P., Batzner, K., Fauser, M., Sattlegger, D., Steger, C.: The mvtec anomaly detection dataset: a comprehensive real-world dataset for unsupervised anomaly detection. International Journal of Computer Vision129(4), 1038–1059 (2021)

2021
[2]

IEEE transactions on medical imaging43(3), 1102–1112 (2023)

Guo, J., Lu, S., Jia, L., Zhang, W., Li, H.: Encoder-decoder contrast for unsu- pervised anomaly detection in medical images. IEEE transactions on medical imaging43(3), 1102–1112 (2023)

2023
[3]

IEEE transactions on pattern analysis and machine intelli- gence44(5), 2293–2312 (2020)

Ramachandra, B., Jones, M.J., Vatsavai, R.R.: A survey of single-scene video anomaly detection. IEEE transactions on pattern analysis and machine intelli- gence44(5), 2293–2312 (2020)

2020
[4]

arXiv preprint arXiv:2401.16402 (2024)

Cao, Y., Xu, X., Zhang, J., Cheng, Y., Huang, X., Pang, G., Shen, W.: A survey on visual anomaly detection: Challenge, approach, and prospect. arXiv preprint arXiv:2401.16402 (2024)

work page arXiv 2024
[5]

In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp

Bergmann, P., Fauser, M., Sattlegger, D., Steger, C.: Uninformed students: Student-teacher anomaly detection with discriminative latent embeddings. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 4183–4192 (2020)

2020
[6]

In: International Conference on Pattern Recognition, pp

Defard, T., Setkov, A., Loesch, A., Audigier, R.: Padim: a patch distribution modeling framework for anomaly detection and localization. In: International Conference on Pattern Recognition, pp. 475–489 (2021). Springer

2021
[7]

In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp

Li, C.-L., Sohn, K., Yoon, J., Pfister, T.: Cutpaste: Self-supervised learning for anomaly detection and localization. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 9664–9674 (2021)

2021
[8]

Advances in Neural Information Processing Systems35, 4571–4584 (2022)

You, Z., Cui, L., Shen, Y., Yang, K., Lu, X., Zheng, Y., Le, X.: A unified model for multi-class anomaly detection. Advances in Neural Information Processing Systems35, 4571–4584 (2022)

2022
[9]

In: Proceedings of the Computer Vision and Pattern Recognition Conference, pp

Guo, J., Lu, S., Zhang, W., Chen, F., Li, H., Liao, H.: Dinomaly: The less is more philosophy in multi-class unsupervised anomaly detection. In: Proceedings of the Computer Vision and Pattern Recognition Conference, pp. 20405–20415 (2025)

2025
[10]

Pattern Recognition, 113354 (2026)

Zhu, W., Wang, C., Gao, B.-B., Zhang, J., Jiang, G., Hu, J., Gan, Z., Wang, L., Zhou, Z., Zhang, J., et al.: Real-iad variety: Pushing industrial anomaly detection dataset to a modern era. Pattern Recognition, 113354 (2026)

2026
[11]

In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp

Roth, K., Pemula, L., Zepeda, J., Sch¨ olkopf, B., Brox, T., Gehler, P.: Towards 26 Preprint total recall in industrial anomaly detection. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 14318–14328 (2022)

2022
[12]

In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp

Deng, H., Li, X.: Anomaly detection via reverse distillation from one-class embed- ding. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 9737–9746 (2022)

2022
[13]

In: Pro- ceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pp

Gudovskiy, D., Ishizaka, S., Kozuka, K.: Cflow-ad: Real-time unsupervised anomaly detection with localization via conditional normalizing flows. In: Pro- ceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pp. 98–107 (2022)

2022
[14]

IEEE Access10, 78446–78454 (2022)

Lee, S., Lee, S., Song, B.C.: Cfa: Coupled-hypersphere-based feature adaptation for target-oriented anomaly localization. IEEE Access10, 78446–78454 (2022)

2022
[15]

In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp

Zavrtanik, V., Kristan, M., Skoˇ caj, D.: Draem-a discriminatively trained recon- struction embedding for surface anomaly detection. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 8330–8339 (2021)

2021
[16]

In: European Conference on Computer Vision, pp

Schl¨ uter, H.M., Tan, J., Hou, B., Kainz, B.: Natural synthetic anomalies for self-supervised anomaly detection and localization. In: European Conference on Computer Vision, pp. 474–489 (2022). Springer

2022
[17]

In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp

Liu, Z., Zhou, Y., Xu, Y., Wang, Z.: Simplenet: A simple network for image anomaly detection and localization. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 20402–20411 (2023)

2023
[18]

Advances in Neural Information Processing Systems36, 8487–8500 (2023)

Lu, R., Wu, Y., Tian, L., Wang, D., Chen, B., Liu, X., Hu, R.: Hierarchical vector quantized transformer for multi-class unsupervised anomaly detection. Advances in Neural Information Processing Systems36, 8487–8500 (2023)

2023
[19]

In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp

Zhang, X., Li, S., Li, X., Huang, P., Shan, J., Chen, T.: Destseg: Segmentation guided denoising student-teacher for anomaly detection. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 3914– 3923 (2023)

2023
[20]

In: European Conference on Computer Vision, pp

Gao, B.-B.: Learning to detect multi-class anomalies with just one normal image prompt. In: European Conference on Computer Vision, pp. 454–470 (2024). Springer

2024
[21]

In: Pro- ceedings of the AAAI Conference on Artificial Intelligence, vol

He, H., Zhang, J., Chen, H., Chen, X., Li, Z., Chen, X., Wang, Y., Wang, C., Xie, L.: A diffusion-based framework for multi-class anomaly detection. In: Pro- ceedings of the AAAI Conference on Artificial Intelligence, vol. 38, pp. 8472–8480 (2024)

2024
[22]

Advances in Neural Information Processing Systems37, 71162–71187 (2024)

He, H., Bai, Y., Zhang, J., He, Q., Chen, H., Gan, Z., Wang, C., Li, X., Tian, 27 Preprint G., Xie, L.: Mambaad: Exploring state space models for multi-class unsuper- vised anomaly detection. Advances in Neural Information Processing Systems37, 71162–71187 (2024)

2024
[23]

In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp

Fan, L., Huang, J., Di, D., Su, A., Song, T., Pagnucco, M., Song, Y.: Sal- vaging the overlooked: Leveraging class-aware contrastive learning for multi-class anomaly detection. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 21419–21428 (2025)

2025
[24]

In: Proceedings of the Computer Vision and Pattern Recognition Conference, pp

Luo, W., Cao, Y., Yao, H., Zhang, X., Lou, J., Cheng, Y., Shen, W., Yu, W.: Exploring intrinsic normal prototypes within a single image for universal anomaly detection. In: Proceedings of the Computer Vision and Pattern Recognition Conference, pp. 9974–9983 (2025)

2025
[25]

In: European Conference on Computer Vision, pp

Fuˇ cka, M., Zavrtanik, V., Skoˇ caj, D.: Transfusion–a transparency-based diffusion model for anomaly detection. In: European Conference on Computer Vision, pp. 91–108 (2024). Springer

2024
[26]

In: European Conference on Computer Vision, pp

Yao, H., Liu, M., Yin, Z., Yan, Z., Hong, X., Zuo, W.: Glad: Towards better reconstruction with global and local adaptive diffusion models for unsupervised anomaly detection. In: European Conference on Computer Vision, pp. 1–17 (2024). Springer

2024
[27]

In: European Conference on Computer Vision, pp

Zou, Y., Jeong, J., Pemula, L., Zhang, D., Dabeer, O.: Spot-the-difference self- supervised pre-training for anomaly detection and segmentation. In: European Conference on Computer Vision, pp. 392–408 (2022). Springer

2022
[28]

DINOv2: Learning Robust Visual Features without Supervision

Oquab, M., Darcet, T., Moutakanni, T., Vo, H., Szafraniec, M., Khalidov, V., Fernandez, P., Haziza, D., Massa, F., El-Nouby, A., et al.: Dinov2: Learning robust visual features without supervision. arXiv preprint arXiv:2304.07193 (2023)

work page internal anchor Pith review Pith/arXiv arXiv 2023
[29]

In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp

Gong, D., Liu, L., Le, V., Saha, B., Mansour, M.R., Venkatesh, S., Hengel, A.v.d.: Memorizing normality to detect anomaly: Memory-augmented deep autoen- coder for unsupervised anomaly detection. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 1705–1714 (2019)

2019
[30]

In: International Conference on Neural Information Processing, pp

You, Z., Yang, K., Luo, W., Cui, L., Zheng, Y., Le, X.: Adtr: Anomaly detection transformer with feature reconstruction. In: International Conference on Neural Information Processing, pp. 298–310 (2022). Springer

2022
[31]

arXiv preprint arXiv:2307.08059 (2023)

Yin, H., Jiao, G., Wu, Q., Karlsson, B.F., Huang, B., Lin, C.Y.: Lafite: Latent dif- fusion model with feature editing for unsupervised multi-class anomaly detection. arXiv preprint arXiv:2307.08059 (2023)

work page arXiv 2023
[32]

arXiv preprint arXiv:2312.07495 (2023) 28 Preprint

Zhang, J., Chen, X., Wang, Y., Wang, C., Liu, Y., Li, X., Yang, M.-H., Tao, D.: Exploring plain vit reconstruction for multi-class unsupervised anomaly detection. arXiv preprint arXiv:2312.07495 (2023) 28 Preprint

work page arXiv 2023
[33]

In: Proceedings of the Computer Vision and Pattern Recognition Conference, pp

Ma, W., Zhang, X., Yao, Q., Tang, F., Wu, C., Li, Y., Yan, R., Jiang, Z., Zhou, S.K.: Aa-clip: Enhancing zero-shot anomaly detection via anomaly-aware clip. In: Proceedings of the Computer Vision and Pattern Recognition Conference, pp. 4744–4754 (2025)

2025
[34]

In: International Confer- ence on Machine Learning, pp

Sohl-Dickstein, J., Weiss, E., Maheswaranathan, N., Ganguli, S.: Deep unsuper- vised learning using nonequilibrium thermodynamics. In: International Confer- ence on Machine Learning, pp. 2256–2265 (2015). pmlr

2015
[35]

Score-Based Generative Modeling through Stochastic Differential Equations

Song, Y., Sohl-Dickstein, J., Kingma, D.P., Kumar, A., Ermon, S., Poole, B.: Score-based generative modeling through stochastic differential equations. arXiv preprint arXiv:2011.13456 (2020)

work page internal anchor Pith review Pith/arXiv arXiv 2011
[36]

Advances in neural information processing systems33, 6840–6851 (2020)

Ho, J., Jain, A., Abbeel, P.: Denoising diffusion probabilistic models. Advances in neural information processing systems33, 6840–6851 (2020)

2020
[37]

In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp

Wyatt, J., Leach, A., Schmon, S.M., Willcocks, C.G.: Anoddpm: Anomaly detection with denoising diffusion probabilistic models using simplex noise. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 650–656 (2022)

2022
[38]

In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp

Zhang, X., Li, N., Li, J., Dai, T., Jiang, Y., Xia, S.-T.: Unsupervised surface anomaly detection with diffusion probabilistic model. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 6782–6791 (2023)

2023
[39]

In: DAGM German Conference on Pattern Recognition, pp

Mousakhan, A., Brox, T., Tayyub, J.: Anomaly detection with conditioned denois- ing diffusion models. In: DAGM German Conference on Pattern Recognition, pp. 181–195 (2024). Springer

2024
[40]

Advances in neural information processing systems30(2017)

Snell, J., Swersky, K., Zemel, R.: Prototypical networks for few-shot learning. Advances in neural information processing systems30(2017)

2017
[41]

arXiv preprint arXiv:2506.21398 (2025)

Tian, L., Li, Y., Dai, Y., Chen, W., Liu, X., Chen, B.: Fastref: Fast pro- totype refinement for few-shot industrial anomaly detection. arXiv preprint arXiv:2506.21398 (2025)

work page arXiv 2025
[42]

In: International Conference on Medical Image Computing and Computer-assisted Intervention, pp

Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer-assisted Intervention, pp. 234–241 (2015). Springer

2015
[43]

Advances in neural information processing systems34, 8780–8794 (2021)

Dhariwal, P., Nichol, A.: Diffusion models beat gans on image synthesis. Advances in neural information processing systems34, 8780–8794 (2021)

2021
[44]

In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp

Peebles, W., Xie, S.: Scalable diffusion models with transformers. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 4195–4205 (2023) 29 Preprint

2023
[45]

Center for Research in Economics and Statistics Working Papers (2017-86) (2017)

Peyr´ e, G., Cuturi, M.: Computational optimal transport. Center for Research in Economics and Statistics Working Papers (2017-86) (2017)

2017
[46]

Denoising Diffusion Implicit Models

Song, J., Meng, C., Ermon, S.: Denoising diffusion implicit models. arXiv preprint arXiv:2010.02502 (2020)

work page internal anchor Pith review Pith/arXiv arXiv 2010
[47]

In: 2021 13th International Congress on Ultra Modern Telecommunications and Control Systems and Workshops (ICUMT), pp

Jezek, S., Jonak, M., Burget, R., Dvorak, P., Skotak, M.: Deep learning-based defect detection of metal parts: evaluating current methods in complex conditions. In: 2021 13th International Congress on Ultra Modern Telecommunications and Control Systems and Workshops (ICUMT), pp. 66–71 (2021). IEEE

2021
[48]

In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp

Wang, C., Zhu, W., Gao, B.-B., Gan, Z., Zhang, J., Gu, Z., Qian, S., Chen, M., Ma, L.: Real-iad: A real-world multi-view dataset for benchmarking versatile industrial anomaly detection. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 22883–22892 (2024)

2024
[49]

Vision Transformers Need Registers

Darcet, T., Oquab, M., Mairal, J., Bojanowski, P.: Vision transformers need registers. arXiv preprint arXiv:2309.16588 (2023)

work page internal anchor Pith review Pith/arXiv arXiv 2023
[50]

Adam: A Method for Stochastic Optimization

Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014)

work page internal anchor Pith review Pith/arXiv arXiv 2014
[51]

Advances in Neural Information Pro- cessing Systems36, 10721–10740 (2023)

Guo, J., Lu, S., Jia, L., Zhang, W., Li, H.: Recontrast: Domain-specific anomaly detection via contrastive reconstruction. Advances in Neural Information Pro- cessing Systems36, 10721–10740 (2023)

2023
[52]

In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp

He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recogni- tion. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016)

2016
[53]

In: International Conference on Machine Learning, pp

Tan, M., Le, Q.: Efficientnet: Rethinking model scaling for convolutional neu- ral networks. In: International Conference on Machine Learning, pp. 6105–6114 (2019). PMLR

2019
[54]

An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale

Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., et al.: An image is worth 16x16 words: Transformers for image recognition at scale. arXiv preprint arXiv:2010.11929 (2020)

work page internal anchor Pith review Pith/arXiv arXiv 2010
[55]

In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp

Caron, M., Touvron, H., Misra, I., J´ egou, H., Mairal, J., Bojanowski, P., Joulin, A.: Emerging properties in self-supervised vision transformers. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 9650–9660 (2021)

2021
[56]

In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp

Bao, F., Nie, S., Xue, K., Cao, Y., Li, C., Su, H., Zhu, J.: All are worth words: A vit backbone for diffusion models. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 22669–22679 (2023) 30 Preprint Appendix A Category Breakdown of S1, S2, and S3 in Real-IAD Variety To systematically evaluate the impact of categor...

2023

[1] [1]

International Journal of Computer Vision129(4), 1038–1059 (2021)

Bergmann, P., Batzner, K., Fauser, M., Sattlegger, D., Steger, C.: The mvtec anomaly detection dataset: a comprehensive real-world dataset for unsupervised anomaly detection. International Journal of Computer Vision129(4), 1038–1059 (2021)

2021

[2] [2]

IEEE transactions on medical imaging43(3), 1102–1112 (2023)

Guo, J., Lu, S., Jia, L., Zhang, W., Li, H.: Encoder-decoder contrast for unsu- pervised anomaly detection in medical images. IEEE transactions on medical imaging43(3), 1102–1112 (2023)

2023

[3] [3]

IEEE transactions on pattern analysis and machine intelli- gence44(5), 2293–2312 (2020)

Ramachandra, B., Jones, M.J., Vatsavai, R.R.: A survey of single-scene video anomaly detection. IEEE transactions on pattern analysis and machine intelli- gence44(5), 2293–2312 (2020)

2020

[4] [4]

arXiv preprint arXiv:2401.16402 (2024)

Cao, Y., Xu, X., Zhang, J., Cheng, Y., Huang, X., Pang, G., Shen, W.: A survey on visual anomaly detection: Challenge, approach, and prospect. arXiv preprint arXiv:2401.16402 (2024)

work page arXiv 2024

[5] [5]

In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp

Bergmann, P., Fauser, M., Sattlegger, D., Steger, C.: Uninformed students: Student-teacher anomaly detection with discriminative latent embeddings. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 4183–4192 (2020)

2020

[6] [6]

In: International Conference on Pattern Recognition, pp

Defard, T., Setkov, A., Loesch, A., Audigier, R.: Padim: a patch distribution modeling framework for anomaly detection and localization. In: International Conference on Pattern Recognition, pp. 475–489 (2021). Springer

2021

[7] [7]

In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp

Li, C.-L., Sohn, K., Yoon, J., Pfister, T.: Cutpaste: Self-supervised learning for anomaly detection and localization. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 9664–9674 (2021)

2021

[8] [8]

Advances in Neural Information Processing Systems35, 4571–4584 (2022)

You, Z., Cui, L., Shen, Y., Yang, K., Lu, X., Zheng, Y., Le, X.: A unified model for multi-class anomaly detection. Advances in Neural Information Processing Systems35, 4571–4584 (2022)

2022

[9] [9]

In: Proceedings of the Computer Vision and Pattern Recognition Conference, pp

Guo, J., Lu, S., Zhang, W., Chen, F., Li, H., Liao, H.: Dinomaly: The less is more philosophy in multi-class unsupervised anomaly detection. In: Proceedings of the Computer Vision and Pattern Recognition Conference, pp. 20405–20415 (2025)

2025

[10] [10]

Pattern Recognition, 113354 (2026)

Zhu, W., Wang, C., Gao, B.-B., Zhang, J., Jiang, G., Hu, J., Gan, Z., Wang, L., Zhou, Z., Zhang, J., et al.: Real-iad variety: Pushing industrial anomaly detection dataset to a modern era. Pattern Recognition, 113354 (2026)

2026

[11] [11]

In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp

Roth, K., Pemula, L., Zepeda, J., Sch¨ olkopf, B., Brox, T., Gehler, P.: Towards 26 Preprint total recall in industrial anomaly detection. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 14318–14328 (2022)

2022

[12] [12]

In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp

Deng, H., Li, X.: Anomaly detection via reverse distillation from one-class embed- ding. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 9737–9746 (2022)

2022

[13] [13]

In: Pro- ceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pp

Gudovskiy, D., Ishizaka, S., Kozuka, K.: Cflow-ad: Real-time unsupervised anomaly detection with localization via conditional normalizing flows. In: Pro- ceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pp. 98–107 (2022)

2022

[14] [14]

IEEE Access10, 78446–78454 (2022)

Lee, S., Lee, S., Song, B.C.: Cfa: Coupled-hypersphere-based feature adaptation for target-oriented anomaly localization. IEEE Access10, 78446–78454 (2022)

2022

[15] [15]

In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp

Zavrtanik, V., Kristan, M., Skoˇ caj, D.: Draem-a discriminatively trained recon- struction embedding for surface anomaly detection. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 8330–8339 (2021)

2021

[16] [16]

In: European Conference on Computer Vision, pp

Schl¨ uter, H.M., Tan, J., Hou, B., Kainz, B.: Natural synthetic anomalies for self-supervised anomaly detection and localization. In: European Conference on Computer Vision, pp. 474–489 (2022). Springer

2022

[17] [17]

In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp

Liu, Z., Zhou, Y., Xu, Y., Wang, Z.: Simplenet: A simple network for image anomaly detection and localization. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 20402–20411 (2023)

2023

[18] [18]

Advances in Neural Information Processing Systems36, 8487–8500 (2023)

Lu, R., Wu, Y., Tian, L., Wang, D., Chen, B., Liu, X., Hu, R.: Hierarchical vector quantized transformer for multi-class unsupervised anomaly detection. Advances in Neural Information Processing Systems36, 8487–8500 (2023)

2023

[19] [19]

In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp

Zhang, X., Li, S., Li, X., Huang, P., Shan, J., Chen, T.: Destseg: Segmentation guided denoising student-teacher for anomaly detection. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 3914– 3923 (2023)

2023

[20] [20]

In: European Conference on Computer Vision, pp

Gao, B.-B.: Learning to detect multi-class anomalies with just one normal image prompt. In: European Conference on Computer Vision, pp. 454–470 (2024). Springer

2024

[21] [21]

In: Pro- ceedings of the AAAI Conference on Artificial Intelligence, vol

He, H., Zhang, J., Chen, H., Chen, X., Li, Z., Chen, X., Wang, Y., Wang, C., Xie, L.: A diffusion-based framework for multi-class anomaly detection. In: Pro- ceedings of the AAAI Conference on Artificial Intelligence, vol. 38, pp. 8472–8480 (2024)

2024

[22] [22]

Advances in Neural Information Processing Systems37, 71162–71187 (2024)

He, H., Bai, Y., Zhang, J., He, Q., Chen, H., Gan, Z., Wang, C., Li, X., Tian, 27 Preprint G., Xie, L.: Mambaad: Exploring state space models for multi-class unsuper- vised anomaly detection. Advances in Neural Information Processing Systems37, 71162–71187 (2024)

2024

[23] [23]

In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp

Fan, L., Huang, J., Di, D., Su, A., Song, T., Pagnucco, M., Song, Y.: Sal- vaging the overlooked: Leveraging class-aware contrastive learning for multi-class anomaly detection. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 21419–21428 (2025)

2025

[24] [24]

In: Proceedings of the Computer Vision and Pattern Recognition Conference, pp

Luo, W., Cao, Y., Yao, H., Zhang, X., Lou, J., Cheng, Y., Shen, W., Yu, W.: Exploring intrinsic normal prototypes within a single image for universal anomaly detection. In: Proceedings of the Computer Vision and Pattern Recognition Conference, pp. 9974–9983 (2025)

2025

[25] [25]

In: European Conference on Computer Vision, pp

Fuˇ cka, M., Zavrtanik, V., Skoˇ caj, D.: Transfusion–a transparency-based diffusion model for anomaly detection. In: European Conference on Computer Vision, pp. 91–108 (2024). Springer

2024

[26] [26]

In: European Conference on Computer Vision, pp

Yao, H., Liu, M., Yin, Z., Yan, Z., Hong, X., Zuo, W.: Glad: Towards better reconstruction with global and local adaptive diffusion models for unsupervised anomaly detection. In: European Conference on Computer Vision, pp. 1–17 (2024). Springer

2024

[27] [27]

In: European Conference on Computer Vision, pp

Zou, Y., Jeong, J., Pemula, L., Zhang, D., Dabeer, O.: Spot-the-difference self- supervised pre-training for anomaly detection and segmentation. In: European Conference on Computer Vision, pp. 392–408 (2022). Springer

2022

[28] [28]

DINOv2: Learning Robust Visual Features without Supervision

Oquab, M., Darcet, T., Moutakanni, T., Vo, H., Szafraniec, M., Khalidov, V., Fernandez, P., Haziza, D., Massa, F., El-Nouby, A., et al.: Dinov2: Learning robust visual features without supervision. arXiv preprint arXiv:2304.07193 (2023)

work page internal anchor Pith review Pith/arXiv arXiv 2023

[29] [29]

In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp

Gong, D., Liu, L., Le, V., Saha, B., Mansour, M.R., Venkatesh, S., Hengel, A.v.d.: Memorizing normality to detect anomaly: Memory-augmented deep autoen- coder for unsupervised anomaly detection. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 1705–1714 (2019)

2019

[30] [30]

In: International Conference on Neural Information Processing, pp

You, Z., Yang, K., Luo, W., Cui, L., Zheng, Y., Le, X.: Adtr: Anomaly detection transformer with feature reconstruction. In: International Conference on Neural Information Processing, pp. 298–310 (2022). Springer

2022

[31] [31]

arXiv preprint arXiv:2307.08059 (2023)

Yin, H., Jiao, G., Wu, Q., Karlsson, B.F., Huang, B., Lin, C.Y.: Lafite: Latent dif- fusion model with feature editing for unsupervised multi-class anomaly detection. arXiv preprint arXiv:2307.08059 (2023)

work page arXiv 2023

[32] [32]

arXiv preprint arXiv:2312.07495 (2023) 28 Preprint

Zhang, J., Chen, X., Wang, Y., Wang, C., Liu, Y., Li, X., Yang, M.-H., Tao, D.: Exploring plain vit reconstruction for multi-class unsupervised anomaly detection. arXiv preprint arXiv:2312.07495 (2023) 28 Preprint

work page arXiv 2023

[33] [33]

In: Proceedings of the Computer Vision and Pattern Recognition Conference, pp

Ma, W., Zhang, X., Yao, Q., Tang, F., Wu, C., Li, Y., Yan, R., Jiang, Z., Zhou, S.K.: Aa-clip: Enhancing zero-shot anomaly detection via anomaly-aware clip. In: Proceedings of the Computer Vision and Pattern Recognition Conference, pp. 4744–4754 (2025)

2025

[34] [34]

In: International Confer- ence on Machine Learning, pp

Sohl-Dickstein, J., Weiss, E., Maheswaranathan, N., Ganguli, S.: Deep unsuper- vised learning using nonequilibrium thermodynamics. In: International Confer- ence on Machine Learning, pp. 2256–2265 (2015). pmlr

2015

[35] [35]

Score-Based Generative Modeling through Stochastic Differential Equations

Song, Y., Sohl-Dickstein, J., Kingma, D.P., Kumar, A., Ermon, S., Poole, B.: Score-based generative modeling through stochastic differential equations. arXiv preprint arXiv:2011.13456 (2020)

work page internal anchor Pith review Pith/arXiv arXiv 2011

[36] [36]

Advances in neural information processing systems33, 6840–6851 (2020)

Ho, J., Jain, A., Abbeel, P.: Denoising diffusion probabilistic models. Advances in neural information processing systems33, 6840–6851 (2020)

2020

[37] [37]

In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp

Wyatt, J., Leach, A., Schmon, S.M., Willcocks, C.G.: Anoddpm: Anomaly detection with denoising diffusion probabilistic models using simplex noise. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 650–656 (2022)

2022

[38] [38]

In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp

Zhang, X., Li, N., Li, J., Dai, T., Jiang, Y., Xia, S.-T.: Unsupervised surface anomaly detection with diffusion probabilistic model. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 6782–6791 (2023)

2023

[39] [39]

In: DAGM German Conference on Pattern Recognition, pp

Mousakhan, A., Brox, T., Tayyub, J.: Anomaly detection with conditioned denois- ing diffusion models. In: DAGM German Conference on Pattern Recognition, pp. 181–195 (2024). Springer

2024

[40] [40]

Advances in neural information processing systems30(2017)

Snell, J., Swersky, K., Zemel, R.: Prototypical networks for few-shot learning. Advances in neural information processing systems30(2017)

2017

[41] [41]

arXiv preprint arXiv:2506.21398 (2025)

Tian, L., Li, Y., Dai, Y., Chen, W., Liu, X., Chen, B.: Fastref: Fast pro- totype refinement for few-shot industrial anomaly detection. arXiv preprint arXiv:2506.21398 (2025)

work page arXiv 2025

[42] [42]

In: International Conference on Medical Image Computing and Computer-assisted Intervention, pp

Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer-assisted Intervention, pp. 234–241 (2015). Springer

2015

[43] [43]

Advances in neural information processing systems34, 8780–8794 (2021)

Dhariwal, P., Nichol, A.: Diffusion models beat gans on image synthesis. Advances in neural information processing systems34, 8780–8794 (2021)

2021

[44] [44]

In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp

Peebles, W., Xie, S.: Scalable diffusion models with transformers. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 4195–4205 (2023) 29 Preprint

2023

[45] [45]

Center for Research in Economics and Statistics Working Papers (2017-86) (2017)

Peyr´ e, G., Cuturi, M.: Computational optimal transport. Center for Research in Economics and Statistics Working Papers (2017-86) (2017)

2017

[46] [46]

Denoising Diffusion Implicit Models

Song, J., Meng, C., Ermon, S.: Denoising diffusion implicit models. arXiv preprint arXiv:2010.02502 (2020)

work page internal anchor Pith review Pith/arXiv arXiv 2010

[47] [47]

In: 2021 13th International Congress on Ultra Modern Telecommunications and Control Systems and Workshops (ICUMT), pp

Jezek, S., Jonak, M., Burget, R., Dvorak, P., Skotak, M.: Deep learning-based defect detection of metal parts: evaluating current methods in complex conditions. In: 2021 13th International Congress on Ultra Modern Telecommunications and Control Systems and Workshops (ICUMT), pp. 66–71 (2021). IEEE

2021

[48] [48]

In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp

Wang, C., Zhu, W., Gao, B.-B., Gan, Z., Zhang, J., Gu, Z., Qian, S., Chen, M., Ma, L.: Real-iad: A real-world multi-view dataset for benchmarking versatile industrial anomaly detection. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 22883–22892 (2024)

2024

[49] [49]

Vision Transformers Need Registers

Darcet, T., Oquab, M., Mairal, J., Bojanowski, P.: Vision transformers need registers. arXiv preprint arXiv:2309.16588 (2023)

work page internal anchor Pith review Pith/arXiv arXiv 2023

[50] [50]

Adam: A Method for Stochastic Optimization

Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014)

work page internal anchor Pith review Pith/arXiv arXiv 2014

[51] [51]

Advances in Neural Information Pro- cessing Systems36, 10721–10740 (2023)

Guo, J., Lu, S., Jia, L., Zhang, W., Li, H.: Recontrast: Domain-specific anomaly detection via contrastive reconstruction. Advances in Neural Information Pro- cessing Systems36, 10721–10740 (2023)

2023

[52] [52]

In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp

He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recogni- tion. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016)

2016

[53] [53]

In: International Conference on Machine Learning, pp

Tan, M., Le, Q.: Efficientnet: Rethinking model scaling for convolutional neu- ral networks. In: International Conference on Machine Learning, pp. 6105–6114 (2019). PMLR

2019

[54] [54]

An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale

Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., et al.: An image is worth 16x16 words: Transformers for image recognition at scale. arXiv preprint arXiv:2010.11929 (2020)

work page internal anchor Pith review Pith/arXiv arXiv 2010

[55] [55]

In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp

Caron, M., Touvron, H., Misra, I., J´ egou, H., Mairal, J., Bojanowski, P., Joulin, A.: Emerging properties in self-supervised vision transformers. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 9650–9660 (2021)

2021

[56] [56]

In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp

Bao, F., Nie, S., Xue, K., Cao, Y., Li, C., Su, H., Zhu, J.: All are worth words: A vit backbone for diffusion models. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 22669–22679 (2023) 30 Preprint Appendix A Category Breakdown of S1, S2, and S3 in Real-IAD Variety To systematically evaluate the impact of categor...

2023