Dual Prototype-Conditioned Diffusion Model for Scalable Multi-Class Unsupervised Anomaly Detection in Large Category Spaces
Pith reviewed 2026-06-30 13:45 UTC · model grok-4.3
The pith
A diffusion model conditioned on dual local and global prototypes enables scalable multi-class anomaly detection across growing category counts.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
DPDiff-AD models heterogeneous normal distributions through complementary local and global prototypes. Local prototypes capture representative fine-grained structural patterns via nearest-prototype aggregation, while global prototypes regulate holistic feature geometry through optimal transport regularization. This space is refined through diffusion-based reconstruction conditioned on both via prototype-aware attention, achieving precise normality modeling and preserving structured separability as category cardinality grows.
What carries the argument
Dual prototypes (local via nearest-prototype aggregation and global via optimal transport) conditioned into a diffusion model via prototype-aware attention to define and refine a structured normality space.
If this is right
- Improves image-level AUROC by 5.3 points and pixel-level by 2.9 points on a 160-category dataset over prior SOTA.
- Maintains stable performance as the number of categories increases.
- Enables precise normality modeling for heterogeneous distributions across diverse categories.
- Supports scalable anomaly discrimination in large category spaces.
Where Pith is reading between the lines
- Similar dual-prototype conditioning could extend to other generative models beyond diffusion for anomaly tasks.
- The approach might apply to domains like medical imaging where category diversity is high.
- Testing on even larger category sets could reveal the limits of the scalability claim.
Load-bearing premise
The dual prototypes and prototype-aware attention together preserve separability of normal and anomalous distributions as the number of categories grows.
What would settle it
Performance degradation on a dataset with significantly more than 160 categories, or failure of the local and global prototypes to complement each other in maintaining AUROC stability.
read the original abstract
Multi-class anomaly detection aims to build unified models across diverse product categories. However, as the number of categories grows, its performance often degrades due to increasingly complex and heterogeneous normal distributions. To address this challenge, we propose DPDiff-AD, a Dual Prototype-conditioned Diffusion model for large-scale multi-class Anomaly Detection. DPDiff-AD models heterogeneous normal distributions through complementary local and global prototypes. Local prototypes capture representative fine-grained structural patterns via nearest-prototype aggregation, while global prototypes regulate holistic feature geometry through optimal transport regularization. Together, these dual-scale representations define a structured normality space. This space is refined through diffusion-based reconstruction conditioned on both local and global prototypes via prototype-aware attention. By jointly leveraging dual prototypes during generation, DPDiff-AD achieves precise normality modeling, preserves structured separability as category cardinality grows, and enables scalable anomaly discrimination. Extensive experiments across five benchmarks demonstrate the effectiveness and scalability of DPDiff-AD. On the 160-category large-scale dataset, it improves image- and pixel-level AUROC by 5.3 and 2.9 points over the previous state-of-the-art method Dinomaly+, while maintaining stable performance as category cardinality increases.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper presents DPDiff-AD, a Dual Prototype-conditioned Diffusion model for scalable multi-class unsupervised anomaly detection. It models heterogeneous normal distributions using local prototypes (via nearest-prototype aggregation) and global prototypes (via optimal transport regularization), which are then used to condition a diffusion model through prototype-aware attention. The method claims to achieve improved performance on large category spaces, with specific gains of 5.3 and 2.9 AUROC points on image- and pixel-level metrics over Dinomaly+ on a 160-category dataset, while maintaining stability as the number of categories increases. Experiments are reported across five benchmarks.
Significance. If the central claims hold, this work would be significant for advancing multi-class anomaly detection in settings with large and growing category spaces, such as industrial inspection. By addressing the degradation in performance with increasing category cardinality through a structured normality space defined by dual prototypes, it offers a potential solution to a practical scalability issue. The integration of diffusion models with prototype conditioning is a promising direction, and the reported empirical gains on a large-scale dataset highlight its potential impact if the underlying mechanism is robustly validated.
major comments (2)
- [Abstract] Abstract: The scalability to 160 categories is asserted based on the dual-prototype conditioning preserving separability via prototype-aware attention, but the abstract provides no derivation, equation, or ablation demonstrating that the attention mechanism integrates local and global prototypes without introducing cross-category interference or degrading separability; this is load-bearing for the claim that performance remains stable as cardinality grows.
- [Experiments] Experiments section: The reported 5.3 and 2.9 point AUROC improvements on the 160-category dataset lack accompanying error bars, details on the dataset composition, or ablations isolating the contribution of the dual prototypes versus the diffusion backbone or training schedule, making it difficult to attribute the gains specifically to the proposed components.
minor comments (2)
- The abstract refers to 'five benchmarks' without naming them, which should be clarified for context.
- Notation for local and global prototypes could be introduced more clearly with equations in the method section to aid readability.
Simulated Author's Rebuttal
We thank the referee for the detailed and constructive comments. We address each major comment below, indicating revisions that will be incorporated to strengthen the presentation and experimental validation.
read point-by-point responses
-
Referee: [Abstract] Abstract: The scalability to 160 categories is asserted based on the dual-prototype conditioning preserving separability via prototype-aware attention, but the abstract provides no derivation, equation, or ablation demonstrating that the attention mechanism integrates local and global prototypes without introducing cross-category interference or degrading separability; this is load-bearing for the claim that performance remains stable as cardinality grows.
Authors: We agree that the abstract's brevity limits its ability to convey the supporting details. The prototype-aware attention and its role in integrating local and global prototypes while preserving separability are formally derived in Section 3.3 (Equations 7-9) and empirically validated via ablations in Section 4.4. To better support the scalability claim in the abstract itself, we will revise it to include a concise reference to the attention mechanism and its separability-preserving property, cross-referencing the relevant sections. This revision will be made without expanding the abstract beyond typical length constraints. revision: yes
-
Referee: [Experiments] Experiments section: The reported 5.3 and 2.9 point AUROC improvements on the 160-category dataset lack accompanying error bars, details on the dataset composition, or ablations isolating the contribution of the dual prototypes versus the diffusion backbone or training schedule, making it difficult to attribute the gains specifically to the proposed components.
Authors: The referee correctly identifies gaps in the experimental reporting. In the revised manuscript we will: (i) report mean AUROC with standard deviations computed over five independent runs for all key results on the 160-category dataset; (ii) expand Section 4.1 with a detailed breakdown of the 160-category dataset composition (category distribution, image counts, and source); and (iii) add a dedicated ablation table isolating the dual-prototype conditioning from the diffusion backbone and training schedule. These additions will directly address attribution of the reported gains. revision: yes
Circularity Check
No circularity: empirical modeling choice with independent validation
full rationale
The paper introduces DPDiff-AD as a new architectural construction (dual local/global prototypes + prototype-aware attention in diffusion) whose performance is validated empirically on external benchmarks including a 160-category dataset. No equations or claims reduce the reported AUROC gains to quantities fitted from the same data or to self-citation chains; the separability-preservation argument is presented as a modeling hypothesis tested by experiment rather than derived by definition. This is the normal case of a self-contained proposal.
Axiom & Free-Parameter Ledger
free parameters (2)
- Number of local and global prototypes
- Optimal transport regularization strength
axioms (1)
- domain assumption Heterogeneous normal distributions across categories can be captured by complementary local and global prototypes without loss of separability
Reference graph
Works this paper leans on
-
[1]
International Journal of Computer Vision129(4), 1038–1059 (2021)
Bergmann, P., Batzner, K., Fauser, M., Sattlegger, D., Steger, C.: The mvtec anomaly detection dataset: a comprehensive real-world dataset for unsupervised anomaly detection. International Journal of Computer Vision129(4), 1038–1059 (2021)
2021
-
[2]
IEEE transactions on medical imaging43(3), 1102–1112 (2023)
Guo, J., Lu, S., Jia, L., Zhang, W., Li, H.: Encoder-decoder contrast for unsu- pervised anomaly detection in medical images. IEEE transactions on medical imaging43(3), 1102–1112 (2023)
2023
-
[3]
IEEE transactions on pattern analysis and machine intelli- gence44(5), 2293–2312 (2020)
Ramachandra, B., Jones, M.J., Vatsavai, R.R.: A survey of single-scene video anomaly detection. IEEE transactions on pattern analysis and machine intelli- gence44(5), 2293–2312 (2020)
2020
-
[4]
arXiv preprint arXiv:2401.16402 (2024)
Cao, Y., Xu, X., Zhang, J., Cheng, Y., Huang, X., Pang, G., Shen, W.: A survey on visual anomaly detection: Challenge, approach, and prospect. arXiv preprint arXiv:2401.16402 (2024)
-
[5]
In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp
Bergmann, P., Fauser, M., Sattlegger, D., Steger, C.: Uninformed students: Student-teacher anomaly detection with discriminative latent embeddings. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 4183–4192 (2020)
2020
-
[6]
In: International Conference on Pattern Recognition, pp
Defard, T., Setkov, A., Loesch, A., Audigier, R.: Padim: a patch distribution modeling framework for anomaly detection and localization. In: International Conference on Pattern Recognition, pp. 475–489 (2021). Springer
2021
-
[7]
In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp
Li, C.-L., Sohn, K., Yoon, J., Pfister, T.: Cutpaste: Self-supervised learning for anomaly detection and localization. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 9664–9674 (2021)
2021
-
[8]
Advances in Neural Information Processing Systems35, 4571–4584 (2022)
You, Z., Cui, L., Shen, Y., Yang, K., Lu, X., Zheng, Y., Le, X.: A unified model for multi-class anomaly detection. Advances in Neural Information Processing Systems35, 4571–4584 (2022)
2022
-
[9]
In: Proceedings of the Computer Vision and Pattern Recognition Conference, pp
Guo, J., Lu, S., Zhang, W., Chen, F., Li, H., Liao, H.: Dinomaly: The less is more philosophy in multi-class unsupervised anomaly detection. In: Proceedings of the Computer Vision and Pattern Recognition Conference, pp. 20405–20415 (2025)
2025
-
[10]
Pattern Recognition, 113354 (2026)
Zhu, W., Wang, C., Gao, B.-B., Zhang, J., Jiang, G., Hu, J., Gan, Z., Wang, L., Zhou, Z., Zhang, J., et al.: Real-iad variety: Pushing industrial anomaly detection dataset to a modern era. Pattern Recognition, 113354 (2026)
2026
-
[11]
In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp
Roth, K., Pemula, L., Zepeda, J., Sch¨ olkopf, B., Brox, T., Gehler, P.: Towards 26 Preprint total recall in industrial anomaly detection. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 14318–14328 (2022)
2022
-
[12]
In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp
Deng, H., Li, X.: Anomaly detection via reverse distillation from one-class embed- ding. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 9737–9746 (2022)
2022
-
[13]
In: Pro- ceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pp
Gudovskiy, D., Ishizaka, S., Kozuka, K.: Cflow-ad: Real-time unsupervised anomaly detection with localization via conditional normalizing flows. In: Pro- ceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pp. 98–107 (2022)
2022
-
[14]
IEEE Access10, 78446–78454 (2022)
Lee, S., Lee, S., Song, B.C.: Cfa: Coupled-hypersphere-based feature adaptation for target-oriented anomaly localization. IEEE Access10, 78446–78454 (2022)
2022
-
[15]
In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp
Zavrtanik, V., Kristan, M., Skoˇ caj, D.: Draem-a discriminatively trained recon- struction embedding for surface anomaly detection. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 8330–8339 (2021)
2021
-
[16]
In: European Conference on Computer Vision, pp
Schl¨ uter, H.M., Tan, J., Hou, B., Kainz, B.: Natural synthetic anomalies for self-supervised anomaly detection and localization. In: European Conference on Computer Vision, pp. 474–489 (2022). Springer
2022
-
[17]
In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp
Liu, Z., Zhou, Y., Xu, Y., Wang, Z.: Simplenet: A simple network for image anomaly detection and localization. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 20402–20411 (2023)
2023
-
[18]
Advances in Neural Information Processing Systems36, 8487–8500 (2023)
Lu, R., Wu, Y., Tian, L., Wang, D., Chen, B., Liu, X., Hu, R.: Hierarchical vector quantized transformer for multi-class unsupervised anomaly detection. Advances in Neural Information Processing Systems36, 8487–8500 (2023)
2023
-
[19]
In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp
Zhang, X., Li, S., Li, X., Huang, P., Shan, J., Chen, T.: Destseg: Segmentation guided denoising student-teacher for anomaly detection. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 3914– 3923 (2023)
2023
-
[20]
In: European Conference on Computer Vision, pp
Gao, B.-B.: Learning to detect multi-class anomalies with just one normal image prompt. In: European Conference on Computer Vision, pp. 454–470 (2024). Springer
2024
-
[21]
In: Pro- ceedings of the AAAI Conference on Artificial Intelligence, vol
He, H., Zhang, J., Chen, H., Chen, X., Li, Z., Chen, X., Wang, Y., Wang, C., Xie, L.: A diffusion-based framework for multi-class anomaly detection. In: Pro- ceedings of the AAAI Conference on Artificial Intelligence, vol. 38, pp. 8472–8480 (2024)
2024
-
[22]
Advances in Neural Information Processing Systems37, 71162–71187 (2024)
He, H., Bai, Y., Zhang, J., He, Q., Chen, H., Gan, Z., Wang, C., Li, X., Tian, 27 Preprint G., Xie, L.: Mambaad: Exploring state space models for multi-class unsuper- vised anomaly detection. Advances in Neural Information Processing Systems37, 71162–71187 (2024)
2024
-
[23]
In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp
Fan, L., Huang, J., Di, D., Su, A., Song, T., Pagnucco, M., Song, Y.: Sal- vaging the overlooked: Leveraging class-aware contrastive learning for multi-class anomaly detection. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 21419–21428 (2025)
2025
-
[24]
In: Proceedings of the Computer Vision and Pattern Recognition Conference, pp
Luo, W., Cao, Y., Yao, H., Zhang, X., Lou, J., Cheng, Y., Shen, W., Yu, W.: Exploring intrinsic normal prototypes within a single image for universal anomaly detection. In: Proceedings of the Computer Vision and Pattern Recognition Conference, pp. 9974–9983 (2025)
2025
-
[25]
In: European Conference on Computer Vision, pp
Fuˇ cka, M., Zavrtanik, V., Skoˇ caj, D.: Transfusion–a transparency-based diffusion model for anomaly detection. In: European Conference on Computer Vision, pp. 91–108 (2024). Springer
2024
-
[26]
In: European Conference on Computer Vision, pp
Yao, H., Liu, M., Yin, Z., Yan, Z., Hong, X., Zuo, W.: Glad: Towards better reconstruction with global and local adaptive diffusion models for unsupervised anomaly detection. In: European Conference on Computer Vision, pp. 1–17 (2024). Springer
2024
-
[27]
In: European Conference on Computer Vision, pp
Zou, Y., Jeong, J., Pemula, L., Zhang, D., Dabeer, O.: Spot-the-difference self- supervised pre-training for anomaly detection and segmentation. In: European Conference on Computer Vision, pp. 392–408 (2022). Springer
2022
-
[28]
DINOv2: Learning Robust Visual Features without Supervision
Oquab, M., Darcet, T., Moutakanni, T., Vo, H., Szafraniec, M., Khalidov, V., Fernandez, P., Haziza, D., Massa, F., El-Nouby, A., et al.: Dinov2: Learning robust visual features without supervision. arXiv preprint arXiv:2304.07193 (2023)
work page internal anchor Pith review Pith/arXiv arXiv 2023
-
[29]
In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp
Gong, D., Liu, L., Le, V., Saha, B., Mansour, M.R., Venkatesh, S., Hengel, A.v.d.: Memorizing normality to detect anomaly: Memory-augmented deep autoen- coder for unsupervised anomaly detection. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 1705–1714 (2019)
2019
-
[30]
In: International Conference on Neural Information Processing, pp
You, Z., Yang, K., Luo, W., Cui, L., Zheng, Y., Le, X.: Adtr: Anomaly detection transformer with feature reconstruction. In: International Conference on Neural Information Processing, pp. 298–310 (2022). Springer
2022
-
[31]
arXiv preprint arXiv:2307.08059 (2023)
Yin, H., Jiao, G., Wu, Q., Karlsson, B.F., Huang, B., Lin, C.Y.: Lafite: Latent dif- fusion model with feature editing for unsupervised multi-class anomaly detection. arXiv preprint arXiv:2307.08059 (2023)
-
[32]
arXiv preprint arXiv:2312.07495 (2023) 28 Preprint
Zhang, J., Chen, X., Wang, Y., Wang, C., Liu, Y., Li, X., Yang, M.-H., Tao, D.: Exploring plain vit reconstruction for multi-class unsupervised anomaly detection. arXiv preprint arXiv:2312.07495 (2023) 28 Preprint
-
[33]
In: Proceedings of the Computer Vision and Pattern Recognition Conference, pp
Ma, W., Zhang, X., Yao, Q., Tang, F., Wu, C., Li, Y., Yan, R., Jiang, Z., Zhou, S.K.: Aa-clip: Enhancing zero-shot anomaly detection via anomaly-aware clip. In: Proceedings of the Computer Vision and Pattern Recognition Conference, pp. 4744–4754 (2025)
2025
-
[34]
In: International Confer- ence on Machine Learning, pp
Sohl-Dickstein, J., Weiss, E., Maheswaranathan, N., Ganguli, S.: Deep unsuper- vised learning using nonequilibrium thermodynamics. In: International Confer- ence on Machine Learning, pp. 2256–2265 (2015). pmlr
2015
-
[35]
Score-Based Generative Modeling through Stochastic Differential Equations
Song, Y., Sohl-Dickstein, J., Kingma, D.P., Kumar, A., Ermon, S., Poole, B.: Score-based generative modeling through stochastic differential equations. arXiv preprint arXiv:2011.13456 (2020)
work page internal anchor Pith review Pith/arXiv arXiv 2011
-
[36]
Advances in neural information processing systems33, 6840–6851 (2020)
Ho, J., Jain, A., Abbeel, P.: Denoising diffusion probabilistic models. Advances in neural information processing systems33, 6840–6851 (2020)
2020
-
[37]
In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp
Wyatt, J., Leach, A., Schmon, S.M., Willcocks, C.G.: Anoddpm: Anomaly detection with denoising diffusion probabilistic models using simplex noise. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 650–656 (2022)
2022
-
[38]
In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp
Zhang, X., Li, N., Li, J., Dai, T., Jiang, Y., Xia, S.-T.: Unsupervised surface anomaly detection with diffusion probabilistic model. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 6782–6791 (2023)
2023
-
[39]
In: DAGM German Conference on Pattern Recognition, pp
Mousakhan, A., Brox, T., Tayyub, J.: Anomaly detection with conditioned denois- ing diffusion models. In: DAGM German Conference on Pattern Recognition, pp. 181–195 (2024). Springer
2024
-
[40]
Advances in neural information processing systems30(2017)
Snell, J., Swersky, K., Zemel, R.: Prototypical networks for few-shot learning. Advances in neural information processing systems30(2017)
2017
-
[41]
arXiv preprint arXiv:2506.21398 (2025)
Tian, L., Li, Y., Dai, Y., Chen, W., Liu, X., Chen, B.: Fastref: Fast pro- totype refinement for few-shot industrial anomaly detection. arXiv preprint arXiv:2506.21398 (2025)
-
[42]
In: International Conference on Medical Image Computing and Computer-assisted Intervention, pp
Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer-assisted Intervention, pp. 234–241 (2015). Springer
2015
-
[43]
Advances in neural information processing systems34, 8780–8794 (2021)
Dhariwal, P., Nichol, A.: Diffusion models beat gans on image synthesis. Advances in neural information processing systems34, 8780–8794 (2021)
2021
-
[44]
In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp
Peebles, W., Xie, S.: Scalable diffusion models with transformers. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 4195–4205 (2023) 29 Preprint
2023
-
[45]
Center for Research in Economics and Statistics Working Papers (2017-86) (2017)
Peyr´ e, G., Cuturi, M.: Computational optimal transport. Center for Research in Economics and Statistics Working Papers (2017-86) (2017)
2017
-
[46]
Denoising Diffusion Implicit Models
Song, J., Meng, C., Ermon, S.: Denoising diffusion implicit models. arXiv preprint arXiv:2010.02502 (2020)
work page internal anchor Pith review Pith/arXiv arXiv 2010
-
[47]
In: 2021 13th International Congress on Ultra Modern Telecommunications and Control Systems and Workshops (ICUMT), pp
Jezek, S., Jonak, M., Burget, R., Dvorak, P., Skotak, M.: Deep learning-based defect detection of metal parts: evaluating current methods in complex conditions. In: 2021 13th International Congress on Ultra Modern Telecommunications and Control Systems and Workshops (ICUMT), pp. 66–71 (2021). IEEE
2021
-
[48]
In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp
Wang, C., Zhu, W., Gao, B.-B., Gan, Z., Zhang, J., Gu, Z., Qian, S., Chen, M., Ma, L.: Real-iad: A real-world multi-view dataset for benchmarking versatile industrial anomaly detection. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 22883–22892 (2024)
2024
-
[49]
Vision Transformers Need Registers
Darcet, T., Oquab, M., Mairal, J., Bojanowski, P.: Vision transformers need registers. arXiv preprint arXiv:2309.16588 (2023)
work page internal anchor Pith review Pith/arXiv arXiv 2023
-
[50]
Adam: A Method for Stochastic Optimization
Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014)
work page internal anchor Pith review Pith/arXiv arXiv 2014
-
[51]
Advances in Neural Information Pro- cessing Systems36, 10721–10740 (2023)
Guo, J., Lu, S., Jia, L., Zhang, W., Li, H.: Recontrast: Domain-specific anomaly detection via contrastive reconstruction. Advances in Neural Information Pro- cessing Systems36, 10721–10740 (2023)
2023
-
[52]
In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recogni- tion. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016)
2016
-
[53]
In: International Conference on Machine Learning, pp
Tan, M., Le, Q.: Efficientnet: Rethinking model scaling for convolutional neu- ral networks. In: International Conference on Machine Learning, pp. 6105–6114 (2019). PMLR
2019
-
[54]
An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale
Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., et al.: An image is worth 16x16 words: Transformers for image recognition at scale. arXiv preprint arXiv:2010.11929 (2020)
work page internal anchor Pith review Pith/arXiv arXiv 2010
-
[55]
In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp
Caron, M., Touvron, H., Misra, I., J´ egou, H., Mairal, J., Bojanowski, P., Joulin, A.: Emerging properties in self-supervised vision transformers. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 9650–9660 (2021)
2021
-
[56]
In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp
Bao, F., Nie, S., Xue, K., Cao, Y., Li, C., Su, H., Zhu, J.: All are worth words: A vit backbone for diffusion models. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 22669–22679 (2023) 30 Preprint Appendix A Category Breakdown of S1, S2, and S3 in Real-IAD Variety To systematically evaluate the impact of categor...
2023
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.