ASBench: Image Anomalies Synthesis Benchmark for Anomaly Detection
Pith reviewed 2026-05-18 09:11 UTC · model grok-4.3
The pith
ASBench is the first dedicated benchmarking framework for evaluating anomaly synthesis methods using four specific dimensions that previous work had overlooked.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The paper establishes that anomaly synthesis methods must be assessed independently from any particular detection pipeline, and that doing so along the four dimensions of cross-dataset generalization, synthetic-to-real data ratios, correlation between intrinsic image metrics and detection scores, and hybrid synthesis strategies reveals previously hidden limitations and supplies practical guidance for improving anomaly detection.
What carries the argument
ASBench, a benchmarking framework that evaluates anomaly synthesis methods by testing generalization performance across datasets and pipelines, the ratio of synthetic to real data, the correlation of intrinsic synthesis-image metrics with detection performance, and strategies for hybrid synthesis methods.
If this is right
- Synthesis methods that appear strong inside one detection pipeline often lose effectiveness when moved to different datasets or detectors.
- There exists an optimal balance between the number of synthetic anomalies and real anomalies that improves overall detection accuracy.
- Basic measurable properties of the synthesized images can be used to predict how much they will help a downstream anomaly detector.
- Combining multiple synthesis techniques produces better training data than relying on any single method.
Where Pith is reading between the lines
- The same four-dimension testing approach could be applied to synthetic data generation in medical imaging or autonomous-vehicle perception to find similar hidden weaknesses.
- Teams building industrial inspection systems could run ASBench-style checks during development to pick or tune synthesis methods before collecting more real defects.
- Standardized results from ASBench might eventually support shared libraries of evaluated synthetic anomaly images for the broader research community.
Load-bearing premise
The assumption that these four evaluation dimensions are the key factors whose measurement will reliably expose limitations in existing anomaly synthesis methods.
What would settle it
If large-scale experiments run through ASBench find that all tested synthesis methods yield nearly identical results across the four dimensions and that intrinsic image metrics show no consistent link to detection accuracy.
Figures
read the original abstract
Anomaly detection plays a pivotal role in manufacturing quality control, yet its application is constrained by limited abnormal samples and high manual annotation costs. While anomaly synthesis offers a promising solution, existing studies predominantly treat anomaly synthesis as an auxiliary component within anomaly detection frameworks, lacking systematic evaluation of anomaly synthesis algorithms. Current research also overlook crucial factors specific to anomaly synthesis, such as decoupling its impact from detection, quantitative analysis of synthetic data and adaptability across different scenarios. To address these limitations, we propose ASBench, the first comprehensive benchmarking framework dedicated to evaluating anomaly synthesis methods. Our framework introduces four critical evaluation dimensions: (i) the generalization performance across different datasets and pipelines (ii) the ratio of synthetic to real data (iii) the correlation between intrinsic metrics of synthesis images and anomaly detection performance metrics , and (iv) strategies for hybrid anomaly synthesis methods. Through extensive experiments, ASBench not only reveals limitations in current anomaly synthesis methods but also provides actionable insights for future research directions in anomaly synthesis
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript proposes ASBench as the first comprehensive benchmarking framework for evaluating anomaly synthesis methods in image anomaly detection, motivated by the scarcity of abnormal samples in manufacturing applications. It defines four evaluation dimensions: (i) generalization performance across datasets and pipelines, (ii) the ratio of synthetic to real data, (iii) correlation between intrinsic synthesis metrics and detection performance, and (iv) strategies for hybrid anomaly synthesis methods. The authors state that extensive experiments using this framework reveal limitations in current synthesis methods and yield actionable insights for future research.
Significance. If the experiments robustly implement the four dimensions and demonstrate that they isolate synthesis effects, ASBench could establish a much-needed standardized protocol for assessing anomaly synthesis techniques. This would be valuable for the field, as it moves beyond treating synthesis as a mere auxiliary tool and instead provides quantitative guidance on generalization, data ratios, metric correlations, and hybrid approaches, potentially improving anomaly detection performance where real anomalies are rare.
major comments (1)
- [Section describing the four evaluation dimensions] Description of dimension (i): The framework's claim to decouple the impact of anomaly synthesis from detection pipelines rests on testing generalization across pipelines. However, if the selected pipelines share similar feature extractors or anomaly scoring heuristics, observed performance differences could reflect those shared structures rather than intrinsic synthesis limitations. The manuscript should explicitly document the diversity of pipelines (e.g., reconstruction-based vs. embedding-based with distinct backbones) and include controls to verify that the dimension isolates synthesis quality as intended.
minor comments (2)
- [Abstract] Abstract: The phrase 'Current research also overlook crucial factors' is grammatically incorrect and should read 'overlooks'.
- [Abstract] Abstract: The four evaluation dimensions are listed in a single sentence without clear separation; numbering or bullet points would improve readability.
Simulated Author's Rebuttal
We thank the referee for the detailed and constructive comment on evaluation dimension (i). The observation highlights an important aspect of ensuring that the benchmark truly isolates the effects of anomaly synthesis. We have revised the manuscript to provide the requested documentation and controls.
read point-by-point responses
-
Referee: [Section describing the four evaluation dimensions] Description of dimension (i): The framework's claim to decouple the impact of anomaly synthesis from detection pipelines rests on testing generalization across pipelines. However, if the selected pipelines share similar feature extractors or anomaly scoring heuristics, observed performance differences could reflect those shared structures rather than intrinsic synthesis limitations. The manuscript should explicitly document the diversity of pipelines (e.g., reconstruction-based vs. embedding-based with distinct backbones) and include controls to verify that the dimension isolates synthesis quality as intended.
Authors: We agree that explicit documentation of pipeline diversity is necessary to support the decoupling claim. In the revised manuscript, we have expanded Section 3.2 to list the exact pipelines employed, which comprise both reconstruction-based approaches (e.g., Autoencoder and GAN-based reconstruction with varying decoder depths) and embedding-based methods (e.g., PatchCore with ResNet-18 backbone and CFLOW with Vision Transformer backbone). These choices were selected to cover distinct feature extraction mechanisms and scoring heuristics. To verify isolation, we added control experiments in Section 4.1 and Appendix C: for each fixed synthesis method, we independently swap feature extractors and scoring functions while measuring detection performance variance. The results show that synthesis-induced differences remain consistent across these swaps, indicating that the observed generalization gaps are attributable to synthesis quality rather than shared pipeline structures. We believe these additions directly address the concern. revision: yes
Circularity Check
No circularity: benchmark framework proposal with independent evaluation dimensions
full rationale
The paper introduces ASBench as an empirical benchmarking framework consisting of four evaluation dimensions for anomaly synthesis methods. No equations, derivations, or fitted parameters are present that reduce any claimed result or prediction to quantities defined by the authors' own inputs or self-citations. The central contribution is the definition and application of the benchmark itself, which stands as an external evaluation tool rather than a closed derivation. Self-citations, if any, are not load-bearing for the framework's validity.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption Standard practices in anomaly detection and image synthesis evaluation are assumed valid.
Reference graph
Works this paper leans on
-
[1]
Mvtec ad – a comprehensive real-world dataset for unsupervised anomaly detection
Paul Bergmann, Michael Fauser, David Sattlegger, and Carsten Ste- ger. Mvtec ad – a comprehensive real-world dataset for unsupervised anomaly detection. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), June 2019
work page 2019
-
[2]
Uninformed students: Student-teacher anomaly detection with discrimi- native latent embeddings
Paul Bergmann, Michael Fauser, David Sattlegger, and Carsten Steger. Uninformed students: Student-teacher anomaly detection with discrimi- native latent embeddings. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 4183–4192, 2020
work page 2020
-
[3]
Rad: A comprehensive dataset for benchmarking the robustness of image anomaly detection
Yuqi Cheng, Yunkang Cao, Rui Chen, and Weiming Shen. Rad: A comprehensive dataset for benchmarking the robustness of image anomaly detection. In2024 IEEE 20th International Conference on Automation Science and Engineering (CASE), pages 2123–2128. IEEE, 2024. 13
work page 2024
-
[4]
Anomaly detection via reverse distillation from one-class embedding
Hanqiu Deng and Xingyu Li. Anomaly detection via reverse distillation from one-class embedding. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 9737–9746, 2022
work page 2022
-
[5]
Huilin Deng, Hongchen Luo, Wei Zhai, Yanming Guo, Yang Cao, and Yu Kang. Prioritized local matching network for cross-category few- shot anomaly detection.IEEE Transactions on Artificial Intelligence, 5(9):4550–4561, 2024
work page 2024
-
[6]
Terrance DeVries and Graham W. Taylor. Improved regularization of convolutional neural networks with cutout, 2017
work page 2017
-
[7]
Jan Diers and Christian Pigorsch. A survey of methods for automated quality control based on images.International Journal of Computer Vision, 131(10):2553–2581, 2023
work page 2023
-
[8]
Zongwei Du, Liang Gao, and Xinyu Li. A new contrastive gan with data augmentation for surface defect recognition under limited data.IEEE Transactions on Instrumentation and Measurement, 72:1–13, 2022
work page 2022
-
[9]
Few-shot defect image generation via defect-aware feature manipulation
Yuxuan Duan, Yan Hong, Li Niu, and Liqing Zhang. Few-shot defect image generation via defect-aware feature manipulation. InProceedings of the AAAI conference on artificial intelligence, volume 37, pages 571– 578, 2023
work page 2023
-
[10]
Model selection of anomaly detectors in the absence of labeled validation data
Clement Fung, Chen Qiu, Aodong Li, and Maja Rudolph. Model selection of anomaly detectors in the absence of labeled validation data. IEEE Transactions on Artificial Intelligence, pages 1–10, 2025
work page 2025
-
[11]
John C Hart. Perlin noise pixel shaders. InProceedings of the ACM SIGGRAPH/EUROGRAPHICS workshop on Graphics hardware, pages 87–94, 2001
work page 2001
-
[12]
Jonathan Ho, Ajay Jain, and Pieter Abbeel. Denoising diffusion prob- abilistic models.Advances in neural information processing systems, 33:6840–6851, 2020
work page 2020
-
[13]
Anomalydiffusion: Few-shot anomaly image generation with diffusion model
Teng Hu, Jiangning Zhang, Ran Yi, Yuzhen Du, Xu Chen, Liang Liu, Yabiao Wang, and Chengjie Wang. Anomalydiffusion: Few-shot anomaly image generation with diffusion model. InProceedings of the AAAI conference on artificial intelligence, volume 38, pages 8526–8534, 2024
work page 2024
-
[14]
Surface defect saliency of magnetic tile.The Visual Computer, 36:85 – 96, 2018
Yibin Huang, Congying Qiu, and Kui Yuan. Surface defect saliency of magnetic tile.The Visual Computer, 36:85 – 96, 2018
work page 2018
-
[15]
Stepan Jezek, Martin Jonak, Radim Burget, Pavel Dvorak, and Milos Skotak. Deep learning-based defect detection of metal parts: evaluating current methods in complex conditions. In2021 13th International Congress on Ultra Modern Telecommunications and Control Systems and Workshops (ICUMT), pages 66–71, 2021
work page 2021
-
[16]
Mmad: A comprehensive benchmark for multimodal large language models in industrial anomaly detection
Xi Jiang, Jian Li, Hanqiu Deng, Yong Liu, Bin-Bin Gao, Yifeng Zhou, Jialin Li, Chengjie Wang, and Feng Zheng. Mmad: A comprehensive benchmark for multimodal large language models in industrial anomaly detection. InThe Thirteenth International Conference on Learning Representations
-
[17]
Cut- paste: Self-supervised learning for anomaly detection and localization
Chun-Liang Li, Kihyuk Sohn, Jinsung Yoon, and Tomas Pfister. Cut- paste: Self-supervised learning for anomaly detection and localization. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 9664–9674, 2021
work page 2021
-
[18]
Hanxi Li, Zhengxun Zhang, Hao Chen, Lin Wu, Bo Li, Deyin Liu, and Mingwen Wang. A novel approach to industrial defect generation through blended latent diffusion model with online adaptation.arXiv preprint arXiv:2402.19330, 2024
-
[19]
Multi- sensor object anomaly detection: Unifying appearance, geometry, and internal properties
Wenqiao Li, Bozhong Zheng, Xiaohao Xu, Jinye Gan, Fading Lu, Xiang Li, Na Ni, Zheng Tian, Xiaonan Huang, Shenghua Gao, et al. Multi- sensor object anomaly detection: Unifying appearance, geometry, and internal properties. InProceedings of the Computer Vision and Pattern Recognition Conference, pages 9984–9993, 2025
work page 2025
-
[20]
Jing-Xiao Liao, Bo-Jian Hou, Hang-Cheng Dong, Hao Zhang, Xiaoge Zhang, Jinwei Sun, Shiping Zhang, and Feng-Lei Fan. Quadratic neuron- empowered heterogeneous autoencoder for unsupervised anomaly de- tection.IEEE Transactions on Artificial Intelligence, 5(9):4723–4737, 2024
work page 2024
-
[21]
Zhikang Liu, Yiming Zhou, Yuansheng Xu, and Zilei Wang. Simplenet: A simple network for image anomaly detection and localization.2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 20402–20411, 2023
work page 2023
-
[22]
Vt-adl: A vision transformer network for image anomaly detection and localization
Pankaj Mishra, Riccardo Verk, Daniele Fornasier, Claudio Piciarelli, and Gian Luca Foresti. Vt-adl: A vision transformer network for image anomaly detection and localization. In2021 IEEE 30th International Symposium on Industrial Electronics (ISIE), page 01–06. IEEE, June 2021
work page 2021
-
[23]
Anomaly detection with conditioned denoising diffusion models
Arian Mousakhan, Thomas Brox, and Jawad Tayyub. Anomaly detection with conditioned denoising diffusion models. InDAGM German Conference on Pattern Recognition, pages 181–195. Springer, 2024
work page 2024
-
[24]
Improved techniques for training gans.Advances in neural information processing systems, 29, 2016
Tim Salimans, Ian Goodfellow, Wojciech Zaremba, Vicki Cheung, Alec Radford, and Xi Chen. Improved techniques for training gans.Advances in neural information processing systems, 29, 2016
work page 2016
-
[25]
Natural synthetic anomalies for self-supervised anomaly detection and localization
Hannah M Schl ¨uter, Jeremy Tan, Benjamin Hou, and Bernhard Kainz. Natural synthetic anomalies for self-supervised anomaly detection and localization. InEuropean Conference on Computer Vision, pages 474–
-
[26]
Few-shot defect image generation based on consistency modeling
Qingfeng Shi, Jing Wei, Fei Shen, and Zhengtao Zhang. Few-shot defect image generation based on consistency modeling. InEuropean Conference on Computer Vision, pages 360–376. Springer, 2024
work page 2024
-
[27]
Jeremy Tan, Benjamin Hou, James Batten, Huaqi Qiu, and Bernhard Kainz. Detecting outliers with foreign patch interpolation.Machine Learning for Biomedical Imaging, 1(April 2022):1–27, April 2022
work page 2022
-
[28]
Detecting outliers with poisson image interpolation
Jeremy Tan, Benjamin Hou, Thomas Day, John Simpson, Daniel Rueckert, and Bernhard Kainz. Detecting outliers with poisson image interpolation. InInternational Conference on Medical Image Computing and Computer-Assisted Intervention, pages 581–591. Springer, 2021
work page 2021
-
[29]
Xian Tao, Shaohua Yan, Xinyi Gong, and Chandranath Adak. Learning multiresolution features for unsupervised anomaly localization on in- dustrial textured surfaces.IEEE Transactions on Artificial Intelligence, 5(1):127–139, 2024
work page 2024
-
[30]
Real-iad: A real-world multi-view dataset for benchmarking versatile industrial anomaly detection
Chengjie Wang, Wenbing Zhu, Bin-Bin Gao, Zhenye Gan, Jiangning Zhang, Zhihao Gu, Shuguang Qian, Mingang Chen, and Lizhuang Ma. Real-iad: A real-world multi-view dataset for benchmarking versatile industrial anomaly detection. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 22883– 22892, 2024
work page 2024
-
[31]
Xuan Xia, Weijie Lv, Xing He, Nan Li, Chuanqi Liu, and Ning Ding. Fractalad: A simple industrial anomaly detection method using fractal anomaly generation and backbone knowledge distillation. In2024 International Joint Conference on Neural Networks (IJCNN), pages 1–9. IEEE, 2024
work page 2024
-
[32]
Guoyang Xie, Jinbao Wang, Jiaqi Liu, Jiayi Lyu, Yong Liu, Chengjie Wang, Feng Zheng, and Yaochu Jin. Im-iad: Industrial image anomaly detection benchmark in manufacturing.IEEE Transactions on Cyber- netics, 2024
work page 2024
-
[33]
Minghui Yang, Peng Wu, and Hui Feng. Memseg: A semi-supervised method for image surface defect detection using differences and commonalities.Engineering Applications of Artificial Intelligence, 119:105835, 2023
work page 2023
-
[34]
Draem-a discrimi- natively trained reconstruction embedding for surface anomaly detection
Vitjan Zavrtanik, Matej Kristan, and Danijel Sko ˇcaj. Draem-a discrimi- natively trained reconstruction embedding for surface anomaly detection. InProceedings of the IEEE/CVF International Conference on Computer Vision, pages 8330–8339, 2021
work page 2021
-
[35]
Reconstruction by inpainting for visual anomaly detection.Pattern Recognition, 112:107706, 2021
Vitjan Zavrtanik, Matej Kristan, and Danijel Sko ˇcaj. Reconstruction by inpainting for visual anomaly detection.Pattern Recognition, 112:107706, 2021
work page 2021
-
[36]
Dsr–a dual sub- space re-projection network for surface anomaly detection
Vitjan Zavrtanik, Matej Kristan, and Danijel Sko ˇcaj. Dsr–a dual sub- space re-projection network for surface anomaly detection. InEuropean conference on computer vision, pages 539–554. Springer, 2022
work page 2022
-
[37]
Defect- gan: High-fidelity defect synthesis for automated defect inspection
Gongjie Zhang, Kaiwen Cui, Tzu-Yi Hung, and Shijian Lu. Defect- gan: High-fidelity defect synthesis for automated defect inspection. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pages 2524–2534, 2021
work page 2021
-
[38]
Hui Zhang, Zheng Wang, Dan Zeng, Zuxuan Wu, and Yu-Gang Jiang. Diffusionad: Norm-guided one-step denoising diffusion for anomaly detection.IEEE Transactions on Pattern Analysis and Machine Intelli- gence, 2025
work page 2025
-
[39]
Jiangning Zhang, Haoyang He, Zhenye Gan, Qingdong He, Yuxuan Cai, Zhucun Xue, Yabiao Wang, Chengjie Wang, Lei Xie, and Yong Liu. Ader: A comprehensive benchmark for multi-class visual anomaly detection.arXiv preprint arXiv:2406.03262, 2024
-
[40]
Realnet: A feature selection network with realistic synthetic anomaly for anomaly detection
Ximiao Zhang, Min Xu, and Xiuzhuang Zhou. Realnet: A feature selection network with realistic synthetic anomaly for anomaly detection. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 16699–16708, 2024
work page 2024
-
[41]
Xuan Zhang, Shiyu Li, Xi Li, Ping-Chia Huang, Jiulong Shan, and Ting Chen. Destseg: Segmentation guided denoising student-teacher for anomaly detection.2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 3914–3923, 2022
work page 2023
-
[42]
Qiang Zhou, Weize Li, Lihan Jiang, Guoliang Wang, Guyue Zhou, Shanghang Zhang, and Hao Zhao. Pad: A dataset and benchmark for pose-agnostic anomaly detection.Advances in Neural Information Processing Systems, 36:44558–44571, 2023
work page 2023
-
[43]
Spot-the-difference self-supervised pre-training for anomaly 14 detection and segmentation
Yang Zou, Jongheon Jeong, Latha Pemula, Dongqing Zhang, and Onkar Dabeer. Spot-the-difference self-supervised pre-training for anomaly 14 detection and segmentation. InEuropean conference on computer vision, pages 392–408. Springer, 2022
work page 2022
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.