AnomalyVFM -- Transforming Vision Foundation Models into Zero-Shot Anomaly Detectors
Pith reviewed 2026-05-16 10:43 UTC · model grok-4.3
The pith
AnomalyVFM converts any vision foundation model into a zero-shot anomaly detector using synthetic data and efficient adaptation.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
AnomalyVFM is a framework that combines a three-stage synthetic dataset generation scheme with parameter-efficient adaptation using low-rank feature adapters and a confidence-weighted pixel loss to transform pretrained vision foundation models into strong zero-shot anomaly detectors, achieving superior performance on multiple datasets.
What carries the argument
The three-stage synthetic anomaly dataset generation combined with low-rank feature adapters and confidence-weighted pixel loss, which adapts the VFM features for anomaly scoring without full retraining.
If this is right
- Any pretrained VFM can be turned into a zero-shot anomaly detector without domain-specific training images.
- Performance on zero-shot anomaly detection improves significantly, reaching 94.1% AUROC on average across diverse datasets.
- The method closes the gap with VLM-based approaches by using only vision models.
- Adaptation is parameter-efficient, making it practical for large models.
Where Pith is reading between the lines
- Such synthetic data methods could extend to other vision tasks where real anomalies are scarce.
- Future work might explore combining this with multi-modal models for even better localization.
- Testing on more industrial or medical domains could reveal if the synthetic generation generalizes further.
Load-bearing premise
The synthetic anomalies generated in three stages have statistical properties close enough to real anomalies in the nine test domains.
What would settle it
Evaluating AnomalyVFM on a new dataset where real anomalies differ markedly in appearance or distribution from the synthetic ones generated, and finding performance drops below prior methods.
Figures
read the original abstract
Zero-shot anomaly detection aims to detect and localise abnormal regions in the image without access to any in-domain training images. While recent approaches leverage vision-language models (VLMs), such as CLIP, to transfer high-level concept knowledge, methods based on purely vision foundation models (VFMs), like DINOv2, have lagged behind in performance. We argue that this gap stems from two practical issues: (i) limited diversity in existing auxiliary anomaly detection datasets and (ii) overly shallow VFM adaptation strategies. To address both challenges, we propose AnomalyVFM, a general and effective framework that turns any pretrained VFM into a strong zero-shot anomaly detector. Our approach combines a robust three-stage synthetic dataset generation scheme with a parameter-efficient adaptation mechanism, utilising low-rank feature adapters and a confidence-weighted pixel loss. Together, these components enable modern VFMs to substantially outperform current state-of-the-art methods. More specifically, with RADIO as a backbone, AnomalyVFM achieves an average image-level AUROC of 94.1% across 9 diverse datasets, surpassing previous methods by significant 3.3 percentage points. Project Page: https://maticfuc.github.io/anomaly_vfm/
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper proposes AnomalyVFM, a framework that adapts pretrained vision foundation models (VFMs) such as RADIO and DINOv2 into zero-shot anomaly detectors. It combines a three-stage synthetic anomaly dataset generation scheme with parameter-efficient low-rank feature adapters and a confidence-weighted pixel loss. The central empirical claim is that this yields an average image-level AUROC of 94.1% across nine diverse datasets with the RADIO backbone, outperforming prior methods by 3.3 percentage points.
Significance. If the performance claims hold under rigorous validation, the work would be significant for zero-shot anomaly detection. It shows that modern VFMs can close the gap with VLM-based approaches through synthetic data augmentation and efficient adaptation rather than direct parameter fitting to target domains. This could enable more practical deployment in domains with scarce real anomalies, such as industrial inspection.
major comments (3)
- [§3] §3 (Synthetic Dataset Generation): The three-stage procedure is load-bearing for the zero-shot claim, yet the manuscript provides no quantitative validation (e.g., feature distribution distances, texture statistics, or scale histograms) that the generated anomalies match the visual properties of real anomalies across the nine evaluation domains. Without this, the reported gains may reflect memorization of the generation recipe rather than transferable anomaly cues.
- [Results section, Table 1] Results section, Table 1: The 94.1% average AUROC and 3.3pp improvement are presented without statistical significance tests, standard deviations across runs, or explicit train/test split details. This leaves moderate uncertainty about whether the central performance claim is robust.
- [§4.2] §4.2 (Ablation Studies): The ablations on adapter rank and loss weighting coefficient do not include controls that vary the synthetic generation parameters, making it impossible to isolate whether gains arise from the adaptation mechanism or from the specific synthetic distribution.
minor comments (2)
- [Abstract] The abstract states 'surpassing previous methods by significant 3.3 percentage points' without naming the exact baselines or providing the per-dataset breakdown in the summary.
- [§4.1] Notation for the confidence-weighted loss could be clarified with an explicit equation reference to avoid ambiguity in the weighting term.
Simulated Author's Rebuttal
We thank the referee for the constructive feedback, which highlights important aspects of rigor in validating the synthetic data, statistical robustness, and ablation design. We address each major comment below and will incorporate revisions to strengthen the manuscript.
read point-by-point responses
-
Referee: [§3] §3 (Synthetic Dataset Generation): The three-stage procedure is load-bearing for the zero-shot claim, yet the manuscript provides no quantitative validation (e.g., feature distribution distances, texture statistics, or scale histograms) that the generated anomalies match the visual properties of real anomalies across the nine evaluation domains. Without this, the reported gains may reflect memorization of the generation recipe rather than transferable anomaly cues.
Authors: We agree that explicit quantitative validation of the synthetic anomalies would better support the zero-shot transferability claim. Although the consistent gains across nine diverse, unseen datasets provide indirect evidence against pure memorization, we will add analyses in the revised manuscript, including feature distribution distances (e.g., MMD between VFM embeddings of synthetic vs. real anomalies), texture statistics (e.g., LBP histograms), and scale histograms. These will be reported for representative datasets to demonstrate alignment with real anomaly properties. revision: yes
-
Referee: [Results section, Table 1] Results section, Table 1: The 94.1% average AUROC and 3.3pp improvement are presented without statistical significance tests, standard deviations across runs, or explicit train/test split details. This leaves moderate uncertainty about whether the central performance claim is robust.
Authors: We acknowledge that reporting variability and significance strengthens the central claim. In the revision, we will add standard deviations computed over multiple random seeds (minimum 3 runs), paired statistical significance tests (e.g., t-tests) against prior methods, and explicit clarification of the evaluation protocol: zero-shot adaptation uses only the synthetic dataset with no in-domain real images, while test splits follow the standard benchmarks from prior zero-shot anomaly detection literature. revision: yes
-
Referee: [§4.2] §4.2 (Ablation Studies): The ablations on adapter rank and loss weighting coefficient do not include controls that vary the synthetic generation parameters, making it impossible to isolate whether gains arise from the adaptation mechanism or from the specific synthetic distribution.
Authors: We will expand §4.2 with new ablation experiments that systematically vary synthetic generation parameters (e.g., anomaly density, scale ranges, and blending factors in the three-stage pipeline) while holding the adapter and loss fixed. This will help disentangle the contributions of the low-rank feature adapters and confidence-weighted pixel loss from the synthetic data characteristics, with results added as an additional table or figure. revision: yes
Circularity Check
No circularity: method uses independent synthetic data and external pretrained backbones
full rationale
The paper's core derivation trains low-rank adapters on procedurally generated synthetic anomalies (three-stage scheme) and applies them zero-shot to nine real evaluation datasets. No equations or steps reduce by construction to fitted parameters from the target distributions, no self-citation chains justify uniqueness or ansatzes, and no predictions are statistically forced by input fitting. The 94.1% AUROC claim is an external evaluation result, not a tautology. This is the normal self-contained case.
Axiom & Free-Parameter Ledger
free parameters (2)
- adapter rank
- loss weighting coefficient
axioms (1)
- domain assumption Pretrained vision foundation models contain transferable features useful for anomaly localization without task-specific fine-tuning.
Reference graph
Works this paper leans on
-
[1]
Zero-shot versus many-shot: Unsupervised texture anomaly detection
Toshimichi Aota, Lloyd Teh Tzer Tong, and Takayuki Okatani. Zero-shot versus many-shot: Unsupervised texture anomaly detection. InProceedings of the IEEE/CVF Win- ter Conference on Applications of Computer Vision, pages 5564–5572, 2023. 2, 5
work page 2023
-
[2]
Efficien- tAD: Accurate Visual Anomaly Detection at Millisecond- Level Latencies
Kilian Batzner, Lars Heckler, and Rebecca K ¨onig. Efficien- tAD: Accurate Visual Anomaly Detection at Millisecond- Level Latencies. InProceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pages 128– 138, 2024. 2, 8
work page 2024
-
[3]
Improving Unsupervised Defect Segmentation by Applying Structural Similarity to Autoencoders
Paul Bergmann, Sindy L ¨owe, Michael Fauser, David Sattleg- ger, and Carsten Steger. Improving Unsupervised defect seg- mentation by applying structural similarity to autoencoders. ArXiv, abs/1807.02011, 2018. 2
work page internal anchor Pith review Pith/arXiv arXiv 2018
-
[4]
MVTec AD–A Comprehensive Real-World Dataset for Unsupervised Anomaly Detection
Paul Bergmann, Michael Fauser, David Sattlegger, and Carsten Steger. MVTec AD–A Comprehensive Real-World Dataset for Unsupervised Anomaly Detection. InProceed- ings of the IEEE/CVF conference on computer vision and pattern recognition, pages 9592–9600, 2019. 1, 2, 5, 7
work page 2019
-
[5]
Wm-dova maps for accurate polyp highlighting in colonoscopy: Validation vs
Jorge Bernal, F Javier S ´anchez, Gloria Fern ´andez- Esparrach, Debora Gil, Cristina Rodr ´ıguez, and Fernando Vilari˜no. Wm-dova maps for accurate polyp highlighting in colonoscopy: Validation vs. saliency maps from physi- cians.Computerized medical imaging and graphics, 43:99– 111, 2015. 5
work page 2015
-
[6]
Language mod- els are realistic tabular data generators.arXiv preprint arXiv:2210.06280, 2022
Vadim Borisov, Kathrin Seßler, Tobias Leemann, Mar- tin Pawelczyk, and Gjergji Kasneci. Language mod- els are realistic tabular data generators.arXiv preprint arXiv:2210.06280, 2022. 2
-
[7]
Jakob Bo ˇziˇc, Domen Tabernik, and Danijel Sko ˇcaj. Mixed supervision for surface-defect detection: From weakly to fully supervised learning.Computers in Industry, 129: 103459, 2021. 5
work page 2021
-
[8]
Yunkang Cao, Xiaohao Xu, Chen Sun, Yuqi Cheng, Zongwei Du, Liang Gao, and Weiming Shen. Segment Any Anomaly without Training via Hybrid Prompt Regularization.arXiv preprint arXiv:2305.10724, 2023. 2, 3, 6
-
[9]
AdaCLIP: Adapting CLIP with hybrid learnable prompts for zero-shot anomaly detection
Yunkang Cao, Jiangning Zhang, Luca Frittoli, Yuqi Cheng, Weiming Shen, and Giacomo Boracchi. AdaCLIP: Adapting CLIP with hybrid learnable prompts for zero-shot anomaly detection. InEuropean Conference on Computer Vision, pages 55–72. Springer, 2024. 1, 2, 3, 4, 5, 6, 7, 8
work page 2024
-
[10]
Back on track: Bundle adjustment for dynamic scene re- construction
Weirong Chen, Ganlin Zhang, Felix Wimbauer, Rui Wang, Nikita Araslanov, Andrea Vedaldi, and Daniel Cremers. Back on track: Bundle adjustment for dynamic scene re- construction. InProceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pages 4951–4960,
-
[11]
Clip-AD: A Language-Guided Staged Dual-Path Model for Zero-shot Anomaly Detection
Xuhai Chen, Jiangning Zhang, Guanzhong Tian, Haoyang He, Wuhao Zhang, Yabiao Wang, Chengjie Wang, and Yong Liu. Clip-AD: A Language-Guided Staged Dual-Path Model for Zero-shot Anomaly Detection. InInternational Joint Conference on Artificial Intelligence, pages 17–33. Springer,
-
[12]
Noel CF Codella, David Gutman, M Emre Celebi, Brian Helba, Michael A Marchetti, Stephen W Dusza, Aadi Kalloo, Konstantinos Liopyris, Nabin Mishra, Harald Kit- tler, et al. Skin lesion analysis toward melanoma detection: A challenge at the 2017 international symposium on biomed- ical imaging (isbi), hosted by the international skin imaging collaboration (i...
work page 2017
-
[13]
Padim: a patch distribution modeling framework for anomaly detection and localization
Thomas Defard, Aleksandr Setkov, Angelique Loesch, and Romaric Audigier. Padim: a patch distribution modeling framework for anomaly detection and localization. InInter- national conference on pattern recognition, pages 475–489. Springer, 2021. 2
work page 2021
-
[14]
Outlier detec- tion by ensembling uncertainty with negative objectness
Anja Deli ´c, Matej Grcic, and Sini ˇsa ˇSegvi´c. Outlier detec- tion by ensembling uncertainty with negative objectness. In 35th British Machine Vision Conference 2024, BMVC 2024, Glasgow, UK, November 25-28, 2024. BMV A, 2024. 1
work page 2024
-
[15]
Anomaly Detection via Re- verse Distillation from One-Class Embedding
Hanqiu Deng and Xingyu Li. Anomaly Detection via Re- verse Distillation from One-Class Embedding. InProceed- ings of the IEEE/CVF conference on computer vision and pattern recognition, pages 9737–9746, 2022. 2, 5
work page 2022
-
[16]
Yuxuan Duan, Yan Hong, Li Niu, and Liqing Zhang. Few- shot defect image generation via defect-aware feature ma- nipulation.Proceedings of the AAAI Conference on Artificial Intelligence, 37(1):571–578, 2023. 2
work page 2023
-
[17]
Scaling Rectified Flow Transformers for High-Resolution Image Synthesis
Patrick Esser, Sumith Kulal, Andreas Blattmann, Rahim Entezari, Jonas M ¨uller, Harry Saini, Yam Levi, Dominik Lorenz, Axel Sauer, Frederic Boesel, et al. Scaling Rectified Flow Transformers for High-Resolution Image Synthesis. In Forty-first International Conference on Machine Learning,
-
[18]
TransFusion–a Transparency-based Diffusion Model for Anomaly Detection
Matic Fu ˇcka, Vitjan Zavrtanik, and Danijel Sko ˇcaj. TransFusion–a Transparency-based Diffusion Model for Anomaly Detection. InEuropean conference on computer vision, pages 91–108. Springer, 2025. 1, 2
work page 2025
-
[19]
SALAD – Semantics-Aware Logical Anomaly Detection
Matic Fu ˇcka, Vitjan Zavrtanik, and Danijel Skoˇcaj. SALAD – Semantics-Aware Logical Anomaly Detection. InPro- ceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), 2025. 2
work page 2025
-
[20]
Multi- task learning for thyroid nodule segmentation with thyroid region prior
Haifan Gong, Guanqi Chen, Ranran Wang, Xiang Xie, Mingzhi Mao, Yizhou Yu, Fei Chen, and Guanbin Li. Multi- task learning for thyroid nodule segmentation with thyroid region prior. In2021 IEEE 18th international symposium on biomedical imaging (ISBI), pages 257–261. IEEE, 2021. 5
work page 2021
-
[21]
A. Hamada. Br35h: Brain tumor detection.https: //www.kaggle.com/datasets/ahmedhamada0/ braintumor - detection, 2020. Online; accessed
work page 2020
-
[22]
Steven A Hicks, Debesh Jha, Vajira Thambawita, P ˚al Halvorsen, Hugo L Hammer, and Michael A Riegler. The 9 endotect 2020 challenge: evaluation and comparison of clas- sification, segmentation and inference time for endoscopy. InInternational Conference on Pattern Recognition, pages 263–274. Springer, 2021. 5
work page 2020
-
[23]
LoRA: Low-rank adaptation of large language models
Edward J Hu, yelong shen, Phillip Wallis, Zeyuan Allen- Zhu, Yuanzhi Li, Shean Wang, Lu Wang, and Weizhu Chen. LoRA: Low-rank adaptation of large language models. InIn- ternational Conference on Learning Representations, 2022. 4, 8
work page 2022
-
[24]
Teng Hu, Jiangning Zhang, Ran Yi, Yuzhen Du, Xu Chen, Liang Liu, Yabiao Wang, and Chengjie Wang. Anomalyd- iffusion: Few-shot anomaly image generation with diffusion model.Proceedings of the AAAI Conference on Artificial Intelligence, 38(8):8526–8534, 2024. 2
work page 2024
-
[25]
Aaron Hurst, Adam Lerer, Adam P Goucher, Adam Perel- man, Aditya Ramesh, Aidan Clark, AJ Ostrow, Akila Weli- hinda, Alan Hayes, Alec Radford, et al. GPT-4o system card. arXiv preprint arXiv:2410.21276, 2024. 3
work page internal anchor Pith review Pith/arXiv arXiv 2024
-
[26]
WinCLIP: Zero- /Few-Shot Anomaly Classification and Segmentation
Jongheon Jeong, Yang Zou, Taewan Kim, Dongqing Zhang, Avinash Ravichandran, and Onkar Dabeer. WinCLIP: Zero- /Few-Shot Anomaly Classification and Segmentation. In Proceedings of the IEEE/CVF Conference on Computer Vi- sion and Pattern Recognition, pages 19606–19616, 2023. 3, 6, 7
work page 2023
-
[27]
Stepan Jezek, Martin Jonak, Radim Burget, Pavel Dvorak, and Milos Skotak. Deep learning-based defect detection of metal parts: evaluating current methods in complex condi- tions. In2021 13th International Congress on Ultra Modern Telecommunications and Control Systems and Workshops (ICUMT), pages 66–71, 2021. 5
work page 2021
-
[28]
Kvasir-seg: A segmented polyp dataset
Debesh Jha, Pia H Smedsrud, Michael A Riegler, P ˚al Halvorsen, Thomas De Lange, Dag Johansen, and H˚avard D Johansen. Kvasir-seg: A segmented polyp dataset. InIn- ternational conference on multimedia modeling, pages 451–
-
[29]
Springer, 2019. 1, 5
work page 2019
-
[30]
Menglin Jia, Luming Tang, Bor-Chun Chen, Claire Cardie, Serge Belongie, Bharath Hariharan, and Ser-Nam Lim. Vi- sual prompt tuning. InEuropean Conference on Computer Vision, pages 709–727. Springer, 2022. 8
work page 2022
-
[31]
Brain tumor detec- tion using mri images.Brain, 3(2):146–150, 2015
Pranita Balaji Kanade and PP Gumaste. Brain tumor detec- tion using mri images.Brain, 3(2):146–150, 2015. 5
work page 2015
-
[32]
Diffusion Models for Open-Vocabulary Segmen- tation
Laurynas Karazija, Iro Laina, Andrea Vedaldi, and Christian Rupprecht. Diffusion Models for Open-Vocabulary Segmen- tation. InEuropean Conference on Computer Vision, pages 299–317. Springer, 2024. 2
work page 2024
-
[33]
Repurpos- ing diffusion-based image generators for monocular depth estimation
Bingxin Ke, Anton Obukhov, Shengyu Huang, Nando Met- zger, Rodrigo Caye Daudt, and Konrad Schindler. Repurpos- ing diffusion-based image generators for monocular depth estimation. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 9492–9502,
-
[34]
Multi-task learning using uncertainty to weigh losses for scene geome- try and semantics
Alex Kendall, Yarin Gal, and Roberto Cipolla. Multi-task learning using uncertainty to weigh losses for scene geome- try and semantics. InProceedings of the IEEE conference on computer vision and pattern recognition, pages 7482–7491,
-
[35]
Alexander Kirillov, Eric Mintun, Nikhila Ravi, Hanzi Mao, Chloe Rolland, Laura Gustafson, Tete Xiao, Spencer White- head, Alexander C Berg, Wan-Yen Lo, et al. Segment any- thing. InProceedings of the IEEE/CVF International Con- ference on Computer Vision, pages 4015–4026, 2023. 3
work page 2023
-
[36]
Dataset Enhancement with Instance-Level Augmentations
Orest Kupyn and Christian Rupprecht. Dataset Enhancement with Instance-Level Augmentations. InEuropean Confer- ence on Computer Vision, pages 384–402. Springer, 2024. 2
work page 2024
-
[37]
Flux.https://github.com/ black-forest-labs/flux, 2024
Black Forest Labs. Flux.https://github.com/ black-forest-labs/flux, 2024. 2, 3, 5, 7, 8, 1
work page 2024
-
[38]
Aodong Li, Chen Qiu, Marius Kloft, Padhraic Smyth, Maja Rudolph, and Stephan Mandt. Zero-Shot Anomaly Detection via Batch Normalization.Advances in Neural Information Processing Systems, 36, 2024. 3
work page 2024
-
[39]
PromptAD: Learning Prompts with only Normal Samples for Few-Shot Anomaly Detection
Xiaofan Li, Zhizhong Zhang, Xin Tan, Chengwei Chen, Yanyun Qu, Yuan Xie, and Lizhuang Ma. PromptAD: Learning Prompts with only Normal Samples for Few-Shot Anomaly Detection. InProceedings of the IEEE/CVF Con- ference on Computer Vision and Pattern Recognition, pages 16838–16848, 2024. 7
work page 2024
-
[40]
PromptAD: Learning Prompts with only Normal Samples for Few-Shot Anomaly Detection
Xiaofan Li, Zhizhong Zhang, Xin Tan, Chengwei Chen, Yanyun Qu, Yuan Xie, and Lizhuang Ma. PromptAD: Learning Prompts with only Normal Samples for Few-Shot Anomaly Detection. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 16838–16848, 2024. 1
work page 2024
-
[41]
Dongze Lian, Daquan Zhou, Jiashi Feng, and Xinchao Wang. Scaling & Shifting Your Features: A New Baseline for Efficient Model Tuning.Advances in Neural Information Processing Systems, 35:109–123, 2022. 8
work page 2022
-
[42]
Focal loss for dense object detection
Tsung-Yi Lin, Priya Goyal, Ross Girshick, Kaiming He, and Piotr Doll´ar. Focal loss for dense object detection. InPro- ceedings of the IEEE international conference on computer vision, pages 2980–2988, 2017. 4
work page 2017
-
[43]
Jiahui Liu, Xin Wen, Shizhen Zhao, Yingxian Chen, and Xi- aojuan Qi. Can OOD Object Detectors Learn from Founda- tion Models? InEuropean Conference on Computer Vision, pages 213–231. Springer, 2024. 2
work page 2024
-
[44]
Grounding DINO: Marrying dino with grounded pre-training for open-set object detection
Shilong Liu, Zhaoyang Zeng, Tianhe Ren, Feng Li, Hao Zhang, Jie Yang, Qing Jiang, Chunyuan Li, Jianwei Yang, Hang Su, et al. Grounding DINO: Marrying dino with grounded pre-training for open-set object detection. In European Conference on Computer Vision, pages 38–55. Springer, 2024. 3
work page 2024
-
[45]
SimpleNet: A Simple Network for Image Anomaly Detec- tion and Localization
Zhikang Liu, Yiming Zhou, Yuansheng Xu, and Zilei Wang. SimpleNet: A Simple Network for Image Anomaly Detec- tion and Localization. InProceedings of the IEEE/CVF Con- ference on Computer Vision and Pattern Recognition, pages 20402–20411, 2023. 2
work page 2023
-
[46]
RePaint: Inpainting using denoising diffusion probabilistic models
Andreas Lugmayr, Martin Danelljan, Andres Romero, Fisher Yu, Radu Timofte, and Luc Van Gool. RePaint: Inpainting using denoising diffusion probabilistic models. InProceed- ings of the IEEE/CVF conference on computer vision and pattern recognition, pages 11461–11471, 2022. 3
work page 2022
-
[47]
Exploring intrinsic normal prototypes within a single im- age for universal anomaly detection
Wei Luo, Yunkang Cao, Haiming Yao, Xiaotian Zhang, Jianan Lou, Yuqi Cheng, Weiming Shen, and Wenyong Yu. Exploring intrinsic normal prototypes within a single im- age for universal anomaly detection. InProceedings of the 10 Computer Vision and Pattern Recognition Conference, pages 9974–9983, 2025. 7
work page 2025
-
[48]
Aa-clip: Enhancing zero-shot anomaly detection via anomaly-aware clip
Wenxin Ma, Xu Zhang, Qingsong Yao, Fenghe Tang, Chenxu Wu, Yingtai Li, Rui Yan, Zihang Jiang, and S Kevin Zhou. Aa-clip: Enhancing zero-shot anomaly detection via anomaly-aware clip. InProceedings of the Computer Vi- sion and Pattern Recognition Conference, pages 4744–4754,
-
[49]
VT-ADL: A vision trans- former network for image anomaly detection and localiza- tion
Pankaj Mishra, Riccardo Verk, Daniele Fornasier, Claudio Piciarelli, and Gian Luca Foresti. VT-ADL: A vision trans- former network for image anomaly detection and localiza- tion. In30th IEEE/IES International Symposium on Indus- trial Electronics (ISIE), 2021. 5
work page 2021
-
[50]
Maxime Oquab, Timoth ´ee Darcet, Th´eo Moutakanni, Huy V . V o, Marc Szafraniec, Vasil Khalidov, Pierre Fernandez, Daniel HAZIZA, Francisco Massa, Alaaeldin El-Nouby, Mido Assran, Nicolas Ballas, Wojciech Galuba, Russell Howes, Po-Yao Huang, Shang-Wen Li, Ishan Misra, Michael Rabbat, Vasu Sharma, Gabriel Synnaeve, Hu Xu, Herve Je- gou, Julien Mairal, Patr...
work page 2024
-
[51]
Inpainting transformer for anomaly detection
Jonathan Pirnay and Keng Chai. Inpainting transformer for anomaly detection. InInternational Conference on Image Analysis and Processing, pages 394–406. Springer, 2022. 2
work page 2022
-
[52]
Supporting high-level to low-level requirements coverage reviewing with large lan- guage models
Anamaria-Roberta Preda, Christoph Mayr-Dorn, Atif Mashkoor, and Alexander Egyed. Supporting high-level to low-level requirements coverage reviewing with large lan- guage models. InProceedings of the 21st International Con- ference on Mining Software Repositories, pages 242–253,
-
[53]
Highly Accurate Dichotomous Im- age Segmentation
Xuebin Qin, Hang Dai, Xiaobin Hu, Deng-Ping Fan, Ling Shao, and Luc Van Gool. Highly Accurate Dichotomous Im- age Segmentation. InEuropean Conference on Computer Vision, pages 38–56. Springer, 2022. 3
work page 2022
-
[54]
Bayesian Prompt Flow Learning for Zero-Shot Anomaly De- tection
Zhen Qu, Xian Tao, Xinyi Gong, ShiChen Qu, Qiyu Chen, Zhengtao Zhang, Xingang Wang, and Guiguang Ding. Bayesian Prompt Flow Learning for Zero-Shot Anomaly De- tection. InProceedings of the Computer Vision and Pattern Recognition Conference (CVPR), pages 30398–30408, 2025. 1, 5, 6, 7, 8
work page 2025
-
[55]
Learn- ing Transferable Visual Models From Natural Language Su- pervision
Alec Radford, Jong Wook Kim, Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sandhini Agarwal, Girish Sastry, Amanda Askell, Pamela Mishkin, Jack Clark, et al. Learn- ing Transferable Visual Models From Natural Language Su- pervision. InInternational conference on machine learning, pages 8748–8763. PMLR, 2021. 1, 3
work page 2021
-
[56]
AM-RADIO: Agglomerative vision founda- tion model reduce all domains into one
Mike Ranzinger, Greg Heinrich, Jan Kautz, and Pavlo Molchanov. AM-RADIO: Agglomerative vision founda- tion model reduce all domains into one. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 12490–12500, 2024. 5
work page 2024
-
[57]
Bla ˇz Rolih, Matic Fu ˇcka, and Danijel Sko ˇcaj. SuperSim- pleNet: Unifying Unsupervised and Supervised Learning for Fast and Reliable Surface Defect Detection. InInternational Conference on Pattern Recognition, 2024. 2
work page 2024
-
[58]
Bla ˇz Rolih, Matic Fuˇcka, and Danijel Skoˇcaj. No Label Left Behind: A Unified Surface Defect Detection model for all Supervision Regimes.Journal of Intelligent Manufacturing,
-
[59]
High-Resolution Image Synthesis with Latent Diffusion Models
Robin Rombach, Andreas Blattmann, Dominik Lorenz, Patrick Esser, and Bj ¨orn Ommer. High-Resolution Image Synthesis with Latent Diffusion Models. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 10684–10695, 2022. 2
work page 2022
-
[60]
Towards To- tal Recall in Industrial Anomaly Detection
Karsten Roth, Latha Pemula, Joaquin Zepeda, Bernhard Sch¨olkopf, Thomas Brox, and Peter Gehler. Towards To- tal Recall in Industrial Anomaly Detection. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 14318–14328, 2022. 1, 2, 7
work page 2022
-
[61]
Multiresolution knowledge distillation for anomaly detection
Mohammadreza Salehi, Niousha Sadjadi, Soroosh Baselizadeh, Mohammad H Rohban, and Hamid R Ra- biee. Multiresolution knowledge distillation for anomaly detection. InProceedings of the IEEE/CVF confer- ence on computer vision and pattern recognition, pages 14902–14912, 2021. 1, 5
work page 2021
-
[62]
Oriane Sim ´eoni, Huy V V o, Maximilian Seitzer, Federico Baldassarre, Maxime Oquab, Cijo Jose, Vasil Khalidov, Marc Szafraniec, Seungeun Yi, Micha ¨el Ramamonjisoa, et al. Dinov3.arXiv preprint arXiv:2508.10104, 2025. 5
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[63]
Domen Tabernik, Samo ˇSela, Jure Skvar ˇc, and Danijel Skoˇcaj. Segmentation-Based Deep-Learning Approach for Surface-Defect Detection.Journal of Intelligent Manufac- turing, 2019. 5
work page 2019
-
[64]
Nima Tajbakhsh, Suryakanth R Gurudu, and Jianming Liang. Automated polyp detection in colonoscopy videos using shape and context information.IEEE transactions on medical imaging, 35(2):630–644, 2015. 5
work page 2015
-
[65]
Kernel-aware graph prompt learning for few-shot anomaly detection
Fenfang Tao, Guo-Sen Xie, Fang Zhao, and Xiangbo Shu. Kernel-aware graph prompt learning for few-shot anomaly detection. InProceedings of the AAAI Conference on Artifi- cial Intelligence, pages 7347–7355, 2025. 1
work page 2025
-
[66]
Attention is all you need.Advances in neural information processing systems, 30, 2017
Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszko- reit, Llion Jones, Aidan N Gomez, Łukasz Kaiser, and Illia Polosukhin. Attention is all you need.Advances in neural information processing systems, 30, 2017. 4
work page 2017
-
[67]
Image-consistent detection of road anomalies as unpredictable patches
Tom ´aˇs V oj´ıˇr and Ji ˇr´ı Matas. Image-consistent detection of road anomalies as unpredictable patches. InProceedings of the IEEE/CVF Winter Conference on Applications of Com- puter Vision, pages 5491–5500, 2023. 1
work page 2023
-
[68]
Pixood: Pixel- level out-of-distribution detection
Tom ´aˇs V oj´ıˇr, Jan ˇSochman, and Ji ˇr´ı Matas. Pixood: Pixel- level out-of-distribution detection. InEuropean Conference on Computer Vision, pages 93–109. Springer, 2024. 1
work page 2024
-
[69]
Wan: Open and Advanced Large-Scale Video Generative Models
Team Wan, Ang Wang, Baole Ai, Bin Wen, Chaojie Mao, Chen-Wei Xie, Di Chen, Feiwu Yu, Haiming Zhao, Jianxiao Yang, et al. Wan: Open and advanced large-scale video gen- erative models.arXiv preprint arXiv:2503.20314, 2025. 7, 8, 1
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[70]
Real-IAD: A real-world multi-view dataset for benchmarking versatile industrial anomaly detec- tion
Chengjie Wang, Wenbing Zhu, Bin-Bin Gao, Zhenye Gan, Jiangning Zhang, Zhihao Gu, Shuguang Qian, Mingang Chen, and Lizhuang Ma. Real-IAD: A real-world multi-view dataset for benchmarking versatile industrial anomaly detec- tion. InProceedings of the IEEE/CVF Conference on Com- 11 puter Vision and Pattern Recognition, pages 22883–22892,
-
[71]
DUST3R: Geometric 3d vision made easy
Shuzhe Wang, Vincent Leroy, Yohann Cabon, Boris Chidlovskii, and Jerome Revaud. DUST3R: Geometric 3d vision made easy. InProceedings of the IEEE/CVF Con- ference on Computer Vision and Pattern Recognition, pages 20697–20709, 2024. 4
work page 2024
-
[72]
LLM-powered data augmentation for enhanced cross- lingual performance
Chenxi Whitehouse, Monojit Choudhury, and Alham Fikri Aji. LLM-powered data augmentation for enhanced cross- lingual performance. InThe 2023 Conference on Empirical Methods in Natural Language Processing, 2023. 2
work page 2023
-
[73]
Weakly supervised learn- ing for industrial optical inspection
Matthias Wieler and Tobias Hahn. Weakly supervised learn- ing for industrial optical inspection. InDAGM symposium in, page 11, 2007. 5
work page 2007
-
[74]
Chenfei Wu, Jiahao Li, Jingren Zhou, Junyang Lin, Kaiyuan Gao, Kun Yan, Sheng-ming Yin, Shuai Bai, Xiao Xu, Yilei Chen, et al. Qwen-image technical report.arXiv preprint arXiv:2508.02324, 2025. 7, 8, 1
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[75]
Yuxin Wu and Kaiming He. Group normalization. InPro- ceedings of the European conference on computer vision (ECCV), pages 3–19, 2018. 4
work page 2018
-
[76]
Minghui Yang, Peng Wu, and Hui Feng. Memseg: A semi- supervised method for image surface defect detection using differences and commonalities.Engineering Applications of Artificial Intelligence, 119:105835, 2023. 4
work page 2023
-
[77]
Defect spectrum: A granular look of large-scale defect datasets with rich seman- tics
Shuai Yang, Zhifei Chen, Pengguang Chen, Xi Fang, Yixun Liang, Shu Liu, and Yingcong Chen. Defect spectrum: A granular look of large-scale defect datasets with rich seman- tics. InComputer Vision – ECCV 2024, pages 187–203, Cham, 2024. Springer Nature Switzerland. 2
work page 2024
-
[78]
GPT3Mix: Leveraging large- scale language models for text augmentation
Kang Min Yoo, Dongju Park, Jaewook Kang, Sang-Woo Lee, and Woomyoung Park. GPT3Mix: Leveraging large- scale language models for text augmentation. InFindings of the Association for Computational Linguistics: EMNLP 2021, pages 2225–2239, Punta Cana, Dominican Republic,
work page 2021
-
[79]
Association for Computational Linguistics. 2
-
[80]
DRÆM - a Discriminatively Trained Reconstruction Embedding for Surface Anomaly Detection
Vitjan Zavrtanik, Matej Kristan, and Danijel Skoˇcaj. DRÆM - a Discriminatively Trained Reconstruction Embedding for Surface Anomaly Detection. InProceedings of the IEEE/CVF international conference on computer vision, pages 8330–8339, 2021. 2
work page 2021
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.