Defect-aware Hybrid Prompt Optimization via Progressive Tuning for Zero-Shot Multi-type Anomaly Detection and Segmentation
read the original abstract
Recent vision-language models (VLMs) like CLIP have shown impressive anomaly detection performance under significant distribution shift by utilizing high-level semantic information through text prompts. However, these models often overlook fine-grained defect cues, e.g., hole, cut, or scratch, that are essential for understanding the anomaly's nature. Moreover, the modality gap between images and text can lead to subtle visual evidence being poorly captured in textual descriptions. To address the gap, we enhance the representation of "abnormal" with structured semantics, bridging coarse anomaly signals and fine-grained defect categories. We propose a hybrid prompting mechanism that combines human-readable descriptions of defect types with learnable token embeddings. Building on these ideas, we introduce DAPO, a Defect-aware Prompt Optimization framework for zero-shot multi-type and binary anomaly detection and segmentation under distribution shift. DAPO aligns anomaly-relevant visual features with their corresponding textual semantics by learning hybrid defect-aware prompts that combine fixed textual anchors with trainable token embeddings. We conducted experiments on public benchmarks (MPDD, VisA, MVTec-AD, MAD, and Real-IAD) and an internal dataset. The results suggest that compared to the baseline models, DAPO achieves a 3.6% average improvement in AUROC and average precision metrics at the image level under distribution shift, and a 5.2% average improvement in AUROC and F1 when localizing novel anomaly types under zero-shot settings.
This paper has not been read by Pith yet.
Forward citations
Cited by 1 Pith paper
-
GenAU: Language-Grounded Industrial Anomaly Understanding with Vision-Language Models
GenAU augments a vision-language model with segmentation tokens to unify image-level anomaly detection, pixel-level segmentation, multi-type classification, and language-based defect analysis in a single instruction-f...
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.