MuSc-V2: Zero-Shot Multimodal Industrial Anomaly Classification and Segmentation with Mutual Scoring of Unlabeled Samples

Feng Xue; Xurui Li; Yu Zhou

arxiv: 2511.10047 · v2 · submitted 2025-11-13 · 💻 cs.CV

MuSc-V2: Zero-Shot Multimodal Industrial Anomaly Classification and Segmentation with Mutual Scoring of Unlabeled Samples

Xurui Li , Feng Xue , Yu Zhou This is my paper

Pith reviewed 2026-05-17 22:52 UTC · model grok-4.3

classification 💻 cs.CV

keywords zero-shot anomaly detectionmultimodal industrial anomalymutual scoringanomaly segmentation3D anomaly detectionunsupervised classificationcross-modal fusion

0 comments

The pith

Mutual scoring of unlabeled patches in 2D and 3D separates anomalies for zero-shot industrial detection.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper shows that normal patches in industrial 2D images and 3D shapes usually match many others, while anomalies stay unique and scattered. It introduces MuSc-V2, a framework that uses this property through mutual scoring within modalities and cross-modal enhancement to classify and segment defects without any labeled training data. This matters because the approach delivers large accuracy gains on benchmark datasets and performs well even on smaller data subsets or with only one modality. A reader interested in practical inspection systems would see value in a label-free method that adapts across different products.

Core claim

The central claim is that the Mutual Scoring framework (MuSc-V2), built on Iterative Point Grouping, Similarity Neighborhood Aggregation with Multi-Degrees, Mutual Scoring Mechanism, Cross-modal Anomaly Enhancement, and Re-scoring with Constrained Neighborhood, leverages the discriminative property of normal patch similarities versus anomaly isolation to deliver strong zero-shot anomaly classification and segmentation performance in multimodal settings.

What carries the argument

The Mutual Scoring Mechanism (MSM), which allows samples to score each other within each modality, fused with cross-modal anomaly enhancement to recover missing detections.

If this is right

Delivers a 23.7 percent AP gain on the MVTec 3D-AD dataset.
Delivers a 19.3 percent boost on the Eyecandies dataset.
Surpasses all previous zero-shot methods and most few-shot methods.
Maintains robust performance when applied to the full dataset or smaller subsets.
Supports flexible use with 2D only, 3D only, or combined modalities.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

This approach could extend to other clustering-based unsupervised tasks where normal examples share common features.
A test on datasets containing anomalies that mimic normal patterns would check the limits of the similarity assumption.
Combining the mutual scoring with additional sensor types might improve robustness in real-world manufacturing lines.

Load-bearing premise

Normal image patches across industrial products typically find many other similar patches in both 2D appearance and 3D shapes, while anomalies remain diverse and isolated.

What would settle it

Finding an industrial dataset where anomalous patches show as many mutual similarities as normal patches would falsify the separation mechanism.

Figures

Figures reproduced from arXiv: 2511.10047 by Feng Xue, Xurui Li, Yu Zhou.

**Figure 2.** Figure 2: The pipeline of our MuSc-V2. This framework processes 2D images and 3D point clouds through four important innovations: (1) IPG replaces the current grouping strategy in the point transformer to generate groups with continuous surfaces (Sec. III-A). (2) SNAMD improves the abnormal modeling ability with varying sizes for both modals (Sec. III-B). (3) MSM obtains anomaly segmentation results of 2D/3D modals.… view at source ↗

**Figure 3.** Figure 3: Toy example of searching KP points for the center point pc. The green lines and regions represent the candidate points, and the blue ones indicate the searched points as the group points of pc. A. 2D/3D Patch Representation 2D Feature Extraction. Following [15], [16], [52], we adopt a vision transformer [23] consisting of S stages to extract hierarchical 2D features. For image Ii , we define the patch toke… view at source ↗

**Figure 4.** Figure 4: Similarity-Weighted Pooling (SWPooling) Versus Average Pooling (APooling). Top: One toy example represents feature maps aggregated by two aggregation methods, where blue patches and red patches simulate normal and abnormal tokens, respectively. Bottom: The visualization of segmentation results with SWPooling and APooling by one real example. where F i,s(m) ∈ R 1×C denotes the feature vector of patch m, and… view at source ↗

**Figure 6.** Figure 6: Two examples whose anomalies exhibit single-modality prominence: (a) 3D-visible peach anomaly, (b) 2D-detectable carrot anomaly. tor of the point cloud Pi is denoted as A i P = [a i,1 P , ..., a i,MP P ] ⊤, where a i,n P represents the anomaly score of the n-th 3D patch. Cross-modal Anomaly Enhancement. Our mutual scoring mechanism achieves strong patch-level anomaly detection within each modality, yet f… view at source ↗

**Figure 8.** Figure 8: Visualization of anomaly segmentation results on MVTec 3D-AD and Eyecandies benchmarks. 3D modal and multimodal (MM) results are displayed. [PITH_FULL_IMAGE:figures/full_fig_p009_8.png] view at source ↗

**Figure 9.** Figure 9: Visualization of anomaly segmentation on MVTec 3D-AD and [PITH_FULL_IMAGE:figures/full_fig_p010_9.png] view at source ↗

**Figure 10.** Figure 10: Four anomaly segmentation metrics with different normal sample [PITH_FULL_IMAGE:figures/full_fig_p011_10.png] view at source ↗

**Figure 11.** Figure 11: Experimental results of the influence of four hyperparameters on [PITH_FULL_IMAGE:figures/full_fig_p011_11.png] view at source ↗

read the original abstract

Zero-shot anomaly classification (AC) and segmentation (AS) methods aim to identify and outline defects without using any labeled samples. In this paper, we reveal a key property that is overlooked by existing methods: normal image patches across industrial products typically find many other similar patches, not only in 2D appearance but also in 3D shapes, while anomalies remain diverse and isolated. To explicitly leverage this discriminative property, we propose a Mutual Scoring framework (MuSc-V2) for zero-shot AC/AS, which flexibly supports single 2D/3D or multimodality. Specifically, our method begins by improving 3D representation through Iterative Point Grouping (IPG), which reduces false positives from discontinuous surfaces. Then we use Similarity Neighborhood Aggregation with Multi-Degrees (SNAMD) to fuse 2D/3D neighborhood cues into more discriminative multi-scale patch features for mutual scoring. The core comprises a Mutual Scoring Mechanism (MSM) that lets samples within each modality to assign score to each other, and Cross-modal Anomaly Enhancement (CAE) that fuses 2D and 3D scores to recover modality-specific missing anomalies. Finally, Re-scoring with Constrained Neighborhood (RsCon) suppresses false classification based on similarity to more representative samples. Our framework flexibly works on both the full dataset and smaller subsets with consistently robust performance, ensuring seamless adaptability across diverse product lines. In aid of the novel framework, MuSc-V2 achieves significant performance improvements: a $\textbf{+23.7\%}$ AP gain on the MVTec 3D-AD dataset and a $\textbf{+19.3\%}$ boost on the Eyecandies dataset, surpassing previous zero-shot benchmarks and even outperforming most few-shot methods. The code will be available at The code will be available at \href{https://github.com/HUST-SLOW/MuSc-V2}{https://github.com/HUST-SLOW/MuSc-V2}.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

MuSc-V2's mutual scoring idea is a reasonable way to use unlabeled data for zero-shot anomaly detection, but the supporting evidence for its key assumption is still missing.

read the letter

The paper's central move is to notice that normal industrial patches tend to have lots of similar patches around them in both appearance and shape, while defects are more unique. They turn that into a scoring system that works without any labeled anomalies. What stands out is the specific stack: Iterative Point Grouping to handle 3D surfaces better, then Similarity Neighborhood Aggregation across degrees, the mutual scoring itself, cross-modal fusion, and a final constrained re-scoring step. This combination lets the method handle 2D, 3D, or both, and it claims to work even on smaller subsets of the data. The reported gains of over 20 points AP on the two main datasets are large enough to get attention. The main weakness is that we don't see direct evidence for the neighbor property they rely on. No plots of how many neighbors normal patches get versus anomalous ones, no check on how sensitive the results are to the similarity thresholds. The performance numbers could be driven mostly by the 3D representation fixes rather than the scoring logic. If the separation between normal and anomaly neighborhoods is weak on other data, the gains won't transfer. This work is aimed at computer vision researchers focused on industrial quality control who want zero-shot options. Anyone building inspection systems without defect examples could get practical value if the method proves robust. It has enough of a concrete proposal and claimed results to merit a full referee process. I would recommend sending it out for review, but ask the referees to look closely at whether the mutual scoring actually adds value beyond the feature improvements.

Referee Report

2 major / 2 minor

Summary. The manuscript presents MuSc-V2, a zero-shot framework for multimodal industrial anomaly classification and segmentation. It identifies a key property that normal patches have many similar counterparts in 2D appearance and 3D shapes, while anomalies are diverse and isolated. The method incorporates Iterative Point Grouping (IPG) to improve 3D representations, Similarity Neighborhood Aggregation with Multi-Degrees (SNAMD) for multi-scale features, Mutual Scoring Mechanism (MSM), Cross-modal Anomaly Enhancement (CAE), and Re-scoring with Constrained Neighborhood (RsCon). Experiments on MVTec 3D-AD and Eyecandies datasets report substantial performance gains of +23.7% AP and +19.3% respectively over prior zero-shot methods.

Significance. If the central claims hold, this work would represent a notable advance in zero-shot anomaly detection by explicitly leveraging inter-sample similarities in an unlabeled pool for both 2D and 3D modalities. The reported outperformance over most few-shot methods is particularly striking. The planned code release at https://github.com/HUST-SLOW/MuSc-V2 enhances reproducibility.

major comments (2)

[Introduction] Introduction (key property paragraph): The foundational claim that 'normal image patches across industrial products typically find many other similar patches, not only in 2D appearance but also in 3D shapes, while anomalies remain diverse and isolated' is presented without any quantitative support such as neighbor-count histograms, average in-neighborhood sizes, or intra-class vs. inter-class similarity distributions on MVTec 3D-AD or Eyecandies. This assumption is load-bearing for the MSM and the reported +23.7% AP gain, as the mutual scoring separation depends on a reliable gap in neighborhood counts.
[Experiments] Experiments section (main results table): The performance tables show overall AP improvements, but no ablation isolates the contribution of MSM + CAE + RsCon from the IPG and SNAMD components alone. Without such controls it remains possible that the gains derive primarily from the 3D representation improvements rather than the mutual-scoring logic itself.

minor comments (2)

[Abstract] Abstract: The sentence 'The code will be available at The code will be available at https://github.com/HUST-SLOW/MuSc-V2' contains a duplicated phrase that should be corrected.
[Method] Method: The aggregation steps in SNAMD would benefit from explicit pseudocode or a small diagram showing how multi-degree neighborhoods are fused across modalities.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback and the recommendation for major revision. We address each major comment point by point below, proposing targeted revisions to strengthen the manuscript while maintaining its core contributions.

read point-by-point responses

Referee: [Introduction] The foundational claim that 'normal image patches across industrial products typically find many other similar patches, not only in 2D appearance but also in 3D shapes, while anomalies remain diverse and isolated' is presented without any quantitative support such as neighbor-count histograms, average in-neighborhood sizes, or intra-class vs. inter-class similarity distributions on MVTec 3D-AD or Eyecandies. This assumption is load-bearing for the MSM and the reported +23.7% AP gain, as the mutual scoring separation depends on a reliable gap in neighborhood counts.

Authors: We acknowledge that the key property is introduced as an empirical observation without explicit quantitative backing in the current introduction. This property emerged from our analysis of patch distributions in the target datasets and is validated indirectly through the method's performance. To directly address the concern and reinforce the foundation for MSM, we will add quantitative analyses—including neighbor-count histograms, average in-neighborhood sizes, and intra- vs. inter-class similarity distributions—on MVTec 3D-AD and Eyecandies in the revised introduction and/or a new supplementary section. revision: yes
Referee: [Experiments] The performance tables show overall AP improvements, but no ablation isolates the contribution of MSM + CAE + RsCon from the IPG and SNAMD components alone. Without such controls it remains possible that the gains derive primarily from the 3D representation improvements rather than the mutual-scoring logic itself.

Authors: We agree that a more granular ablation would help isolate the impact of the mutual scoring logic (MSM, CAE, RsCon) from the representation enhancements (IPG, SNAMD). The existing experiments include module-level ablations and overall framework results, but we recognize the value of a dedicated control experiment. In the revision, we will add an ablation study that evaluates the mutual scoring components on top of the base IPG+SNAMD features to clarify their specific contributions to the reported gains. revision: yes

Circularity Check

0 steps flagged

No significant circularity; heuristic framework rests on explicit empirical assumption

full rationale

The paper states an observed property of normal patches sharing 2D/3D neighbors while anomalies are isolated, then builds MSM + SNAMD + CAE + RsCon to exploit it for scoring. No equations or steps reduce a claimed prediction back to a fitted parameter or self-citation by construction; the reported AP gains are presented as experimental outcomes on MVTec 3D-AD and Eyecandies rather than a closed derivation. The central premise is falsifiable via neighbor-count statistics on the target datasets and does not import uniqueness theorems or rename prior results.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim rests on the domain assumption that normal patches exhibit high mutual similarity while anomalies are isolated; the method introduces several new algorithmic components whose internal hyperparameters are not detailed in the abstract.

axioms (1)

domain assumption Normal patches find many similar counterparts in 2D and 3D while anomalies are diverse and isolated.
This property is stated as the key overlooked fact that the mutual scoring framework exploits.

pith-pipeline@v0.9.0 · 5672 in / 1318 out tokens · 26836 ms · 2026-05-17T22:52:03.741632+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Foundation/AbsoluteFloorClosure.lean absolute_floor_iff_bare_distinguishability unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

normal image patches across industrial products typically find many other similar patches, not only in 2D appearance but also in 3D shapes, while anomalies remain diverse and isolated
IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

Mutual Scoring Mechanism (MSM) that lets samples within each modality to assign score to each other

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

67 extracted references · 67 canonical work pages

[1]

A unified anomaly synthesis strategy with gradient ascent for industrial anomaly detection and localization,

Q. Chen, H. Luo, C. Lv, and Z. Zhang, “A unified anomaly synthesis strategy with gradient ascent for industrial anomaly detection and localization,” inEur. Conf. Comput. Vis., 2025

work page 2025
[2]

Collaborative discrepancy optimization for reliable image anomaly localization,

Y . Cao, X. Xu, Z. Liu, and W. Shen, “Collaborative discrepancy optimization for reliable image anomaly localization,”IEEE Trans. Ind. Inform., 2023

work page 2023
[3]

Diffusionad: Norm-guided one-step denoising diffusion for anomaly detection,

H. Zhang, Z. Wang, D. Zeng, Z. Wu, and Y .-G. Jiang, “Diffusionad: Norm-guided one-step denoising diffusion for anomaly detection,”IEEE Trans. Pattern Anal. Mach. Intell., 2025

work page 2025
[4]

Center- aware residual anomaly synthesis for multiclass industrial anomaly detection,

Q. Chen, H. Luo, H. Yao, W. Luo, Z. Qu, C. Lv, and Z. Zhang, “Center- aware residual anomaly synthesis for multiclass industrial anomaly detection,”IEEE Trans. Ind. Inform., 2025

work page 2025
[5]

Self-supervised masked convolutional transformer block for anomaly detection,

N. Madan, N.-C. Ristea, R. T. Ionescu, K. Nasrollahi, F. S. Khan, T. B. Moeslund, and M. Shah, “Self-supervised masked convolutional transformer block for anomaly detection,”IEEE Trans. Pattern Anal. Mach. Intell., 2023

work page 2023
[6]

Target before shooting: Accurate anomaly detection and localization under one mil- lisecond via cascade patch retrieval,

H. Li, J. Hu, B. Li, H. Chen, Y . Zheng, and C. Shen, “Target before shooting: Accurate anomaly detection and localization under one mil- lisecond via cascade patch retrieval,”IEEE Trans. Image Process., 2024

work page 2024
[7]

Self-supervised anomaly detection with neural transformations,

C. Qiu, M. Kloft, S. Mandt, and M. Rudolph, “Self-supervised anomaly detection with neural transformations,”IEEE Trans. Pattern Anal. Mach. Intell., 2024

work page 2024
[8]

Prior normality prompt transformer for multiclass industrial image anomaly detection,

H. Yao, Y . Cao, W. Luo, W. Zhang, W. Yu, and W. Shen, “Prior normality prompt transformer for multiclass industrial image anomaly detection,” IEEE Trans. Ind. Inform., 2024

work page 2024
[9]

Pushing the limits of fewshot anomaly detection in industry vision: Graphcore,

G. Xie, J. Wang, J. Liu, F. Zheng, and Y . Jin, “Pushing the limits of fewshot anomaly detection in industry vision: Graphcore,” inInt. Conf. Learn. Represent., 2023

work page 2023
[10]

Shape- consistent one-shot unsupervised domain adaptation for rail surface defect segmentation,

S. Ma, K. Song, M. Niu, H. Tian, Y . Wang, and Y . Yan, “Shape- consistent one-shot unsupervised domain adaptation for rail surface defect segmentation,”IEEE Trans. Ind. Inform., 2023

work page 2023
[11]

Few-shot domain-adaptive anomaly detection for cross-site brain images,

J. Su, H. Shen, L. Peng, and D. Hu, “Few-shot domain-adaptive anomaly detection for cross-site brain images,”IEEE Trans. Pattern Anal. Mach. Intell., 2021

work page 2021
[12]

Toward generalist anomaly detection via in-context residual learning with few-shot sample prompts,

J. Zhu and G. Pang, “Toward generalist anomaly detection via in-context residual learning with few-shot sample prompts,” inIEEE Conf. Comput. Vis. Pattern Recog., 2024

work page 2024
[13]

Promptad: Learning prompts with only normal samples for few-shot anomaly detection,

X. Li, Z. Zhang, X. Tan, C. Chen, Y . Qu, Y . Xie, and L. Ma, “Promptad: Learning prompts with only normal samples for few-shot anomaly detection,” inIEEE Conf. Comput. Vis. Pattern Recog., 2024

work page 2024
[14]

Adapting visual-language models for generalizable anomaly detection in medical images,

C. Huang, A. Jiang, J. Feng, Y . Zhang, X. Wang, and Y . Wang, “Adapting visual-language models for generalizable anomaly detection in medical images,” inIEEE Conf. Comput. Vis. Pattern Recog., 2024

work page 2024
[15]

Winclip: Zero-/few-shot anomaly classification and segmentation,

J. Jeong, Y . Zou, T. Kim, D. Zhang, A. Ravichandran, and O. Dabeer, “Winclip: Zero-/few-shot anomaly classification and segmentation,” in IEEE Conf. Comput. Vis. Pattern Recog., 2023

work page 2023
[16]

A zero-/fewshot anomaly classification and segmentation method for cvpr 2023 vand workshop challenge tracks 1&2: 1st place on zero-shot ad and 4th place on few-shot ad,

X. Chen, Y . Han, and J. Zhang, “A zero-/few-shot anomaly classification and segmentation method for cvpr 2023 vand workshop challenge tracks 1&2: 1st place on zero-shot ad and 4th place on few-shot ad,”arXiv preprint arXiv:2305.17382, 2023

work page arXiv 2023
[17]

Zero- shot anomaly detection via batch normalization,

A. Li, C. Qiu, M. Kloft, P. Smyth, M. Rudolph, and S. Mandt, “Zero- shot anomaly detection via batch normalization,” inAdv. Neural Inform. Process. Syst., 2023

work page 2023
[18]

Multimodal industrial anomaly detection via hybrid fusion,

Y . Wang, J. Peng, J. Zhang, R. Yi, Y . Wang, and C. Wang, “Multimodal industrial anomaly detection via hybrid fusion,” inIEEE Conf. Comput. Vis. Pattern Recog., 2023

work page 2023
[19]

Back to the feature: classical 3d features are (almost) all you need for 3d anomaly detection,

E. Horwitz and Y . Hoshen, “Back to the feature: classical 3d features are (almost) all you need for 3d anomaly detection,” inIEEE Conf. Comput. Vis. Pattern Recog., 2023

work page 2023
[20]

M3dm-nr: Rgb-3d noisy-resistant industrial anomaly detection via multimodal denoising,

C. Wang, H. Zhu, J. Peng, Y . Wang, R. Yi, Y . Wu, L. Ma, and J. Zhang, “M3dm-nr: Rgb-3d noisy-resistant industrial anomaly detection via multimodal denoising,”IEEE Trans. Pattern Anal. Mach. Intell., 2025

work page 2025
[21]

Pointad: Comprehending 3d anomalies from points and pixels for zero-shot 3d anomaly detection,

Q. Zhou, J. Yan, S. He, W. Meng, and J. Chen, “Pointad: Comprehending 3d anomalies from points and pixels for zero-shot 3d anomaly detection,” Adv. Neural Inform. Process. Syst., 2024

work page 2024
[22]

Musc: Zero-shot industrial anomaly classification and segmentation with mutual scoring of the unlabeled images,

X. Li, Z. Huang, F. Xue, and Y . Zhou, “Musc: Zero-shot industrial anomaly classification and segmentation with mutual scoring of the unlabeled images,” inInt. Conf. Learn. Represent., 2024

work page 2024
[23]

An image is worth 16x16 words: Transformers for image recognition at scale,

A. Dosovitskiy, L. Beyer, A. Kolesnikov, D. Weissenborn, X. Zhai, T. Unterthiner, M. Dehghani, M. Minderer, G. Heigold, S. Gellyet al., “An image is worth 16x16 words: Transformers for image recognition at scale,” inInt. Conf. Learn. Represent., 2020

work page 2020
[24]

Point transformer,

H. Zhao, L. Jiang, J. Jia, P. H. Torr, and V . Koltun, “Point transformer,” inInt. Conf. Comput. Vis., 2021

work page 2021
[25]

Learning transferable visual models from natural language supervision,

A. Radford, J. W. Kim, C. Hallacy, A. Ramesh, G. Goh, S. Agarwal, G. Sastry, A. Askell, P. Mishkin, J. Clarket al., “Learning transferable visual models from natural language supervision,” inInt. Conf. Mach. Learn., 2021

work page 2021
[26]

Emerging properties in self-supervised vision transformers,

M. Caron, H. Touvron, I. Misra, H. J ´egou, J. Mairal, P. Bojanowski, and A. Joulin, “Emerging properties in self-supervised vision transformers,” inInt. Conf. Comput. Vis., 2021

work page 2021
[27]

Masked autoencoders for point cloud self-supervised learning,

Y . Pang, W. Wang, F. E. Tay, W. Liu, Y . Tian, and L. Yuan, “Masked autoencoders for point cloud self-supervised learning,” inEur. Conf. Comput. Vis., 2022

work page 2022
[28]

Point-bert: Pre- training 3d point cloud transformers with masked point modeling,

X. Yu, L. Tang, Y . Rao, T. Huang, J. Zhou, and J. Lu, “Point-bert: Pre- training 3d point cloud transformers with masked point modeling,” in IEEE Conf. Comput. Vis. Pattern Recog., 2022

work page 2022
[29]

Swin transformer: Hierarchical vision transformer using shifted windows,

Z. Liu, Y . Lin, Y . Cao, H. Hu, Y . Wei, Z. Zhang, S. Lin, and B. Guo, “Swin transformer: Hierarchical vision transformer using shifted windows,” inInt. Conf. Comput. Vis., 2021

work page 2021
[30]

Vlt: Vision-language trans- former and query generation for referring segmentation,

H. Ding, C. Liu, S. Wang, and X. Jiang, “Vlt: Vision-language trans- former and query generation for referring segmentation,”IEEE Trans. Pattern Anal. Mach. Intell., 2022

work page 2022
[31]

Pointnet: Deep learning on point sets for 3d classification and segmentation,

C. R. Qi, H. Su, K. Mo, and L. J. Guibas, “Pointnet: Deep learning on point sets for 3d classification and segmentation,” inIEEE Conf. Comput. Vis. Pattern Recog., 2017

work page 2017
[32]

Flattening- net: Deep regular 2d representation for 3d point cloud analysis,

Q. Zhang, J. Hou, Y . Qian, Y . Zeng, J. Zhang, and Y . He, “Flattening- net: Deep regular 2d representation for 3d point cloud analysis,”IEEE Trans. Pattern Anal. Mach. Intell., 2023

work page 2023
[33]

V ote2cap-detr++: Decoupling localization and describing for end-to-end 3d dense captioning,

S. Chen, H. Zhu, M. Li, X. Chen, P. Guo, Y . Lei, G. Yu, T. Li, and T. Chen, “V ote2cap-detr++: Decoupling localization and describing for end-to-end 3d dense captioning,”IEEE Trans. Pattern Anal. Mach. Intell., 2024

work page 2024
[34]

Pct: Point cloud transformer,

M.-H. Guo, J.-X. Cai, Z.-N. Liu, T.-J. Mu, R. R. Martin, and S.-M. Hu, “Pct: Point cloud transformer,”Comput. Vis. Media, 2021

work page 2021
[35]

Generative variational-contrastive learning for self-supervised point cloud represen- tation,

B. Wang, Z. Tian, A. Ye, F. Wen, S. Du, and Y . Gao, “Generative variational-contrastive learning for self-supervised point cloud represen- tation,”IEEE Trans. Pattern Anal. Mach. Intell., 2024. 13

work page 2024
[36]

Point transformer v2: Grouped vector attention and partition-based pooling,

X. Wu, Y . Lao, L. Jiang, X. Liu, and H. Zhao, “Point transformer v2: Grouped vector attention and partition-based pooling,”Adv. Neural Inform. Process. Syst., 2022

work page 2022
[37]

Point transformer v3: Simpler faster stronger,

X. Wu, L. Jiang, P.-S. Wang, Z. Liu, X. Liu, Y . Qiao, W. Ouyang, T. He, and H. Zhao, “Point transformer v3: Simpler faster stronger,” inIEEE Conf. Comput. Vis. Pattern Recog., 2024

work page 2024
[38]

Adaclip: Adapting clip with hybrid learnable prompts for zero-shot anomaly detection,

Y . Cao, J. Zhang, L. Frittoli, Y . Cheng, W. Shen, and G. Boracchi, “Adaclip: Adapting clip with hybrid learnable prompts for zero-shot anomaly detection,” inEur. Conf. Comput. Vis., 2024

work page 2024
[39]

Vcp-clip: A visual context prompting model for zero-shot anomaly segmentation,

Z. Qu, X. Tao, M. Prasad, F. Shen, Z. Zhang, X. Gong, and G. Ding, “Vcp-clip: A visual context prompting model for zero-shot anomaly segmentation,” inEur. Conf. Comput. Vis., 2024

work page 2024
[40]

Promptad: Zero-shot anomaly detection using text prompts,

Y . Li, A. Goodge, F. Liu, and C.-S. Foo, “Promptad: Zero-shot anomaly detection using text prompts,” inWinter Conf. Appl. Comput. Vis., 2024

work page 2024
[41]

Filo: Zero-shot anomaly detection by fine-grained description and high- quality localization,

Z. Gu, B. Zhu, G. Zhu, Y . Chen, H. Li, M. Tang, and J. Wang, “Filo: Zero-shot anomaly detection by fine-grained description and high- quality localization,” inACM Int. Conf. Multimedia, 2024

work page 2024
[42]

Zero-shot versus many-shot: Unsupervised texture anomaly detection,

T. Aota, L. T. T. Tong, and T. Okatani, “Zero-shot versus many-shot: Unsupervised texture anomaly detection,” inWinter Conf. Appl. Comput. Vis., 2023

work page 2023
[43]

R3d-ad: Reconstruction via diffusion for 3d anomaly detection,

Z. Zhou, L. Wang, N. Fang, Z. Wang, L. Qiu, and S. Zhang, “R3d-ad: Reconstruction via diffusion for 3d anomaly detection,” inEur. Conf. Comput. Vis., 2025

work page 2025
[44]

Towards scalable 3d anomaly detection and localization: A benchmark via 3d anomaly synthesis and a self-supervised learning network,

W. Li, X. Xu, Y . Gu, B. Zheng, S. Gao, and Y . Wu, “Towards scalable 3d anomaly detection and localization: A benchmark via 3d anomaly synthesis and a self-supervised learning network,” inIEEE Conf. Comput. Vis. Pattern Recog., 2024

work page 2024
[45]

Easynet: An easy network for 3d industrial anomaly detection,

R. Chen, G. Xie, J. Liu, J. Wang, Z. Luo, J. Wang, and F. Zheng, “Easynet: An easy network for 3d industrial anomaly detection,” inACM Int. Conf. Multimedia, 2023

work page 2023
[46]

Shape-guided dual-memory learning for 3d anomaly detection,

Y .-M. Chu, C. Liu, T.-I. Hsieh, H.-T. Chen, and T.-L. Liu, “Shape-guided dual-memory learning for 3d anomaly detection,” inInt. Conf. Mach. Learn., 2023

work page 2023
[47]

Multi- modal industrial anomaly detection by crossmodal feature mapping,

A. Costanzino, P. Z. Ramirez, G. Lisanti, and L. Di Stefano, “Multi- modal industrial anomaly detection by crossmodal feature mapping,” in IEEE Conf. Comput. Vis. Pattern Recog., 2024

work page 2024
[48]

Ranking on data manifolds,

D. Zhou, J. Weston, A. Gretton, O. Bousquet, and B. Sch ¨olkopf, “Ranking on data manifolds,”Adv. Neural Inform. Process. Syst., 2003

work page 2003
[49]

Riemannian manifold learning,

T. Lin and H. Zha, “Riemannian manifold learning,”IEEE Trans. Pattern Anal. Mach. Intell., 2008

work page 2008
[50]

Affinity learning via self-diffusion for image segmentation and clustering,

B. Wang and Z. Tu, “Affinity learning via self-diffusion for image segmentation and clustering,” inIEEE Conf. Comput. Vis. Pattern Recog., 2012

work page 2012
[51]

Adaptive manifold learning,

Z. Zhang, J. Wang, and H. Zha, “Adaptive manifold learning,”IEEE Trans. Pattern Anal. Mach. Intell., 2011

work page 2011
[52]

Anomalyclip: Object- agnostic prompt learning for zero-shot anomaly detection,

Q. Zhou, G. Pang, Y . Tian, S. He, and J. Chen, “Anomalyclip: Object- agnostic prompt learning for zero-shot anomaly detection,” inInt. Conf. Learn. Represent., 2024

work page 2024
[53]

The farthest point strategy for progressive image sampling,

Y . Eldar, M. Lindenbaum, M. Porat, and Y . Y . Zeevi, “The farthest point strategy for progressive image sampling,”IEEE Trans. Image Process., 1997

work page 1997
[54]

M. P. Do Carmo,Differential geometry of curves and surfaces: revised and updated second edition. Courier Dover Publications, 2016

work page 2016
[55]

Towards total recall in industrial anomaly detection,

K. Roth, L. Pemula, J. Zepeda, B. Sch ¨olkopf, T. Brox, and P. Gehler, “Towards total recall in industrial anomaly detection,” inIEEE Conf. Comput. Vis. Pattern Recog., 2022

work page 2022
[56]

Unsupervised metric learning by self- smoothing operator,

J. Jiang, B. Wang, and Z. Tu, “Unsupervised metric learning by self- smoothing operator,” inInt. Conf. Comput. Vis., 2011

work page 2011
[57]

The mvtec 3d-ad dataset for unsupervised 3d anomaly detection and localization,

P. Bergmann, X. Jin, D. Sattlegger, and C. Steger, “The mvtec 3d-ad dataset for unsupervised 3d anomaly detection and localization,” inInt. Conf. Comput. Vis. Theor. Appl., 2021

work page 2021
[58]

Fine-grained abnormality prompt learning for zero-shot anomaly detection,

J. Zhu, Y .-S. Ong, C. Shen, and G. Pang, “Fine-grained abnormality prompt learning for zero-shot anomaly detection,” inInt. Conf. Comput. Vis., 2025

work page 2025
[59]

Pointclip v2: Prompting clip and gpt for powerful 3d open- world learning,

X. Zhu, R. Zhang, B. He, Z. Guo, Z. Zeng, Z. Qin, S. Zhang, and P. Gao, “Pointclip v2: Prompting clip and gpt for powerful 3d open- world learning,” inInt. Conf. Comput. Vis., 2023

work page 2023
[60]

Ulip: Learning a unified representation of language, images, and point clouds for 3d understanding,

L. Xue, M. Gao, C. Xing, R. Mart ´ın-Mart´ın, J. Wu, C. Xiong, R. Xu, J. C. Niebles, and S. Savarese, “Ulip: Learning a unified representation of language, images, and point clouds for 3d understanding,” inIEEE Conf. Comput. Vis. Pattern Recog., 2023

work page 2023
[61]

Ulip-2: Towards scalable multimodal pre-training for 3d understanding,

L. Xue, N. Yu, S. Zhang, A. Panagopoulou, J. Li, R. Mart ´ın-Mart´ın, J. Wu, C. Xiong, R. Xu, J. C. Niebleset al., “Ulip-2: Towards scalable multimodal pre-training for 3d understanding,” inIEEE Conf. Comput. Vis. Pattern Recog., 2024

work page 2024
[62]

Towards zero-shot 3d anomaly localization,

Y . Wang, K.-C. Peng, and Y . Fu, “Towards zero-shot 3d anomaly localization,” inWinter Conf. Appl. Comput. Vis., 2025

work page 2025
[63]

The eyecandies dataset for unsupervised multimodal anomaly detection and localization,

L. Bonfiglioli, M. Toschi, D. Silvestri, N. Fioraio, and D. De Gregorio, “The eyecandies dataset for unsupervised multimodal anomaly detection and localization,” inAsian Conf. Comput. Vis., 2022

work page 2022
[64]

Mvtec ad–a comprehensive real-world dataset for unsupervised anomaly detection,

P. Bergmann, M. Fauser, D. Sattlegger, and C. Steger, “Mvtec ad–a comprehensive real-world dataset for unsupervised anomaly detection,” inIEEE Conf. Comput. Vis. Pattern Recog., 2019

work page 2019
[65]

Spot-the- difference self-supervised pre-training for anomaly detection and seg- mentation,

Y . Zou, J. Jeong, L. Pemula, D. Zhang, and O. Dabeer, “Spot-the- difference self-supervised pre-training for anomaly detection and seg- mentation,” inEur. Conf. Comput. Vis., 2022

work page 2022
[66]

Beyond single-modal boundary: Cross-modal anomaly detection through visual prototype and harmonization,

K. Mao, P. Wei, Y . Lian, Y . Wang, and N. Zheng, “Beyond single-modal boundary: Cross-modal anomaly detection through visual prototype and harmonization,” inIEEE Conf. Comput. Vis. Pattern Recog., 2025

work page 2025
[67]

Rareclip: Rarity-aware online zero- shot industrial anomaly detection,

J. He, M. Cao, S. Peng, and Q. Xie, “Rareclip: Rarity-aware online zero- shot industrial anomaly detection,” inInt. Conf. Comput. Vis., 2025

work page 2025

[1] [1]

A unified anomaly synthesis strategy with gradient ascent for industrial anomaly detection and localization,

Q. Chen, H. Luo, C. Lv, and Z. Zhang, “A unified anomaly synthesis strategy with gradient ascent for industrial anomaly detection and localization,” inEur. Conf. Comput. Vis., 2025

work page 2025

[2] [2]

Collaborative discrepancy optimization for reliable image anomaly localization,

Y . Cao, X. Xu, Z. Liu, and W. Shen, “Collaborative discrepancy optimization for reliable image anomaly localization,”IEEE Trans. Ind. Inform., 2023

work page 2023

[3] [3]

Diffusionad: Norm-guided one-step denoising diffusion for anomaly detection,

H. Zhang, Z. Wang, D. Zeng, Z. Wu, and Y .-G. Jiang, “Diffusionad: Norm-guided one-step denoising diffusion for anomaly detection,”IEEE Trans. Pattern Anal. Mach. Intell., 2025

work page 2025

[4] [4]

Center- aware residual anomaly synthesis for multiclass industrial anomaly detection,

Q. Chen, H. Luo, H. Yao, W. Luo, Z. Qu, C. Lv, and Z. Zhang, “Center- aware residual anomaly synthesis for multiclass industrial anomaly detection,”IEEE Trans. Ind. Inform., 2025

work page 2025

[5] [5]

Self-supervised masked convolutional transformer block for anomaly detection,

N. Madan, N.-C. Ristea, R. T. Ionescu, K. Nasrollahi, F. S. Khan, T. B. Moeslund, and M. Shah, “Self-supervised masked convolutional transformer block for anomaly detection,”IEEE Trans. Pattern Anal. Mach. Intell., 2023

work page 2023

[6] [6]

Target before shooting: Accurate anomaly detection and localization under one mil- lisecond via cascade patch retrieval,

H. Li, J. Hu, B. Li, H. Chen, Y . Zheng, and C. Shen, “Target before shooting: Accurate anomaly detection and localization under one mil- lisecond via cascade patch retrieval,”IEEE Trans. Image Process., 2024

work page 2024

[7] [7]

Self-supervised anomaly detection with neural transformations,

C. Qiu, M. Kloft, S. Mandt, and M. Rudolph, “Self-supervised anomaly detection with neural transformations,”IEEE Trans. Pattern Anal. Mach. Intell., 2024

work page 2024

[8] [8]

Prior normality prompt transformer for multiclass industrial image anomaly detection,

H. Yao, Y . Cao, W. Luo, W. Zhang, W. Yu, and W. Shen, “Prior normality prompt transformer for multiclass industrial image anomaly detection,” IEEE Trans. Ind. Inform., 2024

work page 2024

[9] [9]

Pushing the limits of fewshot anomaly detection in industry vision: Graphcore,

G. Xie, J. Wang, J. Liu, F. Zheng, and Y . Jin, “Pushing the limits of fewshot anomaly detection in industry vision: Graphcore,” inInt. Conf. Learn. Represent., 2023

work page 2023

[10] [10]

Shape- consistent one-shot unsupervised domain adaptation for rail surface defect segmentation,

S. Ma, K. Song, M. Niu, H. Tian, Y . Wang, and Y . Yan, “Shape- consistent one-shot unsupervised domain adaptation for rail surface defect segmentation,”IEEE Trans. Ind. Inform., 2023

work page 2023

[11] [11]

Few-shot domain-adaptive anomaly detection for cross-site brain images,

J. Su, H. Shen, L. Peng, and D. Hu, “Few-shot domain-adaptive anomaly detection for cross-site brain images,”IEEE Trans. Pattern Anal. Mach. Intell., 2021

work page 2021

[12] [12]

Toward generalist anomaly detection via in-context residual learning with few-shot sample prompts,

J. Zhu and G. Pang, “Toward generalist anomaly detection via in-context residual learning with few-shot sample prompts,” inIEEE Conf. Comput. Vis. Pattern Recog., 2024

work page 2024

[13] [13]

Promptad: Learning prompts with only normal samples for few-shot anomaly detection,

X. Li, Z. Zhang, X. Tan, C. Chen, Y . Qu, Y . Xie, and L. Ma, “Promptad: Learning prompts with only normal samples for few-shot anomaly detection,” inIEEE Conf. Comput. Vis. Pattern Recog., 2024

work page 2024

[14] [14]

Adapting visual-language models for generalizable anomaly detection in medical images,

C. Huang, A. Jiang, J. Feng, Y . Zhang, X. Wang, and Y . Wang, “Adapting visual-language models for generalizable anomaly detection in medical images,” inIEEE Conf. Comput. Vis. Pattern Recog., 2024

work page 2024

[15] [15]

Winclip: Zero-/few-shot anomaly classification and segmentation,

J. Jeong, Y . Zou, T. Kim, D. Zhang, A. Ravichandran, and O. Dabeer, “Winclip: Zero-/few-shot anomaly classification and segmentation,” in IEEE Conf. Comput. Vis. Pattern Recog., 2023

work page 2023

[16] [16]

A zero-/fewshot anomaly classification and segmentation method for cvpr 2023 vand workshop challenge tracks 1&2: 1st place on zero-shot ad and 4th place on few-shot ad,

X. Chen, Y . Han, and J. Zhang, “A zero-/few-shot anomaly classification and segmentation method for cvpr 2023 vand workshop challenge tracks 1&2: 1st place on zero-shot ad and 4th place on few-shot ad,”arXiv preprint arXiv:2305.17382, 2023

work page arXiv 2023

[17] [17]

Zero- shot anomaly detection via batch normalization,

A. Li, C. Qiu, M. Kloft, P. Smyth, M. Rudolph, and S. Mandt, “Zero- shot anomaly detection via batch normalization,” inAdv. Neural Inform. Process. Syst., 2023

work page 2023

[18] [18]

Multimodal industrial anomaly detection via hybrid fusion,

Y . Wang, J. Peng, J. Zhang, R. Yi, Y . Wang, and C. Wang, “Multimodal industrial anomaly detection via hybrid fusion,” inIEEE Conf. Comput. Vis. Pattern Recog., 2023

work page 2023

[19] [19]

Back to the feature: classical 3d features are (almost) all you need for 3d anomaly detection,

E. Horwitz and Y . Hoshen, “Back to the feature: classical 3d features are (almost) all you need for 3d anomaly detection,” inIEEE Conf. Comput. Vis. Pattern Recog., 2023

work page 2023

[20] [20]

M3dm-nr: Rgb-3d noisy-resistant industrial anomaly detection via multimodal denoising,

C. Wang, H. Zhu, J. Peng, Y . Wang, R. Yi, Y . Wu, L. Ma, and J. Zhang, “M3dm-nr: Rgb-3d noisy-resistant industrial anomaly detection via multimodal denoising,”IEEE Trans. Pattern Anal. Mach. Intell., 2025

work page 2025

[21] [21]

Pointad: Comprehending 3d anomalies from points and pixels for zero-shot 3d anomaly detection,

Q. Zhou, J. Yan, S. He, W. Meng, and J. Chen, “Pointad: Comprehending 3d anomalies from points and pixels for zero-shot 3d anomaly detection,” Adv. Neural Inform. Process. Syst., 2024

work page 2024

[22] [22]

Musc: Zero-shot industrial anomaly classification and segmentation with mutual scoring of the unlabeled images,

X. Li, Z. Huang, F. Xue, and Y . Zhou, “Musc: Zero-shot industrial anomaly classification and segmentation with mutual scoring of the unlabeled images,” inInt. Conf. Learn. Represent., 2024

work page 2024

[23] [23]

An image is worth 16x16 words: Transformers for image recognition at scale,

A. Dosovitskiy, L. Beyer, A. Kolesnikov, D. Weissenborn, X. Zhai, T. Unterthiner, M. Dehghani, M. Minderer, G. Heigold, S. Gellyet al., “An image is worth 16x16 words: Transformers for image recognition at scale,” inInt. Conf. Learn. Represent., 2020

work page 2020

[24] [24]

Point transformer,

H. Zhao, L. Jiang, J. Jia, P. H. Torr, and V . Koltun, “Point transformer,” inInt. Conf. Comput. Vis., 2021

work page 2021

[25] [25]

Learning transferable visual models from natural language supervision,

A. Radford, J. W. Kim, C. Hallacy, A. Ramesh, G. Goh, S. Agarwal, G. Sastry, A. Askell, P. Mishkin, J. Clarket al., “Learning transferable visual models from natural language supervision,” inInt. Conf. Mach. Learn., 2021

work page 2021

[26] [26]

Emerging properties in self-supervised vision transformers,

M. Caron, H. Touvron, I. Misra, H. J ´egou, J. Mairal, P. Bojanowski, and A. Joulin, “Emerging properties in self-supervised vision transformers,” inInt. Conf. Comput. Vis., 2021

work page 2021

[27] [27]

Masked autoencoders for point cloud self-supervised learning,

Y . Pang, W. Wang, F. E. Tay, W. Liu, Y . Tian, and L. Yuan, “Masked autoencoders for point cloud self-supervised learning,” inEur. Conf. Comput. Vis., 2022

work page 2022

[28] [28]

Point-bert: Pre- training 3d point cloud transformers with masked point modeling,

X. Yu, L. Tang, Y . Rao, T. Huang, J. Zhou, and J. Lu, “Point-bert: Pre- training 3d point cloud transformers with masked point modeling,” in IEEE Conf. Comput. Vis. Pattern Recog., 2022

work page 2022

[29] [29]

Swin transformer: Hierarchical vision transformer using shifted windows,

Z. Liu, Y . Lin, Y . Cao, H. Hu, Y . Wei, Z. Zhang, S. Lin, and B. Guo, “Swin transformer: Hierarchical vision transformer using shifted windows,” inInt. Conf. Comput. Vis., 2021

work page 2021

[30] [30]

Vlt: Vision-language trans- former and query generation for referring segmentation,

H. Ding, C. Liu, S. Wang, and X. Jiang, “Vlt: Vision-language trans- former and query generation for referring segmentation,”IEEE Trans. Pattern Anal. Mach. Intell., 2022

work page 2022

[31] [31]

Pointnet: Deep learning on point sets for 3d classification and segmentation,

C. R. Qi, H. Su, K. Mo, and L. J. Guibas, “Pointnet: Deep learning on point sets for 3d classification and segmentation,” inIEEE Conf. Comput. Vis. Pattern Recog., 2017

work page 2017

[32] [32]

Flattening- net: Deep regular 2d representation for 3d point cloud analysis,

Q. Zhang, J. Hou, Y . Qian, Y . Zeng, J. Zhang, and Y . He, “Flattening- net: Deep regular 2d representation for 3d point cloud analysis,”IEEE Trans. Pattern Anal. Mach. Intell., 2023

work page 2023

[33] [33]

V ote2cap-detr++: Decoupling localization and describing for end-to-end 3d dense captioning,

S. Chen, H. Zhu, M. Li, X. Chen, P. Guo, Y . Lei, G. Yu, T. Li, and T. Chen, “V ote2cap-detr++: Decoupling localization and describing for end-to-end 3d dense captioning,”IEEE Trans. Pattern Anal. Mach. Intell., 2024

work page 2024

[34] [34]

Pct: Point cloud transformer,

M.-H. Guo, J.-X. Cai, Z.-N. Liu, T.-J. Mu, R. R. Martin, and S.-M. Hu, “Pct: Point cloud transformer,”Comput. Vis. Media, 2021

work page 2021

[35] [35]

Generative variational-contrastive learning for self-supervised point cloud represen- tation,

B. Wang, Z. Tian, A. Ye, F. Wen, S. Du, and Y . Gao, “Generative variational-contrastive learning for self-supervised point cloud represen- tation,”IEEE Trans. Pattern Anal. Mach. Intell., 2024. 13

work page 2024

[36] [36]

Point transformer v2: Grouped vector attention and partition-based pooling,

X. Wu, Y . Lao, L. Jiang, X. Liu, and H. Zhao, “Point transformer v2: Grouped vector attention and partition-based pooling,”Adv. Neural Inform. Process. Syst., 2022

work page 2022

[37] [37]

Point transformer v3: Simpler faster stronger,

X. Wu, L. Jiang, P.-S. Wang, Z. Liu, X. Liu, Y . Qiao, W. Ouyang, T. He, and H. Zhao, “Point transformer v3: Simpler faster stronger,” inIEEE Conf. Comput. Vis. Pattern Recog., 2024

work page 2024

[38] [38]

Adaclip: Adapting clip with hybrid learnable prompts for zero-shot anomaly detection,

Y . Cao, J. Zhang, L. Frittoli, Y . Cheng, W. Shen, and G. Boracchi, “Adaclip: Adapting clip with hybrid learnable prompts for zero-shot anomaly detection,” inEur. Conf. Comput. Vis., 2024

work page 2024

[39] [39]

Vcp-clip: A visual context prompting model for zero-shot anomaly segmentation,

Z. Qu, X. Tao, M. Prasad, F. Shen, Z. Zhang, X. Gong, and G. Ding, “Vcp-clip: A visual context prompting model for zero-shot anomaly segmentation,” inEur. Conf. Comput. Vis., 2024

work page 2024

[40] [40]

Promptad: Zero-shot anomaly detection using text prompts,

Y . Li, A. Goodge, F. Liu, and C.-S. Foo, “Promptad: Zero-shot anomaly detection using text prompts,” inWinter Conf. Appl. Comput. Vis., 2024

work page 2024

[41] [41]

Filo: Zero-shot anomaly detection by fine-grained description and high- quality localization,

Z. Gu, B. Zhu, G. Zhu, Y . Chen, H. Li, M. Tang, and J. Wang, “Filo: Zero-shot anomaly detection by fine-grained description and high- quality localization,” inACM Int. Conf. Multimedia, 2024

work page 2024

[42] [42]

Zero-shot versus many-shot: Unsupervised texture anomaly detection,

T. Aota, L. T. T. Tong, and T. Okatani, “Zero-shot versus many-shot: Unsupervised texture anomaly detection,” inWinter Conf. Appl. Comput. Vis., 2023

work page 2023

[43] [43]

R3d-ad: Reconstruction via diffusion for 3d anomaly detection,

Z. Zhou, L. Wang, N. Fang, Z. Wang, L. Qiu, and S. Zhang, “R3d-ad: Reconstruction via diffusion for 3d anomaly detection,” inEur. Conf. Comput. Vis., 2025

work page 2025

[44] [44]

Towards scalable 3d anomaly detection and localization: A benchmark via 3d anomaly synthesis and a self-supervised learning network,

W. Li, X. Xu, Y . Gu, B. Zheng, S. Gao, and Y . Wu, “Towards scalable 3d anomaly detection and localization: A benchmark via 3d anomaly synthesis and a self-supervised learning network,” inIEEE Conf. Comput. Vis. Pattern Recog., 2024

work page 2024

[45] [45]

Easynet: An easy network for 3d industrial anomaly detection,

R. Chen, G. Xie, J. Liu, J. Wang, Z. Luo, J. Wang, and F. Zheng, “Easynet: An easy network for 3d industrial anomaly detection,” inACM Int. Conf. Multimedia, 2023

work page 2023

[46] [46]

Shape-guided dual-memory learning for 3d anomaly detection,

Y .-M. Chu, C. Liu, T.-I. Hsieh, H.-T. Chen, and T.-L. Liu, “Shape-guided dual-memory learning for 3d anomaly detection,” inInt. Conf. Mach. Learn., 2023

work page 2023

[47] [47]

Multi- modal industrial anomaly detection by crossmodal feature mapping,

A. Costanzino, P. Z. Ramirez, G. Lisanti, and L. Di Stefano, “Multi- modal industrial anomaly detection by crossmodal feature mapping,” in IEEE Conf. Comput. Vis. Pattern Recog., 2024

work page 2024

[48] [48]

Ranking on data manifolds,

D. Zhou, J. Weston, A. Gretton, O. Bousquet, and B. Sch ¨olkopf, “Ranking on data manifolds,”Adv. Neural Inform. Process. Syst., 2003

work page 2003

[49] [49]

Riemannian manifold learning,

T. Lin and H. Zha, “Riemannian manifold learning,”IEEE Trans. Pattern Anal. Mach. Intell., 2008

work page 2008

[50] [50]

Affinity learning via self-diffusion for image segmentation and clustering,

B. Wang and Z. Tu, “Affinity learning via self-diffusion for image segmentation and clustering,” inIEEE Conf. Comput. Vis. Pattern Recog., 2012

work page 2012

[51] [51]

Adaptive manifold learning,

Z. Zhang, J. Wang, and H. Zha, “Adaptive manifold learning,”IEEE Trans. Pattern Anal. Mach. Intell., 2011

work page 2011

[52] [52]

Anomalyclip: Object- agnostic prompt learning for zero-shot anomaly detection,

Q. Zhou, G. Pang, Y . Tian, S. He, and J. Chen, “Anomalyclip: Object- agnostic prompt learning for zero-shot anomaly detection,” inInt. Conf. Learn. Represent., 2024

work page 2024

[53] [53]

The farthest point strategy for progressive image sampling,

Y . Eldar, M. Lindenbaum, M. Porat, and Y . Y . Zeevi, “The farthest point strategy for progressive image sampling,”IEEE Trans. Image Process., 1997

work page 1997

[54] [54]

M. P. Do Carmo,Differential geometry of curves and surfaces: revised and updated second edition. Courier Dover Publications, 2016

work page 2016

[55] [55]

Towards total recall in industrial anomaly detection,

K. Roth, L. Pemula, J. Zepeda, B. Sch ¨olkopf, T. Brox, and P. Gehler, “Towards total recall in industrial anomaly detection,” inIEEE Conf. Comput. Vis. Pattern Recog., 2022

work page 2022

[56] [56]

Unsupervised metric learning by self- smoothing operator,

J. Jiang, B. Wang, and Z. Tu, “Unsupervised metric learning by self- smoothing operator,” inInt. Conf. Comput. Vis., 2011

work page 2011

[57] [57]

The mvtec 3d-ad dataset for unsupervised 3d anomaly detection and localization,

P. Bergmann, X. Jin, D. Sattlegger, and C. Steger, “The mvtec 3d-ad dataset for unsupervised 3d anomaly detection and localization,” inInt. Conf. Comput. Vis. Theor. Appl., 2021

work page 2021

[58] [58]

Fine-grained abnormality prompt learning for zero-shot anomaly detection,

J. Zhu, Y .-S. Ong, C. Shen, and G. Pang, “Fine-grained abnormality prompt learning for zero-shot anomaly detection,” inInt. Conf. Comput. Vis., 2025

work page 2025

[59] [59]

Pointclip v2: Prompting clip and gpt for powerful 3d open- world learning,

X. Zhu, R. Zhang, B. He, Z. Guo, Z. Zeng, Z. Qin, S. Zhang, and P. Gao, “Pointclip v2: Prompting clip and gpt for powerful 3d open- world learning,” inInt. Conf. Comput. Vis., 2023

work page 2023

[60] [60]

Ulip: Learning a unified representation of language, images, and point clouds for 3d understanding,

L. Xue, M. Gao, C. Xing, R. Mart ´ın-Mart´ın, J. Wu, C. Xiong, R. Xu, J. C. Niebles, and S. Savarese, “Ulip: Learning a unified representation of language, images, and point clouds for 3d understanding,” inIEEE Conf. Comput. Vis. Pattern Recog., 2023

work page 2023

[61] [61]

Ulip-2: Towards scalable multimodal pre-training for 3d understanding,

L. Xue, N. Yu, S. Zhang, A. Panagopoulou, J. Li, R. Mart ´ın-Mart´ın, J. Wu, C. Xiong, R. Xu, J. C. Niebleset al., “Ulip-2: Towards scalable multimodal pre-training for 3d understanding,” inIEEE Conf. Comput. Vis. Pattern Recog., 2024

work page 2024

[62] [62]

Towards zero-shot 3d anomaly localization,

Y . Wang, K.-C. Peng, and Y . Fu, “Towards zero-shot 3d anomaly localization,” inWinter Conf. Appl. Comput. Vis., 2025

work page 2025

[63] [63]

The eyecandies dataset for unsupervised multimodal anomaly detection and localization,

L. Bonfiglioli, M. Toschi, D. Silvestri, N. Fioraio, and D. De Gregorio, “The eyecandies dataset for unsupervised multimodal anomaly detection and localization,” inAsian Conf. Comput. Vis., 2022

work page 2022

[64] [64]

Mvtec ad–a comprehensive real-world dataset for unsupervised anomaly detection,

P. Bergmann, M. Fauser, D. Sattlegger, and C. Steger, “Mvtec ad–a comprehensive real-world dataset for unsupervised anomaly detection,” inIEEE Conf. Comput. Vis. Pattern Recog., 2019

work page 2019

[65] [65]

Spot-the- difference self-supervised pre-training for anomaly detection and seg- mentation,

Y . Zou, J. Jeong, L. Pemula, D. Zhang, and O. Dabeer, “Spot-the- difference self-supervised pre-training for anomaly detection and seg- mentation,” inEur. Conf. Comput. Vis., 2022

work page 2022

[66] [66]

Beyond single-modal boundary: Cross-modal anomaly detection through visual prototype and harmonization,

K. Mao, P. Wei, Y . Lian, Y . Wang, and N. Zheng, “Beyond single-modal boundary: Cross-modal anomaly detection through visual prototype and harmonization,” inIEEE Conf. Comput. Vis. Pattern Recog., 2025

work page 2025

[67] [67]

Rareclip: Rarity-aware online zero- shot industrial anomaly detection,

J. He, M. Cao, S. Peng, and Q. Xie, “Rareclip: Rarity-aware online zero- shot industrial anomaly detection,” inInt. Conf. Comput. Vis., 2025

work page 2025