pith. sign in

arxiv: 2511.10047 · v2 · submitted 2025-11-13 · 💻 cs.CV

MuSc-V2: Zero-Shot Multimodal Industrial Anomaly Classification and Segmentation with Mutual Scoring of Unlabeled Samples

Pith reviewed 2026-05-17 22:52 UTC · model grok-4.3

classification 💻 cs.CV
keywords zero-shot anomaly detectionmultimodal industrial anomalymutual scoringanomaly segmentation3D anomaly detectionunsupervised classificationcross-modal fusion
0
0 comments X

The pith

Mutual scoring of unlabeled patches in 2D and 3D separates anomalies for zero-shot industrial detection.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper shows that normal patches in industrial 2D images and 3D shapes usually match many others, while anomalies stay unique and scattered. It introduces MuSc-V2, a framework that uses this property through mutual scoring within modalities and cross-modal enhancement to classify and segment defects without any labeled training data. This matters because the approach delivers large accuracy gains on benchmark datasets and performs well even on smaller data subsets or with only one modality. A reader interested in practical inspection systems would see value in a label-free method that adapts across different products.

Core claim

The central claim is that the Mutual Scoring framework (MuSc-V2), built on Iterative Point Grouping, Similarity Neighborhood Aggregation with Multi-Degrees, Mutual Scoring Mechanism, Cross-modal Anomaly Enhancement, and Re-scoring with Constrained Neighborhood, leverages the discriminative property of normal patch similarities versus anomaly isolation to deliver strong zero-shot anomaly classification and segmentation performance in multimodal settings.

What carries the argument

The Mutual Scoring Mechanism (MSM), which allows samples to score each other within each modality, fused with cross-modal anomaly enhancement to recover missing detections.

If this is right

  • Delivers a 23.7 percent AP gain on the MVTec 3D-AD dataset.
  • Delivers a 19.3 percent boost on the Eyecandies dataset.
  • Surpasses all previous zero-shot methods and most few-shot methods.
  • Maintains robust performance when applied to the full dataset or smaller subsets.
  • Supports flexible use with 2D only, 3D only, or combined modalities.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • This approach could extend to other clustering-based unsupervised tasks where normal examples share common features.
  • A test on datasets containing anomalies that mimic normal patterns would check the limits of the similarity assumption.
  • Combining the mutual scoring with additional sensor types might improve robustness in real-world manufacturing lines.

Load-bearing premise

Normal image patches across industrial products typically find many other similar patches in both 2D appearance and 3D shapes, while anomalies remain diverse and isolated.

What would settle it

Finding an industrial dataset where anomalous patches show as many mutual similarities as normal patches would falsify the separation mechanism.

Figures

Figures reproduced from arXiv: 2511.10047 by Feng Xue, Xurui Li, Yu Zhou.

Figure 1
Figure 1. Figure 1: (a) Zero-shot AC/AS methods for 2D modal. (b) Zero-shot AC/AS [PITH_FULL_IMAGE:figures/full_fig_p001_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: The pipeline of our MuSc-V2. This framework processes 2D images and 3D point clouds through four important innovations: (1) IPG replaces the current grouping strategy in the point transformer to generate groups with continuous surfaces (Sec. III-A). (2) SNAMD improves the abnormal modeling ability with varying sizes for both modals (Sec. III-B). (3) MSM obtains anomaly segmentation results of 2D/3D modals.… view at source ↗
Figure 3
Figure 3. Figure 3: Toy example of searching KP points for the center point pc. The green lines and regions represent the candidate points, and the blue ones indicate the searched points as the group points of pc. A. 2D/3D Patch Representation 2D Feature Extraction. Following [15], [16], [52], we adopt a vision transformer [23] consisting of S stages to extract hierarchical 2D features. For image Ii , we define the patch toke… view at source ↗
Figure 4
Figure 4. Figure 4: Similarity-Weighted Pooling (SWPooling) Versus Average Pooling (APooling). Top: One toy example represents feature maps aggregated by two aggregation methods, where blue patches and red patches simulate normal and abnormal tokens, respectively. Bottom: The visualization of segmentation results with SWPooling and APooling by one real example. where F i,s(m) ∈ R 1×C denotes the feature vector of patch m, and… view at source ↗
Figure 6
Figure 6. Figure 6: Two examples whose anomalies exhibit single-modality promi￾nence: (a) 3D-visible peach anomaly, (b) 2D-detectable carrot anomaly. tor of the point cloud Pi is denoted as A i P = [a i,1 P , ..., a i,MP P ] ⊤, where a i,n P represents the anomaly score of the n-th 3D patch. Cross-modal Anomaly Enhancement. Our mutual scor￾ing mechanism achieves strong patch-level anomaly detection within each modality, yet f… view at source ↗
Figure 8
Figure 8. Figure 8: Visualization of anomaly segmentation results on MVTec 3D-AD and Eyecandies benchmarks. 3D modal and multimodal (MM) results are displayed. [PITH_FULL_IMAGE:figures/full_fig_p009_8.png] view at source ↗
Figure 9
Figure 9. Figure 9: Visualization of anomaly segmentation on MVTec 3D-AD and [PITH_FULL_IMAGE:figures/full_fig_p010_9.png] view at source ↗
Figure 10
Figure 10. Figure 10: Four anomaly segmentation metrics with different normal sample [PITH_FULL_IMAGE:figures/full_fig_p011_10.png] view at source ↗
Figure 11
Figure 11. Figure 11: Experimental results of the influence of four hyperparameters on [PITH_FULL_IMAGE:figures/full_fig_p011_11.png] view at source ↗
read the original abstract

Zero-shot anomaly classification (AC) and segmentation (AS) methods aim to identify and outline defects without using any labeled samples. In this paper, we reveal a key property that is overlooked by existing methods: normal image patches across industrial products typically find many other similar patches, not only in 2D appearance but also in 3D shapes, while anomalies remain diverse and isolated. To explicitly leverage this discriminative property, we propose a Mutual Scoring framework (MuSc-V2) for zero-shot AC/AS, which flexibly supports single 2D/3D or multimodality. Specifically, our method begins by improving 3D representation through Iterative Point Grouping (IPG), which reduces false positives from discontinuous surfaces. Then we use Similarity Neighborhood Aggregation with Multi-Degrees (SNAMD) to fuse 2D/3D neighborhood cues into more discriminative multi-scale patch features for mutual scoring. The core comprises a Mutual Scoring Mechanism (MSM) that lets samples within each modality to assign score to each other, and Cross-modal Anomaly Enhancement (CAE) that fuses 2D and 3D scores to recover modality-specific missing anomalies. Finally, Re-scoring with Constrained Neighborhood (RsCon) suppresses false classification based on similarity to more representative samples. Our framework flexibly works on both the full dataset and smaller subsets with consistently robust performance, ensuring seamless adaptability across diverse product lines. In aid of the novel framework, MuSc-V2 achieves significant performance improvements: a $\textbf{+23.7\%}$ AP gain on the MVTec 3D-AD dataset and a $\textbf{+19.3\%}$ boost on the Eyecandies dataset, surpassing previous zero-shot benchmarks and even outperforming most few-shot methods. The code will be available at The code will be available at \href{https://github.com/HUST-SLOW/MuSc-V2}{https://github.com/HUST-SLOW/MuSc-V2}.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The manuscript presents MuSc-V2, a zero-shot framework for multimodal industrial anomaly classification and segmentation. It identifies a key property that normal patches have many similar counterparts in 2D appearance and 3D shapes, while anomalies are diverse and isolated. The method incorporates Iterative Point Grouping (IPG) to improve 3D representations, Similarity Neighborhood Aggregation with Multi-Degrees (SNAMD) for multi-scale features, Mutual Scoring Mechanism (MSM), Cross-modal Anomaly Enhancement (CAE), and Re-scoring with Constrained Neighborhood (RsCon). Experiments on MVTec 3D-AD and Eyecandies datasets report substantial performance gains of +23.7% AP and +19.3% respectively over prior zero-shot methods.

Significance. If the central claims hold, this work would represent a notable advance in zero-shot anomaly detection by explicitly leveraging inter-sample similarities in an unlabeled pool for both 2D and 3D modalities. The reported outperformance over most few-shot methods is particularly striking. The planned code release at https://github.com/HUST-SLOW/MuSc-V2 enhances reproducibility.

major comments (2)
  1. [Introduction] Introduction (key property paragraph): The foundational claim that 'normal image patches across industrial products typically find many other similar patches, not only in 2D appearance but also in 3D shapes, while anomalies remain diverse and isolated' is presented without any quantitative support such as neighbor-count histograms, average in-neighborhood sizes, or intra-class vs. inter-class similarity distributions on MVTec 3D-AD or Eyecandies. This assumption is load-bearing for the MSM and the reported +23.7% AP gain, as the mutual scoring separation depends on a reliable gap in neighborhood counts.
  2. [Experiments] Experiments section (main results table): The performance tables show overall AP improvements, but no ablation isolates the contribution of MSM + CAE + RsCon from the IPG and SNAMD components alone. Without such controls it remains possible that the gains derive primarily from the 3D representation improvements rather than the mutual-scoring logic itself.
minor comments (2)
  1. [Abstract] Abstract: The sentence 'The code will be available at The code will be available at https://github.com/HUST-SLOW/MuSc-V2' contains a duplicated phrase that should be corrected.
  2. [Method] Method: The aggregation steps in SNAMD would benefit from explicit pseudocode or a small diagram showing how multi-degree neighborhoods are fused across modalities.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback and the recommendation for major revision. We address each major comment point by point below, proposing targeted revisions to strengthen the manuscript while maintaining its core contributions.

read point-by-point responses
  1. Referee: [Introduction] The foundational claim that 'normal image patches across industrial products typically find many other similar patches, not only in 2D appearance but also in 3D shapes, while anomalies remain diverse and isolated' is presented without any quantitative support such as neighbor-count histograms, average in-neighborhood sizes, or intra-class vs. inter-class similarity distributions on MVTec 3D-AD or Eyecandies. This assumption is load-bearing for the MSM and the reported +23.7% AP gain, as the mutual scoring separation depends on a reliable gap in neighborhood counts.

    Authors: We acknowledge that the key property is introduced as an empirical observation without explicit quantitative backing in the current introduction. This property emerged from our analysis of patch distributions in the target datasets and is validated indirectly through the method's performance. To directly address the concern and reinforce the foundation for MSM, we will add quantitative analyses—including neighbor-count histograms, average in-neighborhood sizes, and intra- vs. inter-class similarity distributions—on MVTec 3D-AD and Eyecandies in the revised introduction and/or a new supplementary section. revision: yes

  2. Referee: [Experiments] The performance tables show overall AP improvements, but no ablation isolates the contribution of MSM + CAE + RsCon from the IPG and SNAMD components alone. Without such controls it remains possible that the gains derive primarily from the 3D representation improvements rather than the mutual-scoring logic itself.

    Authors: We agree that a more granular ablation would help isolate the impact of the mutual scoring logic (MSM, CAE, RsCon) from the representation enhancements (IPG, SNAMD). The existing experiments include module-level ablations and overall framework results, but we recognize the value of a dedicated control experiment. In the revision, we will add an ablation study that evaluates the mutual scoring components on top of the base IPG+SNAMD features to clarify their specific contributions to the reported gains. revision: yes

Circularity Check

0 steps flagged

No significant circularity; heuristic framework rests on explicit empirical assumption

full rationale

The paper states an observed property of normal patches sharing 2D/3D neighbors while anomalies are isolated, then builds MSM + SNAMD + CAE + RsCon to exploit it for scoring. No equations or steps reduce a claimed prediction back to a fitted parameter or self-citation by construction; the reported AP gains are presented as experimental outcomes on MVTec 3D-AD and Eyecandies rather than a closed derivation. The central premise is falsifiable via neighbor-count statistics on the target datasets and does not import uniqueness theorems or rename prior results.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim rests on the domain assumption that normal patches exhibit high mutual similarity while anomalies are isolated; the method introduces several new algorithmic components whose internal hyperparameters are not detailed in the abstract.

axioms (1)
  • domain assumption Normal patches find many similar counterparts in 2D and 3D while anomalies are diverse and isolated.
    This property is stated as the key overlooked fact that the mutual scoring framework exploits.

pith-pipeline@v0.9.0 · 5672 in / 1318 out tokens · 26836 ms · 2026-05-17T22:52:03.741632+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

67 extracted references · 67 canonical work pages

  1. [1]

    A unified anomaly synthesis strategy with gradient ascent for industrial anomaly detection and localization,

    Q. Chen, H. Luo, C. Lv, and Z. Zhang, “A unified anomaly synthesis strategy with gradient ascent for industrial anomaly detection and localization,” inEur. Conf. Comput. Vis., 2025

  2. [2]

    Collaborative discrepancy optimization for reliable image anomaly localization,

    Y . Cao, X. Xu, Z. Liu, and W. Shen, “Collaborative discrepancy optimization for reliable image anomaly localization,”IEEE Trans. Ind. Inform., 2023

  3. [3]

    Diffusionad: Norm-guided one-step denoising diffusion for anomaly detection,

    H. Zhang, Z. Wang, D. Zeng, Z. Wu, and Y .-G. Jiang, “Diffusionad: Norm-guided one-step denoising diffusion for anomaly detection,”IEEE Trans. Pattern Anal. Mach. Intell., 2025

  4. [4]

    Center- aware residual anomaly synthesis for multiclass industrial anomaly detection,

    Q. Chen, H. Luo, H. Yao, W. Luo, Z. Qu, C. Lv, and Z. Zhang, “Center- aware residual anomaly synthesis for multiclass industrial anomaly detection,”IEEE Trans. Ind. Inform., 2025

  5. [5]

    Self-supervised masked convolutional transformer block for anomaly detection,

    N. Madan, N.-C. Ristea, R. T. Ionescu, K. Nasrollahi, F. S. Khan, T. B. Moeslund, and M. Shah, “Self-supervised masked convolutional transformer block for anomaly detection,”IEEE Trans. Pattern Anal. Mach. Intell., 2023

  6. [6]

    Target before shooting: Accurate anomaly detection and localization under one mil- lisecond via cascade patch retrieval,

    H. Li, J. Hu, B. Li, H. Chen, Y . Zheng, and C. Shen, “Target before shooting: Accurate anomaly detection and localization under one mil- lisecond via cascade patch retrieval,”IEEE Trans. Image Process., 2024

  7. [7]

    Self-supervised anomaly detection with neural transformations,

    C. Qiu, M. Kloft, S. Mandt, and M. Rudolph, “Self-supervised anomaly detection with neural transformations,”IEEE Trans. Pattern Anal. Mach. Intell., 2024

  8. [8]

    Prior normality prompt transformer for multiclass industrial image anomaly detection,

    H. Yao, Y . Cao, W. Luo, W. Zhang, W. Yu, and W. Shen, “Prior normality prompt transformer for multiclass industrial image anomaly detection,” IEEE Trans. Ind. Inform., 2024

  9. [9]

    Pushing the limits of fewshot anomaly detection in industry vision: Graphcore,

    G. Xie, J. Wang, J. Liu, F. Zheng, and Y . Jin, “Pushing the limits of fewshot anomaly detection in industry vision: Graphcore,” inInt. Conf. Learn. Represent., 2023

  10. [10]

    Shape- consistent one-shot unsupervised domain adaptation for rail surface defect segmentation,

    S. Ma, K. Song, M. Niu, H. Tian, Y . Wang, and Y . Yan, “Shape- consistent one-shot unsupervised domain adaptation for rail surface defect segmentation,”IEEE Trans. Ind. Inform., 2023

  11. [11]

    Few-shot domain-adaptive anomaly detection for cross-site brain images,

    J. Su, H. Shen, L. Peng, and D. Hu, “Few-shot domain-adaptive anomaly detection for cross-site brain images,”IEEE Trans. Pattern Anal. Mach. Intell., 2021

  12. [12]

    Toward generalist anomaly detection via in-context residual learning with few-shot sample prompts,

    J. Zhu and G. Pang, “Toward generalist anomaly detection via in-context residual learning with few-shot sample prompts,” inIEEE Conf. Comput. Vis. Pattern Recog., 2024

  13. [13]

    Promptad: Learning prompts with only normal samples for few-shot anomaly detection,

    X. Li, Z. Zhang, X. Tan, C. Chen, Y . Qu, Y . Xie, and L. Ma, “Promptad: Learning prompts with only normal samples for few-shot anomaly detection,” inIEEE Conf. Comput. Vis. Pattern Recog., 2024

  14. [14]

    Adapting visual-language models for generalizable anomaly detection in medical images,

    C. Huang, A. Jiang, J. Feng, Y . Zhang, X. Wang, and Y . Wang, “Adapting visual-language models for generalizable anomaly detection in medical images,” inIEEE Conf. Comput. Vis. Pattern Recog., 2024

  15. [15]

    Winclip: Zero-/few-shot anomaly classification and segmentation,

    J. Jeong, Y . Zou, T. Kim, D. Zhang, A. Ravichandran, and O. Dabeer, “Winclip: Zero-/few-shot anomaly classification and segmentation,” in IEEE Conf. Comput. Vis. Pattern Recog., 2023

  16. [16]

    A zero-/fewshot anomaly classification and segmentation method for cvpr 2023 vand workshop challenge tracks 1&2: 1st place on zero-shot ad and 4th place on few-shot ad,

    X. Chen, Y . Han, and J. Zhang, “A zero-/few-shot anomaly classification and segmentation method for cvpr 2023 vand workshop challenge tracks 1&2: 1st place on zero-shot ad and 4th place on few-shot ad,”arXiv preprint arXiv:2305.17382, 2023

  17. [17]

    Zero- shot anomaly detection via batch normalization,

    A. Li, C. Qiu, M. Kloft, P. Smyth, M. Rudolph, and S. Mandt, “Zero- shot anomaly detection via batch normalization,” inAdv. Neural Inform. Process. Syst., 2023

  18. [18]

    Multimodal industrial anomaly detection via hybrid fusion,

    Y . Wang, J. Peng, J. Zhang, R. Yi, Y . Wang, and C. Wang, “Multimodal industrial anomaly detection via hybrid fusion,” inIEEE Conf. Comput. Vis. Pattern Recog., 2023

  19. [19]

    Back to the feature: classical 3d features are (almost) all you need for 3d anomaly detection,

    E. Horwitz and Y . Hoshen, “Back to the feature: classical 3d features are (almost) all you need for 3d anomaly detection,” inIEEE Conf. Comput. Vis. Pattern Recog., 2023

  20. [20]

    M3dm-nr: Rgb-3d noisy-resistant industrial anomaly detection via multimodal denoising,

    C. Wang, H. Zhu, J. Peng, Y . Wang, R. Yi, Y . Wu, L. Ma, and J. Zhang, “M3dm-nr: Rgb-3d noisy-resistant industrial anomaly detection via multimodal denoising,”IEEE Trans. Pattern Anal. Mach. Intell., 2025

  21. [21]

    Pointad: Comprehending 3d anomalies from points and pixels for zero-shot 3d anomaly detection,

    Q. Zhou, J. Yan, S. He, W. Meng, and J. Chen, “Pointad: Comprehending 3d anomalies from points and pixels for zero-shot 3d anomaly detection,” Adv. Neural Inform. Process. Syst., 2024

  22. [22]

    Musc: Zero-shot industrial anomaly classification and segmentation with mutual scoring of the unlabeled images,

    X. Li, Z. Huang, F. Xue, and Y . Zhou, “Musc: Zero-shot industrial anomaly classification and segmentation with mutual scoring of the unlabeled images,” inInt. Conf. Learn. Represent., 2024

  23. [23]

    An image is worth 16x16 words: Transformers for image recognition at scale,

    A. Dosovitskiy, L. Beyer, A. Kolesnikov, D. Weissenborn, X. Zhai, T. Unterthiner, M. Dehghani, M. Minderer, G. Heigold, S. Gellyet al., “An image is worth 16x16 words: Transformers for image recognition at scale,” inInt. Conf. Learn. Represent., 2020

  24. [24]

    Point transformer,

    H. Zhao, L. Jiang, J. Jia, P. H. Torr, and V . Koltun, “Point transformer,” inInt. Conf. Comput. Vis., 2021

  25. [25]

    Learning transferable visual models from natural language supervision,

    A. Radford, J. W. Kim, C. Hallacy, A. Ramesh, G. Goh, S. Agarwal, G. Sastry, A. Askell, P. Mishkin, J. Clarket al., “Learning transferable visual models from natural language supervision,” inInt. Conf. Mach. Learn., 2021

  26. [26]

    Emerging properties in self-supervised vision transformers,

    M. Caron, H. Touvron, I. Misra, H. J ´egou, J. Mairal, P. Bojanowski, and A. Joulin, “Emerging properties in self-supervised vision transformers,” inInt. Conf. Comput. Vis., 2021

  27. [27]

    Masked autoencoders for point cloud self-supervised learning,

    Y . Pang, W. Wang, F. E. Tay, W. Liu, Y . Tian, and L. Yuan, “Masked autoencoders for point cloud self-supervised learning,” inEur. Conf. Comput. Vis., 2022

  28. [28]

    Point-bert: Pre- training 3d point cloud transformers with masked point modeling,

    X. Yu, L. Tang, Y . Rao, T. Huang, J. Zhou, and J. Lu, “Point-bert: Pre- training 3d point cloud transformers with masked point modeling,” in IEEE Conf. Comput. Vis. Pattern Recog., 2022

  29. [29]

    Swin transformer: Hierarchical vision transformer using shifted windows,

    Z. Liu, Y . Lin, Y . Cao, H. Hu, Y . Wei, Z. Zhang, S. Lin, and B. Guo, “Swin transformer: Hierarchical vision transformer using shifted windows,” inInt. Conf. Comput. Vis., 2021

  30. [30]

    Vlt: Vision-language trans- former and query generation for referring segmentation,

    H. Ding, C. Liu, S. Wang, and X. Jiang, “Vlt: Vision-language trans- former and query generation for referring segmentation,”IEEE Trans. Pattern Anal. Mach. Intell., 2022

  31. [31]

    Pointnet: Deep learning on point sets for 3d classification and segmentation,

    C. R. Qi, H. Su, K. Mo, and L. J. Guibas, “Pointnet: Deep learning on point sets for 3d classification and segmentation,” inIEEE Conf. Comput. Vis. Pattern Recog., 2017

  32. [32]

    Flattening- net: Deep regular 2d representation for 3d point cloud analysis,

    Q. Zhang, J. Hou, Y . Qian, Y . Zeng, J. Zhang, and Y . He, “Flattening- net: Deep regular 2d representation for 3d point cloud analysis,”IEEE Trans. Pattern Anal. Mach. Intell., 2023

  33. [33]

    V ote2cap-detr++: Decoupling localization and describing for end-to-end 3d dense captioning,

    S. Chen, H. Zhu, M. Li, X. Chen, P. Guo, Y . Lei, G. Yu, T. Li, and T. Chen, “V ote2cap-detr++: Decoupling localization and describing for end-to-end 3d dense captioning,”IEEE Trans. Pattern Anal. Mach. Intell., 2024

  34. [34]

    Pct: Point cloud transformer,

    M.-H. Guo, J.-X. Cai, Z.-N. Liu, T.-J. Mu, R. R. Martin, and S.-M. Hu, “Pct: Point cloud transformer,”Comput. Vis. Media, 2021

  35. [35]

    Generative variational-contrastive learning for self-supervised point cloud represen- tation,

    B. Wang, Z. Tian, A. Ye, F. Wen, S. Du, and Y . Gao, “Generative variational-contrastive learning for self-supervised point cloud represen- tation,”IEEE Trans. Pattern Anal. Mach. Intell., 2024. 13

  36. [36]

    Point transformer v2: Grouped vector attention and partition-based pooling,

    X. Wu, Y . Lao, L. Jiang, X. Liu, and H. Zhao, “Point transformer v2: Grouped vector attention and partition-based pooling,”Adv. Neural Inform. Process. Syst., 2022

  37. [37]

    Point transformer v3: Simpler faster stronger,

    X. Wu, L. Jiang, P.-S. Wang, Z. Liu, X. Liu, Y . Qiao, W. Ouyang, T. He, and H. Zhao, “Point transformer v3: Simpler faster stronger,” inIEEE Conf. Comput. Vis. Pattern Recog., 2024

  38. [38]

    Adaclip: Adapting clip with hybrid learnable prompts for zero-shot anomaly detection,

    Y . Cao, J. Zhang, L. Frittoli, Y . Cheng, W. Shen, and G. Boracchi, “Adaclip: Adapting clip with hybrid learnable prompts for zero-shot anomaly detection,” inEur. Conf. Comput. Vis., 2024

  39. [39]

    Vcp-clip: A visual context prompting model for zero-shot anomaly segmentation,

    Z. Qu, X. Tao, M. Prasad, F. Shen, Z. Zhang, X. Gong, and G. Ding, “Vcp-clip: A visual context prompting model for zero-shot anomaly segmentation,” inEur. Conf. Comput. Vis., 2024

  40. [40]

    Promptad: Zero-shot anomaly detection using text prompts,

    Y . Li, A. Goodge, F. Liu, and C.-S. Foo, “Promptad: Zero-shot anomaly detection using text prompts,” inWinter Conf. Appl. Comput. Vis., 2024

  41. [41]

    Filo: Zero-shot anomaly detection by fine-grained description and high- quality localization,

    Z. Gu, B. Zhu, G. Zhu, Y . Chen, H. Li, M. Tang, and J. Wang, “Filo: Zero-shot anomaly detection by fine-grained description and high- quality localization,” inACM Int. Conf. Multimedia, 2024

  42. [42]

    Zero-shot versus many-shot: Unsupervised texture anomaly detection,

    T. Aota, L. T. T. Tong, and T. Okatani, “Zero-shot versus many-shot: Unsupervised texture anomaly detection,” inWinter Conf. Appl. Comput. Vis., 2023

  43. [43]

    R3d-ad: Reconstruction via diffusion for 3d anomaly detection,

    Z. Zhou, L. Wang, N. Fang, Z. Wang, L. Qiu, and S. Zhang, “R3d-ad: Reconstruction via diffusion for 3d anomaly detection,” inEur. Conf. Comput. Vis., 2025

  44. [44]

    Towards scalable 3d anomaly detection and localization: A benchmark via 3d anomaly synthesis and a self-supervised learning network,

    W. Li, X. Xu, Y . Gu, B. Zheng, S. Gao, and Y . Wu, “Towards scalable 3d anomaly detection and localization: A benchmark via 3d anomaly synthesis and a self-supervised learning network,” inIEEE Conf. Comput. Vis. Pattern Recog., 2024

  45. [45]

    Easynet: An easy network for 3d industrial anomaly detection,

    R. Chen, G. Xie, J. Liu, J. Wang, Z. Luo, J. Wang, and F. Zheng, “Easynet: An easy network for 3d industrial anomaly detection,” inACM Int. Conf. Multimedia, 2023

  46. [46]

    Shape-guided dual-memory learning for 3d anomaly detection,

    Y .-M. Chu, C. Liu, T.-I. Hsieh, H.-T. Chen, and T.-L. Liu, “Shape-guided dual-memory learning for 3d anomaly detection,” inInt. Conf. Mach. Learn., 2023

  47. [47]

    Multi- modal industrial anomaly detection by crossmodal feature mapping,

    A. Costanzino, P. Z. Ramirez, G. Lisanti, and L. Di Stefano, “Multi- modal industrial anomaly detection by crossmodal feature mapping,” in IEEE Conf. Comput. Vis. Pattern Recog., 2024

  48. [48]

    Ranking on data manifolds,

    D. Zhou, J. Weston, A. Gretton, O. Bousquet, and B. Sch ¨olkopf, “Ranking on data manifolds,”Adv. Neural Inform. Process. Syst., 2003

  49. [49]

    Riemannian manifold learning,

    T. Lin and H. Zha, “Riemannian manifold learning,”IEEE Trans. Pattern Anal. Mach. Intell., 2008

  50. [50]

    Affinity learning via self-diffusion for image segmentation and clustering,

    B. Wang and Z. Tu, “Affinity learning via self-diffusion for image segmentation and clustering,” inIEEE Conf. Comput. Vis. Pattern Recog., 2012

  51. [51]

    Adaptive manifold learning,

    Z. Zhang, J. Wang, and H. Zha, “Adaptive manifold learning,”IEEE Trans. Pattern Anal. Mach. Intell., 2011

  52. [52]

    Anomalyclip: Object- agnostic prompt learning for zero-shot anomaly detection,

    Q. Zhou, G. Pang, Y . Tian, S. He, and J. Chen, “Anomalyclip: Object- agnostic prompt learning for zero-shot anomaly detection,” inInt. Conf. Learn. Represent., 2024

  53. [53]

    The farthest point strategy for progressive image sampling,

    Y . Eldar, M. Lindenbaum, M. Porat, and Y . Y . Zeevi, “The farthest point strategy for progressive image sampling,”IEEE Trans. Image Process., 1997

  54. [54]

    M. P. Do Carmo,Differential geometry of curves and surfaces: revised and updated second edition. Courier Dover Publications, 2016

  55. [55]

    Towards total recall in industrial anomaly detection,

    K. Roth, L. Pemula, J. Zepeda, B. Sch ¨olkopf, T. Brox, and P. Gehler, “Towards total recall in industrial anomaly detection,” inIEEE Conf. Comput. Vis. Pattern Recog., 2022

  56. [56]

    Unsupervised metric learning by self- smoothing operator,

    J. Jiang, B. Wang, and Z. Tu, “Unsupervised metric learning by self- smoothing operator,” inInt. Conf. Comput. Vis., 2011

  57. [57]

    The mvtec 3d-ad dataset for unsupervised 3d anomaly detection and localization,

    P. Bergmann, X. Jin, D. Sattlegger, and C. Steger, “The mvtec 3d-ad dataset for unsupervised 3d anomaly detection and localization,” inInt. Conf. Comput. Vis. Theor. Appl., 2021

  58. [58]

    Fine-grained abnormality prompt learning for zero-shot anomaly detection,

    J. Zhu, Y .-S. Ong, C. Shen, and G. Pang, “Fine-grained abnormality prompt learning for zero-shot anomaly detection,” inInt. Conf. Comput. Vis., 2025

  59. [59]

    Pointclip v2: Prompting clip and gpt for powerful 3d open- world learning,

    X. Zhu, R. Zhang, B. He, Z. Guo, Z. Zeng, Z. Qin, S. Zhang, and P. Gao, “Pointclip v2: Prompting clip and gpt for powerful 3d open- world learning,” inInt. Conf. Comput. Vis., 2023

  60. [60]

    Ulip: Learning a unified representation of language, images, and point clouds for 3d understanding,

    L. Xue, M. Gao, C. Xing, R. Mart ´ın-Mart´ın, J. Wu, C. Xiong, R. Xu, J. C. Niebles, and S. Savarese, “Ulip: Learning a unified representation of language, images, and point clouds for 3d understanding,” inIEEE Conf. Comput. Vis. Pattern Recog., 2023

  61. [61]

    Ulip-2: Towards scalable multimodal pre-training for 3d understanding,

    L. Xue, N. Yu, S. Zhang, A. Panagopoulou, J. Li, R. Mart ´ın-Mart´ın, J. Wu, C. Xiong, R. Xu, J. C. Niebleset al., “Ulip-2: Towards scalable multimodal pre-training for 3d understanding,” inIEEE Conf. Comput. Vis. Pattern Recog., 2024

  62. [62]

    Towards zero-shot 3d anomaly localization,

    Y . Wang, K.-C. Peng, and Y . Fu, “Towards zero-shot 3d anomaly localization,” inWinter Conf. Appl. Comput. Vis., 2025

  63. [63]

    The eyecandies dataset for unsupervised multimodal anomaly detection and localization,

    L. Bonfiglioli, M. Toschi, D. Silvestri, N. Fioraio, and D. De Gregorio, “The eyecandies dataset for unsupervised multimodal anomaly detection and localization,” inAsian Conf. Comput. Vis., 2022

  64. [64]

    Mvtec ad–a comprehensive real-world dataset for unsupervised anomaly detection,

    P. Bergmann, M. Fauser, D. Sattlegger, and C. Steger, “Mvtec ad–a comprehensive real-world dataset for unsupervised anomaly detection,” inIEEE Conf. Comput. Vis. Pattern Recog., 2019

  65. [65]

    Spot-the- difference self-supervised pre-training for anomaly detection and seg- mentation,

    Y . Zou, J. Jeong, L. Pemula, D. Zhang, and O. Dabeer, “Spot-the- difference self-supervised pre-training for anomaly detection and seg- mentation,” inEur. Conf. Comput. Vis., 2022

  66. [66]

    Beyond single-modal boundary: Cross-modal anomaly detection through visual prototype and harmonization,

    K. Mao, P. Wei, Y . Lian, Y . Wang, and N. Zheng, “Beyond single-modal boundary: Cross-modal anomaly detection through visual prototype and harmonization,” inIEEE Conf. Comput. Vis. Pattern Recog., 2025

  67. [67]

    Rareclip: Rarity-aware online zero- shot industrial anomaly detection,

    J. He, M. Cao, S. Peng, and Q. Xie, “Rareclip: Rarity-aware online zero- shot industrial anomaly detection,” inInt. Conf. Comput. Vis., 2025