MambaADv2: Evolving Duality-enhanced State Space Model for Unsupervised Anomaly Detection

Bo Yin; Haoyang He; Jiangning Zhang; Lei Xie; Shuicheng Yan; Xiaobin Hu; Yu-Gang Jiang; Yu He

arxiv: 2606.23126 · v1 · pith:MMDYZ6WRnew · submitted 2026-06-22 · 💻 cs.CV

MambaADv2: Evolving Duality-enhanced State Space Model for Unsupervised Anomaly Detection

Xiaobin Hu , Haoyang He , Bo Yin , Yu He , Lei Xie , Jiangning Zhang , Yu-Gang Jiang , Shuicheng Yan This is my paper

Pith reviewed 2026-06-26 09:24 UTC · model grok-4.3

classification 💻 cs.CV

keywords unsupervised anomaly detectionstate space modelMamba architecturehybrid state spacefeature reconstructionmulti-class detectionprogressive scanningfrequency-enhanced convolution

0 comments

The pith

MambaADv2 uses duality-enhanced state space modules to reconstruct normal features while magnifying anomalies in unsupervised detection.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper aims to show that a Mamba-inspired decoder with Duality-enhanced State Space modules can overcome the long-range modeling limits of CNNs and the quadratic costs of Transformers for multi-class unsupervised anomaly detection. It does so by integrating Hybrid State Space blocks that follow the SSD-based Mamba lineage with Mamba3-style position awareness and frequency-enhanced convolutions. The dual paths of linear recurrence and parallel matrix formulation are presented as the means to capture local continuity alongside global contextual comparison. This setup is claimed to enable precise normal reconstruction paired with amplified anomaly deviations, supported by a semantics-adaptive progressive scanning strategy across scales. Readers would care if the approach delivers practical gains in accuracy and efficiency on standard detection benchmarks.

Core claim

By critically rethinking the structural evolution across the Mamba lineage 1-3 series, this paper proposes MambaADv2, a framework tailored for multi-class unsupervised anomaly detection. MambaADv2 comprises a pre-trained encoder and a Mamba-inspired decoder, equipped with Duality-enhanced State Space (DSS) modules across multiple scales. The proposed DSS module effectively models both global dependencies and local representations by integrating parallel-cascaded Hybrid State Space (HSS) blocks and frequency-enhanced convolution operations. The structure of the Hybrid State Space (HSS) block is tailored by following the SSD-based Mamba lineage and incorporating Mamba3-style position-aware sta

What carries the argument

Duality-enhanced State Space (DSS) module, which integrates parallel-cascaded Hybrid State Space (HSS) blocks and frequency-enhanced convolution operations to model local continuity and global contextual comparison via dual paths of linear recurrence and parallel matrix formulation.

If this is right

Enables precise reconstruction of normal representations while magnifying anomalous deviations in multi-class unsupervised settings.
Achieves long-range dependency modeling at linear computational complexity.
Supports adaptive scanning that reduces complexity along the feature pyramid.
Combines local continuity modeling with global contextual comparison through dual computational paths.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The dual-path design might transfer to other reconstruction-based vision tasks that need both local detail and global context.
Progressive scanning could lower memory use in high-resolution or video anomaly detection scenarios.
Frequency-enhanced convolutions may help in domains with periodic patterns or texture variations.
The overall architecture could be tested against other linear-complexity sequence models for broader efficiency comparisons.

Load-bearing premise

That incorporating the described dual computational paths, Mamba lineage elements, and frequency operations will produce superior reconstruction of normal data and magnification of anomalies compared with prior architectures.

What would settle it

Experiments on standard anomaly detection benchmarks where MambaADv2 fails to outperform prior CNN, Transformer, or Mamba-based methods in detection accuracy or computational efficiency.

Figures

Figures reproduced from arXiv: 2606.23126 by Bo Yin, Haoyang He, Jiangning Zhang, Lei Xie, Shuicheng Yan, Xiaobin Hu, Yu-Gang Jiang, Yu He.

**Figure 2.** Figure 2: Overview: modalities, benchmarks, and scalability of MambaADv2. (a) Multi-modal capabilities span 2D images, multi-view 2D inputs, [PITH_FULL_IMAGE:figures/full_fig_p003_2.png] view at source ↗

**Figure 3.** Figure 3: Overview of MambaADv2 architecture. (a) Pyramidal encoder-decoder framework with frozen ResNet34 and Mamba decoder stacking LSS modules at four scales. (b) LSS Block: three-branch design combining Global (HSS blocks), Local (WTConv for multi-resolution analysis), and Freq (Inception Mixer for spectral modeling) branches, fused via 1 × 1 convolution. (c) Redesigned HSS Block: Mamba-3 with RoPE for position … view at source ↗

**Figure 4.** Figure 4: Qualitative anomaly localization results on four representative [PITH_FULL_IMAGE:figures/full_fig_p007_4.png] view at source ↗

**Figure 5.** Figure 5: Effective receptive field comparison. MambaADv2 covers a [PITH_FULL_IMAGE:figures/full_fig_p009_5.png] view at source ↗

**Figure 6.** Figure 6: Qualitative ablation of spatial-frequency enhancements. [PITH_FULL_IMAGE:figures/full_fig_p010_6.png] view at source ↗

**Figure 7.** Figure 7: Training convergence comparison on MVTec-AD. MambaADv2 [PITH_FULL_IMAGE:figures/full_fig_p013_7.png] view at source ↗

read the original abstract

While recent advancements in anomaly detection have demonstrated the efficacy of CNN- and Transformer-based approaches, these architectures face inherent limitations: CNNs struggle to capture long-range dependencies, whereas Transformers suffer from quadratic computational complexity. Consequently, Mamba-based architectures have attracted considerable attention, as they successfully combine superior long-range dependency modeling with linear computational complexity. By critically rethinking the structural evolution across the Mamba lineage 1-3 series, this paper proposes MambaADv2, a framework tailored for multi-class unsupervised anomaly detection. MambaADv2 comprises a pre-trained encoder and a Mamba-inspired decoder, equipped with Duality-enhanced State Space (DSS) modules across multiple scales. The proposed DSS module effectively models both global dependencies and local representations by integrating parallel-cascaded Hybrid State Space (HSS) blocks and frequency-enhanced convolution operations. The structure of the Hybrid State Space (HSS) block is tailored by following the SSD-based Mamba lineage and incorporating Mamba3-style position-aware state-space modeling, leveraging the dual computational paths of linear recurrence and parallel matrix formulation to model local continuity and global contextual comparison, thereby better serving the core anomaly detection objective of precisely reconstructing normal representations while magnifying anomalous deviations. Additionally, we propose a semantics-adaptive progressive scanning strategy that decays scanning complexity along the feature pyramid.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

MambaADv2 adapts recent Mamba variants to anomaly detection with dual-path HSS blocks but supplies no results or mechanism details to support the performance claims.

read the letter

The main takeaway is an incremental extension of the Mamba lineage to multi-class unsupervised anomaly detection. The authors describe a decoder built around Duality-enhanced State Space modules that combine parallel-cascaded Hybrid State Space blocks, frequency-enhanced convolutions, and a semantics-adaptive progressive scanning strategy that reduces complexity down the feature pyramid.

What the paper does is spell out a clear motivation from the usual CNN and Transformer drawbacks and then map specific Mamba3-style position-aware modeling and dual computational paths onto the anomaly detection goal of normal reconstruction plus anomaly magnification. The architecture choices are presented coherently at the block level.

The soft spot is the absence of any supporting evidence. The abstract contains no numbers, baselines, ablations, or equations that isolate how the parallel matrix path produces a systematic contrast between normal and anomalous regions. The stress-test concern holds: the claim that these dual paths better magnify deviations rests on an unshown mechanism rather than a derivation or experiment. Without that, the central assertion stays untested.

This is aimed at the narrow set of researchers already tracking Mamba variants for vision tasks. Someone outside that niche will not find a broadly useful new principle. The citation pattern follows the expected recent Mamba papers without obvious gaps.

I would send the full manuscript to peer review if it contains the missing quantitative comparisons and ablations; the structural idea is coherent enough to be worth checking even if the current description leaves the key performance question open.

Referee Report

2 major / 1 minor

Summary. The paper proposes MambaADv2 for multi-class unsupervised anomaly detection. It consists of a pre-trained encoder and Mamba-inspired decoder using Duality-enhanced State Space (DSS) modules at multiple scales. Each DSS module integrates parallel-cascaded Hybrid State Space (HSS) blocks (following the SSD-based Mamba lineage with Mamba3-style position-aware modeling) that employ dual paths—linear recurrence for local continuity and parallel matrix formulation for global contextual comparison—along with frequency-enhanced convolution operations. A semantics-adaptive progressive scanning strategy is introduced to decay scanning complexity along the feature pyramid. The architecture is motivated by limitations of CNNs (long-range dependencies) and Transformers (quadratic complexity) and aims to achieve precise normal reconstruction while magnifying anomalous deviations with linear complexity.

Significance. If the central claims hold and the dual-path HSS design demonstrably improves anomaly magnification over prior Mamba variants, the work could provide a computationally efficient alternative for anomaly detection that better captures both local and global context. The explicit adaptation of recent Mamba3 position-aware modeling and frequency operations to the reconstruction objective would represent a targeted evolution within the state-space model lineage for vision tasks.

major comments (2)

[Abstract] Abstract: The claim that the dual computational paths of the HSS block 'better serve the core anomaly detection objective of precisely reconstructing normal representations while magnifying anomalous deviations' is not supported by any equation, derivation, or explicit mechanism showing how the parallel matrix formulation supplies a systematic anomaly-specific contrast (as opposed to standard linear recurrence in prior Mamba models). This is load-bearing for the central claim.
[Abstract] Abstract: No quantitative results (e.g., AUROC, AP), baseline comparisons, ablation studies on the dual paths or frequency-enhanced convolutions, or error analysis are supplied to evaluate whether the DSS module or scanning strategy improves upon prior MambaAD or other architectures. The central performance claims cannot be assessed from the given text.

minor comments (1)

[Abstract] Abstract: The reference to 'Mamba lineage 1-3 series' is vague; explicit citations to the specific prior works (e.g., Mamba, Mamba2, Mamba3) should be provided for clarity.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback on our manuscript. We address each major comment below and outline the revisions we will make to strengthen the presentation of our central claims.

read point-by-point responses

Referee: [Abstract] Abstract: The claim that the dual computational paths of the HSS block 'better serve the core anomaly detection objective of precisely reconstructing normal representations while magnifying anomalous deviations' is not supported by any equation, derivation, or explicit mechanism showing how the parallel matrix formulation supplies a systematic anomaly-specific contrast (as opposed to standard linear recurrence in prior Mamba models). This is load-bearing for the central claim.

Authors: We agree that the abstract states this interpretive claim without an accompanying equation or derivation. The manuscript describes the dual paths (linear recurrence for local continuity and parallel matrix formulation for global contextual comparison) but does not explicitly derive how the parallel path produces anomaly-specific contrast. We will revise the abstract to include a concise mechanistic explanation and expand the main text with an illustrative derivation or comparison showing the contrast effect relative to standard linear recurrence. revision: yes
Referee: [Abstract] Abstract: No quantitative results (e.g., AUROC, AP), baseline comparisons, ablation studies on the dual paths or frequency-enhanced convolutions, or error analysis are supplied to evaluate whether the DSS module or scanning strategy improves upon prior MambaAD or other architectures. The central performance claims cannot be assessed from the given text.

Authors: We agree that the abstract contains no numerical results or ablations. The abstract is intended as a high-level summary, but to allow assessment of the central claims we will revise it to report key quantitative outcomes (AUROC/AP on standard benchmarks), mention the baseline comparisons, and note the ablation findings on the dual paths and frequency convolutions that appear in the experimental section of the manuscript. revision: yes

Circularity Check

0 steps flagged

No circularity: architectural proposal with independent design choices

full rationale

The paper describes MambaADv2 as an evolution of the Mamba/SSD lineage with DSS modules that integrate HSS blocks using dual paths (linear recurrence and parallel matrix) plus frequency convolutions. This is presented as a structural tailoring following prior external work, not as a derivation, equation, or fitted quantity that reduces to its own inputs. No self-definitional loops, predictions from fitted parameters, or load-bearing self-citations appear in the abstract or described claims. The central objective (normal reconstruction and anomaly magnification) is a stated goal of the architecture rather than a result forced by construction. The derivation chain is self-contained as an engineering proposal.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract-only review yields no explicit free parameters, axioms, or invented entities; ledger left empty.

pith-pipeline@v0.9.1-grok · 5789 in / 1023 out tokens · 28277 ms · 2026-06-26T09:24:33.119359+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

114 extracted references · 9 linked inside Pith

[1]

Deep learning for anomaly detection: A review,

G. Pang, C. Shen, L. Cao, and A. V . D. Hengel, “Deep learning for anomaly detection: A review,”ACM computing surveys (CSUR), vol. 54, no. 2, pp. 1–38, 2021. 1

2021
[2]

The mvtec 3d-ad dataset for unsupervised 3d anomaly detection and localization,

P . Bergmann, X. Jin, D. Sattlegger, and C. Steger, “The mvtec 3d-ad dataset for unsupervised 3d anomaly detection and localization,” inVISAPP, 2022. 1, 7, 8, 9, 16, 17

2022
[3]

The mvtec anomaly detection dataset: a comprehensive real-world dataset for unsupervised anomaly detection,

P . Bergmann, K. Batzner, M. Fauser, D. Sattlegger, and C. Steger, “The mvtec anomaly detection dataset: a comprehensive real-world dataset for unsupervised anomaly detection,”International Journal of Computer Vision, vol. 129, no. 4, pp. 1038–1059, 2021. 1, 7, 8, 9, 16, 17

2021
[4]

Spot-the- difference self-supervised pre-training for anomaly detection and segmentation,

Y. Zou, J. Jeong, L. Pemula, D. Zhang, and O. Dabeer, “Spot-the- difference self-supervised pre-training for anomaly detection and segmentation,” inECCV, 2022. 1, 7, 8, 9, 16, 18

2022
[5]

Real-iad: A real-world multi-view dataset for benchmarking versatile industrial anomaly detection,

C. Wang, W. Zhu, B.-B. Gao, Z. Gan, J. Zhang, Z. Gu, S. Qian, M. Chen, and L. Ma, “Real-iad: A real-world multi-view dataset for benchmarking versatile industrial anomaly detection,” in CVPR, 2024. 1, 7, 8

2024
[6]

Anomaly detection via reverse distillation from one-class embedding,

H. Deng and X. Li, “Anomaly detection via reverse distillation from one-class embedding,” inCVPR, 2022. 1, 2, 3, 8, 9, 12, 18, 19, 20, 21, 22, 23

2022
[7]

Simplenet: A simple network for image anomaly detection and localization,

Z. Liu, Y. Zhou, Y. Xu, and Z. Wang, “Simplenet: A simple network for image anomaly detection and localization,” inCVPR, 2023. 1, 8, 9, 12, 18, 19, 20, 21, 22, 23

2023
[8]

Destseg: Segmentation guided denoising student-teacher for anomaly detection,

X. Zhang, S. Li, X. Li, P . Huang, J. Shan, and T. Chen, “Destseg: Segmentation guided denoising student-teacher for anomaly detection,” inCVPR, 2023. 1, 8, 9, 12, 18, 19, 20, 21, 22, 23

2023
[9]

Efficientad: Accurate visual anomaly detection at millisecond-level latencies,

K. Batzner, L. Heckler, and R. König, “Efficientad: Accurate visual anomaly detection at millisecond-level latencies,” inProceedings of the IEEE/CVF winter conference on applications of computer vision, 2024, pp. 128–138. 1, 3

2024
[10]

A unified model for multi-class anomaly detection,

Z. You, L. Cui, Y. Shen, K. Yang, X. Lu, Y. Zheng, and X. Le, “A unified model for multi-class anomaly detection,” inNeurlPS,
[11]

1, 3, 4, 8, 9, 12, 18, 19, 20, 21, 22, 23
[12]

A diffusion-based framework for multi- class anomaly detection,

H. He, J. Zhang, H. Chen, X. Chen, Z. Li, X. Chen, Y. Wang, C. Wang, and L. Xie, “A diffusion-based framework for multi- class anomaly detection,” inAAAI, 2024. 1, 3, 4, 8, 12, 18, 19, 20, 21

2024
[13]

Exploring plain vit reconstruction for multi-class unsupervised anomaly detection,

J. Zhang, X. Chen, Y. Wang, C. Wang, Y. Liu, X. Li, M.-H. Yang, and D. Tao, “Exploring plain vit reconstruction for multi-class unsupervised anomaly detection,”arXiv preprint arXiv:2312.07495,

arXiv
[14]

Learning feature inversion for multi-class unsupervised anomaly detection under general-purpose coco-ad benchmark,

J. Zhang, X. Li, G. Tian, Z. Xue, Y. Liu, G. Pang, and D. Tao, “Learning feature inversion for multi-class unsupervised anomaly detection under general-purpose coco-ad benchmark,”arXiv, 2024. 1, 3, 4, 7, 8, 16, 17

2024
[15]

Unimmad: Unified multi-modal and multi-class anomaly de- tection via moe-driven feature decompression,

Y. Zhao, Y. Pang, L. Zhang, H. Liu, J. Zuo, H. Lu, and X. Zhao, “Unimmad: Unified multi-modal and multi-class anomaly de- tection via moe-driven feature decompression,”arXiv preprint arXiv:2509.25934, 2025. 1, 4, 8

arXiv 2025
[16]

A survey on visual anomaly detection: Challenge, approach, and prospect,

Y. Cao, X. Xu, J. Zhang, Y. Cheng, X. Huang, G. Pang, and W. Shen, “A survey on visual anomaly detection: Challenge, approach, and prospect,”arXiv preprint arXiv:2401.16402, 2024. 1

arXiv 2024
[17]

Ader: A comprehensive benchmark for multi- class visual anomaly detection,

J. Zhang, H. He, Z. Gan, Q. He, Y. Cai, Z. Xue, Y. Wang, C. Wang, L. Xie, and Y. Liu, “Ader: A comprehensive benchmark for multi- class visual anomaly detection,”arXiv preprint arXiv:2406.03262,

arXiv
[18]

Towards total recall in industrial anomaly detection,

K. Roth, L. Pemula, J. Zepeda, B. Schölkopf, T. Brox, and P . Gehler, “Towards total recall in industrial anomaly detection,” inCVPR,
[19]

Padim: a patch distribution modeling framework for anomaly detection and localization,

T. Defard, A. Setkov, A. Loesch, and R. Audigier, “Padim: a patch distribution modeling framework for anomaly detection and localization,” inICPR, 2021. 1, 2

2021
[20]

Uninformed students: Student-teacher anomaly detection with discriminative latent embeddings,

P . Bergmann, M. Fauser, D. Sattlegger, and C. Steger, “Uninformed students: Student-teacher anomaly detection with discriminative latent embeddings,” inCVPR, 2020. 1, 2, 8

2020
[21]

Bias: Incorporating biased knowledge to boost unsupervised image anomaly localiza- tion,

Y. Cao, X. Xu, C. Sun, L. Gao, and W. Shen, “Bias: Incorporating biased knowledge to boost unsupervised image anomaly localiza- tion,”IEEE Transactions on Systems, Man, and Cybernetics: Systems,
[22]

Draem-a discriminatively trained reconstruction embedding for surface anomaly detection,

V . Zavrtanik, M. Kristan, and D. Skoˇ caj, “Draem-a discriminatively trained reconstruction embedding for surface anomaly detection,” inICCV, 2021. 1, 3, 8

2021
[23]

Cutpaste: Self-supervised learning for anomaly detection and localization,

C.-L. Li, K. Sohn, J. Yoon, and T. Pfister, “Cutpaste: Self-supervised learning for anomaly detection and localization,” inCVPR, 2021. 1, 3

2021
[24]

Omni- frequency channel-selection representations for unsupervised anomaly detection,

Y. Liang, J. Zhang, S. Zhao, R. Wu, Y. Liu, and S. Pan, “Omni- frequency channel-selection representations for unsupervised anomaly detection,”TIP, 2023. 1, 3

2023
[25]

Informative knowledge distillation for image anomaly segmentation,

Y. Cao, Q. Wan, W. Shen, and L. Gao, “Informative knowledge distillation for image anomaly segmentation,”Knowledge-Based Systems, vol. 248, p. 108846, 2022. 1 IEEE TRANSACTIONS ON PATTERN ANAL YSIS AND MACHINE INTELLIGENCE 14

2022
[26]

Attention is all you need,

A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, Ł. Kaiser, and I. Polosukhin, “Attention is all you need,” Advances in neural information processing systems, vol. 30, 2017. 1

2017
[27]

An image is worth 16x16 words: Transformers for image recognition at scale,

A. Dosovitskiy, L. Beyer, A. Kolesnikov, D. Weissenborn, X. Zhai, T. Unterthiner, M. Dehghani, M. Minderer, G. Heigold, S. Gelly et al., “An image is worth 16x16 words: Transformers for image recognition at scale,”arXiv preprint arXiv:2010.11929, 2020. 1

Pith/arXiv arXiv 2010
[28]

Mamba: Linear-time sequence modeling with selective state spaces,

A. Gu and T. Dao, “Mamba: Linear-time sequence modeling with selective state spaces,”arXiv preprint arXiv:2312.00752, 2023. 1, 4, 9

Pith/arXiv arXiv 2023
[29]

Transformers are ssms: Generalized models and efficient algorithms through structured state space duality,

T. Dao and A. Gu, “Transformers are ssms: Generalized models and efficient algorithms through structured state space duality,” arXiv preprint arXiv:2405.21060, 2024. 1, 2, 4, 9

Pith/arXiv arXiv 2024
[30]

Vmamba: Visual state space model,

Y. Liu, Y. Tian, Y. Zhao, H. Yu, L. Xie, Y. Wang, Q. Ye, and Y. Liu, “Vmamba: Visual state space model,”arXiv preprint arXiv:2401.10166, 2024. 1, 4

Pith/arXiv arXiv 2024
[31]

Vision mamba: Efficient visual representation learning with bidirectional state space model,

L. Zhu, B. Liao, Q. Zhang, X. Wang, W. Liu, and X. Wang, “Vision mamba: Efficient visual representation learning with bidirectional state space model,”arXiv preprint arXiv:2401.09417, 2024. 1, 4

Pith/arXiv arXiv 2024
[32]

Vmambair: Visual state space model for image restoration,

Y. Shi, B. Xia, X. Jin, X. Wang, T. Zhao, X. Xia, X. Xiao, and W. Yang, “Vmambair: Visual state space model for image restoration,”arXiv preprint arXiv:2403.11423, 2024. 1, 4

arXiv 2024
[33]

Local- mamba: Visual state space model with windowed selective scan,

T. Huang, X. Pei, S. You, F. Wang, C. Qian, and C. Xu, “Local- mamba: Visual state space model with windowed selective scan,” arXiv preprint arXiv:2403.09338, 2024. 1, 4

arXiv 2024
[34]

Vm-unet: Vision mamba unet for medical image segmentation,

J. Ruan and S. Xiang, “Vm-unet: Vision mamba unet for medical image segmentation,”arXiv preprint arXiv:2402.02491, 2024. 1, 4

arXiv 2024
[35]

Mamba-unet: Unet-like pure visual mamba for medical image segmentation,

Z. Wang, J.-Q. Zheng, Y. Zhang, G. Cui, and L. Li, “Mamba-unet: Unet-like pure visual mamba for medical image segmentation,” arXiv preprint arXiv:2402.05079, 2024. 1, 4

arXiv 2024
[36]

Mamba-nd: Selective state space modeling for multi-dimensional data,

S. Li, H. Singh, and A. Grover, “Mamba-nd: Selective state space modeling for multi-dimensional data,” inEuropean Conference on Computer Vision. Springer, 2024, pp. 75–92. 1, 4

2024
[37]

Mambavision: A hybrid mamba- transformer vision backbone,

A. Hatamizadeh and J. Kautz, “Mambavision: A hybrid mamba- transformer vision backbone,” inProceedings of the Computer Vision and Pattern Recognition Conference, 2025, pp. 25 261–25 270. 1, 4

2025
[38]

Mambaad: Exploring state space models for multi-class unsupervised anomaly detection,

H. He, Y. Bai, J. Zhang, Q. He, H. Chen, Z. Gan, C. Wang, X. Li, G. Tian, and L. Xie, “Mambaad: Exploring state space models for multi-class unsupervised anomaly detection,”Advances in Neural Information Processing Systems, vol. 37, pp. 71 162–71 187, 2024. 1, 2, 3, 4, 5, 8, 9, 12

2024
[39]

Mamba-3: Improved sequence modeling using state space principles,

A. Lahoti, K. Y. Li, B. Chen, C. Wang, A. Bick, J. Z. Kolter, T. Dao, and A. Gu, “Mamba-3: Improved sequence modeling using state space principles,”arXiv preprint arXiv:2603.15569, 2026. 2

arXiv 2026
[40]

Roformer: Enhanced transformer with rotary position embedding,

J. Su, M. Ahmed, Y. Lu, S. Pan, W. Bo, and Y. Liu, “Roformer: Enhanced transformer with rotary position embedding,”Neuro- computing, vol. 568, p. 127063, 2024. 2, 5, 9

2024
[41]

A theory for multiresolution signal decomposition: the wavelet representation,

S. G. Mallat, “A theory for multiresolution signal decomposition: the wavelet representation,”IEEE transactions on pattern analysis and machine intelligence, vol. 11, no. 7, pp. 674–693, 2002. 2

2002
[42]

Wavelet convolutions for large receptive fields,

S. E. Finder, R. Amoyal, E. Treister, and O. Freifeld, “Wavelet convolutions for large receptive fields,” inEuropean conference on computer vision. Springer, 2024, pp. 363–380. 2, 5

2024
[43]

Über die stetige abbildung einer linie auf ein flächenstück,

D. Hilbert, “Über die stetige abbildung einer linie auf ein flächenstück,” inDritter Band: Analysis· Grundlagen der Mathematik· Physik Verschiedenes: Nebst Einer Lebensgeschichte. Springer, 1935, pp. 1–2. 2

1935
[44]

Feature pyramid networks for object detection,

T.-Y. Lin, P . Dollár, R. Girshick, K. He, B. Hariharan, and S. Belongie, “Feature pyramid networks for object detection,” inProceedings of the IEEE conference on computer vision and pattern recognition, 2017, pp. 2117–2125. 2

2017
[45]

Sub-image anomaly detection with deep pyramid correspondences,

N. Cohen and Y. Hoshen, “Sub-image anomaly detection with deep pyramid correspondences,”arXiv preprint arXiv:2005.02357,

arXiv 2005
[46]

Same same but differnet: Semi-supervised defect detection with normalizing flows,

M. Rudolph, B. Wandt, and B. Rosenhahn, “Same same but differnet: Semi-supervised defect detection with normalizing flows,” inWACV, 2021. 2

2021
[47]

Cflow-ad: Real-time unsupervised anomaly detection with localization via conditional normalizing flows,

D. Gudovskiy, S. Ishizaka, and K. Kozuka, “Cflow-ad: Real-time unsupervised anomaly detection with localization via conditional normalizing flows,” inWACV, 2022. 2

2022
[48]

Fastflow: Unsupervised anomaly detection and localization via 2d normalizing flows,

J. Yu, Y. Zheng, X. Wang, W. Li, Y. Wu, R. Zhao, and L. Wu, “Fastflow: Unsupervised anomaly detection and localization via 2d normalizing flows,”arXiv preprint arXiv:2111.07677, 2021. 2

arXiv 2021
[49]

Msflow: Multiscale flow-based framework for unsupervised anomaly detection,

Y. Zhou, X. Xu, J. Song, F. Shen, and H. T. Shen, “Msflow: Multiscale flow-based framework for unsupervised anomaly detection,”IEEE transactions on neural networks and learning systems,
[50]

Imagenet: A large-scale hierarchical image database,

J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, and L. Fei-Fei, “Imagenet: A large-scale hierarchical image database,” inCVPR,
[51]

Student-teacher feature pyramid matching for anomaly detection,

G. Wang, S. Han, E. Ding, and D. Huang, “Student-teacher feature pyramid matching for anomaly detection,”arXiv preprint arXiv:2103.04257, 2021. 2

arXiv 2021
[52]

Multiresolution knowledge distillation for anomaly detection,

M. Salehi, N. Sadjadi, S. Baselizadeh, M. H. Rohban, and H. R. Rabiee, “Multiresolution knowledge distillation for anomaly detection,” inProceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2021, pp. 14 902–14 912. 2

2021
[53]

Revisiting reverse distillation for anomaly detection,

T. D. Tien, A. T. Nguyen, N. H. Tran, T. D. Huy, S. Duong, C. D. T. Nguyen, and S. Q. Truong, “Revisiting reverse distillation for anomaly detection,” inProceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2023, pp. 24 511–24 520. 2

2023
[54]

Panda: Adapting pretrained features for anomaly detection and segmentation,

T. Reiss, N. Cohen, L. Bergman, and Y. Hoshen, “Panda: Adapting pretrained features for anomaly detection and segmentation,” in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2021, pp. 2806–2814. 2

2021
[55]

Cfa: Coupled-hypersphere-based feature adaptation for target-oriented anomaly localization,

S. Lee, S. Lee, and B. C. Song, “Cfa: Coupled-hypersphere-based feature adaptation for target-oriented anomaly localization,”IEEE Access, vol. 10, pp. 78 446–78 454, 2022. 2

2022
[56]

Reconpatch: Contrastive patch representation learning for indus- trial anomaly detection,

J. Hyun, S. Kim, G. Jeon, S. H. Kim, K. Bae, and B. J. Kang, “Reconpatch: Contrastive patch representation learning for indus- trial anomaly detection,” inProceedings of the IEEE/CVF winter conference on applications of computer vision, 2024, pp. 2052–2061. 2

2024
[57]

Deep one-class classification,

L. Ruff, R. Vandermeulen, N. Goernitz, L. Deecke, S. A. Siddiqui, A. Binder, E. Müller, and M. Kloft, “Deep one-class classification,” inInternational conference on machine learning. PMLR, 2018, pp. 4393–4402. 3

2018
[58]

Patch svdd: Patch-level svdd for anomaly detection and segmentation,

J. Yi and S. Yoon, “Patch svdd: Patch-level svdd for anomaly detection and segmentation,” inProceedings of the Asian conference on computer vision, 2020. 3

2020
[59]

Pni: Industrial anomaly detection using position and neighborhood information,

J. Bae, J.-H. Lee, and S. Kim, “Pni: Industrial anomaly detection using position and neighborhood information,” inProceedings of the IEEE/CVF International Conference on Computer Vision, 2023, pp. 6373–6383. 3

2023
[60]

Natural synthetic anomalies for self-supervised anomaly detection and localization,

H. M. Schlüter, J. Tan, B. Hou, and B. Kainz, “Natural synthetic anomalies for self-supervised anomaly detection and localization,” inECCV, 2022. 3

2022
[61]

Anomalydiffusion: Few-shot anomaly image generation with diffusion model,

T. Hu, J. Zhang, R. Yi, Y. Du, X. Chen, L. Liu, Y. Wang, and C. Wang, “Anomalydiffusion: Few-shot anomaly image generation with diffusion model,” inAAAI, 2024. 3

2024
[62]

A unified anomaly synthesis strategy with gradient ascent for industrial anomaly detection and localization,

Q. Chen, H. Luo, C. Lv, and Z. Zhang, “A unified anomaly synthesis strategy with gradient ascent for industrial anomaly detection and localization,” inEuropean Conference on Computer Vision. Springer, 2024, pp. 37–54. 3

2024
[63]

Dsr–a dual subspace re-projection network for surface anomaly detection,

V . Zavrtanik, M. Kristan, and D. Skoˇ caj, “Dsr–a dual subspace re-projection network for surface anomaly detection,” inEuropean conference on computer vision. Springer, 2022, pp. 539–554. 3

2022
[64]

Collaborative discrepancy optimization for reliable image anomaly localization,

Y. Cao, X. Xu, Z. Liu, and W. Shen, “Collaborative discrepancy optimization for reliable image anomaly localization,”IEEE Transactions on Industrial Informatics, vol. 19, no. 11, pp. 10 674– 10 683, 2023. 3

2023
[65]

Im- proving unsupervised defect segmentation by applying structural similarity to autoencoders,

P . Bergmann, S. Löwe, M. Fauser, D. Sattlegger, and C. Steger, “Im- proving unsupervised defect segmentation by applying structural similarity to autoencoders,”arXiv preprint arXiv:1807.02011, 2018. 3

Pith/arXiv arXiv 2018
[66]

Reconstruction by inpainting for visual anomaly detection,

V . Zavrtanik, M. Kristan, and D. Skoˇ caj, “Reconstruction by inpainting for visual anomaly detection,”Pattern Recognition, vol. 112, p. 107706, 2021. 3

2021
[67]

Self-supervised predictive convolutional attentive block for anomaly detection,

N.-C. Ristea, N. Madan, R. T. Ionescu, K. Nasrollahi, F. S. Khan, T. B. Moeslund, and M. Shah, “Self-supervised predictive convolutional attentive block for anomaly detection,” inCVPR,
[68]

Memorizing normality to detect anomaly: Memory-augmented deep autoencoder for unsupervised anomaly detection,

D. Gong, L. Liu, V . Le, B. Saha, M. R. Mansour, S. Venkatesh, and A. v. d. Hengel, “Memorizing normality to detect anomaly: Memory-augmented deep autoencoder for unsupervised anomaly detection,” inProceedings of the IEEE/CVF international conference on computer vision, 2019, pp. 1705–1714. 3

2019
[69]

Inpainting transformer for anomaly detection,

J. Pirnay and K. Chai, “Inpainting transformer for anomaly detection,” inICIAP, 2022. 3

2022
[70]

f-anogan: Fast unsupervised anomaly detection with generative adversarial networks [j],

T. Schlegl, P . Seeböck, S. M. Waldstein, G. Langs, and U. Schmidt- Erfurth, “f-anogan: Fast unsupervised anomaly detection with generative adversarial networks [j],”Medical image analysis, vol. 54, pp. 30–44, 2019. 3 IEEE TRANSACTIONS ON PATTERN ANAL YSIS AND MACHINE INTELLIGENCE 15

2019
[71]

Ganomaly: Semi-supervised anomaly detection via adversarial training,

S. Akcay, A. Atapour-Abarghouei, and T. P . Breckon, “Ganomaly: Semi-supervised anomaly detection via adversarial training,” in Asian conference on computer vision. Springer, 2018, pp. 622–637. 3

2018
[72]

Learning semantic context from normal samples for unsupervised anomaly detection,

X. Yan, H. Zhang, X. Xu, X. Hu, and P .-A. Heng, “Learning semantic context from normal samples for unsupervised anomaly detection,” inAAAI, 2021. 3

2021
[73]

Anoddpm: Anomaly detection with denoising diffusion probabilistic models using simplex noise,

J. Wyatt, A. Leach, S. M. Schmon, and C. G. Willcocks, “Anoddpm: Anomaly detection with denoising diffusion probabilistic models using simplex noise,” inCVPR, 2022. 3

2022
[74]

Hierar- chical vector quantized transformer for multi-class unsupervised anomaly detection,

R. Lu, Y. Wu, L. Tian, D. Wang, B. Chen, X. Liu, and R. Hu, “Hierar- chical vector quantized transformer for multi-class unsupervised anomaly detection,” inNeurlPS, 2023. 4

2023
[75]

Promptad: Learning prompts with only normal samples for few- shot anomaly detection,

X. Li, Z. Zhang, X. Tan, C. Chen, Y. Qu, Y. Xie, and L. Ma, “Promptad: Learning prompts with only normal samples for few- shot anomaly detection,” inProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024, pp. 16 838–16 848. 4, 8, 9, 12

2024
[76]

Combining recurrent, convolutional, and continuous-time models with linear state space layers,

A. Gu, I. Johnson, K. Goel, K. Saab, T. Dao, A. Rudra, and C. Ré, “Combining recurrent, convolutional, and continuous-time models with linear state space layers,” inNeurlPS, 2021. 4

2021
[77]

Efficiently modeling long sequences with structured state spaces,

A. Gu, K. Goel, and C. Ré, “Efficiently modeling long sequences with structured state spaces,” inICLR, 2021. 4

2021
[78]

Simplified state space layers for sequence modeling,

J. T. Smith, A. Warrington, and S. W. Linderman, “Simplified state space layers for sequence modeling,” inICLR, 2022. 4

2022
[79]

Long range language modeling via gated state spaces,

H. Mehta, A. Gupta, A. Cutkosky, and B. Neyshabur, “Long range language modeling via gated state spaces,” inICLR, 2022. 4

2022
[80]

Hungry hungry hippos: Towards language modeling with state space models,

D. Y. Fu, T. Dao, K. K. Saab, A. W. Thomas, A. Rudra, and C. Ré, “Hungry hungry hippos: Towards language modeling with state space models,” inICLR, 2022. 4

2022

Showing first 80 references.

[1] [1]

Deep learning for anomaly detection: A review,

G. Pang, C. Shen, L. Cao, and A. V . D. Hengel, “Deep learning for anomaly detection: A review,”ACM computing surveys (CSUR), vol. 54, no. 2, pp. 1–38, 2021. 1

2021

[2] [2]

The mvtec 3d-ad dataset for unsupervised 3d anomaly detection and localization,

P . Bergmann, X. Jin, D. Sattlegger, and C. Steger, “The mvtec 3d-ad dataset for unsupervised 3d anomaly detection and localization,” inVISAPP, 2022. 1, 7, 8, 9, 16, 17

2022

[3] [3]

The mvtec anomaly detection dataset: a comprehensive real-world dataset for unsupervised anomaly detection,

P . Bergmann, K. Batzner, M. Fauser, D. Sattlegger, and C. Steger, “The mvtec anomaly detection dataset: a comprehensive real-world dataset for unsupervised anomaly detection,”International Journal of Computer Vision, vol. 129, no. 4, pp. 1038–1059, 2021. 1, 7, 8, 9, 16, 17

2021

[4] [4]

Spot-the- difference self-supervised pre-training for anomaly detection and segmentation,

Y. Zou, J. Jeong, L. Pemula, D. Zhang, and O. Dabeer, “Spot-the- difference self-supervised pre-training for anomaly detection and segmentation,” inECCV, 2022. 1, 7, 8, 9, 16, 18

2022

[5] [5]

Real-iad: A real-world multi-view dataset for benchmarking versatile industrial anomaly detection,

C. Wang, W. Zhu, B.-B. Gao, Z. Gan, J. Zhang, Z. Gu, S. Qian, M. Chen, and L. Ma, “Real-iad: A real-world multi-view dataset for benchmarking versatile industrial anomaly detection,” in CVPR, 2024. 1, 7, 8

2024

[6] [6]

Anomaly detection via reverse distillation from one-class embedding,

H. Deng and X. Li, “Anomaly detection via reverse distillation from one-class embedding,” inCVPR, 2022. 1, 2, 3, 8, 9, 12, 18, 19, 20, 21, 22, 23

2022

[7] [7]

Simplenet: A simple network for image anomaly detection and localization,

Z. Liu, Y. Zhou, Y. Xu, and Z. Wang, “Simplenet: A simple network for image anomaly detection and localization,” inCVPR, 2023. 1, 8, 9, 12, 18, 19, 20, 21, 22, 23

2023

[8] [8]

Destseg: Segmentation guided denoising student-teacher for anomaly detection,

X. Zhang, S. Li, X. Li, P . Huang, J. Shan, and T. Chen, “Destseg: Segmentation guided denoising student-teacher for anomaly detection,” inCVPR, 2023. 1, 8, 9, 12, 18, 19, 20, 21, 22, 23

2023

[9] [9]

Efficientad: Accurate visual anomaly detection at millisecond-level latencies,

K. Batzner, L. Heckler, and R. König, “Efficientad: Accurate visual anomaly detection at millisecond-level latencies,” inProceedings of the IEEE/CVF winter conference on applications of computer vision, 2024, pp. 128–138. 1, 3

2024

[10] [10]

A unified model for multi-class anomaly detection,

Z. You, L. Cui, Y. Shen, K. Yang, X. Lu, Y. Zheng, and X. Le, “A unified model for multi-class anomaly detection,” inNeurlPS,

[11] [11]

1, 3, 4, 8, 9, 12, 18, 19, 20, 21, 22, 23

[12] [12]

A diffusion-based framework for multi- class anomaly detection,

H. He, J. Zhang, H. Chen, X. Chen, Z. Li, X. Chen, Y. Wang, C. Wang, and L. Xie, “A diffusion-based framework for multi- class anomaly detection,” inAAAI, 2024. 1, 3, 4, 8, 12, 18, 19, 20, 21

2024

[13] [13]

Exploring plain vit reconstruction for multi-class unsupervised anomaly detection,

J. Zhang, X. Chen, Y. Wang, C. Wang, Y. Liu, X. Li, M.-H. Yang, and D. Tao, “Exploring plain vit reconstruction for multi-class unsupervised anomaly detection,”arXiv preprint arXiv:2312.07495,

arXiv

[14] [14]

Learning feature inversion for multi-class unsupervised anomaly detection under general-purpose coco-ad benchmark,

J. Zhang, X. Li, G. Tian, Z. Xue, Y. Liu, G. Pang, and D. Tao, “Learning feature inversion for multi-class unsupervised anomaly detection under general-purpose coco-ad benchmark,”arXiv, 2024. 1, 3, 4, 7, 8, 16, 17

2024

[15] [15]

Unimmad: Unified multi-modal and multi-class anomaly de- tection via moe-driven feature decompression,

Y. Zhao, Y. Pang, L. Zhang, H. Liu, J. Zuo, H. Lu, and X. Zhao, “Unimmad: Unified multi-modal and multi-class anomaly de- tection via moe-driven feature decompression,”arXiv preprint arXiv:2509.25934, 2025. 1, 4, 8

arXiv 2025

[16] [16]

A survey on visual anomaly detection: Challenge, approach, and prospect,

Y. Cao, X. Xu, J. Zhang, Y. Cheng, X. Huang, G. Pang, and W. Shen, “A survey on visual anomaly detection: Challenge, approach, and prospect,”arXiv preprint arXiv:2401.16402, 2024. 1

arXiv 2024

[17] [17]

Ader: A comprehensive benchmark for multi- class visual anomaly detection,

J. Zhang, H. He, Z. Gan, Q. He, Y. Cai, Z. Xue, Y. Wang, C. Wang, L. Xie, and Y. Liu, “Ader: A comprehensive benchmark for multi- class visual anomaly detection,”arXiv preprint arXiv:2406.03262,

arXiv

[18] [18]

Towards total recall in industrial anomaly detection,

K. Roth, L. Pemula, J. Zepeda, B. Schölkopf, T. Brox, and P . Gehler, “Towards total recall in industrial anomaly detection,” inCVPR,

[19] [19]

Padim: a patch distribution modeling framework for anomaly detection and localization,

T. Defard, A. Setkov, A. Loesch, and R. Audigier, “Padim: a patch distribution modeling framework for anomaly detection and localization,” inICPR, 2021. 1, 2

2021

[20] [20]

Uninformed students: Student-teacher anomaly detection with discriminative latent embeddings,

P . Bergmann, M. Fauser, D. Sattlegger, and C. Steger, “Uninformed students: Student-teacher anomaly detection with discriminative latent embeddings,” inCVPR, 2020. 1, 2, 8

2020

[21] [21]

Bias: Incorporating biased knowledge to boost unsupervised image anomaly localiza- tion,

Y. Cao, X. Xu, C. Sun, L. Gao, and W. Shen, “Bias: Incorporating biased knowledge to boost unsupervised image anomaly localiza- tion,”IEEE Transactions on Systems, Man, and Cybernetics: Systems,

[22] [22]

Draem-a discriminatively trained reconstruction embedding for surface anomaly detection,

V . Zavrtanik, M. Kristan, and D. Skoˇ caj, “Draem-a discriminatively trained reconstruction embedding for surface anomaly detection,” inICCV, 2021. 1, 3, 8

2021

[23] [23]

Cutpaste: Self-supervised learning for anomaly detection and localization,

C.-L. Li, K. Sohn, J. Yoon, and T. Pfister, “Cutpaste: Self-supervised learning for anomaly detection and localization,” inCVPR, 2021. 1, 3

2021

[24] [24]

Omni- frequency channel-selection representations for unsupervised anomaly detection,

Y. Liang, J. Zhang, S. Zhao, R. Wu, Y. Liu, and S. Pan, “Omni- frequency channel-selection representations for unsupervised anomaly detection,”TIP, 2023. 1, 3

2023

[25] [25]

Informative knowledge distillation for image anomaly segmentation,

Y. Cao, Q. Wan, W. Shen, and L. Gao, “Informative knowledge distillation for image anomaly segmentation,”Knowledge-Based Systems, vol. 248, p. 108846, 2022. 1 IEEE TRANSACTIONS ON PATTERN ANAL YSIS AND MACHINE INTELLIGENCE 14

2022

[26] [26]

Attention is all you need,

A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, Ł. Kaiser, and I. Polosukhin, “Attention is all you need,” Advances in neural information processing systems, vol. 30, 2017. 1

2017

[27] [27]

An image is worth 16x16 words: Transformers for image recognition at scale,

A. Dosovitskiy, L. Beyer, A. Kolesnikov, D. Weissenborn, X. Zhai, T. Unterthiner, M. Dehghani, M. Minderer, G. Heigold, S. Gelly et al., “An image is worth 16x16 words: Transformers for image recognition at scale,”arXiv preprint arXiv:2010.11929, 2020. 1

Pith/arXiv arXiv 2010

[28] [28]

Mamba: Linear-time sequence modeling with selective state spaces,

A. Gu and T. Dao, “Mamba: Linear-time sequence modeling with selective state spaces,”arXiv preprint arXiv:2312.00752, 2023. 1, 4, 9

Pith/arXiv arXiv 2023

[29] [29]

Transformers are ssms: Generalized models and efficient algorithms through structured state space duality,

T. Dao and A. Gu, “Transformers are ssms: Generalized models and efficient algorithms through structured state space duality,” arXiv preprint arXiv:2405.21060, 2024. 1, 2, 4, 9

Pith/arXiv arXiv 2024

[30] [30]

Vmamba: Visual state space model,

Y. Liu, Y. Tian, Y. Zhao, H. Yu, L. Xie, Y. Wang, Q. Ye, and Y. Liu, “Vmamba: Visual state space model,”arXiv preprint arXiv:2401.10166, 2024. 1, 4

Pith/arXiv arXiv 2024

[31] [31]

Vision mamba: Efficient visual representation learning with bidirectional state space model,

L. Zhu, B. Liao, Q. Zhang, X. Wang, W. Liu, and X. Wang, “Vision mamba: Efficient visual representation learning with bidirectional state space model,”arXiv preprint arXiv:2401.09417, 2024. 1, 4

Pith/arXiv arXiv 2024

[32] [32]

Vmambair: Visual state space model for image restoration,

Y. Shi, B. Xia, X. Jin, X. Wang, T. Zhao, X. Xia, X. Xiao, and W. Yang, “Vmambair: Visual state space model for image restoration,”arXiv preprint arXiv:2403.11423, 2024. 1, 4

arXiv 2024

[33] [33]

Local- mamba: Visual state space model with windowed selective scan,

T. Huang, X. Pei, S. You, F. Wang, C. Qian, and C. Xu, “Local- mamba: Visual state space model with windowed selective scan,” arXiv preprint arXiv:2403.09338, 2024. 1, 4

arXiv 2024

[34] [34]

Vm-unet: Vision mamba unet for medical image segmentation,

J. Ruan and S. Xiang, “Vm-unet: Vision mamba unet for medical image segmentation,”arXiv preprint arXiv:2402.02491, 2024. 1, 4

arXiv 2024

[35] [35]

Mamba-unet: Unet-like pure visual mamba for medical image segmentation,

Z. Wang, J.-Q. Zheng, Y. Zhang, G. Cui, and L. Li, “Mamba-unet: Unet-like pure visual mamba for medical image segmentation,” arXiv preprint arXiv:2402.05079, 2024. 1, 4

arXiv 2024

[36] [36]

Mamba-nd: Selective state space modeling for multi-dimensional data,

S. Li, H. Singh, and A. Grover, “Mamba-nd: Selective state space modeling for multi-dimensional data,” inEuropean Conference on Computer Vision. Springer, 2024, pp. 75–92. 1, 4

2024

[37] [37]

Mambavision: A hybrid mamba- transformer vision backbone,

A. Hatamizadeh and J. Kautz, “Mambavision: A hybrid mamba- transformer vision backbone,” inProceedings of the Computer Vision and Pattern Recognition Conference, 2025, pp. 25 261–25 270. 1, 4

2025

[38] [38]

Mambaad: Exploring state space models for multi-class unsupervised anomaly detection,

H. He, Y. Bai, J. Zhang, Q. He, H. Chen, Z. Gan, C. Wang, X. Li, G. Tian, and L. Xie, “Mambaad: Exploring state space models for multi-class unsupervised anomaly detection,”Advances in Neural Information Processing Systems, vol. 37, pp. 71 162–71 187, 2024. 1, 2, 3, 4, 5, 8, 9, 12

2024

[39] [39]

Mamba-3: Improved sequence modeling using state space principles,

A. Lahoti, K. Y. Li, B. Chen, C. Wang, A. Bick, J. Z. Kolter, T. Dao, and A. Gu, “Mamba-3: Improved sequence modeling using state space principles,”arXiv preprint arXiv:2603.15569, 2026. 2

arXiv 2026

[40] [40]

Roformer: Enhanced transformer with rotary position embedding,

J. Su, M. Ahmed, Y. Lu, S. Pan, W. Bo, and Y. Liu, “Roformer: Enhanced transformer with rotary position embedding,”Neuro- computing, vol. 568, p. 127063, 2024. 2, 5, 9

2024

[41] [41]

A theory for multiresolution signal decomposition: the wavelet representation,

S. G. Mallat, “A theory for multiresolution signal decomposition: the wavelet representation,”IEEE transactions on pattern analysis and machine intelligence, vol. 11, no. 7, pp. 674–693, 2002. 2

2002

[42] [42]

Wavelet convolutions for large receptive fields,

S. E. Finder, R. Amoyal, E. Treister, and O. Freifeld, “Wavelet convolutions for large receptive fields,” inEuropean conference on computer vision. Springer, 2024, pp. 363–380. 2, 5

2024

[43] [43]

Über die stetige abbildung einer linie auf ein flächenstück,

D. Hilbert, “Über die stetige abbildung einer linie auf ein flächenstück,” inDritter Band: Analysis· Grundlagen der Mathematik· Physik Verschiedenes: Nebst Einer Lebensgeschichte. Springer, 1935, pp. 1–2. 2

1935

[44] [44]

Feature pyramid networks for object detection,

T.-Y. Lin, P . Dollár, R. Girshick, K. He, B. Hariharan, and S. Belongie, “Feature pyramid networks for object detection,” inProceedings of the IEEE conference on computer vision and pattern recognition, 2017, pp. 2117–2125. 2

2017

[45] [45]

Sub-image anomaly detection with deep pyramid correspondences,

N. Cohen and Y. Hoshen, “Sub-image anomaly detection with deep pyramid correspondences,”arXiv preprint arXiv:2005.02357,

arXiv 2005

[46] [46]

Same same but differnet: Semi-supervised defect detection with normalizing flows,

M. Rudolph, B. Wandt, and B. Rosenhahn, “Same same but differnet: Semi-supervised defect detection with normalizing flows,” inWACV, 2021. 2

2021

[47] [47]

Cflow-ad: Real-time unsupervised anomaly detection with localization via conditional normalizing flows,

D. Gudovskiy, S. Ishizaka, and K. Kozuka, “Cflow-ad: Real-time unsupervised anomaly detection with localization via conditional normalizing flows,” inWACV, 2022. 2

2022

[48] [48]

Fastflow: Unsupervised anomaly detection and localization via 2d normalizing flows,

J. Yu, Y. Zheng, X. Wang, W. Li, Y. Wu, R. Zhao, and L. Wu, “Fastflow: Unsupervised anomaly detection and localization via 2d normalizing flows,”arXiv preprint arXiv:2111.07677, 2021. 2

arXiv 2021

[49] [49]

Msflow: Multiscale flow-based framework for unsupervised anomaly detection,

Y. Zhou, X. Xu, J. Song, F. Shen, and H. T. Shen, “Msflow: Multiscale flow-based framework for unsupervised anomaly detection,”IEEE transactions on neural networks and learning systems,

[50] [50]

Imagenet: A large-scale hierarchical image database,

J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, and L. Fei-Fei, “Imagenet: A large-scale hierarchical image database,” inCVPR,

[51] [51]

Student-teacher feature pyramid matching for anomaly detection,

G. Wang, S. Han, E. Ding, and D. Huang, “Student-teacher feature pyramid matching for anomaly detection,”arXiv preprint arXiv:2103.04257, 2021. 2

arXiv 2021

[52] [52]

Multiresolution knowledge distillation for anomaly detection,

M. Salehi, N. Sadjadi, S. Baselizadeh, M. H. Rohban, and H. R. Rabiee, “Multiresolution knowledge distillation for anomaly detection,” inProceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2021, pp. 14 902–14 912. 2

2021

[53] [53]

Revisiting reverse distillation for anomaly detection,

T. D. Tien, A. T. Nguyen, N. H. Tran, T. D. Huy, S. Duong, C. D. T. Nguyen, and S. Q. Truong, “Revisiting reverse distillation for anomaly detection,” inProceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2023, pp. 24 511–24 520. 2

2023

[54] [54]

Panda: Adapting pretrained features for anomaly detection and segmentation,

T. Reiss, N. Cohen, L. Bergman, and Y. Hoshen, “Panda: Adapting pretrained features for anomaly detection and segmentation,” in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2021, pp. 2806–2814. 2

2021

[55] [55]

Cfa: Coupled-hypersphere-based feature adaptation for target-oriented anomaly localization,

S. Lee, S. Lee, and B. C. Song, “Cfa: Coupled-hypersphere-based feature adaptation for target-oriented anomaly localization,”IEEE Access, vol. 10, pp. 78 446–78 454, 2022. 2

2022

[56] [56]

Reconpatch: Contrastive patch representation learning for indus- trial anomaly detection,

J. Hyun, S. Kim, G. Jeon, S. H. Kim, K. Bae, and B. J. Kang, “Reconpatch: Contrastive patch representation learning for indus- trial anomaly detection,” inProceedings of the IEEE/CVF winter conference on applications of computer vision, 2024, pp. 2052–2061. 2

2024

[57] [57]

Deep one-class classification,

L. Ruff, R. Vandermeulen, N. Goernitz, L. Deecke, S. A. Siddiqui, A. Binder, E. Müller, and M. Kloft, “Deep one-class classification,” inInternational conference on machine learning. PMLR, 2018, pp. 4393–4402. 3

2018

[58] [58]

Patch svdd: Patch-level svdd for anomaly detection and segmentation,

J. Yi and S. Yoon, “Patch svdd: Patch-level svdd for anomaly detection and segmentation,” inProceedings of the Asian conference on computer vision, 2020. 3

2020

[59] [59]

Pni: Industrial anomaly detection using position and neighborhood information,

J. Bae, J.-H. Lee, and S. Kim, “Pni: Industrial anomaly detection using position and neighborhood information,” inProceedings of the IEEE/CVF International Conference on Computer Vision, 2023, pp. 6373–6383. 3

2023

[60] [60]

Natural synthetic anomalies for self-supervised anomaly detection and localization,

H. M. Schlüter, J. Tan, B. Hou, and B. Kainz, “Natural synthetic anomalies for self-supervised anomaly detection and localization,” inECCV, 2022. 3

2022

[61] [61]

Anomalydiffusion: Few-shot anomaly image generation with diffusion model,

T. Hu, J. Zhang, R. Yi, Y. Du, X. Chen, L. Liu, Y. Wang, and C. Wang, “Anomalydiffusion: Few-shot anomaly image generation with diffusion model,” inAAAI, 2024. 3

2024

[62] [62]

A unified anomaly synthesis strategy with gradient ascent for industrial anomaly detection and localization,

Q. Chen, H. Luo, C. Lv, and Z. Zhang, “A unified anomaly synthesis strategy with gradient ascent for industrial anomaly detection and localization,” inEuropean Conference on Computer Vision. Springer, 2024, pp. 37–54. 3

2024

[63] [63]

Dsr–a dual subspace re-projection network for surface anomaly detection,

V . Zavrtanik, M. Kristan, and D. Skoˇ caj, “Dsr–a dual subspace re-projection network for surface anomaly detection,” inEuropean conference on computer vision. Springer, 2022, pp. 539–554. 3

2022

[64] [64]

Collaborative discrepancy optimization for reliable image anomaly localization,

Y. Cao, X. Xu, Z. Liu, and W. Shen, “Collaborative discrepancy optimization for reliable image anomaly localization,”IEEE Transactions on Industrial Informatics, vol. 19, no. 11, pp. 10 674– 10 683, 2023. 3

2023

[65] [65]

Im- proving unsupervised defect segmentation by applying structural similarity to autoencoders,

P . Bergmann, S. Löwe, M. Fauser, D. Sattlegger, and C. Steger, “Im- proving unsupervised defect segmentation by applying structural similarity to autoencoders,”arXiv preprint arXiv:1807.02011, 2018. 3

Pith/arXiv arXiv 2018

[66] [66]

Reconstruction by inpainting for visual anomaly detection,

V . Zavrtanik, M. Kristan, and D. Skoˇ caj, “Reconstruction by inpainting for visual anomaly detection,”Pattern Recognition, vol. 112, p. 107706, 2021. 3

2021

[67] [67]

Self-supervised predictive convolutional attentive block for anomaly detection,

N.-C. Ristea, N. Madan, R. T. Ionescu, K. Nasrollahi, F. S. Khan, T. B. Moeslund, and M. Shah, “Self-supervised predictive convolutional attentive block for anomaly detection,” inCVPR,

[68] [68]

Memorizing normality to detect anomaly: Memory-augmented deep autoencoder for unsupervised anomaly detection,

D. Gong, L. Liu, V . Le, B. Saha, M. R. Mansour, S. Venkatesh, and A. v. d. Hengel, “Memorizing normality to detect anomaly: Memory-augmented deep autoencoder for unsupervised anomaly detection,” inProceedings of the IEEE/CVF international conference on computer vision, 2019, pp. 1705–1714. 3

2019

[69] [69]

Inpainting transformer for anomaly detection,

J. Pirnay and K. Chai, “Inpainting transformer for anomaly detection,” inICIAP, 2022. 3

2022

[70] [70]

f-anogan: Fast unsupervised anomaly detection with generative adversarial networks [j],

T. Schlegl, P . Seeböck, S. M. Waldstein, G. Langs, and U. Schmidt- Erfurth, “f-anogan: Fast unsupervised anomaly detection with generative adversarial networks [j],”Medical image analysis, vol. 54, pp. 30–44, 2019. 3 IEEE TRANSACTIONS ON PATTERN ANAL YSIS AND MACHINE INTELLIGENCE 15

2019

[71] [71]

Ganomaly: Semi-supervised anomaly detection via adversarial training,

S. Akcay, A. Atapour-Abarghouei, and T. P . Breckon, “Ganomaly: Semi-supervised anomaly detection via adversarial training,” in Asian conference on computer vision. Springer, 2018, pp. 622–637. 3

2018

[72] [72]

Learning semantic context from normal samples for unsupervised anomaly detection,

X. Yan, H. Zhang, X. Xu, X. Hu, and P .-A. Heng, “Learning semantic context from normal samples for unsupervised anomaly detection,” inAAAI, 2021. 3

2021

[73] [73]

Anoddpm: Anomaly detection with denoising diffusion probabilistic models using simplex noise,

J. Wyatt, A. Leach, S. M. Schmon, and C. G. Willcocks, “Anoddpm: Anomaly detection with denoising diffusion probabilistic models using simplex noise,” inCVPR, 2022. 3

2022

[74] [74]

Hierar- chical vector quantized transformer for multi-class unsupervised anomaly detection,

R. Lu, Y. Wu, L. Tian, D. Wang, B. Chen, X. Liu, and R. Hu, “Hierar- chical vector quantized transformer for multi-class unsupervised anomaly detection,” inNeurlPS, 2023. 4

2023

[75] [75]

Promptad: Learning prompts with only normal samples for few- shot anomaly detection,

X. Li, Z. Zhang, X. Tan, C. Chen, Y. Qu, Y. Xie, and L. Ma, “Promptad: Learning prompts with only normal samples for few- shot anomaly detection,” inProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024, pp. 16 838–16 848. 4, 8, 9, 12

2024

[76] [76]

Combining recurrent, convolutional, and continuous-time models with linear state space layers,

A. Gu, I. Johnson, K. Goel, K. Saab, T. Dao, A. Rudra, and C. Ré, “Combining recurrent, convolutional, and continuous-time models with linear state space layers,” inNeurlPS, 2021. 4

2021

[77] [77]

Efficiently modeling long sequences with structured state spaces,

A. Gu, K. Goel, and C. Ré, “Efficiently modeling long sequences with structured state spaces,” inICLR, 2021. 4

2021

[78] [78]

Simplified state space layers for sequence modeling,

J. T. Smith, A. Warrington, and S. W. Linderman, “Simplified state space layers for sequence modeling,” inICLR, 2022. 4

2022

[79] [79]

Long range language modeling via gated state spaces,

H. Mehta, A. Gupta, A. Cutkosky, and B. Neyshabur, “Long range language modeling via gated state spaces,” inICLR, 2022. 4

2022

[80] [80]

Hungry hungry hippos: Towards language modeling with state space models,

D. Y. Fu, T. Dao, K. K. Saab, A. W. Thomas, A. Rudra, and C. Ré, “Hungry hungry hippos: Towards language modeling with state space models,” inICLR, 2022. 4

2022