arxiv: 2604.07802 · v3 · submitted 2026-04-09 · 💻 cs.CV · cs.AI

Recognition: unknown

Latent Anomaly Knowledge Excavation: Unveiling Sparse Sensitive Neurons in Vision-Language Models

Shaotian Li , Shangze Li , Chuancheng Shi , Wenhua Wu , Yanqiu Wu , Xiaohan Yu , Fei Shen , Tat-Seng Chua

Authors on Pith no claims yet

Pith reviewed 2026-05-10 17:21 UTC · model grok-4.3

classification 💻 cs.CV cs.AI

keywords anomaly detectionvision-language modelssparse neuronstraining-freelatent knowledgeneuron interpretabilityindustrial inspection

0 comments

The pith

Anomaly knowledge in pre-trained vision-language models is concentrated in a sparse subset of sensitive neurons that can be identified and activated using only minimal normal samples.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper argues that vision-language models already embed the knowledge needed for anomaly detection, but this knowledge stays latent and concentrated in a small number of special neurons rather than being distributed or requiring added components. It proposes a training-free method called LAKE that locates these neurons from just a few normal examples and uses their signals to build a compact representation of normality by combining visual deviations with cross-modal meanings. This matters because current methods rely on external adapters or memory banks to handle anomaly detection, which this approach avoids entirely. If the hypothesis holds, anomaly detection becomes a matter of selectively activating what the model already knows, yielding strong results on industrial benchmarks along with direct visibility into which neurons drive the decisions.

Core claim

We hypothesize that anomaly knowledge is concentrated within a sparse subset of anomaly-sensitive neurons. To validate this, we propose latent anomaly knowledge excavation (LAKE), a training-free framework that identifies and elicits these critical neuronal signals using only a minimal set of normal samples. By isolating these sensitive neurons, LAKE constructs a highly compact normality representation that integrates visual structural deviations with cross-modal semantic activations. Extensive experiments on industrial AD benchmarks demonstrate that LAKE achieves state-of-the-art performance while providing intrinsic, neuron-level interpretability. Ultimately, our work advocates for a re-fr

What carries the argument

LAKE, the training-free framework that locates sparse anomaly-sensitive neurons using minimal normal samples and elicits their signals to form an integrated visual-semantic normality representation.

If this is right

Anomaly detection can be performed effectively by activating existing pre-trained knowledge rather than acquiring new task-specific components.
A compact normality representation emerges from the selected neurons that fuses visual structural deviations with cross-modal semantic activations.
Neuron-level interpretability is obtained directly from the identification process without additional post-hoc analysis.
State-of-the-art results on industrial anomaly detection benchmarks are attainable using only normal samples for neuron selection.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same sparse-neuron excavation idea could be tested on other zero-shot capabilities of VLMs to see whether latent knowledge is similarly localized for different tasks.
Avoiding fine-tuning or external banks could reduce computational overhead when deploying these models in new industrial settings.
Cross-model comparisons might reveal whether the sensitive neurons are consistent across different vision-language architectures.

Load-bearing premise

Anomaly knowledge is intrinsically embedded but latent and concentrated in a sparse subset of neurons that can be reliably identified and elicited using only a minimal set of normal samples without any training or external data.

What would settle it

If deactivating the neurons selected by LAKE leaves anomaly detection accuracy unchanged or if substituting a random set of neurons yields comparable or better results on the same benchmarks, the claim that these specific neurons carry the concentrated knowledge would fail.

Figures

Figures reproduced from arXiv: 2604.07802 by Chuancheng Shi, Fei Shen, Shangze Li, Shaotian Li, Tat-Seng Chua, Wenhua Wu, Xiaohan Yu, Yanqiu Wu.

**Figure 2.** Figure 2: The LAKE framework. Features from layer l are projected into a variance-filtered sensitive subspace (Isens) and compared against a normal gallery (G) to compute the visual score (Svis). Simultaneously, deeper features (P (l ′ ) ) are aligned with text embeddings (tnorm, tanom) to extract the semantic score (Stext). Both scores are fused via weight α to produce the unified anomaly score (S). is to determine… view at source ↗

**Figure 3.** Figure 3: Qualitative comparison of anomaly localization with state-of-the-art (SOTA) methods. [PITH_FULL_IMAGE:figures/full_fig_p006_3.png] view at source ↗

**Figure 4.** Figure 4: Ablation of support set size on image-level anomaly detection. [PITH_FULL_IMAGE:figures/full_fig_p006_4.png] view at source ↗

**Figure 5.** Figure 5: Ablation of support set size on pixel-level anomaly localization. [PITH_FULL_IMAGE:figures/full_fig_p006_5.png] view at source ↗

**Figure 6.** Figure 6: Ablation studies on key hyperparameters. Impact of the top- [PITH_FULL_IMAGE:figures/full_fig_p007_6.png] view at source ↗

**Figure 7.** Figure 7: Qualitative evaluation of neuron specificity for anomaly excavation. [PITH_FULL_IMAGE:figures/full_fig_p008_7.png] view at source ↗

read the original abstract

Large-scale vision-language models (VLMs) exhibit remarkable zero-shot capabilities, yet the internal mechanisms driving their anomaly detection (AD) performance remain poorly understood. Current methods predominantly treat VLMs as black-box feature extractors, assuming that anomaly-specific knowledge must be acquired through external adapters or memory banks. In this paper, we challenge this assumption by arguing that anomaly knowledge is intrinsically embedded within pre-trained models but remains latent and under-activated. We hypothesize that this knowledge is concentrated within a sparse subset of anomaly-sensitive neurons. To validate this, we propose latent anomaly knowledge excavation (LAKE), a training-free framework that identifies and elicits these critical neuronal signals using only a minimal set of normal samples. By isolating these sensitive neurons, LAKE constructs a highly compact normality representation that integrates visual structural deviations with cross-modal semantic activations. Extensive experiments on industrial AD benchmarks demonstrate that LAKE achieves state-of-the-art performance while providing intrinsic, neuron-level interpretability. Ultimately, our work advocates for a paradigm shift: redefining anomaly detection as the targeted activation of latent pre-trained knowledge rather than the acquisition of a downstream task.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 1 minor

Summary. The paper claims that anomaly knowledge is intrinsically embedded but latent in pre-trained vision-language models, concentrated in a sparse subset of anomaly-sensitive neurons. It proposes LAKE, a training-free framework that identifies these neurons using statistics from only a minimal set of normal samples (no anomalies or training), constructs a compact normality representation integrating visual structural deviations and cross-modal semantics, and achieves state-of-the-art performance on industrial anomaly detection benchmarks while offering neuron-level interpretability. The work advocates redefining anomaly detection as targeted activation of latent pre-trained knowledge rather than downstream acquisition.

Significance. If the neuron selection demonstrably isolates anomaly-responsive units rather than general normal-sample features, the result would support a meaningful paradigm shift toward training-free, interpretable anomaly detection in VLMs. The training-free property, compactness of the representation, and focus on intrinsic interpretability are potential strengths that could influence efficient deployment and mechanistic understanding in the field.

major comments (2)

[LAKE method / neuron identification procedure] The central identification step (LAKE framework) selects neurons exclusively from intra-normal activation patterns or variance. This leaves open whether the selected units are anomaly-sensitive or simply encode low-variance/background features common to the chosen normal samples. The manuscript must provide direct evidence—such as differential activation analysis on held-out anomalous samples between selected and non-selected neurons, or an ablation replacing the selection criterion with random or variance-only baselines—to substantiate the mapping from normal-only statistics to anomaly sensitivity.
[Experiments and results] The SOTA performance claims rest on the assumption that the excavated neurons yield a superior normality representation. Without ablations showing that performance degrades when using non-selected neurons or when the sparsity level is altered, and without controls for post-hoc selection bias, the experimental support for the hypothesis remains incomplete.

minor comments (1)

[Method] Clarify the precise statistical criterion (e.g., activation threshold, variance metric, or cross-modal integration formula) used to designate 'sensitive' neurons; the current description is high-level and would benefit from an explicit equation or pseudocode.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the thoughtful and detailed comments, which highlight important aspects of our neuron selection procedure and experimental validation. We address each major comment below and outline the revisions we will make to strengthen the manuscript.

read point-by-point responses

Referee: [LAKE method / neuron identification procedure] The central identification step (LAKE framework) selects neurons exclusively from intra-normal activation patterns or variance. This leaves open whether the selected units are anomaly-sensitive or simply encode low-variance/background features common to the chosen normal samples. The manuscript must provide direct evidence—such as differential activation analysis on held-out anomalous samples between selected and non-selected neurons, or an ablation replacing the selection criterion with random or variance-only baselines—to substantiate the mapping from normal-only statistics to anomaly sensitivity.

Authors: We agree that additional direct evidence is needed to confirm anomaly sensitivity beyond normal-sample statistics. In the revised manuscript we will add a dedicated analysis section that computes differential activations (normal vs. held-out anomalous samples) for selected versus non-selected neurons, showing statistically higher anomaly responsiveness in the selected set. We will also include ablations that replace our selection criterion with random neuron subsets and with pure variance-based selection; both yield lower detection performance, supporting that our procedure isolates anomaly-sensitive units rather than generic low-variance features. revision: yes
Referee: [Experiments and results] The SOTA performance claims rest on the assumption that the excavated neurons yield a superior normality representation. Without ablations showing that performance degrades when using non-selected neurons or when the sparsity level is altered, and without controls for post-hoc selection bias, the experimental support for the hypothesis remains incomplete.

Authors: We acknowledge the value of these additional controls. The revised version will incorporate: (i) direct performance comparisons using non-selected neurons, demonstrating clear degradation; (ii) sweeps over different sparsity levels to illustrate the benefit of the compact representation; and (iii) bias-control experiments that evaluate randomly chosen neuron subsets of matched cardinality. These results will be reported alongside the existing benchmarks to provide fuller experimental support for the superiority of the excavated neurons. revision: yes

Circularity Check

0 steps flagged

No circularity: empirical validation independent of inputs

full rationale

The paper advances a hypothesis that anomaly knowledge is latent in sparse neurons of pre-trained VLMs and proposes the training-free LAKE method to identify such neurons from minimal normal samples only. This identification relies on intra-normal activation statistics, with success measured by SOTA empirical performance on industrial AD benchmarks. No equations, fitted parameters renamed as predictions, self-citations as load-bearing premises, or self-definitional reductions appear in the derivation chain. The mapping from normal-sample statistics to anomaly sensitivity is tested externally rather than assumed by construction, leaving the central claim self-contained against the provided benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 1 invented entities

The central claim rests on the unproven hypothesis that anomaly knowledge is concentrated in sparse neurons; no free parameters, additional axioms, or invented entities with independent evidence are detailed in the abstract.

axioms (1)

domain assumption Anomaly knowledge is intrinsically embedded within pre-trained VLMs but remains latent and under-activated.
Stated directly as the core hypothesis in the abstract.

invented entities (1)

anomaly-sensitive neurons no independent evidence
purpose: Sparse subset that concentrates the latent anomaly knowledge.
Postulated in the abstract without external falsifiable evidence or prior citation.

pith-pipeline@v0.9.0 · 5519 in / 1260 out tokens · 33972 ms · 2026-05-10T17:21:09.169220+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

52 extracted references · 21 canonical work pages · 2 internal anchors

[1]

Learning Transferable Visual Models From Natural Language Supervision

A. Radford, J. W. Kim, C. Hallacy, A. Ramesh, G. Goh, S. Agarwal, G. Sastry, A. Askell, P. Mishkin, J. Clark, G. Krueger, and I. Sutskever, “Learning transferable visual models from natural language supervision,” 2021. [Online]. Available: https://arxiv.org/abs/2103.00020 1

work page internal anchor Pith review Pith/arXiv arXiv 2021
[2]

Emerging properties in self-supervised vision transformers,

M. Caron, H. Touvron, I. Misra, H. J ´egou, J. Mairal, P. Bojanowski, and A. Joulin, “Emerging properties in self-supervised vision transformers,” inProceedings of the International Conference on Computer Vision (ICCV), 2021. 1

2021
[3]

13 Published as a conference paper at ICLR 2026 Sheng Jin, Xueying Jiang, Jiaxing Huang, Lewei Lu, and Shijian Lu

C. Jia, Y . Yang, Y . Xia, Y .-T. Chen, Z. Parekh, H. Pham, Q. V . Le, Y . Sung, Z. Li, and T. Duerig, “Scaling up visual and vision-language representation learning with noisy text supervision,” 2021. [Online]. Available: https://arxiv.org/abs/2102.05918 1

work page arXiv 2021
[4]

Visual instruction tuning,

H. Liu, C. Li, Q. Wu, and Y . J. Lee, “Visual instruction tuning,” 2023. 1

2023
[5]

Medianomaly: A comparative study of anomaly detection in medical images,

Y . Cai, W. Zhang, H. Chen, and K.-T. Cheng, “Medianomaly: A comparative study of anomaly detection in medical images,”Medical Image Analysis, vol. 102, p. 103500, 2025. 1

2025
[6]

Eiad: Explainable industrial anomaly detection via multi-modal large language models,

Z. Zhang, J. Ruan, X. Gao, T. Liu, and Y . Fu, “Eiad: Explainable industrial anomaly detection via multi-modal large language models,” in 2025 IEEE International Conference on Multimedia and Expo (ICME). IEEE, 2025, pp. 1–6. 1

2025
[7]

Anomalyclip: Object- agnostic prompt learning for zero-shot anomaly detection,

Q. Zhou, G. Pang, Y . Tian, S. He, and J. Chen, “Anomalyclip: Object- agnostic prompt learning for zero-shot anomaly detection,” inThe Twelfth International Conference on Learning Representations, 2024. 1, 2, 5, 9

2024
[8]

Ada- clip: Adapting clip with hybrid learnable prompts for zero-shot anomaly detection,

Y . Cao, J. Zhang, L. Frittoli, Y . Cheng, W. Shen, and G. Boracchi, “Ada- clip: Adapting clip with hybrid learnable prompts for zero-shot anomaly detection,” inEuropean conference on computer vision. Springer, 2024, pp. 55–72. 1, 2, 5, 9

2024
[9]

Towards vlm-based hybrid explainable prompt enhancement for zero- shot industrial anomaly detection,

W. Cai, W. Huang, Y . Cao, C. Huang, F. Yuan, B. Zhang, and J. Wen, “Towards vlm-based hybrid explainable prompt enhancement for zero- shot industrial anomaly detection,” inProceedings of the Thirty-Fourth International Joint Conference on Artificial Intelligence, 2025, pp. 711–

2025
[10]

Adaptclip: Adapting clip for universal visual anomaly detection,

B.-B. Gao, Y . Zhou, J. Yan, Y . Cai, W. Zhang, M. Wang, J. Liu, Y . Liu, L. Wang, and C. Wang, “Adaptclip: Adapting clip for universal visual anomaly detection,” inProceedings of the AAAI Conference on Artificial Intelligence, vol. 40, no. 6, 2026, pp. 4095–4103. 1, 2

2026
[11]

Towards total recall in industrial anomaly detection,

K. Roth, L. Pemula, J. Zepeda, B. Sch ¨olkopf, T. Brox, and P. Gehler, “Towards total recall in industrial anomaly detection,” 2022. [Online]. Available: https://arxiv.org/abs/2106.08265 1, 2

work page arXiv 2022
[12]

Mrad: Zero-shot anomaly detection with memory-driven retrieval,

C. Xu, C. Lv, Q. Chen, F. Zhang, and Z. Zhang, “Mrad: Zero-shot anomaly detection with memory-driven retrieval,” 2026. [Online]. Available: https://arxiv.org/abs/2602.00522 1, 2

work page arXiv 2026
[13]

Fslc: Fast scoring with learnable coreset for zero-shot industrial anomaly detection,

S. Ni, Y . Li, and X. Zhao, “Fslc: Fast scoring with learnable coreset for zero-shot industrial anomaly detection,” in36th British Machine Vision Conference 2025, BMVC 2025, Sheffield, UK, November 24-27, 2025. BMV A, 2025. [Online]. Available: https: //bmva-archive.org.uk/bmvc/2025/assets/papers/Paper 922/paper.pdf 1, 2

2025
[14]

Realnet: A feature selection network with realistic synthetic anomaly for anomaly detection,

X. Zhang, M. Xu, and X. Zhou, “Realnet: A feature selection network with realistic synthetic anomaly for anomaly detection,” 2024. 1, 2

2024
[15]

Anomagic: Crossmodal prompt-driven zero-shot anomaly generation,

Y . Jiang, W. Luo, H. Zhang, Q. Chen, H. Yao, W. Shen, and Y . Cao, “Anomagic: Crossmodal prompt-driven zero-shot anomaly generation,” inProceedings of the AAAI Conference on Artificial Intelligence, vol. 40, no. 7, 2026, pp. 5485–5493. 1

2026
[16]

Dual-interrelated diffusion model for few-shot anomaly image generation,

Y . Jin, J. Peng, Q. He, T. Hu, J. Wu, H. Chen, H. Wang, W. Zhu, M. Chi, J. Liuet al., “Dual-interrelated diffusion model for few-shot anomaly image generation,” inProceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2025, pp. 30 420–30 429. 1

2025
[17]

What do vlms notice? a mechanistic interpretability pipeline for gaussian-noise-free text-image corruption and evaluation,

M. Golovanevsky, W. Rudman, V . Palit, R. Singh, and C. Eickhoff, “What do vlms notice? a mechanistic interpretability pipeline for gaussian-noise-free text-image corruption and evaluation,” 2025. [Online]. Available: https://arxiv.org/abs/2406.16320 1, 2

work page arXiv 2025
[18]

Show and tell: Visually explainable deep neural nets via spatially-aware concept bottleneck models,

I. Benou and T. Riklin-Raviv, “Show and tell: Visually explainable deep neural nets via spatially-aware concept bottleneck models,” 2025. [Online]. Available: https://arxiv.org/abs/2502.20134 1

work page arXiv 2025
[19]

Logicad: Explainable anomaly detection via vlm-based text feature extraction,

E. Jin, Q. Feng, Y . Mou, G. Lakemeyer, S. Decker, O. Simons, and J. Stegmaier, “Logicad: Explainable anomaly detection via vlm-based text feature extraction,” inProceedings of the AAAI Conference on Artificial Intelligence, vol. 39, no. 4, 2025, pp. 4129–4137. 1

2025
[20]

MMNeuron: Discovering neuron-level domain-specific interpretation in multimodal large language model,

J. Huo, Y . Yan, B. Hu, Y . Yue, and X. Hu, “MMNeuron: Discovering neuron-level domain-specific interpretation in multimodal large language model,” inProceedings of the 2024 Conference on Empirical Methods in Natural Language Processing, Y . Al-Onaizan, M. Bansal, and Y .-N. Chen, Eds. Miami, Florida, USA: Association for Computational Linguistics, Nov. 2...

2024
[21]

Knowledge neurons in pretrained transformers,

D. Dai, L. Dong, Y . Hao, Z. Sui, B. Chang, and F. Wei, “Knowledge neurons in pretrained transformers,” inProceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2022, pp. 8493–8502. 1

2022
[22]

Locating and editing factual associations in gpt,

K. Meng, D. Bau, A. Andonian, and Y . Belinkov, “Locating and editing factual associations in gpt,”Advances in neural information processing systems, vol. 35, pp. 17 359–17 372, 2022. 1

2022
[23]

Interpreting the second- order effects of neurons in clip,

Y . Gandelsman, A. A. Efros, and J. Steinhardt, “Interpreting the second- order effects of neurons in clip,”arXiv preprint arXiv:2406.04341, 2024. 1

work page arXiv 2024
[24]

Where culture fades: Revealing the cultural gap in text-to-image generation,

C. Shi, S. Li, S. Guo, S. Xie, W. Wu, J. Dou, C. Wu, C. Xiao, C. Wang, Z. Chenget al., “Where culture fades: Revealing the cultural gap in text-to-image generation,”arXiv preprint arXiv:2511.17282, 2025. 2

work page arXiv 2025
[25]

Dna: Uncovering universal latent forgery knowledge.arXiv preprint arXiv:2601.22515, 2026

J. Dou, C. Shi, Y . Wang, S. Guo, A. Yi, W. Wu, L. Zhang, F. Shen, and T.-S. Chua, “Dna: Uncovering universal latent forgery knowledge,” arXiv preprint arXiv:2601.22515, 2026. 2

work page arXiv 2026
[26]

Rareclip: Rarity-aware online zero-shot industrial anomaly detection,

J. He, M. Cao, S. Peng, and Q. Xie, “Rareclip: Rarity-aware online zero-shot industrial anomaly detection,” inProceedings of the IEEE/CVF International Conference on Computer Vision, 2025, pp. 24 478–24 487. 2

2025
[27]

Aa-clip: Enhancing zero-shot anomaly detection via anomaly-aware clip,

W. Ma, X. Zhang, Q. Yao, F. Tang, C. Wu, Y . Li, R. Yan, Z. Jiang, and S. K. Zhou, “Aa-clip: Enhancing zero-shot anomaly detection via anomaly-aware clip,” 2025. [Online]. Available: https: //arxiv.org/abs/2503.06661 2

work page arXiv 2025
[28]

Madclip: few-shot medical anomaly detection with clip,

M. Shiri, C. Beyan, and V . Murino, “Madclip: few-shot medical anomaly detection with clip,” inInternational Conference on Medical Image Computing and Computer-Assisted Intervention. Springer, 2025, pp. 416–426. 2

2025
[29]

Winclip: Zero-/few-shot anomaly classification and segmentation,

J. Jeong, Y . Zou, T. Kim, D. Zhang, A. Ravichandran, and O. Dabeer, “Winclip: Zero-/few-shot anomaly classification and segmentation,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), June 2023, pp. 19 606–19 616. 2

2023
[30]

Visualad: Language-free zero-shot anomaly detection via vision transformer,

Y . Hou, P. Li, Z. Liu, Y . Wang, Y . Ruan, J. Qiu, and K. Xu, “Visualad: Language-free zero-shot anomaly detection via vision transformer,”
[31]

Available: https://arxiv.org/abs/2603.07952 2, 5, 9

[Online]. Available: https://arxiv.org/abs/2603.07952 2, 5, 9

work page arXiv
[32]

Iad-r1: Reinforcing consistent reasoning in industrial anomaly detection,

Y . Li, Y . Cao, C. Liu, Y . Xiong, X. Dong, and C. Huang, “Iad-r1: Reinforcing consistent reasoning in industrial anomaly detection,” in Proceedings of the AAAI Conference on Artificial Intelligence, vol. 40, no. 8, 2026, pp. 6583–6591. 2

2026
[33]

Towards zero- shot anomaly detection and reasoning with multimodal large language models,

J. Xu, S.-Y . Lo, B. Safaei, V . M. Patel, and I. Dwivedi, “Towards zero- shot anomaly detection and reasoning with multimodal large language models,” inProceedings of the Computer Vision and Pattern Recognition Conference, 2025, pp. 20 370–20 382. 2

2025
[34]

Can multimodal large language models be guided to improve industrial anomaly detection?

Z. Chen, H. Chen, M. Imani, and F. Imani, “Can multimodal large language models be guided to improve industrial anomaly detection?” in International Design Engineering Technical Conferences and Computers and Information in Engineering Conference, vol. 89213. American Society of Mechanical Engineers, 2025, p. V02BT02A051. 2

2025
[35]

Sae-v: Interpreting multimodal models for enhanced alignment,

H. Lou, C. Li, J. Ji, and Y . Yang, “Sae-v: Interpreting multimodal models for enhanced alignment,” 2025. [Online]. Available: https: //arxiv.org/abs/2502.17514 2

work page arXiv 2025
[36]

Interpreting clip with hierarchi- cal sparse autoencoders, 2025

V . Zaigrajew, H. Baniecki, and P. Biecek, “Interpreting clip with hierarchical sparse autoencoders,” 2025. [Online]. Available: https://arxiv.org/abs/2502.20578 2

work page arXiv 2025
[37]

TIDE : Temporal-aware sparse autoencoders for interpretable diffusion transformers in image generation

V . S.-J. Huang, L. Zhuo, Y . Xin, Z. Wang, F.-Y . Wang, Y . Wang, R. Zhang, P. Gao, and H. Li, “Tide : Temporal-aware sparse autoencoders for interpretable diffusion transformers in image generation,” 2025. [Online]. Available: https://arxiv.org/abs/2503.07050 2

work page arXiv 2025
[38]

Sparse autoencoders reveal selective remapping of visual concepts during adaptation.arXiv preprint arXiv:2412.05276,

H. Lim, J. Choi, J. Choo, and S. Schneider, “Sparse autoencoders reveal selective remapping of visual concepts during adaptation,” 2025. [Online]. Available: https://arxiv.org/abs/2412.05276 2

work page arXiv 2025
[39]

Granular concept circuits: Toward a fine-grained circuit discovery for concept representations,

D. Kwon, S. Lee, and J. Choi, “Granular concept circuits: Toward a fine-grained circuit discovery for concept representations,” 2025. [Online]. Available: https://arxiv.org/abs/2508.01728 2

work page arXiv 2025
[40]

Do llms and vlms share neurons for inference? evidence and mechanisms of cross-modal transfer,

C. Cui, A. Zhang, Y . Chen, G. Deng, J. Zheng, Z. Liang, X. Wang, and T.-S. Chua, “Do llms and vlms share neurons for inference? evidence and mechanisms of cross-modal transfer,”arXiv preprint arXiv:2602.19058,

work page arXiv
[41]

Tracerouter: Robust safety for large foundation models via path-level intervention.arXiv preprint arXiv:2601.21900, 2026

C. Shi, S. Li, W. Lu, W. Wu, C. Wang, Z. Cheng, F. Shen, and T.- S. Chua, “Tracerouter: Robust safety for large foundation models via path-level intervention,”arXiv preprint arXiv:2601.21900, 2026. 2

work page arXiv 2026
[42]

Towards neuron attributions in multi-modal large language models,

J. Fang, Z. Bi, R. Wang, H. Jiang, Y . Gao, K. Wang, A. Zhang, J. Shi, X. Wang, and T.-S. Chua, “Towards neuron attributions in multi-modal large language models,”Advances in Neural Information Processing Systems, vol. 37, pp. 122 867–122 890, 2024. 2

2024
[43]

Mvtec ad–a comprehensive real-world dataset for unsupervised anomaly detection,

P. Bergmann, M. Fauser, D. Sattlegger, and C. Steger, “Mvtec ad–a comprehensive real-world dataset for unsupervised anomaly detection,” inProceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2019, pp. 9592–9600. 5, 11

2019
[44]

Spot-the-difference self-supervised pre- training for anomaly detection and segmentation.arXiv preprint arXiv:2207.14315, 2022

Y . Zou, J. Jeong, L. Pemula, D. Zhang, and O. Dabeer, “Spot-the- difference self-supervised pre-training for anomaly detection and seg- mentation,”arXiv preprint arXiv:2207.14315, 2022. 5, 11

work page arXiv 2022
[45]

Btad: Btech anomaly detection dataset for industrial inspection,

S. Tripathi, S. Shetty, X. Z. Fern, and R. Raich, “Btad: Btech anomaly detection dataset for industrial inspection,”arXiv preprint arXiv:2012.10408, 2020. 5

work page arXiv 2012
[46]

Learning transferable visual models from natural language supervision,

A. Radford, J. W. Kim, C. Hallacy, A. Ramesh, G. Goh, S. Agarwal, G. Sastry, A. Askell, P. Mishkin, J. Clarket al., “Learning transferable visual models from natural language supervision,” inInternational conference on machine learning. PmLR, 2021, pp. 8748–8763. 5, 12

2021
[47]

Winclip: Zero-/few-shot anomaly classification and segmentation,

J. Jeong, Y . Zou, T. Kim, D. Zhang, A. Ravichandran, and O. Dabeer, “Winclip: Zero-/few-shot anomaly classification and segmentation,” in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2023, pp. 19 606–19 616. 5, 8, 9, 11

2023
[48]

Clip-ad: A language-guided staged dual-path model for zero- shot anomaly detection,

X. Chen, J. Zhang, G. Tian, H. He, W. Zhang, Y . Wang, C. Wang, and Y . Liu, “Clip-ad: A language-guided staged dual-path model for zero- shot anomaly detection,” inInternational joint conference on artificial intelligence. Springer, 2024, pp. 17–33. 5, 9

2024
[49]

The RSNA-ASNR-MICCAI BraTS 2021 Benchmark on Brain Tumor Segmentation and Radiogenomic Classification

U. Baid, S. Ghodasara, S. Mohan, M. Bilello, E. Calabrese, E. Colak, K. Farahani, J. Kalpathy-Cramer, F. C. Kitamura, S. Patiet al., “The rsna-asnr-miccai brats 2021 benchmark on brain tumor segmentation and radiogenomic classification,”arXiv preprint arXiv:2107.02314, 2021. 5, 9, 11

work page internal anchor Pith review arXiv 2021
[50]

Uninformed students: Student-teacher anomaly detection with discriminative latent embeddings,

P. Bergmann, M. Fauser, D. Sattlegger, and C. Steger, “Uninformed students: Student-teacher anomaly detection with discriminative latent embeddings,” inProceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2020, pp. 4183–4192. 5

2020
[51]

Dinov2: Learning robust visual features without supervision,

M. Oquab, T. Darcet, T. Moutakanni, H. V . V o, M. Szafraniec, V . Khali- dov, P. Fernandez, D. Haziza, F. Massa, A. El-Nouby, R. Howes, P.-Y . Huang, H. Xu, V . Sharma, S.-W. Li, W. Galuba, M. Rabbat, M. Assran, N. Ballas, G. Synnaeve, I. Misra, H. Jegou, J. Mairal, P. Labatut, A. Joulin, and P. Bojanowski, “Dinov2: Learning robust visual features withou...

2023
[52]

Remp-ad: Retrieval- enhanced multi-modal prompt fusion for few-shot industrial visual anomaly detection,

H. Ma, G. Yang, D. Zhao, Y . Ji, and W. Zuo, “Remp-ad: Retrieval- enhanced multi-modal prompt fusion for few-shot industrial visual anomaly detection,” inProceedings of the IEEE/CVF International Conference on Computer Vision, 2025, pp. 20 425–20 434. 5, 8 APPENDIX The appendices of this paper provide a comprehensive technical extension of the LAKE framew...

2025