arxiv: 2602.09850 · v2 · submitted 2026-02-10 · 💻 cs.CV

Recognition: 2 theorem links

· Lean Theorem

Towards Explainable Industrial Anomaly Detection via Knowledge-Guided Latent Reasoning

Peng Chen , Chao Huang , Yunkang Cao , Chengliang Liu , Wei Wang , Wenqiang Wang , Mingbo Yang , Li Shen

show 2 more authors

Wenqi Ren Xiaochun Cao

Authors on Pith no claims yet

Pith reviewed 2026-05-16 02:58 UTC · model grok-4.3

classification 💻 cs.CV

keywords industrial anomaly detectionexplainable AIlatent reasoningmultimodal large language modelsknowledge retrievalentropy optimizationdefect detection

0 comments

The pith

Reason-IAD improves industrial anomaly detection accuracy and explainability by retrieving category knowledge and applying entropy-driven latent reasoning.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

General multimodal models struggle with the fine-grained, category-specific defects common in industrial settings because they lack targeted domain context. Reason-IAD tackles this by first pulling in textual descriptions of expected anomalies for a given product type and then running an iterative reasoning process inside a compact latent space. Optimizable think tokens are guided by an entropy reward that favors confident predictions, while a dynamic injection step feeds only the most informative image patches into the sequence. If the approach works, detection systems can both locate defects more reliably and surface the reasoning steps that led to each decision.

Core claim

Reason-IAD is a knowledge-guided dynamic latent reasoning framework whose retrieval-augmented knowledge module supplies category-specific textual descriptions and whose entropy-driven latent reasoning mechanism uses optimizable think tokens plus dynamic visual injection to direct attention to anomaly-critical regions, yielding consistent outperformance over prior methods on multiple industrial anomaly detection tasks.

What carries the argument

Entropy-driven latent reasoning mechanism that performs iterative exploration in a compact latent space with optimizable think tokens, an entropy-based reward for confident predictions, and dynamic injection of the most informative image patches.

If this is right

Category-specific textual knowledge allows the model to reason about defects that general pretraining misses.
The entropy reward encourages the latent reasoning loop to converge on stable, high-confidence anomaly labels.
Selective injection of informative patches focuses computation on defect-relevant image areas.
The resulting reasoning traces make model decisions more interpretable for human inspectors.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same retrieval-plus-latent-reasoning pattern could be tested on medical imaging where domain knowledge is also critical.
Adding temporal sequences to the dynamic injection step might extend the method to video-based industrial monitoring.
Measuring how the entropy reward behaves under domain shift would clarify whether the latent tokens remain stable outside the training distribution.

Load-bearing premise

The entropy-driven mechanism with optimizable think tokens and dynamic visual injection will reliably steer attention to anomaly-critical regions without creating instability or new biases.

What would settle it

Apply the trained Reason-IAD model to a held-out industrial dataset containing subtle defects under changed lighting or from a different manufacturing line and check whether both detection accuracy and explanation quality fall below current state-of-the-art baselines.

Figures

Figures reproduced from arXiv: 2602.09850 by Chao Huang, Chengliang Liu, Li Shen, Mingbo Yang, Peng Chen, Wei Wang, Wenqiang Wang, Wenqi Ren, Xiaochun Cao, Yunkang Cao.

**Figure 1.** Figure 1: Comparison between existing reasoning methods and the proposed Reason-IAD. (a) Existing methods conduct reasoning through explicit chains of thought. (b) Reason-IAD retrieves domain-specific knowledge and identifies anomalies via iterative latent reasoning. 1. Introduction Industrial anomaly detection plays a critical role in ensuring the safety and reliability of modern manufacturing systems (Jiang et … view at source ↗

**Figure 2.** Figure 2: Overview of the proposed Reason-IAD. (a) Given a query image, Reason-IAD retrieves the most relevant category-specific descriptions and incorporates them into the model prompt to enhance anomaly awareness. (b) An entropy-guided latent reasoning module iteratively refines latent think tokens and dynamically injects visual evidence to improve reasoning accuracy. (c) Illustration of the iterative latent-space… view at source ↗

**Figure 3.** Figure 3: Performance gains of Reason-IAD over baseline models under the one-shot setting. 1 × 10−3 . To ensure stable exploration in the latent space, we set the perturbation magnitude to 10%. By default, we employ a one-shot setting with a randomly sampled normal image from the same domain as the reference template. For comparison, we also evaluate a zero-shot setting. All experiments are conducted on four NVIDIA … view at source ↗

**Figure 4.** Figure 4: Effect of iteration count on anomaly discrimination performance. Increasing iterations consistently improves accuracy while maintaining stability. tasks, which may be attributed to catastrophic forgetting when MLLMs are trained on expert outputs. Another domain-specific method, AnomalyR1, incorporates GRPO to further enhance anomaly understanding. In contrast, our training-free approach still outperforms A… view at source ↗

**Figure 5.** Figure 5: Comparison of model outputs for anomaly detection. demonstrate the improvements brought by Reason-IAD, we measure its performance gains over the baseline models. As shown in [PITH_FULL_IMAGE:figures/full_fig_p008_5.png] view at source ↗

**Figure 6.** Figure 6: Performance gains of Reason-IAD over baseline models under the one-shot and zero-shot setting. performance in certain models. For example, both SPHINX and LLaVAOnVision frequently misclassify anomalies as normal, leading to many missed detections, while LLaVA-1.5 shows high recall but low precision, reflecting a substantial false positive rate. Human experts outperform all MLLMs, with experts achieving ove… view at source ↗

**Figure 7.** Figure 7: A case study of the anomaly discrimination task on the MVTec-AD dataset. Question: There is a defect in the object. Where is the defect? A. Top left candle B. Bottom right candle C. Bottom left candle D. Top right Answer: Looking at the image, I need to find which candle has a defect. I am examining each candle one by one: Top left candle: appears normal, wick is intact. Top right candle: appears normal, w… view at source ↗

**Figure 8.** Figure 8: A case study of the defect location task on the Visa dataset. Answer: The object in the image is a screw. Upon close inspection, the head of the screw shows visible damage or wear, particularly around the slot where a driver would be inserted. The slot appears chipped or broken, which would prevent a standard screwdriver from engaging properly. This is a clear defect that compromises the functionality of t… view at source ↗

**Figure 9.** Figure 9: A case study of the anomaly discrimination task on the MVTec-AD dataset. 16 [PITH_FULL_IMAGE:figures/full_fig_p016_9.png] view at source ↗

**Figure 10.** Figure 10: A case study of the defect location task on the Visa dataset. Question: There is a defect in the object. What is the appearance of the defect? A. Peeling paint B. Circular stain C. Vertical indent D. Horizontal scratch Answer: Looking at the image of the Coca-Cola can, I can see a clear defect on the red surface. There is a noticeable white mark running vertically along the side of the can, interrupting t… view at source ↗

**Figure 11.** Figure 11: A case study of the defect description task on the GoodsAD dataset. Question: What kind of product is in the image? A. Rubber hose. B. Three-core electrical cable. C. Plumbing pipe. D. Fiber optic cable. Answer: The image shows a cross-section of a cable with three distinct inner conductors, each surrounded by insulation. The conductors appear to be made of copper (shiny, metallic strands), which is typic… view at source ↗

**Figure 12.** Figure 12: A case study of the object classification task on the MVTec-AD dataset. 17 [PITH_FULL_IMAGE:figures/full_fig_p017_12.png] view at source ↗

**Figure 13.** Figure 13: A case study of the defect description task on the GoodsAD dataset. Question: There is a defect in the object. What is the potential effect of the defect? A. Reduced performance B. Improper insertion C. Increased power consumption D. Shorter lifespan Answer: The image shows an ultrasonic sensor module. A defect is visible: the header pins for the connector are bent or misaligned, particularly the middle p… view at source ↗

**Figure 14.** Figure 14: A case study of the defect analysis task on the Visa dataset. 18 [PITH_FULL_IMAGE:figures/full_fig_p018_14.png] view at source ↗

read the original abstract

Industrial anomaly detection demands precise reasoning over fine-grained defect patterns. However, existing multimodal large language models (MLLMs), pretrained on general-domain data, often struggle to capture category-specific anomalies, thereby limiting both detection accuracy and interpretability. To address these limitations, we propose Reason-IAD, a knowledge-guided dynamic latent reasoning framework for explainable industrial anomaly detection. Reason-IAD comprises two core components. First, a retrieval-augmented knowledge module incorporates category-specific textual descriptions into the model input, enabling context-aware reasoning over domain-specific defects. Second, an entropy-driven latent reasoning mechanism conducts iterative exploration within a compact latent space using optimizable latent think tokens, guided by an entropy-based reward that encourages confident and stable predictions. Furthermore, a dynamic visual injection strategy selectively incorporates the most informative image patches into the latent sequence, directing the reasoning process toward regions critical for anomaly detection. Extensive experimental results demonstrate that Reason-IAD consistently outperforms state-of-the-art methods across multiple tasks. The code will be publicly available at https://github.com/chenpeng052/Reason-IAD.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

Reason-IAD layers retrieval-augmented knowledge and entropy-driven latent think tokens onto MLLMs for industrial anomaly detection, but the abstract gives no numbers and the stability of the token optimization is unexamined.

read the letter

The main point is that this paper puts forward Reason-IAD, a framework that adds category-specific textual retrieval and an entropy-guided latent reasoning step to multimodal LLMs so they can handle fine-grained industrial defects with better accuracy and some built-in interpretability. The latent part uses optimizable think tokens that get updated under an entropy reward meant to favor confident predictions, plus a dynamic step that injects only the most useful image patches into the sequence.

Referee Report

2 major / 2 minor

Summary. The manuscript proposes Reason-IAD, a knowledge-guided dynamic latent reasoning framework for explainable industrial anomaly detection. It consists of a retrieval-augmented knowledge module incorporating category-specific textual descriptions and an entropy-driven latent reasoning mechanism that performs iterative exploration in a compact latent space via optimizable latent think tokens guided by an entropy-based reward, together with a dynamic visual injection strategy that selectively incorporates informative image patches. The central claim is that this approach consistently outperforms state-of-the-art methods across multiple tasks.

Significance. If the performance gains are substantiated by rigorous, reproducible experiments and the latent reasoning mechanism is shown to be stable, the work could meaningfully advance explainable industrial anomaly detection by adapting multimodal LLMs to domain-specific defects through interpretable latent exploration. The combination of retrieval-augmented knowledge and entropy-guided optimizable tokens offers a concrete direction for directing attention to fine-grained anomalies without relying solely on general pretraining.

major comments (2)

[Abstract] Abstract: the assertion that Reason-IAD 'consistently outperforms state-of-the-art methods across multiple tasks' is presented without any quantitative metrics, ablation results, baseline comparisons, or error analysis. Because the headline claim rests on these results, their absence prevents verification of whether the gains derive from the proposed retrieval module, entropy reward, or dynamic injection.
[Method (entropy-driven latent reasoning)] Entropy-driven latent reasoning mechanism (Method section): the entropy-based reward is introduced to encourage confident predictions, yet no convergence diagnostics, variance across random seeds for the optimizable think tokens, or ablation isolating the reward's contribution versus the retrieval module are provided. This analysis is load-bearing for the claim that the mechanism reliably directs attention to anomaly-critical regions without introducing instability.

minor comments (2)

[Method] The notation for the latent think tokens and the precise formulation of the dynamic visual injection could be clarified with explicit equations or pseudocode to improve reproducibility.
[Conclusion] The manuscript states that code will be released at the cited GitHub repository; confirming this upon acceptance would strengthen the contribution.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive comments, which help us strengthen the clarity and rigor of the presentation. We address each major point below and will incorporate revisions as noted.

read point-by-point responses

Referee: [Abstract] Abstract: the assertion that Reason-IAD 'consistently outperforms state-of-the-art methods across multiple tasks' is presented without any quantitative metrics, ablation results, baseline comparisons, or error analysis. Because the headline claim rests on these results, their absence prevents verification of whether the gains derive from the proposed retrieval module, entropy reward, or dynamic injection.

Authors: We agree that the abstract would be strengthened by including key quantitative results. The full manuscript contains detailed tables (e.g., Table 1 for overall performance on MVTec AD and VisA, Table 3 for ablations isolating retrieval, entropy reward, and dynamic injection). In the revised version we will update the abstract to report the main average gains (e.g., +2.3% AUROC over the strongest baseline) and explicitly reference the supporting tables and ablation studies so readers can immediately verify the source of the improvements. revision: yes
Referee: [Method (entropy-driven latent reasoning)] Entropy-driven latent reasoning mechanism (Method section): the entropy-based reward is introduced to encourage confident predictions, yet no convergence diagnostics, variance across random seeds for the optimizable think tokens, or ablation isolating the reward's contribution versus the retrieval module are provided. This analysis is load-bearing for the claim that the mechanism reliably directs attention to anomaly-critical regions without introducing instability.

Authors: The manuscript already contains an ablation study (Section 4.3) that isolates the entropy reward's contribution from the retrieval module and dynamic injection. However, we did not include convergence curves or multi-seed variance statistics for the optimizable latent think tokens. We will add these diagnostics in the revision (new figure showing entropy reward convergence and standard deviation across five random seeds) to confirm stability and that the mechanism reliably focuses on anomaly-critical regions. revision: yes

Circularity Check

0 steps flagged

No significant circularity in claimed derivations

full rationale

The paper introduces Reason-IAD as a framework with a retrieval-augmented knowledge module and an entropy-driven latent reasoning mechanism using optimizable think tokens plus dynamic visual injection. No equations, derivations, or self-citations are exhibited that reduce any prediction or performance claim to quantities defined by construction from the paper's own fitted inputs or prior self-referential results. The central claims rest on external retrieval signals, optimization rewards, and reported experimental outperformance rather than any self-definitional loop or fitted-input renaming. This matches the default expectation for non-circular papers.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 2 invented entities

The proposal rests on domain assumptions about the utility of external textual knowledge and introduces new conceptual mechanisms without independent prior validation.

axioms (1)

domain assumption Category-specific textual descriptions can be effectively retrieved and incorporated to improve domain-specific anomaly reasoning in pretrained MLLMs
Invoked as the basis for the retrieval-augmented knowledge module.

invented entities (2)

optimizable latent think tokens no independent evidence
purpose: Enable iterative exploration and reasoning inside a compact latent space
Introduced as the core of the entropy-driven mechanism
entropy-based reward no independent evidence
purpose: Guide optimization toward confident and stable predictions
New reward signal defined for the latent reasoning process

pith-pipeline@v0.9.0 · 5513 in / 1247 out tokens · 121150 ms · 2026-05-16T02:58:55.716490+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

entropy-guided reward mechanism... H(P_i^(n)) = -∑ P log P... R(Z^(n)) = 1 - (1/m) ∑ H(P_i^(n))... Z'^(n) ← Z^(n) + η ∇_Z J(Z^(n))
IndisputableMonolith/Foundation/ArithmeticFromLogic.lean LogicNat recovery from Law of Logic unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

iterative exploration within a compact latent space using optimizable latent think tokens

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

24 extracted references · 24 canonical work pages · 9 internal anchors

[1]

Model card for Claude-3 series. Bai, S., Cai, Y ., Chen, R., Chen, K., Chen, X., Cheng, Z., Deng, L., Ding, W., Gao, C., Ge, C., Ge, W., Guo, Z., Huang, Q., Huang, J., Huang, F., Hui, B., Jiang, S., Li, Z., Li, M., Li, M., Li, K., Lin, Z., Lin, J., Liu, X., Liu, J., Liu, C., Liu, Y ., Liu, D., Liu, S., Lu, D., Luo, R., Lv, C., Men, R., Meng, L., Ren, X., ...

work page internal anchor Pith review Pith/arXiv arXiv
[2]

Anomalyr1: A grpo- based end-to-end mllm for industrial anomaly detection

Chao, Y ., Liu, J., Tang, J., and Wu, G. Anomalyr1: A grpo- based end-to-end mllm for industrial anomaly detection. arXiv preprint arXiv:2504.11914,

work page arXiv
[3]

Reasoning be- yond language: A comprehensive survey on latent chain- of-thought reasoning.arXiv preprint arXiv:2505.16782, 2025b

Chen, X., Zhao, A., Xia, H., Lu, X., Wang, H., Chen, Y ., Zhang, W., Wang, J., Li, W., and Shen, X. Reasoning be- yond language: A comprehensive survey on latent chain- of-thought reasoning.arXiv preprint arXiv:2505.16782, 2025b. Chen, Z., Wu, J., Wang, W., Su, W., Chen, G., Xing, S., Zhong, M., Zhang, Q., Zhu, X., Lu, L., et al. Internvl: Scaling up visi...

work page arXiv
[4]

Gemini 2.5: Pushing the Frontier with Advanced Reasoning, Multimodality, Long Context, and Next Generation Agentic Capabilities

Comanici, G., Bieber, E., Schaekermann, M., Pasupat, I., Sachdeva, N., Dhillon, I., Blistein, M., Ram, O., Zhang, D., Rosen, E., et al. Gemini 2.5: Pushing the frontier with advanced reasoning, multimodality, long context, and next generation agentic capabilities.arXiv preprint arXiv:2507.06261,

work page internal anchor Pith review Pith/arXiv arXiv
[5]

and Zhang, R

Ding, Y . and Zhang, R. Sherlock: Self-correcting rea- soning in vision-language models.arXiv preprint arXiv:2505.22651,

work page arXiv
[6]

Scaling up Test-Time Compute with Latent Reasoning: A Recurrent Depth Approach

Geiping, J., McLeish, S., Jain, N., Kirchenbauer, J., Singh, S., Bartoldson, B. R., Kailkhura, B., Bhatele, A., and Goldstein, T. Scaling up test-time compute with latent reasoning: A recurrent depth approach.arXiv preprint arXiv:2502.05171,

work page internal anchor Pith review Pith/arXiv arXiv
[7]

Filo: Zero-shot anomaly detection by fine- grained description and high-quality localization

Gu, Z., Zhu, B., Zhu, G., Chen, Y ., Li, H., Tang, M., and Wang, J. Filo: Zero-shot anomaly detection by fine- grained description and high-quality localization. InPro- ceedings of the 32nd ACM International Conference on Multimedia, pp. 2041–2049, 2024a. Gu, Z., Zhu, B., Zhu, G., Chen, Y ., Tang, M., and Wang, J. Anomalygpt: Detecting industrial anomalie...

work page 2041
[8]

Training Large Language Models to Reason in a Continuous Latent Space

Hao, S., Sukhbaatar, S., Su, D., Li, X., Hu, Z., Weston, J., and Tian, Y . Training large language models to reason in a continuous latent space.arXiv preprint arXiv:2412.06769,

work page internal anchor Pith review Pith/arXiv arXiv
[9]

F., and Yang, F.-E

Huang, C.-P., Wu, Y .-H., Chen, M.-H., Wang, Y .-C. F., and Yang, F.-E. Thinkact: Vision-language-action reason- ing via reinforced visual latent planning.arXiv preprint arXiv:2507.16815,

work page arXiv
[10]

GPT-4o System Card

Hurst, A., Lerer, A., Goucher, A. P., Perelman, A., Ramesh, A., Clark, A., Ostrow, A., Welihinda, A., Hayes, A., Radford, A., et al. Gpt-4o system card.arXiv preprint arXiv:2410.21276,

work page internal anchor Pith review Pith/arXiv arXiv
[11]

Mmad: A com- prehensive benchmark for multimodal large language models in industrial anomaly detection.arXiv preprint arXiv:2410.09453,

Jiang, X., Li, J., Deng, H., Liu, Y ., Gao, B.-B., Zhou, Y ., Li, J., Wang, C., and Zheng, F. Mmad: A com- prehensive benchmark for multimodal large language models in industrial anomaly detection.arXiv preprint arXiv:2410.09453,

work page arXiv
[12]

LLaVA-OneVision: Easy Visual Task Transfer

Li, B., Zhang, Y ., Guo, D., Zhang, R., Li, F., Zhang, H., Zhang, K., Zhang, P., Li, Y ., Liu, Z., et al. Llava- onevision: Easy visual task transfer.arXiv preprint arXiv:2408.03326, 2024a. Li, B., Sun, X., Liu, J., Wang, Z., Wu, J., Yu, X., Chen, H., Barsoum, E., Chen, M., and Liu, Z. Latent visual reasoning.arXiv preprint arXiv:2509.24251, 2025a. Li, H....

work page internal anchor Pith review Pith/arXiv arXiv
[13]

Liu, C., Yang, Y ., Fan, Y ., Wei, Q., Liu, S., and Wang, X. E. Reasoning within the mind: Dynamic multimodal inter- leaving in latent space.arXiv preprint arXiv:2512.12623,

work page internal anchor Pith review Pith/arXiv arXiv
[14]

Liu, H., Li, C., Li, Y ., and Lee, Y . J. Improved baselines with visual instruction tuning. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 26296–26306, 2024a. 10 Reason-IAD: Knowledge-Guided Dynamic Latent Reasoning for Explainable Industrial Anomaly Detection Liu, H., Li, C., Li, Y ., Li, B., Zhang, Y ., Shen, S...

work page 2024
[15]

and Ngo, C

Pham, T.-H. and Ngo, C. Multimodal chain of continu- ous thought for latent-space reasoning in vision-language models.arXiv preprint arXiv:2508.12587,

work page arXiv
[16]

J., and Hashimoto, T

Ruan, Y ., Band, N., Maddison, C. J., and Hashimoto, T. Reasoning to learn from latent thoughts.arXiv preprint arXiv:2503.18866,

work page arXiv
[17]

OpenAI GPT-5 System Card

Singh, A., Fry, A., Perelman, A., Tart, A., Ganesh, A., El-Kishky, A., McLaughlin, A., Low, A., Ostrow, A., Ananthram, A., et al. Openai gpt-5 system card.arXiv preprint arXiv:2601.03267,

work page internal anchor Pith review Pith/arXiv arXiv
[18]

Latent chain-of-thought for visual reasoning.arXiv preprint arXiv:2510.23925,

Sun, G., Hua, H., Wang, J., Luo, J., Dianat, S., Rabbani, M., Rao, R., and Tao, Z. Latent chain-of-thought for visual reasoning.arXiv preprint arXiv:2510.23925,

work page arXiv
[19]

Reason-rft: Reinforcement fine-tuning for visual reasoning of vision language models

Tan, H., Ji, Y ., Hao, X., Chen, X., Wang, P., Wang, Z., and Zhang, S. Reason-rft: Reinforcement fine-tuning for visual reasoning of vision language models. InThe Thirty-ninth Annual Conference on Neural Information Processing Systems, 2025a. Tan, W., Li, J., Ju, J., Luo, Z., Luan, J., and Song, R. Think silently, think fast: Dynamic latent compression of...

work page arXiv
[20]

VL-Rethinker: Incentivizing Self-Reflection of Vision-Language Models with Reinforcement Learning

Wang, H., Qu, C., Huang, Z., Chu, W., Lin, F., and Chen, W. Vl-rethinker: Incentivizing self-reflection of vision- language models with reinforcement learning.arXiv preprint arXiv:2504.08837, 2025a. Wang, X., Wang, X., Bai, H., Lim, E. G., and Xiao, J. Cnc: Cross-modal normality constraint for unsupervised multi-class anomaly detection. InProceedings of t...

work page internal anchor Pith review Pith/arXiv arXiv
[21]

Machine mental imagery: Empower multimodal reasoning with latent visual tokens.arXiv preprint arXiv:2506.17218, 2025

Yang, Z., Yu, X., Chen, D., Shen, M., and Gan, C. Machine mental imagery: Empower multimodal reasoning with latent visual tokens.arXiv preprint arXiv:2506.17218,

work page arXiv
[22]

Pku-goodsad: A supermarket goods dataset for unsupervised anomaly de- tection and segmentation.IEEE Robotics and Automation Letters, 9(3):2008–2015,

Zhang, J., Ding, R., Ban, M., and Dai, L. Pku-goodsad: A supermarket goods dataset for unsupervised anomaly de- tection and segmentation.IEEE Robotics and Automation Letters, 9(3):2008–2015,

work page 2008
[23]

Towards training-free anomaly detection with vision and language foundation models

Zhang, J., Wang, G., Jin, Y ., and Huang, D. Towards training-free anomaly detection with vision and language foundation models. InProceedings of the Computer Vi- sion and Pattern Recognition Conference, pp. 15204– 15213, 2025a. 11 Reason-IAD: Knowledge-Guided Dynamic Latent Reasoning for Explainable Industrial Anomaly Detection Zhang, Q., Shao, M., Chen,...

work page arXiv
[24]

12 Reason-IAD: Knowledge-Guided Dynamic Latent Reasoning for Explainable Industrial Anomaly Detection Appendix. A. More Details about Evaluation A.1. Experiment Setup. To comprehensively evaluate the reasoning capabilities of the proposed Reason-IAD, we conduct extensive experiments comparing it against a diverse set of representative models. These includ...

work page 2024