Recognition: unknown
MMR-AD: A Large-Scale Multimodal Dataset for Benchmarking General Anomaly Detection with Multimodal Large Language Models
Pith reviewed 2026-05-10 15:48 UTC · model grok-4.3
The pith
A new multimodal dataset shows that current generalist MLLMs fall short of industrial standards for anomaly detection, while a reasoning-based model trained on it improves both detection and localization.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
MMR-AD supplies a comprehensive training and evaluation benchmark of multimodal data tailored to anomaly detection, including chain-of-thought reasoning examples that address gaps between web pretraining and industrial needs. Tests on MMR-AD show that current state-of-the-art generalist MLLMs still perform far below industrial requirements for general anomaly detection. Anomaly-R1, trained on the dataset's CoT data and further improved via reinforcement learning, delivers substantial gains in both anomaly detection and localization over those generalist baselines.
What carries the argument
The MMR-AD dataset, which provides multimodal image-text pairs with anomaly-specific chain-of-thought annotations to enable post-training and benchmarking of MLLMs for general anomaly detection.
If this is right
- Generalist MLLMs can reach usable levels of general anomaly detection once supplied with targeted multimodal reasoning data.
- Chain-of-thought supervision plus reinforcement learning improves both detection accuracy and localization precision on novel classes.
- MMR-AD provides a reusable benchmark that allows direct comparison of future MLLM approaches to general anomaly detection.
- Industrial inspection pipelines could shift toward models that handle new product types without per-class retraining.
Where Pith is reading between the lines
- Dataset construction methods used here could be adapted to create similar benchmarks for other specialized visual reasoning problems such as medical defect detection.
- The performance gap likely arises from mismatches in low-level visual features or reasoning style that could be targeted during earlier pretraining stages.
- Real-world deployment would still need separate checks on live production lines to confirm the improvements survive lighting changes, camera angles, and sensor noise.
- Combining the reasoning approach with existing single-class anomaly detectors might create practical hybrid systems for factories with mixed needs.
Load-bearing premise
The scenarios, classes, and annotations inside MMR-AD capture enough of the real diversity and difficulty of industrial anomaly detection that measured gaps and improvements will hold outside the specific test splits.
What would settle it
Evaluating both a generalist MLLM and Anomaly-R1 on a new collection of real factory images from product categories absent from MMR-AD and finding no meaningful difference in detection or localization metrics would falsify the reported improvements.
Figures
read the original abstract
In the progress of industrial anomaly detection, general anomaly detection (GAD) is an emerging trend and also the ultimate goal. Unlike the conventional single- and multi-class AD, general AD aims to train a general AD model that can directly detect anomalies in diverse novel classes without any retraining or fine-tuning on the target data. Recently, Multimodal Large Language Models (MLLMs) have shown great promise in achieving general anomaly detection due to their revolutionary visual understanding and language reasoning capabilities. However, MLLM's general AD ability remains underexplored due to: (1) MLLMs are pretrained on amounts of data sourced from the Web, these data still have significant gaps with the data in AD scenarios. Moreover, the image-text pairs during pretraining are also not specifically for AD tasks. (2) The current mainstream AD datasets are image-based and not yet suitable for post-training MLLMs. To facilitate MLLM-based general AD research, we present MMR-AD, which is a comprehensive benchmark for both training and evaluating MLLM-based AD models. With MMR-AD, we reveal that the AD performance of current SOTA generalist MLLMs still falls far behind the industrial requirements. Based on MMR-AD, we also propose a baseline model, Anomaly-R1, which is a reasoning-based AD model that learns from the CoT data in MMR-AD and is further enhanced by reinforcement learning. Extensive experiments show that our Anomaly-R1 achieves remarkable improvements over generalist MLLMs in both anomaly detection and localization.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper introduces MMR-AD, a large-scale multimodal dataset for training and benchmarking MLLMs on general anomaly detection (GAD) tasks in industrial settings. It argues that web-pretrained SOTA generalist MLLMs underperform due to domain gaps and lack of AD-specific pretraining data, and proposes Anomaly-R1, a CoT- and RL-enhanced reasoning model that achieves notable gains in detection and localization over baselines.
Significance. If the dataset construction and reported gains hold under scrutiny, the work could meaningfully advance MLLM-based GAD research by supplying a dedicated multimodal benchmark with CoT annotations that targets the web-to-industrial domain shift. The emphasis on generalist models without per-class retraining and the RL-enhanced baseline represent constructive steps toward more capable anomaly reasoning systems.
major comments (2)
- [Abstract] Abstract: The central claim that current SOTA generalist MLLMs 'still falls far behind the industrial requirements' is load-bearing for the paper's motivation, yet the abstract supplies no quantitative metrics (e.g., AUROC, localization accuracy), no table references, and no explicit comparison to industrial thresholds; the experiments section must furnish these numbers plus evidence that MMR-AD's imaging conditions and anomaly types lie outside web pretraining distributions.
- [Abstract] Abstract: The assertion of 'remarkable improvements' for Anomaly-R1 via CoT data and reinforcement learning requires demonstration that gains arise from improved general reasoning rather than distribution-specific adaptation; ablations isolating the CoT and RL components, plus evaluation on external industrial AD datasets beyond MMR-AD test splits, are needed to substantiate generalization.
minor comments (2)
- [Abstract] The phrase 'amounts of data' in the abstract should read 'large amounts of data' for grammatical precision.
- [Abstract] The abstract would benefit from a concise statement of MMR-AD scale (image count, class count, anomaly categories) to better support the 'comprehensive benchmark' description.
Simulated Author's Rebuttal
We thank the referee for the constructive feedback on strengthening the motivation and evidence in the abstract. We address each major comment below and will incorporate revisions to improve clarity and substantiation of our claims.
read point-by-point responses
-
Referee: [Abstract] Abstract: The central claim that current SOTA generalist MLLMs 'still falls far behind the industrial requirements' is load-bearing for the paper's motivation, yet the abstract supplies no quantitative metrics (e.g., AUROC, localization accuracy), no table references, and no explicit comparison to industrial thresholds; the experiments section must furnish these numbers plus evidence that MMR-AD's imaging conditions and anomaly types lie outside web pretraining distributions.
Authors: We agree that the abstract would benefit from explicit quantitative support. In the revised version, we will include key metrics (e.g., AUROC and localization accuracy for SOTA generalist MLLMs on MMR-AD) with direct references to the experiments tables. We will also add a concise comparison to typical industrial thresholds (such as AUROC > 0.95 for practical deployment). For the domain gap, we will expand the introduction and dataset sections with concrete details on MMR-AD's controlled industrial imaging conditions, lighting setups, and fine-grained anomaly types that are underrepresented in web-scale pretraining corpora. revision: yes
-
Referee: [Abstract] Abstract: The assertion of 'remarkable improvements' for Anomaly-R1 via CoT data and reinforcement learning requires demonstration that gains arise from improved general reasoning rather than distribution-specific adaptation; ablations isolating the CoT and RL components, plus evaluation on external industrial AD datasets beyond MMR-AD test splits, are needed to substantiate generalization.
Authors: We recognize the need to isolate the sources of improvement. We will add ablation experiments that separately remove or vary the CoT annotations and the RL stage to quantify their individual contributions to reasoning quality. To address generalization, we will report Anomaly-R1 results on at least one external industrial benchmark (e.g., MVTec AD) in addition to MMR-AD, allowing direct comparison of performance outside the training distribution. revision: yes
Circularity Check
No circularity: empirical dataset introduction with standard baseline evaluation
full rationale
The paper presents MMR-AD as a new multimodal dataset for general anomaly detection benchmarking and introduces Anomaly-R1 as a baseline trained via CoT and RL on that dataset. No equations, parameter fits, or derivations are described; performance claims consist of direct empirical comparisons between zero-shot generalist MLLMs and the fine-tuned baseline on the authors' own splits. This structure is self-contained and does not invoke self-citations, uniqueness theorems, or ansatzes that reduce the central results to their own inputs by construction. The reported gaps and improvements are benchmark-specific measurements rather than tautological predictions.
Axiom & Free-Parameter Ledger
axioms (2)
- domain assumption MLLMs pretrained on web data have significant gaps with industrial AD scenarios
- domain assumption Current mainstream AD datasets are unsuitable for post-training MLLMs
invented entities (2)
-
MMR-AD dataset
no independent evidence
-
Anomaly-R1 model
no independent evidence
Reference graph
Works this paper leans on
-
[1]
Cableinspect-ad: An expert- annotated anomaly detection dataset.In NeurIPS, 2024
Akshatha Arodi, Margaux Luck, Jean-Luc Bedwani, Aldo Zaimi, Ge Li, Nicolas Pouliot, Julien Beaudry, and Ga´etan Marceau Caron. Cableinspect-ad: An expert- annotated anomaly detection dataset.In NeurIPS, 2024. 3
2024
-
[2]
Qwen-VL: A Versatile Vision-Language Model for Understanding, Localization, Text Reading, and Beyond
Jinze Bai, Shuai Bai, Shusheng Yang, Shijie Wang, Sinan Tan, Peng Wang, Junyang Lin, Chang Zhou, and Jinren Zhou. Qwen-vl: A frontier large vision-language model with versatile abilities.arXiv preprint arXiv: 2308.12966, 2023. 2
work page internal anchor Pith review Pith/arXiv arXiv 2023
-
[3]
Shuai Bai, Keqin Chen, Xuejing Liu, Jialin Wang, Wenbin Ge, Sibo Song, Kai Dang, Peng Wang, Shijie Wang, Jun Tang, and et al. Qwen2.5-vl technical report.arXiv preprint arXiv:2502.13923, 2025. 2, 3, 7
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[4]
Miad: A main- tenance inspection dataset for unsupervised anomaly detec- tion.In ICCV Workshop, 2023
Tianpeng Bao, Jiadong Chen, Wei Li, Xiang Wang, Jingjing Fei, Liwei Wu, Rui Zhao, and Ye Zheng. Miad: A main- tenance inspection dataset for unsupervised anomaly detec- tion.In ICCV Workshop, 2023. 3
2023
-
[5]
Mvtec ad - a comprehensive real-world dataset for unsupervised anomaly detection.In CVPR, 2019
Paul Bergmann, Michael Fauser, David Sattlegger, and Carsten Steger. Mvtec ad - a comprehensive real-world dataset for unsupervised anomaly detection.In CVPR, 2019. 3
2019
-
[6]
Paul Bergmann, Xin Jin, David Sattlegger, and Carsten Ste- ger. The mvtec 3d-ad dataset for unsupervised 3d anomaly detection and localization.arXiv preprint arXiv:2112.09045,
-
[7]
Beyond dents and scratches: Logical constraints in unsupervised anomaly detection and localization.In IJCV, 2022
Paul Bergmann, Kilian Batzner, Michael Fauser, David Sat- tlegger, and Carsten Steger. Beyond dents and scratches: Logical constraints in unsupervised anomaly detection and localization.In IJCV, 2022. 3
2022
-
[8]
Yunkang Cao, Xiaohao Xu, Chen Sun, Xiaonan Huang, and Weiming Shen. Towards generic anomaly detection and understanding: Large-scale visual-linguistic model (gpt-4v) takes the lead.arXiv preprint arXiv:2311.02782, 2023. 1, 2
-
[9]
A unified anomaly synthesis strategy with gradint ascent for industrial anomaly detection and localization.In ECCV, 2024
Qiyu Chen, Huiyuan Luo, Chengkan LV , and Zhengtao Zhang. A unified anomaly synthesis strategy with gradint ascent for industrial anomaly detection and localization.In ECCV, 2024. 3
2024
-
[10]
Zhe Chen, Weiyun Wang, Yue Cao, Yangzhou Liu, Zhang- wei Gao, Erfei Cui, Jinguo Zhu, Shenglong Ye, Hao Tian, Zhaoyang Liu, and et al. Expanding performance boundaries of open-source multimodal models with model, data, and test-time scaling.arXiv preprint arXiv:2412.05271, 2024. 2
work page internal anchor Pith review Pith/arXiv arXiv 2024
-
[11]
Internvl: Scaling up vision founda- tion models and aligning for generic visual-linguistic tasks
Zhe Chen, Jiannan Wu, Wenhai Wang, Weijie Su, Guo Chen, Sen Xing, Muyan Zhong, Qinglong Zhang, Xizhou Zhu, Lewei Lu, and et al. Internvl: Scaling up vision founda- tion models and aligning for generic visual-linguistic tasks. In CVPR, 2024. 2
2024
-
[12]
Padim: a patch distribution model- ing framework for anomaly detection and localization.In 1st International Workshop on Industrial Machine Learning,
Thomas Defard, Aleksandr Setkov, Angelique Loesch, and Romaric Audigier. Padim: a patch distribution model- ing framework for anomaly detection and localization.In 1st International Workshop on Industrial Machine Learning,
-
[13]
Anomaly detection via reverse distillation from one-class embedding.In CVPR, 2022
Hanqiu Deng and Xingyu Li. Anomaly detection via reverse distillation from one-class embedding.In CVPR, 2022. 1
2022
-
[14]
Xiaoyi Dong, Pan Zhang, Yuhang Zang, Yuhang Cao, Bin Wang, Linke Ouyang, Xilin Wei, Songyang Zhang, Haodong Duan, Maosong Cao, and et al. Internlm-xcomposer2: Mastering free-form text-image composition and compre- hension in vision-language large model.arXiv preprint arXiv:2401.16420, 2024. 2
-
[15]
Manta: A large-scale multi-view and visual-text anomaly detection dataset for tiny objects.In CVPR, 2025
Lei Fan, Dongdong Fan, Zhiguang Hu, Yiwen Ding, Donglin Di, Kai Yi, Maurice Pagnucco, and Yang Song. Manta: A large-scale multi-view and visual-text anomaly detection dataset for tiny objects.In CVPR, 2025. 3
2025
-
[16]
Google Gemini Team. Gemini 2.5: Pushing the fron- tier with advanced reasoning, multimodality, long context, and next generation agentic capabilities.arXiv preprint arXiv:2507.06261, 2025. 2, 5, 7
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[17]
Google Gemma Team. Gemma 3 technical report.arXiv preprint arXiv:2503.19786, 2025. 7
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[18]
Anomalygpt: Detecting in- dustrial anomalies using large vision-language models.In AAAI, 2024
Zhaopeng Gu, Bingke Zhu, Guibo Zhu, Yingying Chen, Ming Tang, and Jinqiao Wang. Anomalygpt: Detecting in- dustrial anomalies using large vision-language models.In AAAI, 2024. 1, 3, 4
2024
-
[19]
Dinomaly: The less is more philosophy in multi-class unsupervised anomaly detection.In CVPR, 2025
Jia Guo, Shuai Lu, Weihang Zhang, Fang Chen, Huiqi Li, and Hongen Liao. Dinomaly: The less is more philosophy in multi-class unsupervised anomaly detection.In CVPR, 2025. 1, 7
2025
-
[20]
Jianfei Hu Hanxi Li, Bo Li, Hao Chen, Yongbin Zheng, and Chunhua Shen. Target before shooting: Accurate anomaly detection and localization under one millisecond via cascade patch retrieval.arXiv preprint arXiv: 2308.06748, 2023. 1
-
[21]
Diad: A diffusion-based framework for multi-class anomaly detection.In AAAI, 2024
Haoyang He, Jiangning Zhang, Hongxu Chen, Xuhai Chen, Zhishan Li, Xu Chen, Yabiao Wang, Chengjie Wang, and Lei Xie. Diad: A diffusion-based framework for multi-class anomaly detection.In AAAI, 2024. 1
2024
-
[22]
Lora: Low rank adaptation of large language models.In ICLR,
Edward Hu, Yelong Shen, Phillip Wallis, Zeyuan Allen-Zhu, Yuanzhi Li, Shean Wang, Lu Wang, and Weizhu Chen. Lora: Low rank adaptation of large language models.In ICLR,
-
[23]
Stepan Jezek, Martin Jonak, Radim Burget, Pavel Dvorak, and Milos Skotak. Deep learning-based defect detection of metal parts: evaluating current methods in complex con- ditions.In 13th International Congress on Ultra Modern Telecommunications and Control Systems and Workshops,
-
[24]
Mmad: The first-ever comprehensive benchmark for multi- modal large language models in indutrial anomaly detection
Xi Jiang, Jian Li, Hanqiu Deng, Yong Liu, Bin-Bin Gao, Yifeng Zhou, Jialin Li, ChengjieWang, and Feng Zheng. Mmad: The first-ever comprehensive benchmark for multi- modal large language models in indutrial anomaly detection. In ICLR, 2025. 3, 4, 5
2025
-
[25]
Yuqi Jiang, Xudong Lu, Qian Jin, Qi Sun, Hanming Wu, and Cheng Zhuo. Fabgpt: An efficient large multimodal model for complex wafer defect knowledge queries.arXiv preprint arXiv:2407.10810, 2024. 3
-
[26]
Vmad: Visual-enhanced multimodal large lan- guage model for zero-shot anomaly detection.IEEE Trans- actions on Automation Science and Engineering, 2024
Yuqi Jiang, Xudong Lu, Qian Jin, Qi Sun, Hanming Wu, and Cheng Zhuo. Vmad: Visual-enhanced multimodal large lan- guage model for zero-shot anomaly detection.IEEE Trans- actions on Automation Science and Engineering, 2024. 3
2024
-
[27]
Tianwu Lei, Bohan Wang, Silin Chen, Shurong Cao, and Ningmu Zou. Texture-ad: An anomaly detection dataset and benchmark for real algorithm development.arXiv preprint arXiv:2409.06367, 2024. 3
-
[28]
Yuanze Li, Haolin Wang, Shihao Yuan, Ming Liu, Debin Zhao, Yiwen Guo, Chen Xu, and Guangming Shi. Myr- iad: Large multimodal model by applying vision experts for industrial anomaly detection.arXiv preprint arXiv: 2310.19070, 2023. 1, 3
-
[29]
Visual instruction tuning.In NeurIPS, 2023
Haotian Liu, Chunyuan Li, Qingyang Wu, and Yong Jae Lee. Visual instruction tuning.In NeurIPS, 2023. 2
2023
-
[30]
Llava- next: Improved reasoning, ocr, and world knowledge
Haotian Liu, Chunyuan Li, Yuheng Li, Bo Li, Yuan- han Zhang, Sheng Shen, and Yong Jae Lee. Llava- next: Improved reasoning, ocr, and world knowledge. https://github.com/LLaVA-VL/LLaVA-NeXT, 2024. 2
2024
-
[31]
Exploring intrinsic normal prototypes within a single image for universal anomaly detection.In CVPR, 2025
Wei Luo, Yunkang Cao, Haiming Yao, Xiaotian Zhang, Jianan Lou, Yuqi Cheng, Weiming Shen, and Wenyong Yu. Exploring intrinsic normal prototypes within a single image for universal anomaly detection.In CVPR, 2025. 1, 7
2025
-
[32]
The llama 4 herd: The beginning of a new era of natively multimodal ai innovation
Meta. The llama 4 herd: The beginning of a new era of natively multimodal ai innovation. https://ai.meta.com/blog/llama-4-multimodal-intelligence,
-
[33]
Turning up the heat: Min-p sampling for creative and coherent llm outputs.In ICLR, 2025
Nguyen Nhat Minh, Andrew Baker, Clement Neo, Allen Roush, Andreas Kirsch, and Ravid Shwartz-Ziv. Turning up the heat: Min-p sampling for creative and coherent llm outputs.In ICLR, 2025. 6, 2
2025
-
[34]
Gpt-4v(ision) system card
OpenAI. Gpt-4v(ision) system card. https://cdn.openai.com/papers/GPTV System Card.pdf,
-
[35]
Gpt-4o system card.https://cdn.openai.com/gpt- 4o-system-card.pdf, 2024
OpenAI. Gpt-4o system card.https://cdn.openai.com/gpt- 4o-system-card.pdf, 2024. 7
2024
-
[36]
Gpt-5 system card.https://cdn.openai.com/gpt-5- system-card.pdf, 2025
OpenAI. Gpt-5 system card.https://cdn.openai.com/gpt-5- system-card.pdf, 2025. 7
2025
-
[37]
Openai o3 and o4-mini system card
OpenAI. Openai o3 and o4-mini system card. https://cdn.openai.com/pdf/2221c875-02dc-4789-800b- e7758f3722c1/o3-and-o4-mini-system-card.pdf, 2025. 2, 5, 7
2025
-
[38]
Kosmos-2: Grounding Multimodal Large Language Models to the World
Zhiliang Peng, Wenhui Wang, Li Dong, Yaru Hao, Shaohan Huang, Shuming Ma, and Furu Wei. Kosmos-2: Ground- ing multimodal large language models to the world.arXiv preprint arXiv:2306.14824, 2023. 2
work page internal anchor Pith review arXiv 2023
-
[39]
Qvq-max: Think with evidence
Alibaba Qwen Team. Qvq-max: Think with evidence. https://qwenlm.github.io/blog/qvq-max-preview, 2025. 7
2025
-
[40]
Manning, and Chelsea Finn
Rafael Rafailov, Archit Sharma, Eric Mitchell, Stefano Er- mon, Christopher D. Manning, and Chelsea Finn. Direct preference optimization: Your language model is secretly a reward model.In NeurIPS, 2023. 5
2023
-
[41]
Towards total recall in industrial anomaly detection.In CVPR, 2022
Karsten Roth, Latha Pemula, Joaquin Zepeda, Bernhard Scholkopf, Thomas Brox, and Peter Gehler. Towards total recall in industrial anomaly detection.In CVPR, 2022. 1, 7
2022
-
[42]
DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models
Zhihong Shao, Peiyi Wang, Qihao Zhu, Runxin Xu, Junxiao Song, Xiao Bi, Haowei Zhang, Mingchuan Zhang, YK Li, Y Wu, and et al. Deepseekmath: Pushing the limits of mathe- matical reasoning in open language models.arXiv preprint arXiv:2402.03300, 2024. 5
work page internal anchor Pith review Pith/arXiv arXiv 2024
-
[43]
HybridFlow: A Flexible and Efficient RLHF Framework
Guangming Sheng, Chi Zhang, Zilingfeng Ye, Xibin Wu, Wang Zhang, Ru Zhang, Yanghua Peng, Haibin Lin, and Chuan Wu. Hybridflow: A flexible and efficient rlhf frame- work.arXiv preprint arXiv:2409.19256, 2024. 2
work page internal anchor Pith review arXiv 2024
-
[44]
Gemini: a family of highly capable multimodal models.arXiv preprint arXiv:2312.1180, 2023
Gemini Team, Rohan Anil, Sebastian Borgeaud, Yonghui Wu, Jean-Baptiste Alayrac, Jiahui Yu, Radu Soricut, Johan Schalkwyk, Andrew M Dai, Anja Hauth, and et al. Gemini: a family of highly capable multimodal models.arXiv preprint arXiv:2312.1180, 2023. 2
-
[45]
Real-iad: A real-world multi-view dataset for benchmarking versatile industrial anomaly detec- tion.In CVPR, 2024
Chengjie Wang, Wenbing Zhu, Bin-Bin Gao, Zhenye Gan, Jiangning Zhang, Zhihao Gu, Shuguang Qian, Mingang Chen, and Lizhuang Ma. Real-iad: A real-world multi-view dataset for benchmarking versatile industrial anomaly detec- tion.In CVPR, 2024. 2, 3
2024
-
[46]
Qwen2-VL: Enhancing Vision-Language Model's Perception of the World at Any Resolution
Peng Wang, Shuai Bai, Sinan Tan, Shijie Wang, Zhihao Fan, Jinze Bai, Keqin Chen, Xuejing Liu, Jialin Wang, Wen- bin Ge, and et al. Qwen2-vl: Enhancing vision-language model’s perception of the world at any resolution.arXiv preprint arXiv:2409.12191, 2024. 2
work page internal anchor Pith review Pith/arXiv arXiv 2024
-
[47]
Cogvlm: Visual expert for pretrained language 11 models
Weihan Wang, Qingsong Lv, Wenmeng Yu, Wenyi Hong, Ji Qi, Yan Wang, Junhui Ji, Zhuoyi Yang, Lei Zhao, Xixuan Song, and et al. Cogvlm: Visual expert for pretrained lan- guage models.arXiv preprint arXiv:2311.03079, 2023. 2
-
[48]
InternVL3.5: Advancing Open-Source Multimodal Models in Versatility, Reasoning, and Efficiency
Weiyun Wang, Zhangwei Gao, Lixin Gu, Hengjun Pu, Long Cui, XingguangWei, Zhaoyang Liu, and et al. Internvl3.5: Advancing open-source multimodal models in versatility, reasoning, and efficiency.arXiv preprint arXiv:2508.18265,
work page internal anchor Pith review Pith/arXiv arXiv
-
[49]
Patel, and Isht Dwivedi
Jiacong Xu, Shao-Yuan Lo, Bardia Safaei, Vishal M. Patel, and Isht Dwivedi. Towards zero-shot anomaly detection and reasoning with multimodal large language models.In CVPR,
-
[50]
Xiaohao Xu, Yunkang Cao, Yongqi Chen, Weiming Shen, and Xiaonan Huang. Customizing visual-language founda- tion models for multi-modal anomaly detection and reason- ing.arXiv preprint arXiv:2403.11083, 2024. 1, 2
-
[51]
3cad: A large-scale real- world 3c product dataset for unsupervised anomaly detec- tion.In AAAI, 2025
Enquan Yang, Peng Xing, Hanyang Sun, Wenbo Guo, Yuan- wei Ma, Zechao Li, and Dan Zeng. 3cad: A large-scale real- world 3c product dataset for unsupervised anomaly detec- tion.In AAAI, 2025. 3
2025
-
[52]
Focus the discrepancy: Intra- and inter- correlation learning for image anomaly detection.In ICCV,
Xincheng Yao, Ruoqi Li, Zefeng Qian, Yan Luo, and Chongyang Zhang. Focus the discrepancy: Intra- and inter- correlation learning for image anomaly detection.In ICCV,
-
[53]
Explicit boundary guided semi-push- pull contrastive learning for supervised anomaly detection
Xincheng Yao, Ruoqi Li, Jing Zhang, Jun Sun, and Chongyang Zhang. Explicit boundary guided semi-push- pull contrastive learning for supervised anomaly detection. In CVPR, 2023. 1
2023
-
[54]
One-for-all: Proposal masked cross-class anomaly detection.In AAAI, 2023
Xincheng Yao, Chongyang Zhang, Ruoqi Li, Jun Sun, and Zhenyu Liu. One-for-all: Proposal masked cross-class anomaly detection.In AAAI, 2023. 1
2023
-
[55]
Resad: A simple framework for class generalizable anomaly detection.In neurIPS, 2024
Xincheng Yao, Zixin Chen, Guangtao Zhai, and Chongyang Zhang. Resad: A simple framework for class generalizable anomaly detection.In neurIPS, 2024. 1
2024
-
[56]
Hierarchical gaussian mixture normaliz- ing flow modeling for unified anomaly detection.In ECCV,
Xincheng Yao, Ruoqi Li, Zefeng Qian, Lu Wang, and Chongyang Zhang. Hierarchical gaussian mixture normaliz- ing flow modeling for unified anomaly detection.In ECCV,
-
[57]
mplug-owl2: Revolutionizing multi-modal large language model with modality collaboration
Qinghao Ye, Haiyang Xu, Jiabo Ye, Ming Yan, Haowei Liu, Qi Qian, Ji Zhang, Fei Huang, and Jingren Zhou. mplug-owl2: Revolutionizing multi-modal large language model with modality collaboration.arXiv preprint arXiv:2311.04257, 2023. 2
-
[58]
A unified model for multi-class anomaly detection.arXiv preprint arXiv:2206.03687, 2022
Zhiyuan You, Lei Cui, Yujun Shen, Kai Yang, Xin Lu, Yu Zheng, and Xinyi Le. A unified model for multi-class anomaly detection.arXiv preprint arXiv:2206.03687, 2022. 1
-
[59]
DAPO: An Open-Source LLM Reinforcement Learning System at Scale
Qiying Yu, Zheng Zhang, Ruofei Zhu, Yufeng Yuan, Xi- aochen Zuo, Yu Yue, Weinan Dai, Tiantian Fan, Gaohong Liu, Lingjun Liu, Xin Liu, Haibin Lin, Zhiqi Lin, Bole Ma, Guangming Sheng, Yuxuan Tong, Chi Zhang, Mofan Zhang, Wang Zhang, Hang Zhu, Jinhua Zhu, Jiaze Chen, Jiangjie Chen, Chengyi Wang, Hongli Yu, Yuxuan Song, Xi- angpeng Wei, Hao Zhou, Jingjing Li...
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[60]
Sage: A visual lan- guage model for anomaly detection via fact enhancement and entropy-aware alignment.In ACM MM, 2025
Guoxin Zang, Xue Li, Donglin Di, Lanshun Nie, Dechen Zhan, Yang Song, and Lei Fan. Sage: A visual lan- guage model for anomaly detection via fact enhancement and entropy-aware alignment.In ACM MM, 2025. 3
2025
-
[61]
Jiangning Zhang, Haoyang He, Xuhai Chen2, Zhucun Xue, Yabiao Wang, Chengjie Wang, Lei Xie, and Yong Liu. Gpt-4v-ad: Exploring grounding potential of vqa-oriented gpt-4v for zero-shot anomaly detection.arXiv preprint arXiv:2311.02612, 2023. 1, 2
-
[62]
Pku-goodsad: A supermarket goods dataset for unsupervised anomaly detection and segmentation.IEEE Robotics and Au- tomation Letters, 2024
Jian Zhang, Runwei Ding, Miaoju Ban, and Linhui Dai. Pku-goodsad: A supermarket goods dataset for unsupervised anomaly detection and segmentation.IEEE Robotics and Au- tomation Letters, 2024. 3
2024
-
[63]
Anomalyclip: Object-agnostic prompt learning for zero-shot anomaly detection.In ICLR, 2024
Qihang Zhou, Guansong Pang, Yu Tian, Shibo He, and Jim- ing Chen. Anomalyclip: Object-agnostic prompt learning for zero-shot anomaly detection.In ICLR, 2024. 1
2024
-
[64]
Minigpt-4: Enhancing vision-language understanding with advanced large language models.In ICLR, 2024
Deyao Zhu, Jun Chen, Xiaoqian Shen, Xiang Li, and Mo- hamed Elhoseiny. Minigpt-4: Enhancing vision-language understanding with advanced large language models.In ICLR, 2024. 2
2024
-
[65]
Toward generalist anomaly detection via in-context residual learning with few-shot sam- ple prompts.In CVPR, 2024
JiaWen Zhu and Guansong Pang. Toward generalist anomaly detection via in-context residual learning with few-shot sam- ple prompts.In CVPR, 2024. 1
2024
-
[66]
InternVL3: Exploring Advanced Training and Test-Time Recipes for Open-Source Multimodal Models
Jinguo Zhu, Weiyun Wang, Zhe Chen, Zhaoyang Liu, Shen- glong Ye, Lixin Gu, Hao Tian, and et al. Internvl3: Exploring advanced training and test-time recipes for open-source mul- timodal models.arXiv preprint arXiv:2504.10479, 2025. 7
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[67]
Real-iad d³: A real-world 2d/pseudo-3d/3d dataset for industrial anomaly detection.In CVPR, 2025
Wenbing Zhu, Lidong Wang, Ziqing Zhou1, Chengjie Wang, Yurui Pan, Ruoyi Zhang, Zhuhao Chen, Linjie Cheng, Bin- Bin Gao, Jiangning Zhang, Zhenye Gan, Yuxie Wang, Yu- long Chen, Shuguang Qian, Mingmin Chi, Bo Peng, and Lizhuang Ma. Real-iad d³: A real-world 2d/pseudo-3d/3d dataset for industrial anomaly detection.In CVPR, 2025. 3
2025
-
[68]
Spot-the-difference self-supervised pre- training for anomaly detection and segmentation.In ECCV,
Yang Zou, Jongheon Jeong, Latha Pemula, Dongqing Zhang, and Onkar Dabeer. Spot-the-difference self-supervised pre- training for anomaly detection and segmentation.In ECCV,
-
[69]
Bag of Words
3 MMR-AD: A Large-Scale Multimodal Dataset for Benchmarking General Anomaly Detection with Multimodal Large Language Models Supplementary Material A. More Discussions Chain-of-Thought Data.One concern about our work is that our MMR-AD dataset relies on Chain-of-Thought (CoT) reasoning, which is primarily designed for language- centric reasoning tasks, whi...
2048
-
[70]
The second image has an irregular dark spot on the lower left side of the rim, which appears to be a different texture or material
**Surface Texture and Color**: The first image shows a smooth surface with consistent coloration around the rim. The second image has an irregular dark spot on the lower left side of the rim, which appears to be a different texture or material
-
[71]
**Shape and Symmetry**: Both images appear circular and symmetrical overall, but the presence of the dark spot disrupts this symmetry in the second image
-
[72]
Based on these observations, the dark spot is likely an anomaly
**Light Reflection**: The light reflection patterns seem similar in both images, indicating no significant changes in the material properties except for the dark spot. Based on these observations, the dark spot is likely an anomaly. </think><answer>Yes.{'bbox_2d': [128, 409, 205, 476], 'label': 'dark spot'}</answer> <think>T odetermine if there are any an...
-
[73]
The second image appears to have some irregularities
**Visual Inspection**: The first image shows a clean and uniform surface with no visible defects. The second image appears to have some irregularities
-
[74]
This area has a rough texture and color variation compared to the smooth surface in the first image
**Surface Analysis**: Upon closer inspection of the second image, there seems to be an area where the material looks different from the rest. This area has a rough texture and color variation compared to the smooth surface in the first image
-
[75]
By analyzing the position relative to the overall structure, the coordinates can be estimated as [97, 296, 184, 370]
**Location Identification**: The anomaly is located towards the bottom left quadrant of the image. By analyzing the position relative to the overall structure, the coordinates can be estimated as [97, 296, 184, 370]. Based on this analysis, there is indeed an anomaly present in the second image. </think><answer>Yes. {'bbox_2d': [97, 296, 184, 370], 'label...
-
[76]
- There is no visible damage or discoloration on the insulation
**Cable Insulation Color and Condition**: - The colors of the cables (green, blue, brown) appear consistent between both images. - There is no visible damage or discoloration on the insulation
-
[77]
- In the second image, the green cable's copper strands seem slightly more spread out than in the first image, but this differenceis minimal and may not be significant
**Copper Wire Strands**: - In the first image, the copper wire strands are tightly packed and evenly distributed within each cable. - In the second image, the green cable's copper strands seem slightly more spread out than in the first image, but this differenceis minimal and may not be significant. - However, the blue cable in the second image shows a no...
-
[78]
Based on this analysis, the only notable anomaly is the protruding copper strands in the blue cable of the second image
**Overall Cable Structure**: - The overall structure of the cables appears similar in both images except for the mentioned protrusion in the blue cable. Based on this analysis, the only notable anomaly is the protruding copper strands in the blue cable of the second image. </think><answer>Yes.{'bbox_2d': [94, 205, 240, 336], 'label': 'Protruding Copper St...
-
[79]
**Shape and Structure**: Both capsules appear to have the same overall shape and structure
-
[80]
actavis" and
**Color and Markings**: The color distribution and markings ("actavis" and "500") seem consistent between the two images
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.