MMVIAD is the first multi-view continuous video dataset for industrial anomaly detection with four supported tasks, and the VISTA model improves average benchmark scores from 45.0 to 57.5 on unseen data while surpassing GPT-5.4.
Towards total recall in industrial anomaly detection
5 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
years
2026 5roles
background 1polarities
background 1representative citing papers
FORGE benchmark shows domain-specific knowledge, not visual grounding, is the main bottleneck for MLLMs in manufacturing, with SFT on a 3B model delivering up to 90.8% relative accuracy improvement on held-out scenarios.
VACE learns compact directionally coherent representations for multivariate time series anomaly detection via velocity-consistency training and reports state-of-the-art results on TSB-AD-M.
DART is a cross-modal foundation model that delivers rope damage classification, severity regression, and few-shot recognition from a single frozen representation trained on 4270 images across 14 damage classes.
EAGLE achieves up to 94.4% anomaly detection accuracy on MVTec-AD and 88.1% on VisA by guiding frozen MLLMs with expert-derived thresholds and confidence-aware attention without parameter updates.
citing papers explorer
-
MMVIAD: Multi-view Multi-task Video Understanding for Industrial Anomaly Detection
MMVIAD is the first multi-view continuous video dataset for industrial anomaly detection with four supported tasks, and the VISTA model improves average benchmark scores from 45.0 to 57.5 on unseen data while surpassing GPT-5.4.
-
FORGE: Fine-grained Multimodal Evaluation for Manufacturing Scenarios
FORGE benchmark shows domain-specific knowledge, not visual grounding, is the main bottleneck for MLLMs in manufacturing, with SFT on a 3B model delivering up to 90.8% relative accuracy improvement on held-out scenarios.
-
VACE: Learning Geometrically Structured Representations for Time Series Anomaly Detection
VACE learns compact directionally coherent representations for multivariate time series anomaly detection via velocity-consistency training and reports state-of-the-art results on TSB-AD-M.
-
DART: A Vision-Language Foundation Model for Comprehensive Rope Condition Monitoring
DART is a cross-modal foundation model that delivers rope damage classification, severity regression, and few-shot recognition from a single frozen representation trained on 4270 images across 14 damage classes.
-
EAGLE: Expert-Augmented Attention Guidance for Tuning-Free Industrial Anomaly Detection in Multimodal Large Language Models
EAGLE achieves up to 94.4% anomaly detection accuracy on MVTec-AD and 88.1% on VisA by guiding frozen MLLMs with expert-derived thresholds and confidence-aware attention without parameter updates.