MLE-UVAD: Minimal Latent Entropy Autoencoder for Fully Unsupervised Video Anomaly Detection
Pith reviewed 2026-05-15 01:16 UTC · model grok-4.3
The pith
Adding minimal latent entropy loss to autoencoders creates a reconstruction gap for unsupervised video anomaly detection.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The dual-loss design of reconstruction plus minimal latent entropy produces a clear reconstruction gap that enables effective anomaly detection in single-scene fully unsupervised video anomaly detection by pulling sparse anomalous latent embeddings into the dominant normal cluster.
What carries the argument
Minimal Latent Entropy (MLE) loss that minimizes the entropy of latent embeddings to encourage concentration around high-density regions.
If this is right
- The approach outperforms baselines on two standard benchmarks and a self-collected driving dataset.
- It works directly on raw videos without labels or normal-only data, avoiding issues with distribution shifts.
- Anomaly detection is performed by thresholding the reconstruction error.
- The method is robust in single-scene settings where anomalies are sparse.
Where Pith is reading between the lines
- This technique might apply to other unsupervised outlier detection tasks in sequences where one class dominates.
- If the proportion of anomalies increases, the clustering effect could weaken, requiring adjustments to the entropy term.
- It implicitly performs a form of density estimation in latent space without explicit modeling.
Load-bearing premise
Normal frames dominate the raw video so that minimizing latent entropy pulls sparse anomalous embeddings into the normal cluster without any labels.
What would settle it
Evaluating the method on a dataset where anomalous frames constitute the majority would likely show degraded performance due to the loss of the dominance assumption.
Figures
read the original abstract
In this paper, we address the challenging problem of single-scene, fully unsupervised video anomaly detection (VAD), where raw videos containing both normal and abnormal events are used directly for training and testing without any labels. This differs sharply from prior work that either requires extensive labeling (fully or weakly supervised) or depends on normal-only videos (one-class classification), which are vulnerable to distribution shifts and contamination. We propose an entropy-guided autoencoder that detects anomalies through reconstruction error by reconstructing normal frames well while making anomalies reconstruct poorly. The key idea is to combine the standard reconstruction loss with a novel Minimal Latent Entropy (MLE) loss in the autoencoder. Reconstruction loss alone maps normal and abnormal inputs to distinct latent clusters due to their inherent differences, but also risks reconstructing anomalies too well to detect. Therefore, MLE loss addresses this by minimizing the entropy of latent embeddings, encouraging them to concentrate around high-density regions. Since normal frames dominate the raw video, sparse anomalous embeddings are pulled into the normal cluster, so the decoder emphasizes normal patterns and produces poor reconstructions for anomalies. This dual-loss design produces a clear reconstruction gap that enables effective anomaly detection. Extensive experiments on two widely used benchmarks and a challenging self-collected driving dataset demonstrate that our method achieves robust and superior performance over baselines.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper proposes MLE-UVAD, an autoencoder for fully unsupervised single-scene video anomaly detection that trains directly on raw mixed videos. It augments standard reconstruction loss with a Minimal Latent Entropy (MLE) loss that minimizes the entropy of latent embeddings. The central claim is that reconstruction loss alone separates normal and anomalous inputs into distinct clusters, while the MLE term concentrates embeddings around high-density regions; because normal frames dominate, sparse anomalous latents are pulled into the normal mode, causing the decoder to reconstruct normals well and anomalies poorly and thereby producing a usable reconstruction-error gap for detection. Experiments on two public benchmarks plus a self-collected driving dataset are reported to show superior performance over baselines.
Significance. If the claimed clustering mechanism is shown to hold, the method would constitute a meaningful advance for unsupervised VAD by removing dependence on labels or clean normal-only training sets and by offering robustness to distribution shift. The dual-loss formulation is conceptually simple and could be portable to other reconstruction-based anomaly tasks, provided the entropy term reliably produces the asserted normal-mode attractor rather than collapse or multi-modal equilibria.
major comments (3)
- [Abstract and §3] Abstract and §3: the claim that MLE loss 'pulls sparse anomalous embeddings into the normal cluster' because normals dominate is stated without derivation, gradient-flow analysis, or proof that the joint loss has a unique high-density normal attractor. No equation or lemma shows why the entropy term does not induce uniform collapse or preserve separate modes of comparable density, which is load-bearing for the reconstruction-gap argument.
- [Experiments] Experiments section: superior benchmark numbers are presented, yet no ablation isolates the MLE term from generic regularization, no latent-space diagnostics (t-SNE, entropy histograms, mode-separation metrics) are supplied, and no controlled variation of anomaly ratio is reported to test the dominance assumption. These omissions leave the central mechanism unverified.
- [§3] §3: the balancing weight between reconstruction and MLE losses is treated as a free hyper-parameter; its sensitivity and the range over which the claimed clustering occurs should be quantified, as the method's robustness claim rests on this choice.
minor comments (2)
- [Abstract] The abstract would be clearer if it briefly stated the mathematical form of the MLE loss (e.g., entropy estimator used) rather than describing it only at the level of 'minimizing entropy of latent embeddings.'
- [§3] Notation for the latent distribution and entropy estimator should be introduced once in §3 and used consistently thereafter to avoid ambiguity.
Simulated Author's Rebuttal
We thank the referee for the constructive comments. We address each major point below and have revised the manuscript to incorporate additional analysis, ablations, and sensitivity studies to better substantiate our claims.
read point-by-point responses
-
Referee: Abstract and §3: the claim that MLE loss 'pulls sparse anomalous embeddings into the normal cluster' because normals dominate is stated without derivation, gradient-flow analysis, or proof that the joint loss has a unique high-density normal attractor. No equation or lemma shows why the entropy term does not induce uniform collapse or preserve separate modes of comparable density, which is load-bearing for the reconstruction-gap argument.
Authors: We agree that a more rigorous justification is needed. In the revised manuscript, we have added a gradient analysis in Section 3.2 showing that the MLE loss term creates an attractive force towards high-density regions in latent space. Combined with the reconstruction loss, which preserves separation between normal and anomalous modes, the joint optimization leads to anomalous points being pulled into the dominant normal mode. We include a simple derivation demonstrating that uniform collapse is prevented by the reconstruction term maintaining distinct reconstruction errors. Empirical support is provided via new latent visualizations. revision: yes
-
Referee: Experiments section: superior benchmark numbers are presented, yet no ablation isolates the MLE term from generic regularization, no latent-space diagnostics (t-SNE, entropy histograms, mode-separation metrics) are supplied, and no controlled variation of anomaly ratio is reported to test the dominance assumption. These omissions leave the central mechanism unverified.
Authors: We have revised the Experiments section to include comprehensive ablations. Specifically, we compare against variants using L2 regularization or dropout instead of MLE to isolate its effect. We provide t-SNE plots of latent embeddings, histograms of latent entropy, and quantitative mode separation metrics (e.g., silhouette score). Additionally, we report results with anomaly ratios varied from 5% to 30% in synthetic mixtures, confirming the dominance assumption holds and performance degrades gracefully as anomaly ratio increases. revision: yes
-
Referee: §3: the balancing weight between reconstruction and MLE losses is treated as a free hyper-parameter; its sensitivity and the range over which the claimed clustering occurs should be quantified, as the method's robustness claim rests on this choice.
Authors: We have added a sensitivity analysis for the balancing hyper-parameter λ in the revised Section 3 and Experiments. We evaluate performance for λ ranging from 0.01 to 5.0 on the benchmarks, showing stable superior results for λ ∈ [0.1, 1.0]. Outside this range, either the reconstruction gap diminishes (low λ) or training becomes unstable (high λ). We recommend λ=0.5 as default and discuss how to tune it based on dataset characteristics. revision: yes
Circularity Check
No circularity in the proposed dual-loss autoencoder design
full rationale
The paper proposes a new Minimal Latent Entropy (MLE) loss term added to standard reconstruction loss. The central mechanism—that entropy minimization concentrates embeddings and pulls sparse anomalies into the dominant normal cluster—is presented as an intuitive consequence of data dominance and the joint loss, without any equation that reduces the claimed reconstruction gap to a fitted parameter or self-referential definition. No self-citations, uniqueness theorems, or ansatzes from prior author work are invoked as load-bearing steps. The derivation chain is self-contained as a design choice whose validity rests on empirical benchmark results rather than circular reduction.
Axiom & Free-Parameter Ledger
free parameters (1)
- balancing weight between reconstruction and MLE losses
axioms (1)
- domain assumption Normal frames dominate the raw video
Forward citations
Cited by 1 Pith paper
-
COPRA: Conditional Parameter Adaptation with Reinforcement Learning for Video Anomaly Detection
COPRA introduces conditional parameter adaptation via RL to dynamically tune frozen VLMs for video anomaly detection, outperforming static methods in in-domain and cross-domain settings while generalizing to other vid...
Reference graph
Works this paper leans on
-
[1]
In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition
Acsintoae,A.,Florescu,A.,Georgescu,M.I.,Mare,T.,Sumedrea,P.,Ionescu,R.T., Khan, F.S., Shah, M.: Ubnormal: New benchmark for supervised open-set video anomaly detection. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. pp. 20143–20153 (2022)
work page 2022
-
[2]
In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)
Al-lahham, A., Zaheer, M.Z., Tastan, N., Nandakumar, K.: Collaborative learning of anomalies with privacy (clap) for unsupervised video anomaly detection: A new baseline. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). pp. 12416–12425 (June 2024)
work page 2024
-
[3]
IEEE Transactions on Systems, Man, and Cybernetics: Systems 51(9), 5819–5829 (2019)
Chen, B., Dang, L., Gu, Y., Zheng, N., Príncipe, J.C.: Minimum error entropy kalman filter. IEEE Transactions on Systems, Man, and Cybernetics: Systems 51(9), 5819–5829 (2019)
work page 2019
-
[4]
Dawoud, K., Zaheer, Z., Khan, M., Nandakumar, K., Elsaddik, A., Khan, M.H.: Fusedvision: A knowledge-infusing approach for practical anomaly detection in real-world surveillance videos. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops. pp. 4045–4055 (June 2025)
work page 2025
-
[5]
In: The Twelfth International Conference on Learning Representations (2024)
Dong, Y., Gong, T., Chen, H., Yu, S., Li, C.: Rethinking information-theoretic generalization: Loss entropy induced pac bounds. In: The Twelfth International Conference on Learning Representations (2024)
work page 2024
-
[6]
In: 2025 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)
Geng, Y., Zhou, Y., Zhang, Y., Zhang, Z.R., Yang, K., Ruble, T., Vidal, G., Ruchkin, I.: Unsupervised anomaly detection improves imitation learning for autonomous racing. In: 2025 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). pp. 11654–11660. IEEE (2025)
work page 2025
-
[7]
In: 2022 IEEE International Conference on Multimedia and Expo (ICME)
Hu,J.,Yu,G.,Wang,S.,Zhu,E.,Cai,Z.,Zhu,X.:Detectinganomalouseventsfrom unlabeled videos via temporal masked auto-encoding. In: 2022 IEEE International Conference on Multimedia and Expo (ICME). pp. 1–6. IEEE (2022)
work page 2022
-
[8]
In: 2025 IEEE/CVF Winter Conference on Ap- plications of Computer Vision (WACV)
Im, J., Son, Y., Hong, J.H.: Fun-ad: Fully unsupervised learning for anomaly de- tection with noisy training data. In: 2025 IEEE/CVF Winter Conference on Ap- plications of Computer Vision (WACV). pp. 9447–9456. IEEE (2025)
work page 2025
-
[9]
In: Proceedings of the IEEE/CVF winter conference on applications of computer vision
Karim, H., Doshi, K., Yilmaz, Y.: Real-time weakly supervised video anomaly detection. In: Proceedings of the IEEE/CVF winter conference on applications of computer vision. pp. 6848–6856 (2024)
work page 2024
-
[10]
Kommanduri, R., Ghorai, M.: Dast-net: Dense visual attention augmented spatio- temporalnetworkforunsupervisedvideoanomalydetection.Neurocomputing579, 127444 (2024) 16 Y. Geng et al
work page 2024
-
[11]
In: Proceedings of the AAAI conference on artificial intelligence
Li, H., Yu, S., Principe, J.: Causal recurrent variational autoencoder for medi- cal time series generation. In: Proceedings of the AAAI conference on artificial intelligence. vol. 37, pp. 8562–8570 (2023)
work page 2023
-
[12]
Liu, F., Huang, X., Chen, Y., Suykens, J.A.: Random features for kernel approxi- mation:Asurveyonalgorithms,theory,andbeyond.IEEETransactionsonPattern Analysis and Machine Intelligence44(10), 7128–7148 (2021)
work page 2021
-
[13]
Liu, J., Liu, Y., Lin, J., Li, J., Cao, L., Sun, P., Hu, B., Song, L., Boukerche, A., Leung, V.C.: Networking systems for video anomaly detection: A tutorial and survey. ACM Comput. Surv.57(10) (May 2025).https://doi.org/10.1145/ 3729222,https://doi.org/10.1145/3729222
-
[14]
IEEE Transactions on Neural Networks and Learning Systems pp
Liu, Y., Liu, S., Zhu, X., Yang, H., Li, J., Guo, J., Teng, L., Yang, D., Wang, Y., Liu, J.: Privacy-preserving video anomaly detection: A survey. IEEE Transactions on Neural Networks and Learning Systems pp. 1–22 (2025).https://doi.org/10. 1109/TNNLS.2025.3600252
-
[15]
In: 2025 IEEE 101st Vehicular Technology Conference (VTC2025-Spring)
Pallewela, R., Eldeeb, E., Alves, H.: An analysis of minimum error entropy loss functions in wireless communications. In: 2025 IEEE 101st Vehicular Technology Conference (VTC2025-Spring). pp. 1–6. IEEE (2025)
work page 2025
-
[16]
In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition
Pang, G., Yan, C., Shen, C., Hengel, A.v.d., Bai, X.: Self-trained deep ordinal re- gression for end-to-end video anomaly detection. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. pp. 12173–12182 (2020)
work page 2020
-
[17]
Pattern Recognition153, 110550 (2024)
Qiu, S., Ye, J., Zhao, J., He, L., Liu, L., Huang, X., et al.: Video anomaly detection guided by clustering learning. Pattern Recognition153, 110550 (2024)
work page 2024
-
[18]
Ramachandra, B., Jones, M.J., Vatsavai, R.R.: A survey of single-scene video anomaly detection. IEEE Transactions on Pattern Analysis and Machine Intelli- gence44(5), 2293–2312 (2022).https://doi.org/10.1109/TPAMI.2020.3040591
-
[19]
In: 2017 IEEE international conference on image processing (ICIP)
Ravanbakhsh, M., Nabi, M., Sangineto, E., Marcenaro, L., Regazzoni, C., Sebe, N.: Abnormal event detection in videos using generative adversarial nets. In: 2017 IEEE international conference on image processing (ICIP). pp. 1577–1581. IEEE (2017)
work page 2017
-
[20]
Rényi, A.: On measures of entropy and information. In: Proceedings of the fourth Berkeley symposium on mathematical statistics and probability, volume 1: con- tributions to the theory of statistics. vol. 4, pp. 547–562. University of California Press (1961)
work page 1961
-
[21]
In: Proceedings of the IEEE/CVF winter conference on applications of computer vision
Rodrigues, R., Bhargava, N., Velmurugan, R., Chaudhuri, S.: Multi-timescale tra- jectory prediction for abnormal human activity detection. In: Proceedings of the IEEE/CVF winter conference on applications of computer vision. pp. 2626–2634 (2020)
work page 2020
-
[22]
In: Proceedings of the IEEE conference on computer vision and pattern recognition
Sultani, W., Chen, C., Shah, M.: Real-world anomaly detection in surveillance videos. In: Proceedings of the IEEE conference on computer vision and pattern recognition. pp. 6479–6488 (2018)
work page 2018
-
[23]
Sun, Z., Wang, P., Zheng, W., Zhang, M.: Dual groupgan: An unsupervised four-competitor (2v2) approach for video anomaly detection. Pattern Recog- nition153, 110500 (2024).https://doi.org/https://doi.org/10.1016/j. patcog.2024.110500,https://www.sciencedirect.com/science/article/pii/ S0031320324002516
work page doi:10.1016/j 2024
-
[24]
IEEE Transactions on Mul- timedia26, 10160–10173 (2024)
Tao, C., Wang, C., Lin, S., Cai, S., Li, D., Qian, J.: Feature reconstruction with disruption for unsupervised video anomaly detection. IEEE Transactions on Mul- timedia26, 10160–10173 (2024)
work page 2024
-
[25]
Tian, Y., Pang, G., Chen, Y., Singh, R., Verjans, J.W., Carneiro, G.: Weakly- supervised video anomaly detection with robust temporal feature magnitude learn- Title Suppressed Due to Excessive Length 17 ing.In:ProceedingsoftheIEEE/CVFinternationalconferenceoncomputervision. pp. 4975–4986 (2021)
work page 2021
-
[26]
Scientific Reports15(1), 1–18 (2025)
Verma, U., Pai, M.M.M., Pai, R.M., et al.: Contextual information based anomaly detection for multi-scene aerial videos. Scientific Reports15(1), 1–18 (2025)
work page 2025
-
[27]
Mixing neural networks and the
Viitala, A., Boney, R., Zhao, Y., Ilin, A., Kannala, J.: Learning to drive (l2d) as a low-cost benchmark for real-world reinforcement learning. In: 2021 20th International Conference on Advanced Robotics (ICAR). pp. 275–281 (2021). https://doi.org/10.1109/ICAR53236.2021.9659342
-
[28]
arXiv preprint arXiv:2104.07268 (2021)
Wan, B., Fang, Y., Xia, X., Mei, J.: Weakly supervised video anomaly detection via center-guided discriminative learning. arXiv preprint arXiv:2104.07268 (2021)
-
[29]
IEEE transactions on neural networks and learning systems33(6), 2301–2312 (2021)
Wang, X., Che, Z., Jiang, B., Xiao, N., Yang, K., Tang, J., Ye, J., Wang, J., Qi, Q.: Robust unsupervised video anomaly detection by multipath frame predic- tion. IEEE transactions on neural networks and learning systems33(6), 2301–2312 (2021)
work page 2021
-
[30]
IEEE Transactions on Industrial Informatics (2025)
Yang, K., Lin, Z., Yang, Z., Tian, Z., Ma, J., Príncipe, J.C., Harley, J.B.: Im- proved pca reconstruction-based unsupervised anomaly detection in uncontrolled structural health monitoring with correntropy. IEEE Transactions on Industrial Informatics (2025)
work page 2025
-
[31]
Journal of the Franklin Institute356(5), 3187–3215 (2019)
Yu, S., Abraham, Z., Wang, H., Shah, M., Wei, Y., Príncipe, J.C.: Concept drift de- tection and adaptation with hierarchical hypothesis testing. Journal of the Franklin Institute356(5), 3187–3215 (2019)
work page 2019
-
[32]
In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)
Zaheer, M.Z., Mahmood, A., Khan, M.H., Segu, M., Yu, F., Lee, S.I.: Generative cooperative learning for unsupervised video anomaly detection. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). pp. 14744–14754 (June 2022)
work page 2022
-
[33]
In: European Conference on Computer Vision
Zhang, G., Liu, Y., Yang, X., Huang, H., Huang, C.: Trafficnight: An aerial multi- modal benchmark for nighttime vehicle surveillance. In: European Conference on Computer Vision. pp. 36–48. Springer (2024)
work page 2024
-
[34]
URL https://ieeexplore.ieee.org/document/ 10658325/
Zhang, M., Wang, J., Qi, Q., Sun, H., Zhuang, Z., Ren, P., Ma, R., Liao, J.: Multi-scale video anomaly detection by multi-grained spatio-temporal represen- tation learning. In: 2024 IEEE/CVF Conference on Computer Vision and Pat- tern Recognition (CVPR). pp. 17385–17394 (2024).https://doi.org/10.1109/ CVPR52733.2024.01646
-
[35]
In: Pro- ceedings of the IEEE/CVF conference on computer vision and pattern recognition
Zhong, J.X., Li, N., Kong, W., Liu, S., Li, T.H., Li, G.: Graph convolutional label noise cleaner: Train a plug-and-play action classifier for anomaly detection. In: Pro- ceedings of the IEEE/CVF conference on computer vision and pattern recognition. pp. 1237–1246 (2019)
work page 2019
-
[36]
Zhou, C., Paffenroth, R.C.: Anomaly detection with robust deep autoencoders. In: Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. p. 665–674. KDD ’17, Association for Computing Machinery, New York, NY, USA (2017).https://doi.org/10.1145/3097983. 3098052,https://doi.org/10.1145/3097983.3098052
-
[37]
In: Globerson, A., Mackey, L., Bel- grave, D., Fan, A., Paquet, U., Tomczak, J., Zhang, C
Zhu, L., Wang, L., Raj, A., Gedeon, T., Chen, C.: Advancing video anomaly de- tection: A concise review and a new dataset. In: Globerson, A., Mackey, L., Bel- grave, D., Fan, A., Paquet, U., Tomczak, J., Zhang, C. (eds.) Advances in Neu- ral Information Processing Systems. vol. 37, pp. 89943–89977. Curran Associates, Inc. (2024),https://proceedings.neurip...
work page 2024
-
[38]
In: International conference on learning representations (2018)
Zong, B., Song, Q., Min, M.R., Cheng, W., Lumezanu, C., Cho, D., Chen, H.: Deep autoencoding gaussian mixture model for unsupervised anomaly detection. In: International conference on learning representations (2018)
work page 2018
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.