Reliability-Aware Prototype Calibration for Frozen Pose-Flow Video Anomaly Detection

Ning Dong; Xin Dong; Xinnian Guo; Yingna Su; Zhuangzhuang Pan; Ziyun Jiao

arxiv: 2606.20312 · v1 · pith:RG5LXHGKnew · submitted 2026-06-18 · 💻 cs.CV

Reliability-Aware Prototype Calibration for Frozen Pose-Flow Video Anomaly Detection

Ning Dong , Yingna Su , Xin Dong , Ziyun Jiao , Xinnian Guo , Zhuangzhuang Pan This is my paper

Pith reviewed 2026-06-26 18:20 UTC · model grok-4.3

classification 💻 cs.CV

keywords video anomaly detectionpose flowprototype calibrationfrozen modelspost-hoc calibrationkeypoint reliabilityone-class classificationlatent space geometry

0 comments

The pith

Reliability-Aware Prototype Calibration adds a gated nearest-prototype deviation to flow scores and raises AUROC on every frozen pose-flow backbone-dataset pair tested.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces RPC as a post-hoc calibration for fixed pose-flow anomaly detectors that cannot be retrained. It standardizes both the original flow likelihood and the distance to the nearest prototype in the cached latent space, then adds the prototype term only when keypoint confidence indicates reliable pose data. This combination corrects rankings that a single likelihood score misses when normal behavior is multimodal or observations are noisy. Experiments on two backbones and four datasets show AUROC gains in all eight cases, averaging 2.03 points, with the prototype term identified as the primary driver and reliability gating useful mainly under low-confidence observations.

Core claim

RPC improves frame-level AUROC by adding a standardized nearest-prototype deviation in the frozen latent space to the standardized flow score, with keypoint confidence used only to gate the added geometric term. Across two frozen pose-flow backbones and four datasets, the method raises AUROC in every backbone-dataset pair, with gains from 0.34 to 4.49 points and an average of 2.03 points. Ablations confirm that prototype deviation supplies the main corrective signal while reliability gating matters most when pose observations are less trustworthy.

What carries the argument

Reliability-Aware Prototype Calibration (RPC), a post-hoc score that adds the standardized distance to the nearest prototype in the frozen latent space to the standardized flow likelihood, gated by keypoint confidence.

If this is right

Frozen detectors can be strengthened without retraining or reproducing the full pose pipeline.
Prototype deviation corrects rankings hidden by multimodal normal behavior in likelihood scores.
Reliability gating limits the geometric correction to trustworthy pose observations.
The approach remains lightweight and compatible with any cached pose-flow system.
Gains hold across multiple backbones and datasets when the latent space is kept fixed.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Cached surveillance systems could adopt similar post-hoc corrections to extend useful life without hardware changes.
The same nearest-prototype idea might transfer to other one-class detectors whose latent spaces are already frozen.
Future tests could measure whether the calibration still helps when normal training data is drawn from a different distribution than the prototypes.
Integration with real-time keypoint trackers would show whether the reliability gate reduces false alarms during camera motion or occlusion.

Load-bearing premise

The frozen latent space already contains stable normal-mode structure that nearest prototypes can capture and that this geometric signal is complementary to the original flow likelihood.

What would settle it

A new backbone-dataset pair in which adding the prototype deviation term either leaves AUROC unchanged or lowers it, even after reliability gating.

Figures

Figures reproduced from arXiv: 2606.20312 by Ning Dong, Xin Dong, Xinnian Guo, Yingna Su, Zhuangzhuang Pan, Ziyun Jiao.

**Figure 2.** Figure 2: Rank distributions of frame scores before and after calibration. Scores are converted to rank percentiles within [PITH_FULL_IMAGE:figures/full_fig_p009_2.png] view at source ↗

**Figure 3.** Figure 3: Noise robustness gain of the full RPC score over the baseline score. Each cell reports the AUROC gain in [PITH_FULL_IMAGE:figures/full_fig_p011_3.png] view at source ↗

**Figure 4.** Figure 4: Sensitivity of RPC over five seeds for the number of prototypes [PITH_FULL_IMAGE:figures/full_fig_p011_4.png] view at source ↗

**Figure 5.** Figure 5: Case studies on selected SHTech and UBN clips. Columns and rows are datasets and frozen backbones, [PITH_FULL_IMAGE:figures/full_fig_p012_5.png] view at source ↗

read the original abstract

Pose-flow video anomaly detectors are attractive for one-class surveillance because they provide likelihood-based rankings for tracked skeleton windows. However, a single likelihood score may hide multimodal normal behavior and be sensitive to pose-observation noise. We study a frozen-detector setting in which the pose-flow backbone, cached skeleton tracks, and evaluation pipeline are fixed. Reliability-Aware Prototype Calibration (RPC) is a post-hoc score calibration method for this setting. It adds a standardized nearest-prototype deviation in the frozen latent space to the standardized flow score, and uses keypoint confidence only to gate this added geometric evidence. Thus, RPC preserves the original density signal while correcting the ranking with empirical normal-mode structure under pose reliability. Across two frozen pose-flow backbones and four datasets, RPC improves frame-level AUROC in all eight backbone-dataset pairs, with gains ranging from 0.34 to 4.49 percentage points and averaging 2.03 points. Ablation and reliability analyses show that prototype deviation is the main corrective signal, while reliability gating is most useful when pose observations are less trustworthy. These results suggest that lightweight post-hoc calibration can strengthen cached pose-flow systems when retraining or reproducing the full pose pipeline is impractical.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

RPC is a simple post-hoc calibration that gives small consistent AUROC gains on frozen pose-flow detectors, but the improvements stay modest and narrowly scoped.

read the letter

The core result is that adding a standardized nearest-prototype deviation term (gated by keypoint confidence) to the frozen flow score lifts frame-level AUROC in every one of the eight backbone-dataset pairs, with an average gain of 2.03 points. The paper shows this holds for two different frozen backbones and four datasets, and the ablations attribute most of the lift to the geometric prototype signal rather than the gating.

What works is the disciplined frozen-detector framing. The method is explicitly post-hoc, so it fits settings where the pose pipeline and cached tracks cannot be retrained. The reliability analysis also makes sense: gating helps most when pose observations are noisy. That matches practical surveillance constraints.

The soft spots are the size of the gains and the missing details. Two-point AUROC lifts are useful but not large, and the abstract gives no information on run-to-run variance, statistical tests, or how the prototypes are actually constructed from the normal data. Without those, it is hard to judge whether the improvements would hold up under different splits or slight changes in prototype count. The work also stays inside pose-flow anomaly detection, so it does not claim or test broader applicability.

This paper is for people who already run cached pose-flow systems and want a lightweight score tweak. It has enough concrete empirical results and clear ablations to deserve a serious referee, even if the expected revisions would ask for more statistical grounding and implementation specifics.

Referee Report

2 major / 2 minor

Summary. The manuscript proposes Reliability-Aware Prototype Calibration (RPC), a post-hoc score calibration technique for frozen pose-flow video anomaly detectors. RPC augments the standardized original flow likelihood with a standardized nearest-prototype deviation computed in the frozen latent space, with the added geometric term gated by keypoint confidence. The central empirical claim is that this yields frame-level AUROC gains in all eight backbone-dataset pairs (two frozen backbones, four datasets), ranging from 0.34 to 4.49 percentage points with an average of 2.03 points; ablations attribute the lift primarily to the prototype-deviation term.

Significance. If the reported AUROC gains prove robust under statistical scrutiny and are reproducible from the provided implementation details, the work demonstrates that a lightweight, training-free post-hoc adjustment can meaningfully strengthen cached pose-flow anomaly detectors in surveillance settings where full pipeline retraining is impractical. The reliability-gating analysis and ablation results provide useful guidance on when the geometric correction is most effective.

major comments (2)

[Abstract] Abstract: the reported AUROC improvements (0.34–4.49 pp) are presented without any mention of statistical significance testing, standard errors, data-split details, or correction for multiple comparisons across the eight backbone-dataset pairs; given the modest size of the gains, this information is required to establish that the central claim of uniform improvement is not attributable to chance or selection effects.
[Methods] Methods (prototype construction and scoring): the description of how normal-mode prototypes are derived from the cached skeleton tracks, the precise definition of the nearest-prototype deviation, and the standardization procedure are insufficiently specified to allow independent verification or reproduction; without these details it is impossible to confirm that the added term is complementary to the flow likelihood rather than partially redundant with it.

minor comments (2)

[Methods] Notation for the combined score (flow term plus gated prototype deviation) should be introduced with an explicit equation early in the methods to improve readability.
[Experiments] Table or figure presenting the per-pair AUROC values should include the original baseline scores alongside the RPC scores for direct comparison.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the detailed and constructive feedback. We agree that both statistical validation and precise methodological specification are necessary to strengthen the central claims. Below we respond point-by-point and indicate the revisions we will make.

read point-by-point responses

Referee: [Abstract] Abstract: the reported AUROC improvements (0.34–4.49 pp) are presented without any mention of statistical significance testing, standard errors, data-split details, or correction for multiple comparisons across the eight backbone-dataset pairs; given the modest size of the gains, this information is required to establish that the central claim of uniform improvement is not attributable to chance or selection effects.

Authors: We agree that the absence of statistical testing leaves the modest gains open to questions of chance. In the revised manuscript we will add bootstrap standard errors and 95% confidence intervals for each AUROC difference, report paired Wilcoxon signed-rank tests (or equivalent) with p-values, apply Bonferroni correction across the eight pairs, and explicitly state the train/validation/test splits used for prototype construction and evaluation. These additions will be placed in both the abstract (concise summary) and the experimental results section. revision: yes
Referee: [Methods] Methods (prototype construction and scoring): the description of how normal-mode prototypes are derived from the cached skeleton tracks, the precise definition of the nearest-prototype deviation, and the standardization procedure are insufficiently specified to allow independent verification or reproduction; without these details it is impossible to confirm that the added term is complementary to the flow likelihood rather than partially redundant with it.

Authors: We accept that the current prose description is not sufficiently formal for independent reproduction. We will revise the Methods section to include: (1) the exact procedure for deriving normal-mode prototypes (k-means or mean-pooling of latent embeddings from cached normal tracks, with the number of modes chosen by silhouette score on the training set); (2) the mathematical definition of nearest-prototype deviation as the Euclidean distance in the frozen latent space to the closest prototype; and (3) the standardization formulas (z-score using mean and standard deviation computed on the normal training set for both the flow likelihood and the prototype deviation). We will also add a short analysis showing the correlation between the two standardized terms to address complementarity. revision: yes

Circularity Check

0 steps flagged

No significant circularity detected

full rationale

The paper presents RPC as an empirical post-hoc calibration that adds a standardized nearest-prototype deviation (gated by keypoint confidence) to a frozen flow score. All reported results are AUROC gains measured on held-out test sets across eight backbone-dataset pairs, with ablations attributing lift to the prototype term. No equations, fitted parameters, or self-citations are shown that would make the gains equivalent to inputs by construction; the method contains no derivation chain, uniqueness theorem, or ansatz that reduces to its own definitions.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Based solely on the abstract, no explicit free parameters, axioms, or invented entities are stated; the method implicitly relies on the existence of a meaningful latent space and prototype structure, but these are not formalized here.

pith-pipeline@v0.9.1-grok · 5756 in / 1222 out tokens · 27437 ms · 2026-06-26T18:20:58.215030+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

39 extracted references · 27 canonical work pages

[1]

Real-world anomaly detection in surveillance videos

Waqas Sultani, Chen Chen, and Mubarak Shah. Real-world anomaly detection in surveillance videos. In 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 6479–6488. IEEE, 2018. doi: 10.1109/CVPR.2018.00678

work page doi:10.1109/cvpr.2018.00678 2018
[2]

Anomaly detection method of surveillance video based on global-local information.Knowledge-Based Systems, 317:113530, 2025

Yuwei Wu, Haifeng Sang, and Fei Li. Anomaly detection method of surveillance video based on global-local information.Knowledge-Based Systems, 317:113530, 2025

2025
[3]

Generalized video anomaly event detection: Systematic taxonomy and comparison of deep models.ACM Computing Surveys, 56(7):1–38, 2024

Yang Liu, Dingkang Yang, Yan Wang, Jing Liu, Jun Liu, Azzedine Boukerche, Peng Sun, and Liang Song. Generalized video anomaly event detection: Systematic taxonomy and comparison of deep models.ACM Computing Surveys, 56(7):1–38, 2024. doi: 10.1145/3645101

work page doi:10.1145/3645101 2024
[4]

Jing Liu, Yang Liu, Jieyu Lin, Jielin Li, Liang Cao, Peng Sun, Bo Hu, Liang Song, Azzedine Boukerche, and Victor C. M. Leung. Networking systems for video anomaly detection: A tutorial and survey.ACM Computing Surveys, 57(10):1–37, 2025. doi: 10.1145/3729222

work page doi:10.1145/3729222 2025
[5]

Future frame prediction for anomaly detection – a new baseline

Wen Liu, Weixin Luo, Dongze Lian, and Shenghua Gao. Future frame prediction for anomaly detection – a new baseline. In2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 6536–6545. IEEE,
[6]

doi: 10.1109/CVPR.2018.00684

work page doi:10.1109/cvpr.2018.00684 2018
[7]

Deep one-class classification

Lukas Ruff, Robert Vandermeulen, Nico Goernitz, Lucas Deecke, Shoaib Ahmed Siddiqui, Alexander Binder, Emmanuel Müller, and Marius Kloft. Deep one-class classification. In Jennifer Dy and Andreas Krause, editors, Proceedings of the 35th International Conference on Machine Learning, volume 80 ofProceedings of Machine Learning Research, pages 4393–4402. PMLR, 2018

2018
[8]

Contracting skeletal kinematics for human-related video anomaly detection.Pattern Recognition, 156:110817, 2024

Alessandro Flaborea, Guido Maria D’Amely di Melendugno, Stefano D’Arrigo, Marco Aurelio Sterpa, Alessio Sampieri, and Fabio Galasso. Contracting skeletal kinematics for human-related video anomaly detection.Pattern Recognition, 156:110817, 2024. doi: 10.1016/j.patcog.2024.110817

work page doi:10.1016/j.patcog.2024.110817 2024
[9]

Normalizing flows for human pose anomaly detection

Or Hirschorn and Shai Avidan. Normalizing flows for human pose anomaly detection. In2023 IEEE/CVF International Conference on Computer Vision (ICCV), pages 13499–13508. IEEE, 2023. doi: 10.1109/ICCV5107 0.2023.01246

work page doi:10.1109/iccv5107 2023
[10]

Do deep generative models know what they don’t know? InInternational Conference on Learning Representations, 2019

Eric Nalisnick, Akihiro Matsukawa, Yee Whye Teh, Dilan Görür, and Balaji Lakshminarayanan. Do deep generative models know what they don’t know? InInternational Conference on Learning Representations, 2019

2019
[11]

Why normalizing flows fail to detect out-of- distribution data

Polina Kirichenko, Pavel Izmailov, and Andrew Gordon Wilson. Why normalizing flows fail to detect out-of- distribution data. InAdvances in Neural Information Processing Systems, volume 33, 2020

2020
[12]

Visual anomaly detection via partition memory bank module and error estimation

Peng Xing and Zechao Li. Visual anomaly detection via partition memory bank module and error estimation. IEEE Transactions on Circuits and Systems for Video Technology, 33(8):3596–3607, 2023. doi: 10.1109/TCSVT. 2023.3237562

work page doi:10.1109/tcsvt 2023
[13]

Whole-body human pose estimation in the wild

Sheng Jin, Lumin Xu, Jin Xu, Can Wang, Wentao Liu, Chen Qian, Wanli Ouyang, and Ping Luo. Whole-body human pose estimation in the wild. InProceedings of the European Conference on Computer Vision (ECCV), pages 196–214, 2020

2020
[14]

Generalized out-of-distribution detection: A survey

Jingkang Yang, Kaiyang Zhou, Yixuan Li, and Ziwei Liu. Generalized out-of-distribution detection: A survey. International Journal of Computer Vision, 132(12):5635–5662, 2024. doi: 10.1007/s11263-024-02117-4

work page doi:10.1007/s11263-024-02117-4 2024
[15]

IRASim: A fine-grained world model for robot manipulation,

Anja Deli´c, Matej Gr ˇci´c, and Siniša Šegvi ´c. Sequential keypoint density estimator: An overlooked baseline of skeleton-based video anomaly detection. In2025 IEEE/CVF International Conference on Computer Vision (ICCV), pages 11579–11589. IEEE, 2025. doi: 10.1109/ICCV51701.2025.01077

work page doi:10.1109/iccv51701.2025.01077 2025
[16]

PoseTrack: A benchmark for human pose estimation and tracking

Mykhaylo Andriluka, Umar Iqbal, Eldar Insafutdinov, Leonid Pishchulin, Anton Milan, Juergen Gall, and Bernt Schiele. PoseTrack: A benchmark for human pose estimation and tracking. InProceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 5167–5176, 2018. doi: 10.1109/CVPR.2018.00542. 13 RPC for Frozen Pose-Flow Video An...

work page doi:10.1109/cvpr.2018.00542 2018
[17]

Vpe-wsvad: Visual prompt exemplars for weakly-supervised video anomaly detection.Knowledge-Based Systems, 299:111978, 2024

Yong Su, Yuyu Tan, Meng Xing, and Simin An. Vpe-wsvad: Visual prompt exemplars for weakly-supervised video anomaly detection.Knowledge-Based Systems, 299:111978, 2024

2024
[18]

Hierarchical vision-language model with comprehensive language description for video anomaly detection.Knowledge-Based Systems, page 115466, 2026

Muaz Al Radi and Sajid Javed. Hierarchical vision-language model with comprehensive language description for video anomaly detection.Knowledge-Based Systems, page 115466, 2026

2026
[19]

Nicola, M

Canhui Tang, Sanping Zhou, Haoyue Shi, and Le Wang. Action hints: Semantic typicality and context uniqueness for generalizable skeleton-based video anomaly detection.Pattern Recognition, 179:113898, 2026. doi: 10.1016/j. patcog.2026.113898

work page doi:10.1016/j 2026
[20]

DA-Flow: Dual attention normalizing flow for skeleton-based video anomaly detection.IEEE Transactions on Multimedia, 27:8847–8858, 2025

Ruituo Wu, Yang Chen, Jian Xiao, Bing Li, Jicong Fan, Frédéric Dufaux, Ce Zhu, and Yipeng Liu. DA-Flow: Dual attention normalizing flow for skeleton-based video anomaly detection.IEEE Transactions on Multimedia, 27:8847–8858, 2025. doi: 10.1109/TMM.2025.3607708

work page doi:10.1109/tmm.2025.3607708 2025
[21]

Video anomaly detection guided by clustering learning.Pattern Recognition, 153:110550, 2024

Shaoming Qiu, Jingfeng Ye, Jiancheng Zhao, Lei He, Liangyu Liu, Bicong E., and Xinchen Huang. Video anomaly detection guided by clustering learning.Pattern Recognition, 153:110550, 2024. doi: 10.1016/j.patcog.2 024.110550

work page doi:10.1016/j.patcog.2 2024
[22]

In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)

Amir Markovitz, Gilad Sharir, Itamar Friedman, Lihi Zelnik-Manor, and Shai Avidan. Graph embedded pose clustering for anomaly detection. In2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 10536–10544. IEEE, 2020. doi: 10.1109/CVPR42600.2020.01055

work page doi:10.1109/cvpr42600.2020.01055 2020
[23]

Weinberger

Chuan Guo, Geoff Pleiss, Yu Sun, and Kilian Q. Weinberger. On calibration of modern neural networks. In Doina Precup and Yee Whye Teh, editors,Proceedings of the 34th International Conference on Machine Learning, volume 70 ofProceedings of Machine Learning Research, pages 1321–1330. PMLR, 2017

2017
[24]

A revisit of sparse coding based anomaly detection in stacked RNN framework

Weixin Luo, Wen Liu, and Shenghua Gao. A revisit of sparse coding based anomaly detection in stacked RNN framework. In2017 IEEE International Conference on Computer Vision (ICCV), pages 341–349. IEEE, 2017. doi: 10.1109/ICCV.2017.45

work page doi:10.1109/iccv.2017.45 2017
[25]

doi: 10.1109/CVPR52688.2022.00176

Andra Acsintoae, Andrei Florescu, Mariana-Iuliana Georgescu, Tudor Mare, Paul Sumedrea, Radu Tudor Ionescu, Fahad Shahbaz Khan, and Mubarak Shah. UBnormal: New benchmark for supervised open-set video anomaly detection. In2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 20111– 20121. IEEE, 2022. doi: 10.1109/CVPR52688.2022.01951

work page doi:10.1109/cvpr52688.2022.01951 2022
[26]

Anomaly detection in video via self-supervised and multi-task learning

Mariana-Iuliana Georgescu, Antonio Barbalau, Radu Tudor Ionescu, Fahad Shahbaz Khan, Marius Popescu, and Mubarak Shah. Anomaly detection in video via self-supervised and multi-task learning. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 12742–12752, 2021

2021
[27]

Video anomaly detection via self- supervised and spatio-temporal proxy tasks learning.Pattern Recognition, 158:111021, 2025

Qingyang Yang, Chuanxu Wang, Peng Liu, Zitai Jiang, and Jiajiong Li. Video anomaly detection via self- supervised and spatio-temporal proxy tasks learning.Pattern Recognition, 158:111021, 2025. doi: 10.1016/j.patc og.2024.111021

work page doi:10.1016/j.patc 2025
[28]

Model selection of anomaly detectors in the absence of labeled validation data.arXiv preprint arXiv:2310.10461, 2024

Clement Fung, Chen Qiu, Aodong Li, and Maja Rudolph. Model selection of anomaly detectors in the absence of labeled validation data.arXiv preprint arXiv:2310.10461, 2024. doi: 10.48550/arXiv.2310.10461

work page doi:10.48550/arxiv.2310.10461 2024
[29]

URL https://proceedings.mlr

Nicolae-Catalin Ristea, Florinel-Alin Croitoru, Radu Tudor Ionescu, Marius Popescu, Fahad Shahbaz Khan, and Mubarak Shah. Self-distilled masked auto-encoders are efficient video anomaly detectors. In2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 15984–15995. IEEE, 2024. doi: 10.1109/CVPR52733.2024.01513

work page doi:10.1109/cvpr52733.2024.01513 2024
[30]

URL https://proceedings.mlr

Menghao Zhang, Jingyu Wang, Qi Qi, Haifeng Sun, Zirui Zhuang, Pengfei Ren, Ruilong Ma, and Jianxin Liao. Multi-scale video anomaly detection by multi-grained spatio-temporal representation learning. In2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 17385–17394. IEEE, 2024. doi: 10.1109/CVPR52733.2024.01646

work page doi:10.1109/cvpr52733.2024.01646 2024
[31]

VideoPatchCore: An effective method to memorize normality for video anomaly detection

Sunghyun Ahn, Youngwan Jo, Kijung Lee, and Sanghyun Park. VideoPatchCore: An effective method to memorize normality for video anomaly detection. InComputer Vision – ACCV 2024, pages 312–328. Springer Nature Singapore, 2024. doi: 10.1007/978-981-96-0908-6_18

work page doi:10.1007/978-981-96-0908-6_18 2024
[32]

Safeguarding sustainable cities: Unsupervised video anomaly detection through diffusion-based latent pattern learning

Menghao Zhang, Jingyu Wang, Qi Qi, Pengfei Ren, Haifeng Sun, Zirui Zhuang, Lei Zhang, and Jianxin Liao. Safeguarding sustainable cities: Unsupervised video anomaly detection through diffusion-based latent pattern learning. In Kate Larson, editor,Proceedings of the Thirty-Third International Joint Conference on Artificial Intelligence, IJCAI-24, pages 7572...
[33]

doi: 10.24963/ijcai.2024/838

work page doi:10.24963/ijcai.2024/838 2024
[34]

Follow the rules: Reasoning for video anomaly detection with large language models

Yuchen Yang, Kwonjoon Lee, Behzad Dariush, Yinzhi Cao, and Shao-Yuan Lo. Follow the rules: Reasoning for video anomaly detection with large language models. InComputer Vision – ECCV 2024, pages 304–322. Springer Nature Switzerland, 2024. doi: 10.1007/978-3-031-73004-7_18. 14 RPC for Frozen Pose-Flow Video Anomaly DetectionA PREPRINT

work page doi:10.1007/978-3-031-73004-7_18 2024
[35]

Spatio-temporal graph-based self-labeling for video anomaly detection.Neuro- computing, 627:129576, 2025

Meng Xing, Zhiyong Feng, Yong Su, Yiming Zhang, Changjae Oh, Valeriya Gribova, Vladimir Fedorovich Filaretov, and Deshuang Huang. Spatio-temporal graph-based self-labeling for video anomaly detection.Neuro- computing, 627:129576, 2025. doi: 10.1016/j.neucom.2025.129576

work page doi:10.1016/j.neucom.2025.129576 2025
[36]

Video anomaly detection with motion and appearance guided patch diffusion model.Proceedings of the AAAI Conference on Artificial Intelligence, 39(10):10761–10769, 2025

Hang Zhou, Jiale Cai, Yuteng Ye, Yonghui Feng, Chenxing Gao, Junqing Yu, Zikai Song, and Wei Yang. Video anomaly detection with motion and appearance guided patch diffusion model.Proceedings of the AAAI Conference on Artificial Intelligence, 39(10):10761–10769, 2025. doi: 10.1609/aaai.v39i10.33169

work page doi:10.1609/aaai.v39i10.33169 2025
[38]

A video anomaly detection framework based on semantic consistency and multi-attribute feature complementarity.Pattern Recognition, 170:112016, 2026

Chuanxu Wang, Zitai Jiang, Haigang Deng, and Chunjuan Yan. A video anomaly detection framework based on semantic consistency and multi-attribute feature complementarity.Pattern Recognition, 170:112016, 2026. doi: 10.1016/j.patcog.2025.112016

work page doi:10.1016/j.patcog.2025.112016 2026
[40]

In: IEEE/CVF Winter Conference on Applications of Computer Vision, WACV 2022, Waikoloa, HI, USA, January 3-8, 2022

Ghazal Alinezhad Noghre, Armin Danesh Pazho, and Hamed Tabkhi. An exploratory study on human-centric video anomaly detection through variational autoencoders and trajectory prediction. In2024 IEEE/CVF Winter Conference on Applications of Computer Vision Workshops (WACVW), pages 995–1004. IEEE, 2024. doi: 10.1109/W ACVW60836.2024.00109

work page doi:10.1109/w 2024
[41]

PoseWatch: A transformer-based architecture for human-centric video anomaly detection using spatio-temporal pose tokenization, 2024

Ghazal Alinezhad Noghre, Armin Danesh Pazho, and Hamed Tabkhi. PoseWatch: A transformer-based architecture for human-centric video anomaly detection using spatio-temporal pose tokenization, 2024. 15

2024

[1] [1]

Real-world anomaly detection in surveillance videos

Waqas Sultani, Chen Chen, and Mubarak Shah. Real-world anomaly detection in surveillance videos. In 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 6479–6488. IEEE, 2018. doi: 10.1109/CVPR.2018.00678

work page doi:10.1109/cvpr.2018.00678 2018

[2] [2]

Anomaly detection method of surveillance video based on global-local information.Knowledge-Based Systems, 317:113530, 2025

Yuwei Wu, Haifeng Sang, and Fei Li. Anomaly detection method of surveillance video based on global-local information.Knowledge-Based Systems, 317:113530, 2025

2025

[3] [3]

Generalized video anomaly event detection: Systematic taxonomy and comparison of deep models.ACM Computing Surveys, 56(7):1–38, 2024

Yang Liu, Dingkang Yang, Yan Wang, Jing Liu, Jun Liu, Azzedine Boukerche, Peng Sun, and Liang Song. Generalized video anomaly event detection: Systematic taxonomy and comparison of deep models.ACM Computing Surveys, 56(7):1–38, 2024. doi: 10.1145/3645101

work page doi:10.1145/3645101 2024

[4] [4]

Jing Liu, Yang Liu, Jieyu Lin, Jielin Li, Liang Cao, Peng Sun, Bo Hu, Liang Song, Azzedine Boukerche, and Victor C. M. Leung. Networking systems for video anomaly detection: A tutorial and survey.ACM Computing Surveys, 57(10):1–37, 2025. doi: 10.1145/3729222

work page doi:10.1145/3729222 2025

[5] [5]

Future frame prediction for anomaly detection – a new baseline

Wen Liu, Weixin Luo, Dongze Lian, and Shenghua Gao. Future frame prediction for anomaly detection – a new baseline. In2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 6536–6545. IEEE,

[6] [6]

doi: 10.1109/CVPR.2018.00684

work page doi:10.1109/cvpr.2018.00684 2018

[7] [7]

Deep one-class classification

Lukas Ruff, Robert Vandermeulen, Nico Goernitz, Lucas Deecke, Shoaib Ahmed Siddiqui, Alexander Binder, Emmanuel Müller, and Marius Kloft. Deep one-class classification. In Jennifer Dy and Andreas Krause, editors, Proceedings of the 35th International Conference on Machine Learning, volume 80 ofProceedings of Machine Learning Research, pages 4393–4402. PMLR, 2018

2018

[8] [8]

Contracting skeletal kinematics for human-related video anomaly detection.Pattern Recognition, 156:110817, 2024

Alessandro Flaborea, Guido Maria D’Amely di Melendugno, Stefano D’Arrigo, Marco Aurelio Sterpa, Alessio Sampieri, and Fabio Galasso. Contracting skeletal kinematics for human-related video anomaly detection.Pattern Recognition, 156:110817, 2024. doi: 10.1016/j.patcog.2024.110817

work page doi:10.1016/j.patcog.2024.110817 2024

[9] [9]

Normalizing flows for human pose anomaly detection

Or Hirschorn and Shai Avidan. Normalizing flows for human pose anomaly detection. In2023 IEEE/CVF International Conference on Computer Vision (ICCV), pages 13499–13508. IEEE, 2023. doi: 10.1109/ICCV5107 0.2023.01246

work page doi:10.1109/iccv5107 2023

[10] [10]

Do deep generative models know what they don’t know? InInternational Conference on Learning Representations, 2019

Eric Nalisnick, Akihiro Matsukawa, Yee Whye Teh, Dilan Görür, and Balaji Lakshminarayanan. Do deep generative models know what they don’t know? InInternational Conference on Learning Representations, 2019

2019

[11] [11]

Why normalizing flows fail to detect out-of- distribution data

Polina Kirichenko, Pavel Izmailov, and Andrew Gordon Wilson. Why normalizing flows fail to detect out-of- distribution data. InAdvances in Neural Information Processing Systems, volume 33, 2020

2020

[12] [12]

Visual anomaly detection via partition memory bank module and error estimation

Peng Xing and Zechao Li. Visual anomaly detection via partition memory bank module and error estimation. IEEE Transactions on Circuits and Systems for Video Technology, 33(8):3596–3607, 2023. doi: 10.1109/TCSVT. 2023.3237562

work page doi:10.1109/tcsvt 2023

[13] [13]

Whole-body human pose estimation in the wild

Sheng Jin, Lumin Xu, Jin Xu, Can Wang, Wentao Liu, Chen Qian, Wanli Ouyang, and Ping Luo. Whole-body human pose estimation in the wild. InProceedings of the European Conference on Computer Vision (ECCV), pages 196–214, 2020

2020

[14] [14]

Generalized out-of-distribution detection: A survey

Jingkang Yang, Kaiyang Zhou, Yixuan Li, and Ziwei Liu. Generalized out-of-distribution detection: A survey. International Journal of Computer Vision, 132(12):5635–5662, 2024. doi: 10.1007/s11263-024-02117-4

work page doi:10.1007/s11263-024-02117-4 2024

[15] [15]

IRASim: A fine-grained world model for robot manipulation,

Anja Deli´c, Matej Gr ˇci´c, and Siniša Šegvi ´c. Sequential keypoint density estimator: An overlooked baseline of skeleton-based video anomaly detection. In2025 IEEE/CVF International Conference on Computer Vision (ICCV), pages 11579–11589. IEEE, 2025. doi: 10.1109/ICCV51701.2025.01077

work page doi:10.1109/iccv51701.2025.01077 2025

[16] [16]

PoseTrack: A benchmark for human pose estimation and tracking

Mykhaylo Andriluka, Umar Iqbal, Eldar Insafutdinov, Leonid Pishchulin, Anton Milan, Juergen Gall, and Bernt Schiele. PoseTrack: A benchmark for human pose estimation and tracking. InProceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 5167–5176, 2018. doi: 10.1109/CVPR.2018.00542. 13 RPC for Frozen Pose-Flow Video An...

work page doi:10.1109/cvpr.2018.00542 2018

[17] [17]

Vpe-wsvad: Visual prompt exemplars for weakly-supervised video anomaly detection.Knowledge-Based Systems, 299:111978, 2024

Yong Su, Yuyu Tan, Meng Xing, and Simin An. Vpe-wsvad: Visual prompt exemplars for weakly-supervised video anomaly detection.Knowledge-Based Systems, 299:111978, 2024

2024

[18] [18]

Hierarchical vision-language model with comprehensive language description for video anomaly detection.Knowledge-Based Systems, page 115466, 2026

Muaz Al Radi and Sajid Javed. Hierarchical vision-language model with comprehensive language description for video anomaly detection.Knowledge-Based Systems, page 115466, 2026

2026

[19] [19]

Nicola, M

Canhui Tang, Sanping Zhou, Haoyue Shi, and Le Wang. Action hints: Semantic typicality and context uniqueness for generalizable skeleton-based video anomaly detection.Pattern Recognition, 179:113898, 2026. doi: 10.1016/j. patcog.2026.113898

work page doi:10.1016/j 2026

[20] [20]

DA-Flow: Dual attention normalizing flow for skeleton-based video anomaly detection.IEEE Transactions on Multimedia, 27:8847–8858, 2025

Ruituo Wu, Yang Chen, Jian Xiao, Bing Li, Jicong Fan, Frédéric Dufaux, Ce Zhu, and Yipeng Liu. DA-Flow: Dual attention normalizing flow for skeleton-based video anomaly detection.IEEE Transactions on Multimedia, 27:8847–8858, 2025. doi: 10.1109/TMM.2025.3607708

work page doi:10.1109/tmm.2025.3607708 2025

[21] [21]

Video anomaly detection guided by clustering learning.Pattern Recognition, 153:110550, 2024

Shaoming Qiu, Jingfeng Ye, Jiancheng Zhao, Lei He, Liangyu Liu, Bicong E., and Xinchen Huang. Video anomaly detection guided by clustering learning.Pattern Recognition, 153:110550, 2024. doi: 10.1016/j.patcog.2 024.110550

work page doi:10.1016/j.patcog.2 2024

[22] [22]

In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)

Amir Markovitz, Gilad Sharir, Itamar Friedman, Lihi Zelnik-Manor, and Shai Avidan. Graph embedded pose clustering for anomaly detection. In2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 10536–10544. IEEE, 2020. doi: 10.1109/CVPR42600.2020.01055

work page doi:10.1109/cvpr42600.2020.01055 2020

[23] [23]

Weinberger

Chuan Guo, Geoff Pleiss, Yu Sun, and Kilian Q. Weinberger. On calibration of modern neural networks. In Doina Precup and Yee Whye Teh, editors,Proceedings of the 34th International Conference on Machine Learning, volume 70 ofProceedings of Machine Learning Research, pages 1321–1330. PMLR, 2017

2017

[24] [24]

A revisit of sparse coding based anomaly detection in stacked RNN framework

Weixin Luo, Wen Liu, and Shenghua Gao. A revisit of sparse coding based anomaly detection in stacked RNN framework. In2017 IEEE International Conference on Computer Vision (ICCV), pages 341–349. IEEE, 2017. doi: 10.1109/ICCV.2017.45

work page doi:10.1109/iccv.2017.45 2017

[25] [25]

doi: 10.1109/CVPR52688.2022.00176

Andra Acsintoae, Andrei Florescu, Mariana-Iuliana Georgescu, Tudor Mare, Paul Sumedrea, Radu Tudor Ionescu, Fahad Shahbaz Khan, and Mubarak Shah. UBnormal: New benchmark for supervised open-set video anomaly detection. In2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 20111– 20121. IEEE, 2022. doi: 10.1109/CVPR52688.2022.01951

work page doi:10.1109/cvpr52688.2022.01951 2022

[26] [26]

Anomaly detection in video via self-supervised and multi-task learning

Mariana-Iuliana Georgescu, Antonio Barbalau, Radu Tudor Ionescu, Fahad Shahbaz Khan, Marius Popescu, and Mubarak Shah. Anomaly detection in video via self-supervised and multi-task learning. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 12742–12752, 2021

2021

[27] [27]

Video anomaly detection via self- supervised and spatio-temporal proxy tasks learning.Pattern Recognition, 158:111021, 2025

Qingyang Yang, Chuanxu Wang, Peng Liu, Zitai Jiang, and Jiajiong Li. Video anomaly detection via self- supervised and spatio-temporal proxy tasks learning.Pattern Recognition, 158:111021, 2025. doi: 10.1016/j.patc og.2024.111021

work page doi:10.1016/j.patc 2025

[28] [28]

Model selection of anomaly detectors in the absence of labeled validation data.arXiv preprint arXiv:2310.10461, 2024

Clement Fung, Chen Qiu, Aodong Li, and Maja Rudolph. Model selection of anomaly detectors in the absence of labeled validation data.arXiv preprint arXiv:2310.10461, 2024. doi: 10.48550/arXiv.2310.10461

work page doi:10.48550/arxiv.2310.10461 2024

[29] [29]

URL https://proceedings.mlr

Nicolae-Catalin Ristea, Florinel-Alin Croitoru, Radu Tudor Ionescu, Marius Popescu, Fahad Shahbaz Khan, and Mubarak Shah. Self-distilled masked auto-encoders are efficient video anomaly detectors. In2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 15984–15995. IEEE, 2024. doi: 10.1109/CVPR52733.2024.01513

work page doi:10.1109/cvpr52733.2024.01513 2024

[30] [30]

URL https://proceedings.mlr

Menghao Zhang, Jingyu Wang, Qi Qi, Haifeng Sun, Zirui Zhuang, Pengfei Ren, Ruilong Ma, and Jianxin Liao. Multi-scale video anomaly detection by multi-grained spatio-temporal representation learning. In2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 17385–17394. IEEE, 2024. doi: 10.1109/CVPR52733.2024.01646

work page doi:10.1109/cvpr52733.2024.01646 2024

[31] [31]

VideoPatchCore: An effective method to memorize normality for video anomaly detection

Sunghyun Ahn, Youngwan Jo, Kijung Lee, and Sanghyun Park. VideoPatchCore: An effective method to memorize normality for video anomaly detection. InComputer Vision – ACCV 2024, pages 312–328. Springer Nature Singapore, 2024. doi: 10.1007/978-981-96-0908-6_18

work page doi:10.1007/978-981-96-0908-6_18 2024

[32] [32]

Safeguarding sustainable cities: Unsupervised video anomaly detection through diffusion-based latent pattern learning

Menghao Zhang, Jingyu Wang, Qi Qi, Pengfei Ren, Haifeng Sun, Zirui Zhuang, Lei Zhang, and Jianxin Liao. Safeguarding sustainable cities: Unsupervised video anomaly detection through diffusion-based latent pattern learning. In Kate Larson, editor,Proceedings of the Thirty-Third International Joint Conference on Artificial Intelligence, IJCAI-24, pages 7572...

[33] [33]

doi: 10.24963/ijcai.2024/838

work page doi:10.24963/ijcai.2024/838 2024

[34] [34]

Follow the rules: Reasoning for video anomaly detection with large language models

Yuchen Yang, Kwonjoon Lee, Behzad Dariush, Yinzhi Cao, and Shao-Yuan Lo. Follow the rules: Reasoning for video anomaly detection with large language models. InComputer Vision – ECCV 2024, pages 304–322. Springer Nature Switzerland, 2024. doi: 10.1007/978-3-031-73004-7_18. 14 RPC for Frozen Pose-Flow Video Anomaly DetectionA PREPRINT

work page doi:10.1007/978-3-031-73004-7_18 2024

[35] [35]

Spatio-temporal graph-based self-labeling for video anomaly detection.Neuro- computing, 627:129576, 2025

Meng Xing, Zhiyong Feng, Yong Su, Yiming Zhang, Changjae Oh, Valeriya Gribova, Vladimir Fedorovich Filaretov, and Deshuang Huang. Spatio-temporal graph-based self-labeling for video anomaly detection.Neuro- computing, 627:129576, 2025. doi: 10.1016/j.neucom.2025.129576

work page doi:10.1016/j.neucom.2025.129576 2025

[36] [36]

Video anomaly detection with motion and appearance guided patch diffusion model.Proceedings of the AAAI Conference on Artificial Intelligence, 39(10):10761–10769, 2025

Hang Zhou, Jiale Cai, Yuteng Ye, Yonghui Feng, Chenxing Gao, Junqing Yu, Zikai Song, and Wei Yang. Video anomaly detection with motion and appearance guided patch diffusion model.Proceedings of the AAAI Conference on Artificial Intelligence, 39(10):10761–10769, 2025. doi: 10.1609/aaai.v39i10.33169

work page doi:10.1609/aaai.v39i10.33169 2025

[37] [38]

A video anomaly detection framework based on semantic consistency and multi-attribute feature complementarity.Pattern Recognition, 170:112016, 2026

Chuanxu Wang, Zitai Jiang, Haigang Deng, and Chunjuan Yan. A video anomaly detection framework based on semantic consistency and multi-attribute feature complementarity.Pattern Recognition, 170:112016, 2026. doi: 10.1016/j.patcog.2025.112016

work page doi:10.1016/j.patcog.2025.112016 2026

[38] [40]

In: IEEE/CVF Winter Conference on Applications of Computer Vision, WACV 2022, Waikoloa, HI, USA, January 3-8, 2022

Ghazal Alinezhad Noghre, Armin Danesh Pazho, and Hamed Tabkhi. An exploratory study on human-centric video anomaly detection through variational autoencoders and trajectory prediction. In2024 IEEE/CVF Winter Conference on Applications of Computer Vision Workshops (WACVW), pages 995–1004. IEEE, 2024. doi: 10.1109/W ACVW60836.2024.00109

work page doi:10.1109/w 2024

[39] [41]

PoseWatch: A transformer-based architecture for human-centric video anomaly detection using spatio-temporal pose tokenization, 2024

Ghazal Alinezhad Noghre, Armin Danesh Pazho, and Hamed Tabkhi. PoseWatch: A transformer-based architecture for human-centric video anomaly detection using spatio-temporal pose tokenization, 2024. 15

2024