Bounding-Box Trajectories Matter for Video Anomaly Detection

Inpyo Song; Jangwon Lee

arxiv: 2605.21957 · v1 · pith:N53MV3SAnew · submitted 2026-05-21 · 💻 cs.CV

Bounding-Box Trajectories Matter for Video Anomaly Detection

Inpyo Song , Jangwon Lee This is my paper

Pith reviewed 2026-05-22 07:02 UTC · model grok-4.3

classification 💻 cs.CV

keywords video anomaly detectionbounding box trajectoriesnormalizing flowspose estimationShanghaiTechkinematic patternsMSAD dataset

0 comments

The pith

Bounding-box trajectories alone can model normal video motion well enough to detect anomalies better than pose-based approaches.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

This paper argues that the paths traced by bounding boxes around moving objects contain enough information about normal movement patterns to identify unusual events in videos. By using normalizing flows to learn these patterns from trajectories, the method works without the need for detailed human pose estimation, which is common in other approaches. On the ShanghaiTech dataset, the version relying only on trajectories achieves higher average precision than previous pose-focused methods, and combining both yields even better results. This matters because it points to a simpler, readily available signal that has been overlooked in efforts to improve public safety monitoring.

Core claim

We present TrajVAD, a framework that models multi-class bounding-box trajectories using normalizing flows to learn normal kinematic patterns. Its trajectory-only variant (TrajVAD-T) eliminates pose estimation and surpasses all compared pose-based methods on ShanghaiTech in AP (87.7%), while achieving the best results on MSAD. An extended version (TrajVAD-P) incorporates pose information and further improves performance to 88.6% AUROC and 90.9% AP on ShanghaiTech.

What carries the argument

Normalizing flows for modeling multi-class bounding-box trajectories as a way to capture normal kinematic patterns in videos.

Load-bearing premise

Bounding-box trajectories contain enough information about motion to distinguish normal from anomalous events across the tested video datasets.

What would settle it

Evaluating the trajectory-only model on a new dataset featuring anomalies that change body pose but keep the same bounding box path, such as a person suddenly waving arms abnormally while staying in place, and checking if detection rates fall below those of pose-based methods.

Figures

Figures reproduced from arXiv: 2605.21957 by Inpyo Song, Jangwon Lee.

**Figure 1.** Figure 1: Pose-based VAD methods (left) score anomalies from skeleton sequences and are limited to person-class tracks. TrajVAD (right) treats multi-class bounding-box trajectories as the primary signal, applicable to any detected object. TrajVAD-P adds an optional pose branch (dashed) activated only for human tracks when pose is reliable. Early deep learning approaches addressed this problem through pixel-level rec… view at source ↗

**Figure 2.** Figure 2: TrajVAD pipeline. Multi-class tracks from detection and tracking are encoded as standardized trajectory-derived feature sequences and conditioned on class embeddings. TrajVAD-T (top row) maps segments through a normalizing flow and uses the negative log-likelihood as the anomaly score. TrajVAD-P (both row) adds a pose branch conditioned on the trajectory latent ztraj, gated by pose reliability g. 3 Method… view at source ↗

**Figure 3.** Figure 3: Qualitative comparison on ShanghaiTech and MSAD. Red boxes and anomaly scores (higher means anomaly) indicate detected anomalies. Top: a car in a pedestrian zone is invisible to pose-based STG-NF but flagged by TrajVAD through boundingbox kinematics. Bottom: partial occlusion corrupts skeleton estimation, suppressing the STG-NF anomaly score, while TrajVAD maintains detection from trajectory features. 90%… view at source ↗

**Figure 4.** Figure 4: Effect of flow depth K on AUROC for TrajVAD-T and TrajVAD-P on ShanghaiTech and MSAD. Stars mark the best K per panel. TrajVAD-T is robust across depths, while TrajVAD-P peaks at K=18. 4.7 Ablation Study Feature group ablation [PITH_FULL_IMAGE:figures/full_fig_p013_4.png] view at source ↗

read the original abstract

Video anomaly detection is critical for public safety and security, yet remains highly challenging despite extensive research due to large variations in appearance, viewpoint, and scene dynamics. Among existing approaches, human pose-based methods have emerged as a major line of research, showing strong performance since many anomalies in public datasets involve humans and pose representations are robust to appearance changes while providing compact motion descriptions. However, these methods often overlook bounding-box trajectories, although such information is inherently available in pose-based pipelines. In this paper, we explicitly leverage these trajectories as a primary anomaly cue. We present TrajVAD, a framework that models multi-class bounding-box trajectories using normalizing flows to learn normal kinematic patterns. Its trajectory-only variant (TrajVAD-T) eliminates pose estimation and surpasses all compared pose-based methods on ShanghaiTech in AP (87.7%), while achieving the best results on MSAD. An extended version (TrajVAD-P) incorporates pose information and further improves performance to 88.6% AUROC and 90.9% AP on ShanghaiTech, highlighting bounding-box trajectories as an effective yet underexplored modality for video anomaly detection.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

Bounding-box trajectories modeled with normalizing flows can beat pose-based baselines on standard video anomaly benchmarks without needing pose estimation.

read the letter

The main takeaway is that this work shows a trajectory-only approach (TrajVAD-T) reaching 87.7% AP on ShanghaiTech and strong results on MSAD, outperforming the pose-based methods it compares against, while adding pose on top pushes performance higher still. The authors treat bounding-box trajectories as the primary cue rather than a side input, which is a straightforward but useful shift given that pose pipelines already produce boxes anyway. Modeling multi-class normal kinematic patterns via normalizing flows is a clean way to capture the distribution of typical motion without appearance features. This framing and the reported gains are the concrete new element. The paper does well at keeping the argument focused on efficiency and availability of the data source, and the numbers are presented clearly enough to invite direct comparison. The central assumption that trajectories carry enough kinematic signal to detect anomalies holds in the experiments they ran. Soft spots are mostly around missing detail: the abstract and available summary give performance figures and method names but little on the flow architecture choices, exact baseline implementations, dataset statistics, or error breakdowns. That makes it harder to judge whether the gains are robust or tied to specific tuning. Generalization beyond the two datasets is also not addressed in what is shown. This is for people working on practical video surveillance systems who care about dropping pose estimation overhead. A reader looking for incremental but measurable improvements in anomaly detection pipelines would get something usable from it. The work is coherent on its own terms and deserves a serious referee to check the experimental setup and ask for ablations. I would send it to review rather than desk reject.

Referee Report

2 major / 1 minor

Summary. The manuscript introduces TrajVAD, a framework for video anomaly detection that models multi-class bounding-box trajectories with normalizing flows to capture normal kinematic patterns. Its trajectory-only variant (TrajVAD-T) eliminates pose estimation and reports outperforming all compared pose-based methods on ShanghaiTech (87.7% AP) while achieving the best results on MSAD; the pose-augmented variant (TrajVAD-P) further improves to 88.6% AUROC and 90.9% AP on ShanghaiTech.

Significance. If the empirical results hold under detailed scrutiny, the work establishes bounding-box trajectories as a sufficient and high-performing modality for VAD, reducing dependence on pose estimation while maintaining competitive or superior accuracy on standard benchmarks. The normalizing-flow density estimation on trajectories is a technically appropriate choice for modeling normal patterns, and the provision of both T and P variants enables direct assessment of the trajectories' contribution.

major comments (2)

[§4] §4 (Experiments) and associated tables: the claim that TrajVAD-T surpasses all pose-based methods with 87.7% AP on ShanghaiTech requires explicit listing of the AP scores for every cited baseline (including re-implementation details) so that the ranking can be independently verified; without these numbers the superiority statement cannot be assessed.
[§3.2] §3.2 (multi-class modeling): the assumption that bounding-box trajectories contain sufficient kinematic information is load-bearing for the central claim, yet the paper provides no ablation on class granularity or on trajectory-only versus appearance-augmented inputs; this leaves open whether performance gains are truly attributable to the trajectory modality.

minor comments (1)

[Abstract] Abstract: performance numbers are stated without accompanying dataset statistics (e.g., number of normal/anomalous frames or trajectory counts), which should be added for context.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback on our manuscript. The comments highlight important aspects of clarity and experimental rigor that we will address in the revision. We respond to each major comment below.

read point-by-point responses

Referee: [§4] §4 (Experiments) and associated tables: the claim that TrajVAD-T surpasses all pose-based methods with 87.7% AP on ShanghaiTech requires explicit listing of the AP scores for every cited baseline (including re-implementation details) so that the ranking can be independently verified; without these numbers the superiority statement cannot be assessed.

Authors: We agree that explicit numerical comparison strengthens the claim. In the revised manuscript we will add a dedicated table in Section 4 that reports the AP scores of every cited pose-based baseline on ShanghaiTech, together with a short note on any re-implementation settings used. This will allow direct verification of the reported ranking and of the 87.7% AP achieved by TrajVAD-T. revision: yes
Referee: [§3.2] §3.2 (multi-class modeling): the assumption that bounding-box trajectories contain sufficient kinematic information is load-bearing for the central claim, yet the paper provides no ablation on class granularity or on trajectory-only versus appearance-augmented inputs; this leaves open whether performance gains are truly attributable to the trajectory modality.

Authors: We recognize the value of additional ablations. The manuscript already contrasts the trajectory-only variant (TrajVAD-T) with the pose-augmented variant (TrajVAD-P), which isolates the contribution of trajectories versus an additional kinematic cue. To further address class granularity we will include a new ablation that compares single-class versus multi-class normalizing-flow modeling on ShanghaiTech. Regarding appearance-augmented inputs, we will add a brief discussion clarifying that our design deliberately avoids appearance features to isolate kinematic information; if space permits we will also report a lightweight comparison against a simple appearance baseline. revision: partial

Circularity Check

0 steps flagged

No significant circularity

full rationale

The paper describes an empirical method that applies standard normalizing flows to multi-class bounding-box trajectories for video anomaly detection, reporting direct benchmark results on ShanghaiTech and MSAD without any derivation steps, equations, or claims that reduce to self-definitions, fitted parameters renamed as predictions, or load-bearing self-citations. The central performance claims (e.g., 87.7% AP for TrajVAD-T) are presented as outcomes of the proposed pipeline rather than tautological restatements of inputs, and the approach remains self-contained against external datasets and comparisons.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract-only review provides no explicit free parameters, axioms, or invented entities; the approach relies on standard normalizing flows from prior literature without detailing any ad-hoc choices or new postulates.

pith-pipeline@v0.9.0 · 5727 in / 1185 out tokens · 51646 ms · 2026-05-22T07:02:53.418842+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

We present TrajVAD, a framework that models multi-class bounding-box trajectories using normalizing flows to learn normal kinematic patterns.

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

42 extracted references · 42 canonical work pages · 2 internal anchors

[1]

In: CVPR

Acsintoae, A., Florescu, A., Georgescu, M.I., Mare, T., Sumedrea, P., Ionescu, R.T., Khan, F.S., Shah, M.: UBnormal: New benchmark for supervised open-set video anomaly detection. In: CVPR. pp. 20143–20153 (2022)

work page 2022
[2]

Computer Vision and Image Understanding229, 103656 (2023)

Barbalau, A., Ionescu, R.T., Georgescu, M.I., Dueholm, J., Ramachandra, B., Nas- rollahi, K., Khan, F.S., Moeslund, T.B., Shah, M.: SSMTL++: Revisiting self- supervised multi-task learning for video anomaly detection. Computer Vision and Image Understanding229, 103656 (2023)

work page 2023
[3]

In: CVPR

Dawoud, K., Zaheer, Z., Khan, M., Nandakumar, K., Elsaddik, A., Khan, M.H.: FusedVision: A knowledge-infusing approach for practical anomaly detection in real-world surveillance videos. In: CVPR. pp. 4036–4046 (2025)

work page 2025
[4]

In: ICCV

Delić, A., Grcic, M., Šegvić, S.: Sequential keypoint density estimator: an over- looked baseline of skeleton-based video anomaly detection. In: ICCV. pp. 11579– 11589 (2025)

work page 2025
[5]

Density estimation using Real NVP

Dinh, L., Sohl-Dickstein, J., Bengio, S.: Density estimation using Real NVP. arXiv preprint arXiv:1605.08803 (2016)

work page internal anchor Pith review Pith/arXiv arXiv 2016
[6]

In: WACV

Doshi, K., Yilmaz, Y.: Towards interpretable video anomaly detection. In: WACV. pp. 2655–2664 (2023)

work page 2023
[7]

IEEE TPAMI45(6), 7157–7173 (2022)

Fang, H.S., Li, J., Tang, H., Xu, C., Zhu, H., Xiu, Y., Li, Y.L., Lu, C.: AlphaPose: Whole-body regional multi-person pose estimation and tracking in real-time. IEEE TPAMI45(6), 7157–7173 (2022)

work page 2022
[8]

In: ICCV

Flaborea, A., Collorone, L., Di Melendugno, G.M.D., D’Arrigo, S., Prenkaj, B., Galasso, F.: Multimodal motion conditioned diffusion model for skeleton-based video anomaly detection. In: ICCV. pp. 10318–10329 (2023)

work page 2023
[9]

PR156, 110817 (2024)

Flaborea, A., di Melendugno, G.M.D., D’Arrigo, S., Sterpa, M.A., Sampieri, A., Galasso, F.: Contracting skeletal kinematics for human-related video anomaly de- tection. PR156, 110817 (2024)

work page 2024
[10]

YOLOX: Exceeding YOLO Series in 2021

Ge, Z., Liu, S., Wang, F., Li, Z., Sun, J.: YOLOX: Exceeding YOLO series in 2021. arXiv preprint arXiv:2107.08430 (2021)

work page internal anchor Pith review Pith/arXiv arXiv 2021
[11]

In: CVPR

Georgescu, M.I., Barbalau, A., Ionescu, R.T., Khan, F.S., Popescu, M., Shah, M.: Anomaly detection in video via self-supervised and multi-task learning. In: CVPR. pp. 12742–12752 (2021)

work page 2021
[12]

In: ICCV

Gong, D., Liu, L., Le, V., Saha, B., Mansour, M.R., Venkatesh, S., Hengel, A.v.d.: Memorizing normality to detect anomaly: Memory-augmented deep autoencoder for unsupervised anomaly detection. In: ICCV. pp. 1705–1714 (2019)

work page 2019
[13]

In: CVPR

Hasan, M., Choi, J., Neumann, J., Roy-Chowdhury, A.K., Davis, L.S.: Learning temporal regularity in video sequences. In: CVPR. pp. 733–742 (2016)

work page 2016
[14]

In: ICCV

Hinami, R., Mei, T., Satoh, S.: Joint detection and recounting of abnormal events by learning deep generic knowledge. In: ICCV. pp. 3619–3627 (2017)

work page 2017
[15]

In: ICCV

Hirschorn, O., Avidan, S.: Normalizing flows for human pose anomaly detection. In: ICCV. pp. 13545–13554 (2023)

work page 2023
[16]

In: ICPR

Jain, Y., Sharma, A.K., Velmurugan, R., Banerjee, B.: Posecvae: Anomalous hu- man activity detection. In: ICPR. pp. 2927–2934 (2021)

work page 2021
[17]

arXiv preprint arXiv:2207.02281 (2022)

Kanu-Asiegbu, A.M., Vasudevan, R., Du, X.: BiPOCO: Bi-directional trajectory prediction with pose constraints for pedestrian anomaly detection. arXiv preprint arXiv:2207.02281 (2022)

work page arXiv 2022
[18]

In: WACV

Karami, A., Ho, T.K.K., Armanfard, N.: Graph-jigsaw conditioned diffusion model for skeleton-based video anomaly detection. In: WACV. pp. 4237–4247 (2025) 16 I. Song and J. Lee

work page 2025
[19]

NeurIPS31(2018)

Kingma, D.P., Dhariwal, P.: Glow: Generative flow with invertible 1x1 convolu- tions. NeurIPS31(2018)

work page 2018
[20]

Neurocomputing490, 482–494 (2022)

Li, N., Chang, F., Liu, C.: Human-related anomalous event detection via spatial- temporalgraphconvolutionalautoencoderwithembeddedlongshort-termmemory network. Neurocomputing490, 482–494 (2022)

work page 2022
[21]

In: CVPR

Liu,W.,Luo,W.,Lian,D.,Gao,S.:Futureframepredictionforanomalydetection– a new baseline. In: CVPR. pp. 6536–6545 (2018)

work page 2018
[22]

In: ICCV

Liu, Z., Nie, Y., Long, C., Zhang, Q., Li, G.: A hybrid video anomaly detection framework via memory-augmented flow reconstruction and flow-guided frame pre- diction. In: ICCV. pp. 13588–13597 (2021)

work page 2021
[23]

Neurocomputing444, 332–337 (2021)

Luo, W., Liu, W., Gao, S.: Normal graph: Spatial temporal graph convolutional networks based prediction network for skeleton based video anomaly detection. Neurocomputing444, 332–337 (2021)

work page 2021
[24]

In: CVPR

Markovitz, A., Sharir, G., Friedman, I., Zelnik-Manor, L., Avidan, S.: Graph em- bedded pose clustering for anomaly detection. In: CVPR. pp. 10539–10547 (2020)

work page 2020
[25]

In: CVPR

Micorek, J., Possegger, H., Narnhofer, D., Bischof, H., Kozinski, M.: MULDE: Multiscale log-density estimation via denoising score matching for video anomaly detection. In: CVPR. pp. 18868–18877 (2024)

work page 2024
[26]

In: CVPR

Morais, R., Le, V., Tran, T., Saha, B., Mansour, M., Venkatesh, S.: Learning regularity in skeleton trajectories for anomaly detection in videos. In: CVPR. pp. 11996–12004 (2019)

work page 2019
[27]

In: WACV

Noghre, G.A., Pazho, A.D., Tabkhi, H.: An exploratory study on human-centric video anomaly detection through variational autoencoders and trajectory predic- tion. In: WACV. pp. 995–1004 (2024)

work page 2024
[28]

IEEE TCSVT18(11), 1544–1554 (2008)

Piciarelli, C., Micheloni, C., Foresti, G.L.: Trajectory-based anomalous event de- tection. IEEE TCSVT18(11), 1544–1554 (2008)

work page 2008
[29]

In: WACV

Rodrigues, R., Bhargava, N., Velmurugan, R., Chaudhuri, S.: Multi-timescale tra- jectory prediction for abnormal human activity detection. In: WACV. pp. 2626– 2634 (2020)

work page 2020
[30]

In: CVPR

Singh, A., Jones, M.J., Learned-Miller, E.G.: EVAL: Explainable video anomaly localization. In: CVPR. pp. 18717–18726 (2023)

work page 2023
[31]

In: CVPR

Singh, A., Jones, M.J., Learned-Miller, E.G.: Tracklet-based explainable video anomaly localization. In: CVPR. pp. 3992–4001 (2024)

work page 2024
[32]

In: ICIP (2025)

Song, I., Lee, J.: Real-time traffic accident anticipation with feature reuse. In: ICIP (2025)

work page 2025
[33]

In: WACV

Song, I., Lee, S., Joo, M., Lee, J.: Anomaly detection for people with visual impair- ments using an egocentric 360-degree camera. In: WACV. pp. 2828–2837 (2025)

work page 2025
[34]

In: WACV

Stergiou, A., De Weerdt, B., Deligiannis, N.: Holistic representation learning for multitask trajectory anomaly detection. In: WACV. pp. 6729–6739 (2024)

work page 2024
[35]

In: ECCV

Wang, G., Wang, Y., Qin, J., Zhang, D., Bao, X., Huang, D.: Video anomaly detection by solving decoupled spatio-temporal jigsaw puzzles. In: ECCV. pp. 494– 511 (2022)

work page 2022
[36]

Cluster Computing25(4), 2715–2737 (2022)

Wu, C., Shao, S., Tunc, C., Satam, P., Hariri, S.: An explainable and efficient deep learning framework for video anomaly detection. Cluster Computing25(4), 2715–2737 (2022)

work page 2022
[37]

IEEE TMM (2025)

Wu, R., Chen, Y., Xiao, J., Li, B., Fan, J., Dufaux, F., Zhu, C., Liu, Y.: DA-flow: Dual attention normalizing flow for skeleton-based video anomaly detection. IEEE TMM (2025)

work page 2025
[38]

In: ICCV

Yan, C., Zhang, S., Liu, Y., Pang, G., Wang, W.: Feature prediction diffusion model for video anomaly detection. In: ICCV. pp. 5527–5537 (2023) Bounding-Box Trajectories Matter for Video Anomaly Detection 17

work page 2023
[39]

In: CVPR

Zaheer, M.Z., Mahmood, A., Khan, M.H., Segu, M., Yu, F., Lee, S.I.: Generative cooperative learning for unsupervised video anomaly detection. In: CVPR. pp. 14744–14754 (2022)

work page 2022
[40]

Zhang, Y., Sun, P., Jiang, Y., Yu, D., Weng, F., Yuan, Z., Luo, P., Liu, W., Wang, X.:ByteTrack:Multi-objecttrackingbyassociatingeverydetectionbox.In:ECCV. pp. 1–21 (2022)

work page 2022
[41]

In: ICCV

Zhou,K.,Yang,Y.,Cavallaro,A.,Xiang,T.:Omni-scalefeaturelearningforperson re-identification. In: ICCV. pp. 3702–3712 (2019)

work page 2019
[42]

NeurIPS37, 89943–89977 (2024)

Zhu, L., Wang, L., Raj, A., Gedeon, T., Chen, C.: Advancing video anomaly de- tection: A concise review and a new dataset. NeurIPS37, 89943–89977 (2024)

work page 2024

[1] [1]

In: CVPR

Acsintoae, A., Florescu, A., Georgescu, M.I., Mare, T., Sumedrea, P., Ionescu, R.T., Khan, F.S., Shah, M.: UBnormal: New benchmark for supervised open-set video anomaly detection. In: CVPR. pp. 20143–20153 (2022)

work page 2022

[2] [2]

Computer Vision and Image Understanding229, 103656 (2023)

Barbalau, A., Ionescu, R.T., Georgescu, M.I., Dueholm, J., Ramachandra, B., Nas- rollahi, K., Khan, F.S., Moeslund, T.B., Shah, M.: SSMTL++: Revisiting self- supervised multi-task learning for video anomaly detection. Computer Vision and Image Understanding229, 103656 (2023)

work page 2023

[3] [3]

In: CVPR

Dawoud, K., Zaheer, Z., Khan, M., Nandakumar, K., Elsaddik, A., Khan, M.H.: FusedVision: A knowledge-infusing approach for practical anomaly detection in real-world surveillance videos. In: CVPR. pp. 4036–4046 (2025)

work page 2025

[4] [4]

In: ICCV

Delić, A., Grcic, M., Šegvić, S.: Sequential keypoint density estimator: an over- looked baseline of skeleton-based video anomaly detection. In: ICCV. pp. 11579– 11589 (2025)

work page 2025

[5] [5]

Density estimation using Real NVP

Dinh, L., Sohl-Dickstein, J., Bengio, S.: Density estimation using Real NVP. arXiv preprint arXiv:1605.08803 (2016)

work page internal anchor Pith review Pith/arXiv arXiv 2016

[6] [6]

In: WACV

Doshi, K., Yilmaz, Y.: Towards interpretable video anomaly detection. In: WACV. pp. 2655–2664 (2023)

work page 2023

[7] [7]

IEEE TPAMI45(6), 7157–7173 (2022)

Fang, H.S., Li, J., Tang, H., Xu, C., Zhu, H., Xiu, Y., Li, Y.L., Lu, C.: AlphaPose: Whole-body regional multi-person pose estimation and tracking in real-time. IEEE TPAMI45(6), 7157–7173 (2022)

work page 2022

[8] [8]

In: ICCV

Flaborea, A., Collorone, L., Di Melendugno, G.M.D., D’Arrigo, S., Prenkaj, B., Galasso, F.: Multimodal motion conditioned diffusion model for skeleton-based video anomaly detection. In: ICCV. pp. 10318–10329 (2023)

work page 2023

[9] [9]

PR156, 110817 (2024)

Flaborea, A., di Melendugno, G.M.D., D’Arrigo, S., Sterpa, M.A., Sampieri, A., Galasso, F.: Contracting skeletal kinematics for human-related video anomaly de- tection. PR156, 110817 (2024)

work page 2024

[10] [10]

YOLOX: Exceeding YOLO Series in 2021

Ge, Z., Liu, S., Wang, F., Li, Z., Sun, J.: YOLOX: Exceeding YOLO series in 2021. arXiv preprint arXiv:2107.08430 (2021)

work page internal anchor Pith review Pith/arXiv arXiv 2021

[11] [11]

In: CVPR

Georgescu, M.I., Barbalau, A., Ionescu, R.T., Khan, F.S., Popescu, M., Shah, M.: Anomaly detection in video via self-supervised and multi-task learning. In: CVPR. pp. 12742–12752 (2021)

work page 2021

[12] [12]

In: ICCV

Gong, D., Liu, L., Le, V., Saha, B., Mansour, M.R., Venkatesh, S., Hengel, A.v.d.: Memorizing normality to detect anomaly: Memory-augmented deep autoencoder for unsupervised anomaly detection. In: ICCV. pp. 1705–1714 (2019)

work page 2019

[13] [13]

In: CVPR

Hasan, M., Choi, J., Neumann, J., Roy-Chowdhury, A.K., Davis, L.S.: Learning temporal regularity in video sequences. In: CVPR. pp. 733–742 (2016)

work page 2016

[14] [14]

In: ICCV

Hinami, R., Mei, T., Satoh, S.: Joint detection and recounting of abnormal events by learning deep generic knowledge. In: ICCV. pp. 3619–3627 (2017)

work page 2017

[15] [15]

In: ICCV

Hirschorn, O., Avidan, S.: Normalizing flows for human pose anomaly detection. In: ICCV. pp. 13545–13554 (2023)

work page 2023

[16] [16]

In: ICPR

Jain, Y., Sharma, A.K., Velmurugan, R., Banerjee, B.: Posecvae: Anomalous hu- man activity detection. In: ICPR. pp. 2927–2934 (2021)

work page 2021

[17] [17]

arXiv preprint arXiv:2207.02281 (2022)

Kanu-Asiegbu, A.M., Vasudevan, R., Du, X.: BiPOCO: Bi-directional trajectory prediction with pose constraints for pedestrian anomaly detection. arXiv preprint arXiv:2207.02281 (2022)

work page arXiv 2022

[18] [18]

In: WACV

Karami, A., Ho, T.K.K., Armanfard, N.: Graph-jigsaw conditioned diffusion model for skeleton-based video anomaly detection. In: WACV. pp. 4237–4247 (2025) 16 I. Song and J. Lee

work page 2025

[19] [19]

NeurIPS31(2018)

Kingma, D.P., Dhariwal, P.: Glow: Generative flow with invertible 1x1 convolu- tions. NeurIPS31(2018)

work page 2018

[20] [20]

Neurocomputing490, 482–494 (2022)

Li, N., Chang, F., Liu, C.: Human-related anomalous event detection via spatial- temporalgraphconvolutionalautoencoderwithembeddedlongshort-termmemory network. Neurocomputing490, 482–494 (2022)

work page 2022

[21] [21]

In: CVPR

Liu,W.,Luo,W.,Lian,D.,Gao,S.:Futureframepredictionforanomalydetection– a new baseline. In: CVPR. pp. 6536–6545 (2018)

work page 2018

[22] [22]

In: ICCV

Liu, Z., Nie, Y., Long, C., Zhang, Q., Li, G.: A hybrid video anomaly detection framework via memory-augmented flow reconstruction and flow-guided frame pre- diction. In: ICCV. pp. 13588–13597 (2021)

work page 2021

[23] [23]

Neurocomputing444, 332–337 (2021)

Luo, W., Liu, W., Gao, S.: Normal graph: Spatial temporal graph convolutional networks based prediction network for skeleton based video anomaly detection. Neurocomputing444, 332–337 (2021)

work page 2021

[24] [24]

In: CVPR

Markovitz, A., Sharir, G., Friedman, I., Zelnik-Manor, L., Avidan, S.: Graph em- bedded pose clustering for anomaly detection. In: CVPR. pp. 10539–10547 (2020)

work page 2020

[25] [25]

In: CVPR

Micorek, J., Possegger, H., Narnhofer, D., Bischof, H., Kozinski, M.: MULDE: Multiscale log-density estimation via denoising score matching for video anomaly detection. In: CVPR. pp. 18868–18877 (2024)

work page 2024

[26] [26]

In: CVPR

Morais, R., Le, V., Tran, T., Saha, B., Mansour, M., Venkatesh, S.: Learning regularity in skeleton trajectories for anomaly detection in videos. In: CVPR. pp. 11996–12004 (2019)

work page 2019

[27] [27]

In: WACV

Noghre, G.A., Pazho, A.D., Tabkhi, H.: An exploratory study on human-centric video anomaly detection through variational autoencoders and trajectory predic- tion. In: WACV. pp. 995–1004 (2024)

work page 2024

[28] [28]

IEEE TCSVT18(11), 1544–1554 (2008)

Piciarelli, C., Micheloni, C., Foresti, G.L.: Trajectory-based anomalous event de- tection. IEEE TCSVT18(11), 1544–1554 (2008)

work page 2008

[29] [29]

In: WACV

Rodrigues, R., Bhargava, N., Velmurugan, R., Chaudhuri, S.: Multi-timescale tra- jectory prediction for abnormal human activity detection. In: WACV. pp. 2626– 2634 (2020)

work page 2020

[30] [30]

In: CVPR

Singh, A., Jones, M.J., Learned-Miller, E.G.: EVAL: Explainable video anomaly localization. In: CVPR. pp. 18717–18726 (2023)

work page 2023

[31] [31]

In: CVPR

Singh, A., Jones, M.J., Learned-Miller, E.G.: Tracklet-based explainable video anomaly localization. In: CVPR. pp. 3992–4001 (2024)

work page 2024

[32] [32]

In: ICIP (2025)

Song, I., Lee, J.: Real-time traffic accident anticipation with feature reuse. In: ICIP (2025)

work page 2025

[33] [33]

In: WACV

Song, I., Lee, S., Joo, M., Lee, J.: Anomaly detection for people with visual impair- ments using an egocentric 360-degree camera. In: WACV. pp. 2828–2837 (2025)

work page 2025

[34] [34]

In: WACV

Stergiou, A., De Weerdt, B., Deligiannis, N.: Holistic representation learning for multitask trajectory anomaly detection. In: WACV. pp. 6729–6739 (2024)

work page 2024

[35] [35]

In: ECCV

Wang, G., Wang, Y., Qin, J., Zhang, D., Bao, X., Huang, D.: Video anomaly detection by solving decoupled spatio-temporal jigsaw puzzles. In: ECCV. pp. 494– 511 (2022)

work page 2022

[36] [36]

Cluster Computing25(4), 2715–2737 (2022)

Wu, C., Shao, S., Tunc, C., Satam, P., Hariri, S.: An explainable and efficient deep learning framework for video anomaly detection. Cluster Computing25(4), 2715–2737 (2022)

work page 2022

[37] [37]

IEEE TMM (2025)

Wu, R., Chen, Y., Xiao, J., Li, B., Fan, J., Dufaux, F., Zhu, C., Liu, Y.: DA-flow: Dual attention normalizing flow for skeleton-based video anomaly detection. IEEE TMM (2025)

work page 2025

[38] [38]

In: ICCV

Yan, C., Zhang, S., Liu, Y., Pang, G., Wang, W.: Feature prediction diffusion model for video anomaly detection. In: ICCV. pp. 5527–5537 (2023) Bounding-Box Trajectories Matter for Video Anomaly Detection 17

work page 2023

[39] [39]

In: CVPR

Zaheer, M.Z., Mahmood, A., Khan, M.H., Segu, M., Yu, F., Lee, S.I.: Generative cooperative learning for unsupervised video anomaly detection. In: CVPR. pp. 14744–14754 (2022)

work page 2022

[40] [40]

Zhang, Y., Sun, P., Jiang, Y., Yu, D., Weng, F., Yuan, Z., Luo, P., Liu, W., Wang, X.:ByteTrack:Multi-objecttrackingbyassociatingeverydetectionbox.In:ECCV. pp. 1–21 (2022)

work page 2022

[41] [41]

In: ICCV

Zhou,K.,Yang,Y.,Cavallaro,A.,Xiang,T.:Omni-scalefeaturelearningforperson re-identification. In: ICCV. pp. 3702–3712 (2019)

work page 2019

[42] [42]

NeurIPS37, 89943–89977 (2024)

Zhu, L., Wang, L., Raj, A., Gedeon, T., Chen, C.: Advancing video anomaly de- tection: A concise review and a new dataset. NeurIPS37, 89943–89977 (2024)

work page 2024