Pistachio: Towards Synthetic, Balanced, and Long-Form Video Anomaly Benchmarks
Pith reviewed 2026-05-17 06:25 UTC · model grok-4.3
The pith
A controlled video generation pipeline produces balanced, diverse long-form anomaly benchmarks without internet data biases.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Pistachio is a new VAD and VAU benchmark built entirely through a controlled generation-based pipeline. The pipeline combines scene-conditioned anomaly assignment, multi-step storyline generation, and temporally consistent long-form synthesis to create coherent 41-second videos. This gives precise control over scenes, anomaly types, and temporal narratives, removing the biases and limitations of Internet-collected datasets while demonstrating scale, diversity, and complexity that expose new challenges for existing methods.
What carries the argument
The controlled generation-based pipeline integrating scene-conditioned anomaly assignment, multi-step storyline generation, and temporally consistent long-form synthesis to produce 41-second videos.
If this is right
- Existing VAD and VAU methods encounter new performance challenges on long-form videos with balanced and diverse anomaly coverage.
- Benchmark creation for semantic and causal anomaly reasoning becomes feasible with far less manual annotation effort.
- Research can shift toward models that handle dynamic multi-event sequences and temporal causality in anomalies.
- Precise control over anomaly types and narratives enables targeted testing of method robustness on specific patterns.
Where Pith is reading between the lines
- The same pipeline could generate on-demand benchmarks for other video tasks such as action prediction or event localization by swapping the anomaly assignment step.
- As generation quality rises, fully synthetic datasets might reduce reliance on real-world collection for many video understanding benchmarks.
- Performance differences between Pistachio and real datasets could highlight specific weaknesses in current models for long temporal context.
- Domain-specific versions could be produced quickly by conditioning the scene and anomaly choices on particular environments like traffic or indoor surveillance.
Load-bearing premise
Videos produced by current generation models are realistic enough and match real-world anomaly distributions and timing to act as a reliable stand-in for evaluating detection and understanding methods.
What would settle it
If methods that rank highest on Pistachio videos produce markedly different rankings or much lower accuracy when tested on established real-world anomaly video datasets, the synthetic benchmark would fail to serve as a valid proxy.
Figures
read the original abstract
Automatically detecting abnormal events in videos is crucial for modern autonomous systems, yet existing Video Anomaly Detection (VAD) benchmarks lack the scene diversity, balanced anomaly coverage, and temporal complexity needed to reliably assess real-world performance. Meanwhile, the community is increasingly moving toward Video Anomaly Understanding (VAU), which requires deeper semantic and causal reasoning but remains difficult to benchmark due to the heavy manual annotation effort it demands. In this paper, we introduce Pistachio, a new VAD/VAU benchmark constructed entirely through a controlled, generation-based pipeline. By leveraging recent advances in video generation models, Pistachio provides precise control over scenes, anomaly types, and temporal narratives, effectively eliminating the biases and limitations of Internet-collected datasets. Our pipeline integrates scene-conditioned anomaly assignment, multi-step storyline generation, and a temporally consistent long-form synthesis strategy that produces coherent 41-second videos with minimal human intervention. Extensive experiments demonstrate the scale, diversity, and complexity of Pistachio, revealing new challenges for existing methods and motivating future research on dynamic and multi-event anomaly understanding.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript introduces Pistachio, a synthetic VAD/VAU benchmark constructed via a controlled generation pipeline. The pipeline combines scene-conditioned anomaly assignment, multi-step storyline generation, and temporally consistent long-form synthesis to produce 41-second videos with explicit control over scenes, anomaly types, and temporal narratives. The central claim is that this approach eliminates the biases and limitations of Internet-collected datasets while providing the scale, diversity, and complexity needed to expose new challenges for existing VAD and VAU methods.
Significance. If the generated videos prove to be faithful proxies for real-world anomaly distributions and temporal dynamics, Pistachio would offer a scalable, precisely controllable benchmark that reduces annotation burden and enables systematic study of dynamic and multi-event scenarios, directly addressing gaps in current VAD/VAU evaluation.
major comments (2)
- [Experiments] Experiments section: The abstract asserts that 'extensive experiments demonstrate the scale, diversity, and complexity of Pistachio, revealing new challenges,' yet no quantitative results (e.g., VAD/VAU performance metrics, error analysis, or baseline comparisons on Pistachio versus ShanghaiTech/UCF-Crime) are referenced. This absence is load-bearing for the claim that the benchmark exposes new challenges.
- [Pipeline] Pipeline description (Section 3): The claim that the pipeline 'effectively eliminat[es] the biases and limitations of Internet-collected datasets' rests on the untested assumption that generated 41-second videos match real-world motion, physics, and anomaly temporal profiles. No video-level fidelity metrics (FID, FVD), human realism scores, or statistical tests (e.g., Kolmogorov-Smirnov on anomaly duration/frequency distributions) against real benchmarks are reported.
minor comments (2)
- [Abstract] Abstract: 'VAU' is expanded on first use, but subsequent references to 'dynamic and multi-event anomaly understanding' would benefit from a brief forward pointer to the specific VAU tasks evaluated.
- [Figures] Figure captions (throughout): Several figures showing generated video frames lack explicit labels for anomaly start/end times or scene conditioning parameters, reducing immediate readability.
Simulated Author's Rebuttal
We thank the referee for their careful reading and constructive feedback. We address each major comment below and describe the revisions we will make to strengthen the manuscript.
read point-by-point responses
-
Referee: [Experiments] Experiments section: The abstract asserts that 'extensive experiments demonstrate the scale, diversity, and complexity of Pistachio, revealing new challenges,' yet no quantitative results (e.g., VAD/VAU performance metrics, error analysis, or baseline comparisons on Pistachio versus ShanghaiTech/UCF-Crime) are referenced. This absence is load-bearing for the claim that the benchmark exposes new challenges.
Authors: We agree that the current manuscript lacks the quantitative baseline evaluations needed to fully support the claim that Pistachio reveals new challenges. The experiments section in the submitted version focuses on dataset statistics, diversity measures, and qualitative examples of generated videos. In the revision we will add a new subsection reporting VAD and VAU baseline results on Pistachio, including standard metrics such as AUC-ROC, comparisons against ShanghaiTech and UCF-Crime, and an error analysis highlighting failure modes that are more prevalent in our long-form, multi-event setting. revision: yes
-
Referee: [Pipeline] Pipeline description (Section 3): The claim that the pipeline 'effectively eliminat[es] the biases and limitations of Internet-collected datasets' rests on the untested assumption that generated 41-second videos match real-world motion, physics, and anomaly temporal profiles. No video-level fidelity metrics (FID, FVD), human realism scores, or statistical tests (e.g., Kolmogorov-Smirnov on anomaly duration/frequency distributions) against real benchmarks are reported.
Authors: The referee is correct that the manuscript does not yet provide direct quantitative evidence that the generated videos match real-world distributions in motion, physics, or anomaly timing. While the pipeline was designed to reduce collection biases through explicit control, we will add video fidelity measurements (FVD scores), human realism ratings from a user study, and statistical comparisons (including Kolmogorov-Smirnov tests on anomaly duration and frequency) against real benchmarks such as UCF-Crime to substantiate the claim. revision: yes
Circularity Check
No significant circularity: dataset construction without derivations or fitted predictions
full rationale
This is a dataset construction paper that describes a procedural pipeline for generating synthetic long-form videos using existing video generation models, scene-conditioned assignment, and storyline generation. No equations, predictions, or first-principles results are presented that could reduce to inputs by construction. The central claims concern control over content and elimination of internet-data biases, but these rest on the external properties of the generators rather than any internal self-definition, fitted-parameter renaming, or load-bearing self-citation chain. The work is self-contained as a benchmark proposal whose utility is to be assessed by downstream users against real-world data, consistent with the reader's assessment of no derivation or fitted quantities.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption Video generation models can be conditioned to produce temporally consistent 41-second videos with specified anomaly narratives.
Reference graph
Works this paper leans on
-
[1]
Ubnor- mal: New benchmark for supervised open-set video anomaly detection
Andra Acsintoae, Andrei Florescu, Mariana-Iuliana Georgescu, Tudor Mare, Paul Sumedrea, Radu Tudor Ionescu, Fahad Shahbaz Khan, and Mubarak Shah. Ubnor- mal: New benchmark for supervised open-set video anomaly detection. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2022. 2
work page 2022
-
[2]
Ub- normal: New benchmark for supervised open-set video anomaly detection
Andra Acsintoae, Andrei Florescu, Mariana-Iuliana Georgescu, Tudor Mare, Paul Sumedrea, Radu Tudor Ionescu, Fahad Shahbaz Khan, and Mubarak Shah. Ub- normal: New benchmark for supervised open-set video anomaly detection. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 20143–20153, 2022. 3
work page 2022
-
[3]
Amit Adam, Ehud Rivlin, Ilan Shimshoni, and Daviv Reinitz. Robust real-time unusual event detection using mul- tiple fixed-location monitors.IEEE transactions on pattern analysis and machine intelligence, 30(3):555–560, 2008. 3
work page 2008
-
[4]
Shuai Bai, Keqin Chen, Xuejing Liu, Jialin Wang, Wenbin Ge, Sibo Song, Kai Dang, Peng Wang, Shijie Wang, Jun Tang, Humen Zhong, Yuanzhi Zhu, Mingkun Yang, Zhao- hai Li, Jianqiang Wan, Pengfei Wang, Wei Ding, Zheren Fu, Yiheng Xu, Jiabo Ye, Xi Zhang, Tianbao Xie, Zesen Cheng, Hang Zhang, Zhibo Yang, Haiyang Xu, and Jun- yang Lin. Qwen2.5-vl technical repor...
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[5]
A new comprehensive benchmark for semi-supervised video anomaly detection and anticipation
Congqi Cao, Yue Lu, Peng Wang, and Yanning Zhang. A new comprehensive benchmark for semi-supervised video anomaly detection and anticipation. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 20392–20401, 2023. 3
work page 2023
-
[6]
Yingxian Chen, Zhengzhe Liu, Baoheng Zhang, Wilton Fok, Xiaojuan Qi, and Yik-Chung Wu. Mgfn: Magnitude- contrastive glance-and-focus network for weakly-supervised video anomaly detection, 2022. 2, 6, 7
work page 2022
-
[7]
Mgfn: Magnitude- contrastive glance-and-focus network for weakly-supervised video anomaly detection
Yingxian Chen, Zhengzhe Liu, Baoheng Zhang, Wilton Fok, Xiaojuan Qi, and Yik-Chung Wu. Mgfn: Magnitude- contrastive glance-and-focus network for weakly-supervised video anomaly detection. InProceedings of the AAAI con- ference on artificial intelligence, pages 387–395, 2023. 3
work page 2023
-
[8]
Internvl: Scaling up vision foundation mod- els and aligning for generic visual-linguistic tasks
Zhe Chen, Jiannan Wu, Wenhai Wang, Weijie Su, Guo Chen, Sen Xing, Muyan Zhong, Qinglong Zhang, Xizhou Zhu, Lewei Lu, et al. Internvl: Scaling up vision foundation mod- els and aligning for generic visual-linguistic tasks. InPro- ceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 24185–24198, 2024. 8
work page 2024
-
[9]
A discriminative framework for anomaly detection in large videos
Allison Del Giorno, J Andrew Bagnell, and Martial Hebert. A discriminative framework for anomaly detection in large videos. InEuropean conference on computer vision, pages 334–349. Springer, 2016. 3
work page 2016
-
[10]
Hang Du, Sicheng Zhang, Binzhu Xie, Guoshun Nan, Ji- ayang Zhang, Junrui Xu, Hangyu Liu, Sicong Leng, Jiang- ming Liu, Hehe Fan, Dajiu Huang, Jing Feng, Linli Chen, Can Zhang, Xuhuan Li, Hao Zhang, Jianhang Chen, Qimei Cui, and Xiaofeng Tao. Uncovering what, why and how: A comprehensive benchmark for causation understanding of video anomaly, 2024. 2
work page 2024
-
[11]
Clip-tsa: Clip-assisted temporal self-attention for weakly-supervised video anomaly detection, 2023
Hyekang Kevin Joo, Khoa V o, Kashu Yamazaki, and Ngan Le. Clip-tsa: Clip-assisted temporal self-attention for weakly-supervised video anomaly detection, 2023. 2
work page 2023
-
[12]
Hunyuanvideo: A systematic framework for large video generative models, 2025
Weijie Kong, Qi Tian, Zijian Zhang, Rox Min, Zuozhuo Dai, Jin Zhou, Jiangfeng Xiong, Xin Li, Bo Wu, Jianwei Zhang, Kathrina Wu, Qin Lin, Junkun Yuan, Yanxin Long, Aladdin Wang, Andong Wang, Changlin Li, Duojun Huang, Fang Yang, Hao Tan, Hongmei Wang, Jacob Song, Jiawang Bai, Jianbing Wu, Jinbao Xue, Joey Wang, Kai Wang, Mengyang Liu, Pengyu Li, Shuai Li, ...
work page 2025
-
[13]
Weixin Li, Vijay Mahadevan, and Nuno Vasconcelos. Anomaly detection and localization in crowded scenes.IEEE transactions on pattern analysis and machine intelligence, 36(1):18–32, 2013. 3
work page 2013
-
[14]
Microsoft coco: Common objects in context
Tsung-Yi Lin, Michael Maire, Serge Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Doll´ar, and C Lawrence Zitnick. Microsoft coco: Common objects in context. In European conference on computer vision, pages 740–755. Springer, 2014. 6
work page 2014
-
[15]
Exploring background-bias for anomaly detection in surveillance videos
Kun Liu and Huadong Ma. Exploring background-bias for anomaly detection in surveillance videos. InProceedings of the 27th ACM International Conference on Multimedia, pages 1490–1499, 2019. 3
work page 2019
-
[16]
Sora: A Review on Background, Technology, Limitations, and Opportunities of Large Vision Models
Yixin Liu, Kai Zhang, Yuan Li, Zhiling Yan, Chujie Gao, Ruoxi Chen, Zhengqing Yuan, Yue Huang, Hanchi Sun, Jian- feng Gao, et al. Sora: A review on background, technology, limitations, and opportunities of large vision models.arXiv preprint arXiv:2402.17177, 2024. 2, 3
work page internal anchor Pith review Pith/arXiv arXiv 2024
-
[17]
Abnormal event de- tection at 150 fps in matlab
Cewu Lu, Jianping Shi, and Jiaya Jia. Abnormal event de- tection at 150 fps in matlab. 2013. 3
work page 2013
-
[18]
A revisit of sparse coding based anomaly detection in stacked rnn framework
Weixin Luo, Wen Liu, and Shenghua Gao. A revisit of sparse coding based anomaly detection in stacked rnn framework. InProceedings of the IEEE international conference on com- puter vision, pages 341–349, 2017. 3
work page 2017
-
[19]
Mulde: Multiscale log- density estimation via denoising score matching for video anomaly detection
Jakub Micorek, Horst Possegger, Dominik Narnhofer, Horst Bischof, and Mateusz Kozinski. Mulde: Multiscale log- density estimation via denoising score matching for video anomaly detection. InProceedings of the IEEE/CVF Confer- ence on Computer Vision and Pattern Recognition (CVPR), pages 18868–18877, 2024. 2, 6, 7
work page 2024
-
[20]
Scalable diffusion models with transformers
William Peebles and Saining Xie. Scalable diffusion models with transformers. InProceedings of the IEEE/CVF inter- national conference on computer vision, pages 4195–4205,
-
[21]
Scalable diffusion models with transformers, 2023
William Peebles and Saining Xie. Scalable diffusion models with transformers, 2023. 3
work page 2023
-
[22]
Yujiang Pu, Xiaoyu Wu, Lulu Yang, and Shengjin Wang. Learning prompt-enhanced context features for weakly- supervised video anomaly detection.IEEE Transactions on Image Processing, 2024. 3, 2, 5
work page 2024
-
[23]
Ariadna Quattoni and Antonio Torralba. Recognizing indoor scenes. In2009 IEEE conference on computer vision and pattern recognition, pages 413–420. IEEE, 2009. 6
work page 2009
-
[24]
Street scene: A new dataset and evaluation protocol for video anomaly detection
Bharathkumar Ramachandra and Michael Jones. Street scene: A new dataset and evaluation protocol for video anomaly detection. InProceedings of the IEEE/CVF winter conference on applications of computer vision, pages 2569– 2578, 2020. 2, 3
work page 2020
-
[25]
Self-distilled masked auto-encoders are efficient video anomaly detectors
Nicolae-C Ristea, Florinel-Alin Croitoru, Radu Tudor Ionescu, Marius Popescu, Fahad Shahbaz Khan, Mubarak Shah, et al. Self-distilled masked auto-encoders are efficient video anomaly detectors. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 15984–15995, 2024. 3
work page 2024
-
[26]
Real-world anomaly detection in surveillance videos
Waqas Sultani, Chen Chen, and Mubarak Shah. Real-world anomaly detection in surveillance videos. InProceedings of the IEEE conference on computer vision and pattern recog- nition, pages 6479–6488, 2018. 3
work page 2018
-
[27]
Real-world anomaly detection in surveillance videos, 2019
Waqas Sultani, Chen Chen, and Mubarak Shah. Real-world anomaly detection in surveillance videos, 2019. 3
work page 2019
-
[28]
Hawk: Learning to understand open-world video anomalies, 2024
Jiaqi Tang, Hao Lu, Ruizheng Wu, Xiaogang Xu, Ke Ma, Cheng Fang, Bin Guo, Jiangbo Lu, Qifeng Chen, and Ying- Cong Chen. Hawk: Learning to understand open-world video anomalies, 2024. 2
work page 2024
- [29]
-
[31]
Yu Tian, Guansong Pang, Yuanhong Chen, Rajvinder Singh, Johan W Verjans, and Gustavo Carneiro. Weakly-supervised video anomaly detection with robust temporal feature mag- nitude learning.arXiv preprint arXiv:2101.10030, 2021. 2
-
[32]
Weakly-supervised video anomaly detection with robust temporal feature magni- tude learning
Yu Tian, Guansong Pang, Yuanhong Chen, Rajvinder Singh, Johan W Verjans, and Gustavo Carneiro. Weakly-supervised video anomaly detection with robust temporal feature magni- tude learning. InProceedings of the IEEE/CVF international conference on computer vision, pages 4975–4986, 2021. 3
work page 2021
-
[33]
Wan: Open and advanced large-scale video generative models, 2025
Team Wan, Ang Wang, Baole Ai, Bin Wen, Chaojie Mao, Chen-Wei Xie, Di Chen, Feiwu Yu, Haiming Zhao, Jianx- iao Yang, Jianyuan Zeng, Jiayu Wang, Jingfeng Zhang, Jin- gren Zhou, Jinkai Wang, Jixuan Chen, Kai Zhu, Kang Zhao, Keyu Yan, Lianghua Huang, Mengyang Feng, Ningyi Zhang, Pandeng Li, Pingyu Wu, Ruihang Chu, Ruili Feng, Shiwei Zhang, Siyang Sun, Tao Fan...
work page 2025
-
[34]
Xuanzhao Wang, Zhengping Che, Bo Jiang, Ning Xiao, Ke Yang, Jian Tang, Jieping Ye, Jingyu Wang, and Qi Qi. Robust unsupervised video anomaly detection by multipath frame prediction.IEEE transactions on neural networks and learn- ing systems, 33(6):2301–2312, 2021. 2
work page 2021
-
[35]
Video models are zero-shot learn- ers and reasoners, 2025
Thadd ¨aus Wiedemer, Yuxuan Li, Paul Vicol, Shixiang Shane Gu, Nick Matarese, Kevin Swersky, Been Kim, Priyank Jaini, and Robert Geirhos. Video models are zero-shot learn- ers and reasoners, 2025. 2, 3
work page 2025
-
[36]
Not only look, but also listen: Learning multimodal violence detection under weak supervision
Peng Wu, Jing Liu, Yujia Shi, Yujia Sun, Fangtao Shao, Zhaoyang Wu, and Zhiwei Yang. Not only look, but also listen: Learning multimodal violence detection under weak supervision. InEuropean conference on computer vision, pages 322–339. Springer, 2020. 3
work page 2020
-
[37]
Vadclip: Adapting vision-language models for weakly supervised video anomaly detection, 2023
Peng Wu, Xuerong Zhou, Guansong Pang, Lingru Zhou, Qingsen Yan, Peng Wang, and Yanning Zhang. Vadclip: Adapting vision-language models for weakly supervised video anomaly detection, 2023. 2
work page 2023
-
[38]
Open-vocabulary video anomaly detection, 2024
Peng Wu, Xuerong Zhou, Guansong Pang, Yujia Sun, Jing Liu, Peng Wang, and Yanning Zhang. Open-vocabulary video anomaly detection, 2024. 3
work page 2024
-
[39]
Follow the rules: Reasoning for video anomaly detection with large language models, 2024
Yuchen Yang, Kwonjoon Lee, Behzad Dariush, Yinzhi Cao, and Shao-Yuan Lo. Follow the rules: Reasoning for video anomaly detection with large language models, 2024. 3
work page 2024
-
[40]
Harnessing large language mod- els for training-free video anomaly detection, 2024
Luca Zanella, Willi Menapace, Massimiliano Mancini, Yim- ing Wang, and Elisa Ricci. Harnessing large language mod- els for training-free video anomaly detection, 2024. 3
work page 2024
-
[41]
Holmes-vau: Towards long-term video anomaly understanding at any granularity
Huaxin Zhang, Xiaohao Xu, Xiang Wang, Jialong Zuo, Xi- aonan Huang, Changxin Gao, Shanjun Zhang, Li Yu, and Nong Sang. Holmes-vau: Towards long-term video anomaly understanding at any granularity. InProceedings of the Computer Vision and Pattern Recognition Conference, pages 13843–13853, 2025. 8
work page 2025
-
[42]
Holmes-vau: Towards long-term video anomaly understanding at any granularity, 2025
Huaxin Zhang, Xiaohao Xu, Xiang Wang, Jialong Zuo, Xi- aonan Huang, Changxin Gao, Shanjun Zhang, Li Yu, and Nong Sang. Holmes-vau: Towards long-term video anomaly understanding at any granularity, 2025. 2
work page 2025
-
[43]
Multi- scale video anomaly detection by multi-grained spatio- temporal representation learning
Menghao Zhang, Jingyu Wang, Qi Qi, Haifeng Sun, Zirui Zhuang, Pengfei Ren, Ruilong Ma, and Jianxin Liao. Multi- scale video anomaly detection by multi-grained spatio- temporal representation learning. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 17385–17394, 2024. 3
work page 2024
-
[44]
Single-image crowd counting via multi-column convolutional neural network
Yingying Zhang, Desen Zhou, Siqin Chen, Shenghua Gao, and Yi Ma. Single-image crowd counting via multi-column convolutional neural network. InProceedings of the IEEE conference on computer vision and pattern recognition, pages 589–597, 2016. 2
work page 2016
-
[45]
Llava- next: A strong zero-shot video understanding model, 2024
Yuanhan Zhang, Bo Li, haotian Liu, Yong jae Lee, Liangke Gui, Di Fu, Jiashi Feng, Ziwei Liu, and Chunyuan Li. Llava- next: A strong zero-shot video understanding model, 2024. 8
work page 2024
-
[46]
Dual memory units with uncertainty regulation for weakly supervised video anomaly detection
Hang Zhou, Junqing Yu, and Wei Yang. Dual memory units with uncertainty regulation for weakly supervised video anomaly detection. InProceedings of the AAAI Conference on Artificial Intelligence, pages 3769–3777, 2023. 2, 6, 7
work page 2023
-
[47]
Dual memory units with uncertainty regulation for weakly supervised video anomaly detection
Hang Zhou, Junqing Yu, and Wei Yang. Dual memory units with uncertainty regulation for weakly supervised video anomaly detection. InProceedings of the AAAI Conference on Artificial Intelligence, pages 3769–3777, 2023. 3
work page 2023
-
[48]
Liyun Zhu, Lei Wang, Arjun Raj, Tom Gedeon, and Chen Chen. Advancing video anomaly detection: A concise re- view and a new dataset.Advances in Neural Information Processing Systems, 37:89943–89977, 2024. 3 Pistachio: Towards Synthetic, Balanced, and Long-Form Video Anomaly Benchmarks Supplementary Material
work page 2024
-
[49]
Details of the System Prompts. The comprehensive system prompts (Tab 5) are pivotal for precisely guiding the Large Language Model (LLM) through the multi-stage annotation workflow of the Pista- chio dataset. These stage-specific prompts are designed to ensure structured and consistent output generation across four critical phases, aligning with the metho...
-
[50]
Dataset Characterization and Curation De- tails. To further substantiate the quality, diversity, and robust gen- eration methodology of the Pistachio dataset, we provide additional visualizations and analysis related to our cura- tion process. Our complete generation pipeline, formalized in Algorithm 1, demonstrates a systematic three-stage ap- proach: Sc...
-
[51]
Public Roads & Transportation Areas
-
[52]
Enclosed & Indoor Premises
-
[53]
Commercial & Entertainment Gathering Points
-
[54]
Industrial & Construction Zones
-
[55]
Outdoor & Natural Environments
-
[56]
Critical Infrastructure Anomaly Type Determination (6 prompts, one per scene) Example for Commercial & Entertainment Gathering Points: You are an expert in multi-image analysis and event inference for ”Commercial & Entertainment Gath- ering Points.” You will receive a set of image files. Your task is to assign the most likely and specific anomalous events...
-
[57]
leverages a dual-branch structure that makes full use of the frozen CLIP model’s fine-grained vision-language alignment. The visual features are enhanced by alignment with rich semantic language representations. This cross- modal knowledge acts as a strong semantic prior, providing a generalized understanding of ”anomaly” that transcends dataset-specific ...
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.