iPay fuses RGB and skeleton expert streams via dual-attention and a prior-driven Spatial Difference Discriminator to reach 83.45% accuracy on 500+ real-world payment clips from onboard transit cameras.
Ntu rgb+ d 120: A large-scale benchmark for 3d human activity understanding
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
citation-role summary
background 1
citation-polarity summary
fields
cs.CV 2years
2026 2verdicts
UNVERDICTED 2roles
background 1polarities
background 1representative citing papers
Zero-shot MLLMs on ShanghaiTech and CHAD exhibit strong conservative bias with high precision but collapsed recall; class-specific prompts raise peak F1 from 0.09 to 0.64 yet recall remains the bottleneck.
citing papers explorer
-
iPay: Integrated Payment Action Recognition via Multimodal Networks and Adaptive Spatial Prior Learning
iPay fuses RGB and skeleton expert streams via dual-attention and a prior-driven Spatial Difference Discriminator to reach 83.45% accuracy on 500+ real-world payment clips from onboard transit cameras.
-
Are Multimodal LLMs Ready for Surveillance? A Reality Check on Zero-Shot Anomaly Detection in the Wild
Zero-shot MLLMs on ShanghaiTech and CHAD exhibit strong conservative bias with high precision but collapsed recall; class-specific prompts raise peak F1 from 0.09 to 0.64 yet recall remains the bottleneck.