pith. sign in

arxiv: 2411.15729 · v2 · pith:PKSQVQ2Unew · submitted 2024-11-24 · 💻 cs.CV

OccludeNet: A Causal Journey into Mixed-View Actor-Centric Video Action Recognition under Occlusions

classification 💻 cs.CV
keywords occlusionactioncausaloccludenetmodeloccludedrecognitionscenes
0
0 comments X
read the original abstract

The lack of occlusion data in common action recognition video datasets limits model robustness and hinders consistent performance gains. We build OccludeNet, a large-scale occluded video dataset including both real and synthetic occlusion scenes in different natural settings. OccludeNet includes dynamic occlusion, static occlusion, and multi-view interactive occlusion, addressing gaps in current datasets. Our analysis shows occlusion affects action classes differently: actions with low scene relevance and partial body visibility see larger drops in accuracy. To overcome the limits of existing occlusion-aware methods, we propose a structural causal model for occluded scenes and introduce the Causal Action Recognition (CAR) method, which uses backdoor adjustment and counterfactual reasoning. This approach strengthens key actor information and improves model robustness to occlusion. We hope the challenges of OccludeNet will encourage more study of causal links in occluded scenes and lead to a fresh look at class relations, ultimately leading to lasting performance improvements. Our code and data is availibale at: https://github.com/The-Martyr/OccludeNet-Dataset

This paper has not been read by Pith yet.

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.