Segmental Spatiotemporal CNNs for Fine-grained Action Segmentation
read the original abstract
Joint segmentation and classification of fine-grained actions is important for applications of human-robot interaction, video surveillance, and human skill evaluation. However, despite substantial recent progress in large-scale action classification, the performance of state-of-the-art fine-grained action recognition approaches remains low. We propose a model for action segmentation which combines low-level spatiotemporal features with a high-level segmental classifier. Our spatiotemporal CNN is comprised of a spatial component that uses convolutional filters to capture information about objects and their relationships, and a temporal component that uses large 1D convolutional filters to capture information about how object relationships change across time. These features are used in tandem with a semi-Markov model that models transitions from one action to another. We introduce an efficient constrained segmental inference algorithm for this model that is orders of magnitude faster than the current approach. We highlight the effectiveness of our Segmental Spatiotemporal CNN on cooking and surgical action datasets for which we observe substantially improved performance relative to recent baseline methods.
This paper has not been read by Pith yet.
Forward citations
Cited by 2 Pith papers
-
Novel evaluation of surgical activity recognition models using task-based efficiency metrics
RP-Net-V2 recognizes 12 steps in robotic prostatectomies with Jaccard index 0.85; efficiency metrics computed from its outputs correlate with those from expert labels, supporting metrics-based model evaluation.
-
Stabilizing Temporal Inference Dynamics for Online Surgical Phase Recognition
A framework using Temporal Error-Cascade loss, Evidence-Gated Transition Predictor, and Temporal Fragmentation Index reduces temporal fragmentation in online surgical phase recognition on Cholec80 and AutoLaparo datasets.
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.