Temporal Convolutional Networks: A Unified Approach to Action Segmentation
read the original abstract
The dominant paradigm for video-based action segmentation is composed of two steps: first, for each frame, compute low-level features using Dense Trajectories or a Convolutional Neural Network that encode spatiotemporal information locally, and second, input these features into a classifier that captures high-level temporal relationships, such as a Recurrent Neural Network (RNN). While often effective, this decoupling requires specifying two separate models, each with their own complexities, and prevents capturing more nuanced long-range spatiotemporal relationships. We propose a unified approach, as demonstrated by our Temporal Convolutional Network (TCN), that hierarchically captures relationships at low-, intermediate-, and high-level time-scales. Our model achieves superior or competitive performance using video or sensor data on three public action segmentation datasets and can be trained in a fraction of the time it takes to train an RNN.
This paper has not been read by Pith yet.
Forward citations
Cited by 1 Pith paper
-
MoCo-AIS: A Contrastive Learning Framework for Similarity Computation of Vessel Trajectories
MoCo-AIS is a MoCo-based contrastive learning framework that learns vessel trajectory embeddings and improves similarity computation over baselines on large-scale real-world AIS datasets while offering a benchmarking ...
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.