pith. sign in

arxiv: 2201.02849 · v1 · pith:KNKAHJFPnew · submitted 2022-01-08 · 💻 cs.CV · cs.AI

Spatio-Temporal Tuples Transformer for Skeleton-Based Action Recognition

classification 💻 cs.CV cs.AI
keywords framesjointscorrelationdifferentspatio-temporaltransformertuplesaction
0
0 comments X
read the original abstract

Capturing the dependencies between joints is critical in skeleton-based action recognition task. Transformer shows great potential to model the correlation of important joints. However, the existing Transformer-based methods cannot capture the correlation of different joints between frames, which the correlation is very useful since different body parts (such as the arms and legs in "long jump") between adjacent frames move together. Focus on this problem, A novel spatio-temporal tuples Transformer (STTFormer) method is proposed. The skeleton sequence is divided into several parts, and several consecutive frames contained in each part are encoded. And then a spatio-temporal tuples self-attention module is proposed to capture the relationship of different joints in consecutive frames. In addition, a feature aggregation module is introduced between non-adjacent frames to enhance the ability to distinguish similar actions. Compared with the state-of-the-art methods, our method achieves better performance on two large-scale datasets.

This paper has not been read by Pith yet.

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. Quality-Preserving Imperceptible Adversarial Attack on Skeleton-based Human Action Recognition

    cs.CV 2026-06 unverdicted novelty 6.0

    A distribution-based adversarial attack generates quality-preserving adversarial motions for skeleton action recognition without noise perturbations, outperforming prior methods in success rate and naturalness on two ...